Nginx + Memcached: Questions / Advice

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Nginx + Memcached: Questions / Advice

Neil Sheth
Hello,

I'm looking to add some increased caching to our setup, and was
interested in incorporating memcached to nginx.  I just had a few
questions, looking for a little direction!

First, our current setup has an nginx front-end serving static content
(images, js, css, etc), with two backend servers running apache / php.
 Currently, we utilize memcached on our backend, storing some snippets
of html and caching some of our more expensive db queries.

First question - has anyone done a comparison between setting up the
memcached integration through nginx and just serving the pages out of
memcached on the backend?  That is, we already have to insert the
whole page into memcached on the backend.  So, either I serve out of
memcached (and avoid the overhead of the apache hit), or I just have
apache / php query memcached and return the page.

The latter would be much easier to implement - not sure what sort of
performance different would be.

One reason it would be easier to implement if the caching is handled
through our backend - we need to only cache traffic that's not logged
in.  We "could" do this through nginx if we cookie logged in users,
and have nginx read that cookie, and bypass memcache if the cookie
isn't found.


If we have nginx serving up content from memcached - how is gzipping
handled?  Do we store it in the cache gzip'd?

Another thing - we do a bit of A/B testing of our content.  So, to
fully track that, we'd need some percentage of sessions to bypass the
cache.  From nginx, that's a bit more tricky, as we don't have the
session information if things are served out of memcached.  So, I was
thinking, I could just route a certain percentage of requests that
have an external referrer back to our backend, and cookie those users
to also bypass the cache for the rest of the session.

Looking at the memcached module documentation - how do you specify
multiple memcached servers?  It appears that it would treat them as
mirrors, not as a distributed cache?

I think that's it.  In any case, the main thing is, would the
increased performance outweigh the additional complexity, if anyone's
examined that in more detail (serving the cached pages via apache vs
nginx directly)?  Anything else I should be aware of?

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: Nginx + Memcached: Questions / Advice

Merlin-2
Neil,

It sounds like you are looking for an agreement on your rationale.  If this is the case, then yes, it seems generally sound.  As you are no doubt aware, nginx serving from a memcached backend directly will certainly be much faster than serving it from memcached via PHP and apache THEN to nginx (in answer to your question).  However (as you figured out), the memcached module is not currently flexible enough to accomidate your other needs by itself (maybe it doesn't need to, either) so there is something to be said for the flexibility that you get by choosing the key on the backend.  Personally, I would recommend keeping it flexible so that you can use memcached on the front-end for the general case as it is the most efficient, but make it simple to switch it back to the backend during A-B testing.  If you didn't want to have to maintain separate configs that you include via a symlink, you could probably implement this in much the same way people have done maintainence pages by checking for the existence of a file, but of course this is an extra check for each request (so it will impact performance and you might as well have stuck with just the backend).

With regard to the usage of multiple upstream servers, from what I can tell at the wiki documentation here: http://wiki.nginx.org/NginxHttpMemcachedModule you can use multiple backends by using memcached_pass with a backend defined in an upstream block and then specify with memcached_next_upstream which events will cause the next upstream to be queried.

This would lead me to believe that it always uses the same upstream until a failure, then it will use the next one if you have defined cases for that.  I might have a chance to look through the code later or simply attempt it, but I cannot guarantee.  Please let us know if you find out!

Thanks,
Merlin

On Mon, Mar 16, 2009 at 1:09 PM, Neil Sheth <[hidden email]> wrote:
Hello,

I'm looking to add some increased caching to our setup, and was
interested in incorporating memcached to nginx.  I just had a few
questions, looking for a little direction!

First, our current setup has an nginx front-end serving static content
(images, js, css, etc), with two backend servers running apache / php.
 Currently, we utilize memcached on our backend, storing some snippets
of html and caching some of our more expensive db queries.

First question - has anyone done a comparison between setting up the
memcached integration through nginx and just serving the pages out of
memcached on the backend?  That is, we already have to insert the
whole page into memcached on the backend.  So, either I serve out of
memcached (and avoid the overhead of the apache hit), or I just have
apache / php query memcached and return the page.

The latter would be much easier to implement - not sure what sort of
performance different would be.

One reason it would be easier to implement if the caching is handled
through our backend - we need to only cache traffic that's not logged
in.  We "could" do this through nginx if we cookie logged in users,
and have nginx read that cookie, and bypass memcache if the cookie
isn't found.


If we have nginx serving up content from memcached - how is gzipping
handled?  Do we store it in the cache gzip'd?

Another thing - we do a bit of A/B testing of our content.  So, to
fully track that, we'd need some percentage of sessions to bypass the
cache.  From nginx, that's a bit more tricky, as we don't have the
session information if things are served out of memcached.  So, I was
thinking, I could just route a certain percentage of requests that
have an external referrer back to our backend, and cookie those users
to also bypass the cache for the rest of the session.

Looking at the memcached module documentation - how do you specify
multiple memcached servers?  It appears that it would treat them as
mirrors, not as a distributed cache?

I think that's it.  In any case, the main thing is, would the
increased performance outweigh the additional complexity, if anyone's
examined that in more detail (serving the cached pages via apache vs
nginx directly)?  Anything else I should be aware of?

Thanks!


Reply | Threaded
Open this post in threaded view
|

Re: Nginx + Memcached: Questions / Advice

Merlin-2
One note on rereading my message; I was not attempting to indicate that the stat() from the file existence check would slow things down so much that it is as "not fast" as going through memcached->PHP->apache (but depending on things, it might be SOMETIMES) but rather that either you care about performance or you care about flexibility and you should maximize one or the other, not necessarily both.  In the scheme of things, neither delay will likely matter or be noticeable to anyone, even with both.  It was simply a matter of simplicity ;).

- Merlin

On Mon, Mar 16, 2009 at 4:54 PM, Merlin <[hidden email]> wrote:
Neil,

It sounds like you are looking for an agreement on your rationale.  If this is the case, then yes, it seems generally sound.  As you are no doubt aware, nginx serving from a memcached backend directly will certainly be much faster than serving it from memcached via PHP and apache THEN to nginx (in answer to your question).  However (as you figured out), the memcached module is not currently flexible enough to accomidate your other needs by itself (maybe it doesn't need to, either) so there is something to be said for the flexibility that you get by choosing the key on the backend.  Personally, I would recommend keeping it flexible so that you can use memcached on the front-end for the general case as it is the most efficient, but make it simple to switch it back to the backend during A-B testing.  If you didn't want to have to maintain separate configs that you include via a symlink, you could probably implement this in much the same way people have done maintainence pages by checking for the existence of a file, but of course this is an extra check for each request (so it will impact performance and you might as well have stuck with just the backend).

With regard to the usage of multiple upstream servers, from what I can tell at the wiki documentation here: http://wiki.nginx.org/NginxHttpMemcachedModule you can use multiple backends by using memcached_pass with a backend defined in an upstream block and then specify with memcached_next_upstream which events will cause the next upstream to be queried.

This would lead me to believe that it always uses the same upstream until a failure, then it will use the next one if you have defined cases for that.  I might have a chance to look through the code later or simply attempt it, but I cannot guarantee.  Please let us know if you find out!

Thanks,
Merlin


On Mon, Mar 16, 2009 at 1:09 PM, Neil Sheth <[hidden email]> wrote:
Hello,

I'm looking to add some increased caching to our setup, and was
interested in incorporating memcached to nginx.  I just had a few
questions, looking for a little direction!

First, our current setup has an nginx front-end serving static content
(images, js, css, etc), with two backend servers running apache / php.
 Currently, we utilize memcached on our backend, storing some snippets
of html and caching some of our more expensive db queries.

First question - has anyone done a comparison between setting up the
memcached integration through nginx and just serving the pages out of
memcached on the backend?  That is, we already have to insert the
whole page into memcached on the backend.  So, either I serve out of
memcached (and avoid the overhead of the apache hit), or I just have
apache / php query memcached and return the page.

The latter would be much easier to implement - not sure what sort of
performance different would be.

One reason it would be easier to implement if the caching is handled
through our backend - we need to only cache traffic that's not logged
in.  We "could" do this through nginx if we cookie logged in users,
and have nginx read that cookie, and bypass memcache if the cookie
isn't found.


If we have nginx serving up content from memcached - how is gzipping
handled?  Do we store it in the cache gzip'd?

Another thing - we do a bit of A/B testing of our content.  So, to
fully track that, we'd need some percentage of sessions to bypass the
cache.  From nginx, that's a bit more tricky, as we don't have the
session information if things are served out of memcached.  So, I was
thinking, I could just route a certain percentage of requests that
have an external referrer back to our backend, and cookie those users
to also bypass the cache for the rest of the session.

Looking at the memcached module documentation - how do you specify
multiple memcached servers?  It appears that it would treat them as
mirrors, not as a distributed cache?

I think that's it.  In any case, the main thing is, would the
increased performance outweigh the additional complexity, if anyone's
examined that in more detail (serving the cached pages via apache vs
nginx directly)?  Anything else I should be aware of?

Thanks!