Avoiding Nginx restart when rsyncing cache across machines

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Avoiding Nginx restart when rsyncing cache across machines

Quintin Par

I run a mini CDN for a static site by having Nginx cache machines (in different locations) in front of the origin and load balanced by Cloudflare.

 

Periodically I run rsync pull to update the cache on each of these machines. Works well, except that I realized I need to restart Nginx and reload isn’t updating the cache in memory.

 

Really want to avoid the restart. Is this possible? Or maybe I am doing something wrong here.


- Quintin

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

Maxim Dounin
Hello!

On Tue, Sep 11, 2018 at 04:45:42PM -0700, Quintin Par wrote:

> I run a mini CDN for a static site by having Nginx cache machines (in
> different locations) in front of the origin and load balanced by Cloudflare.
>
> Periodically I run rsync pull to update the cache on each of these
> machines. Works well, except that I realized I need to restart Nginx and
> reload isn’t updating the cache in memory.
>
> Really want to avoid the restart. Is this possible? Or maybe I am doing
> something wrong here.

You are not expected to modify cache contents yourself.  Doing so
will likely cause various troubles - including not using the new
files placed into the cache after it was loaded from the disk, not
maintaining configured cache max_size and so on.

If you want to control cache contents yourself by syncing data
across machines, you may have better luck by using proxy_store
and normal files instead.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

Quintin Par

Hi Maxim,

 

Thank you for this. Opened my eyes.

 

Not to sounds demanding, but do you have any examples (code) of proxy_store bring used as a CDN. What’s most important to me in the initial cache warming. I should be able to start a new machine with 30 GB of cache vs. a cold start.

 

Thanks once again.

 


- Quintin


On Wed, Sep 12, 2018 at 7:46 AM Maxim Dounin <[hidden email]> wrote:
Hello!

On Tue, Sep 11, 2018 at 04:45:42PM -0700, Quintin Par wrote:

> I run a mini CDN for a static site by having Nginx cache machines (in
> different locations) in front of the origin and load balanced by Cloudflare.
>
> Periodically I run rsync pull to update the cache on each of these
> machines. Works well, except that I realized I need to restart Nginx and
> reload isn’t updating the cache in memory.
>
> Really want to avoid the restart. Is this possible? Or maybe I am doing
> something wrong here.

You are not expected to modify cache contents yourself.  Doing so
will likely cause various troubles - including not using the new
files placed into the cache after it was loaded from the disk, not
maintaining configured cache max_size and so on.

If you want to control cache contents yourself by syncing data
across machines, you may have better luck by using proxy_store
and normal files instead.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

Lucas Rolff-2
Can I ask, why do you need to start with a warm cache directly? Sure it will lower the requests to the origin, but you could implement a secondary caching layer if you wanted to (using nginx), so you’d have your primary cache in let’s say 10 locations, let's say spread across 3 continents (US, EU, Asia), then you could have a second layer that consist of a smaller amount of locations (1 instance in each continent) - this way you'll warm up faster when you add new servers, and it won't really affect your origin server.

It's a lot more clean also because you're able to use proxy_cache which is really what (in my opinion) you should use when you're building caching proxies.

Generally I'd just slowly warm up new servers prior to putting them into production, get a list of top X files accessed, and loop over them to pull them in as a normal http request.

There's plenty of decent solutions (some more complex than others), but there should really never be a reason to having to sync your cache across machines - even for new servers.

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

Quintin Par

Hi Lucas,

 

The cache is pretty big and I want to limit unnecessary requests if I can. Cloudflare is in front of my machines and I pay for load balancing, firewall, Argo among others. So there is a cost per request.

 

Admittedly I have a not so complex cache architecture. i.e. all cache machines in front of the origin and it has worked so far. This is also because I am not that great a programmer/admin :-)

 

My optimization is not primarily around hits to the origin, but rather bandwidth and number of requests.

 


- Quintin


On Wed, Sep 12, 2018 at 1:06 PM Lucas Rolff <[hidden email]> wrote:
Can I ask, why do you need to start with a warm cache directly? Sure it will lower the requests to the origin, but you could implement a secondary caching layer if you wanted to (using nginx), so you’d have your primary cache in let’s say 10 locations, let's say spread across 3 continents (US, EU, Asia), then you could have a second layer that consist of a smaller amount of locations (1 instance in each continent) - this way you'll warm up faster when you add new servers, and it won't really affect your origin server.

It's a lot more clean also because you're able to use proxy_cache which is really what (in my opinion) you should use when you're building caching proxies.

Generally I'd just slowly warm up new servers prior to putting them into production, get a list of top X files accessed, and loop over them to pull them in as a normal http request.

There's plenty of decent solutions (some more complex than others), but there should really never be a reason to having to sync your cache across machines - even for new servers.

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

nginx mailing list
Quintin,

Are most of your requests for dynamic or static content?
Are the requests clustered such that there is a lot of requests for a few (between 5 and 200, say) URLs?
If three different people make same request do they get personalized or identical content returned?
How long are the cached resources valid for?

I have seen layered caches deliver enormous benefit both in terms of performance and ensuring availability- which is usually
synonymous with “protecting teh backend.”  That protection was most useful when, for example
 I was working on a site that would get mentioned in a tv show at known time of the day every week.
nginx proxy_cache was invaluable at helping the  site stay up and responsive when hit with enormous spikes of requests.

This is nuanced, subtle stuff though.

Is your site something that you can disclose publicly?


Peter



On 12 Sep 2018, at 7:23 PM, Quintin Par <[hidden email]> wrote:

Hi Lucas,

 

The cache is pretty big and I want to limit unnecessary requests if I can. Cloudflare is in front of my machines and I pay for load balancing, firewall, Argo among others. So there is a cost per request.

 

Admittedly I have a not so complex cache architecture. i.e. all cache machines in front of the origin and it has worked so far. This is also because I am not that great a programmer/admin :-)

 

My optimization is not primarily around hits to the origin, but rather bandwidth and number of requests.

 


- Quintin


On Wed, Sep 12, 2018 at 1:06 PM Lucas Rolff <[hidden email]> wrote:
Can I ask, why do you need to start with a warm cache directly? Sure it will lower the requests to the origin, but you could implement a secondary caching layer if you wanted to (using nginx), so you’d have your primary cache in let’s say 10 locations, let's say spread across 3 continents (US, EU, Asia), then you could have a second layer that consist of a smaller amount of locations (1 instance in each continent) - this way you'll warm up faster when you add new servers, and it won't really affect your origin server.

It's a lot more clean also because you're able to use proxy_cache which is really what (in my opinion) you should use when you're building caching proxies.

Generally I'd just slowly warm up new servers prior to putting them into production, get a list of top X files accessed, and loop over them to pull them in as a normal http request.

There's plenty of decent solutions (some more complex than others), but there should really never be a reason to having to sync your cache across machines - even for new servers.

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx


_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

Quintin Par

Hi Peter,

 

Here are my stats for this week: https://imgur.com/a/JloZ37h . The Bypass is only because I was experimenting with some cache warmer scripts. This is primarily a static website.

Here’s my URL hit distribution: https://imgur.com/a/DRJUjPc

If three people are making the same request, they get identical content. No personalization. The pages are cached for 200 days and inactive in proxy_cache_path set to 60 days.

 

This is embarrassing but my CDNs are primarily $5 digital ocean machines across the web with this Nginx cache setup. The server response time averages at 0.29 seconds. Prior to doing my ghetto CDNing this was at 0.98 seconds. I am pretty proud that I have survived several Slashdot effects on the $5 machines serving cached content peaking at 2500 requests/second without any issues.

 

Since this is working well, I don’t want to do any layered caching, unless there is a compelling reason. 


- Quintin


On Wed, Sep 12, 2018 at 4:32 PM Peter Booth via nginx <[hidden email]> wrote:
Quintin,

Are most of your requests for dynamic or static content?
Are the requests clustered such that there is a lot of requests for a few (between 5 and 200, say) URLs?
If three different people make same request do they get personalized or identical content returned?
How long are the cached resources valid for?

I have seen layered caches deliver enormous benefit both in terms of performance and ensuring availability- which is usually
synonymous with “protecting teh backend.”  That protection was most useful when, for example
 I was working on a site that would get mentioned in a tv show at known time of the day every week.
nginx proxy_cache was invaluable at helping the  site stay up and responsive when hit with enormous spikes of requests.

This is nuanced, subtle stuff though.

Is your site something that you can disclose publicly?


Peter



On 12 Sep 2018, at 7:23 PM, Quintin Par <[hidden email]> wrote:

Hi Lucas,

 

The cache is pretty big and I want to limit unnecessary requests if I can. Cloudflare is in front of my machines and I pay for load balancing, firewall, Argo among others. So there is a cost per request.

 

Admittedly I have a not so complex cache architecture. i.e. all cache machines in front of the origin and it has worked so far. This is also because I am not that great a programmer/admin :-)

 

My optimization is not primarily around hits to the origin, but rather bandwidth and number of requests.

 


- Quintin


On Wed, Sep 12, 2018 at 1:06 PM Lucas Rolff <[hidden email]> wrote:
Can I ask, why do you need to start with a warm cache directly? Sure it will lower the requests to the origin, but you could implement a secondary caching layer if you wanted to (using nginx), so you’d have your primary cache in let’s say 10 locations, let's say spread across 3 continents (US, EU, Asia), then you could have a second layer that consist of a smaller amount of locations (1 instance in each continent) - this way you'll warm up faster when you add new servers, and it won't really affect your origin server.

It's a lot more clean also because you're able to use proxy_cache which is really what (in my opinion) you should use when you're building caching proxies.

Generally I'd just slowly warm up new servers prior to putting them into production, get a list of top X files accessed, and loop over them to pull them in as a normal http request.

There's plenty of decent solutions (some more complex than others), but there should really never be a reason to having to sync your cache across machines - even for new servers.

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

Lucas Rolff-2
> The cache is pretty big and I want to limit unnecessary requests if I can.

30gb of cache and ~ 400k hits isn’t a lot.

> Cloudflare is in front of my machines and I pay for load balancing, firewall, Argo among others. So there is a cost per request.

Doesn’t matter if you pay for load balancing, firewall, argo etc – implementing a secondary caching layer won’t increase your costs on the CloudFlare side of things, because you’re not communicating via CloudFlare but rather between machines – you’d connect your X amount of locations to a smaller amount of locations, doing direct traffic between your DigitalOcean instances – so no CloudFlare costs involved.

Communication between your CDN servers and your origin server also (IMO) shouldn’t go via any CloudFlare related products, so additional hits on the origin will be “free” in the expense of a bit higher load – however since it would be only a subset of locations that would request via the origin, and they then serve as the origin for your other servers – you’re effectively decreasing the origin traffic.

You should easily be able to get a 97-99% offload of your origin (in my own setup, it’s at 99.95% at this point), even without using a secondary layer, and performance can get improved by using stuff such as:

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_background_update

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_use_stale_updating 

Nginx is smart enough to do a sub-request in the background to check if the origin request updated (using modified or etags e.g) – this way the origin communication would be little anyway.

The only Load Balancer / Argo / Firewall costs you should have is the “CDN Server -> end user” traffic, and that won’t increase or decrease by doing a normal proxy_cache setup or a setup with a secondary cache layer.

You also won’t increase costs by doing a warmup of your CDN servers – you could do something as simple as:

curl -o /dev/null -k -I --resolve cdn.yourdomain.com:80:127.0.0.1 https://cdn.yourdomain.com/img/logo.png 

You could do the same with python or another language if you’re feeling more comfortable there.

However using a method like above, will result in your warmup being kept “local”, since you’re resolving the cdn.yourdomain.com to localhost, requests that are not yet cached will use whatever is configured in your proxy_pass in the nginx config.

> Admittedly I have a not so complex cache architecture. i.e. all cache machines in front of the origin and it has worked so far

I would say it’s complex if you have to sync your content – many pull based CDN’s simply do a normal proxy_cache + proxy_pass setup, not syncing content, and then using some of the nifty features (such as proxy_cache_background_update and proxy_cache_use_stale_updating) to decrease the origin traffic, or possibly implementing a secondary layer if they’re still doing a lot of origin traffic (e.g. because of having a lot of “edge servers”) – if you’re like 10 servers, I wouldn’t even consider a secondary layer unless your origin is under heavy load and can’t handle 10 possible clients (CDN Servers).

Best Regards,
Lucas Rolff


_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

Maxim Dounin
In reply to this post by Quintin Par
Hello!

On Wed, Sep 12, 2018 at 12:41:15PM -0700, Quintin Par wrote:

> Not to sounds demanding, but do you have any examples (code) of proxy_store
> bring used as a CDN. What’s most important to me in the initial cache
> warming. I should be able to start a new machine with 30 GB of cache vs. a
> cold start.

Simple examples of using proxy_store can be found in the
documentation, see here:

http://nginx.org/r/proxy_store

It usually works well when you need to mirror static files which
are never changed.  Note though that if you need to implement cache
experiation, or need to preserve custom response headers, this
might be a challenge.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

Quintin Par
In reply to this post by Lucas Rolff-2

Hi Lucas,

 

Thank you for this. GEM all over. I didn’t know curl had –resolve.

 

This is a more a generic question: How does one ensure cache consistency on all edges? Do people resort to a combination of expiry + background update + stale responding? What if one edge and the origin was updated to the latest and I now want all the other 1000 edges updates within a minute but the content expiry is 100 days. 


- Quintin


On Wed, Sep 12, 2018 at 11:39 PM Lucas Rolff <[hidden email]> wrote:
> The cache is pretty big and I want to limit unnecessary requests if I can.

30gb of cache and ~ 400k hits isn’t a lot.

> Cloudflare is in front of my machines and I pay for load balancing, firewall, Argo among others. So there is a cost per request.

Doesn’t matter if you pay for load balancing, firewall, argo etc – implementing a secondary caching layer won’t increase your costs on the CloudFlare side of things, because you’re not communicating via CloudFlare but rather between machines – you’d connect your X amount of locations to a smaller amount of locations, doing direct traffic between your DigitalOcean instances – so no CloudFlare costs involved.

Communication between your CDN servers and your origin server also (IMO) shouldn’t go via any CloudFlare related products, so additional hits on the origin will be “free” in the expense of a bit higher load – however since it would be only a subset of locations that would request via the origin, and they then serve as the origin for your other servers – you’re effectively decreasing the origin traffic.

You should easily be able to get a 97-99% offload of your origin (in my own setup, it’s at 99.95% at this point), even without using a secondary layer, and performance can get improved by using stuff such as:

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_background_update

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_use_stale_updating

Nginx is smart enough to do a sub-request in the background to check if the origin request updated (using modified or etags e.g) – this way the origin communication would be little anyway.

The only Load Balancer / Argo / Firewall costs you should have is the “CDN Server -> end user” traffic, and that won’t increase or decrease by doing a normal proxy_cache setup or a setup with a secondary cache layer.

You also won’t increase costs by doing a warmup of your CDN servers – you could do something as simple as:

curl -o /dev/null -k -I --resolve cdn.yourdomain.com:80:127.0.0.1 https://cdn.yourdomain.com/img/logo.png

You could do the same with python or another language if you’re feeling more comfortable there.

However using a method like above, will result in your warmup being kept “local”, since you’re resolving the cdn.yourdomain.com to localhost, requests that are not yet cached will use whatever is configured in your proxy_pass in the nginx config.

> Admittedly I have a not so complex cache architecture. i.e. all cache machines in front of the origin and it has worked so far

I would say it’s complex if you have to sync your content – many pull based CDN’s simply do a normal proxy_cache + proxy_pass setup, not syncing content, and then using some of the nifty features (such as proxy_cache_background_update and proxy_cache_use_stale_updating) to decrease the origin traffic, or possibly implementing a secondary layer if they’re still doing a lot of origin traffic (e.g. because of having a lot of “edge servers”) – if you’re like 10 servers, I wouldn’t even consider a secondary layer unless your origin is under heavy load and can’t handle 10 possible clients (CDN Servers).

Best Regards,
Lucas Rolff


_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

Lucas Rolff-2
> How does one ensure cache consistency on all edges?

I wouldn't - you can never really rely on anything being consistent cached, there will always be stuff that doesn't follow the standards and thus can give an inconsistent state for one or more users.

What I'd do, would simply to be to purge the files whenever needed (and possibly warm them up if you want them to be "hot" when visitors arrive), sure the first 1-2 visitors in each location might have a bit slower request, but that's about it.

Alternatively you could just put a super low cache-control, when you're using proxy_cache_background_update and proxy_cache_use_stale_updating, nginx will ask the origin server if the file has changed - so if it haven't you'll simply get a 304 from the origin (if the origin supports it) - so you'll do more requests to the origin, but traffic will be minimal because it just returns 304 not modified (plus some more headers).

Best Regards,
Lucas Rolff


_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

nginx mailing list
One more approach is to not change the contents of resources without also changing their name. One example would be the cache_key feature in Rails, where resources have a path based on some ID and their updated_at value. Whenever you modify a resource it automatically expires.

Sent from my iPhone

On Sep 13, 2018, at 4:03 PM, Lucas Rolff <[hidden email]> wrote:

>> How does one ensure cache consistency on all edges?
>
> I wouldn't - you can never really rely on anything being consistent cached, there will always be stuff that doesn't follow the standards and thus can give an inconsistent state for one or more users.
>
> What I'd do, would simply to be to purge the files whenever needed (and possibly warm them up if you want them to be "hot" when visitors arrive), sure the first 1-2 visitors in each location might have a bit slower request, but that's about it.
>
> Alternatively you could just put a super low cache-control, when you're using proxy_cache_background_update and proxy_cache_use_stale_updating, nginx will ask the origin server if the file has changed - so if it haven't you'll simply get a 304 from the origin (if the origin supports it) - so you'll do more requests to the origin, but traffic will be minimal because it just returns 304 not modified (plus some more headers).
>
> Best Regards,
> Lucas Rolff
>
>
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding Nginx restart when rsyncing cache across machines

wld75
It is fairly simple to hack nginx and use Lua to reload the cache timed or
via a request.
The code is already there, its just a matter of calling it again.

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,281179,281225#msg-281225

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Question

Saint Michael
In reply to this post by nginx mailing list
I am a new developer and need to publish several database tables with relationship one to many, etc,. What web framework is fastest to learn ? I am looking at Mojolicios or Catalyst, but don´t know if they are necessary or not. For a new project, what  parts would you choose? I have read the Nginx is the best application server, don´t know if I need anything else.
 

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx