Nginx proxy cache purge process does not clean up items fast enough for new elements

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Nginx proxy cache purge process does not clean up items fast enough for new elements

j94305
Hi,

We have a nginx fronting our object storage which caches large objects.
Objects are as large as 100GB. The nginx cache max size is set to about
3.5TB.

When there is a surge of large object requests and disk quickly fills up,
nginx runs into out of disk space error. I was expecting the cache manager
to purge items based on LRU and make room for the new elements, but that
does not happen.

I can reproduce the problem with a simple test case:

Config:

proxy_cache_path /tmp/cache levels=1:2 keys_zone=cache_one:256m inactive=2d
max_size=16G use_temp_path=off;

Test:

    Run a request to download a file of 15GB, it is served correctly and
stored in cache.
    Run a second request to download a different file of 10GB, it will fail
with something like this:

2019/10/04 11:49:08 [crit] 20206#20206: *21 pwritev()
"/tmp/cache/9/fa/a301d42ca6e5d4188c38ecf56aa3afa9.0000000001" has written
only 221184 of 229376 while reading upstream, client: 127.0.0.1, server:
eos_cache_filer, request: "GET...
2019/10/04 12:07:29 [crit] 21201#21201: *487 pwrite()
"/tmp/cache/9/fa/a301d42ca6e5d4188c38ecf56aa3afa9.0000000002" failed (28: No
space left on device) while reading upstream, client: 127.0.0.1, server:
eos_cache_filer, request:

Can I tune some cache_manager parameters to make this work? Is there a way
to disable buffering in such case - ideally download should not fail, it
should just disable caching and buffering.

Thanks
Sachin

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,285896,285896#msg-285896

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx proxy cache purge process does not clean up items fast enough for new elements

Maxim Dounin
Hello!

On Wed, Oct 16, 2019 at 05:24:01AM -0400, [hidden email] wrote:

> We have a nginx fronting our object storage which caches large objects.
> Objects are as large as 100GB. The nginx cache max size is set to about
> 3.5TB.
>
> When there is a surge of large object requests and disk quickly fills up,
> nginx runs into out of disk space error. I was expecting the cache manager
> to purge items based on LRU and make room for the new elements, but that
> does not happen.
>
> I can reproduce the problem with a simple test case:
>
> Config:
>
> proxy_cache_path /tmp/cache levels=1:2 keys_zone=cache_one:256m inactive=2d
> max_size=16G use_temp_path=off;
>
> Test:
>
>     Run a request to download a file of 15GB, it is served correctly and
> stored in cache.
>     Run a second request to download a different file of 10GB, it will fail
> with something like this:
>
> 2019/10/04 11:49:08 [crit] 20206#20206: *21 pwritev()
> "/tmp/cache/9/fa/a301d42ca6e5d4188c38ecf56aa3afa9.0000000001" has written
> only 221184 of 229376 while reading upstream, client: 127.0.0.1, server:
> eos_cache_filer, request: "GET...
> 2019/10/04 12:07:29 [crit] 21201#21201: *487 pwrite()
> "/tmp/cache/9/fa/a301d42ca6e5d4188c38ecf56aa3afa9.0000000002" failed (28: No
> space left on device) while reading upstream, client: 127.0.0.1, server:
> eos_cache_filer, request:
>
> Can I tune some cache_manager parameters to make this work? Is there a way
> to disable buffering in such case - ideally download should not fail, it
> should just disable caching and buffering.

Cache manager works in parallel to worker processes which fill up
cache.  Further, with "max_size=" it only starts to clean things
once max_size limit is reached.  Hence it is possible that total
size of the cache will be larger than max_size configured.

It is recommended to keep max_size smaller than actual disk
space available, and maintain the difference large enough for at
least 10 seconds of filling up cache (10 seconds is how long cache
manager will sleep if it has nothing to do), preferably more.

In particular, the difference is expected to be larger than
maximum size of a single cache item, or it is possible that adding
one cache item will fail if max_size limit is not yet reached.
This is probably what happens in your case.

Note well that temporary files, regardless of whether you use
"use_temp_path=off" or not, are not included into cache size.  You
have to reserve some space for temporary files as well.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx proxy cache purge process does not clean up items fast enough for new elements

j94305
Thankyou Maxim, is there anyway I can make the cache manager a bit more
aggressive in prune and purge? We already leave 20%  of space free on the
disks, but the concurrent request rate for large files can be huge and we
still run in to this issue.

What are your thoughts about disabling buffering on such issues? This is not
a fatal error, so we should stop buffering and switch to streaming mode and
let the request succeed with a error log line in error.log.

Thanks again for your help on this.

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,285896,285900#msg-285900

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx proxy cache purge process does not clean up items fast enough for new elements

Hung Nguyen-2
Hello,

I Dont think the way you currently use nginx as cache proxy is best practice.

Serving large file then store whole file into cache with large number of request is like burning your disk, even if nginx cache manager can delete and refill cache fast enough, it will keep write/delete file infinitely.

Another option should be using nginx slice module, and serving large file using range request. Nginx cache will then work far better.

--
Hưng

> On Oct 16, 2019, at 20:44, [hidden email] <[hidden email]> wrote:
>
> Thankyou Maxim, is there anyway I can make the cache manager a bit more
> aggressive in prune and purge? We already leave 20%  of space free on the
> disks, but the concurrent request rate for large files can be huge and we
> still run in to this issue.
>
> What are your thoughts about disabling buffering on such issues? This is not
> a fatal error, so we should stop buffering and switch to streaming mode and
> let the request succeed with a error log line in error.log.
>
> Thanks again for your help on this.
>
> Posted at Nginx Forum: https://forum.nginx.org/read.php?2,285896,285900#msg-285900
>
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx proxy cache purge process does not clean up items fast enough for new elements

Maxim Dounin
In reply to this post by j94305
Hello!

On Wed, Oct 16, 2019 at 09:44:08AM -0400, [hidden email] wrote:

> Thankyou Maxim, is there anyway I can make the cache manager a bit more
> aggressive in prune and purge? We already leave 20%  of space free on the
> disks, but the concurrent request rate for large files can be huge and we
> still run in to this issue.

Most likely you are hitting space limits due to temporary files, which
aren't managed by cache manager at all.  My recommendation would
be to consider proxy_cache_lock and/or "proxy_cache_use_stale
updating" to reduce number of concurrent requests trying to cache
the same files, see here:

http://nginx.org/r/proxy_cache_lock
http://nginx.org/r/proxy_cache_use_stale

Also you may want to tune things like proxy_cache_min_uses to
reduce unneeded caching on sporadic requests.

> What are your thoughts about disabling buffering on such issues? This is not
> a fatal error, so we should stop buffering and switch to streaming mode and
> let the request succeed with a error log line in error.log.

Yes, in theory.  But the question is how complex and error prone
the resulting code would be.  Hence the current choice is to treat
such errors as fatal.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx proxy cache purge process does not clean up items fast enough for new elements

j94305
Thankyou, we use proxy_cache_lock as well, but in certain weird burst
scenarios, it still ends up filling the disk.

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,285896,285910#msg-285910

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx proxy cache purge process does not clean up items fast enough for new elements

Maxim Dounin
Hello!

On Thu, Oct 17, 2019 at 08:35:58AM -0400, [hidden email] wrote:

> Thankyou, we use proxy_cache_lock as well, but in certain weird burst
> scenarios, it still ends up filling the disk.

There are two timeouts for proxy_cache_lock to tune,
proxy_cache_lock_age and proxy_cache_lock_timeout:

http://nginx.org/r/proxy_cache_lock_age
http://nginx.org/r/proxy_cache_lock_timeout

Defaults are small enough and certainly aren't optimal for
100-gigabyte files.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx