Strange behavior on proxy cache at high load spike

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange behavior on proxy cache at high load spike

vergil
Hi,
this bugs me for some time now. I have nginx 1.16.0 configured as following
on proxy cache:

proxy_cache_path           /dev/shm/nginx_cache levels=1:2
keys_zone=proxy:1024m max_size=1024m inactive=60m;
proxy_temp_path            /dev/shm/nginx_proxy_tmp;
proxy_cache_use_stale      updating;
proxy_cache_lock           on;
proxy_cache_lock_timeout   30s;

Most of the time all is fine and working as expected. There is some
specialty in the deployment setup where some expected spikes in requests
(end clients updating daily data) to few locations occur. Response size
varies 1M-1.5M non-gziped. Log snippet from such spike:

[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200
445984  cache: HIT request time: 50.211 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200
780472  cache: HIT request time: 52.891 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200
85432  cache: HIT request time: 33.284 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200
57920  cache: HIT request time: 34.957 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200
401096  cache: HIT request time: 49.991 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200
244712  cache: HIT request time: 48.412 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200
101360  cache: HIT request time: 34.955 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200
102808  cache: HIT request time: 34.753 sec
...                                                                        
             
[2020-03-24T00:02:16] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200
1526025  cache: HIT request time: 48.671 sec

Monitoring du on cache location shows max 1.1G, like:
1.1G    /dev/shm/nginx_cache
0       /dev/shm/nginx_proxy_tmp

After 2minutes response 'stabilizes' with correct size (in this example
1526025). Problem is also amplified due clients validate response and retry
progressively if corrupted.

There are no weird log lines in error log or linux (centos) messages, also
there is no cache 'updating', just hits (I guess this omits upstream servers
issue). Is it possible we have issue with reading cached entries from
/dev/shm during peak times?

I would kindly ask for hints where possibly to start looking and debugging?
Big thanks in advance

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,287951,287951#msg-287951

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Strange behavior on proxy cache at high load spike

J.R.
> After 2minutes response 'stabilizes' with correct size (in this example
> 1526025). Problem is also amplified due clients validate response and retry
> progressively if corrupted.

What is the response your upstream is sending back? If the 'corrupted'
data is still a 200, then nginx will cache that... You need to make
sure it's sending back a 5xx if it's overloaded or whatever error
would be relevant.

You might want to consider expanding your 'use_stale' like:

'proxy_cache_use_stale error timeout invalid_header http_500 http_502
http_503 http_504;'

Why are you wasting duplicating the data in the same SHM? Just set
'use_temp_path=off' in the proxy_cache_path and be done with it.

What is the valid cache time for the content? (i.e. the headers)  If
they are missing or things are set to 'no cache', then you are
obviously going to have issues...
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx