100% CPU use in ngx_http_finalize_connection

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

100% CPU use in ngx_http_finalize_connection

Richard Stanway
Hello,
I recently moved our site to a new server running Linux 4.9, Debian
8.7 64 bit with nginx 1.11.11 from the nginx repository. Our config is
straightforward - epoll, a few proxy backends and a few fastcgi
backends, a handful of vhosts, some with HTTP2, geoip module loaded.
No AIO, no threads, no timer_resolution.

After some time, nginx worker processes are getting stuck at 100% CPU
use in what seems to be ngx_http_finalize_connection. New requests
hitting the worker are completely stalled. Eventually all nginx
workers will become stuck and the sites become unreachable.

I'm running older versions of nginx on the same versions of Debian and
Linux at other sites without a problem, but the server giving me
problems also receives a much larger amount of traffic than the
others. Due to the traffic, the debug log gets incredibly large which
makes it difficult to isolate the error. I've posted a 1 second
excerpt of the core debug log at http://pastebin.com/hqzGzjTV during
the time that some of the workers were at 100%, however I'm not sure
this contains enough information. I'll look into enabling HTTP level
logging if necessary.

Has anyone experienced anything similar to this or have any ideas
where to start looking to debug this?

Thanks.

nginx version: nginx/1.11.11
built by gcc 4.9.2 (Debian 4.9.2-10)
built with OpenSSL 1.0.1t  3 May 2016
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx
--modules-path=/usr/lib/nginx/modules
--conf-path=/etc/nginx/nginx.conf
--error-log-path=/var/log/nginx/error.log
--http-log-path=/var/log/nginx/access.log
--pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock
--http-client-body-temp-path=/var/cache/nginx/client_temp
--http-proxy-temp-path=/var/cache/nginx/proxy_temp
--http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp
--http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp
--http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx
--group=nginx --with-compat --with-file-aio --with-threads
--with-http_addition_module --with-http_auth_request_module
--with-http_dav_module --with-http_flv_module
--with-http_gunzip_module --with-http_gzip_static_module
--with-http_mp4_module --with-http_random_index_module
--with-http_realip_module --with-http_secure_link_module
--with-http_slice_module --with-http_ssl_module
--with-http_stub_status_module --with-http_sub_module
--with-http_v2_module --with-mail --with-mail_ssl_module --with-stream
--with-stream_realip_module --with-stream_ssl_module
--with-stream_ssl_preread_module --with-cc-opt='-g -O2
-fstack-protector-strong -Wformat -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now
-Wl,--as-needed -pie'


#0  0x000055d533ab87e8 in ngx_pfree (pool=0x55d536202fe0,
p=0x55d5361636c0) at src/core/ngx_palloc.c:282
#1  0x000055d533af54d9 in ngx_http_set_keepalive (r=<optimized out>)
at src/http/ngx_http_request.c:3000
#2  ngx_http_finalize_connection (r=<optimized out>) at
src/http/ngx_http_request.c:2556
#3  0x000055d533af0d8b in ngx_http_core_content_phase
(r=0x55d536136f10, ph=0x55d537cbf210) at
src/http/ngx_http_core_module.c:1391
#4  0x000055d533aeb29d in ngx_http_core_run_phases
(r=r@entry=0x55d536136f10) at src/http/ngx_http_core_module.c:860
#5  0x000055d533aeb392 in ngx_http_handler (r=r@entry=0x55d536136f10)
at src/http/ngx_http_core_module.c:843
#6  0x000055d533af669e in ngx_http_process_request (r=0x55d536136f10)
at src/http/ngx_http_request.c:1921
#7  0x000055d533adeda4 in ngx_epoll_process_events (cycle=<optimized
out>, timer=<optimized out>, flags=<optimized out>) at
src/event/modules/ngx_epoll_module.c:902
#8  0x000055d533ad5caa in ngx_process_events_and_timers
(cycle=cycle@entry=0x55d5357ba110) at src/event/ngx_event.c:242
#9  0x000055d533adcc31 in ngx_worker_process_cycle
(cycle=cycle@entry=0x55d5357ba110, data=data@entry=0x12) at
src/os/unix/ngx_process_cycle.c:749
#10 0x000055d533adb583 in ngx_spawn_process
(cycle=cycle@entry=0x55d5357ba110, proc=proc@entry=0x55d533adcbb0
<ngx_worker_process_cycle>, data=data@entry=0x12,
name=name@entry=0x55d533b71db0 "worker process",
respawn=respawn@entry=-4) at src/os/unix/ngx_process.c:198
#11 0x000055d533adce50 in ngx_start_worker_processes
(cycle=0x55d5357ba110, n=24, type=-4) at
src/os/unix/ngx_process_cycle.c:358
#12 0x000055d533addae7 in ngx_master_process_cycle
(cycle=0x55d5357ba110) at src/os/unix/ngx_process_cycle.c:243
#13 0x000055d533ab5e56 in main (argc=<optimized out>, argv=<optimized
out>) at src/core/nginx.c:375
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 100% CPU use in ngx_http_finalize_connection

Richard Stanway
I caught another loop, this time using nginx-debug with source. It
seems it is stuck in a loop trying to ngx_pfree something that is
already freed? I don't really understand the source enough to know
what's going on, but the parameters to ngx_pfree are the same every
time and the code keeps looping over this part.

(gdb) frame 0
#0  0x0000555e0d19cfce in ngx_http_set_keepalive (r=0x555e0f52c1b0) at
src/http/ngx_http_request.c:2987
2987                ngx_free_chain(c->pool, ln);
(gdb) list
2982        if (hc->free) {
2983            for (cl = hc->free; cl; /* void */) {
2984                ln = cl;
2985                cl = cl->next;
2986                ngx_pfree(c->pool, ln->buf->start);
2987                ngx_free_chain(c->pool, ln);
2988            }
2989
2990            hc->free = NULL;
2991        }
(gdb) s
2983            for (cl = hc->free; cl; /* void */) {
(gdb)
2986                ngx_pfree(c->pool, ln->buf->start);
(gdb)
2985                cl = cl->next;
(gdb)
2986                ngx_pfree(c->pool, ln->buf->start);
(gdb)
ngx_pfree (pool=0x555e0db19680, p=0x555e0f52d790) at src/core/ngx_palloc.c:279
279     {
(gdb)
282         for (l = pool->large; l; l = l->next) {
(gdb)
293         return NGX_DECLINED;
(gdb)
282         for (l = pool->large; l; l = l->next) {
(gdb)
283             if (p == l->alloc) {
(gdb)
283             if (p == l->alloc) {
(gdb)
283             if (p == l->alloc) {
(gdb)
293         return NGX_DECLINED;
(gdb)
294     }
(gdb)
ngx_http_set_keepalive (r=0xfffffffffffffffb) at
src/http/ngx_http_request.c:2987
2987                ngx_free_chain(c->pool, ln);
(gdb)
2983            for (cl = hc->free; cl; /* void */) {
(gdb)
2987                ngx_free_chain(c->pool, ln);
(gdb)
2983            for (cl = hc->free; cl; /* void */) {
(gdb)
2986                ngx_pfree(c->pool, ln->buf->start);
(gdb)
2985                cl = cl->next;
(gdb)
2986                ngx_pfree(c->pool, ln->buf->start);
(gdb)
ngx_pfree (pool=0x555e0db19680, p=0x555e0f52d790) at src/core/ngx_palloc.c:279
279     {
(gdb)
282         for (l = pool->large; l; l = l->next) {
(gdb)
293         return NGX_DECLINED;
(gdb)
282         for (l = pool->large; l; l = l->next) {
(gdb)
283             if (p == l->alloc) {
(gdb)
283             if (p == l->alloc) {
(gdb)
283             if (p == l->alloc) {
(gdb)
293         return NGX_DECLINED;
(gdb)
294     }
(gdb)
ngx_http_set_keepalive (r=0xfffffffffffffffb) at
src/http/ngx_http_request.c:2987
2987                ngx_free_chain(c->pool, ln);
(gdb)
2983            for (cl = hc->free; cl; /* void */) {
(gdb)
2987                ngx_free_chain(c->pool, ln);
(gdb)
2983            for (cl = hc->free; cl; /* void */) {
(gdb)
2986                ngx_pfree(c->pool, ln->buf->start);
(gdb)
2985                cl = cl->next;
(gdb)
2986                ngx_pfree(c->pool, ln->buf->start);
(gdb)
ngx_pfree (pool=0x555e0db19680, p=0x555e0f52d790) at src/core/ngx_palloc.c:279
279     {
(gdb)
282         for (l = pool->large; l; l = l->next) {
(gdb)
293         return NGX_DECLINED;
(gdb)
282         for (l = pool->large; l; l = l->next) {
(gdb)
283             if (p == l->alloc) {
(gdb)
283             if (p == l->alloc) {
(gdb)
283             if (p == l->alloc) {
(gdb)
293         return NGX_DECLINED;
(gdb)
294     }
(gdb)
ngx_http_set_keepalive (r=0xfffffffffffffffb) at
src/http/ngx_http_request.c:2987
2987                ngx_free_chain(c->pool, ln);
(gdb)
2983            for (cl = hc->free; cl; /* void */) {
(gdb)
2987                ngx_free_chain(c->pool, ln);
(gdb)
2983            for (cl = hc->free; cl; /* void */) {
(gdb)
2986                ngx_pfree(c->pool, ln->buf->start);
(gdb)
2985                cl = cl->next;
(gdb)
2986                ngx_pfree(c->pool, ln->buf->start);
(gdb)
ngx_pfree (pool=0x555e0db19680, p=0x555e0f52d790) at src/core/ngx_palloc.c:279
279     {
(gdb)
282         for (l = pool->large; l; l = l->next) {
(gdb)
293         return NGX_DECLINED;
(gdb)
282         for (l = pool->large; l; l = l->next) {
(gdb)
283             if (p == l->alloc) {
(gdb)
283             if (p == l->alloc) {
(gdb)
283             if (p == l->alloc) {
(gdb)
293         return NGX_DECLINED;
(gdb)
294     }
(gdb)
ngx_http_set_keepalive (r=0xfffffffffffffffb) at
src/http/ngx_http_request.c:2987
2987                ngx_free_chain(c->pool, ln);
(gdb)
2983            for (cl = hc->free; cl; /* void */) {
(gdb)
2987                ngx_free_chain(c->pool, ln);

(and so on...)


On Fri, Mar 24, 2017 at 1:31 PM, Richard Stanway
<[hidden email]> wrote:

> Hello,
> I recently moved our site to a new server running Linux 4.9, Debian
> 8.7 64 bit with nginx 1.11.11 from the nginx repository. Our config is
> straightforward - epoll, a few proxy backends and a few fastcgi
> backends, a handful of vhosts, some with HTTP2, geoip module loaded.
> No AIO, no threads, no timer_resolution.
>
> After some time, nginx worker processes are getting stuck at 100% CPU
> use in what seems to be ngx_http_finalize_connection. New requests
> hitting the worker are completely stalled. Eventually all nginx
> workers will become stuck and the sites become unreachable.
>
> I'm running older versions of nginx on the same versions of Debian and
> Linux at other sites without a problem, but the server giving me
> problems also receives a much larger amount of traffic than the
> others. Due to the traffic, the debug log gets incredibly large which
> makes it difficult to isolate the error. I've posted a 1 second
> excerpt of the core debug log at http://pastebin.com/hqzGzjTV during
> the time that some of the workers were at 100%, however I'm not sure
> this contains enough information. I'll look into enabling HTTP level
> logging if necessary.
>
> Has anyone experienced anything similar to this or have any ideas
> where to start looking to debug this?
>
> Thanks.
>
> nginx version: nginx/1.11.11
> built by gcc 4.9.2 (Debian 4.9.2-10)
> built with OpenSSL 1.0.1t  3 May 2016
> TLS SNI support enabled
> configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx
> --modules-path=/usr/lib/nginx/modules
> --conf-path=/etc/nginx/nginx.conf
> --error-log-path=/var/log/nginx/error.log
> --http-log-path=/var/log/nginx/access.log
> --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock
> --http-client-body-temp-path=/var/cache/nginx/client_temp
> --http-proxy-temp-path=/var/cache/nginx/proxy_temp
> --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp
> --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp
> --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx
> --group=nginx --with-compat --with-file-aio --with-threads
> --with-http_addition_module --with-http_auth_request_module
> --with-http_dav_module --with-http_flv_module
> --with-http_gunzip_module --with-http_gzip_static_module
> --with-http_mp4_module --with-http_random_index_module
> --with-http_realip_module --with-http_secure_link_module
> --with-http_slice_module --with-http_ssl_module
> --with-http_stub_status_module --with-http_sub_module
> --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream
> --with-stream_realip_module --with-stream_ssl_module
> --with-stream_ssl_preread_module --with-cc-opt='-g -O2
> -fstack-protector-strong -Wformat -Werror=format-security
> -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now
> -Wl,--as-needed -pie'
>
>
> #0  0x000055d533ab87e8 in ngx_pfree (pool=0x55d536202fe0,
> p=0x55d5361636c0) at src/core/ngx_palloc.c:282
> #1  0x000055d533af54d9 in ngx_http_set_keepalive (r=<optimized out>)
> at src/http/ngx_http_request.c:3000
> #2  ngx_http_finalize_connection (r=<optimized out>) at
> src/http/ngx_http_request.c:2556
> #3  0x000055d533af0d8b in ngx_http_core_content_phase
> (r=0x55d536136f10, ph=0x55d537cbf210) at
> src/http/ngx_http_core_module.c:1391
> #4  0x000055d533aeb29d in ngx_http_core_run_phases
> (r=r@entry=0x55d536136f10) at src/http/ngx_http_core_module.c:860
> #5  0x000055d533aeb392 in ngx_http_handler (r=r@entry=0x55d536136f10)
> at src/http/ngx_http_core_module.c:843
> #6  0x000055d533af669e in ngx_http_process_request (r=0x55d536136f10)
> at src/http/ngx_http_request.c:1921
> #7  0x000055d533adeda4 in ngx_epoll_process_events (cycle=<optimized
> out>, timer=<optimized out>, flags=<optimized out>) at
> src/event/modules/ngx_epoll_module.c:902
> #8  0x000055d533ad5caa in ngx_process_events_and_timers
> (cycle=cycle@entry=0x55d5357ba110) at src/event/ngx_event.c:242
> #9  0x000055d533adcc31 in ngx_worker_process_cycle
> (cycle=cycle@entry=0x55d5357ba110, data=data@entry=0x12) at
> src/os/unix/ngx_process_cycle.c:749
> #10 0x000055d533adb583 in ngx_spawn_process
> (cycle=cycle@entry=0x55d5357ba110, proc=proc@entry=0x55d533adcbb0
> <ngx_worker_process_cycle>, data=data@entry=0x12,
> name=name@entry=0x55d533b71db0 "worker process",
> respawn=respawn@entry=-4) at src/os/unix/ngx_process.c:198
> #11 0x000055d533adce50 in ngx_start_worker_processes
> (cycle=0x55d5357ba110, n=24, type=-4) at
> src/os/unix/ngx_process_cycle.c:358
> #12 0x000055d533addae7 in ngx_master_process_cycle
> (cycle=0x55d5357ba110) at src/os/unix/ngx_process_cycle.c:243
> #13 0x000055d533ab5e56 in main (argc=<optimized out>, argv=<optimized
> out>) at src/core/nginx.c:375
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 100% CPU use in ngx_http_finalize_connection

Maxim Dounin
In reply to this post by Richard Stanway
Hello!

On Fri, Mar 24, 2017 at 01:31:35PM +0100, Richard Stanway wrote:

> Hello,
> I recently moved our site to a new server running Linux 4.9, Debian
> 8.7 64 bit with nginx 1.11.11 from the nginx repository. Our config is
> straightforward - epoll, a few proxy backends and a few fastcgi
> backends, a handful of vhosts, some with HTTP2, geoip module loaded.
> No AIO, no threads, no timer_resolution.
>
> After some time, nginx worker processes are getting stuck at 100% CPU
> use in what seems to be ngx_http_finalize_connection. New requests
> hitting the worker are completely stalled. Eventually all nginx
> workers will become stuck and the sites become unreachable.
>
> I'm running older versions of nginx on the same versions of Debian and
> Linux at other sites without a problem, but the server giving me
> problems also receives a much larger amount of traffic than the
> others. Due to the traffic, the debug log gets incredibly large which
> makes it difficult to isolate the error. I've posted a 1 second
> excerpt of the core debug log at http://pastebin.com/hqzGzjTV during
> the time that some of the workers were at 100%, however I'm not sure
> this contains enough information. I'll look into enabling HTTP level
> logging if necessary.
>
> Has anyone experienced anything similar to this or have any ideas
> where to start looking to debug this?
>
> Thanks.
>
> nginx version: nginx/1.11.11
> built by gcc 4.9.2 (Debian 4.9.2-10)

[...]

> #0  0x000055d533ab87e8 in ngx_pfree (pool=0x55d536202fe0,
> p=0x55d5361636c0) at src/core/ngx_palloc.c:282
> #1  0x000055d533af54d9 in ngx_http_set_keepalive (r=<optimized out>)
> at src/http/ngx_http_request.c:3000
> #2  ngx_http_finalize_connection (r=<optimized out>) at
> src/http/ngx_http_request.c:2556
> #3  0x000055d533af0d8b in ngx_http_core_content_phase
> (r=0x55d536136f10, ph=0x55d537cbf210) at
> src/http/ngx_http_core_module.c:1391

I think I see the problem.
Please try the following patch:

diff --git a/src/http/ngx_http_request.c b/src/http/ngx_http_request.c
--- a/src/http/ngx_http_request.c
+++ b/src/http/ngx_http_request.c
@@ -2904,6 +2904,7 @@ ngx_http_set_keepalive(ngx_http_request_
             }
 
             cl->buf = b;
+            cl->next = NULL;
 
             hc->busy = cl;
             hc->nbusy = 1;

--
Maxim Dounin
http://nginx.org/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 100% CPU use in ngx_http_finalize_connection

Richard Stanway
Hi Maxim,
Thanks for the quick patch! I've applied it to our server and will
monitor the results. Usually the problem starts to occur within 1-2
hours of a restart, so I'll post again later today with an update.

On Fri, Mar 24, 2017 at 2:03 PM, Maxim Dounin <[hidden email]> wrote:

> Hello!
>
> On Fri, Mar 24, 2017 at 01:31:35PM +0100, Richard Stanway wrote:
>
>> Hello,
>> I recently moved our site to a new server running Linux 4.9, Debian
>> 8.7 64 bit with nginx 1.11.11 from the nginx repository. Our config is
>> straightforward - epoll, a few proxy backends and a few fastcgi
>> backends, a handful of vhosts, some with HTTP2, geoip module loaded.
>> No AIO, no threads, no timer_resolution.
>>
>> After some time, nginx worker processes are getting stuck at 100% CPU
>> use in what seems to be ngx_http_finalize_connection. New requests
>> hitting the worker are completely stalled. Eventually all nginx
>> workers will become stuck and the sites become unreachable.
>>
>> I'm running older versions of nginx on the same versions of Debian and
>> Linux at other sites without a problem, but the server giving me
>> problems also receives a much larger amount of traffic than the
>> others. Due to the traffic, the debug log gets incredibly large which
>> makes it difficult to isolate the error. I've posted a 1 second
>> excerpt of the core debug log at http://pastebin.com/hqzGzjTV during
>> the time that some of the workers were at 100%, however I'm not sure
>> this contains enough information. I'll look into enabling HTTP level
>> logging if necessary.
>>
>> Has anyone experienced anything similar to this or have any ideas
>> where to start looking to debug this?
>>
>> Thanks.
>>
>> nginx version: nginx/1.11.11
>> built by gcc 4.9.2 (Debian 4.9.2-10)
>
> [...]
>
>> #0  0x000055d533ab87e8 in ngx_pfree (pool=0x55d536202fe0,
>> p=0x55d5361636c0) at src/core/ngx_palloc.c:282
>> #1  0x000055d533af54d9 in ngx_http_set_keepalive (r=<optimized out>)
>> at src/http/ngx_http_request.c:3000
>> #2  ngx_http_finalize_connection (r=<optimized out>) at
>> src/http/ngx_http_request.c:2556
>> #3  0x000055d533af0d8b in ngx_http_core_content_phase
>> (r=0x55d536136f10, ph=0x55d537cbf210) at
>> src/http/ngx_http_core_module.c:1391
>
> I think I see the problem.
> Please try the following patch:
>
> diff --git a/src/http/ngx_http_request.c b/src/http/ngx_http_request.c
> --- a/src/http/ngx_http_request.c
> +++ b/src/http/ngx_http_request.c
> @@ -2904,6 +2904,7 @@ ngx_http_set_keepalive(ngx_http_request_
>              }
>
>              cl->buf = b;
> +            cl->next = NULL;
>
>              hc->busy = cl;
>              hc->nbusy = 1;
>
> --
> Maxim Dounin
> http://nginx.org/
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 100% CPU use in ngx_http_finalize_connection

Maxim Dounin
Hello!

On Fri, Mar 24, 2017 at 02:11:12PM +0100, Richard Stanway wrote:

> Hi Maxim,
> Thanks for the quick patch! I've applied it to our server and will
> monitor the results. Usually the problem starts to occur within 1-2
> hours of a restart, so I'll post again later today with an update.

A version with the fix was released, nginx 1.11.12.
Thanks for reporting this.

--
Maxim Dounin
http://nginx.org/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 100% CPU use in ngx_http_finalize_connection

Richard Stanway
Thanks Maxim, everything is looking great after the patch.

On Fri, Mar 24, 2017 at 4:26 PM, Maxim Dounin <[hidden email]> wrote:

> Hello!
>
> On Fri, Mar 24, 2017 at 02:11:12PM +0100, Richard Stanway wrote:
>
>> Hi Maxim,
>> Thanks for the quick patch! I've applied it to our server and will
>> monitor the results. Usually the problem starts to occur within 1-2
>> hours of a restart, so I'll post again later today with an update.
>
> A version with the fix was released, nginx 1.11.12.
> Thanks for reporting this.
>
> --
> Maxim Dounin
> http://nginx.org/
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Loading...