Quantcast

proxy_upstream_next while no live upstreams

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

proxy_upstream_next while no live upstreams

WuBingzheng
Hi all,

I have an upstream configure with Nginx 1.8.1 :

    upstream test {
      server 192.168.0.5;
      server 192.168.0.6;
    }

Question 1:
Assume both of the 2 servers are down.
First request tries both of them and fails, and response 502. Nginx marks both of them as DOWN. This is OK.
Second request comes and finds there is no live upstreams, then Nginx resets both of servers as UP, logs "no live upstreams", and returns 502.
My question is that in the second request, nginx dose NOT try the 2 servers, but just return 502 immediately. Is this in line with expectations?

From the code in ngx_http_upstream_next(), ft_type=NGX_HTTP_UPSTREAM_FT_NOLIVE always leads to ngx_http_upstream_finalize_request() while not ngx_http_upstream_connect().


Question 2: (not related with Question 1)
In my production environment, 192.168.0.5 is UP, and 192.168.0.6 is DOWN.
There are few access logs with $upstream_addr as "192.168.0.6, test", and $status as 502.
There were no error logs of connecting/reading 192.168.0.5 fails which mean this server is UP, so I think the request should try 192.168.0.5 after 192.168.0.6.
But it does not try 192.168.0.5, and just log "no live upstream" and return 502.
The logs like this are very few, and I can not re-produce this or debug it.
I just ask it here in case someone else know the problem.


Thanks in advance,
Wu
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: proxy_upstream_next while no live upstreams

Maxim Dounin
Hello!

On Wed, May 10, 2017 at 04:26:06PM +0800, Wu Bingzheng wrote:

> I have an upstream configure with Nginx 1.8.1 :
>
>     upstream test {
>       server 192.168.0.5;
>       server 192.168.0.6;
>     }
>
> Question 1:
> Assume both of the 2 servers are down.
> First request tries both of them and fails, and response 502.
> Nginx marks both of them as DOWN. This is OK.
> Second request comes and finds there is no live upstreams, then
> Nginx resets both of servers as UP, logs "no live upstreams",
> and returns 502.
> My question is that in the second request, nginx dose NOT try
> the 2 servers, but just return 502 immediately. Is this in line
> with expectations?

Yes, as long as all servers in an upstream group are already
considered unavailable, nginx will return 502 without trying to
connect them.

You may control when servers are considered unavailable using the
max_fails and fail_timeout parameters of the server directives,
see here:

http://nginx.org/en/docs/http/ngx_http_upstream_module.html#max_fails 

Note well that nginx versions before 1.11.5 reset all servers once
all servers are unavailable, effectively returning just one 502
per worker process.  Since nginx 1.11.5, it will wait for
fail_timeout to expire:

    *) Change: now if there are no available servers in an upstream, nginx
       will not reset number of failures of all servers as it previously
       did, but will wait for fail_timeout to expire.

> Question 2: (not related with Question 1)
> In my production environment, 192.168.0.5 is UP, and 192.168.0.6
> is DOWN.
> There are few access logs with $upstream_addr as "192.168.0.6,
> test", and $status as 502.
> There were no error logs of connecting/reading 192.168.0.5 fails
> which mean this server is UP, so I think the request should try
> 192.168.0.5 after 192.168.0.6.
> But it does not try 192.168.0.5, and just log "no live upstream"
> and return 502.
> The logs like this are very few, and I can not re-produce this
> or debug it.
> I just ask it here in case someone else know the problem.

See above, this is exactly what is expected to happen when a
request to upstream server fails.  The 502 / "no live upstream"
you are seeing is a result of all servers considered unavailable.  
There are only few such errors as you are using nginx 1.8.1, which
quickly resets failure counters of all servers in such situation.  
With recent nginx versions, 502 errors will be returned till
fail_timeout expiration.

If you want nginx to completely ignore errors on the only working
upstream server in your environment, consider using "server ...
max_fails=0".  Alternatively, consider using fail_timeout which is
appropriate for your environment.

--
Maxim Dounin
http://nginx.org/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re:Re: proxy_upstream_next while no live upstreams

WuBingzheng
Thanks for the answer.

Maybe you miss something in Question 2. The server 192.168.0.5 never fails.
I think nginx should not return 502 if there is at least one server never fails.
Exactly speaking, the server never fails in the last 1 hour and the fail_timeout is the default 10 second.


>> Question 2: (not related with Question 1) >> In my production environment, 192.168.0.5 is UP, and 192.168.0.6 >> is DOWN. >> There are few access logs with $upstream_addr as "192.168.0.6, >> test", and $status as 502. >> There were no error logs of connecting/reading 192.168.0.5 fails >> which mean this server is UP, so I think the request should try >> 192.168.0.5 after 192.168.0.6. >> But it does not try 192.168.0.5, and just log "no live upstream" >> and return 502. >> The logs like this are very few, and I can not re-produce this >> or debug it. >> I just ask it here in case someone else know the problem. > >See above, this is exactly what is expected to happen when a >request to upstream server fails. The 502 / "no live upstream" >you are seeing is a result of all servers considered unavailable. >There are only few such errors as you are using nginx 1.8.1, which >quickly resets failure counters of all servers in such situation. >With recent nginx versions, 502 errors will be returned till >fail_timeout expiration. > >If you want nginx to completely ignore errors on the only working >upstream server in your environment, consider using "server ... >max_fails=0". Alternatively, consider using fail_timeout which is >appropriate for your environment. > >-- >Maxim Dounin >http://nginx.org/ >_______________________________________________ >nginx mailing list >[hidden email] >http://mailman.nginx.org/mailman/listinfo/nginx


 


_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re: proxy_upstream_next while no live upstreams

Maxim Dounin
Hello!

On Wed, May 10, 2017 at 10:27:16PM +0800, Wu Bingzheng wrote:

> Maybe you miss something in Question 2. The server 192.168.0.5 never fails.
> I think nginx should not return 502 if there is at least one server never fails.
> Exactly speaking, the server never fails in the last 1 hour and the fail_timeout is the default 10 second.

How do you know that the server never fails?

The "no live upstreams" error indicate that it failed from nginx
point of view, and was considered unavailable.

Note that "failure" might not be something specifically logged by
nginx, but a response with a specific http code you've configured
in proxy_next_upstream, see http://nginx.org/r/proxy_next_upstream.

--
Maxim Dounin
http://nginx.org/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re:Re: Re: proxy_upstream_next while no live upstreams

WuBingzheng

The last request before this 502 request is almost 20 minutes ago and its response code is 200.

The proxy_next_upstream conf:
    proxy_next_upstream error timeout invalid_header http_502 http_503 http_504;

Here is the access log. The upstream server 192.168.0.6 is DOWN. The line-10 is the 502 request:

  1 [03/May/2017:14:35:38 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.6:8181, 192.168.0.5:8181" 0.012 0.001, 0.011
  2 [03/May/2017:14:35:38 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.013 0.013
  3 [03/May/2017:14:54:30 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.206 0.206
  4 [03/May/2017:15:03:08 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.154 0.154
  5 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.6:8181, 192.168.0.5:8181" 0.012 0.000, 0.012
  6 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.014 0.014
  7 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.016 0.016
  8 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.017 0.017
  9 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.011 0.011
 10 [03/May/2017:15:59:06 -0400] "POST /x/y HTTP/1.1" 502  "192.168.0.6:8181, test_backend" 0.000 0.000, 0.000
 11 [03/May/2017:15:59:07 -0400] "POST /x/y HTTP/1.1" 200  "10.255.222.206:8181" 0.260 0.260



At 2017-05-10 22:43:07, "Maxim Dounin" <[hidden email]> wrote:

>Hello!
>
>On Wed, May 10, 2017 at 10:27:16PM +0800, Wu Bingzheng wrote:
>
>> Maybe you miss something in Question 2. The server 192.168.0.5 never fails.
>> I think nginx should not return 502 if there is at least one server never fails.
>> Exactly speaking, the server never fails in the last 1 hour and the fail_timeout is the default 10 second.
>
>How do you know that the server never fails?
>
>The "no live upstreams" error indicate that it failed from nginx
>point of view, and was considered unavailable.
>
>Note that "failure" might not be something specifically logged by
>nginx, but a response with a specific http code you've configured
>in proxy_next_upstream, see http://nginx.org/r/proxy_next_upstream.
>
>--
>Maxim Dounin
>http://nginx.org/
>_______________________________________________
>nginx mailing list
>[hidden email]
>http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re: Re: proxy_upstream_next while no live upstreams

Maxim Dounin
Hello!

On Fri, May 12, 2017 at 01:24:14PM +0800, Wu Bingzheng wrote:

>
> The last request before this 502 request is almost 20 minutes ago and its response code is 200.
>
> The proxy_next_upstream conf:
>     proxy_next_upstream error timeout invalid_header http_502 http_503 http_504;
>
> Here is the access log. The upstream server 192.168.0.6 is DOWN. The line-10 is the 502 request:
>
>   1 [03/May/2017:14:35:38 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.6:8181, 192.168.0.5:8181" 0.012 0.001, 0.011
>   2 [03/May/2017:14:35:38 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.013 0.013
>   3 [03/May/2017:14:54:30 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.206 0.206
>   4 [03/May/2017:15:03:08 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.154 0.154
>   5 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.6:8181, 192.168.0.5:8181" 0.012 0.000, 0.012
>   6 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.014 0.014
>   7 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.016 0.016
>   8 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.017 0.017
>   9 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.011 0.011
>  10 [03/May/2017:15:59:06 -0400] "POST /x/y HTTP/1.1" 502  "192.168.0.6:8181, test_backend" 0.000 0.000, 0.000
>  11 [03/May/2017:15:59:07 -0400] "POST /x/y HTTP/1.1" 200  "10.255.222.206:8181" 0.260 0.260

Looking into response status code in access logs is not enough to
understand if a server is up or down.  For at least the following
reasons:

- there might be over requests currently in flight which are not
  yet logged;

- errors may occur while sending response body, and hence status
  code will not show if there was an error.

It is usually a good idea to look into error logs instead.

--
Maxim Dounin
http://nginx.org/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re:Re: Re: Re: proxy_upstream_next while no live upstreams

WuBingzheng

Because the last request before this 502-request was almost 20 minutes ago, so there was no error log in 20 minutes before this 502-request.

This is some strange, and only happens very rarely.

I know it's difficult to debug this if not reproduced. I just ask here to see if this is a known question.

Thanks for your answer.

Wu




At 2017-05-12 18:39:12, "Maxim Dounin" <[hidden email]> wrote:

>Hello!
>
>On Fri, May 12, 2017 at 01:24:14PM +0800, Wu Bingzheng wrote:
>
>>
>> The last request before this 502 request is almost 20 minutes ago and its response code is 200.
>>
>> The proxy_next_upstream conf:
>>     proxy_next_upstream error timeout invalid_header http_502 http_503 http_504;
>>
>> Here is the access log. The upstream server 192.168.0.6 is DOWN. The line-10 is the 502 request:
>>
>>   1 [03/May/2017:14:35:38 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.6:8181, 192.168.0.5:8181" 0.012 0.001, 0.011
>>   2 [03/May/2017:14:35:38 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.013 0.013
>>   3 [03/May/2017:14:54:30 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.206 0.206
>>   4 [03/May/2017:15:03:08 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.154 0.154
>>   5 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.6:8181, 192.168.0.5:8181" 0.012 0.000, 0.012
>>   6 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.014 0.014
>>   7 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.016 0.016
>>   8 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.017 0.017
>>   9 [03/May/2017:15:40:51 -0400] "POST /x/y HTTP/1.1" 200  "192.168.0.5:8181" 0.011 0.011
>>  10 [03/May/2017:15:59:06 -0400] "POST /x/y HTTP/1.1" 502  "192.168.0.6:8181, test_backend" 0.000 0.000, 0.000
>>  11 [03/May/2017:15:59:07 -0400] "POST /x/y HTTP/1.1" 200  "10.255.222.206:8181" 0.260 0.260
>
>Looking into response status code in access logs is not enough to
>understand if a server is up or down.  For at least the following
>reasons:
>
>- there might be over requests currently in flight which are not
>  yet logged;
>
>- errors may occur while sending response body, and hence status
>  code will not show if there was an error.
>
>It is usually a good idea to look into error logs instead.
>
>--
>Maxim Dounin
>http://nginx.org/
>_______________________________________________
>nginx mailing list
>[hidden email]
>http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Loading...