No live upstreams with a single upstream

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

No live upstreams with a single upstream

wld75
While running a load test that injects 10k TPS across 3 Nginx instances, we
are seeing spikes of errors where Nginx returns HTTP 502 and logs the
message 'no live upstreams while connecting to upstream'.  There are no
other errors logged e.g. connection errors.

Also, we have a single upstream virtual IP (we use iptables to balance load
across the backend) and according to the docs the upstream should never be
marked as down in this case:

'If there is only a single server in a group, max_fails, fail_timeout and
slow_start parameters are ignored, and such a server will never be
considered unavailable'

Testing locally with our config confirms this and I cannot reproduce the 'no
live upstreams while connecting to upstream' message when simulating
connection and read errors with a single upstream.

To debug I tried enabling debug logs but under load that degraded
performance too much.  I also traced the worker process with strace and
didn't find any socket or other other errors during the 502 spike.

I was able to create this issue on Nginx 1.12.2 and 1.15.3.

So given that we don't see any source error and we have a single upstream,
I'm interested to know what other scenarios could result in a 502 with the
log message 'no live upstreams while connecting to upstream'?

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,281255,281255#msg-281255

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: No live upstreams with a single upstream

Maxim Dounin
Hello!

On Tue, Sep 18, 2018 at 06:02:46AM -0400, domleb wrote:

> While running a load test that injects 10k TPS across 3 Nginx instances, we
> are seeing spikes of errors where Nginx returns HTTP 502 and logs the
> message 'no live upstreams while connecting to upstream'.  There are no
> other errors logged e.g. connection errors.
>
> Also, we have a single upstream virtual IP (we use iptables to balance load
> across the backend) and according to the docs the upstream should never be
> marked as down in this case:
>
> 'If there is only a single server in a group, max_fails, fail_timeout and
> slow_start parameters are ignored, and such a server will never be
> considered unavailable'
>
> Testing locally with our config confirms this and I cannot reproduce the 'no
> live upstreams while connecting to upstream' message when simulating
> connection and read errors with a single upstream.
>
> To debug I tried enabling debug logs but under load that degraded
> performance too much.  I also traced the worker process with strace and
> didn't find any socket or other other errors during the 502 spike.
>
> I was able to create this issue on Nginx 1.12.2 and 1.15.3.
>
> So given that we don't see any source error and we have a single upstream,
> I'm interested to know what other scenarios could result in a 502 with the
> log message 'no live upstreams while connecting to upstream'?

Could you please show the upstream configuration you are using?

With a single server in the upstream block, "no live upstreams"
error may happen if:

- the server is marked "down" in the configuration, or
- the server reached the max_conns limit.

Also note that "a single server" does not apply to cases when
there is a single hostname which resolves to multiple IP address
(this defines multiple servers at once).

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: No live upstreams with a single upstream

wld75
Maxim Dounin Wrote:
-------------------------------------------------------

> Hello!
>
> On Tue, Sep 18, 2018 at 06:02:46AM -0400, domleb wrote:
>
> > While running a load test that injects 10k TPS across 3 Nginx
> instances, we
> > are seeing spikes of errors where Nginx returns HTTP 502 and logs
> the
> > message 'no live upstreams while connecting to upstream'.  There are
> no
> > other errors logged e.g. connection errors.
> >
> > Also, we have a single upstream virtual IP (we use iptables to
> balance load
> > across the backend) and according to the docs the upstream should
> never be
> > marked as down in this case:
> >
> > 'If there is only a single server in a group, max_fails,
> fail_timeout and
> > slow_start parameters are ignored, and such a server will never be
> > considered unavailable'
> >
> > Testing locally with our config confirms this and I cannot reproduce
> the 'no
> > live upstreams while connecting to upstream' message when simulating
> > connection and read errors with a single upstream.
> >
> > To debug I tried enabling debug logs but under load that degraded
> > performance too much.  I also traced the worker process with strace
> and
> > didn't find any socket or other other errors during the 502 spike.
> >
> > I was able to create this issue on Nginx 1.12.2 and 1.15.3.
> >
> > So given that we don't see any source error and we have a single
> upstream,
> > I'm interested to know what other scenarios could result in a 502
> with the
> > log message 'no live upstreams while connecting to upstream'?
>
> Could you please show the upstream configuration you are using?
>
> With a single server in the upstream block, "no live upstreams"
> error may happen if:
>
> - the server is marked "down" in the configuration, or
> - the server reached the max_conns limit.
>
> Also note that "a single server" does not apply to cases when
> there is a single hostname which resolves to multiple IP address
> (this defines multiple servers at once).
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx


I removed our max_conns limit and that resolved the issue - thanks for the
help.

I might be worth changing the log message in this case as I believe the
upstream is still live and there are no other log messages to indicate what
the problem is.

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,281255,281298#msg-281298

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx