feature request: warn when domain name resolves to several addresses

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

feature request: warn when domain name resolves to several addresses

Roger Pack
I noticed that in ngx_http_proxy_module

proxy_pass http://localhost:8000/uri/;
"If a domain name resolves to several addresses, all of them will be
used in a round-robin fashion. In addition, an address can be
specified as a server group."

However this can be confusing for end users who innocently put the
domain name "localhost" then find that round-robin across ipv6 and
ipv4 is occurring, ref:
https://stackoverflow.com/a/58924751/32453
https://stackoverflow.com/a/52550758/32453

Suggestion/feature request: If a domain name resolves to several
addresses, log a warning in error.log file somehow, or at least in the
output of -T, to warn  somehow.  Then there won't be unexpected
round-robins occurring and "supposedly single" servers being
considered unavailable due to timeouts, surprising people like myself.

Thank you for your attention, and for nginx, it's rocking fast! :)

-Roger Pack-
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: feature request: warn when domain name resolves to several addresses

Maxim Dounin
Hello!

On Tue, Nov 19, 2019 at 10:47:01AM -0700, Roger Pack wrote:

> I noticed that in ngx_http_proxy_module
>
> proxy_pass http://localhost:8000/uri/;
> "If a domain name resolves to several addresses, all of them will be
> used in a round-robin fashion. In addition, an address can be
> specified as a server group."
>
> However this can be confusing for end users who innocently put the
> domain name "localhost" then find that round-robin across ipv6 and
> ipv4 is occurring, ref:
> https://stackoverflow.com/a/58924751/32453

This seems to be your own answer, and it looks incorrect to me.  
In particular, the 499 error is logged when the client closes
connection, and there is no need to have more than one backend
server specified to see 499 errors.

> https://stackoverflow.com/a/52550758/32453

Changing "localhost" to "127.0.0.1" here "works" because having just
one address triggers slightly different logic in the upstream
code: with just one address, max_fails / fail_timeout logic is
disabled, and nginx always uses the (only) address available, even
if there are errors.

The underlying problem is still the same though: backends cannot
cope with the load, and there are errors.

(And no, it's not a DNS failure - DNS is only used when nginx
resolves the name in the proxy_pass directive while parsing
configuration on startup.)

> Suggestion/feature request: If a domain name resolves to several
> addresses, log a warning in error.log file somehow, or at least in the
> output of -T, to warn  somehow.  Then there won't be unexpected
> round-robins occurring and "supposedly single" servers being
> considered unavailable due to timeouts, surprising people like myself.

Multiple addresses are fairy normal, and I don't think that
logging a warning is a good idea.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: feature request: warn when domain name resolves to several addresses

Roger Pack
On Tue, Nov 19, 2019 at 12:01 PM Maxim Dounin <[hidden email]> wrote:
>
> Hello!

Hi back again :)

> On Tue, Nov 19, 2019 at 10:47:01AM -0700, Roger Pack wrote:
>
> > I noticed that in ngx_http_proxy_module
> >
> > proxy_pass http://localhost:8000/uri/;
> > "If a domain name resolves to several addresses, all of them will be
> > used in a round-robin fashion. In addition, an address can be
> > specified as a server group."
> >
> > However this can be confusing for end users who innocently put the
> > domain name "localhost" then find that round-robin across ipv6 and
> > ipv4 is occurring, ref:
> > https://stackoverflow.com/a/58924751/32453
>
> This seems to be your own answer, and it looks incorrect to me.
> In particular, the 499 error is logged when the client closes
> connection, and there is no need to have more than one backend
> server specified to see 499 errors.

True, those cases were covered in some other answers to that question,
but I'll add a note. :)
It can also be logged when the backend server times out, at least
empirically that seems to be the case...
see also https://serverfault.com/questions/523340/post-request-is-repeated-with-nginx-loadbalanced-server-status-499/783624#783624

> > https://stackoverflow.com/a/52550758/32453
>
> Changing "localhost" to "127.0.0.1" here "works" because having just
> one address triggers slightly different logic in the upstream
> code: with just one address, max_fails / fail_timeout logic is
> disabled, and nginx always uses the (only) address available, even
> if there are errors.

Right.  The confusion in my mind is that people configuring Nginx will
use one backend "localhost", and assume they have set it up for a
"single server" type server group.
Since they have listed only one host.  But it has not...
See for instance https://stackoverflow.com/a/52550758

> The underlying problem is still the same though: backends cannot
> cope with the load, and there are errors.

Right.  However with the "single server" scenario this behavior is
handled differently (it doesn't exhaust the server group of available
servers and begin to return with 502's exclusively for a time, as it
did in my instance...).

Basically if, while setting it up, you happen to forward to 127.0.0.1,
it will work fine, no "periods of 502's" (though you may get some
504's).

But if you forward it to "localhost" you may be surprised one day to
discover that you are getting "periods of 502's" if any connections
timeout (> 60s) for any reason.  Since only 2 of those and your entire
server group has been exhausted.

> (And no, it's not a DNS failure - DNS is only used when nginx
> resolves the name in the proxy_pass directive while parsing
> configuration on startup.)
>
> > Suggestion/feature request: If a domain name resolves to several
> > addresses, log a warning in error.log file somehow, or at least in the
> > output of -T, to warn  somehow.  Then there won't be unexpected
> > round-robins occurring and "supposedly single" servers being
> > considered unavailable due to timeouts, surprising people like myself.
>
> Multiple addresses are fairy normal, and I don't think that
> logging a warning is a good idea.

I'm just saying...it might help somebody like me out, in the future.
There be dragons...or maybe the default error log could be configured
to make it more obvious to people what is going on?
(https://stackoverflow.com/a/52550758)

Or possibly the "-T" output could be enhanced to add "this server
group resolves to this many total unique servers" or something.
Your call of course, regardless :)

Thanks for the helps and conversations, all the best.

-Roger Pack-
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: feature request: warn when domain name resolves to several addresses

Maxim Dounin
Hello!

On Tue, Nov 19, 2019 at 07:26:35PM -0700, Roger Pack wrote:

> On Tue, Nov 19, 2019 at 12:01 PM Maxim Dounin <[hidden email]> wrote:
>
> > On Tue, Nov 19, 2019 at 10:47:01AM -0700, Roger Pack wrote:
> >
> > > I noticed that in ngx_http_proxy_module
> > >
> > > proxy_pass http://localhost:8000/uri/;
> > > "If a domain name resolves to several addresses, all of them will be
> > > used in a round-robin fashion. In addition, an address can be
> > > specified as a server group."
> > >
> > > However this can be confusing for end users who innocently put the
> > > domain name "localhost" then find that round-robin across ipv6 and
> > > ipv4 is occurring, ref:
> > > https://stackoverflow.com/a/58924751/32453
> >
> > This seems to be your own answer, and it looks incorrect to me.
> > In particular, the 499 error is logged when the client closes
> > connection, and there is no need to have more than one backend
> > server specified to see 499 errors.
>
> True, those cases were covered in some other answers to that question,
> but I'll add a note. :)
> It can also be logged when the backend server times out, at least
> empirically that seems to be the case...
> see also https://serverfault.com/questions/523340/post-request-is-repeated-with-nginx-loadbalanced-server-status-499/783624#783624

It is logged when the client closes the connection, only.  But
reasons why the client closes the connect might be different.

In particular, when the backend server times out, it means that
processing the request takes a long time.  And if processing
takes time, it is likely that the client will give up waiting and
will close the connection, resulting in 499.

> > > https://stackoverflow.com/a/52550758/32453
> >
> > Changing "localhost" to "127.0.0.1" here "works" because having just
> > one address triggers slightly different logic in the upstream
> > code: with just one address, max_fails / fail_timeout logic is
> > disabled, and nginx always uses the (only) address available, even
> > if there are errors.
>
> Right.  The confusion in my mind is that people configuring Nginx will
> use one backend "localhost", and assume they have set it up for a
> "single server" type server group.
> Since they have listed only one host.  But it has not...
> See for instance https://stackoverflow.com/a/52550758
>
> > The underlying problem is still the same though: backends cannot
> > cope with the load, and there are errors.
>
> Right.  However with the "single server" scenario this behavior is
> handled differently (it doesn't exhaust the server group of available
> servers and begin to return with 502's exclusively for a time, as it
> did in my instance...).
>
> Basically if, while setting it up, you happen to forward to 127.0.0.1,
> it will work fine, no "periods of 502's" (though you may get some
> 504's).
>
> But if you forward it to "localhost" you may be surprised one day to
> discover that you are getting "periods of 502's" if any connections
> timeout (> 60s) for any reason.  Since only 2 of those and your entire
> server group has been exhausted.

I don't think people know and/or expect the difference in handling
between single address and multiple addresses, regardless of
whether they know there are multiple addresses, or not.  As such,
a configuration-time warning won't help.

Rather, we can consider explaining the difference.  Alternatively,
we can make it go away - either by changing the single-address case
to be identical to the multiple-addresses one, or vice versa.  Or even
by making this configurable.

(Actually, previously multiple-addresses case was handled
differently, closer to the single-address approach, and resulted
in just one 502, with "quick recovery" of all servers on the first
request.  But some time ago this was changed to follow
fail_timeout instead, as quick recovery of all servers seems to
cause more harm than good in most configurations.)

> > (And no, it's not a DNS failure - DNS is only used when nginx
> > resolves the name in the proxy_pass directive while parsing
> > configuration on startup.)
> >
> > > Suggestion/feature request: If a domain name resolves to several
> > > addresses, log a warning in error.log file somehow, or at least in the
> > > output of -T, to warn  somehow.  Then there won't be unexpected
> > > round-robins occurring and "supposedly single" servers being
> > > considered unavailable due to timeouts, surprising people like myself.
> >
> > Multiple addresses are fairy normal, and I don't think that
> > logging a warning is a good idea.
>
> I'm just saying...it might help somebody like me out, in the future.
> There be dragons...or maybe the default error log could be configured
> to make it more obvious to people what is going on?
> (https://stackoverflow.com/a/52550758)

From the error log things are expected to be pretty obvious -
nginx logs the original errors, and it also logs when it cannot
pick an upstream server to use ("no live upstreams", which means
"all upstream servers are disabled due to errors").  Further, it
also logs when it disables a server, though it happens on the
"warn" level.

The main problem is that people hardly look into error logs at
all.  For example, the answer you are referring to only provides
access log information, and this is what makes it confusing.  On
the other hand, another answer to the same question is based on
the "no live upstreams" error message from the question, and
correctly refers to the max_fails/fail_timeout parameters.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: feature request: warn when domain name resolves to several addresses

Roger Pack
On Wed, Nov 20, 2019 at 12:28 PM Maxim Dounin <[hidden email]> wrote:

>
> Hello!
>
> On Tue, Nov 19, 2019 at 07:26:35PM -0700, Roger Pack wrote:
>
> > On Tue, Nov 19, 2019 at 12:01 PM Maxim Dounin <[hidden email]> wrote:
> >
> > > On Tue, Nov 19, 2019 at 10:47:01AM -0700, Roger Pack wrote:
> > >
> > > > I noticed that in ngx_http_proxy_module
> > > >
> > > > proxy_pass http://localhost:8000/uri/;
> > > > "If a domain name resolves to several addresses, all of them will be
> > > > used in a round-robin fashion. In addition, an address can be
> > > > specified as a server group."
> > > >
> > > > However this can be confusing for end users who innocently put the
> > > > domain name "localhost" then find that round-robin across ipv6 and
> > > > ipv4 is occurring, ref:
> > > > https://stackoverflow.com/a/58924751/32453
> > >
> > > This seems to be your own answer, and it looks incorrect to me.
> > > In particular, the 499 error is logged when the client closes
> > > connection, and there is no need to have more than one backend
> > > server specified to see 499 errors.
> >
> > True, those cases were covered in some other answers to that question,
> > but I'll add a note. :)
> > It can also be logged when the backend server times out, at least
> > empirically that seems to be the case...
> > see also https://serverfault.com/questions/523340/post-request-is-repeated-with-nginx-loadbalanced-server-status-499/783624#783624
>
> It is logged when the client closes the connection, only.  But
> reasons why the client closes the connect might be different.
>
> In particular, when the backend server times out, it means that
> processing the request takes a long time.  And if processing
> takes time, it is likely that the client will give up waiting and
> will close the connection, resulting in 499.

OK you're right, thank you for the hint, turns out our client had a
60s timeout, so basically we'd see "connection timed out" error log
and "499 response" in quick succession and thought it was related.

Thank you that helped me figure out what was going on with my system!

> > > > https://stackoverflow.com/a/52550758/32453
> > >
> > > Changing "localhost" to "127.0.0.1" here "works" because having just
> > > one address triggers slightly different logic in the upstream
> > > code: with just one address, max_fails / fail_timeout logic is
> > > disabled, and nginx always uses the (only) address available, even
> > > if there are errors.
> >
> > Right.  The confusion in my mind is that people configuring Nginx will
> > use one backend "localhost", and assume they have set it up for a
> > "single server" type server group.
> > Since they have listed only one host.  But it has not...
> > See for instance https://stackoverflow.com/a/52550758
> >
> > > The underlying problem is still the same though: backends cannot
> > > cope with the load, and there are errors.
> >
> > Right.  However with the "single server" scenario this behavior is
> > handled differently (it doesn't exhaust the server group of available
> > servers and begin to return with 502's exclusively for a time, as it
> > did in my instance...).
> >
> > Basically if, while setting it up, you happen to forward to 127.0.0.1,
> > it will work fine, no "periods of 502's" (though you may get some
> > 504's).
> >
> > But if you forward it to "localhost" you may be surprised one day to
> > discover that you are getting "periods of 502's" if any connections
> > timeout (> 60s) for any reason.  Since only 2 of those and your entire
> > server group has been exhausted.
>
> I don't think people know and/or expect the difference in handling
> between single address and multiple addresses, regardless of
> whether they know there are multiple addresses, or not.  As such,
> a configuration-time warning won't help.
>
> Rather, we can consider explaining the difference.  Alternatively,
> we can make it go away - either by changing the single-address case
> to be identical to the multiple-addresses one, or vice versa.  Or even
> by making this configurable.
> (Actually, previously multiple-addresses case was handled
> differently, closer to the single-address approach, and resulted
> in just one 502, with "quick recovery" of all servers on the first
> request.  But some time ago this was changed to follow
> fail_timeout instead, as quick recovery of all servers seems to
> cause more harm than good in most configurations.)

Yeah, it might make sense to make the behavior similar.  Maybe never
disable the "last server marked as available" (of a server group) or
to enforce the 10s fail_timeout for single server (if it was useful
for multiple...then again maybe single is supposed to be a simpler
configuration?).

Or maybe add a warning to the documentation near where it says "If a
domain name resolves to several addresses, all of them will be used in
a round-robin fashion."

If you specify a hostname like 'localhost' and your system supports
both IPv4 and IPv6, the hostname can be interpreted to mean two
different servers.  Specify an exact IP address if you wish to avoid
this ambiguity, like '127.0.0.1' (or something like that).

Also the documentation for max_fails, fail_timeout and slow_start
maybe could add a note in them that they are ignored in the case of
single server.

> > > (And no, it's not a DNS failure - DNS is only used when nginx
> > > resolves the name in the proxy_pass directive while parsing
> > > configuration on startup.)
> > >
> > > > Suggestion/feature request: If a domain name resolves to several
> > > > addresses, log a warning in error.log file somehow, or at least in the
> > > > output of -T, to warn  somehow.  Then there won't be unexpected
> > > > round-robins occurring and "supposedly single" servers being
> > > > considered unavailable due to timeouts, surprising people like myself.
> > >
> > > Multiple addresses are fairy normal, and I don't think that
> > > logging a warning is a good idea.
> >
> > I'm just saying...it might help somebody like me out, in the future.
> > There be dragons...or maybe the default error log could be configured
> > to make it more obvious to people what is going on?
> > (https://stackoverflow.com/a/52550758)
>
> From the error log things are expected to be pretty obvious -
> nginx logs the original errors, and it also logs when it cannot
> pick an upstream server to use ("no live upstreams", which means
> "all upstream servers are disabled due to errors").  Further, it
> also logs when it disables a server, though it happens on the
> "warn" level.

Might be nice to log that at the error level, or possibly add it to
the " upstream timed out " log error message like " upstream timed
out, marking server as unavailable" or something like that (if easier
:).

A few more thoughts/ideas for that error message.
maybe could enhance it a bit, ex "upstream timed out after x seconds"
and "trying next server" (or "giving up") depending on what it does
next.  Just for quicker understanding of what decisions are being made
(and which configs being respected).

> The main problem is that people hardly look into error logs at
> all.  For example, the answer you are referring to only provides
> access log information, and this is what makes it confusing.  On
> the other hand, another answer to the same question is based on
> the "no live upstreams" error message from the question, and
> correctly refers to the max_fails/fail_timeout parameters.

I looked at the error logs when problems started happening (502's), so
the error logs are useful! :)

My answer references the error log (or at least does now, with some
recent changes):

https://stackoverflow.com/a/58924751/32453
Some others don't :)

Thanks for your thoughtful replies.
Cheers!
-Roger-
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx