UDP Load balancer does not scale

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

UDP Load balancer does not scale

drook
Hi

I am trying to set up a UDP load balancer using Nginx. Initially, I
configured 4 usptream servers with two server processes running on each of
them.
It gave a throughput of around 24000 query per second when tested with
dnsperf. When I try to add two more upstreams servers, the throughput is not
increasing as expected. In fact, it deteriorates to the range of 5000 query
per second with the following error:

[warn] 5943#0: *10433175 upstream server temporarily disabled while proxying
connection, udp client: xxx.xxx.xxx.29, server: 0.0.0.0:53, upstream:
"xxx.xxx.xxx.224:53", bytes from/to client:80/0, bytes from/to
upstream:0/80
[error] 5943#0: *10085077 no live upstreams while connecting to upstream,
udp client: xxx.xxx.xxx.224, server: 0.0.0.0:53, upstream: "dns_upstreams",
bytes from/to client:80/0, bytes from/to upstream:0/0

I understood that the above error appears when Nginx doesn't receive
responses from upstream on time, and it is marked as unavailable
temporarily. I used to get this error before even with 4 upstream servers,
but after adding the following additional configuration, it had got
resolved:

user nginx;
worker_processes 4;
worker_rlimit_nofile 65535;

load_module "/usr/lib64/nginx/modules/ngx_stream_module.so";

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  10240;
}

stream {
    upstream dns_upstreams {
              server xxx.xxx.xxx.0:53 max_fails=2000 fail_timeout=30s;
              server xxx.xxx.xxx.0:6363 max_fails=2000 fail_timeout=0s;
              server xxx.xxx.xxx.187:53 max_fails=2000 fail_timeout=30s;
              server xxx.xxx.xxx.187:6363 max_fails=2000 fail_timeout=30s;
              server xxx.xxx.xxx.183:53 max_fails=2000 fail_timeout=30s;
              server xxx.xxx.xxx.183:6363 max_fails=2000 fail_timeout=30s;
              server xxx.xxx.xxx.212:53 max_fails=2000 fail_timeout=30s;
              server xxx.xxx.xxx.212:6363 max_fails=2000 fail_timeout=30s;  
 
    }

    server {
        listen 53 udp;
        proxy_pass dns_upstreams;
        proxy_timeout 1s;
        proxy_responses 1;
    }
}

Even though this configuration works fine with 4 upstream servers, it
doesn't help when I increase the number of servers.

The Nginx server has enough memory and CPU capacity remaining when running
with 4 upstream servers as well as 6 upstream servers. And the dnsperf
client is not a bottleneck here because it can send much more load in a
different setup. Also, the individual upstream server can serve a bit more
than 5000 request per second.

I am trying to get some hints about why I am observing more upstream
failures and eventual unavailability when I add more servers. If anybody has
faced a similar issue in the past and can give me some pointers to solve it,
that would of great help.

Thanks,
Ajmal

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,274257,274257#msg-274257

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: UDP Load balancer does not scale

Maxim Konovalov
Hello,

On 16/05/2017 10:10, ajmalahd wrote:

> Hi
>
> I am trying to set up a UDP load balancer using Nginx. Initially, I
> configured 4 usptream servers with two server processes running on each of
> them.
> It gave a throughput of around 24000 query per second when tested with
> dnsperf. When I try to add two more upstreams servers, the throughput is not
> increasing as expected. In fact, it deteriorates to the range of 5000 query
> per second with the following error:
>
[...]

With adequate hardware you should expect at least 10x more.

I would double check if you are not CPU bound on the server and
client side (yes, I read your note about that but it's worth to
check again).  It could be that nginx uses just one worker/cpu.
Same about the client.

Then check if no bottleneck at the network layer / UDP with your
preferred tool. Check the OS network metrics for any obvious limits.

Also, it makes sense to read the following great slides about udp
perf tuning from Toshiaki Makita from NTT:

http://textlab.io/doc/15478046/boost-udp-transaction-performance

--
Maxim Konovalov
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx