Nginx throttling issue?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Nginx throttling issue?

John Melom

Hi,

 

I am load testing our system using Jmeter as a load generator.  We execute a script consisting of an https request executing in a loop.  The loop does not contain a think time, since at this point I am not trying to emulate a “real user”.  I want to get a quick look at our system capacity.  Load on our system is increased by increasing the number of Jmeter threads executing our script.  Each Jmeter thread references different data.

 

Our system is in AWS with an ELB fronting Nginx, which serves as a reverse proxy for our Docker Swarm application cluster.

 

At moderate loads, a subset of our https requests start experiencing to a 1 second delay in addition to their normal response time.  The delay is not due to resource contention.  System utilizations remain low.  The response times cluster around 4 values:  0 millilseconds, 50 milliseconds, 1 second, and 1.050 seconds.  Right now, I am most interested in understanding and eliminating the 1 second delay that gives the clusters at 1 second and 1.050 seconds.

 

The attachment shows a response time scatterplot from one of our runs.  The x-axis is the number of seconds into the run, the y-axis is the response time in milliseconds.  The plotted data shows the response time of requests at the time they occurred in the run.

 

If I run the test bypassing the ELB and Nginx, this delay does not occur. 

If I bypass the ELB, but include Nginx in the request path, the delay returns.

 

This leads me to believe the 1 second delay is coming from Nginx. 

 

One possible candidate Nginx DDOS.  Since all requests are coming from the same Jmeter system, I expect they share the same originating IP address.  I attempted to control DDOS throttling by setting limit_req as shown in the nginx.conf fragment below:

 

http {

    limit_req_zone $binary_remote_addr zone=perf:20m rate=10000r/s;

    server {

        location /myReq {

            limit_req zone=perf burst=600;

            proxy_pass xxx.xxx.xxx.xxx;

        }

….

    }

 

The thinking behind the values set in this conf file is that my aggregate demand would not exceed 10000 requests per second, so throttling of requests should not occur.  If there were short bursts more intense than that, the burst value would buffer these requests.

 

This tuning did not change my results.  I still get the 1 second delay.

 

Am I implementing this correctly?

Is there something else I should be trying?

 

The responses are not large, so I don’t believe limit_req is the answer.

I have a small number of intense users, so limit_conn does not seem likely to be the answer either.

 

Thanks,

 

John Melom

Performance Test Engineer

Spōk, Inc.

+1 (952) 230 5311 Office

[hidden email]

 

cid:image001.jpg@01D1E1AF.34FE1C10

 



NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

rawRespScatterplot.png (57K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Nginx throttling issue?

Peter Booth
You’re correct that this is the ddos throttling.  The real question is what do you want to do?  JMeter with zero think time is an imperfect load generator- this is only one complication. The bigger one is the open/closed model issue. With you design you have back ptesssure from your system under test to your load generator. A jmeter virtual user will only ever issue a request when the prior one completes. Real users are not so well behaved which is why your test results will always be over optimistic with this design.

Better approach us to use a load generator that replicates the desired request distribution without triggering the ddos protection. Wrk2, Tsung, httperf are candidates, as well as the cloud based load generator services. Also see Neil Gunther’s paper on how to combine multiple jmeter instances to replicate real world tragic patterns.

Peter

Sent from my iPhone

On Mar 26, 2018, at 4:21 PM, John Melom <[hidden email]> wrote:

Hi,

 

I am load testing our system using Jmeter as a load generator.  We execute a script consisting of an https request executing in a loop.  The loop does not contain a think time, since at this point I am not trying to emulate a “real user”.  I want to get a quick look at our system capacity.  Load on our system is increased by increasing the number of Jmeter threads executing our script.  Each Jmeter thread references different data.

 

Our system is in AWS with an ELB fronting Nginx, which serves as a reverse proxy for our Docker Swarm application cluster.

 

At moderate loads, a subset of our https requests start experiencing to a 1 second delay in addition to their normal response time.  The delay is not due to resource contention.  System utilizations remain low.  The response times cluster around 4 values:  0 millilseconds, 50 milliseconds, 1 second, and 1.050 seconds.  Right now, I am most interested in understanding and eliminating the 1 second delay that gives the clusters at 1 second and 1.050 seconds.

 

The attachment shows a response time scatterplot from one of our runs.  The x-axis is the number of seconds into the run, the y-axis is the response time in milliseconds.  The plotted data shows the response time of requests at the time they occurred in the run.

 

If I run the test bypassing the ELB and Nginx, this delay does not occur. 

If I bypass the ELB, but include Nginx in the request path, the delay returns.

 

This leads me to believe the 1 second delay is coming from Nginx. 

 

One possible candidate Nginx DDOS.  Since all requests are coming from the same Jmeter system, I expect they share the same originating IP address.  I attempted to control DDOS throttling by setting limit_req as shown in the nginx.conf fragment below:

 

http {

    limit_req_zone $binary_remote_addr zone=perf:20m rate=10000r/s;

    server {

        location /myReq {

            limit_req zone=perf burst=600;

            proxy_pass xxx.xxx.xxx.xxx;

        }

….

    }

 

The thinking behind the values set in this conf file is that my aggregate demand would not exceed 10000 requests per second, so throttling of requests should not occur.  If there were short bursts more intense than that, the burst value would buffer these requests.

 

This tuning did not change my results.  I still get the 1 second delay.

 

Am I implementing this correctly?

Is there something else I should be trying?

 

The responses are not large, so I don’t believe limit_req is the answer.

I have a small number of intense users, so limit_conn does not seem likely to be the answer either.

 

Thanks,

 

John Melom

Performance Test Engineer

Spōk, Inc.

+1 (952) 230 5311 Office

[hidden email]

 

<image003.jpg>

 



NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
<rawRespScatterplot.png>
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx throttling issue?

Maxim Dounin
In reply to this post by John Melom
Hello!

On Mon, Mar 26, 2018 at 08:21:27PM +0000, John Melom wrote:

> I am load testing our system using Jmeter as a load generator.  
> We execute a script consisting of an https request executing in
> a loop.  The loop does not contain a think time, since at this
> point I am not trying to emulate a “real user”.  I want to get a
> quick look at our system capacity.  Load on our system is
> increased by increasing the number of Jmeter threads executing
> our script.  Each Jmeter thread references different data.
>
> Our system is in AWS with an ELB fronting Nginx, which serves as
> a reverse proxy for our Docker Swarm application cluster.
>
> At moderate loads, a subset of our https requests start
> experiencing to a 1 second delay in addition to their normal
> response time.  The delay is not due to resource contention.  
> System utilizations remain low.  The response times cluster
> around 4 values:  0 millilseconds, 50 milliseconds, 1 second,
> and 1.050 seconds.  Right now, I am most interested in
> understanding and eliminating the 1 second delay that gives the
> clusters at 1 second and 1.050 seconds.
>
> The attachment shows a response time scatterplot from one of our
> runs.  The x-axis is the number of seconds into the run, the
> y-axis is the response time in milliseconds.  The plotted data
> shows the response time of requests at the time they occurred in
> the run.
>
> If I run the test bypassing the ELB and Nginx, this delay does
> not occur.
> If I bypass the ELB, but include Nginx in the request path, the
> delay returns.
>
> This leads me to believe the 1 second delay is coming from
> Nginx.

There are no magic 1 second delays in nginx - unless you've
configured something explicitly.

Most likely, the 1 second delay is coming from TCP retransmission
timeout during connection establishment due to listen queue
overflows.  Check "netstat -s" to see if there are any listen
queue overflows on your hosts.

[...]

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

RE: Nginx throttling issue?

John Melom
In reply to this post by Peter Booth

Peter,

 

Thanks for your reply.

 

What I’d really like is to understand how to tune nginx to avoid the delays when I run my tests. 

 

I am comfortable with the overly optimistic results from my current “closed model” test design.  Once I determine my system’s throughput limits I will introduce significant think times into my scripts so that much larger user populations are required to produce the same work demand.  This will more closely approximate an “open model” test design.

 

Could you provide more explanation as to why a different load generation tool would avoid triggering a DDOS response from nginx?  My first guess would have been that they would also generate requests from a single IP address, and thus look the same as a JMeter load.

 

I did try my test with JMeter driving workload from 2 different machines at the same time.  I ran each machine ‘s workload at a low enough level that individually they did not trigger the 1 second delay.  The combined workload did trigger the delay for each of the JMeter workload generators.  I’m not sure how many machines would be required to avoid the collective response from nginx.

 

Thanks,

 

John

 

 

From: nginx [mailto:[hidden email]] On Behalf Of Peter Booth
Sent: Monday, March 26, 2018 3:57 PM
To: [hidden email]
Subject: Re: Nginx throttling issue?

 

You’re correct that this is the ddos throttling.  The real question is what do you want to do?  JMeter with zero think time is an imperfect load generator- this is only one complication. The bigger one is the open/closed model issue. With you design you have back ptesssure from your system under test to your load generator. A jmeter virtual user will only ever issue a request when the prior one completes. Real users are not so well behaved which is why your test results will always be over optimistic with this design.

 

Better approach us to use a load generator that replicates the desired request distribution without triggering the ddos protection. Wrk2, Tsung, httperf are candidates, as well as the cloud based load generator services. Also see Neil Gunther’s paper on how to combine multiple jmeter instances to replicate real world tragic patterns.

 

Peter

Sent from my iPhone


On Mar 26, 2018, at 4:21 PM, John Melom <[hidden email]> wrote:

Hi,

 

I am load testing our system using Jmeter as a load generator.  We execute a script consisting of an https request executing in a loop.  The loop does not contain a think time, since at this point I am not trying to emulate a “real user”.  I want to get a quick look at our system capacity.  Load on our system is increased by increasing the number of Jmeter threads executing our script.  Each Jmeter thread references different data.

 

Our system is in AWS with an ELB fronting Nginx, which serves as a reverse proxy for our Docker Swarm application cluster.

 

At moderate loads, a subset of our https requests start experiencing to a 1 second delay in addition to their normal response time.  The delay is not due to resource contention.  System utilizations remain low.  The response times cluster around 4 values:  0 millilseconds, 50 milliseconds, 1 second, and 1.050 seconds.  Right now, I am most interested in understanding and eliminating the 1 second delay that gives the clusters at 1 second and 1.050 seconds.

 

The attachment shows a response time scatterplot from one of our runs.  The x-axis is the number of seconds into the run, the y-axis is the response time in milliseconds.  The plotted data shows the response time of requests at the time they occurred in the run.

 

If I run the test bypassing the ELB and Nginx, this delay does not occur. 

If I bypass the ELB, but include Nginx in the request path, the delay returns.

 

This leads me to believe the 1 second delay is coming from Nginx. 

 

One possible candidate Nginx DDOS.  Since all requests are coming from the same Jmeter system, I expect they share the same originating IP address.  I attempted to control DDOS throttling by setting limit_req as shown in the nginx.conf fragment below:

 

http {

    limit_req_zone $binary_remote_addr zone=perf:20m rate=10000r/s;

    server {

        location /myReq {

            limit_req zone=perf burst=600;

            proxy_pass xxx.xxx.xxx.xxx;

        }

….

    }

 

The thinking behind the values set in this conf file is that my aggregate demand would not exceed 10000 requests per second, so throttling of requests should not occur.  If there were short bursts more intense than that, the burst value would buffer these requests.

 

This tuning did not change my results.  I still get the 1 second delay.

 

Am I implementing this correctly?

Is there something else I should be trying?

 

The responses are not large, so I don’t believe limit_req is the answer.

I have a small number of intense users, so limit_conn does not seem likely to be the answer either.

 

Thanks,

 

John Melom

Performance Test Engineer

Spōk, Inc.

+1 (952) 230 5311 Office

[hidden email]

 

<image003.jpg>

 

 


NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.

<rawRespScatterplot.png>

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx



NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

RE: Nginx throttling issue?

John Melom
In reply to this post by Maxim Dounin
Maxim,

Thank you for your reply.  I will look to see if "netstat -s" detects any listen queue overflows.

John


-----Original Message-----
From: nginx [mailto:[hidden email]] On Behalf Of Maxim Dounin
Sent: Tuesday, March 27, 2018 6:55 AM
To: [hidden email]
Subject: Re: Nginx throttling issue?

Hello!

On Mon, Mar 26, 2018 at 08:21:27PM +0000, John Melom wrote:

> I am load testing our system using Jmeter as a load generator.
> We execute a script consisting of an https request executing in a
> loop.  The loop does not contain a think time, since at this point I
> am not trying to emulate a “real user”.  I want to get a quick look at
> our system capacity.  Load on our system is increased by increasing
> the number of Jmeter threads executing our script.  Each Jmeter thread
> references different data.
>
> Our system is in AWS with an ELB fronting Nginx, which serves as a
> reverse proxy for our Docker Swarm application cluster.
>
> At moderate loads, a subset of our https requests start experiencing
> to a 1 second delay in addition to their normal response time.  The
> delay is not due to resource contention.
> System utilizations remain low.  The response times cluster around 4
> values:  0 millilseconds, 50 milliseconds, 1 second, and 1.050
> seconds.  Right now, I am most interested in understanding and
> eliminating the 1 second delay that gives the clusters at 1 second and
> 1.050 seconds.
>
> The attachment shows a response time scatterplot from one of our runs.
> The x-axis is the number of seconds into the run, the y-axis is the
> response time in milliseconds.  The plotted data shows the response
> time of requests at the time they occurred in the run.
>
> If I run the test bypassing the ELB and Nginx, this delay does not
> occur.
> If I bypass the ELB, but include Nginx in the request path, the delay
> returns.
>
> This leads me to believe the 1 second delay is coming from Nginx.

There are no magic 1 second delays in nginx - unless you've configured something explicitly.

Most likely, the 1 second delay is coming from TCP retransmission timeout during connection establishment due to listen queue overflows.  Check "netstat -s" to see if there are any listen queue overflows on your hosts.

[...]

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

________________________________
NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

RE: Nginx throttling issue?

John Melom
Hi Maxim,

I've looked at the nstat data and found the following values for counters:

> nstat -az | grep -I listen
TcpExtListenOverflows           0                  0.0
TcpExtListenDrops               0                  0.0
TcpExtTCPFastOpenListenOverflow 0                  0.0


nstat -az | grep -i retra
TcpRetransSegs                  12157              0.0
TcpExtTCPLostRetransmit         0                  0.0
TcpExtTCPFastRetrans            270                0.0
TcpExtTCPForwardRetrans         11                 0.0
TcpExtTCPSlowStartRetrans       0                  0.0
TcpExtTCPRetransFail            0                  0.0
TcpExtTCPSynRetrans             25                 0.0

Assuming the above "Listen" counters provide data about the overflow issue you mention, then there are no overflows on my system.  While retransmissions are happening, it doesn't seem they are related to listen queue overflows.


Am I looking at the correct data items?  Is my interpretation of the data correct?  If so, do you have any other ideas I could investigate?

Thanks,

John

-----Original Message-----
From: nginx [mailto:[hidden email]] On Behalf Of John Melom
Sent: Tuesday, March 27, 2018 8:52 AM
To: [hidden email]
Subject: RE: Nginx throttling issue?

Maxim,

Thank you for your reply.  I will look to see if "netstat -s" detects any listen queue overflows.

John


-----Original Message-----
From: nginx [mailto:[hidden email]] On Behalf Of Maxim Dounin
Sent: Tuesday, March 27, 2018 6:55 AM
To: [hidden email]
Subject: Re: Nginx throttling issue?

Hello!

On Mon, Mar 26, 2018 at 08:21:27PM +0000, John Melom wrote:

> I am load testing our system using Jmeter as a load generator.
> We execute a script consisting of an https request executing in a
> loop.  The loop does not contain a think time, since at this point I
> am not trying to emulate a “real user”.  I want to get a quick look at
> our system capacity.  Load on our system is increased by increasing
> the number of Jmeter threads executing our script.  Each Jmeter thread
> references different data.
>
> Our system is in AWS with an ELB fronting Nginx, which serves as a
> reverse proxy for our Docker Swarm application cluster.
>
> At moderate loads, a subset of our https requests start experiencing
> to a 1 second delay in addition to their normal response time.  The
> delay is not due to resource contention.
> System utilizations remain low.  The response times cluster around 4
> values:  0 millilseconds, 50 milliseconds, 1 second, and 1.050
> seconds.  Right now, I am most interested in understanding and
> eliminating the 1 second delay that gives the clusters at 1 second and
> 1.050 seconds.
>
> The attachment shows a response time scatterplot from one of our runs.
> The x-axis is the number of seconds into the run, the y-axis is the
> response time in milliseconds.  The plotted data shows the response
> time of requests at the time they occurred in the run.
>
> If I run the test bypassing the ELB and Nginx, this delay does not
> occur.
> If I bypass the ELB, but include Nginx in the request path, the delay
> returns.
>
> This leads me to believe the 1 second delay is coming from Nginx.

There are no magic 1 second delays in nginx - unless you've configured something explicitly.

Most likely, the 1 second delay is coming from TCP retransmission timeout during connection establishment due to listen queue overflows.  Check "netstat -s" to see if there are any listen queue overflows on your hosts.

[...]

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

________________________________
NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

________________________________
NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx throttling issue?

Peter Booth
John,

I think that you need to understand what is happening on your host throughout the duration of the test. Specifically, what is happening with the tcp connections. If you run netstat and grep for tcp and do this in a loop every say five seconds then you’ll see how many connections peak get created.
If the thing you are testing exists in production then you are lucky. You can do the same in production and see what it is that you need to replicate.

You didn’t mention whether you had persistent connections (http keep alive) configured. This is key to maximizing scalability. You did say that you were using SSL. If it were me I’d use a load generator that more closely resembles the behavior of real users on a website. Wrk2, Tsung, httperf, Gatling are examples of some that do. Using jmeter with zero think time is a very common anti pattern that doesn’t behave anything like real users. I think of it as the lazy performance tester pattern.

Imagine a real web server under heavy load from human beings. You will see thousands of concurrent connections but fewer concurrent requests in flight. With the jmeter zero think time model then you are either creating new connections or reusing them - so either you have a shitload of connections and your nginx process starts running out of file handles or you are jamming requests down a single connection- neither of which resemble reality.

If you are committed to using jmeter for some reason then use more instances with real thinktimes. Each instance’s connection wil have a different source port

Sent from my iPhone

> On Apr 4, 2018, at 5:20 PM, John Melom <[hidden email]> wrote:
>
> Hi Maxim,
>
> I've looked at the nstat data and found the following values for counters:
>
>> nstat -az | grep -I listen
> TcpExtListenOverflows           0                  0.0
> TcpExtListenDrops               0                  0.0
> TcpExtTCPFastOpenListenOverflow 0                  0.0
>
>
> nstat -az | grep -i retra
> TcpRetransSegs                  12157              0.0
> TcpExtTCPLostRetransmit         0                  0.0
> TcpExtTCPFastRetrans            270                0.0
> TcpExtTCPForwardRetrans         11                 0.0
> TcpExtTCPSlowStartRetrans       0                  0.0
> TcpExtTCPRetransFail            0                  0.0
> TcpExtTCPSynRetrans             25                 0.0
>
> Assuming the above "Listen" counters provide data about the overflow issue you mention, then there are no overflows on my system.  While retransmissions are happening, it doesn't seem they are related to listen queue overflows.
>
>
> Am I looking at the correct data items?  Is my interpretation of the data correct?  If so, do you have any other ideas I could investigate?
>
> Thanks,
>
> John
>
> -----Original Message-----
> From: nginx [mailto:[hidden email]] On Behalf Of John Melom
> Sent: Tuesday, March 27, 2018 8:52 AM
> To: [hidden email]
> Subject: RE: Nginx throttling issue?
>
> Maxim,
>
> Thank you for your reply.  I will look to see if "netstat -s" detects any listen queue overflows.
>
> John
>
>
> -----Original Message-----
> From: nginx [mailto:[hidden email]] On Behalf Of Maxim Dounin
> Sent: Tuesday, March 27, 2018 6:55 AM
> To: [hidden email]
> Subject: Re: Nginx throttling issue?
>
> Hello!
>
>> On Mon, Mar 26, 2018 at 08:21:27PM +0000, John Melom wrote:
>>
>> I am load testing our system using Jmeter as a load generator.
>> We execute a script consisting of an https request executing in a
>> loop.  The loop does not contain a think time, since at this point I
>> am not trying to emulate a “real user”.  I want to get a quick look at
>> our system capacity.  Load on our system is increased by increasing
>> the number of Jmeter threads executing our script.  Each Jmeter thread
>> references different data.
>>
>> Our system is in AWS with an ELB fronting Nginx, which serves as a
>> reverse proxy for our Docker Swarm application cluster.
>>
>> At moderate loads, a subset of our https requests start experiencing
>> to a 1 second delay in addition to their normal response time.  The
>> delay is not due to resource contention.
>> System utilizations remain low.  The response times cluster around 4
>> values:  0 millilseconds, 50 milliseconds, 1 second, and 1.050
>> seconds.  Right now, I am most interested in understanding and
>> eliminating the 1 second delay that gives the clusters at 1 second and
>> 1.050 seconds.
>>
>> The attachment shows a response time scatterplot from one of our runs.
>> The x-axis is the number of seconds into the run, the y-axis is the
>> response time in milliseconds.  The plotted data shows the response
>> time of requests at the time they occurred in the run.
>>
>> If I run the test bypassing the ELB and Nginx, this delay does not
>> occur.
>> If I bypass the ELB, but include Nginx in the request path, the delay
>> returns.
>>
>> This leads me to believe the 1 second delay is coming from Nginx.
>
> There are no magic 1 second delays in nginx - unless you've configured something explicitly.
>
> Most likely, the 1 second delay is coming from TCP retransmission timeout during connection establishment due to listen queue overflows.  Check "netstat -s" to see if there are any listen queue overflows on your hosts.
>
> [...]
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
>
> ________________________________
> NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
>
> ________________________________
> NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx throttling issue?

nginx mailing list
Even though it shouldn't be reaching your limits, limit_req does delay in 1 second increments which sounds like it could be responsible for this. You should see error log entries if this happens (severity warning). Have you tried without the limit_req option? You can also use the nodelay option to avoid the delaying behavior.



On Thu, Apr 5, 2018 at 6:45 AM, Peter Booth <[hidden email]> wrote:
John,

I think that you need to understand what is happening on your host throughout the duration of the test. Specifically, what is happening with the tcp connections. If you run netstat and grep for tcp and do this in a loop every say five seconds then you’ll see how many connections peak get created.
If the thing you are testing exists in production then you are lucky. You can do the same in production and see what it is that you need to replicate.

You didn’t mention whether you had persistent connections (http keep alive) configured. This is key to maximizing scalability. You did say that you were using SSL. If it were me I’d use a load generator that more closely resembles the behavior of real users on a website. Wrk2, Tsung, httperf, Gatling are examples of some that do. Using jmeter with zero think time is a very common anti pattern that doesn’t behave anything like real users. I think of it as the lazy performance tester pattern.

Imagine a real web server under heavy load from human beings. You will see thousands of concurrent connections but fewer concurrent requests in flight. With the jmeter zero think time model then you are either creating new connections or reusing them - so either you have a shitload of connections and your nginx process starts running out of file handles or you are jamming requests down a single connection- neither of which resemble reality.

If you are committed to using jmeter for some reason then use more instances with real thinktimes. Each instance’s connection wil have a different source port

Sent from my iPhone

> On Apr 4, 2018, at 5:20 PM, John Melom <[hidden email]> wrote:
>
> Hi Maxim,
>
> I've looked at the nstat data and found the following values for counters:
>
>> nstat -az | grep -I listen
> TcpExtListenOverflows           0                  0.0
> TcpExtListenDrops               0                  0.0
> TcpExtTCPFastOpenListenOverflow 0                  0.0
>
>
> nstat -az | grep -i retra
> TcpRetransSegs                  12157              0.0
> TcpExtTCPLostRetransmit         0                  0.0
> TcpExtTCPFastRetrans            270                0.0
> TcpExtTCPForwardRetrans         11                 0.0
> TcpExtTCPSlowStartRetrans       0                  0.0
> TcpExtTCPRetransFail            0                  0.0
> TcpExtTCPSynRetrans             25                 0.0
>
> Assuming the above "Listen" counters provide data about the overflow issue you mention, then there are no overflows on my system.  While retransmissions are happening, it doesn't seem they are related to listen queue overflows.
>
>
> Am I looking at the correct data items?  Is my interpretation of the data correct?  If so, do you have any other ideas I could investigate?
>
> Thanks,
>
> John
>
> -----Original Message-----
> From: nginx [mailto:[hidden email]] On Behalf Of John Melom
> Sent: Tuesday, March 27, 2018 8:52 AM
> To: [hidden email]
> Subject: RE: Nginx throttling issue?
>
> Maxim,
>
> Thank you for your reply.  I will look to see if "netstat -s" detects any listen queue overflows.
>
> John
>
>
> -----Original Message-----
> From: nginx [mailto:[hidden email]] On Behalf Of Maxim Dounin
> Sent: Tuesday, March 27, 2018 6:55 AM
> To: [hidden email]
> Subject: Re: Nginx throttling issue?
>
> Hello!
>
>> On Mon, Mar 26, 2018 at 08:21:27PM +0000, John Melom wrote:
>>
>> I am load testing our system using Jmeter as a load generator.
>> We execute a script consisting of an https request executing in a
>> loop.  The loop does not contain a think time, since at this point I
>> am not trying to emulate a “real user”.  I want to get a quick look at
>> our system capacity.  Load on our system is increased by increasing
>> the number of Jmeter threads executing our script.  Each Jmeter thread
>> references different data.
>>
>> Our system is in AWS with an ELB fronting Nginx, which serves as a
>> reverse proxy for our Docker Swarm application cluster.
>>
>> At moderate loads, a subset of our https requests start experiencing
>> to a 1 second delay in addition to their normal response time.  The
>> delay is not due to resource contention.
>> System utilizations remain low.  The response times cluster around 4
>> values:  0 millilseconds, 50 milliseconds, 1 second, and 1.050
>> seconds.  Right now, I am most interested in understanding and
>> eliminating the 1 second delay that gives the clusters at 1 second and
>> 1.050 seconds.
>>
>> The attachment shows a response time scatterplot from one of our runs.
>> The x-axis is the number of seconds into the run, the y-axis is the
>> response time in milliseconds.  The plotted data shows the response
>> time of requests at the time they occurred in the run.
>>
>> If I run the test bypassing the ELB and Nginx, this delay does not
>> occur.
>> If I bypass the ELB, but include Nginx in the request path, the delay
>> returns.
>>
>> This leads me to believe the 1 second delay is coming from Nginx.
>
> There are no magic 1 second delays in nginx - unless you've configured something explicitly.
>
> Most likely, the 1 second delay is coming from TCP retransmission timeout during connection establishment due to listen queue overflows.  Check "netstat -s" to see if there are any listen queue overflows on your hosts.
>
> [...]
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
>
> ________________________________
> NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
>
> ________________________________
> NOTE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you have received this e-mail in error, please contact the sender by replying to this email, and destroy all copies of the original message and any material included with this email.
> _______________________________________________
> nginx mailing list
> [hidden email]
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx


_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Nginx throttling issue?

Maxim Dounin
Hello!

On Fri, Apr 06, 2018 at 07:11:36PM +0200, Richard Stanway via nginx wrote:

>  Even though it shouldn't be reaching your limits, limit_req does delay in
> 1 second increments which sounds like it could be responsible for this. You

Delays as introduced by limit_req (again, only if explicitly
configured) use milliseconds granularity.  In the particular case
configured with rate=10000r/s and burst=600, maximum possible
delay would be 60ms.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx