Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are some very rough estimates on when it makes sense to look at these low-level network settings when scaling an application? I assume the default settings are good enough for moderate loads, but at which point does this stuff become a bottleneck?

Are the default setting here reasonable for most cases, or is it more like something that you should tune even if you're not really pushing any limits?



My NGINX webserver configuration on AWS behind an ALB is:

/etc/sysctl.conf:

    net.core.wmem_max = 12582912
    net.core.rmem_max = 12582912
    net.ipv4.tcp_rmem = 10240 87380 12582912
    net.ipv4.tcp_wmem = 10240 87380 12582912
    fs.file-max = 1000000
    net.ipv4.ip_local_port_range = 1024 65535
    net.ipv4.tcp_tw_recycle = 1
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_max_syn_backlog = 262144
    net.ipv4.tcp_syncookies = 0
    net.ipv4.tcp_fin_timeout = 3
    net.ipv4.tcp_syn_retries = 2
    net.ipv4.tcp_synack_retries = 2
    net.ipv4.tcp_no_metrics_save = 1
    net.ipv4.tcp_max_orphans = 262144
    net.core.somaxconn = 1000000
nginx.conf (just the relevant directives):

    worker_rlimit_nofile 102400;

    events {
      worker_connections 102400;
      multi_accept on;
    }

    http {
        server {
          listen 80 default_server reuseport backlog=102400;
          ...
        }    
    }
As you can see, the socket and backlog-related values have been cranked way up. I've never had any problems with this configuration. Because these servers are behind and ALB I don't know how relevant they are since the SYN and SYN-ACK relation to RTT is between the server and the load balancer, not the remote clients. But I could be wrong. Maybe there's something I'm missing. But I've never had a problem, and I've never had any performance problems related to TCP connections in the kernel or NGINX.


I think for ALB you'll see pooled connections (http or http2) so I would expect the number of TCP connections to stay pretty low. In http2 it could theoretically be as low as one.


Gerenaly those will start to hurt on the single digit thousands of connections per second (per process). I'd say that it's much more relevant that you start monitoring those logs when you reach single digit hundreds of connections per second than that you set a point to act. (100s of connections/second is a pretty normal "just got traction" value, so if you see a steady usage, monitor.)

Of course, YMMV. High latency networks reduce those numbers.

Anyway, I don't see why the numbers aren't 100 times larger by default, but there's probably a reason.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: