SRX

Expand all | Collapse all

SRX1400 - large number of hosts, SNAT allocation failure, TCP retransmission

Jump to Best Answer
  • 1.  SRX1400 - large number of hosts, SNAT allocation failure, TCP retransmission

    Posted 03-04-2018 08:39

    I'm kind new here, however I'm been studying SRX JunOS for about 6 months, and sometimes I find some caveats and this forum still being my support, and I'm thankful for that and the effort of the community.

                However, there's a problem that I'm been studying it a lot:

                We, a public university, have a setup of two SRX1400 in HA with a link of 10 Gbps and internet of 1 Gbps. There's about 30 SNATs for different purposes with approximate 10 to 100 hosts/clients and the Internet access is pretty good.

                However, there's a SNAT for a public WIFI network that could reach 3000 hosts/clients easily and the Internet access is really poor (~76% of packet loss) and the packet lost in the link (host/client <-> SRX1400 gateway) is 0% of failure. The first problem was the DNS UDP queries, they didn't reach the outside DNS and problem start with no domain resolution, then TCP connection weren't made with the external servers. So, I brought an interface of our DNS inside the network and the DNS queries success rate raised to 100%. So the problem starts to become more "tactile".

                Next, I checked the CPU load (~0.30, ok), MEM (~30% free, ok) and our NAT logs and see a lot of this message:

                RT_FLOW_SESSION_CLOSE: session closed source NAT allocation failure
                
                Another symptom is the great number of ACK Retransmissions.
                
                So...
                
                First, I increased the aging timeout of the session flow
                    set security flow aging early-ageout 20
                    
                But, no success.

                So I tried to understand the process of session creation in the SRX and learned that there's a default limit for each SNAT of 128 concurrent sessions for destination-based. I created a screen to increase this limit, however I adjusted some instructions described here:
                    https://www.juniper.net/documentation/en_US/junos/topics/concept/denial-of-service-firewall-destination-based-session-limit-understanding.html
                    https://www.juniper.net/documentation/en_US/junos/topics/example/denial-of-service-firewall-destination-based-session-limit-setting-cli.html
                to increase the destination-based number in the INTERNAL_OPENWIFI zone, so a large number of clients could access the same host at the "same" time.
                
                But I'm still getting these SNAT FLOW errors (no success).
                
                The number of sessions is ~80000, with ~7000 invalidate sessions (I think this number is pretty high), but the session limit of the SRX is about 2^20 (1048576), so the number of sessions is a way bellow the maximum (I thinks this is good).
                
                I have the impression that the SRX is doing a WFQ (Weighted Fair Queue) between the SNATs transferring (INTERNAL_{Zone1|Zone2|...|ZoneN} -> UNTRUST), so I think it could be reserving the same bandwidth to SNATs with less hosts, however, I didn't find any source check this and to teach how to "tame" it if this really exists.
                
                If someone could help me with something, it will help us and a lot of users 🙂


    #TCP
    #SNAT
    #HOSTS
    #retransmission
    #LARGE
    #number


  • 2.  RE: SRX1400 - large number of hosts, SNAT allocation failure, TCP retransmission
    Best Answer

    Posted 03-12-2018 10:12

    Hello guys,

    I found the solution of the problem and it's pretty straightfoward 🙂

     

    Each SourceNAT has an IP address defined in the pool. Each IP has a capacity of ~65535 ports, which they are the resource used in the lines (less 1024, reserved) of the NAT table. So, if you have a SNAT with an amount of clients that exceed the ~65535 sessions, add another IP address in the pool.

     

    Ex: 3000 clients with 25 sessions = 75000 sessions <- Not good! Adding a second one, you will have ~131070 (less 2048) lines for the nat table and your users will be accessing normally.