SRX

Expand all | Collapse all

Traffic to node 1 is blocked when HA data plane is in active-active mode

Jump to Best Answer
  • 1.  Traffic to node 1 is blocked when HA data plane is in active-active mode

    Posted 05-01-2020 12:11

    Hi, all,

     

     Let me copy&paste this KB article, because it directly relates to my question:

    SUMMARY:
    This article explains why traffic that goes to node 1 is blocked when HA data plane is running in active-active mode, and source NAT pool (no port translation) contains only one IP address.
    
    SYMPTOMS:
    The Source Network Address Translation (NAT) in high availability (HA) was configured as follows:
    
    {primary:node0}[edit security nat]
    root# show | display set
    set security nat source pool 1 address 1.1.1.1/32
    set security nat source pool 1 port no-translation
    set security nat source rule-set 1 from zone untrust
    set security nat source rule-set 1 to zone trust
    set security nat source rule-set 1 rule 1 match source-address 0.0.0.0/0
    set security nat source rule-set 1 rule 1 then source-nat pool 1
    
    After committing, the following error was seen:
     
    {primary:node0}[edit]
    root# commit
    [edit security nat source pool 1]
    'port'
    warning: Ha data plane will be running in active-active mode, source NAT pool (no port translation) contains too few addresses(at least 2 addresses needed), traffic goes to node 1 will be BLOCKED!
    node0:
    configuration check succeeds
    node1:
    [edit security nat source pool 1]
    'port'
    warning: Ha data plane will be running in active-active mode, source NAT pool (no port translation) contains too few addresses(at least 2 addresses needed), traffic goes to node 1 will be BLOCKED!
    commit complete
    node0:
    commit complete
    
    Even when the commit was successful, after data RG (for example RG1) failover from Node0 to Node1, the traffic was blocked.
    CAUSE:
    By default, on SRX devices when running in chassis cluster (SRX runs in Active-Active setup), if an IP based source NAT is done, then ports are equally divided on both nodes, 1-32k on node1, and 32k-65k on node0 for the NAT purpose.
    
    However, in this case, there was only one IP address but not the ports for translation. This means only one IP is present with no port translation. Therefore, a minimum of two IP addresses would be required to distribute the pool among two nodes.
    
    SOLUTION:
    Add more IP addresses into the pool
    
    set security nat source pool 1 address 1.1.1.0/30 <-- here
    set security nat source pool 1 port no-translation
    set security nat source rule-set 1 from zone trust
    set security nat source rule-set 1 to zone untrust
    set security nat source rule-set 1 rule 1 match source-address 0.0.0.0/0
    set security nat source rule-set 1 rule 1 then source-nat pool 1
    
    Or without the port no-translation
    
    set security nat source pool 1 address 1.1.1.1/32
    delete security nat source pool 1 port no-translation <--here
    set security nat source rule-set 1 from zone trust
    set security nat source rule-set 1 to zone untrust
    set security nat source rule-set 1 rule 1 match source-address 0.0.0.0/0
    set security nat source rule-set 1 rule 1 then source-nat pool 1
    
    Or change to use Static NAT
    
    set security nat static rule-set 1 from zone untrust
    set security nat static rule-set 1 rule 1 match destination-address 1.1.1.1/32
    set security nat static rule-set 1 rule 1 then static-nat prefix 10.1.1.1/32

    We have a bunch of SIP trunks across SRX, all of our SIP signalling servers and media servers are using private IP addresses with SRX currently doing static NAT address translation for all signaling and media servers ( NAT related SIP signaling/SDP issues are handled at software, SRX has SIP ALG intentionally turned off, SRX is not aware of SIP). Now when business grows, we are running out of public IP addresses, I need to figure out way to solve this problem, I want to achieve is a SIP media load balancer on SRX, 

     

    Since SIP SDP allocates media server IP and port for a SIP session without SRX's awareness (there will be NO STUN involved),  SRX can not change source port when performing NAT (port  n-otranslation), we can only give out one IP addresses for media servers, because SRX would not dynamically know which public IP to source NAT with if we have a NAT address pool. 

     

    While I can change software behavior to start media first in order to establish the session on SRX so return traffic can flow to the right media server and I can also make sure no backend media servers use the same source port, this "Traffic to node 1 is blocked when HA data plane is in active-active mode" is now a show stopper for me, none of the work around is applicable to my situation.

     

    I am confused that why active-active cluster is relavent, I am running active-standby cluster, should SRX just use the whole 65K port (in my case, no port translation), or this KB article only applies to active-active cluster, and I can safely ignore the warning if I am running active-stanby HA cluster?



  • 2.  RE: Traffic to node 1 is blocked when HA data plane is in active-active mode
    Best Answer

    Posted 05-01-2020 13:55

    By default in Cluster, data plane mode is in active-active mode, regardless of user configuration.

    root@srx> show chassis cluster information detail
    node0:
    --------------------------------------------------------------------------
    Redundancy mode:
    Configured mode: active-active
    Operational mode: active-active

     

    You may change the cluster redundancy-mode to active-backup as mentioned in KB21263 to fix the issue
    - Both nodes need to be rebooted simultaneously after the config change

    root@srx> show chassis cluster information detail | match "mode|node[01]:"
    node0:
    --------------------------------------------------------------------------
    Redundancy mode:
    Configured mode: active-backup
    Operational mode: active-backup
    node1:
    --------------------------------------------------------------------------
    Redundancy mode:
    Configured mode: active-backup
    Operational mode: active-backup

     

     



  • 3.  RE: Traffic to node 1 is blocked when HA data plane is in active-active mode

    Posted 05-01-2020 14:07

    Thank you so much,  I only have one redundancy-group (besides redundant group 0, of course), but the reboot part is a bummer.