SRX

 View Only
last person joined: yesterday 

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.
Expand all | Collapse all

Host-bound traffic on SRX340 chassis cluster

  • 1.  Host-bound traffic on SRX340 chassis cluster

    Posted 07-13-2019 10:40

    Hello all,

     

    We have an SRX340 chassis cluster in active/active configuration with a few redundancy groups.

    reth1 is comprised of ge-0/0/3 and ge-5/0/3 and configured as a trunk port for VLAN `test`. It is in redundancy group 1.

     

    Vlan `test` is bound to l3-interface irb.100 that has a layer 3 configuration:
    `family inet address 10.10.10.1`

     

    irb.100 is part of security zone `trust` that has `host-inbound-traffic system-services ping` configured.

     

    We have link monitoring in place that makes sure that redundancy group 1 fails over to node1 when the link fails.

    Failover works as expected and the primary will be assigned to node1 when I e.g. disconnect the ge-0/0/3 link.

     

    However though, pings are not answered when redundancy group 1 is active on the secondary node (node1). As soon as a RE failover (redundancy group 0) occurs and the node1 becomes the primary node in the cluster, pings are being answered again.

     

    Is this by design? It is a major issue for us as we have routed l3 traffic going through these ports and this would basically force us to do a full RE failover when any of the links fail.

     

    Thanks a lot in advance!

     

    Pascal

     

     



  • 2.  RE: Host-bound traffic on SRX340 chassis cluster

     
    Posted 07-13-2019 13:13

    Hi Pascal,

     

    Of the top of my head, there isn't a known problem with active-active/Zmode traffic.

     

    When you attempt pinging, do you see any interface error counters increment on the 'interface extensive' output for this irb?

     

    Alternately, can you see this captured on the output of 'monitor traffic interface <ifname>' ? Please also try this on the individual child ports.

     

    Cheers

    Pooja

    Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!



  • 3.  RE: Host-bound traffic on SRX340 chassis cluster

    Posted 07-13-2019 13:53

    Hi Pooja,

     

    Thanks a lot for your response. I forgot to mention that I was on 15.1X49-D170 and upgraded to 15.1X49D180 now with no luck.

    I browsed through the release notes, but couldn't find any issues either.

     

    I checked on my device and am actually getting ARP responses from the SRX, but the ICMP requests are not answered.

    Further tried to test if I could establish traffic flow (ping) between two different devices in different reth interfaces within the `test` vlan. It works fine if they are both on the same chassis and fails if they are not.

     

    On the irb.100 interface Self and ICMP packet counters increase and error counters are all at 0. Still no response when connected to the secondary chassis.

     

    Cheers,

    Pascal



  • 4.  RE: Host-bound traffic on SRX340 chassis cluster

     
    Posted 07-13-2019 14:07

    Thank you for letting me know Pascal. The only known limitation when it comes to Z-mode is IPsec vpn traffic in a dual active-backup scenario.

     

    This other test you performed might be far more useful, the one where you tested echo requests between two different devices in the same test vlan. 

     

    Can you do/provide the following please?

     

    1. Share the interface configuration for the associated reth.x interfaces, irb.x we transit when running this new test you mentioned

    2. Share the vlan related configuration for this test vlan 'show vlans'

    3. Collect a flow trace when you test this?

     

    Ensure that your test is specific to say about 5 echo requests, so the data doesn't get overwritten.

     

    Cheers

    Pooja

    Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!



  • 5.  RE: Host-bound traffic on SRX340 chassis cluster

    Posted 07-13-2019 14:29
      |   view attached

    Thanks again. I'll need to investigate how to collect the flow trace, but will provide it soon.

    Please find attached the (stripped down, I tried to leave everything relevant in) configuration. Maybe that provides some hints already?

     

    `show vlans` didn't give any interesting output (I think). The only thing I noticed is that when I tried to run it on the secondary node, it told me that "the l2-learning subsystem is not running", but I suppose that's expected on a secondary? It was working fine on the primary.

    Attachment(s)

    txt
    config.txt   4 KB 1 version


  • 6.  RE: Host-bound traffic on SRX340 chassis cluster

     
    Posted 07-13-2019 14:38

    Hi Pascal,

     

    That would mean the L2 learning daemon is not running.

     

    Could you check if restarting l2-learning service is a quick fix?

     

    If not, do follow the instructions I shared for the traces.

     

    Cheers

    Pooja

    Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!



  • 7.  RE: Host-bound traffic on SRX340 chassis cluster
    Best Answer

    Posted 07-13-2019 14:38

    From the top of my head, you have configured a chassis cluster with switching but only with layer3 fabric. To utilize switching in a cluster, you will also need seperate interfaces for "switch fabric" (swfab0 + swfab1). That could be the reason for the behaviour you are seing.

     

    If you actually don't need switching on the SRX gateways I will suggest to convert the configuration with logical units with vlan-tagging on reth1

     

    set interfaces reth1 unit 100 vlan-id 100

    set interfaces reth1 unit 100 family inet address 10.101.41.1/24

    set security zones security-zone TEST interfaces reth1.100

    etc...

     



  • 8.  RE: Host-bound traffic on SRX340 chassis cluster

    Posted 07-13-2019 14:53

    @jonashauge wrote:

    From the top of my head, you have configured a chassis cluster with switching but only with layer3 fabric. To utilize switching in a cluster, you will also need seperate interfaces for "switch fabric" (swfab0 + swfab1). That could be the reason for the behaviour you are seing.


    Hi Jonas,

     

    Thank you so much for the hint, this has indeed solved the issue!

    Spent the entire day trying to figure this out and feel a bit dumb now.

    I hadn't found instructions on swfab in the manual when setting up the device. 

     

    Huge thanks to Pooja as well for the super helpful debugging instructions.



  • 9.  RE: Host-bound traffic on SRX340 chassis cluster

     
    Posted 07-13-2019 14:58

    Thanks for letting us know Pascal!

     

    Have a good one

    Pooja



  • 10.  RE: Host-bound traffic on SRX340 chassis cluster

    Posted 07-13-2019 14:42
      |   view attached

    Thanks for the capture instructions!

     

    I checked the flow sessions in the meantime, and it looks perfectly normal to me, except that when I connect my test host to the secondary, while everything looks fine, the number of flow sessions will increase over time. I ended up seeing 10+ sessions after a little while.

     

    Command output attached.

    Attachment(s)

    txt
    flow-sessions.txt   3 KB 1 version


  • 11.  RE: Host-bound traffic on SRX340 chassis cluster

     
    Posted 07-13-2019 14:08

    Pascal,

     

    Here's how you configure flow traceoptions:

    Refer https://kb.juniper.net/KB16108

     

    Cheers

    Pooja



  • 12.  RE: Host-bound traffic on SRX340 chassis cluster

     
    Posted 07-13-2019 14:10

    Also Pascal,

     

    What does the output of 'show security flow session source-prefix <icmp src-ip>' look like?

    I am interested in seeing the entry for both, node0 and node1 for this command.

     

    Cheers

    Pooja