Routing

Backup routing instance ignored on ACX2200

  • 1.  Backup routing instance ignored on ACX2200

    Posted 01-22-2021 06:35
    Hi Folks, 

    We've set-up a fail-over scenario on an ACX2200, with OSPF on two upstream interfaces and single client interface with their public prefix assigned.

    ge-0/0/0 is primary, ge-0/0/1 is backup and ge-0/0/2 is the client interface.

    ge-0/0/0 and ge-0/0/1 are connected with OSPF adjacency established at all times and getting a default route, ge-0/0/1 is in a backup router instance and RIB groups used to move the default route and client prefix around.

    This scenario works perfectly on SRX300 devices we've deployed as client CPEs, but some need Gb throughput so we've repurposed some ACX2200s to the task, but the failover just isn't working.

    There has been a bit of a journey so far, routing tables have looked perfect all along (routed prefix in both primary and Backup routing tables, and the default route moved accordingly as fail-over and fail-back situations were simulated. The PFE however was a different story, on the original firmware we were using (17.3R3.10), the backup PFE table only held a default route (with nh being the upstream gateway) for the backup prefix, the result being a TTL expiry as the packet bounced between this ACX and the secondary gateway. In other words it looked like this (All IP addresses changed to private address for privacy):

    show route 10.251.1.2 
    
    inet.0: 561 destinations, 1117 routes (561 active, 0 holddown, 0 hidden)
    + = Active Route, - = Last Active, * = Both
    
    10.251.1.0/24      *[Direct/0] 00:00:13
                        > via ge-0/0/2.0
                        [OSPF/10] 00:00:12, metric 1003
                        > to 10.134.108.17 via ge-0/0/1.0
    
    Backup.inet.0: 562 destinations, 565 routes (562 active, 0 holddown, 0 hidden)
    + = Active Route, - = Last Active, * = Both
    
    10.251.1.0/24      *[Direct/0] 00:00:13
                        > via ge-0/0/2.0
                        [OSPF/10] 00:00:12, metric 1003
                        > to 10.134.108.17 via ge-0/0/1.0

    while the PFE looked like this :

    show pfe route ip prefix 10.251.1.2 
    
    Slot 0
    
    
    IPv4 Route Table 0, default.0, 0x80000:
    Destination                       NH IP Addr      Type     NH ID Interface
    --------------------------------- --------------- -------- ----- ---------
    10.251.1/24                                        Resolve   586 RT-ifl 325 ge-0/0/2.0 ifl 325
    
    
    IPv4 Route Table 5, Backup.5, 0x0:
    Destination                       NH IP Addr      Type     NH ID Interface
    --------------------------------- --------------- -------- ----- ---------
    default                           10.134.108.17    Unicast   579 RT-ifl 0 ge-0/0/1.0 ifl 330
    ​


    For completeness, this was the status during a simulated fail-over (ge-0/0/0 disabled) :

    show route 10.251.1.2  
    
    inet.0: 560 destinations, 560 routes (560 active, 0 holddown, 0 hidden)
    + = Active Route, - = Last Active, * = Both
    
    10.251.1.0/24      *[Direct/0] 00:08:48
                        > via ge-0/0/2.0
    
    Backup.inet.0: 562 destinations, 562 routes (562 active, 0 holddown, 0 hidden)
    + = Active Route, - = Last Active, * = Both
    
    10.251.1.0/24      *[Direct/0] 00:08:48
                        > via ge-0/0/2.0
    
    show pfe route ip prefix 10.251.1.2  
    
    Slot 0
    
    
    IPv4 Route Table 0, default.0, 0x80000:
    Destination                       NH IP Addr      Type     NH ID Interface
    --------------------------------- --------------- -------- ----- ---------
    10.251.1/24                                        Resolve   586 RT-ifl 325 ge-0/0/2.0 ifl 325
    
    IPv4 Route Table 5, Backup.5, 0x0:
    Destination                       NH IP Addr      Type     NH ID Interface
    --------------------------------- --------------- -------- ----- ---------
    default                           10.134.108.17    Unicast   579 RT-ifl 0 ge-0/0/1.0 ifl 330
    
    In this scenario, pings to 10.251.1.1 (this device) are successful, but pings to 10.251.1.2 (the client *past* this device go TTL expired as they bounce between 10.134.108.20 (this device) and 10.134.108.17 (the backup default gateway). Even the the route table looks good, the PFE output shows why this happens.

    A firmware upgrade solved this problem and now we have a [table] route in the PFE, again, for completeness, this is now the (in failover) status :

    show route 10.251.1.2 
    
    inet.0: 560 destinations, 560 routes (560 active, 0 holddown, 0 hidden)
    + = Active Route, - = Last Active, * = Both
    
    10.251.1.0/24      *[Direct/0] 00:05:21
                        > via ge-0/0/2.0
    
    Backup.inet.0: 562 destinations, 562 routes (562 active, 0 holddown, 0 hidden)
    + = Active Route, - = Last Active, * = Both
    
    10.251.1.0/24      *[Direct/0] 00:05:21
                        > via ge-0/0/2.0
    
    show pfe route ip prefix 10.251.1.2 
    
    Slot 0
    
    
    IPv4 Route Table 0, default.0, 0x80000:
    Destination                       NH IP Addr      Type     NH ID Interface
    --------------------------------- --------------- -------- ----- ---------
    10.251.1/24                                        Resolve   524 RT-ifl 325 ge-0/0/2.0 ifl 325
    
    IPv4 Route Table 5, Backup.5, 0x0:
    Destination                       NH IP Addr      Type     NH ID Interface
    --------------------------------- --------------- -------- ----- ---------
    10.251.1/24                                          Table     1 RT-ifl 0
    ​


    The problem, however, is that while we no longer get a TTL expire on pings to the protected prefix, there are no responses either (although, again, the IP on ge-0/0/2 responds normally) and clients on IPs within the prefix have no connectivity, and I have no idea why not! Again, we have lots of these configurations on our SRX300 CPE devices and they work just fine.

    If anybody has managed to get this far and has anything at all they'd like me to try PLEASE let me know, I've hit a complete brick wall at this stage!

    Thanks for bearing with me!!


    ------------------------------
    Ciaran Kendellen
    ------------------------------