SRX

View Only

last person joined: 5 days ago

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.

Back to discussions

Expand all | Collapse all

Is ECMP supposed to work on SRX cluster?

Jump to Best Answer

1. Is ECMP supposed to work on SRX cluster?

0 Recommend
p.k
Posted 07-03-2011 00:20

Reply Reply Privately
Hi All,

I'm trying to set up ECMP (per-flow load balancing) to 2 different uplinks in
a chassis cluster in my lab. The links are

se-1/0/0 - on node0
se-5/0/1 - on node1

I have static route with 2 next-hops and policy that allows balancing

set routing-options static route 0.0.0.0/0 next-hop 172.18.1.1
set routing-options static route 0.0.0.0/0 next-hop 172.18.2.1
set routing-options forwarding-table export balance
set policy-options policy-statement balance then load-balance per-packet

In forwarding table, both next hops are seen,

lab@jsrx# run show route forwarding-table
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0                    ulst 262142     2
                              ff.3.0.21          ucst   549     2 se-1/0/0.0
                              ff.3.0.21          ucst   564     3 se-5/0/1.0

However all my transit sessions go to se-1/0/0 (tried with ping and telnet)

Session ID: 296, Policy name: default-policy/2, State: Active, Timeout: 1794, Valid
In: 172.20.100.10/54872 --> 172.31.15.1/23;tcp, If: reth0.100, Pkts: 53, Bytes: 2945
Out: 172.31.15.1/23 --> 172.20.100.10/54872;tcp, If: se-1/0/0.0, Pkts: 41, Bytes: 2757

Session ID: 300, Policy name: default-policy/2, State: Active, Timeout: 1792, Valid
In: 172.20.100.10/61428 --> 172.31.15.1/23;tcp, If: reth0.100, Pkts: 36, Bytes: 2046
Out: 172.31.15.1/23 --> 172.20.100.10/61428;tcp, If: se-1/0/0.0, Pkts: 29, Bytes: 2055

Session ID: 329, Policy name: default-policy/2, State: Active, Timeout: 1784, Valid
In: 172.20.100.10/62054 --> 172.31.15.1/23;tcp, If: reth0.100, Pkts: 10, Bytes: 547
Out: 172.31.15.1/23 --> 172.20.100.10/62054;tcp, If: se-1/0/0.0, Pkts: 9, Bytes: 529

By the way, show route shows

0.0.0.0/0          *[Static/5] 00:57:57
                      to 172.18.1.1 via se-1/0/0.0
                    > to 172.18.2.1 via se-5/0/1.0

Does this mean that ECMP is not supported on chassis cluster? I haven't found an
indication for that in documentation, however. Junos 10.2R3.10.

#ECMP
2. RE: Is ECMP supposed to work on SRX cluster?
Best Answer

1 Recommend
Erdem
Posted 07-03-2011 02:12

Reply Reply Privately
ECMP on a cluster is indeed a special case. I don't know of any documentation about this, but when I tried this, I observed similar behavior. As far as I can remember, If there are two active routes for the same destination, the SRX will prefer the one which has a local interface, rather than crossing the FAB link.
This would make sense for A/A deployments, otherwise the FAB link would become the bottleneck.

You could test this theory by performing a failover, if I'm right then the sessions should use se-5/0/1 instead.
3. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
p.k
Posted 07-03-2011 02:37

Reply Reply Privately
Hi motd,

Thanks a lot for your reply. I tried the RG failover, however the routing did not
fail over and traffic is now traversing the fabric link, leaving through se-1/0/0.

I've done failover of both RGs,

lab@jsrx> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   200         secondary      no       yes
    node1                   255         primary        no       yes

Redundancy group: 1 , Failover count: 5
    node0                   200         secondary      yes      yes
    node1                   255         primary        yes      yes

And sessions are still leaving through node0:

{primary:node1}
lab@jsrx> show security flow session
node0:
--------------------------------------------------------------------------

Session ID: 2425, Policy name: default-policy/2, State: Active, Timeout: 1624, Valid
In: 172.20.100.10/50561 --> 172.31.15.1/23;tcp, If: reth0.100, Pkts: 0, Bytes: 0
Out: 172.31.15.1/23 --> 172.20.100.10/50561;tcp, If: se-1/0/0.0, Pkts: 0, Bytes: 0

Session ID: 2437, Policy name: default-policy/2, State: Active, Timeout: 1790, Valid
In: 172.20.100.10/55822 --> 172.31.15.1/23;tcp, If: reth0.100, Pkts: 0, Bytes: 0
Out: 172.31.15.1/23 --> 172.20.100.10/55822;tcp, If: se-1/0/0.0, Pkts: 0, Bytes: 0
Total sessions: 2

node1:
--------------------------------------------------------------------------

Session ID: 548, Policy name: default-policy/2, State: Backup, Timeout: 1622, Valid
In: 172.20.100.10/50561 --> 172.31.15.1/23;tcp, If: reth0.100, Pkts: 28, Bytes: 1611
Out: 172.31.15.1/23 --> 172.20.100.10/50561;tcp, If: se-1/0/0.0, Pkts: 23, Bytes: 1664

Session ID: 593, Policy name: default-policy/2, State: Backup, Timeout: 1790, Valid
In: 172.20.100.10/55822 --> 172.31.15.1/23;tcp, If: reth0.100, Pkts: 24, Bytes: 1399
Out: 172.31.15.1/23 --> 172.20.100.10/55822;tcp, If: se-1/0/0.0, Pkts: 20, Bytes: 1490
Total sessions: 2

Looks like SRX cluster is always using the first entry from the forwarding table
and ECMP is not working

lab@jsrx> show route forwarding-table
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0                    ulst 262142     2
                              ff.3.0.21          ucst   549     2 se-1/0/0.0
                              ff.3.0.21          ucst   564     2 se-5/0/1.0

Any other ideas (except Junos upgrade)?
4. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
Erdem
Posted 07-03-2011 03:24

Reply Reply Privately
The sessions still appear to be active on node0 and backup on node1, looks like that were sessions that were created when node0 was primary. Have you tried setting up new telnet sessions after the failover?

I'll see if I can find some of my old lab notes, we never implemented this because the customer wanted more granular control over which link was used and we ended up with FBF+VRs.
5. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
p.k
Posted 07-03-2011 05:15

Reply Reply Privately
Hi

These were new sessions, and they always go to node0's se-1/0/0.

I've tried upgrading to 11.1R3 now and it is the same, ECMP not working,
everything goes to se-1/0/0.

I'm adding another record to my list of things I don't understand about
SRX routing. Some other things, by the way, are:

- How is the routing table consulted for the reverse route during session init (routing instances, zones, etc - what matters?)
- How exactly do routing changes propagate into the session

In many cases it "just works", but it would be good to have a detailed
explanation. Because you never know in what case it will "just won't work"
(as this ECMP question)...
6. RE: Is ECMP supposed to work on SRX cluster?

1 Recommend
Erdem
Posted 07-03-2011 06:25

Reply Reply Privately
Thats interesting. I'll see what I can find tomorrow, I'm quite sure this was working in my lab setup. But that could have been with reth interfaces (we didn't use non-reth because before 10.2R3 we experienced issues with packet loss on the FAB link).

All good questions, juniper should write some KB articles about all this. This is where the SRX is different from the routers because it is stateful and because you can also use per-packet filters on SRX (for example for FBF), this can get really confusing.

How is the routing table consulted for the reverse route during session init (routing instances, zones, etc - what matters?)
Each interface is in exactly one virtual-router. Either the default one or a routing-instance of type virtual-router. When the first packet of a session arrives on an interface, a route lookup is performed in the corresponding table, for both source and destination.

Say you have something like this:
Client ---- [reth1] (Client-VR) (Server-VR) [reth2] ---- Server

If the client sets up a connection to the server, a destination route lookup is performed in Client-VR and a session is created. When the server responds, the SRX performs a session lookup, finds the existing session and will lookup up the reverse route in Client-VR again. Server-VR doesn't even need a route back to the client. Any FBF filters you may have applied on reth2 are ignored as well!

I use this behavior a lot to connect multiple ISPs to the SRX, each with its own PA address space. If a client connects to an IP from ISP1, it is important that the response is routed back through that same ISP because ISP2 would simply drop the traffic. So place each ISP in its own virtual-router and everything will work fine.

How exactly do routing changes propagate into the session
We have had quite a few discussions about this and it is on my list of things to test. Here are some things I do know:
- the route can't change to another egress zone. That would require a new policy lookup which simply doesn't happen.
- if the original route is still valid, nothing changes and the traffic is still routed that way. I often change route preferences to redirect traffic and this only affects new sessions, existing sessions still use the old path.
Sometimes this can be a problem. This is a good example: http://geogeeks.net/blog/2010/12/juniper-srx-udp-problem/

Questions still to be answered:
- Can a session failover be done from one interface to another if both interfaces are in the same zone?
- Can a session failover by done from an ethernet to a VPN interface? ScreenOS always had a problem with this but its possible in the more recent versions.

I need more time for labs 🙂
7. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
p.k
Posted 07-03-2011 11:33

Reply Reply Privately
Hi

Thanks for the info. Yes, a lot of lab testing is still needed 🙂
Please update the thread if you will have more info.

And I forgot to add another "don't understand" point. The book "Junos Security" says that
asymmetric routing is 100% supported on SRX, but gives no details. Again, is that true
in general case (different zones/routing instances)? In what exact cases it will or will
not work?
8. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
Erdem
Posted 07-03-2011 12:21

Reply Reply Privately
And I forgot to add another "don't understand" point. The book "Junos Security" says that
asymmetric routing is 100% supported on SRX, but gives no details. Again, is that true
in general case (different zones/routing instances)? In what exact cases it will or will
not work?
The main restriction is that the zones need to be the same. Asymmetric routing sometimes occurs if you have two BGP peers. You never know which route is used by the client so a request may be received on interface1 but your reverse route may point to interface2. If both these interfaces are in the same zone, that will work. If they are in different zones, the response packets will be dropped with a "zone mismatch" error.
This also implies that both interfaces need to be in the same routing instance as a zone can only be used in a single instance (I really wish they would remove that limitation).
9. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
Erdem
Posted 07-05-2011 02:20

Reply Reply Privately
@motd wrote:
And I forgot to add another "don't understand" point. The book "Junos Security" says that
asymmetric routing is 100% supported on SRX, but gives no details. Again, is that true
in general case (different zones/routing instances)? In what exact cases it will or will
not work?
The main restriction is that the zones need to be the same. Asymmetric routing sometimes occurs if you have two BGP peers. You never know which route is used by the client so a request may be received on interface1 but your reverse route may point to interface2. If both these interfaces are in the same zone, that will work. If they are in different zones, the response packets will be dropped with a "zone mismatch" error.
This also implies that both interfaces need to be in the same routing instance as a zone can only be used in a single instance (I really wish they would remove that limitation).
And saying this limitation wasn't there in the first junos-es releases... still wondering why they removed it.

Grtz,
Frac
10. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
p.k
Posted 07-05-2011 12:44

Reply Reply Privately
Hi motd, Hi Frac,

Thanks for your replies. Regarding asymmetric routing, do you know if it
works on high-end SRX (3k/5k)? For example, packet comes in NPC1, leaves
through NPC2. The return packet is routed back through NPC3. Assume zone is
the same. The doc says that session is installed in incoming and outgoing
NPC, so it will be 1 and 2, and NPC3 will not know about the session and
drop the traffic. Right? Unfortunately I can't check this in lab...
11. RE: Is ECMP supposed to work on SRX cluster?

1 Recommend
Erdem
Posted 07-09-2011 06:17

Reply Reply Privately
After performing some tests, I'll have to correct what I wrote earlier. Route changes do affect sessions but only if no routing-instances are involved.
If a session is established over interface X and a routing change occurs sending the traffic out via interface Y, the following happens:
-All traffic is now sent out via interface Y. if X and Y are in the same zone => no problem. As expected the session can still be matched. Note that in "show security flow session", the egress interface still shows X even though the traffic is now routed through interface Y.
- If X and Y are in different zones => the egress packest are dropped with message "packet dropped, pak dropped since re-route failed". At this time, the session will remain in the session table and all subsequent packets will be dropped with the same error message. If the "route-change-timeout" setting is set, the timeout of the session will be reduced to that value as soon as this error is encountered. It doesn't change when the routing changes, but only when traffic is seen.
- If the sessions exit through another routing-instance, route changes do NOT affect the session. This is why in our setup I can change the default route to another ISP without losing the already established sessions. New sessions follow the new route, the existing ones still use the original route.
I couldn't find a way to cause the session to be re-routed by changing routes in the client-side or server-side routing-instances.
Regarding asymmetric routing, do you know if it
works on high-end SRX (3k/5k)? For example, packet comes in NPC1, leaves
through NPC2. The return packet is routed back through NPC3. Assume zone is
the same. The doc says that session is installed in incoming and outgoing
NPC, so it will be 1 and 2, and NPC3 will not know about the session and
drop the traffic. Right? Unfortunately I can't check this in lab...
Don't know about that one, I would assume that because NPC3 doesn't know about the session, the packet is send to the CP which knows about the session and installs it in NPC3 as well. But I don't have the lab equipment available to test this either. I know there is a detailed explanation about this in the courseware (I think in AJSEC), but I don't have access to those right now - which I could get them in PDF.
12. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
p.k
Posted 07-10-2011 03:43

Reply Reply Privately
Hi motd,

Thanks a lot for that lab results, this is what I really wanted to know but was too lazy (or busy?)
to test myself...

I've got AJSEC book, it has some details about packet processing in the Appendix, but
I could not find a clear answer to asymmetric routing question. It is not obvious for me
if the return packet from different NPC will be sent to CP and if the CP will be able to
unterstand that it belongs to an existing session and new NPCs should be programmed
for the same session...
13. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
Erdem
Posted 01-14-2012 22:42

Reply Reply Privately
pk, reall nice and informative discussion ...

regards
14. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
Erdem
Posted 04-17-2012 21:23

Reply Reply Privately
http://kb.juniper.net/InfoCenter/index?page=content&id=KB23417

SRX support ECMP flow-based forwarding after 12.1.
15. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
p.k
Posted 04-18-2012 03:06

Reply Reply Privately
Hi JJJ

Thanks for updating this old thread. Can you confirm that this is also
working on the cluster? Sorry but I lost too much blood with this problem
so I will not mark thread as "solved" until I'm absolutely sure it works...
16. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
p.k
Posted 03-12-2013 02:45

Reply Reply Privately
Hi All

Just an update, I tested ECMP with cluster running 12.1X44-D10.4. It is working if I have several

uplinks on one node, but for uplinks connected to different nodes, there is no load balancing.

However if load balancing is turned on (export policy to forwarding table, etc), outgoing traffic seems to

choose the incoming node's uplink, so fabric link forwarding is minimized this way.

So I will mark Motd's first answer a a solution for now.
17. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
asbestos-muffin
Posted 05-10-2013 18:37

Reply Reply Privately
Hi, All.

Given that the SRX would create the session ingress/egress interfaces based on the forwarding table as shown at another thread, how would ECMP influence it now?

If reth0 and reth1 are both ECMP default routes, is there a guaranteed behavior that sessions created for incoming connections at reth0 would not user reth1 for return traffic because of ECMP?

I am migrating our design to multiple VRs for ISPs and came accross the KB for ECMP support on 12.1. It would be great if someone could confirm the SRX behavior in this use case.

- asbestos-muffin
18. RE: Is ECMP supposed to work on SRX cluster?

0 Recommend
sm_mc
Posted 10-10-2019 11:19

Reply Reply Privately
I know this is a very old thread but have recently been facing the very same problem on clustered SRX1500 running JTAC recommended 15.1X49-D170.4. I am updating the thread as I may have found a workable solution which I have been testing.

The topology in the thread aims to ECMP traffic over dual active ISP interfaces on A/A SRX Cluster. Using physical interfaces the traffic egresses through the primary route interface even though both next-hop addresses are available in the forwarding table.

Moving the physical interfaces to ethernet-switching and members of a VLAN, with the VLAN l3-interface as the ISP IP end-point

ECMP load-balancing does work. I assume this is because the VLAN will exist on both units regardless of location of physical link, and rouing over the VLAN inheriently sends the ECMP over the FAB link.

#show vlans | display set

set vlans ISP1 vlan-id 901
set vlans ISP1 l3-interface irb.901
set vlans ISP2 vlan-id 902
set vlans ISP2 l3-interface irb.902

#show routing-instances TRAFFIC | display set

set routing-instances TRAFFIC instance-type virtual-router
set routing-instances TRAFFIC interface irb.901
set routing-instances TRAFFIC interface irb.902
set routing-instances TRAFFIC routing-options graceful-restart
set routing-instances TRAFFIC routing-options static route 0.0.0.0/0 next-hop 1.1.1.1
set routing-instances TRAFFIC routing-options static route 0.0.0.0/0 next-hop 2.2.2.2
set routing-instances TRAFFIC routing-options static route 0.0.0.0/0 preference 10

#show interfaces | display set

set interfaces ge-0/0/1 unit 0 family ethernet-switching vlan members ISP1
set interfaces ge-7/0/12 unit 0 family ethernet-switching vlan members ISP2
set interfaces irb unit 901 family inet address 1.1.1.2/30
set interfaces irb unit 902 family inet address 2.2.2.1/30

SRX

Is ECMP supposed to work on SRX cluster?

p.k07-03-2011 00:20

Erdem07-03-2011 02:12Best Answer

p.k07-03-2011 02:37

Erdem07-03-2011 03:24

p.k07-03-2011 05:15

Erdem07-03-2011 06:25

p.k07-03-2011 11:33

Erdem07-03-2011 12:21

Erdem07-05-2011 02:20

p.k07-05-2011 12:44

Erdem07-09-2011 06:17

p.k07-10-2011 03:43

Erdem01-14-2012 22:42

Erdem04-17-2012 21:23

p.k04-18-2012 03:06

p.k03-12-2013 02:45

asbestos-muffin05-10-2013 18:37

sm_mc10-10-2019 11:19

1. Is ECMP supposed to work on SRX cluster?

2. RE: Is ECMP supposed to work on SRX cluster? Best Answer

3. RE: Is ECMP supposed to work on SRX cluster?

4. RE: Is ECMP supposed to work on SRX cluster?

5. RE: Is ECMP supposed to work on SRX cluster?

6. RE: Is ECMP supposed to work on SRX cluster?

7. RE: Is ECMP supposed to work on SRX cluster?

8. RE: Is ECMP supposed to work on SRX cluster?

9. RE: Is ECMP supposed to work on SRX cluster?

10. RE: Is ECMP supposed to work on SRX cluster?

11. RE: Is ECMP supposed to work on SRX cluster?

12. RE: Is ECMP supposed to work on SRX cluster?

13. RE: Is ECMP supposed to work on SRX cluster?

14. RE: Is ECMP supposed to work on SRX cluster?

15. RE: Is ECMP supposed to work on SRX cluster?

16. RE: Is ECMP supposed to work on SRX cluster?

17. RE: Is ECMP supposed to work on SRX cluster?

18. RE: Is ECMP supposed to work on SRX cluster?

2. RE: Is ECMP supposed to work on SRX cluster?
Best Answer