Junos OS

Expand all | Collapse all

Strange aggreagtion issue on eBGP

Jump to Best Answer
  • 1.  Strange aggreagtion issue on eBGP

     
    Posted 11-06-2018 07:30

    Hi,

     

    We have now connected our two data networks together for redundancy and have run into a very strange issue.

     

    Just to be clear:

     

    THW Core ae0 ----------------------------------- ae0 Core HEX

     

    iBGP is running between the core loopback interfaces.

    ISIS is the IGP

    eBGP peering over xe-1/2/5

     

    THW Core aggregated routes remain advertised when the connection is made.

    HEX aggregated routes disappear completely.

     

    If I disconnect the data centres, the HEX aggregated routes re-appear.

     

    Any ideas anyone?

     



  • 2.  RE: Strange aggreagtion issue on eBGP

    Posted 11-06-2018 08:44
    what do mean by disappear?

    Are the routes hidden? Your question is not clear to me but that could be an expected behavior.

    Can you explain your topology and who advertise these aggregate routes?


  • 3.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-06-2018 09:06

    Hi Kingsman,

     

    Funny you should mention that as I have an update..... and this will be good learning for troubleshooting.

     

    Take the two sites, as follows:

     

    THW CORE ---------------------------------  HEX CORE

     

    THW Core has aggregated, advertised routes to one upstream ISP and HEX Core has different advertised aggregated routes to another upstream ISP.

     

    When they were separate the routing was fine to the upstream ISPs. When connected it was not. However, after completing some troubleshooting I found out the following:

     

    While disconnected

    run show route advertised-protocols bgp <peer address>

     

    All aggregated routes appeared.

     

    After connection was made between the data centres

    run show route advertised-protocols bgp <peer address>

     

    No aggregated routes showed

     

    I know why this occuring now. All routing from HEX Core is now going across iBGP or ISIS (I need to investigate a little further) to THW Core and out to that upstream ISP. Naturally, as the HEX Core routes are not advertised there, then nothing will work from HEX, only from THW.

     

    So, I'm getting there with the troubleshooting, now I have to find a way of making HEX routes exit via HEX Core and not THW Core.

     

    I will leave open and update as I go for a good bit of troubleshooting for other people.

     

    I'm open to any suggestions on what to look for to influence the routes (local preference, MEDs)......

     

    Thanks



  • 4.  RE: Strange aggreagtion issue on eBGP

    Posted 11-06-2018 09:46
    Ok.. I am not sure what looked funny to you.

    But here’s my thought on this issue.

    Assuming THW CORE advertise aggregate route “A” to its upstream ISP and HEX CORE advertise aggregate route “B” to him upstream.

    Now you say when you interconnect THW and HEX with iBGP and ISIS, HEX stops advertising aggregate route “B” to it’s upstream provider. Correct me if I am wrong.

    If so, could you confirm what export policy is applied on HEX to advertise the route to its upstream? Only the active route in the routing table gets exported.

    You need to check the policy and the route in the routing table after interconnecting both HEX and THW. There could be scenario you are exporting the route from protocol aggregate but the same route could be received from the THW via ISIS which replace the aggregate route with ISIS route in routing-table.

    Match the export policy and route in the routing-table.


  • 5.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-07-2018 00:57

    Hi Kingsman,

     

    Yes. Preference of an internal route is what I suspected too. I will investigate today and let you know the results.

     

    There has to be "live" contributing routes for aggregate routes to be advertised in BGP.... these do exist.... so, I'll check the routing and forwarding tables and the policies and see what I can find there. 

     

    If you want me to post the config here then please say.

     

    Thanks



  • 6.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-07-2018 08:26
      |   view attached

    Hi Kingsman,

     

    Okay. I will try and explain this as best I can over forum messaging.

     

    When looking at the routing tables I found that IS-IS was being preferred, but the reason for this was, unknown to me, someone on THW-Core was advertising the complete internal network. So, I removed that aggregate route and the ISO configuration on the connecting interfaces, hey presto, all of the aggregated routes for HEX re-appeared. All good so far...... except....

     

    One of the tests we need to complete is a loss of upstream ISP on one site and ensure the routes traverse the interconnects and exit the opposing site, therefore ensuring failover. So, for example, if we shut the upstream ISP interface on HEX, we expect all routing to go to THW from HEX across the core routers. Again, here is the topology (Basic):

     

    THW CORE -------------------------- HEX CORE

     

    Port xe-1/2/5 is to the Upstream ISP.

     

    Disable xe-1/2/5 on HEX Core. I expect all routes that did exit xe-1/2/5 on HEX to now traverse across the connection to THW and exit xe-1/2/5 on THW.

     

    So, I disabled the xe-1/2/5 interface on HEX Core. Checked the routes on THW and none of HEX routes appeared (I did add them to THW aggregated list):

     

    run show route advertised-protocol bgp <peer address>   on THW. Only THW routes appeared.

     

    run show route protocols aggregated detail  - No contributing routes showing.

     

    So, I found the right policy that was only adding "Direct" routes to iBGP. So, I configured the following as a test:

     

    On HEX Core:

    set policy-options policy-statement internal-bgp-peers term 2 from protocol isis

     

    And the routes appeared on THW.

     

    So, now I am left with the following position:

     

    I can route from a DSL CPE to THW xe-1/2/5 interface, but no further. The routes appear in the aggregated table, they also appear in the contributing list.... I have also checked on "lookingglass" and the routes are seen there in the BGP tables. It's almost like iBGP is not getting the routes to eBGP somehow.

     

    I have attached a basic overview of the network.....

     

    Please let me know what other information you would like to point me in the right direction.

     

    I will carry on investigating.

     

    Thanks

     

     



  • 7.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-08-2018 00:57

    As an add on to this..... A thought I had regarding what is currently occuring is that the "other" upstream ISP might not be accepting those new routes. Waiting for a response from them as to what filters they have in place.

     

    Edit add on:

    No. It appears the other upstream ISP is accepting those routes.

     

    So, given the diagram I attached, imagine a CPE DSL Customer the other side of Wholesale, comes into HEX-LNS-02, out of ae1 to HEX-CORE-02 and, normally, through xe-1/2/5 to the upstream ISP. But in this DR test I have disabled xe-1/2/5 on HEX-CORE-02. Now, when I complete the following command on THW-CORE-01:

     

    Let's say the two routes should be:

    192.168.10.0/30

    192.168.100.0/30

    set routing-options aggregate route 192.168.10.0/24

    set routing-options aggregate route 192.168.100.0/24

     

    run show route advertising-protocol bgp <peer address>  ----  I see the following:

    * 192.168.10.0/24 Self I
    * 192.168.10.0/30 Self I

    * 192.168.100.0/24 Self I
    * 192.168.100.0/30 Self I

     

    And if I run the following command:

     

    run show route protocol aggregate detail:

    192.168.10.0/24 (1 entry, 1 announced)
    *Aggregate Preference: 130
    Next hop type: Reject, Next hop index: 0
    Address: 0x2a2f284
    Next-hop reference count: 13
    State: <Active Int Ext>
    Local AS: 11111
    Age: 18:42:58
    Validation State: unverified
    Task: Aggregate
    Announcement bits (3): 0-KRT 2-BGP_RT_Background 7-Resolve tree 4
    AS path: I (LocalAgg)
    Flags: Depth: 0 Active
    AS path list:
    AS path: I Refcount: 2
    Contributing Routes (1):
    192.168.10.0/30 proto BGP

    192.168.100.0/24 (1 entry, 1 announced)
    *Aggregate Preference: 130
    Next hop type: Reject, Next hop index: 0
    Address: 0x2a2f284
    Next-hop reference count: 13
    State: <Active Int Ext>
    Local AS: 11111
    Age: 17:42:08
    Validation State: unverified
    Task: Aggregate
    Announcement bits (3): 0-KRT 2-BGP_RT_Background 7-Resolve tree 4
    AS path: I (LocalAgg)
    Flags: Depth: 0 Active
    AS path list:
    AS path: I Refcount: 1
    Contributing Routes (1):
    192.168.100.1/32 proto BGP

     

    If I complete a traceroute from the CPE at HEX side, I can get through to the THW-CORE-01 ae0 interface. If I run the following command on THW-CORE-01 ae1 interface:

     

    run monitor traffic interface ae0 no-resolve size 1500 matching "net 192.168.10.1"   -----   

     

    and ping the xe-1/2/5 interface on THW-CORE-01 I see the packet flow. When I run traceroute from the CPE I get to the ae0 interface and no further and if I try and ping 8.8.8.8 I see no traffic.... so, I think there is an issue with the advertsing or the Policy on THW-CORE-01 rather than HEX-CORE-02....... 

     

    Any help here would be great .....  if the above has not completely confused you  🙂

     

    But with the attached network topology it will make sense...



  • 8.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-08-2018 03:15

    I am not sure I follow the topology so apologies if this is off. 

     

    The issue I see is how the site will know that the ISP has failed at the remote site.

    We have to have the aggregate routes ready to advertise from what we see on the remote site link to our own ISP.  But only when the remote ISP is down.

     

    It would be easy to advertise them all the time with either different as prepend, local pref to the ISP or in different prefix lengths.  But to turn this on and off based on an event on the remote router is tricky.

     

    My simple solution would be:

    for the primary site advertise two aggregate routes each being half of the ip space.  This will make the longest match win and most traffic will come here.

     

    On the backup site advertise the full aggregate single prefix so it is in the ISP tables and ready to go but generally is not used.

     

    This could then be up all the time with no event detection needed.

     

     

     



  • 9.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-08-2018 04:00
      |   view attached

    Hi Steve,

     

    No need for apologies. I am very appreciative of the help here.

     

    I have attached here a new, more detailed network topology to try and show where BGP is running and the traffic flow. I have created two made up customers. So, with regards to the attached document, here is the scenario:

     

    2 x data centres: 1 at Harbour Exchange (HEX) and one at Telehouse West (THW). We will have a 50/50 split estate across the two sites.

     

    A customer at site HEX (as marked with IP 192.168.12.10 on the diagram) would have atraffic flow through HEX-LNS-02, across interface ae1 to ae1 on HEX-CORE-02 and then out of xe-1/2/5 on HEX-CORE-02 to the upstream ISP (marked as ASN 23456 on the diagram).

     

    The same flow would occur on THW (but obviously through the THW equipment to the upstream ISP marked as ASN 98765 on the diagram).

     

    So, one of the Disaster Recovery tests I need to complete is to simulate a loss of one ISP from one of the sites. So, on HEX-CORE-02 I disable interface xe-1/2/5 to simulate the loss (this is from remote so commit confirmed is always used).

     

    Now the traffic flow should be (so no loss of service for customers at HEX):

     

    Customer at 192.168.12.10 will route through HEX-LNS-02, then the ae1 interface to HEX-CORE-02, then across iBGP, so ae0 to THW-CORE-01 and then out of xe-1/2/5 at THW-CORE-01 to the upstream ISP on ASN 98765.

     

    Hopefully, with the diagram and that explanation it should look a lot better.

     

    Please let me know what information you would like to see?

     

    policies?

    BGP configuration?

    Route details?

    Output from troubleshooting commands?

     

    Any thing you need, then please let me know.

     

    I will continue troubleshooting and will also try your suggestion.

     

    Thanks Steve

     

     



  • 10.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-08-2018 09:30

    Update:

     

    Okay. So, I have got the routing working by enabling IS-IS on the ae0 interfaces, but this appears to have caused another issue....

     

    I cannot peer with the HEX upstream ISP when re-enabling the port.

    Reason: Prefix limit reached.

     

    It appears that the internal iGP is sending the complete internet routing table to each other. Not sure if that is meant to happen. More troubleshooting required.



  • 11.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-09-2018 03:00

    For the igp it sounds like ISIS is importing bgp routes which you will not want to do in your case since the only use for the igp is to get the loopbacks and internal routing.  All the bgp should be done in the bgp peerings.

     

    For the iBGP are these the only two sites or is there multiple sites and a route reflector somewhere?

     

    It also sounds like you don't need full tables for this setup.  Since you have each site using only one ISP they only really need the default route to reach the internet.   You would only need full tables to mix the upstream usage of more than one ISP.

     

    Between the sites on iBGP you would advertise the default route from the ISP and your customer prefixes for reachability.  On the import policy you would mark these at a lower local preference so that this default is only used when the local ISP is lost.

     

    The export to the ISP policies I described above.  By using the more specific prefixes at the preferred site most traffic will arrive there but the longer prefix will be out and available for failover when the primary site is lost.

     



  • 12.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-09-2018 08:25

    Hi Steve,

     

    Thank you for the reply. Much appreciated.....

     

    From an iBGP perspective, I have configured the following group which only has 1 policy applied:

     

    THW-CORE-01:

    set protocols bgp group internal-peers type internal
    set protocols bgp group internal-peers local-address 192.168.1.2  -----   loopback address
    set protocols bgp group internal-peers export internal-bgp-route
    set protocols bgp group internal-peers peer-as 200994
    set protocols bgp group internal-peers neighbor 192.168.1.5  -----   peer loopback

     

    set policy-options policy-statement internal-bgp-route term 2 from protocol direct
    set policy-options policy-statement internal-bgp-route term 2 then accept
    set policy-options policy-statement internal-bgp-route term next-hop-self then next-hop self

     

    This is replicated on HEX.

    Can I assume then, given your information, that this policy is not correct? Should I remove the "next-hop-self" (I'm not sure I should or the iBGP peer will not know where to route everything.

     



  • 13.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-09-2018 17:28

    I assume you have other terms in the policy? 

     

    This would only send directly connected routes which I assume are your customer prefixes downstream.

     

    To have the backup isp link you would need to send over the ISP route.  Currently this is the full table BGP that you mentioned and also seems to not have enough memory to share.  Here I suggest you get a default only instead then have a term in this policy that matches and sends ony the bgp default route.

     

    The import policy would accept this bgp default route from the iBGP peer and set a local preference lower than the local ISP default route so it will only be a backup should the local route be lost.

     



  • 14.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-13-2018 00:42

    Hi Steve,

     

    So, currently, I have the following configured and set on the eBGP peering interface:

    set protocols bgp group External-Peers export isis-default

    set policy-options policy-statement isis-default term ipv4 from protocol static
    set policy-options policy-statement isis-default term ipv4 from route-filter 0.0.0.0/0 exact
    set policy-options policy-statement isis-default term ipv4 then accept

     

    Is this what I should also export in iBGP but with the obvious option of changing the local-preference so it becomes a backup?

     



  • 15.  RE: Strange aggreagtion issue on eBGP
    Best Answer

     
    Posted 11-13-2018 03:18

    Almost.  You don't want to create a static default route at all.  You will want that route to come in via BGP from the upstream carrier.  That way the route will leave when you lose the carrier link. 

     

    If you have this as a static route it won't go away in some cases.  Only if you lose the physical link to the carrier.  In order for failover to occur you need to have the default route removed when the upstream path is gone fo any reason not just physical link failure.

     



  • 16.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-13-2018 03:50

    Hi Steve,

     

    Okay, I have just set the default route and removed the static and configured it as an "export" policy on the iBGP group.

     

    Now, here is the issue we are experiencing:

     

    1: The complete table has gone, which is great.

    2: If I now complete a traceroute from a DSL subscriber on HEX (As per the diagram) when I shiutdown the upstream ISP (GTT) interface, I get a loop from the Core to the LNS backwards and forwards.

     

    On the Core router I have a 0.0.0.0/0 default route pointing towards the LNS. If I remove this then I have no routes available for DSL subscribers and also lose connectivity to HEX 9Commit confirmed is useful 🙂  ).

     

    This is a little confusing as with the default created policy applied as an export on the iBGP group, I would have thought it would have worked.

     

    It may be that I need to put the iBGP physical link (not the peering loopback) into IS-IS and retry.



  • 17.  RE: Strange aggreagtion issue on eBGP

     
    Posted 11-13-2018 05:41

    Hi Steve,

     

    The solution is nearly always a little piece of configuraiton that is missed.

     

    So, when they were two separate sites, the default route had to point back to the LNS. And it was STILL pointing to the LNS. I changed this to point to the peer physical interfaces between the iBGP peers and, hey presto, it is all working exactly as we want it to.

     

    I do have a separate question that I discovered during this troubleshooting, but I will ask that in a separate topic.

     

    Thank you for your help Steve, much appreciated.