Routing

 View Only
last person joined: yesterday 

Ask questions and share experiences about ACX Series, CTP Series, MX Series, PTX Series, SSR Series, JRR Series, and all things routing, including portfolios and protocols.
Expand all | Collapse all

Challenging : OSPF adjacency flapping between Full to Loading...

Surya

Surya08-22-2014 07:36

  • 1.  Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-21-2014 14:30
      |   view attached

    Hi all !

     

    I have a flapping OSPF adjacency between FULL to EXCHANGE which I'm not able to solve. This adjacency is between a Juniper M320 (JunOS 9.3S5.1) and a Cisco uBR7225VXR (CMTS). Both of the adjacency are in the global routing table (no VRF).

     

    1. Hello, interval, dead time are ok => adjacency passed the TWO-WAY state
    2. MTU is OK => adjacency  passed the EXCHANGE state + show interface and traceoption show the same MTU
    3. DDB packets, LS Update and LS ackno are OK

     

    Cisco device is complaining that it doesn't receive his own router-id on the hello packet.

    WTF: OSPF: Cannot see ourself in hello from x.x.x.x on Port-channel1.850, state INIT

     

    Juniper device is complaining that it stop receving the hello packet while the dead timer expired :

    OSPF neighbor 10.122.0.14 (ae0.850 area 8.0.3.0) state changed from Full to Down due to InActiveTimer (event reason: neighbor was inactive and declared dead) (nbr helped: 0)

     

    I tried both traceoption on Juniper and debug on Cisco, but I'm not able to understand what happend. No error, no warning !

     

    So I sniffed the traffic between the Juniper and Cisco and I saw

    1. that Cisco and Juniper correctly exhange hello packets, except that I suspect the Juniper (according to the timestamp of wireshark) doesn't respect the hello interval : Hello interval is set to 1 sec, but I saw many hello packet in less than 1s.
    2. that from an unknow reason, the Juniper remove the Cisco router-id from the OSPF Hello packet it sends to the Cisco.
    3. I was not able to determine if the Juniper remove the Cisco Router id from the OSPF hello packet AFTER it declares it down (capture traffic show it received the hello packets !) and BEFORE it declares it down

     

    I would appreciate your help !

     

    Thank you !

     

    Salah

    JNCIE-ENT

    JNCIE-SP

     

    Attachment(s)

    zip
    Packet_capture.zip   120 KB 1 version


  • 2.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

    Posted 08-21-2014 16:11

    Hi,

     

    It's a quite interesting problem. I am not familiar with Cisco but I will still share my opinion.

     

     

    From your snoop I noticed that the Junos device is declaring the adjacency down at packets 56, 112, 164, 238, etc...

    Since the configured dead interval is 4s we must assume that the last received and processed Hello from the Cisco device was sent 4s + espilon earlier. If we check again the snoop we notice that 4 seconds earlier the Cisco device always sends an unicast Hello message to the Junos device. And apparently Junos always ignores the following multicast Hellos.

     

    From RFC2238:

     

            On broadcast networks and physical point-to-point networks,
            Hello packets are sent every HelloInterval seconds to the IP
            multicast address AllSPFRouters.  On virtual links, Hello
            packets are sent as unicasts (addressed directly to the other
            end of the virtual link) every HelloInterval seconds. On Point-
            to-MultiPoint networks, separate Hello packets are sent to each
            attached neighbor every HelloInterval seconds.

     

    I am assuming your LAG is a regular broadcast network. Thus I don't see the point for the Cisco device to send unicast Hello packets. How is Junos supposed to deal with that? On its side Junos never sends unicast Hellos.

     

    My idea: maybe Junos after seeing the first erroneous unicast Hellos is incorrectly discarding the following valid multicast Hellos.

     

    As a mean to test this hypotesis, we could check the configuration on the Cisco device or we could try to configure a firewall filter on the Junos device to discard unicast Hellos. Not easy at first sight. How to distinguish unicast Hellos from other valid unicast OSPF messages? From the snoop it seems that all unicast Hellos sent from the Cisco have a packet size of 94... Well... this sounds crazy but why not try? 🙂



  • 3.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

    Posted 08-21-2014 16:28

    Take a look at this article. It seems like the hello interval is too aggressive for OSPF even though that value can be configured. If you need more aggressivenes for failure detection use BFD instead.

    http://www.juniper.net/documentation/en_US/junos13.3/topics/example/ospf-timers-configuring.html

     

    Adjust the hello interval to about 5 secs ad then monitor it for stability.



  • 4.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-21-2014 19:45
    With the default value of 40sec for dead-interval, it means that there were no hello received on M320 for 40 sec. Can you confirm if there isn't lot of host bound traffic or interface congestion?


  • 5.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-22-2014 02:06

    Hi All !

     

    Thanks for your reply. I forgot to mention that I did play with hello timer (hello interval and dead-interval) to check the stability. Whatever I put, I still have this flapping !  To reduce the database exchange, I use a stub area but flapping was still there....

     

    Plus, I did check traffic drop or packet errors : all at zero ! (no drop, no packet error)

     

    user@JUNIPER_re0> show configuration protocols ospf   
    traffic-engineering;
    reference-bandwidth 100g;
     */ OUTPUT OMMIED */
    area 8.0.3.0 {
        interface ae0.850 {
            hello-interval 5;
            dead-interval 40;
        }
    }
    

     

    Cisco :

    CISCO#show run | section router ospf 850
    router ospf 850
     router-id 89.158.252.0
     log-adjacency-changes
     passive-interface default
     no passive-interface Port-channel1.850
     network 10.122.0.0 0.0.0.15 area 8.0.3.0
     network 89.158.252.0 0.0.0.0 area 8.0.3.0
     default-metric 20
    CISCO#show run int Port-channel1.850
    Building configuration...
    
    Current configuration : 229 bytes
    !
    interface Port-channel1.850
     description BSOD
     encapsulation dot1Q 850
     ip address 10.122.0.14 255.255.255.240
     ip mtu 1570
     ip ospf cost 10
     ip ospf hello-interval 5
     ip ospf dead-interval 40 ip ospf priority 10 mpls label protocol ldp mpls ip end

     

     

     

    More idea ?

     



  • 6.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-22-2014 05:39

    I investigated more and I can say that after adjacency transition to the FULL state, the routing-engine doesn't receive any OSPF hello packet from the Cisco although  the Cisco sent them. Comparing traceoptions + tcpdump (monitor traffic interface) on Juniper and debug on Cisco between specific time interval help me on that diagnostic. But the Juniper is able to see (or maybe) process again once the session was tear down... (sorry I'm not sure about my english 🙂 )

    There is a switch (Extreme Networks) betweend the Juniper and the Cisco. The configuration of this switch was double checked and everything is ok. The last things I have to do is to sniff the traffic between the switch and the Juniper to undersantand a) if the switch drop some OSPF hello packet or b) if the Juniper is not able to process them (going to Juniper PFE and not to the RE...)

     

    Is there any command that will help me to see if the OSPF packets are present on the PFE and not on the RE ?

     

    Thanks for you help !

     

    Salah

     

     

     

     



  • 7.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-22-2014 07:00
    The quickest way to check would be to apply firewall filter to count the incoming ospf packets and also run CLI command " monitor traffic interface ae0.850" to see if RE receive the ospf packet. This would help you in identifying any mismatch between PFE and RE.


  • 8.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-22-2014 07:34

    Do you mean that the firewall filter is applying to the PFE for this kind of packets (OSPF Hello packet) ?

     

    Thanks for your reply !

     

     

     



  • 9.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-22-2014 07:36
    Yes, that's correct.


  • 10.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-24-2014 23:48

    I tried that this morning but it's not easy to troubleshoot like this : I can't filter different OSPF packet. So I will now if I receive and OSPF packet, but I won't know if it's a Hello, or DD packets.....

     

     

    --

    Salah



  • 11.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

    Posted 08-25-2014 01:51

    Hi Salah,

     

    At least you should be able to determine whether both types of Hello packets are received :

    1. unicast Hellos: sent by the Cisco device to the physical IP address of the Junos device.
    2. multicast Hellos: sent to 224.0.0.5

    As I explained in my previous post, I suspect that the Junos device is only receiving and processing the unicast Hellos. For some reason it seems as if the multicast are filtered somewhere..

     



  • 12.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

    Posted 08-25-2014 10:32

    You will only see the DD packets during the Adjacency formation. After the master/slave relationship is torn down, they use LSR,LSU and LSA to update the OSPF database. The best option to troubleshoot OSPF is to enable ospf traceoptions. That will record all the OSPF transactions.. Then also look at the OSPF database detail/extensive and you can match on a RID. Can you show the OSPF config? This sounds like a situation where there is a gre tunnel that is is being detected by OSPF as a route to a remote destination. Not saying this is the case, but it produces similar results.

    >show ospf ?

    This should help you troubleshoot:

    http://www.juniper.net/documentation/en_US/junos13.2/topics/task/configuration/ospf-tracing.html

     

    Here is some background information

    http://www.juniper.net/techpubs/en_US/junos11.4/topics/concept/ospf-routing-packets-overview.html

    http://www.juniper.net/techpubs/en_US/junos11.4/topics/concept/ospf-timers-overview.html

     



  • 13.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-25-2014 12:40

    Did you try to include "packet-length" parameter which would help to match only Hello packets?

     

    family inet {
        filter count_ospf_hello {
            term a {
                from {
                    packet-length 80;
                    protocol ospf;
                }
                then count ospf_hello;
            }
            term b {
                then accept;
            }
        }
    }



  • 14.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

    Posted 08-25-2014 12:42
    apply it to the loopback interface


  • 15.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-25-2014 12:47
    >>> apply it to the loopback interface

    Yes it would work, but it will count the Hellos coming from all interfaces. And would be ideal if you have single OSPF session.

    If not, better to apply on interface level under respective subunit where you want to count the ospf hello packets.


  • 16.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

    Posted 08-25-2014 12:54

    add one or two more match conditions for example:

    from interface <int-name>
    from address <>
    from source-address <int-ip>



  • 17.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

     
    Posted 08-25-2014 13:22
    Like I said, it can be done, but isn't it too much overhead when the same can simply be achieved with firewall being applied on interface subunit?


  • 18.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

    Posted 08-25-2014 13:58

    Not too much overhead. Juniper systems are capable of handling the debug options in addition to the fact that this is only a temporary test. In my opinion it is also a better way to see if the packets are making up to the routing engine as they are handled by the RE. However, if you are not comfortable with applying it to the lo0 interface, that is okay.

    Have you enabled traceoptions for protocol ospf? if yes, what did you find? If no, why? That would be the first place to begin troubleshooting ospf problems.



  • 19.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

    Posted 09-10-2014 10:18

    Hi Daboss,

     

    Just for curiosity, did you manage to fix your problem?



  • 20.  RE: Challenging : OSPF adjacency flapping between Full to Loading...

    Posted 05-16-2018 16:48

    Hi

     

    i am also facing similar issue.

     

    Initally it was ip MTU mismatch which i resolved it.

     

    OSPF usually working fine but randomly the Juniper device sends too many Hello packet say 4 packets with in short span of time. Amount the 4 hello packet int the last 2 packet it send without listing router-if od the ASR.

    Since the ASR receive the hello packet without its own router-is it logg it as "Cannot see ourself in hello from <juniper routrer>, state INIT"

     

    After few seconds this getting resolved.

     

    i am able to capture the packet on ASR when the issue is happening but since its happening randomly i am not able to capture it on Junper device (any thoughts on how to capture when the issue is occuring)

     

    Since i have Riverbed Steelhead in between the ASR and juniper i want to identify where the issue is happening.

     

    Loooks like you too have the same issue so it should be Juniper.

     

    How did you resolve this issue.

     

    regards

    Logesh