Junos OS

Expand all | Collapse all

Tail-dropped packets EX4200

  • 1.  Tail-dropped packets EX4200

    Posted 02-12-2019 06:35

    Hello, i have 10 switchs EX4200-48T, and i got tail-drops on ge-0/0/ interfaces:

     

    > show interfaces statistics | match Error | except Link-level | except "Output errors: 0" 
      Input errors: 0, Output errors: 6997
      Input errors: 0, Output errors: 428
      Input errors: 0, Output errors: 8424
      Input errors: 0, Output errors: 7107
      Input errors: 0, Output errors: 8544
      Input errors: 0, Output errors: 8731
      Input errors: 0, Output errors: 8299
      Input errors: 0, Output errors: 8225
      Input errors: 0, Output errors: 8252
      Input errors: 0, Output errors: 8402
      Input errors: 0, Output errors: 8330
      Input errors: 0, Output errors: 6992
      Input errors: 0, Output errors: 6981
      Input errors: 0, Output errors: 8136
      Input errors: 0, Output errors: 8461
      Input errors: 0, Output errors: 8093
      Input errors: 0, Output errors: 8075
      Input errors: 0, Output errors: 8265
      Input errors: 0, Output errors: 8298
      Input errors: 0, Output errors: 8380
      Input errors: 0, Output errors: 8588
      Input errors: 0, Output errors: 8429
      Input errors: 0, Output errors: 8512
      Input errors: 0, Output errors: 8582
      Input errors: 0, Output errors: 8629
    Forwarding classes: 16 supported, 4 in use
    Egress queues: 8 supported, 4 in use
    Queue: 0, Forwarding classes: best-effort
      Queued:
      Transmitted:
        Packets              :                337719
        Bytes                :              74149542
        Tail-dropped packets :                  9669
        RL-dropped packets   :                     0
        RL-dropped bytes     :                     0
    

     

    But on some ports there is almost no traffic. And it happens on all switches

      Input rate     : 0 bps (0 pps)
      Output rate    : 16944 bps (32 pps)
      Input errors: 0, Output errors: 9418
    

     

    I tried to change buffer size, but this is not helped for me.

    > show configuration class-of-service    
    shared-buffer {
        percent 100;
    }
    

    Thank you.



  • 2.  RE: Tail-dropped packets EX4200

    Posted 05-16-2019 11:46

    Hi,

     

    taildropped packets happen because of buffer exhaustion do you have a custom CoS configuration on the device?



  • 3.  RE: Tail-dropped packets EX4200

    Posted 05-17-2019 02:52

    Hi,

    It also can be caused by flow-control protocol. Please, disable flow-control on a link.



  • 4.  RE: Tail-dropped packets EX4200

    Posted 09-17-2019 07:05

    Good day, I apologize for the long time reply. Thank you all for your responses.

     

    On xe-0/0/0 interface:

    MAC control frames 0 0
    MAC pause frames 0 0

     

    And same on all others interfaces



  • 5.  RE: Tail-dropped packets EX4200

    Posted 05-19-2019 18:03

    You can try giving more bandwidth to best effort queue and see if that improves, check also if other queues are taking up more bandwith than necessary.

     

    Another theory is that the switch may be experiencing bottle neck , for example traffic coming from 10gb is going out to 1gb .

     

    Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too! Smiley Happy

     



  • 6.  RE: Tail-dropped packets EX4200

    Posted 09-17-2019 07:08

    On problem interface

    Input rate : 3847176 bps (387 pps)
    Output rate : 584248 bps (371 pps)
    Input errors: 0, Output errors: 2145011

     

    I see same problem on empty servers where i had rate about 1pps



  • 7.  RE: Tail-dropped packets EX4200

    Posted 06-06-2019 04:15

    Good Day,

     

    Could you please provide next output for the one of the affected interfaces, so we can check QoS settings:

    show class-of-service interface ge-0/3/0 comprehensive

     

    Thanks!



  • 8.  RE: Tail-dropped packets EX4200

    Posted 09-17-2019 07:09
    > show class-of-service interface ge-0/3/0 comprehensive 
    error: command is not valid on the ex4200-48t
    
    > show class-of-service interface ge-0/0/3    
    Physical interface: ge-0/0/3, Index: 133
    Maximum usable queues: 8, Queues in use: 4
      Scheduler map: <default>, Index: 2
      Congestion-notification: Disabled
    
      Logical interface: ge-0/0/3.0, Index: 74
    Object                  Name                   Type                    Index
    Classifier              ieee8021p-untrust      untrust                    16
    


  • 9.  RE: Tail-dropped packets EX4200

    Posted 10-29-2019 03:03

    Good Day Roman90,

     

    I strongly believe there is a configuration issue.

    Once interface with very low outgoing traffic is suffering from drops and it is seen on many devices - could be that queue just doesn't have any dedicated bandwidth, according to the configuration.

    Could you please provide class-of-service part of the configuration for the review?

     

    Thank you!



  • 10.  RE: Tail-dropped packets EX4200

    Posted 11-01-2019 04:08

    Hej

    It is hard to address without seeing the CoS config. Can you provide the full CoS config for the problem interface?

    #show class-of-service interface

     

    Do you only see the problem in BE queue or other queues drop traffic as well?
    >show interface <logical interface> queue

     

    Also are you monitoring your network? You might be getting bursts in certain times that might be causing the drops. That is why other times it might appear as there is no traffic.

    Regards
    Oscar



  • 11.  RE: Tail-dropped packets EX4200

    Posted 02-11-2020 10:39

    Hi,

     

    Packet congestion may seem difficult to troubleshoot and resolve, oversubscription and bursty traffic may be causing this.

     

    One question, did you get better or worst results after applying the 'shared-buffer' to 100% compared to the 50% by default?

     

    Benjamin



  • 12.  RE: Tail-dropped packets EX4200

    Posted 03-13-2020 15:50

    Hey Guys,

     

    Just to say we're seeing the same thing to.

    We also got recommended to up the shared buffer to 100% and it did work for a little while and has decreased the effect of the issue (however, it should be noted that he mentioned the buffer is 95% by default, not 50%).

     

    I'm guessing either the device is on its way out, or there is microbursting at play - of which most devices don't play nicely with.



  • 13.  RE: Tail-dropped packets EX4200

     
    Posted 03-14-2020 06:41

    If you can identify the ingress/egress port pairs where the congestion is occuring you might be able to alleviate the issue by moving them to the same switch and chip in a virtual chassis to minimize the path required and potential for congestion.

     



  • 14.  RE: Tail-dropped packets EX4200

    Posted 03-15-2020 13:15

    This is something I want to try - but somehow we've ended up with a 4 member stack where 2 members are fibre only and the other 2 are copper only, but all the customers are on the copper members and the 10gig uplinks are on the fibre members. I think the copper members don't have the uplink card to support 10g uplink at the moment and I'm not even sure it's available for the devices. But even if they were, that's not a hot-swappable part, so we'd have to turn everything off take the card out of the fibre devices, put it in the copper devices then power everything back on.

     

    At that point, might as well look into 4300s if the company can afford it. Conversationally, the dedicated VC ports are supposedly 32Gb/s interfaces (I'm just going off of the output of "show virtual-chasssi vc-port") and the 10G uplinks have ~2-3Gb/s of traffic going through them, so whilst the path is hugely inefficient as everything has to travel through a VC port and then an uplink to get out, I don't see how we're even close to bottle-necking. That's why it looks more like glitch/bug/hardware issue or micro-bursting (impossible to see on graphing unless you have something called telemtry?) - though it is understood that the 4200/4550's queing system is out-dated. Also, we have many other stacks in various implementations (some performing a lot of routing, some just providing pure L2 capabilities) which aren't having this issue - also, most people who experience this seem to be on 15.X code so wondering if there's anything going on there...

     

    Anyway, it is an interesting one, if we get the downtime, re-cabling a more efficient setup will be interesting to try. I think it was mentioned each 24-port section has an ASIC dedicated to it, so was thinking of maybe distributing ports in the fashion of:

    ge-0/0/0 : ge-0/0/24

    ge-0/0/1 : ge-0/0/25

    ge-0/0/2 : ge-0/0/26

    .......

    ge-0/0/23 : ge-0/0/48



  • 15.  RE: Tail-dropped packets EX4200

     
    Posted 03-16-2020 02:58

    Since you might not be hitting traffic limits do you have a class of service configuration in place where the queues might be the issue generating the tail drops?

     



  • 16.  RE: Tail-dropped packets EX4200

    Posted 03-19-2020 17:52

    I think there is actually CoS on a single port.

    JTAC didn't seem to mention it though.

    Since we've seen CPU increase, we had a look at what was causing the main CPU load - the stack being polled seemed to make the mib2 process spike up quite frequently - for a while we thought maybe that was causing it. But, we disabled SNMP completely - for the first 30 minutes, there were no drops, then I checked back about 10 hours later and I sadly see some drops had appeared.

     

    Re-enabling SNMP polling access to the device just increases the speed in which the drops occur. I will take a look at the CoS config on the single port tomorrow and see if anything pops out. I find it weird that a single port's config would cause basically every other active switch port to drop packets - but frequently in the world of computers and networking, 1 digit/character can cause catastrophically different results. It's also strange that the interface that has been doing the most traffic is the one with the least drops and is not the port that has been configured with CoS.

     

    One weird thing to note is that there are zero drops on the 10gig uplink interfaces in any direction and they're technically doing the most traffic; they are however on the other two members which are separate to the copper members that are currenly experiencing issues. And whilst drops happen on almost every port...it's NOT EVERY port, but it does always appear to be the same 10 ports spread across two members.



  • 17.  RE: Tail-dropped packets EX4200

    Posted 03-20-2020 10:42
    #Checking for config with CoS
    show configuration | match "class|cos" |except login | display set
    set class-of-service shared-buffer percent 100
    set class-of-service interfaces ge-2/0/14 shaping-rate 200m
    
    #Checking Interface with CoS - also matches the filter and policer names
    show configuration | match "ge-2/0/14|-sanitised-|200M-firewall-filter" | display set
    set interfaces ge-2/0/14 description "RESERVED QinQ - #HIDDEN#"
    set interfaces ge-2/0/14 unit 0 family ethernet-switching port-mode access
    set interfaces ge-2/0/14 unit 0 family ethernet-switching vlan members 957
    set interfaces ge-2/0/14 unit 0 family ethernet-switching filter input -sanitised-
    set class-of-service interfaces ge-2/0/14 shaping-rate 200m
    set firewall family ethernet-switching filter -sanitised- term 1 then accept
    set firewall family ethernet-switching filter -sanitised- term 1 then policer 200M-firewall-filter
    set firewall family ethernet-switching filter -sanitised- term 2 then accept
    set firewall policer 200M-firewall-filter if-exceeding bandwidth-limit 200m
    set firewall policer 200M-firewall-filter if-exceeding burst-size-limit 125k
    set firewall policer 200M-firewall-filter then discard
    
    #Surprisingly, this interface has drops today - hasn't had it before (don't worry about the Half Duplex, it's a cosmetic error with the 15.1 code)
    
    show interfaces ge-2/0/14 extensive | match "phy|speed|duplex|error|bps|flap"
    Physical interface: ge-2/0/14, Enabled, Physical link is Up
      Link-level type: Ethernet, MTU: 1514, LAN-PHY mode, Speed: Auto, Duplex: Half-duplex, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
      Last flapped   : 2018-09-11 06:17:45 UTC (79w3d 11:17 ago)
       Input  bytes  :           5007373448               441112 bps
       Output bytes  :            891298374                45256 bps
      Input errors:
        Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Policed discards: 0, L3 incompletes: 0, L2 channel errors: 0, L2 mismatch timeouts: 0, FIFO errors: 0, Resource errors: 0
      Output errors:
        Carrier transitions: 0, Errors: 0, Drops: 16, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        CRC/Align errors                         0                0
        FIFO errors                              0                0
            Link mode: Full-duplex, Flow control: None, Remote fault: OK, Link partner Speed: 1000 Mbps
                                  %            bps     %           usec
         Input  bytes  :                    0                    0 bps
         Output bytes  :                    0                    0 bps
    
    
    #In comparison, all the other interfaces that have drops
     show interfaces extensive | match drops | no-more | except "Drops: 0"
        Carrier transitions: 0, Errors: 0, Drops: 216210, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 20, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 9029, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 297, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 12361, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 5609, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 16, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 17, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 1093, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 6989, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 3480, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
        Carrier transitions: 0, Errors: 0, Drops: 31859, MTU errors: 0, Resource errors: 0
    
    #Doubt this is important but - classifier for the CoS interface and the interface that has the most drops is the same (I think all interfaces are in this default Classifier)
     show class-of-service interface ge-2/0/14.0
      Logical interface: ge-2/0/14.0, Index: 150
    Object                  Name                   Type                    Index
    Classifier              ieee8021p-untrust      untrust                    16
    
     show class-of-service interface ae4.0
      Logical interface: ae4.0, Index: 75
    Object                  Name                   Type                    Index
    Classifier              ieee8021p-untrust      untrust                    16
    
    show class-of-service classifier name ieee8021p-untrust
    Classifier: ieee8021p-untrust, Code point type: ieee-802.1, Index: 16
      Code point         Forwarding class                    Loss priority
      000                best-effort                         low
      001                best-effort                         low
      010                best-effort                         low
      011                best-effort                         low
      100                best-effort                         low
      101                best-effort                         low
      110                best-effort                         low
      111                best-effort                         low
    

    Looks like we're going to have to start reading the CoS books to get this to work the way we want - may try disabling the CoS we have in place now just to see if it affects it 🙂



  • 18.  RE: Tail-dropped packets EX4200

    Posted 03-25-2020 05:38

    Nothing worked in the end.

    Disabling CoS just made it worse and drops started happening on interfaces is wasnt previously happening on.

    We're going to try and upgrade to 4300s.

    Code 12.1 works fine - but I doubt I'd be able to convince a downgrade. Latest 15.x SR code mentions nothing about this issue so probably not addressed / not actually an issue (micro-burst etc). Pinpointing the actual issue is going to be too difficult (extremely sporadic and drops happen across multiple ports).

    It is annoying that it doesn't look like the device is being maxed out, but it's probably getting destroyed at sub millisecond speed; poor thing 🙂

    Thanks for the suggestions.