Do you happen to have flow control enabled?
> show interfaces xe-1/1/0
Physical interface: xe-1/1/0, Enabled, Physical link is Up
Interface index: 187, SNMP ifIndex: 625
Description: Uplink to QFX 2/2 AE30
Link-level type: Ethernet, MTU: 1522, Speed: 10Gbps, Duplex: Full-Duplex, BPDU Error: None, MAC-REWRITE Error: None,
Loopback: Disabled, Source filtering: Disabled,
Flow control: DisabledFlow control can cause massive tail drops.
Micro bursts are an unaviodable evil of networking. Some solve it with larger buffers, but then you add delay instead and who wants a packet that is half a second old? In some cases it may be better to receive packets with delay than drop them, but not always. I think your problem is that you receive traffic on a high speed interface (10 G?) and try to put ot out on a slower one (1 G?). More interfaces in the LAG may solve the issue but not always as the packet distribution in a LAG is not always the best. Your best bet is to upgrade the link if at all possible. In witch switch do you see the drops? I guess in the EX4500 as that should be your central aggregation given that the EX4200's are 1 G switches. Where does the traffic come from?
Another way of mitigating tail drops is to split the connection in VLAN groups. If you have, say 2 x 1 G with all your VLANs now, splitting out the worst VLAN to a LAG of its own will reduce or eliminate packet loss for the other VLANs. The EX4200 only has 2.5 MB packet buffers per PFE. PFE 0 serves the first 24 interfaces, PFE1 the next 24 interfaces if this is a 48 port switch and the last one serves the expansion slot (PIC 1). This means that if you have a 48 port switch, you may well see an improvement by splitting the LAG ports between the port groups, say ae0 has members ge-0/0/0 and ge-0/0/24, ae1 has ge-0/0/1 and /25 and so on. This will utilize the buffers from both PFEs for all LAGs. If you have the EX4200-24F, splitting LAGs between the ge-0/0/x and ge-0/1/x interfaces may work as well. Depending on your traffic patterns, this may or may not be optimal, but it's worth thinking about.
Buffer memory and tail drops are only relevant for traffic sent out on an interface, not receiving it.
Original Message:
Sent: 12-03-2020 02:22
From: Abed AL-Rahman Bishara
Subject: Virtual chassi s tail drop pakcets
Hi
I think the best way to accomplish this task (finding micro-bursts) is port mirroring to different port
Right now we cannot port mirror from 20G LAG interfaces to some other device (since we need something strong to handle that amount of traffic)
Do you have any other idea to accomplish this task?
Thank you!
------------------------------
Abed AL-Rahman Bishara
Original Message:
Sent: 11-30-2020 14:40
From: Unknown User
Subject: Virtual chassi s tail drop pakcets
It seems you already increased the shared-buffer to 100 percent, then to stop the drops it will be required to find the source of the traffic, since micro-burst can't be fixed with CoS as the congestion issues that can be mitigated with CoS.
Regards,
Original Message:
Sent: 11-30-2020 13:21
From: Abed AL-Rahman Bishara
Subject: Virtual chassi s tail drop pakcets
Hi
Thanks for your answer
Thats not physical problem for sure
If it is a micro-burst , then what is the most proper commands to increase the bandwidth of queue 0 to avoid the tail dropped issue?
Do you have a sample of configuration set for ex4550 switch?
Thanks
------------------------------
Abed AL-Rahman Bishara
Original Message:
Sent: 11-30-2020 13:01
From: Unknown User
Subject: Virtual chassi s tail drop pakcets
Hello Abed,
-First make sure this is not a layer 1 issue (physical problem in any of the child interfaces of this AE)
-Check the duplex of the port
-Confirm if these drops are a result of congestion (exceeding the interface bandwidth)
If you are not facing any of above problems, then most likely these drops are a result of micro-burst (a short spike of packets received in a small interval at a rate much higher than the configured guaranteed bandwidth for a given queue).
You will have to capture some sample packets on that interface and analyze what type of traffic is causing the micro burst so you can identify the source of those micro-burst.
Regards,
Original Message:
Sent: 11-30-2020 12:35
From: Abed AL-Rahman Bishara
Subject: Virtual chassi s tail drop pakcets
Hello
We have a mixed virtual chassis that includes 4xEX4200 and 2xEX4550
Running 15.1R5.5
And we're experiencing a problem with tail-drop packets
for example:
admin@inter-BB> show interfaces queue ae3 Physical interface: ae3, Enabled, Physical link is Up Interface index: 132, SNMP ifIndex: 770 Description: Uplink UCS-2Forwarding classes: 16 supported, 4 in useEgress queues: 8 supported, 4 in useQueue: 0, Forwarding classes: best-effort Queued: Transmitted: Packets : 23692449417 Bytes : 16787250219509 Tail-dropped packets : 1173192 RL-dropped packets : 0 RL-dropped bytes : 0Queue: 1, Forwarding classes: assured-forwarding Queued: Transmitted: Packets : 0 Bytes : 0 Tail-dropped packets : 0 RL-dropped packets : 0 RL-dropped bytes : 0Queue: 5, Forwarding classes: expedited-forwarding Queued: Transmitted: Packets : 0 Bytes : 0 Tail-dropped packets : 0 RL-dropped packets : 0 RL-dropped bytes : 0Queue: 7, Forwarding classes: network-control Queued: Transmitted: Packets : 61618 Bytes : 6935602 Tail-dropped packets : 0 RL-dropped packets : 0 RL-dropped bytes : 0
We enabled this setting but didn't help:
set class-of-service shared-buffer percent 100
We moved some interfaces between switches from 4200 to 4550 , didn't help
We also tried the following settings but also didn't help:
set class-of-service drop-profiles terminal fill-level 100 drop-probability 100set class-of-service schedulers best-effort transmit-rate percent 100set class-of-service schedulers best-effort buffer-size percent 100set class-of-service schedulers best-effort priority strict-high
We opened a ticket to TAC support , but there is no progress so far
Has anyone encountered this problem and managed to solved it?
------------------------------
Abed AL-Rahman Bishara
------------------------------