Switching

 View Only
last person joined: 21 hours ago 

Ask questions and share experiences about EX and QFX portfolios and all switching solutions across your data center, campus, and branch locations.
  • 1.  Slow performance on EX46550

    Posted 09-17-2025 15:49

    We are trying to migrate from Juniper EX4550 ( EX4550-32F) with version: 15.1R7-S12 in a virtual Chassis to a Juniper ES4650-48Y-8C version 23.4R2.13 also configured for a virtual chassis.

    When we moved the existing connections over to the EX4650 we experienced some really slow performance hence we move the cables back to the EX4550 and the problem went away.   A summary of what we did and tried is shown below.  My question is how to diagnose and fix the slow performance problem on the Juniper EX4650.  

    1).  The new EX4650 was setup in a test environment where the V.C. and JUNOS version was configured and installed.

    2).  The new EX4650 also receive its initial configuration in the test lab environment where it was able to connect to the network for management purposes.

    3).  The new EX4650 also was configured pretty much exactly as the EX4550 except its up-link ports are in a LAG using interfaces 0/047 and 1/0/47 instead of 0/0/31 and 1/0/31 (on the EX4550).  

            a.  Also every port on the EX4650 is configured with the  "family ethernet-switching storm-control default' command while the EX4550 does not have this configuration on any of the ports. 

            b. I tested the routable-VLans and connections worked from with ping testing and connecting a test ESXi host with a spare 10 GB cable.

    4).  I then racked and mounted the EX4650 in a Server Rack directly below the EX4550 used in production to help for a smooth deployment and more testing.

    5).  I did move 2 more connections over for testing before the migration.

           a.  2 separate 1 - GB heartbeat connection for some storage arrays would not work well on the EX-4650.

           b.  We were using Ethernet to 1-GB Digital Attach Copper (DAC) Jumper cables on the EX4550.

           c.  I then just connected the same 1 x GB Ethernet connections (without the DAC - Adapter) to a Juniper EX4300 Gb Switch and the same connection worked well.

    6).  For additional testing I then moved 2 x 10-GB DAC Jumper cables that were being currently used on the EX-4550 to the EX4650.

           a.  Those 2 test connections worked well and they were directly connected to the ESXi Hosts from the EX4650. 

           b.  All of the DAC connections on the EX-4550 except for the 2 x Heartbeat connections used SFPP-PC015 and that model DAC is not listed on the EX4650 hardware compatibility web page: https://apps.juniper.net/home/ex4650-48y/hardware-compatibility          

           c.   Those same DAC's were moved over to the EX4650 Switch and the performance was not very good.  Note: The FS P/N is changed from SFPP-PC015 to SFP-10G-PC015, but the product is the same.  https://www.fs.com/products/39781.html

       

    7).  After we moved the same DAC's back to the EX4550 the performance for the servers was good again.

    I suspect the problem with the Juniper EX-4650 connections is that we should use DAC's that are listed supported, tested, and listed on the EX4650 Hardwawre compatibility web page.  Not to use the 10-GB Ethernet DAC's that are currently connected on the EX-4550.  I read that if you use an unsupported DAC transceiver the network performance us unpredictable. The interface configuration on the Switches looks to the same.



    ------------------------------
    KJS ADMIN
    ------------------------------


  • 2.  RE: Slow performance on EX46550

    Posted 09-18-2025 09:46

    How did you configure the 1Gb/s ports on the 4650 ?



    ------------------------------
    Olivier Benghozi
    ------------------------------



  • 3.  RE: Slow performance on EX46550

    Posted 09-19-2025 16:10
    Edited by KJS ADMIN 09-19-2025 17:33
    1).  We did not disable DRS as we un-plugged the DAC cable from the EX4550 to the EX4650. 
    2).  Hence the automatic migrations kicked in right away.
    3).  After trying just 1 x ESXi host for testing we just tried to disconnect and re-connect 1 DAC cable at a time (for example mgmt) and then test performance.
    4).  We realized that the performance on the ESXi host and the VM's on that ESXi host were good up until vmotion was triggered manually to move a VM from that test ESXi host to another host.
    5).  Hence we moved the Vmotion cable to the EX4650 from the EX4550 and manually triggered a VM migration while all of the other ESXi host connections were on the EX4550.  and that triggered slowness.
    Diagram
    6).  I ran a packet capture on a VM on the Test ESXi host and it was unrepsonsive and the pings from other ESXi hosts that were trying to ping the Est ESXi host were upwards of 300-500-700 miliseconds.  While before the vmotion trigger they were less than 1 millisecond.
    7).  After I moved the DAC cable back to the EX4550 then the problem went away and I was able to save the packet wireshark packet capture.
    8).  I saw a lot of tcp retrasmissstions and I think those retrasmittions cause the ESXi server to have bad performance.
           a.  There were a lot of network errors listed in the packet capture and those inclided:
    inclided: tcp.analysis.fast_retransmission || tcp.analysis.duplicate_ack || tcp.analysis.retransmission || tcp.analysis.lost_segment || tcp.analysis.out_of_order || tcp.analysis.window_full
    9).  I verified that the MTU is set for 1500.
    10).  I do not see any CRC errors or Drops from the Uplink port on the EX4650.
    Any thoughts?

     



    ------------------------------
    KJS ADMIN
    ------------------------------



  • 4.  RE: Slow performance on EX46550

    Posted 09-19-2025 17:27

    We are not using that 1 GB connection any longer.

    But since you asked we manually,  I manually changed the speed on that port

    >set chassis fpc 1 pic 0 port 20 speed 1g <enter> 

    But we ended up just moving that connection to a different switch. 



    ------------------------------
    KJS ADMIN
    ------------------------------



  • 5.  RE: Slow performance on EX46550

    Posted 09-19-2025 17:35
    Edited by KJS ADMIN 09-19-2025 17:35

    I think this might be a routing cable problem.  I will try to

    1).  Update the JUNOS. 

    2).  Temporarily Disable the VMware DRS.

    3).  Move all of the cables over from all of the ESXi hosts.

            a.  And the storage array.

    4).  Evaluate the performance before I trigger vmotion.

    5).  Then trigger vmotion and see what happens.

     

    I suspect that the Vmotion, Iscsi, and the Storage Array connections will not have much to any packet loss if they are connected on the same switch.



    ------------------------------
    KJS ADMIN
    ------------------------------



  • 6.  RE: Slow performance on EX46550

    Posted 24 days ago

    We ended up fixing this  problem with the performance by looking at how another EX4650 was configured and being used at our backup site. 

    Since the Backup Switch did not have this performance problem, I used that as a base line.  If the Backup Switch did not have specific commands I then removed those commands from the Production Switch (we are testing).  After removing those commands then there were no more problems on the production switch.

    The backup network is a good testing base and I have a list of connections that I have removed the following commands from the new production switch that we want to use.

    1).  I removed:  set routing-options nonstop-routing

    2).  I removed: set system syslog file messages match " "

    3).  I removed the following 2 commands.

            a.  >set interfaces vlan unit 0 family inet
            b.  >set interfaces vlan unit 1 family inet

    4).  I removed: set protocols dcbx interface all

    5).  I removed the connds:

    set interfaces irb unit 0 family inet dhcp vendor-id Juniper-ex4650-48y-8c-XH3722230774
    set interfaces irb unit 0 family inet6 dhcpv6-client client-type stateful
    set interfaces irb unit 0 family inet6 dhcpv6-client client-ia-type ia-na
    set interfaces irb unit 0 family inet6 dhcpv6-client client-identifier duid-type duid-ll
    set interfaces irb unit 0 family inet6 dhcpv6-client vendor-id Juniper:ex4650-48y-8c:XH3722230774

    6).  I removed: 

    set system phone-home server https://redirect.juniper.net
    set system phone-home rfc-compliant

    7).  I removed the following commands:

    set system processes general-authentication-service traceoptions file radius
    set system processes general-authentication-service traceoptions flag all

    8).  I added a command that was missing from the production switch:

    set system radius-options attributes nas-ip-address #...

    9). I removed the following commands:  

    set system services netconf ssh
    set system services netconf rfc-compliant
    set system services netconf yang-compliant

    10).  I removed the follwoing comands:

    set protocols layer2-control nonstop-bridging
    set protocols layer2-control bpdu-block disable-timeout 300

    ------------

    After removing the above commands the performance problem went away on the production switch.

    I then re-added all of the above commands to the back up switch (using an older OS version) and I was not able to reproduce the performance problem.

    ------------

    When trouble-shooting the problem on the production EX4550 (which is not working correctly)  the performance problem was triggered by initiating vMware vMotion from the SAN.  My plan is to add 1 command back at a time during a maintenance window and to test the performance.

    Then to add the remainder of the ESXi hosts ands connections from the EX4550 to the EX4650. 



    ------------------------------
    KJS ADMIN
    ------------------------------



  • 7.  RE: Slow performance on EX46550

    Posted 24 days ago

    Thanks for letting us know :)

    By the way, a production switch/router should not have any traceoptions configured (except when debugging problems), as it's expected to add some more work.



    ------------------------------
    Olivier Benghozi
    ------------------------------



  • 8.  RE: Slow performance on EX46550

    Posted 23 days ago

    Thank you for the details in this. 



    ------------------------------
    Randy Shulse
    ------------------------------



  • 9.  RE: Slow performance on EX46550

    Posted 22 days ago

    Any other advice?  The only commands that I think the switch should really need to have is

    >set routing-options nonstop-routing<enter>for future update purposes.

    Otherwise the other commands are not really necessary.

     



    ------------------------------
    KJS ADMIN
    ------------------------------



  • 10.  RE: Slow performance on EX46550

    Posted 16 days ago

    The network card that we have on the ESXi controller is: Intel(R) Ethernet Controller X710 for 10GbE SFP+ and that doe snot fiully support DCBX protocols.

    It  seems the 2 commands that are most likely to have caused problems are:

    1).  set protocols dcbx interface all

    2).  set protocols layer2-control bpdu-block disable-timeout 300

    We will see how the remaining ESXi hosts work on the Juniper EX4650 after they are migrated as well.  Thus far 2 of the 5 ESXi hosts that had bad performance problems with the EX4650  But do not with the EX4550 are now working very well for about 30 days on the Juniper EX4650.

    According to Chat GPT:

    Why Removing the DCBX Config Fixed It

    When you removed:

    set protocols dcbx interface all

    The EX4650 no longer attempted DCBX negotiation on its interfaces. As a result:

    • The switch treated the ESXi host like a regular Ethernet peer

    • No attempts to negotiate PFC/ETS

    • No malformed or mismatched DCBX TLVs

    • The Intel X710 just did basic Layer 2 forwarding - which it handles fine

    • Performance normalized, even under vMotion


      What This Likely Means

      1. The EX4550's DCBX behavior is "passive" or non-intrusive

        • It supports basic DCBX TLVs, but likely does not enforce or fully negotiate PFC/ETS.

        • This means the Intel X710 (which does not support full DCBX under ESXi) doesn't care, and the link behaves like a standard Ethernet connection.

        • No side effects. So it "just works."

      2. The EX4650's DCBX implementation is "active" or stricter

        • When set protocols dcbx interface all is applied, it actively tries to exchange capabilities like PFC, ETS, maybe even tries to reconfigure traffic classes or priorities.

        • Since the Intel X710 can't fully respond (under ESXi), this mismatch leads to degraded performance:

          • Bad flow control negotiation

          • Pause frames in one direction

          • Queuing issues

          • Incorrect priority tagging

          • Buffer exhaustion under high throughput (like vMotion)

      3. vMotion stresses the link

        • Under normal usage, traffic might be low enough that the side effects are hidden.

        • But vMotion saturates the link, revealing any instability in pause/flow control, mis-negotiated CoS priorities, or buffer handling.



    ------------------------------
    KJS ADMIN
    ------------------------------