SRX

 View Only
last person joined: yesterday 

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.
  • 1.  SRX4100 traffic drops due to high CPU usage by nsd process during commits

    Posted 19 days ago

    Hey all. 

    We've involved JTAC on this issue, but we haven't heard anything from them and we are in a hurry to find out what this issue might be. 

    The thing is that we have an SRX4100 were we see traffic drops during every commit. Even the easiest commits triggers the issue, ie. a change of description in a interface name. We observe that during the commits, the nsd process is very eager and consumes almost 100% CPU during a short periode of time. This timespan matches the timespan were we see packet drops. 

    There are several things that we have observed:

    • This issue is only seen on our SRX4100s configured with MNHA. We have several SRX4100s set up in chassis cluster without the issue. 
    • The SRX has almost 1000 policy rules, but we've removed all of them and this does not seem to solve anything. Also this is way below the number of rules that the SRX4100 is capable of. 
    • We've removed several DNS-name address book entries that didn't resolve to anything, but this did not help. 
    • We've tried to remove all SRG1+ except one, to see if that helped. But no fix. 
    • This is not load related because this SRX has been taken out of the production network and does not currently handle any traffic. 

    We have seen this PR related to nsd, but it is fixed in 23.4R2-S4 and this is the same version as we are currently running. Also our problem does not match the triggers in the PR. 

    I understand that this is hard for any of you to find out what might be the cause, but I am interested to hear if anyone has seen similar issues and most of all if anyone has any pointers to how we can get this temporarily fixed until a fixed software version arrives. 



    ------------------------------
    Best regards
    Vidar Stokke
    ------------------------------


  • 2.  RE: SRX4100 traffic drops due to high CPU usage by nsd process during commits

    Posted 19 days ago

    Hello

    It's VERY IMPORTANT to remember that the number of routes for these high end gateways have many routes to remember. I believe that redundancy is a serious point in all this. Even if it isn't a cluster. With that said there may not even be a single failure point. Communication is another point of interest. I have learned that h323 plays a big part in long distance because of its ability to traverse old networks. My phones have just recently gotten better in long distance communication. I want to try VPLS, but my asus wifi's are doing this as a dns server role. Hmmmmm. I've tried posting here about memory issues as well. Seems many NEED this. My xfinity phones are like old landlines, and I've learned some things. Hope this helps. Let us know.

    Perhaps request system storage cleanup

    https://community.juniper.net/discussion/high-routing-memory-utilization#bm1bdff71b-8321-4f16-8db8-019622b82c0d 



    ------------------------------
    Adrian Aguinaga
    B.S.C.M. I.T.T. Tech
    (Construction Management)
    A.A.S. I.T.T. Tech
    (Drafting & Design)
    ------------------------------



  • 3.  RE: SRX4100 traffic drops due to high CPU usage by nsd process during commits

    Posted 18 days ago

    No idea on a fix, but some ideas on poking around more:

    • Use | display detail on commit commands to see if you can correlate the time of high CPU with a particular portion of the commit
    • Split the commit into commit prepare and then commit activate to see whether high CPU happens during checks or during activation
    • Try sec security traceoptions to see if there's anything logged that might give a clue as to what we can do about it

    Hoping for clues ...



    ------------------------------
    Nikolay Semov
    ------------------------------



  • 4.  RE: SRX4100 traffic drops due to high CPU usage by nsd process during commits

    Posted 17 days ago

    Hi again Nikolay. 

    Thank you very much for the nice pointers. I will do this today and check for clues. 



    ------------------------------
    Best regards
    Vidar Stokke
    ------------------------------



  • 5.  RE: SRX4100 traffic drops due to high CPU usage by nsd process during commits

    Posted 18 days ago

    We have several customers running MNHA, but I'm not sure if that's with the SRX4100 or other models. The model type shouldn't really matter though. Are you sure the MNHA parameters are setup as they should and that the traffic path(s) between them are OK? I guess you have already checked this guide:

    https://www.juniper.net/documentation/us/en/software/junos/high-availability/topics/example/mnha-configuration-example.html

    If you get no response from JTAC, make the urgency of the problem clear (realistic expectations and requirements) in the case notes and call them on the phone and request to talk to the case owner. If that doesn't help, use the "escalate" button in the case portal. Surely, your local Juniper SE team can help out too? I have a feeling I know who they are ;) Perhaps there are special recommendations on Junos releases for MNHA, have them check that!




  • 6.  RE: SRX4100 traffic drops due to high CPU usage by nsd process during commits

    Posted 17 days ago

    Hi!

    Thank you very much for your answer. 

    Yeah.... I've been through the guide several times looking for clues. Everything seems up to speed with the MNHA configuration. 

    When it comes to JTAC, my reseller and local SE team has escalated this and I hope for response soon. Fingers crossed. 



    ------------------------------
    Best regards
    Vidar Stokke
    ------------------------------



  • 7.  RE: SRX4100 traffic drops due to high CPU usage by nsd process during commits

    Posted 2 days ago

    Hey guys. 

    An update on this issue:

    • First of all, the commands Nikolay recommended were useful tips, but did not give me any specific pointers towards our issue. Thanks again Nikolay for the great pointers. 
    • Secondly we did see that the issue happened right after the commit had completed and not during the commit. 

    But to come to our current conclusion and discoveries:

    • The issue is probably not related to high CPU caused by nsd, but JTAC is not certain of this. 
    • JTAC was unable to reproduce this in their labs with our configuration. 
    • We tested the following without any effect:
      • Moved the ICL link away from lo0 interface to revenue ports on the SRX4100s with L2 between them. No fix. 
      • Disabled encryption on the ICL link when using revenue ports. No fix. 
      • Removed all aggregated interfaces and used only single ports (based on suggestion from JTAC). No fix. 

    But... during our investigations we did see that there was issues with the virtual MAC-address on the virtual IPs, seen from the connected switches point of view. This did not get updated correctly on the switch side during failover. Based on this and as a "shot in the dark", we removed "use-virtual-mac" under all virtual-ips in the MNHA configuration. And... this actually "fixed" our issue... or is a "workaround" if you will. 

    We are currently still in process with JTAC to find out WHY this is happening. We run the same setup on SRX1600s without any issues. We don't think that there is something happening on the switch right after the commits, but we can't say for sure. 

    So thats an update on our issue if someone else ever experience similar problems. I will keep you posted. 



    ------------------------------
    Best regards
    Vidar Stokke
    ------------------------------