We have an interesting issue that has just cropped up on our EX4200 switch stack at a school we manage. Every few days, the DHCP helper service (DHCP service) will fail and stop forwarding packets to the DHCP servers. This has worked for nearly 2 years, but all of a sudden, all DHCP stops after a few days. We can restart the DHCP service on the stack and it comes back and works for a while.
Any ideas out there on where to look? We did add 2 IP cameras to the network (in their own subnet) which coincides with the problem cropping up, but I can't image what 2 IP cameras could do to cause this, while we have hundreds of Chromebooks, PC's and student devices connecting during the school year without issue. We're in summer session which means that besides the cameras, and the wireless AP's, there are only about a dozen devices running, so load should be minimal.
From the attached logs, you can see that the switch runs fine until the errors appear and then no more DHCP...
This log message means that you are having a memory leak on DHCP daemon.
Can you please share the config of the dhcp-relay and dhcp relay statistics?
"show config forwarding-options dhcp-relay"
"show dhcp relay statistics"
If the relay-option-82 is configured then it could be similar to PR https://prsearch.juniper.net/PR1277433
If its this PR1277433, then the only option is to upgrade.
Hello Arpitch, Thanks for the quick reply. We had suspected a memory leak but unsure why it cropped up just now.
Here are the results of various commands:
@MDF-ServerRoom-Stack> show configuration forwarding-options dhcp-relay
Yields no results.
@MDF-ServerRoom-Stack> show dhcp relay statistics
warning: dhcp-service subsystem not running - not needed by configuration.
@MDF-ServerRoom-Stack> show version
JUNOS EX Software Suite [15.1R5.5]
JUNOS FIPS mode utilities [15.1R5.5]
JUNOS Online Documentation [15.1R5.5]
JUNOS EX 4200 Software Suite [15.1R5.5]
JUNOS Web Management Platform Package [15.1R5.5]
JUNOS Web Management Application package [15.1A3]
JUNOS Online Documentation [15.1R5.5]
@MDF-ServerRoom-Stack> show configuration | display set | match helper
set forwarding-options helpers bootp server 10.140.0.10 routing-instance vir-Access
set forwarding-options helpers bootp server 10.140.0.11 routing-instance vir-Access
set forwarding-options helpers bootp interface vlan.1408
set forwarding-options helpers bootp interface vlan.255
set forwarding-options helpers bootp interface vlan.1401
set forwarding-options helpers bootp interface vlan.110
set forwarding-options helpers bootp interface vlan.252
set forwarding-options helpers bootp interface vlan.1410
set forwarding-options helpers bootp interface vlan.1432
set forwarding-options helpers bootp interface vlan.1440
set forwarding-options helpers bootp interface vlan.100
set forwarding-options helpers bootp interface vlan.1412
My bad DHCP-relay should used the jdhcpd and not the "dhcpd".
Helpers use dhcpd and fud processes.
Can you collect "show helper statistics" and see if you see something abnormal ?
Does the IP cameras that you added does DHCP ??
Sometimes restarting just the dhcpd might not be fixing the leak properly.
Can you try rebooting the whole box or even try upgrading to one the JTAC recommended version using KB https://kb.juniper.net/InfoCenter/index?page=content&id=KB21476
Here are the results:
@MDF-ServerRoom-Stack> show helper statisticsBOOTP:Received packets: 2055Forwarded packets: 3011Dropped packets: 0Due to no interface in DHCP Relay database: 0Due to no matching routing instance: 0Due to an error during packet read: 0Due to an error during packet send: 0Due to invalid server address: 0Due to no valid local address: 0Due to no route to server/client: 0Due to received on ICL interface: 0
The cameras are DHCP. They are Axis cameras that have up to date firmware, so don't think there is a known issue about them. But I don't know exactly.
I do have the latest firmware (recommended JTAC) but haven't installed yet. I can do that tonight (after everyone leaves). The helper stats don't look off, although 2000 requests, not sure if that's since restarting the service this morning, or since the switch was rebooted which was about a month ago (when we first started seeing this issue).
These stats reset once you reboot the router or if you clear them manually using command "clear helper statistics"
How long does it take to reach RLIMIT_DATA of 85% after you restart the DHCP ??
Please keep us posted of the results after the upgrade.
I believe it takes about 4 to 5 days to run out of ram and fail...
I'll let you know what we see after the update. If the stats are since reboot and not restart of the dhcp service, then 2000 is right in line with what I'd expect in a month or so. Stack has only been up 42 days.
Since this an EX4200 VC, could you please try 12.3R12-S12 (not S13) and see what you get for results? Mu suggestion for legacy EX is stay away from 15.1, but if you plan or want to use this code, then run R7 minimum.
Please keep us posted on whether the upgrade helped or not.
Also keep monitoring the dhcp process to see if its increasing along with the helper statistics?
show system processes extensive | match dhcp
show helper statistics
I rolled the JUNOS back to the one you recommended, 12.3r12.4, and the memory is staying exactly where it is supposed to. We did have to adjust the config a bit, as the downgrade changed our aggregated port count and we lost our uplink to the routers... BUT, after a quick trip to the clients, adjust one line, commit, BAM network was up, and all seemed well. IT's been up for a few days now, and memory is right where it should be.
Thank you for your help! We'll keep monitoring it, but I don't anticipate any issues at this time.
HI swgarland !
Thanks a lot for the confirmation.
It was a pleasure working with you.
Kudos are highly appreciated.
Please Mark solution as accepted and hit Kudos if you like the solution.