vSRX Latest Trial Version: 15.1X49-D70.3
vCenter 5.50, ESXi: 5.5.0U3
I deployed one vSRX OK with a basic config, all seemed OK, then I redeployed 2 new instances and tried to setup a simple chassis cluster. As soon as clustering was enabled after a reboot I suffer from disappearing interfaces - any combination of fxp, control link or fabric interfaces don't come up on one or both instances. After multiple reboots they might all come up, reboot again and any might not come up.
I am using 1 x dvSwitch for fxp and Control link, and another for fabric interfaces.
The control link seems stable now on both vSRX's but the fabric interface is quite flakey. I have enaled promiscuous, forged transits and MAC address changes on all switches, and at the moment both firewalls are deployed on the same host to eliminate any external network issues.
Sometime all interfaces refuse to come up, and looking at the ports on the dvSwitch the link states doen't show as Link Up. Checking the message log the only thing I can find is:
jsrpd: PVIDB: Attribute 'jsrpd.use_tvp_eeprom' not present in dB
jsrpd: failed to initialize sockets
Also many times I get an error that the jsrp-service subsystem is not running, and leaving it in this state if then goes to debug.
I am trialling this for an immediate project, and although first impressions of a single instance were good, trying to enable clustering hasn't been a great success. How do other people find chassis clustering on VMware? I understand there is always added complexity and variances of underlying infrastructure, but we have a fairly plain VMware configuration so I had hoped it would have been a fairly quick way to deploy 😕
Results of show int terse:
node 0: no g/e interfaces, no fab interfaces mapped to g/e, but em1 and fxp0 are all up
node 1: fab interfaces are present, but reports down for ge-7/0/0 (understandable, it's not even present)
both fab interfaces are on the same dvSwitch
1: After enabling the cluster the
vNIC1 --. fxp0
vNIC2-- em0 (mainly used for the control link)
vNIC onwards the ge-x/x/x interfaces. One of the ge will mapped in the fab interface.
2: I hope you have correct mapping of the interfaces. YOu can add more interfaces as in the document : https://www.juniper.net/techpubs/en_US/vsrx15.1x49-d40/topics/task/configuration/security-vsrx-vmware-adding-interfaces.html
3: Any specific reason why do you have different IPs on your fab0.0 interface on both the box?
Thanks for the reply,
I did add the additional NICs as per the document (even though there shouldn't have been a need to).
So I stripped it right back to basics - I made a new vSwitch for all interfaces (previously I was using dvSwitches) - and bingo it worked first time. When the cluster came up with em0, fab and the reth interface it all seemed stable, so I went back to the original deployment:
2 x dvSwitches, both SRX's on the same ESXi host, back to instability and interfaces not coming back.
I made a new dvSwitch, new VLAN on the network, no exisitng config anywhere and put all interfaces into that - the same.
I then moved the interfaces back into their designated previously existing dvSwitches, same result
I then vMotioned one of the SRX's to a different host - immediately all interfaces came up and the cluster formed correctly, without even a reboot of the vSRX. Many reboots, testing etc later and I'm happy to report it is all stable.
Just for testing I vMotioned the other vSRX to the second host - immediately the troubles began again. So it looks like it is an issue when using dvSwitches on the same ESXi host, but vSwitches are OK. Luckily in my production design I will use anti affinity rules to separate out the vSRX's to different hosts anyway, but this is what I have found so far.
I also managed to get the latest release to trial form my Juniper rep, so I'll deploy these again and see if the the problem is still apparant. I did find a release note regarding change of behaviour of MAC address and vSwithces, but IIRC this is related only to reth interfaces.
For anyone who is also having this problem, this happened to me with all the same symptoms. Enable "Expose VMware Hardware Assisted Virtualization to the guest OS" in your CPU settings for your vSRX. Once I did this all these problems went away and performance was much better.
This is documented in the link below:
Without this, Nested RE becomes slow and you may see issues like one you described.
Thanks - enabling the option Expose hardware-assisted virtualization to guest OS made a huge difference, I missed this when initially deploying.