Hi all!
I have mx480 as a border GW with 3 BGP sessions with 2 providers: two BGP sessions with provider 1 and one with provider 2.
Sessions with provider 1 are preferable and are configured as the first one is main (higher LP), second is backup, the session with second provider is the backup of the backup. But the sessions with provider 1 are not quite independent: in some cases if main session goes down, traffic immediately switches to the second, but in other cases, when main session is down and second session could be alive, provider's bgp convergence could take some time (tens of minutes). For such cases I would like to switch traffic (by shutting down backup session of the provider 1) if border does not have connection to the internet for 30 seconds, for example by pinging 8.8.8.8. So I started to find out what is possible to do with event scripts.
Long story short, I have configuration like this, I expect that in case bgp neighbor 10.0.130.210 goes down, all the mentioned things in "then" section will happen:
set event-options policy bgp-down events rpd_bgp_neighbor_state_changed
set event-options policy bgp-down attributes-match rpd_bgp_neighbor_state_changed.old-state matches Established
set event-options policy bgp-down attributes-match rpd_bgp_neighbor_state_changed.peer-name matches 10.0.130.210
set event-options policy bgp-down then change-configuration commands "activate protocol bgp group toPE1"
set event-options policy bgp-down then change-configuration commands "top set interfaces ge-3/2/3 description test"
set event-options policy bgp-down then change-configuration commit-options log "changing config due to event script bgp-down"
set event-options policy bgp-down then execute-commands commands "ping 10.0.191.1"
set event-options policy bgp-down then execute-commands commands "show bgp summary"
Then I try to reproduce the problem by deactivating BGP session on neighbors 10.0.130.210 side. But the script does not work, and I suppose it is because the event is not triggered. Deactivating lines 2-3 about attributes-match does not change anything. The only thing I see in logs is:
Aug 16 14:16:15 MX480-RE0 rpd[5402]: bgp_handle_notify:4235: NOTIFICATION received from 10.0.130.210 (External AS 65002): code 6 (Cease) subcode 3 (Peer Unconfigured)
and nothing about "rpd_bgp_neighbor_state_changed".
In case I deactivate l3 interface on neighbors side:
Aug 16 18:19:46 MX480-RE0 rpd[5402]: bgp_io_mgmt_cb:1777: NOTIFICATION sent to 10.0.130.210 (External AS 65002): code 4 (Hold Timer Expired Error), Reason: holdtime expired for 10.0.130.210 (External AS 65002), socket buffer sndcc: 57 rcvcc: 0 TCP state: 4, snd_una: 208574444 snd_nxt: 208574482 snd_wnd: 16384 rcv_nxt: 1714757300 rcv_adv: 1714773684, hold timer 90s, hold timer remain 0s, last sent 5s, TCP port (local 179, remote 59862)
still does not work, and still nothing about "rpd_bgp_neighbor_state_changed".
So why is this not working?
Any thoughts if it is possible to solve this without scripting? The main problem is to check connectivity when main session is down and second is still up, but there is no connectivity with Internet.
------------------------------
Vladlen London
------------------------------