All valid concerns, I think. The good thing is that all RPM tests in a probe have to fail in order for the probe to be considered down.
I'm weary of using the ISP gateway because 1) sometimes the gateway address lives very close to your device and would be unaffected by outages; and 2) even when (1) doesn't apply, a problem within the ISP network may not affect your reachability of the gateway.
Even Google DNS is somewhat regional -- it only tells you that you can reach the closest Google point of presence that advertises 8.8.8.0/24 to the world. But good enough is, well, good enough. I think Cloudflare's 1.1.1.1 is anycast like that as well.
If you have a second site under your control (hopefully with its own two ISPs), then pinging that would be nice as well. Sometimes I wonder what would happen if one day 8.8.8.8 stops responding -- the sheer number of alert emails that would start flying around might make a seasoned spammer jealous.
As far as the ip-monitoring action goes, I'm not a fan of changing the default route by policy like that because when both connections are working, your backup connection is essentially "unreachable." I put each ISP connection in its own virtual router and then export the default routes into the main routing instance (or any routing instance needed; you could even divide traffic and use both connections at the same time). The route import / export policies can then be affected by the presence or absence of dummy routes (e.g. 10.255.255.255/32 or something like that) in the ISP routing instances which are inserted there by ip-monitoring policies.
------------------------------
Nikolay Semov
------------------------------
Original Message:
Sent: 04-22-2026 09:20
From: ALFREDO JO
Subject: Subject: ISP Failover on SRX - RPM probes best practices?
Hey all,
Curious how others are handling dual-ISP failover on their SRXs.
We're currently using RPM probes with ICMP pings to a couple of endpoints - the ISP gateway and 8.8.8.8 - tied to an ip-monitoring policy that shifts the preferred route to ISP2 when both probes
fail. Works, but I'm not fully confident in it.
My main concern is the reliability of using public IPs like 8.8.8.8 as probe targets. Google and Cloudflare have no obligation to respond to our pings - they could rate-limit or deprioritize
ICMP at any point and we'd trigger a false failover. Given how many devices around the world are pinging 8.8.8.8 every few seconds, I wouldn't blame them if they did.
A few things I've been thinking about:
- Probing the ISP gateway directly since that's part of our circuit and they can't ignore it
- Switching from ICMP to TCP probes (port 53 or 443) since those are harder to suppress
- Using HTTP probes against something like captive.apple.com since those endpoints exist specifically for connectivity detection
- BGP + BFD if anyone's gone that route on business circuits
What are you all doing in production? Any gotchas with the ip-monitoring + preferred-route approach we should be aware of, like flapping during brownouts?
Thanks
------------------------------
ALFREDO JO
------------------------------