Junos OS

Expand all | Collapse all

Problem with RPM and event policies

  • 1.  Problem with RPM and event policies

    Posted 12-15-2020 11:39

    Hello,
    I've been setting up automatic internet fail-over on our SRX-300 devices and have a problem with the event handling.

    The "back-to-normal-maininet" event policy (to be run when RPM indicates the ping monitor is succeeding) is being run multiple times per day when there should be no reason to. The "no-prefer-maininet-pingfail" event policy (to be run when the RPM monitor fails) is never run at any time before these events.

    Any ideas?

    SRX-300
    JunOS 18.2R3-S2.9



    
    
    system {
      services {
        syslog {
          file rpm.log {
              any any;
              match rmopd;
              archive size 5m files 1;
              explicit-priority;
          }
      }
    }
    services {
        rpm {
            traceoptions {
                file RPM_status.txt size 10m files 2 world-readable;
                flag all;
            }
            probe main-inet-probe {
                test probe-oogle {
                    probe-type icmp-ping;
                    target address 8.8.8.8;
                    probe-count 3;
                    probe-interval 1;
                    test-interval 3;
                    thresholds {
                        successive-loss 3;
                    }
                    destination-interface ge-0/0/5.0;
                    next-hop 71.13.X.U;
                }
            }
        }
    }
    event-options {
        policy no-prefer-maininet-pingfail {
            events [ ping_test_failed ping_probe_failed ];
            within 30 {
                trigger until 4;
            }
            within 25 {
                trigger on 3;
            }
            attributes-match {
                ping_test_failed.test-owner matches main-inet-probe;
                ping_test_failed.test-name matches probe-oogle;
            }
            then {
                priority-override {
                    severity warning;
                }
                change-configuration {
                    retry count 3 interval 20;
                    commands {
                        "set routing-options static route 0.0.0.0/0 preference 42000";
                    }
                    commit-options {
                        log "RPM ping fail, disable primary default route";
                    }
                }
                execute-commands {
                    commands {
                        # this is just help me better monitor when these events are triggered
                        "request snmp spoof-trap mplsTunnelDown";
                    }
                }
            }
        }
        policy back-to-normal-maininet {
            events ping_test_completed;
            within 20 {
                trigger on 3;
            }
            within 25 {
                trigger until 4;
            }
            attributes-match {
                ping_test_completed.test-owner matches main-inet-probe;
                ping_test_completed.test-name matches probe-oogle;
            }
            then {
                priority-override {
                    severity warning;
                }
                change-configuration {
                    retry count 3 interval 20;
                    commands {
                        "delete routing-options static route 0.0.0.0/0 preference";
                    }
                    commit-options {
                        log "RPM ping success, return to normal routing";
                    }
                }
                execute-commands {
                    commands {
                        # this is just help me better monitor when these events are triggered
                        "request snmp spoof-trap mplsTunnelUp";
                    }
                }
            }
        }
        traceoptions {
            file EVENTOPTIONS.txt size 5m files 2 world-readable;
            flag all;
        }
    }
    ​



    % grep '09:18:' /var/log/RPM_status.txt | egrep -v 'Next-hop|rmop_calc_jitter'
    ...
    Nov 18 09:18:32 test_done: sent 3, test 0
    Nov 18 09:18:32 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:18:32 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:18:37 test_done: sent 3, test 0
    Nov 18 09:18:37 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:18:37 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:18:42 test_done: sent 3, test 0
    Nov 18 09:18:42 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:18:42 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:18:47 test_done: sent 3, test 0
    Nov 18 09:18:47 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:18:47 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:18:56 test_done: sent 3, test 0
    Nov 18 09:18:56 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:18:56 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:19:05 handle_timeout: time out 3, test 0 entry->probe_state 1
    Nov 18 09:19:05 handle_timeout: time out 3, test 0
    Nov 18 09:19:05 test_done: sent 3, test 0
    Nov 18 09:19:05 PING_TEST_FAILED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:05 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state FAIL
    Nov 18 09:19:05 icmp_notify: not sent, test 0, ping 2
    Nov 18 09:19:10 test_done: sent 3, test 0
    Nov 18 09:19:10 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:10 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:19:15 test_done: sent 3, test 0
    Nov 18 09:19:15 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:15 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:19:22 test_done: sent 3, test 0
    Nov 18 09:19:22 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:22 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:19:26 handle_timeout: time out 1, test 0 entry->probe_state 1
    Nov 18 09:19:26 handle_timeout: time out 1, test 0
    Nov 18 09:19:27 icmp_notify: not sent, test 0, ping 0
    Nov 18 09:19:27 test_done: sent 3, test 0
    Nov 18 09:19:27 PING_TEST_FAILED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:27 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state FAIL
    Nov 18 09:19:32 test_done: sent 3, test 0
    Nov 18 09:19:32 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:32 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:19:37 test_done: sent 3, test 0
    Nov 18 09:19:37 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:37 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:19:46 test_done: sent 3, test 0
    Nov 18 09:19:46 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:46 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:19:56 test_done: sent 3, test 0
    Nov 18 09:19:56 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:56 RTM_CHANGE gencfg for probe main-inet-probe, test probe-oogle to state PASS
    Nov 18 09:20:01 test_done: sent 3, test 0
    Nov 18 09:20:01 PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    ...

    % grep '09:18:' /var/log/rpm.log
    ...
    Nov 18 09:18:32 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:18:37 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:18:42 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:18:47 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:18:56 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:05 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_FAILED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:10 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:15 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:22 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:27 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_FAILED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:32 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:37 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:46 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:19:56 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    Nov 18 09:20:01 ws-firewall rmopd[2051]: %DAEMON-6-PING_TEST_COMPLETED: pingCtlOwnerIndex = main-inet-probe, pingCtlTestName = probe-oogle
    ...

    % egrep '09:(18|19|20):' /var/log/EVENTOPTIONS.txt | egrep 'PING|Execut'
    ...
    Nov 18 09:18:32 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:18:32 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:18:32 Policy <back-to-normal-maininet>'s conditions dont match for event <PING_TEST_COMPLETED>
    Nov 18 09:18:37 event_filter_tree_find: Event id PING_TEST_COMPLETED not found
    Nov 18 09:18:37 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:18:37 Policy <back-to-normal-maininet>'s conditions dont match for event <PING_TEST_COMPLETED>
    Nov 18 09:18:37 event_filter_tree_find: Event id PING_TEST_COMPLETED not found
    Nov 18 09:18:42 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:18:42 Executing policy: <back-to-normal-maininet>
    Nov 18 09:18:42 Changing priority, for policy 'back-to-normal-maininet' triggered on receipt of event 'PING_TEST_COMPLETED'
    Nov 18 09:18:42 executing commands for policy 'back-to-normal-maininet' triggered on receipt of event 'PING_TEST_COMPLETED', with '/tmp/evt_cmd_0TKvCe' as the temporary command file, and '/dev/null' as the temporary output file
    Nov 18 09:18:42 Executing commands </usr/libexec/invoke-commands /tmp/evt_cmd_0TKvCe /dev/null XML>
    Nov 18 09:18:42 event_filter_tree_find: Event id PING_TEST_COMPLETED not found
    Nov 18 09:18:44 changing configuration for policy 'back-to-normal-maininet' triggered on receipt of event 'PING_TEST_COMPLETED', with '/tmp/evt_cmd_kasf4N' as the temporary command file
    Nov 18 09:18:47 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:18:47 Policy <back-to-normal-maininet>'s conditions dont match for event <PING_TEST_COMPLETED>
    Nov 18 09:18:47 event_filter_tree_find: Event id PING_TEST_COMPLETED not found
    Nov 18 09:18:56 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:18:56 Policy <back-to-normal-maininet>'s conditions dont match for event <PING_TEST_COMPLETED>
    Nov 18 09:18:56 event_filter_tree_find: Event id PING_TEST_COMPLETED not found
    Nov 18 09:19:05 Processing event <PING_TEST_FAILED>
    Nov 18 09:19:05 Policy <no-prefer-maininet-pingfail>'s conditions dont match for event <PING_TEST_FAILED>
    Nov 18 09:19:06 event_filter_tree_find: Event id PING_TEST_FAILED not found
    Nov 18 09:19:10 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:19:10 Policy <back-to-normal-maininet>'s conditions dont match for event <PING_TEST_COMPLETED>
    Nov 18 09:19:10 event_filter_tree_find: Event id PING_TEST_COMPLETED not found
    Nov 18 09:19:16 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:19:16 Executing policy: <back-to-normal-maininet>
    Nov 18 09:19:16 Changing priority, for policy 'back-to-normal-maininet' triggered on receipt of event 'PING_TEST_COMPLETED'
    Nov 18 09:19:16 executing commands for policy 'back-to-normal-maininet' triggered on receipt of event 'PING_TEST_COMPLETED', with '/tmp/evt_cmd_XjFc9d' as the temporary command file, and '/dev/null' as the temporary output file
    Nov 18 09:19:16 event_filter_tree_find: Event id PING_TEST_COMPLETED not found
    Nov 18 09:19:16 Executing commands </usr/libexec/invoke-commands /tmp/evt_cmd_XjFc9d /dev/null XML>
    Nov 18 09:19:22 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:19:22 Executing policy: <back-to-normal-maininet>
    Nov 18 09:19:22 Changing priority, for policy 'back-to-normal-maininet' triggered on receipt of event 'PING_TEST_COMPLETED'
    Nov 18 09:19:22 executing commands for policy 'back-to-normal-maininet' triggered on receipt of event 'PING_TEST_COMPLETED', with '/tmp/evt_cmd_6Zjtdq' as the temporary command file, and '/dev/null' as the temporary output file
    Nov 18 09:19:22 event_filter_tree_find: Event id PING_TEST_COMPLETED not found
    Nov 18 09:19:22 Executing commands </usr/libexec/invoke-commands /tmp/evt_cmd_6Zjtdq /dev/null XML>
    Nov 18 09:19:27 Processing event <PING_TEST_FAILED>
    Nov 18 09:19:27 Policy <no-prefer-maininet-pingfail>'s conditions dont match for event <PING_TEST_FAILED>
    Nov 18 09:19:27 event_filter_tree_find: Event id PING_TEST_FAILED not found
    Nov 18 09:19:27 changing configuration for policy 'back-to-normal-maininet' triggered on receipt of event 'PING_TEST_COMPLETED', with '/tmp/evt_cmd_gk3W5N' as the temporary command file
    Nov 18 09:19:28 changing configuration for policy 'back-to-normal-maininet' triggered on receipt of event 'PING_TEST_COMPLETED', with '/tmp/evt_cmd_QiyK2W' as the temporary command file
    Nov 18 09:19:32 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:19:32 Policy <back-to-normal-maininet>'s conditions dont match for event <PING_TEST_COMPLETED>
    Nov 18 09:19:32 event_filter_tree_find: Event id PING_TEST_COMPLETED not found
    Nov 18 09:19:37 Processing event <PING_TEST_COMPLETED>
    Nov 18 09:19:37 Policy <back-to-normal-maininet>'s conditions dont match for event <PING_TEST_COMPLETED>
    ...


  • 2.  RE: Problem with RPM and event policies

     
    Posted 12-17-2020 09:23

    Hi,

    Something that could be simpler to deploy and that the SRX300 supports starting on 15.1X49-D35 is IP Monitoring. Here is the word-by-word description on what it does:

    "Using IP monitoring with route failover, you can track an IP address or a set of IP addresses using a real-time performance monitoring (RPM) probe. If the RPM probe fails, you can inject a route into the routing table. After the RPM probe successfully reaches its target, the route is withdrawn from both the routing and forwarding tables."

    This would look like this per your current configuration:

    set services rpm probe main-inet-probe test probe-oogle target address 8.8.8.8
    set services rpm probe main-inet-probe test probe-oogle probe-count 3
    set services rpm probe main-inet-probe test probe-oogle probe-interval 1
    set services rpm probe main-inet-probe test probe-oogle test-interval 3
    set services rpm probe main-inet-probe test probe-oogle thresholds successive-loss 3
    set services rpm probe main-inet-probe test probe-oogle destination-interface ge-0/0/5.0
    set services rpm probe main-inet-probe test probe-oogle hardware-timestamp
    set services rpm probe main-inet-probe test probe-oogle next-hop 71.13.X.U

    set services ip-monitoring policy probe match rpm-probe main-inet-probe
    set services ip-monitoring policy payment then preferred-route route 0.0.0.0/0 next-hop 2.1.A.B

    https://www.juniper.net/documentation/en_US/release-independent/nce/topics/task/configuration/internet-protocol-route-monitoring.html 



    ------------------------------
    Juan Ospina
    ------------------------------



  • 3.  RE: Problem with RPM and event policies

    Posted 12-21-2020 10:44
    -------------------------------------------
    Original Message:
    Sent: 12-17-2020 09:22
    From: Juan Ospina
    Subject: Problem with RPM and event policies


    Juan

    My example configuration does not show it but I also need to alter DNS settings and change some OSPF exports. That is why I am using this method.

    The above method is well documented and used by many people so I was hoping someone would have an idea why I am seeing such odd behavior.

    Thanks.

    Hi,

    Something that could be simpler to deploy and that the SRX300 supports starting on 15.1X49-D35 is IP Monitoring. Here is the word-by-word description on what it does:

    "Using IP monitoring with route failover, you can track an IP address or a set of IP addresses using a real-time performance monitoring (RPM) probe. If the RPM probe fails, you can inject a route into the routing table. After the RPM probe successfully reaches its target, the route is withdrawn from both the routing and forwarding tables."

    This would look like this per your current configuration:

    set services rpm probe main-inet-probe test probe-oogle target address 8.8.8.8
    set services rpm probe main-inet-probe test probe-oogle probe-count 3
    set services rpm probe main-inet-probe test probe-oogle probe-interval 1
    set services rpm probe main-inet-probe test probe-oogle test-interval 3
    set services rpm probe main-inet-probe test probe-oogle thresholds successive-loss 3
    set services rpm probe main-inet-probe test probe-oogle destination-interface ge-0/0/5.0
    set services rpm probe main-inet-probe test probe-oogle hardware-timestamp
    set services rpm probe main-inet-probe test probe-oogle next-hop 71.13.X.U

    set services ip-monitoring policy probe match rpm-probe main-inet-probe
    set services ip-monitoring policy payment then preferred-route route 0.0.0.0/0 next-hop 2.1.A.B

    https://www.juniper.net/documentation/en_US/release-independent/nce/topics/task/configuration/internet-protocol-route-monitoring.html 



    ------------------------------
    Juan Ospina