Blogs

Expert Advice: Design Considerations for “Long-Lived” On-box SLAX Scripts

By Erdem posted 08-11-2015 14:35

  

Design Considerations for “Long-Lived” On-box SLAX Scripts

 

Previous Article: SLAX

Next Article: Best Practices Series: Make Your Junos Automation Scripts More Robust

 

 

NOTE: This applies to SLAX version 1.0 and higher.

 

You can design trade-offs for SLAX script design.

 

Junos OS automation (scripts) serve a lot of different functions. Sometimes you need a script that provides a one-shot report of some device counters (such as for troubleshooting), or maybe provides a shortcut for updating the configuration. Sometimes you need a  script that operates over a long period of time (such as collecting or reporting counters) and behaves more like a system “daemon”. 

 

As it turns out, there are two basic approaches to making an on-box SLAX script provide daemon-like (daemonic?) functionality:

 

  • Schedule the script as a timer-driven event script, so that it is executed much like a cron job.
  • Create a self-restarting script; one that causes a new copy of itself to be started when the current copy exits.

You may well ask, “What would each of these approaches look like?” or What are the pros and cons of these two approaches?”. And I would reply, “Those are very good questions. You are clearly an intelligent, engaged and discerning reader.  Please continue.”

 

Approach 1: The Timer-Driven Script

 

This is probably the more straightforward of the two approaches, and might be all you’ll need to set up repetitive execution of most basic scripts. If you already have an op script that performs some set of operations and exits, then all you need to do is craft a generate-event timer that will exec your script at your prescribed interval. For example, the following generate-event timer causes an event named "Every5Minutes" to fire at 5-minute (300-second) intervals:

 

1

event-options {

2

    generate-event {

3

        Every5Minutes time-interval 300;

4

    }

5

}

 

Now, if you have a script, for example, pfe-mon.slax, that you would like to execute as an event script every 5 minutes, then all you need to do is add the policy to your script like so:

 

01

event-options {

02

    generate-event {

03

        Every5Minutes time-interval 300;

04

    }

05

    policy run_pfe-mon {

06

        events Every5Minutes;

07

        then {

08

            event-script pfe-mon.slax;

09

        }

10

    }

11

}

 

And that is that. The policy run_pfe-mon will ensure that your script (presuming it was correctly written and installed into /var/db/scripts/event) will execute automatically every 5 minutes, ad infinitum. You’re now done; go have a latte.

 

Approach 2: Self-Restarting Script

 

A slightly more involved approach entails designing your script so that when it exits, it emits a unique “leaving now” syslog message. Along with that script mod, you must also craft an event policy that detects your new “leaving now” message and executes your script, starting the cycle all over again.

 

For example, the code to emit the “leaving now” syslog message in your event script wackamole.slax would use the jcs:syslog() function and would look something like this:

 

1

expr jcs:syslog(“daemon.info”, ”Normal completion for wackamole script. Exiting.”);

 

You don’t necessarily have to use the same facility code.severity, as above.

And the event policy (wackamole-done) that would catch your “leaving now” message would look like this:

 

01

event-options {

02

    policy wackamole-done {

03

        events system;

04

        attributes-match {

05

            system.message matches "Normal completion for wackamole script. Exiting.";

06

        }

07

        then {

08

            event-script wackamole.slax;

09

        }

10

    }

11

}

 

So using this combination of syslog message and event policy, as soon as your script exited, eventd would start a fresh copy, ad infinitum.   You may well be thinking, “This sounds a little silly and/or unnecessarily complex. Why on earth would I ever want to do something like that?” If so, then you’re playing right into my trap: the next section.

 

When Would I Choose One Approach over the Other? What Are the Pros and Cons to Consider?

 

Now that we have two different approaches to solve the same issue, it’s instructive to understand when one might be more appropriate than the other.

 

CPU Usage

 

When you execute a script – either interactively as an op script, or automatically from an event policy -- Junos OS creates a new process to run your code.  From a system resource point of view, process creation is a brief, but fairly CPU-intensive operation; a lot of context must be established even before your script executes a single line of code.  (Managing CPU usage within your SLAX script is a fine little topic for a future TechWiki article.) 

 

If your script really doesn’t need to be executed more frequently than (nominally) 5-minute intervals, then the overhead of process creation is probably not that much of a concern, making a simple timer-driven invocation more suitable. So, get another latte; you’re still done.

 

On the other hand, if your script needs to be executed much more frequently – say at 60-second intervals – then implementing it as a basic timer-driven script is probably a Bad Idea. You’ll be flogging the system with a lot of process-creation overhead. For that type of “high-fidelity” operation, you would be better off creating an in-script loop to handle the repetition.

 

For the case where you need to do such an in-script loop with a relatively short interval, then you can make your script maintain the timing between each iteration itself with judicious use of functions like jcs:sleep() and date:seconds()in-loop. There’s one annoyance with this approach: there is no such thing as an infinite loop (for example, a “while(1)” operation) in SLAX. There are hard limits on the size of the counter loops you can create and the number of times a function or template can re-curse.

 

This – the lack of a while() loop – is the main reason for a self-restarting script. With a self-restarting script, your code can execute its main loop for “X” iterations, then emit a “leaving now” syslog message and exit.  The eventd process will simply start the whole thing over again. This approach allows your to “amortize” the script initialization or startup CPU costs over “X” number of loop iterations.

 

Memory Usage

 

When your SLAX script starts, it also starts to consume memory. It starts with a base allocation and uses more memory each time it updates or instantiates variables. There is no “garbage collection” in SLAX, so as your script processes RPC responses and updates variables, your process memory usage will only ever increase. Your script’s allocated memory is only released back to Junos OS when it exits. If your script process exceeds its system limits, then the kernel will kill it -- utterly without remorse, no questions asked.

If you are writing a self-restarting script (or even an op script) that has a loop with a LOT of iterations, then memory usage might well become a concern for you. In the case of self-restarting script with a loop, you can tune its operation to address this issue by managing the number of iterations in your loop. More loop iterations mean less process creation overhead (script startup), but also mean more memory allocated over the life of the loop. Fewer iterations-per-invocation will mean auto-restarting the script more often, but each exit frees up previously allocated memory resources. It’s a trade-off that can be tuned for your particular circumstances. If, however, your script exceeds its process memory limit and is killed by the kernel, it has no way to detect this and will be dead before it can ask Junos OS to start another copy. So sad.

 

Scheduling

 

For timer-driven scripts, it’s a good Idea to take pains to ensure that the script will always complete its execution before the timer event policy will start a new one. Otherwise, you can start to run into situations like “process pile-ups” (multiple copies of the script executing at the same time) and data consistency issues (such as where the newly-started script overwrites data from the script that is not yet finished).    Since there is no SLAX function available to kill() other processes,  a new process cannot stop older running copies of itself. Any timer-driven script should include the ability to check for the existence of already running copies and act accordingly. (We’ll talk about this in more detail in a another future TechWiki article.)

 

Self-restarting scripts really don’t need to worry about a new copy starting “on top” of an existing copy, since they’re only started when the current copy exits. (Technically, this isn’t 100 percent true: something had to start the script initially. There are a couple of approaches to this, and we will talk about that in yet another future TechWiki article.) 

 

For self-restarting scripts using loops, it’s possible to set regular intervals with fixed jcs:sleep() operations.  Interestingly it’s also possible vary the sleep() length for each interval by checking duration since start of interval. This can mitigate “timestamp jitter” that might occur when some loop iterations take longer (due to system load or other external factors).

 

Summary

 

I hope this article was helpful. We touched on a couple of areas that we’ll revisit in more detail in future articles. In addition, I’m planning a couple of SLAX automation articles on topics like script robustness and configurability, and data persistence.

 

If you have questions (or corrections!) for this article, or questions or ideas for future articles, please reply here and we’ll get cracking on it.

 

 

Written by Douglas McPherson

Solutions Consultant at Juniper

 


#Slax
#SLAXscripts
#ExpertAdvice