FAQ: How do AI-Scripts affect Junos system resource utilization?

By Elevate posted 03-30-2016 06:04



How do AI-Scripts affect Junos system resource utilization?


The impact of AI-Scripts on Junos system resources is almost none. Event-based scripts are executed only if a particular event is observed and the executed script is specific to the event. Scripts do not scan for logs or engage in other lengthy tasks. They execute specific commands and complete execution in a minute or so. Proactive scripts operate very similarly based on a timer rather than the specific event.


The primary purpose of AI-Scripts is to detect the set of events active on a device, and collect appropriate data as quickly as possible after an event occurs on the device. The primary purpose of Junos OS is to maintain routing and security activity above all else, including the AI-Scripts data collection process.


CPU Utilization

To ensure that Junos OS can best manage the CPU to perform critical functions, all AI-Scripts processing is executed at a lower scheduler priority. This means any AI-Scripts processing always cede CPU usage to critical tasks. AI-Scripts activity leverages the Junos OS behavior to achieve this.


AI-Scripts Release 5.0R4 and later contain improvements specifically designed to allow Junos OS to continue operating without any interference from processing done by AI-Scripts. For example, when a critical router task needs 75% of the CPU, and AI-Scripts also needs 75% of the CPU to keep up with its work, the critical router task, running at higher priority gets 75% of the CPU. AI-Scripts is restricted to the remaining 25% due to its lower priority. As a result, the critical router task keeps up with its work and no side effects occur. If the critical task gets busier and requries 90% of the CPU, it will still get 90% of the CPU and AI-Scripts is restricted to use the remaining 10%.  The critical router task always gets the CPU it needs, while AI-Scripts uses less CPU and takes longer to complete collecting all the required information without disturbing the critical router task.

This behavior does not mean CPU will never reach 100% utilization when AI-Scripts processes are executing. This behavior does ensure that whatever portion of the 100% utilization is allotted to AI-Scripts is not needed by other Junos OS functions. Extended periods of 100% CPU utilization while the low priority processing completes are normal, and does not affect critical functions.

This priority setting can be verified by observing the output of the show system processes extensive Junos OS command. All AI-Scripts processes use a NICE value of 20, as displayed in the NICE column of the output.

root@host> show system processes extensive
last pid: 54331;  load averages:  0.48,  0.20,  0.11  up 10+05:26:18    11:45:07
134 processes: 16 running, 105 sleeping, 1 zombie, 12 waiting
Mem: 215M Active, 121M Inact, 551M Wired, 13M Cache, 112M Buf, 70M Free 
 1500 root        5  76    0   513M 57040K select 0 268.7H 96.19% flowd_octeon_hm
   22 root        1 171   52     0K    16K RUN    0 211.5H 76.66% idle: cpu0
54323 root        1  79    0 51708K 17164K select 0   0:01  2.98% cli
   23 root        1 -20 -139     0K    16K WAIT   0 136:33  0.00% swi7: clock:
 1783 root        1   4    0  2348K   860K kqread 0   4:34  0.00% tail
 1519 root        1  76    0 15116K  5972K select 0   4:32  0.00% rtlogd
   21 root        1 171   52     0K    16K RUN    1   3:47  0.00% idle: cpu1
 1523 root        3  76    0 15536K  4564K select 0   3:46  0.00% wland
   26 root        1 -16    0     0K    16K -      0   3:45  0.00% yarrow
 1494 root        1  76    0 23572K  8436K select 0   0:49  0.00% cosd
 1511 root        1  76    0 15224K  5880K select 0   0:48  0.00% jsrpd
 1518 root        1  76    0 22548K  7144K select 0   0:48  0.00% smihelperd
 1485 root        1  76    0 14508K  4360K select 0   0:48  0.00% craftd
   38 root        1 171   52     0K    16K pgzero 0   0:35  0.00% pagezero
   39 root        1 -16    0     0K    16K psleep 0   0:26  0.00% bufdaemon
   56 root        1  -8    0     0K    16K mdwait 0   0:24  0.00% md0
 1777 root        1  76    0 44508K 22540K select 0   0:22  0.00% mgd
 1528 root        1   5    0  3012K  1016K ttyin  0   0:00  0.00% getty
54324 root        1  76    0 44508K  6212K select 0   0:00  0.00% mgd
 1776 labroot     1   8    0  3424K  1108K wait   0   0:00  0.00% sh
54331 root        1  76    0 24592K  1996K CPU0   0   0:00  0.00% top
52527 root        1   8   20  3424K  1608K wait   0   0:00  0.00% sh
52540 root        1  -8   20  3416K  1564K piperd 0   0:00  0.00% sh
52539 root        1  -8   20  3196K  1320K piperd 0   0:00  0.00% grep
52538 root        1   4   20  2312K   896K fifoor 0   0:00  0.00% cat
 1322 root        1  -8    0     0K    16K mdwait 0   0:00  0.00% md3
    7 root        1   8    0     0K    16K -      0   0:00  0.00% thread taskq
   31 root        1 -48 -167     0K    16K WAIT   0   0:00  0.00% swi0: uart
 1332 root        1  -8    0     0K    16K mdwait 0   0:00  0.00% md4


The value of 20 is not exclusive to AI-Scripts processes, but in the above example only AI-Scripts processes are using that value. The only processes running at lower priority, indicated by a larger NICE value, are the low level system processes ‘idle’ and ‘pagezero’. All other processes take precedence over AI-Scripts execution.


The above example was taken when no AI-Script was executing. When an event profile is triggered, AI-Scripts creates more processes. These processes always run at lower priority, as indicated by a NICE value of 20, and terminate when the event profile handling is complete.


Storage Utilization


The following lists how AI-Scripts data collection sequence is mindful about disk utilization:


  • AI-Scripts check the utilization capacity of the device filesystem both at the start of Juniper Message Bundle (JMB) generation and during the collection of data, as part of generating the JMB. This check is performed while creating all types of JMB.
  • On a device with dual Routing Engine, the primary and backup Routing Engines check both the current disk and the destination disk to which the JMB will be transferred. Both checks are done at the start of JMB creation and during data collection, as disk utilization can change by the time an attachment, such as an Request Support Information (RSI) or log archive, is completed. In the case of the destination disk, AI-Scripts perform an additional check on the destination disk before copying the JMB.
  • If a JMB cannot be copied, AI-Scripts wait and retry the copy operation once every hour
  • AI-Scripts issues a warning message, then ceases additional filesystem consumption, when disk usage reaches defined limits. The default behavior is to issue warning logs at 50% or more utilization, then to stop writing new attachment data to disk when utilization reaches 75%. The values of these thresholds can be modified with an op script.
  • If the primary Routing Engine has reached the full threshold, JMB data generated on the backup Routing Engine remains on the backup Routing Engine. AI-Scripts retries copying the JMB data once every hour until it succeeds, or until three days have elapsed, at which time the data is deleted from the backup Routing Engine. The three day threshold can be modified with an op script.
  • Any event JMB that is triggered when the disk is at full threshold is automatically dampened (ignored).
  • An attempt to create an on-demand JMB while the maximum disk utilization threshold is exceeded results in a record of the attempt, but no data file attachments are generated. This is reflected in the status messages file, which is still created. The log file gets generated but contains a single entry stating the disk is considered full.


Memory Utilization

Junos OS, by default, limits each SLAX event script (cscript) process to 128 MB of memory usage.  If a SLAX script attempts to exceed this limit, it terminates and a system log entry is generated. Junos OS also limits all systems to have a maximum of 16 event scripts executing simultaneously, except for some very low end systems, such as SRX100, which have a limit of four due to memory limitations on those devices.  Each AI-Scripts event profile requires two event scripts, so not more than eight AI-Scripts event profiles are processing at a time.


The 16 simultaneous event scripts limit is imposed system wide, and also caps event scripts that are not part of AI-Scripts as part of the total.


Rapid Event Generation


When an event is detected by AI-Scripts, it automatically collects diagnostic data for the event and packages the data as part of a JMB data set. As a safety mechanism, Junos OS limits, by default, the maximum number of concurrent JMB generation processes to 16. In addition to the limit set by Junos OS, AI-Scripts limits the number of shell scripts it initiates by checking the number of active scripts before starting a new one.
Even in the stress test case of initiating 200 simultaneous events, only four AI-Scripts shell processes are initiated. The remainder are dropped. Additional processes are started only after the original processes begin to complete. Because AI-Scripts are monitoring impactful events, if more than four are occurring simultaneously, the system is distressed, and additional data collection by AI-Scripts is stopped to avoid contributing to the distress.