Junos OS

 View Only
last person joined: 4 days ago 

Ask questions and share experiences about Junos OS.
  • 1.  RE IIC Access error

    Posted 07-14-2022 10:59
    I am seeing this minor alarm "RE0 IIC Access error"  on my MX480 router. Has anyone else come across something like this or can help with an explanation for same?

    ------------------------------
    NITIKA THAKUR
    ------------------------------


  • 2.  RE: RE IIC Access error

     
    Posted 12 days ago
    Nitka,

    Its been a while since you posted but though I'd reply for anyone looking at this post in future. We have experienced the "RE0 IIC Access error" on a number of our MX240s and had  opened a few cases to JTAC  to identify the root cause. In summary they've advised these are transient errors that can be ignored (though still a concern we get these). The explanation from JTAC is was follows:

    The IIC controller is used for communication over the I2C bus, used to read environment variables (power voltage - PSU, temp sensor, etc.) from the FRUs in the system. I could see based on the output of the command requested on your previous emails that multiple modules are presenting active errors ( i.e. TMP75_EXHAUST_48, TMP75_INTAKE_4E, I2CLTC3880, and I2CADS7830 ). Please note, the error numbers are very low considering the transient hardware issue is quite common in modern computer systems - many operating systems, especially the user OSes in PC or smart phone, won't generate these types of alarm. However, the Juniper MX router is mission critical device and we choose to report as much information as we can. For example, every 5 seconds ichsmb device driver issues 2 byte read request to PSU device. This request is generated by chassisd to read PSU status in a periodic, if there is any read issue, such as read timeout due to bus signal collision, link noisy etc, system will generate the minor alarm even the next request action is successful.

     

    Please find below a similar kb for the reported alarm:

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB33407&act=login

     

    RE (e.g. RE0) will raise an alarm of "RE0 IIC access Error" when the resiliency I2C bus access error is detected and in normal circumstances alarm is cleared within 24h. JTAC suggested to reboot RE0 in a safe maintenance window, if the reboot doesn't fix, then we shall create RMA for RE0. Most of cases, these minor alarms are just information and there is no service impact at all.

     

    In previous JUNOSes, we generate the minor alarm whenever there is a transient hardware failure (e.g. ichsmb read timeout) – which might not be necessary. Those requests are periodical, one or two times missed won't be any problem.

     

    We do have an internal PR/1615863 to suppress the alarm unless there are continuously three times failure and avoid alarming the user:

    The transient ichsmb failures are retried three times. failures are logged on every instance, but successful reads are silent. The calling function in chassisd should log an error message only once all 3 retries fail to avoid alarming the user. The ichsmb driver continues to log all failures so that there is visibility on failures when required.

     

    The enhancement had been applied-in evo:22.1R3-EVO evo:22.2R2-EVO evo:22.3R1-EVO evo:22.4R1-EVO 

    junos:21.2R3-S2 junos:22.1R3 junos:22.2R2 junos:22.3R1

     Hope the above helps




  • 3.  RE: RE IIC Access error

    Posted 12 days ago
    Thank you Booge for your well detailed explanation for the issue and information.

    ------------------------------
    Nitz
    ------------------------------