Notice of Warm Resets on Cross-Connect Board on your NG-SDH

Abstract: The communication of the 485 communication module fails frequently. As a result, warm resets occur frequently on the cross-connect board on optical transmission NG-SDH  equipment. In this case, service interruption may be caused.

 

Trigger Condition:
The 485 communication module on the board is faulty.
Problem Phenomena:
1. The boards on the NE report the COMMUN_FAIL alarm. The third parameter of the alarm is 0x01 or 0x02, which respectively indicates that the first or second 485 communication channel is abnormal.
2. Warm resets occur on the cross-connect board frequently.

3. The service board reports the TR_LOC alarm (in a version later than R003, the fourth parameter is 0x04 or 0x20, which respectively indicates that the cross-connect board in slot 9 or 10 is detected to be faulty; in the R003 version or earlier, the fourth parameter is 0x03, which indicates that both cross-connect boards are detected to be faulty).

4. The downstream reports the MS_AIS alarm.
Judge Method:
The problem can be inferred if the following conditions are true:
1. The version of the NE is one of the affected versions.
2. The COMMUN_FAIL alarm is reported, and the parameter indicates that the 485 communication module is faulty (the third parameter is 0x01 or 0x02).

3. Warm resets occur on the cross-connect board frequently.

4. If all the services that traverse the local NE are interrupted, the corresponding line board on the downstream NE reports the MS_AIS alarm and the service board on the local NE reports the TR_LOC alarm (in a version later than R003, the fourth parameter is 0x04 or 0x20, which respectively indicates that the cross-connect board in slot 9 or 10 is detected to be faulty; in the R003 version or earlier, the fourth parameter is 0x03, which indicates that both cross-connect boards are detected to be faulty).

[Cause Analysis]
When the 485 communication module is faulty, the cross-connect board receives many messages of the 485 communication anomaly and the CPU of the cross-connect board is busy in processing these messages. As a result, warm resets occur on the cross-connect board. If three warm reset occur on the cross-connect board within five minutes, the cross-connect board is considered faulty (if the NE houses a standby cross-connect board that is in normal state, a switching of the cross-connect boards is triggered after the active cross-connect board is considered faulty. After the switching, the currently active cross-connect board is considered faulty if three warm resets occur on the board within five minutes). In this case, the service board inserts the MS-AIS alarm and then the services are interrupted.
[Impact and Risk]
If three warm reset occur on the cross-connect board within five minutes, the cross-connect board is considered faulty (if the NE houses a standby cross-connect board that is in normal state, a switching of the cross-connect boards is triggered after the active cross-connect board is considered faulty. After the switching, the currently active cross-connect board is considered faulty if three warm resets occur on the board within five minutes). In this case, the service board inserts the MS-AIS alarm and then the services are interrupted.

Recover Measures:

1. Absolute recovery measures: Locating this problem is not easy. Hence, after the problem occurs, you should contact the R&D sector for the problem locating and board replacement, or you can also repurchase other new boards on professional Huawei network product distributors.
2. Temporary recovery measures: If the services are interrupted, perform a cold reset on or reset a cross-connect board. After the cross-connect board is restarted, take the following avoidance measures.

In these commands, the parameter “a” in blue indicates the slot ID of the board, which is represented in hex.
Note: This avoidance measure can make the service not interrupted by disabling the logical watchdog, but can not solve the warm resetting of cross-connect board.
After this avoidance measure be taken, the cross-connect board may still warm reset, and the service may be interrupted if a switching happens.

Solution:

Upgrade the NE to the V100R003C02SPC620 (5.xx.13.63) version, or upgrade to V100R008C02SPC200(5. xx.18.50P01) or later version
But the warm resetting of cross-connect board can not be solved. This problem can be solved by changing the board which has a 485 problem.

Categories:

Comments are closed