Pay Attention to Service Interruption on FPGAs Cross-Connect Boards

[Problem Description]
Trigger conditions:

The FPGA of a cross-connect board is faulty and in an incorrect state, so the FPGA falsely checks the status of the service board and triggers MSP switching with an abnormal state, resulting in service interruption.

1


Symptom:
MSP switching occurs, services are interrupted, and the faulty cross-connect board is reset or active/standby switching occurs.

Identification method:
When the following two conditions are met, it can be determined that the problem is triggered:

  •  MSP switching occurs. Only services configured on the MS are interrupted while other services of the NE are normal.
  • The faulty cross-connect board is reset or active/standby switching occurs.

[Root Cause]
Scenario 1: The cross-connect board functions as the standby board. When the FPGA of the cross-connect board is faulty and in an incorrect state, the FPGA falsely considers that the running status of the line board is abnormal and mistakenly treats itself as the active board, and so triggers the MSP switching protocol. However, the real active cross-connect board does not switch the cross-connect matrix based on the MSP switching protocol (because the board detects that the service board is normal). As a result, services configured on the MS are interrupted.
Scenario 2: The cross-connect board functions as the active board. When the FPGA of the cross-connect board is faulty and in an incorrect state, the FPGA checks that the running status of the service board is abnormal, and so triggers MSP switching and switches cross-connection pages; while the standby board does not detect the abnormality and so does not switch cross-connect pages, resulting in a difference in the cross-connect pages between the active and standby boards. In addition, the active board automatically resets due to a logical status error, and the standby board becomes the active board after the reset. Therefore, services are interrupted ultimately due to the difference in the cross-connect pages between the active and standby boards.

[Impact and Risk]
Impact: Services configured on the MS are interrupted.
Risk: The FPGA failure was an occasional failure, so the problem rarely occurs.

[Measures and Solutions]
Recovery measures:
Quickly replace the faulty cross-connect board after re-enabling the MSP protocol.
Preventive measures:
Upgrade the NE to R10C03SPC208 or a later version. The service interruption caused by the faulty cross-connect board can be avoided.
Solution:
Upgrade the NE to R10C03SPC208 or a later version and replace the faulty cross-connect board.
Material handling after replacement:
N/A

[Inspector Applicable or Not]
None
[Rectification Scope and Time Requirements]
N/A
[Rectification Guide]
N/A
[Attachment]
None

Comments are closed