Monthly Archives: August 2014

If you Meet Service Interruption on STM-64 Boards Please read It Below

[Problem Description]
Trigger conditions:
A linear or ring MSP switching on STM-64 Boards is triggered by an SF event (R_LOS, R_LOF, or MS_AIS) and the SF event lasts for less than 10 ms.

STM-64

Fault symptom:
After the SF event clears, the MSP protocol changes to the normal state, but is on the idle state page (without the MS_APS_INDI_EX alarm) at one end and on the switching state page (with the MS_APS_INDI_EX alarm) at the other end. The services are interrupted thus cannot be restored and need upgrading by Huawei engineers.Or you can sell another one on Huawei network product distributors.
Identification method:
1. Check whether boards of “involved versions” are used and whether the problem symptoms are as described in this document.
2. Check whether any of the following boards exists in the switched span: SSN1SL64, SSN1SF64, SSN3SL64, and SSN1SLD64 and whether the board software version is 7.15.
3. Check the K byte event, and confirm that the when the SF event end, the EVENT-PARA of K_RECEIVED is the same the one before.
PG-ID EVENT-NO EVENT-VALUE EVENT-PARA DATE-TIME TIME-STAMP
2 598 K_RECEIVED 0×0540 2012-01-13 01:54:09 0x00ec58c4
2 599 K_DIR 0×0000 2012-01-13 01:54:09 0x00ec58cd
2 618 K_RECEIVED 0xb54a 2012-02-10 17:38:32 0x0163e7fd
2 619 K_DIR 0×0002 2012-02-10 17:38:32 0x0163e806
2 620 SF_CLEARS 0×0000 2012-02-10 17:43:02 0x021df4d6
2 621 K_RECEIVED 0×0540 2012-02-10 17:43:02 0x021df616
2 622 K_DIR 0×0000 2012-02-10 17:43:02 0x021df61f

As the K byte event above, the west side receive a K byte (0×0540), at the same time, the other NE which MSP switch happened don’t sent this K byte, also this K byte (0×0540) was received before.
[Root Cause]
When an SF event is transiently reported, an exception occurs when the board of an “involved version” processes the K byte. That is, the board incorrectly reports the history K byte to the MSP protocol. As a result, the MSP protocol goes abnormal and services are interrupted.
[Impact and Risks]
The  MSP switching fails and services are interrupted.
[Measures and Solutions]
Recovery measures:
Restart  the MSP protocol.
Solutions:
1.Load the V1R8C02SPH201 hot patch  to the line boards.

2.Upgrade  to R8C02SPC300 and later version or choose a new one in excellent Huawei network product ditributor.

[Rectification Guide]
It is recommended that you load the hot patch to the line boards to resolve this problem. For details, see the OptiX NG-SDH Hot Patch Upgrade-20120625-A
For the software upgrade method, refer to the corresponding software upgrade guide.

TwitterLinkedInGoogle+FacebookPinterestTumblrStumbleUponRedditShare

Cautions for Version Upgrade on Optix OSN 8800

Summary:
NEs using software packages including the Huawei  OSN products like OptiX OSN 8800 T32 V100R007C00SPC100, V100R007C00SPC200, and V100R007C00SPC100 need to be upgrade to the related GA versions or maintenance versions. Precautions need to be taken during the version upgrade.

upgrade version on osn8800

[Problem Description]
On the live network, NEs using software packages including the OptiX OSN 8800 T32 V100R007C00SPC100, V100R007C00SPC200, and V100R007C02SPC100 have risks during long-time operation. They need to be upgraded to mainstream maintenance versions according to version policies.
During the version upgrade, some 100G boards have risks and therefore spare boards must be prepared for them before the version upgrade.
The following lists the versions need to be upgraded:

  •  OptiX OSN 8800 T32 V100R007C00SPC100 (ESS version)
  • OptiX OSN 8800 T32 V100R007C00SPC200 (ESS version)
  • OptiX OSN 8800 T32 V100R007C02SPC100 (TR5 version)

Trigger condition:
When NEs are using software packages including the OptiX OSN 8800 T32 V100R007C00SPC100, V100R007C00SPC200, and V100R007C02SPC100, they need to upgrade to the related GA versions or maintenance versions according to the upgrade paths listed in the following table.

upgrade paths table

Symptom:
None
Identification method:
The NE uses a software package of the OptiX OSN 8800 T32 V100R007C00SPC100, V100R007C00SPC200, or V100R007C02SPC100. You can get OptiX OSN 8800 from Huawei network product distributor

[Root Cause]
NEs have risks during long-time operation using software packages including the OptiX OSN 8800 T32 V100R007C00SPC100, V100R007C00SPC200, and V100R007C00SPC100. They need to be upgraded to GA versions or maintenance versions according to upgrade policies.

[Impact and Risk]
An NE using a software package of the OptiX OSN 8800 T32 V100R007C00SPC100, V100R007C00SPC200, or V100R007C00SPC100 has risks during the version upgrade if it uses a TN54NS4, or TN54TSC, or TN54TTX board.
Specifically, the TN54NS4, TN54TSC, and TN54TTX boards use earlier-version FPGA which has a defect. Due to this defect, during the NE upgrade, there is a 0.45% probability that the NE becomes suspended when later-version FPGA is loaded to the NE and the NE cannot be recovered. When this occurs, the board must be replaced.
The NE has risks and special measures need to be taken (as described in “Preventive measures”) if the TN54NS4, TN54TSC, or TN54TTX board meets the following conditions:
1. The board manufacturing date is earlier than March 31, 2013. In other words, the value indicating the date in the board barcode is equal to or less than “D3″.
2. When the board undergoes a cold reset during NE upgrade, services over the board are interrupted and alarms such as OTUk_LOF and OTUk_DEG are reported.

[Measures and Solutions]
Recovery measures:
None
Workarounds:
None
Preventive measures:
Upgrade the NE using a software package of OptiX OSN 8800 T32 V100R007C00SPC100, V100R007C00SPC200, or V100R007C00SPC100 to the related GA or maintenance version.
If the TN54NS4, TN54TSC, or TN54TTX board is used on the NE, the following operations are recommended during the NE upgrade:
Isolate the boards, prepare spare boards at a ratio of 1% of the boards (at least two spare boards of the same board type). After the NE upgrade is completed, release the board isolation. If the services are interrupted and alarms such as OTUk_LOF and OTUk_DEG are reported after automatic board package matching is completed and the boards undergo a cold reset, replace the boards at both ends.

Notice of Warm Resets on Cross-Connect Board on your NG-SDH

Abstract: The communication of the 485 communication module fails frequently. As a result, warm resets occur frequently on the cross-connect board on optical transmission NG-SDH  equipment. In this case, service interruption may be caused.

scc

Trigger Condition:
The 485 communication module on the board is faulty.
Problem Phenomena:
1. The boards on the NE report the COMMUN_FAIL alarm. The third parameter of the alarm is 0×01 or 0×02, which respectively indicates that the first or second 485 communication channel is abnormal.
2. Warm resets occur on the cross-connect board frequently.

3. The service board reports the TR_LOC alarm (in a version later than R003, the fourth parameter is 0×04 or 0×20, which respectively indicates that the cross-connect board in slot 9 or 10 is detected to be faulty; in the R003 version or earlier, the fourth parameter is 0×03, which indicates that both cross-connect boards are detected to be faulty).

4. The downstream reports the MS_AIS alarm.
Judge Method:
The problem can be inferred if the following conditions are true:
1. The version of the NE is one of the affected versions.
2. The COMMUN_FAIL alarm is reported, and the parameter indicates that the 485 communication module is faulty (the third parameter is 0×01 or 0×02).

3. Warm resets occur on the cross-connect board frequently.

4. If all the services that traverse the local NE are interrupted, the corresponding line board on the downstream NE reports the MS_AIS alarm and the service board on the local NE reports the TR_LOC alarm (in a version later than R003, the fourth parameter is 0×04 or 0×20, which respectively indicates that the cross-connect board in slot 9 or 10 is detected to be faulty; in the R003 version or earlier, the fourth parameter is 0×03, which indicates that both cross-connect boards are detected to be faulty).

[Cause Analysis]
When the 485 communication module is faulty, the cross-connect board receives many messages of the 485 communication anomaly and the CPU of the cross-connect board is busy in processing these messages. As a result, warm resets occur on the cross-connect board. If three warm reset occur on the cross-connect board within five minutes, the cross-connect board is considered faulty (if the NE houses a standby cross-connect board that is in normal state, a switching of the cross-connect boards is triggered after the active cross-connect board is considered faulty. After the switching, the currently active cross-connect board is considered faulty if three warm resets occur on the board within five minutes). In this case, the service board inserts the MS-AIS alarm and then the services are interrupted.
[Impact and Risk]
If three warm reset occur on the cross-connect board within five minutes, the cross-connect board is considered faulty (if the NE houses a standby cross-connect board that is in normal state, a switching of the cross-connect boards is triggered after the active cross-connect board is considered faulty. After the switching, the currently active cross-connect board is considered faulty if three warm resets occur on the board within five minutes). In this case, the service board inserts the MS-AIS alarm and then the services are interrupted.

Recover Measures:

1. Absolute recovery measures: Locating this problem is not easy. Hence, after the problem occurs, you should contact the R&D sector for the problem locating and board replacement, or you can also repurchase other new boards on professional Huawei network product distributors.
2. Temporary recovery measures: If the services are interrupted, perform a cold reset on or reset a cross-connect board. After the cross-connect board is restarted, take the following avoidance measures.

In these commands, the parameter “a” in blue indicates the slot ID of the board, which is represented in hex.
Note: This avoidance measure can make the service not interrupted by disabling the logical watchdog, but can not solve the warm resetting of cross-connect board.
After this avoidance measure be taken, the cross-connect board may still warm reset, and the service may be interrupted if a switching happens.

Solution:

Upgrade the NE to the V100R003C02SPC620 (5.xx.13.63) version, or upgrade to V100R008C02SPC200(5. xx.18.50P01) or later version
But the warm resetting of cross-connect board can not be solved. This problem can be solved by changing the board which has a 485 problem.

Beware of Using a Single Cross-connect Board for OSN 7500

[Problem Description]
Trigger condition:

The problem described in this pre-warning is occasionally triggered when the following conditions are all met:
1. More than 16 processing boards are used on an NE.
2. Only one cross-connect board is powered on.
Problem Description
The NE reports a COMMUN_FAIL alarm, or MSP switching fails.

Huawei corss-connected board

Identification Methods:
Check the following:
Whether there is any COMMUN_FAIL alarm that has a parameter whose value is 01 or 02.
Whether only one cross-connect board is powered on.
Whether the number of service processing boards in the line slots is more than 15.

Parameters of the COMMUN_FAIL alarm:
725127 COMMUN_FAIL  SA NEW_BOARD  MJ start 2012-11-07 09:42:33+08:00 None SA NEW_BOARD board=10;01 00 01 ff ff;

725127 COMMUN_FAIL SA  NEW_BOARD  MJ start 2012-11-07 09:42:33+08:00 None SA NEW_BOARD board=3;01 00 02 ff ff;

The first parameter indicates a slot number which may be any slot. You need to pay attention to the fourth parameter. A fault will occur if the value of the fourth parameter is 01 or 02 which indicates a 485 communication failure on the first or second channel.

[Root Cause]
The problem was caused by the related components on cross-connect and service processing boards on the OptiX OSN 7500 NEs which were delivered before October, 2012. Specifically, the delay in the transmit and receive directions on the 485 bus is long; however, the data driving capability of the 485 bus is mainly driven by the circuit on the cross-connect board. When only one cross-connect board is inserted and used, the driving capability of the bus decreases by 50%. In this case, if multiple processing boards are used, there is a high probability that a fault or conflict occurs in the communication data. As a result, communication between boards on an NE fails. In some cases, a COMMUN_FAIL alarm (485 communication) may occur. Therefore, in this case, the use of single cross-connect board for an OptiX OSN 7500 is not allowed. The relevant components manufactured after October, 2012 have been technically improved. The 485 components on all boards are replaced with components with shorter delay. In addition, their circuits are optimized. Therefore, no such problem will exist on the boards which are delivered after October, 2012.

OptiX OSN 7500 Equipment Delivered Before    OptiX OSN 7500 Equipment
October, 2012    Delivered After October, 2012
Scenario If more than 16 service processing boards are used,two cross-connect boards must be configured.     No such problem exists.

However, because newly delivered boards may work with old NEs and boards, it is recommended that you not use a single cross-connect board on all Huawei optical network products like OptiX OSN 7500 NEs.

[Impact and Risks]
1. The NE reports a COMMUN_FAIL alarm.
2. MSP switching may occasionally fail.
Measures and Solutions
Recovery measures:
Insert a cross-connect board on the idle slot for cross-connect boards. For more boards you can search from Huawei network product distributor.
Workaround
Do not use single cross-connect board on the OptiX OSN 7500 NE on the entire network.
Corrective Action
Boards which were deliveried after October, 2012 do not have this problem.

Have you ever encountered Chip Failures of the TN55NO2 and TN55TOX Board on the OptiX OSN 8800

Summary: The clock chips used on some TN55N02 and TN55TOX boards of  Huawei OSN product OptiX OSN 8800 NEs have design defects. As a result, the failure rate of the chips is 5% after they are in service for 2 years. In the case of a clock chip failure, services on these boards will be interrupted and cannot be restored even after you remove and reinsert or perform a cold reset on the boards.


OSN 8800

[Problem Description]
Trigger conditions:

The chip failure rate is 5% within 2 years. The chip failure rate is proportional to the service period but is irrelevant to application scenarios.
Symptom:
On the transmission faulty board, a Hard_Bad alarm is reported and cannot be cleared and services are interrupted.
Identification method:
1. Check the barcodes of the boards against the attachment. If the barcodes are included in the attachment which provided by the official website, you can determine that the boards are at risk of a clock chip failure.
2. If the previously described symptom occurs on a TN55TOX01 or TN55NO201 board, you can determine that the board is at risk of a clock chip failure.
You can also query the Zalink register using the following commands.

commands of board

[Root Cause]
The clock chip has design defects. As a result, the TN55NO2 and TN55TOX boards have 5% failure rate within 2 years.
[Impact and Risk]
OptiX OSN 8800: On a board with a defective clock chip, a Hard_Bad alarm is reported and services are interrupted. The services cannot be restored even after you remove and reinsert or perform a cold reset on the board. The chip failure rate is 5% within 2 years and is proportional to the service period.

[Measures and Solutions]
Recovery measures
Replace the boards with a defective clock chip.
Solutions:
Replace the boards with a defective clock chip.

defective clock chip

Material handling after replacement:
For markets outside China directly scrap the boards. For markets inside China, send the boards back to the Spare Parts Center for disposal.

Cautions for the Upgrade Failure on OptiX OSN 550

Summary:
During the upgrade of an Optical transmission OSN 550 earlier than V100R007C10, if the NE has a TNH2EGT1 board, the SCC switching performs repeatedly in the process of activation. After the SCC switching times out, the NE rolls back to the original version, resulting in an upgrade failure.

Huawei OSN500

[Problem Description]

Fault symptoms:
During the upgrade of an OptiX OSN 550 earlier than V100R007C10, if the NE has a TNH2EGT1 board, the SCC switching performs repeatedly in the process of activation. After the SCC switching times out, the NE rolls back to the original version, resulting in an upgrade failure.

Trigger conditions:

The problem may occur if the following conditions are met. An OptiX OSN 550 with the TNH2EGT1 board is upgraded.You can get more Huawei transmission board from Huawei network product distributor

  • The source version and target version of the upgrade are both earlier than V100R007C10.

Identification method:
This problem can be identified if the following conditions are met.

  • The source version and target version of the upgrade are both earlier than V100R007C10.
  • The OptiX OSN 550 is inserted with an TNH2EGT1 board.
  • The following information is recorded in the black box errlog reset record:

ExcptErr: taskname=004tSQMProcess, pc=4981b7ac, SR=40009032, FaultAddr=400, DataBuff=25ce364.

[Root Cause]
The software causes an abnormal alarm detecting task during the upgrade reset of the THN2EGT1 board. Therefore, the active SCC board resets and the package loading state rolls back to the state of activating the standby board. After the standby SCC board reset, the SCC tries to activate the THN2EGT1 board, which again triggers abnormal board tasks and active SCC board reset. In this condition, the status of package loading switches between activating the SCC and activating the THN2EGT1 board. As a result, the activation task times out and the upgrade fails.

[Impact and Risks]
If abnormal tasks occur, the NE upgrade fails.

[Measures and Solutions]
Recovery measures:None

Workaround:
Step 1 This step is critical. Create an upgrade task. On the page shown as follows, select Pause Before Current Operation, Dispense and Activate.

step1

 

Step 2 Execute the upgrade task and wait until the package loading is complete.
Step 3 Run the following commands to quarantine all the EGTI boards.
:swdl-add-excludebd:BID1,0,0,0;
:swdl-add-excludebd:BID2,0,0,0;
BID1 and BID2 indicate the slot numbers of the boards.

STEP2

Step 4 Resume the activation task to complete the upgrade.
Step 5 Wait for ten minutes after the activation is completed and then run commands to de-quarantine the EGTI boards.

STEP5

Notes:
Execute step 3 strictly after step 2 is complete.
Execute step 5 ten minutes after the activation succeed.

Solution:
Upgrade the OptiX OSN 550 to V100R007C10 or a later version.

You might face with TD Alarms by Using Pluggable Optical Modules in WDM Products

Summary:
If no intra-board cross-connection from the Optical transmission WDM side to the client side is configured for boards that support intra-board cross-connections, these boards may falsely report TD alarms when their optical port lasers are turned on.

 

Huawei transmission XCS board

[Problem Description]

Trigger conditions:
The problem occurs on a board when the following conditions are present:
The board is listed in the table above. No intra-board cross-connection shown in 1.1.1 I. Step 11.Figure 1 is configured on the board. Specifically, no unidirectional cross-connection from the WDM side to the client side is configured on the board.

Pluggable modules are used on the board in  Huawei optical network equipment and the optical port lasers are turned on.

Symptom:
TD alarms are reported.
Identification method:
The problem occurs if the trigger conditions and symptom are satisfied.

[Root Cause]
When no intra-board cross-connection shown in Figure 1 is configured on the board, the module bias current increases. In addition, the laser transmit efficiency decreases when the ambient temperature increases. To maintain stable module power, the bias current further increases. When the bias current increases to a value greater than the alarm threshold, TD alarms are reported. The laser transmit efficiency varies according to modules. Therefore, not all modules have this problem.
[Impact and Risk]
The board falsely reports TD alarms. Consequently, operators may consider that the optical modules are faulty.

[Measures and Solutions]
Recovery measures:
Take either of the following measures:
Turn off the lasers after confirming that no service is deployed.
Configure intra-board cross-connections for the related optical ports.
Workarounds:
Turn on the lasers after configuring intra-board cross-connections.
Preventive measures:
Update the related product manuals. Specifically, add descriptions in the product manuals that intra-board cross-connections shown in Figure 1 must be configured for the boards listed in the table above if the lasers on these boards are turned on to test light emitting.

 

Notice for Abnormal Service Interruption on OSN 9500

[Problem Description] Trigger condition: This problem occurs when: The higher order cross-connect board on the Huawei Optical transmission product OSN 9500 NE uses software of the problematic version. Perform service addition/deletion, SNCP switching, MSP switching, and ASON rerouting operations twice or several times within 5 minutes.

Huawei optix OSN9500
Symptoms:
Symptoms may vary depending on the specific operations that have been performed:
1. In case of service addition, services cannot be added.
2. In case of service deletion, services fail to be deleted, while the NMS and the NE software indicate that the services have been deleted.
3. In case of the other three operations (that is SNCP switching, MSP switching, and ASON rerouting), services are interrupted.Identification methods:
The problem described in the pre-warning is triggered when the following conditions are all met:
1. The device is  OptiX OSN 9500 and the software versions have bugs.
2. The cross-connect matrix changes at least twice within 5 minutes, such as service configuration addition/deletion, SNCP switching, MSP switching, and ASON rerouting.
3. When the services are abnormal, there are abnormal records in the black box. There are records such as “130 2011-10-26 5:4:25 0×1 41 0×43484543 0x4B205843 L. 541475145″ in BB0.log or “CHECK XC FAIL” in BB1.log.

[Root Cause]
The software of higher order cross-connect boards has defects. During the cross-connection matrix update, task preemption occurs occasionally. As a result, the newly deployed or deleted services are affected. Huawei cross-connect boards are available at Huawei network product distributor.

[Impact and Risks]
The updated services fail. Based on the R&D personnel analysis, there is a small probability that the problem occurs.
Measures and Solutions
Recovery measures:
Active/Standby protection switching for the cross-connect board
Workarounds:
None.
Solution:
Upgrade the OptiX OSN 9500 NEs involved to the following version.
V100R006C03SPH202 and versions later than V100R006C03
V100R006C05SPC200 and versions later than V100R006C05

Notice on Rectification for SDH Trail Interruption on the U2000

Summary: The SDH(Huawei network product distributor) trail management window displays two trails with the same source or/and sink. A user regards one of them as a duplicate or useless service and deactivates it, which causes interruption of unrelated trails.


PDH

[Problem Description]
Trigger conditions:
SDH trail search is performed on one client while SDH trail creation is performed on another client simultaneously.
This problem also occurs on a single client if a user creates an SDH trail when automatic SDH trail search is in progress.
Fault symptom:
The SDH trail management window displays two trails with the same source or/and sink (NEs, ports, and timeslots). In this situation, the following fault symptoms occur:

  • When a user queries the cross-connections of the two trails, cross-connections are displayed only for one of them or only some cross-connections are displayed for bothof them.
  •  If user-defined information is configured, the information is displayed only for one of them.
  •  If the trails are VC4 server trails, client trails can be queried only for one of them or only some client trails can be queried for both of them.

Deactivating any one of the trails will cause interruption of unrelated trails.
Identification method:

  • Automatic: See the Readme file in the attached tool package.
  • GUI-based: Check the SDH trail management window for two trails with the same source or/and sink.

[Root Cause]
1. When trail search begins, the U2000 loads network-wide trails to the memory for comparing them with those found after the search. Based on the comparison, the U2000 determines whether there are new trails.
2. Before trail search begins, the U2000 divides NEs into different areas. During trail search, the U2000 loads cross-connections area by area. As shown in the schematic diagram, the trail search starts from area A. Within this period, a user creates trail X between an NE in area B and another NE in area C.
3. When the trail search proceeds to areas B and C, the cross-connections of trail X form trail X+.
4. Trail X does not exist at the beginning. Therefore, the U2000 determines that trail X+ is a new trail and saves it to the database.
5. Trail X and trail X+ coexist in the database and use the same source and sink. Either trail X or trail X+ has no cross-connection.
6. It is also possible that trail X and trail X+ use the same source or sink and both of them have only some cross-connections, which depends on the timing of the search and creation.



[Impact and Risks]
After a user deactivates one of the two trails with the same source or/and sink, the other trail is unexpectedly interrupted as well as the client trails of the other trail.

[Measures and Solutions]

Recovery measures:

  •  Upgrade the current U2000 to one of the following patch versions or a later version:

T2000 V200R007C03SPC108, U2000 V100R002C01CP5131, U2000 V100R003C00CP1052, U2000 V100R005C00 CP6031, U2000 V100R006C00CP2001, U2000 V100R006C02SPC300

  • If you do not want to install a patch, delete the duplicate trails from the network(Huawei transport network) layer and search for network-wide SDH trails. Ensure that trail creation is not performed during the search. The tool attached in this document can find duplicate trails.

Workarounds:

  • Search for SDH trails when trail creation is not performed.
  •  Delete the existing duplicate trails from the network layer and search for network-wide SDH trails to restore the desired trail. Do not deactivate a trail directly.
  •  Do not deactivate trails in batches.

If deactivation has been performed, verify that service interruption does not occur before deleting a trail.

Solutions:
Search for network-wide SDH trails after installing one of the following patches:

  • T2000 V200R007C03SPC109
  •  U2000 V100R002C01CP5032 (in offices that require security rectification)
  •  U2000 V100R002C01CP5033 (in offices that do not require security rectfication)
  •  U2000 V100R005C00CP6137
  •  U2000 V100R006C00CP3102
  •  U2000 V100R006C02SPC301

This tool (a script) can check the existence of trails with the same source or/and sink on the U2000. To successfully execute the tool, you are required to enter the password for the database user sa. The tool creates three temporary tables and one store procedure during a check and deletes them after the check is complete. The tool does not modify any user data. After a check is complete, the tool returns the names of duplicate trails.

 

Effect on Boards Service in Slots 7&8 When OSN 1500B Configured with TPS Protection Groups

[Problem Description]
Trigger condition:
The problem described in the pre-warning is triggered when the following four conditions are all met:

1.TPS is triggered or restored by the TPS protection group consisting of transmission boards in slot 12 and 13.

2. The NE type is OptiX OSN 1500B and its version is included in the “Versions Involved” part.

Huawei optix OSN 1500

 

3. The NE is configured with a TPS protection group consisting of processing boards in slots 12 and 13.

4. The NE is configured with a service processing board that needs work with an interface board in slot 7 or 8, such as SSR1PD1 and SSR2PD1. In addition, services are configured on the board.

Symptoms:
The fault scenarios are as follows:
Scenario 1: Boards in slots 12 and 13 form a TPS protection group, and a board in slot 7 or 8 is not configured with TPS protection.
When TPS is triggered by the TPS protection group consisting of boards in slots 12 and 13, services on the board in slot 7 or 8 are interrupted and the T_ALOS alarm is reported. After the TPS is restored, the T_ALOS alarm is cleared and the services on the board are recovered.
Scenario 2: Boards in slots 12 and 13 form a TPS protection group, and a board in slot 7 or 8 is configured with the TPS protection group.
1. TPS is not triggered on the boards in slots 7 and 8:
When the TPS protection group consisting of boards in slots 12 and 13 triggers TPS, services on the boards in slot 7 and 8 are interrupted, and the boards in slots 7 and 8 report the T_ALOS alarm. When the TPS protection group consisting of boards in slots 12 and 13 is restored to the idle state, the T_ALOS alarm is cleared and the services on the boards in slots 7 and 8 are recovered.
2. If TPS is triggered on the board in slot 7 or 8:
When TPS protection group consisting of boards in slots 12 and 13 is restored to the idle state, services on the board in slot 7 or 8 are interrupted and the T_ALOS alarm is reported. When TPS on the board in slot 7 and 8 is restored, the services on the board are recovered and the T_ALOS alarm is cleared.

Identification methods:
The problem described in the pre-warning is triggered when the following three conditions are all met:
1. The NE type is OptiX OSN 1500B and its version is included in the “Versions Involved” part.
2. The NE is configured with TPS protection groups consisting of the service processing boards in slots 12 and 13. In addition, the NE is configured with a service processing board in slot 7 or 8 that needs to work with an interface board (such as SSR1PD1 and SSR2PD1).
3. Services on the boards in slots 7 and 8 are interrupted and the T_ALOS alarm is reported, which is triggered by TPS or TPS restoration in the TPS protection group consisting of boards in slots 12 and 13.
[Root Cause]
The software version has defects. Boards in slot 12 and 13 form a TPS protection group. When TPS is triggered or restored, relays on boards in slots 16 and 17 are switched no matter whether two interface boards are required to be configured on the service processing board in slot 13.
If two interface boards are required by the board in slot 13, the valid slots are slots 16 and 17.
If one interface board is required by the board in slot 13, the valid slot is slot 16.
When a service processing board with an interface board is required by the board in slot 7 or 8 (the interface board is in slot 15 or 17), services on the boards in slots 7 and 8 are affected due to TPS or TPS restoration in the TPS protection group consisting of the boards in slots 12 and 13.

[Impact and Risks]
TPS and TPS restoration will interrupt the services on boards in slots 7 and 8 which are configured with a TPS protection group consisting of boards in slots 12 and 13.
Measures and Solutions
Recovery measures:
It is recommended that you restore TPS for scenario 1 and trigger TPS to ensure that two TPS protection groups are in the idle state for scenario 2.
Workarounds:
Currently, there are two methods available for working around the problem.

1. Do not configure TPS protection groups for boards in slots 12 and 13.
2. Configure TPS protection groups for boards in slots 12 and 13 on an OptiX OSN 1500B subrack but do not configure service processing boards or configure service processing boards that do not work with interface boards in slots 7 and 8.