Monthly Archives: April 2015

10G services on OSN 8800 layer interrupted

1. 10G services on OSN 8800 layer interrupted

2. Three 10 G Services impacted for than 20 seconds.

1. The input optical power fluctuated for about 2.2dB in line boards which was probably triggered by abnormal fiber condition. Then lots of code errors appeared in receiving side of the 8030 site, and finally resulting in service interruption.

1. Both the working and protection line boards received many line code errors:

1.         Check the related fiber connectors. If the fiber connectors are dirty, clean or replace them.

2.         If the alarm persists, check the fiber jumper. If the fiber jumper endures a large bending radius or it is damaged or aging, adjust or replace the fiber jumper

3.         The Huawei optical interface connector is well inserted.

4.         The fiber connector is clean.

5.         The cable is intact.

6.          Add or remove some optical attenuators so that the optical power is proper.

7.         Check whether the fiber is damaged. If yes, replace the fiber,

8.         If bit errors still occur, replace the line board that reports bit errors at the local station.

1.         Please take in process the link optimization activity.

2.         OSNR should be monitor on routine basis.

TwitterLinkedInGoogle+FacebookPinterestTumblrStumbleUponRedditShare

Rectification Notice of Faulty MOS Transistors on TN51XCS Boards

Summary:
Metal-oxide semiconductor (MOS) transistors in a lot used on TN51XCS boards intended for the Huawei OptiX OSN 8800 have a defect. Because of the defect, the noise introduced by the 12 V power supply on the TN51XCS board is out of range and interferes the signals transmitted over the adjacent high-speed buses. This interference will result in signal degrade (SD). Consequently, bit errors are generated when service boards receive the signals.

[Problem Description]
Trigger conditions:
When the following conditions are met, the issue occurs:
1. A cross-connect board listed in the attachment is used on a network.
2. During deployment or network expansion, slots 6–8, 12–14, 25–27, and 29–31 on the OptiX OSN 8800 I subrack house the following Huawei service boards and cross-connections that traverse the cross-connect board are configured on the service boards:
TN55TQX, TN55TDX, TN53NQ2, TN11TOM, TN52TOM, TN11TQM, TN12TQM, TN51THM, TN11LOG, TN12LOG, TN11TQS, TN52TOG, TN11TQX, TN11TDX, TN12TDX, TN52TQX, TN11NS2, TN11NS2T, TN11ND2, TN12ND2-TP, TN12ND2-XFP, TN52ND2-TP, TN52ND2-FP, TN11ND2T, TN12NS2, TN11NS2-B, TN11NS2-AT, TN12NS2T, TN12ND2, TN12ND2T, TN51NQ2, TN52NS2, TN52NS2T, TN52ND2, TN52ND2T, TN52NQ2, TN11NS3, TN52NS3, TN11TSXL, and TN12TMX

Symptom:
1. Concurrent symptoms on the tributary board:
− According to the query of the current performance events, there is bit error second (BES) or background block error ratio (BBER) in the ODUk PM section. In addition, BES and BBER increase, as indicated by multiple queries. When the performance deteriorates, a BUS_ERR alarm is reported.
− According to the query of the performance events on the line board that is interconnected with the tributary board (under the condition that the non-intrusive monitoring in the ODUk PM section of the corresponding channel is enabled), there is no BES or BBER in the section.
2. Symptoms on line boards:
According to the query of the current performance of the line board at downstream site 2 (under the condition that the non-intrusive monitoring in the ODUk PM section of the corresponding channel is enabled), there is BES or BBER in the section and they increase as indicated by multiple queries; however, there is no BES or BBER in the OTUk section on the line board. When the performance deteriorates, a BUS_ERR alarm is reported on the line board that is directly interconnected with the cross-connect board.

site1 site 2

Identification method:
The issue occurs when the following conditions are met:
1. The bar code of the TN51XCS board is listed in the attachment.
2. The issue is confirmed by the inspection script listed in the attachment when the service boards are TN55TQX, TN55TDX, TN53NQ2, or TN52NS3.

[Root Cause]
MOS transistors in a lot used on TN51XCS boards have a defect. Because of the defect, the noise introduced by the 12 V power supply on a TN51XCS board is out of range and interferes the signals transmitted over the adjacent high-speed buses. This interference will result in SD. Consequently, bit errors are generated when service boards receive the signals.

[Impact and Risk]
When the issue occurs, bit errors are generated in services and even services are interrupted.

[Measures and Solutions]
Recovery measures:
None

Workarounds:
Remove the involved service boards and inserted them in other slots to temporarily restore services.

Preventive measures:
Replace the faulty cross-connect board with a cross-connect board whose bar code is not listed in the attachment.

 

 

How to Prevent a License Defect Flash Exhaustion on the Main Control Board

Abstract: When the license module processes interchange messages for hot standby between the working and protection main control boards, commissioning information will accumulate to the flash and finally exhaust the flash space. As a result, configuration data fails to be saved and database backup fails.

[Problem Description]
Trigger conditions:
Two main control boards are configured on an NE of a version listed in “Involved Versions”.
Symptom:
The flash space on the protection main control board is consumed at a speed twice of that on the working main control board. Therefore, this issue occurs on the protection main control board earlier than on the working main control board.
1. If no sufficient flash space is available for periodic database backup, the backup will fail and the DBMS_ERROR alarm will be reported.
2. Extra small files need to be generated in the case of configuration deployment or dynamic services. If no sufficient flash space is available, these files cannot be generated, causing an NE reset and inconsistency between the working and protection main control boards (the result of :hbu-get-backup-info is 0×00000002).
3. If the flash space is insufficient during an NE upgrade, the upgrade will fail.
Identification method

  •  An NE of a version listed in “Involved Versions” is deployed.
  •  The NE has two main control boards.

[Root Cause]
When the license module processes interchange messages for hot standby between the working and protection main control boards, commissioning information will accumulate to the flash and finally exhaust the flash space. (The working main control board consumes 1 M flash space per 20 days and the protection main control board consumes 1 M flash space per 10 days.)

[Impact and Risks]
The flash space will be exhausted.
1. If the flash space is insufficient for periodic database backup, the backup will fail and all new configuration data may be lost upon a reset.
2. If the flash space is insufficient, desired small files cannot be generated, causing an NE reset and inconsistency between the working and protection main control boards (the result of :hbu-get-backup-info is 0×00000002). In addition, a rest on the working main control board may result in loss of new configuration data. .
3. If the flash space is insufficient during an NE upgrade, the upgrade fails.

[Measures and Solutions]
Recovery measures:

  • If the DBMS_ERROR alarm is reported, do as follows:

1. Run the following commands to query whether the HBUMSG.TXT file exists on the working or protection main control board:
:sftm-show-dir:working board,”ofs1/license”
:sftm-show-dir:protection board,”ofs1/license”
2. If yes, run the following commands to delete the file:
:sftm-delete-file:working board,”ofs1/license/HBUMSG.TXT”
:sftm-delete-file:protection board,”ofs1/license/HBUMSG.TXT”

  •  If the protection main control board is reset, do as follows:

1. If data synchronization fails between the working and protection Huawei main control board, issue the following command to the working main control board:
:sm-set-nebusy:0,0,0,0,none
2. Run the following commands to query whether the HBUMSG.TXT file exists on the working or protection main control board:
:sftm-show-dir:working board,”ofs1/license”
:sftm-show-dir:protection board,”ofs1/license”
3. If yes, run the following commands to delete the file:
:sftm-delete-file:working board,”ofs1/license/HBUMSG.TXT”
:sftm-delete-file:protection board,”ofs1/license/HBUMSG.TXT”

  • If the working main control board is reset, do as follows:

1. If data synchronization fails between the working and protection main control board, issue the following command to the working main control board:
:sm-set-nebusy:0,0,0,0,none
2. Run the following commands to query whether the HBUMSG.TXT file exists on the working or protection main control board:
:sftm-show-dir:working board,”ofs1/license”:sftm-show-dir:protection board,”ofs1/license”

3. If yes, run the following commands to delete the file:
sftm-delete-file:working board,”ofs1/license/HBUMSG.TXT”
:sftm-delete-file:protection board,”ofs1/license/HBUMSG.TXT”
4. If new configuration data is lost, reconfigure the new services.

Workarounds:
Use the health check tool to periodically (recommended: every six months) check the NEs of the involved versions on the live network. For details on the health check tool, see “Support for the Health Check Tool.”

Solution:
For OSN 3500/OSN 7500 product, upgrade software to V200R011C00SPC300 or a later version.
For OSN 1500 product, upgrade software to V200R011C00SPC300 or a later version; or upgrade software to V200R011C01SPC103 or a later version.

Material handling after replacement:
None

[Support for the Health Check Tool]
The SmartKit NSE2700 V200R006C00 Inspector is supported and can be upgraded to the latest version. The health check item is “Common_Platform_Checking/Check if HBUMSG.TXT file exist in active and standby SCC”.

[Rectification Scope and Time Requirements]
None

 

Abnormal performance record in tributary board of ONS500

Query performance of tributary board on OSN 500,there has E1 performance in T1 service: 

:per-get-curdata:9,1,0,1,perall,15m; 
                                         PER_GET_CURDATA                                              
  BID   PERIOD  STARTTIME                      EID                   OPPORT  CHAN          VALUE      
  9     15m     1990-01-03 01:45:00+08:00      t1_les_sdh            1       1             214        
  9     15m     1990-01-03 01:45:00+08:00      t1_lses_sdh           1       1             214        
  9     15m     1990-01-03 01:45:00+08:00      e1_les_sdh            1       1             214   //E1 performance    
  9     15m     1990-01-03 01:45:00+08:00      e1_lses_sdh           1       1             214   //E1 performance 

NULL 
The soft has bug,both performance of T1 and E1 in the performance record when query performance. 

Issues related to its prior version Huawei MSTP OSN500V100R002C01SPH307 all versions. and Solve its later version is V100R002C01SPC309 
N/A

when there is an abnormality of the T1 service, check tributary board performance. and found the performance of both T1 and E1. Then check the board software, there has defective in performance records. Modified the relevant code of this version and verification, it can solve this abnormal performance records.

About the Huawei SDH equipment Version which has this problem, if there has only T1 configuration, ignore E1 performance, if there has only E1 configuration, ignore T1 performance.

Attention: Abnormal OSC Communication of the OptiX OSN 1800

The networking is as follows:
Station A and station B are configured with the LWX2 and the DMD2S. Each SCC is configured with one 1551-nm optical module to implement the OSC communication. The configuration of the two stations is the same. The NE version is 5.67.1.22. After Huawei optical power commissioning is complete, the ECC between station A and station B fails.  1.       The optical power is incorrect.
2.       The optical module is faulty.Huawei OSN 1800
3.       The SCC software is defective The SCC software has a bug. When the PHY chip used by the OSC optical port on the SCC works in BASE-EFX mode, special requirements are raised for the optical module. That is, the chip can be linked only when the amplitude of the electrical input signals of the optical module is higher than 800 mV. The amplitude of the electrical input signals of the currently used optical module is about 800 mV. Therefore, the OSC communication may fail.

  1. The LWX2 does not support the ESC. Station A and station B use the optical module on the SCC to implement the OSC communication. Check the receive optical power of the RM1/TM1 optical port on the SCC. The check result shows that the receive optical power is normal. There is no abnormal alarm. Run :cm-get-eccroute. The remote NE cannot be queried. Run :cm-get-bdinfo to query the DCC state. The query result shows that the DCC state is rx-f. Reset the SCC. The fault persists.
  2. Install OptiX OSN 1800 A and OptiX OSN 1800 B in the same equipment room. Interconnect the RM1s/TM1s of the two OptiX OSN 1800s with the added fixed optical attenuators through fibers. Run :cm-get-eccroute. The remote NE still cannot be queried. The DCC state is still rx-f.
  3. The confirmation result by the related experts shows that the SCC software has a bug. When the PHY chip used by the OSC optical port on the SCC works in BASE-EFX mode, special requirements are raised for the optical module. That is, the chip can be linked only when the amplitude of the electrical input signals of the optical module is higher than 800 mV. The amplitude of the electrical input signals of the currently used optical module is about 800 mV. Therefore, the OSC communication may fail.
  4. Software mitigation is performed in the latest version (5.67.01.24). The problem is solved. After the NE software is upgraded to this version, the OSC communication is normal on Huawei WDM OSN 1800.

How to Prevent Transient FAN_FAIL on OptiX OSN 6800

Summary:
When only one SCC board is configured for OptiX OSN 6800, there is a low probability that the fan tray assembly with FPGA V200 transiently reports a FAN_FAIL alarm.

[Problem Description]
trigger condition:
The FPGA version of the fan tray assembly on OptiX OSN 6800 is V200, and only one SCC board is configured.

Symptom:
The indicator on the fan tray assembly blinks red and green alternately. When it blinks red, the fans speed up and a FAN_FAIL alarm is reported on the NMS. When it blinks green, the fan speed restores to normal.

Identification method:
Identify the problem as follows:
Step 1 On the NMS, query for FAN_FAIL alarms that are reported and cleared at random.
Step 2 On the NMS, query the FPGA version.
1. On the main menu, choose Inventory > Physical Inventory.

1.Physical Inventory

2. Under Physical Inventory Type, click Board.

2,click board

3. On the Board List tab, click Filter. In the Filter dialog box that is displayed, enter fan in Board Name to filter fan boards. Then click OK.

3.click filter

4. In the window that displays the filter result, view the FPGA version information.

4.check the FPGA version

If the FPGA version is V200, go to the next step.
Step 3 Run the :cfg-get-phybd command to query SCC configuration. If only one SCC board (TN51SCC or TN52SCC) is queried in slots 17 and 18, then only one SCC board is configured.
Step 4 At the equipment room, check the indicator status on the fan tray assembly. The highlight in the following figure shows the indicator position. If the indicator blinks red and green alternatively, and the fans speed up when it blinks red and restore to the normal speed when it blinks green, the fan tray assembly is involved in the problem.

check the indicator

[Root Cause]
When OptiX OSN 6800 is equipped with only one SCC board, the fan board with FPGA V200 is quite sensitive to the spurs of the fan speed adjustment signals from the SPI bus. The FAN_FAIL alarm possibly results. If the ambient temperature of the equipment room exceeds the normal operation temperature range (0℃ to 45), ℃ there is a higher probability that this alarm is reported.

[Impact and Risk]
The transient FAN_FAIL alarms have no impact on services but reduce customer satisfaction.
[Measures and Solutions]
Recovery measures:
None.
Workarounds:
Configure active and standby SCC boards if permitted.
Preventive measures:
Replace the fan tray assembly whose FPGA version is V200 with one whose FPGA version is V210.