Monthly Archives: November 2014

Pay Attention to Service Interruption on FPGAs Cross-Connect Boards

[Problem Description]
Trigger conditions:

The FPGA of a cross-connect board is faulty and in an incorrect state, so the FPGA falsely checks the status of the service board and triggers MSP switching with an abnormal state, resulting in service interruption.

1


Symptom:
MSP switching occurs, services are interrupted, and the faulty cross-connect board is reset or active/standby switching occurs.

Identification method:
When the following two conditions are met, it can be determined that the problem is triggered:

  •  MSP switching occurs. Only services configured on the MS are interrupted while other services of the NE are normal.
  • The faulty cross-connect board is reset or active/standby switching occurs.

[Root Cause]
Scenario 1: The cross-connect board functions as the standby board. When the FPGA of the cross-connect board is faulty and in an incorrect state, the FPGA falsely considers that the running status of the line board is abnormal and mistakenly treats itself as the active board, and so triggers the MSP switching protocol. However, the real active cross-connect board does not switch the cross-connect matrix based on the MSP switching protocol (because the board detects that the service board is normal). As a result, services configured on the MS are interrupted.
Scenario 2: The cross-connect board functions as the active board. When the FPGA of the cross-connect board is faulty and in an incorrect state, the FPGA checks that the running status of the service board is abnormal, and so triggers MSP switching and switches cross-connection pages; while the standby board does not detect the abnormality and so does not switch cross-connect pages, resulting in a difference in the cross-connect pages between the active and standby boards. In addition, the active board automatically resets due to a logical status error, and the standby board becomes the active board after the reset. Therefore, services are interrupted ultimately due to the difference in the cross-connect pages between the active and standby boards.

[Impact and Risk]
Impact: Services configured on the MS are interrupted.
Risk: The FPGA failure was an occasional failure, so the problem rarely occurs.

[Measures and Solutions]
Recovery measures:
Quickly replace the faulty cross-connect board after re-enabling the MSP protocol.
Preventive measures:
Upgrade the NE to R10C03SPC208 or a later version. The service interruption caused by the faulty cross-connect board can be avoided.
Solution:
Upgrade the NE to R10C03SPC208 or a later version and replace the faulty cross-connect board.
Material handling after replacement:
N/A

[Inspector Applicable or Not]
None
[Rectification Scope and Time Requirements]
N/A
[Rectification Guide]
N/A
[Attachment]
None

TwitterLinkedInGoogle+FacebookPinterestTumblrStumbleUponRedditShare

Pay attention to Alarms Reported on Some Boards for OptiX Metro 6100 and OptiX BWS 1600G

[Problem Description]
Trigger conditions:
This issue is triggered when operations are performed on the flash memories of the boards such as the C9ELOG board intended for OptiX Metro 6100 or OptiX BWS 1600G. The operations include writing logs and saving configurations.

QQ截图20141126100556

Symptom:
During the running of the OptiX Metro 6100 or OptiX BWS 1600G, a WRG_BD_TYPE alarm is reported on a board. A MAIL_ERR alarm may be also reported on the board.
Identification method:
You can identify the issue when the following conditions are met:
1. A WRG_BD_TYPE or MAIL_ERR alarm is reported on a board and the physical board type of the board where the WRG_BD_TYPE alarm is generated is 00 01.
2. The software versions of the boards involved are the following or earlier versions:
OptiX BWS 1600G:
SSE2SC1/E2SC2: 2.16
SSE3RPC03: 3.18
SSE3ELOG: 3.26
SSE1MCA01/03/07/08: 1.28
SSE8ELOG/SSE8LBF: 8.33
SSE8ETMX: 8.33
SSE8TMR: 8.33
SSE1ROP01: 2.24
SSE1FMU02: 1.14
OptiX Metro 6100:
SSC9SC2: 2.16
SSC9ELOG: 3.24
SSL2MCA/C7MCA/C7MCA01/SSCBETMX: 1.27
SSCBELOG: 8.31
[Root Cause]
When flash memory writing operations are performed on boards such as the C9ELOG board intended for OptiX Metro 6100 or OptiX BWS 1600G, memory overwriting may occasionally occur. As a result, the message queue is blocked, the board type 00 01 is reported, and the WRG_BD_TYPE and MAIL_ERR alarms are reported.

[Impact and Risk]
When the fault occurs, the involved boards cannot be configured. Services are not affected.
[Measures and Solutions]
Recovery measures:
Remove and re-insert the involved boards
Workarounds:
None
Preventive measures:
Upgrade the software package to Huawei OptiX Metro 6100 V100R008C04SPC600 or OptiX BWS 1600G V100R007C02SPC700 or later.

Precaution on Service Password Restoring the Fault of Oracle M-series Server

[Problem Description]

If the Oracle M-series (M4000 or M5000) server is not in the Oracle’s maintenance warranty services, no service password is provided. Without the service password, the service cannot be restored if the servers have the following faults:
1. For Oracle M-series server such as M4000 or M5000, if the hardware status is faulty when the server is powered off unexpectedly but the hardware is functioning properly, apply for the service password to Oracle. After you use the service password to clear the faulty status, the server can be started properly.
2. The indicators for some alarms, such as over-temperature alarms, are red even after the alarms are restored. Apply for the service password to Oracle to clear the alarms.
3. For some parts of the server, many alarms are generated, such as the faulty memory unit verification or damaged fans or power supply. The parts are added to the blacklist. Apply for the service password to Oracle to clear them from the blacklist. Otherwise, the parts cannot be activated even though they are replaced.

[Influence and Risk]
If the parts are not in the Oracle’s warranty services, you cannot obtain the service password. Therefore, you cannot remove the faults of Oracle M-series servers and restore the services.

[Workaround and Solution]
The service manager at each representative office must organize maintenance engineers to operate as the following steps:
1. Obtain the serial numbers(S/N) of all Oracle M Serial servers used in the office.
2. Query the maintenance status of Oracle M Serial servers:

  • For the Oracle servers with local vendor services, dial the local technical support hotline of Oracle to query for the maintenance status. As local technical support hotline of Oracle, refer to attachment Oracle Global Support Capability_2012Q3.
  • For the Oracle servers without local vendor services, send email to transfer@huawei.com to query the maintenance status of Oracle servers.

3. The service manager at each representative office must advise customers to purchase the Oracle’s maintenance warranty services for Oracle servers that are not in the maintenance warranty services:

  •  Oracle’s maintenance services should be purchased in regions with Oracle local vendor services.
  •  Huawei Integrated services should be purchased in regions without Oracle local vendor services. Then contact the following service contact to purchase the Oracle’s maintenance warranty services through China Oracle.

As per Oracle’s latest notification that when upgrade firmware version to 1115, the service password
is no need for restoring the fault of Oracle M-series Server. Please refer to the attachment< M-Server Firmware Update Procedure> for operation details.

 

Pay attention on the Failure of TNH2SL1D and TNH2SP3D Boards Get Online on OSN 500

[Background Information]
TNH2SL1D, TNH2SP3D, and TNH2PL3T boards are developed based on OptiX OSN 550 V100R003 and are used on OptiX OSN 500 V100R002 to replace TNH1SL1D, TNH1SP3D, and TNH1PL3T boards.
TNH2 boards are applied to OptiX OSN 500 V100R002 using the hardware independence technology. Software of OptiX OSN 500 earlier than V100R002SPC305 does not contain the driver for TNH2 boards. The NEs dynamically load the driver stored in the flash memory of TNH2 boards. Then the NEs use the interface that is used to access the driver of TNH1 boards to access the driver of the TNH2 boards. In this case, TNH1 boards can be replaced by TNH2 boards without changes on the NE software.
The driver stored in the flash memory of TNH2 boards is called the hardware independent driver.

Huawei OSN 500

[Problem Description]
Trigger conditions:
The OptiX OSN 500 version is earlier than V100R002C01SPC305 (5.62.02.16). A TNH2SL1D board (independent driver version: 120) or TNH2SP3D board (independent driver version: 120) is installed.
Symptom:
After a TNH2SL1D board is installed in the extended slot of the NE, the board fails to go online and the physical board cannot be queried on the NMS. After a TNH2SP3D board is installed, the NE becomes unreachable. After the TNH2SL1D board or TNH2SP3D board is installed on an OptiX OSN 500 V100R002C01SPC305 (5.62.02.16) or later, the board goes online normally.
Identification method:

This issue is triggered when both of the following conditions are met:
1. The OptiX OSN 500 version is earlier than V100R002C01SPC305 (5.62.02.16).
:ver
BIOS 8.26.21T01 20131122 14:00:56 inactive
ExtBios 9.26.21T01 20131122 14:03:21 active
NeSoft(P) 5.62.02.12P03 20110907 00:29:05
Platform(D) 5.00.13.B221 20100105 10:37:20
Logic (U301)230
Dsp
2. The independent driver version of the TNH2SL1D board or TNH2SP3D board is 120.

If the board is installed in slot 3, the address for querying the independent driver version is 0×53460090. If the board is installed in slot 4, the address for querying the independent driver version is 0x53C60090.
For the TNH2SL1D board, the three digits after tnh2slxd.hwx indicate the independent driver version.
:dm:0x53c60090
53c60090 31 71 53 64 38 35 30 32 2e 6f 20 2d 3e 20 74 6e 1qSd8502.o.->.tn
53c600a0 68 32 73 6c 78 64 2e 68 77 78 00 31 32 30 00 4c h2slxd.hwx.120.L
53c600b0 75 53 4c 31 51 31 31 30 54 30 32 5f 45 6e 74 72 uSL1Q110T02_Entr
53c600c0 79 00 5d 00 00 10 00 b9 c1 05 00 00 00 00 00 00 y.]………….
53c600d0 3f 91 45 84 68 34 8a 09 0a 41 50 57 dc 0c b6 b3 ?.E.h4…APW….
53c600e0 d0 7f 14 61 52 c8 fe 2f 1e 75 98 3a 5f f6 6b f8 …aR../.u.:_.k.
53c600f0 be d4 42 91 d5 be c9 0b 73 be 05 51 33 61 3d 2f ..B…..s..Q3a=/
53c60100 0b 1c cb 85 79 d8 a2 0d c2 67 18 79 b1 f4 25 a7 ….y….g.y..%.
53c60110 42 52 88 f8 51 c0 a8 ca 6b a1 d1 9c d3 de a9 cf BR..Q…k…….
53c60120 94 18 2f fb d3 c0 96 ef 95 ff b7 13 c2 6c 65 19 ../……….le.

For the TNH2SP3D board, the three digits after tnh2sp3d.hwx indicate the independent driver version.
:dm:0×53460090
53460090 33 44 53 64 35 39 37 2e 6f 20 2d 3e 20 74 6e 68 3DSd597.o.->.tnh
534600a0 32 73 70 33 64 2e 68 77 78 00 31 32 30 00 54 75 2sp3d.hwx.120.Tu
534600b0 53 50 33 44 31 31 30 54 30 32 5f 45 6e 74 72 79 SP3D110T02_Entry
534600c0 00 5d 00 00 10 00 18 0d 04 00 00 00 00 00 00 3f .]………….?
534600d0 91 45 84 68 34 8a 09 0a 41 50 57 dc 0c b6 b3 c7 .E.h4…APW…..
534600e0 d5 86 19 0b ce 72 2c 71 ea cf af fb 52 aa d3 99 …..r,q….R…
534600f0 04 8c 14 4f 68 70 7b 2d 02 74 fc 7a bb 2f 8e 42 …Ohp{-.t.z./.B
53460100 51 8e cc 90 d6 e8 9c 45 07 93 31 4c 20 36 66 20 Q……E..1L.6f.
53460110 09 fa 2a 2a a1 28 4c 4d 46 9d 11 5b fd 01 76 ac ..**.(LMF..[..v.
53460120 e2 0c 9e 6f 03 9e 33 1b a5 46 23 01 bd da e5 58 …o..3..F#….X

[Root Cause]
If the NE version is earlier than V100R002C01SPC305 (5.62.02.16), the NE software does not contain the driver for the TNH2SL1D or TNH2SP3D board. The NE must load the independent driver from the flash memory of the board. If the independent driver version of the TNH2SL1D or TNH2SP3D board is 120, the driver interface defined by the driver is different from that defined by the NE. Therefore, the board fails to go online, or NE tasks are suspended.
[Impact and Risk]
After a TNH2SL1D board is installed, the board fails to go online. After a TNH2SP3D board is installed, the NE becomes unreachable.

[Measures and Solutions]
Recovery measures:
Remove the TNH2SL1D or TNH2SP3D board.
Workarounds:
Before installing a TNH2SL1D or TNH2SP3D board, upgrade the OptiX OSN 500 to V100R002C01SPC305 (5.62.02.16) or later.

Preventive measures:
Upgrade the independent driver or the OptiX OSN 500 software.
1. Upgrade the independent driver:
Upgrade the independent driver of the TNH2SL1D board to version 131, and that of the THN2SP3D board to version 130. For details, see the attachment《OptiX OSN 500 Independent Driver Upgrade&Downgrade Guide》.
2. Upgrade the OptiX OSN 500 software:
Upgrade the NE software to V100R002C01SPC305 (5.62.02.16) or later.

[Rectification Scope and Time Requirements]
N/A
[Rectification Instructions]
N/A
[Attachment]
None

Notice on Bare Power Terminals Configured for OptiX OSN 3500

Abstract: OptiX OSN 3500 type-III cabinets (BOM: 02113256) started to be manufactured and delivered by India Supply Center from September, 2011. Cabinet process inspection shows that the cabinet power OT terminals delivered by India Supply Center are not fully wrapped in the heat shrink tubing. The entire cabinet test shows that in all power-on scenarios, once configured, bare power terminals may spark, burning themselves, burning the cabinet, or even causing human injuries. Idle power terminals must be provided with insulation protection and cannot be bare ones.

OptiX OSN3500

[Problem Description]
Trigger conditions:
OptiX OSN 3500 type-III cabinets (BOM: 02113256) manufactured and delivered by India Supply Center have idle bare power terminals.
Fault symptoms:
In all power-on scenarios, once configured, bare power terminals may spark, burning themselves, burning the cabinet, or even causing human injuries.
Identification methods:
Once bare power terminals spark, power terminals are burnt, or the cabinet is burnt upon power-on, query the cabinet BOM code. The problem is identified if the cabinet BOM code is in the delivery list. (For the delivery list, refer to the following attachments.)

[Root Cause]
Cabinet power terminals must be wrapped in insulating bags. However, cabinet power terminals provided by India Supply Center are not wrapped in insulating bags, since related regulations are not synchronized to India Supply Center.

[Impact and Risks]
In all power-on scenarios, once configured, bare power terminals may spark, burning themselves, burning the cabinet, or even causing human injuries. Therefore, rectification is required for all idle and bare power terminals.
Mandatory rectification: (In the case of a live network, detailed rectification requirements are specified in the corresponding rectification notice that is to be released by the product line. In the case of new network construction, power terminals on all involved cabinets must be replaced before the project delivery.)

MaterialType Limitation
Cabinet Bare power terminals may spark, burning themselves, burning the cabinet
or even causing human injuries.

[Measures and Solutions]
Recovery measures:
After identifying this problem based on the cabinet BOM code, visit the site and wrap the bare power terminals with adhesive PVC insulation tape. The operations are as follows:
Step 1 Ensure that the circuit breaker for the power cable unconnected to a subrack is in closed state. If the circuit breaker is in open state, close it and perform the subsequent operation.
Step 2 Wear gloves and fully and securely wrap the bare power terminals with PVC insulation tape.

—-End
Remarks:
Check for the system control board BOM code in the contract (for the system control board delivery list, refer to the following attachment), locate the system control board based on the BOM code, and determine the location of the cabinet based on the location of the system control board.
Workarounds:
None
Solutions:
Frontline engineers need to perform the rectification in time according to the corresponding rectification notice. In the case of a new network construction project, they need to complete rectification for all involved cabinets before the project delivery.

Cations for OTU2_LOF Reported on TN55NPO2(E) in OptiX OSN 8800 Products

Summary:
The TN55NPO2(E) board reports an OTU2_LOF alarm on corresponding channels because of large leakage current on the X5R capacitor of the board. This problem has been reported several times on live networks.

Huawei osn 8800 products:

Optix osn8800

[Problem Description]
Trigger condition:
None
Symptom:
When the problem occurs, the TN55NPO2(E) board reports an OTU2_LOF alarm on corresponding channels and services on the channels are interrupted.

Identification method:
The problem is caused by defective GN2405 components, which are used only on TN55NPO2(E) boards. If a TN55NPO2(E) board reports an OTU2_LOF alarm on corresponding channels, identify whether the board is involved in the problem using either of the following methods:

  •  On the peer board, run the command for bypassing the PLL for each of the channels with the OTU2_LOF alarm. If the alarm is cleared and services are restored on the channels, the board is involved in the problem.
  • On the peer board, run the command for collecting the value of the LOL status register inside the GN2405 component for about 200 times for each of the channels with the OTU2_LOF alarm and send the collected data back to Huawei R&D for identification.

[Root Cause]
The GN2405 components in a lot are defective. To be specific, the X5R filter capacitors of the LF pins on the GN2405 components in the lot generate large leakage current (as large as uA) because of cracks or gaps caused by unqualified materials, mechanical stress, or electrical stress, or because of environment changes such as temperature or humidity changes. When the problem occurs on a GN2405 component, the PLL inside the component works abnormally and locks a non-service frequency, causing the OTU2_LOF alarm.

[Impact and Risk]
The TN55NPO2(E) boards are faulty, affecting customer experience.
The OTU2_LOF alarm is reported and services on corresponding channels are interrupted.

[Measures and Solutions]
Recovery measures:

Perform either of the following recovery measures:

  •  Replace the faulty boards.
  •  Run the command for bypassing the PLL for each of the channels with the OTU2_LOF alarm. In this manner, the OTU2_LOF alarm can be cleared and services can be quickly restored. Pay attention to the following points when running the command for bypassing the PLL:
  •  If you run the command for a channel without the OTU2_LOF alarm, services on the channel will be interrupted for approximately 1s. Therefore, confirm the channels with the OTU2_LOF alarm or perform SNCP switching to switch services on a faulty board to the related protection board before running the command.
  • The command applies only to V100R006C01SPC200 and later versions.
  • You can run the command for channels that do not carry services to prevent the problem.
  • Do not perform cold resets or power-off restarts on boards where services have been restored by running the command. If such boards undergo cold resets or power-off restarts, run the command again on the boards.

Workarounds:
None
Preventive measures:
Perform either of the following preventive measures to resolve the problem:

  • Software Solution:On the peer board, run the command for bypassing the PLL for each of the channels with the OTU2_LOF alarm, achieving quick service restoration. Do not run the command for a channel that has no OTU2_LOF alarm; otherwise, services on the channel will be transiently interrupted for approximately 1s.
  •  Hardware Solution:Replace the faulty board with a newly delivered board.Replace the X5R filter capacitor with the X7R filter capacitor used on the LF pin of the GN2405 component. The BOM list has been updated in June, 2013.