When Prewarning on Service Unavailability of H806GPBH & H806GPBD Boards of MA5600T

Keywords: H806GPBH, H806GPBD, DDR3 cache, slow Internet access, board reset

Abstract:

When a large number of users are connected to the H806GPBH/H806GPBD boards, the DDR3 read

operation occasionally becomes abnormal and the external DDR3 cache returns incorrect data, causing

wrong service packets. As a result, slow Internet access, dialup failure, and board reset may occur.

 

[Problem Description]

Trigger conditions

1. A large number of users are connected to the H806GPBH/H806GPBD boards (this problem is more

likely to be triggered when the number of users exceeds 300, and more users mean a higher probability

for this problem to occur), traffic is heavy, or traffic burst occurs.

2. Devices use the patches of versions earlier than V800R008SPC321, V800R010SPC111, V800R011SPC109,

V800R012SPC106, and V800R013C00SPC205.

 

The preceding conditions are mandatory conditions for the issue to occur. However, if these conditions are met,

the problem does not necessarily occur (for details, see the root cause analysis).

Symptom:

Users connected to the H806GPBH/H806GPBD boards encounter slow Internet access or dialup failures.

Even board reset may occur.

Location method 1 (manual):

When an H806GPBH or H806GPBD board encounters any fault mentioned above, check the DDR cache through

the transparent channel. Then, determine the problem based on the read/write result.

Step 1 Check whether the OLT and board versions are earlier than V800R008SPC321, V800R010SPC111,

V800R011SPC109, V800R012SPC106, or V800R013C00SPC205.

MA5600T(config)#display patch all

Software Version:MA5600V800R011C00

SPC100

SPH103

HP1102

————————————————————————

Current Patch State:

————————————————————————

Patch Name        Patch State     Delivery     Attribute     Dependency

————————————————————————

SPC100            running         common       cold patch    NO

SPH103            running         common       hot patch     NO

HP1102            running         common       hot patch     NO

————————————————————————

Total:3

Patches in the system cannot be rolled back

Step 2 Enter the transparent channel of the board.

MA5680T(config)#diagnose

MA5680T(diagnose)%%su

Challenge:E8BUH36K

Please input password:                — password (can be obtained using a password generation tool)

MA5680T(su)%%transparent on 0/slotid                      —slotid indicates the slot ID of the board.

Serial redirect function is enabled now!

Step 3 Run the following three groups of commissioning commands consecutively. If all the three return values

of a command group are incorrect, the DDR3 partition related to this group is faulty. As long as the execution

results of one or more group of commands indicate a fault, the problem may occur (note that, to ensure accuracy,

the interval for executing the three groups of commands must be short).

MA5680T(su)%%tm set indirect-reg 0x48000040 0x55555555

Write register 0x48000040 0x55555555 successfully!

MA5680T(su)%%tm dis indirect-reg 0x48000040 0x1

0x40000040: 55555555

MA5680T(su)%%tm set indirect-reg 0x48000040 0xaaaaaaaa

Write register 0x48000040 0xaaaaaaaa successfully!

MA5680T(su)%%tm dis indirect-reg 0x48000040 0x1

0x40000040: aaaaaaaa

MA5680T(su)%%tm set indirect-reg 0x48000040 0xffffffff

Write register 0x48000040 0xffffffff successfully!

MA5680T(su)%%tm dis indirect-reg 0x48000040 0x1

0x40000040: ffffffff

MA5680T(su)%%tm set indirect-reg 0x58000040 0x55555555

Write register 0x58000040 0x55555555 successfully!

MA5680T(su)%%tm dis indirect-reg 0x58000040 0x1

0x50000040: 55555555

MA5680T(su)%%tm set indirect-reg 0x58000040 0xaaaaaaaa

Write register 0x58000040 0xaaaaaaaa successfully!

MA5680T(su)%%tm dis indirect-reg 0x58000040 0x1

0x50000040: aaaaaaaa

MA5680T(su)%%tm set indirect-reg 0x58000040 0xffffffff

Write register 0x58000040 0xffffffff successfully!

MA5680T(su)%%tm dis indirect-reg 0x58000040 0x1

0x50000040: ffffffff

MA5680T(su)%%tm set indirect-reg 0x68000040 0x55555555

Write register 0x68000040 0x55555555 successfully!

MA5680T(su)%%tm dis indirect-reg 0x68000040 0x1

0x60000040: 55555555

MA5680T(su)%%tm set indirect-reg 0x68000040 0xaaaaaaaa

Write register 0x68000040 0xaaaaaaaa successfully!

MA5680T(su)%%tm dis indirect-reg 0x68000040 0x1

0x60000040: aaaaaaaa

MA5680T(su)%%tm set indirect-reg 0x68000040 0xffffffff

Write register 0x68000040 0xffffffff successfully!

MA5680T(su)%%tm dis indirect-reg 0x68000040 0x1

0x60000040: ffffffff

 

If “Read register fail errorcode” is displayed in the test, the problem can be identified.

MA5680T(su)%%tm display indirect-reg 0x68000040 1

0x68000040:

Read register fail errorcode=1082982587!   —This problem can be identified as long as this output is displayed.

—-End

Location method 2 (PMI tool)

If a board involves the problem symptom, upgrade the preventive maintenance inspection (PMI) tool using the

package attached in this document and then perform PMI to identify the problem. Install the required patch

if the following PMI result is displayed: “Detected DDR error. Solution: Update to SPC321 if is R8; update to

R11SPC109 if is R11. Should you have any question, please contact R&D, Zhouhao 00140882.”

 

For the PMI tool upgrade package, see the link:

http://support.huawei.com/support/pages/editionctrl/catalog/ShowSoftDetail.do?actionFlag=displaySoftInfo&node_id=000001539334&web_doc_id=SW0000627564&colID=ROOTENWEB|CO0000000174

[Root Cause]

When packet traffic is heavy, there is a small probability that the interval at which the FPGA reads and writes

the DDR3 is too short and cannot meet the DDR3 requirement. As a result, DDR3 becomes abnormal and

packets are incorrect, causing slow Internet access, dialup failure, or even board reset.

 

[Impact and Risk]

This problem occurs with a low probability and may cause slow Internet access, dialup failure, and

occasional board reset, which will affect live-network services.

 

[Measures and Solutions]

Recovery measures:

This problem is triggered occasionally and can be rectified by resetting the board affected.

Because this problem occurs occasionally and may trigger another DDR3 exception and relevant

problems, install the required patches for the faulty board.

Preventive measures:

None

Solutions:

Upgrade patches V800R008SPC321, V800R010SPC111, V800R011SPC109, V800R012SPC106,

and V800R013C00SPC105 accordingly to resolve the problem.

 

[Warning Expiration]

This warning automatically expires after the trigger conditions are not met.

Categories:

Tags:

Comments are closed