Monthly Archives: April 2016

when Ring MSP Configuration Failures on SSN1SLQ16 Boards on OSN 3500 Products

Keywords:

SSN1SLQ16MSTP, OptiX OSN 3500

[Problem Description]

Trigger condition:

1. The version of an OSN 3500 NE is one of the versions listed in Versions Involved

in the preceding table.

2. The logical board is SSN1SLQ16.

3. Configuration data is downloaded from the network management system (NMS), or

ports are deleted and then added.

Symptom:

l During the configuration of the multiplex section protection (MSP), an error message

Non-pair board that is checked. Error Code: 38742” is displayed.

l A board replacement (such as replacing SSN1SLQ16 with SSN2SLQ16) fails, and the

error code is 41264.

l The downloading from the NMS to the NE is partially successful, and the error code

is 38742.

 

If this issue is triggered on the OSN 3500 NE of a version listed in Versions Involved

in the preceding table and not addressed in a timely manner, it still exists after an upgrade.

Identification method:

This issue can be identified when all the following two conditions are met:

1. The logical board is SSN1SLQ16, and the ring MSP is configured on a port of the

SSN1SLQ16 logical board. An error is reported, and the error code is 38742.

4. The OSN 3500 NE is upgraded from one of the versions listed in Versions Involved.

[Root Cause]

When the ring MSP is configured on optical ports (at the STM-16 rate or a higher rate) that

support the shared MSP, the ring MSP can be configured in non-paired slots. For an NE of

a version listed in Versions Involved, the configuration files of ports on the SSN1SLQ16

logical board are registered as not supporting the shared MSP. However, the configuration

file of the SSN1SLQ16 logical board is registered as supporting the shared MSP. As a result,

when the SSN1SLQ16 logical board is added, the automatically added ports support the

shared MSP. When the ports are deleted and then added again, port configuration files are

invoked. The newly added ports do not support the shared MSP. Therefore, when the ring

MSP is configured, the error message “Non-pair board that is checked. Error Code:

38742″ is displayed.

The logical board is added first during the downloading from the NMS, and the optical ports

are automatically added. The automatically added optical ports support the shared MSP (the

MSP can be configured in non-paired slots). Then all the optical ports are deleted, and optical

ports to be used are added using the NMS. Therefore, this issue exists on all ports of the

SSN1SLQ16 logical board after configuration data is downloaded from the NMS. If the ring

MSP is configured on a port, an error message is displayed during the downloading from the

NMS.

[Impact and Risk]

The ring MSP configuration on the SSN1SLQ16 board in a non-paired slot fails. The downloading

from the NMS to an NE configured with the ring MSP also fails, and the error code is 38742.

However, the delivery of other services on the NE is not affected.

[Measures and Solutions]

Recovery measures:

Delete and then add the logical board on an NE of a version listed in Versions Involved. The

automatically added logical ports do not have this issue. However, the preceding operations are

complex because services are carried on the live network. It is recommended to upgrade the NE

to a version in which this issue has been resolved, and then download configuration data from

the NMS.

Preventive measures:

Do not delete and then add ports for the SSN1SLQ16 logical board on an NE of a version listed

in Versions Involved. If a board provides the version replacement function, it is not

recommended to configure the SSN1SLQ16 board as the logical board.

Solution:

Upgrade NEs to V100R008C02SPC300 or later versions, or to V100R010C03SPC200 or later

versions.

 

If this issue is triggered before an upgrade, the issue still exists after the upgrade. In this case,

choose any of the following three methods to resolve the issue:

Delete and then add ports of the logical board again.

Delete and then add the logical board again.

Download configuration data from the NMS.

Material handling after replacement:

N/A

[Inspector Applicable or Not]

N/A

[Rectification Scope and Time Requirements]

N/A

[Rectification Instructions]

N/A

[Attachment]

N/A

TwitterLinkedInGoogle+FacebookPinterestTumblrStumbleUponRedditShare

How to do when Occasional Hot Patch Installation Failures?

Summary:

When the active system control board on an NG-SDH product of a version listed in

Versions Involved in the preceding table starts from a reset, there is a possibility

that the patch package module cannot obtain the software version of the active system

control board. As a result, software version verification fails when a patch is loaded,

and the patch cannot be installed on the active system control board.

[Problem Description]

Trigger condition:

1. A patch of a version listed in Versions involved in the preceding table is installed

on OSN 1500/OSN 2500/OSN 3500/OSN 3500 II/OSN 7500 equipment.

2. The system control board starts from a reset.

Symptom:

A hot patch cannot be installed on the active system control board.

Identification method:

1. The version of an OSN 1500/OSN 2500/OSN 3500/OSN 3500 II/OSN 7500 NE is

V100R010C03.

2. Query the NE version by running the following command or using the NMS.

 

SDH

SDH

3. Run the following command to query the version of the active system control board

recorded in the patch package:

:mon-get-dump:18,”PATCH.IPATCH.CPATCH”,”018″

The numbers in red represent the slot ID of the active system control board.

The command output indicates that the version of the active system control board is

empty (the ProgVer field behind the slot ID of the active system control board is empty,

as shown in the following figure).

 

SDH

SDH

[Root Cause]

After the active system control board starts from a reset, the patch package module issues

a command to the software management module to query the software version. Because the

CPU is busy, the software management module does not send the software version to the

patch package module within the timeout period. As a result, the query for the software version

times out, and the software version of the active system control board recorded in the patch

package is empty. When the patch is installed, the NE software verifies the software version of

the active system control board and finds that it is not consistent. The verification fails, and the

installation of the hot patch for the active system control board is stopped.

[Impact and Risks]

A patch cannot be installed for the active system control board. Issues which can be resolved by

installing a hot patch remain unresolved.

Measures and Solutions

Preventive measure:

Before installing a patch, run the following command to query whether the version of the active

system control board recorded in the patch package is empty:

:mon-get-dump:18,”PATCH.IPATCH.CPATCH”,”018″

The numbers in red represent the slot ID of the active system control board.

In normal cases, the ProgVer field behind the slot ID of the active system control board records

the detailed version number, as shown in the following figure.

 

SDH

SDH

When exceptions occur, the ProgVer field behind the slot ID of the active system control board is

empty, as shown in the following figure.

 

SDH

SDH

 

If the version is empty, warm reset the active system control board, or perform an active/standby

switchover between the system control boards (for details, see recovery measures). Then, query the

software version of the active system control board again to ensure that the software version is recovered.

Recovery measures:

5. When possible, warm reset the active system control board.

6. If an NE houses an active system control board and a standby system control board, a patch has

been installed on the standby system control board, and batch backup operations on the active and

standby system control boards are complete, then manually trigger an active/standby switchover

between the active and standby system control boards to resolve the issue when possible.

Solution:

Upgrade the NE software to V100R010C03SPC220 (which will be released in the first quarter of 2015)

or a later version, in which the patch going-online mechanism of the active system control board is

optimized and the query for the software version of the active system control board will not time out.

[Rectification Scope and Time Requirements]

N/A

[Rectification Instructions]

N/A

[Appendix]

N/A

[Inspector Applicable or Not]

Use the inspector to check the entire network. Upon detection of an NE that is suspected to have this issue,

it is recommended to perform the recovery measures and then upgrade the NE.

Version of the inspector: SmartKit V200R009C00SPC201 or later

Inspector upgrade package:

Common_Inspector_V200R009_ON_20140909163631680.exe

Inspector_V200R009_ON_OptiX OSN 1500, OptiX OSN 1500+,+_20140909163631780.exe

Test case name (which will be released at the end of September 2014): Check whether the version number

of the active system control board cannot be obtained and the hot patch cannot take effect.

Why Broadcast Packets Cannot Be Transmitted Successfully on Several HMSTP Data Boards?

Keywords: OptiX OSN 3500/7500/7500 II, ELAN, LAG, PW, UNI, MSTP

Summary:

When a physical port on a SSN1PEG8, SSN1PEX2, SSN2PEX1, TNN1EG8, TNN1EG16, or TNN1EX2 board

of the OptiX OSN 3500/7500/7500 II in a link aggregation group (LAG) is configured in an ELAN, and the

board has only one UNI and one PW interface in the ELAN, some broadcast packets cannot be transmitted

successfully in the ELAN.

 

[Problem Description]

Trigger conditions:

This fault occurs if the following three conditions are all met:

l The versions of SSN1PEG8, SSN1PEX2, SSN2PEX1, TNN1EG8, TNN1EG16, TNN1EX2 boards of OptiX

OSN 3500/7500/7500 II are involved in this pre-warning.

l A port of these data boards in a LAG is configured in an ELAN, and each board has only one UNI and one

PW interface in the ELAN.

l There are broadcast packets in the ELAN.

Fault symptoms:

Broadcast packets cannot be transmitted from the UNI to the PW interface, whereas unicast packets can be

transmitted from the UNI to the PW interface.

  • Identification method: Check whether the versions of the equipment are involved in this pre-warning.

 

OSN 3500

OSN 3500

 

  • Check for LAG configurations.

 

OSN 3500

OSN 3500

 

  •  Check whether the board in a LAG has only one UNI and one PW interface in one ELAN.

 

Note:

Network management personnel can check configurations of UNI interfaces and PW interfaces as shown

in the following figure.

 

 

OSN 3500

OSN 3500

  • Check whether broadcast packets cannot be transmitted from the UNI to the PW interface, whereas
  • unicast packets can be transmitted successfully from the UNI to the PW interface.

If the preceding four verifications are met, the board is faulty.

 

[Root Cause]

Due to software bugs of the SD8850 chips used in the OptiX OSN3500/7500/7500 II, when a port of these

data boards in a LAG is configured in an ELAN, the upstream broadcast packets from a UNI cannot be copied

to the board. Therefore if a PW interface and a UNI are on the same board, broadcast packets cannot be

transmitted from the UNI to the PW interface.

 

[Impact and Risk]

When the trigger conditions are met, broadcast packets cannot be transmitted successfully from the UNI to

the PW interface.

 

[Measure and Solutions]

Recovery measure:

Modify configurations so that the UNI and the PW interface are not on the same board.

Preventive measure:

Plan networking so that the UNI and the PW interface are not on the same board.

Solution:

Upgrade to V200R0013C00SPC100 or a later version.

 

[Inspector Applicable or Not]

Inspection case name:

Broadcast packets cannot be transmitted successfully on several HMSTP data boards

 

[Rectification Scope and Time Requirement]

N/A

 

[Rectification Instruction]

N/A

How The SSE3LWF Board Reports the OTU_LOF Alarm When Interworking with the SSE1TMR

The SSE3LWF board reports the OTU_LOF alarm when interworking with the SSE1TMR.

Product

OptiX BWS 1600G

Fault Type

Equipment Interconnection

Optical Transponder Unit

OTU_LOF

Symptom


The SSE1TMR board at station B is forced to emit light during the commissioning. Then, the

SSE3LWF board at station C reports the OTU_LOF alarm. Perform an inloop on the WDM side

of the SSE3LWF at station C. The services are normal. The adjacent channels are also normal.

In addition, it is ensured that the alarm is not caused by dispersion problem. The OTU_LOF

alarm persists.

Cause Analysis

The SSE1TMR supports AFEC and FEC encoding/decoding modes and the FEC auto-adaptation

function. The board selects the FEC encoding/decoding mode automatically according to the FEC

mode of the input signals. No manual setting is required.

When the SSE1TMR reports R_LOS and is forced to emit light, note that the FEC mode when the

board inserts ODU_AIS is the FEC mode that the board tries to perform adaptation at the last time.

If the FEC mode of the TMR is not the same as the FEC mode of the downstream LWF, the LWF

reports the OTU_LOF alarm. If the FEC mode of the TMR is the same as the FEC mode of the

downstream LWF, the downstream LWF reports the ODU_AIS alarm.

Procedure

Enable the WDM-side laser of the SSE3LWF at station A and make sure that the input optical power

of the SSE1TMR is normal. The SSE1TMR automatically selects the proper FEC mode. Then, the

OTU_LOF on the downstream board is cleared.

Reference Information

None.

Warning of Unexpected NE Resets Resulted from Resetting Boards Without a CPU on MSTP NEs

[Problem Description]

Triggering conditions

l An NE houses N4GSCC boards.

l The NE runs a version involved.

l A technician reseats a board without a CPU or its serving processing board on the NE.

The following lists boards without a CPU:

Interface boards: SSN1D75S, SSN1D12S, SSN1D34S, SSN1EU08, SSN1EU08A, SSN1OU08, SSN1TSB4,

SSN1TSB8, SSN1EU04, SSN1C34S, SSN1MU04, SSN2OU08, SSN1D12B, SSN1ETF8, SSN1EFF8, SSN1ETS8,

SSN1DM12, SSN1ETF8A, SSN1ETF8A, SSR1L75S, SSR1L12S, SSN1EOD41, SSN1PETF8, SSN1CQ1, SSN1MD75,

SSN1MD12, SSN1PEFF8, TNN1CO1, TNN1ETMC, TNN1AFO1, TNN1D75E, TNN1D12E, SSN1CMD2, and

TNN1EFF8.

Service boards: SSN1FIB, SSN1MR2A, SSR1MR2B, SSN1MR2C, TNN1XMD2, and DCU.

Auxiliary boards: SSQ2PIU, SSN1PIU, SST1PIU, SSR2PIU, SSN1FAN, SSN1FANB, SSN1FANA, SSN1FANC,

and SSN1RPWR.

Symptoms

The system control boards of a problematic NE reset unexpectedly, and the NE is unreachable to the NMS

within a short time.

Identification method

Check whether the unexpected resets cause the NE being unreachable to the NMS.

Check whether the reset task is tVos1s.

Run the :errlog command, and check whether the following exception information occurs within the period

during which the NE is unreachable.

No.6: 2014-8-20 7:5:57

BOARD=24 TYPE=0xf0000010 SOFTTYPE=2

# 2014-8-20 7:5:13

ExcptErr: taskname=tVos1s, pc=235970, SR=1230, FaultAddr=300, DataBuff=2dcc26c

SymBol::ResetCommTaskFun__10CCommonCmdS EIPADDR=0x2399ac, Depth=0, 7:5

SymBol::Vos1sTask__Fv EIPADDR=0x1aa4664, Depth=1, 7:5

SymBol::tskAllTaskEntry EIPADDR=0x1a69510, Depth=2, 7:5

SymBol::vxTaskEntry EIPADDR=0x1b79070, Depth=3, 7:5

SymBol::0×0 EIPADDR=0000, Depth=4, 7:5

SymBol::0×0 EIPADDR=0000, Depth=5, 7:5

 

[Root Cause]

Compilers of system control boards of the N4 type are defective. As a result, when a board without a CPU is

removed, deregistering the callback function fails and the pointer resides. When the board is installed or removed

again, the corresponding task fails occasionally, resulting in a reset.

 

[Impact and Risk]

NEs involved are unreachable to the NMS within a short time and cannot respond to operations in time.

 

[Measures and Solutions]

Recovery measures

NEs involved can automatically recover.

Workarounds

None

Preventive measures

Upgrade NEs involved to V100R008C02SPC500 or a later R008 version or V100R010C03SPC203 or a later version.

Material handling after replacement

None

 

[Inspector Applicable or Not]

None

 

[Rectification Scope and Time Requirements]

None

 

[Rectification Instructions]

None

 

[Attachment]

None

 

What’s the alarm on the OptiX OSN 1800II board restarts after power-off

Summary: When the OptiX OSN 1800II device restarts after power-off, the SCC board

reports the HARD_BAD and MOD_COM_FAIL alarms and the HARD_BAD and

MOD_COM_FAIL alarms persist.

Product Family: WDM     Product Model: OptiX OSN 1800II

[Problem Description]

Trigger conditions:

This problem occasionally occurs (about a 1% probability) when OptiX OSN 1800II devices

using V100R005C10SPC100 or earlier versions restart after power-off.

Symptoms:

l The faulty SCC board reports the HARD_BAD and MOD_COM_FAIL alarms and the

HARD_BAD and MOD_COM_FAIL alarms persist.

l Impact on services:

− For the NE with only one SCC board: If this problem occurs, all data services on the local

NE are interrupted and cannot be automatically restored.

− For the NE with dual SCC boards: Data services on OTU boards can be automatically restored

after service interruption for 15 minutes. Data services connected to the faulty board are interrupted

and cannot be automatically restored.

Identification method:

Step 1 Check the alarms reported on the live network. If the HARD_BAD and MOD_COM_FAIL

alarms are displayed, go the Step 2.

Step 2 Perform a cold reset on the faulty board to identify the problem. If the fault is rectified after

the cold reset, this problem can be identified.

 

[Root Cause]

The resetting process of chips on boards using V100R005C10SPC100 and earlier versions has

problems, which result in that the board power-on occurs at a certain probability and the central

switching board cannot go online.

 

[Impact and Risk]

l For the NE with only one SCC board: If this problem occurs, all data services on the local NE are

interrupted and cannot be automatically restored. The alarms persist.

l For the NE with dual SCC boards: Data services on OTU boards can be automatically restored

after service interruption for 15 minutes. Data services connected to the faulty board are

interrupted and cannot be automatically restored. The alarms persist.

 

[Measures and Solutions]

Recovery measures:

l Cold reset the faulty SCC board.

Workarounds:

None

Preventive measures:

Upgrade the OptiX OSN 1800II to V100R005C10SPC200 or a later version (The NMS must

be upgraded to V200R014C50SPC200 accordingly).