Monthly Archives: December 2014

Cautions for Unavailable Services After N2EFT8 & N2EFT8A board is Connected to Third-Party Equipment

Abstract:
Since the Generic Framing Procedure (GFP) for the N2EFT8/ N2EFT8A board strictly verifies the GFP frames from third-party huawei transmission equipment after the board is connected to third-party , all frames are discarded and services are unavailable.

[Problem Description]
Trigger conditions:
1. The following three conditions are met during deployment:
(1) The NE software is a version earlier than V100R010C03SPC202.
(2) The board uses the GFP encapsulation mode.
(3) The NE is connected to third-party equipment.

2. The following three conditions are met during maintenance:
(1) The board uses the GFP encapsulation mode.
(2) The NE is connected to third-party equipment.
(3) The N1EFT8/N1EFT8A board is replaced with N2EFT8/N2EFT8A board in the version earlier than V100R010C03SPC202.

Fault symptom:
Services are unavailable after the N2EFT8/N2EFT8A board is connected to third-party equipment.
Identification method:
1. The NE software is a version earlier than V100R010C03SPC202.
2. The board uses the GFP encapsulation mode.
3. The NE is connected to third-party equipment.

[Root Cause]
The chip has a defect. The x4140 register value of the N2EFT8/N2EFT8A board is 0xacfb and the corresponding bit5/bit6 is 1. This triggers a consistency check for payload headers as shown in the following figure. If the GFP frames sent from the peer end fail to meet related requirements, they are discarded.

root case

 

[Impact and Risks]
Services are interrupted when related requirements are met.

[Measures and Solutions]
Recovery measures:
None
Workarounds:
1. Change the encapsulation mode to LAPS.
2. Change the GFP type of third-party equipment to 0×1001 and the HEC check value of the type field to 0×1352.

Solutions:
The hardware triggers the problem, which can be prevented using software. Upgrade the NE version to V100R0010C03SPC202 or a later version.
If the current NE version is V100R008C02SP200, a weak mapping mode can be used. Upgrade the software of the N2EFT8/N2EFT8A board only. for more Huawei products please go check Huawei Transmission supplier
Solution of material handling after replacement:
N/A
[Rectification Scope and Time Requirements]
N/A

 

 

 

TwitterLinkedInGoogle+FacebookPinterestTumblrStumbleUponRedditShare

Cautions for Recurring K Byte Caused by Optical Path Jitter of the RMS of the NG-SDH Product

[Problem Description]
Trigger conditions:
The RMS is in the switchover state, and the status of the Huawei optical fiber in the switchover section jitters. The MSprotocol repetitively switches between the state of waiting for recovery and the switchover state. As a result, the Kbyte in the loop recurs.

Fault symptom:
Scenario 1:
The services that pass through the switchover section remain interrupted. The MS protocol at the switchover site
stops and starts once every second. The APS_INDI alarm remains active, but does not change with the protocol.
The fault persists even after the optical path does not jitter.
Scenario 2
The services that pass the switchover section are interrupted and then automatically recovered.

Identification method:
1. Check whether the equipment is of versions listed in the “Related Version” row.
2. If the services are interrupted, query the BB9.log file of each cross-connect board in the MS ring. If a
cross-connect board has the following record, it indicates that a fault exists.
:log-query:10,”bb9.log”
bb9.log 2009-12-16 11:49:10 Level:2, RpsAdpt.cpp, Line:7404, PgId[1] RpsRestartProtocol
Success!
bb9.log 2009-12-16 11:49:15 Level:2, RpsAdpt.cpp, Line:7404, PgId[1] RpsRestartProtocol
Success!

[Root Cause]
To solve the problem of possible cross-connect board reset when the entire ring is in the pass-through state and
the K byte recurs frequently, the line board judges whether a frequently recurring K byte is received.
If a frequently recurring K byte is received, the line board requires the cross-connect board to restart the protocol
for self-healing.
If the line board does not judge whether a recurring K byte is received for multiple times in a second during
handling, the line board requires the cross-connect board to restart the protocol when a recurring K byte appears
for a long time.

[Impact and Risks]
When the optical path status changes frequently, the K byte recurs, leading to protocol restart. If the MS is in the switchover state, the services are interrupted transiently. If the K byte cycle is broken after protocol restart, the services are automatically recovered. Otherwise, the services remain interrupted.

[Measures and Solutions]
Recovery measures:
No intervention is required if the services are automatically recovered.
Recover the services as follows if the services remain interrupted:
1. Shut down the lasers at both ends of the switchover section.
2. Then clean the historical K bytes in all the line boards which belong to the MSP ring;
Use the following command:
:optp:bid (hex),0,9b,1,08,bc,portid(hex ),mspportid (hex, usually is 0)
OR soft reset all the line boards belong to the MSP ring.
3. Perform forcible switchover after the protocol restarts successfully.
4. Switch on the lasers. Cancel the forcible switchover after the status of the optical path in the switchover section becomes steady.

Workarounds:
None
Solutions:
Upgrade the NE software to the following versions:
V1R7C02SPC011 (5.xx.17.31p03) and later V1R7 version;
V1R8C02B01L (5.xx.18.50) + V1R8C02SPH011 and later V1R8 version;
V1R9C04SPC100 (5.21.19.22P01) and later version for MSTP TDM version;
V2R11C00 and later MSTP + version;
It is suggested to upgrade to the latest recommended maintenance version.

[Rectification Scope and Time Requirements]
If the V100R008C02B01L (5.**.18.50) version is adopted, the hot patch must be installed for rectification.

[Rectification Guide]
Install the hot patch of the V100R008C02SPH011 version on the cross-connect board to rectify the fault.
See the《 OptiX NG-SDH Product Hot Patch Version Upgrade Guide (Toolkit)-20091202-B》, or 《OptiX NG-SDH Product Hot Patch Version Upgrade Guide (Navigator)-20091202-B》.

Be Ware of Software Watchdog Reset on the System Control Board of OptiX OSN Equipment

Summary:
The software watchdog on the system control board of OSN Products an NE is occasionally reset when ASON gold services are created, rerouted, optimized, and changed among protection levels.

[Problem Description]
Trigger conditions:
New gold services are created or the existing gold services are rerouted, optimized, or changed among protection levels.

Symptom:
NEs are unreachable and abnormally reset.
Identification method:
1. A version that has the problem is used at the site.
2. The preceding trigger conditions are met.
3. The preceding symptom occurs.
4. Run the :mon-get-errlog:bid; (bid indicates the slot ID of the active system control board) command. The in Soft Dog Reset information is found in the reset log, as shown as follows:
fatal task errorcode=110009, Line 0 in Soft Dog Reset # 2012-7-16 8:3:33
5. Query the debugbuf.Log. The records similar to followings are found, and the tIonNbsSch task uses more than 90% of CPU resources. From the following records, it can be seen that the tIonNbsSch task uses 98.80% of CPU resources, higher than that of any other tasks.

QQ图片20141218101247

Note:
debugbuf.log: Run the :log-query:bid,”debugbuf.log”; command (bid indicates the slot ID of the active system control board).

[Root Cause]
During the route computation of gold services, two trails with the same cost but different numbers of hops exist. In this case, the code processing error occurs, resulting in the software watchdog reset on the NE.
[Impact and Risk]
Fiber cut occurs during the ASON NE reset and services are interrupted.

[Measures and Solutions]
Recovery measures:
None
Preventive measures:
Change ASON gold services to silver or diamond services.
Solution:
Upgrade the NE to the following versions or install corresponding hot patches.
NG SDH: V100R008C02SPC200+SPH203
NG SDH: V100R008C02SPC500+SPH505
NG SDH: V100R010C03SPC203 and later versions
OCS 9500: V100R006C03SPC200+SPH206 (If the JSCC board is used on the live network, replace it with the ESCC board.)
OCS 9500: V100R006C05SPC203 and later versions

[Rectification Scope and Time Requirements]
N/A
[Rectification Guide]
N/A
[Attachment]
N/A
[Inspector Applicable or Not]
N/A

Be ware of Warm Reset of the System Control Board on OptiX OSN Equipment

Summary:
The ASON network is unstable. Fiber cut frequently occurs and ASON services are rerouted constantly. In this case, the system control board of an NE is prone to be reset when network resources are insufficient or residual cross connections exist.

[Problem Description]
Trigger conditions:
The problem is triggered if the following conditions are met:

  •  The network is an ASON network and fiber cut frequently occurs.
  • There are tunnel services and reroute failure events in the ASON domain.
  • There are CP_TEL_PATH_MIS and CP_TEL_MSP_MIS alarms in the ASON domain.

Symptom:
The system control board is abnormally reset. Based on the network scale, this problem occasionally occurs on the SSN1GSCC and board, and rarely occurs on the SSN4GSCC board. The NE is transiently unreachable to the NMS when the system control board is being reset.

Huawei SSN1GSCC Board

Identification method:
1. A version that has the problem is used at the site.
2. The preceding trigger conditions are met.
3. The preceding symptom occurs.
4. There are a large number of rerouting failure events in the abnormal events on the NMS. The error code is 40510 (indicating the label allocation failure) and the faulty NE is the NE where the system control board is abnormally reset.

[Root Cause]
When the labels fail to be allocated to the last node of the tunnel service, an error occurs in internal code processing, so the applied memory is not released, resulting in memory leak. The memory of the system control board is used up due to the long-time rerouting label allocation failure. Therefore, the system control board is reset because it fails to apply for memory.

[Impact and Risk]
Fiber cut occurs during ASON NE reset, and services are interrupted.

[Measures and Solutions]
Recovery measures:
None
Preventive measures:

  •  Timely handle the CP_TEL_PATH_MIS and CP_TEL_MSP_MIS alarms in daily maintenance.
  • If services are frequently rerouted, clear corresponding port alarms and channel alarms as soon as possible to prevent rerouting failures.
  • Solution:
    Upgrade the NE to the following versions or install corresponding hot patches.
    NG SDH: V100R008C02SPC200+SPH203
    NG SDH: V100R008C02SPC500+SPH505
    NG SDH: V100R010C03SPC203+SPH209
    NG SDH: V100R010C03SPC208 and later versions
    OCS 9500: V100R006C03SPC200+SPH206 (If the JSCC board is used on the live network, replace it with the ESCC board.)
    OCS 9500: V100R006C05SPC203 and later versions

[Rectification Scope and Time Requirements]

N/A

[Rectification Guide]
N/A
[Attachment]
N/A
[Inspector Applicable or Not]
N/A

Pay attention for SSN1EAS2 Board Occasional Packet Loss or Ethernet Service Interruption on NG SDH Products

Problem Description
Trigger condition:
1. SSN1EAS2 boards use FPGA 110.
2. Cross-connect boards in slot 10 are active cross-connect boards.
3. The PCB of a cross-connect board is SSN1SXCS1, SST1PSXCSA1, SST1PSXCSA, SST2PSXCS, or SSN2SXCS.
4. When the proceeding three conditions are all met, the problem occurs occasionally (about 10% probability) and is determined by SSN1EAS2 boards and cross-connect boards.

EAS2Note:

During an upgrade on the live network, the problem is easily triggered when the cross-connect board in slot 10 replaces that in slot 9 as an active one.
When an SSN1EAS2 board works with a cross-connect board whose PCB is SSN1SXCS1, SST1PSXCSA1, SST1PSXCSA, SST2PSXCS, or SSN2SXCS, there is a high probability that the problem may occur. When an SSN1EAS2 board works with other types of cross-connect boards, this problem never occurs on live networks and in test environment. For details about risky types of cross-connect boards, see the Risky Types of Cross-Connect Boards That May Encounter Header Jitters on FPGA 110 of SSN1EAS2 Boards When They Work Together.

Symptoms:
1. Some packets of Ethernet services on an SSN1EAS2 board are lost, or Ethernet services are interrupted.
2. The board may repeatedly or occasionally reports service alarms related to huawei SDH or GFP services, such as B3_SD, HP_UNEQ, HP_RDI, T_LOSEX, ALM_GFP_DLFD, and FCS_ERR.
3. The active and standby cross-connect boards may report BUS_ERR alarms simultaneously, and alarm parameters indicate that the SSN1EAS2 board caused the alarm.
Cold reset cannot resolve the problem nor trigger the problem.
Identification methods:
1. If all of the following conditions are met, these fault symptoms are most probably caused by the SSN1EAS2 board:
− An SSN1EAS2 board uses FPGA 110.
− The cross-connect board in slot 10 is the active one.
− The cross-connect board is the type listed in the attachment Risky Types of Cross-Connect Boards That May Encounter Header Jitters on FPGA 110 of SSN1EAS2 Boards When They Work Together.

2. After the cross-connect board in slot 10 replaces that in slot 9, if fault symptoms on the SSN1EAS2 board are cleared, the problem is caused by the SSN1EAS2 board.

[Root Cause]
The design of the FPGA on an SSN1EAS2 board for cross-clock-domain has bugs. When the cross-connect board in slot 10 function as the active one, the headers output to the MAPPER and VSC9128 chips have jitters at the period of 77 Mbit/s. As a result, the VSC9128 chips fail to correctly receive service signals from the cross-connect boards. In addition, the services signals transmitted to the cross-boards may have jitters, resulting in bidirectional packet loss or service interruption.

[Impact and Risk]
Some packets of Ethernet services are lost, or Ethernet services are interrupted in the upstream and downstream direction.
Measures and Solutions
Recovery measures:
Replace the cross-connect board in slot 10 with that in slot 9 as the active one.
Workarounds:
When an SSN1EAS2 board uses FPGA 110 or earlier, avoid using the cross-connect board in slot 10 as the active cross-connect.

Preventive Solutions:
1. Upgrade the FPGA (BOM: 05020AAE) used on an SSN1EAS2 board to version 120 or later, because the FPGAs resolve the design bugs for cross-clock-domain.
− For V100R008 and V100R009 versions, upgrade the device to V100R010C03SPC203 or later.
− For V100R010 versions, upgrade the device to V100R010C03SPC203 or later.
− For V200R011 versions, upgrade the device to V200R011C02SPC106 or later.
− For V200R012 versions, upgrade the device to V200R012C00SPC101/V200R012C01 or later.
During an upgrade, the board version needs to match the device version specified in the version mapping.

On the NE whose version is V100R010C03SPC202, the SSN1EAS2 board can use the software of V100R010C03SPC203 in a weak mapping mode. The software includes BIOS, board software, FPGA, and EPLD.

2. SSN3EAS2 boards can be used. Both SSN1EAS2 and SSN3EAS2 boards are 10GE Ethernet service processing boards. When using an SSN3EAS2 board, ensure that the device version supports the SSN3EAS2 board.

 

Notice of iManager U2000/N2000 BMS/T2000 Version Incorporation

Summary:
The delivery and maintenance of NMS products face many challenges, and NMSs run in a complex environment. NMS versions need to be incorporated to eliminate NMS running risks, prevent major accidents, improve customer experience, and enhance customer satisfaction.

[Problem Description]
The U2000 V100R002C01, U2000 V100R003C00, U2000 V100R005C00, N2000 BMS V200R012C03, N2000 BMS V200R012C05, T2000 V200R006C03, T2000 V200R007C03 have been in service for a long time. The latest patches provided by Huawei resolve many major issues.
1. For the U2000 V100R002C01, the following issues occur on the live network:
− The U2000 client fails to be upgraded using the client automatic upgrade (CAU) tool.
− The number of consumed licenses doubles, which is abnormal.
− Massive data is stored in the FTP directory, resulting in insufficient disk space.
− Certain NE functions are unavailable if the NE licenses are enabled.
− Starting routers (including switches) during the polling period sometimes causes a core dump on the NE management process.
− Taking NEs offline sometimes causes a core dump on the NE management process.
− Discrete cross-connections are generated after ASON rerouting.
− The database cannot be restored.
2. For the U2000 V100R003C00, the following issues occur on the live network:
− Users cannot log in to the U2000 when electronic label export is in progress.
− The U2000 shuts down unexpectedly.
− A large number of NEs are disconnected intermittently.
− Automatically discovered ONTs cannot be confirmed.
− Incorrect optical module information is displayed for more than 5000 NEs.
− Services are interrupted after a user deletes tunnels that carry MS-PWs.
− Services are interrupted after a user deletes server trails.
− The database encounters exceptions because of overlong SQL statements.
3. For the U2000 V100R005C00, the following issues occur on the live network:
− A core dump occurs on the performance management system (PMS) process.
− Ethernet port aliases disappear after synchronization.

− When the U2000 manages a datacom NE whose name contains some special characters, the datacom NE panel becomes blank.
− Services are interrupted after a user deletes the tunnels that carry MS-PWs.
− E2E WDM trails fail to be created.
− Nodes fail to be added using the trail cutover automated tool (TCAT).
4. For the N2000 BMS V200R012C03, the following issues occur on the live network:
− The client exits unexpectedly.
− NEs are unreachable.
− The value-added service (VAS) profile cannot be modified using TL1 commands.
5. For the N2000 BMS V200R012C05, the following issues occur on the live network:
− The VAS profile sometimes fails to be applied to ONTs.
− MDUs cannot go online.
− The configuration script management function cannot be used.
6. For the T2000 V200R006C03, the following issues occur on the live network:
− The element management system (EMS) process restarts due to daylight saving time (DST) switching.
− Deleting a service makes other services interrupted.
− Cross-connections cannot be created due to cross-connection ID conflicts.
7. For the T2000 V200R007C03, the following issues occur on the live network:
− A core dump occurs on the NE management process due to subnetwork connection protection (SNCP) switching.
− The ASON-SDH trail licenses are consumed more than they should.
[Workarounds]
No workarounds can be provided for the above-mentioned issues. Version incorporation must be carried out.
[Preventive Measures]
Related representative offices submit rectification applications to incorporate the involved versions.
The following table lists the versions before and after incorporation.

version before incorporation[Rectification Scope and Time Requirements]
1. This rectification notice applies only to global security-insensitive countries and regions. Do not implement this notice in security-sensitive countries and regions. Security-sensitive countries and regions include countries and regions where Vodafone networks are deployed, USA, Canada, Australia, New Zealand, Switzerland, Japan, EEA, and their non-native territories and dependencies. The EEA covers 30 European countries, including France, Italy, Holland, Belgium, Luxembourg, Germany, Ireland, Denmark, the UK, Greece, Portugal, Spain, Austria, Sweden, Finland, Poland, Latvia, Lithuania, Estonia, Hungary, Czech, Slovakia, Slovenia, Malta, Cyprus, Bulgaria, Romania, Norway, Iceland, and Liechtenstein.
2. The U2000, N2000 BMS, and T2000 versions should be incorporated worldwide once this rectification notice is issued. The work must be done before Jun 30, 2014.
3. The version incorporation is not required for the sites that will upgrade NMSs to other versions (not included in the above-mentioned versions). The technical support contacts must record such sites.

[Rectification Instructions]
1. Related representative offices submit applications at Batch Replacement Workflow. For details, see the attachment.

2. During the NMS upgrade or patch installation, the NMS is unavailable but services keep running on NEs. Therefore, in a single-server system, an NMS must be deployed for temporary network monitoring.
3. Engineers must read through the upgrade guide or patch installation guide before performing the NMS upgrade or patch installation. In addition, engineers must strictly follow the guides.
4. For NMSs in the EOS state, engineers in representative offices and customers must upgrade the software or hardware according to Overview of Huawei Network OSS Service Product and Version Lifecycle. These NMSs are not included in this version incorporation.