Monthly Archives: November 2015

Precaution About PTN series Service Interruption

Abstract: When deploying VPLS service, E-LAN loop will cause broadcast storm, MAC address flapping,

at last it cause service interruption.

Product Line: Carrier IP                       Product Family: PTN
Product Model: PTN3900&3900-8&PTN1900&PTN960&PTN950&PTN910

Trigger conditions:

When Deploying VPLS services, we must avoid loop, such as PW , UNI interface and external device.

PW logical loop in a single VPLS. When two parallel homologous and homoclinic PWs are added into

VPLS without Split Horizon Group configured, it will certainly cause a loop.

 

PTN_1

 


PTN_2

Loop is generated by VPLS and other service or other device.

 

PTN_3

VPLS UNI loopback. For example, when two UNI connects to each other, or go to the same switch/OLT

ELAN, or port loopback, loop will be generated.

PTN_4

 

PTN_5

Fault appearance:

When a large amount of broadcast traffic occurs on link of VPLS, causing broadcast storm, link

congestion, or MAC address flapping and finally cause service interruption.

 

Judgment method:

Where the link of VPLS goes far beyond the usual flow of broadcast packets. In normal situation,

broadcast and multicast traffic are relatively low. When the problem happens, you can see lots

broadcast traffic through the port RMON performance, such as 1MB or more on FE. Checking

method is as the figure below.

 

PTN_6

 

Traffic congestion occurs. Alarm like FLOW_OVER, ETH_TX_FLOW_OVER, ETH_RX_FLOW_OVER

ETH etc are reported MAC address flapping appears. PW or UNI learn the wrong MAC address. When

continue query the MAC address self-learning table, it can be found that a MAC address learned from

different interface (PW/UNI) from time to time.

PTN_7

 

Root Cause:

When loop is generated in VPLS, the broadcast traffic will be numerously copied and cause broadcast

storm. Meanwhile, it will cause MAC address flapping.

Impact and Risk:

Broadcast storm will generate congestion on link and cause other service interruption, MAC address

flapping will affects the service itself

TwitterLinkedInGoogle+FacebookPinterestTumblrStumbleUponRedditShare

Notice on Prewarning for the PATCH_ERR Alarm Falsely Reported on the NG SDH V100R008C02SPH203

Summary: On an OptiX OSN3500  V100R008C02SPH203 that houses active and standby system control boards, the standby system control board periodically performs file patrol and reports version information to the active system control board to synchronize patch files. The two operations conflict occasionally, resulting in patch file synchronization failure and the reporting of the PATCH_ERR alarm. The alarm is automatically cleared after patch file synchronization succeeds in the next period.
Product Line: Transport network product line         Product Family: MSTP products
Product Model: OptiX OSN 1500/2500/3500/3500 II/7500             Keywords: Optix OSN 3500
[Problem Description]
Trigger conditions:
The problem occurs when the following conditions are present:
The OptiX OSN 1500/2500/3500/3500 II/7500 runs the V100R008C02SPH203 patch.
The OptiX OSN 1500/2500/3500/3500 II/7500 houses active and standby system control boards, SSQ3CXLN, SSR1CXL, SSR2CXLN, SSN3GSCC, SSN3GSCC-B2, or SSN4GSCC.
Symptom:
The NE reports the PATCH_ERR alarm occasionally and the alarm persists for 10 to 20 minutes.
Identification method:
Run the following commands or query the version number on the NMS to check whether the involved NE is an OptiX OSN 1500/2500/3500/3500 II/7500 V100R008C02SPH203.

OSN3500

OSN3500

 

Run the following command to check whether the active and standby system control boards run properly.

OSN3500

Check historical alarms and find that the following alarm record or records are displayed and no current PATCH_ERR alarm exists.

OSN3500

[Root Cause]
The OptiX OSN 3500 runs the V100R008C02SPH203 patch. The standby system control board reports version information to the patch package on the active system control board every 23 or 60 minutes to request patch file synchronization. This operation occasionally conflicts with periodical file patrol (every 30 minutes by default) performed on the standby system control, causing synchronization.
failure and PATCH_ERR alarms. After the patch file synchronization succeeds in the next period, the alarm is automatically cleared.

OSN3500

[Impact and Risk]
The NE falsely reports the PATCH_ERR alarm occasionally whereas patches on the standby system control board are normal.
[Measures and Solutions]
Preventive measures:
The following three preventive measures are available:
Method 1:
Disable file patrol on the standby system control board. Enable file patrol one hour later. This method decreases the possibility that PATCH_ERR alarms are falsely reported.
Disable file patrol on the standby system control board by running the following command.

OSN3500

SlaveBid indicates the slot ID of the standby system control board.
File patrol is successfully disabled if the following results are queried and the file patrol state is displayed as USERSTOP.

OSN3500

Enable file patrol on the standby system control board one hour later.

OSN3500

File patrol is successfully enabled if the following results are queried and the file patrol state is displayed as START.

OSN3500

—-End
Method 2:
Lower the severity of the PATCH_ERR alarm at corresponding sites.
Method 3:
Mask the PATCH_ERR alarm on the NMS at corresponding sites. Enable the alarm after the standby system control board is reseated or replaced.
Recovery measures:
The PATCH_ERR alarm is automatically cleared within 10 to 20 minutes after it is reported.
Solutions:
Upgrade the NE to V100R008C02SPH505 and modifies the file patrol period on the standby system control board to 24 hours.
Upgrade the NE to V100R010C03 which optimizes patch package synchronization and avoids conflicts with file patrol on the system control board.
Material handling after replacement:
N/A
[Inspector Applicable or Not]
N/A
[Rectification Scope and Time Requirements]
N/A
[Rectification Instructions]
N/A
[Attachments]
N/A

Guide to Clearing the DBMS_ERROR Alarm Reported After an Upgrade Due to the Residual SNCP Pointe

Constraint: This recovery guide is applicable to V200R011C03SPC200, V200R012C00, and V200R012C01 only

1 Identifying the Problem

1.1 Querying the Database

Run the :dbms-query:”23310000.dbf”,mdb command. The query result is shown in the following attachment: “Result.txt”

1.2 Checking Whether a Residual SNCP Pointer Exists

As shown in the attachment in 1.1 “Querying the Database”, the 3112103 field is 00 but the 23312301 field is not 00000000, so the SNCP pointer exists.
Records 1, 2, 5, and 6 need to be recovered, as shown as follows:

SNCP

SNCP

2 Recovering the Database

2.1 Identifying the Recovery Mode

2.1.1 V200R011C03SPC200, V200R012C00, or V200R012C01

If the NE version is V200R011C03SPC200, V200R012C00, or V200R012C01, run the :alm-get-curdata-ext:0,DBMS_ERROR,0; command to check whether the DBMS_ERROR alarm is reported.
If the DBMS_ERROR alarm is not reported, perform the steps in 2.2 “Recovery Method When No DBMS_ERROR Alarm Is Reported”; if the DBMS_ERROR alarm is reported, perform the steps in “Recovery Method When the DBMS_ERROR Alarm Is Reported”.

2.1.2Versions Earlier than V200R011C03SPC200

If the NE version is earlier than V200R011C03SPC200 and a residual SNCP pointer exists, check whether the DBMS_ERROR alarm will be reported after the upgrade. If the DBMS_ERROR alarm is not reported, perform the steps in “Recovery Method When No DBMS_ERROR Alarm Is Reported” after normally upgrading the NE; if the DBMS_ERROR alarm is reported, the activation of the processing board will time out, so perform the steps in “Recovery Method When the DBMS_ERROR Alarm Is Reported”.
The method to check whether the DBMS_ERROR alarm will be reported after the upgrade is as follows:
In the query result, the record of the residual SNCP pointer exists. If the value of the 23312301 field is displayed twice, the DBMS_ERROR alarm will be reported after the upgrade; if not, the DBMS_ERROR alarms will not be reported.
The query result in 1.1 “Querying the Database” is used as an example. Confirm that records 5 and 6 are residual data by referring to 1.2 “Checking Whether a Residual SNCP Pointer Exists”. The value of the 23312301 field is c7240000. In the query result, c7240000 is displayed twice. The DBMS_ERROR alarm will be reported.

SNCP

SNCP

2.2 Recovery Method When No DBMS_ERROR Alarm Is Reported

2.2.1 Running Commands to Recovery the Database

:cfg-gsp-test:”modifyPO 23310000 poid 23312301 00000000″
Poid indicates the decimal value converted from the hexadecimal value of the 14112301 field in the residual SNCP pointer records. Values of other fields are fixed (pay attention to the space between data).
For records 1 and 2, the decimal values for 2469 and 246b in the 14112301 field are 9321 and 9323 respectively.
Run the following two commands:
:cfg-gsp-test:”modifyPO 23310000 9321 23312301 00000000″
:cfg-gsp-test:”modifyPO 23310000 9323 23312301 00000000″
Recover all residual records in sequence.

2.2.2 Checking Recovery Results

Run the :dbms-query:”23310000.dbf”,mdb command to confirm that all residual SNCP pointer records are recovered.

2.2.3 Saving the Database

Run the :dbms-copy-all:drdb,fdb command.
The recovery is complete.

2.3 Recovery Method When the DBMS_ERROR Alarm Is Reported.

2.3.1 Temporarily Allowing External Operations

Run the :sm-set-nebusy:0,0,0,0,none command.
When the DBMS_ERROR alarm is reported, no external operation is allowed on the NE. After the command is executed, you can perform operations on the NE in five minutes. If the residual records are not recovered after five minutes, you need to run the command again. Do not configure the NE during this period.

2.3.2 Running Commands to Recovery the Database

:cfg-gsp-test:”modifyPO 23310000 poid 23312301 00000000″
Poid indicates the decimal value converted from the hexadecimal value of the 14112301 field in the residual SNCP pointer records. Values of other fields are fixed (pay attention to the space between data).
For records 1 and 2, the decimal values for 2469 and 246b in the 14112301 field are 9321 and 9323 respectively.

2.3 Run the following two commands:

:cfg-gsp-test:”modifyPO 23310000 9321 23312301 00000000″
:cfg-gsp-test:”modifyPO 23310000 9323 23312301 00000000″
Recover all residual records in sequence.

2.3.3 Waiting for Alarm Clearance and Completing Active/Standby Backup

After completing all recovery operations, wait for about 45 minutes. Run the :alm-get-curdata-ext:0,DBMS_ERROR,0; command to confirm that the DBMS_ERROR alarm is cleared. Run the :hbu-get-backup-info command and confirm that the returned value is 3.
The recovery is complete.
If the returned value is not 3, contact Huawei’s engineers.

2.4 Upgrade and Recovery Measures for the NE on Which the DBMS_ERROR Alarm Is Reported After an Upgrade

The upgrade steps are adjusted. The system control board and processing board are loaded and activated respectively. Upgrade the system control board. After the fault is rectified, upgrade the processing board.

2.4.1 Upgrading Active and Standby System Control Boards Based on Simulation Package Loading

When confirming the simulation loading and activation sequence, for the group where the system control board is not located, select Pause Before Current Group, as shown as follows:

After the system control board upgrade is complete, the NE stops the activation of more boards.

configuration activation

configuration activation

2.4.2 Temporarily Allowing External Operations

Run the : sm-set-nebusy: 0,0,0,0,none command.
External operations are not allowed on the NE during backup. After the command is executed, you can perform operations on the NE in about five minutes. If the residual records are not recovered after five minutes, you need to run the command again. Do not configure the NE during this period.

2.4.3 Running the Command to Recovery the Database

:cfg-gsp-test:”modifyPO 23310000 poid 23312301 00000000″
Poid indicates the decimal value converted from the hexadecimal value of the 14112301 field in the residual SNCP pointer records. Values of other fields are fixed (pay attention to the space between data).
For records 1 and 2, the decimal values for 2469 and 246b in the 14112301 field are 9321 and 9323 respectively.
Run the following two commands:
:cfg-gsp-test:”modifyPO 23310000 9321 23312301 00000000″
:cfg-gsp-test:”modifyPO 23310000 9323 23312301 00000000″
Recover all residual records in sequence.

2.4.4 Waiting for Alarm Clearance and Completing Active/Standby Backup

After about 45 minutes, run the :alm-get-curdata-ext:0,DBMS_ERROR,0; command to confirm that the DBMS_ERROR alarm is cleared. Run the :hbu-get-backup-info command and confirm that the returned value is 3.
If the returned value is not 3, contact Huawei’s engineers.

2.4.5 Continuing Tasks and Completing Processing Board Activation

Board Activation

Board Activation

The recovery is complete.

Prewarning for Maintenance of the AC Power Board on the MA561x

Summary: When replacing the AC power board PAIA/PAIB on the MA5610/MA5616, if the operator

removes the power board without powering off it beforehand (removing the 220 V AC power connector

on the panel), there are potential safety hazards.

Product Line:  Access Network                   Product Model: MA5616

Keywords: MA5610/MA5616   PAIA  PAIB  AC power board  security specifications

Problem Description 

Trigger conditions

When replacing of the AC power board PAIA/PAIB on the MA5610/MA5616, the operator removes the AC

power board without powering off the board beforehand (disconnecting the 220 V AC power plug from the

panel).

Symptom

Removing the power board before disconnecting the 220 V power plug from the board panel causes

potential safety hazards.

Identification method

The MA5610s/MA5616s installed with the PAIA or PAIB power boards that are delivered to sites before

September 1, 2009 are affected.

 

Root Cause

The safety specifications of the H831PAIA/H831PAIB AC power boards used on the MxUs cannot fully

meet the requirements, which may produce potential safety hazards to operators who remove the power

boards without complying with safety rules.

 

Impact and Risk

There are potential safety hazards.

 

Measures and Solutions

Preventive measures

The safety specifications of the AC power boards delivered after September 2009 have been strengthened.

Recovery measures

Use the inspection tool to check the boards on the live network. Rectify the board with potential safety

hazards.

For the inspection tool script, detailed delivery data, rectification method, contact CS coordinator Xue

Guoqiang (Employee ID: 98735) for China market and Nie Bo (Employee ID: 92548) for markets outside

of China.

 

Prewarning Expiration Condition

This prewarning will expire after the board check is complete.