Monthly Archives: February 2015

Be aware of Lower Order SNCP Services Interruption on NG SDH Products

Summary:
Due to a bug in the NE software of V100R010C03, the lower order ingress bus table is not recovered from the lowxcbus.cfg file after a standby system control board is switched to be the active system control board. As a result, matrix configuration on the active system control board and that on the cross-connect board are consistent. If services are configured after the switchover and a switchover of lower order SNCP services occurs, there is a possibility that the lower order SNCP services are interrupted.

[Problem Description]Fault symptoms:
Lower order SNCP services are interrupted after a switchover of them.
Trigger conditions:
There is a possibility that lower order SNCP services are interrupted after a switchover of them if services are configured or boards are added or deleted after a switchover between the active and standby system control boards.

Identification method:
The problem occurs if all of the following conditions are met:

  • The NE is an NG SDH NE with the version of V100R010C03SPC203 (5.21.20.54/5.36.20.54) or an earlier V100R010C03 version.
  • Lower order SNCP services have been configured.
  • A switchover between the active system control board and standby system control board has occurred.
  • Services or protection is configured or boards are added or deleted after the switchover to trigger verification.
  • A switchover of lower order SNCP services has occurred.

[Root Cause]

After the active and standby system control boards on an NE of the involved version have switched over, the lower order bus table on the current system control board fails to be recovered from the lowxcbus.cfg file due to a bug in the NE software. After the switchover, the current active system control board (the original standby system control board) still uses the original bus table. The table is not updated until the system control board is reset. In this case, data on the active system control board and data in the lowxcbus.cfg file are different. The lower order ingress numbers stored on the active system control board and those stored on the cross-connect board are different. After a switchover of lower order SNCP services, there is a possibility that the services are interrupted.

[Impact and Risks]
There is a possibility that lower order SNCP services are interrupted.

[Measures and Solutions]
Recovery measures: Do the following two steps

  •  Deactivate and then activate the interrupted lower order SNCP services on the NMS.
  •  Warm reset the active system control board, standby cross-connect board and active cross-connect board one by one. Reset the next board after the previous board has got online. If the NE is an ASON NE, pay attention to the precautions mentioned in the appendix.

Workaround:
Use the Inspector tool to check an entire network. If a problematic NE is found, perform the recovery measure and then upgrade the NE to V100R010C03SPC203+V100R010C03SPH205 or a later version.
Version of the Inspector: SmartKit V200R008C00SPC100 or later
Tool upgrade package: Common_Inspector_V200R008_ON_20130111154652680.exeInspector_V200R008_ON_OptiX 10G MADM,OptiX 155_622,+_20130111154652680.exe
Upgrade the SmartKit automatically by downloading the upgrade package according to the reminding information.
Name of the test case: Checking the SNCP configuration whether is consistency between XCS board and BUS
Directory of the test case: OptiX OSN 1500/2500/3500/7500 configuration data check/SNCP configuration check

OptiX OSN 1500 2500 35007500 configuration data

Solution:
Perform the workaround measure before an upgrade and upgrade the target NE to V100R010C03SPC203+V100R010C03SPH205 or a later version.

Material handling after replacement:
None

[Rectification Scope]
Not involved

[Rectification Guide]
None

[Appendix]
Precautions for resetting system control boards on ASON NEs:

  • Backup databases of all ASON NEs on a network. If any exception occurs after a reset, restore services using the backed up databases.
  •  Use the PMI tool to check an NE before a reset. The check focuses on only the ASON data which can be found by choosing Intelligence_Data_Checking > ASON_Data_Analyse_Before_Reset. If any error is reported, contact the contact person.
  • Run the following commands to manually synchronize databases:

:dbms-copy-all:mdb,drdb;

:dbms-copy-all:drdb,fdb0;:dbms-copy-all:drdb,fdb1;

  •  Query the synchronization status between active and standby system control boards by issuing the :hbu-get-backup-info; command. A warm reset can be performed only when the returned value is 0×00000003.
  • Warm reset the active Huawei system control board on the NMS.

 

 

TwitterLinkedInGoogle+FacebookPinterestTumblrStumbleUponRedditShare

Technical Case: Optical Network

Abstract:
This case is face to the emergency situation of Optical Network include Huawei WDM ,NG-SDH, networks and T2000 Double system structure, provides emergency measures for accidents such as running abnormally in T2000, NG-SDH and WDM to guide the maintenance engineers to restore system as soon as possible..
1 OBJECTIVE
Optical network emergency solution is aiming at that when Huawei Devices running at WDM and NG-SDH network get some emergency situation, for example: service trail abnormally, network traffic abnormal, device exceptional warning, packet forwarding failure, board abnormal, interface abnormal and etc. The purpose is when the accident happens, it can provide emergency solution.

2 Network
2.1 Detailed Network Information
Optical network comprises one NG-SDH ASON network and one Backbone WDM network. NG-SDH ASON network is made up of OSN 7500 equipments, and Backbone WDM network is made up of 1600G equipments ,the NG-SDH network is constructed upon 1600G Backbone WDM network, NG-SDH ASON network carried customer diamond Service, such as detail network information in the attachment.
2.1.1.WDM Backbone network topology
WDM Backbone networks are made up of 1600G equipment
DCN Design of WDM Backbone networks detail information.

2.1.2 NG-OSN ASON network topology
NG-SDH ASON Networks is made up of OSN7500 equipment detail main network topology framework as mentioned.

NG-OSN ASON network topology

3.1 Carried Services Analysis
NG-SDH ASON networks carried by WDM Backbone networks, WDM Backbone networks according to different service requirement up and down different wavelength service in the propriety section such as carried networks for customer demand, NG-SDH ASON networks carried ASON service, Currently Configuration all ASON service LSA Level is diamond service on the ASON networks.

3.1.1 Service and Running Status Analysis
During eleven months working and network running for networks, Currently WDM Backbone networks running status is stability, sometimes due to customer provided link fiber frequently by cut would affect WDM Backbone networks stability, it’s our maintenance team come up against an important problem in NOC, at the same time our team very important job supervise customer to solved in time for link fiber cut bring WDM Backbone networks break off.
Currently NG-SDH ASON networks running status is stability, Customer configuration total LSA service is diamond service,from inspector return result view service running is stability and normal, but Customer Choice Revertive parameter when configuration diamond service on the ASON networks, when active service is break off re-route to another trail, due to customer link-fiber cut long time can’t recover , change or modify this service on the NMS by configuration Revertive Parameter LAS service will become disperse service would affect ASON networks running stability and efficiency ,advice change Revertive LAS service to NON-Revertive services for customer.

3.1.2 NMS running status Analysis
NMS type is use T2000, According with every days checking list proved NMS T2000 running status is normal and stability, Currently project process addition many new WDM 1600G and NG-OSN 7500 NE according with originally design divide WDM and OSN ASON to two single NMS Server for monitor, now doing this plan, if completed it NMS running status would be enhance.
3.2 Detailed Devices Information
Optical networks Devices include OTM, OADM, OLA of WDM and OSN7500 of NG-SDH ASON networks, hardware and software configuration, both WDM and ASON network detailed devices information view as follows.

3.2.1 Basic device information
Optical networks basic device information includes type of Huawei WDM and Huawei NG-SDH devices, site configuration information and so on detail information in the attachment.
3.2.2 Device configuration
WDM and NG-Huawei OSN networks Device configuration information including NE board version and NE information, and ASON service detail information such as attachment as follows:

3.3 Risks Analysis
Describe all the risks that have been found during network routing inspection, network evaluation and troubleshooting process; and the possible workaround and solutions.
3.3.1 Network Risk Analysis
During eleven month maintenance in the NOC for WDM and NG-OSN networks, we already mastery about WDM and ASON networks running status, currently network risk mostly focus on item as follow:
1. Customer provided link fiber frequently by cut would affect WDM Backbone networks stability, it’s our maintenance team come up against an important problem in NOC, at the same time our team very important job supervise customer to solved in time for link fiber cut bring WDM Backbone networks break off.
2. Customer Choice Revertive parameter when configuration diamond service on the ASON networks, when active service is break off re-route to another trail, due to customer link-fiber cut long time can’t recover , change or modify this service on the NMS by configuration Revertive Parameter LAS service will become disperse service

would affect ASON networks running stability and efficiency.
3. Due to upload NE of WDM and NG-SDH is new created; software version of board of NE is not match with design.
3.3.2 Device Risk Analysis
1. According with Precaution of WDM and NG-OSN found WDM NE name can’t including space, if NE name including space OAMS server can’t identify. Relation more Precaution refers to Precaution notification.
2. NG-SDH OSN7500 new NE upload to NMS fond GSCC board software lost phenomena, some new NE need download software to board.
3. DCN of NMS T2000 Server stability and efficiency by effect due to more new NE of WDM and NG-OSN, already make plan divided WDM and NG-OSN to signal server monitor.

4 Emergency Solution
When serious problems such as service interruption occur on equipment of WDM and NG-SDG ASON networks, emergency solution provides for the equipment maintenance personnel to locate and remove the fault quickly, Focus on service recovery, divide issues to different scenes and give detailed emergency solutions,To ensure the stable running of the optical transmission system and reduce the emergency accident to the minimum extent.
4.1 Principals for Emergency Solution Principals during critical issues happen: notify related people in time, fast issues orientate, and fast service recovery.
Problem Notification: Notify the problem as time limited according to our workflow.
Fast Issues Orientation: Orientate the problem based on the problem report ASAP.
Fast Service Recovery: Recover the service through adjusts network parameter, board reboot, board replacement or service switchover to other links or devices.
When handing serious problems such as service interruption by following this flow, take the following precautions.
1. Restore the services as soon as possible. If a backup route is available, switch the services to the backup route.
2. Analyze the fault phenomenon and then handle the fault after locating the cause. When the cause is unclear, do not make operations at random to prevent problems from being severer.
3. In the case of any problem, contact Huawei for technical support, and cooperate with Huawei to handle the problem, and to minimize the service interruption time.
4. Record data during the fault handling, and store the original data related to the fault.

4.2 Critical problem Emergency handling workflow of WDM
For service interruption caused by external factors such as power failure and fiber cuts, or improper operation, or software and hardware faults, refer to this flow to locate faults quickly, or to obtain assistance, until the service is resumed.
Emergency handling workflow of WDM as follows:
Detail information in the attachment.
Emergency Handling:
4.3 Critical problem Emergency handling workflow of NG-OSN
This workflow provides measures for handling emergencies such as a service interruption in an Huawei OSN optical transmission system that is constructed by OptiX series products. It can be used as troubleshooting guidance so that the maintenance personnel can restore the equipment to the normal running state quickly.
Emergency handling workflow of NG-SDH as follows:

Optics OSN system emergency handling flow
4.4 Critical problem Emergency handling workflow of ASON
This workflow describes the measures that should be taken when services in an ASON network are interrupted. The guidelines focus on how to restore services quickly within a minimum of time in the case of a fault.
Emergency handling workflow of ASON as follows:
Detail information in the attachment.
4.5 Critical problem Emergency handling workflow of I-Manager T2000
This guide provides emergency measures for accidents such as running abnormally in OptiX iManager T2000/T2100 to guide the maintenance staff to restore system as soon as possible, For stable running of OptiX iManager T2000/T2100 systems and to avoid accidents to the best extent.

5 History Problems and Solutions
As for the operation and maintenance to the long-term also most important project as Optical Network, Summarize the history problems happened, so as to perfect the Critical Problem Response Process, these type of problem should be enhance Huawei Optical Network Maintenance and Emergency Response Implementation Guide Confidentiality Level.