[Problem Description]
Trigger condition:
Scenario 1: If a large number of ASON tunnels are rerouted, reverted, or optimized and the system control board is busy, there is a low probability that lower order services in the tunnels are lost on Huawei OSN product. Such as OptiX OSN 1500/2500/3500/7500.
Scenario 2: If algorithm rollback occurs when ASON tunnel rerouting fails, tunnel services that fail to be rerouted may be interrupted.
Symptom:
Scenario 1: For normal silver tunnels, all lower order services are lost and interrupted during rerouting, route reversion, or route optimization. For associated silver tunnels, all lower order services in the faulty tunnel are lost, but are not interrupted due to the protection of the associated tunnel.
Scenario 2: For normal silver tunnels, all lower order services can be queried normally, but all the services report alarms and are interrupted. For associated silver tunnels, all lower order services in the faulty tunnel report alarms, but are not interrupted due to the protection of the associated tunnel.
Identification method:
Scenario 1: If all the following conditions are met, lower order services are lost.
1. The NE version is NG-SDH V100R008C02SPC200 (5.xx.18.50P01) or any earlier V100R008C01 or V100R008C02 version.
2. Residual lower order static cross-connections fail to access the ASON tunnels, but exist in other tunnels.
3. A “vector is empty,but state is verified!” message exists in the asonlog.txt log file at the corresponding time.
To identify abnormal records in the asonlog.txt log file:
1. Upload mfs/log/asonlog.txt and ofs2/log/asonlog*.gz to the PC using the data collection tool.
2. Open the asonlog.txt log of the corresponding time and search for the “vector is empty,but state is verified!” message. If the message is recorded in the log, the problem occurs.
50 ms TimeOut Calling SendAVerifyToFcl:Tunnel.cfgRc=0 6- 2 20: 1:35
fcl_ason_init.cpp,533 Current State:VERIFYING 6- 2 20: 1:35
Call Verify by 1006,Result=0x0,rbyRspFlag=1,Rud=0x121E006 6- 2 20: 1:50
Call PostConfirm by 1006,Result=0x0,rbyRspFlag=2 6- 2 20: 2:14
mo_Ne.cpp,1621 Current State:VERIFIED 6- 2 20: 2:14
vector is empty,but state is verified! 6- 2 20: 2:14
Rsp To ASON.wCmd=0x891A,CmdIdx=0x0,Len=608, Para=1A, 1, 0, D, 0, 1, 1, 1,1A, 1, 0, D, 0, 1, 1, 0,1A, 1, 0, E, 0, 1, 1, 1,1A, 1, 0, E, 0, 1, 1, 0,1C, 1, 0,3F, 0, 1, 1, 1,1C, 1, 0,3D, 0, 1, 1, 1,1C, 1, 0,3D, 0, 1, 1, 0,1C, 1, 0,3B, 0, 1, 1, 1,1C, 1, 0, 6- 2 20: 2:14
Rsp To ASON.wCmd=0x8A00,CmdIdx=0x4112,Len=2, Para= 0, 0, 6- 2 20: 2:41
Scenario 2: If all the following conditions are met, tunnel services that fail to be rerouted are interrupted.
1. The NE version is NG-SDH V100R008C02SPC200 (5.xx.18.50P01) or any earlier V100R008C01 or V100R008C02 version.
2. The tunnel services are normal according to the query result on the NMS, and the logical layer is correctly configured.
3. A “tunnel rollback” message exists in the asonlog.txt log file at the corresponding time.
To identify abnormal records in the asonlog.txt log file:
1. Upload mfs/log/asonlog.txt and ofs2/log/asonlog*.gz to the PC using the data collection tool.
2. Open the asonlog.txt log of the corresponding time and search for the “tunnel rollback” message. If the message is recorded in the log, the problem occurs.
From=0x8949,CmdIdx=0x7E,Len=10, Para= B, 1, 0,1D, 8, 1, 0,20, 0, 1, 3-23 9:23: 3
From=0x8949,CmdIdx=0x82,Len=10, Para= 8, 1, 0,20, B, 1, 0,1E, 0, 1, 3-23 9:23: 3
From=0x8949,CmdIdx=0x85,Len=10, Para= 8, 1, 0, 8, B, 1, 0,1D, 0, 1, 3-23 9:23: 3
From=0x8949,CmdIdx=0x88,Len=10, Para= 8, 1, 0, D, B, 1, 0,13, 0, 1, 3-23 9:23: 3
From=0x8949,CmdIdx=0x8B,Len=10, Para= 8, 1, 0,18, B, 1, 0,11, 0, 1, 3-23 9:23: 3
From=0x8949,CmdIdx=0x8E,Len=10, Para= B, 1, 0,1E, 8, 1, 0,1D, 0, 1, 3-23 9:23: 3
From=0x8949,CmdIdx=0x96,Len=10, Para= 8, 1, 0, 3, B, 1, 0,10, 0, 1, 3-23 9:23: 3
50 ms TimeOut Calling SendAVerifyToFcl:Tunnel.cfgRc=20494 3-23 9:23: 3
tunnel rollback 13,19 3-23 9:23: 3
tunnel rollback 24,17 3-23 9:23: 3
tunnel rollback 30,29 3-23 9:23: 3
tunnel rollback 3,16 3-23 9:23: 3
Rsp To ASON.wCmd=0x8949,CmdIdx=0x7E,Len=2, Para=97,5B, 3-23 9:23: 4
Rsp To ASON.wCmd=0x8949,CmdIdx=0x82,Len=2, Para=97,5B, 3-23 9:23: 4
Rsp To ASON.wCmd=0x8949,CmdIdx=0x85,Len=2, Para=97,5B, 3-23 9:23: 4
Rsp To ASON.wCmd=0x8949,CmdIdx=0x88,Len=2, Para=50, E, 3-23 9:23: 4
Rsp To ASON.wCmd=0x8949,CmdIdx=0x8B,Len=2, Para=50, E, 3-23 9:23: 4
Rsp To ASON.wCmd=0x8949,CmdIdx=0x8E,Len=2, Para=50, E, 3-23 9:23: 4
Rsp To ASON.wCmd=0x8949,CmdIdx=0x96,Len=2, Para=50, E, 3-23 9:23: 4
[Root Cause]
Scenario 1: When a large number of ASON tunnel services are rerouted, reverted, or optimized, the NE processes the tunnel commands and buffers the processing results in the message queue. The buffering duration is 20 seconds. After 20 seconds, the timeout management module clears the command package from the buffer. The rerouting, reverting, or optimization of tunnel services is complex and lasts for a long time. Obtaining the response message fails because the response message is already cleared after the buffering duration expires. As a result, the response message to tunnel rerouting is lost. After waiting for the response message times out, the ASON tunnel considers that the NE fails to process the tunnel commands. In fact, tunnel services have also been transferred, causing the loss of lower order services.
Scenario 2: When ASON tunnel rerouting fails due to an abnormality, the NE performs a rollback. Due to defects of the tunnel rollback processing mechanism, the tunnel rerouting is successful in the cross-connect matrix, but fails at the configuration layer. That is, the logical configurations of the tunnel services are different from algorithm-side configurations, causing service interruption.
[Impact and Risk]
Scenario 1: For normal silver tunnels, all lower order services are lost and interrupted during rerouting, route reversion, or route optimization. For associated silver tunnels, all lower order services in the faulty tunnel are lost, but are not interrupted due to the protection of the associated tunnel.
Scenario 2: For normal silver tunnels, all lower order services can be queried normally, but all the services report alarms and are interrupted. For associated silver tunnels, all lower order services in the faulty tunnel report alarms, but are not interrupted due to the protection of the associated tunnel.
[Measures and Solutions]
Recovery measures:
Scenario 1: Delete the existing server trails and then re-create the ASON server trails by specifying lower order service timeslots at the source and sink nodes. Alternatively, use the ASONDOCTOR tool to generate the configuration restoration script.
Scenario 2: Warm reset the active system control board and then warm reset the active and standby cross-connect boards.
Workarounds:
For scenario 1 and scenario 2, change the silver tunnels to associated silver tunnels to avoid service interruption.
Preventive measures:
For scenario 1 and scenario 2, upgrade the NE to V100R008C02SPC200 and install the V100R008C02SPH201 or later hot patches; or upgrade the NE to a version later than V100R008C02SPC200. If you want to see more Huawei products please browse the Huawei transport network.
Material handling after replacement:
N/A
[Inspector Applicable or Not]
N/A
[Rectification Scope and Time Requirements]
N/A
[Rectification Instructions]
N/A
[Attachment]
None
Comments are closed