Summary: Due to the time sequence controlling problem of the latch chip, a random output level status occurs when the board is powered on, which results in the burning of the triode chip on the board, and leads to the abnormal switching between the working and protection channels of the board.
[Problem Description]
Trigger conditions:
This problem will be triggered when the power is on, with low probabilities of occurrence.
Symptom:
(1) A protection switching failure occurs on a board;
(2) The NMS reports that the working channel remains the same after a protection group switching command is issued on the NMS.
This fault occurs when both the symptoms occur.
Identification method:
For the faulty board, a triode burning fault occurs when the working channel remains the same after a protection group switching command is issued on the NMS.
See the attached Identification Method of Triode Fault on the DCP and OLP Boards.
[Root Cause]
Due to the time sequence controlling problem of the latch chip which controls the status of the optical switch, a random output level status occurs when the board is powered on. There is a probability that both the upper and lower triodes of the optical switch drive circuit are in the conducting state at the same time, which will result in a damage to the triodes after they are working for some time. The damage will cause a sudden increase on the current through the triodes and lead to the burning of them and consequently an abnormal switching between the working and protection channels of the board.
A switching test can resolve the abnormal level status of the latch chip.
[Impact and Risk]
The optical switch fails to switch and the service is interrupted.
[Measures and Solutions]
Recovery measures:
Replace the board.
Preventive measure:
No preventive measures need to be taken.
Solutions:
1. Upgrade the logic.
Upgrade NEs to OptiX OSN 6800 V100R006C01SPC500/OptiX OSN 8800 V100R006C01SPC500, OptiX OSN 6800 V100R006C03SPC500/OptiX OSN 8800 V100R006C03SPC500, or later.
2. Boards produced after May 22, 2012 are free of this problem.Live network measures:
1. Use a preventive maintenance tool to perform preventive maintenance. Focus on the check result for this prewarning issue.
2. If errors occur on preventive maintenance using a preventive maintenance tool, identify boards that have this prewarning issue manually.
a. Identification methods:
- Run the :cfg-get-bdverinfo:$bid command to query whether the PCB version of the OLP/DCP board on an NE is the F version.
- If the PCB is of the F version, run the :ops-get-oppsbdmap:0,0 command to query whether the OLP/DCP board has been configured with protection groups.
- If the OLP/DCP board is not configured with protection groups, the OLP/DCP board is considered a risky board.
- If the OLP/DCP board has been configured with protection groups, run the :ops-get-oppsevtlog:$opid command to query switching logs of protection groups. If two SWITCH_CHANNEL records exist at the same time, the OLP/DCP board may have the prewarning issue.
b. Handling suggestions:
- For risky boards, configure protection groups on boards in a timely manner. Then apply for service interruption time window to conduct the switching test twice. If protection switching fails, replace the board.
- For boards that may have this prewarning issue, apply for service interruption time window to conduct the switching test twice. If protection switching fails, replace the board.
3. Conduct the switching test twice every time when the board is powered on.
Material handling after replacement:
Send the materials back to the producing sector for reverse maintenance.
[Rectification Scope and Time Requirements]
No rectification scope and time requirements are involved.
[Rectification Instructions]
No rectification instructions are involved.
Comments are closed