[Problem Description]
Trigger conditions:
Two lower order cross-connection protection groups are configured on an OptiX OSN 9500. If any of the following operations is performed, the problem occurs.
1. Configure lower order services, or convert ordinary lower order services to lower order SNCP.
2. Run commands or perform operations on the NMS to adjust concatenated timeslots.
3. Download configurations from the NMS.
Symptoms:
1. Lower order services fail to be configured. The error code “38735” is returned, indicating that the verification services on the matrix algorithm submodule fail to be connected.
2. After service configuration, the system control board may be reset.
3. The system control board is frequently reset when perform concatenated timeslot adjustment operations on the NMS or running commands.
Identification method:
OptiX OSN 9500 NEs are configured with two lower order cross-connection protection groups. Note that only OSN 9500 NEs with a V100R6 version or later can be configured with two lower order cross-connection protection groups.
Scenario 1:
Lower order services fail to be configured. The error code “38735” is returned, indicating that the verification services on the matrix algorithm submodule fail to be connected. Alternatively, the system control board may be warm reset.
Scenario 2:
The system control board is frequently reset when perform concatenated timeslot adjustment operations on the NMS or running commands.]
[Root Cause]
During concatenated timeslot adjustment, higher order concatenation needs to be established between two cross-connection protection groups to search and allocate concatenated lower order TU timeslots for lower order cross-connections. The algorithm for allocating concatenated lower order timeslots has defects. Therefore, invalid lower order timeslots are allocated in some specific scenarios, resulting in array subscript violation and memory access exception in further allocation. As a result, NEs are abnormally reset.
As for the service configuration failure that occurs after configuration downloading from the NMS, it is mainly caused by the defects in the algorithm for allocating concatenated lower order TU timeslots. Therefore, invalid TU timeslots are allocated, and they will fail in the lower order algorithm verification when functioning as service sinks
[Impact and Risk]
1. Lower order services may fail to be configured. There is a low probability that the system control boards are reset.
2. Lower order resources cannot be optimized by adjusting concatenated timeslots. Otherwise, NEs may be frequently reset.
3. Configurations may fail to be downloaded from the NMS.
4. If an NE is faulty, the system control board may fail to be started after warm reset or active/standby switching.
5. This problem does not affect:
- Operations such as rerouting and optimization after lower order services are transmitted to ASON tunnels.
- SNCP switching.
- MSP switching.
- Active/Standby switching of cross-connect boards
[Measures and Solutions]
Recovery measures:
1. If new services fail to be configured, select another timeslots or change the sequence of service configurations.
2. If the NE is frequently reset after concatenated timeslots are adjusted, restore services by restoring the NE database package.
Workarounds:
1. Do not adjust concatenated timeslots on the live network.
2. Avoid configuring lower order services in batches before the problem is resolved. If the lower order services must be configured, prepare service configure rules based on the service configuration on the live network to avoid configuring new services crossing two protection groups. For details about configuration rules, see the attachment Guide to Sorting Service Configurations.
3. Back up databases periodically. Back up databases immediately after any lower order service configuration. These methods ensure that a faulty NE can be restored by restoring databases.
4. Check the backup database. If any exception is found, restore the database.
Use the following methods to check the database:(For overseas offices, ask R&D for more help.And R&D will provide the tool.)
1. Open the cfglxc.dbf file with a DBF file reader.
2. Export data from the database.
Record the data into excels.
Check the HXCIDCC and CCTUID columns. The values in both columns for a parameter should be 0 or non-zero. If the values in both columns for a parameters is 0 or non-zero, the database may have exceptions. Request R&D engineers to analyze the database.
Preventive measures:
1. Upgrade the OSN 9500 NEs involved to V100R006C05SPC200 or later versions
2. Fix hot patch R006C03SPH203 on these OSN 9500 NEs, and this patch will be released in March, 2013.
[Rectification Instructions]
For details about how to upgrade software, refer to the related software upgrade guide.
Comments are closed