Traffic-abnormal-when-move-the-EOS-service-from-OSN3500-to-OSN8800

Traffic abnormal when move the EOS service from OSN3500 to OSN8800

This article will share with you a case about the Traffic abnormal when move the EOS service from OSN3500 to OSN8800.

Issue description

when cutover the EoS from OSN3500 to OTN8800, the traffic become much higher.

in below cutover, when move the EoS traffic from OSN3500 to OTN8800, some traffic become much higher. (The fiber marked red are the fibers to be moved, different color means different equipment. blue and green is two OSN 8800, purple and red are two OSN3500, yellow is third party)

Handling Process

step 1: inside of the EGSH, there is ELINE between VCTRUNK and ports, normally we think this part wouldn’t cause any traffic abnormal. then check the realtime traffic of the HUNQ2 ports in blue OSN8800, and the realtime traffic of the EG16. found that the traffic from HUNQ2 is much less then the traffic goes out from EG16. so the traffic has broadcast in the ELAN between HUNQ2 and EG16.

step 2: other parameter seems not related to any traffic broadcast potential. As there is sharing LAG between HUNQ2, we plan to test the LAG first. change the LAG mode and check the realtime traffic 5 minutes later after LAG change. 

step 3: obviously, the LAG mode has clear effect on the traffic.

Root cause

The root cause is related to OSN8800 LAG working mechanism detail. Normally in OSN9800, OSN1800V or OSN3500, the forwarding table of the ELAN services are shared between LAG ports, but in OSN8800 , the forwarding table is not shared between LAG ports in different boards.

to understand how this happen, below example are made:

rule 1:for ELAN forwarding table, it learn from the S-MAC, forward based on the D-MAC, and forwarding table is saved in downstream boards.

rule 2:the ELAN only know the HUNQ2-LAG group as a logical port.ELAN don’t know the main-LAG port or the standby port. then when traffic assign to this LAG-group, there is one algorithm will decide traffic goes to main or standby port.

then, when everything is normal: you can see the forwarding table as below for each board in downstream. X1-Y1 pair and X2-Y2 pair go to main path, X3-Y3 pair goes to standby. 

so how the problem happened? the problem is in the LAG algorithm, normally when the LAG algorithm assigns the traffic MAC-Y2–MAC-X2 to main port of LAG, it should assign the traffic MAC-X2–MAC-Y2 to main of LAG too.

but this LAG algorithm of OSN8800 in sharing LAG of source and destination MAC mode may have malfunction. it may assign traffic MAC-Y2–MAC-X2 to main port and in peer side, assign the MAC-X2–MAC-Y2 to the standby port.

for example, the assignment of MAC-X2–MAC-Y2 has problem, then in router side the forwarding table become like below, the MW side forwarding table remain the same. then for traffic of this pair, X2-Y2, it would have broadcast in both side because it always can’t find any information in forwarding table. then, traffic broadcast happens.

Solution

modify the lag mode from sharing to non-sharing or same board.

Summary

do not use LAG sharing of MAC source and destination mode in OSN8800. other mode is ok. here we use the non-sharing.

Comments are closed