How to Troubleshoot High CPU Issue while Passing High Load Data Traffic Through S6700?


When we are passing 9Gbps through one S6700 switch (being part of RRPP ring), which comes from a single port, the CPU rises up until 60 %. When traffic is stopped, the CPU usage drops down again to its normal values.      

This is a big problem because it is supposed that data should be forwarded using ASIC without utilizing the CPU unit. But why is the CPU at 60 %? It is normal for this load of traffic to come just from one port? Of course not, but what is happened here?

Handling Process

I analyzed the processes handled by the CPU from display cpu-usage: (here are shown the most representative tasks)

bcmL2MOD.0           17%         0/8285f38d       tS16  //Task that handles mac-flapping.
bcmCNTR.0             8%         0/40c54910       tS17
PPI                   7%         0/36d7ee7e       PPI Product Process Interface   // PPI: This is a task at the adaptation layer. Maintain chip interface status
bmLINK.0              2%         0/160654ff       tS1a               //bmLI: Scan port status and notify the application modules of status changes

So the switch is facing mac-flapping from the RRPP ring? Great discovery but there is no other traffic in the ring than the one used for testing. 
Are related? The customer had a strange way of testing high load traffic by plugging the traffic generator / measuring device on one port (e.g. xg0/0/13) and using loopback internal on the way-out port (e.g. xg0/0/14) so all the frames sent to the way-out port  are coming back from that port.
How this could represent the resolution for high-cpu issue? Well, after all is quite simple because using a traffic generator the source MACs of the frames are the same and  coming from both ports xg0/0/13 and xg0/0/14 – when is coming back – it is processed by the switch as mac-flapping that is why it causes high-cpu. Witness is the output of display mac-address flapping record:

["TES1-EQI1.192"]display mac-address flapping record
S  : start time
E  : end time
(Q) : quit vlan
(D) : error down
Move-Time             VLAN MAC-Address   Original-Port   Move-Ports      MoveNum
S:2014-10-03 00:09:20 1818 cafe-beef-cafe XGE0/0/13       XGE0/0/14       35186
E:2014-10-03 00:09:53
Root Causes

The way how the customer wanted to pass high load traffic through the S6700 switch enabling on the way-out port loopback internal command.


Please kindly note do not use loopbacks while testing traffic performances and having the traffic source coming from a measuring device/traffic generator.