Wednesday, April 25, 2018 9:27 am

Debug like a Warrior. Part II

Written by
Rate this item
(0 votes)

At NetBurner we do our best to deliver a turnkey experience that is easy, fast and as seamless as possible. Yet, there are times when you need to sharpen your sword and drive into battle against hordes of… Bugs! This doesn’t happen in most simple out of the box applications; however, one of the advantages with NetBurner is that we enable cavalier and experienced engineers to develop and tailor the embedded software to get customized functionality. This can sometimes introduce some less predictable and difficult-to-eradicate bugs, gremlins, goblins or demons. In this sequel, we’ll pick up with a few more techniques that will further train and empower you to recognize and exterminate those digital foes like a true warrior. If you missed the first part of this article you can catch up here.

“A warrior must only take care that his spirit is never broken.”---Chozan Shissai

The Network died now what?

Start
Start
No
No
Yes
Yes<br>
Does it work on start ups?
Does<b> it </b>work on start ups?
Yes
Yes<br>
No
No
Can you find the NetBurner with IPSETUP?
Can you find the NetBurner with IPSETUP?
Yes
Yes
Has it died?
Has <b>it</b> died?
Yes
Yes
No
No
Out of buffers?
Out of buffers?
Diagnose buffer loss
Diagnose buffer loss
Yes
Yes
No
No
Is the network address & mask correct?
Is the network address & mask correct?
No
No
Yes
Yes
Is the network address & mask correct?
Is the network address & mask correct?
Hardware or Firewall Issue
Hardware or Firewall Issue
Fix network address / mask issues
Fix network address / mask issues
Look at tasks
Look at tasks
Run Specific Failure Flowchart
Run Specific Failure Flowchart
No
No

PNG image file.

Having your network die is like a warhorse laying down just as you prepare to storm into battle -- sword drawn. Let’s sort this dire situation out like a warrior. The very first step is to figure out what died. Here’s a handy block flow diagram to get you through this. The subsequent sections will address some of the less obvious blocks in the diagram.

Out of Buffers

The entire network stack works by passing around pre-allocated buffers. This is a limited resource and misbehaving code can use them up. To test the buffer, add the following debugging code which will repost the buffer free count. This can be added in a number of different areas of code, depending on the structure of the application. One possible solution is to put it inside the infinite loop in UserMain(), and use a timer to have it called every so often.

#include 
//Exposes the function :   WORD GetFreeCount();
//Then some place in your code …
iprintf(“Free buffers = %d\r\n”,(int)GetFreeCount());

Now run this and carefully watch the messages. If this number is decreasing every time the module reports it and the system stops responding when it goes to zero…. DING! This is your issue.

Why buffers might go to zero

The primary cause of buffer loss (filling buffers from the available pool and failing to release them) is mishandled UDP sockets or UDP registered FIFO’s. If you set up a UDP socket to receive, or if you use the UDP object class and register a FIFO to receive UDP packets, then you must read them. If you’re not reading them, then unread UDP packets will accumulate in the UDP receive queue until you have run out of buffers. Alternately, you can also exhaust all buffers by creating a UDP transmit loop that does not wait until the packet has completely gone out on the wire before creating another packet.

A secondary, less common buffer loss is due to trying to do I/O within a user written interrupt routine. There are several things that you may not do from within an interrupt routine. Calling any of the functions listed below inside of an interrupt routine can get you into some misery:

  • All μC/OS critical section functions:
    • USER_ENTER_CRITICAL
    • USER_EXIT_CRITICAL
    • UCOS_ENTER_CRITICAL
    • UCOS_EXIT_CRITICAL
  • All μC/OS init and pend functions (all OsxxPendNoWait functions are okay):
    • OSxxInit
    • OSxxPend
    • OSCritEnter
    • OSChangePrio
    • OSTaskDelete
    • OSLock
    • OSUnlock
    • OSTaskCreate
    • OSTimeDly
  • I/O Functions:
    • write
    • writeall
    • read
    • printf
    • fprintf
    • iprintf
    • scanf
    • gets
    • puts
  • Memory management functions:
    • malloc
    • free
    • new
    • delete

As you can see from the list above, trying to use any sort of I/O functions can cause a lot of headaches. One of the possible failures from doing this is to interfere with the buffer handling queues, which will prevent used buffers from getting released.

From time to time, we have also found internal NetBurner bugs that lost packets on strange corner-cases. And because of our upcoming beta releases of the new WiFi and SSL drivers, it’s possible you found a path to lose buffers we did not find. If this is the case, thank you! Also, please report it to This email address is being protected from spambots. You need JavaScript enabled to view it..

Is the Network Address and mask correct?

The system can get network addresses with DHCP or it can use static settings. A full network address consists of four things:

  1. IP Address
  2. Network Mask
  3. Gateway address (only needed if you are routing to addresses off the local network)
  4. DNS address (only needed if you are addressing things with names, i.e. www.NetBurner.com instead of 34.199.141.96)

The validity of these settings is covered in detail in the NetBurner Programmer’s Guide.

Here’s an abridged version: The bit operation “AND” of the IP Address and the Network Mask must equal the bit operation “AND” of the Gateway address and the Network Mask. In other words:

(ip & mask) == (gateway & mask) // This must be true

Otherwise, the gateway is not on the local network. DNS must have a valid value.

Look at Tasks

The previous article in this series gave you a way to look at what tasks are doing using the TaskScan utility. Unfortunately, if you have a user created high-priority task that never yields or blocks the network, that method is not going to work. If you end up in this block of the flow diagram, you’ll want to create a dump of the TaskScan or textual task dump results , open a NetBurner support ticket with the details of what’s happening, and we will be happy to help you figure out what’s going on.

The Webserver Dies

Occasionally, the webserver running on your NetBurner may hit a bump in the road and get locked up. While it’s not quite as troubling as having your warhorse laying down on your right before a fight, it is a lot like showing up to the battlefield and realizing you left your sword with your other suit of armor. The chart below walks you through debugging the most common pitfalls encountered when this beast rears its ugly head.

Web server dies
Web server dies
No
No
Yes
Yes
Has HTTP task died in a user function?
Has HTTP task died in a user function?
Run TaskScan
Run TaskScan
No
No
Yes
Yes
Has HTTP task died in a user function?
Has HTTP task died in a user function?
Yes
Yes
No
No
Has HTTP task died in a user function?
Has HTTP task died in a user function?
If your user function clocks it stops all HTTP processing until complete.
[Not supported by viewer]
Figure out where it's stuck and why. Ask for help on NetBurner Support
Figure out where it's stuck and why. Ask for help on NetBurner Support
Look at the TCP socket loss reports
Look at the TCP socket loss reports

PNG image file.

Look at timages/learn/webserverdied.pnghe TCP socket loss reports.

Add the following code to your project:

#include <../system/tcp_internal.h>
//Then call
       socket_struct::ShowSocketInfo(false);

This is going to dump some information about each opensocket. It will look something like this:

Socket[32]:
  State: ESTABLISED
  gpFlags: 0x80
  myIP: 10.1.1.170 - theirIP: 10.1.1.159
  myPort: 80 - theirPort: 56777
  max_listen: 0 - cur_listen: 0
  pNextActive: 0x0 - pPrevActive: 0x0
  NextTimeAction: 4361
  theirWindow: 3532485379 - myWindow: 3517094507
  TxBuff Depth: 0 - RxBuff Depth: 0
Socket[36]:
  State: LISTEN

The most interesting value is the “State”. The NetBurner system by default has 32 sockets. If all 32 sockets have been listed in this report, you are out of TCP sockets and need to figure out why the system is out of sockets. Again, if you get to this point and nothing is obvious, include this data and contact NetBurner support and we will help you dig deeper.

Your outbound TCP connection dies

Ok, we’re running out of medieval warfare metaphors here! Needless to say, having your TCP connection fail on you can be extremely frustrating. Not only will it impact your application, but several of the NetBurner tools rely on TCP network communication, and being stripped of those can often make you feel vulnerable and alone… kind of like showing up to battle with no horse, armor or sword (looks like we found one after all). The chart below takes you through the steps needed to address the most common causes for TCP connection failure to help you pinpoint exactly what’s going on.

Outbound TCP connection/web client diagnosis
Outbound TCP connection/web client diagnosis
Name
Name
IP
IP
Are you using a name (DNS) or IP address?
Are you using a name (DNS) or IP address?
No
No
Yes
Yes
Did it work then die?
Did it work then die?
Yes
Yes
No
No
Is IP on the local network?
Is IP on the local network?
No
No
Yes
Yes
Is DNS resolving to an address?
Is DNS resolving to an address?
Try rerunning DNS resolution as dynamic servers can move.
Try rerunning DNS resolution as dynamic servers can move.
Yes
Yes
No
No
Is the connect call returning an error?
Is the connect call returning an error?
Yes
Yes
No
No
Does NB have a DNS address & Gateway?
Does NB have a DNS address & Gateway?
Yes
Yes
No
No
Does the NB have a Gateway on the local network?
Does the NB have a Gateway on the local network?
No
No
Yes
Yes
Is it TCP_ERR_NONE_AVIAL (-5)?
Is it TCP_ERR_NONE_AVIAL (-5)?
No
No
Yes
Yes
Is this a TCP connection that remains open?
Is this a TCP connection that remains open?
Contact NB support and get a wireshark capture of the error.
[Not supported by viewer]
Read notes on TCP keep alive and continuously connected sockets
Read notes on TCP keep alive and continuously connected sockets
Yes
Yes
No
No
Can you PC resolve the NAME?
Can you PC resolve the NAME?
Try a different DNS server or capture a wireshark of the failure and contact NB support
[Not supported by viewer]
Name is bad.
Name is bad.
Fix DHCP server or network config
Fix DHCP server or network config
Yes
Yes
No
No
Is it TCP_ERR_CON_RESET (-6)?
Is it TCP_ERR_CON_RESET (-6)?
Look at the TCP socket loss reports.
Look at the TCP socket loss reports.
No
No
Yes
Yes<br>
Did you
specify the local port in your connection
call?
[Not supported by viewer]
If your sure the server you are trying to connect to is ip, contact NB support with the error code.
[Not supported by viewer]
Don't do that!!
Don't do that!!

PNG image file.

Notes about TCP keep alive.

By design a TCP connection does not send anything over the wire if there is nothing to be sent. What this means is if you have a TCP connection to a remote device and the power goes off, the cable gets unplugged or the building is overrun by the Night King and the army of the dead, the TCP connection will have no way of knowing the other side of the connection is dead, unless it tries to send data.

There are two ways to see if a socket is dead:

  1. Send it some data.
  2. Send it a keepalive request: The details of the keepalive are spelled out and working code is given as an example in nburn/examples/standardstack/tcp/TCP_simple_keepalive. Here’s a quick code snippet...

#include 
//Find out the last time we received any kind of packet from the other side of the connection.
DWORD TcpGetLastRxTime( int fd );//Returns the number of TimeTicks since the last RX
//Send a keep alive the other side should respond.
void TcpSendKeepAlive ( int fd );

One could write many volumes on how to debug Network issues. This article is necessarily brief.

Further Support

If you are having network issues the best path forward is to go down these flow charts until you get confused or the results are unclear. Then open a support request with as much information as possible about what is wrong, including the steps of what you have done to debug this yourself. As part of your support request please include as a minimum:

  • What is connected to what.
  • Did it ever work?
  • How long it ran before it died.
  • And as much of the diagnostic flow as you have accomplished.

As always if there is something you’d like us to expand upon or have questions about please feel free to comment below or contact us directly. We hope this has hardened your skills as you fight the good fight. From one embedded warrior to another -- Tallyho! And remember:

“Warriors do not win victories by beating their heads against walls, but by overtaking the walls. Warriors jump over walls; they don't demolish .hem” --- The Teachings of Don Juan, Carlos Castaneda
Read 490 times Last modified on Thursday, April 26, 2018 3:23 am

Leave a comment

NetBurner Learn

The NetBurner Learn website is a place to learn faster ways to design, code, and build your NetBurner based product. Sign-up for our monthly newsletter!

Latest Articles

We use cookies to help us provide a better user experience.