Serial overrun on Linux

Working with serial lines can sometimes give you big headaches. I have an embedded PC based on 386 SX 40 processor. This PC doesn’t make much but it has programs using the serial line intensively. Things didn’t work as well as expected so I looked carefully what was going on … the beast was loosing bytes ! My information has been promptly confirmed by the /proc/tty/driver/serial entry. If you have “oe: X” (where X is a positive number) there, it means that one of your UART detected overrun errors.

So what’s an overrun ? An overrun happens when the UART receives data while its FIFO buffer is full. Why is the FIFO full ? Because Linux didn’t treat the serial interruption quickly enough. Why is Linux so slow ? Linux is not a real time OS and it doesn’t guarantee any response time to interruption, so Linux is not so slow but my PC really is … what happens is that interruption related to the network are treated before serial interruptions. Furthermore IDE disk interruptions can take too long too. Worst case is of course, you’re treating a disk interruption, then you have to treat the network interruption and only after that you can treat the serial interruption which in fact happened right after the beginning of the disk interruption

So fixing serial overrun is a rather complex problem since it’s really a kernel related problem. Googling on the subject I have found several ideas to explore :

  • configure IDE disk to use DMA hdparm -d 1 /dev/hda, use of DMA will shorten the time where IRQ will be masked to the kernel (in my case it doesn’t work since I’m using DiskOnModule which do not support DMA)
  • make disk IRQ interruptible with hdparm -u 1 /dev/hda
  • use irqtune to re-prioritize the IRQ on the interruption controller. This software is no more maintained and it doesn’t work out of the box on kernel 2.4.x.

Using hdparm -u wasn’t enough to solve my problem… so I continued to look for a solution and I found one ! I recompiled my kernel with the low latency patch and the preemptible kernel patch. Those are usually used for multimedia applications where you need good responsiveness in order to deliver content in real-time but the fact is that they do work for my purpose too !

My serial overruns are completely gone at 9600 bauds. However I can still have some when running at 115200 bauds. Moreover I can create serial overrun by running a find / -type f | grep -v /proc/ | xargs md5sum in the background… I can’t make miracles with this slow processor… if you have more ideas to further improve the situation, I’m willing to try !

Additional Resources

Get the Debian Administrator's Handbook

After a successful liberation campaign, the Debian Administrator's Handbook is now freely available. If you appreciate my articles and what I do for Debian, check out the book and grab a copy.

Comments

  1. Did you consider a kernel upgrade to 2.6?

  2. Anonymous says:

    What kind of interface are you using with the DiskOnModule?

  3. If this is a 16450 UART, which is likely given the processor of the machine, it only has one byte’s worth of buffer (or is it actually one bit? I can’t quite remember!), so if you’re running at 115kbps you need to service the serial interrupt exceedingly quickly indeed to not lose the byte. I used to use a 486 DX2 66MHz as a gateway box masquerading my LAN to an ISDN modem, and it had problems with running out of buffer when running the serial at 115kbps. I don’t know if it’s an option in your case, but I bought a pretty cheap ISA “fast” serial card which had a more modern 16650 UART, with 16 bytes of buffer, and the problems went away.

    HTH,
    Rob

  4. why do you use such old stuff?

    if you want something better use the latest real time patch from ingo molnar.
    the 0(1) scheduler 2.6 improved already a lot in this area and stock 2.6.13 and 2.6.14 have lots of low latency patches by him coming out directly of the -rt patchset,

    what made you stick with something that ancient?

  5. In reality, the hardware is not old … it’s quite new but it’s just built on an compatible i386 CPU. Why an i386 ? Because it’s cheap …

    The UART is a 16550A so it has 16 bytes buffer luckily ! The DiskOnModule is connected on a 44pin IDE connector.

    I’m still using 2.4.x because I have the feeling that it’s lighter than 2.6.x in an embedded context… and also because I have some custom module which have been written for linux 2.4.x and which may have to be ported if I need to switch to 2.6.x.

  6. Anonymous says:

    2.6 has several additional features which make it possible to shrink the kernel a good bit further than a 2.4 kernel. You have much more control over which components are included or not, even down to the basic stuff that you wouldn’t normally think to leave out.

    Do you mind if I ask what the function of your embedded device is?

  7. The embedded device is a “serial-ethernet converter”. You can manage your serial port via TCP/IP. It’s used to remotely drive any serial device.

  8. Kurt Roeckx says:

    Are you using some kind of flow control? Best would of course be hardware flow control, and it depends on the cable you have if you can use it or not.

  9. Of course, hardware flow control is activated ! Any serial device who must treat big chunks of data uses that.

  10. Kurt Roeckx says:

    Did you check that your cable actually supports hardware flow control? There are alot of cables out there with just 3 wires: RX, TX, signal ground. With those you can’t do hardware flow control. What you need is one that has 7 wires, but there are 2 types of those. One that connects the RTS/CTS localy back, one that doesn’t. Depending on how the serial port is set up, you need one of the 2 cables with 7 wires.

  11. My cable is ok. I tested it physically as well as via software. The software handling the serial line can change RTS/DTR and read CTS/DSR/RNG/DCD. And I checked that changing RTS on one side changes CTS on the other side, etc.

Trackbacks

  1. […] This is a never ending story for me. The first time I’ve had problems with Linux’s handling of serial UART dates back to 2005 (see my previous blog post on buffer overruns). At that time I could improve the situation by applying two patches (kernel-preempt and low latency). […]