Working with serial lines can sometimes give you big headaches. I have an embedded PC based on 386 SX 40 processor. This PC doesn’t make much but it has programs using the serial line intensively. Things didn’t work as well as expected so I looked carefully what was going on … the beast was loosing bytes ! My information has been promptly confirmed by the /proc/tty/driver/serial entry. If you have “oe: X” (where X is a positive number) there, it means that one of your UART detected overrun errors.
So what’s an overrun ? An overrun happens when the UART receives data while its FIFO buffer is full. Why is the FIFO full ? Because Linux didn’t treat the serial interruption quickly enough. Why is Linux so slow ? Linux is not a real time OS and it doesn’t guarantee any response time to interruption, so Linux is not so slow but my PC really is … what happens is that interruption related to the network are treated before serial interruptions. Furthermore IDE disk interruptions can take too long too. Worst case is of course, you’re treating a disk interruption, then you have to treat the network interruption and only after that you can treat the serial interruption which in fact happened right after the beginning of the disk interruption
So fixing serial overrun is a rather complex problem since it’s really a kernel related problem. Googling on the subject I have found several ideas to explore :
- configure IDE disk to use DMA
hdparm -d 1 /dev/hda, use of DMA will shorten the time where IRQ will be masked to the kernel (in my case it doesn’t work since I’m using DiskOnModule which do not support DMA)
- make disk IRQ interruptible with
hdparm -u 1 /dev/hda
irqtuneto re-prioritize the IRQ on the interruption controller. This software is no more maintained and it doesn’t work out of the box on kernel 2.4.x.
hdparm -u wasn’t enough to solve my problem… so I continued to look for a solution and I found one ! I recompiled my kernel with the low latency patch and the preemptible kernel patch. Those are usually used for multimedia applications where you need good responsiveness in order to deliver content in real-time but the fact is that they do work for my purpose too !
My serial overruns are completely gone at 9600 bauds. However I can still have some when running at 115200 bauds. Moreover I can create serial overrun by running a
find / -type f | grep -v /proc/ | xargs md5sum in the background… I can’t make miracles with this slow processor… if you have more ideas to further improve the situation, I’m willing to try !