Com Ports Under NetBSD

TOPICS

What Do Other People Experience?
How Do Com Ports Work?
Why Does a FIFO Matter?
I have SILO errors; WHY?
Can I fix my SILO errors?
I want to push HUGE amounts of data through the line.
Things which should be done.

Com ports are a sore spot in the belly of the i386 port of NetBSD. Over the years, numerous people report problems with dropped characters, FIFO errors, lockups and so on. Some people prove immune and escape these problems altogether, others have obviously offended the packet gods. In the interest of information propogation, this page details some of the issues surrounding com ports, potential solutions, and comments by others.

What Do Other People Experience?

Mostly you need to be on the NetBSD mailing lists to hear about this sort of thing. Sometimes it comes up in the NetBSD newsgroups. I also maintain a page of data here; please add your experiences.

How Do Com Ports Work?

Under NetBSD, com devices work in the following manner. Data comes in from the modem to the UART. The UART, when it feels full, triggers an interrupt. NetBSD, when it hears the interrupt, invokes the comintr() routine which copies the UART's data into the tty system. There are several types of serial port UARTs:

8250 - old as time and dumb as rocks
16450 - also dumb
16550 - 16 byte FIFOs
16650 - 32 byte FIFOs
16750 - 64 byte FIFOs
ESP - 1024 byte FIFOs

Why Does a FIFO Matter?

FIFO stands for First In First Out; its a hardware queue onto which the incoming bytes are placed for the system software to retrieve. Similarly, outgoing bytes are placed onto the transmit FIFO for the modem to send. The size of the FIFO is important, because it defines the number of bytes that can be received before data is dropped. In the oldest UART chip, the 8250, there is but 1 byte for incoming data! This means that the system needs to recover that byte before another one appears. If the system is too slow, bytes are lost; this is a silo error.

10 bits per byte (8 bits data, 1 start, 1 stop)
Baud		Bytes/s	Sec/Byte	uS/byte
2400		240	0.00420		4200
9600		960	0.001		1000
14400		1440	0.0007		 700
28800		2880	0.000347	 347
38400		3840	0.00026		 260
57600		5760	0.000176	 176
115200		11520	0.000087	  87

So with a port running at 56700, you have ~170 micro-seconds to recover each byte before it is overwritten by the following byte. This can be a problem because of the interrupt time of ethernet and other devices, which can be over 200us. Under NetBSD 1.0, the interrupt dispatch time for the com interrupt was ~25us on a 486/33.

A FIFO multiplies the time allowed before data is lost. A 16 byte FIFO at 56700 provides between 1400us (trigger at 8) and 2100us (trigger at 4). It can provide a maximum of 2600us (trigger at 1). The NetBSD com device triggers interrupts at 8 characters, therefore when using a 16550, if any device in the system takes longer than 1400us, characters can be lost causing a silo overflow.

The Hayes ESP (and other large FIFO) board is a sure fire cure for silo lossage. Its enormous buffers provide ~88,392us before overflow. The NetBSD, FreeBSD, and I assume Linux com drivers have been ported to support this device.

I have SILO errors; WHY?

The questions you should be able to answer are: What type of UART do I have? How fast am I running my COM port? How loaded is my system? If you have a 16540 or 8250 UART, you should consider investing in a 16550 or better serial card. If you have a 16550 or better and you get frequent silo errors, you should consider slowing down the COM port from 115.2k to 57.6k or 38.4k. If you've got a dead slow old machine and you're determined to use it as a network router, consider an ESP card.

Can I fix my SILO errors?

Heres some information from Onno regarding 'fast interrupts'. I've not tried these; no idea if they work.

For those of you who are having com driver troubles on NetBSD/i386, try ftp://ftp.fwi.uva.nl/pub/comp/NetBSD/fvdl/com-290596.tar.gz.
Thanks to Frank for putting it up for ftp.
This is the faster serial driver as constructed by Onno van der Linden (onno@simplex.nl). To use it, you need to:

- Have a NetBSD/i386 -current which includes the recent IPL_HIGH changes
- Apply the icu.s.diff patch to /usr/src/sys/arch/i386/isa/icu.s
- drop com.c into /usr/src/sys/dev/isa
- recompile

I want to push HUGE amounts of data through the line.

One person runs a incoming satellite newsfeed through his 386 box. He runs his Hayes ESP serial port at 115.2k, but was forced to make a few tty changes to keep up with the large amount of data:

>sys/kern/tty.c: change 3 clalloc()s in ttymalloc() to 4k.
>sys/sys/tty.h: change TTYHOG to 4k.

These alterations allow the tty subsystem to buffer more data than normal (1k).

Things which should be done.

Certain benchmarking could make debugging some of these matters easier. In particular, it'd be nice to know what sort of time machines spend in network or disk i/o. What is the interrupt latency on com port? Do the inner workings of the tty layer negatively effect the com port? This sort of thing can be done with an oscilloscope hooked to the IRQ lines.

Do the blocking and deprioritising of interrupt work on i386? Here are some old messages talking about this sort of thing:

From: "John F. Woods" 
Date: Thu, 06 Jul 1995 08:22:36 -0400

If there is a microsecond-resolution clock available on the x86, you can
use the trick I used for interrupt latency calculations:  modify the spl
routines to figure out when and from where they were called, and keep track
of who holds the interrupt mask.  (This is going to be much trickier on the
386 than on the machine where I did this last, which had only one interrupt
level.)  If you have time, you can put macro wrappers around the spl routines
to pass in identification strings (like __FILE__ and __LINE__, or even custom
strings if you have lots of time), but it shouldn't be a trememdous amount of
work (I hope) to have splxxx grovel up the stack to find a likely candidate
for blame.

This would give a very complete picture of the interrupt situation, though
it might involve more time than you're able to put into it.  (I might even
try tackling this myself sometime, since it's a favorite hobby-horse of
mine, but I'm not sure I have enough interesting hardware to really find
unusual cases.  I suppose that once I get it working, others could take
the same code and run it on different hardware.)

From: Charles Hannum 
Date: Thu, 6 Jul 1995 12:32:08 -0400

There are many things that can cause interrupts to be delayed for
short periods.  Combined with the overhead of actually entering the
interrupt routine and calling up to the line discipline input routine,
the total latency before the next character is read can easily be too
high for an unbuffered serial port.  There are a few ways to partially
fix this:

1) Add an extra layer of buffering, to shorten the path inside the
interrupt handler.  This has been done.

2) Give tty interrupts a higher priority.  I was planning to do this
soon.  You could go further and (almost) never allow the lower half of
the interrupt handler to be blocked.  This would give you close to the
minimum possible latency.

3) Modify the line discipline interface to allow passing up a larger
number of characters at once.  For things like SLIP and PPP, this
could significantly reduce overhead by eliminating function calls and
allowing the line discipline to have tighter loops for several things.
This would help prevent overflowing the secondary buffer.

I would guess that items 2 and 3 could be implemented in a weekend.  I
don't currently have a weekend free, though.  (hint, hint)

The FreeBSD code mostly does item 2.  It goes to a slight extreme on
item 3, by inlining part of the generic tty interface into the serial
driver.  While this probably improves performance, it's not really
acceptable from an architectural viewpoint.

From: Robert Dobbs 
Date: Thu, 6 Jul 1995 15:05:25 -0700

Heres an outline of things that'd need to be done to support multiple
character transfers.  Note that nothing need be broken until the last
step when the new functions are enabled.  Everything else can be done 
while retaining the original interface.

* new linesw function for multiple char transfer: l_mrint
        /sys/include/conf.h     add l_mrint to linesw structure
        /sys/kern/tty_conf.c    add NULL entries for l_mrint initialization

* check the new l_mrint in com.c: if not NULL, use it rather than l_rint
        /sys/dev/isa/com.c
        /* other places? */

* add l_mrint interface stubs for slip and ppp
        /sys/net/if_sl.c        slmultinput(int *p, int size, struct tty *tp)
        /sys/net/if_ppp.c       pppmultinput(int *p, int size, struct tty *tp)

* modify the slip and ppp stub functions to work ;)
        in if_sl.c, this would essentially entail wrapping the code from
        slinput() in a while loop.  this would be a first step.
        i assume if_ppp.c would be the same way.
* "turn on" the minput routines by setting the l_mrint pointer to the
        proper function.

From: Chris G Demetriou 
Date: Wed, 12 Jul 1995 05:01:37 -0400

> Heres an outline of things that'd need to be done to support multiple
> character transfers.  Note that nothing need be broken until the last
> step when the new functions are enabled.  Everything else can be done 
> while retaining the original interface.

to my mind, if a "multiple input" routine is going to be created, it
should completely replace the old "single input" routine.  it's not
very much extra work at all, to pass a pointer and a constant, than to
pass a variable, and it relieves the need to maintain two both
functions.

that's a much more 'sweeping' change, though...