[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

linux pipes



Wes recently mentioned some problems using pipes for loose data
synchronization, so I thought I'd take a look.

The interesting files in the kernel appear to be:
	fs/pipe.c
	include/linux/pipe_fs_i.h
	include/limits.h
from these it's apparent that the data structure used for linux pipes
is really simple: a fixed size array.  New data goes on the end, old
data comes off the "start".  Data in the pipe doesn't move.  This means
that a sequence of reads and writes that fails to completely consume
pending data inside the pipe will leave a trail that crawls out to the
end of the array, and then block until that last bit of data is
consumed.

Normally, this is of no consequence.  The array size itself is
4096 bytes, and since most applications will be using that
exact same size, the semantics work out to:
	application A writes 4096 bytes
			application B reads 4096 bytes
	application A writes 4096 bytes
			application B reads 4096 bytes
	...
not a whole lot of buffering happening there, but otherwise fine.

Now, the interesting case is if you perform I/O in units less than 4096
bytes in size, coupled with non-blocking I/O.  Doing reads of less than
4096 bytes can result in leaving data in the pipe.  Writes of 4096
bytes or less are atomic--either all the data is written, or none is
written.  So, if you are doing a series of reads and writes in rapid
succession such that the reads aren't quite keeping up with the writes,
the queued data will creep out to the end of the buffer, and then, that
next write will fail.  This can happen even if very little data is
actually pending in the pipe--all that is necessary is that it was not
completely empty at any point for the last at most 4096 bytes worth of
data was written.  Select-wise, linux pipes are also strange:  they are
either ready for reading (pipe is not empty) or ready for writing (pipe
is completely empty), but never both.  However, this is correct
behavior for when PIPE_SIZE == PIPE_BUF.

Just for the sake of comparison, openbsd implements pipes using a
similar array in kernel address space, but does handle wraparound
unlike linux.  Limit-wise, openbsd only guarantees atomicity for a
smaller number of bytes:
	PIPE_BUF	512
but by defaults buffers a larger amount of data
	PIPE_SIZE	16384
and for applications that do big writes, will support at most 32 pipes
of a larger size:
	BIG_PIPE_SIZE	65536
OpenBSD also has more complicated logic for select such that a pipe can
be simultaneously ready to read & write, and has high water/low water
logic to decouple read and write application buffer size interaction.

				-Marcus