Python2.2 + mailbox. Bug????

Heiko Wundram heikowu at ceosg.de
Fri May 21 13:32:03 EDT 2004


Am Freitag, 21. Mai 2004 18:18 schrieb Noam Raphael:
> I assume that when you write "./program.py < mboxfile.txt", Python knows
> that sys.stdin is a regular file, so it can do seek on it (for example,
> go back to its beginning).

It's not Python that knows that you can do a seek on the file, but what the 
shell does when you pipe a file to a program is to call filefd = 
open(file,"r"); fdup2(filefd,0) (0 = stdin) just before the shell forks to 
start the program. sys.stdin is always just connected to the filedescriptor 0 
which was passed in, which in turn is connected to a file file-descriptor by 
the shell, which in turn is seekable.

> When you write "cat mboxfile.txt | 
> program.py", the program cat outputs the file mboxfile.txt into
> sys.stdin byte by byte, so you can't do seek on it.

Now, when you pipe something into another program, exactly this gets 
generated: a pipe is generated using readfd, writefd = pipe() whose write end 
is connected to the stdout (fdup2(writefd,1); fd 1 = stdout) of the first 
program when the shell forks to start it, and whose read end is connected to 
stdin (fdup2(readfd,0); fd 0 = stdin, as before) of the second program, again 
when the shell forks to start it. This means that sys.stdin of the Python 
program, which again is connected to filedescriptor 0 is now connected to a 
pipe file-descriptor. Pipes are not seekable, and that's exactly what the 
exception is telling you.

So, what do we learn from this? The mbox format needs a filedescriptor which 
is seekable to be able to parse it (err, I guess it wouldn't need this, but 
who knows, look at the source luke!), so you need to pass a reference to a 
file-like object which implements seek (or at least a file-descriptor which 
is seekable, which pipes are not).

HTH!

Heiko.




More information about the Python-list mailing list