[Python-Dev] Better text processing support in py2k?

Andy Robinson andy@robanal.demon.co.uk
Wed, 29 Dec 1999 00:34:43 -0800 (PST)


--- Skip Montanaro <skip@mojam.com> wrote:
>     fast/memory-intensive/clear
>     slow/memory-conserving/not-as-clear
>     fast/memory-conserving/fairly-muddy
> 
> Any particular reason that the readline method can't
> return an iterator that
> supports __getitem__ and buffers input?  (Again,
> remember this is for py2k,
> so the potential breakage such a change might cause
> is a consideration, but
> not a showstopper.)

Why not generalize fileinput to do buffering instead?

More generally, Java has the notion of 'stackable
streams' - e.g. construct a 'BufferedFile' around a
'File', maybe construct a 'Line-oriented file' around
that etc.  Each one takes a file-like object as an
argument to the constructor.  Things you might want to
do:
- buffering
- international encoding conversions
- line delimiters other than CR/LF/CRLF
- read/write Python objects (i.e. use pickle/marshal)
- easy interfaces to parsers

This took me a couple of hours to get used to (and at
the time I thought 'Yuk!' when I saw first saw four
nested constructors), but gives you very precise
control and a lot of versatility when handling files. 
It's an idiom Python does not use much but maybe it
should.

I'd argue that maybe some enhancements to fileinput.py
- adding some streams to provide building blocks for
these operations - would get us the power you want and
a lot more versatility besides.





=====
Andy Robinson
Robinson Analytics Ltd.
------------------
My opinions are the official policy of Robinson Analytics Ltd.
They just vary from day to day.

__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://messenger.yahoo.com