[Python-Dev] Better text processing support in py2k?
Andy Robinson
andy@robanal.demon.co.uk
Wed, 29 Dec 1999 00:34:43 -0800 (PST)
--- Skip Montanaro <skip@mojam.com> wrote:
> fast/memory-intensive/clear
> slow/memory-conserving/not-as-clear
> fast/memory-conserving/fairly-muddy
>
> Any particular reason that the readline method can't
> return an iterator that
> supports __getitem__ and buffers input? (Again,
> remember this is for py2k,
> so the potential breakage such a change might cause
> is a consideration, but
> not a showstopper.)
Why not generalize fileinput to do buffering instead?
More generally, Java has the notion of 'stackable
streams' - e.g. construct a 'BufferedFile' around a
'File', maybe construct a 'Line-oriented file' around
that etc. Each one takes a file-like object as an
argument to the constructor. Things you might want to
do:
- buffering
- international encoding conversions
- line delimiters other than CR/LF/CRLF
- read/write Python objects (i.e. use pickle/marshal)
- easy interfaces to parsers
This took me a couple of hours to get used to (and at
the time I thought 'Yuk!' when I saw first saw four
nested constructors), but gives you very precise
control and a lot of versatility when handling files.
It's an idiom Python does not use much but maybe it
should.
I'd argue that maybe some enhancements to fileinput.py
- adding some streams to provide building blocks for
these operations - would get us the power you want and
a lot more versatility besides.
=====
Andy Robinson
Robinson Analytics Ltd.
------------------
My opinions are the official policy of Robinson Analytics Ltd.
They just vary from day to day.
__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://messenger.yahoo.com