Comparison of parsers in python?

andrew cooke andrew at acooke.org
Sun Sep 20 15:12:22 CEST 2009


> The file size of a wig file can be very large (GB). Most tasks on this
> file format does not need the parser to save all the lines read from
> the file in the memory to produce the parsing result. I'm wondering if
> pyparsing is capable of parsing large wig files by keeping only
> minimum required information in the memory.

ok, now you are getting into the kind of detail where you will need to
ask the authors of individual packages.

lepl is stream oriented and should behave as you want (it will only
keep in memory what it needs, and will read data gradually from a
file) but (1) it's fairly new and i have not tested the memory use -
there may be some unexpected memory leak; (2) it's python 2.6/3 only;
(3) parsing line-based formats like this is not yet supported very
well (you can do it, but you have to explicitly match the newline
character to find the end of line); (4) the community for support is
small.

so i would suggest asking on the pyparsing list for advice on using
that with large data files (you are getting closer to the point where
i would recommend lepl - but of course i am biased as i wrote it).

andrew

ps is there somewhere can download example files?  this would be
useful for my own testing.  thanks.



More information about the Python-list mailing list