Python "compiler" is too slow for processing large data files???

Fri Sep 6 00:25:31 EDT 2002

[posted and mailed]

On Wed, 28 Aug 2002, Ron Horn wrote:

{...}

> Simple example - I can import or exec this file to load my data (my real app
> has int, float, and string data):
> ------ try5a3.py --------
> list1 = [
>     (323, 870, 46, ),
>     (810, 336, 271, ),
>     (572, 55, 596, ),
>     (337, 256, 629, ),
>     (31, 702, 16, ),
> ]
> print len(list1)
> ---------------------------
>
> Anyway, as my data files went from just a few lines, up to about 8000 lines
> (with 10 values in each line for total of about 450KB of text), the time to
> 'exec' the file became too slow (e.g. 15 seconds) and used too much memory
> (e.g. 50MB) (using ms-windows, python 2.2.1).  It is the "compile" phase,
> because if I re-run, and there is *.pyc file available, the import goes very
> fast (no compilation required).

Hmmm...  Your problem looks familiar.

A change to Python's parser was checked in to CVS by Tim Peters on both
the head (2.3 to be) and 2.2 maintenance branches, which works around
malloc()/realloc()/free() issues on several platforms in the face of very
long expressions - see test_longexp.py in Python's regression test suite.

The symptoms include gross memory consumption and/or very long runtimes
for test_longexp.py.

The fix will be available in 2.2.2 as well as 2.3 when released.

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac at bullseye.apana.org.au  (pref) | Snail: PO Box 370
        andymac at pcug.org.au             (alt) |        Belconnen  ACT  2616
Web:    http://www.andymac.org/               |        Australia