string parsing screwing up on large files?

Bengt Richter bokr at oz.net
Sat Dec 20 08:37:49 EST 2003


On 19 Dec 2003 18:55:29 -0800, danl_kramer at yahoo.com (Daniel Kramer) wrote:

>Hello, I'm fairly new to python but I've written a script that takes
>in a special text file (a renderman .rib to be specific).. and filters
>some of the commands.  The .rib file is a simple text file, but in
>some cases it's very large.. can be 20megs or more at times.
>
>The script steps though each line looking for keywords and changes the
>line if nessisary but most lines just pass in and out of the script
>un-modified.  The problem is sometimes the lines aren't written out
>correctly and it's an intermittent problem.  If I re-run the script
>again on the same input usually it works fine.  After filtering about
>100 files i might get 4 or 5 that come out bad.. simply re-running
>those fixes them.
>
>Anyone know what I might look for? It's possible that the machine is
>under a lot of i/o load and/or cpu load when it happens, but not sure
>about that.. I normally send this processing to a render farm, so it's
>hard to predict exactly what sort of load is going on at that time. It
>feels like a buffer isn't getting flushed before the text is written
>out.. or something like that.
>
>Any suggestions where I might look?
>
What is telling you that some lines aren't correct? Renderman syntax errors?
Maybe if you saved the bad file(s) and re-ran the changes until you got a good
one, and then ran diff -u goodfile badfile to see how things were actually
changing, it would become clear. Or if not, you could post some diffs and
the code that should be accomplishing the changes, and we could go from there.

Is the code threaded? Are you perhaps clobbering something across threads
occasionally? Accidental name collisions? Unsychronized accesses?

You might also want to mention what platform and python version etc you are running.
Maybe there is a file system bug that an upgrade would fix? It doesn't happen often,
but it might be worth googling for for your platform.

Regards,
Bengt Richter




More information about the Python-list mailing list