[Tutor] Problem When Iterating Over Large Test Files

Steven D'Aprano steve at pearwood.info
Thu Jul 19 01:54:26 CEST 2012


On Wed, Jul 18, 2012 at 04:33:20PM -0700, Ryan Waples wrote:
> I'm seeing some unexpected output when I use a script (included at
> end) to iterate over large text files.  I am unsure of the source of
> the unexpected output and any help would be much appreciated.

It may help if you can simplify your script to the smallest amount of 
code which demonstrates the problem. See here for more details:

http://sscce.org/

More suggestions follow below.

> In my output I am seeing lines that don't occur in the original file,
> and that don't match any lines in the original file.

How do you know? What are you doing to test that they don't match the 
original?

I'm not suggesting that you are wrong, I'm just trying to see what steps 
you have already taken.


> The incidences
> of badly formatted lines don't seem to match up with any patterns in
> the data file, and occur across multiple different data files.

Do they occur at random, or is this repeatable?

That is, if you get this mysterious output for files A, B, H and Q 
(say), do you *always* get them for A, B, H and Q?


> I've included 20 consecutive lines of input and output.  Each of these
> 5 'records' should have been selected and printed to the output file.

Earlier, you stated that each record should be four lines. But your 
sample data starts with a record of three lines.

More to follow later (time permitting).


-- 
Steven



More information about the Tutor mailing list