[Numpy-discussion] Question about improving genfromtxt errors

Tue Sep 29 16:36:59 EDT 2009

On 09/29/2009 01:30 PM, Pierre GM wrote:
> On Sep 29, 2009, at 1:57 PM, Bruce Southey wrote:
>
>    
>> On 09/29/2009 11:37 AM, Christopher Barker wrote:
>>      
>>> Pierre GM wrote:
>>>
>>>        
>>   Probably more than memory is the execution time involved in printing
>> these problem rows.
>>      
> The rows with problems will be printed outside the loop (with at least
> an associated warning or possibly raising an exception). My concern is
> to whether store only the tuples (index of the row, nb of columns) for
> the invalid rows, or just create a list of nb of columns that I'd
> parse afterwards. The first solution requires an extra test in the
> loop, the second may waste some memory space.
> Bah, I'll figure it out. Please send me some test cases so that I can
> time/test the best option.
>    
>>      
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>    
Hi,
The first case just has to handle a missing delimiter - actually I 
expect that most of my cases would relate this. So here is simple Python 
code to generate arbitrary large list with the occasional missing delimiter.

I set it so it reads the desired number of rows and frequency of bad 
rows from the linux command line.
$time python tbig.py 1000000 100000

If I comment out the extra prints in io.py that I put in, it takes about 
22 seconds to finish if the delimiters are correct. If I have the 
missing delimiter it takes 20.5 seconds to crash.

Bruce

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tbig.py
Type: text/x-python
Size: 530 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090929/5572dd90/attachment.py>