[Numpy-discussion] seeking advice on a fast string->array conversion

Christopher Barker Chris.Barker at noaa.gov
Tue Nov 16 13:01:11 EST 2010


On 11/16/10 8:57 AM, Darren Dale wrote:
> In my case, I am making an assumption about the integrity of the file.

That does make things easier, but less universal. I guess this is the 
whole trade-off about "reusable code". It sure it a lot easier to write 
code that does the one thing you need than something general purpose.

>> Anyone know what the advantage of ato* is over scanf()/fscanf()?
>>
>> Also, why are you doing string parsing rather than parsing the files
>> directly, wouldn't that be a bit faster?
>
> Rank inexperience, I guess. I don't understand what you have in mind.

if your goal is to read numbers from an ascii file, you can use 
fromfile() directly, rather than reading the file (or some of it) into a 
string, and then using fromstring(). Also, in C, you can use fscanf to 
read the file directly (of course, under the hood, it's putting stuff in 
stings somewhere along the line, but presumably in an optimized way.

> scanf/fscanf don't actually convert strings to numbers, do they?

yes, that's exactly what they do.

http://en.wikipedia.org/wiki/Scanf

The C lib may very well use ato* under the hood.

My idea at this point is to write a function in Cython to takes a file 
and a numpy dtype, converts the dtype to a scanf format string, then 
calls fscanf (or scanf) to parse out the file. My existing scanner code 
more or less does that, but the format string is hard-code to be either 
for floats or doubles.

>> I've got some C extension code for simple parsing of text files into
>> arrays of floats or doubles (using fscanf). I'd be curious how the
>> performance compares to what you've got. Let me know if you're interested.
>
> I'm curious, yes.

OK -- I'll whip up a test similar to yours -- stay tuned!

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list