Is pythonic version of scanf() or sscanf() planned?

Sun Oct 4 18:23:34 EDT 2009

On Sun, 4 Oct 2009 13:18:22 -0400,
	Simon Forman <sajmikins at gmail.com> wrote:
> On Sun, Oct 4, 2009 at 5:29 AM, Martien Verbruggen
><martien.verbruggen at invalid.see.sig> wrote:
>> On Sun, 4 Oct 2009 01:17:18 +0000 (UTC),
>>        Grant Edwards <invalid at invalid.invalid> wrote:
>>> On 2009-10-03, ryniek90 <ryniek90 at gmail.com> wrote:
>>>
>>>> So, whether it is or has been planned the core Python
>>>> implementation of *scanf()* ?

>>> Given the bad behavior and general fragility of scanf(), I
>>> doubt there's much demand for something equally broken for
>>> Python.
>>
>> scanf() is not broken. It's just hard to use correctly for unpredictable
>> input.
>>
>> Having something equivalent in Python would be nice where most or all of
>> your input is numerical, i.e. floats or integers. Using the re module,
>> or splitting and converting everything with int() or float() slows down
>> your program rather spectacularly. If those conversions could be done
>> internally, and before storing the input as Python strings, the speed
>> improvements could be significant.

> I haven't tried it but couldn't you use scanf from ctypes?

I have just tried it. I wasn't aware of ctypes, being relatively new to
Python. :)

However, using ctypes makes the simple test program I wrote
actually slower, rather than faster. Probably the extra conversions
needed between ctypes internal types and Python's eat op more time.
Built in scanf()-like functionality would not need to convert the same
information two or three times. it would parse the bytes coming in from
the input stream directly, and set the values of the appropriate Python
variable directly.

Contrive an example:
Assume an input file with two integers, and three floats per line,
separated by a space. output should be the same two integers, followed
by the average of the three floats.

In pure python, now, there is string manipulation (file.readline(), and
split()) and conversion of floats going on:

from sys import *
for line in stdin:
    a, b, u, v, w = line.split()
    print a, " ", b, " ", (float(u) + float(v) + float(w))/3.0

(17.57s user 0.07s system 99% cpu 17.728 total)

With ctypes, it becomes something like:

from sys import *
from ctypes import *
from ctypes.util import find_library

libc = cdll.LoadLibrary(find_library('c'))
a = c_int()
b = c_int()
u = c_float()
v = c_float()
w = c_float()
for line in stdin:
    libc.sscanf(line, "%d%d%f%f%f", 
            byref(a), byref(b), byref(u), byref(v), byref(w))
    print "{0} {1} {2}".format(a.value, b.value, 
            (u.value + v.value + w.value)/3.0)

(22.21s user 0.10s system 98% cpu 22.628)

We no longer need split(), and the three conversions from string to
float(), but now we have the 5 c_types(), and the .value dereferences at
the end. And that makes it slower, unfortunately. (Maybe I'm still doing
things a bit clumsily and it could be faster)

It's not really a big deal: As I said before, if I really need the
speed, I'll write C:

#include <stdio.h>
int main(void)
{
    int a, b;
    float u, v, w;

    while (scanf("%d%d%f%f%f", &a, &b, &u, &v, &w) == 5)
	printf("%d %d %f\n", a, b, (u + v + w)/3.0);

    return 0;
}

(5.96s user 0.06s system 99% cpu 6.042 total)

Martien
-- 
                             | 
Martien Verbruggen           | There is no reason anyone would want a
first.last at heliotrope.com.au | computer in their home. -- Ken Olson,
                             | president DEC, 1977