[pypy-dev] Converting from Python strings to C strings slow?

Sat Jan 26 11:27:27 CET 2008

Martin C. Martin wrote:
> 
> Maciek Fijalkowski wrote:
>> Martin C. Martin wrote:
>>> Hi,
>>>
>>> There seems to be a lot of overhead when passing a large string (23 
>>> Meg) to C compiled RPython code.  For example, this code:
>>>
>>> def small(text):
>>>     return 3
>>>
>>> t = Translation(small)
>>> t.annotate()
>>> t.rtype()
>>> f3 = t.compile_c()
>>>
>>> st = time.time()
>>> z = f3(xml)
>>> print time.time() - st
>>>
>>>   
>> This is wrong. You should even get a warning, the proper command is 
>> t.annotate([str]).
> 
> Oops, yes, I've been working with variations of this all day, and I 
> hadn't actually compiled & run the example in the email, although I'd 
> done something equivalent.
> 
>> Besides, this is not the official way of writing rpython standalone 
>> programs.
> 
> Thanks, but I'm not trying to write a standalone program, I need to call 
> some 3rd party libraries.  For example, the string comes from one of a 
> couple dozen of socket connections, managed by Twisted.  So I just want 
> my inner loop in RPython.  The inner loop turns XML into a MySQL 
> statement, which the main python program can then send to a database.
> 
> So I need to get a big string into RPython, and a smaller (but still 
> pretty big) string out of it.

Couldn't you just use a subprocess, read the string from stdin and write 
the result to stdout? It's quite likely that this is not slower than the 
way strings are passed in and out now and has many advantages. You would 
need to use os.read and os.write, since sys.stdin/stdout is not 
supported in RPython, but apart from that it should work fine.

One of them is that if you use the Translation class, your RPython 
program will use reference counting, which is our slowest GC. If you use 
a subprocess you get the benefits of our much better generational GC.

Cheers,

Carl Friedrich