[pypy-dev] Converting from Python strings to C strings slow?

Martin C. Martin martin at martincmartin.com
Sun Jan 27 09:51:46 CET 2008



Carl Friedrich Bolz wrote:
> Martin C. Martin wrote:
>>
>> Thanks, but I'm not trying to write a standalone program, I need to 
>> call some 3rd party libraries.  For example, the string comes from one 
>> of a couple dozen of socket connections, managed by Twisted.  So I 
>> just want my inner loop in RPython.  The inner loop turns XML into a 
>> MySQL statement, which the main python program can then send to a 
>> database.
>>
>> So I need to get a big string into RPython, and a smaller (but still 
>> pretty big) string out of it.
> 
> Couldn't you just use a subprocess, read the string from stdin and write 
> the result to stdout? It's quite likely that this is not slower than the 
> way strings are passed in and out now and has many advantages. You would 
> need to use os.read and os.write, since sys.stdin/stdout is not 
> supported in RPython, but apart from that it should work fine.
> 
> One of them is that if you use the Translation class, your RPython 
> program will use reference counting, which is our slowest GC. If you use 
> a subprocess you get the benefits of our much better generational GC.

What I'm really looking for is a way to write most of my applications in 
a dynamic language (because its more productive to write & maintain), 
then if and when performance is a problem, have some way to speed it up. 
  PyPy promises to do this even before performance is a problem, which 
will be great!

Until that comes, I was hoping for a language where I could give some 
hints to the compiler or runtime to speed it up.  Things like "although 
this binding could change each time through the loop, it doesn't 
actually change, so there's no need to do a hash lookup for every 
access."  Or "this variable is always an int."

The only language I know of that can do that is Lisp, which is a strong 
possibility.  But Lisp's syntax is more verbose and low level than 
modern dynamic languages, it doesn't have as many libraries, it doesn't 
have an IDE with auto completion, or a good source level debugger.  I 
had hoped Groovy would be like that, with its optional typing and Java 
inspired syntax and semantics, but sadly, the developers valued 
dynamism, however rarely used, over performance.

So the next best thing is to rewrite the performance critical parts in 
some other language.  I had hoped RPython would be that language for 
Python, but it turns out not to be.  I could rewrite in C++, but the 
semantics of C++ are very different than Python, so interfacing the two 
becomes verbose and awkward.  The ctypes module looks good for calling C 
libraries that weren't originally designed to work with Python.  But it 
doesn't have a good way (or any way?) to manipulate Python objects from 
C.  Even Java's JNI makes for a lot of boilerplate code to translate 
back and forth.

So it looks like my best bet may be Groovy, which interacts with Java 
seamlessly.  A year ago, when I last checked, the IDEs weren't up to the 
job, but may that's changed.

And once PyPy is done, that may be an even better solution.

Best,
Martin



More information about the Pypy-dev mailing list