[Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.

Stefan Behnel stefan_ml at behnel.de
Tue Apr 12 20:22:05 CEST 2011


Arthur de Souza Ribeiro, 12.04.2011 14:59:
> Hi Stefan, yes, I'm working on this, in fact I'm trying to recompile json
> module (http://docs.python.org/library/json.html) adding some type
> definitions and cython things o get the code faster.

Cool.


> I'm getting in trouble with some things too, I'm going to enumerate here so
> that, you could give me some tips about how to solve them.
>
> 1 - Compile package modules - json module is inside a package (files:
> __init__.py, decoder.py, encoder.py, decoder.py) is there a way to generate
> the cython modules just like its get generated by cython?

The __init__.py doesn't really look performance critical. It's better to 
leave that modules in plain Python, that improves readability by reducing 
surprises and simplifies reuse by other implementations.

That being said, you can compile each module separately, just use the 
"cython" command line tool for that, or write a little distutils script as in

http://docs.cython.org/src/quickstart/build.html#building-a-cython-module-using-distutils

Don't worry too much about a build integration for now.


> 2 - Because I'm getting in trouble with issue #1, I'm running the tests
> manually, I go to %Python-dir%/Lib/tests/json_tests, get the files
> corresponding to the tests python make and run manually.

That's fine.


> 3 - To get the performance of the module, I'm thinking about to use the
> timeit function in  the unit tests for the project. I think a good number of
> executions would be made and it would be possible to compare each time.

That's ok for a start, artificial benchmarks are good to test specific 
functionality. However, unit tests tend to be short running with a lot of 
overhead, so later on, you will need to use real code to benchmark the 
modules. I would expect that there are benchmarks for JSON implementations 
around, and you can just generate a large JSON file and run loads and dumps 
on it.


> 4 - I didn't create the .pxd files, some problems are happening, it tells
> methods are not defined, but, they are defined, I will try to investigate
> this better

When reporting usage related problems (preferably on the cython-users 
mailing list), it's best to present the exact error messages and the 
relevant code snippets, so that others can quickly understand what's going 
on and/or reproduce the problem.


> The code is in this repository:
> https://github.com/arthursribeiro/JSON-module your feedback would be very
> important, so that I could improve my skills to get more and more able to
> work sooner in the project.

I'd strongly suggest implementing this in pure Python (.py files instead of 
.pyx files), with externally provided static types for performance. A 
single code base is very advantageous for a large project like CPython, 
much more than the ultimate 5% better performance.


> I think some things implemented in this rewriting process are going to be
> useful when doing this with C modules...

Well, if you can get the existing Python implementation up to mostly 
comparable speed as the C implementation, then there is no need to care 
about the C module anymore. Even if you can get only 90% of a module to run 
at comparable speed, and need to keep 10% in plain C, that's already a huge 
improvement in terms of maintainability.

Stefan


More information about the cython-devel mailing list