[Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.
stefan_ml at behnel.de
Tue Apr 12 20:22:05 CEST 2011
Arthur de Souza Ribeiro, 12.04.2011 14:59:
> Hi Stefan, yes, I'm working on this, in fact I'm trying to recompile json
> module (http://docs.python.org/library/json.html) adding some type
> definitions and cython things o get the code faster.
> I'm getting in trouble with some things too, I'm going to enumerate here so
> that, you could give me some tips about how to solve them.
> 1 - Compile package modules - json module is inside a package (files:
> __init__.py, decoder.py, encoder.py, decoder.py) is there a way to generate
> the cython modules just like its get generated by cython?
The __init__.py doesn't really look performance critical. It's better to
leave that modules in plain Python, that improves readability by reducing
surprises and simplifies reuse by other implementations.
That being said, you can compile each module separately, just use the
"cython" command line tool for that, or write a little distutils script as in
Don't worry too much about a build integration for now.
> 2 - Because I'm getting in trouble with issue #1, I'm running the tests
> manually, I go to %Python-dir%/Lib/tests/json_tests, get the files
> corresponding to the tests python make and run manually.
> 3 - To get the performance of the module, I'm thinking about to use the
> timeit function in the unit tests for the project. I think a good number of
> executions would be made and it would be possible to compare each time.
That's ok for a start, artificial benchmarks are good to test specific
functionality. However, unit tests tend to be short running with a lot of
overhead, so later on, you will need to use real code to benchmark the
modules. I would expect that there are benchmarks for JSON implementations
around, and you can just generate a large JSON file and run loads and dumps
> 4 - I didn't create the .pxd files, some problems are happening, it tells
> methods are not defined, but, they are defined, I will try to investigate
> this better
When reporting usage related problems (preferably on the cython-users
mailing list), it's best to present the exact error messages and the
relevant code snippets, so that others can quickly understand what's going
on and/or reproduce the problem.
> The code is in this repository:
> https://github.com/arthursribeiro/JSON-module your feedback would be very
> important, so that I could improve my skills to get more and more able to
> work sooner in the project.
I'd strongly suggest implementing this in pure Python (.py files instead of
.pyx files), with externally provided static types for performance. A
single code base is very advantageous for a large project like CPython,
much more than the ultimate 5% better performance.
> I think some things implemented in this rewriting process are going to be
> useful when doing this with C modules...
Well, if you can get the existing Python implementation up to mostly
comparable speed as the C implementation, then there is no need to care
about the C module anymore. Even if you can get only 90% of a module to run
at comparable speed, and need to keep 10% in plain C, that's already a huge
improvement in terms of maintainability.
More information about the cython-devel