LZMA compression support in 3.3

Hello all, I'd like to propose the addition of a new module in Python 3.3. The 'lzma' module will provide support for compression and decompression using the LZMA algorithm, and the .xz and .lzma file formats. The matter has already been discussed on the tracker <http://bugs.python.org/issue6715>, where there seems to be a consensus that this is a desirable feature. What are your thoughts? The proposed module's API will be very similar to that of the bz2 module; the only differences will be additional keyword arguments to some functions, for specifying container formats and detailed compressor options. The implementation will also be similar to bz2 - basic compressor and decompressor classes written in C, with convenience functions and a file interface implemented on top of those in Python. I've already done some work on the C parts of the module; I'll push that to my sandbox <http://hg.python.org/sandbox/nvawda/> in the next day or two. Cheers, Nadeem

When I reviewed lzma, I found that this approach might not be appropriate. lzma has many more options and aspects that allow tuning and selection, and a Python LZMA library should provide the same feature set as the underlying C library. So I would propose that a very thin C layer is created around the C library that focuses on the actual algorithms, and that any higher layers (in particular file formats) are done in Python. Regards, Martin

On Sat, Aug 27, 2011 at 4:50 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I probably shouldn't have used the word "basic" here - these classes expose all the features of the underlying library. I was rather trying to underscore that the rest of the module is implemented in terms of these two classes. As for file formats, these are handled by liblzma itself; the extension module just selects which compressor/decompressor initializer function to use depending on the value of the "format" argument. Our code won't contain anything along the lines of GzipFile; all of that work is done by the underlying C library. Rather, the LZMAFile class will be like BZ2File - just a simple filter that passes the read/written data through a LZMACompressor or LZMADecompressor as appropriate. Cheers, Nadeem

This is exactly what I worry about. I think adding file I/O to bz2 was a mistake, as this doesn't integrate with Python's IO library (it used to, but now after dropping stdio, they were incompatible. Indeed, for Python 3.2, BZ2File has been removed from the C module, and lifted to Python. IOW, the _lzma C module must not do any I/O, neither directly nor indirectly (through liblzma). The approach of gzip.py (doing IO and file formats in pure Python) is exactly right. Regards, Martin

On Sun, Aug 28, 2011 at 1:15 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
PEP 399 also comes into play - we need a pure Python version for PyPy et al (or a plausible story for why an exception should be granted). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, 28 Aug 2011 01:36:50 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
The plausible story being that we basically wrap an existing library? I don't think PyPy et al have pure Python versions of the zlib or OpenSSL, do they? If we start taking PEP 399 conformance to such levels, we might as well stop developing CPython. cheers Antoine.

On Sun, Aug 28, 2011 at 1:40 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It's acceptable for the Python version to use ctypes in the case of wrapping an existing library, but the Python version should still exist. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 27, 2011 at 5:42 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Not sure whether you already have this: supporting the tarfile module would be nice.
Yes, got that - issue 5689. Also of interest is issue 5411 - adding .xz support to distutils. But I think that these are separate projects that should wait until the lzma module is finalized. On Sat, Aug 27, 2011 at 5:40 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Indeed, PEP 399 specifically notes that exemptions can be granted for modules that wrap external C libraries. On Sat, Aug 27, 2011 at 5:52 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm not too sure about that - PEP 399 explicitly says that using ctypes is frowned upon, and doesn't mention anywhere that it should be used in this sort of situation. Cheers, Nadeem

On Sun, Aug 28, 2011 at 1:58 AM, Nadeem Vawda <nadeem.vawda@gmail.com> wrote:
Note to self: do not comment on python-dev at 2 am, as one's ability to read PEPs correctly apparently suffers :) Consider my comment withdrawn, you're quite right that PEP 399 actually says this is precisely the case where an exemption is a reasonable idea. Although I believe it's likely that PyPy will wrap it with ctypes anyway :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 27, 2011 at 9:04 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'd like to better understand why ctypes is (sometimes) frowned upon. Is it the brittleness? Tendency to segfault? If yes, is there a way of making ctypes less brittle - say, by carefully matching it against a specific version of a .so/.dll before starting to make heavy use of said .so/.dll? FWIW, I have a partial implementation of a module that does xz from Python using ctypes. It only does in-memory compression and decompression (not stream compression or decompression to or from a file), because that was all I needed for my current project, but it runs on CPython 2.x, CPython 3.x, and PyPy. I don't think it runs on Jython, but I've not looked at that carefully - my code falls back on subprocess if ctypes doesn't appear to be all there. It's at http://stromberg.dnsalias.org/svn/xz_mod/trunk/xz_mod.py

I'd like to better understand why ctypes is (sometimes) frowned upon.
Is it the brittleness? Tendency to segfault?
That, and Python should work completely if ctypes is not available.
FWIW, I have a partial implementation of a module that does xz from Python using ctypes.
So does it work on Sparc/Solaris? On OpenBSD? On ARM-Linux? Does it work if the xz library is installed into /opt/sfw/xz? Regards, Martin

On Sat, Aug 27, 2011 at 1:21 PM, "Martin v. Löwis" <martin@v.loewis.de>wrote:
What are the most major platforms ctypes doesn't work on? It seems like there should be some way of coming up with an xml file describing the types of the various bits of data and formal arguments - perhaps using gccxml or something like it.
So far, I've only tried it on a couple of Linuxes and Cygwin. I intend to try it on a large number of *ix variants in the future, including OS/X and Haiku. I doubt I'll test OpenBSD, but I'm likely to test on FreeBSD and Dragonfly again. With regard to /opt/sfw/xz, if ctypes.util.find_library(library) is smart enough to look there, then yes, xz_mod should find libxz there. On Cygwin, ctypes.util.find_library() wasn't smart enough to find a Cygwin DLL, so I coded around that. But it finds the library OK on the Linuxes I've tried so far. (This is part of a larger project, a backup program. The backup program has been tested on a large number of OS's, but I've not done another broad round of testing yet since adding the ctypes+xz code)

On Sat, Aug 27, 2011 at 10:41 PM, Dan Stromberg <drsalists@gmail.com> wrote:
The problem is that you would need to do this check at runtime, every time you load up the library - otherwise, what happens if the user upgrades their installed copy of liblzma? And we can't expect users to have the liblzma headers installed, so we'd have to try and figure out whether the library was ABI-compatible from the shared object alone; I doubt that this is even possible.

On Sat, Aug 27, 2011 at 2:38 PM, Nadeem Vawda <nadeem.vawda@gmail.com>wrote:
I was thinking about this as I was getting groceries a bit ago. Why -can't- we expect the user to have liblzma headers installed? Couldn't it just be a dependency in the package management system? BTW, gcc-xml seems to be only for C++ (?), but long ago, around the time people were switching from K&R to Ansi C, there were programs like "mkptypes" that could parse a .c/.h and output prototypes. It seems we could do something like this on module init. IMO, we really, really need some common way of accessing C libraries that works for all major Python variants.

On Sat, Aug 27, 2011 at 3:26 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Well, uhhhhh, yeah. Not sure what your point is. 1) We could easily work with the dev / nondev distinction by taking a dependency on the -dev version of whatever we need, instead of the nondev version. 2) It's a rather arbitrary distinction that's being drawn between dev and nondev today. There's no particular reason why the line couldn't be drawn somewhere else.
Also, under Windows, most users don't have development stuff installed at all.
Yes... But if the nature of "what development stuff is" were to change, they'd have different stuff. Also, we wouldn't have to parse the .h's every time a module is loaded - we could have a timestamp file (or database) indicating when we last parsed a given .h. Also, we could query the package management system for the version of lzma that's currently installed on module init. Also, we could include our own version of lzma. Granted, this was a mess when zlib needed to be patched, but even this one might be worth it for the improved library unification across Python implementations.

On Sat, Aug 27, 2011 at 4:27 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Sure. Now please convince Linux distributions first, because this particular subthread is going nowhere.
I hope you're not a solipsist. Anyway, if the mere -discussion- of embracing a standard and safe way of making C libraries callable from all the major Python implementations is "going nowhere" before the discussion has even gotten started, I fear for Python's future. Repeat aloud to yourself: Python != CPython. Python != CPython. Python != CPython. Has this topic been discussed to death? If so, then say so. It's rude to try to kill the thread summarily before it gets started, sans discussion, sans explanation, sans commentary on whether new additions to the topic have surfaced or not.

On Sat, Aug 27, 2011 at 4:27 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Interesting. You seem to want to throw an arbitrary barrier between Python, the language, and accomplishing something important for said language. Care to tell me why I'm wrong? I'm all ears. I'll note that you've deleted:
...which makes it more than apparent that we needn't convince Linux distributors of #2, which you seem to prefer to focus on. Why was it in your best interest to delete #1, without even commenting on it?

Dan, I once had the more or less the same opinion/question as you with regard to ctypes, but I now see at least 3 problems. 1) It seems hard to write it correctly. There are currently 47 open ctypes issues, with 9 being feature requests, leaving 38 behavior-related issues. Tom Heller has not been able to work on it since the beginning of 2010 and has formally withdrawn as maintainer. No one else that I know of has taken his place. 2) It is not trivial to use it correctly. I think it needs a SWIG-like companion script that can write at least first-pass ctypes code from the .h header files. Or maybe it could/should use header info at runtime (with the .h bundled with a module). 3) It seems to be slower than compiled C extension wrappers. That, at least, was the discovery of someone who re-wrote pygame using ctypes. (The hope was that using ctypes would aid porting to 3.x, but the time penalty was apparently too much for time-critical code.) If you want to see more use of ctypes in the Python community (though not necessarily immediately in the stdlib), feel free to work on any one of these problems. A fourth problem is that people capable of working on ctypes are also capable of writing C extensions, and most prefer that. Or some work on Cython, which is a third solution. -- Terry Jan Reedy

On Sun, Aug 28, 2011 at 6:58 AM, Terry Reedy <tjreedy@udel.edu> wrote:
This is sort of already available: -- http://starship.python.net/crew/theller/ctypes/old/codegen.html -- http://svn.python.org/projects/ctypes/trunk/ctypeslib/ It just appears to have never made it into CPython. I've used it successfully on a small project. Schiavo Simon

Hi, sorry for hooking in here with my usual Cython bias and promotion. When the question comes up what a good FFI for Python should look like, it's an obvious reaction from my part to throw Cython into the game. Terry Reedy, 28.08.2011 06:58:
Cython has an active set of developers and a rather large and growing user base. It certainly has lots of open issues in its bug tracker, but most of them are there because we *know* where the development needs to go, not so much because we don't know how to get there. After all, the semantics of Python and C/C++, between which Cython sits, are pretty much established. Cython compiles to C code for CPython, (hopefully soon [1]) to Python+ctypes for PyPy and (mostly [2]) C++/CLI code for IronPython, which boils down to the same build time and runtime kind of dependencies that the supported Python runtimes have anyway. It does not add dependencies on any external libraries by itself, such as the libffi in CPython's ctypes implementation. For the CPython backend, the generated code is very portable and is self-contained when compiled against the CPython runtime (plus, obviously, libraries that the user code explicitly uses). It generates efficient code for all existing CPython versions starting with Python 2.4, with several optimisations also for recent CPython versions (including the upcoming 3.3).
2) It is not trivial to use it correctly.
Cython is basically Python, so Python developers with some C or C++ knowledge tend to get along with it quickly. I can't say yet how easy it is (or will be) to write code that is portable across independent Python implementations, but given that that field is still young, there's certainly a lot that can be done to aid this.
From my experience, this is a "nice to have" more than a requirement. It has been requested for Cython a couple of times, especially by new users, and there are a couple of scripts out there that do this to some extent. But the usual problem is that Cython users (and, similarly, ctypes users) do not want a 1:1 mapping of a library API to a Python API (there's SWIG for that), and you can't easily get more than a trivial mapping out of a script. But, yes, a one-shot generator for the necessary declarations would at least help in cases where the API to be wrapped is somewhat large.
Cython code can be as fast as C code, and in some cases, especially when developer time is limited, even faster than hand written C extensions. It allows for a straight forward optimisation path from regular Python code down to the speed of C, and trivial interaction with C code itself, if the need arises. Stefan [1] The PyPy port of Cython is currently being written as a GSoC project. [2] The IronPython port of Cython was written to facility a NumPy port to the .NET environment. It's currently not a complete port of all Cython features.

On Sun, Aug 28, 2011 at 11:23 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Cythin does sound attractive for cross-Python-implementation use. This is exciting.
Hm, the main use that was proposed here for ctypes is to wrap existing libraries (not to create nicer APIs, that can be done in pure Python on top of this). In general, an existing library cannot be called without access to its .h files -- there are probably struct and constant definitions, platform-specific #ifdefs and #defines, and other things in there that affect the linker-level calling conventions for the functions in the library. (Just like Python's own .h files -- e.g. the extensive renaming of the Unicode APIs depending on narrow/wide build) How does Cython deal with these? I wonder if for this particular purpose SWIG isn't the better match. (If SWIG weren't universally hated, even by its original author. :-)
-- --Guido van Rossum (python.org/~guido)

On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum <guido@python.org> wrote:
SWIG is nice when you control the C/C++ side of the API as well and can tweak it to be SWIG-friendly. I shudder at the idea of using it to wrap arbitrary C++ code, though. That said, the idea of using SWIG to emit Cython code rather than C/API code may be one well worth exploring. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Aug 28, 2011, at 7:27 PM, Guido van Rossum wrote:
Unfortunately I don't know a lot about this, but I keep hearing about something called "rffi" that PyPy uses to call C from RPython: <http://readthedocs.org/docs/pypy/en/latest/rffi.html>. This has some shortcomings currently, most notably the fact that it needs those .h files (and therefore a C compiler) at runtime, so it's currently a non-starter for code distributed to users. Not to mention the fact that, as you can see, it's not terribly thoroughly documented. But, that "ExternalCompilationInfo" object looks very promising, since it has fields like "includes", "libraries", etc. Nevertheless it seems like it's a bit more type-safe than ctypes or cython, and it seems to me that it could cache some of that information that it extracts from header files and store it for later when a compiler might not be around. Perhaps someone with more PyPy knowledge than I could explain whether this is a realistic contender for other Python runtimes?

2011/8/29 Glyph Lefkowitz <glyph@twistedmatrix.com>:
This is incorrect. rffi is actually quite like ctypes. The part you are referring to is probably rffi_platform [1], which invokes the compiler to determine constant values and struct offsets, or ctypes_configure, which does need runtime headers [2]. [1] https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/pypy/rpython/tool/rffi_plat... [2] https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/ctypes_configure/ -- Regards, Benjamin

Guido van Rossum wrote:
SIP is an alternative to SWIG: http://www.riverbankcomputing.com/software/sip/intro http://pypi.python.org/pypi/SIP and there are a few others as well: http://wiki.python.org/moin/IntegratingPythonWithOtherLanguages
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 29 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 36 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Guido van Rossum, 29.08.2011 04:27:
The same applies to Cython, obviously. The main advantage of Cython over ctypes for this is that the Python-level wrapper code is also compiled into C, so whenever the need for a thicker wrapper arises in some part of the API, you don't loose any performance in intermediate layers.
In the CPython backend, the header files are normally #included by the generated C code, so they are used at C compilation time. Cython has its own view on the header files in separate declaration files (.pxd). Basically looks like this: # file "mymath.pxd" cdef extern from "aheader.h": double PI double E double abs(double x) These declaration files usually only contain the parts of a header file that are used in the user code, either manually copied over or extracted by scripts (that's what I was referring to in my reply to Terry). The complete 'real' content of the header file is then used by the C compiler at C compilation time. The user code employs a "cimport" statement to import the declarations at Cython compilation time, e.g. # file "mymodule.pyx" cimport mymath print mymath.PI + mymath.E would result in C code that #includes "aheader.h", adds the C constants "PI" and "E", converts the result to a Python float object and prints it out using the normal CPython machinery. This means that declarations can be reused across modules, just like with header files. In fact, Cython actually ships with a couple of common declaration files, e.g. for parts of libc, NumPy or CPython's C-API. I don't know that much about the IronPython backend, but from what I heard, it uses basically the same build time mechanisms and generates a thin C++ wrapper and a corresponding CLI part as glue layer. The ctypes backend for PyPy works different in that it generates a Python module from the .pxd files that contains the declarations as ctypes code. Then, the user code imports that normally at Python runtime. Obviously, this means that there are cases where the Cython-level declarations and thus the generated ctypes code will not match the ABI for a given target platform. So, in the worst case, there is a need to manually adapt the ctypes declarations in the Python module that was generated from the .pxd. Not worse than the current situation, though, and the rest of the Cython wrapper will compile into plain Python code that simply imports the declarations from the .pxd modules. But there's certainly room for improvements here. Stefan

On 29 August 2011 10:39, Stefan Behnel <stefan_ml@behnel.de> wrote:
One thing that would make it easier for me to understand the role of Cython in this context would be to see a simple example of the type of "thin wrapper" we're talking about here. The above code is nearly this, but the pyx file executes "real code". For example, how do I simply expose pi and abs from math.h? Based on the above, I tried a pyx file containing just the code cdef extern from "math.h": double pi double abs(double x) but the resulting module exported no symbols. What am I doing wrong? Could you show a working example of writing such a wrapper? This is probably a bit off-topic, but it seems to me that whenever Cython comes up in these discussions, the implications of Cython-as-an-implementation-of-python obscure the idea of simply using Cython as a means of writing thin library wrappers. Just to clarify - the above code (if it works) seems to me like a nice simple means of writing wrappers. Something involving this in a pxd file, plus a pyx file with a whole load of dummy def abs(x): return cimported_module.abs(x) definitions, seems ok, but annoyingly clumsy. (Particularly for big APIs). I've kept python-dev in this response, on the assumption that others on the list might be glad of seeing a concrete example of using Cython to build wrapper code. But anything further should probably be taken off-list... Thanks, Paul. PS This would also probably be a useful addition to the Cython wiki and/or the manual. I searched both and found very little other than a page on wrapping C++ classes (which is not very helpful for simple C global functions and constants).

Hi, I agree that this is getting off-topic for this list. I'm answering here in a certain detail to lighten things up a bit regarding thin and thick wrappers, but please move further usage related questions to the cython-users mailing list. Paul Moore, 29.08.2011 12:37:
Yes, that's the idea. If all you want is an exact, thin wrapper, you are better off with SWIG (well, assuming that performance is not important for you - Cython is a *lot* faster). But if you use it, or any other plain glue code generator, chances are that you will quickly learn that you do not actually want a thin wrapper. Instead, you want something that makes the external library easily and efficiently usable from Python code. Which means that the wrapper will be thin in some places and thick in others, sometimes very thick in selected places, and usually growing thicker over time. You can do this by using a glue code generator and writing the rest in a Python wrapper on top of the thin glue code. It's just that Cython makes such a wrapper much more efficient (for CPython), be it in terms of CPU performance (fast Python interaction, overhead-free C interaction, native C data type support, various Python code optimisations), or in terms of parallelisation support (explicit GIL-free threading and OpenMP), or just general programmer efficiency, e.g. regarding automatic data conversion or ease and safety of manual C memory management.
Recent Cython versions have support for directly exporting C values (e.g. enum values) at the Python module level. However, the normal way is to explicitly implement the module API as you guessed, i.e. cimport mydecls # assuming there is a mydecls.pxd PI = mydecls.PI def abs(x): return mydecls.abs(x) Looks simple, right? Nothing interesting here, until you start putting actual code into it, as in this (totally contrived and untested, but much more correct) example: from libc cimport math cdef extern from *: # these are defined by the always included Python.h: long LONG_MAX, LONG_MIN def abs(x): if isinstance(x, float): # -> C double return math.fabs(x) elif isinstance(x, int): # -> may or may not be a C integer if LONG_MIN <= x <= LONG_MAX: return <unsigned long> math.labs(x) else: # either within "long long" or raise OverflowError return <unsigned long long> math.llabs(x) else: # assume it can at least coerce to a C long, # or raise ValueError or OverflowError or whatever return <unsigned long> math.labs(x) BTW, there is some simple templating/generics-like type merging support upcoming in a GSoC to simplify this kind of type specific code.
Cython is not a glue code generator, it's a full-fledged programming language. It's Python, with additional support for C data types. That makes it great for writing non-trivial wrappers between Python and C. It's not so great for the trivial cases, but luckily, those are rare. ;)
Agreed. The best place for asking about Cython usage is the cython-users mailing list.
Hmm, ok, I guess that's because it's too simple (you actually guessed how it works) and a somewhat rare use case. In most cases, wrappers tend to use extension types, as presented here: http://docs.cython.org/src/tutorial/clibraries.html Stefan

On Mon, Aug 29, 2011 at 2:39 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Yes, this is a very nice advantage. The only advantage that I can think of for ctypes is that it doesn't require a toolchain -- you can just write the Python code and get going. With Cython you will always have to invoke the Cython compiler. Another advantage may be that it works *today* for PyPy -- I don't know the status of Cython for PyPy. Also, (maybe this was answered before?), how well does Cython deal with #include files (especially those you don't have control over, like the ones typically required to use some lib<foo>.so safely on all platforms)? -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Pyrex/Cython deal with it by generating C code that includes the relevant headers, so the C compiler expands all the macros, interprets the struct declarations, etc. All you need to do when writing the .pyx file is follow the same API that you would if you were writing C code to use the library. -- Greg

Guido van Rossum wrote:
On Mon, Aug 29, 2011 at 2:17 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
You might be reading more into that statement than I meant. You have to supply Pyrex/Cython versions of the C declarations, either hand-written or generated by a tool. But you write them based on the advertised C API -- you don't have to manually expand macros, work out the low-level layout of structs, or anything like that (as you often have to do when using ctypes). -- Greg

"Martin v. Löwis", 30.08.2011 10:46:
I had written a bit about this here: http://thread.gmane.org/gmane.comp.python.devel/126340/focus=126419 Stefan

On Tue, Aug 30, 2011 at 9:49 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
To elaborate, with CPython it looks pretty solid, at least for functions and constants (does it do structs?). You must manually declare the name and signature of a function, and Pyrex/Cython emits C code that includes the header and calls the function with the appropriate types. If the signature you declare doesn't match what's in the .h file you'll get a compiler error when the C code is compiled. If (perhaps on some platforms) the function is really a macro, the macro in the .h file will be invoked and the right thing will happen. So far so good. The problem lies with the PyPy backend -- there it generates ctypes code, which means that the signature you declare to Cython/Pyrex must match the *linker* level API, not the C compiler level API. Thus, if in a system header a certain function is really a macro that invokes another function with a permuted or augmented argument list, you'd have to know what that macro does. I also don't see how this would work for #defined constants: where does Cython/Pyrex get their value? ctypes doesn't have their values. So, for PyPy, a solution based on Cython/Pyrex has many of the same downsides as one based on ctypes where it comes to complying with an API defined by a .h file. -- --Guido van Rossum (python.org/~guido)

On 8/30/2011 1:05 PM, Guido van Rossum wrote:
Thank you for this elaboration. My earlier comment that ctypes seems to be hard to use was based on observation of posts to python-list presenting failed attempts (which have included somehow getting function signatures wrong) and a sense that ctypes was somehow bypassing the public compiler API to make a more direct access via some private api. You have explained and named that as the 'linker API', so I understand much better now. Nothing like 'linker API' or 'signature' appears in the ctypes doc. All I could find about discovering specific function calling conventions is "To find out the correct calling convention you have to look into the C header file or the documentation for the function you want to call." Perhaps that should be elaborated to explain, as you did above, the need to trace macro definitions to find the actual calling convention and the need to be aware that macro definitions can change to accommodate implementation detail changes even as the surface calling conventions seems to remain the same. -- Terry Jan Reedy

Guido van Rossum, 30.08.2011 19:05:
Sure. They even coerce from Python dicts and accept keyword arguments in Cython.
Right.
Right again. The declarations that Cython uses describe the API at the C or C++ level. They do not describe the ABI. So the situation is the same as with ctypes, and the same solutions (or work-arounds) apply, such as generating additional glue code that calls macros or reads compile time constants, for example. That's the approach that the IronPython backend has taken. It's a lot more complex, but also a lot more versatile in the long run. Stefan

On Tue, Aug 30, 2011 at 10:05 AM, Guido van Rossum <guido@python.org> wrote:
It's certainly a harder problem. For most simple constants, Cython/Pyrex might be able to generate a series of tiny C programs with which to find CPP symbol values: #include "file1.h" ... #include "filen.h" main() { printf("%d", POSSIBLE_CPP_SYMBOL1); } ...and again with %f, %s, etc. The typing is quite a mess, and code fragments would probably be impractical. But since the C Preprocessor is supposedly turing complete, maybe there's a pleasant surprise waiting there. But hopefully clang has something that'd make this easier. SIP's approach of using something close to, but not identical to, the .h's sounds like it might be pretty productive - especially if the derivative of the .h's could be automatically derived using a python script, with minor tweaks to the inputs on .h upgrades. But sip itself is apparently C++-only.

Dan Stromberg, 01.09.2011 19:56:
The user will commonly declare #defined values as typed external variables and callable macros as functions in .pxd files. These manually typed "macro" functions allow users to tell Cython what it should know about how the macros will be used. And that would allow it to generate C/C++ glue code for them that uses the declared types as a real function signature and calls the macro underneath.
and code fragments would probably be impractical.
Not necessarily at the C level but certainly for a ctypes backend, yes.
But hopefully clang has something that'd make this easier.
For figuring these things out, maybe. Not so much for solving the problems they introduce. Stefan

Dan Stromberg wrote:
http://www.riverbankcomputing.co.uk/software/sip/intro "What is SIP? One of the features of Python that makes it so powerful is the ability to take existing libraries, written in C or C++, and make them available as Python extension modules. Such extension modules are often called bindings for the library. SIP is a tool that makes it very easy to create Python bindings for C and C++ libraries. It was originally developed to create PyQt, the Python bindings for the Qt toolkit, but can be used to create bindings for any C or C++ library. " It's not C++ only. The code for SIP is also in C. Jeremy

On Sat, Aug 27, 2011 at 11:58 PM, Terry Reedy <tjreedy@udel.edu> wrote:
I am trying to work through getting these issues resolved. The hard part so far has been getting reviews and commits. The follow patches are awaiting review (the patch for issue 11241 has been accepted, just not applied): 1. http://bugs.python.org/issue9041 2. http://bugs.python.org/issue9651 3. http://bugs.python.org/issue11241 I am more than happy to keep working through these issues, but I need some help getting the patches actually applied since I don't have commit rights. -- # Meador

Meador Inge <meadori <at> gmail.com> writes:
I raised a question about this patch (in the issue tracker).
2. http://bugs.python.org/issue9651 3. http://bugs.python.org/issue11241
I presume, since Amaury has commit rights, that he could commit these. Regards, Vinay Sajip

I also have some patches sitting on the tracker for some time: http://bugs.python.org/issue12764 http://bugs.python.org/issue11835 http://bugs.python.org/issue12528 which also fixes http://bugs.python.org/issue6069 and http://bugs.python.org/issue11920 http://bugs.python.org/issue6068 which also fixes http://bugs.python.org/issue6493 Thank you, Vlad On Tue, Aug 30, 2011 at 6:09 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:

On Sat, Aug 27, 2011 at 3:14 PM, Dan Stromberg <drsalists@gmail.com> wrote:
IMO, we really, really need some common way of accessing C libraries that works for all major Python variants.
We have one. It's called writing an extension module. ctypes is a crutch because it doesn't realistically have access to the header files. It's a fine crutch for PyPy, which doesn't have much of an alternative. It's also a fine crutch for people who need something to run *now*. It's a horrible strategy for the standard library. If you have a better proposal please do write it up. But so far you are mostly exposing your ignorance and insisting dramatically that you be educated. -- --Guido van Rossum (python.org/~guido)

On Sat, Aug 27, 2011 at 8:57 PM, Guido van Rossum <guido@python.org> wrote:
And yet Cext's are full of CPython-isms. I've said in the past that Python has been lucky in that it had only a single implementation for a long time, but still managed to escape becoming too defined by the idiosyncrasies of that implementation - that's quite impressive, and is probably our best indication that Python has had leadership with foresight. In the language proper, I'd say I still believe this, but Cext's are sadly not a good example.
ctypes is a crutch because it doesn't realistically have access to the header files.
Well, actually, header files are pretty easy to come by. I bet you've installed them yourself many times. In fact, you've probably even automatically brought some of them in via a package management system of one form or another without getting your hands dirty. As a thought experiment, imagine having a ctypes configuration system that looks around a computer for .h's and .so's (etc) with even 25% of the effort expended by GNU autoconf. Instead of building the results into a bunch of .o's, the results are saved in a .ct file or something. If you build-in some reasonable default locations to look in, plus the equivalent of some -I's and -L's (and maybe -rpath's) as needed, you probably end up with a pretty comparable system. (typedef's might be a harder problem - that's particularly worth discussing, IMO - your chance to nip this in the bud with a reasoned explanation why they can't be handled well!) It's a fine crutch for PyPy, which doesn't have much of
an alternative.
Wait - a second ago I thought I was to believe that C extension modules were the one true way of interfacing with C code across all major implementations? Are we perhaps saying that CPython is "the" major implementation, and that we want it to stay that way? I personally feel that PyPy has arrived as a major implementation. The backup program I've been writing in my spare time runs great on PyPy (and the CPython's from 2.5.x, and pretty well on Jython). And PyPy has been maturing very rapidly ('just wish they'd do 3.x!). It's also a fine crutch for people who need something
to run *now*. It's a horrible strategy for the standard library.
I guess I'm coming to see this as dogma. If ctypes is augmented with type information and/or version information and where to find things, wouldn't it Become safe and convenient? Or do you have other concerns? Make a list of things that can go wrong with ctypes modules. Now make a list of things that can wrong with C extension modules. Aren't they really pretty similar - missing .so, .so in a weird place, and especially: .so with a changed interface? C really isn't a very safe language - not like http://en.wikipedia.org/wiki/Turing_%28programming_language%29 or something. Perhaps it's a little easier to mess things up with ctypes today (a recompile doesn't fix, or at least detect, as many problems), but isn't it at least worth Thinking about how that situation could be improved? If you have a better proposal please do write it up. But so far you
are mostly exposing your ignorance and insisting dramatically that you be educated.
I'm not sure why you're trying to avoid having a discussion. I think it's premature to dive into a proposal before getting other people's thoughts. Frankly, 100 people tend to think better than one - at least, if the 100 people feel like they can talk. I'm -not- convinced ctypes are the way forward. I just want to talk about it - for now. ctypes have some significant advantages - if we can find a way to eliminate and/or ameliorate their disadvantages, they might be quite a bit nicer than Cext's.

On Sat, Aug 27, 2011 at 10:36 PM, Dan Stromberg <drsalists@gmail.com> wrote:
I have to apologize, I somehow misread your "all Python variants" as a mixture of "all CPython versions" and "all platforms where CPython runs". While I have no desire to continue this discussion, you are most welcome to do so. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
I think Dan means some way of doing this without having to hand-craft a different one for each Python implementation. If we're really serious about the idea that "Python is not CPython", this seems like a reasonable thing to want. Currently the Python universe is very much centred around CPython, with the other implementations perpetually in catch-up mode. My suggestion on how to address this would be something akin to Pyrex or Cython. I gather that there has been some work recently on adding different back-ends to Cython to generate code for different Python implementations. -- Greg

On Sat, Aug 27, 2011 at 9:47 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Please note that the code I'm talking about is not the same as the patches by Per Øyvind Karlsen that are attached to the tracker issue. I have been doing a completely new implementation of the module, specifically to address the concerns raised by Martin and Antoine. (As for why I haven't posted my own changes yet - I'm currently an intern at Google, and they want me to run my code by their open-source team before releasing it into the wild. Sorry for the delay and the confusion.)
I talked to Antoine about this on IRC; he didn't seem to think a PEP would be necessary. But a summary of the discussion on the tracker issue might still be a useful thing to have, given how long it's gotten.
As stated in my earlier response to Martin, I intend to do this. Aside from I/O, though, there's not much that _can_ be done in Python - the rest is basically just providing a thin wrapper for the C library. On Sat, Aug 27, 2011 at 9:58 PM, Dan Stromberg <drsalists@gmail.com> wrote:
I'd like to better understand why ctypes is (sometimes) frowned upon.
Is it the brittleness? Tendency to segfault?
The problem (as I understand it) is that ABI changes in a library will cause code that uses it via ctypes to break without warning. With an extension module, you'll get a compile failure if you rely on things that change in an incompatible way. With a ctypes wrapper, you just get incorrect answers, or segfaults.
This might be feasible for a specific application running in a controlled environment, but it seems impractical for something as widely-used as the stdlib. Having to include a whitelist of acceptable library versions would be a substantial maintenance burden, and (compatible) new versions would not work until the library whitelist gets updated. Cheers, Nadeem

I've updated the issue <http://bugs.python.org/issue6715> with a patch containing my work so far - the LZMACompressor and LZMADecompressor classes, along with some tests. These two classes should provide a fairly complete interface to liblzma; it will be possible to implement LZMAFile on top of them, entirely in Python. Note that the C code does no I/O; this will be handled by LZMAFile. Please take a look, and let me know what you think. Cheers, Nadeem

I've posted an updated patch to the bug tracker, with a complete implementation of the lzma module, including 100% test coverage for the LZMAFile class (which is implemented entirely in Python). It doesn't include ReST documentation (yet), but the docstrings are quite detailed. Please take a look and let me know what you think. Cheers, Nadeem

Another update - I've added proper documentation. Now the code should be pretty much complete - all that's missing is the necessary bits and pieces to build it on Windows. Cheers, Nadeem

Dan Stromberg, 27.08.2011 21:58:
Maybe unwieldy code and slow execution on CPython? Note that there's a ctypes backend for Cython being written as part of a GSoC, so it should eventually become possible to write C library wrappers in Cython and have it generate a ctypes version to run on PyPy. That, together with the IronPython backend that is on its way, would give you a way to write fast wrappers for at least three of the major four Python implementations, without sacrificing readability or speed in one of them. Stefan

On Sun, 28 Aug 2011 01:52:51 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
I think you're taking this too seriously. Our extension modules (_bz2, _ssl...) are *already* optional even on CPython. If the library or its development headers are not available on the system, building these extensions is simply skipped, and the test suite passes nonetheless. The only required libraries for passing the tests being basically the libc and the zlib. Regards Antoine.

PEP 399 also comes into play - we need a pure Python version for PyPy et al (or a plausible story for why an exception should be granted).
No, we don't. We can grant an exception, which I'm very willing to do. The PEP lists wrapping a specific C-based library as a plausible reason.
It's acceptable for the Python version to use ctypes
Hmm. To me, *that's* unacceptable. In the specific case, having a pure-Python implementation would be acceptable to me, but I'm skeptical that anybody is willing to produce one. Regards, Martin

On Sat, Aug 27, 2011 at 5:15 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
It is not my intention for the _lzma C module to do I/O - that will be done by the LZMAFile class, which will be written in Python. My comparison with bz2 was in reference to the state of the module after it was rewritten for issue 5863. Saying "anything along the lines of GzipFile" was a bad choice of wording; what I meant is that the LZMAFile class won't handle the problem of picking apart the .xz and .lzma container formats. That is handled by liblzma (operating entirely on in-memory buffers). It will do _only_ I/O, in a similar fashion to the BZ2File class (as of changeset 2cb07a46f4b5, to avoid ambiguity ;) ). Cheers, Nadeem

On 8/27/2011 9:47 AM, Nadeem Vawda wrote:
As I read the discussion, the idea has been more or less accepted in principle. However, the current patch is not and needs changes.
I believe Antoine suggested a PEP. It should summarize the salient points in the long tracker discussion into a coherent exposition and flesh out the details implied above. (Perhaps they are already in the proposed doc addition.)
I would follow Martin's suggestions, including doing all i/o with the io module and the following: "So I would propose that a very thin C layer is created around the C library that focuses on the actual algorithms, and that any higher layers (in particular file formats) are done in Python." If we minimize the C code we add and maximize what is done in Python, that would maximize the ease of porting to other implementations. This would conform to the spirit of PEP 399. -- Terry Jan Reedy

When I reviewed lzma, I found that this approach might not be appropriate. lzma has many more options and aspects that allow tuning and selection, and a Python LZMA library should provide the same feature set as the underlying C library. So I would propose that a very thin C layer is created around the C library that focuses on the actual algorithms, and that any higher layers (in particular file formats) are done in Python. Regards, Martin

On Sat, Aug 27, 2011 at 4:50 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I probably shouldn't have used the word "basic" here - these classes expose all the features of the underlying library. I was rather trying to underscore that the rest of the module is implemented in terms of these two classes. As for file formats, these are handled by liblzma itself; the extension module just selects which compressor/decompressor initializer function to use depending on the value of the "format" argument. Our code won't contain anything along the lines of GzipFile; all of that work is done by the underlying C library. Rather, the LZMAFile class will be like BZ2File - just a simple filter that passes the read/written data through a LZMACompressor or LZMADecompressor as appropriate. Cheers, Nadeem

This is exactly what I worry about. I think adding file I/O to bz2 was a mistake, as this doesn't integrate with Python's IO library (it used to, but now after dropping stdio, they were incompatible. Indeed, for Python 3.2, BZ2File has been removed from the C module, and lifted to Python. IOW, the _lzma C module must not do any I/O, neither directly nor indirectly (through liblzma). The approach of gzip.py (doing IO and file formats in pure Python) is exactly right. Regards, Martin

On Sun, Aug 28, 2011 at 1:15 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
PEP 399 also comes into play - we need a pure Python version for PyPy et al (or a plausible story for why an exception should be granted). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, 28 Aug 2011 01:36:50 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
The plausible story being that we basically wrap an existing library? I don't think PyPy et al have pure Python versions of the zlib or OpenSSL, do they? If we start taking PEP 399 conformance to such levels, we might as well stop developing CPython. cheers Antoine.

On Sun, Aug 28, 2011 at 1:40 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It's acceptable for the Python version to use ctypes in the case of wrapping an existing library, but the Python version should still exist. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 27, 2011 at 5:42 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Not sure whether you already have this: supporting the tarfile module would be nice.
Yes, got that - issue 5689. Also of interest is issue 5411 - adding .xz support to distutils. But I think that these are separate projects that should wait until the lzma module is finalized. On Sat, Aug 27, 2011 at 5:40 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Indeed, PEP 399 specifically notes that exemptions can be granted for modules that wrap external C libraries. On Sat, Aug 27, 2011 at 5:52 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm not too sure about that - PEP 399 explicitly says that using ctypes is frowned upon, and doesn't mention anywhere that it should be used in this sort of situation. Cheers, Nadeem

On Sun, Aug 28, 2011 at 1:58 AM, Nadeem Vawda <nadeem.vawda@gmail.com> wrote:
Note to self: do not comment on python-dev at 2 am, as one's ability to read PEPs correctly apparently suffers :) Consider my comment withdrawn, you're quite right that PEP 399 actually says this is precisely the case where an exemption is a reasonable idea. Although I believe it's likely that PyPy will wrap it with ctypes anyway :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 27, 2011 at 9:04 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'd like to better understand why ctypes is (sometimes) frowned upon. Is it the brittleness? Tendency to segfault? If yes, is there a way of making ctypes less brittle - say, by carefully matching it against a specific version of a .so/.dll before starting to make heavy use of said .so/.dll? FWIW, I have a partial implementation of a module that does xz from Python using ctypes. It only does in-memory compression and decompression (not stream compression or decompression to or from a file), because that was all I needed for my current project, but it runs on CPython 2.x, CPython 3.x, and PyPy. I don't think it runs on Jython, but I've not looked at that carefully - my code falls back on subprocess if ctypes doesn't appear to be all there. It's at http://stromberg.dnsalias.org/svn/xz_mod/trunk/xz_mod.py

I'd like to better understand why ctypes is (sometimes) frowned upon.
Is it the brittleness? Tendency to segfault?
That, and Python should work completely if ctypes is not available.
FWIW, I have a partial implementation of a module that does xz from Python using ctypes.
So does it work on Sparc/Solaris? On OpenBSD? On ARM-Linux? Does it work if the xz library is installed into /opt/sfw/xz? Regards, Martin

On Sat, Aug 27, 2011 at 1:21 PM, "Martin v. Löwis" <martin@v.loewis.de>wrote:
What are the most major platforms ctypes doesn't work on? It seems like there should be some way of coming up with an xml file describing the types of the various bits of data and formal arguments - perhaps using gccxml or something like it.
So far, I've only tried it on a couple of Linuxes and Cygwin. I intend to try it on a large number of *ix variants in the future, including OS/X and Haiku. I doubt I'll test OpenBSD, but I'm likely to test on FreeBSD and Dragonfly again. With regard to /opt/sfw/xz, if ctypes.util.find_library(library) is smart enough to look there, then yes, xz_mod should find libxz there. On Cygwin, ctypes.util.find_library() wasn't smart enough to find a Cygwin DLL, so I coded around that. But it finds the library OK on the Linuxes I've tried so far. (This is part of a larger project, a backup program. The backup program has been tested on a large number of OS's, but I've not done another broad round of testing yet since adding the ctypes+xz code)

On Sat, Aug 27, 2011 at 10:41 PM, Dan Stromberg <drsalists@gmail.com> wrote:
The problem is that you would need to do this check at runtime, every time you load up the library - otherwise, what happens if the user upgrades their installed copy of liblzma? And we can't expect users to have the liblzma headers installed, so we'd have to try and figure out whether the library was ABI-compatible from the shared object alone; I doubt that this is even possible.

On Sat, Aug 27, 2011 at 2:38 PM, Nadeem Vawda <nadeem.vawda@gmail.com>wrote:
I was thinking about this as I was getting groceries a bit ago. Why -can't- we expect the user to have liblzma headers installed? Couldn't it just be a dependency in the package management system? BTW, gcc-xml seems to be only for C++ (?), but long ago, around the time people were switching from K&R to Ansi C, there were programs like "mkptypes" that could parse a .c/.h and output prototypes. It seems we could do something like this on module init. IMO, we really, really need some common way of accessing C libraries that works for all major Python variants.

On Sat, Aug 27, 2011 at 3:26 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Well, uhhhhh, yeah. Not sure what your point is. 1) We could easily work with the dev / nondev distinction by taking a dependency on the -dev version of whatever we need, instead of the nondev version. 2) It's a rather arbitrary distinction that's being drawn between dev and nondev today. There's no particular reason why the line couldn't be drawn somewhere else.
Also, under Windows, most users don't have development stuff installed at all.
Yes... But if the nature of "what development stuff is" were to change, they'd have different stuff. Also, we wouldn't have to parse the .h's every time a module is loaded - we could have a timestamp file (or database) indicating when we last parsed a given .h. Also, we could query the package management system for the version of lzma that's currently installed on module init. Also, we could include our own version of lzma. Granted, this was a mess when zlib needed to be patched, but even this one might be worth it for the improved library unification across Python implementations.

On Sat, Aug 27, 2011 at 4:27 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Sure. Now please convince Linux distributions first, because this particular subthread is going nowhere.
I hope you're not a solipsist. Anyway, if the mere -discussion- of embracing a standard and safe way of making C libraries callable from all the major Python implementations is "going nowhere" before the discussion has even gotten started, I fear for Python's future. Repeat aloud to yourself: Python != CPython. Python != CPython. Python != CPython. Has this topic been discussed to death? If so, then say so. It's rude to try to kill the thread summarily before it gets started, sans discussion, sans explanation, sans commentary on whether new additions to the topic have surfaced or not.

On Sat, Aug 27, 2011 at 4:27 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Interesting. You seem to want to throw an arbitrary barrier between Python, the language, and accomplishing something important for said language. Care to tell me why I'm wrong? I'm all ears. I'll note that you've deleted:
...which makes it more than apparent that we needn't convince Linux distributors of #2, which you seem to prefer to focus on. Why was it in your best interest to delete #1, without even commenting on it?

Dan, I once had the more or less the same opinion/question as you with regard to ctypes, but I now see at least 3 problems. 1) It seems hard to write it correctly. There are currently 47 open ctypes issues, with 9 being feature requests, leaving 38 behavior-related issues. Tom Heller has not been able to work on it since the beginning of 2010 and has formally withdrawn as maintainer. No one else that I know of has taken his place. 2) It is not trivial to use it correctly. I think it needs a SWIG-like companion script that can write at least first-pass ctypes code from the .h header files. Or maybe it could/should use header info at runtime (with the .h bundled with a module). 3) It seems to be slower than compiled C extension wrappers. That, at least, was the discovery of someone who re-wrote pygame using ctypes. (The hope was that using ctypes would aid porting to 3.x, but the time penalty was apparently too much for time-critical code.) If you want to see more use of ctypes in the Python community (though not necessarily immediately in the stdlib), feel free to work on any one of these problems. A fourth problem is that people capable of working on ctypes are also capable of writing C extensions, and most prefer that. Or some work on Cython, which is a third solution. -- Terry Jan Reedy

On Sun, Aug 28, 2011 at 6:58 AM, Terry Reedy <tjreedy@udel.edu> wrote:
This is sort of already available: -- http://starship.python.net/crew/theller/ctypes/old/codegen.html -- http://svn.python.org/projects/ctypes/trunk/ctypeslib/ It just appears to have never made it into CPython. I've used it successfully on a small project. Schiavo Simon

Hi, sorry for hooking in here with my usual Cython bias and promotion. When the question comes up what a good FFI for Python should look like, it's an obvious reaction from my part to throw Cython into the game. Terry Reedy, 28.08.2011 06:58:
Cython has an active set of developers and a rather large and growing user base. It certainly has lots of open issues in its bug tracker, but most of them are there because we *know* where the development needs to go, not so much because we don't know how to get there. After all, the semantics of Python and C/C++, between which Cython sits, are pretty much established. Cython compiles to C code for CPython, (hopefully soon [1]) to Python+ctypes for PyPy and (mostly [2]) C++/CLI code for IronPython, which boils down to the same build time and runtime kind of dependencies that the supported Python runtimes have anyway. It does not add dependencies on any external libraries by itself, such as the libffi in CPython's ctypes implementation. For the CPython backend, the generated code is very portable and is self-contained when compiled against the CPython runtime (plus, obviously, libraries that the user code explicitly uses). It generates efficient code for all existing CPython versions starting with Python 2.4, with several optimisations also for recent CPython versions (including the upcoming 3.3).
2) It is not trivial to use it correctly.
Cython is basically Python, so Python developers with some C or C++ knowledge tend to get along with it quickly. I can't say yet how easy it is (or will be) to write code that is portable across independent Python implementations, but given that that field is still young, there's certainly a lot that can be done to aid this.
From my experience, this is a "nice to have" more than a requirement. It has been requested for Cython a couple of times, especially by new users, and there are a couple of scripts out there that do this to some extent. But the usual problem is that Cython users (and, similarly, ctypes users) do not want a 1:1 mapping of a library API to a Python API (there's SWIG for that), and you can't easily get more than a trivial mapping out of a script. But, yes, a one-shot generator for the necessary declarations would at least help in cases where the API to be wrapped is somewhat large.
Cython code can be as fast as C code, and in some cases, especially when developer time is limited, even faster than hand written C extensions. It allows for a straight forward optimisation path from regular Python code down to the speed of C, and trivial interaction with C code itself, if the need arises. Stefan [1] The PyPy port of Cython is currently being written as a GSoC project. [2] The IronPython port of Cython was written to facility a NumPy port to the .NET environment. It's currently not a complete port of all Cython features.

On Sun, Aug 28, 2011 at 11:23 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Cythin does sound attractive for cross-Python-implementation use. This is exciting.
Hm, the main use that was proposed here for ctypes is to wrap existing libraries (not to create nicer APIs, that can be done in pure Python on top of this). In general, an existing library cannot be called without access to its .h files -- there are probably struct and constant definitions, platform-specific #ifdefs and #defines, and other things in there that affect the linker-level calling conventions for the functions in the library. (Just like Python's own .h files -- e.g. the extensive renaming of the Unicode APIs depending on narrow/wide build) How does Cython deal with these? I wonder if for this particular purpose SWIG isn't the better match. (If SWIG weren't universally hated, even by its original author. :-)
-- --Guido van Rossum (python.org/~guido)

On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum <guido@python.org> wrote:
SWIG is nice when you control the C/C++ side of the API as well and can tweak it to be SWIG-friendly. I shudder at the idea of using it to wrap arbitrary C++ code, though. That said, the idea of using SWIG to emit Cython code rather than C/API code may be one well worth exploring. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Aug 28, 2011, at 7:27 PM, Guido van Rossum wrote:
Unfortunately I don't know a lot about this, but I keep hearing about something called "rffi" that PyPy uses to call C from RPython: <http://readthedocs.org/docs/pypy/en/latest/rffi.html>. This has some shortcomings currently, most notably the fact that it needs those .h files (and therefore a C compiler) at runtime, so it's currently a non-starter for code distributed to users. Not to mention the fact that, as you can see, it's not terribly thoroughly documented. But, that "ExternalCompilationInfo" object looks very promising, since it has fields like "includes", "libraries", etc. Nevertheless it seems like it's a bit more type-safe than ctypes or cython, and it seems to me that it could cache some of that information that it extracts from header files and store it for later when a compiler might not be around. Perhaps someone with more PyPy knowledge than I could explain whether this is a realistic contender for other Python runtimes?

2011/8/29 Glyph Lefkowitz <glyph@twistedmatrix.com>:
This is incorrect. rffi is actually quite like ctypes. The part you are referring to is probably rffi_platform [1], which invokes the compiler to determine constant values and struct offsets, or ctypes_configure, which does need runtime headers [2]. [1] https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/pypy/rpython/tool/rffi_plat... [2] https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/ctypes_configure/ -- Regards, Benjamin

Guido van Rossum wrote:
SIP is an alternative to SWIG: http://www.riverbankcomputing.com/software/sip/intro http://pypi.python.org/pypi/SIP and there are a few others as well: http://wiki.python.org/moin/IntegratingPythonWithOtherLanguages
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 29 2011)
2011-10-04: PyCon DE 2011, Leipzig, Germany 36 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Guido van Rossum, 29.08.2011 04:27:
The same applies to Cython, obviously. The main advantage of Cython over ctypes for this is that the Python-level wrapper code is also compiled into C, so whenever the need for a thicker wrapper arises in some part of the API, you don't loose any performance in intermediate layers.
In the CPython backend, the header files are normally #included by the generated C code, so they are used at C compilation time. Cython has its own view on the header files in separate declaration files (.pxd). Basically looks like this: # file "mymath.pxd" cdef extern from "aheader.h": double PI double E double abs(double x) These declaration files usually only contain the parts of a header file that are used in the user code, either manually copied over or extracted by scripts (that's what I was referring to in my reply to Terry). The complete 'real' content of the header file is then used by the C compiler at C compilation time. The user code employs a "cimport" statement to import the declarations at Cython compilation time, e.g. # file "mymodule.pyx" cimport mymath print mymath.PI + mymath.E would result in C code that #includes "aheader.h", adds the C constants "PI" and "E", converts the result to a Python float object and prints it out using the normal CPython machinery. This means that declarations can be reused across modules, just like with header files. In fact, Cython actually ships with a couple of common declaration files, e.g. for parts of libc, NumPy or CPython's C-API. I don't know that much about the IronPython backend, but from what I heard, it uses basically the same build time mechanisms and generates a thin C++ wrapper and a corresponding CLI part as glue layer. The ctypes backend for PyPy works different in that it generates a Python module from the .pxd files that contains the declarations as ctypes code. Then, the user code imports that normally at Python runtime. Obviously, this means that there are cases where the Cython-level declarations and thus the generated ctypes code will not match the ABI for a given target platform. So, in the worst case, there is a need to manually adapt the ctypes declarations in the Python module that was generated from the .pxd. Not worse than the current situation, though, and the rest of the Cython wrapper will compile into plain Python code that simply imports the declarations from the .pxd modules. But there's certainly room for improvements here. Stefan

On 29 August 2011 10:39, Stefan Behnel <stefan_ml@behnel.de> wrote:
One thing that would make it easier for me to understand the role of Cython in this context would be to see a simple example of the type of "thin wrapper" we're talking about here. The above code is nearly this, but the pyx file executes "real code". For example, how do I simply expose pi and abs from math.h? Based on the above, I tried a pyx file containing just the code cdef extern from "math.h": double pi double abs(double x) but the resulting module exported no symbols. What am I doing wrong? Could you show a working example of writing such a wrapper? This is probably a bit off-topic, but it seems to me that whenever Cython comes up in these discussions, the implications of Cython-as-an-implementation-of-python obscure the idea of simply using Cython as a means of writing thin library wrappers. Just to clarify - the above code (if it works) seems to me like a nice simple means of writing wrappers. Something involving this in a pxd file, plus a pyx file with a whole load of dummy def abs(x): return cimported_module.abs(x) definitions, seems ok, but annoyingly clumsy. (Particularly for big APIs). I've kept python-dev in this response, on the assumption that others on the list might be glad of seeing a concrete example of using Cython to build wrapper code. But anything further should probably be taken off-list... Thanks, Paul. PS This would also probably be a useful addition to the Cython wiki and/or the manual. I searched both and found very little other than a page on wrapping C++ classes (which is not very helpful for simple C global functions and constants).

Hi, I agree that this is getting off-topic for this list. I'm answering here in a certain detail to lighten things up a bit regarding thin and thick wrappers, but please move further usage related questions to the cython-users mailing list. Paul Moore, 29.08.2011 12:37:
Yes, that's the idea. If all you want is an exact, thin wrapper, you are better off with SWIG (well, assuming that performance is not important for you - Cython is a *lot* faster). But if you use it, or any other plain glue code generator, chances are that you will quickly learn that you do not actually want a thin wrapper. Instead, you want something that makes the external library easily and efficiently usable from Python code. Which means that the wrapper will be thin in some places and thick in others, sometimes very thick in selected places, and usually growing thicker over time. You can do this by using a glue code generator and writing the rest in a Python wrapper on top of the thin glue code. It's just that Cython makes such a wrapper much more efficient (for CPython), be it in terms of CPU performance (fast Python interaction, overhead-free C interaction, native C data type support, various Python code optimisations), or in terms of parallelisation support (explicit GIL-free threading and OpenMP), or just general programmer efficiency, e.g. regarding automatic data conversion or ease and safety of manual C memory management.
Recent Cython versions have support for directly exporting C values (e.g. enum values) at the Python module level. However, the normal way is to explicitly implement the module API as you guessed, i.e. cimport mydecls # assuming there is a mydecls.pxd PI = mydecls.PI def abs(x): return mydecls.abs(x) Looks simple, right? Nothing interesting here, until you start putting actual code into it, as in this (totally contrived and untested, but much more correct) example: from libc cimport math cdef extern from *: # these are defined by the always included Python.h: long LONG_MAX, LONG_MIN def abs(x): if isinstance(x, float): # -> C double return math.fabs(x) elif isinstance(x, int): # -> may or may not be a C integer if LONG_MIN <= x <= LONG_MAX: return <unsigned long> math.labs(x) else: # either within "long long" or raise OverflowError return <unsigned long long> math.llabs(x) else: # assume it can at least coerce to a C long, # or raise ValueError or OverflowError or whatever return <unsigned long> math.labs(x) BTW, there is some simple templating/generics-like type merging support upcoming in a GSoC to simplify this kind of type specific code.
Cython is not a glue code generator, it's a full-fledged programming language. It's Python, with additional support for C data types. That makes it great for writing non-trivial wrappers between Python and C. It's not so great for the trivial cases, but luckily, those are rare. ;)
Agreed. The best place for asking about Cython usage is the cython-users mailing list.
Hmm, ok, I guess that's because it's too simple (you actually guessed how it works) and a somewhat rare use case. In most cases, wrappers tend to use extension types, as presented here: http://docs.cython.org/src/tutorial/clibraries.html Stefan

On Mon, Aug 29, 2011 at 2:39 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Yes, this is a very nice advantage. The only advantage that I can think of for ctypes is that it doesn't require a toolchain -- you can just write the Python code and get going. With Cython you will always have to invoke the Cython compiler. Another advantage may be that it works *today* for PyPy -- I don't know the status of Cython for PyPy. Also, (maybe this was answered before?), how well does Cython deal with #include files (especially those you don't have control over, like the ones typically required to use some lib<foo>.so safely on all platforms)? -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Pyrex/Cython deal with it by generating C code that includes the relevant headers, so the C compiler expands all the macros, interprets the struct declarations, etc. All you need to do when writing the .pyx file is follow the same API that you would if you were writing C code to use the library. -- Greg

Guido van Rossum wrote:
On Mon, Aug 29, 2011 at 2:17 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
You might be reading more into that statement than I meant. You have to supply Pyrex/Cython versions of the C declarations, either hand-written or generated by a tool. But you write them based on the advertised C API -- you don't have to manually expand macros, work out the low-level layout of structs, or anything like that (as you often have to do when using ctypes). -- Greg

"Martin v. Löwis", 30.08.2011 10:46:
I had written a bit about this here: http://thread.gmane.org/gmane.comp.python.devel/126340/focus=126419 Stefan

On Tue, Aug 30, 2011 at 9:49 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
To elaborate, with CPython it looks pretty solid, at least for functions and constants (does it do structs?). You must manually declare the name and signature of a function, and Pyrex/Cython emits C code that includes the header and calls the function with the appropriate types. If the signature you declare doesn't match what's in the .h file you'll get a compiler error when the C code is compiled. If (perhaps on some platforms) the function is really a macro, the macro in the .h file will be invoked and the right thing will happen. So far so good. The problem lies with the PyPy backend -- there it generates ctypes code, which means that the signature you declare to Cython/Pyrex must match the *linker* level API, not the C compiler level API. Thus, if in a system header a certain function is really a macro that invokes another function with a permuted or augmented argument list, you'd have to know what that macro does. I also don't see how this would work for #defined constants: where does Cython/Pyrex get their value? ctypes doesn't have their values. So, for PyPy, a solution based on Cython/Pyrex has many of the same downsides as one based on ctypes where it comes to complying with an API defined by a .h file. -- --Guido van Rossum (python.org/~guido)

On 8/30/2011 1:05 PM, Guido van Rossum wrote:
Thank you for this elaboration. My earlier comment that ctypes seems to be hard to use was based on observation of posts to python-list presenting failed attempts (which have included somehow getting function signatures wrong) and a sense that ctypes was somehow bypassing the public compiler API to make a more direct access via some private api. You have explained and named that as the 'linker API', so I understand much better now. Nothing like 'linker API' or 'signature' appears in the ctypes doc. All I could find about discovering specific function calling conventions is "To find out the correct calling convention you have to look into the C header file or the documentation for the function you want to call." Perhaps that should be elaborated to explain, as you did above, the need to trace macro definitions to find the actual calling convention and the need to be aware that macro definitions can change to accommodate implementation detail changes even as the surface calling conventions seems to remain the same. -- Terry Jan Reedy

Guido van Rossum, 30.08.2011 19:05:
Sure. They even coerce from Python dicts and accept keyword arguments in Cython.
Right.
Right again. The declarations that Cython uses describe the API at the C or C++ level. They do not describe the ABI. So the situation is the same as with ctypes, and the same solutions (or work-arounds) apply, such as generating additional glue code that calls macros or reads compile time constants, for example. That's the approach that the IronPython backend has taken. It's a lot more complex, but also a lot more versatile in the long run. Stefan

On Tue, Aug 30, 2011 at 10:05 AM, Guido van Rossum <guido@python.org> wrote:
It's certainly a harder problem. For most simple constants, Cython/Pyrex might be able to generate a series of tiny C programs with which to find CPP symbol values: #include "file1.h" ... #include "filen.h" main() { printf("%d", POSSIBLE_CPP_SYMBOL1); } ...and again with %f, %s, etc. The typing is quite a mess, and code fragments would probably be impractical. But since the C Preprocessor is supposedly turing complete, maybe there's a pleasant surprise waiting there. But hopefully clang has something that'd make this easier. SIP's approach of using something close to, but not identical to, the .h's sounds like it might be pretty productive - especially if the derivative of the .h's could be automatically derived using a python script, with minor tweaks to the inputs on .h upgrades. But sip itself is apparently C++-only.

Dan Stromberg, 01.09.2011 19:56:
The user will commonly declare #defined values as typed external variables and callable macros as functions in .pxd files. These manually typed "macro" functions allow users to tell Cython what it should know about how the macros will be used. And that would allow it to generate C/C++ glue code for them that uses the declared types as a real function signature and calls the macro underneath.
and code fragments would probably be impractical.
Not necessarily at the C level but certainly for a ctypes backend, yes.
But hopefully clang has something that'd make this easier.
For figuring these things out, maybe. Not so much for solving the problems they introduce. Stefan

Dan Stromberg wrote:
http://www.riverbankcomputing.co.uk/software/sip/intro "What is SIP? One of the features of Python that makes it so powerful is the ability to take existing libraries, written in C or C++, and make them available as Python extension modules. Such extension modules are often called bindings for the library. SIP is a tool that makes it very easy to create Python bindings for C and C++ libraries. It was originally developed to create PyQt, the Python bindings for the Qt toolkit, but can be used to create bindings for any C or C++ library. " It's not C++ only. The code for SIP is also in C. Jeremy

On Sat, Aug 27, 2011 at 11:58 PM, Terry Reedy <tjreedy@udel.edu> wrote:
I am trying to work through getting these issues resolved. The hard part so far has been getting reviews and commits. The follow patches are awaiting review (the patch for issue 11241 has been accepted, just not applied): 1. http://bugs.python.org/issue9041 2. http://bugs.python.org/issue9651 3. http://bugs.python.org/issue11241 I am more than happy to keep working through these issues, but I need some help getting the patches actually applied since I don't have commit rights. -- # Meador

Meador Inge <meadori <at> gmail.com> writes:
I raised a question about this patch (in the issue tracker).
2. http://bugs.python.org/issue9651 3. http://bugs.python.org/issue11241
I presume, since Amaury has commit rights, that he could commit these. Regards, Vinay Sajip

I also have some patches sitting on the tracker for some time: http://bugs.python.org/issue12764 http://bugs.python.org/issue11835 http://bugs.python.org/issue12528 which also fixes http://bugs.python.org/issue6069 and http://bugs.python.org/issue11920 http://bugs.python.org/issue6068 which also fixes http://bugs.python.org/issue6493 Thank you, Vlad On Tue, Aug 30, 2011 at 6:09 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:

On Sat, Aug 27, 2011 at 3:14 PM, Dan Stromberg <drsalists@gmail.com> wrote:
IMO, we really, really need some common way of accessing C libraries that works for all major Python variants.
We have one. It's called writing an extension module. ctypes is a crutch because it doesn't realistically have access to the header files. It's a fine crutch for PyPy, which doesn't have much of an alternative. It's also a fine crutch for people who need something to run *now*. It's a horrible strategy for the standard library. If you have a better proposal please do write it up. But so far you are mostly exposing your ignorance and insisting dramatically that you be educated. -- --Guido van Rossum (python.org/~guido)

On Sat, Aug 27, 2011 at 8:57 PM, Guido van Rossum <guido@python.org> wrote:
And yet Cext's are full of CPython-isms. I've said in the past that Python has been lucky in that it had only a single implementation for a long time, but still managed to escape becoming too defined by the idiosyncrasies of that implementation - that's quite impressive, and is probably our best indication that Python has had leadership with foresight. In the language proper, I'd say I still believe this, but Cext's are sadly not a good example.
ctypes is a crutch because it doesn't realistically have access to the header files.
Well, actually, header files are pretty easy to come by. I bet you've installed them yourself many times. In fact, you've probably even automatically brought some of them in via a package management system of one form or another without getting your hands dirty. As a thought experiment, imagine having a ctypes configuration system that looks around a computer for .h's and .so's (etc) with even 25% of the effort expended by GNU autoconf. Instead of building the results into a bunch of .o's, the results are saved in a .ct file or something. If you build-in some reasonable default locations to look in, plus the equivalent of some -I's and -L's (and maybe -rpath's) as needed, you probably end up with a pretty comparable system. (typedef's might be a harder problem - that's particularly worth discussing, IMO - your chance to nip this in the bud with a reasoned explanation why they can't be handled well!) It's a fine crutch for PyPy, which doesn't have much of
an alternative.
Wait - a second ago I thought I was to believe that C extension modules were the one true way of interfacing with C code across all major implementations? Are we perhaps saying that CPython is "the" major implementation, and that we want it to stay that way? I personally feel that PyPy has arrived as a major implementation. The backup program I've been writing in my spare time runs great on PyPy (and the CPython's from 2.5.x, and pretty well on Jython). And PyPy has been maturing very rapidly ('just wish they'd do 3.x!). It's also a fine crutch for people who need something
to run *now*. It's a horrible strategy for the standard library.
I guess I'm coming to see this as dogma. If ctypes is augmented with type information and/or version information and where to find things, wouldn't it Become safe and convenient? Or do you have other concerns? Make a list of things that can go wrong with ctypes modules. Now make a list of things that can wrong with C extension modules. Aren't they really pretty similar - missing .so, .so in a weird place, and especially: .so with a changed interface? C really isn't a very safe language - not like http://en.wikipedia.org/wiki/Turing_%28programming_language%29 or something. Perhaps it's a little easier to mess things up with ctypes today (a recompile doesn't fix, or at least detect, as many problems), but isn't it at least worth Thinking about how that situation could be improved? If you have a better proposal please do write it up. But so far you
are mostly exposing your ignorance and insisting dramatically that you be educated.
I'm not sure why you're trying to avoid having a discussion. I think it's premature to dive into a proposal before getting other people's thoughts. Frankly, 100 people tend to think better than one - at least, if the 100 people feel like they can talk. I'm -not- convinced ctypes are the way forward. I just want to talk about it - for now. ctypes have some significant advantages - if we can find a way to eliminate and/or ameliorate their disadvantages, they might be quite a bit nicer than Cext's.

On Sat, Aug 27, 2011 at 10:36 PM, Dan Stromberg <drsalists@gmail.com> wrote:
I have to apologize, I somehow misread your "all Python variants" as a mixture of "all CPython versions" and "all platforms where CPython runs". While I have no desire to continue this discussion, you are most welcome to do so. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
I think Dan means some way of doing this without having to hand-craft a different one for each Python implementation. If we're really serious about the idea that "Python is not CPython", this seems like a reasonable thing to want. Currently the Python universe is very much centred around CPython, with the other implementations perpetually in catch-up mode. My suggestion on how to address this would be something akin to Pyrex or Cython. I gather that there has been some work recently on adding different back-ends to Cython to generate code for different Python implementations. -- Greg

On Sat, Aug 27, 2011 at 9:47 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Please note that the code I'm talking about is not the same as the patches by Per Øyvind Karlsen that are attached to the tracker issue. I have been doing a completely new implementation of the module, specifically to address the concerns raised by Martin and Antoine. (As for why I haven't posted my own changes yet - I'm currently an intern at Google, and they want me to run my code by their open-source team before releasing it into the wild. Sorry for the delay and the confusion.)
I talked to Antoine about this on IRC; he didn't seem to think a PEP would be necessary. But a summary of the discussion on the tracker issue might still be a useful thing to have, given how long it's gotten.
As stated in my earlier response to Martin, I intend to do this. Aside from I/O, though, there's not much that _can_ be done in Python - the rest is basically just providing a thin wrapper for the C library. On Sat, Aug 27, 2011 at 9:58 PM, Dan Stromberg <drsalists@gmail.com> wrote:
I'd like to better understand why ctypes is (sometimes) frowned upon.
Is it the brittleness? Tendency to segfault?
The problem (as I understand it) is that ABI changes in a library will cause code that uses it via ctypes to break without warning. With an extension module, you'll get a compile failure if you rely on things that change in an incompatible way. With a ctypes wrapper, you just get incorrect answers, or segfaults.
This might be feasible for a specific application running in a controlled environment, but it seems impractical for something as widely-used as the stdlib. Having to include a whitelist of acceptable library versions would be a substantial maintenance burden, and (compatible) new versions would not work until the library whitelist gets updated. Cheers, Nadeem

I've updated the issue <http://bugs.python.org/issue6715> with a patch containing my work so far - the LZMACompressor and LZMADecompressor classes, along with some tests. These two classes should provide a fairly complete interface to liblzma; it will be possible to implement LZMAFile on top of them, entirely in Python. Note that the C code does no I/O; this will be handled by LZMAFile. Please take a look, and let me know what you think. Cheers, Nadeem

I've posted an updated patch to the bug tracker, with a complete implementation of the lzma module, including 100% test coverage for the LZMAFile class (which is implemented entirely in Python). It doesn't include ReST documentation (yet), but the docstrings are quite detailed. Please take a look and let me know what you think. Cheers, Nadeem

Another update - I've added proper documentation. Now the code should be pretty much complete - all that's missing is the necessary bits and pieces to build it on Windows. Cheers, Nadeem

Dan Stromberg, 27.08.2011 21:58:
Maybe unwieldy code and slow execution on CPython? Note that there's a ctypes backend for Cython being written as part of a GSoC, so it should eventually become possible to write C library wrappers in Cython and have it generate a ctypes version to run on PyPy. That, together with the IronPython backend that is on its way, would give you a way to write fast wrappers for at least three of the major four Python implementations, without sacrificing readability or speed in one of them. Stefan

On Sun, 28 Aug 2011 01:52:51 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
I think you're taking this too seriously. Our extension modules (_bz2, _ssl...) are *already* optional even on CPython. If the library or its development headers are not available on the system, building these extensions is simply skipped, and the test suite passes nonetheless. The only required libraries for passing the tests being basically the libc and the zlib. Regards Antoine.

PEP 399 also comes into play - we need a pure Python version for PyPy et al (or a plausible story for why an exception should be granted).
No, we don't. We can grant an exception, which I'm very willing to do. The PEP lists wrapping a specific C-based library as a plausible reason.
It's acceptable for the Python version to use ctypes
Hmm. To me, *that's* unacceptable. In the specific case, having a pure-Python implementation would be acceptable to me, but I'm skeptical that anybody is willing to produce one. Regards, Martin

On Sat, Aug 27, 2011 at 5:15 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
It is not my intention for the _lzma C module to do I/O - that will be done by the LZMAFile class, which will be written in Python. My comparison with bz2 was in reference to the state of the module after it was rewritten for issue 5863. Saying "anything along the lines of GzipFile" was a bad choice of wording; what I meant is that the LZMAFile class won't handle the problem of picking apart the .xz and .lzma container formats. That is handled by liblzma (operating entirely on in-memory buffers). It will do _only_ I/O, in a similar fashion to the BZ2File class (as of changeset 2cb07a46f4b5, to avoid ambiguity ;) ). Cheers, Nadeem

On 8/27/2011 9:47 AM, Nadeem Vawda wrote:
As I read the discussion, the idea has been more or less accepted in principle. However, the current patch is not and needs changes.
I believe Antoine suggested a PEP. It should summarize the salient points in the long tracker discussion into a coherent exposition and flesh out the details implied above. (Perhaps they are already in the proposed doc addition.)
I would follow Martin's suggestions, including doing all i/o with the io module and the following: "So I would propose that a very thin C layer is created around the C library that focuses on the actual algorithms, and that any higher layers (in particular file formats) are done in Python." If we minimize the C code we add and maximize what is done in Python, that would maximize the ease of porting to other implementations. This would conform to the spirit of PEP 399. -- Terry Jan Reedy
participants (20)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Barry Warsaw
-
Benjamin Peterson
-
Dan Stromberg
-
Glyph Lefkowitz
-
Greg Ewing
-
Guido van Rossum
-
Jeremy Sanders
-
M.-A. Lemburg
-
Meador Inge
-
Nadeem Vawda
-
Nick Coghlan
-
Paul Moore
-
Ross Lagerwall
-
Simon Cross
-
Stefan Behnel
-
Terry Reedy
-
Vinay Sajip
-
Vlad Riscutia