C-API status of Python 3?

Hi all, I would like to know how stable the C-API of Python 3 is, or what the expected release level (beta?) would be at which I can expect it to stabilise. What is the plan here? The background is Cython, which will need to support Python 3 one day or another, so I wanted to know from which point on it will make sense to start thinking about a migration plan. Thanks, Stefan

Stefan Behnel wrote:
I would like to know how stable the C-API of Python 3 is, or what the expected release level (beta?) would be at which I can expect it to stabilise. What is the plan here?
The background is Cython, which will need to support Python 3 one day or another, so I wanted to know from which point on it will make sense to start thinking about a migration plan.
The 3.0 API isn't stable yet. I plan to rename some of the functions before the first beta is released. Currently the naming schema is too confusing: PyUnicode - str PyString - bytes PyBytes - bytearray See? :) The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing. Christian

On Sat, Mar 1, 2008 at 12:14 PM, Christian Heimes <lists@cheimes.de> wrote: ...
The 3.0 API isn't stable yet. I plan to rename some of the functions before the first beta is released. Currently the naming schema is too confusing:
PyUnicode - str PyString - bytes PyBytes - bytearray
See? :)
Yep, but please do keep the PyUnicode for str and PyString for bytes (as macros/synonnyms of PyStr and PyBytes if you want!-) to help the task of porting existing extensions... the bytearray functions should no doubt be PyBytearray, though. Alex

Alex Martelli wrote:
Yep, but please do keep the PyUnicode for str and PyString for bytes (as macros/synonnyms of PyStr and PyBytes if you want!-) to help the task of porting existing extensions... the bytearray functions should no doubt be PyBytearray, though.
Yeah, we've already planed to keep PyUnicode as prefix for str type functions. It makes perfectly sense, not only from the historical point of view. But for PyString I planed to rename the prefix to PyBytes. In my opinion we are going to regret it, when we keep too many legacy names from 2.x. In order to make the migration process easier I can add a header file that provides PyString_* functions as aliases for PyBytes_* Comments? Christian

On 2008-03-02 14:47, Christian Heimes wrote:
Alex Martelli wrote:
Yep, but please do keep the PyUnicode for str and PyString for bytes (as macros/synonnyms of PyStr and PyBytes if you want!-) to help the task of porting existing extensions... the bytearray functions should no doubt be PyBytearray, though.
Yeah, we've already planed to keep PyUnicode as prefix for str type functions. It makes perfectly sense, not only from the historical point of view.
But for PyString I planed to rename the prefix to PyBytes. In my opinion we are going to regret it, when we keep too many legacy names from 2.x. In order to make the migration process easier I can add a header file that provides PyString_* functions as aliases for PyBytes_*
Comments?
+1 Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ? The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak. However, depending on the application space, "string" is used as synonym for "text string" just as well as "data string". Removing the term "string" altogether would make it easier for people to understand that Py3k only has unicode (for text data) and bytes (for binary data). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 02 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?
The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak. However, depending on the application space, "string" is used as synonym for "text string" just as well as "data string".
Removing the term "string" altogether would make it easier for people to understand that Py3k only has unicode (for text data) and bytes (for binary data).
I agree that "string" is very overloaded, but calling it "unicode" is sort of like calling integers "int32" -- that is, you're talking about the implementation rather than the type. In most programming languages that aren't at the machine level (like C is), "string" really is a sequence of text characters, not a "string of bytes", and that's probably the term that should be used for Python going forward, despite the legacy issues it involves. Personally, I feel that "string" (for text) and "bytes" (for binary data represented as a sequence of bytes) are appropriate terms for Python. Keep "unicode" for a release or two as an alias for "string". But isn't all this in a PEP somewhere already? Bill

On 2008-03-02 20:39, Bill Janssen wrote:
Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?
The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak. However, depending on the application space, "string" is used as synonym for "text string" just as well as "data string".
Removing the term "string" altogether would make it easier for people to understand that Py3k only has unicode (for text data) and bytes (for binary data).
I agree that "string" is very overloaded, but calling it "unicode" is sort of like calling integers "int32" -- that is, you're talking about the implementation rather than the type.
Hmm in that case, we'd have to call it "ucs2" or "ucs4" depending on how Python was compiled ;-)
In most programming languages that aren't at the machine level (like C is), "string" really is a sequence of text characters, not a "string of bytes", and that's probably the term that should be used for Python going forward, despite the legacy issues it involves.
I'm not bound to "unicode" at all, just don't think using "string" for text data will really make people think twice often enough and then you end up having binary data in a "string" again - with the only difference that it's now using the Unicode type internally. My personal favorite is "text" for text data.
Personally, I feel that "string" (for text) and "bytes" (for binary data represented as a sequence of bytes) are appropriate terms for Python. Keep "unicode" for a release or two as an alias for "string". But isn't all this in a PEP somewhere already?
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 03 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

M.-A. Lemburg wrote:
Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?
-1 on making us type 7 characters instead of 3 all over the place.
The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak.
I disagree -- "string" has come to mean "string of characters" unless otherwise qualified. Using one to hold non-characters is just an aberration that was necessary in Python 2 because there wasn't much alternative. -- Greg

On 2008-03-02 23:11, Greg Ewing wrote:
M.-A. Lemburg wrote:
Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?
-1 on making us type 7 characters instead of 3 all over the place.
Oh well... how about "text()" ?
The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak.
I disagree -- "string" has come to mean "string of characters" unless otherwise qualified. Using one to hold non-characters is just an aberration that was necessary in Python 2 because there wasn't much alternative.
Buffer objects have been around for years and for exactly this purpose. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 03 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

On Sun, Mar 2, 2008 at 3:26 PM, M.-A. Lemburg <mal@egenix.com> wrote:
On 2008-03-02 23:11, Greg Ewing wrote:
M.-A. Lemburg wrote:
Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?
-1 on making us type 7 characters instead of 3 all over the place.
Oh well... how about "text()" ?
Sorry, this was discussed and decided long ago. I'm not going to change this now. The type is called string or some variation thereof in most other popular languages.
The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak.
I disagree -- "string" has come to mean "string of characters" unless otherwise qualified. Using one to hold non-characters is just an aberration that was necessary in Python 2 because there wasn't much alternative.
Historically that's incorrect. In 1990, when Unicode hadn't even been invented, 'str' was very intentionally designed to hold text and data equally well.
Buffer objects have been around for years and for exactly this purpose.
No, buffer objects were not invented to *hold* binary data. The buffer API was invented to *reference* bytes that were owned by 3rd party C libraries. Its descendant in Py3k, 'memoryview' (see PEP 3118) has the same purpose without having the same bugs. For *holding* bytes in Py3k we'll use bytes (immutable) or bytearray (mutable). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On 3/2/08, Christian Heimes <lists@cheimes.de> wrote:
Alex Martelli wrote:
Yep, but please do keep the PyUnicode for str and PyString for bytes (as macros/synonnyms of PyStr and PyBytes if you want!-) to help the task of porting existing extensions... the bytearray functions should no doubt be PyBytearray, though.
Yeah, we've already planed to keep PyUnicode as prefix for str type functions. It makes perfectly sense, not only from the historical point of view.
But for PyString I planed to rename the prefix to PyBytes. In my opinion we are going to regret it, when we keep too many legacy names from 2.x. In order to make the migration process easier I can add a header file that provides PyString_* functions as aliases for PyBytes_*
+1 on only doing this via a header that must be explicitly included by modules wanting the compatibility names.

On Sun, Mar 2, 2008 at 10:39 AM, Gregory P. Smith <greg@krypto.org> wrote:
On 3/2/08, Christian Heimes <lists@cheimes.de> wrote:
Alex Martelli wrote:
Yep, but please do keep the PyUnicode for str and PyString for bytes (as macros/synonnyms of PyStr and PyBytes if you want!-) to help the task of porting existing extensions... the bytearray functions should no doubt be PyBytearray, though.
Yeah, we've already planed to keep PyUnicode as prefix for str type functions. It makes perfectly sense, not only from the historical point of view.
But for PyString I planed to rename the prefix to PyBytes. In my opinion we are going to regret it, when we keep too many legacy names from 2.x. In order to make the migration process easier I can add a header file that provides PyString_* functions as aliases for PyBytes_*
+1 on only doing this via a header that must be explicitly included by modules wanting the compatibility names.
OK, as long as it's also supplied (and presumably empty) for 2.6 -- my key concern is faciitating the maintenance of a single codebase for C-coded Python extensions that can be compiled for both 2.6 and 3.0. (I'm also thinking of SWIG and similar utilities, but those can probably best be tweaked to emit rather different C code for the two cases; still, that C code will also include some C snippets hand-coded by the extension author/maintainer, e.g. via SWIG typemaps &c, so easing the "single codebase" approach may help there too). I don't think we want to go the route of code translators/generators for C-coded Python extensions (the way we do for Python code via 2to3), and the fewer #if's and #define's C extension authors/maintainers are required to devise (in order to support both 2.6 and 3.0), the likelier it is that we'll see 3.0 support in popular C-coded Python extensions sooner rather than later. Alex

On Sunday 02 March 2008, Alex Martelli wrote:
On Sun, Mar 2, 2008 at 10:39 AM, Gregory P. Smith <greg@krypto.org> wrote:
On 3/2/08, Christian Heimes <lists@cheimes.de> wrote:
Alex Martelli wrote:
Yep, but please do keep the PyUnicode for str and PyString for bytes (as macros/synonnyms of PyStr and PyBytes if you want!-) to help the task of porting existing extensions... the bytearray functions should no doubt be PyBytearray, though.
Yeah, we've already planed to keep PyUnicode as prefix for str type functions. It makes perfectly sense, not only from the historical point of view.
But for PyString I planed to rename the prefix to PyBytes. In my opinion we are going to regret it, when we keep too many legacy names from 2.x. In order to make the migration process easier I can add a header file that provides PyString_* functions as aliases for PyBytes_*
+1 on only doing this via a header that must be explicitly included by modules wanting the compatibility names.
OK, as long as it's also supplied (and presumably empty) for 2.6 -- my key concern is faciitating the maintenance of a single codebase for C-coded Python extensions that can be compiled for both 2.6 and 3.0. (I'm also thinking of SWIG and similar utilities, but those can probably best be tweaked to emit rather different C code for the two cases; still, that C code will also include some C snippets hand-coded by the extension author/maintainer, e.g. via SWIG typemaps &c, so easing the "single codebase" approach may help there too).
I don't think we want to go the route of code translators/generators for C-coded Python extensions (the way we do for Python code via 2to3), and the fewer #if's and #define's C extension authors/maintainers are required to devise (in order to support both 2.6 and 3.0), the likelier it is that we'll see 3.0 support in popular C-coded Python extensions sooner rather than later.
Speaking for myself, this isn't going to make any difference as pre-2.6 versions of Python still need to be supported. More of a pain is if 2.6 introduces source level incompatibilities with 2.5 (as 2.5 did with 2.4). Phil

Christian Heimes wrote:
Stefan Behnel wrote:
I would like to know how stable the C-API of Python 3 is, or what the expected release level (beta?) would be at which I can expect it to stabilise. What is the plan here?
The release schedule in PEP 3000 says "August 2008" for 3.0 final, is that still the current goal? Can I expect the C-API to stabilise by June, then? That's where we are planning a Cython workshop with a couple of sprints. Py3k support might be worth targeting - if we can rely on a fixed target by then.
The background is Cython, which will need to support Python 3 one day or another, so I wanted to know from which point on it will make sense to start thinking about a migration plan.
The 3.0 API isn't stable yet. I plan to rename some of the functions before the first beta is released. Currently the naming schema is too confusing:
PyUnicode - str PyString - bytes PyBytes - bytearray
See? :)
I see. :) I actually expect the string semantics to be amongst the harder changes (at least, it's the most visible from a C-API point of view). However, names are not a big problem if you generate code anyway. Behaviour is what matters most for Cython. And we're already trying to adapt Cython's syntax to Py3k's, although that's not a requirement in all cases, as Cython lives with a couple of differences already. Keeping old syntax around and mapping it to the new C-API makes it easier to migrate existing Cython code. Hmmm, I even guess that the biggest problem might be porting Cython itself...
The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing.
Ok, so I guess it would at least be a good idea to wait for the docs to be fixed, then. Thanks, Stefan

Stefan Behnel wrote:
The release schedule in PEP 3000 says "August 2008" for 3.0 final, is that still the current goal? Can I expect the C-API to stabilise by June, then? That's where we are planning a Cython workshop with a couple of sprints. Py3k support might be worth targeting - if we can rely on a fixed target by then.
Yes, August 2008 is still our goal. I still think it's a realistic goal. The C API is mostly stabilized around May when we target the first beta.
I actually expect the string semantics to be amongst the harder changes (at least, it's the most visible from a C-API point of view).
However, names are not a big problem if you generate code anyway. Behaviour is what matters most for Cython. And we're already trying to adapt Cython's syntax to Py3k's, although that's not a requirement in all cases, as Cython lives with a couple of differences already. Keeping old syntax around and mapping it to the new C-API makes it easier to migrate existing Cython code.
The semantics are easier in Python 3.x than in the 2.x series. Old style classes are gone, longs are gone and integers are PyLong based, the distinction between bytes and text is much easier ... Christian
participants (9)
-
Alex Martelli
-
Bill Janssen
-
Christian Heimes
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
M.-A. Lemburg
-
Phil Thompson
-
Stefan Behnel