C-API status of Python 3?

Hi all, I would like to know how stable the C-API of Python 3 is, or what the expected release level (beta?) would be at which I can expect it to stabilise. What is the plan here? The background is Cython, which will need to support Python 3 one day or another, so I wanted to know from which point on it will make sense to start thinking about a migration plan. Thanks, Stefan

Stefan Behnel wrote:
The 3.0 API isn't stable yet. I plan to rename some of the functions before the first beta is released. Currently the naming schema is too confusing: PyUnicode - str PyString - bytes PyBytes - bytearray See? :) The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing. Christian

On Sat, Mar 1, 2008 at 12:14 PM, Christian Heimes <lists@cheimes.de> wrote: ...
Yep, but please do keep the PyUnicode for str and PyString for bytes (as macros/synonnyms of PyStr and PyBytes if you want!-) to help the task of porting existing extensions... the bytearray functions should no doubt be PyBytearray, though. Alex

Alex Martelli wrote:
Yeah, we've already planed to keep PyUnicode as prefix for str type functions. It makes perfectly sense, not only from the historical point of view. But for PyString I planed to rename the prefix to PyBytes. In my opinion we are going to regret it, when we keep too many legacy names from 2.x. In order to make the migration process easier I can add a header file that provides PyString_* functions as aliases for PyBytes_* Comments? Christian

On 2008-03-02 14:47, Christian Heimes wrote:
+1 Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ? The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak. However, depending on the application space, "string" is used as synonym for "text string" just as well as "data string". Removing the term "string" altogether would make it easier for people to understand that Py3k only has unicode (for text data) and bytes (for binary data). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 02 2008)
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

I agree that "string" is very overloaded, but calling it "unicode" is sort of like calling integers "int32" -- that is, you're talking about the implementation rather than the type. In most programming languages that aren't at the machine level (like C is), "string" really is a sequence of text characters, not a "string of bytes", and that's probably the term that should be used for Python going forward, despite the legacy issues it involves. Personally, I feel that "string" (for text) and "bytes" (for binary data represented as a sequence of bytes) are appropriate terms for Python. Keep "unicode" for a release or two as an alias for "string". But isn't all this in a PEP somewhere already? Bill

On 2008-03-02 20:39, Bill Janssen wrote:
Hmm in that case, we'd have to call it "ucs2" or "ucs4" depending on how Python was compiled ;-)
I'm not bound to "unicode" at all, just don't think using "string" for text data will really make people think twice often enough and then you end up having binary data in a "string" again - with the only difference that it's now using the Unicode type internally. My personal favorite is "text" for text data.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 03 2008)
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

M.-A. Lemburg wrote:
Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?
-1 on making us type 7 characters instead of 3 all over the place.
I disagree -- "string" has come to mean "string of characters" unless otherwise qualified. Using one to hold non-characters is just an aberration that was necessary in Python 2 because there wasn't much alternative. -- Greg

On 2008-03-02 23:11, Greg Ewing wrote:
Oh well... how about "text()" ?
Buffer objects have been around for years and for exactly this purpose. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 03 2008)
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

On Sun, Mar 2, 2008 at 3:26 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Sorry, this was discussed and decided long ago. I'm not going to change this now. The type is called string or some variation thereof in most other popular languages.
Historically that's incorrect. In 1990, when Unicode hadn't even been invented, 'str' was very intentionally designed to hold text and data equally well.
Buffer objects have been around for years and for exactly this purpose.
No, buffer objects were not invented to *hold* binary data. The buffer API was invented to *reference* bytes that were owned by 3rd party C libraries. Its descendant in Py3k, 'memoryview' (see PEP 3118) has the same purpose without having the same bugs. For *holding* bytes in Py3k we'll use bytes (immutable) or bytearray (mutable). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sun, Mar 2, 2008 at 10:39 AM, Gregory P. Smith <greg@krypto.org> wrote:
OK, as long as it's also supplied (and presumably empty) for 2.6 -- my key concern is faciitating the maintenance of a single codebase for C-coded Python extensions that can be compiled for both 2.6 and 3.0. (I'm also thinking of SWIG and similar utilities, but those can probably best be tweaked to emit rather different C code for the two cases; still, that C code will also include some C snippets hand-coded by the extension author/maintainer, e.g. via SWIG typemaps &c, so easing the "single codebase" approach may help there too). I don't think we want to go the route of code translators/generators for C-coded Python extensions (the way we do for Python code via 2to3), and the fewer #if's and #define's C extension authors/maintainers are required to devise (in order to support both 2.6 and 3.0), the likelier it is that we'll see 3.0 support in popular C-coded Python extensions sooner rather than later. Alex

Christian Heimes wrote:
The release schedule in PEP 3000 says "August 2008" for 3.0 final, is that still the current goal? Can I expect the C-API to stabilise by June, then? That's where we are planning a Cython workshop with a couple of sprints. Py3k support might be worth targeting - if we can rely on a fixed target by then.
I see. :) I actually expect the string semantics to be amongst the harder changes (at least, it's the most visible from a C-API point of view). However, names are not a big problem if you generate code anyway. Behaviour is what matters most for Cython. And we're already trying to adapt Cython's syntax to Py3k's, although that's not a requirement in all cases, as Cython lives with a couple of differences already. Keeping old syntax around and mapping it to the new C-API makes it easier to migrate existing Cython code. Hmmm, I even guess that the biggest problem might be porting Cython itself...
The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing.
Ok, so I guess it would at least be a good idea to wait for the docs to be fixed, then. Thanks, Stefan

Stefan Behnel wrote:
Yes, August 2008 is still our goal. I still think it's a realistic goal. The C API is mostly stabilized around May when we target the first beta.
The semantics are easier in Python 3.x than in the 2.x series. Old style classes are gone, longs are gone and integers are PyLong based, the distinction between bytes and text is much easier ... Christian

Stefan Behnel wrote:
The 3.0 API isn't stable yet. I plan to rename some of the functions before the first beta is released. Currently the naming schema is too confusing: PyUnicode - str PyString - bytes PyBytes - bytearray See? :) The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing. Christian

On Sat, Mar 1, 2008 at 12:14 PM, Christian Heimes <lists@cheimes.de> wrote: ...
Yep, but please do keep the PyUnicode for str and PyString for bytes (as macros/synonnyms of PyStr and PyBytes if you want!-) to help the task of porting existing extensions... the bytearray functions should no doubt be PyBytearray, though. Alex

Alex Martelli wrote:
Yeah, we've already planed to keep PyUnicode as prefix for str type functions. It makes perfectly sense, not only from the historical point of view. But for PyString I planed to rename the prefix to PyBytes. In my opinion we are going to regret it, when we keep too many legacy names from 2.x. In order to make the migration process easier I can add a header file that provides PyString_* functions as aliases for PyBytes_* Comments? Christian

On 2008-03-02 14:47, Christian Heimes wrote:
+1 Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ? The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak. However, depending on the application space, "string" is used as synonym for "text string" just as well as "data string". Removing the term "string" altogether would make it easier for people to understand that Py3k only has unicode (for text data) and bytes (for binary data). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 02 2008)
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

I agree that "string" is very overloaded, but calling it "unicode" is sort of like calling integers "int32" -- that is, you're talking about the implementation rather than the type. In most programming languages that aren't at the machine level (like C is), "string" really is a sequence of text characters, not a "string of bytes", and that's probably the term that should be used for Python going forward, despite the legacy issues it involves. Personally, I feel that "string" (for text) and "bytes" (for binary data represented as a sequence of bytes) are appropriate terms for Python. Keep "unicode" for a release or two as an alias for "string". But isn't all this in a PEP somewhere already? Bill

On 2008-03-02 20:39, Bill Janssen wrote:
Hmm in that case, we'd have to call it "ucs2" or "ucs4" depending on how Python was compiled ;-)
I'm not bound to "unicode" at all, just don't think using "string" for text data will really make people think twice often enough and then you end up having binary data in a "string" again - with the only difference that it's now using the Unicode type internally. My personal favorite is "text" for text data.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 03 2008)
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

M.-A. Lemburg wrote:
Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?
-1 on making us type 7 characters instead of 3 all over the place.
I disagree -- "string" has come to mean "string of characters" unless otherwise qualified. Using one to hold non-characters is just an aberration that was necessary in Python 2 because there wasn't much alternative. -- Greg

On 2008-03-02 23:11, Greg Ewing wrote:
Oh well... how about "text()" ?
Buffer objects have been around for years and for exactly this purpose. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 03 2008)
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

On Sun, Mar 2, 2008 at 3:26 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Sorry, this was discussed and decided long ago. I'm not going to change this now. The type is called string or some variation thereof in most other popular languages.
Historically that's incorrect. In 1990, when Unicode hadn't even been invented, 'str' was very intentionally designed to hold text and data equally well.
Buffer objects have been around for years and for exactly this purpose.
No, buffer objects were not invented to *hold* binary data. The buffer API was invented to *reference* bytes that were owned by 3rd party C libraries. Its descendant in Py3k, 'memoryview' (see PEP 3118) has the same purpose without having the same bugs. For *holding* bytes in Py3k we'll use bytes (immutable) or bytearray (mutable). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sun, Mar 2, 2008 at 10:39 AM, Gregory P. Smith <greg@krypto.org> wrote:
OK, as long as it's also supplied (and presumably empty) for 2.6 -- my key concern is faciitating the maintenance of a single codebase for C-coded Python extensions that can be compiled for both 2.6 and 3.0. (I'm also thinking of SWIG and similar utilities, but those can probably best be tweaked to emit rather different C code for the two cases; still, that C code will also include some C snippets hand-coded by the extension author/maintainer, e.g. via SWIG typemaps &c, so easing the "single codebase" approach may help there too). I don't think we want to go the route of code translators/generators for C-coded Python extensions (the way we do for Python code via 2to3), and the fewer #if's and #define's C extension authors/maintainers are required to devise (in order to support both 2.6 and 3.0), the likelier it is that we'll see 3.0 support in popular C-coded Python extensions sooner rather than later. Alex

Christian Heimes wrote:
The release schedule in PEP 3000 says "August 2008" for 3.0 final, is that still the current goal? Can I expect the C-API to stabilise by June, then? That's where we are planning a Cython workshop with a couple of sprints. Py3k support might be worth targeting - if we can rely on a fixed target by then.
I see. :) I actually expect the string semantics to be amongst the harder changes (at least, it's the most visible from a C-API point of view). However, names are not a big problem if you generate code anyway. Behaviour is what matters most for Cython. And we're already trying to adapt Cython's syntax to Py3k's, although that's not a requirement in all cases, as Cython lives with a couple of differences already. Keeping old syntax around and mapping it to the new C-API makes it easier to migrate existing Cython code. Hmmm, I even guess that the biggest problem might be porting Cython itself...
The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing.
Ok, so I guess it would at least be a good idea to wait for the docs to be fixed, then. Thanks, Stefan

Stefan Behnel wrote:
Yes, August 2008 is still our goal. I still think it's a realistic goal. The C API is mostly stabilized around May when we target the first beta.
The semantics are easier in Python 3.x than in the 2.x series. Old style classes are gone, longs are gone and integers are PyLong based, the distinction between bytes and text is much easier ... Christian
participants (9)
-
Alex Martelli
-
Bill Janssen
-
Christian Heimes
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
M.-A. Lemburg
-
Phil Thompson
-
Stefan Behnel