Mailman 3 C-API status of Python 3? - Python-Dev

Christian Heimes

March 2008

3:14 a.m.

Stefan Behnel wrote:

The 3.0 API isn't stable yet. I plan to rename some of the functions before the first beta is released. Currently the naming schema is too confusing: PyUnicode - str PyString - bytes PyBytes - bytearray See? :) The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing. Christian

Reply

Sign in to reply online Use email software

Christian Heimes

8:47 p.m.

Alex Martelli wrote:

...

Yeah, we've already planed to keep PyUnicode as prefix for str type functions. It makes perfectly sense, not only from the historical point of view. But for PyString I planed to rename the prefix to PyBytes. In my opinion we are going to regret it, when we keep too many legacy names from 2.x. In order to make the migration process easier I can add a header file that provides PyString_* functions as aliases for PyBytes_* Comments? Christian

Reply

Sign in to reply online Use email software

Bill Janssen

2:39 a.m.

...

I agree that "string" is very overloaded, but calling it "unicode" is sort of like calling integers "int32" -- that is, you're talking about the implementation rather than the type. In most programming languages that aren't at the machine level (like C is), "string" really is a sequence of text characters, not a "string of bytes", and that's probably the term that should be used for Python going forward, despite the legacy issues it involves. Personally, I feel that "string" (for text) and "bytes" (for binary data represented as a sequence of bytes) are appropriate terms for Python. Keep "unicode" for a release or two as an alias for "string". But isn't all this in a PEP somewhere already? Bill

Reply

Sign in to reply online Use email software

Greg Ewing

5:11 a.m.

M.-A. Lemburg wrote:

...

Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?

-1 on making us type 7 characters instead of 3 all over the place.

...

I disagree -- "string" has come to mean "string of characters" unless otherwise qualified. Using one to hold non-characters is just an aberration that was necessary in Python 2 because there wasn't much alternative. -- Greg

Reply

Sign in to reply online Use email software

Guido van Rossum

8:14 a.m.

On Sun, Mar 2, 2008 at 3:26 PM, M.-A. Lemburg <mal@egenix.com> wrote:

...

On 2008-03-02 23:11, Greg Ewing wrote:

...
M.-A. Lemburg wrote:

...
Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?

-1 on making us type 7 characters instead of 3 all over the place.

Oh well... how about "text()" ?

Sorry, this was discussed and decided long ago. I'm not going to change this now. The type is called string or some variation thereof in most other popular languages.

...

...
...
The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak.

I disagree -- "string" has come to mean "string of characters" unless otherwise qualified. Using one to hold non-characters is just an aberration that was necessary in Python 2 because there wasn't much alternative.

Historically that's incorrect. In 1990, when Unicode hadn't even been invented, 'str' was very intentionally designed to hold text and data equally well.

...

Buffer objects have been around for years and for exactly this purpose.

No, buffer objects were not invented to *hold* binary data. The buffer API was invented to *reference* bytes that were owned by 3rd party C libraries. Its descendant in Py3k, 'memoryview' (see PEP 3118) has the same purpose without having the same bugs. For *holding* bytes in Py3k we'll use bytes (immutable) or bytearray (mutable). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Reply

Sign in to reply online Use email software

Alex Martelli

2:06 a.m.

On Sun, Mar 2, 2008 at 10:39 AM, Gregory P. Smith <greg@krypto.org> wrote:

...

OK, as long as it's also supplied (and presumably empty) for 2.6 -- my key concern is faciitating the maintenance of a single codebase for C-coded Python extensions that can be compiled for both 2.6 and 3.0. (I'm also thinking of SWIG and similar utilities, but those can probably best be tweaked to emit rather different C code for the two cases; still, that C code will also include some C snippets hand-coded by the extension author/maintainer, e.g. via SWIG typemaps &c, so easing the "single codebase" approach may help there too). I don't think we want to go the route of code translators/generators for C-coded Python extensions (the way we do for Python code via 2to3), and the fewer #if's and #define's C extension authors/maintainers are required to devise (in order to support both 2.6 and 3.0), the likelier it is that we'll see 3.0 support in popular C-coded Python extensions sooner rather than later. Alex

Reply

Sign in to reply online Use email software

Stefan Behnel

12:44 p.m.

Christian Heimes wrote:

...

Stefan Behnel wrote:

...
I would like to know how stable the C-API of Python 3 is, or what the expected release level (beta?) would be at which I can expect it to stabilise. What is the plan here?

The release schedule in PEP 3000 says "August 2008" for 3.0 final, is that still the current goal? Can I expect the C-API to stabilise by June, then? That's where we are planning a Cython workshop with a couple of sprints. Py3k support might be worth targeting - if we can rely on a fixed target by then.

...

...
The background is Cython, which will need to support Python 3 one day or another, so I wanted to know from which point on it will make sense to start thinking about a migration plan.

The 3.0 API isn't stable yet. I plan to rename some of the functions before the first beta is released. Currently the naming schema is too confusing:

PyUnicode - str PyString - bytes PyBytes - bytearray

See? :)

I see. :) I actually expect the string semantics to be amongst the harder changes (at least, it's the most visible from a C-API point of view). However, names are not a big problem if you generate code anyway. Behaviour is what matters most for Cython. And we're already trying to adapt Cython's syntax to Py3k's, although that's not a requirement in all cases, as Cython lives with a couple of differences already. Keeping old syntax around and mapping it to the new C-API makes it easier to migrate existing Cython code. Hmmm, I even guess that the biggest problem might be porting Cython itself...

...

The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing.

Ok, so I guess it would at least be a good idea to wait for the docs to be fixed, then. Thanks, Stefan

Reply

Sign in to reply online Use email software

Christian Heimes

March 2008

3:14 a.m.

Stefan Behnel wrote:

...

The 3.0 API isn't stable yet. I plan to rename some of the functions before the first beta is released. Currently the naming schema is too confusing: PyUnicode - str PyString - bytes PyBytes - bytearray See? :) The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing. Christian

Reply

Sign in to reply online Use email software

Christian Heimes

8:47 p.m.

Alex Martelli wrote:

...

Yeah, we've already planed to keep PyUnicode as prefix for str type functions. It makes perfectly sense, not only from the historical point of view. But for PyString I planed to rename the prefix to PyBytes. In my opinion we are going to regret it, when we keep too many legacy names from 2.x. In order to make the migration process easier I can add a header file that provides PyString_* functions as aliases for PyBytes_* Comments? Christian

Reply

Sign in to reply online Use email software

Bill Janssen

2:39 a.m.

...

I agree that "string" is very overloaded, but calling it "unicode" is sort of like calling integers "int32" -- that is, you're talking about the implementation rather than the type. In most programming languages that aren't at the machine level (like C is), "string" really is a sequence of text characters, not a "string of bytes", and that's probably the term that should be used for Python going forward, despite the legacy issues it involves. Personally, I feel that "string" (for text) and "bytes" (for binary data represented as a sequence of bytes) are appropriate terms for Python. Keep "unicode" for a release or two as an alias for "string". But isn't all this in a PEP somewhere already? Bill

Reply

Sign in to reply online Use email software

Greg Ewing

March 2008

10:11 p.m.

M.-A. Lemburg wrote:

...

Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?

-1 on making us type 7 characters instead of 3 all over the place.

...

I disagree -- "string" has come to mean "string of characters" unless otherwise qualified. Using one to hold non-characters is just an aberration that was necessary in Python 2 because there wasn't much alternative. -- Greg

Reply

Sign in to reply online Use email software

Guido van Rossum

1:14 a.m.

On Sun, Mar 2, 2008 at 3:26 PM, M.-A. Lemburg <mal@egenix.com> wrote:

...

On 2008-03-02 23:11, Greg Ewing wrote:

...
M.-A. Lemburg wrote:

...
Why not also make unicode() the default type constructor and only keep str() as alias to simplify porting (perhaps with a warning) ?

-1 on making us type 7 characters instead of 3 all over the place.

Oh well... how about "text()" ?

Sorry, this was discussed and decided long ago. I'm not going to change this now. The type is called string or some variation thereof in most other popular languages.

...

...
...
The term "string" is just too overloaded with all kinds of misinterpretations. The term "string" just refers to a string of bytes - a variable length array so to speak.

I disagree -- "string" has come to mean "string of characters" unless otherwise qualified. Using one to hold non-characters is just an aberration that was necessary in Python 2 because there wasn't much alternative.

Historically that's incorrect. In 1990, when Unicode hadn't even been invented, 'str' was very intentionally designed to hold text and data equally well.

...

Buffer objects have been around for years and for exactly this purpose.

No, buffer objects were not invented to *hold* binary data. The buffer API was invented to *reference* bytes that were owned by 3rd party C libraries. Its descendant in Py3k, 'memoryview' (see PEP 3118) has the same purpose without having the same bugs. For *holding* bytes in Py3k we'll use bytes (immutable) or bytearray (mutable). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Reply

Sign in to reply online Use email software

Alex Martelli

7:06 p.m.

On Sun, Mar 2, 2008 at 10:39 AM, Gregory P. Smith <greg@krypto.org> wrote:

...

OK, as long as it's also supplied (and presumably empty) for 2.6 -- my key concern is faciitating the maintenance of a single codebase for C-coded Python extensions that can be compiled for both 2.6 and 3.0. (I'm also thinking of SWIG and similar utilities, but those can probably best be tweaked to emit rather different C code for the two cases; still, that C code will also include some C snippets hand-coded by the extension author/maintainer, e.g. via SWIG typemaps &c, so easing the "single codebase" approach may help there too). I don't think we want to go the route of code translators/generators for C-coded Python extensions (the way we do for Python code via 2to3), and the fewer #if's and #define's C extension authors/maintainers are required to devise (in order to support both 2.6 and 3.0), the likelier it is that we'll see 3.0 support in popular C-coded Python extensions sooner rather than later. Alex

Reply

Sign in to reply online Use email software

Stefan Behnel

March 2008

5:44 a.m.

Christian Heimes wrote:

...

Stefan Behnel wrote:

...
I would like to know how stable the C-API of Python 3 is, or what the expected release level (beta?) would be at which I can expect it to stabilise. What is the plan here?

The release schedule in PEP 3000 says "August 2008" for 3.0 final, is that still the current goal? Can I expect the C-API to stabilise by June, then? That's where we are planning a Cython workshop with a couple of sprints. Py3k support might be worth targeting - if we can rely on a fixed target by then.

...

...
The background is Cython, which will need to support Python 3 one day or another, so I wanted to know from which point on it will make sense to start thinking about a migration plan.

The 3.0 API isn't stable yet. I plan to rename some of the functions before the first beta is released. Currently the naming schema is too confusing:

PyUnicode - str PyString - bytes PyBytes - bytearray

See? :)

I see. :) I actually expect the string semantics to be amongst the harder changes (at least, it's the most visible from a C-API point of view). However, names are not a big problem if you generate code anyway. Behaviour is what matters most for Cython. And we're already trying to adapt Cython's syntax to Py3k's, although that's not a requirement in all cases, as Cython lives with a couple of differences already. Keeping old syntax around and mapping it to the new C-API makes it easier to migrate existing Cython code. Hmmm, I even guess that the biggest problem might be porting Cython itself...

...

The documentation for the PyString functions is outdated and IIRC the PyBytes docs are non existing.

Ok, so I guess it would at least be a good idea to wait for the docs to be fixed, then. Thanks, Stefan

Reply

Sign in to reply online Use email software

C-API status of Python 3?

Stefan Behnel

Christian Heimes

Alex Martelli

Christian Heimes

M.-A. Lemburg

Bill Janssen

M.-A. Lemburg

Greg Ewing

M.-A. Lemburg

Guido van Rossum

Gregory P. Smith

Alex Martelli

Phil Thompson

Stefan Behnel

Christian Heimes

Christian Heimes

Alex Martelli

Christian Heimes

M.-A. Lemburg

Bill Janssen

M.-A. Lemburg

Greg Ewing

M.-A. Lemburg

Guido van Rossum

Gregory P. Smith

Alex Martelli

Phil Thompson

Stefan Behnel

Christian Heimes

tags

participants (9)