Re: [Cython] Preparing the language level change - Re: [cython-users] Cython 0.29 beta 1 released
Stefan Behnel schrieb am 21.09.2018 um 17:46:
Robert Bradshaw schrieb am 21.09.2018 um 17:30:
I agree that this doesn't really feel like a language-level thing.
There seem three desired behaviors here:
language_level=2 where currently "abc" is always a bytes object langauge_level=3 where currently "abc" is always a unicode object
and a third option, where "abc" is a str object (depending on the runtime). We should support all three of these modes.
Correction: with "language_level=2", unprefixed string literals are "str", i.e. "bytes" in Py2 and "unicode" in Py3. "language_level=3" always makes them unicode strings, and that's what Jeroen was referring to.
It's really difficult to guess what the most common use case is here. The only reason why the type changing unprefixed "str" literals are relevant is that Py2 cannot handle Unicode in some cases (and Unicode strings are memory hogs in Py2), but Py3 requires Unicode in most cases (and handles it efficiently). So, the problem here is really Py2, and the problem will go away as soon as we dump support for it. Until then, however, we're trying to find a solution to make the language level switch bearable and easy the transition.
I added a new directive "str_is_str=True" which can be combined with "language_level=3" to get the desired behaviour. It keeps the 'str' builtin type as it is (it would otherwise become 'unicode' with level 3) and keeps unprefixed string literals as type 'str' in Py2 and Py3. Everything else should depend solely on the language_level switch. https://github.com/cython/cython/commit/cea42915c5e9ea1da9187aa3c55f3f16d04b... I think we're now set for the release. I'll prepare a candidate. Stefan
On 24-09-2018 14:05, Stefan Behnel wrote:
I added a new directive "str_is_str=True" which can be combined with "language_level=3" to get the desired behaviour. It keeps the 'str' builtin type as it is (it would otherwise become 'unicode' with level 3) and keeps unprefixed string literals as type 'str' in Py2 and Py3. Everything else should depend solely on the language_level switch.
For consistency with the CPython from __future__ import unicode_literals wouldn't it be better to call this directive "str_literals"? I realize there isn't 100% overlap in the functionality of the two, but I find the "str_is_str" name not very descriptive. Thanks. Cheers, Dan
Daniele Nicolodi schrieb am 25.09.2018 um 00:28:
On 24-09-2018 14:05, Stefan Behnel wrote:
I added a new directive "str_is_str=True" which can be combined with "language_level=3" to get the desired behaviour. It keeps the 'str' builtin type as it is (it would otherwise become 'unicode' with level 3) and keeps unprefixed string literals as type 'str' in Py2 and Py3. Everything else should depend solely on the language_level switch.
For consistency with the CPython
from __future__ import unicode_literals
wouldn't it be better to call this directive "str_literals"?
I realize there isn't 100% overlap in the functionality of the two, but I find the "str_is_str" name not very descriptive.
I started off with "unicode_literals=False", and then renamed it because this name didn't cover the change of "str" to "unicode" (i.e. renaming the usages of the builtin type internally, so that "str(x)" actually calls "unicode(x)"). Looks like this bikeshed needs painting, so let's have a quick(!) discussion or vote. - should this feature touch the builtin type at all? - more opinions on the name? Stefan
On 9/25/18 1:24 AM, Stefan Behnel wrote:
I started off with "unicode_literals=False", and then renamed it because this name didn't cover the change of "str" to "unicode" (i.e. renaming the usages of the builtin type internally, so that "str(x)" actually calls "unicode(x)").
Looks like this bikeshed needs painting, so let's have a quick(!) discussion or vote.
- should this feature touch the builtin type at all?
- more opinions on the name?
I think the name str_is_str will be confusing to developers who begin with Python 3 because they'll probably assume str is unicode. The name str_is_bytes might be better or maybe unprefixed-string-literals-are-bytes. A bit verbose, but this should be something used with old code that's used with Python 3; newly written code should use b'' literals. I do wonder how usable this will be in practice because passing a bytes instance to something that expects a unicode instance may lead to problems. John
John Ehresman schrieb am 25.09.2018 um 16:27:
On 9/25/18 1:24 AM, Stefan Behnel wrote:
I started off with "unicode_literals=False", and then renamed it because this name didn't cover the change of "str" to "unicode" (i.e. renaming the usages of the builtin type internally, so that "str(x)" actually calls "unicode(x)").
Looks like this bikeshed needs painting, so let's have a quick(!) discussion or vote.
- should this feature touch the builtin type at all?
- more opinions on the name?
I think the name str_is_str will be confusing to developers who begin with Python 3 because they'll probably assume str is unicode.
It is unicode for them, at least in Py3. In Py2, it's str. Thus the name "str_is_str", it's "str" in both versions. Stefan
participants (3)
-
Daniele Nicolodi -
John Ehresman -
Stefan Behnel