Making the most of internal UTF8

Is there a tutorial about how to best take advantage of PyPy's internal UTF8? The docs say the PyPy now uses UTF8 internally to represent unicode. So, for an old codger, that sounds like were are back to a point where ASCII characters just act normally again, like in Python v.2, since ASCII IS UTF8. In other words, within the range of ASCII characters, the UTF8 representation is identical to the ASCII representation. So, does that mean we can put the 'u' and 'b' prefix nightmares behind us? It would help some diehards finally make the switch to v.3.X, in spite of the bad taste that lingers from early attempts. There is a universe of v.2.X code out there, in production, and something like this would go a long way toward motivating folks so go ahead and update. The fewer the barriers, the more likely the movement. Oh, and incidentally, it would help promote the use of PyPy, as well ... Thanks! Jerry S.

Hi Jerry, On Wed, 26 Feb 2020 at 16:09, Jerry Spicklemire <jspicklemire@gmail.com> wrote:
Is there a tutorial about how to best take advantage of PyPy's internal UTF8?
For best or for worse, this is only an internal feature. It has no effect for the end user. In particular, Python programs written for PyPy3.6 and for CPython3.6 should work identically. The fact that it uses internally utf-8 is not visible to the Python program---otherwise, it would never be cross-compatible. A bientôt, Armin.

To expand Armin's answer, the two most "visible" effects for end users are: - some_unicode.encode('utf-8') is essentially for free (because it is already UTF-8 internally) - some_bytes.decode('utf-8') is very chep (it just needs to check that some_bytes is valid utf-8) ciao, Anto On Wed, Feb 26, 2020 at 4:47 PM Armin Rigo <armin.rigo@gmail.com> wrote:

On Wed, Feb 26, 2020 at 09:08:49AM -0600, Jerry Spicklemire wrote:
You can think of 'u' as being the default in python3 where 'b' was the default in python2 (not ascii) - but most stdlib functions would accept bytes as strings. So in python3, you don't need 'u' and you only occasionally need 'b' or to convert between the two. The defaults are generally better for the programming most people do imo. m -- Matt Billenstein matt@vazor.com http://www.vazor.com/

Hi Jerry, On Wed, 26 Feb 2020 at 16:09, Jerry Spicklemire <jspicklemire@gmail.com> wrote:
Is there a tutorial about how to best take advantage of PyPy's internal UTF8?
For best or for worse, this is only an internal feature. It has no effect for the end user. In particular, Python programs written for PyPy3.6 and for CPython3.6 should work identically. The fact that it uses internally utf-8 is not visible to the Python program---otherwise, it would never be cross-compatible. A bientôt, Armin.

To expand Armin's answer, the two most "visible" effects for end users are: - some_unicode.encode('utf-8') is essentially for free (because it is already UTF-8 internally) - some_bytes.decode('utf-8') is very chep (it just needs to check that some_bytes is valid utf-8) ciao, Anto On Wed, Feb 26, 2020 at 4:47 PM Armin Rigo <armin.rigo@gmail.com> wrote:

On Wed, Feb 26, 2020 at 09:08:49AM -0600, Jerry Spicklemire wrote:
You can think of 'u' as being the default in python3 where 'b' was the default in python2 (not ascii) - but most stdlib functions would accept bytes as strings. So in python3, you don't need 'u' and you only occasionally need 'b' or to convert between the two. The defaults are generally better for the programming most people do imo. m -- Matt Billenstein matt@vazor.com http://www.vazor.com/
participants (5)
-
Antonio Cuni
-
Armin Rigo
-
Dan Stromberg
-
Jerry Spicklemire
-
Matt Billenstein