"convert" string to bytes without changing data (encoding)

Terry Reedy tjreedy at udel.edu
Thu Mar 29 12:49:19 EDT 2012


On 3/29/2012 11:30 AM, Ross Ridge wrote:

> No, Evan in his own words admitted that his post was ment to be harsh,

I agree that he should have restrained and censored his writing.

> Just because I refuse to drink the
 > "it's impossible to represent strings as a series of bytes" kool-aid

I do not believe *anyone* has made that claim. Is this meant to be a 
wild exaggeration? As wild as Evan's?

In my first post on this thread, I made three truthful claims.

1. A 3.x text string is logically a sequence of unicode 'characters' 
(codepoints).

2. The Python language definition does not require that a string be 
bytes or become bytes unless and until it is explicitly encoded.

3. The intentionally hidden byte implementation of strings on byte 
machines is version and system dependent. The bytes used for a 
particular character is (in 3.3) context dependent.

As it turns out, the OP had mistakenly assumed that the hidden byte 
implementation of 3.3 strings was both well-defined and something 
(utf-8) that it is not and (almost certainly) never will be. Guido and 
most other devs strongly want string indexing (and hence slice endpoint 
finding) to be O(1).

So all of the above is moot as far as the OP's problem is concerned. I 
already gave him the three standard solutions.

-- 
Terry Jan Reedy




More information about the Python-list mailing list