"convert" string to bytes without changing data (encoding)
Grant Edwards
invalid at invalid.invalid
Wed Mar 28 15:44:02 EDT 2012
On 2012-03-28, Prasad, Ramit <ramit.prasad at jpmorgan.com> wrote:
>
>>You can't generally just "deal with the ascii portions" without
>>knowing something about the encoding. Say you encounter a byte
>>greater than 127. Is it a single non-ASCII character, or is it the
>>leading byte of a multi-byte character? If the next character is less
>>than 127, is it an ASCII character, or a continuation of the previous
>>character? For UTF-8 you could safely assume ASCII, but without
>>knowing the encoding, there is no way to be sure. If you just assume
>>it's ASCII and manipulate it as such, you could be messing up
>>non-ASCII characters.
>
> Technically, ASCII goes up to 256
No, ASCII only defines 0-127. Values >=128 are not ASCII.
>From https://en.wikipedia.org/wiki/ASCII:
ASCII includes definitions for 128 characters: 33 are non-printing
control characters (now mostly obsolete) that affect how text and
space is processed and 95 printable characters, including the space
(which is considered an invisible graphic).
--
Grant Edwards grant.b.edwards Yow! Used staples are good
at with SOY SAUCE!
gmail.com
More information about the Python-list
mailing list