"convert" string to bytes without changing data (encoding)

Grant Edwards invalid at invalid.invalid
Wed Mar 28 15:44:02 EDT 2012


On 2012-03-28, Prasad, Ramit <ramit.prasad at jpmorgan.com> wrote:
> 
>>You can't generally just "deal with the ascii portions" without
>>knowing something about the encoding.  Say you encounter a byte
>>greater than 127.  Is it a single non-ASCII character, or is it the
>>leading byte of a multi-byte character?  If the next character is less
>>than 127, is it an ASCII character, or a continuation of the previous
>>character?  For UTF-8 you could safely assume ASCII, but without
>>knowing the encoding, there is no way to be sure.  If you just assume
>>it's ASCII and manipulate it as such, you could be messing up
>>non-ASCII characters.
> 
> Technically, ASCII goes up to 256

No, ASCII only defines 0-127.  Values >=128 are not ASCII.

>From https://en.wikipedia.org/wiki/ASCII:

  ASCII includes definitions for 128 characters: 33 are non-printing
  control characters (now mostly obsolete) that affect how text and
  space is processed and 95 printable characters, including the space
  (which is considered an invisible graphic).

-- 
Grant Edwards               grant.b.edwards        Yow! Used staples are good
                                  at               with SOY SAUCE!
                              gmail.com            



More information about the Python-list mailing list