"convert" string to bytes without changing data (encoding)
ramit.prasad at jpmorgan.com
Thu Mar 29 19:36:34 CEST 2012
> > Technically, ASCII goes up to 256 but they are not A-z letters.
> Technically, ASCII is 7-bit, so it goes up to 127.
> No, ASCII only defines 0-127. Values >=128 are not ASCII.
> >From https://en.wikipedia.org/wiki/ASCII:
> ASCII includes definitions for 128 characters: 33 are non-printing
> control characters (now mostly obsolete) that affect how text and
> space is processed and 95 printable characters, including the space
> (which is considered an invisible graphic).
Doh! I was mistaking extended ASCII for ASCII. Thanks for the
Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423
> -----Original Message-----
> From: python-list-bounces+ramit.prasad=jpmorgan.com at python.org
> [mailto:python-list-bounces+ramit.prasad=jpmorgan.com at python.org] On
> Behalf Of MRAB
> Sent: Wednesday, March 28, 2012 2:50 PM
> To: python-list at python.org
> Subject: Re: "convert" string to bytes without changing data (encoding)
> On 28/03/2012 20:02, Prasad, Ramit wrote:
> >> >The right way to convert bytes to strings, and vice versa, is via
> >> >encoding and decoding operations.
> >> If you want to dictate to the original poster the correct way to do
> >> things then you don't need to do anything more that. You don't need
> >> pretend like Chris Angelico that there's isn't a direct mapping from
> >> the his Python 3 implementation's internal respresentation of strings
> >> to bytes in order to label what he's asking for as being "silly".
> > It might be technically possible to recreate internal implementation,
> > or get the byte data. That does not mean it will make any sense or
> > be understood in a meaningful manner. I think Ian summarized it
> > very well:
> >>You can't generally just "deal with the ascii portions" without
> >>knowing something about the encoding. Say you encounter a byte
> >>greater than 127. Is it a single non-ASCII character, or is it the
> >>leading byte of a multi-byte character? If the next character is less
> >>than 127, is it an ASCII character, or a continuation of the previous
> >>character? For UTF-8 you could safely assume ASCII, but without
> >>knowing the encoding, there is no way to be sure. If you just assume
> >>it's ASCII and manipulate it as such, you could be messing up
> >>non-ASCII characters.
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.
More information about the Python-list