[Python-ideas] Python 3.x and bytes

Fri May 20 17:16:46 CEST 2011

On Fri, May 20, 2011 at 11:05 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> which is now
>
> field_type = chr(hdr[11])

This is definitely a modelling problem, and exactly the kind of
thinking that the bytes model in Py3k is intended to combat.

Bytes are not text, even when you're dealing primarily with ASCII. The
world where that mindset worked consistently and reliably is ancient
history (and many non-English speakers still suffer annoying software
glitches due to the fact that English speakers have been able to get
by with only ASCII for so long).

If you want a subscript on a bytes object to create another bytes
object, then slice it, just as you would a list. If you want the
integer value, index it.

> So because my single element access to the byte string lost its bytes type, I may no longer get the correct result.

Umm, no. You may not get the correct result because you're telling
Python to interpret a value as a Unicode code point when it is
actually no such thing (given your example, I assume it is actually
cp1251 encoded text). Therefore, instead of:

chr(hdr[11]) # Only makes sense for a sequence of Unicode code points

you want something like:

hdr[11:12].decode('cp1251') # Makes sense for a cp1251 encoded byte sequence

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia