
On Fri, May 20, 2011 at 11:05 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
which is now
field_type = chr(hdr[11])
This is definitely a modelling problem, and exactly the kind of thinking that the bytes model in Py3k is intended to combat. Bytes are not text, even when you're dealing primarily with ASCII. The world where that mindset worked consistently and reliably is ancient history (and many non-English speakers still suffer annoying software glitches due to the fact that English speakers have been able to get by with only ASCII for so long). If you want a subscript on a bytes object to create another bytes object, then slice it, just as you would a list. If you want the integer value, index it.
So because my single element access to the byte string lost its bytes type, I may no longer get the correct result.
Umm, no. You may not get the correct result because you're telling Python to interpret a value as a Unicode code point when it is actually no such thing (given your example, I assume it is actually cp1251 encoded text). Therefore, instead of: chr(hdr[11]) # Only makes sense for a sequence of Unicode code points you want something like: hdr[11:12].decode('cp1251') # Makes sense for a cp1251 encoded byte sequence Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia