[Python-Dev] Python 3.x and bytes

Tue Jun 14 00:11:15 CEST 2011

Thank you all for the responses.  Rather than reply to each, I just made 
one big summary.  :)
----------------------------------------------------------------

Martin v. Löwis wrote:
 > Ethan Furman wrote:
 >> # constants
 >>
 >> EOH  = b'\r'[0]
 >> CHAR = b'C'[0]
 >> DATE = b'D'[0]
 >> FLOAT = b'F'[0]
 >> INT = b'I'[0]
 >> LOGICAL = b'L'[0]
 >> MEMO = b'M'[0]
 >> NUMBER = b'N'[0]
 >>
 >> This is not beautiful code.
 >
 > In this case, I think the intent would be better captured with
 >
 > def ASCII(c):
 >     return c.encode('ascii')
 >
 > EOH     = ASCII('\r') # 0D
 > CHAR    = ASCII('C')  # 43
 > DATE    = ASCII('D')  # 44
 > FLOAT   = ASCII('F')  # 46
 > INT     = ASCII('I')  # 49
 > LOGICAL = ASCII('L')  # 4C
 > MEMO    = ASCII('M')  # 4D
 > NUMBER  = ASCII('N')  # 4E
 >
 > This expresses the intent that a) these are really byte values,
 > not characters, and b) the specific choice of byte values was
 > motivated by ASCII.

Definitely easier to read.  If I go this route I'll probably use ord(), 
though, since ascii and unicode are the same for the first 127 chars, 
and there will be plenty of places to error out with a more appropriate 
message if I get garbage.  Since I really don't care what the actual 
integer values are, I'll skip those comments, too.
----------------------------------------------------------------

Hagen Fürstenau wrote:
 > You still have the alternative
 >
 > EOH = ord('\r')
 > CHAR = ord('C')
 > ...
 >
 > which looks fine to me.

Yes it does.  I just dislike the (to me unnecessary) extra function 
call.  For those tuning in late to this thread, these are workarounds 
for this not working:

field_type = header[11] # field_type is now an int, not a 1-byte bstr
if field_type == r'C':  # r'C' is a 1-byte bstr, so this always fails
----------------------------------------------------------------

Greg Ewing wrote:
 > Guido van Rossum wrote:
 >>> On Thu, May 19, 2011 at 1:43 AM, Nick Coghlan wrote:
 >>>> Proposals to address this include:
 >>>> - introduce a "character" literal to allow c'a' as an alternative
 >>>> to ord('a')
 >>
 >> -1; the result is not a *character* but an integer.
 >
 > Would you be happier if it were spelled i'a' instead?

That would work for me, although I would prefer a'a' (for ASCII).  :)
----------------------------------------------------------------

Stephen J. Turnbull wrote:
 > Put mascara on a pig, and you have a pig with mascara on, not Bette
 > Davis.  I don't necessarily think you're doing anybody a service by
 > making the hack of using ASCII bytes as mnemonics more beautiful.  I
 > think Martin's version is as beautiful as this code should get.

I'll either use Martin's or Nick's.  The point of beauty here is the 
ease of readability.  I think less readable is worse, and we shouldn't 
have to have ugly, hard to read code nor inefficient code just because 
we have to deal with byte streams that aren't unicode.
----------------------------------------------------------------

Nick Coghlan wrote:
 > Agreed, but:
 >
 > EOH, CHAR, DATE, FLOAT, INT, LOGICAL, MEMO, NUMBER = b'\rCDFILMN'
 >
 > is a shorter way to write the same thing.
 >
 > Going two per line makes it easier to mentally map the characters:
 >
 > EOH, CHAR = b'\rC'
 > DATE, FLOAT = b'DF'
 > INT, LOGICAL = b'IL'
 > MEMO, NUMBER = b'MN'

Wow.  I didn't realize that could be done.  That very nearly makes up 
for not being able to do it one char at a time.

Thanks, Nick!
----------------------------------------------------------------

~Ethan~