[Python-Dev] bytes thoughts

Baptiste Carvello baptiste13 at altern.org
Thu Mar 2 01:59:04 CET 2006


some more thoughts about the bytes object:

1) it would be nice to have an trivial way to change a bytes object to an int / 
long, and vice versa.

Rationale:

while manipulating binary data will happen mostly with bytes objects, some 
operations are better done with ints, like the bit manipulations with the &|~^ 
operators. So we should make sure there is no impedance mismatch between those 2 
ways of editing binary data. Getting an individual byte at a time is not 
sufficient, because the part of the data you want to edit might span over a few 
bytes, or simply fall across a byte boundary.

Toy implementation:

 >>> class bytes(list):
...     def from_int(cls, value, length):
...         return cls([(value >> 8*i) % 256 for i in range(length)[::-1]])
...     from_int=classmethod(from_int)
...     def int(self):
...         return sum([256**i*n for i,n in enumerate(self[::-1])])
...
 >>>

The length argument to from_int is necessary to create a fixed number of bytes, 
event if those bytes are 0.

Use case:

let's say you have a binary record made of 7 bits of padding and 3x3 bytes of 
unix permissions. You want to change the user permissions, and store the record 
back to a bytes object:

 >>> record=bytes([1,36]) # this could be a slice of a preexisting bytes object
 >>> perms=record.int()
 >>> print oct(perms)
0444
 >>> perms &=~( 7 <<6 )   # clear the bits corresponding to user permissions
 >>> perms |=   6 <<6     # set the bits to the new value
 >>> print oct(perms)
0644
 >>> record=bytes.from_int(perms,2)
 >>>

2) a common case of interactive use is to display a bytes string as a character 
string in order to spot which parts are text. In this case you ignore non-ASCII 
characters, and replace everything that cannot be printed with a space (as some 
hex editors do). So you don't need to care about encodings.

 >>> import string
 >>> def printable(c):
...     if not c in string.printable: return ' '
...     if c.isspace(): return ' '
...     return c
...
 >>> class bytes(list):
...     def printable_ascii(self):
...         return u"".join([printable(chr(i)) for i in nb])
...
 >>> nb=bytes([48,0,10,12,34,65,66])
 >>> print nb.printable_ascii()
0   "AB
 >>>

by the way, what will chr return in py3k ?

Cheers,
BC



More information about the Python-Dev mailing list