[Python-Dev] bytes thoughts
Baptiste Carvello
baptiste13 at altern.org
Thu Mar 2 01:59:04 CET 2006
some more thoughts about the bytes object:
1) it would be nice to have an trivial way to change a bytes object to an int /
long, and vice versa.
Rationale:
while manipulating binary data will happen mostly with bytes objects, some
operations are better done with ints, like the bit manipulations with the &|~^
operators. So we should make sure there is no impedance mismatch between those 2
ways of editing binary data. Getting an individual byte at a time is not
sufficient, because the part of the data you want to edit might span over a few
bytes, or simply fall across a byte boundary.
Toy implementation:
>>> class bytes(list):
... def from_int(cls, value, length):
... return cls([(value >> 8*i) % 256 for i in range(length)[::-1]])
... from_int=classmethod(from_int)
... def int(self):
... return sum([256**i*n for i,n in enumerate(self[::-1])])
...
>>>
The length argument to from_int is necessary to create a fixed number of bytes,
event if those bytes are 0.
Use case:
let's say you have a binary record made of 7 bits of padding and 3x3 bytes of
unix permissions. You want to change the user permissions, and store the record
back to a bytes object:
>>> record=bytes([1,36]) # this could be a slice of a preexisting bytes object
>>> perms=record.int()
>>> print oct(perms)
0444
>>> perms &=~( 7 <<6 ) # clear the bits corresponding to user permissions
>>> perms |= 6 <<6 # set the bits to the new value
>>> print oct(perms)
0644
>>> record=bytes.from_int(perms,2)
>>>
2) a common case of interactive use is to display a bytes string as a character
string in order to spot which parts are text. In this case you ignore non-ASCII
characters, and replace everything that cannot be printed with a space (as some
hex editors do). So you don't need to care about encodings.
>>> import string
>>> def printable(c):
... if not c in string.printable: return ' '
... if c.isspace(): return ' '
... return c
...
>>> class bytes(list):
... def printable_ascii(self):
... return u"".join([printable(chr(i)) for i in nb])
...
>>> nb=bytes([48,0,10,12,34,65,66])
>>> print nb.printable_ascii()
0 "AB
>>>
by the way, what will chr return in py3k ?
Cheers,
BC
More information about the Python-Dev
mailing list