Just an idea of usability fix for Python 3. hexdump module (function or bytes method is better) as simple, easy and intuitive way for dumping binary data when writing programs in Python. hexdump(bytes) - produce human readable dump of binary data, byte-by-byte representation, separated by space, 16-byte rows Rationale: 1. Debug. Generic binary data can't be output to console. A separate helper is needed to print, log or store its value in human readable format in database. This takes time. 2. Usability. binascii is ugly: name is not intuitive any more, there are a lot of functions, and it is not clear how it relates to unicode. 3. Serialization. It is convenient to have format that can be displayed in a text editor. Simple tools encourage people to use them. Practical example:
print(b) � � � �� �� � �� �� � � � � b '\xe6\xb0\x08\x04\xe7\x9e\x08\x04\xe7\xbc\x08\x04\xe7\xd5\x08\x04\xe7\xe4\x08\x04\xe6\xb0\x08\x04\xe7\xf0\x08\x04\xe7\xff\x08\x04\xe8\x0b\x08\x04\xe8\x1a\x08\x04\xe6\xb0\x08\x04\xe6\xb0\x08\x04' print(binascii.hexlify(data)) e6b00804e79e0804e7bc0804e7d50804e7e40804e6b00804e7f00804e7ff0804e80b0804e81a0804e6b00804e6b00804
data = hexdump(b) print(data) E6 B0 08 04 E7 9E 08 04 E7 BC 08 04 E7 D5 08 04 E7 E4 08 04 E6 B0 08 04 E7 F0 08 04 E7 FF 08 04 E8 0B 08 04 E8 1A 08 04 E6 B0 08 04 E6 B0 08 04
# achieving the same output with binascii is overcomplicated data_lines = [binascii.hexlify(b)[i:min(i+32, len(binascii.hexlify(b)))] for i in xrange(0, len(binascii.hexlify(b)), 32)] data_lines = [' '.join(l[i:min(i+2, len(l))] for i in xrange(0, len(l), 2)).upper() for l in data_lines] print('\n'.join(data_lines)) E6 B0 08 04 E7 9E 08 04 E7 BC 08 04 E7 D5 08 04 E7 E4 08 04 E6 B0 08 04 E7 F0 08 04 E7 FF 08 04 E8 0B 08 04 E8 1A 08 04 E6 B0 08 04 E6 B0 08 04
On the other side, getting rather useless binascii output from hexdump() is quite trivial:
data.replace(' ','').replace('\n','').lower() 'e6b00804e79e0804e7bc0804e7d50804e7e40804e6b00804e7f00804e7ff0804e80b0804e81a0804e6b00804e6b00804'
But more practical, for example, would be counting offset from hexdump:
print( ''.join( '%05x: %s\n' % (i*16,l) for i,l in enumerate(hexdump(b).split('\n'))))
Etc. Conclusion: By providing better building blocks on basic level Python will become a better tool for more useful tasks. References: [1] http://stackoverflow.com/questions/2340319/python-3-1-1-string-to-hex [2] http://en.wikipedia.org/wiki/Hex_dump -- anatoly t.
On Sat, May 12, 2012 at 11:59:03AM +0300, anatoly techtonik <techtonik@gmail.com> wrote:
Just an idea of usability fix for Python 3. hexdump module (function or bytes method is better) as simple, easy and intuitive way for dumping binary data when writing programs in Python.
Well, you know, the way to add such modules to Python is via Cheeseshop. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
On Sat, May 12, 2012 at 12:15 PM, Oleg Broytman <phd@phdru.name> wrote:
On Sat, May 12, 2012 at 11:59:03AM +0300, anatoly techtonik < techtonik@gmail.com> wrote:
Just an idea of usability fix for Python 3. hexdump module (function or bytes method is better) as simple, easy and intuitive way for dumping binary data when writing programs in Python.
Well, you know, the way to add such modules to Python is via Cheeseshop.
Done. https://pypi.python.org/pypi/hexdump -- anatoly t.
Le Mon, 29 Apr 2013 00:48:43 +0300, anatoly techtonik <techtonik@gmail.com> a écrit :
On Sat, May 12, 2012 at 12:15 PM, Oleg Broytman <phd@phdru.name> wrote:
On Sat, May 12, 2012 at 11:59:03AM +0300, anatoly techtonik < techtonik@gmail.com> wrote:
Just an idea of usability fix for Python 3. hexdump module (function or bytes method is better) as simple, easy and intuitive way for dumping binary data when writing programs in Python.
Well, you know, the way to add such modules to Python is via Cheeseshop.
Actually, I think a hexdump() function in pprint would be a nice addition. I find myself wanting it when inspecting some binary protocols (e.g. pickle :-)). Regards Antoine.
On Mon, Apr 29, 2013 at 11:59 AM, Antoine Pitrou <solipsis@pitrou.net>wrote:
Actually, I think a hexdump() function in pprint would be a nice addition. I find myself wanting it when inspecting some binary protocols (e.g. pickle :-)).
Python 2.7 had
'alkdjfa'.encode('hex') '616c6b646a6661'
So why not:
b'asdf'.decode('hexdump') '61 73 64 66'
Yuval
Le Mon, 29 Apr 2013 14:32:46 +0300, Yuval Greenfield <ubershmekel@gmail.com> a écrit :
On Mon, Apr 29, 2013 at 11:59 AM, Antoine Pitrou <solipsis@pitrou.net>wrote:
Actually, I think a hexdump() function in pprint would be a nice addition. I find myself wanting it when inspecting some binary protocols (e.g. pickle :-)).
Python 2.7 had
'alkdjfa'.encode('hex') '616c6b646a6661'
So why not:
b'asdf'.decode('hexdump') '61 73 64 66'
Command-line hexdump has a bit more options and abilities, such as wrapping to N character width, printing an ASCII transcript beside the representation, etc. To support this flexibility, a module function is better than a codec :-) Regards Antoine.
On Mon, Apr 29, 2013 at 2:43 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Command-line hexdump has a bit more options and abilities, such as wrapping to N character width, printing an ASCII transcript beside the representation, etc.
To support this flexibility, a module function is better than a codec :-)
I agree. I also agree that pprint is a good place for this. Though you could: b'asdf'.decode('hexdump80chars') b'asdf'.decode('hexdump40chars') b'asdf'.decode('hexdump80chars-trans') b'asdf'.decode('hexdump40chars-trans') Jokes aside, this makes me wonder why decode/encode work like they do. It'd be more sensible to: b'asdf'.decode.utf16(little_endian=True) 'asdf'.encode.utf8(bom=True) Yuval
On 5/12/2012 4:59 AM, anatoly techtonik wrote:
Just an idea of usability fix for Python 3. hexdump module (function or bytes method is better) as simple, easy and intuitive way for dumping binary data when writing programs in Python.
hexdump(bytes) - produce human readable dump of binary data, byte-by-byte representation, separated by space, 16-byte rows
Hexdump, as you propose it, does three things. In each case, it fixes a parameter that could reasonably have a different value. 1. Splits the hex characters into groups of two characters, each representing one byte. For some uses, large chunks would be more useful. 2. Uppercases the alpha hex characters. This is a holdover from the ancient all-uppercase world, where there was no choice. While is may make the block visual more 'even' and 'aesthetic', which not actually being read, it makes it harder to tell the difference between a 0-9 digit and alpha digit. B and 8 become very similar. There is justification for binascii.hexlify using locecase. 3. Group the hex-represented units into lines of 16 each. This is only useful when the bytes come from memory with hex addresses, when the point is to determine the specific bytes at specific addresses. For displaying decimal-length byte strings, 25 bytes per line would be better. What it does not do. 4. Break lines into blocks. One might want to break up multiple lines of 25 into blocks of four lines each. 5. Label the rows and column either with hex or decimal labels. 6. Add 'dotted ascii' translation to reveal embedded ascii strints. Output: choices are an iterator of lines, a list of lines, and a string with embedded newlines. The second and third are easily derived from the first, so I propose the first as the best choice. A iterator can also be used to write to a file. A flexible module would be a good addition to pypi if not there already. Let see.... hexencoder 1.0 hex encode decode and compare This project offers 3 basic tools for manipulating binary files: 1) flexible hexdump Home Page: http://sourceforge.net/projects/hexencoder I did not look to see how flexible is 'flexible', but there it is.
Rationale: 1. Debug. Generic binary data can't be output to console.
That depends on the console. Old IBM PCs had a character for every byte. That was meant for line-drawing, accents, and symbols, but could also be used for binary dumps. I believe there are Windows codepages that will do similar. Any bytes can be decoded as latin-1 and then printed.
A separate helper is needed to print, log or store its value in human readable format in database. This takes time.
A custom helper gives custom output.
2. Usability. binascii is ugly: name is not intuitive any more, there are a lot of functions, and it is not clear how it relates to unicode.
Even if there are lots of functions, one might be added. What does 'it' refer to? hexdump or binascii? Both are about binary bytes and not about unicode characters, so neither relate to abstract unicode. Encoded unicode characters are binary data like any other, though if the encoding is utf-16 or utf-32, one would want 2 or 4 bytes dumped together, as I suggested above. -- Terry Jan Reedy
On 12 May 2012 09:59, anatoly techtonik <techtonik@gmail.com> wrote:
hexdump(bytes) - produce human readable dump of binary data,
+1 on this basic function, that would be very nice in the stdlib. Now I always need to go and dig up my own function from somewhere. A certain deal of bikeshedding would be required on the function signature however, I'd go with something like: hexdump(data, rowsize=16, offsets=True, ascii=True) Where rowsize is the number of bytes on one row, offsets controls showing the byte number (in hex) of the first byte of each row and ascii controls showing the 7-bit printable characters in a right hand column. This would cover my needs, I'm sure other people will come up with more must-haves. Regards, Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
Rather than bikeshedding, why not implement the common formats and flags implemented by the venerable 'od' command? It's been time-tested... On Sat, May 12, 2012 at 9:20 AM, Floris Bruynooghe <flub@devork.be> wrote:
On 12 May 2012 09:59, anatoly techtonik <techtonik@gmail.com> wrote:
hexdump(bytes) - produce human readable dump of binary data,
+1 on this basic function, that would be very nice in the stdlib. Now I always need to go and dig up my own function from somewhere.
A certain deal of bikeshedding would be required on the function signature however, I'd go with something like:
hexdump(data, rowsize=16, offsets=True, ascii=True)
Where rowsize is the number of bytes on one row, offsets controls showing the byte number (in hex) of the first byte of each row and ascii controls showing the 7-bit printable characters in a right hand column.
This would cover my needs, I'm sure other people will come up with more must-haves.
Regards, Floris
-- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
participants (8)
-
anatoly techtonik
-
Antoine Pitrou
-
Eric V. Smith
-
Floris Bruynooghe
-
Guido van Rossum
-
Oleg Broytman
-
Terry Reedy
-
Yuval Greenfield