On 5/12/2012 4:59 AM, anatoly techtonik wrote:
Just an idea of usability fix for Python 3. hexdump module (function or bytes method is better) as simple, easy and intuitive way for dumping binary data when writing programs in Python.
hexdump(bytes) - produce human readable dump of binary data, byte-by-byte representation, separated by space, 16-byte rows
Hexdump, as you propose it, does three things. In each case, it fixes a parameter that could reasonably have a different value.
1. Splits the hex characters into groups of two characters, each representing one byte. For some uses, large chunks would be more useful.
2. Uppercases the alpha hex characters. This is a holdover from the ancient all-uppercase world, where there was no choice. While is may make the block visual more 'even' and 'aesthetic', which not actually being read, it makes it harder to tell the difference between a 0-9 digit and alpha digit. B and 8 become very similar. There is justification for binascii.hexlify using locecase.
3. Group the hex-represented units into lines of 16 each. This is only useful when the bytes come from memory with hex addresses, when the point is to determine the specific bytes at specific addresses. For displaying decimal-length byte strings, 25 bytes per line would be better.
What it does not do.
4. Break lines into blocks. One might want to break up multiple lines of 25 into blocks of four lines each.
5. Label the rows and column either with hex or decimal labels.
6. Add 'dotted ascii' translation to reveal embedded ascii strints.
Output: choices are an iterator of lines, a list of lines, and a string with embedded newlines. The second and third are easily derived from the first, so I propose the first as the best choice. A iterator can also be used to write to a file.
A flexible module would be a good addition to pypi if not there already. Let see....
hexencoder 1.0 hex encode decode and compare This project offers 3 basic tools for manipulating binary files: 1) flexible hexdump Home Page: http://sourceforge.net/projects/hexencoder
I did not look to see how flexible is 'flexible', but there it is.
- Debug. Generic binary data can't be output to console.
That depends on the console. Old IBM PCs had a character for every byte. That was meant for line-drawing, accents, and symbols, but could also be used for binary dumps. I believe there are Windows codepages that will do similar. Any bytes can be decoded as latin-1 and then printed.
A separate helper is needed to print, log or store its value in human readable format in database. This takes time.
A custom helper gives custom output.
- Usability. binascii is ugly: name is not intuitive any more, there are a lot
of functions, and it is not clear how it relates to unicode.
Even if there are lots of functions, one might be added. What does 'it' refer to? hexdump or binascii? Both are about binary bytes and not about unicode characters, so neither relate to abstract unicode. Encoded unicode characters are binary data like any other, though if the encoding is utf-16 or utf-32, one would want 2 or 4 bytes dumped together, as I suggested above.