Problems with hex-conversion functions

Sun Sep 6 00:23:45 EDT 2009

On Fri, 04 Sep 2009 15:28:10 -0700, Arnon Yaari wrote:

> Hello everyone.
> Perhaps I'm missing something, but I see several problems with the two
> hex-conversion function pairs that Python offers: 1. binascii.hexlify
> and binascii.unhexlify 2. bytes.fromhex and bytes.hex
> 
> Problem #1:
> bytes.hex is not implemented, although it was specified in PEP 358. This
> means there is no symmetrical function to accompany bytes.fromhex.

That would probably be an oversight. Patches are welcome.

> Problem #2:
> Both pairs perform the same function, although The Zen Of Python
> suggests that
> "There should be one-- and preferably only one --obvious way to do it."

That is not a prohibition against multiple ways of doing something. It is 
a recommendation that there should be one obvious way (as opposed to no 
way at all, or thirty five non-obvious ways) to do things. Preferably 
only one obvious way, but it's not a prohibition against there being an 
obvious way and a non-obvious way.

> I do not understand why PEP 358 specified the bytes function pair
> although it mentioned the binascii pair...

Because there are three obvious ways of constructing a sequence of bytes:

(1) from a sequence of characters, with an optional encoding;

(2) from a sequence of pairs of hex digits, such as from a hex dump of a 
file;

(3) from a sequence of integers.

(1) and (2) are difficult to distinguish -- should "ab45" be interpreted 
as four characters, "a" "b" "4" and "5", or as two pairs of hex digits 
"ab" and "45"? The obvious solution is to have two different bytes 
constructors.

> Problem #3:
> bytes.fromhex may receive spaces in the input string, although
> binascii.unhexlify may not.
> I see no good reason for these two functions to have different features.

There's clearly differences of opinion about how strict to be when 
accepting input strings. Personally, I can see arguments for both. Given 
that these are two different functions, there's no requirement that they 
do exactly the same thing, so I wouldn't even call it a wart. It's just a 
difference.

> Problem #4:
> binascii.unhexlify may receive both input types: strings or bytes,
> whereas bytes.fromhex raises an exception when given a bytes parameter.
> Again there is no reason for these functions to be different.

There's no reason for them to be the same. unhexlify() is designed to 
take either strings or bytes, mostly for historical reasons: in Python 
1.x and 2.x, it is normal to use byte-strings (called 'strings') as the 
standard string type, and character-strings ('unicode') is relatively 
rare, so unhexlify needs to accept bytes. In Python 3.x, the use of bytes 
as character strings is discouraged, hence passing hex digits as bytes to 
bytes.fromhex() is illegal.

> Problem #5:
> binascii.hexlify returns a bytes type - although ideally, converting to
> hex should always return string types and converting from hex should
> always return bytes.

This is due to historical reasons -- binascii comes from Python 1.x when 
bytes were the normal string type. Presumably modifying binascii to 
return strings in Python 3.x (but not 2.6 or 2.7) would probably be a 
good idea. Patches are welcome.

[...]
> To fix these issues, three changes should be applied: 1. Deprecate
> bytes.fromhex. 

-1 on that. I disagree strongly: bytes are built-ins, and constructing 
bytes from a sequence of hex digits is such a natural and important 
function that needing to import a module to do it is silly and wasteful.

[...]
> 2. In order to keep the functionality that bytes.fromhex has over
> unhexlify,
>    the latter function should be able to handle spaces in its input
> (fix #3)

0 on that. I don't care either way.

> 3. binascii.hexlify should return string as its return type (fix #5)

+1 for the Python 3.x series, -1 for Python 2.6 and 2.7.

-- 
Steven