[Python-ideas] Add an option for delimiters in bytes.hex()

Erik python at lucidity.plus.com
Tue May 2 18:39:48 EDT 2017


On 02/05/17 12:31, Steven D'Aprano wrote:
> I disagree with this approach. There's nothing special about bytes.hex()
> here, perhaps we want to format the output of hex() or bin() or oct(),
> or for that matter "%x" and any of the other string templates?
>
> In fact, this is a string operation that could apply to any character
> string, including decimal digits.
>
> Rather than duplicate the API and logic everywhere, I suggest we add a
> new string method. My suggestion is str.chunk(size, delimiter=' ') and
> str.rchunk() with the same arguments:
>
> "1234ABCDEF".chunk(4)
> => returns "1234 ABCD EF"

FWIW, I implemented a version of something similar as a fixed-length 
"chunk" method in itertoolsmodule.c (it was similar to izip_longest - it 
had a "fill" keyword to pad the final chunk). It was ~100 LOC including 
the structure definitions. The chunk method was an iterator (so it 
returned a sequence of "chunks" as defined by the API).

Then I read that "itertools" should consist of primitives only and that 
we should defer to "moreitertools" for anything that is of a higher 
level (which this is - it can be done in terms of itertools functions). 
So I didn't propose it, although the processing of my WAV files (in 
which the sample data are groups of bytes - frames - of a fixed length) 
was significantly faster with it :(

I also looked at implementing itertools.chunk as a function that would 
make use of a "__chunk__" method on the source object if it existed 
(which allowed a class to support an even more efficient version of 
chunking - things like range() etc).

> I don't see any advantage to adding this to bytes.hex(), hex(), oct(),
> bin(), and I really don't think it is helpful to be grouping the
> characters by the number of bits. Its a string formatting operation, not
> a bit operation.

Why do you want to limit it to strings? Isn't something like this 
potentially useful for all sequences (where the result is a tuple of 
objects that are the same as the source sequence - be that strings or 
lists or lazy ranges or whatever?). Why aren't the chunks returned via 
an iterator?

E.


More information about the Python-ideas mailing list