[Python-ideas] Add an option for delimiters in bytes.hex()
Erik
python at lucidity.plus.com
Tue May 2 18:39:48 EDT 2017
On 02/05/17 12:31, Steven D'Aprano wrote:
> I disagree with this approach. There's nothing special about bytes.hex()
> here, perhaps we want to format the output of hex() or bin() or oct(),
> or for that matter "%x" and any of the other string templates?
>
> In fact, this is a string operation that could apply to any character
> string, including decimal digits.
>
> Rather than duplicate the API and logic everywhere, I suggest we add a
> new string method. My suggestion is str.chunk(size, delimiter=' ') and
> str.rchunk() with the same arguments:
>
> "1234ABCDEF".chunk(4)
> => returns "1234 ABCD EF"
FWIW, I implemented a version of something similar as a fixed-length
"chunk" method in itertoolsmodule.c (it was similar to izip_longest - it
had a "fill" keyword to pad the final chunk). It was ~100 LOC including
the structure definitions. The chunk method was an iterator (so it
returned a sequence of "chunks" as defined by the API).
Then I read that "itertools" should consist of primitives only and that
we should defer to "moreitertools" for anything that is of a higher
level (which this is - it can be done in terms of itertools functions).
So I didn't propose it, although the processing of my WAV files (in
which the sample data are groups of bytes - frames - of a fixed length)
was significantly faster with it :(
I also looked at implementing itertools.chunk as a function that would
make use of a "__chunk__" method on the source object if it existed
(which allowed a class to support an even more efficient version of
chunking - things like range() etc).
> I don't see any advantage to adding this to bytes.hex(), hex(), oct(),
> bin(), and I really don't think it is helpful to be grouping the
> characters by the number of bits. Its a string formatting operation, not
> a bit operation.
Why do you want to limit it to strings? Isn't something like this
potentially useful for all sequences (where the result is a tuple of
objects that are the same as the source sequence - be that strings or
lists or lazy ranges or whatever?). Why aren't the chunks returned via
an iterator?
E.
More information about the Python-ideas
mailing list