[Python-ideas] Add an option for delimiters in bytes.hex()
Steven D'Aprano
steve at pearwood.info
Tue May 2 12:28:12 EDT 2017
On Tue, May 02, 2017 at 11:45:35PM +1000, Nick Coghlan wrote:
> Attempting to align the terminology with existing string methods and
> other stdlib APIs:
[...]
> 1. we don't have any current APIs or documentation that use "chunk" in
> combination with any kind of delimiter
> 2. we don't have any current APIs or documentation that use "chunk" as
> a verb - they all use it as a noun
English has a long and glorious tradition of verbing nouns, and nouning
verbs. Group can mean the action of putting things into a group, join
likewise refers to both the action of attaching two things and the seam
or joint where they have been joined. Likewise for chunking:
https://duckduckgo.com/html/?q=chunking
"Chunk" has used as a verb since at least 1890 (albeit with a different
meaning). None of my dictionaries give a date for the use of chunking to
mean dividing something up into chunks, so that could be quite recent,
but it's well-established in education (chunking as a technique for
doing long division), psychology, linguistics and more. I remember using
"chunking" as a verb to describe Hyperscript's text handling back in the
mid 1980s, e.g. "word 2 of line 6 of text".
The nltk library handles chunk as both a noun and verb in a similar
sense:
http://www.nltk.org/howto/chunk.html
> So if we went with this approach, then Carl Smith's suggestion of
> "str.delimit()" likely makes sense.
The problem with "delimit" is that in many contexts it refers to
marking both the start and end boundaries, e.g. people often refer to
string delimiters '...' and list delimiters [...]. That doesn't apply
here, where we're adding separators between chunks/groups.
The term delimiter can be used in various ways, and some of them do not
match the behaviour we want here:
http://stackoverflow.com/questions/9118769/when-to-use-the-terms-delimiter-terminator-and-separator
In this case, we are not adding delimiters, we're adding separators.
We're chunking (or grouping) characters by counting them, then
separating the groups. The test here is what happens if the string is
shorter than the group size?
"xyz".chunk(5, '*')
If we're delimiting the boundaries of the group, then I expect that we
should get "*xyz*", but if we're separating groups, I expect that we
should get "xyz" unchanged.
> However, the other question worth asking is whether we might want a
> "string slice splitting" operation rather than a string delimiting
> option: once you have the slices, then combining them again with
> str.join is straightforward, but extracting the slices in the first
> place is currently a little fiddly (especially for the reversed case):
Let me think about that :-)
--
Steve
More information about the Python-ideas
mailing list