On Tue, May 02, 2017 at 11:45:35PM +1000, Nick Coghlan wrote:
Attempting to align the terminology with existing string methods and other stdlib APIs: [...] 1. we don't have any current APIs or documentation that use "chunk" in combination with any kind of delimiter 2. we don't have any current APIs or documentation that use "chunk" as a verb - they all use it as a noun
English has a long and glorious tradition of verbing nouns, and nouning verbs. Group can mean the action of putting things into a group, join likewise refers to both the action of attaching two things and the seam or joint where they have been joined. Likewise for chunking: https://duckduckgo.com/html/?q=chunking "Chunk" has used as a verb since at least 1890 (albeit with a different meaning). None of my dictionaries give a date for the use of chunking to mean dividing something up into chunks, so that could be quite recent, but it's well-established in education (chunking as a technique for doing long division), psychology, linguistics and more. I remember using "chunking" as a verb to describe Hyperscript's text handling back in the mid 1980s, e.g. "word 2 of line 6 of text". The nltk library handles chunk as both a noun and verb in a similar sense: http://www.nltk.org/howto/chunk.html
So if we went with this approach, then Carl Smith's suggestion of "str.delimit()" likely makes sense.
The problem with "delimit" is that in many contexts it refers to marking both the start and end boundaries, e.g. people often refer to string delimiters '...' and list delimiters [...]. That doesn't apply here, where we're adding separators between chunks/groups. The term delimiter can be used in various ways, and some of them do not match the behaviour we want here: http://stackoverflow.com/questions/9118769/when-to-use-the-terms-delimiter-t... In this case, we are not adding delimiters, we're adding separators. We're chunking (or grouping) characters by counting them, then separating the groups. The test here is what happens if the string is shorter than the group size? "xyz".chunk(5, '*') If we're delimiting the boundaries of the group, then I expect that we should get "*xyz*", but if we're separating groups, I expect that we should get "xyz" unchanged.
However, the other question worth asking is whether we might want a "string slice splitting" operation rather than a string delimiting option: once you have the slices, then combining them again with str.join is straightforward, but extracting the slices in the first place is currently a little fiddly (especially for the reversed case):
Let me think about that :-) -- Steve