[Python-ideas] Add list.join() please

Thu Jan 31 18:43:53 EST 2019

On 1/31/2019 12:51 PM, Chris Barker via Python-ideas wrote:

> I do a lot of numerical programming, and used to use MATLAB and now 
> numpy a lot. So I am very used to "vectorization" -- i.e. having 
> operations that work on a whole collection of items at once.
> 
> Example:
> 
> a_numpy_array * 5
> 
> multiplies every item in the array by 5
> 
> In pure Python, you would do something like:
> 
> [ i * 5 for i in a_regular_list]
> 
> You can imagine that for more complex expressions the "vectorized" 
> approach can make for much clearer and easier to parse code. Also much 
> faster, which is what is usually talked about, but I think the 
> readability is the bigger deal.
> 
> So what does this have to do with the topic at hand?
> 
> I know that when I'm used to working with numpy and then need to do some 
> string processing or some such, I find myself missing this 
> "vectorization" -- if I want to do the same operation on a whole bunch 
> of strings, why do I need to write a loop or comprehension or map? that is:
> 
> [s.lower() for s in a_list_of_strings]
> 
> rather than:
> 
> a_list_of_strings.lower()
> 
> (NOTE: I prefer comprehension syntax to map, but map would work fine 
> here, too)
> 
> It strikes me that that is the direction some folks want to go.

To me, thinking of strings as being in lists is Python 1 thinking. 
Interactive applications work with *streams* of input strings.

> If so, then I think the way to do it is not to add a bunch of stuff to 
> Python's str or sequence types, but rather to make a new library that 
> provides quick and easy manipulation of sequences of strings.  -- kind 
> of a stringpy -- analogous to numpy.
> 
> At the core of numpy is the ndarray: a "a multidimensional, homogeneous 
> array
> of fixed-size items"
> 
> a strarray could be simpler -- I don't see any reason for more than 1-D, 
> nor more than one datatype. But it could be a "vector" of strings that 
> was guaranteed to be all strings, and provide operations that acted on 
> the entire collection in one fell swoop.

I think an iterator (stream) of strings would be better.  Here is a start.

class StringIt:
     """Iterator of strings.

     A StringIt wraps an iterator of strings to provide methods that
     apply the corresponding string method to each string in the
     iterator.  StringIt methods do not enforce the positional-only
     restrictions of some string methods.  The join method reverses the
     order of the arguments.

     Except for join(joiner), which returns a single string,
     the return values are iterators of the return value of the string
     methods.  An iterator of strings is returned as a StringIt so that
     further methods can be applied.
     """

     def __init__(self, objects, nogen=False):
         """Return a wrapped iterator of strings.

         Objects must be an iterator of strings or an iterable of objects
         with good __str__ methods.  All builtin objects have a good
         __str__ methods and all non-buggy user-defined objects should.

         When *objects* is an iterator of strings, passing nogen=True
         avoids an layer of wrapping by claiming that str calls are not
         needed.  StringIt methods that return a StringIt do this.  An
         iterable of strings, such as ['a', 'b', 'c'], can be turned into
         an iterator with iter(iterable).  Users who pass nogen=True do
         so at their own risk because checking the claim would empty the
         iterable.
         """
         if not hasattr(objects, '__iter__'):
             raise ValueError('objects is not an iterable')
         if nogen and not hasattr(objects, '__next__'):
             raise ValueError('objects is not an iterator')
         if nogen:
             self.it = objects
         else:
             self.it = (str(ob) for ob in objects)

     def __iter__(self):
         return self.it.__iter__()

     def __next__(self):
         return self.it.__next__()

     def upper(self):
         return StringIt((s.upper() for s in self.it), nogen=True)

     def count(self, sub, start=0, end=None):
         return (s.count(sub, start, end or len(s)) for s in self.it)

     def join(self, joiner):
         return joiner.join(self.it)

for si, out in (
     (StringIt(iter(('a', 'b', 'c')), nogen=True), ['a', 'b', 'c']),
     (StringIt((1, 2, 3)), ['1', '2', '3']),
     (StringIt((1, 2, 3)).count('1'), [1, 0, 0]),
     (StringIt(('a', 'b', 'c')).upper(), ['A', 'B', 'C']),
     ):
     assert list(si) == out
assert StringIt(('a', 'b', 'c')).upper().join('-') == 'A-B-C'
# asserts all pass

-- 
Terry Jan Reedy