[Python-ideas] Add list.join() please
Terry Reedy
tjreedy at udel.edu
Thu Jan 31 18:43:53 EST 2019
On 1/31/2019 12:51 PM, Chris Barker via Python-ideas wrote:
> I do a lot of numerical programming, and used to use MATLAB and now
> numpy a lot. So I am very used to "vectorization" -- i.e. having
> operations that work on a whole collection of items at once.
>
> Example:
>
> a_numpy_array * 5
>
> multiplies every item in the array by 5
>
> In pure Python, you would do something like:
>
> [ i * 5 for i in a_regular_list]
>
> You can imagine that for more complex expressions the "vectorized"
> approach can make for much clearer and easier to parse code. Also much
> faster, which is what is usually talked about, but I think the
> readability is the bigger deal.
>
> So what does this have to do with the topic at hand?
>
> I know that when I'm used to working with numpy and then need to do some
> string processing or some such, I find myself missing this
> "vectorization" -- if I want to do the same operation on a whole bunch
> of strings, why do I need to write a loop or comprehension or map? that is:
>
> [s.lower() for s in a_list_of_strings]
>
> rather than:
>
> a_list_of_strings.lower()
>
> (NOTE: I prefer comprehension syntax to map, but map would work fine
> here, too)
>
> It strikes me that that is the direction some folks want to go.
To me, thinking of strings as being in lists is Python 1 thinking.
Interactive applications work with *streams* of input strings.
> If so, then I think the way to do it is not to add a bunch of stuff to
> Python's str or sequence types, but rather to make a new library that
> provides quick and easy manipulation of sequences of strings. -- kind
> of a stringpy -- analogous to numpy.
>
> At the core of numpy is the ndarray: a "a multidimensional, homogeneous
> array
> of fixed-size items"
>
> a strarray could be simpler -- I don't see any reason for more than 1-D,
> nor more than one datatype. But it could be a "vector" of strings that
> was guaranteed to be all strings, and provide operations that acted on
> the entire collection in one fell swoop.
I think an iterator (stream) of strings would be better. Here is a start.
class StringIt:
"""Iterator of strings.
A StringIt wraps an iterator of strings to provide methods that
apply the corresponding string method to each string in the
iterator. StringIt methods do not enforce the positional-only
restrictions of some string methods. The join method reverses the
order of the arguments.
Except for join(joiner), which returns a single string,
the return values are iterators of the return value of the string
methods. An iterator of strings is returned as a StringIt so that
further methods can be applied.
"""
def __init__(self, objects, nogen=False):
"""Return a wrapped iterator of strings.
Objects must be an iterator of strings or an iterable of objects
with good __str__ methods. All builtin objects have a good
__str__ methods and all non-buggy user-defined objects should.
When *objects* is an iterator of strings, passing nogen=True
avoids an layer of wrapping by claiming that str calls are not
needed. StringIt methods that return a StringIt do this. An
iterable of strings, such as ['a', 'b', 'c'], can be turned into
an iterator with iter(iterable). Users who pass nogen=True do
so at their own risk because checking the claim would empty the
iterable.
"""
if not hasattr(objects, '__iter__'):
raise ValueError('objects is not an iterable')
if nogen and not hasattr(objects, '__next__'):
raise ValueError('objects is not an iterator')
if nogen:
self.it = objects
else:
self.it = (str(ob) for ob in objects)
def __iter__(self):
return self.it.__iter__()
def __next__(self):
return self.it.__next__()
def upper(self):
return StringIt((s.upper() for s in self.it), nogen=True)
def count(self, sub, start=0, end=None):
return (s.count(sub, start, end or len(s)) for s in self.it)
def join(self, joiner):
return joiner.join(self.it)
for si, out in (
(StringIt(iter(('a', 'b', 'c')), nogen=True), ['a', 'b', 'c']),
(StringIt((1, 2, 3)), ['1', '2', '3']),
(StringIt((1, 2, 3)).count('1'), [1, 0, 0]),
(StringIt(('a', 'b', 'c')).upper(), ['A', 'B', 'C']),
):
assert list(si) == out
assert StringIt(('a', 'b', 'c')).upper().join('-') == 'A-B-C'
# asserts all pass
--
Terry Jan Reedy
More information about the Python-ideas
mailing list