On 1/31/2019 12:51 PM, Chris Barker via Python-ideas wrote:
I do a lot of numerical programming, and used to use MATLAB and now numpy a lot. So I am very used to "vectorization" -- i.e. having operations that work on a whole collection of items at once.
a_numpy_array * 5
multiplies every item in the array by 5
In pure Python, you would do something like:
[ i * 5 for i in a_regular_list]
You can imagine that for more complex expressions the "vectorized" approach can make for much clearer and easier to parse code. Also much faster, which is what is usually talked about, but I think the readability is the bigger deal.
So what does this have to do with the topic at hand?
I know that when I'm used to working with numpy and then need to do some string processing or some such, I find myself missing this "vectorization" -- if I want to do the same operation on a whole bunch of strings, why do I need to write a loop or comprehension or map? that is:
[s.lower() for s in a_list_of_strings]
(NOTE: I prefer comprehension syntax to map, but map would work fine here, too)
It strikes me that that is the direction some folks want to go.
To me, thinking of strings as being in lists is Python 1 thinking. Interactive applications work with *streams* of input strings.
If so, then I think the way to do it is not to add a bunch of stuff to Python's str or sequence types, but rather to make a new library that provides quick and easy manipulation of sequences of strings. -- kind of a stringpy -- analogous to numpy.
At the core of numpy is the ndarray: a "a multidimensional, homogeneous array of fixed-size items"
a strarray could be simpler -- I don't see any reason for more than 1-D, nor more than one datatype. But it could be a "vector" of strings that was guaranteed to be all strings, and provide operations that acted on the entire collection in one fell swoop.
I think an iterator (stream) of strings would be better. Here is a start.
class StringIt: """Iterator of strings.
A StringIt wraps an iterator of strings to provide methods that apply the corresponding string method to each string in the iterator. StringIt methods do not enforce the positional-only restrictions of some string methods. The join method reverses the order of the arguments.
Except for join(joiner), which returns a single string, the return values are iterators of the return value of the string methods. An iterator of strings is returned as a StringIt so that further methods can be applied. """
def __init__(self, objects, nogen=False): """Return a wrapped iterator of strings.
Objects must be an iterator of strings or an iterable of objects with good __str__ methods. All builtin objects have a good __str__ methods and all non-buggy user-defined objects should.
When *objects* is an iterator of strings, passing nogen=True avoids an layer of wrapping by claiming that str calls are not needed. StringIt methods that return a StringIt do this. An iterable of strings, such as ['a', 'b', 'c'], can be turned into an iterator with iter(iterable). Users who pass nogen=True do so at their own risk because checking the claim would empty the iterable. """ if not hasattr(objects, '__iter__'): raise ValueError('objects is not an iterable') if nogen and not hasattr(objects, '__next__'): raise ValueError('objects is not an iterator') if nogen: self.it = objects else: self.it = (str(ob) for ob in objects)
def __iter__(self): return self.it.__iter__()
def __next__(self): return self.it.__next__()
def upper(self): return StringIt((s.upper() for s in self.it), nogen=True)
def count(self, sub, start=0, end=None): return (s.count(sub, start, end or len(s)) for s in self.it)
def join(self, joiner): return joiner.join(self.it)
for si, out in ( (StringIt(iter(('a', 'b', 'c')), nogen=True), ['a', 'b', 'c']), (StringIt((1, 2, 3)), ['1', '2', '3']), (StringIt((1, 2, 3)).count('1'), [1, 0, 0]), (StringIt(('a', 'b', 'c')).upper(), ['A', 'B', 'C']), ): assert list(si) == out assert StringIt(('a', 'b', 'c')).upper().join('-') == 'A-B-C' # asserts all pass