[Python-ideas] Vectorization [was Re: Add list.join() please]

Sat Feb 2 18:22:12 EST 2019

On Fri, Feb 1, 2019 at 5:00 PM David Mertz <mertz at gnosis.cx> wrote:

> is is certainly doable. But why would it be better than:
>>
>
> map(str.lower, my_string_vector)
> map(compute_grad, my_student_vector)
>

or [s.lower() for s in my_string_vector]

Side note: It's really interesting to me that Python introduced
comprehension sytax some years ago, and even "hid" reduce(), and now there
seems to be a big interest / revival of "map".

Even numpy supports inhomogeneous data:
> py> a = np.array([1, 'spam'])
> py> a
> array(['1', 'spam'],
>       dtype='|S4')

well, no -- it doesn't -- look carefully, that is an array or type '!S4' --
i,e, a 4 element long string --every element in that array is that same
type. Also note that numpy's support for strings a not very complete.

numpy does support an "object" type, that can be inhomogeneous -- it's
still a single type, but that type is a python object (under the hood it's
an array fo pointers to pyobjects):

In [3]: a = np.array([1, 'spam'], dtype=np.object)

In [4]: a

Out[4]: array([1, 'spam'], dtype=object)

And it does support vectorization to some extent:
In  [5]: a * 5

Out [5]: array([5, 'spamspamspamspamspam'], dtype=object)

But not with any performance benefits.

I think there are good reasons to have a "string_vector" that is known to
be homogenous:

Performance -- it could be significantly optimized (are there many use
cases for that? I don't know.

Clear API: a string_vector would have all the relevant string methods.

You could easily write a list subclass that passed on method calls to the
enclosed objects, but then you'd have a fair bit of confusion as to what
might be a vector method vs a method on the objects.

which I suppose leaves us with something like:

list.elements.upper()

list.elements * 5

hmm -- not sure how much I like this, but it's pretty doable.

I still haven't seen any examples that aren't already spelled 'map(fun, it)'

and I don't think you will -- I *think* get credit for starting this part
of the the thread, and I started by saying I have often longed for
essentially a more concise way to spell map() or comprehensions.
performance asside, I use numpy because:

c = np.sqrt(a**2 + b**2)

is a heck of a lot easer to read, write, and get correct than:

c = list(map(math.sqrt, map(lambda x, y: x + y, map(lambda x: x**2, a),
                                                map(lambda x: x**2, b)
                              )))

or:

[math.sqrt(x) for x in (a + b for a, b in zip((x**2 for x in a),
                                              (x**2 for x in b)
                                              ))]

Note: it took me quite a while to get those right! (and I know I could have
used the operator module to get the map version maybe a bit cleaner, but
the point stands)

Does this apply to string processing? I'm not sure, though I do a fair bit
of chaining of string operations:

my_string.strip().lower().title()

if you wanted to do that to a list of strings:

a_list_of_strings.strip().lower().title()

is a lot nicer than:

[s.title() for s in (s.lower() for s in [s.strip(s) for s in
a_list_of_strings])]

or

list(map(str.title, (map(str.lower, (map(str.strip, a_list_of_strings)))) #
untested

How common is that use case? not common enough for me to go any further
with this.

-CHB

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190202/5f60f67d/attachment-0001.html>