[Python-ideas] Vectorization [was Re: Add list.join() please]
Christopher Barker
pythonchb at gmail.com
Sat Feb 2 18:22:12 EST 2019
On Fri, Feb 1, 2019 at 5:00 PM David Mertz <mertz at gnosis.cx> wrote:
> is is certainly doable. But why would it be better than:
>>
>
> map(str.lower, my_string_vector)
> map(compute_grad, my_student_vector)
>
or [s.lower() for s in my_string_vector]
Side note: It's really interesting to me that Python introduced
comprehension sytax some years ago, and even "hid" reduce(), and now there
seems to be a big interest / revival of "map".
Even numpy supports inhomogeneous data:
> py> a = np.array([1, 'spam'])
> py> a
> array(['1', 'spam'],
> dtype='|S4')
well, no -- it doesn't -- look carefully, that is an array or type '!S4' --
i,e, a 4 element long string --every element in that array is that same
type. Also note that numpy's support for strings a not very complete.
numpy does support an "object" type, that can be inhomogeneous -- it's
still a single type, but that type is a python object (under the hood it's
an array fo pointers to pyobjects):
In [3]: a = np.array([1, 'spam'], dtype=np.object)
In [4]: a
Out[4]: array([1, 'spam'], dtype=object)
And it does support vectorization to some extent:
In [5]: a * 5
Out [5]: array([5, 'spamspamspamspamspam'], dtype=object)
But not with any performance benefits.
I think there are good reasons to have a "string_vector" that is known to
be homogenous:
Performance -- it could be significantly optimized (are there many use
cases for that? I don't know.
Clear API: a string_vector would have all the relevant string methods.
You could easily write a list subclass that passed on method calls to the
enclosed objects, but then you'd have a fair bit of confusion as to what
might be a vector method vs a method on the objects.
which I suppose leaves us with something like:
list.elements.upper()
list.elements * 5
hmm -- not sure how much I like this, but it's pretty doable.
I still haven't seen any examples that aren't already spelled 'map(fun, it)'
and I don't think you will -- I *think* get credit for starting this part
of the the thread, and I started by saying I have often longed for
essentially a more concise way to spell map() or comprehensions.
performance asside, I use numpy because:
c = np.sqrt(a**2 + b**2)
is a heck of a lot easer to read, write, and get correct than:
c = list(map(math.sqrt, map(lambda x, y: x + y, map(lambda x: x**2, a),
map(lambda x: x**2, b)
)))
or:
[math.sqrt(x) for x in (a + b for a, b in zip((x**2 for x in a),
(x**2 for x in b)
))]
Note: it took me quite a while to get those right! (and I know I could have
used the operator module to get the map version maybe a bit cleaner, but
the point stands)
Does this apply to string processing? I'm not sure, though I do a fair bit
of chaining of string operations:
my_string.strip().lower().title()
if you wanted to do that to a list of strings:
a_list_of_strings.strip().lower().title()
is a lot nicer than:
[s.title() for s in (s.lower() for s in [s.strip(s) for s in
a_list_of_strings])]
or
list(map(str.title, (map(str.lower, (map(str.strip, a_list_of_strings)))) #
untested
How common is that use case? not common enough for me to go any further
with this.
-CHB
-CHB
--
Christopher Barker, PhD
Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190202/5f60f67d/attachment-0001.html>
More information about the Python-ideas
mailing list