[Python-ideas] Vectorization [was Re: Add list.join() please]

Kyle Lahnakoski klahnakoski at mozilla.com
Sun Feb 10 13:05:42 EST 2019

On 2019-02-02 18:11, Steven D'Aprano wrote:
> We can improve that comprehension a tiny bit by splitting it into
> multiple steps:
>      temp1 = [d+e for d, e in zip(vector, sequence)]
>      temp2 = [process(c) for x in temp1]
>      result = [a*b for a, b in zip(temp2, items)]
> but none of these are as elegant or readable as the vectorized syntax
>      result = process.(vector .+ sequence) .* items

The following reads a little better:

| result = [
|     process(v+s)*i
|     for v, s, i in zip(vector, sequence, items)
| ]

Vector operations will promote the use of data formats that work well
with vector operations. So, I would expect data to appear like rows in a
table, rather than in the columnar form shown above. Even if columnar
form must be dealt with, we can extend our Vector class (or whatever
abstraction you are using to enter vector space) to naturally zip() columns.

| Vector(zip(vector, sequence, items))
|     .map(lambda v, s, i: process(v+s)*i)    

If we let Vector represent a list of tuples instead of a list of values,
we can make construction simpler:

| Vector(vector, sequence, items)
|     .map(lambda v, s, i: process(v+s)*i)    

If we have zip() to extend the tuples in the Vector, then we can be
verbose to demonstrate how to use columnar data:

| Vector(vector)
|     .zip(sequence)
|     .map(operator.add)
|     .map(process)
|     .zip(items)
|     .map(operator.mul)

This looks verbose, but it is not too far from the vectorized syntax:

the Vector() brings us to vector mode, and the two zip()s convert from
columnar form. This verbose form may be *better* than the vectorized
syntax because the operations are in order, rather than the mixing infix
and functional forms seen in the vectorized syntax form.

I suggest this discussion include vector operations on (frozen)
dicts/objects and (frozen) lists/tuples.  Then we can have an
interesting discussion about the meaning of group_by, join, and window
functions, plus other operations we find in database query languages.

I am interested in vector operations.  I have situations where I want to
perform some conceptually simple operations on a series of
not-defined-by-me objects to make a series of conclusions.  The
calculations can be done succinctly in SQL, but Python makes them
difficult. Right now, my solution is to describe the transformations in
JSON, and have an interpreter do the processing:


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190210/4dcb0bf7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jpgkifbeadmcaigm.png
Type: image/png
Size: 62836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190210/4dcb0bf7/attachment-0001.png>

More information about the Python-ideas mailing list