[Python-ideas] Vectorization [was Re: Add list.join() please]

Steven D'Aprano steve at pearwood.info
Thu Feb 7 18:42:00 EST 2019


On Thu, Feb 07, 2019 at 03:17:18PM -0500, David Mertz wrote:
> Many apologies if people got one or more encrypted versions of this.
> 
> On 2/7/19 12:13 AM, Steven D'Aprano wrote:
> 
> It wasn't a concrete proposal, just food for thought. Unfortunately the
> thinking seems to have missed the point of the Julia syntax and run off
> with the idea of a wrapper class.
> 
> I did not miss the point! I think adding new syntax à la Julia is a bad
> idea—or at very least, not something we can experiment with today (and
> wrote as much).

I'm sorry, I did not see your comment that you thought new syntax was a 
bad idea. If I had, I would have responded directly to that.

Why is it an overtly *bad* (i.e. harmful) idea? As opposed to merely 
not sufficiently useful, or unnecessary?

You're certainly right that we can't easily experiment in the 
interpreter with new syntax, but we can perform thought-experiments and 
we don't need anything but a text editor for that. As far as I'm 
concerned, the thought experiment of comparing these two snippets:

    ((seq .* 2)..name)..upper()

versus

    map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, seq)))

demonstrates conclusively that even with the ugly double dot syntax, 
infix syntax easily and conclusively beats map.

If I recall correctly, the three maps here were originally proposed by 
you as examples of why map() alone was sufficient and there was no 
benefit to the Julia syntax. I suggested composing them together as a 
single operation instead of considering them in isolation.


> Therefore, something we CAN think about and experiment with today is a
> wrapper class.

Again, I apologise, I did not see where you said that this was intended 
as a proof-of-concept to experiment with the concept.


[...]
> One of the principles I had in mind in my demonstration is that I want
> to wrap the original collection type (or keep it an iterator if it
> started as one).  A number of other ideas here, whether for built-in
> syntax or different behaviors of a wrapper, effectively always reduce
> every sequence to a list under the hood.  This makes my approach less
> intrusive to move things in and out of "vector mode."  For example:

If the Vector class is only a proof of concept, then we surely don't 
need to care about moving things in and out of "vector mode". We can 
take it as a given that "the real thing" will work that way: the syntax 
will be duck-typed and work with any iterable, and there will not be any 
actual wrapper class involved and consequently no need to move things in 
and out of the wrapper.

I had taken note of this functionality of the class before, and that was 
one of the things which lead me to believe that you thought that a 
wrapper class was in and of itself a solution to the problem. If you had 
been proposing this Vector class as a viable working solution (or at 
least a first alpha version towards a viable solution) then worrying 
about round-tripping would be important.

But as a proof-of-concept of the functionality, then:

    set( Vector(set_of_stuff) + spam )
    list( Vector(list_of_stuff) + spam )

should be enough to play around with the concept.



[...]
> Inasmuch as I want to handle iterator here, it is impossible to do any
> type check upon creating a Vector.  For concrete
> `collections.abc.Sequence` objects we could check, in principle.  But
> I'd rather it be "we're all adults here" ... or at most provide some
> `check_type_uniformity()` function or method that had to be called
> explicitly.

Why do you care about type uniformity or type-checking the contents of 
the iterable?

Comments like this suggest to me that you haven't understood the 
idea as I have tried to explain it. I'm sorry that I have failed to 
explain it better.

Julia is (if I understand correctly) statically typed, and that allows 
it to produce efficient machine code because it knows that it is 
iterating over (let's say) an array of 32-bit ints.

While that might be important for the efficiency of the generated 
machine code, that's not important for the semantic meaning of the code. 
In Python, we duck-type and resolve operations at runtime. We don't 
typically validate types in advance:

    for x in sequence:
        if not isinstance(x, Spam):
             raise TypeError('not Spam')
    for x in sequence:
        process(x)

(except under unusual circumstances). More to the point, when we write a 
for-loop:

    result = []
    for a_string in seq:
        result.append(a_string.upper())

we don't expect that the interpreter will validate that the sequence 
contains nothing but strings in advance. So if I write this using Julia 
syntax:

    result = seq..upper()

I shouldn't expect the iterpreter to check that seq contains nothing but 
strings either.



-- 
Steven


More information about the Python-ideas mailing list