[Python-ideas] Vectorization [was Re: Add list.join() please]
David Mertz
mertz at gnosis.cx
Thu Feb 7 15:17:18 EST 2019
Many apologies if people got one or more encrypted versions of this.
On 2/7/19 12:13 AM, Steven D'Aprano wrote:
It wasn't a concrete proposal, just food for thought. Unfortunately the
thinking seems to have missed the point of the Julia syntax and run off
with the idea of a wrapper class.
I did not miss the point! I think adding new syntax à la Julia is a bad
idea—or at very least, not something we can experiment with today (and
wrote as much).
Therefore, something we CAN think about and experiment with today is a
wrapper class. This approach is pretty much exactly the same thing I
tried in a discussion of PEP 505 a while back (None-aware operators).
In the same vein as that—where I happen to dislike PEP 505 pretty
strongly—one approach to simulate or avoid new syntax is precisely to
use a wrapper class.
As a footnote, I think my demonstration of PEP 505 got derailed by lots
of comments along the lines of "Your current toy library gets the
semantics of the proposed new syntax wrong in these edge cases." Those
comments were true (and I think I didn't fix all the issues since my
interest faded with the active thread)... but none of them were
impossible to fix, just small errors I had made.
With my *very toy* stringpy.Vector class, I'm just experimenting with
usage ideas. I have shown a number of uses that I think could be useful
to capture most or all of what folks want in "string vectorization."
Most of what I've but in this list is what the little module does
already, but some is just ideas for what it might do if I add the code
(or someone else makes a PR at https://github.com/DavidMertz/stringpy).
One of the principles I had in mind in my demonstration is that I want
to wrap the original collection type (or keep it an iterator if it
started as one). A number of other ideas here, whether for built-in
syntax or different behaviors of a wrapper, effectively always reduce
every sequence to a list under the hood. This makes my approach less
intrusive to move things in and out of "vector mode." For example:
v1 = Vector(set_of_strings)
set_of_strings = v1.lower().apply(my_str_fun)._it # Get a set back
v2 = Vector(list_of_strings)
list_of_strings = v2.lower().apply(my_str_fun)._it # Get a list back
v3 = Vector(deque_of_strings)
deque_of_strings = v3.lower().apply(my_str_fun)._it # Get a deque back
v4 = Vector(iter_of_strings)
iter_of_strings = v4.lower().apply(my_str_fun)._it # stays lazy!
So this is round-tripping through vector-land.
Small note: I use the attribute `._it` to store the "sequential thing."
That feels internal, so maybe some better way of spelling "get the
wrapped thing" would be desirable.
I've also lost track of whether anyone is proposing a "vector of strings'
as opposed to a vector of arbitrary objects.
Nothing I wrote is actually string-specific. That is just the main use
case stated. My `stringpy.Vector` might be misnamed in that it is happy
to contain any kind of items. But we hope they are all items with the
particular methods we want to vectorize. I showed an example where a
list might contain a custom string-like object that happens to have
methods like `.lower()` as an illustration.
Inasmuch as I want to handle iterator here, it is impossible to do any
type check upon creating a Vector. For concrete
`collections.abc.Sequence` objects we could check, in principle. But
I'd rather it be "we're all adults here" ... or at most provide some
`check_type_uniformity()` function or method that had to be called
explicitly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190207/cfe13910/attachment-0001.html>
More information about the Python-ideas
mailing list