[Python-ideas] Adding "Typed" collections/iterators to Python

Mon Dec 19 03:24:12 CET 2011

On Sun, Dec 18, 2011 at 6:45 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Mon, Dec 19, 2011 at 9:28 AM, Nathan Rice
> <nathan.alexander.rice at gmail.com> wrote:
>> -- Performing a series of operations using comprehensions or map
>> tends to be highly verbose in an uninformative way.  Compare the
>> current method with what would be possible using "typed" collections:
>>
>> L2 = [X(e) for e in L1]
>> L3 = [Y(e) for e in L2]
>> vs
>> L2 = X(L1) # assuming X has been updated to work in both vector/scalar
>> L3 = Y(L2) # context...
>
> This use case is why map() remains a builtin, even in Python 3:

Yes, but map(lambda x: getattr(x, "method")(), thing) is ugly, and
map(lambda x: x.method_2(param), map(lambda x: x.method(param),
thing)) is really ugly.  On top of that, it is asking more of code
analysis tools to verify that code, and IDEs aren't going to be able
to tell you the methods on the ambiguous x in the lambda. Sure if the
only argument is self, you could call class.method, but I don't think
that is the majority use case.

> L2 = map(X, L1)
> L3 = map(Y, L2)
>
> Short, but explicit (no under-the-hood guessing about whether or not
> something should be treated as a scalar or vector value - in the
> general case, this distinction isn't as clear as you might think, just
> look at strings).

Yes, I love that feature of strings, it is a source of lots of bugs,
but I digress.  The reason this partially solves that problem is that
instead of having to do a bunch of guesswork on an iterable to see if
you should do the vectorized version of the function, you just check
to see if it is an instance of a certain TypedCollectionContract.  If
so, vectorize.  No "isinstance(foo, Iterable) and not isinstance(foo,
basestr)" silliness here.

>> L2 = [Z(Y(X(e))) for e in L1]
>> vs
>> L2 = Z(Y(X(L1)))
>
> def XYZ(arg):
>   """Look, I can document what this means!"""
>   return Z(Y(X(arg)))
>
> L2 = map(XYZ, L1)

What about WXY, and XZ, and WYZ, and YZ, and...

>> L2 = [e.X().Y().Z() for e in L1]
>> vs
>> L2 = L1.X().Y().Z() # assuming vectorized versions of member methods
>> #are folded into the collection via the mixin.
>
> def XYZ_methods(arg):
>   """I can also document what *this* means"""
>   return arg.X().Y().Z()
>
> L2 = map(XYZ_methods, L1)
>
>> --  Because collections are type agnostic, it is not possible to place
>> methods on them that are type specific.  This leads to a lot of cases
>> where python forces you to read inside out or a the syntax gets
>> very disjoint in general.  A good example of this is:
>>
>> "\n".join(l.capitalize() for l in my_string.split("\n"))
>>
>> which could reduce to something far more readable, such as:
>>
>> my_string.split("\n").capitalize().join_items("\n")
>
> Another bad example, since that's just a really verbose way of writing
> my_string.capitalize().

The python interpreter says otherwise...

>>> foo = "line 1\nline 2\nline 3"
>>> foo.capitalize()
'Line 1\nline 2\nline 3'
>>> "\n".join(s.capitalize() for s in foo.split("\n"))
'Line 1\nLine 2\nLine 3'
>>>

> Short answer: what advantage does your proposal really offer over
> simply extracting the repetitive operation out to a use case specific
> function, and making effective use of the existing vectorisation
> utilities (i.e. map() and itertools)?

IDEs can provide context hints, interpreters can use the contract to
change how they treat the collection to improve performance, lint
style code analysis will be easier and it is across the board less
verbose than the currently available options, while reading left to
right with fewer non letter characters I.E. more understandable.

Nathan