From ron3200 at gmail.com  Fri May  1 00:59:03 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Thu, 30 Apr 2015 18:59:03 -0400
Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects
In-Reply-To: <CAHVvXxTO6RBXZDu8aAzNWSzaZq-Ppf=v1Q65_t6LAJxGbDF5_A@mail.gmail.com>
References: <CAHVvXxTO6RBXZDu8aAzNWSzaZq-Ppf=v1Q65_t6LAJxGbDF5_A@mail.gmail.com>
Message-ID: <mhuc3o$2fm$1@ger.gmane.org>



On 04/30/2015 04:37 PM, Oscar Benjamin wrote:
> With PEP 492 it seems that I would get something like:
>
>>>> >>>async def af(): pass
>>>> >>>ag = af()
>>>> >>>ag
> <coroutine_object object af at 0x7fb81dadc828>


 > It seems harder to think of a good name for ag though.

A waiter?
or awaiter?

As in a-wait-ing an awaiter.

Maybe there's a restaurant/food way of describing how it works. :-)


I'm not sure I have the use correct.  But I think we need to use "await 
af()" when calling an async function.

Cheers,
    Ron


From greg.ewing at canterbury.ac.nz  Fri May  1 02:26:53 2015
From: greg.ewing at canterbury.ac.nz (Greg)
Date: Fri, 01 May 2015 12:26:53 +1200
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <CAP7+vJL7G_bN-KknPYbe-R-RZ6kGob1LtOXQWvb30AC8jjw=cA@mail.gmail.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info> <mhtjpb$r9$1@ger.gmane.org>
 <CACac1F9EH7apMRpwLMbnaZV0zpcpzT-b1hPk3M5QxG=Fvq9NbQ@mail.gmail.com>
 <CAFpSVpKrJWsBnXK44YaER8E8mK6E45vE08MZc81vOLGquw_TOw@mail.gmail.com>
 <CAP7+vJL7G_bN-KknPYbe-R-RZ6kGob1LtOXQWvb30AC8jjw=cA@mail.gmail.com>
Message-ID: <5542C84D.4000107@canterbury.ac.nz>

On 1/05/2015 5:31 a.m., Guido van Rossum wrote:
> Ah. But 'async for' is not meant to introduce parallelism or
> concurrency.

This kind of confusion is why I'm not all that enamoured of using
the word "async" the way PEP 492 does.

But since there seems to be prior art for it in other languages
now, I suppose there are at least some people out there who
won't be confused by it.

-- 
Greg


From steve at pearwood.info  Fri May  1 02:35:52 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 1 May 2015 10:35:52 +1000
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
Message-ID: <20150501003551.GG5663@ando.pearwood.info>

On Thu, Apr 30, 2015 at 07:12:11PM +0200, Todd wrote:
> On Thu, Apr 30, 2015 at 1:36 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:

> > A parallel version of map makes sense, because the semantics of map are
> > well defined: given a function f and a sequence [a, b, c, ...] it
> > creates a new sequence [f(a), f(b), f(c), ...]. The assumption is that f
> > is a pure-function which is side-effect free (if it isn't, you're going
> > to have a bad time). The specific order in which a, b, c etc. are
> > processed doesn't matter. If it does matter, then map is the wrong way
> > to process it.
> >
> >
> multiprocessing.Pool.map guarantees ordering.  It is
> multiprocessing.Pool.imap_unordered that doesn't.

I don't think it guarantees ordering in the sense I'm referring to. It 
guarantees that the returned result will be [f(a), f(b), f(c), ...] in 
that order, but not that f(a) will be calculated before f(b), which is 
calculated before f(c), ... and so on. That's the point of parallelism: 
if f(a) takes a long time to complete, another worker may have completed 
f(b) in the meantime.

The point I am making is that map() doesn't have any connotations of the 
order of execution, where as for loops have a very strong connotation of 
executing the block in a specific sequence. People don't tend to use map 
with a function with side-effects:

    map(lambda i: print(i) or i, range(100))

will return [0, 1, 2, ..., 99] but it may not print 0 1 2 3 ... in that 
order. But with a for-loop, it would be quite surprising if

   for i in range(100):
       print(i)

printed the values out of order. In my opinion, sticking "mypool" in 
front of the "for i" doesn't change the fact that adding parallelism to 
a for loop would be surprising and hard to reason about.

If you still wish to argue for this, one thing which may help your case 
is if you can identify other programming languages that have already 
done something similar.


-- 
Steve

From yselivanov.ml at gmail.com  Fri May  1 02:54:42 2015
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Thu, 30 Apr 2015 20:54:42 -0400
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <20150501003551.GG5663@ando.pearwood.info>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info>
Message-ID: <5542CED2.6080108@gmail.com>

On 2015-04-30 8:35 PM, Steven D'Aprano wrote:
>> multiprocessing.Pool.map guarantees ordering.  It is
>> >multiprocessing.Pool.imap_unordered that doesn't.
> I don't think it guarantees ordering in the sense I'm referring to. It
> guarantees that the returned result will be [f(a), f(b), f(c), ...] in
> that order, but not that f(a) will be calculated before f(b), which is
> calculated before f(c), ... and so on. That's the point of parallelism:
> if f(a) takes a long time to complete, another worker may have completed
> f(b) in the meantime.

This is an *excellent* point.

Yury


From ethan at stoneleaf.us  Fri May  1 03:02:01 2015
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 30 Apr 2015 18:02:01 -0700
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <5542CED2.6080108@gmail.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info>
 <5542CED2.6080108@gmail.com>
Message-ID: <20150501010201.GL10248@stoneleaf.us>

On 04/30, Yury Selivanov wrote:
> On 2015-04-30 8:35 PM, Steven D'Aprano wrote:

>> I don't think it guarantees ordering in the sense I'm referring to. It
>> guarantees that the returned result will be [f(a), f(b), f(c), ...] in
>> that order, but not that f(a) will be calculated before f(b), which is
>> calculated before f(c), ... and so on. That's the point of parallelism:
>> if f(a) takes a long time to complete, another worker may have completed
>> f(b) in the meantime.
> 
> This is an *excellent* point.

So, PEP 492 asynch for also guarantees that the loop runs in order, one at
a time, with one loop finishing before the next one starts?

*sigh*

How disappointing.

--
~Ethan~

From yselivanov.ml at gmail.com  Fri May  1 03:07:50 2015
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Thu, 30 Apr 2015 21:07:50 -0400
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <20150501010201.GL10248@stoneleaf.us>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info> <5542CED2.6080108@gmail.com>
 <20150501010201.GL10248@stoneleaf.us>
Message-ID: <5542D1E6.80307@gmail.com>

On 2015-04-30 9:02 PM, Ethan Furman wrote:
> On 04/30, Yury Selivanov wrote:
>> On 2015-04-30 8:35 PM, Steven D'Aprano wrote:
>>> I don't think it guarantees ordering in the sense I'm referring to. It
>>> guarantees that the returned result will be [f(a), f(b), f(c), ...] in
>>> that order, but not that f(a) will be calculated before f(b), which is
>>> calculated before f(c), ... and so on. That's the point of parallelism:
>>> if f(a) takes a long time to complete, another worker may have completed
>>> f(b) in the meantime.
>> This is an *excellent* point.
> So, PEP 492 asynch for also guarantees that the loop runs in order, one at
> a time, with one loop finishing before the next one starts?
>
> *sigh*
>
> How disappointing.
>


No.  Nothing prevents you from scheduling asynchronous
parallel computation, or prefetching more data.  Since
__anext__ is an awaitable you can do that.

Steven's point is that Todd's proposal isn't that
straightforward to apply.


Yury

From guido at python.org  Fri May  1 05:29:22 2015
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Apr 2015 20:29:22 -0700
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <5542D1E6.80307@gmail.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info> <5542CED2.6080108@gmail.com>
 <20150501010201.GL10248@stoneleaf.us> <5542D1E6.80307@gmail.com>
Message-ID: <CAP7+vJ+vEMnPq6XXtjYyvJwU_E7uq6jZH_N+GwdK+rViWdzMjg@mail.gmail.com>

On Thu, Apr 30, 2015 at 6:07 PM, Yury Selivanov <yselivanov.ml at gmail.com>
wrote:

> On 2015-04-30 9:02 PM, Ethan Furman wrote:
>
>> On 04/30, Yury Selivanov wrote:
>>
>>> On 2015-04-30 8:35 PM, Steven D'Aprano wrote:
>>>
>>>> I don't think it guarantees ordering in the sense I'm referring to. It
>>>> guarantees that the returned result will be [f(a), f(b), f(c), ...] in
>>>> that order, but not that f(a) will be calculated before f(b), which is
>>>> calculated before f(c), ... and so on. That's the point of parallelism:
>>>> if f(a) takes a long time to complete, another worker may have completed
>>>> f(b) in the meantime.
>>>>
>>> This is an *excellent* point.
>>>
>> So, PEP 492 asynch for also guarantees that the loop runs in order, one at
>> a time, with one loop finishing before the next one starts?
>>
>> *sigh*
>>
>> How disappointing.
>>
>>
>
> No.  Nothing prevents you from scheduling asynchronous
> parallel computation, or prefetching more data.  Since
> __anext__ is an awaitable you can do that.
>

That's not Ethan's point. The 'async for' statement indeed is a sequential
loop: e.g. if you write

  async for rec in db_cursor:
      print(rec)

you are guaranteed that the records are printed in the order in which they
are produced by the database cursor. There is no implicit parallellism of
the execution of the loop bodies. Of course you can introduce parallelism,
but you have to be explicit about it, e.g. by calling some async function
for each record *without* awaiting for the result, e.g. collecting the
awaitables in a separate list and then using e.g. the gather() operation
from the asyncio package:

  async def process_record(rec):
      print(rec)

  fs = []
  for rec in db_cursor:
      fs.append(process_record(rec))
  await asyncio.gather(*fs)

This may print the records in arbitrary order. Note that unlike threads,
you don't need locks, since there is no worry about parallel access to
sys.stdout by print(). The print() function does not guarantee atomicity
when it writes to sys.stdout, and in a threaded version of the above code
you might occasionally see two records followed by two \n characters,
because threads can be arbitrarily interleaved. Task switching between
coroutines only happens at await (or yield [from] :-) and at the await
points specified by PEP 492 in the 'async for' and 'async with' statements.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150430/67a390c3/attachment.html>

From greg.ewing at canterbury.ac.nz  Fri May  1 07:24:39 2015
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 01 May 2015 17:24:39 +1200
Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects
In-Reply-To: <mhuc3o$2fm$1@ger.gmane.org>
References: <CAHVvXxTO6RBXZDu8aAzNWSzaZq-Ppf=v1Q65_t6LAJxGbDF5_A@mail.gmail.com>
 <mhuc3o$2fm$1@ger.gmane.org>
Message-ID: <55430E17.7030404@canterbury.ac.nz>

Ron Adam wrote:

> A waiter?
> or awaiter?
> 
> As in a-wait-ing an awaiter.

The waiter would be the function executing the await
operator, not the thing it's operating on.

In a restaurant, waiters wait on customers. But calling
an awaitable object a "customer" doesn't seem right
at all.

-- 
Greg

From ram at rachum.com  Fri May  1 10:12:19 2015
From: ram at rachum.com (Ram Rachum)
Date: Fri, 1 May 2015 11:12:19 +0300
Subject: [Python-ideas] Add `Executor.filter`
Message-ID: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>

Hi,

What do you think about adding a method: `Executor.filter`?

I was using something like this:

my_things = [thing for thing in things if some_condition(thing)]


But the problem was that `some_condition` took a long time to run waiting
on I/O, which is a great candidate for parallelizing with
ThreadPoolExecutor. I made it work using `Executor.map` and some
improvizing, but it would be nicer if I could do:

with concurrent.futures.ThreadPoolExecutor(100) as executor:
    my_things = executor.filter(some_condition, things)

And have the condition run in parallel on all the threads.

What do you think?


Thanks,
Ram.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150501/f94856cb/attachment-0001.html>

From abarnert at yahoo.com  Fri May  1 13:13:41 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 1 May 2015 04:13:41 -0700
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <CACac1F-8cqYZ3ah73Bq4DRYf40qAzJkNVksLeWrRhqrZmp_jiw@mail.gmail.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info> <mhtjpb$r9$1@ger.gmane.org>
 <CACac1F9EH7apMRpwLMbnaZV0zpcpzT-b1hPk3M5QxG=Fvq9NbQ@mail.gmail.com>
 <CAFpSVpKrJWsBnXK44YaER8E8mK6E45vE08MZc81vOLGquw_TOw@mail.gmail.com>
 <CAP7+vJL7G_bN-KknPYbe-R-RZ6kGob1LtOXQWvb30AC8jjw=cA@mail.gmail.com>
 <CACac1F-8cqYZ3ah73Bq4DRYf40qAzJkNVksLeWrRhqrZmp_jiw@mail.gmail.com>
Message-ID: <A5101A7D-35FD-489E-9B1E-36A938291DC4@yahoo.com>

On Apr 30, 2015, at 10:54, Paul Moore <p.f.moore at gmail.com> wrote:
> 
>> On 30 April 2015 at 18:31, Guido van Rossum <guido at python.org> wrote:
>> PEP 492 is only meant to make code easier to read and write that's already
>> written to use coroutines (e.g. using the asyncio library, but not limited
>> to that).
> 
> OK, that's fair. To an outsider like me it feels like a lot of new
> syntax to support a very specific use case. But that's because I don't
> really have a feel for what you mean when you note "but not limited to
> that". Are there any good examples or use cases for coroutines that
> are *not* asyncio-based?

IIRC, the original asyncio PEP has links to Greg Ewing's posts that demonstrated how you could use yield from coroutines for various purposes, including asynchronous I/O, but also things like many-actor simulations, with pretty detailed examples.

> And assuming you are saying that PEP 482
> should help for those as well, could it include a non-asyncio example?
> My immediate reaction is that the keywords "async" and "await" will
> seem a little odd in a non-asyncio context.
> 
> Paul
> 
> Paul
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From abarnert at yahoo.com  Fri May  1 13:19:16 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 1 May 2015 04:19:16 -0700
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <20150501003551.GG5663@ando.pearwood.info>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info>
Message-ID: <C4E88FB6-A1ED-4669-813A-73BD0DBC2DF2@yahoo.com>

On Apr 30, 2015, at 17:35, Steven D'Aprano <steve at pearwood.info> wrote:
> 
>> On Thu, Apr 30, 2015 at 07:12:11PM +0200, Todd wrote:
>> On Thu, Apr 30, 2015 at 1:36 PM, Steven D'Aprano <steve at pearwood.info>
>> wrote:
> 
>>> A parallel version of map makes sense, because the semantics of map are
>>> well defined: given a function f and a sequence [a, b, c, ...] it
>>> creates a new sequence [f(a), f(b), f(c), ...]. The assumption is that f
>>> is a pure-function which is side-effect free (if it isn't, you're going
>>> to have a bad time). The specific order in which a, b, c etc. are
>>> processed doesn't matter. If it does matter, then map is the wrong way
>>> to process it.
>> multiprocessing.Pool.map guarantees ordering.  It is
>> multiprocessing.Pool.imap_unordered that doesn't.
> 
> I don't think it guarantees ordering in the sense I'm referring to. It 
> guarantees that the returned result will be [f(a), f(b), f(c), ...] in 
> that order, but not that f(a) will be calculated before f(b), which is 
> calculated before f(c), ... and so on. That's the point of parallelism: 
> if f(a) takes a long time to complete, another worker may have completed 
> f(b) in the meantime.
> 
> The point I am making is that map() doesn't have any connotations of the 
> order of execution, where as for loops have a very strong connotation of 
> executing the block in a specific sequence. People don't tend to use map 
> with a function with side-effects:
> 
>    map(lambda i: print(i) or i, range(100))
> 
> will return [0, 1, 2, ..., 99] but it may not print 0 1 2 3 ... in that 
> order. But with a for-loop, it would be quite surprising if
> 
>   for i in range(100):
>       print(i)
> 
> printed the values out of order. In my opinion, sticking "mypool" in 
> front of the "for i" doesn't change the fact that adding parallelism to 
> a for loop would be surprising and hard to reason about.
> 
> If you still wish to argue for this, one thing which may help your case 
> is if you can identify other programming languages that have already 
> done something similar.

The obvious thing to look at here seems to be OpenMP's parallel for. I haven't used it in a long time, but IIRC, in the C bindings, you use it something like:

    #pragma omp_parallel_for
    for (int i=0; i!=100; ++i) {
        lots_of_work(i);
    }

... and it turns it into something like:

    for (int i=0; i!=100; ++i) {
        queue_put(current_team_queue, processed loop body thingy);
    }
    queue_wait(current_team_queue, 100);

> 
> -- 
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From abarnert at yahoo.com  Fri May  1 13:24:55 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 1 May 2015 04:24:55 -0700
Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects
In-Reply-To: <55430E17.7030404@canterbury.ac.nz>
References: <CAHVvXxTO6RBXZDu8aAzNWSzaZq-Ppf=v1Q65_t6LAJxGbDF5_A@mail.gmail.com>
 <mhuc3o$2fm$1@ger.gmane.org> <55430E17.7030404@canterbury.ac.nz>
Message-ID: <FF145B05-126C-4542-9348-5453B71CC9C4@yahoo.com>

On Apr 30, 2015, at 22:24, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> 
> Ron Adam wrote:
> 
>> A waiter?
>> or awaiter?
>> As in a-wait-ing an awaiter.
> 
> The waiter would be the function executing the await
> operator, not the thing it's operating on.
> 
> In a restaurant, waiters wait on customers. But calling
> an awaitable object a "customer" doesn't seem right
> at all.

Well, the only thing in the restaurant besides the waiter and the customers is the Vikings, so I guess the restaurant metaphor doesn't work...

Anyway, if I understand the problem, the main confusion is that we use "coroutine" both to mean a thing that can be suspended and resumed, and a function that returns such a thing. Why not just "coroutine" and "coroutine function", just as with "generator" and "generator function".

If the issue is that there are other things that are coroutines besides the coroutine type... well, there are plenty of things that are iterators that are all of unrelated types, and has anyone ever been confused by that? (Of course people have been confused by iterator vs. iterable, but that's a different issue, and one that doesn't have a parallel here.)


> -- 
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From oscar.j.benjamin at gmail.com  Fri May  1 13:49:44 2015
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Fri, 1 May 2015 12:49:44 +0100
Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects
In-Reply-To: <FF145B05-126C-4542-9348-5453B71CC9C4@yahoo.com>
References: <CAHVvXxTO6RBXZDu8aAzNWSzaZq-Ppf=v1Q65_t6LAJxGbDF5_A@mail.gmail.com>
 <mhuc3o$2fm$1@ger.gmane.org> <55430E17.7030404@canterbury.ac.nz>
 <FF145B05-126C-4542-9348-5453B71CC9C4@yahoo.com>
Message-ID: <CAHVvXxTSGtrtF6oCH9wbBkO_OUC48zHFNrY7YW=FyYcFgyayzA@mail.gmail.com>

On 1 May 2015 at 12:24, Andrew Barnert via Python-ideas
<python-ideas at python.org> wrote:
> Anyway, if I understand the problem, the main confusion is that we use "coroutine" both to mean a thing that can be suspended and resumed, and a function that returns such a thing. Why not just "coroutine" and "coroutine function", just as with "generator" and "generator function".

That's the terminology in the asyncio docs I guess:
https://docs.python.org/3/library/asyncio-task.html#coroutine
... except that there it is referring to decorated generator functions.

That feels like a category error to me because coroutines are a
generalisation a functions so if anything is the coroutine itself then
it is the async def function rather than the object it returns but I
guess if that's what's already being used.

> If the issue is that there are other things that are coroutines besides the coroutine type... well, there are plenty of things that are iterators that are all of unrelated types, and has anyone ever been confused by that? (Of course people have been confused by iterator vs. iterable, but that's a different issue, and one that doesn't have a parallel here.)

There is no concrete "iterator" type. The use of iterator as a type is
explicitly intended to refer to a range of different types of objects
analogous to using an interface in Java.

The PEP proposes at the same time that the word coroutine should be
both a generic term for objects exposing a certain interface and also
the term for a specific language construct: the function resulting
from an async def statement.

So if I say that something is a "coroutine" it's really not clear what
that means. It could mean an an asyncio.coroutine generator function,
it could mean an async def function or it could mean both. Worse it
could mean the object returned by either of those types of functions.


--
Oscar

From stefan_ml at behnel.de  Fri May  1 13:58:29 2015
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 01 May 2015 13:58:29 +0200
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <CAP7+vJL7G_bN-KknPYbe-R-RZ6kGob1LtOXQWvb30AC8jjw=cA@mail.gmail.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info> <mhtjpb$r9$1@ger.gmane.org>
 <CACac1F9EH7apMRpwLMbnaZV0zpcpzT-b1hPk3M5QxG=Fvq9NbQ@mail.gmail.com>
 <CAFpSVpKrJWsBnXK44YaER8E8mK6E45vE08MZc81vOLGquw_TOw@mail.gmail.com>
 <CAP7+vJL7G_bN-KknPYbe-R-RZ6kGob1LtOXQWvb30AC8jjw=cA@mail.gmail.com>
Message-ID: <mhvpp5$lgc$1@ger.gmane.org>

Guido van Rossum schrieb am 30.04.2015 um 19:31:
> But 'async for' is not meant to introduce parallelism or concurrency.

Well, the fact that it's not *meant* for that doesn't mean you can't use it
for that. It allows an iterator (name it coroutine if you want) to suspend
and return control to the outer caller to wait for the next item. What the
caller does in order to get that item is completely up to itself. It could
be called "asyncio" and do some I/O in order to get data, but it can
equally well be a multi-threading setup that grabs data from a queue
connected to a pool of threads.

Granted, this implies an inversion of control in that it's the caller that
provides the thread-pool and not the user, but it's not like it's
unprecedented to work with a 'global' pool of pre-instantiated threads (or
processes, for that matter) in order to avoid startup overhead.

Stefan



From steve at pearwood.info  Fri May  1 14:22:22 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 1 May 2015 22:22:22 +1000
Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects
In-Reply-To: <CAHVvXxTSGtrtF6oCH9wbBkO_OUC48zHFNrY7YW=FyYcFgyayzA@mail.gmail.com>
References: <CAHVvXxTO6RBXZDu8aAzNWSzaZq-Ppf=v1Q65_t6LAJxGbDF5_A@mail.gmail.com>
 <mhuc3o$2fm$1@ger.gmane.org> <55430E17.7030404@canterbury.ac.nz>
 <FF145B05-126C-4542-9348-5453B71CC9C4@yahoo.com>
 <CAHVvXxTSGtrtF6oCH9wbBkO_OUC48zHFNrY7YW=FyYcFgyayzA@mail.gmail.com>
Message-ID: <20150501122222.GK5663@ando.pearwood.info>

On Fri, May 01, 2015 at 12:49:44PM +0100, Oscar Benjamin wrote:

> So if I say that something is a "coroutine" it's really not clear what
> that means. It could mean an an asyncio.coroutine generator function,
> it could mean an async def function or it could mean both. Worse it
> could mean the object returned by either of those types of functions.

I'm sympathetic to your concerns, and I raised a similar issue earlier.

But, it's not entirely without precedence. We already use "generator" to 
mean both the generator-function and the generator-iterator returned 
from the generator-function. We use "decorator" to mean both the 
function and the @ syntax. Sometimes we distinguish between classes and 
objects (instances), sometimes we say that classes are objects, and 
sometimes we say that classes are instances of the metaclass. "Method" 
can refer to either the function object inside a class or the method 
instance after the descriptor protocol has run.

And of course, once you start comparing terms from multiple languages, 
the whole thing just gets worse (contrast what Haskell considers a 
functor with what C++ considers a functor).

It's regretable when language is ambiguous, but sometimes a little bit 
of ambiguity is the lesser of the evils. Human beings are usually good 
at interpreting that given sufficient context and understanding.

If there is no good alternative to coroutine, we'll need some good 
documentation to disambiguate the meanings.



-- 
Steve

From guido at python.org  Fri May  1 17:08:06 2015
From: guido at python.org (Guido van Rossum)
Date: Fri, 1 May 2015 08:08:06 -0700
Subject: [Python-ideas] Add `Executor.filter`
In-Reply-To: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
Message-ID: <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>

Sounds like should be an easy patch. Of course, needs to work for
ProcessPoolExecutor too.

On Fri, May 1, 2015 at 1:12 AM, Ram Rachum <ram at rachum.com> wrote:

> Hi,
>
> What do you think about adding a method: `Executor.filter`?
>
> I was using something like this:
>
> my_things = [thing for thing in things if some_condition(thing)]
>
>
> But the problem was that `some_condition` took a long time to run waiting
> on I/O, which is a great candidate for parallelizing with
> ThreadPoolExecutor. I made it work using `Executor.map` and some
> improvizing, but it would be nicer if I could do:
>
> with concurrent.futures.ThreadPoolExecutor(100) as executor:
>     my_things = executor.filter(some_condition, things)
>
> And have the condition run in parallel on all the threads.
>
> What do you think?
>
>
> Thanks,
> Ram.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150501/7ab7e091/attachment.html>

From guido at python.org  Fri May  1 17:12:58 2015
From: guido at python.org (Guido van Rossum)
Date: Fri, 1 May 2015 08:12:58 -0700
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <A5101A7D-35FD-489E-9B1E-36A938291DC4@yahoo.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info> <mhtjpb$r9$1@ger.gmane.org>
 <CACac1F9EH7apMRpwLMbnaZV0zpcpzT-b1hPk3M5QxG=Fvq9NbQ@mail.gmail.com>
 <CAFpSVpKrJWsBnXK44YaER8E8mK6E45vE08MZc81vOLGquw_TOw@mail.gmail.com>
 <CAP7+vJL7G_bN-KknPYbe-R-RZ6kGob1LtOXQWvb30AC8jjw=cA@mail.gmail.com>
 <CACac1F-8cqYZ3ah73Bq4DRYf40qAzJkNVksLeWrRhqrZmp_jiw@mail.gmail.com>
 <A5101A7D-35FD-489E-9B1E-36A938291DC4@yahoo.com>
Message-ID: <CAP7+vJ+FMc9=E9OJ8yj28OiSvrcmwwSNhnFb4iPysDaZH95e=g@mail.gmail.com>

On Fri, May 1, 2015 at 4:13 AM, Andrew Barnert <abarnert at yahoo.com> wrote:

> IIRC, the original asyncio PEP has links to Greg Ewing's posts that
> demonstrated how you could use yield from coroutines for various purposes,
> including asynchronous I/O, but also things like many-actor simulations,
> with pretty detailed examples.


http://www.cosc.canterbury.ac.nz/greg.ewing/python/yield-from/yield_from.html

It has two small examples of *generator iterators* that can be nicely
refactored using yield-from (no need to switch to async there), but the
only meaty example using a trampoline is a scheduler for multiplexed I/O.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150501/7663ea5b/attachment.html>

From ram at rachum.com  Fri May  1 17:13:35 2015
From: ram at rachum.com (Ram Rachum)
Date: Fri, 1 May 2015 18:13:35 +0300
Subject: [Python-ideas] Add `Executor.filter`
In-Reply-To: <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
 <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
Message-ID: <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>

I envisioned it being implemented directly on `Executor`, so it'll
automatically apply to all executor types. (I'll be happy to write the
implementation if we have a general feeling that this is a desired feature.)

On Fri, May 1, 2015 at 6:08 PM, Guido van Rossum <guido at python.org> wrote:

> Sounds like should be an easy patch. Of course, needs to work for
> ProcessPoolExecutor too.
>
> On Fri, May 1, 2015 at 1:12 AM, Ram Rachum <ram at rachum.com> wrote:
>
>> Hi,
>>
>> What do you think about adding a method: `Executor.filter`?
>>
>> I was using something like this:
>>
>> my_things = [thing for thing in things if some_condition(thing)]
>>
>>
>> But the problem was that `some_condition` took a long time to run waiting
>> on I/O, which is a great candidate for parallelizing with
>> ThreadPoolExecutor. I made it work using `Executor.map` and some
>> improvizing, but it would be nicer if I could do:
>>
>> with concurrent.futures.ThreadPoolExecutor(100) as executor:
>>     my_things = executor.filter(some_condition, things)
>>
>> And have the condition run in parallel on all the threads.
>>
>> What do you think?
>>
>>
>> Thanks,
>> Ram.
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150501/24282b19/attachment.html>

From techtonik at gmail.com  Fri May  1 11:25:51 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 1 May 2015 12:25:51 +0300
Subject: [Python-ideas] Top 10 Python modules that need a redesign Was:
 Geo coordinates conversion in stdlib
In-Reply-To: <CAPTjJmqK3xhL-yi9F4e+B3zkORsp6rMFXQS=Y0byVeGXnjaG9A@mail.gmail.com>
References: <CAP7h-xY_YwX3jDNxpNi3LgH6f47EDAU1zeeUBjXzK4L2b2WN2Q@mail.gmail.com>
 <CAPkN8xLx9zZXn6P0X+PpJ8va5fK5coBcD3==iJ3esVkK+e4UyQ@mail.gmail.com>
 <CAP7h-xZihNbWvcFtH-GyNJNS2patEp-FSOZ3c5ChZxhKDazoSQ@mail.gmail.com>
 <CAPkN8xLTiaz7V_-TFXJVnJOp9SUbHAbq2kxk50+8MkbK4PD2BQ@mail.gmail.com>
 <CAP7h-xYp9tX+8xS5W9w3s5xcWjFcmmaj_q4_771ttCgrxh1zQg@mail.gmail.com>
 <CAPkN8x+KFjNE4zwy2bT+7h0GiYMFcuad+38rTd0GxfqqNniVdA@mail.gmail.com>
 <CAPTjJmqK3xhL-yi9F4e+B3zkORsp6rMFXQS=Y0byVeGXnjaG9A@mail.gmail.com>
Message-ID: <CAPkN8xLw7qG72K4-FUvSJ5sPm-menmG6+60=8ONiKmUnNUF6gA@mail.gmail.com>

On Sat, Apr 4, 2015 at 7:18 AM, Chris Angelico <rosuav at gmail.com> wrote:

> On Fri, Apr 3, 2015 at 7:33 PM, anatoly techtonik <techtonik at gmail.com>
> wrote:
> > Author is me, so you can ask directly. Why I didn't propose to redesign?
> > Because people will assume that somebody will need to write PEP and will
> > force me to write one. I don't believe in "redesign by specification"
> like
> > current PEP process assumes and people accuse me of being lazy and
> trolling
> > them, because I don't want to write the PEPs. Damn, I believe in
> iterative
> > development and evolution, and I failed to persuade coredevs that
> practices
> > digged up by people under the "agile" label is not some sort of corporate
> > bullshit. So it is not my problem now. I did all I am capable of.
>
> Why, exactly, is it that you don't want to author a PEP? Is it because
> you don't have the time to devote to chairing the discussion and all?
>

Don't have time and limited energy for such discussions. Switching to
discussion requires unloading all other information, remembering the
last point, tracking what people think. If you switch to discussion few
days later (because you don't have time) it needs more time to refresh
the data about the state. This is highly inefficient. Expanding on that
below..


> If so, you could quite possibly persuade someone else to. I'd be
> willing to take on the job; convince me that your core idea is worth
> pursuing (and make clear to me precisely what your core idea is), and
> I could do the grunt-work of writing. But you say that you "don't
> *believe in*" the process, which suggests a more philosophical
> objection. What's the issue, here? Why are you holding back from such
> a plan? *cue the troll music*
>

I don't believe in the process, right. I need data. How many people
actually read the PEPs through the end? How many say that they fully
support the PEP decision? How many people read the diffs after they've
read the PEP and can validate that none of their previous usage cases
were broken? I assume that None. That's my belief, but I'd be happy to
see that data that proves me wrong.

I also don't believe in the PEP process, because I can't even validate my
own usage cases using the layout of information proposed by the PEP.
PEP is a compression and optimization of the various usage cases
expressed in verbal form that is easy to implement, but not easy to
understand or argue about decisions. Especially about ones that seem
not-well-thought, because of the flawed process above.

I also have problems with reading specifications without diagrams and
with drawing concepts on a virtual canvas in my head. I also find that
some stuff in PEP is confusing, but there is no channel like StackOverflow
to ask question about design decisions. Maybe I am just a poor reader,
but that is the reality. I'd prefer cookbook to PEP approach.


> There are many Pythons in the world. You can't just hack on CPython
> and expect everything to follow on from there. Someone has to explain
> to the Jython folks what they'll have to do to be compatible. Someone
> has to write something up so MicroPython can run the same code that
> CPython does. Someone, somewhere, has to be able to ensure that
> Brython users aren't caught out by your proposed change. PEPs provide
> that. (They also provide useful pointers for the "What's New" lists,
> eg PEP 441.)
>
> So, are you proposing a change to Python? Then propose it.
>

The concept of "proposal" is completely fine. But the form is dated and
ineffective. And I can't deal with people who are afraid of new concepts
and can't see a rationale behind the buzzwords like agile, story, roadmap,
user experience. These are all the de-facto tools of the new generation,
and if somebody prefers to ride the steam engine, I don't mind, but me
personally don't have the lifetime to move so slow.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150501/4e58b2bb/attachment-0001.html>

From techtonik at gmail.com  Fri May  1 11:44:09 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 1 May 2015 12:44:09 +0300
Subject: [Python-ideas] Top 10 Python modules that need a redesign Was:
 Geo coordinates conversion in stdlib
In-Reply-To: <mfpaam$uak$1@ger.gmane.org>
References: <CAP7h-xY_YwX3jDNxpNi3LgH6f47EDAU1zeeUBjXzK4L2b2WN2Q@mail.gmail.com>
 <CAPkN8xLx9zZXn6P0X+PpJ8va5fK5coBcD3==iJ3esVkK+e4UyQ@mail.gmail.com>
 <CAP7h-xZihNbWvcFtH-GyNJNS2patEp-FSOZ3c5ChZxhKDazoSQ@mail.gmail.com>
 <CAPkN8xLTiaz7V_-TFXJVnJOp9SUbHAbq2kxk50+8MkbK4PD2BQ@mail.gmail.com>
 <CAP7h-xYp9tX+8xS5W9w3s5xcWjFcmmaj_q4_771ttCgrxh1zQg@mail.gmail.com>
 <CAPkN8x+KFjNE4zwy2bT+7h0GiYMFcuad+38rTd0GxfqqNniVdA@mail.gmail.com>
 <CAPTjJmqK3xhL-yi9F4e+B3zkORsp6rMFXQS=Y0byVeGXnjaG9A@mail.gmail.com>
 <mfpaam$uak$1@ger.gmane.org>
Message-ID: <CAPkN8xL0aRhzvKwk8RPZQZdV=WL2TGnCxHf4hBW5dX3THAxoAw@mail.gmail.com>

On Sat, Apr 4, 2015 at 9:25 PM, Mark Lawrence <breamoreboy at yahoo.co.uk>
wrote:

> On 04/04/2015 05:18, Chris Angelico wrote:
>
>> On Fri, Apr 3, 2015 at 7:33 PM, anatoly techtonik <techtonik at gmail.com>
>> wrote:
>>
>>> Author is me, so you can ask directly. Why I didn't propose to redesign?
>>> Because people will assume that somebody will need to write PEP and will
>>> force me to write one. I don't believe in "redesign by specification"
>>> like
>>> current PEP process assumes and people accuse me of being lazy and
>>> trolling
>>> them, because I don't want to write the PEPs. Damn, I believe in
>>> iterative
>>> development and evolution, and I failed to persuade coredevs that
>>> practices
>>> digged up by people under the "agile" label is not some sort of corporate
>>> bullshit. So it is not my problem now. I did all I am capable of.
>>>
>>
>> Why, exactly, is it that you don't want to author a PEP? Is it because
>> you don't have the time to devote to chairing the discussion and all?
>> If so, you could quite possibly persuade someone else to. I'd be
>> willing to take on the job; convince me that your core idea is worth
>> pursuing (and make clear to me precisely what your core idea is), and
>> I could do the grunt-work of writing. But you say that you "don't
>> *believe in*" the process, which suggests a more philosophical
>> objection. What's the issue, here? Why are you holding back from such
>> a plan? *cue the troll music*
>>
>> There are many Pythons in the world. You can't just hack on CPython
>> and expect everything to follow on from there. Someone has to explain
>> to the Jython folks what they'll have to do to be compatible. Someone
>> has to write something up so MicroPython can run the same code that
>> CPython does. Someone, somewhere, has to be able to ensure that
>> Brython users aren't caught out by your proposed change. PEPs provide
>> that. (They also provide useful pointers for the "What's New" lists,
>> eg PEP 441.)
>>
>> So, are you proposing a change to Python? Then propose it.
>>
>> ChrisA
>>
>>
> I don't understand why people bother with this gentleman.  All talk, no
> action, but expects others to do his bidding.  I would say "Please go take
> a running jump", but that would get me into trouble with the CoC
> aficionados, so I won't.
>

What action can I do if I point that CLA is invalid, and nobody answers to
my call? I don't agree that people are signing it without understanding the
content in detail, and I got banned for it. I sent a few patches to
tracker, but what's the point if people are afraid to apply even the doc
fixes. Instead of obeying the order of copyright lawyers from the paper
age, the role of any Internet Community is to understand and guard its own
interests and protect its way of doing things. Instead of that, the
community is just places a roadblock, because "lawyers know better".

Anti-offtopic. If you want to see, what I do, and want to enable some of
the big things that can come up in the future, please help resolve this
issue with Jinja2, Python 2 and setdefaultencoding utf-8 -
http://issues.roundup-tracker.org/issue2550811 - just as a core developer,
send us a patch that we should commit to enable Roundup work with Jinja2
again.

This a key to add "modules" field to tracker to track patches submitted to
different modules (using modstats.py from
https://bitbucket.org/techtonik/python-stdlib) and split the work for
different interested parties. This key lower the barrier to entry by
removing the need to learn XML and TAL stuff from designers who want to
experiment with Python tracker to add stuff, like marking modules that need
a redesign.
-- 
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150501/de2c349f/attachment.html>

From joseph.martinot-lagarde at m4x.org  Fri May  1 17:52:32 2015
From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde)
Date: Fri, 01 May 2015 17:52:32 +0200
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <20150501003551.GG5663@ando.pearwood.info>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info>
Message-ID: <5543A140.6010406@m4x.org>

Le 01/05/2015 02:35, Steven D'Aprano a ?crit :
>
> If you still wish to argue for this, one thing which may help your case
> is if you can identify other programming languages that have already
> done something similar.
>
>
Cython has prange. It replaces range() in the for loop but runs the loop 
body in parallel using openmp:

from cython.parallel import prange

cdef int func(Py_ssize_t n):
     cdef Py_ssize_t i

     for i in prange(n, nogil=True):
         if i == 8:
             with gil:
                 raise Exception()
         elif i == 4:
             break
         elif i == 2:
             return i

This is an example from the cython documentation: 
http://docs.cython.org/src/userguide/parallelism.html

Joseph


From guido at python.org  Fri May  1 18:56:04 2015
From: guido at python.org (Guido van Rossum)
Date: Fri, 1 May 2015 09:56:04 -0700
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <5543A140.6010406@m4x.org>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org>
Message-ID: <CAP7+vJJ1-3R4zy+fwifpRU0QnXnV_0Xuou-8UknTvLr_eAYORQ@mail.gmail.com>

On Fri, May 1, 2015 at 8:52 AM, Joseph Martinot-Lagarde <
joseph.martinot-lagarde at m4x.org> wrote:

> Le 01/05/2015 02:35, Steven D'Aprano a ?crit :
>
>>
>> If you still wish to argue for this, one thing which may help your case
>> is if you can identify other programming languages that have already
>> done something similar.
>>
>>
>>  Cython has prange. It replaces range() in the for loop but runs the loop
> body in parallel using openmp:
>
> from cython.parallel import prange
>
> cdef int func(Py_ssize_t n):
>     cdef Py_ssize_t i
>
>     for i in prange(n, nogil=True):
>         if i == 8:
>             with gil:
>                 raise Exception()
>         elif i == 4:
>             break
>         elif i == 2:
>             return i
>
> This is an example from the cython documentation:
> http://docs.cython.org/src/userguide/parallelism.html
>

Interesting. I'm trying to imagine how this could be implemented in CPython
by turning the for-loop body into a coroutine. It would be a complicated
transformation because of the interaction with local variables in the code
surrounding the for-loop. Perhaps the compiler could mark all such
variables as implicitly nonlocal. The Cython example also shows other
interesting issues -- what should return or break do?

In any case, I don't want this idea to distract the PEP 492 discussion --
it's a much thornier problem, and maybe coroutine concurrency isn't what we
should be after here -- the use cases here seem to be true (GIL-free)
parallelism. I'm imagining that pyparallel has already solved this (if it
has solved anything :-).

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150501/47b8762d/attachment.html>

From ron3200 at gmail.com  Fri May  1 19:03:47 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Fri, 01 May 2015 13:03:47 -0400
Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects
In-Reply-To: <55430E17.7030404@canterbury.ac.nz>
References: <CAHVvXxTO6RBXZDu8aAzNWSzaZq-Ppf=v1Q65_t6LAJxGbDF5_A@mail.gmail.com>
 <mhuc3o$2fm$1@ger.gmane.org> <55430E17.7030404@canterbury.ac.nz>
Message-ID: <mi0blk$22r$1@ger.gmane.org>



On 05/01/2015 01:24 AM, Greg Ewing wrote:
> Ron Adam wrote:
>
>> A waiter?
>> or awaiter?
>>
>> As in a-wait-ing an awaiter.
>
> The waiter would be the function executing the await
> operator, not the thing it's operating on.

> In a restaurant, waiters wait on customers. But calling
> an awaitable object a "customer" doesn't seem right
> at all.

Guido has been using awaitable over in python-dev.  Lets see how that works...

In a restaurant, a waiter serves food.  An awaitable is a specific kind of 
waiter..  One that may wait for other waiters to serve their customers 
(table) food before they serve your food, even though your order may have 
happened before another tables order was taken.  Each awaitable only serves 
one table, and never takes orders or serves food to any other table.

In a normal python restaurant without awaitables, each waiter must takes 
your order, and then serve your food, before any other waiter can take an 
order and serve it's customer food.

The consumer is the caller of the expression. We can think of restaurant 
tables as function frames. The "awaiter" keyword here just makes sure an 
awaiter is qualified to serve food in this async restaurant.  We don't want 
the Vikings serving food, do we. ;-)

Of course someone needs to get the tables filled.  That's where the maitre 
d' comes in.  He uses an "async for", or "async with", statement to fill 
all the tables with customers and keeps them happy.

That's not perfect, but I think it gets the general concepts correct and 
makes them easier to think about.  (At least for me.)


Cheers,
    Ron












From joseph.martinot-lagarde at m4x.org  Fri May  1 20:52:20 2015
From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde)
Date: Fri, 01 May 2015 20:52:20 +0200
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <CAP7+vJJ1-3R4zy+fwifpRU0QnXnV_0Xuou-8UknTvLr_eAYORQ@mail.gmail.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org>
 <CAP7+vJJ1-3R4zy+fwifpRU0QnXnV_0Xuou-8UknTvLr_eAYORQ@mail.gmail.com>
Message-ID: <5543CB64.8000802@m4x.org>

Le 01/05/2015 18:56, Guido van Rossum a ?crit :
> On Fri, May 1, 2015 at 8:52 AM, Joseph Martinot-Lagarde
> <joseph.martinot-lagarde at m4x.org
> <mailto:joseph.martinot-lagarde at m4x.org>> wrote:
>
>     Le 01/05/2015 02:35, Steven D'Aprano a ?crit :
>
>
>         If you still wish to argue for this, one thing which may help
>         your case
>         is if you can identify other programming languages that have already
>         done something similar.
>
>
>     Cython has prange. It replaces range() in the for loop but runs the
>     loop body in parallel using openmp:
>
>     from cython.parallel import prange
>
>     cdef int func(Py_ssize_t n):
>          cdef Py_ssize_t i
>
>          for i in prange(n, nogil=True):
>              if i == 8:
>                  with gil:
>                      raise Exception()
>              elif i == 4:
>                  break
>              elif i == 2:
>                  return i
>
>     This is an example from the cython documentation:
>     http://docs.cython.org/src/userguide/parallelism.html
>
>
> Interesting. I'm trying to imagine how this could be implemented in
> CPython by turning the for-loop body into a coroutine. It would be a
> complicated transformation because of the interaction with local
> variables in the code surrounding the for-loop. Perhaps the compiler
> could mark all such variables as implicitly nonlocal. The Cython example
> also shows other interesting issues -- what should return or break do?

About return and break in cython, there is a section in the documentation:

"For prange() this means that the loop body is skipped after the first 
break, return or exception for any subsequent iteration in any thread. 
It is undefined which value shall be returned if multiple different 
values may be returned, as the iterations are in no particular order."

>
> In any case, I don't want this idea to distract the PEP 492 discussion
> -- it's a much thornier problem, and maybe coroutine concurrency isn't
> what we should be after here -- the use cases here seem to be true
> (GIL-free) parallelism. I'm imagining that pyparallel has already solved
> this (if it has solved anything :-).
>
> --
> --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



From abarnert at yahoo.com  Sat May  2 00:39:21 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 1 May 2015 15:39:21 -0700
Subject: [Python-ideas] Add `Executor.filter`
In-Reply-To: <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
 <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
 <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>
Message-ID: <B48915FA-76CC-46FE-BC4A-05B56D15CF43@yahoo.com>

On May 1, 2015, at 08:13, Ram Rachum <ram at rachum.com> wrote:
> 
> I envisioned it being implemented directly on `Executor`, so it'll automatically apply to all executor types. (I'll be happy to write the implementation if we have a general feeling that this is a desired feature.)

I'd say just write it if you want it. If it turns out to be so trivial everyone decides it's unnecessary to add, you've only wasted 10 minutes. If it turns out to be tricky enough to take more time, that in itself will be a great argument that it should be added so users don't screw it up themselves. 

Plus, of course, even if it gets rejected, you'll have the code you want for your own project. :)

> 
>> On Fri, May 1, 2015 at 6:08 PM, Guido van Rossum <guido at python.org> wrote:
>> Sounds like should be an easy patch. Of course, needs to work for ProcessPoolExecutor too.
>> 
>>> On Fri, May 1, 2015 at 1:12 AM, Ram Rachum <ram at rachum.com> wrote:
>>> Hi,
>>> 
>>> What do you think about adding a method: `Executor.filter`?
>>> 
>>> I was using something like this: 
>>> 
>>> my_things = [thing for thing in things if some_condition(thing)]
>>> 
>>> But the problem was that `some_condition` took a long time to run waiting on I/O, which is a great candidate for parallelizing with ThreadPoolExecutor. I made it work using `Executor.map` and some improvizing, but it would be nicer if I could do:
>>> 
>>> with concurrent.futures.ThreadPoolExecutor(100) as executor:
>>>     my_things = executor.filter(some_condition, things)
>>> 
>>> And have the condition run in parallel on all the threads.
>>> 
>>> What do you think? 
>>> 
>>> 
>>> Thanks,
>>> Ram.
>>> 
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>> 
>> 
>> 
>> -- 
>> --Guido van Rossum (python.org/~guido)
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150501/c77e7199/attachment.html>

From abarnert at yahoo.com  Sat May  2 00:52:26 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 1 May 2015 15:52:26 -0700
Subject: [Python-ideas] PEP 492 terminology - (native) coroutine objects
In-Reply-To: <CAHVvXxTSGtrtF6oCH9wbBkO_OUC48zHFNrY7YW=FyYcFgyayzA@mail.gmail.com>
References: <CAHVvXxTO6RBXZDu8aAzNWSzaZq-Ppf=v1Q65_t6LAJxGbDF5_A@mail.gmail.com>
 <mhuc3o$2fm$1@ger.gmane.org> <55430E17.7030404@canterbury.ac.nz>
 <FF145B05-126C-4542-9348-5453B71CC9C4@yahoo.com>
 <CAHVvXxTSGtrtF6oCH9wbBkO_OUC48zHFNrY7YW=FyYcFgyayzA@mail.gmail.com>
Message-ID: <47F4D662-5A3C-4781-850A-3BACE6D65A5B@yahoo.com>

On May 1, 2015, at 04:49, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
> 
> On 1 May 2015 at 12:24, Andrew Barnert via Python-ideas
> <python-ideas at python.org> wrote:
>> Anyway, if I understand the problem, the main confusion is that we use "coroutine" both to mean a thing that can be suspended and resumed, and a function that returns such a thing. Why not just "coroutine" and "coroutine function", just as with "generator" and "generator function".
> 
> That's the terminology in the asyncio docs I guess:
> https://docs.python.org/3/library/asyncio-task.html#coroutine
> ... except that there it is referring to decorated generator functions.
> 
> That feels like a category error to me because coroutines are a
> generalisation a functions so if anything is the coroutine itself then
> it is the async def function rather than the object it returns but I
> guess if that's what's already being used.

I agree with this last point, and the "cofunction" terminology handled that better...

In practice, I don't think this kind of thing usually causes much of a problem. For example, when first learning Swift, you have to learn that an iterator isn't really an iterator, it's a generalized index, but within the first day you've already forgotten the issue and you're just using iterators. It's no worse than switching back and forth between Self and C++, which both have things that as reasonably accurately called "iterators" but nevertheless work completely differently.

But maybe the best thing to do here is look at the terminology used in the F# papers (which I think introduced the await/async idea), and then see if the same terminology is used in practice in more widespread languages like C# that borrowed the idea, and if so just go with that. Even if it's wrong, it'll be the same wrong that everyone else is learning, and if we don't have something clearly better...

>> If the issue is that there are other things that are coroutines besides the coroutine type... well, there are plenty of things that are iterators that are all of unrelated types, and has anyone ever been confused by that? (Of course people have been confused by iterator vs. iterable, but that's a different issue, and one that doesn't have a parallel here.)
> 
> There is no concrete "iterator" type. The use of iterator as a type is
> explicitly intended to refer to a range of different types of objects
> analogous to using an interface in Java.
> 
> The PEP proposes at the same time that the word coroutine should be
> both a generic term for objects exposing a certain interface and also
> the term for a specific language construct: the function resulting
> from an async def statement.
> 
> So if I say that something is a "coroutine" it's really not clear what
> that means. It could mean an an asyncio.coroutine generator function,
> it could mean an async def function or it could mean both. Worse it
> could mean the object returned by either of those types of functions.
> 
> 
> --
> Oscar
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From ram at rachum.com  Sat May  2 11:25:30 2015
From: ram at rachum.com (Ram Rachum)
Date: Sat, 2 May 2015 12:25:30 +0300
Subject: [Python-ideas] Add `Executor.filter`
In-Reply-To: <B48915FA-76CC-46FE-BC4A-05B56D15CF43@yahoo.com>
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
 <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
 <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>
 <B48915FA-76CC-46FE-BC4A-05B56D15CF43@yahoo.com>
Message-ID: <CANXboVagtQcE_NMjYMeHEO_xazBigRHPT9Uo1NNQUUybMcJFgg@mail.gmail.com>

Okay, I implemented it. Might be getting something wrong because I've never
worked with the internals of this module before. See attached file for a
demonstration, and here's the code for just the method:

    def filter(self, fn, iterable, timeout=None):

        if timeout is not None:
            end_time = timeout + time.time()

        items_and_futures = [
            (item, self.submit(fn, item)) for item in iterable
        ]

        # Yield must be hidden in closure so that the futures are submitted
        # before the first iterator value is required.
        def result_iterator():
            try:
                for item, future in items_and_futures:
                    if timeout is None:
                        result = future.result()
                    else:
                        result = future.result(end_time - time.time())
                    if result:
                        yield item
            finally:
                for _, future in items_and_futures:
                    future.cancel()
        return result_iterator()


On Sat, May 2, 2015 at 1:39 AM, Andrew Barnert <abarnert at yahoo.com> wrote:

> On May 1, 2015, at 08:13, Ram Rachum <ram at rachum.com> wrote:
>
> I envisioned it being implemented directly on `Executor`, so it'll
> automatically apply to all executor types. (I'll be happy to write the
> implementation if we have a general feeling that this is a desired feature.)
>
>
> I'd say just write it if you want it. If it turns out to be so trivial
> everyone decides it's unnecessary to add, you've only wasted 10 minutes. If
> it turns out to be tricky enough to take more time, that in itself will be
> a great argument that it should be added so users don't screw it up
> themselves.
>
> Plus, of course, even if it gets rejected, you'll have the code you want
> for your own project. :)
>
>
> On Fri, May 1, 2015 at 6:08 PM, Guido van Rossum <guido at python.org> wrote:
>
>> Sounds like should be an easy patch. Of course, needs to work for
>> ProcessPoolExecutor too.
>>
>> On Fri, May 1, 2015 at 1:12 AM, Ram Rachum <ram at rachum.com> wrote:
>>
>>> Hi,
>>>
>>> What do you think about adding a method: `Executor.filter`?
>>>
>>> I was using something like this:
>>>
>>> my_things = [thing for thing in things if some_condition(thing)]
>>>
>>>
>>> But the problem was that `some_condition` took a long time to run
>>> waiting on I/O, which is a great candidate for parallelizing with
>>> ThreadPoolExecutor. I made it work using `Executor.map` and some
>>> improvizing, but it would be nicer if I could do:
>>>
>>> with concurrent.futures.ThreadPoolExecutor(100) as executor:
>>>     my_things = executor.filter(some_condition, things)
>>>
>>> And have the condition run in parallel on all the threads.
>>>
>>> What do you think?
>>>
>>>
>>> Thanks,
>>> Ram.
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150502/b8b1d543/attachment-0001.html>
-------------- next part --------------
import concurrent.futures
import time
import requests

class NiceExecutorMixin:
    def filter(self, fn, iterable, timeout=None):
        
        if timeout is not None:
            end_time = timeout + time.time()

        items_and_futures = [
            (item, self.submit(fn, item)) for item in iterable
        ]

        # Yield must be hidden in closure so that the futures are submitted
        # before the first iterator value is required.
        def result_iterator():
            try:
                for item, future in items_and_futures:
                    if timeout is None:
                        result = future.result()
                    else:
                        result = future.result(end_time - time.time())
                    if result:
                        yield item
            finally:
                for _, future in items_and_futures:
                    future.cancel()
        return result_iterator()


        
class MyThreadPoolExecutor(NiceExecutorMixin,
                           concurrent.futures.ThreadPoolExecutor):
    pass

def has_wikipedia_page(name):
    response = requests.get(
        'http://en.wikipedia.org/wiki/%s' % name.replace(' ', '_')
    )
    return response.status_code == 200
    

if __name__ == '__main__':
    
    people = (
        'Barack Obama', 'Shimon Peres', 'Justin Bieber',
        'Some guy I saw on the street', 'Steve Buscemi',
        'My first-grade teacher', 'Gandhi'
    )
    people_who_have_wikipedia_pages = (
        'Barack Obama', 'Shimon Peres', 'Justin Bieber', 'Steve Buscemi',
        'Gandhi'
    )
    # assert tuple(filter(has_wikipedia_page, people_who_have_wikipedia_pages)) \
                                             # == people_who_have_wikipedia_pages
    with MyThreadPoolExecutor(100) as executor:
        executor_filter_result = tuple(
            executor.filter(has_wikipedia_page, people)
        )
        print(executor_filter_result)
        assert executor_filter_result == people_who_have_wikipedia_pages
        

From stephen at xemacs.org  Sat May  2 15:30:44 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 02 May 2015 22:30:44 +0900
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <CACac1F9EH7apMRpwLMbnaZV0zpcpzT-b1hPk3M5QxG=Fvq9NbQ@mail.gmail.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <mhtjpb$r9$1@ger.gmane.org>
 <CACac1F9EH7apMRpwLMbnaZV0zpcpzT-b1hPk3M5QxG=Fvq9NbQ@mail.gmail.com>
Message-ID: <87y4l7nm2j.fsf@uwakimon.sk.tsukuba.ac.jp>

Paul Moore writes:

 >     mypool for item in items:
 >         do_something_here
 >         do_something_else
 >         do_yet_another_thing
 > 
 > I'm assuming that's the OP's intention (it's certainly mine) is that
 > the "mypool for" loop works something like
 > 
 >     def _work(item):
 >         do_something_here
 >         do_something_else
 >         do_yet_another_thing
 >     for _ in mypool.map(_work, items):
 >         # Wait for the subprocesses
 >         pass

I would think that given a pool of processors, the pool's .map method
itself would implement the distribution.  In fact the Pool ABC would
probably provide several variations on the map method (eg, a mapreduce
implementation, a map-to-list implementation, and a map-is-generator
implementation depending on the treatment of results of the _work
computation (if any).

I don't see a need for syntax here.

Aside: Doesn't the "Wait for the subprocesses" belong outside the for
suite?

From abarnert at yahoo.com  Sat May  2 18:16:33 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 2 May 2015 09:16:33 -0700
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <5543A140.6010406@m4x.org>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org>
Message-ID: <D09D8B33-66B1-4870-A29E-CB07AD424DF1@yahoo.com>

On May 1, 2015, at 08:52, Joseph Martinot-Lagarde <joseph.martinot-lagarde at m4x.org> wrote:
> 
> Le 01/05/2015 02:35, Steven D'Aprano a ?crit :
>> 
>> If you still wish to argue for this, one thing which may help your case
>> is if you can identify other programming languages that have already
>> done something similar.
> Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp:

I think that's pretty good evidence that this proposal (I meant the syntax for loop modifiers, not "some way to do loops in parallel would be nice") isn't needed. What OpenMP has to do with loop modifier syntax, Cython can do with just a special iterator in normal Python syntax.

Of course that doesn't guarantee that something similar to prange could be built for Python 3.5's Pool, Executor, etc. types without changes, but if even if it can't, a change to the iterator protocol to make prange bulldable doesn't seem as disruptive as a change to the basic syntax of the for loop. (Unless there just is no reasonable change to the protocol that could work.)

> from cython.parallel import prange
> 
> cdef int func(Py_ssize_t n):
>    cdef Py_ssize_t i
> 
>    for i in prange(n, nogil=True):
>        if i == 8:
>            with gil:
>                raise Exception()
>        elif i == 4:
>            break
>        elif i == 2:
>            return i
> 
> This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html
> 
> Joseph
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From ron3200 at gmail.com  Sat May  2 18:45:17 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Sat, 02 May 2015 12:45:17 -0400
Subject: [Python-ideas] awaiting ... was Re: More general "for" loop handling
In-Reply-To: <5542D1E6.80307@gmail.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info> <5542CED2.6080108@gmail.com>
 <20150501010201.GL10248@stoneleaf.us> <5542D1E6.80307@gmail.com>
Message-ID: <mi2uuu$5mo$1@ger.gmane.org>



On 04/30/2015 09:07 PM, Yury Selivanov wrote:
> On 2015-04-30 9:02 PM, Ethan Furman wrote:
>> On 04/30, Yury Selivanov wrote:
>>> On 2015-04-30 8:35 PM, Steven D'Aprano wrote:
>>>> I don't think it guarantees ordering in the sense I'm referring to. It
>>>> guarantees that the returned result will be [f(a), f(b), f(c), ...] in
>>>> that order, but not that f(a) will be calculated before f(b), which is
>>>> calculated before f(c), ... and so on. That's the point of parallelism:
>>>> if f(a) takes a long time to complete, another worker may have completed
>>>> f(b) in the meantime.
>>> This is an *excellent* point.
>> So, PEP 492 asynch for also guarantees that the loop runs in order, one at
>> a time, with one loop finishing before the next one starts?
>>
>> *sigh*
>>
>> How disappointing.

> No.  Nothing prevents you from scheduling asynchronous
> parallel computation, or prefetching more data.  Since
> __anext__ is an awaitable you can do that.
>
> Steven's point is that Todd's proposal isn't that
> straightforward to apply.

Initialising several coroutines at once still doesn't seem clear/clean to 
me.  Or maybe I'm just not getting that part yet.

Here is what I would like. :-)

values = awaiting [awaitable, awaitable, ...]
a, b, ... = awaiting (awaitable, awaitable, ...)

This doesn't have the issues of order because a list of values is returned 
with the same order of the awaitables.  But the awaitables are scheduled in 
parallel.

A regular for loop could still do these in order, but would pause when it 
gets to a values that haven't returned/resolved yet.  That would probably 
be expected.

Awaiting sets would be different... they are unordered.  So we can use a 
set and get the items that become available as they become available...

     for x in awaiting {awaitable, awaitable, ...}:
         print(x)

x would print in an arbitrary order, but that would be what I would expect 
here. :-)

The body could have await calls in it, and so it could cooperate along with 
the awaiting set.  Of course if it's only a few statements, that probably 
wouldn't make much difference.

This seems like it's both explicit and simple to think about.  It also 
seems like it might not be that hard to do, I think most of the parts are 
already worked out.

One option is to allow await to work with iterables in this way.  But the 
awaiting keyword would make the code clearer and error messages nicer.

Cheers,
    Ron















From ron3200 at gmail.com  Sat May  2 20:12:51 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Sat, 02 May 2015 14:12:51 -0400
Subject: [Python-ideas] awaiting iterables
Message-ID: <mi3434$jig$1@ger.gmane.org>


(I had posted this in the "more general 'for' loop" thread, but this really 
is a different idea from that.)

Initialising several coroutines at once still doesn't seem clear/clean to me.

Here is what I would like.

values = awaiting [awaitable, awaitable, ...]
a, b, ... = awaiting (awaitable, awaitable, ...)

This doesn't have the issues of order because a list of values is returned 
with the same order of the awaitables.  But the awaitables are scheduled in 
parallel.

A regular for loop could still do these in order, but would pause when it 
gets to a values that hasn't returned/resolved yet.  That would probably be 
expected.

     for x in awaiting [awaitable, awaitable, ...]:
         print(x)

X is printed in the order of the awaitables.


Awaiting sets would be different... they are unordered.  So we could use a 
set and get the items that become available as they become available...

     for x in awaiting {awaitable, awaitable, ...}:
         print(x)

x would print in an arbitrary order, but that would be what I would expect 
here.

The body could have await calls in it, and so it could cooperate along with 
the awaiting set of awaitablers.  Of course if the for body is only a few 
statements, that probably wouldn't make much difference.

This seems like it's both explicit and simple to think about.  It also 
seems like it might not be that hard to do, I think most of the parts are 
already worked out.

One option is to allow await to work with iterables in this way.  But the 
awaiting keyword would make the code clearer and error messages nicer.

The last piece of the puzzle is how to specify the current coroutine 
manager/runner.

    import asyncio
    with asyncio.coroutine_loop():
        main()

That seems simple enough.  It prettty much abstracts out all the coroutine 
specific stuff to three keyword.  async, await, and awaiting.

Are async for and async with needed if we have awaiting?  Can they be 
impelented in terms of awaiting?


import asyncio

async def factorial(name, number):
     f = 1
     for i in range(2, number+1):
         print("Task %s: Compute factorial(%s)..." % (name, i))
         await yielding()
         f *= i
     print("Task %s: factorial(%s) = %s" % (name, number, f))

with asyncio.coroutine_loop():
     awaiting [
         factorial("A", 2),
         factorial("B", 3),
         factorial("C", 4)]


Compared to the example in asyncio docs...


import asyncio

@asyncio.coroutine
def factorial(name, number):
     f = 1
     for i in range(2, number+1):
         print("Task %s: Compute factorial(%s)..." % (name, i))
         yield from asyncio.sleep(1)
         f *= i
     print("Task %s: factorial(%s) = %s" % (name, number, f))

loop = asyncio.get_event_loop()
tasks = [
     asyncio.async(factorial("A", 2)),
     asyncio.async(factorial("B", 3)),
     asyncio.async(factorial("C", 4))]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()



Cheers,
    Ron


From piotr.jerzy.jurkiewicz at gmail.com  Sat May  2 23:24:58 2015
From: piotr.jerzy.jurkiewicz at gmail.com (Piotr Jurkiewicz)
Date: Sat, 02 May 2015 23:24:58 +0200
Subject: [Python-ideas] awaiting iterables
In-Reply-To: <mi3434$jig$1@ger.gmane.org>
References: <mi3434$jig$1@ger.gmane.org>
Message-ID: <554540AA.3080002@gmail.com>

There are three modes in which you can await multiple coroutines:
- iterate over results as they become ready
- await till all are done
- await till any is done

For example C# has helper functions WhenAll and WhenAny for that:

     await Task.WhenAll(tasks_list);
     await Task.WhenAny(tasks_list);

I can imagine the set of three functions being exposed to user to 
control waiting for multiple coroutines:

asynctools.as_done()  # returns asynchronous iterator for iterating over 
the results of coroutines as they complete

asynctools.all_done() # returns a future aggregating results from the 
given coroutine objects, which awaited returns list of results (like 
asyncio.gather())

asynctools.any_done() # returns a future, which awaited returns result 
of first completed coroutine

Example:

     from asynctools import as_done, all_done, any_done

     corobj0 = async_sql_query("SELECT...")
     corobj1 = async_memcached_get("someid")
     corobj2 = async_http_get("http://python.org")

     # ------------------------------------------------

     # Iterate over results as coroutines complete
     # using async iterator

     await for result in as_done([corobj0, corobj1, corobj2]):
         print(result)

     # ------------------------------------------------

     # Await for results of all coroutines
     # using async iterator

     results = []
     await for result in as_done([corobj0, corobj1, corobj2]):
         results.append(result)

     # or using shorthand all_done()

     results = await all_done([corobj0, corobj1, corobj2])

     # ------------------------------------------------

     # Await for a result of first completed coroutine
     # using async iterator

     await for result in as_done([corobj0, corobj1, corobj2]):
         first_result = result
         break

     # or using shorthand any_done()

     first_result = await any_done([corobj0, corobj1, corobj2])

Piotr

From guido at python.org  Sat May  2 23:29:59 2015
From: guido at python.org (Guido van Rossum)
Date: Sat, 2 May 2015 14:29:59 -0700
Subject: [Python-ideas] awaiting iterables
In-Reply-To: <554540AA.3080002@gmail.com>
References: <mi3434$jig$1@ger.gmane.org> <554540AA.3080002@gmail.com>
Message-ID: <CAP7+vJKDvR41WZxhOtWotXfac4rpihfa9j8K80XU3wkNa28j9g@mail.gmail.com>

The asyncio package already has this functionality; check out wait() (it
has various options), as_completed(), gather().

On Sat, May 2, 2015 at 2:24 PM, Piotr Jurkiewicz <
piotr.jerzy.jurkiewicz at gmail.com> wrote:

> There are three modes in which you can await multiple coroutines:
> - iterate over results as they become ready
> - await till all are done
> - await till any is done
>
> For example C# has helper functions WhenAll and WhenAny for that:
>
>     await Task.WhenAll(tasks_list);
>     await Task.WhenAny(tasks_list);
>
> I can imagine the set of three functions being exposed to user to control
> waiting for multiple coroutines:
>
> asynctools.as_done()  # returns asynchronous iterator for iterating over
> the results of coroutines as they complete
>
> asynctools.all_done() # returns a future aggregating results from the
> given coroutine objects, which awaited returns list of results (like
> asyncio.gather())
>
> asynctools.any_done() # returns a future, which awaited returns result of
> first completed coroutine
>
> Example:
>
>     from asynctools import as_done, all_done, any_done
>
>     corobj0 = async_sql_query("SELECT...")
>     corobj1 = async_memcached_get("someid")
>     corobj2 = async_http_get("http://python.org")
>
>     # ------------------------------------------------
>
>     # Iterate over results as coroutines complete
>     # using async iterator
>
>     await for result in as_done([corobj0, corobj1, corobj2]):
>         print(result)
>
>     # ------------------------------------------------
>
>     # Await for results of all coroutines
>     # using async iterator
>
>     results = []
>     await for result in as_done([corobj0, corobj1, corobj2]):
>         results.append(result)
>
>     # or using shorthand all_done()
>
>     results = await all_done([corobj0, corobj1, corobj2])
>
>     # ------------------------------------------------
>
>     # Await for a result of first completed coroutine
>     # using async iterator
>
>     await for result in as_done([corobj0, corobj1, corobj2]):
>         first_result = result
>         break
>
>     # or using shorthand any_done()
>
>     first_result = await any_done([corobj0, corobj1, corobj2])
>
> Piotr
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150502/f9f64b2f/attachment.html>

From piotr.jerzy.jurkiewicz at gmail.com  Sun May  3 00:18:05 2015
From: piotr.jerzy.jurkiewicz at gmail.com (Piotr Jurkiewicz)
Date: Sun, 03 May 2015 00:18:05 +0200
Subject: [Python-ideas] awaiting iterables
In-Reply-To: <CAP7+vJKDvR41WZxhOtWotXfac4rpihfa9j8K80XU3wkNa28j9g@mail.gmail.com>
References: <mi3434$jig$1@ger.gmane.org> <554540AA.3080002@gmail.com>
 <CAP7+vJKDvR41WZxhOtWotXfac4rpihfa9j8K80XU3wkNa28j9g@mail.gmail.com>
Message-ID: <55454D1D.3080500@gmail.com>

I know that. But the problem with wait() is that it returns Tasks, not 
their results directly. So user has to unpack them manually.

Furthermore, after introduction of `await`, its name will become 
problematic. It will reassembles `await` too much and can cause a 
confusion. Its usage would result in an awkward 'await wait()'.

There is a function gather(*coros_or_futures) which returns results list 
directly, like the function all_done() I proposed.

But there is no function gather_any(*coros_or_futures), to return just a 
result of the first done coroutine. (One can achieve it with 
wait(return_when=FIRST_COMPLETED) but as mentioned before, it does not 
return a result directly, so there is no symmetry with gather())

Function as_completed() returns indeed an iterator over the futures as 
they complete, but it is not compatible with the 'async for' protocol 
proposed in PEP 492. So new function has to be created anyway.

Therefore I deliberately placed these functions in a new asynctools 
module, not in the asyncio module: to emphasize that they are supposed 
to be used with the new-style coroutines, proposed in PEP 492.

I wanted to achieve simplicity (by returning results directly) and 
symmetry (all_done()/any_done()).

Piotr

On 2015-05-02 23:29, Guido van Rossum wrote:
> The asyncio package already has this functionality; check out wait() (it
> has various options), as_completed(), gather().

From guido at python.org  Sun May  3 02:27:51 2015
From: guido at python.org (Guido van Rossum)
Date: Sat, 2 May 2015 17:27:51 -0700
Subject: [Python-ideas] awaiting iterables
In-Reply-To: <55454D1D.3080500@gmail.com>
References: <mi3434$jig$1@ger.gmane.org> <554540AA.3080002@gmail.com>
 <CAP7+vJKDvR41WZxhOtWotXfac4rpihfa9j8K80XU3wkNa28j9g@mail.gmail.com>
 <55454D1D.3080500@gmail.com>
Message-ID: <CAP7+vJJJzhR5NK-dzQoC-Txcu32RVFrRFd0wY9ZEAYu0yF3LcQ@mail.gmail.com>

You can try to place these in a separate module, but in the end they still
depend on asyncio. You'll find out why when you try to implement any of
them. Don't dismiss the effort that went into asyncio too lightly.

On Sat, May 2, 2015 at 3:18 PM, Piotr Jurkiewicz <
piotr.jerzy.jurkiewicz at gmail.com> wrote:

> I know that. But the problem with wait() is that it returns Tasks, not
> their results directly. So user has to unpack them manually.
>
> Furthermore, after introduction of `await`, its name will become
> problematic. It will reassembles `await` too much and can cause a
> confusion. Its usage would result in an awkward 'await wait()'.
>
> There is a function gather(*coros_or_futures) which returns results list
> directly, like the function all_done() I proposed.
>
> But there is no function gather_any(*coros_or_futures), to return just a
> result of the first done coroutine. (One can achieve it with
> wait(return_when=FIRST_COMPLETED) but as mentioned before, it does not
> return a result directly, so there is no symmetry with gather())
>
> Function as_completed() returns indeed an iterator over the futures as
> they complete, but it is not compatible with the 'async for' protocol
> proposed in PEP 492. So new function has to be created anyway.
>
> Therefore I deliberately placed these functions in a new asynctools
> module, not in the asyncio module: to emphasize that they are supposed to
> be used with the new-style coroutines, proposed in PEP 492.
>
> I wanted to achieve simplicity (by returning results directly) and
> symmetry (all_done()/any_done()).
>
> Piotr
>
>
> On 2015-05-02 23:29, Guido van Rossum wrote:
>
>> The asyncio package already has this functionality; check out wait() (it
>> has various options), as_completed(), gather().
>>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150502/373be494/attachment.html>

From joseph.martinot-lagarde at m4x.org  Sun May  3 23:52:32 2015
From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde)
Date: Sun, 03 May 2015 23:52:32 +0200
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <D09D8B33-66B1-4870-A29E-CB07AD424DF1@yahoo.com>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org>
 <D09D8B33-66B1-4870-A29E-CB07AD424DF1@yahoo.com>
Message-ID: <554698A0.6030709@m4x.org>

Le 02/05/2015 18:16, Andrew Barnert via Python-ideas a ?crit :
> On May 1, 2015, at 08:52, Joseph Martinot-Lagarde <joseph.martinot-lagarde at m4x.org> wrote:
>>
>> Le 01/05/2015 02:35, Steven D'Aprano a ?crit :
>>>
>>> If you still wish to argue for this, one thing which may help your case
>>> is if you can identify other programming languages that have already
>>> done something similar.
>> Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp:
>
> I think that's pretty good evidence that this proposal (I meant the syntax for loop modifiers, not "some way to do loops in parallel would be nice") isn't needed. What OpenMP has to do with loop modifier syntax, Cython can do with just a special iterator in normal Python syntax.

Cython uses python syntax but the behavior is different. This is 
especially obvious seeing how break and return are managed, where the 
difference in not only in the iterator.

>
> Of course that doesn't guarantee that something similar to prange could be built for Python 3.5's Pool, Executor, etc. types without changes, but if even if it can't, a change to the iterator protocol to make prange bulldable doesn't seem as disruptive as a change to the basic syntax of the for loop. (Unless there just is no reasonable change to the protocol that could work.)
>
>> from cython.parallel import prange
>>
>> cdef int func(Py_ssize_t n):
>>     cdef Py_ssize_t i
>>
>>     for i in prange(n, nogil=True):
>>         if i == 8:
>>             with gil:
>>                 raise Exception()
>>         elif i == 4:
>>             break
>>         elif i == 2:
>>             return i
>>
>> This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html
>>
>> Joseph
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



From joseph.martinot-lagarde at m4x.org  Sun May  3 23:55:04 2015
From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde)
Date: Sun, 03 May 2015 23:55:04 +0200
Subject: [Python-ideas] More general "for" loop handling
In-Reply-To: <554698A0.6030709@m4x.org>
References: <CAFpSVpJ-6NjNS6-1T0Di7WFhNgNwPRo0CFp_o+H5WmWsLftchQ@mail.gmail.com>
 <20150430113644.GC5663@ando.pearwood.info>
 <CAFpSVp+7iNGpsN8kOaO1oGVPUqWcxROJDhWKYVRtH3Ozzyo9-Q@mail.gmail.com>
 <20150501003551.GG5663@ando.pearwood.info> <5543A140.6010406@m4x.org>
 <D09D8B33-66B1-4870-A29E-CB07AD424DF1@yahoo.com> <554698A0.6030709@m4x.org>
Message-ID: <55469938.6010501@m4x.org>

Le 03/05/2015 23:52, Joseph Martinot-Lagarde a ?crit :
> Le 02/05/2015 18:16, Andrew Barnert via Python-ideas a ?crit :
>> On May 1, 2015, at 08:52, Joseph Martinot-Lagarde
>> <joseph.martinot-lagarde at m4x.org> wrote:
>>>
>>> Le 01/05/2015 02:35, Steven D'Aprano a ?crit :
>>>>
>>>> If you still wish to argue for this, one thing which may help your case
>>>> is if you can identify other programming languages that have already
>>>> done something similar.
>>> Cython has prange. It replaces range() in the for loop but runs the
>>> loop body in parallel using openmp:
>>
>> I think that's pretty good evidence that this proposal (I meant the
>> syntax for loop modifiers, not "some way to do loops in parallel would
>> be nice") isn't needed. What OpenMP has to do with loop modifier
>> syntax, Cython can do with just a special iterator in normal Python
>> syntax.
>
> Cython uses python syntax but the behavior is different. This is
> especially obvious seeing how break and return are managed, where the
> difference in not only in the iterator.
>
Sorry, ignore my last email. I agree that no new *syntax* is needed.
>>
>> Of course that doesn't guarantee that something similar to prange
>> could be built for Python 3.5's Pool, Executor, etc. types without
>> changes, but if even if it can't, a change to the iterator protocol to
>> make prange bulldable doesn't seem as disruptive as a change to the
>> basic syntax of the for loop. (Unless there just is no reasonable
>> change to the protocol that could work.)
>>
>>> from cython.parallel import prange
>>>
>>> cdef int func(Py_ssize_t n):
>>>     cdef Py_ssize_t i
>>>
>>>     for i in prange(n, nogil=True):
>>>         if i == 8:
>>>             with gil:
>>>                 raise Exception()
>>>         elif i == 4:
>>>             break
>>>         elif i == 2:
>>>             return i
>>>
>>> This is an example from the cython documentation:
>>> http://docs.cython.org/src/userguide/parallelism.html
>>>
>>> Joseph
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



From storchaka at gmail.com  Mon May  4 10:15:47 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 04 May 2015 11:15:47 +0300
Subject: [Python-ideas] Processing surrogates in
Message-ID: <mi79rj$vl8$1@ger.gmane.org>

Surrogate characters (U+D800-U+DFFF) are not allowed in Unicode, but 
Python allows them in Unicode strings for different purposes.

1) To represent UTF-8, UTF-16 or UTF-32 encoded strings that contain 
surrogate characters. This data can came from other programs, including 
Python 2.

2) To represent undecodable bytes in ASCII-compatible encoding with the 
"surrogateescape" error handlers.

So surrogate characters can be obtained from "surrogateescape" or 
"surrogatepass" error handlers or created manually with chr() or %c. 
Some encodings (UTF-7, unicode-escape) also allows surrogate characters.

But on output the surrogate characters can cause fail.

In issue18814 proposed several functions to work with surrogate and 
astral characters. All these functions takes a string and returns a string.

* rehandle_surrogatepass(string, errors)

Handles surrogate characters (U+D800-U+DFFF) with specified error 
handler. E.g.

   rehandle_surrogatepass('?\udcba', 'strict') -> error
   rehandle_surrogatepass('?\udcba', 'ignore') -> '?'
   rehandle_surrogatepass('?\udcba', 'replace') -> '?\ufffd'
   rehandle_surrogatepass('?\udcba', 'backslashreplace') -> '?\\udcba'

* rehandle_surrogateescape(string, errors)

Handles non-ASCII bytes encoded with surrogate characters in range 
U+DC80-U+DCFF with specified error handler. Surrogate characters outside 
of range U+DC80-U+DCFF cause error. E.g.

   rehandle_surrogateescape('?\udcba', 'strict') -> error
   rehandle_surrogateescape('?\udcba', 'ignore') -> '?'
   rehandle_surrogateescape('?\udcba', 'replace') -> '?\ufffd'
   rehandle_surrogateescape('?\udcba', 'backslashreplace') -> '?\\xba'

* handle_astrals(string, errors)

Handles non-BMP characters (U+10000-U+10FFFF) with specified error 
handler. E.g.

   handle_astrals('?\U00012345', 'strict') -> error
   handle_astrals('?\U00012345', 'ignore') -> '?'
   handle_astrals('?\U00012345', 'replace') -> '?\ufffd'
   handle_astrals('?\U00012345', 'backslashreplace') -> '?\\U00012345'

* decompose_astrals(string)

Converts non-BMP characters (U+10000-U+10FFFF) to surrogate pairs. E.g.

   decompose_astrals('?\U00012345') -> '?\ud808\udf45'

* compose_surrogate_pairs(string)

Converts surrogate pairs to non-BMP characters. E.g.

   compose_surrogate_pairs('?\ud808\udf45') -> '?\U00012345'

Function names are preliminary and discussable! Location (currently the 
codecs module) is discussable. Interface is discussable.

These functions revive UnicodeTranslateError, not used currently (but 
handled with several error handlers).

Proposed patch provides Python implementation in the codecs module, but 
after discussion I'll provide much more efficient (O(1) in best case) C 
implementation.


From python at mrabarnett.plus.com  Mon May  4 20:18:32 2015
From: python at mrabarnett.plus.com (MRAB)
Date: Mon, 04 May 2015 19:18:32 +0100
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <mi79rj$vl8$1@ger.gmane.org>
References: <mi79rj$vl8$1@ger.gmane.org>
Message-ID: <5547B7F8.8070405@mrabarnett.plus.com>

On 2015-05-04 09:15, Serhiy Storchaka wrote:
> Surrogate characters (U+D800-U+DFFF) are not allowed in Unicode, but
> Python allows them in Unicode strings for different purposes.
>
> 1) To represent UTF-8, UTF-16 or UTF-32 encoded strings that contain
> surrogate characters. This data can came from other programs, including
> Python 2.
>
> 2) To represent undecodable bytes in ASCII-compatible encoding with the
> "surrogateescape" error handlers.
>
> So surrogate characters can be obtained from "surrogateescape" or
> "surrogatepass" error handlers or created manually with chr() or %c.
> Some encodings (UTF-7, unicode-escape) also allows surrogate characters.
>
> But on output the surrogate characters can cause fail.
>
> In issue18814 proposed several functions to work with surrogate and
> astral characters. All these functions takes a string and returns a string.
>
> * rehandle_surrogatepass(string, errors)
>
> Handles surrogate characters (U+D800-U+DFFF) with specified error
> handler. E.g.
>
>     rehandle_surrogatepass('?\udcba', 'strict') -> error
>     rehandle_surrogatepass('?\udcba', 'ignore') -> '?'
>     rehandle_surrogatepass('?\udcba', 'replace') -> '?\ufffd'
>     rehandle_surrogatepass('?\udcba', 'backslashreplace') -> '?\\udcba'
>
> * rehandle_surrogateescape(string, errors)
>
> Handles non-ASCII bytes encoded with surrogate characters in range
> U+DC80-U+DCFF with specified error handler. Surrogate characters outside
> of range U+DC80-U+DCFF cause error. E.g.
>
>     rehandle_surrogateescape('?\udcba', 'strict') -> error
>     rehandle_surrogateescape('?\udcba', 'ignore') -> '?'
>     rehandle_surrogateescape('?\udcba', 'replace') -> '?\ufffd'
>     rehandle_surrogateescape('?\udcba', 'backslashreplace') -> '?\\xba'
>
It looks like the first 3 are the same as rehandle_surrogatepass, so
couldn't they be merged somehow?

     handle_surrogates('?\udcba', 'strict') -> error
     handle_surrogates('?\udcba', 'ignore') -> '?'
     handle_surrogates('?\udcba', 'replace') -> '?\ufffd'
     handle_surrogates('?\udcba', 'backslashreplace') -> '?\\udcba'
     handle_surrogates('?\udcba', 'surrogatereplace') -> '?\\xba'

> * handle_astrals(string, errors)
>
> Handles non-BMP characters (U+10000-U+10FFFF) with specified error
> handler. E.g.
>
>     handle_astrals('?\U00012345', 'strict') -> error
>     handle_astrals('?\U00012345', 'ignore') -> '?'
>     handle_astrals('?\U00012345', 'replace') -> '?\ufffd'
>     handle_astrals('?\U00012345', 'backslashreplace') -> '?\\U00012345'
>
> * decompose_astrals(string)
>
> Converts non-BMP characters (U+10000-U+10FFFF) to surrogate pairs. E.g.
>
>     decompose_astrals('?\U00012345') -> '?\ud808\udf45'
>
> * compose_surrogate_pairs(string)
>
> Converts surrogate pairs to non-BMP characters. E.g.
>
>     compose_surrogate_pairs('?\ud808\udf45') -> '?\U00012345'
>
Perhaps this should be called "compose_astrals".

> Function names are preliminary and discussable! Location (currently the
> codecs module) is discussable. Interface is discussable.
>
> These functions revive UnicodeTranslateError, not used currently (but
> handled with several error handlers).
>
> Proposed patch provides Python implementation in the codecs module, but
> after discussion I'll provide much more efficient (O(1) in best case) C
> implementation.
>


From storchaka at gmail.com  Mon May  4 21:12:34 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 04 May 2015 22:12:34 +0300
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <5547B7F8.8070405@mrabarnett.plus.com>
References: <mi79rj$vl8$1@ger.gmane.org> <5547B7F8.8070405@mrabarnett.plus.com>
Message-ID: <mi8gb2$bkr$1@ger.gmane.org>

On 04.05.15 21:18, MRAB wrote:
> On 2015-05-04 09:15, Serhiy Storchaka wrote:
>> * rehandle_surrogatepass(string, errors)
>>
>> Handles surrogate characters (U+D800-U+DFFF) with specified error
>> handler. E.g.
>>
>>     rehandle_surrogatepass('?\udcba', 'strict') -> error
>>     rehandle_surrogatepass('?\udcba', 'ignore') -> '?'
>>     rehandle_surrogatepass('?\udcba', 'replace') -> '?\ufffd'
>>     rehandle_surrogatepass('?\udcba', 'backslashreplace') -> '?\\udcba'
>>
>> * rehandle_surrogateescape(string, errors)
>>
>> Handles non-ASCII bytes encoded with surrogate characters in range
>> U+DC80-U+DCFF with specified error handler. Surrogate characters outside
>> of range U+DC80-U+DCFF cause error. E.g.
>>
>>     rehandle_surrogateescape('?\udcba', 'strict') -> error
>>     rehandle_surrogateescape('?\udcba', 'ignore') -> '?'
>>     rehandle_surrogateescape('?\udcba', 'replace') -> '?\ufffd'
>>     rehandle_surrogateescape('?\udcba', 'backslashreplace') -> '?\\xba'
>>
> It looks like the first 3 are the same as rehandle_surrogatepass, so
> couldn't they be merged somehow?
>
>      handle_surrogates('?\udcba', 'strict') -> error
>      handle_surrogates('?\udcba', 'ignore') -> '?'
>      handle_surrogates('?\udcba', 'replace') -> '?\ufffd'
>      handle_surrogates('?\udcba', 'backslashreplace') -> '?\\udcba'
>      handle_surrogates('?\udcba', 'surrogatereplace') -> '?\\xba'

These functions work with arbitrary error handlers, that support 
UnicodeTranslateError (for rehandle_surrogatepass) or UnicodeDecodeError 
(for rehandle_surrogateescape). They behave differently for surrogate 
characters outside of range U+DC80-U+DCFF.
handle_surrogates() needs new error handler "surrogatereplace".

>> * compose_surrogate_pairs(string)
>>
>> Converts surrogate pairs to non-BMP characters. E.g.
>>
>>     compose_surrogate_pairs('?\ud808\udf45') -> '?\U00012345'
>>
> Perhaps this should be called "compose_astrals".

May be. Or "compose_non_bmp". I have no preferences and opened this 
topic mainly for bikeshedding names.



From stephen at xemacs.org  Mon May  4 23:21:30 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 05 May 2015 06:21:30 +0900
Subject: [Python-ideas]  Processing surrogates in
In-Reply-To: <mi79rj$vl8$1@ger.gmane.org>
References: <mi79rj$vl8$1@ger.gmane.org>
Message-ID: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>

Serhiy Storchaka writes:

 > In issue18814 proposed several functions to work with surrogate and
 > astral characters. All these functions takes a string and returns a
 > string.

What's the use case?  As far as I can see, in recent Python 3 PEP 393
is implemented, so non-BMP characters are represented as themselves,
not as surrogate pairs.  In a PEP 393-enabled Python, the only
surrogates should be those due to surrogateescape error handling on
input, and chr().  If you don't like the former, be careful about your
use of surrogateescape, and the latter is clearly a "consenting
adults" issue.

Also, you mention that such surrogate characters can be received as
input, which is true, but the standard codecs should already be
treating those as errors.

So as far as I can see, the existing codecs and error handlers already
can deal with any case I might run into in practice.

From storchaka at gmail.com  Mon May  4 23:57:56 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Tue, 05 May 2015 00:57:56 +0300
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <mi8q15$bbp$1@ger.gmane.org>

On 05.05.15 00:21, Stephen J. Turnbull wrote:
> Serhiy Storchaka writes:
>   > In issue18814 proposed several functions to work with surrogate and
>   > astral characters. All these functions takes a string and returns a
>   > string.
>
> What's the use case?  As far as I can see, in recent Python 3 PEP 393
> is implemented, so non-BMP characters are represented as themselves,
> not as surrogate pairs.  In a PEP 393-enabled Python, the only
> surrogates should be those due to surrogateescape error handling on
> input, and chr().  If you don't like the former, be careful about your
> use of surrogateescape, and the latter is clearly a "consenting
> adults" issue.

Use cases include programs that use tkinter (common build of Tcl/Tk 
don't accept non-BMP characters), email or wsgiref.

> Also, you mention that such surrogate characters can be received as
> input, which is true, but the standard codecs should already be
> treating those as errors.

Usually surrogate characters came from decoding with "surrogatepass" or 
"surrogateescape" error handlers. That is why Nick proposed names 
rehandle_surrogatepass and rehandle_surrogateescape.

> So as far as I can see, the existing codecs and error handlers already
> can deal with any case I might run into in practice.

See issue18814. It is not so easy to get desirable result. Perhaps the 
simplest and most efficient way is to use regular expressions, and it is 
used in Python implementations, but C implementation can be much more 
efficient.



From techtonik at gmail.com  Sat May  2 09:48:42 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Sat, 2 May 2015 09:48:42 +0200
Subject: [Python-ideas] Support 1.x notation in version specifiers
Message-ID: <CAPkN8x+1S5ePtg2wsmCVtpc=4RQXLhuDaPH9vwEPXUS=uWChZQ@mail.gmail.com>

pip team said they won't support setting limit for major version
of package being installed in the way below until it is supported
by PEP 440.

    pip install patch==1.x

The current way ==1.* conflicts with system shell expansion
and the other way is not known / not intuitive.

https://github.com/pypa/pip/issues/2737#issuecomment-97621684
-- 
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150502/23f383d1/attachment.html>

From phd at phdru.name  Tue May  5 10:21:57 2015
From: phd at phdru.name (Oleg Broytman)
Date: Tue, 5 May 2015 10:21:57 +0200
Subject: [Python-ideas] Support 1.x notation in version specifiers
In-Reply-To: <CAPkN8x+1S5ePtg2wsmCVtpc=4RQXLhuDaPH9vwEPXUS=uWChZQ@mail.gmail.com>
References: <CAPkN8x+1S5ePtg2wsmCVtpc=4RQXLhuDaPH9vwEPXUS=uWChZQ@mail.gmail.com>
Message-ID: <20150505082157.GA15195@phdru.name>

Hi!

On Sat, May 02, 2015 at 09:48:42AM +0200, anatoly techtonik <techtonik at gmail.com> wrote:
> pip team said they won't support setting limit for major version
> of package being installed in the way below until it is supported
> by PEP 440.
> 
>     pip install patch==1.x

   This syntax (1.x) is even less intuitive for me.

> The current way ==1.* conflicts with system shell expansion

   Other comparison operators (< and >) conflict with shell redirection.
And nobody cares because one can always quote shell metacharacters.

    pip install patch==1.\*
    pip install patch=='1.*'
    pip install 'patch==1.*'
    pip install 'patch>=1,<2'

> and the other way is not known / not intuitive.
> 
> https://github.com/pypa/pip/issues/2737#issuecomment-97621684
> -- 
> anatoly t.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From stephen at xemacs.org  Tue May  5 10:23:52 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 05 May 2015 17:23:52 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <mi8q15$bbp$1@ger.gmane.org>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
Message-ID: <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>

Serhiy Storchaka writes:

 > Use cases include programs that use tkinter (common build of Tcl/Tk 
 > don't accept non-BMP characters), email or wsgiref.

So, consider Tcl/Tk.  If you use it for input, no problem, it *can't*
produce non-BMP characters.  So you're using it for output.  If
knowing that your design involves tkinter, you deduce you must not
accept non-BMP characters on input, where's your problem?

And ... you looked twice at your proposal?  You have basically
reproduced the codec error handling API for .decode and .encode in a
bunch to str2str "rehandle" functions.  In other words, you need to
know as much to use "rehandle_*" properly as you do to use .decode and
.encode.  I do not see a win for the programmer who is mostly innocent
of encoding knowledge.  What you're going to see is what Ezio points
out in issue18814:

    With Python 2 I've seen lot of people blindingly trying .decode
    when .encode failed (and the other way around) whenever they were
    getting an UnicodeError[...].

    I'm afraid that confused developers will try to (mis)use redecode
    as a workaround to attempt to fix something that shouldn't be
    broken in the first place, without actually understanding what the
    real problem is.

If we apply these rehandle_* thumbs to the holes in the I18N dike,
it's just going to spring more leaks elsewhere.

 > See issue18814. It is not so easy to get desirable result.

That's because it is damn hard to get desirable results, end of story,
nothing to see here, move along, people, move along!  The only way
available to consistently get desirable results is a Swiftian "Modest
Proposal": euthanize all those miserable folks using non-UTF-8
encodings, and start the world over again.

Seriously, I see nothing in issue18814 except frustration.  There's no
plausible account of how these new functions are going to enable naive
programmers to get better results, just complaints that the current
situation is unbearable.  I can't speak to wsgiref, but in email I
think David is overly worried about efficiency: in most mail flows,
the occasional need to mess with surrogates is going to be far
overshadowed by spam/virus filtering and authentication (DKIM
signature verification and DMARC/DKIM/SPF DNS lookups) on pretty much
all real mailflows.

So this proposal merely amounts to reintroduction of the Python 2 str
confusion into Python 3.  It is dangerous *precisely because* the
current situation is so frustrating.  These functions will not be used
by "consenting adults", in most cases.  Those with sufficient
knowledge for "informed consent" also know enough to decode encoded
text ASAP, and encode internal text ALAP, with appropriate handlers,
in the first place.

Rather, these str2str functions will be used by programmers at the
ends of their ropes desperate to suppress "those damned Unicode
errors" by any means available.  In fact, they are most likely to be
used and recommended by *library* writers, because they're the ones
who are least like to have control over input, or to know their
clients' requirements for output.  "Just use rehandle_* to ameliorate
the errors" is going to be far too tempting for them to resist.

That Nick, of all people, supports this proposal is to me just
confirmation that it's frustration, and only frustration, speaking
here.  He used to be one of the strongest supporters of keeping
"native text" (Unicode) and "encoded text" separate by keeping the
latter in bytes.


From rosuav at gmail.com  Tue May  5 11:17:33 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 5 May 2015 19:17:33 +1000
Subject: [Python-ideas] Support 1.x notation in version specifiers
In-Reply-To: <20150505082157.GA15195@phdru.name>
References: <CAPkN8x+1S5ePtg2wsmCVtpc=4RQXLhuDaPH9vwEPXUS=uWChZQ@mail.gmail.com>
 <20150505082157.GA15195@phdru.name>
Message-ID: <CAPTjJmq9z20+ev6e3YYgbQnno+qjT_jLyXT8vEfSfTwLUuNYoA@mail.gmail.com>

On Tue, May 5, 2015 at 6:21 PM, Oleg Broytman <phd at phdru.name> wrote:
>> The current way ==1.* conflicts with system shell expansion
>
>    Other comparison operators (< and >) conflict with shell redirection.
> And nobody cares because one can always quote shell metacharacters.
>
>     pip install patch==1.\*
>     pip install patch=='1.*'
>     pip install 'patch==1.*'
>     pip install 'patch>=1,<2'

Plus, you can stick anything you like into a requirements.txt and
simply 'pip install -r requirements.txt'. That's a safe option - not
least since it lets you manage your dependencies in source control.

ChrisA

From abarnert at yahoo.com  Tue May  5 11:56:34 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 5 May 2015 02:56:34 -0700
Subject: [Python-ideas] Support 1.x notation in version specifiers
In-Reply-To: <CAPkN8x+1S5ePtg2wsmCVtpc=4RQXLhuDaPH9vwEPXUS=uWChZQ@mail.gmail.com>
References: <CAPkN8x+1S5ePtg2wsmCVtpc=4RQXLhuDaPH9vwEPXUS=uWChZQ@mail.gmail.com>
Message-ID: <0B613961-E396-4BDF-8ED8-55D1E70B5426@yahoo.com>

On May 2, 2015, at 00:48, anatoly techtonik <techtonik at gmail.com> wrote:
> 
> pip team said they won't support setting limit for major version
> of package being installed in the way below until it is supported
> by PEP 440.

I think that's misrepresenting them. They explained why it isn't needed, and threw in an "anyway, it's not up to us"; they didn't say "sounds like a good idea, but you have to fix the PEP first".

Also, if you can't use pip 6.0 or later to take advantage of the already-working syntax that they recommended you use, how would you be able to use your new syntax even if it did get added?

>     pip install patch==1.x
> 
> The current way ==1.* conflicts with system shell expansion
> and the other way is not known / not intuitive.
> 
> https://github.com/pypa/pip/issues/2737#issuecomment-97621684
> -- 
> anatoly t.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150505/9cb5b64d/attachment.html>

From abarnert at yahoo.com  Tue May  5 12:00:53 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 5 May 2015 03:00:53 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com>

On May 5, 2015, at 01:23, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> 
> Serhiy Storchaka writes:
> 
>> Use cases include programs that use tkinter (common build of Tcl/Tk 
>> don't accept non-BMP characters), email or wsgiref.
> 
> So, consider Tcl/Tk.  If you use it for input, no problem, it *can't*
> produce non-BMP characters.  So you're using it for output.  If
> knowing that your design involves tkinter, you deduce you must not
> accept non-BMP characters on input, where's your problem?

The real issue with tkinter (and similar cases that can't handle BMP) is that they're actually UCS-2, and we paper over that by pretending the interface is Unicode. Maybe it would be better to wrap the low-level interfaces in `bytes` rather than `str` and put an explicit `.encode('UCS-2')` in the higher-level interfaces (or even in user code?) to make the problem obvious and debuggable rather than just pretending the problem doesn't exist?

(I'm not sure if we actually have a UCS-2 codec, but if not, it's trivial to write--it's just UTF-16 without surrogates.)

> And ... you looked twice at your proposal?  You have basically
> reproduced the codec error handling API for .decode and .encode in a
> bunch to str2str "rehandle" functions.  In other words, you need to
> know as much to use "rehandle_*" properly as you do to use .decode and
> .encode.  I do not see a win for the programmer who is mostly innocent
> of encoding knowledge.  What you're going to see is what Ezio points
> out in issue18814:
> 
>    With Python 2 I've seen lot of people blindingly trying .decode
>    when .encode failed (and the other way around) whenever they were
>    getting an UnicodeError[...].
> 
>    I'm afraid that confused developers will try to (mis)use redecode
>    as a workaround to attempt to fix something that shouldn't be
>    broken in the first place, without actually understanding what the
>    real problem is.
> 
> If we apply these rehandle_* thumbs to the holes in the I18N dike,
> it's just going to spring more leaks elsewhere.
> 
>> See issue18814. It is not so easy to get desirable result.
> 
> That's because it is damn hard to get desirable results, end of story,
> nothing to see here, move along, people, move along!  The only way
> available to consistently get desirable results is a Swiftian "Modest
> Proposal": euthanize all those miserable folks using non-UTF-8
> encodings, and start the world over again.
> 
> Seriously, I see nothing in issue18814 except frustration.  There's no
> plausible account of how these new functions are going to enable naive
> programmers to get better results, just complaints that the current
> situation is unbearable.  I can't speak to wsgiref, but in email I
> think David is overly worried about efficiency: in most mail flows,
> the occasional need to mess with surrogates is going to be far
> overshadowed by spam/virus filtering and authentication (DKIM
> signature verification and DMARC/DKIM/SPF DNS lookups) on pretty much
> all real mailflows.
> 
> So this proposal merely amounts to reintroduction of the Python 2 str
> confusion into Python 3.  It is dangerous *precisely because* the
> current situation is so frustrating.  These functions will not be used
> by "consenting adults", in most cases.  Those with sufficient
> knowledge for "informed consent" also know enough to decode encoded
> text ASAP, and encode internal text ALAP, with appropriate handlers,
> in the first place.
> 
> Rather, these str2str functions will be used by programmers at the
> ends of their ropes desperate to suppress "those damned Unicode
> errors" by any means available.  In fact, they are most likely to be
> used and recommended by *library* writers, because they're the ones
> who are least like to have control over input, or to know their
> clients' requirements for output.  "Just use rehandle_* to ameliorate
> the errors" is going to be far too tempting for them to resist.
> 
> That Nick, of all people, supports this proposal is to me just
> confirmation that it's frustration, and only frustration, speaking
> here.  He used to be one of the strongest supporters of keeping
> "native text" (Unicode) and "encoded text" separate by keeping the
> latter in bytes.
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From stephen at xemacs.org  Tue May  5 12:46:41 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 05 May 2015 19:46:41 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com>
Message-ID: <87zj5j47zi.fsf@uwakimon.sk.tsukuba.ac.jp>

Andrew Barnert writes:

 > (I'm not sure if we actually have a UCS-2 codec, but if not, it's
 > trivial to write--it's just UTF-16 without surrogates.)

The PEP 393 machinery knows when astral characters are introduced
because it has to widen the representation.  That might be a more
convenient place to raise an exception on non-BMP characters.


From koos.zevenhoven at aalto.fi  Tue May  5 15:55:56 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Tue, 5 May 2015 16:55:56 +0300
Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?)
Message-ID: <5548CBEC.3000303@aalto.fi>

Hi all!

I am excited about seeing what's going on with asyncio and PEP492 etc. I 
really like that Python is becoming more suitable for the increasing 
amount of async code and that the distinction between async functions 
and generators is increasing.

In addition, however, I would also like to see the async functions and 
methods come even closer to regular functions and methods. This is 
something that is keeping me from using asyncio at the moment even if I 
would like to. Below I'll try to explain what and why, and a little bit 
of how. If it is not clear, please ask :)

Motivation:

One of the best things about asyncio and coroutines/async functions is 
that you can write asynchronous code as if it were synchronous, the 
difference in many places being just the use of "await" ("yield from") 
when calling something that may end up doing IO (somewhere down the 
function call chain) and that the code is run from an event loop.

When writing a package that does IO, you have the option to make it 
either synchronous or asynchronous. Regardless of the choice, the code 
will look roughly the same. But what if you want to be able to do both? 
Should you maintain two versions, one with "async" and "await" 
everywhere and one without?

Besides the keywords "async" and "await", async code of course differs 
from synchronous code by the functions/coroutines that are used for IO 
at the end of the function call chain. Here, I mean the end (close to) 
where the "yield" expressions are hidden in the async versions. At the 
other end of the calling chain, async code needs the event loop and 
associated framework (almost always asyncio?) which hides all the async 
scheduling fanciness etc. I'm not sure about the terminology, but I will 
use "L end" and "Y end" to refer to the two ends here. (L for event 
Loop; Y for Yield)

The Y and L ends need to be compatible with each other for the code to 
work. While asyncio and the standard library might provide both ends in 
many cases, there can also be situations where a package would want to 
work with different combinations of L and Y end, or completely without 
an event loop, i.e. synchronously.

In a very simple example, one might want to wrap different 
implementations of sleep() in a function that would pick the right one 
depending on the context. Perhaps something like this:

  async def any_sleep(seconds):
      if __async__.framework is None:
          time.sleep(1)
      elif __async__.framework is asyncio:
          await asyncio.sleep(1)
      else:
          raise RuntimeError("Was called with an unsupported async 
framework.")

[You could of course replace sleep() with socket IO or whatever, but 
sleep is nice and simple. Also, a larger library would probably have a 
whole chain of async functions and methods before calling something like 
this]

But if await is only allowed inside "async def", then how can 
any_sleep() be conveniently run in non-async code? Also, there is 
nothing like __async__.framework. Below, I describe what I think a 
potential solution might look like.



Potential solution:

This is simplified version; for instance, as "awaitables", I consider 
only async function objects here. I describe the idea in three parts:

(1) next(...):

Add a keyword argument "async_framework" (or whatever) to next(...) with 
a default value of None. When an async framework, typically asyncio, 
starts an async function object (coroutine) with a call to next(...), it 
would do something like next(coro, async_framework = asyncio). Here, 
asyncio could of course be replaced with any object that identifies the 
framework. This information would then be somehow attached to the async 
function object.


(2) __async__.framework or something similar:

Add something like __async__ that has an attribute such as .framework 
that allows the code inside the async function to access the information 
passed to next(...) by the framework (L end) using the keyword argument 
of next [see (1)].

(3) Generalized "await":

[When the world is ready:] Allow using "await" anywhere, not just within 
async functions. Inside async functions, the behavior of "await" would 
be the same as in PEP492, with the addition that it would somehow 
propagate the __async__.framework value to the awaited coroutine. 
Outside async functions, "await" would do roughly the same as this function:

  def await(async_func_obj):
      try:
          next(async_func_obj)   # same as next(async_func_obj, 
async_framework = None)
      except StopIteration as si:
          return si.value
      raise RuntimeError("The function does not support synchronous 
execution")

(This function would, of course, work in Python 3.4, but it would be 
mostly useless because the async functions would not know that they are 
being called in a 'synchronous program'. IIUC, this *function* would be 
valid even with PEP492, but having this as a function would be ugly in 
the long run.)


Some random thoughts:

With this addition to Python, one could write libraries that work both 
async and non-async. When await is not inside async def, one would 
expect it to potentially do blocking IO, just like an await inside async 
def would suggest that there is a yield/suspend somewhere in there.

For testing, I tried to see if there is a reasonable way to make a hack 
with __async__.framework that could be set by next(), but did not find 
an obvious way. For instance, coro.gi_frame.f_locals is read-only, I 
believe.

An alternative to this approach could be that await would implicitly 
start a temporary event loop for running the coroutine, but how would it 
know which event loop? This might also have a huge performance overhead.

Relation to PEP492:

This of course still needs more thinking, but I wanted to post it here 
now in case there is desire to prepare for something like this already 
in PEP492. It is not completely clear if/how this would need to affect 
PEP492, but some things come to mind. For example, this could 
potentially remove the need for __aenter__, __aiter__, etc. or even 
"async for" and "async with". If __aenter__ is defined as "async def", 
then a with statement would do an "await" on it, and the context manager 
would have __async__.framework (or whatever it would be called) 
available, for determining what behavior is appropriate.

Was this clear enough to understand which problem(s) this would be 
solving and how? I'd be happy to hear about any thoughts on this :).


Best regards,
Koos


From gmludo at gmail.com  Tue May  5 16:57:36 2015
From: gmludo at gmail.com (Ludovic Gasc)
Date: Tue, 5 May 2015 16:57:36 +0200
Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?)
In-Reply-To: <5548CBEC.3000303@aalto.fi>
References: <5548CBEC.3000303@aalto.fi>
Message-ID: <CAON-fpHD630jYDyYtA3YFFWNBNf1+3H6-k3wmk8cUW36F1VYTg@mail.gmail.com>

Hi Koos,

2015-05-05 15:55 GMT+02:00 Koos Zevenhoven <koos.zevenhoven at aalto.fi>:

> With this addition to Python, one could write libraries that work both
> async and non-async. When await is not inside async def, one would expect
> it to potentially do blocking IO, just like an await inside async def would
> suggest that there is a yield/suspend somewhere in there.



To be honest with you, I'd this type of ideas back in my mind, but for now,
I've no suggestion to avoid end-developer nor low-developer nightmares.

For example, we may detect if it's async or not if you have: result = await
response.payload() or result = response.payload()
The issue I see with that and certainly already explained during PEP492
discussions, is that it will be difficult for the developer to spot where
he is forgotten await keyword, because he won't have errors.

Moreover, in the use cases where async is less efficient that sync, it
should be interesting to be possible, maybe with a context manager to
define a block of code where all await are in fact sync (without to use
event loop). But, even if a talentuous low-developper find a solution to
implement this idea, because I'm not sure it's technically possible, in
fact it will more easier even for end-developers to use the sync library
version of this need.

FYI, I've made an yocto library for my company where I need to be sync for
some use cases and async for some other use cases.
For the sync and async public API where the business logic behind most
functions are identical, I've followed the same pattern as in Python-LDAP:
http://www.python-ldap.org/doc/html/ldap.html#sending-ldap-requests
I've postfixed all sync functions by "_s".

For a more complex library, it may possible to have two differents classes
with explicit names.

At least to me, it's enough to work efficiently, explicit is better than
implicit ;-)
--
Ludovic Gasc (GMLudo)
http://www.gmludo.eu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150505/9ba68c7d/attachment-0001.html>

From koos.zevenhoven at aalto.fi  Tue May  5 17:49:47 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Tue, 5 May 2015 18:49:47 +0300
Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?)
In-Reply-To: <CAON-fpHD630jYDyYtA3YFFWNBNf1+3H6-k3wmk8cUW36F1VYTg@mail.gmail.com>
References: <5548CBEC.3000303@aalto.fi>
 <CAON-fpHD630jYDyYtA3YFFWNBNf1+3H6-k3wmk8cUW36F1VYTg@mail.gmail.com>
Message-ID: <5548E69B.3000902@aalto.fi>

On 2015-05-05 17:57, Ludovic Gasc wrote:
>
> For example, we may detect if it's async or not if you have: result = 
> await response.payload() or result = response.payload()
> The issue I see with that and certainly already explained 
> during PEP492 discussions, is that it will be difficult for the 
> developer to spot where he is forgotten await keyword, because he 
> won't have errors.
>

Thank you for your email!

I've been following quite a bit of the PEP492 discussions, but not sure 
if I have missed something. If there is something about await outside 
async def that goes further than "It is a SyntaxError to use await 
outside of an async def function (like it is a SyntaxError to use yield 
outside of def function.)", which is directly from the PEP, I've missed 
that. A link or pointer would be helpful.

In any case, I think I understand the problem you are referring to, but 
is that any different from forgetting a postfix "_s" in the approach you 
mention below?

> Moreover, in the use cases where async is less efficient that sync, it 
> should be interesting to be possible, maybe with a context manager to 
> define a block of code where all await are in fact sync (without to 
> use event loop). But, even if a talentuous low-developper find a 
> solution to implement this idea, because I'm not sure it's technically 
> possible, in fact it will more easier even for end-developers to use 
> the sync library version of this need.

Surely that is possible, although may of course be hard to implement :). 
I think this is related to this earlier suggestion by Joshua Bartlett 
(which I do like):

https://mail.python.org/pipermail/python-ideas/2013-January/018519.html

However, I don't think it solves *this* problem. It would just become a 
more verbose version of what I suggested.

>
> FYI, I've made an yocto library for my company where I need to be sync 
> for some use cases and async for some other use cases.
> For the sync and async public API where the business logic behind most 
> functions are identical, I've followed the same pattern as in 
> Python-LDAP: 
> http://www.python-ldap.org/doc/html/ldap.html#sending-ldap-requests
> I've postfixed all sync functions by "_s".
>
> For a more complex library, it may possible to have two differents 
> classes with explicit names.
>
> At least to me, it's enough to work efficiently, explicit is better 
> than implicit ;-)
>

In my mind, this is not at all about explicit vs. implicit. It is mostly 
about letting the coroutines know what kind of context they are being 
run from. Anyway, I'm pretty sure there are plenty of people in the 
Python community who don't think efficiency is enough, but that is a 
matter of personal preference. I want everything, and that's why I'm 
using Python ;).

-- Koos

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150505/8990546d/attachment.html>

From steve at pearwood.info  Tue May  5 19:28:46 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 6 May 2015 03:28:46 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <mi79rj$vl8$1@ger.gmane.org>
References: <mi79rj$vl8$1@ger.gmane.org>
Message-ID: <20150505172845.GF5663@ando.pearwood.info>

On Mon, May 04, 2015 at 11:15:47AM +0300, Serhiy Storchaka wrote:
> Surrogate characters (U+D800-U+DFFF) are not allowed in Unicode, but 
> Python allows them in Unicode strings for different purposes.
> 
> 1) To represent UTF-8, UTF-16 or UTF-32 encoded strings that contain 
> surrogate characters. This data can came from other programs, including 
> Python 2.

Can you give a simple example of a Python 2 program that provides output 
that Python 3 will read as surrogates?



> 2) To represent undecodable bytes in ASCII-compatible encoding with the 
> "surrogateescape" error handlers.
> 
> So surrogate characters can be obtained from "surrogateescape" or 
> "surrogatepass" error handlers or created manually with chr() or %c. 
>
> Some encodings (UTF-7, unicode-escape) also allows surrogate characters.

Also UTF-16, and possible others. 

I'm not entirely sure, but I think that this is a mistake, if not a 
bug. I think that *no* UTF encoding should allow lone surrogates to 
escape through encoding. But I not entirely sure, so I won't argue that 
now -- besides, it's irrelevant to the proposal.



> But on output the surrogate characters can cause fail.

What do you mean by "on output"? Do you mean when printing?


> In issue18814 proposed several functions to work with surrogate and 
> astral characters. All these functions takes a string and returns a string.

I like the idea of having better surrogate and astral character 
handling, but I don't think I like your suggested API of using functions 
for this. I think this is better handled as str-to-str codecs.

Unfortunately, there is still no concensus of the much-debated return of 
str-to-str and byte-to-byte codecs via the str.encode and byte.decode 
methods. At one point people were talking about adding a separate method 
(transform?) to handle them, but that seems to have been forgotten. 
Fortunately the codecs module handles them just fine:

py> codecs.encode("Hello world", "rot-13")
'Uryyb jbeyq'


I propose, instead of your function/method rehandle_surrogatepass(), we 
add a pair of str-to-str codecs:

codecs.encode(mystring, 'remove_surrogates', errors='strict')
codecs.encode(mystring, 'remove_astrals', errors='strict')

For the first one, if the string has no surrogates, it returns the 
string unchanged. If it contains any surrogates, the error handler runs 
in the usual fashion.

The second is exactly the same, except it checks for astral characters.

For the avoidance of doubt:

* surrogates are code points in the range U+D800 to U+DFFF inclusive;

* astrals are characters from the Supplementary Multilingual Planes, 
  that is code points U+10000 and above.


Advantage of using codecs:

- there's no arguments about where to put it (is it a str method? a 
  function? in the string module? some other module? where?)

- we can use the usual codec machinery, rather than duplicate it;

- people already understand that codecs and error handles go together;

Disadvantage:

- have to use codec.encode instead of str.encode.


It is slightly sad that there is still no entirely obvious way to call 
str-to-str codecs from the encode method, but since this is a fairly 
advanced and unusual use-case, I don't think it is a problem that we 
have to use the codecs module.


> * decompose_astrals(string)
> * compose_surrogate_pairs(string)

I'm not sure about those. I have to think about them.



-- 
Steve

From abarnert at yahoo.com  Tue May  5 19:33:28 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 5 May 2015 10:33:28 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <87zj5j47zi.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com>
 <87zj5j47zi.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <72DABA4D-EA98-46CC-824B-BA3AF1785B04@yahoo.com>

On May 5, 2015, at 03:46, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> 
> Andrew Barnert writes:
> 
>> (I'm not sure if we actually have a UCS-2 codec, but if not, it's
>> trivial to write--it's just UTF-16 without surrogates.)
> 
> The PEP 393 machinery knows when astral characters are introduced
> because it has to widen the representation.  That might be a more
> convenient place to raise an exception on non-BMP characters.
> 
But the PEP 393 machinery doesn't know when it's dealing with strings that are ultimately destined for a UCS-2 application, any more than it can know when it's dealing with strings that have to be pure ASCII or CP1252 or any other character set.

If you want to print emoji to a CP1252 console or write them to a Shift-JIS text file, you get an error from an explicit or implicit `str.encode` that you can debug. If you want to display emoji in a Tkinter GUI, it should be exactly the same. The only reason it isn't is that we pretend "narrow Unicode" is a real thing and implicitly convert to UTF-16 instead of making the code explicitly specify UCS-2 or UTF-16 as appropriate. 



From abarnert at yahoo.com  Tue May  5 20:00:03 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 5 May 2015 18:00:03 +0000 (UTC)
Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?)
In-Reply-To: <5548CBEC.3000303@aalto.fi>
References: <5548CBEC.3000303@aalto.fi>
Message-ID: <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com>

It seems like it might be a lot easier to approach this from the other end: Is it possible to write a decorator that takes an async coroutine function, strips out all the awaits, and returns a regular sync function? If so, all you need to do is write everything as async, and then users can "from spam import sync as spam" or "from spam import async as spam" (where async just imports all the real functions, while sync imports them and calls the decorator on all of them).
That also avoids the need to have all the looking up the event loop, switching between different code branches, etc. inside every function at runtime. (Not that it matters for the performance of sleep(1), but it might matter for the performance of other functions?and, more importantly, it might make the implementation of those functions simpler and easier to debug through.) 


     On Tuesday, May 5, 2015 7:01 AM, Koos Zevenhoven <koos.zevenhoven at aalto.fi> wrote:
   
 

 Hi all!

I am excited about seeing what's going on with asyncio and PEP492 etc. I 
really like that Python is becoming more suitable for the increasing 
amount of async code and that the distinction between async functions 
and generators is increasing.

In addition, however, I would also like to see the async functions and 
methods come even closer to regular functions and methods. This is 
something that is keeping me from using asyncio at the moment even if I 
would like to. Below I'll try to explain what and why, and a little bit 
of how. If it is not clear, please ask :)

Motivation:

One of the best things about asyncio and coroutines/async functions is 
that you can write asynchronous code as if it were synchronous, the 
difference in many places being just the use of "await" ("yield from") 
when calling something that may end up doing IO (somewhere down the 
function call chain) and that the code is run from an event loop.

When writing a package that does IO, you have the option to make it 
either synchronous or asynchronous. Regardless of the choice, the code 
will look roughly the same. But what if you want to be able to do both? 
Should you maintain two versions, one with "async" and "await" 
everywhere and one without?

Besides the keywords "async" and "await", async code of course differs 
from synchronous code by the functions/coroutines that are used for IO 
at the end of the function call chain. Here, I mean the end (close to) 
where the "yield" expressions are hidden in the async versions. At the 
other end of the calling chain, async code needs the event loop and 
associated framework (almost always asyncio?) which hides all the async 
scheduling fanciness etc. I'm not sure about the terminology, but I will 
use "L end" and "Y end" to refer to the two ends here. (L for event 
Loop; Y for Yield)

The Y and L ends need to be compatible with each other for the code to 
work. While asyncio and the standard library might provide both ends in 
many cases, there can also be situations where a package would want to 
work with different combinations of L and Y end, or completely without 
an event loop, i.e. synchronously.

In a very simple example, one might want to wrap different 
implementations of sleep() in a function that would pick the right one 
depending on the context. Perhaps something like this:

? async def any_sleep(seconds):
? ? ? if __async__.framework is None:
? ? ? ? ? time.sleep(1)
? ? ? elif __async__.framework is asyncio:
? ? ? ? ? await asyncio.sleep(1)
? ? ? else:
? ? ? ? ? raise RuntimeError("Was called with an unsupported async 
framework.")

[You could of course replace sleep() with socket IO or whatever, but 
sleep is nice and simple. Also, a larger library would probably have a 
whole chain of async functions and methods before calling something like 
this]

But if await is only allowed inside "async def", then how can 
any_sleep() be conveniently run in non-async code? Also, there is 
nothing like __async__.framework. Below, I describe what I think a 
potential solution might look like.



Potential solution:

This is simplified version; for instance, as "awaitables", I consider 
only async function objects here. I describe the idea in three parts:

(1) next(...):

Add a keyword argument "async_framework" (or whatever) to next(...) with 
a default value of None. When an async framework, typically asyncio, 
starts an async function object (coroutine) with a call to next(...), it 
would do something like next(coro, async_framework = asyncio). Here, 
asyncio could of course be replaced with any object that identifies the 
framework. This information would then be somehow attached to the async 
function object.


(2) __async__.framework or something similar:

Add something like __async__ that has an attribute such as .framework 
that allows the code inside the async function to access the information 
passed to next(...) by the framework (L end) using the keyword argument 
of next [see (1)].

(3) Generalized "await":

[When the world is ready:] Allow using "await" anywhere, not just within 
async functions. Inside async functions, the behavior of "await" would 
be the same as in PEP492, with the addition that it would somehow 
propagate the __async__.framework value to the awaited coroutine. 
Outside async functions, "await" would do roughly the same as this function:

? def await(async_func_obj):
? ? ? try:
? ? ? ? ? next(async_func_obj)? # same as next(async_func_obj, 
async_framework = None)
? ? ? except StopIteration as si:
? ? ? ? ? return si.value
? ? ? raise RuntimeError("The function does not support synchronous 
execution")

(This function would, of course, work in Python 3.4, but it would be 
mostly useless because the async functions would not know that they are 
being called in a 'synchronous program'. IIUC, this *function* would be 
valid even with PEP492, but having this as a function would be ugly in 
the long run.)


Some random thoughts:

With this addition to Python, one could write libraries that work both 
async and non-async. When await is not inside async def, one would 
expect it to potentially do blocking IO, just like an await inside async 
def would suggest that there is a yield/suspend somewhere in there.

For testing, I tried to see if there is a reasonable way to make a hack 
with __async__.framework that could be set by next(), but did not find 
an obvious way. For instance, coro.gi_frame.f_locals is read-only, I 
believe.

An alternative to this approach could be that await would implicitly 
start a temporary event loop for running the coroutine, but how would it 
know which event loop? This might also have a huge performance overhead.

Relation to PEP492:

This of course still needs more thinking, but I wanted to post it here 
now in case there is desire to prepare for something like this already 
in PEP492. It is not completely clear if/how this would need to affect 
PEP492, but some things come to mind. For example, this could 
potentially remove the need for __aenter__, __aiter__, etc. or even 
"async for" and "async with". If __aenter__ is defined as "async def", 
then a with statement would do an "await" on it, and the context manager 
would have __async__.framework (or whatever it would be called) 
available, for determining what behavior is appropriate.

Was this clear enough to understand which problem(s) this would be 
solving and how? I'd be happy to hear about any thoughts on this :).


Best regards,
Koos

_______________________________________________
Python-ideas mailing list
Python-ideas at python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


 
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150505/ed629c51/attachment-0001.html>

From guido at python.org  Tue May  5 20:48:45 2015
From: guido at python.org (Guido van Rossum)
Date: Tue, 5 May 2015 11:48:45 -0700
Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?)
In-Reply-To: <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com>
References: <5548CBEC.3000303@aalto.fi>
 <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <CAP7+vJ+ZsdCmDf+Hns2FXzo1stEX-pfeUZpxFBoin4GbzpPZZw@mail.gmail.com>

Quick notes:
- I don't think it's really possible to write realistic async code
independently from an async framework.
- For synchronous code that wants to use some async code, the pattern is
simple:
    asyncio.get_event_loop().run_until_complete(some_async_call(args, etc))
- We can probably wrap this in a convenience helper function so you can
just write:
    asyncio.sync_wait(some_async_call(args, etc))
- Note that this will fail (and rightly so!) if called when the event loop
is already running.

On Tue, May 5, 2015 at 11:00 AM, Andrew Barnert via Python-ideas <
python-ideas at python.org> wrote:

> It seems like it might be a lot easier to approach this from the other
> end: Is it possible to write a decorator that takes an async coroutine
> function, strips out all the awaits, and returns a regular sync function?
> If so, all you need to do is write everything as async, and then users can
> "from spam import sync as spam" or "from spam import async as spam" (where
> async just imports all the real functions, while sync imports them and
> calls the decorator on all of them).
>
> That also avoids the need to have all the looking up the event loop,
> switching between different code branches, etc. inside every function at
> runtime. (Not that it matters for the performance of sleep(1), but it might
> matter for the performance of other functions?and, more importantly, it
> might make the implementation of those functions simpler and easier to
> debug through.)
>
>
>
>   On Tuesday, May 5, 2015 7:01 AM, Koos Zevenhoven <
> koos.zevenhoven at aalto.fi> wrote:
>
>
>
> Hi all!
>
> I am excited about seeing what's going on with asyncio and PEP492 etc. I
> really like that Python is becoming more suitable for the increasing
> amount of async code and that the distinction between async functions
> and generators is increasing.
>
> In addition, however, I would also like to see the async functions and
> methods come even closer to regular functions and methods. This is
> something that is keeping me from using asyncio at the moment even if I
> would like to. Below I'll try to explain what and why, and a little bit
> of how. If it is not clear, please ask :)
>
> Motivation:
>
> One of the best things about asyncio and coroutines/async functions is
> that you can write asynchronous code as if it were synchronous, the
> difference in many places being just the use of "await" ("yield from")
> when calling something that may end up doing IO (somewhere down the
> function call chain) and that the code is run from an event loop.
>
> When writing a package that does IO, you have the option to make it
> either synchronous or asynchronous. Regardless of the choice, the code
> will look roughly the same. But what if you want to be able to do both?
> Should you maintain two versions, one with "async" and "await"
> everywhere and one without?
>
> Besides the keywords "async" and "await", async code of course differs
> from synchronous code by the functions/coroutines that are used for IO
> at the end of the function call chain. Here, I mean the end (close to)
> where the "yield" expressions are hidden in the async versions. At the
> other end of the calling chain, async code needs the event loop and
> associated framework (almost always asyncio?) which hides all the async
> scheduling fanciness etc. I'm not sure about the terminology, but I will
> use "L end" and "Y end" to refer to the two ends here. (L for event
> Loop; Y for Yield)
>
> The Y and L ends need to be compatible with each other for the code to
> work. While asyncio and the standard library might provide both ends in
> many cases, there can also be situations where a package would want to
> work with different combinations of L and Y end, or completely without
> an event loop, i.e. synchronously.
>
> In a very simple example, one might want to wrap different
> implementations of sleep() in a function that would pick the right one
> depending on the context. Perhaps something like this:
>
>   async def any_sleep(seconds):
>       if __async__.framework is None:
>           time.sleep(1)
>       elif __async__.framework is asyncio:
>           await asyncio.sleep(1)
>       else:
>           raise RuntimeError("Was called with an unsupported async
> framework.")
>
> [You could of course replace sleep() with socket IO or whatever, but
> sleep is nice and simple. Also, a larger library would probably have a
> whole chain of async functions and methods before calling something like
> this]
>
> But if await is only allowed inside "async def", then how can
> any_sleep() be conveniently run in non-async code? Also, there is
> nothing like __async__.framework. Below, I describe what I think a
> potential solution might look like.
>
>
>
> Potential solution:
>
> This is simplified version; for instance, as "awaitables", I consider
> only async function objects here. I describe the idea in three parts:
>
> (1) next(...):
>
> Add a keyword argument "async_framework" (or whatever) to next(...) with
> a default value of None. When an async framework, typically asyncio,
> starts an async function object (coroutine) with a call to next(...), it
> would do something like next(coro, async_framework = asyncio). Here,
> asyncio could of course be replaced with any object that identifies the
> framework. This information would then be somehow attached to the async
> function object.
>
>
> (2) __async__.framework or something similar:
>
> Add something like __async__ that has an attribute such as .framework
> that allows the code inside the async function to access the information
> passed to next(...) by the framework (L end) using the keyword argument
> of next [see (1)].
>
> (3) Generalized "await":
>
> [When the world is ready:] Allow using "await" anywhere, not just within
> async functions. Inside async functions, the behavior of "await" would
> be the same as in PEP492, with the addition that it would somehow
> propagate the __async__.framework value to the awaited coroutine.
> Outside async functions, "await" would do roughly the same as this
> function:
>
>   def await(async_func_obj):
>       try:
>           next(async_func_obj)  # same as next(async_func_obj,
> async_framework = None)
>       except StopIteration as si:
>           return si.value
>       raise RuntimeError("The function does not support synchronous
> execution")
>
> (This function would, of course, work in Python 3.4, but it would be
> mostly useless because the async functions would not know that they are
> being called in a 'synchronous program'. IIUC, this *function* would be
> valid even with PEP492, but having this as a function would be ugly in
> the long run.)
>
>
> Some random thoughts:
>
> With this addition to Python, one could write libraries that work both
> async and non-async. When await is not inside async def, one would
> expect it to potentially do blocking IO, just like an await inside async
> def would suggest that there is a yield/suspend somewhere in there.
>
> For testing, I tried to see if there is a reasonable way to make a hack
> with __async__.framework that could be set by next(), but did not find
> an obvious way. For instance, coro.gi_frame.f_locals is read-only, I
> believe.
>
> An alternative to this approach could be that await would implicitly
> start a temporary event loop for running the coroutine, but how would it
> know which event loop? This might also have a huge performance overhead.
>
> Relation to PEP492:
>
> This of course still needs more thinking, but I wanted to post it here
> now in case there is desire to prepare for something like this already
> in PEP492. It is not completely clear if/how this would need to affect
> PEP492, but some things come to mind. For example, this could
> potentially remove the need for __aenter__, __aiter__, etc. or even
> "async for" and "async with". If __aenter__ is defined as "async def",
> then a with statement would do an "await" on it, and the context manager
> would have __async__.framework (or whatever it would be called)
> available, for determining what behavior is appropriate.
>
> Was this clear enough to understand which problem(s) this would be
> solving and how? I'd be happy to hear about any thoughts on this :).
>
>
> Best regards,
> Koos
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150505/6e535bc8/attachment.html>

From ncoghlan at gmail.com  Tue May  5 21:21:37 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 6 May 2015 05:21:37 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>

On 5 May 2015 at 18:23, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> So this proposal merely amounts to reintroduction of the Python 2 str
> confusion into Python 3.  It is dangerous *precisely because* the
> current situation is so frustrating.  These functions will not be used
> by "consenting adults", in most cases.  Those with sufficient
> knowledge for "informed consent" also know enough to decode encoded
> text ASAP, and encode internal text ALAP, with appropriate handlers,
> in the first place.
>
> Rather, these str2str functions will be used by programmers at the
> ends of their ropes desperate to suppress "those damned Unicode
> errors" by any means available.  In fact, they are most likely to be
> used and recommended by *library* writers, because they're the ones
> who are least like to have control over input, or to know their
> clients' requirements for output.  "Just use rehandle_* to ameliorate
> the errors" is going to be far too tempting for them to resist.

The primary intended audience is Linux distribution developers using
Python 3 as the system Python. I agree misuse in other contexts is a
risk, but consider assisting the migration of the Linux ecosystem from
Python 2 to Python 3 sufficiently important that it's worth our while
taking that risk.

> That Nick, of all people, supports this proposal is to me just
> confirmation that it's frustration, and only frustration, speaking
> here.  He used to be one of the strongest supporters of keeping
> "native text" (Unicode) and "encoded text" separate by keeping the
> latter in bytes.

It's not frustration (at least, I don't think it is), it's a proposal
for advanced tooling to deal properly with legacy *nix systems that
either:

a. use a locale encoding other than UTF-8; or
b. don't reliably set the locale encoding for system services and cron
jobs (which anecdotally appears to amount to "aren't using systemd" in
the current crop of *nix init systems)

If a developer only cares about Windows, Mac OS X, or modern systemd
based *nix systems that use UTF-8 as the system locale, and they never
set "LANG=C" before running a Python program, then these new functions
will be completely irrelevant to them. (I've also submitted a request
to the glibc team to make C.UTF-8 universally available, reducing the
need to use "LANG=C", and they're amenable to the idea, but it
requires someone to work on preparing and submitting a patch:
https://sourceware.org/bugzilla/show_bug.cgi?id=17318)

If, however, a developer wants to handle "LANG=C", or other non-UTF-8
locales reliably across the full spectrum of *nix systems in Python 3,
they need a way to cope with system data that they *know* has been
decoded incorrectly by the interpreter, as we'll potentially do
exactly that for environment variables, command line arguments,
stdin/stdout/stderr and more if we get bad locale encoding settings
from the OS (such as when "LANG=C" is specified, or the init system
simply doesn't set a locale at all and hence CPython falls back to the
POSIX default of ASCII).

Python 2 lets users sweep a lot of that under the rug, as the data at
least round trips within the system, but you get unexpected mojibake
in some cases (especially when taking local data and pushing it out
over the network).

Since these boundary decoding issues don't arise on properly
configured modern *nix systems, we've been able to take advantage of
that by moving Python 3 towards a more pragmatic and distro-friendly
approach in coping with legacy *nix platforms and behaviours,
primarily by starting to use "surrogateescape" by default on a few
more system interfaces (e.g. on the standard streams when the OS
*claims* that the locale encoding is ASCII, which we now assume to
indicate a configuration error, which we can at least work around for
roundtripping purposes so that "os.listdir()" works reliably at the
interactive prompt).

This change in approach (heavily influenced by the parallel "Python 3
as the default system Python" efforts in Ubuntu and Fedora) *has*
moved us back towards an increased risk of introducing mojibake in
legacy environments, but the nature of that trade-off has changed
markedly from the situation back in 2009 (let alone 2006):

* most popular modern Linux systems use systemd with the UTF-8 locale,
which "just works" from a boundary encoding/decoding perspective (it's
closely akin to the situation we've had on Mac OS X from the dawn of
Python 3)
* even without systemd, most modern *nix systems at least default to
the UTF-8 locale, which works reliably for user processes in the
absence of an explicit setting like "LANG=C", even if service daemons
and cron jobs can be a bit sketchier in terms of the locale settings
they receive
* for legacy environments migrating from Python 2 without upgrading
the underlying OS, our emphasis has shifted to tolerating "bug
compatibility" at the Python level in order to ease migration, as the
most appropriate long term solution for those environments is now to
upgrade their OS such that it more reliably provides correct locale
encoding settings to the Python 3 interpreter (which wasn't a
generally available option back when Python 3 first launched)

Armin Ronacher (as ever) provides a good explanation of the system
interface problems that can arise in Python 3 with bad locale encoding
settings here: http://click.pocoo.org/4/python3/#python3-surrogates

In my view, the critical helper function for this purpose is actually
"handle_surrogateescape", as that's the one that lets us readily adapt
from the incorrectly specified ASCII locale encoding to any other
ASCII-compatible system encoding once we've bootstrapped into a full
Python environment which has more options for figuring out a suitable
encoding than just looking at the locale setting provided by the C
runtime. It's also the function that serves to provide the primary
"hook" where we can hang documentation of this platform specific
boundary encoding/decoding issue.

The other suggested functions are then more about providing a "peek
behind the curtain" API for folks that want to *use Python* to explore
some of the ins and outs of Unicode surrogate handling. Surrogates and
astrals really aren't that complicated, but we've historically hidden
them away as "dark magic not to be understood by mere mortals". In
reality, they're just different ways of composing sequences of
integers to represent text, and the suggested APIs are designed to
expose that in a way we haven't done in the past. I can't actually
think of a practical purpose for them other than teaching people the
basics of how Unicode representations work, but demystifying that
seems sufficiently worthwhile to me that I'm not opposed to their
inclusion (bear in mind I'm also the current "dis" module maintainer,
and a contributor to the "inspect", so I'm a big fan of exposing
underlying concepts like this in a way that lets people play with them
programmatically for learning purposes).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From eric at trueblade.com  Tue May  5 23:03:33 2015
From: eric at trueblade.com (Eric V. Smith)
Date: Tue, 05 May 2015 17:03:33 -0400
Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?)
In-Reply-To: <CAP7+vJ+ZsdCmDf+Hns2FXzo1stEX-pfeUZpxFBoin4GbzpPZZw@mail.gmail.com>
References: <5548CBEC.3000303@aalto.fi>
 <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com>
 <CAP7+vJ+ZsdCmDf+Hns2FXzo1stEX-pfeUZpxFBoin4GbzpPZZw@mail.gmail.com>
Message-ID: <55493025.10209@trueblade.com>

On 5/5/2015 2:48 PM, Guido van Rossum wrote:
> Quick notes:
> - I don't think it's really possible to write realistic async code
> independently from an async framework.
> - For synchronous code that wants to use some async code, the pattern is
> simple:
>     asyncio.get_event_loop().run_until_complete(some_async_call(args, etc))
> - We can probably wrap this in a convenience helper function so you can
> just write:
>     asyncio.sync_wait(some_async_call(args, etc))
> - Note that this will fail (and rightly so!) if called when the event
> loop is already running.

If we're going through all of the effort to elevate await and async def
to syntax, then can't the interpreter also be aware if it's running an
event loop? Then, if we are running an event loop, await becomes "yield
from", using the event loop. But if we're not running an event loop,
then await becomes a blocking wait, using some version of
run_until_complete, whether really from asyncio or baked into the
interpreter.

This way, I can write my library code as being async, but it's still
usable from non-async code (although it would need to be called with
await, of course).

I'll admit I haven't thought this all the way through, and I'm still
reading through PEP 492. But if I can write my async code as if it were
blocking using await, why can't it really be blocking, too?

Eric.


From guido at python.org  Tue May  5 23:27:29 2015
From: guido at python.org (Guido van Rossum)
Date: Tue, 5 May 2015 14:27:29 -0700
Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?)
In-Reply-To: <55493025.10209@trueblade.com>
References: <5548CBEC.3000303@aalto.fi>
 <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com>
 <CAP7+vJ+ZsdCmDf+Hns2FXzo1stEX-pfeUZpxFBoin4GbzpPZZw@mail.gmail.com>
 <55493025.10209@trueblade.com>
Message-ID: <CAP7+vJJ6O8fTDz_cJLw7wtEY4-M4JjZXoD5VMOVBU1grUttRqg@mail.gmail.com>

No, we can't, because the async/await are interpreted by the *compiler*,
while the presence of an event loop is a condition of the *runtime*.

On Tue, May 5, 2015 at 2:03 PM, Eric V. Smith <eric at trueblade.com> wrote:

> On 5/5/2015 2:48 PM, Guido van Rossum wrote:
> > Quick notes:
> > - I don't think it's really possible to write realistic async code
> > independently from an async framework.
> > - For synchronous code that wants to use some async code, the pattern is
> > simple:
> >     asyncio.get_event_loop().run_until_complete(some_async_call(args,
> etc))
> > - We can probably wrap this in a convenience helper function so you can
> > just write:
> >     asyncio.sync_wait(some_async_call(args, etc))
> > - Note that this will fail (and rightly so!) if called when the event
> > loop is already running.
>
> If we're going through all of the effort to elevate await and async def
> to syntax, then can't the interpreter also be aware if it's running an
> event loop? Then, if we are running an event loop, await becomes "yield
> from", using the event loop. But if we're not running an event loop,
> then await becomes a blocking wait, using some version of
> run_until_complete, whether really from asyncio or baked into the
> interpreter.
>
> This way, I can write my library code as being async, but it's still
> usable from non-async code (although it would need to be called with
> await, of course).
>
> I'll admit I haven't thought this all the way through, and I'm still
> reading through PEP 492. But if I can write my async code as if it were
> blocking using await, why can't it really be blocking, too?
>
> Eric.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150505/1dd722b9/attachment.html>

From koos.zevenhoven at aalto.fi  Wed May  6 00:23:19 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Wed, 6 May 2015 01:23:19 +0300
Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?)
In-Reply-To: <CAP7+vJ+ZsdCmDf+Hns2FXzo1stEX-pfeUZpxFBoin4GbzpPZZw@mail.gmail.com>
References: <5548CBEC.3000303@aalto.fi>
 <1924070601.522541.1430848804012.JavaMail.yahoo@mail.yahoo.com>
 <CAP7+vJ+ZsdCmDf+Hns2FXzo1stEX-pfeUZpxFBoin4GbzpPZZw@mail.gmail.com>
Message-ID: <554942D7.6080107@aalto.fi>

Hi Guido and Andrew,

Thank you for your prompt responses!

On 5.5.2015 21:48, Guido van Rossum wrote:
> Quick notes:
> - I don't think it's really possible to write realistic async code 
> independently from an async framework.

And since there is asyncio in the standard library, I would assume there 
typically is no reason to do that either(?) However, as a side effect of 
my proposal, there would still be a way to use an if statement to pick 
the right async code to match the framework, along with matching the 
non-async version :).

Speaking of side effects, I think the same "__async__" variable might 
also naturally provide this:

https://mail.python.org/pipermail/python-ideas/2015-April/033152.html

By the way, if I understand your first note, it might be the same as my 
"The Y and L ends need to be compatible with each other for the code to 
work." Sorry about the terminology. I hope the explanations of Y and L 
are somewhat understandable.

> - For synchronous code that wants to use some async code, the pattern 
> is simple:
> asyncio.get_event_loop().run_until_complete(some_async_call(args, etc))
> - We can probably wrap this in a convenience helper function so you 
> can just write:
>     asyncio.sync_wait(some_async_call(args, etc))

This is what is keeping me from using asyncio. Ignoring performance 
overhead, if in any synchronous script (or interactive prompt or ipython 
notebook) all calls to my library would look like that, I will happily 
use my 2.7 version that uses threads. Well, I admit that the part about 
"happily" is not completely true in my case.

Instead, I would be quite happy typing "await <function_call>", since 
awaiting the function call (to finish/return a value) is exactly what I 
would be doing, regardless of whether there is an event loop or not.

> - Note that this will fail (and rightly so!) if called when the event 
> loop is already running.
>

Regarding my proposal, there would still be a way for libraries to 
provide this functionality, if desired :).

Please see also the comments below.

> On Tue, May 5, 2015 at 11:00 AM, Andrew Barnert via Python-ideas 
> <python-ideas at python.org <mailto:python-ideas at python.org>> wrote:
>
>     It seems like it might be a lot easier to approach this from the
>     other end: Is it possible to write a decorator that takes an async
>     coroutine function, strips out all the awaits, and returns a
>     regular sync function? If so, all you need to do is write
>     everything as async, and then users can "from spam import sync as
>     spam" or "from spam import async as spam" (where async just
>     imports all the real functions, while sync imports them and calls
>     the decorator on all of them).
>

Interesting idea. If this is possible, it would solve part of the issue, 
but the "Y end" (sorry) of the chain may still need to be done by hand.

>
>     That also avoids the need to have all the looking up the event
>     loop, switching between different code branches, etc. inside every
>     function at runtime. (Not that it matters for the performance of
>     sleep(1), but it might matter for the performance of other
>     functions?and, more importantly, it might make the implementation
>     of those functions simpler and easier to debug through.)
>
>

This could indeed save some if statements at runtime.

Note that the if statements would not be inside every function, but only 
in the ones that do the actual IO. For instance, some 3rd-party library 
might use wrappers around socket send and socket recv to choose between 
sync and async versions, and that might be all the IO it needs to build 
several layers of async code. Even better, had someone taken the time to 
provide these if statements inside the standard library, the whole 
3rd-party async library would just magically work also in synchronous 
code :).

Best regards,
Koos

>
>     On Tuesday, May 5, 2015 7:01 AM, Koos Zevenhoven
>     <koos.zevenhoven at aalto.fi <mailto:koos.zevenhoven at aalto.fi>> wrote:
>
>
>
>         Hi all!
>
>         I am excited about seeing what's going on with asyncio and
>         PEP492 etc. I
>         really like that Python is becoming more suitable for the
>         increasing
>         amount of async code and that the distinction between async
>         functions
>         and generators is increasing.
>
>         In addition, however, I would also like to see the async
>         functions and
>         methods come even closer to regular functions and methods.
>         This is
>         something that is keeping me from using asyncio at the moment
>         even if I
>         would like to. Below I'll try to explain what and why, and a
>         little bit
>         of how. If it is not clear, please ask :)
>
>         Motivation:
>
>         One of the best things about asyncio and coroutines/async
>         functions is
>         that you can write asynchronous code as if it were
>         synchronous, the
>         difference in many places being just the use of "await"
>         ("yield from")
>         when calling something that may end up doing IO (somewhere
>         down the
>         function call chain) and that the code is run from an event loop.
>
>         When writing a package that does IO, you have the option to
>         make it
>         either synchronous or asynchronous. Regardless of the choice,
>         the code
>         will look roughly the same. But what if you want to be able to
>         do both?
>         Should you maintain two versions, one with "async" and "await"
>         everywhere and one without?
>
>         Besides the keywords "async" and "await", async code of course
>         differs
>         from synchronous code by the functions/coroutines that are
>         used for IO
>         at the end of the function call chain. Here, I mean the end
>         (close to)
>         where the "yield" expressions are hidden in the async
>         versions. At the
>         other end of the calling chain, async code needs the event
>         loop and
>         associated framework (almost always asyncio?) which hides all
>         the async
>         scheduling fanciness etc. I'm not sure about the terminology,
>         but I will
>         use "L end" and "Y end" to refer to the two ends here. (L for
>         event
>         Loop; Y for Yield)
>
>         The Y and L ends need to be compatible with each other for the
>         code to
>         work. While asyncio and the standard library might provide
>         both ends in
>         many cases, there can also be situations where a package would
>         want to
>         work with different combinations of L and Y end, or completely
>         without
>         an event loop, i.e. synchronously.
>
>         In a very simple example, one might want to wrap different
>         implementations of sleep() in a function that would pick the
>         right one
>         depending on the context. Perhaps something like this:
>
>           async def any_sleep(seconds):
>               if __async__.framework is None:
>                   time.sleep(1)
>               elif __async__.framework is asyncio:
>                   await asyncio.sleep(1)
>               else:
>                   raise RuntimeError("Was called with an unsupported
>         async
>         framework.")
>
>         [You could of course replace sleep() with socket IO or
>         whatever, but
>         sleep is nice and simple. Also, a larger library would
>         probably have a
>         whole chain of async functions and methods before calling
>         something like
>         this]
>
>         But if await is only allowed inside "async def", then how can
>         any_sleep() be conveniently run in non-async code? Also, there is
>         nothing like __async__.framework. Below, I describe what I
>         think a
>         potential solution might look like.
>
>
>
>         Potential solution:
>
>         This is simplified version; for instance, as "awaitables", I
>         consider
>         only async function objects here. I describe the idea in three
>         parts:
>
>         (1) next(...):
>
>         Add a keyword argument "async_framework" (or whatever) to
>         next(...) with
>         a default value of None. When an async framework, typically
>         asyncio,
>         starts an async function object (coroutine) with a call to
>         next(...), it
>         would do something like next(coro, async_framework = asyncio).
>         Here,
>         asyncio could of course be replaced with any object that
>         identifies the
>         framework. This information would then be somehow attached to
>         the async
>         function object.
>
>
>         (2) __async__.framework or something similar:
>
>         Add something like __async__ that has an attribute such as
>         .framework
>         that allows the code inside the async function to access the
>         information
>         passed to next(...) by the framework (L end) using the keyword
>         argument
>         of next [see (1)].
>
>         (3) Generalized "await":
>
>         [When the world is ready:] Allow using "await" anywhere, not
>         just within
>         async functions. Inside async functions, the behavior of
>         "await" would
>         be the same as in PEP492, with the addition that it would somehow
>         propagate the __async__.framework value to the awaited coroutine.
>         Outside async functions, "await" would do roughly the same as
>         this function:
>
>           def await(async_func_obj):
>               try:
>                   next(async_func_obj)  # same as next(async_func_obj,
>         async_framework = None)
>               except StopIteration as si:
>                   return si.value
>               raise RuntimeError("The function does not support
>         synchronous
>         execution")
>
>         (This function would, of course, work in Python 3.4, but it
>         would be
>         mostly useless because the async functions would not know that
>         they are
>         being called in a 'synchronous program'. IIUC, this *function*
>         would be
>         valid even with PEP492, but having this as a function would be
>         ugly in
>         the long run.)
>
>
>         Some random thoughts:
>
>         With this addition to Python, one could write libraries that
>         work both
>         async and non-async. When await is not inside async def, one
>         would
>         expect it to potentially do blocking IO, just like an await
>         inside async
>         def would suggest that there is a yield/suspend somewhere in
>         there.
>
>         For testing, I tried to see if there is a reasonable way to
>         make a hack
>         with __async__.framework that could be set by next(), but did
>         not find
>         an obvious way. For instance, coro.gi_frame.f_locals is
>         read-only, I
>         believe.
>
>         An alternative to this approach could be that await would
>         implicitly
>         start a temporary event loop for running the coroutine, but
>         how would it
>         know which event loop? This might also have a huge performance
>         overhead.
>
>         Relation to PEP492:
>
>         This of course still needs more thinking, but I wanted to post
>         it here
>         now in case there is desire to prepare for something like this
>         already
>         in PEP492. It is not completely clear if/how this would need
>         to affect
>         PEP492, but some things come to mind. For example, this could
>         potentially remove the need for __aenter__, __aiter__, etc. or
>         even
>         "async for" and "async with". If __aenter__ is defined as
>         "async def",
>         then a with statement would do an "await" on it, and the
>         context manager
>         would have __async__.framework (or whatever it would be called)
>         available, for determining what behavior is appropriate.
>
>         Was this clear enough to understand which problem(s) this
>         would be
>         solving and how? I'd be happy to hear about any thoughts on
>         this :).
>
>
>         Best regards,
>         Koos
>
>         _______________________________________________
>         Python-ideas mailing list
>         Python-ideas at python.org <mailto:Python-ideas at python.org>
>         https://mail.python.org/mailman/listinfo/python-ideas
>         Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
>     _______________________________________________
>     Python-ideas mailing list
>     Python-ideas at python.org <mailto:Python-ideas at python.org>
>     https://mail.python.org/mailman/listinfo/python-ideas
>     Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
>
> -- 
> --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/258a65d2/attachment-0001.html>

From koos.zevenhoven at aalto.fi  Wed May  6 01:19:04 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Wed, 6 May 2015 02:19:04 +0300
Subject: [Python-ideas] async/await and synchronous code (and PEP492 ?)
In-Reply-To: <5477_1430834452_5548CD13_5477_4819_1_5548CBEC.3000303@aalto.fi>
References: <5477_1430834452_5548CD13_5477_4819_1_5548CBEC.3000303@aalto.fi>
Message-ID: <55494FE8.8050703@aalto.fi>

Hi all,

I noticed a typo in my first email (had written__aenter__ instead of 
__enter__). I fixed the typo below.

-- Koos


On 5.5.2015 16:55, Koos Zevenhoven wrote:
>
> Relation to PEP492:
>
> This of course still needs more thinking, but I wanted to post it here 
> now in case there is desire to prepare for something like this already 
> in PEP492. It is not completely clear if/how this would need to affect 
> PEP492, but some things come to mind. For example, this could 
> potentially remove the need for __aenter__, __aiter__, etc. or even 
> "async for" and "async with". If __enter__ is defined as "async def", 
> then a with statement would do an "await" on it, and the context 
> manager would have __async__.framework (or whatever it would be 
> called) available, for determining what behavior is appropriate.
>


From abarnert at yahoo.com  Wed May  6 06:00:29 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 5 May 2015 21:00:29 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
Message-ID: <4D8FF17C-1D0B-42C8-A55F-0479A652321F@yahoo.com>

On May 5, 2015, at 12:21, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
>> On 5 May 2015 at 18:23, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> So this proposal merely amounts to reintroduction of the Python 2 str
>> confusion into Python 3.  It is dangerous *precisely because* the
>> current situation is so frustrating.  These functions will not be used
>> by "consenting adults", in most cases.  Those with sufficient
>> knowledge for "informed consent" also know enough to decode encoded
>> text ASAP, and encode internal text ALAP, with appropriate handlers,
>> in the first place.
>> 
>> Rather, these str2str functions will be used by programmers at the
>> ends of their ropes desperate to suppress "those damned Unicode
>> errors" by any means available.  In fact, they are most likely to be
>> used and recommended by *library* writers, because they're the ones
>> who are least like to have control over input, or to know their
>> clients' requirements for output.  "Just use rehandle_* to ameliorate
>> the errors" is going to be far too tempting for them to resist.
> 
> The primary intended audience is Linux distribution developers using
> Python 3 as the system Python. I agree misuse in other contexts is a
> risk, but consider assisting the migration of the Linux ecosystem from
> Python 2 to Python 3 sufficiently important that it's worth our while
> taking that risk.

In this case, the "unfortunate" fact that all these functions have to be "buried" in codecs instead of more discoverable sounds like a _good_ thing, not a problem. The Fedora and Ubuntu people will know where to find them, other linux distros will follow their lead, and the kind of end-user developers that Stephen is worried about who just like to throw in random encode and decode calls until their one test case on their one machine works will never even notice them and will still be encouraged to actually do the right thing.

>> That Nick, of all people, supports this proposal is to me just
>> confirmation that it's frustration, and only frustration, speaking
>> here.  He used to be one of the strongest supporters of keeping
>> "native text" (Unicode) and "encoded text" separate by keeping the
>> latter in bytes.
> 
> It's not frustration (at least, I don't think it is), it's a proposal
> for advanced tooling to deal properly with legacy *nix systems that
> either:
> 
> a. use a locale encoding other than UTF-8; or
> b. don't reliably set the locale encoding for system services and cron
> jobs (which anecdotally appears to amount to "aren't using systemd" in
> the current crop of *nix init systems)

It seems like launchd systems are as good as systemd systems here. Or are you not considering OS X a *nix?

I suppose given than the timeline for Apple to switch to Python 3 as the default Python is "maybe it'll happen, but we'll never tell you until a month before the public beta", it isn't really all that relevant...

> If a developer only cares about Windows, Mac OS X, or modern systemd
> based *nix systems that use UTF-8 as the system locale, and they never
> set "LANG=C" before running a Python program, then these new functions
> will be completely irrelevant to them. (I've also submitted a request
> to the glibc team to make C.UTF-8 universally available, reducing the
> need to use "LANG=C", and they're amenable to the idea, but it
> requires someone to work on preparing and submitting a patch:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17318)
> 
> If, however, a developer wants to handle "LANG=C", or other non-UTF-8
> locales reliably across the full spectrum of *nix systems in Python 3,
> they need a way to cope with system data that they *know* has been
> decoded incorrectly by the interpreter, as we'll potentially do
> exactly that for environment variables, command line arguments,
> stdin/stdout/stderr and more if we get bad locale encoding settings
> from the OS (such as when "LANG=C" is specified, or the init system
> simply doesn't set a locale at all and hence CPython falls back to the
> POSIX default of ASCII).
> 
> Python 2 lets users sweep a lot of that under the rug, as the data at
> least round trips within the system, but you get unexpected mojibake
> in some cases (especially when taking local data and pushing it out
> over the network).
> 
> Since these boundary decoding issues don't arise on properly
> configured modern *nix systems, we've been able to take advantage of
> that by moving Python 3 towards a more pragmatic and distro-friendly
> approach in coping with legacy *nix platforms and behaviours,
> primarily by starting to use "surrogateescape" by default on a few
> more system interfaces (e.g. on the standard streams when the OS
> *claims* that the locale encoding is ASCII, which we now assume to
> indicate a configuration error, which we can at least work around for
> roundtripping purposes so that "os.listdir()" works reliably at the
> interactive prompt).
> 
> This change in approach (heavily influenced by the parallel "Python 3
> as the default system Python" efforts in Ubuntu and Fedora) *has*
> moved us back towards an increased risk of introducing mojibake in
> legacy environments, but the nature of that trade-off has changed
> markedly from the situation back in 2009 (let alone 2006):
> 
> * most popular modern Linux systems use systemd with the UTF-8 locale,
> which "just works" from a boundary encoding/decoding perspective (it's
> closely akin to the situation we've had on Mac OS X from the dawn of
> Python 3)
> * even without systemd, most modern *nix systems at least default to
> the UTF-8 locale, which works reliably for user processes in the
> absence of an explicit setting like "LANG=C", even if service daemons
> and cron jobs can be a bit sketchier in terms of the locale settings
> they receive
> * for legacy environments migrating from Python 2 without upgrading
> the underlying OS, our emphasis has shifted to tolerating "bug
> compatibility" at the Python level in order to ease migration, as the
> most appropriate long term solution for those environments is now to
> upgrade their OS such that it more reliably provides correct locale
> encoding settings to the Python 3 interpreter (which wasn't a
> generally available option back when Python 3 first launched)
> 
> Armin Ronacher (as ever) provides a good explanation of the system
> interface problems that can arise in Python 3 with bad locale encoding
> settings here: http://click.pocoo.org/4/python3/#python3-surrogates
> 
> In my view, the critical helper function for this purpose is actually
> "handle_surrogateescape", as that's the one that lets us readily adapt
> from the incorrectly specified ASCII locale encoding to any other
> ASCII-compatible system encoding once we've bootstrapped into a full
> Python environment which has more options for figuring out a suitable
> encoding than just looking at the locale setting provided by the C
> runtime. It's also the function that serves to provide the primary
> "hook" where we can hang documentation of this platform specific
> boundary encoding/decoding issue.
> 
> The other suggested functions are then more about providing a "peek
> behind the curtain" API for folks that want to *use Python* to explore
> some of the ins and outs of Unicode surrogate handling. Surrogates and
> astrals really aren't that complicated, but we've historically hidden
> them away as "dark magic not to be understood by mere mortals".

I thought most linux 2.x system pythons were wide builds, and there definitely aren't any UTF-16 system interfaces like there are on Windows (which misleadingly calls them "Unicode", which we abet by not making people .encode('utf-16') in some of the places where they'd have to .encode('utf-8') on Mac and Linux...).

So I'm surprised there's a problem here at all. The only issues a Linux user is likely to ever see should be with surrogate escapes, not real surrogates, right?

> In
> reality, they're just different ways of composing sequences of
> integers to represent text, and the suggested APIs are designed to
> expose that in a way we haven't done in the past. I can't actually
> think of a practical purpose for them other than teaching people the
> basics of how Unicode representations work, but demystifying that
> seems sufficiently worthwhile to me that I'm not opposed to their
> inclusion (bear in mind I'm also the current "dis" module maintainer,
> and a contributor to the "inspect", so I'm a big fan of exposing
> underlying concepts like this in a way that lets people play with them
> programmatically for learning purposes).
> 
> Cheers,
> Nick.
> 
> -- 
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From stephen at xemacs.org  Wed May  6 08:36:23 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 06 May 2015 15:36:23 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <72DABA4D-EA98-46CC-824B-BA3AF1785B04@yahoo.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <381A9EDF-A2F5-43FF-9795-FC15AEC78A9A@yahoo.com>
 <87zj5j47zi.fsf@uwakimon.sk.tsukuba.ac.jp>
 <72DABA4D-EA98-46CC-824B-BA3AF1785B04@yahoo.com>
Message-ID: <87twvq43h4.fsf@uwakimon.sk.tsukuba.ac.jp>

Andrew Barnert writes:

 > But the PEP 393 machinery doesn't know when it's dealing with
 > strings that are ultimately destined for a UCS-2 application,
 > any more than it can know when it's dealing with strings that have
 > to be pure ASCII or CP1252 or any other character set.

Of course[1] it doesn't, and that's why I say the whole issue is just
frustration speaking.  Whatever we do, it's going to require that the
programmers know what they're doing, or they're just throwing their
garbage in the neighbor's yard.

With respect to doing the check in the str machinery, you can provide
an option that tells PEP 393 str to raise an "OutOfRepertoireError"
(subclass of UnicodeError) on introduction of astral characters to an
instance of str, or provide an API to ask an instance if it's wide
enough to accomodate astral characters.

Either way, the programmer needs to design and implement the
application to use those features, and that's hard.  "Toto!  I don't
think we're in Kansas anymore!"

 > If you want to print emoji to a CP1252 console or write them to a
 > Shift-JIS text file, you get an error from an explicit or implicit
 > `str.encode` that you can debug.

Yup, and these proposals for str2str conversions propose to sneak data
with unknown meaning into the application as if it were well-formed.
This is just like assuming the modular arithmetic that is performed in
registers is actually mathematical integer arithmetic.  You'll almost
never get burned.  Isn't that good enough?

That's not for me to say, but apparently, "small integer arithmetic"
is *not* good enough for Python.

Footnotes: 
[1]  In the current implementation.  We could provide a fontconfig-
like charset facility to describe repertoire restrictions in str, and
code to enforce it.  But this is a delicate question.  Users almost
always hate repertoire restrictions when imposed for the programmer's
convenience: they want to insert emoji, or write foreign words
correctly, or cut-and-paste from email or web pages, or whatever.  And
of course the restrictions may vary depending on the output media.


From stephen at xemacs.org  Wed May  6 09:56:36 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 06 May 2015 16:56:36 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
Message-ID: <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > If a developer only cares about Windows, Mac OS X, or modern systemd
 > based *nix systems that use UTF-8 as the system locale, and they never
 > set "LANG=C" before running a Python program, then these new functions
 > will be completely irrelevant to them.

"Irrelevant" is wildly optimistic.  They are a gift from heaven for
programmers who are avoiding developing Unicode skills.  Don't tell me
those skills are expensive -- I know, I sweat blood and spilt milk to
acquire them.  Nevertheless, without acquiring a modicum of those
skills, use of these proposed APIs is just what Ezio described:
applying any random thing that might work, to shut up those annoying
Unicode errors.  But these *will* *appear* to work, because they are
*designed* to smuggle the unprintable all the way to the output medium
by giving it a printable encoding.  You'll only find out that it was
done incorrectly when the user goes "achtung! mojibake!", and that
will be way too late.

 > If, however, a developer wants to handle "LANG=C", or other non-UTF-8
 > locales reliably across the full spectrum of *nix systems in Python 3,
 > they need a way to cope with system data that they *know* has been
 > decoded incorrectly by the interpreter,

But if so, why is this being discussed as a visible addition to the
Python API?  AFAICS, .decode('ascii', errors=surrogateescape) plus
some variant on

for encoding in plausible_encoding_by_likelihood_list:
    try:
        s = input.encode('ascii', errors='surrogateescape')
        s = s.decode(encoding, errors='strict')
        break
    except UnicodeError:
        continue

is all you really need inside of the Python init sequence.  That is
how I read your opinion, too.

 > The other suggested functions are then more about providing a "peek
 > behind the curtain" API for folks that want to *use Python* to explore
 > some of the ins and outs of Unicode surrogate handling.

I just don't see a need.  .encode and .decode already give you all the
tools you need for exploring, and they do so in a way that tells you
via the type whether you're looking at abstract text or at the
representation.  It doesn't get better than this!

And if the APIs merely exposed the internal representation that would
be one thing.  But they don't, and the people who are saying, "I'm not
an expert on Unicode but this looks great!" are clearly interested in
mutating str instances to be something more palatable to the requisite
modules and I/O systems they need to use, but which aren't prepared for
astral characters or proper handling of surrogateescapes.

 > I can't actually think of a practical purpose for them other than
 > teaching people the basics of how Unicode representations work,

I agree, but it seems to me that a lot of people are already scheming
to use them for practical purposes.  Serhiy mentions tkinter, email,
and wsgiref, and David lusts after them for email.


From levkivskyi at gmail.com  Wed May  6 15:15:38 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 15:15:38 +0200
Subject: [Python-ideas] (no subject)
Message-ID: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>

Dear all,

The matrix multiplication operator @ is going to be introduced in Python
3.5 and I am thinking about the following idea:

The semantics of matrix multiplication is the composition of the
corresponding linear transformations.
A linear transformation is a particular example of a more general concept -
functions.
The latter are frequently composed with ("wrap") each other. For example:

plot(real(sqrt(data)))

However, it is not very readable in case of many wrapping layers.
Therefore, it could be useful to employ
the matrix multiplication operator @ for indication of function
composition. This could be done by such (simplified) decorator:

class composable:

    def __init__(self, func):
        self.func = func

    def __call__(self, arg):
        return self.func(arg)

    def __matmul__(self, other):
        def composition(*args, **kwargs):
            return self.func(other(*args, **kwargs))
        return composable(composition)

I think using such decorator with functions that are going to be deeply
wrapped
could improve readability.
You could compare (note that only the outermost function should be
decorated):

plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
(data_array)

I think the latter is more readable, also compare

def sunique(lst):
    return sorted(list(set(lst)))

vs.

sunique = sorted @ list @ set

Apart from readability, there are following pros of the proposed decorator:

1. Similar semantics as for matrix multiplication.
2. Same symbol for composition as for decorators.
3. The symbol @ resembles mathematical notation for function composition: ?

I think it could be a good idea to add such a decorator to the stdlib
functools module.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/bab3aa81/attachment.html>

From levkivskyi at gmail.com  Wed May  6 15:20:15 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 15:20:15 +0200
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
	matrix multiplication)
Message-ID: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>

Dear all,

The matrix multiplication operator @ is going to be introduced in Python
3.5 and I am thinking about the following idea:

The semantics of matrix multiplication is the composition of the
corresponding linear transformations.
A linear transformation is a particular example of a more general concept -
functions.
The latter are frequently composed with ("wrap") each other. For example:

plot(real(sqrt(data)))

However, it is not very readable in case of many wrapping layers.
Therefore, it could be useful to employ
the matrix multiplication operator @ for indication of function
composition. This could be done by such (simplified) decorator:

class composable:

    def __init__(self, func):
        self.func = func

    def __call__(self, arg):
        return self.func(arg)

    def __matmul__(self, other):
        def composition(*args, **kwargs):
            return self.func(other(*args, **kwargs))
        return composable(composition)

I think using such decorator with functions that are going to be deeply
wrapped
could improve readability.
You could compare (note that only the outermost function should be
decorated):

plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
(data_array)

I think the latter is more readable, also compare

def sunique(lst):
    return sorted(list(set(lst)))

vs.

sunique = sorted @ list @ set

Apart from readability, there are following pros of the proposed decorator:

1. Similar semantics as for matrix multiplication.
2. Same symbol for composition as for decorators.
3. The symbol @ resembles mathematical notation for function composition: ?

I think it could be a good idea to add such a decorator to the stdlib
functools module.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/43656e7c/attachment.html>

From abarnert at yahoo.com  Wed May  6 15:59:45 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 6 May 2015 06:59:45 -0700
Subject: [Python-ideas] (no subject)
In-Reply-To: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
Message-ID: <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>

This was discussed when the proposal to add @ for matrix multiplication came up, so you should first read that thread and make sure you have answers to all of the issues that came up before proposing it again.

Off the top of my head:

Python functions don't just take 1 parameter, they take any number of parameters, possibly including optional parameters, keyword-only, *args, **kwargs, etc. There are a dozen different compose implementations on PyPI and ActiveState that handle these differently. Which one is "right"?

The design you describe can be easily implemented as a third-party library. Why not do so, put it on PyPI, see if you get any traction and any ideas for improvement, and then suggest it for the stdlib?

The same thing is already doable today using a different operator--and, again, there are a dozen implementations. Why isn't anyone using them?

Thinking in terms of function composition requires a higher level of abstraction than thinking in terms of lambda expressions. That's one of the reasons people perceive Haskell to be a harder language to learn than Lisp or Python. Of course learning Haskell is rewarding--but being easy to learn is one of Python's major strengths.

Python doesn't have a static optimizing compiler that can avoid building 4 temporary function objects to evaluate (plot @ sorted @ sqrt @ real) (data_array), so it will make your code significantly less efficient.

Is @ for composition and () for application really sufficient to write point free code in general without auto-curried functions, operator sectioning, reverse compose, reverse apply, etc.? Most of the examples people use in describing the feature from Haskell have a (+ 1) or (== x) or take advantage of map-type functions being (a->b) -> ([a] -> [b]) instead of (a->b, [a]) -> [b].

Sent from my iPhone

> On May 6, 2015, at 06:15, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> 
> Dear all,
> 
> The matrix multiplication operator @ is going to be introduced in Python 3.5 and I am thinking about the following idea:
> 
> The semantics of matrix multiplication is the composition of the corresponding linear transformations.
> A linear transformation is a particular example of a more general concept - functions.
> The latter are frequently composed with ("wrap") each other. For example:
> 
> plot(real(sqrt(data)))
> 
> However, it is not very readable in case of many wrapping layers. Therefore, it could be useful to employ
> the matrix multiplication operator @ for indication of function composition. This could be done by such (simplified) decorator:
> 
> class composable:
> 
>     def __init__(self, func):
>         self.func = func
> 
>     def __call__(self, arg):
>         return self.func(arg)
> 
>     def __matmul__(self, other):
>         def composition(*args, **kwargs):
>             return self.func(other(*args, **kwargs))
>         return composable(composition)
> 
> I think using such decorator with functions that are going to be deeply wrapped 
> could improve readability.
> You could compare (note that only the outermost function should be decorated):
> 
> plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) (data_array)
> 
> I think the latter is more readable, also compare
> 
> def sunique(lst):
>     return sorted(list(set(lst)))
> 
> vs. 
> 
> sunique = sorted @ list @ set
> 
> Apart from readability, there are following pros of the proposed decorator:
> 
> 1. Similar semantics as for matrix multiplication.
> 2. Same symbol for composition as for decorators.
> 3. The symbol @ resembles mathematical notation for function composition: ?
> 
> I think it could be a good idea to add such a decorator to the stdlib functools module.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From guettliml at thomas-guettler.de  Wed May  6 16:05:00 2015
From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=)
Date: Wed, 06 May 2015 16:05:00 +0200
Subject: [Python-ideas] Policy for altering sys.path
Message-ID: <554A1F8C.1040005@thomas-guettler.de>

I am missing a policy how sys.path should be altered.

We run a custom sub class of list in sys.path. We set it in sitecustomize.py

This instance get replace by a common list in lines like this:

sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path

The above line is from pip, it similar things happen in a lot of packages.

Before trying to solve this with code, I think the python community should agree an a policy for altering sys.path.

What can I do to this done?

We use Python 2.7.


Related: http://bugs.python.org/issue24135

Regards,
   Thomas G?ttler


From erik.m.bray at gmail.com  Wed May  6 16:10:22 2015
From: erik.m.bray at gmail.com (Erik Bray)
Date: Wed, 6 May 2015 10:10:22 -0400
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
Message-ID: <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>

On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> Dear all,
>
> The matrix multiplication operator @ is going to be introduced in Python 3.5
> and I am thinking about the following idea:
>
> The semantics of matrix multiplication is the composition of the
> corresponding linear transformations.
> A linear transformation is a particular example of a more general concept -
> functions.
> The latter are frequently composed with ("wrap") each other. For example:
>
> plot(real(sqrt(data)))
>
> However, it is not very readable in case of many wrapping layers. Therefore,
> it could be useful to employ
> the matrix multiplication operator @ for indication of function composition.
> This could be done by such (simplified) decorator:
>
> class composable:
>
>     def __init__(self, func):
>         self.func = func
>
>     def __call__(self, arg):
>         return self.func(arg)
>
>     def __matmul__(self, other):
>         def composition(*args, **kwargs):
>             return self.func(other(*args, **kwargs))
>         return composable(composition)
>
> I think using such decorator with functions that are going to be deeply
> wrapped
> could improve readability.
> You could compare (note that only the outermost function should be
> decorated):
>
> plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
> (data_array)
>
> I think the latter is more readable, also compare
>
> def sunique(lst):
>     return sorted(list(set(lst)))
>
> vs.
>
> sunique = sorted @ list @ set
>
> Apart from readability, there are following pros of the proposed decorator:
>
> 1. Similar semantics as for matrix multiplication.
> 2. Same symbol for composition as for decorators.
> 3. The symbol @ resembles mathematical notation for function composition: ?
>
> I think it could be a good idea to add such a decorator to the stdlib
> functools module.

In the astropy.modeling package, which consists largely of collection
of fancy wrappers around analytic functions,
we used the pipe operator | (that is, __or__) to implement function
composition, as demonstrated here:

http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition

I do like the idea of using the new @ operator for this purpose--it
makes sense as a generalization of linear operators,
and it just looks a little more like the circle operator often used
for functional composition.  On the other hand
I'm also fond of the choice to use |, for the similarity to UNIX shell
pipe operations, as long as it can't be confused with
__or__.  Point being something like this could be implemented now with __or__.

I think this is simple enough that it doesn't need to be in the
stdlib, especially if there are different ways people
would like to do this.  But I do like the idea.

Erik

From steve at pearwood.info  Wed May  6 16:51:35 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 7 May 2015 00:51:35 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
Message-ID: <20150506145131.GL5663@ando.pearwood.info>

On Wed, May 06, 2015 at 03:15:38PM +0200, Ivan Levkivskyi wrote:
> Dear all,
> 
> The matrix multiplication operator @ is going to be introduced in Python
> 3.5 and I am thinking about the following idea:
> 
> The semantics of matrix multiplication is the composition of the
> corresponding linear transformations.
> A linear transformation is a particular example of a more general concept -
> functions.
> The latter are frequently composed with ("wrap") each other. For example:
> 
> plot(real(sqrt(data)))
> 
> However, it is not very readable in case of many wrapping layers.
> Therefore, it could be useful to employ
> the matrix multiplication operator @ for indication of function
> composition. This could be done by such (simplified) decorator:

I like the idea of @ as a function compose operator.

There have been many requests and attempts at support for function 
composition:

http://code.activestate.com/recipes/574458-composable-functions/

http://code.activestate.com/recipes/52902-function-composition/

http://code.activestate.com/recipes/528929-dynamic-function-composition-decorator/

http://blog.o1iver.net/2011/08/09/python-function-composition.html

https://mail.python.org/pipermail/python-dev/2009-August/091161.html

http://stackoverflow.com/questions/2281693/is-it-a-good-idea-to-have-a-syntax-sugar-to-function-composition-in-python


The last one is notable, as it floundered in part on the lack of a good 
operator. I think @ makes a good operator for function composition.


I think that there are some questions that would need to be answered. 
For instance, given some composition:

    f = math.sin @ (lambda x: x**2)

what would f.__name__ return? What about str(f)?


Do the composed functions:

    (spam @ eggs @ cheese)(x)

perform acceptibly compared to the traditional syntax?

    spam(eggs(cheese(x))



-- 
Steve

From levkivskyi at gmail.com  Wed May  6 17:05:05 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 17:05:05 +0200
Subject: [Python-ideas] (no subject)
In-Reply-To: <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
Message-ID: <CAOMjWkkSK6iAQhnCTJ4JPjFioxregNz4xFu-S3NpX00p3ZnznQ@mail.gmail.com>

Dear Andrew,

Thank you for pointing out the previous discussion, I have overlooked it.
(Btw, I have found your post about the infix operators, that is a great
idea).
Also, It turns out that astropy uses a very similar idea for function
composition.

I agree that there are indeed to much ambiguities about the "right way",
and thus it is not good for stdlib. However, implementing only one
decorator as a third-party library is not good idea as well.
You are right that no one will install such library. Probably, it would be
better to combine it with other functionality like @infix (via overloading
__or__ or __rshift__), @auto_curry, etc.

Thank you for the feedback!


On 6 May 2015 at 15:59, Andrew Barnert <abarnert at yahoo.com> wrote:

> This was discussed when the proposal to add @ for matrix multiplication
> came up, so you should first read that thread and make sure you have
> answers to all of the issues that came up before proposing it again.
>
> Off the top of my head:
>
> Python functions don't just take 1 parameter, they take any number of
> parameters, possibly including optional parameters, keyword-only, *args,
> **kwargs, etc. There are a dozen different compose implementations on PyPI
> and ActiveState that handle these differently. Which one is "right"?
>
> The design you describe can be easily implemented as a third-party
> library. Why not do so, put it on PyPI, see if you get any traction and any
> ideas for improvement, and then suggest it for the stdlib?
>
> The same thing is already doable today using a different operator--and,
> again, there are a dozen implementations. Why isn't anyone using them?
>
> Thinking in terms of function composition requires a higher level of
> abstraction than thinking in terms of lambda expressions. That's one of the
> reasons people perceive Haskell to be a harder language to learn than Lisp
> or Python. Of course learning Haskell is rewarding--but being easy to learn
> is one of Python's major strengths.
>
> Python doesn't have a static optimizing compiler that can avoid building 4
> temporary function objects to evaluate (plot @ sorted @ sqrt @ real)
> (data_array), so it will make your code significantly less efficient.
>
> Is @ for composition and () for application really sufficient to write
> point free code in general without auto-curried functions, operator
> sectioning, reverse compose, reverse apply, etc.? Most of the examples
> people use in describing the feature from Haskell have a (+ 1) or (== x) or
> take advantage of map-type functions being (a->b) -> ([a] -> [b]) instead
> of (a->b, [a]) -> [b].
>
> Sent from my iPhone
>
> > On May 6, 2015, at 06:15, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> >
> > Dear all,
> >
> > The matrix multiplication operator @ is going to be introduced in Python
> 3.5 and I am thinking about the following idea:
> >
> > The semantics of matrix multiplication is the composition of the
> corresponding linear transformations.
> > A linear transformation is a particular example of a more general
> concept - functions.
> > The latter are frequently composed with ("wrap") each other. For example:
> >
> > plot(real(sqrt(data)))
> >
> > However, it is not very readable in case of many wrapping layers.
> Therefore, it could be useful to employ
> > the matrix multiplication operator @ for indication of function
> composition. This could be done by such (simplified) decorator:
> >
> > class composable:
> >
> >     def __init__(self, func):
> >         self.func = func
> >
> >     def __call__(self, arg):
> >         return self.func(arg)
> >
> >     def __matmul__(self, other):
> >         def composition(*args, **kwargs):
> >             return self.func(other(*args, **kwargs))
> >         return composable(composition)
> >
> > I think using such decorator with functions that are going to be deeply
> wrapped
> > could improve readability.
> > You could compare (note that only the outermost function should be
> decorated):
> >
> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
> (data_array)
> >
> > I think the latter is more readable, also compare
> >
> > def sunique(lst):
> >     return sorted(list(set(lst)))
> >
> > vs.
> >
> > sunique = sorted @ list @ set
> >
> > Apart from readability, there are following pros of the proposed
> decorator:
> >
> > 1. Similar semantics as for matrix multiplication.
> > 2. Same symbol for composition as for decorators.
> > 3. The symbol @ resembles mathematical notation for function
> composition: ?
> >
> > I think it could be a good idea to add such a decorator to the stdlib
> functools module.
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/f83dad2f/attachment.html>

From p.f.moore at gmail.com  Wed May  6 17:07:52 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 6 May 2015 16:07:52 +0100
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <554A1F8C.1040005@thomas-guettler.de>
References: <554A1F8C.1040005@thomas-guettler.de>
Message-ID: <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>

On 6 May 2015 at 15:05, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
> I am missing a policy how sys.path should be altered.

Well, the docs say that applications can modify sys.path as needed.
Generally, applications modify sys.path in place via sys.path[:] =
whatever, but that's not mandated as far as I know.

> We run a custom sub class of list in sys.path. We set it in sitecustomize.py

Can you explain why? It seems pretty risky to expect that no
applications will replace sys.path. I understand that you're proposing
that we say that applications shouldn't do that - but just saying so
won't change the many applications already out there.

> This instance get replace by a common list in lines like this:
>
> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
>
> The above line is from pip, it similar things happen in a lot of packages.

How does the fact that pip does that cause a problem? The sys.path
modification is only in effect while pip is running, and no code in
pip relies on sys.path being an instance of your custom class.

> Before trying to solve this with code, I think the python community should
> agree an a policy for altering sys.path.

I can't imagine that happening, and even if it does, it won't make any
difference because a new policy won't change existing code. It won't
even affect new code unless people know about it (which isn't certain
- I doubt many people read the documentation that closely).

> What can I do to this done?

I doubt you can.

A PR for pip that changes the above line to modify sys.path in place
would probably get accepted (I can't see any reason why it wouldn't),
and I guess you could do the same for any other code you find. But as
for persuading the Python programming community not to replace
sys.path in any code, that seems unlikely to happen.

> We use Python 2.7

If you were using 3.x, then it's (barely) conceivable that making
sys.path read-only (so people could only modify it in-place) could be
done as a new feature, but (a) it would be a major backward
compatibility break, so there would have to be a strong justification,
and (b) it would stop you from replacing sys.path with your custom
class in the first place, so it wouldn't solve your issue.

Which also raises the question, why do you believe it's OK to forbid
other people to replace sys.path when that's what you're doing in your
sitecustomize code? That seems self-contradictory...

Paul

From rosuav at gmail.com  Wed May  6 17:11:16 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 7 May 2015 01:11:16 +1000
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <554A1F8C.1040005@thomas-guettler.de>
References: <554A1F8C.1040005@thomas-guettler.de>
Message-ID: <CAPTjJmoRK472eokBGqsB-dmQ7boS1SoAtx7+UOYnEnt_enCQMQ@mail.gmail.com>

On Thu, May 7, 2015 at 12:05 AM, Thomas G?ttler
<guettliml at thomas-guettler.de> wrote:
> We run a custom sub class of list in sys.path. We set it in sitecustomize.py
>
> This instance get replace by a common list in lines like this:
>
> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path

Forgive the obtuse question, but wouldn't an __radd__ method resolve
this for you?

ChrisA

From levkivskyi at gmail.com  Wed May  6 17:21:28 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 17:21:28 +0200
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
Message-ID: <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>

Dear Erik,

Thank you for the link! I agree that this idea is too raw for stdlib (there
are problems with many argument functions, keyword arguments, etc.)
Concerning the shell | vs. matrix @ I think it is a good idea to have
both... but with different order.
I mean in shell logic f | g means g (f (x)), while for matrix
multiplication f @ g means f(g(x)).
The former is probably more natural for people with more "programming"
background, while the latter is more natural for people with a "scientific"
background.
We could now do good for both, since we now have a new operator.


On 6 May 2015 at 16:10, Erik Bray <erik.m.bray at gmail.com> wrote:

> On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi <levkivskyi at gmail.com>
> wrote:
> > Dear all,
> >
> > The matrix multiplication operator @ is going to be introduced in Python
> 3.5
> > and I am thinking about the following idea:
> >
> > The semantics of matrix multiplication is the composition of the
> > corresponding linear transformations.
> > A linear transformation is a particular example of a more general
> concept -
> > functions.
> > The latter are frequently composed with ("wrap") each other. For example:
> >
> > plot(real(sqrt(data)))
> >
> > However, it is not very readable in case of many wrapping layers.
> Therefore,
> > it could be useful to employ
> > the matrix multiplication operator @ for indication of function
> composition.
> > This could be done by such (simplified) decorator:
> >
> > class composable:
> >
> >     def __init__(self, func):
> >         self.func = func
> >
> >     def __call__(self, arg):
> >         return self.func(arg)
> >
> >     def __matmul__(self, other):
> >         def composition(*args, **kwargs):
> >             return self.func(other(*args, **kwargs))
> >         return composable(composition)
> >
> > I think using such decorator with functions that are going to be deeply
> > wrapped
> > could improve readability.
> > You could compare (note that only the outermost function should be
> > decorated):
> >
> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
> > (data_array)
> >
> > I think the latter is more readable, also compare
> >
> > def sunique(lst):
> >     return sorted(list(set(lst)))
> >
> > vs.
> >
> > sunique = sorted @ list @ set
> >
> > Apart from readability, there are following pros of the proposed
> decorator:
> >
> > 1. Similar semantics as for matrix multiplication.
> > 2. Same symbol for composition as for decorators.
> > 3. The symbol @ resembles mathematical notation for function
> composition: ?
> >
> > I think it could be a good idea to add such a decorator to the stdlib
> > functools module.
>
> In the astropy.modeling package, which consists largely of collection
> of fancy wrappers around analytic functions,
> we used the pipe operator | (that is, __or__) to implement function
> composition, as demonstrated here:
>
>
> http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition
>
> I do like the idea of using the new @ operator for this purpose--it
> makes sense as a generalization of linear operators,
> and it just looks a little more like the circle operator often used
> for functional composition.  On the other hand
> I'm also fond of the choice to use |, for the similarity to UNIX shell
> pipe operations, as long as it can't be confused with
> __or__.  Point being something like this could be implemented now with
> __or__.
>
> I think this is simple enough that it doesn't need to be in the
> stdlib, especially if there are different ways people
> would like to do this.  But I do like the idea.
>
> Erik
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/bf063578/attachment.html>

From levkivskyi at gmail.com  Wed May  6 17:30:56 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 17:30:56 +0200
Subject: [Python-ideas] Function composition
Message-ID: <CAOMjWk=g_dQKAve=dZ60j40jP=SEYR_70jbZ5C7ULb4W6kSmNw@mail.gmail.com>

Dear Steve,

Thank you for the feedback and for the links!

I think that both (f at g).__name__ and str(f at g) should be
f.__name__ + ' @ ' + g.__name__
and
str(f) + ' @ ' +str(g)

Concerning the performance, I think that it could be poor, and I don't know
yet how to improve this.

> > Dear all,
> >
> > The matrix multiplication operator @ is going to be introduced in Python
> > 3.5 and I am thinking about the following idea:
> >
> > The semantics of matrix multiplication is the composition of the
> > corresponding linear transformations.
> > A linear transformation is a particular example of a more general
concept -
> > functions.
> > The latter are frequently composed with ("wrap") each other. For
example:
> >
> > plot(real(sqrt(data)))
> >
> > However, it is not very readable in case of many wrapping layers.
> > Therefore, it could be useful to employ
> > the matrix multiplication operator @ for indication of function
> > composition. This could be done by such (simplified) decorator:
>
> I like the idea of @ as a function compose operator.
>
> There have been many requests and attempts at support for function
> composition:
>
> http://code.activestate.com/recipes/574458-composable-functions/
>
> http://code.activestate.com/recipes/52902-function-composition/
>
>
http://code.activestate.com/recipes/528929-dynamic-function-composition-decorator/
>
> http://blog.o1iver.net/2011/08/09/python-function-composition.html
>
> https://mail.python.org/pipermail/python-dev/2009-August/091161.html
>
>
http://stackoverflow.com/questions/2281693/is-it-a-good-idea-to-have-a-syntax-sugar-to-function-composition-in-python
>
>
> The last one is notable, as it floundered in part on the lack of a good
> operator. I think @ makes a good operator for function composition.
>
>
> I think that there are some questions that would need to be answered.
> For instance, given some composition:
>
>     f = math.sin @ (lambda x: x**2)
>
> what would f.__name__ return? What about str(f)?
>
>
> Do the composed functions:
>
>     (spam @ eggs @ cheese)(x)
>
> perform acceptibly compared to the traditional syntax?
>
>     spam(eggs(cheese(x))
>
>
>
> --
> Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/f839bfc6/attachment.html>

From erik.m.bray at gmail.com  Wed May  6 17:42:17 2015
From: erik.m.bray at gmail.com (Erik Bray)
Date: Wed, 6 May 2015 11:42:17 -0400
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
 <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
Message-ID: <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>

On Wed, May 6, 2015 at 11:21 AM, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> Dear Erik,
>
> Thank you for the link! I agree that this idea is too raw for stdlib (there
> are problems with many argument functions, keyword arguments, etc.)
> Concerning the shell | vs. matrix @ I think it is a good idea to have
> both... but with different order.
> I mean in shell logic f | g means g (f (x)), while for matrix multiplication
> f @ g means f(g(x)).
> The former is probably more natural for people with more "programming"
> background, while the latter is more natural for people with a "scientific"
> background.
> We could now do good for both, since we now have a new operator.

Absolutely!  I've found that it takes a little work sometimes for
scientific users to wrap
their heads around the

g | f

syntax.  Once Python 3.5 is out I might add support for "f @ g" as
well, though I'm wary
of having more than one way to do it.  Worth trying out though, so
thanks for the idea.

Erik

> On 6 May 2015 at 16:10, Erik Bray <erik.m.bray at gmail.com> wrote:
>>
>> On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi <levkivskyi at gmail.com>
>> wrote:
>> > Dear all,
>> >
>> > The matrix multiplication operator @ is going to be introduced in Python
>> > 3.5
>> > and I am thinking about the following idea:
>> >
>> > The semantics of matrix multiplication is the composition of the
>> > corresponding linear transformations.
>> > A linear transformation is a particular example of a more general
>> > concept -
>> > functions.
>> > The latter are frequently composed with ("wrap") each other. For
>> > example:
>> >
>> > plot(real(sqrt(data)))
>> >
>> > However, it is not very readable in case of many wrapping layers.
>> > Therefore,
>> > it could be useful to employ
>> > the matrix multiplication operator @ for indication of function
>> > composition.
>> > This could be done by such (simplified) decorator:
>> >
>> > class composable:
>> >
>> >     def __init__(self, func):
>> >         self.func = func
>> >
>> >     def __call__(self, arg):
>> >         return self.func(arg)
>> >
>> >     def __matmul__(self, other):
>> >         def composition(*args, **kwargs):
>> >             return self.func(other(*args, **kwargs))
>> >         return composable(composition)
>> >
>> > I think using such decorator with functions that are going to be deeply
>> > wrapped
>> > could improve readability.
>> > You could compare (note that only the outermost function should be
>> > decorated):
>> >
>> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
>> > (data_array)
>> >
>> > I think the latter is more readable, also compare
>> >
>> > def sunique(lst):
>> >     return sorted(list(set(lst)))
>> >
>> > vs.
>> >
>> > sunique = sorted @ list @ set
>> >
>> > Apart from readability, there are following pros of the proposed
>> > decorator:
>> >
>> > 1. Similar semantics as for matrix multiplication.
>> > 2. Same symbol for composition as for decorators.
>> > 3. The symbol @ resembles mathematical notation for function
>> > composition: ?
>> >
>> > I think it could be a good idea to add such a decorator to the stdlib
>> > functools module.
>>
>> In the astropy.modeling package, which consists largely of collection
>> of fancy wrappers around analytic functions,
>> we used the pipe operator | (that is, __or__) to implement function
>> composition, as demonstrated here:
>>
>>
>> http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition
>>
>> I do like the idea of using the new @ operator for this purpose--it
>> makes sense as a generalization of linear operators,
>> and it just looks a little more like the circle operator often used
>> for functional composition.  On the other hand
>> I'm also fond of the choice to use |, for the similarity to UNIX shell
>> pipe operations, as long as it can't be confused with
>> __or__.  Point being something like this could be implemented now with
>> __or__.
>>
>> I think this is simple enough that it doesn't need to be in the
>> stdlib, especially if there are different ways people
>> would like to do this.  But I do like the idea.
>>
>> Erik
>
>

From steve at pearwood.info  Wed May  6 17:48:11 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 7 May 2015 01:48:11 +1000
Subject: [Python-ideas] (no subject)
In-Reply-To: <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
Message-ID: <20150506154811.GM5663@ando.pearwood.info>

On Wed, May 06, 2015 at 06:59:45AM -0700, Andrew Barnert via Python-ideas wrote:

> Python functions don't just take 1 parameter, they take any number of 
> parameters, possibly including optional parameters, keyword-only, 
> *args, **kwargs, etc.

Maybe Haskell programmers are used to functions which all take one 
argument, and f(a, b, c) is syntactic sugar for f(a)(b)(c), but I doubt 
anyone else is. When we Python programmers manually compose a function 
today, by writing an expression or a new function, we have to deal with 
the exact same problems. There's nothing new about the programmer 
needing to ensure that the function signatures are compatible:

def spam(a, b, c):
    return a+b+c

def eggs(x, y, z):
    return x*y/z

def composed(*args):
    return eggs(spam(*args))  # doesn't work

It is the programmer's responsibility to compose compatible functions. 
Why should it be a fatal flaw that the same limitation applies to a 
composition operator?

Besides, with Argument Clinic, it's possible that the @ operator could 
catch incompatible signatures ahead of time.


> There are a dozen different compose 
> implementations on PyPI and ActiveState that handle these differently. 

That is good evidence that this is functionality that people want.

> Which one is "right"?

Perhaps all of them? Perhaps none of them? There are lots of buggy or 
badly designed functions and classes on the internet. Perhaps that 
suggests that the std lib should solve it right once and for all.


> The design you describe can be easily implemented as a third-party 
> library. Why not do so, put it on PyPI, see if you get any traction 
> and any ideas for improvement, and then suggest it for the stdlib?

I agree that this idea needs to have some real use before it can be 
added to the std lib, but see below for a counter-objection to the PyPI 
objection.


> The same thing is already doable today using a different 
> operator--and, again, there are a dozen implementations. Why isn't 
> anyone using them?

It takes a certain amount of effort for people to discover and use a 
third-party library: one has to find a library, or libraries, determine 
that it is mature, decide which competing library to use, determine if 
the licence is suitable, download and install it. This "activiation 
energy" is insignificant if the library does something big, say, like 
numpy, or nltk, or even medium sized.

But for a library that provides effectively a single function, that 
activation energy is a barrier to entry. It's not that the function 
isn't useful, or that people wouldn't use it if it were already 
available. It's just that the effort to get it is too much bother. 
People will do without, or re-invent the wheel. (Re-inventing the wheel 
is at least fun. Searching PyPI and reading licences is not.)


> Thinking in terms of function composition requires a higher level of 
> abstraction than thinking in terms of lambda expressions.

Do you think its harder than, say, the "async for" feature that's just 
been approved by Guido?

Compared to asynchronous code, I would say function composition is 
trivial. Anyone who can learn the correspondence

    (a @ b)(arg)  <=> a(b(arg))

can deal with it.


> Python doesn't have a static optimizing compiler that can avoid 
> building 4 temporary function objects to evaluate (plot @ sorted @ 
> sqrt @ real) (data_array), so it will make your code significantly 
> less efficient.

Why would it necessarily have to create 4 temporary function objects? 
Besides, the rules for optimization apply here too: don't dismiss 
something as too slow until you've measured it :-)

We shouldn't care about the cost of the @ operator itself, only the cost 
of calling the composed functions. Building the Composed object 
generally happens only once, while calling it generally happens many 
times.


> Is @ for composition and () for application really sufficient to write 
> point free code in general without auto-curried functions, operator 
> sectioning, reverse compose, reverse apply, etc.? Most of the examples 
> people use in describing the feature from Haskell have a (+ 1) or (== 
> x) or take advantage of map-type functions being (a->b) -> ([a] -> 
> [b]) instead of (a->b, [a]) -> [b].

See, now *that's* why people consider Haskell to be difficult: it is 
based on areas of mathematics which even maths graduates may never have 
come across. But function composition is taught in high school. (At 
least in Australia, and I expect Europe and Japan.) It's a nice, simple 
and useful functional tool, like partial.


-- 
Steve

From guido at python.org  Wed May  6 17:48:08 2015
From: guido at python.org (Guido van Rossum)
Date: Wed, 6 May 2015 08:48:08 -0700
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
 <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
 <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>
Message-ID: <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>

I realize this is still python-ideas, but does this really leave functions
with multiple arguments completely out of the picture (except as the first
stage in the pipeline)?

On Wed, May 6, 2015 at 8:42 AM, Erik Bray <erik.m.bray at gmail.com> wrote:

> On Wed, May 6, 2015 at 11:21 AM, Ivan Levkivskyi <levkivskyi at gmail.com>
> wrote:
> > Dear Erik,
> >
> > Thank you for the link! I agree that this idea is too raw for stdlib
> (there
> > are problems with many argument functions, keyword arguments, etc.)
> > Concerning the shell | vs. matrix @ I think it is a good idea to have
> > both... but with different order.
> > I mean in shell logic f | g means g (f (x)), while for matrix
> multiplication
> > f @ g means f(g(x)).
> > The former is probably more natural for people with more "programming"
> > background, while the latter is more natural for people with a
> "scientific"
> > background.
> > We could now do good for both, since we now have a new operator.
>
> Absolutely!  I've found that it takes a little work sometimes for
> scientific users to wrap
> their heads around the
>
> g | f
>
> syntax.  Once Python 3.5 is out I might add support for "f @ g" as
> well, though I'm wary
> of having more than one way to do it.  Worth trying out though, so
> thanks for the idea.
>
> Erik
>
> > On 6 May 2015 at 16:10, Erik Bray <erik.m.bray at gmail.com> wrote:
> >>
> >> On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi <levkivskyi at gmail.com>
> >> wrote:
> >> > Dear all,
> >> >
> >> > The matrix multiplication operator @ is going to be introduced in
> Python
> >> > 3.5
> >> > and I am thinking about the following idea:
> >> >
> >> > The semantics of matrix multiplication is the composition of the
> >> > corresponding linear transformations.
> >> > A linear transformation is a particular example of a more general
> >> > concept -
> >> > functions.
> >> > The latter are frequently composed with ("wrap") each other. For
> >> > example:
> >> >
> >> > plot(real(sqrt(data)))
> >> >
> >> > However, it is not very readable in case of many wrapping layers.
> >> > Therefore,
> >> > it could be useful to employ
> >> > the matrix multiplication operator @ for indication of function
> >> > composition.
> >> > This could be done by such (simplified) decorator:
> >> >
> >> > class composable:
> >> >
> >> >     def __init__(self, func):
> >> >         self.func = func
> >> >
> >> >     def __call__(self, arg):
> >> >         return self.func(arg)
> >> >
> >> >     def __matmul__(self, other):
> >> >         def composition(*args, **kwargs):
> >> >             return self.func(other(*args, **kwargs))
> >> >         return composable(composition)
> >> >
> >> > I think using such decorator with functions that are going to be
> deeply
> >> > wrapped
> >> > could improve readability.
> >> > You could compare (note that only the outermost function should be
> >> > decorated):
> >> >
> >> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
> >> > (data_array)
> >> >
> >> > I think the latter is more readable, also compare
> >> >
> >> > def sunique(lst):
> >> >     return sorted(list(set(lst)))
> >> >
> >> > vs.
> >> >
> >> > sunique = sorted @ list @ set
> >> >
> >> > Apart from readability, there are following pros of the proposed
> >> > decorator:
> >> >
> >> > 1. Similar semantics as for matrix multiplication.
> >> > 2. Same symbol for composition as for decorators.
> >> > 3. The symbol @ resembles mathematical notation for function
> >> > composition: ?
> >> >
> >> > I think it could be a good idea to add such a decorator to the stdlib
> >> > functools module.
> >>
> >> In the astropy.modeling package, which consists largely of collection
> >> of fancy wrappers around analytic functions,
> >> we used the pipe operator | (that is, __or__) to implement function
> >> composition, as demonstrated here:
> >>
> >>
> >>
> http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition
> >>
> >> I do like the idea of using the new @ operator for this purpose--it
> >> makes sense as a generalization of linear operators,
> >> and it just looks a little more like the circle operator often used
> >> for functional composition.  On the other hand
> >> I'm also fond of the choice to use |, for the similarity to UNIX shell
> >> pipe operations, as long as it can't be confused with
> >> __or__.  Point being something like this could be implemented now with
> >> __or__.
> >>
> >> I think this is simple enough that it doesn't need to be in the
> >> stdlib, especially if there are different ways people
> >> would like to do this.  But I do like the idea.
> >>
> >> Erik
> >
> >
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/09cd6ce5/attachment.html>

From guido at python.org  Wed May  6 18:01:26 2015
From: guido at python.org (Guido van Rossum)
Date: Wed, 6 May 2015 09:01:26 -0700
Subject: [Python-ideas] (no subject)
In-Reply-To: <20150506154811.GM5663@ando.pearwood.info>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
 <20150506154811.GM5663@ando.pearwood.info>
Message-ID: <CAP7+vJJE2-43zUFHL8NhjLZc3VnGbJa4Yoiq9HgAT-Z-wOePeA@mail.gmail.com>

On Wed, May 6, 2015 at 8:48 AM, Steven D'Aprano <steve at pearwood.info> wrote:

> Compared to asynchronous code, I would say function composition is
> trivial. Anyone who can learn the correspondence
>
>     (a @ b)(arg)  <=> a(b(arg))
>
> can deal with it.


Personally, I can certainly "deal" with it, but it'll never come naturally
to me. As soon as I see code like this I have to mentally pick it apart and
rewrite it in the more familiar form before I understand what's going on.

Maybe if I needed this frequently I'd learn to fly with it, but I just
don't see the need that often. I see things like f().g() much more often
than f(g()).

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/f718d000/attachment-0001.html>

From erik.m.bray at gmail.com  Wed May  6 18:04:10 2015
From: erik.m.bray at gmail.com (Erik Bray)
Date: Wed, 6 May 2015 12:04:10 -0400
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
 <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
 <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>
 <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
Message-ID: <CAOTD34ZskQuE7_yv7bdkfkJ2ETeEWV3f6=3R_5vY5YXDSqoN7w@mail.gmail.com>

On Wed, May 6, 2015 at 11:48 AM, Guido van Rossum <guido at python.org> wrote:
> I realize this is still python-ideas, but does this really leave functions
> with multiple arguments completely out of the picture (except as the first
> stage in the pipeline)?

I'm not sure exactly what this is in response to, but in the case of
astropy.modeling, any function
at any point in the chain can take multiple arguments, as long as the
function its is composed with
returns the right number of outputs (as a tuple).

There is also a "Mapping" object that allows remapping arguments so if
for example you want to
swap the return values of one function before passing them as inputs
to the next function.  You can
also duplicate outputs, drop outputs, inline new ones, etc.  It can
get quite arbitrarily complex, and there
aren't enough facilities in place yet to visualize complicated
function compositions as I would like.  But
this is already being put to good use.

Nothing about this is particular to astropy.modeling--the same
approach could be used in a generic function
composition operator.

Erik

> On Wed, May 6, 2015 at 8:42 AM, Erik Bray <erik.m.bray at gmail.com> wrote:
>>
>> On Wed, May 6, 2015 at 11:21 AM, Ivan Levkivskyi <levkivskyi at gmail.com>
>> wrote:
>> > Dear Erik,
>> >
>> > Thank you for the link! I agree that this idea is too raw for stdlib
>> > (there
>> > are problems with many argument functions, keyword arguments, etc.)
>> > Concerning the shell | vs. matrix @ I think it is a good idea to have
>> > both... but with different order.
>> > I mean in shell logic f | g means g (f (x)), while for matrix
>> > multiplication
>> > f @ g means f(g(x)).
>> > The former is probably more natural for people with more "programming"
>> > background, while the latter is more natural for people with a
>> > "scientific"
>> > background.
>> > We could now do good for both, since we now have a new operator.
>>
>> Absolutely!  I've found that it takes a little work sometimes for
>> scientific users to wrap
>> their heads around the
>>
>> g | f
>>
>> syntax.  Once Python 3.5 is out I might add support for "f @ g" as
>> well, though I'm wary
>> of having more than one way to do it.  Worth trying out though, so
>> thanks for the idea.
>>
>> Erik
>>
>> > On 6 May 2015 at 16:10, Erik Bray <erik.m.bray at gmail.com> wrote:
>> >>
>> >> On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi <levkivskyi at gmail.com>
>> >> wrote:
>> >> > Dear all,
>> >> >
>> >> > The matrix multiplication operator @ is going to be introduced in
>> >> > Python
>> >> > 3.5
>> >> > and I am thinking about the following idea:
>> >> >
>> >> > The semantics of matrix multiplication is the composition of the
>> >> > corresponding linear transformations.
>> >> > A linear transformation is a particular example of a more general
>> >> > concept -
>> >> > functions.
>> >> > The latter are frequently composed with ("wrap") each other. For
>> >> > example:
>> >> >
>> >> > plot(real(sqrt(data)))
>> >> >
>> >> > However, it is not very readable in case of many wrapping layers.
>> >> > Therefore,
>> >> > it could be useful to employ
>> >> > the matrix multiplication operator @ for indication of function
>> >> > composition.
>> >> > This could be done by such (simplified) decorator:
>> >> >
>> >> > class composable:
>> >> >
>> >> >     def __init__(self, func):
>> >> >         self.func = func
>> >> >
>> >> >     def __call__(self, arg):
>> >> >         return self.func(arg)
>> >> >
>> >> >     def __matmul__(self, other):
>> >> >         def composition(*args, **kwargs):
>> >> >             return self.func(other(*args, **kwargs))
>> >> >         return composable(composition)
>> >> >
>> >> > I think using such decorator with functions that are going to be
>> >> > deeply
>> >> > wrapped
>> >> > could improve readability.
>> >> > You could compare (note that only the outermost function should be
>> >> > decorated):
>> >> >
>> >> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @
>> >> > real)
>> >> > (data_array)
>> >> >
>> >> > I think the latter is more readable, also compare
>> >> >
>> >> > def sunique(lst):
>> >> >     return sorted(list(set(lst)))
>> >> >
>> >> > vs.
>> >> >
>> >> > sunique = sorted @ list @ set
>> >> >
>> >> > Apart from readability, there are following pros of the proposed
>> >> > decorator:
>> >> >
>> >> > 1. Similar semantics as for matrix multiplication.
>> >> > 2. Same symbol for composition as for decorators.
>> >> > 3. The symbol @ resembles mathematical notation for function
>> >> > composition: ?
>> >> >
>> >> > I think it could be a good idea to add such a decorator to the stdlib
>> >> > functools module.
>> >>
>> >> In the astropy.modeling package, which consists largely of collection
>> >> of fancy wrappers around analytic functions,
>> >> we used the pipe operator | (that is, __or__) to implement function
>> >> composition, as demonstrated here:
>> >>
>> >>
>> >>
>> >> http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition
>> >>
>> >> I do like the idea of using the new @ operator for this purpose--it
>> >> makes sense as a generalization of linear operators,
>> >> and it just looks a little more like the circle operator often used
>> >> for functional composition.  On the other hand
>> >> I'm also fond of the choice to use |, for the similarity to UNIX shell
>> >> pipe operations, as long as it can't be confused with
>> >> __or__.  Point being something like this could be implemented now with
>> >> __or__.
>> >>
>> >> I think this is simple enough that it doesn't need to be in the
>> >> stdlib, especially if there are different ways people
>> >> would like to do this.  But I do like the idea.
>> >>
>> >> Erik
>> >
>> >
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)

From levkivskyi at gmail.com  Wed May  6 18:10:22 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 18:10:22 +0200
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
 <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
 <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>
 <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
Message-ID: <CAOMjWkmcuc0R6B+EDhk6r+cNTEDewFEevZ8zUYyJmfVho78vRQ@mail.gmail.com>

Dear Guido,

My original idea was to make the composable functions auto-curried (similar
to proposed here
http://code.activestate.com/recipes/52902-function-composition/ as pointed
out by Steve) so that

my_fun = square @ add(1)
my_fun(x)

evaluates to

square(add(1,x))


On 6 May 2015 at 17:48, Guido van Rossum <guido at python.org> wrote:

> I realize this is still python-ideas, but does this really leave functions
> with multiple arguments completely out of the picture (except as the first
> stage in the pipeline)?
>
> On Wed, May 6, 2015 at 8:42 AM, Erik Bray <erik.m.bray at gmail.com> wrote:
>
>> On Wed, May 6, 2015 at 11:21 AM, Ivan Levkivskyi <levkivskyi at gmail.com>
>> wrote:
>> > Dear Erik,
>> >
>> > Thank you for the link! I agree that this idea is too raw for stdlib
>> (there
>> > are problems with many argument functions, keyword arguments, etc.)
>> > Concerning the shell | vs. matrix @ I think it is a good idea to have
>> > both... but with different order.
>> > I mean in shell logic f | g means g (f (x)), while for matrix
>> multiplication
>> > f @ g means f(g(x)).
>> > The former is probably more natural for people with more "programming"
>> > background, while the latter is more natural for people with a
>> "scientific"
>> > background.
>> > We could now do good for both, since we now have a new operator.
>>
>> Absolutely!  I've found that it takes a little work sometimes for
>> scientific users to wrap
>> their heads around the
>>
>> g | f
>>
>> syntax.  Once Python 3.5 is out I might add support for "f @ g" as
>> well, though I'm wary
>> of having more than one way to do it.  Worth trying out though, so
>> thanks for the idea.
>>
>> Erik
>>
>> > On 6 May 2015 at 16:10, Erik Bray <erik.m.bray at gmail.com> wrote:
>> >>
>> >> On Wed, May 6, 2015 at 9:20 AM, Ivan Levkivskyi <levkivskyi at gmail.com>
>> >> wrote:
>> >> > Dear all,
>> >> >
>> >> > The matrix multiplication operator @ is going to be introduced in
>> Python
>> >> > 3.5
>> >> > and I am thinking about the following idea:
>> >> >
>> >> > The semantics of matrix multiplication is the composition of the
>> >> > corresponding linear transformations.
>> >> > A linear transformation is a particular example of a more general
>> >> > concept -
>> >> > functions.
>> >> > The latter are frequently composed with ("wrap") each other. For
>> >> > example:
>> >> >
>> >> > plot(real(sqrt(data)))
>> >> >
>> >> > However, it is not very readable in case of many wrapping layers.
>> >> > Therefore,
>> >> > it could be useful to employ
>> >> > the matrix multiplication operator @ for indication of function
>> >> > composition.
>> >> > This could be done by such (simplified) decorator:
>> >> >
>> >> > class composable:
>> >> >
>> >> >     def __init__(self, func):
>> >> >         self.func = func
>> >> >
>> >> >     def __call__(self, arg):
>> >> >         return self.func(arg)
>> >> >
>> >> >     def __matmul__(self, other):
>> >> >         def composition(*args, **kwargs):
>> >> >             return self.func(other(*args, **kwargs))
>> >> >         return composable(composition)
>> >> >
>> >> > I think using such decorator with functions that are going to be
>> deeply
>> >> > wrapped
>> >> > could improve readability.
>> >> > You could compare (note that only the outermost function should be
>> >> > decorated):
>> >> >
>> >> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @
>> real)
>> >> > (data_array)
>> >> >
>> >> > I think the latter is more readable, also compare
>> >> >
>> >> > def sunique(lst):
>> >> >     return sorted(list(set(lst)))
>> >> >
>> >> > vs.
>> >> >
>> >> > sunique = sorted @ list @ set
>> >> >
>> >> > Apart from readability, there are following pros of the proposed
>> >> > decorator:
>> >> >
>> >> > 1. Similar semantics as for matrix multiplication.
>> >> > 2. Same symbol for composition as for decorators.
>> >> > 3. The symbol @ resembles mathematical notation for function
>> >> > composition: ?
>> >> >
>> >> > I think it could be a good idea to add such a decorator to the stdlib
>> >> > functools module.
>> >>
>> >> In the astropy.modeling package, which consists largely of collection
>> >> of fancy wrappers around analytic functions,
>> >> we used the pipe operator | (that is, __or__) to implement function
>> >> composition, as demonstrated here:
>> >>
>> >>
>> >>
>> http://docs.astropy.org/en/stable/modeling/compound-models.html#model-composition
>> >>
>> >> I do like the idea of using the new @ operator for this purpose--it
>> >> makes sense as a generalization of linear operators,
>> >> and it just looks a little more like the circle operator often used
>> >> for functional composition.  On the other hand
>> >> I'm also fond of the choice to use |, for the similarity to UNIX shell
>> >> pipe operations, as long as it can't be confused with
>> >> __or__.  Point being something like this could be implemented now with
>> >> __or__.
>> >>
>> >> I think this is simple enough that it doesn't need to be in the
>> >> stdlib, especially if there are different ways people
>> >> would like to do this.  But I do like the idea.
>> >>
>> >> Erik
>> >
>> >
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/9c160335/attachment-0001.html>

From rosuav at gmail.com  Wed May  6 18:15:58 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 7 May 2015 02:15:58 +1000
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CAOMjWkmcuc0R6B+EDhk6r+cNTEDewFEevZ8zUYyJmfVho78vRQ@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
 <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
 <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>
 <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
 <CAOMjWkmcuc0R6B+EDhk6r+cNTEDewFEevZ8zUYyJmfVho78vRQ@mail.gmail.com>
Message-ID: <CAPTjJmri5wHvDfUHd0N68oa0yxCE7xjsKGjS2ojaD0iCHbKxRQ@mail.gmail.com>

On Thu, May 7, 2015 at 2:10 AM, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> Dear Guido,
>
> My original idea was to make the composable functions auto-curried (similar
> to proposed here
> http://code.activestate.com/recipes/52902-function-composition/ as pointed
> out by Steve) so that
>
> my_fun = square @ add(1)
> my_fun(x)
>
> evaluates to
>
> square(add(1,x))

Hmm. This would require that your composable functions autocurry,
which may be tricky to do in the general case. It also requires that
the right hand function be composable, unlike in your earlier example.

ChrisA

From steve at pearwood.info  Wed May  6 18:17:55 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 7 May 2015 02:17:55 +1000
Subject: [Python-ideas] (no subject)
In-Reply-To: <CAP7+vJJE2-43zUFHL8NhjLZc3VnGbJa4Yoiq9HgAT-Z-wOePeA@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
 <20150506154811.GM5663@ando.pearwood.info>
 <CAP7+vJJE2-43zUFHL8NhjLZc3VnGbJa4Yoiq9HgAT-Z-wOePeA@mail.gmail.com>
Message-ID: <20150506161754.GN5663@ando.pearwood.info>

On Wed, May 06, 2015 at 09:01:26AM -0700, Guido van Rossum wrote:
> On Wed, May 6, 2015 at 8:48 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> > Compared to asynchronous code, I would say function composition is
> > trivial. Anyone who can learn the correspondence
> >
> >     (a @ b)(arg)  <=> a(b(arg))
> >
> > can deal with it.
> 
> 
> Personally, I can certainly "deal" with it, but it'll never come naturally
> to me. As soon as I see code like this I have to mentally pick it apart and
> rewrite it in the more familiar form before I understand what's going on.

Yes, I remember you dislike reduce() as well :-)


> Maybe if I needed this frequently I'd learn to fly with it, but I just
> don't see the need that often. I see things like f().g() much more often
> than f(g()).

Perhaps it has something to do with my background in mathematics, I see 
things like f(g()) all the time.

I will admit that, going back to my teens in high school, it took a 
little while for the "f o g" notation to really sink in. I remember 
writing study notes to learn it.

But I still have to look up the order of list.insert() every single time 
I use it, and I can never remember whether functools.partial applies its 
arguments from the left or the right, and let's not even mention list 
comps with more than one "for" clause. Some things just never become 
entirely comfortable to some people.

I look forward to many weeks or months trying to wrap my head around 
asyncronous programming too :-)


-- 
Steve

From ericsnowcurrently at gmail.com  Wed May  6 18:23:09 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 6 May 2015 10:23:09 -0600
Subject: [Python-ideas] discouraging direct use of the C-API
Message-ID: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>

A big blocker to making certain sweeping changes to CPython (e.g.
ref-counting) is compatibility with the vast body of C extension
modules out there that use the C-API.  While there are certainly
drastic long-term solutions to that problem, there is one thing we can
do in the short-term that would at least get the ball rolling.  We can
put a big red note at the top of every page of the C-API docs that
encourages folks to either use CFFI or Cython.

Thoughts?

-eric

From levkivskyi at gmail.com  Wed May  6 18:23:25 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 18:23:25 +0200
Subject: [Python-ideas] (no subject)
Message-ID: <CAOMjWk=Vbk6Kp6B4zRXzg8EY1yLGaS1yssDTDR=SdvcZqGAdeg@mail.gmail.com>

I should clarify why I would like to have the possibility to easily compose
functions.
I am a physicist (not a real programmer), and in my code I often compose
functions.

To do this I need to write something like

def new_func(x):
    return f(g(h(x)))

This means I see f(g(h())) quite often and I would prefer to see f @ g @ h
instead.

> > Compared to asynchronous code, I would say function composition is
> > trivial. Anyone who can learn the correspondence
> >
> >     (a @ b)(arg)  <=> a(b(arg))
> >
> > can deal with it.
>
>
> Personally, I can certainly "deal" with it, but it'll never come naturally
> to me. As soon as I see code like this I have to mentally pick it apart
and
> rewrite it in the more familiar form before I understand what's going on.
>
> Maybe if I needed this frequently I'd learn to fly with it, but I just
> don't see the need that often. I see things like f().g() much more often
> than f(g()).
>
> --
> --Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/de49dcb2/attachment.html>

From stefan_ml at behnel.de  Wed May  6 18:36:29 2015
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 06 May 2015 18:36:29 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
Message-ID: <midfud$pqq$1@ger.gmane.org>

Eric Snow schrieb am 06.05.2015 um 18:23:
> A big blocker to making certain sweeping changes to CPython (e.g.
> ref-counting) is compatibility with the vast body of C extension
> modules out there that use the C-API.  While there are certainly
> drastic long-term solutions to that problem, there is one thing we can
> do in the short-term that would at least get the ball rolling.  We can
> put a big red note at the top of every page of the C-API docs that
> encourages folks to either use CFFI or Cython.

I've been advocating that for years now: leave the low-level stuff to the
experts. (There's a reason why Cython code is usually faster than C-API code.)

Not sure how big, fat and red the warning needs to be, but a big +1 from me.

Stefan



From guido at python.org  Wed May  6 18:41:10 2015
From: guido at python.org (Guido van Rossum)
Date: Wed, 6 May 2015 09:41:10 -0700
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
Message-ID: <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>

On Wed, May 6, 2015 at 9:23 AM, Eric Snow <ericsnowcurrently at gmail.com>
wrote:

> A big blocker to making certain sweeping changes to CPython (e.g.
> ref-counting) is compatibility with the vast body of C extension
> modules out there that use the C-API.  While there are certainly
> drastic long-term solutions to that problem, there is one thing we can
> do in the short-term that would at least get the ball rolling.  We can
> put a big red note at the top of every page of the C-API docs that
> encourages folks to either use CFFI or Cython.
>
> Thoughts?
>

I think Cython is already used by those people who benefit from it.

As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have
some really outdated versions in the CPython tree and nobody wants to step
in and upgrade these to the latest CFFI, for some reason (such as that that
would actually break a lot of code because the latest version is so
different from the version we currently include?).

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/b04cb4de/attachment-0001.html>

From encukou at gmail.com  Wed May  6 18:45:05 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Wed, 6 May 2015 18:45:05 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <midfud$pqq$1@ger.gmane.org>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <midfud$pqq$1@ger.gmane.org>
Message-ID: <CA+=+wqCpREJKgSKw_Dxd73cHJzUxM+CjXGgk2pdVdFkRCmUD8g@mail.gmail.com>

On Wed, May 6, 2015 at 6:36 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Eric Snow schrieb am 06.05.2015 um 18:23:
>> A big blocker to making certain sweeping changes to CPython (e.g.
>> ref-counting) is compatibility with the vast body of C extension
>> modules out there that use the C-API.  While there are certainly
>> drastic long-term solutions to that problem, there is one thing we can
>> do in the short-term that would at least get the ball rolling.  We can
>> put a big red note at the top of every page of the C-API docs that
>> encourages folks to either use CFFI or Cython.
>
> I've been advocating that for years now: leave the low-level stuff to the
> experts. (There's a reason why Cython code is usually faster than C-API code.)
>
> Not sure how big, fat and red the warning needs to be, but a big +1 from me.

Probably not too big. Cython and CFFI are easier to use, so people who
know about them, and can afford the extra dependency*, should use
them. I a pointer would be enough, perhaps like in
https://docs.python.org/3/library/urllib.request.html

* Possibly build-time dependency. Or a complete rewrite, in case of
existing code.

From levkivskyi at gmail.com  Wed May  6 18:55:16 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 18:55:16 +0200
Subject: [Python-ideas] Add 'composable' decorator to functools
Message-ID: <CAOMjWk=5e8ynnuzD6EoKkcCQbDwZbYir5D2NBjj4ajxYsi_22w@mail.gmail.com>

Dear Chris,

> > My original idea was to make the composable functions auto-curried
(similar
> > to proposed here
> > http://code.activestate.com/recipes/52902-function-composition/ as
pointed
> > out by Steve) so that
> >
> > my_fun = square @ add(1)
> > my_fun(x)
> >
> > evaluates to
> >
> > square(add(1,x))

> It also requires that
> the right hand function be
> composable, unlike in your earlier
> example.

This is true. One can only use single argument "normal" functions. Multiple
argument ones should be made "composable".

>
> ChrisA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/7e3b728b/attachment.html>

From solipsis at pitrou.net  Wed May  6 18:57:15 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 6 May 2015 18:57:15 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
Message-ID: <20150506185715.2083b063@fsol>

On Wed, 6 May 2015 10:23:09 -0600
Eric Snow <ericsnowcurrently at gmail.com>
wrote:
> A big blocker to making certain sweeping changes to CPython (e.g.
> ref-counting) is compatibility with the vast body of C extension
> modules out there that use the C-API.  While there are certainly
> drastic long-term solutions to that problem, there is one thing we can
> do in the short-term that would at least get the ball rolling.  We can
> put a big red note at the top of every page of the C-API docs that
> encourages folks to either use CFFI or Cython.

CFFI is only useful for a small subset of stuff people use the C API for
(mainly, thin wrappers around external libraries). Cython is a more
reasonable suggestion in this context.

I would advocate against red warning boxes. Warnings are for
potentially dangerous constructs, we use them mainly for security
issues. Adding a note and some pointers at the start of the C API docs
may be enough.

Regards

Antoine.



From donald at stufft.io  Wed May  6 19:13:57 2015
From: donald at stufft.io (Donald Stufft)
Date: Wed, 6 May 2015 13:13:57 -0400
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <20150506185715.2083b063@fsol>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <20150506185715.2083b063@fsol>
Message-ID: <7CE0A9A8-8688-4F7D-A47B-074C5148883A@stufft.io>


> On May 6, 2015, at 12:57 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> 
> On Wed, 6 May 2015 10:23:09 -0600
> Eric Snow <ericsnowcurrently at gmail.com>
> wrote:
>> A big blocker to making certain sweeping changes to CPython (e.g.
>> ref-counting) is compatibility with the vast body of C extension
>> modules out there that use the C-API.  While there are certainly
>> drastic long-term solutions to that problem, there is one thing we can
>> do in the short-term that would at least get the ball rolling.  We can
>> put a big red note at the top of every page of the C-API docs that
>> encourages folks to either use CFFI or Cython.
> 
> CFFI is only useful for a small subset of stuff people use the C API for
> (mainly, thin wrappers around external libraries). Cython is a more
> reasonable suggestion in this context.

You can write stuff in C itself for cffi too, it?s not just for C bindings,
an example would be the .c?s and .h?s for padding and constant time compare
in the cryptography project [1].

[1] https://github.com/pyca/cryptography/tree/master/src/cryptography/hazmat/primitives/src

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/f50503bc/attachment.sig>

From p.f.moore at gmail.com  Wed May  6 20:44:25 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 6 May 2015 19:44:25 +0100
Subject: [Python-ideas] (no subject)
In-Reply-To: <CAOMjWk=Vbk6Kp6B4zRXzg8EY1yLGaS1yssDTDR=SdvcZqGAdeg@mail.gmail.com>
References: <CAOMjWk=Vbk6Kp6B4zRXzg8EY1yLGaS1yssDTDR=SdvcZqGAdeg@mail.gmail.com>
Message-ID: <CACac1F9Y-4CvNCySpfzufuPaWfWjC0PR5s6KB4a4g47KukeRZw@mail.gmail.com>

On 6 May 2015 at 17:23, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> I should clarify why I would like to have the possibility to easily compose
> functions.
> I am a physicist (not a real programmer), and in my code I often compose
> functions.
>
> To do this I need to write something like
>
> def new_func(x):
>     return f(g(h(x)))
>
> This means I see f(g(h())) quite often and I would prefer to see f @ g @ h
> instead.

I appreciate that it's orthogonal to the proposal, but would a utility
function like this be useful?

def compose(*fns):
    def composed(x):
        for f in reversed(fns):
            x = f(x)
        return x
    return composed

comp = compose(f, g, h)
# comp(x) = f(g(h(x)))

Paul

From p.f.moore at gmail.com  Wed May  6 20:56:50 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 6 May 2015 19:56:50 +0100
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>
Message-ID: <CACac1F_YPRjceF9T4OYsZwhZnZE1faonQ2UtHFE-rq9MUMj5oQ@mail.gmail.com>

On 6 May 2015 at 17:41, Guido van Rossum <guido at python.org> wrote:
> As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have
> some really outdated versions in the CPython tree and nobody wants to step
> in and upgrade these to the latest CFFI, for some reason (such as that that
> would actually break a lot of code because the latest version is so
> different from the version we currently include?).

I think you are referring to libffi (used by ctypes) here rather than cffi.

Libffi isn't really relevant to the original topic, but the big issue
there is that ctypes on Windows uses a patched copy of a pretty old
version of libffi. The reason it does is to trap stack usage errors so
that it can give a ValueError if you call a C function with the wrong
number of arguments. libffi doesn't offer a way to do this, so
migrating to the latest libffi would mean hacking out that code, and a
loss of the stack checking (which is currently tested for, so although
it's not exactly an API guarantee, it would be a compatibility break).

Otherwise, it's mostly a matter of getting the build steps to work.
Zach Ware got 32-bit builds going, and his approach (use git bash to
run configure and keep a copy of the results) should in principle be
fine for 64-bit, but I stalled because I've no way of testing a libffi
build short of building the whole of Python and running the ctypes
tests, which is both heavy handed and likely to obscure the root cause
of any actual bugs found that way :-(

The big problem is that ctypes with the current embedded libffi is
"good enough" on Windows, which is where the bulk of ctypes usage
occurs. So there's no compelling reason to put in the work to upgrade
it. And (as usual) few people with the necessary expertise (in this
case, Windows, C, Unix-style build processes, and assembler-level
calling conventions - a pretty impressive mix!).

Paul

From guido at python.org  Wed May  6 21:28:35 2015
From: guido at python.org (Guido van Rossum)
Date: Wed, 6 May 2015 12:28:35 -0700
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CACac1F_YPRjceF9T4OYsZwhZnZE1faonQ2UtHFE-rq9MUMj5oQ@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>
 <CACac1F_YPRjceF9T4OYsZwhZnZE1faonQ2UtHFE-rq9MUMj5oQ@mail.gmail.com>
Message-ID: <CAP7+vJLV2NWpvjeTFPwbJR9W50NBb5M4YnMOMnS7aPWSeMLXDQ@mail.gmail.com>

On Wed, May 6, 2015 at 11:56 AM, Paul Moore <p.f.moore at gmail.com> wrote:

> On 6 May 2015 at 17:41, Guido van Rossum <guido at python.org> wrote:
> > As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have
> > some really outdated versions in the CPython tree and nobody wants to
> step
> > in and upgrade these to the latest CFFI, for some reason (such as that
> that
> > would actually break a lot of code because the latest version is so
> > different from the version we currently include?).
>
> I think you are referring to libffi (used by ctypes) here rather than cffi.
>

Oh dear. I think you're right. :-(

Forget I said anything. Naming is hard.

I'm still not sure how realistic it is to try and deprecate the C API.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/5b2ebbc0/attachment.html>

From kaiser.yann at gmail.com  Wed May  6 21:32:54 2015
From: kaiser.yann at gmail.com (Yann Kaiser)
Date: Wed, 06 May 2015 19:32:54 +0000
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CAOMjWkmcuc0R6B+EDhk6r+cNTEDewFEevZ8zUYyJmfVho78vRQ@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
 <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
 <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>
 <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
 <CAOMjWkmcuc0R6B+EDhk6r+cNTEDewFEevZ8zUYyJmfVho78vRQ@mail.gmail.com>
Message-ID: <CANUJvPUfpK=-uJRKoWy2-ECAn+LGvKWJ34z8n4Rou0FkH7ZXsA@mail.gmail.com>

On Wed, 6 May 2015 at 09:10 Ivan Levkivskyi <levkivskyi at gmail.com> wrote:

> Dear Guido,
>
> My original idea was to make the composable functions auto-curried
> (similar to proposed here
> http://code.activestate.com/recipes/52902-function-composition/ as
> pointed out by Steve) so that
>
> my_fun = square @ add(1)
> my_fun(x)
>
> evaluates to
>
> square(add(1,x))
>

This breaks the (IMO) fundamental expectation that

    z = add(1)
    my_fun = square @ z

is equivalent to

    my_fun = square @ add(1)

-Yann
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/8dc1e45e/attachment.html>

From me at the-compiler.org  Wed May  6 21:46:25 2015
From: me at the-compiler.org (Florian Bruhin)
Date: Wed, 6 May 2015 21:46:25 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CAP7+vJLV2NWpvjeTFPwbJR9W50NBb5M4YnMOMnS7aPWSeMLXDQ@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>
 <CACac1F_YPRjceF9T4OYsZwhZnZE1faonQ2UtHFE-rq9MUMj5oQ@mail.gmail.com>
 <CAP7+vJLV2NWpvjeTFPwbJR9W50NBb5M4YnMOMnS7aPWSeMLXDQ@mail.gmail.com>
Message-ID: <20150506194625.GO429@tonks>

* Guido van Rossum <guido at python.org> [2015-05-06 12:28:35 -0700]:
> On Wed, May 6, 2015 at 11:56 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> 
> > On 6 May 2015 at 17:41, Guido van Rossum <guido at python.org> wrote:
> > > As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have
> > > some really outdated versions in the CPython tree and nobody wants to
> > step
> > > in and upgrade these to the latest CFFI, for some reason (such as that
> > that
> > > would actually break a lot of code because the latest version is so
> > > different from the version we currently include?).
> >
> > I think you are referring to libffi (used by ctypes) here rather than cffi.
> >
> 
> Oh dear. I think you're right. :-(
> 
> Forget I said anything. Naming is hard.
> 
> I'm still not sure how realistic it is to try and deprecate the C API.

I don't think it should be *deprecated*, but I agree the documentation
should point out the (probably better) alternatives.

From time to time, in the #python IRC channel there are people who
want to use C libraries with Python and try doing so with the C API
because they aren't aware of the alternatives.

Florian

-- 
http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP)
   GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc
         I love long mails! | http://email.is-not-s.ms/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/6264037f/attachment.sig>

From jsbueno at python.org.br  Wed May  6 21:49:04 2015
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Wed, 6 May 2015 16:49:04 -0300
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CAP7+vJLV2NWpvjeTFPwbJR9W50NBb5M4YnMOMnS7aPWSeMLXDQ@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>
 <CACac1F_YPRjceF9T4OYsZwhZnZE1faonQ2UtHFE-rq9MUMj5oQ@mail.gmail.com>
 <CAP7+vJLV2NWpvjeTFPwbJR9W50NBb5M4YnMOMnS7aPWSeMLXDQ@mail.gmail.com>
Message-ID: <CAH0mxTRn2SWvP+God9dqjMS51Fs6RWqZ7CQfWACTLGQCf_Ot3A@mail.gmail.com>

On 6 May 2015 at 16:28, Guido van Rossum <guido at python.org> wrote:
> On Wed, May 6, 2015 at 11:56 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>>
>> On 6 May 2015 at 17:41, Guido van Rossum <guido at python.org> wrote:
>> > As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have
>> > some really outdated versions in the CPython tree and nobody wants to
>> > step
>> > in and upgrade these to the latest CFFI, for some reason (such as that
>> > that
>> > would actually break a lot of code because the latest version is so
>> > different from the version we currently include?).
>>
>> I think you are referring to libffi (used by ctypes) here rather than
>> cffi.
>
>
> Oh dear. I think you're right. :-(
>
> Forget I said anything. Naming is hard.
>
> I'm still not sure how realistic it is to try and deprecate the C API.

I am also not sure, but I feel like it would be a __huge__ step
forward for other implementations
and Python as a language instead of the particular cPython Software Product.

Today, many people still see using the C API as "the way" to extend Python,
which implies in most extensions created being invalid for all other
implementations of the language.

(Ok, I actually don't know if cython modules
could be called from, say Pypy or Jython, but even if not, I suppose
a "jcython" and "pycython" could be made available in the future)
(fist-google-entry says cython has at least partial support for pypy already)
?
   js
 -><-

From kaiser.yann at gmail.com  Wed May  6 21:50:41 2015
From: kaiser.yann at gmail.com (Yann Kaiser)
Date: Wed, 06 May 2015 19:50:41 +0000
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
 <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
 <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>
 <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
Message-ID: <CANUJvPUgmZLi=fR5pj0vUBmD32oaGmNojvXKCZY1oKYEh+TpAQ@mail.gmail.com>

On Wed, 6 May 2015 at 08:49 Guido van Rossum <guido at python.org> wrote:

> I realize this is still python-ideas, but does this really leave functions
> with multiple arguments completely out of the picture (except as the first
> stage in the pipeline)?
>

To provide some alternative ideas, what I did in
sigtools.wrappers.Combination[1] was to replace the first argument with the
return value of the previous call while always using the same remaining
(positional and keyword) arguments. In code:

    def __call__(self, arg, *args, **kwargs):
        for function in self.functions:
            arg = function(arg, *args, **kwargs)
        return arg

With this you can even use functions that use different parameters, at the
cost of less strictness:

    def func1(arg, *, kw1, **kwargs):
        ...

    def func2(arg, *, kw2, **kwargs):
        ...

That class is more of a demo for sigtools.signatures.merge[2] rather than
something spawned out of a need however.

[1] http://sigtools.readthedocs.org/en/latest/#sigtools.wrappers.Combination
[2] http://sigtools.readthedocs.org/en/latest/#sigtools.signatures.merge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/48a70ac0/attachment-0001.html>

From abarnert at yahoo.com  Wed May  6 21:51:29 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 6 May 2015 12:51:29 -0700
Subject: [Python-ideas] (no subject)
In-Reply-To: <CAOMjWkkSK6iAQhnCTJ4JPjFioxregNz4xFu-S3NpX00p3ZnznQ@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
 <CAOMjWkkSK6iAQhnCTJ4JPjFioxregNz4xFu-S3NpX00p3ZnznQ@mail.gmail.com>
Message-ID: <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com>

On May 6, 2015, at 08:05, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> 
> Dear Andrew,
> 
> Thank you for pointing out the previous discussion, I have overlooked it. (Btw, I have found your post about the infix operators, that is a great idea).

Well, nobody else seemed to like that idea, which may be a warning sign about this one. :)

> Also, It turns out that astropy uses a very similar idea for function composition.
> 
> I agree that there are indeed to much ambiguities about the "right way", and thus it is not good for stdlib. However, implementing only one decorator as a third-party library is not good idea as well.
> You are right that no one will install such library. Probably, it would be better to combine it with other functionality like @infix (via overloading __or__ or __rshift__), @auto_curry, etc.

Actually, many of the implementations on PyPI are part of "miscellaneous functional tools" libraries that do combine it with such things. And they still have practically no users.

There are plenty of libraries that, despite being on PyPI and not mentioned anywhere in the standard docs, still have a lot of users. In fact, much of what's in the Python stdlib today (json, sqlite3, ElementTree, statistics, enum, multiprocessing, ...) started off that way. And there may be more people using requests or NumPy or Django than a lot of parts of the stdlib. "Nobody will use it unless it's in the stdlib" doesn't cut it anymore in the days of most Python installations including pip, the stdlib docs referencing libraries on PyPI, etc. If something isn't getting traction on PyPI, either people really don't want it--in which case there's nothing to do--or someone really needs to evangelize it--in which case you should start doing that, rather than proposing yet another implementation that will just gather dust.

Finally, I think you've ignored an important part of my message--which is probably my fault for not making it clearer. Code that deals in abstract functional terms is harder for many people to think about. Not just novices (unless you want to call Guido a novice). Languages that make it easier to write such code are harder languages to read. So, making it easier to write such code in Python may not be a win.

And the reason I brought up all those other abstract features in Haskell is that they tie together with composition very closely. Most of the best examples anyone can come up with for how compose makes code easier to read also include curried functions, operator sections, composing the apply operator itself, and so on. They're all really cool ideas that can simplify your logic--but only if you're willing to think on that more abstract plane. Adding all of that to Python would make it harder to learn. Not adding it to Python would make compose not very useful. (Which is why the various implementations are languishing without users.)

> Thank you for the feedback!
> 
> 
>> On 6 May 2015 at 15:59, Andrew Barnert <abarnert at yahoo.com> wrote:
>> This was discussed when the proposal to add @ for matrix multiplication came up, so you should first read that thread and make sure you have answers to all of the issues that came up before proposing it again.
>> 
>> Off the top of my head:
>> 
>> Python functions don't just take 1 parameter, they take any number of parameters, possibly including optional parameters, keyword-only, *args, **kwargs, etc. There are a dozen different compose implementations on PyPI and ActiveState that handle these differently. Which one is "right"?
>> 
>> The design you describe can be easily implemented as a third-party library. Why not do so, put it on PyPI, see if you get any traction and any ideas for improvement, and then suggest it for the stdlib?
>> 
>> The same thing is already doable today using a different operator--and, again, there are a dozen implementations. Why isn't anyone using them?
>> 
>> Thinking in terms of function composition requires a higher level of abstraction than thinking in terms of lambda expressions. That's one of the reasons people perceive Haskell to be a harder language to learn than Lisp or Python. Of course learning Haskell is rewarding--but being easy to learn is one of Python's major strengths.
>> 
>> Python doesn't have a static optimizing compiler that can avoid building 4 temporary function objects to evaluate (plot @ sorted @ sqrt @ real) (data_array), so it will make your code significantly less efficient.
>> 
>> Is @ for composition and () for application really sufficient to write point free code in general without auto-curried functions, operator sectioning, reverse compose, reverse apply, etc.? Most of the examples people use in describing the feature from Haskell have a (+ 1) or (== x) or take advantage of map-type functions being (a->b) -> ([a] -> [b]) instead of (a->b, [a]) -> [b].
>> 
>> Sent from my iPhone
>> 
>> > On May 6, 2015, at 06:15, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
>> >
>> > Dear all,
>> >
>> > The matrix multiplication operator @ is going to be introduced in Python 3.5 and I am thinking about the following idea:
>> >
>> > The semantics of matrix multiplication is the composition of the corresponding linear transformations.
>> > A linear transformation is a particular example of a more general concept - functions.
>> > The latter are frequently composed with ("wrap") each other. For example:
>> >
>> > plot(real(sqrt(data)))
>> >
>> > However, it is not very readable in case of many wrapping layers. Therefore, it could be useful to employ
>> > the matrix multiplication operator @ for indication of function composition. This could be done by such (simplified) decorator:
>> >
>> > class composable:
>> >
>> >     def __init__(self, func):
>> >         self.func = func
>> >
>> >     def __call__(self, arg):
>> >         return self.func(arg)
>> >
>> >     def __matmul__(self, other):
>> >         def composition(*args, **kwargs):
>> >             return self.func(other(*args, **kwargs))
>> >         return composable(composition)
>> >
>> > I think using such decorator with functions that are going to be deeply wrapped
>> > could improve readability.
>> > You could compare (note that only the outermost function should be decorated):
>> >
>> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real) (data_array)
>> >
>> > I think the latter is more readable, also compare
>> >
>> > def sunique(lst):
>> >     return sorted(list(set(lst)))
>> >
>> > vs.
>> >
>> > sunique = sorted @ list @ set
>> >
>> > Apart from readability, there are following pros of the proposed decorator:
>> >
>> > 1. Similar semantics as for matrix multiplication.
>> > 2. Same symbol for composition as for decorators.
>> > 3. The symbol @ resembles mathematical notation for function composition: ?
>> >
>> > I think it could be a good idea to add such a decorator to the stdlib functools module.
>> > _______________________________________________
>> > Python-ideas mailing list
>> > Python-ideas at python.org
>> > https://mail.python.org/mailman/listinfo/python-ideas
>> > Code of Conduct: http://python.org/psf/codeofconduct/
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/a56a58b3/attachment.html>

From abarnert at yahoo.com  Wed May  6 21:57:57 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 6 May 2015 12:57:57 -0700
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CAH0mxTRn2SWvP+God9dqjMS51Fs6RWqZ7CQfWACTLGQCf_Ot3A@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>
 <CACac1F_YPRjceF9T4OYsZwhZnZE1faonQ2UtHFE-rq9MUMj5oQ@mail.gmail.com>
 <CAP7+vJLV2NWpvjeTFPwbJR9W50NBb5M4YnMOMnS7aPWSeMLXDQ@mail.gmail.com>
 <CAH0mxTRn2SWvP+God9dqjMS51Fs6RWqZ7CQfWACTLGQCf_Ot3A@mail.gmail.com>
Message-ID: <2612484C-4BA0-460E-8913-2F294310107B@yahoo.com>

On May 6, 2015, at 12:49, Joao S. O. Bueno <jsbueno at python.org.br> wrote:
> 
>> On 6 May 2015 at 16:28, Guido van Rossum <guido at python.org> wrote:
>>> On Wed, May 6, 2015 at 11:56 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>>> 
>>>> On 6 May 2015 at 17:41, Guido van Rossum <guido at python.org> wrote:
>>>> As for CFFI, is the ownership/maintenance issue solved yet? IIRC we have
>>>> some really outdated versions in the CPython tree and nobody wants to
>>>> step
>>>> in and upgrade these to the latest CFFI, for some reason (such as that
>>>> that
>>>> would actually break a lot of code because the latest version is so
>>>> different from the version we currently include?).
>>> 
>>> I think you are referring to libffi (used by ctypes) here rather than
>>> cffi.
>> 
>> 
>> Oh dear. I think you're right. :-(
>> 
>> Forget I said anything. Naming is hard.
>> 
>> I'm still not sure how realistic it is to try and deprecate the C API.
> 
> I am also not sure, but I feel like it would be a __huge__ step
> forward for other implementations
> and Python as a language instead of the particular cPython Software Product.
> 
> Today, many people still see using the C API as "the way" to extend Python,
> which implies in most extensions created being invalid for all other
> implementations of the language.
> 
> (Ok, I actually don't know if cython modules
> could be called from, say Pypy or Jython, but even if not, I suppose
> a "jcython" and "pycython" could be made available in the future)
> (fist-google-entry says cython has at least partial support for pypy already)

Yes, but PyPy also has pretty good support for
C API extensions, too.

For Jython and IronPython, I'm not sure what the answer could be. Could Cython automatically build
JNI thingies and wrap them up in Java thingies to expose them to Jython even in theory? Even if that worked, what about something like Skulpt? Compile to C code and jsctypes wrappers?

So I think there's a limit to what you can expect out of "extending Python the language" in any generic way.
> ?
>   js
> -><-
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From abarnert at yahoo.com  Wed May  6 21:59:16 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 6 May 2015 12:59:16 -0700
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
Message-ID: <E8C9B3CA-992A-4A48-BDF8-37980BB0F737@yahoo.com>

On May 6, 2015, at 09:23, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> 
> A big blocker to making certain sweeping changes to CPython (e.g.
> ref-counting) is compatibility with the vast body of C extension
> modules out there that use the C-API.  While there are certainly
> drastic long-term solutions to that problem, there is one thing we can
> do in the short-term that would at least get the ball rolling.  We can
> put a big red note at the top of every page of the C-API docs that
> encourages folks to either use CFFI or Cython.

Does this mean you also want to discourage boost::python, SIP, SWIG, etc., which as far as I know come down to automatically building C API extensions, and would need to be completely rewritten if you wanted to make them work a different way?

> Thoughts?
> 
> -eric
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From abarnert at yahoo.com  Wed May  6 22:10:10 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 6 May 2015 13:10:10 -0700
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
	matrix multiplication)
In-Reply-To: <CANUJvPUgmZLi=fR5pj0vUBmD32oaGmNojvXKCZY1oKYEh+TpAQ@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
 <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
 <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>
 <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
 <CANUJvPUgmZLi=fR5pj0vUBmD32oaGmNojvXKCZY1oKYEh+TpAQ@mail.gmail.com>
Message-ID: <FB6AB86E-7080-44D4-AE03-B8EC8990E7D2@yahoo.com>

On May 6, 2015, at 12:50, Yann Kaiser <kaiser.yann at gmail.com> wrote:
> 
>> On Wed, 6 May 2015 at 08:49 Guido van Rossum <guido at python.org> wrote:
>> I realize this is still python-ideas, but does this really leave functions with multiple arguments completely out of the picture (except as the first stage in the pipeline)?
> 
> To provide some alternative ideas, what I did in sigtools.wrappers.Combination[1] was to replace the first argument with the return value of the previous call while always using the same remaining (positional and keyword) arguments

This is exactly my point about there not being one obvious right answer for dealing with multiple arguments. Among the choices are:

 * Don't allow them.
 * Auto-*-unpack return values into multiple arguments.
 * Compose on the first argument, pass others along the chain.
 * Auto-curry and auto-partial everywhere.
 * Auto-curry and auto-partial and _also_ auto-*-unpack (which I'd never considered, but it sounds like that's what this thread proposes).

They've all got uses. But if you're going to write _the_ compose function, it has to pick one.

Also, keep in mind that "auto-curry and auto-partial everything" can't really mean "everything"--unlike Haskell, Python can't partial operator expressions, and we've still got *args and keyword-only params and **kw, and we've got C functions that aren't argclinic'd yet, and so on. (If that all seems like obscure edge cases, consider that most proxy implementations, forwarding functions, etc. work by just taking and passing along *args, **kw, not by inspecting and binding the signature of the wrappee.)

> . In code:
> 
>     def __call__(self, arg, *args, **kwargs):
>         for function in self.functions:
>             arg = function(arg, *args, **kwargs)
>         return arg
> 
> With this you can even use functions that use different parameters, at the cost of less strictness:
> 
>     def func1(arg, *, kw1, **kwargs):
>         ...
> 
>     def func2(arg, *, kw2, **kwargs):
>         ...
> 
> That class is more of a demo for sigtools.signatures.merge[2] rather than something spawned out of a need however.
> 
> [1] http://sigtools.readthedocs.org/en/latest/#sigtools.wrappers.Combination
> [2] http://sigtools.readthedocs.org/en/latest/#sigtools.signatures.merge
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/0023be0f/attachment.html>

From breamoreboy at yahoo.co.uk  Wed May  6 22:38:11 2015
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Wed, 06 May 2015 21:38:11 +0100
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CACac1F_YPRjceF9T4OYsZwhZnZE1faonQ2UtHFE-rq9MUMj5oQ@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>
 <CACac1F_YPRjceF9T4OYsZwhZnZE1faonQ2UtHFE-rq9MUMj5oQ@mail.gmail.com>
Message-ID: <midu3j$bhl$1@ger.gmane.org>

On 06/05/2015 19:56, Paul Moore wrote:
>
> Otherwise, it's mostly a matter of getting the build steps to work.
> Zach Ware got 32-bit builds going, and his approach (use git bash to
> run configure and keep a copy of the results) should in principle be
> fine for 64-bit, but I stalled because I've no way of testing a libffi
> build short of building the whole of Python and running the ctypes
> tests, which is both heavy handed and likely to obscure the root cause
> of any actual bugs found that way :-(
>
> Paul

Feel free to throw this or any other Windows issues my way.  I've all 
the time in the world to try stuff like this.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence


From abarnert at yahoo.com  Wed May  6 22:40:18 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 6 May 2015 13:40:18 -0700
Subject: [Python-ideas] (no subject)
In-Reply-To: <20150506154811.GM5663@ando.pearwood.info>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
 <20150506154811.GM5663@ando.pearwood.info>
Message-ID: <26EE1D3C-0496-41C4-98D8-6D918FEF578A@yahoo.com>

Apologies for the split replies; is everyone else seeing this as three separate threads spawned from two copies of the original mail, or is this just Yahoo sucking again?

On May 6, 2015, at 08:48, Steven D'Aprano <steve at pearwood.info> wrote:
> 
>> On Wed, May 06, 2015 at 06:59:45AM -0700, Andrew Barnert via Python-ideas wrote:
>> 
>> Python functions don't just take 1 parameter, they take any number of 
>> parameters, possibly including optional parameters, keyword-only, 
>> *args, **kwargs, etc.
> 
> Maybe Haskell programmers are used to functions which all take one 
> argument, and f(a, b, c) is syntactic sugar for f(a)(b)(c), but I doubt 
> anyone else is.

But that's exactly the point. In Haskell, because f(a, b, c) is syntactic sugar for f(a)(b)(c), it's obvious (to a Haskell programmer, not to a human) what it means to compose f. In Python, it's not at all obvious.

Or, worse, it _is_ obvious that you just can't compose f with anything. There is no function whose return value can be passed to a function that takes 3 arguments. (Unless you add some extra rule, like auto-*-unpacking, or passing along *args[1:], **kw up the chain, or you do like Haskell and auto-curry everything.) Which makes composition far less useful in Python than in Haskell.

> When we Python programmers manually compose a function 
> today, by writing an expression or a new function, we have to deal with 
> the exact same problems.

Yes, but we can explicitly decide what to pass for the b and c arguments when writing an expression, and the obvious way to encode that decision is trivially readable. For example:

    f(g(a), b, c)
    f(*g(a, b, c))
    f(g(a)) # using default values for b and c

... etc.

I can't think of any syntax for compose that makes that true.

> There's nothing new about the programmer 
> needing to ensure that the function signatures are compatible:
> 
> def spam(a, b, c):
>    return a+b+c
> 
> def eggs(x, y, z):
>    return x*y/z
> 
> def composed(*args):
>    return eggs(spam(*args))  # doesn't work
> 
> It is the programmer's responsibility to compose compatible functions. 
> Why should it be a fatal flaw that the same limitation applies to a 
> composition operator?

Sure, just as bad for a compose function as for a compose operator. I'm not suggesting we should add composed to the stdlib instead of @, I'm suggesting we should add neither.

> Besides, with Argument Clinic, it's possible that the @ operator could 
> catch incompatible signatures ahead of time.
> 
> 
>> There are a dozen different compose 
>> implementations on PyPI and ActiveState that handle these differently.
> 
> That is good evidence that this is functionality that people want.

Not if nobody is using any of those implementations.

>> Which one is "right"?
> 
> Perhaps all of them? Perhaps none of them? There are lots of buggy or 
> badly designed functions and classes on the internet. Perhaps that 
> suggests that the std lib should solve it right once and for all.

It's not that they're buggy, it's that there are fundamental design choices that have to be made to fit compose into a language like Python, and none of the options are good enough to be standardized. One project may have good uses for a compose that passes along extra args, another for a compose that *-unpacks, and another for auto-currying. Providing one of those won't help the other two projects at all; all it'll do is collide with the name they wanted to use.

>> The design you describe can be easily implemented as a third-party 
>> library. Why not do so, put it on PyPI, see if you get any traction 
>> and any ideas for improvement, and then suggest it for the stdlib?
> 
> I agree that this idea needs to have some real use before it can be 
> added to the std lib, but see below for a counter-objection to the PyPI 
> objection.
> 
> 
>> The same thing is already doable today using a different 
>> operator--and, again, there are a dozen implementations. Why isn't 
>> anyone using them?
> 
> It takes a certain amount of effort for people to discover and use a 
> third-party library: one has to find a library, or libraries, determine 
> that it is mature, decide which competing library to use, determine if 
> the licence is suitable, download and install it. This "activiation 
> energy" is insignificant if the library does something big, say, like 
> numpy, or nltk, or even medium sized.
> 
> But for a library that provides effectively a single function, that 
> activation energy is a barrier to entry. It's not that the function 
> isn't useful, or that people wouldn't use it if it were already 
> available. It's just that the effort to get it is too much bother. 
> People will do without, or re-invent the wheel. (Re-inventing the wheel 
> is at least fun. Searching PyPI and reading licences is not.)
> 
> 
>> Thinking in terms of function composition requires a higher level of 
>> abstraction than thinking in terms of lambda expressions.
> 
> Do you think its harder than, say, the "async for" feature that's just 
> been approved by Guido?

That's not a fair comparison.

Writing proper c10k network code is hard. An extra layer of abstraction that you have to get your head around, but that makes it a lot easier once you do, is a clear win.

Calling functions with the result of other functions is easy. An extra layer of abstraction that you have to get your head around, but that makes it possible to write slightly more concise or elegant code once you do, is probably not a win.

When I first come back to Python after a bit of time with another language, I have to shift gears and stop reducing compositions and instead loop over explicitly composed expressions, but that means I write more Pythonic code. I don't want Python to enable me to write code that Guido can't understand (unless it's something inherently complex in the first place).

> Compared to asynchronous code, I would say function composition is 
> trivial. Anyone who can learn the correspondence
> 
>    (a @ b)(arg)  <=> a(b(arg))
> 
> can deal with it.
> 
> 
>> Python doesn't have a static optimizing compiler that can avoid 
>> building 4 temporary function objects to evaluate (plot @ sorted @ 
>> sqrt @ real) (data_array), so it will make your code significantly 
>> less efficient.
> 
> Why would it necessarily have to create 4 temporary function objects? 
> Besides, the rules for optimization apply here too: don't dismiss 
> something as too slow until you've measured it :-)

It's not so much creating the 4 temporary objects as having to call through them every time you want to call the composed function.

And, while I obviously haven't measured the vaporware implementation of the current proposal, I have played around with different ways of writing Haskelly code in Python and how they fare without a GHC-style optimizer, and the performance impact is definitely noticeable in almost everything you do.

Of course it's possible that the very nature of "playing" vs. writing production code means that I was pushing it a lot farther than anyone would do in real life (nobody's going to invent and apply new combinators in production code, even in Haskell...), so I'll concede that maybe this wouldn't be a problem. But I suspect it will be.

> We shouldn't care about the cost of the @ operator itself, only the cost 
> of calling the composed functions. Building the Composed object 
> generally happens only once, while calling it generally happens many 
> times.
> 
> 
>> Is @ for composition and () for application really sufficient to write 
>> point free code in general without auto-curried functions, operator 
>> sectioning, reverse compose, reverse apply, etc.? Most of the examples 
>> people use in describing the feature from Haskell have a (+ 1) or (== 
>> x) or take advantage of map-type functions being (a->b) -> ([a] -> 
>> [b]) instead of (a->b, [a]) -> [b].
> 
> See, now *that's* why people consider Haskell to be difficult:

Hold on. (+ 1) meaning lambda x: x + 1 doesn't require any abstruse graduate-level math.

Understanding auto-currying map functions might... But that's kind of my point: most of the best examples for compose involve exactly these kinds of things that you don't want to even try to understand. And notice that the author of this current proposal thinks we should add the same thing to Python. Doesn't that make you worry that maybe compose belongs to the wrong universe?

> it is 
> based on areas of mathematics which even maths graduates may never have 
> come across. But function composition is taught in high school. (At 
> least in Australia, and I expect Europe and Japan.) It's a nice, simple 
> and useful functional tool, like partial.
> 
> 
> -- 
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From me at the-compiler.org  Wed May  6 22:58:12 2015
From: me at the-compiler.org (Florian Bruhin)
Date: Wed, 6 May 2015 22:58:12 +0200
Subject: [Python-ideas] (no subject)
In-Reply-To: <26EE1D3C-0496-41C4-98D8-6D918FEF578A@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
 <20150506154811.GM5663@ando.pearwood.info>
 <26EE1D3C-0496-41C4-98D8-6D918FEF578A@yahoo.com>
Message-ID: <20150506205812.GP429@tonks>

* Andrew Barnert via Python-ideas <python-ideas at python.org> [2015-05-06 13:40:18 -0700]:
> Apologies for the split replies; is everyone else seeing this as three separate threads spawned from two copies of the original mail, or is this just Yahoo sucking again?

Yes - I guess the OP accidentally sent it without a subject, and then
re-sent it with a subject.

Florian

-- 
http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP)
   GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc
         I love long mails! | http://email.is-not-s.ms/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/89f7ea0c/attachment.sig>

From levkivskyi at gmail.com  Wed May  6 23:12:49 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 23:12:49 +0200
Subject: [Python-ideas] (no subject)
Message-ID: <CAOMjWkmoXEQvNwTpmcZ71w9i9hXJvDM43iS=9LyO-ECUgiuNaA@mail.gmail.com>

> Apologies for the split replies; is everyone else seeing this
> as three separate threads spawned
> from two copies of the original mail, or is this just Yahoo sucking again?

Probably that is my fault. I have sent first a message via google-group,
but then received a message from python-ideas at python.org that my message
has not been delivered and sent the second directly. Sorry for that. This
is my first post here.

> And notice that the author of this current proposal
> thinks we should add the same thing to Python.
> Doesn't that make you worry that maybe compose
> belongs to the wrong universe?

I would like to clarify that I don't want to add all Haskell to Python, on
the contrary, I wanted to propose a small subset of tools that could be
useful. Your position as I understand is that it is not easy: either you
get many complex tools, or you get useless tools.

Still, I think one could try to find some kind of compromise.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/7acbfb0e/attachment.html>

From levkivskyi at gmail.com  Wed May  6 23:15:12 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 23:15:12 +0200
Subject: [Python-ideas] (no subject)
In-Reply-To: <CACac1F9Y-4CvNCySpfzufuPaWfWjC0PR5s6KB4a4g47KukeRZw@mail.gmail.com>
References: <CAOMjWk=Vbk6Kp6B4zRXzg8EY1yLGaS1yssDTDR=SdvcZqGAdeg@mail.gmail.com>
 <CACac1F9Y-4CvNCySpfzufuPaWfWjC0PR5s6KB4a4g47KukeRZw@mail.gmail.com>
Message-ID: <CAOMjWknffxA1Ru0_XO9CMSofni3L21Tb_kgL_0v=w+8dQKacTA@mail.gmail.com>

This is one of the options, but in my opinion an operator (@ that I
propose) is clearer than a function

On 6 May 2015 at 20:44, Paul Moore <p.f.moore at gmail.com> wrote:

> On 6 May 2015 at 17:23, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> > I should clarify why I would like to have the possibility to easily
> compose
> > functions.
> > I am a physicist (not a real programmer), and in my code I often compose
> > functions.
> >
> > To do this I need to write something like
> >
> > def new_func(x):
> >     return f(g(h(x)))
> >
> > This means I see f(g(h())) quite often and I would prefer to see f @ g @
> h
> > instead.
>
> I appreciate that it's orthogonal to the proposal, but would a utility
> function like this be useful?
>
> def compose(*fns):
>     def composed(x):
>         for f in reversed(fns):
>             x = f(x)
>         return x
>     return composed
>
> comp = compose(f, g, h)
> # comp(x) = f(g(h(x)))
>
> Paul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/219eadda/attachment.html>

From levkivskyi at gmail.com  Wed May  6 23:25:05 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 23:25:05 +0200
Subject: [Python-ideas] Add 'composable' decorator to functools (with @
 matrix multiplication)
In-Reply-To: <CANUJvPUfpK=-uJRKoWy2-ECAn+LGvKWJ34z8n4Rou0FkH7ZXsA@mail.gmail.com>
References: <CAOMjWk=-1u8d8ZxRmXNAiitLpHRbb7dCqbyBQu6VhaJHKUkp1w@mail.gmail.com>
 <CAOTD34bm+H+y4v+Kzjzo++aPwt6Dn8Z9khNj4cJD+D+Aj5tnKw@mail.gmail.com>
 <CAOMjWkmgXQQeVSqD6wSoO+UecQDGuwR7Y-qYKS2T8wR5ejpC5w@mail.gmail.com>
 <CAOTD34YQJOQpp+whiX2Xp5OwJ1Y4c=H7fLm3aSgeMyvhmTD3OQ@mail.gmail.com>
 <CAP7+vJKuR5w=CGyt0QxpD8BBpJ=x+ZqTn4Mo49dYhWFTpde6JA@mail.gmail.com>
 <CAOMjWkmcuc0R6B+EDhk6r+cNTEDewFEevZ8zUYyJmfVho78vRQ@mail.gmail.com>
 <CANUJvPUfpK=-uJRKoWy2-ECAn+LGvKWJ34z8n4Rou0FkH7ZXsA@mail.gmail.com>
Message-ID: <CAOMjWk=AZV8Q5WoN6-6hSxoYnRoKeQVjY2V1rJ9qq9Y_dJk9Ww@mail.gmail.com>

Dear Yann,

The two options that you mentioned are indeed equivalent (the function
application is much tighter that @),
but note that z would be a partial-like object.

Of course for this to work, not only the first function must be decorated
with @composable, but also all multi-argument functions.


On 6 May 2015 at 21:32, Yann Kaiser <kaiser.yann at gmail.com> wrote:

> On Wed, 6 May 2015 at 09:10 Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
>
>> Dear Guido,
>>
>> My original idea was to make the composable functions auto-curried
>> (similar to proposed here
>> http://code.activestate.com/recipes/52902-function-composition/ as
>> pointed out by Steve) so that
>>
>> my_fun = square @ add(1)
>> my_fun(x)
>>
>> evaluates to
>>
>> square(add(1,x))
>>
>
> This breaks the (IMO) fundamental expectation that
>
>     z = add(1)
>     my_fun = square @ z
>
> is equivalent to
>
>     my_fun = square @ add(1)
>
> -Yann
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/bcbf60ac/attachment.html>

From levkivskyi at gmail.com  Wed May  6 23:38:02 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Wed, 6 May 2015 23:38:02 +0200
Subject: [Python-ideas] (no subject)
In-Reply-To: <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
 <CAOMjWkkSK6iAQhnCTJ4JPjFioxregNz4xFu-S3NpX00p3ZnznQ@mail.gmail.com>
 <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com>
Message-ID: <CAOMjWk=fAzZxnTAU0baK5d631sjpQ4hykFiXhgBMBn3VJf-OEw@mail.gmail.com>

On 6 May 2015 at 21:51, Andrew Barnert <abarnert at yahoo.com> wrote:

> On May 6, 2015, at 08:05, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
>
> Dear Andrew,
>
> Thank you for pointing out the previous discussion, I have overlooked it.
> (Btw, I have found your post about the infix operators, that is a great
> idea).
>
>
> Well, nobody else seemed to like that idea, which may be a warning sign
> about this one. :)
>
> Also, It turns out that astropy uses a very similar idea for function
> composition.
>
> I agree that there are indeed to much ambiguities about the "right way",
> and thus it is not good for stdlib. However, implementing only one
> decorator as a third-party library is not good idea as well.
> You are right that no one will install such library. Probably, it would be
> better to combine it with other functionality like @infix (via overloading
> __or__ or __rshift__), @auto_curry, etc.
>
>
> Actually, many of the implementations on PyPI are part of "miscellaneous
> functional tools" libraries that do combine it with such things. And they
> still have practically no users.
>
> There are plenty of libraries that, despite being on PyPI and not
> mentioned anywhere in the standard docs, still have a lot of users. In
> fact, much of what's in the Python stdlib today (json, sqlite3,
> ElementTree, statistics, enum, multiprocessing, ...) started off that way.
> And there may be more people using requests or NumPy or Django than a lot
> of parts of the stdlib. "Nobody will use it unless it's in the stdlib"
> doesn't cut it anymore in the days of most Python installations including
> pip, the stdlib docs referencing libraries on PyPI, etc. If something isn't
> getting traction on PyPI, either people really don't want it--in which case
> there's nothing to do--or someone really needs to evangelize it--in which
> case you should start doing that, rather than proposing yet another
> implementation that will just gather dust.
>

Ok, I will try inspecting all existing approaches to find the one that
seems more "right" to me :) In any case that approach could be updated by
incorporating matrix @ as a dedicated operator for compositions. At least,
it seems that Erik from astropy likes this idea and it is quite natural for
people with "scientific" background.


> Finally, I think you've ignored an important part of my message--which is
> probably my fault for not making it clearer. Code that deals in abstract
> functional terms is harder for many people to think about. Not just novices
> (unless you want to call Guido a novice). Languages that make it easier to
> write such code are harder languages to read. So, making it easier to write
> such code in Python may not be a win.
>
> And the reason I brought up all those other abstract features in Haskell
> is that they tie together with composition very closely. Most of the best
> examples anyone can come up with for how compose makes code easier to read
> also include curried functions, operator sections, composing the apply
> operator itself, and so on. They're all really cool ideas that can simplify
> your logic--but only if you're willing to think on that more abstract
> plane. Adding all of that to Python would make it harder to learn. Not
> adding it to Python would make compose not very useful. (Which is why the
> various implementations are languishing without users.)
>
Thank you for the feedback!
>
>
> On 6 May 2015 at 15:59, Andrew Barnert <abarnert at yahoo.com> wrote:
>
>> This was discussed when the proposal to add @ for matrix multiplication
>> came up, so you should first read that thread and make sure you have
>> answers to all of the issues that came up before proposing it again.
>>
>> Off the top of my head:
>>
>> Python functions don't just take 1 parameter, they take any number of
>> parameters, possibly including optional parameters, keyword-only, *args,
>> **kwargs, etc. There are a dozen different compose implementations on PyPI
>> and ActiveState that handle these differently. Which one is "right"?
>>
>> The design you describe can be easily implemented as a third-party
>> library. Why not do so, put it on PyPI, see if you get any traction and any
>> ideas for improvement, and then suggest it for the stdlib?
>>
>> The same thing is already doable today using a different operator--and,
>> again, there are a dozen implementations. Why isn't anyone using them?
>>
>> Thinking in terms of function composition requires a higher level of
>> abstraction than thinking in terms of lambda expressions. That's one of the
>> reasons people perceive Haskell to be a harder language to learn than Lisp
>> or Python. Of course learning Haskell is rewarding--but being easy to learn
>> is one of Python's major strengths.
>>
>> Python doesn't have a static optimizing compiler that can avoid building
>> 4 temporary function objects to evaluate (plot @ sorted @ sqrt @ real)
>> (data_array), so it will make your code significantly less efficient.
>>
>> Is @ for composition and () for application really sufficient to write
>> point free code in general without auto-curried functions, operator
>> sectioning, reverse compose, reverse apply, etc.? Most of the examples
>> people use in describing the feature from Haskell have a (+ 1) or (== x) or
>> take advantage of map-type functions being (a->b) -> ([a] -> [b]) instead
>> of (a->b, [a]) -> [b].
>>
>> Sent from my iPhone
>>
>> > On May 6, 2015, at 06:15, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
>> >
>> > Dear all,
>> >
>> > The matrix multiplication operator @ is going to be introduced in
>> Python 3.5 and I am thinking about the following idea:
>> >
>> > The semantics of matrix multiplication is the composition of the
>> corresponding linear transformations.
>> > A linear transformation is a particular example of a more general
>> concept - functions.
>> > The latter are frequently composed with ("wrap") each other. For
>> example:
>> >
>> > plot(real(sqrt(data)))
>> >
>> > However, it is not very readable in case of many wrapping layers.
>> Therefore, it could be useful to employ
>> > the matrix multiplication operator @ for indication of function
>> composition. This could be done by such (simplified) decorator:
>> >
>> > class composable:
>> >
>> >     def __init__(self, func):
>> >         self.func = func
>> >
>> >     def __call__(self, arg):
>> >         return self.func(arg)
>> >
>> >     def __matmul__(self, other):
>> >         def composition(*args, **kwargs):
>> >             return self.func(other(*args, **kwargs))
>> >         return composable(composition)
>> >
>> > I think using such decorator with functions that are going to be deeply
>> wrapped
>> > could improve readability.
>> > You could compare (note that only the outermost function should be
>> decorated):
>> >
>> > plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
>> (data_array)
>> >
>> > I think the latter is more readable, also compare
>> >
>> > def sunique(lst):
>> >     return sorted(list(set(lst)))
>> >
>> > vs.
>> >
>> > sunique = sorted @ list @ set
>> >
>> > Apart from readability, there are following pros of the proposed
>> decorator:
>> >
>> > 1. Similar semantics as for matrix multiplication.
>> > 2. Same symbol for composition as for decorators.
>> > 3. The symbol @ resembles mathematical notation for function
>> composition: ?
>> >
>> > I think it could be a good idea to add such a decorator to the stdlib
>> functools module.
>> > _______________________________________________
>> > Python-ideas mailing list
>> > Python-ideas at python.org
>> > https://mail.python.org/mailman/listinfo/python-ideas
>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/060ab978/attachment-0001.html>

From mal at egenix.com  Wed May  6 23:41:13 2015
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 06 May 2015 23:41:13 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
Message-ID: <554A8A79.2040306@egenix.com>

On 06.05.2015 18:23, Eric Snow wrote:
> A big blocker to making certain sweeping changes to CPython (e.g.
> ref-counting) is compatibility with the vast body of C extension
> modules out there that use the C-API.  While there are certainly
> drastic long-term solutions to that problem, there is one thing we can
> do in the short-term that would at least get the ball rolling.  We can
> put a big red note at the top of every page of the C-API docs that
> encourages folks to either use CFFI or Cython.
> 
> Thoughts?

Python without the C extensions would hardly have had the
success it has. It is widely known as perfect language to
glue together different systems and provide integration.

Deprecating the C API would mean that you deprecate all
those existing C extensions together with the C API.

This can hardly be in the interest of Python's quest for
world domination :-)

BTW: What can be more drastic than deprecating the Python C API ?
There are certainly better ways to evolve an API than getting
rid of it.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 06 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From donald at stufft.io  Wed May  6 23:54:06 2015
From: donald at stufft.io (Donald Stufft)
Date: Wed, 6 May 2015 17:54:06 -0400
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <554A8A79.2040306@egenix.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <554A8A79.2040306@egenix.com>
Message-ID: <B85B15EB-8F92-4192-A5F0-BFA19EE093AC@stufft.io>


> On May 6, 2015, at 5:41 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
> On 06.05.2015 18:23, Eric Snow wrote:
>> A big blocker to making certain sweeping changes to CPython (e.g.
>> ref-counting) is compatibility with the vast body of C extension
>> modules out there that use the C-API.  While there are certainly
>> drastic long-term solutions to that problem, there is one thing we can
>> do in the short-term that would at least get the ball rolling.  We can
>> put a big red note at the top of every page of the C-API docs that
>> encourages folks to either use CFFI or Cython.
>> 
>> Thoughts?
> 
> Python without the C extensions would hardly have had the
> success it has. It is widely known as perfect language to
> glue together different systems and provide integration.
> 
> Deprecating the C API would mean that you deprecate all
> those existing C extensions together with the C API.
> 
> This can hardly be in the interest of Python's quest for
> world domination :-)
> 
> BTW: What can be more drastic than deprecating the Python C API ?
> There are certainly better ways to evolve an API than getting
> rid of it.


I think ?deprecate? might be a bad word for it rather than telling
people they shoul use CFFI (or Python) instead of the C-API.
Similar to having the urllib.request direct people towards the
requests project for accessing the internet.

CFFI still makes it easy to act as a glue between different systems,
it just does so in a way that isn?t tied to one particular
implementation?s API and which is generally much easier to work with
on top of that. The biggest problems with CFFI currently are the
problems in distributing a CFFI module because of some early decisions,
but the CFFI 1.0 work is fixing all of that.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/8ecb7f81/attachment.sig>

From ericsnowcurrently at gmail.com  Thu May  7 00:00:48 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 6 May 2015 16:00:48 -0600
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <E8C9B3CA-992A-4A48-BDF8-37980BB0F737@yahoo.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <E8C9B3CA-992A-4A48-BDF8-37980BB0F737@yahoo.com>
Message-ID: <CALFfu7DNYsH_9SYxuwcKvNSM38kmzdx96WnbtzS4qagTsaJqwg@mail.gmail.com>

On Wed, May 6, 2015 at 1:59 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
> On May 6, 2015, at 09:23, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>>
>> A big blocker to making certain sweeping changes to CPython (e.g.
>> ref-counting) is compatibility with the vast body of C extension
>> modules out there that use the C-API.  While there are certainly
>> drastic long-term solutions to that problem, there is one thing we can
>> do in the short-term that would at least get the ball rolling.  We can
>> put a big red note at the top of every page of the C-API docs that
>> encourages folks to either use CFFI or Cython.
>
> Does this mean you also want to discourage boost::python, SIP, SWIG, etc., which as far as I know come down to automatically building C API extensions, and would need to be completely rewritten if you wanted to make them work a different way?

Not really.  I mentioned CFFI and Cython specifically because they are
the two that kept coming up in previous discussions related to
discouraging use of the C-API.  If C extensions were always generated
using tools, then only tools would have to adapt to (drastic) changes
in the C-API.  That would be a much better situation than the status
quo since it drastically reduces the impact of changes.

-eric

From solipsis at pitrou.net  Thu May  7 00:16:22 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 7 May 2015 00:16:22 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <20150506185715.2083b063@fsol>
 <7CE0A9A8-8688-4F7D-A47B-074C5148883A@stufft.io>
Message-ID: <20150507001622.000808af@fsol>

On Wed, 6 May 2015 13:13:57 -0400
Donald Stufft <donald at stufft.io> wrote:
> 
> > On May 6, 2015, at 12:57 PM, Antoine Pitrou <solipsis-xNDA5Wrcr86sTnJN9+BGXg at public.gmane.org> wrote:
> > 
> > On Wed, 6 May 2015 10:23:09 -0600
> > Eric Snow <ericsnowcurrently at gmail.com>
> > wrote:
> >> A big blocker to making certain sweeping changes to CPython (e.g.
> >> ref-counting) is compatibility with the vast body of C extension
> >> modules out there that use the C-API.  While there are certainly
> >> drastic long-term solutions to that problem, there is one thing we can
> >> do in the short-term that would at least get the ball rolling.  We can
> >> put a big red note at the top of every page of the C-API docs that
> >> encourages folks to either use CFFI or Cython.
> > 
> > CFFI is only useful for a small subset of stuff people use the C API for
> > (mainly, thin wrappers around external libraries). Cython is a more
> > reasonable suggestion in this context.
> 
> You can write stuff in C itself for cffi too, it?s not just for C bindings,
> an example would be the .c?s and .h?s for padding and constant time compare
> in the cryptography project [1].

That really doesn't change what I said.  CFFI is not appropriate to
write e.g. actual extension classes.

Besides, we have ctypes in the standard library, it would be stupid to
recommend CFFI and not ctypes.

Regards

Antoine.



From ericsnowcurrently at gmail.com  Thu May  7 00:19:03 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 6 May 2015 16:19:03 -0600
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <554A8A79.2040306@egenix.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <554A8A79.2040306@egenix.com>
Message-ID: <CALFfu7CDZ9rQWxQV2WbZ6m32-BeMDeZCT9oPs48LzqvaBxDsOg@mail.gmail.com>

On Wed, May 6, 2015 at 3:41 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Python without the C extensions would hardly have had the
> success it has. It is widely known as perfect language to
> glue together different systems and provide integration.
>
> Deprecating the C API would mean that you deprecate all
> those existing C extensions together with the C API.

As Donald noted, I'm not suggesting that the C-API be deprecated.  I
was careful in calling it "discouraging direct use of the C-API". :)

>
> This can hardly be in the interest of Python's quest for
> world domination :-)
>
> BTW: What can be more drastic than deprecating the Python C API ?
> There are certainly better ways to evolve an API than getting
> rid of it.

I'd like to hear more on alternatives.  Lately all I've heard is how
much better off we'd be if folks used CFFI or tools like Cython to
write their extension modules.  Regardless of what it is, we should
try to find *some* solution that puts us in a position that we can
accomplish certain architectural changes, such as moving away from
ref-counting.  Larry talked about it at the language summit.

-eric

From donald at stufft.io  Thu May  7 00:27:20 2015
From: donald at stufft.io (Donald Stufft)
Date: Wed, 6 May 2015 18:27:20 -0400
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <20150507001622.000808af@fsol>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <20150506185715.2083b063@fsol>
 <7CE0A9A8-8688-4F7D-A47B-074C5148883A@stufft.io>
 <20150507001622.000808af@fsol>
Message-ID: <CA3FDF45-CCC5-4245-831D-A7FC34ECBFD8@stufft.io>


> On May 6, 2015, at 6:16 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> 
> On Wed, 6 May 2015 13:13:57 -0400
> Donald Stufft <donald at stufft.io> wrote:
>> 
>>> On May 6, 2015, at 12:57 PM, Antoine Pitrou <solipsis-xNDA5Wrcr86sTnJN9+BGXg at public.gmane.org> wrote:
>>> 
>>> On Wed, 6 May 2015 10:23:09 -0600
>>> Eric Snow <ericsnowcurrently at gmail.com>
>>> wrote:
>>>> A big blocker to making certain sweeping changes to CPython (e.g.
>>>> ref-counting) is compatibility with the vast body of C extension
>>>> modules out there that use the C-API.  While there are certainly
>>>> drastic long-term solutions to that problem, there is one thing we can
>>>> do in the short-term that would at least get the ball rolling.  We can
>>>> put a big red note at the top of every page of the C-API docs that
>>>> encourages folks to either use CFFI or Cython.
>>> 
>>> CFFI is only useful for a small subset of stuff people use the C API for
>>> (mainly, thin wrappers around external libraries). Cython is a more
>>> reasonable suggestion in this context.
>> 
>> You can write stuff in C itself for cffi too, it?s not just for C bindings,
>> an example would be the .c?s and .h?s for padding and constant time compare
>> in the cryptography project [1].
> 
> That really doesn't change what I said.  CFFI is not appropriate to
> write e.g. actual extension classes.


What is an ?actual extension class??

> 
> Besides, we have ctypes in the standard library, it would be stupid to
> recommend CFFI and not ctypes.


Besides the fact that ctypes can only work at the ABI level which flat out
doesn?t work for a lot of C projects, but even if you?re working at the ABI
level ctypes isn?t nearly as nice to use as CFFI is. With ctypes you have to
repeat the C declarations using ctypes special snowflake API but with cffi
you just re-use the C declarations (for the most part), in most scenarios
you can simply copy/paste from the .h files or man pages or what have you.

Here?s a decent read: http://eli.thegreenplace.net/2013/03/09/python-ffi-with-ctypes-and-cffi

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/08f504f7/attachment-0001.sig>

From solipsis at pitrou.net  Thu May  7 00:34:18 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 7 May 2015 00:34:18 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <20150506185715.2083b063@fsol>
 <7CE0A9A8-8688-4F7D-A47B-074C5148883A@stufft.io>
 <20150507001622.000808af@fsol>
 <CA3FDF45-CCC5-4245-831D-A7FC34ECBFD8@stufft.io>
Message-ID: <20150507003418.03a78ca0@fsol>

On Wed, 6 May 2015 18:27:20 -0400
Donald Stufft <donald at stufft.io> wrote:
> 
> > On May 6, 2015, at 6:16 PM, Antoine Pitrou <solipsis-xNDA5Wrcr86sTnJN9+BGXg at public.gmane.org> wrote:
> > 
> > On Wed, 6 May 2015 13:13:57 -0400
> > Donald Stufft <donald at stufft.io> wrote:
> >> 
> >>> On May 6, 2015, at 12:57 PM, Antoine Pitrou <solipsis-xNDA5Wrcr86sTnJN9+BGXg at public.gmane.org> wrote:
> >>> 
> >>> On Wed, 6 May 2015 10:23:09 -0600
> >>> Eric Snow <ericsnowcurrently at gmail.com>
> >>> wrote:
> >>>> A big blocker to making certain sweeping changes to CPython (e.g.
> >>>> ref-counting) is compatibility with the vast body of C extension
> >>>> modules out there that use the C-API.  While there are certainly
> >>>> drastic long-term solutions to that problem, there is one thing we can
> >>>> do in the short-term that would at least get the ball rolling.  We can
> >>>> put a big red note at the top of every page of the C-API docs that
> >>>> encourages folks to either use CFFI or Cython.
> >>> 
> >>> CFFI is only useful for a small subset of stuff people use the C API for
> >>> (mainly, thin wrappers around external libraries). Cython is a more
> >>> reasonable suggestion in this context.
> >> 
> >> You can write stuff in C itself for cffi too, it?s not just for C bindings,
> >> an example would be the .c?s and .h?s for padding and constant time compare
> >> in the cryptography project [1].
> > 
> > That really doesn't change what I said.  CFFI is not appropriate to
> > write e.g. actual extension classes.
> 
> 
> What is an ?actual extension class??

Uh... Please take a look at the C API manual.

Regards

Antoine.



From chris.barker at noaa.gov  Wed May  6 21:24:04 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Wed, 6 May 2015 12:24:04 -0700
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CAP7+vJKKg0yrYH48KtFdQEhTuBWKqDGXTUgCCnDfcm2tgK2F8A@mail.gmail.com>
Message-ID: <CALGmxELnZ-RxcJGFtTHg9NMsA93gaVRYQWM7PR-8THxDk0oRwg@mail.gmail.com>

On Wed, May 6, 2015 at 9:41 AM, Guido van Rossum <guido at python.org> wrote:

> I think Cython is already used by those people who benefit from it.
>

I wish that where the case, but I don't think so -- there is a LOT of
weight behind the idea of something being "built-in" and/or "official". So
folks do still right extensions using the raw C API.

Some note recommending Cython in the core docs about the C API would be
great.

And we don't use Cython in the standard library, do we?

-CHB




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/ad6fc873/attachment.html>

From stephen at xemacs.org  Thu May  7 03:13:11 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 07 May 2015 10:13:11 +0900
Subject: [Python-ideas] (no subject)
In-Reply-To: <CAOMjWk=fAzZxnTAU0baK5d631sjpQ4hykFiXhgBMBn3VJf-OEw@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
 <CAOMjWkkSK6iAQhnCTJ4JPjFioxregNz4xFu-S3NpX00p3ZnznQ@mail.gmail.com>
 <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com>
 <CAOMjWk=fAzZxnTAU0baK5d631sjpQ4hykFiXhgBMBn3VJf-OEw@mail.gmail.com>
Message-ID: <87lhh142c8.fsf@uwakimon.sk.tsukuba.ac.jp>

Ivan Levkivskyi writes:

 > Ok, I will try inspecting all existing approaches to find the one
 > that seems more "right" to me :)

If you do inspect all the approaches you can find, I hope you'll keep
notes and publish them, perhaps as a blog article.

 > In any case that approach could be updated by incorporating matrix
 > @ as a dedicated operator for compositions.

I think rather than "dedicated" you mean "suggested".  One of Andrew's
main points is that you're unlikely to find more than a small minority
agreeing on the "right" approach, no matter which one you choose.

 > At least, it seems that Erik from astropy likes this idea and it is
 > quite natural for people with "scientific" background.

Sure, but as he also points out, when you know that you're going to be
composing only functions of one argument, the Unix pipe symbol is also
quite natural (as is Haskell's operator-less notation).  While one of
my hobbies is category theory (basically, the mathematical theory of
composable maps for those not familiar with the term), I find the Unix
pipeline somehow easier to think about than abstract composition,
although I believe they're equivalent (at least as composition is
modeled by category theory).


From rob.cliffe at btinternet.com  Thu May  7 03:41:34 2015
From: rob.cliffe at btinternet.com (Rob Cliffe)
Date: Thu, 07 May 2015 02:41:34 +0100
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
Message-ID: <554AC2CE.5040705@btinternet.com>

This is no doubt *not* the best platform to raise these thoughts (which 
are nothing to do with Python - apologies), but I'm not sure where else 
to go.
I watch discussions like this ...
I watch posts like this one [Nick's] ...
...  And I despair.  I really despair.

I am a very experienced but old (some would say "dinosaur") programmer.
I appreciate the need for Unicode.  I really do.
I don't understand Unicode and all its complications AT ALL.
And I can't help wondering:
     Why, oh why, do things have to be SO FU*****G COMPLICATED?  This 
thread, for example, is way over my head.  And it is typical of many 
discussions I have stared at, uncomprehendingly.
Surely 65536 (2-byte) encodings are enough to express all characters in 
all the languages in the world, plus all the special characters we need.
Why can't there be just *ONE* universal encoding?  (Decided upon, no 
doubt, by some international standards committee. There would surely be 
enough spare codes for any special characters etc. that might come up in 
the foreseeable future.)

*Is it just historical accident* (partly due to an awkward move from 
1-byte ASCII to 2-byte Unicode, implemented in many different places, in 
many different ways) *that we now have a patchwork of encodings that we 
strive to fit into some over-complicated scheme*?
Or is there *really* some *fundamental reason* why things *can't* be 
simpler?  (Like, REALLY, _*REALLY*_ simple?)
Imageine if we were starting to design the 21st century from scratch, 
throwing away all the history?  How would we go about it?
(Maybe I'm just naive, but sometimes ... Out of the mouths of babes and 
sucklings.)
Aaaargh!  Do I really have to learn all this mumbo-jumbo?!  (Forgive me. 
:-) )
I would be grateful for any enlightenment - thanks in advance.
Rob Cliffe


On 05/05/2015 20:21, Nick Coghlan wrote:
> On 5 May 2015 at 18:23, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> So this proposal merely amounts to reintroduction of the Python 2 str
>> confusion into Python 3.  It is dangerous *precisely because* the
>> current situation is so frustrating.  These functions will not be used
>> by "consenting adults", in most cases.  Those with sufficient
>> knowledge for "informed consent" also know enough to decode encoded
>> text ASAP, and encode internal text ALAP, with appropriate handlers,
>> in the first place.
>>
>> Rather, these str2str functions will be used by programmers at the
>> ends of their ropes desperate to suppress "those damned Unicode
>> errors" by any means available.  In fact, they are most likely to be
>> used and recommended by *library* writers, because they're the ones
>> who are least like to have control over input, or to know their
>> clients' requirements for output.  "Just use rehandle_* to ameliorate
>> the errors" is going to be far too tempting for them to resist.
> The primary intended audience is Linux distribution developers using
> Python 3 as the system Python. I agree misuse in other contexts is a
> risk, but consider assisting the migration of the Linux ecosystem from
> Python 2 to Python 3 sufficiently important that it's worth our while
> taking that risk.
>
>> That Nick, of all people, supports this proposal is to me just
>> confirmation that it's frustration, and only frustration, speaking
>> here.  He used to be one of the strongest supporters of keeping
>> "native text" (Unicode) and "encoded text" separate by keeping the
>> latter in bytes.
> It's not frustration (at least, I don't think it is), it's a proposal
> for advanced tooling to deal properly with legacy *nix systems that
> either:
>
> a. use a locale encoding other than UTF-8; or
> b. don't reliably set the locale encoding for system services and cron
> jobs (which anecdotally appears to amount to "aren't using systemd" in
> the current crop of *nix init systems)
>
> If a developer only cares about Windows, Mac OS X, or modern systemd
> based *nix systems that use UTF-8 as the system locale, and they never
> set "LANG=C" before running a Python program, then these new functions
> will be completely irrelevant to them. (I've also submitted a request
> to the glibc team to make C.UTF-8 universally available, reducing the
> need to use "LANG=C", and they're amenable to the idea, but it
> requires someone to work on preparing and submitting a patch:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17318)
>
> If, however, a developer wants to handle "LANG=C", or other non-UTF-8
> locales reliably across the full spectrum of *nix systems in Python 3,
> they need a way to cope with system data that they *know* has been
> decoded incorrectly by the interpreter, as we'll potentially do
> exactly that for environment variables, command line arguments,
> stdin/stdout/stderr and more if we get bad locale encoding settings
> from the OS (such as when "LANG=C" is specified, or the init system
> simply doesn't set a locale at all and hence CPython falls back to the
> POSIX default of ASCII).
>
> Python 2 lets users sweep a lot of that under the rug, as the data at
> least round trips within the system, but you get unexpected mojibake
> in some cases (especially when taking local data and pushing it out
> over the network).
>
> Since these boundary decoding issues don't arise on properly
> configured modern *nix systems, we've been able to take advantage of
> that by moving Python 3 towards a more pragmatic and distro-friendly
> approach in coping with legacy *nix platforms and behaviours,
> primarily by starting to use "surrogateescape" by default on a few
> more system interfaces (e.g. on the standard streams when the OS
> *claims* that the locale encoding is ASCII, which we now assume to
> indicate a configuration error, which we can at least work around for
> roundtripping purposes so that "os.listdir()" works reliably at the
> interactive prompt).
>
> This change in approach (heavily influenced by the parallel "Python 3
> as the default system Python" efforts in Ubuntu and Fedora) *has*
> moved us back towards an increased risk of introducing mojibake in
> legacy environments, but the nature of that trade-off has changed
> markedly from the situation back in 2009 (let alone 2006):
>
> * most popular modern Linux systems use systemd with the UTF-8 locale,
> which "just works" from a boundary encoding/decoding perspective (it's
> closely akin to the situation we've had on Mac OS X from the dawn of
> Python 3)
> * even without systemd, most modern *nix systems at least default to
> the UTF-8 locale, which works reliably for user processes in the
> absence of an explicit setting like "LANG=C", even if service daemons
> and cron jobs can be a bit sketchier in terms of the locale settings
> they receive
> * for legacy environments migrating from Python 2 without upgrading
> the underlying OS, our emphasis has shifted to tolerating "bug
> compatibility" at the Python level in order to ease migration, as the
> most appropriate long term solution for those environments is now to
> upgrade their OS such that it more reliably provides correct locale
> encoding settings to the Python 3 interpreter (which wasn't a
> generally available option back when Python 3 first launched)
>
> Armin Ronacher (as ever) provides a good explanation of the system
> interface problems that can arise in Python 3 with bad locale encoding
> settings here: http://click.pocoo.org/4/python3/#python3-surrogates
>
> In my view, the critical helper function for this purpose is actually
> "handle_surrogateescape", as that's the one that lets us readily adapt
> from the incorrectly specified ASCII locale encoding to any other
> ASCII-compatible system encoding once we've bootstrapped into a full
> Python environment which has more options for figuring out a suitable
> encoding than just looking at the locale setting provided by the C
> runtime. It's also the function that serves to provide the primary
> "hook" where we can hang documentation of this platform specific
> boundary encoding/decoding issue.
>
> The other suggested functions are then more about providing a "peek
> behind the curtain" API for folks that want to *use Python* to explore
> some of the ins and outs of Unicode surrogate handling. Surrogates and
> astrals really aren't that complicated, but we've historically hidden
> them away as "dark magic not to be understood by mere mortals". In
> reality, they're just different ways of composing sequences of
> integers to represent text, and the suggested APIs are designed to
> expose that in a way we haven't done in the past. I can't actually
> think of a practical purpose for them other than teaching people the
> basics of how Unicode representations work, but demystifying that
> seems sufficiently worthwhile to me that I'm not opposed to their
> inclusion (bear in mind I'm also the current "dis" module maintainer,
> and a contributor to the "inspect", so I'm a big fan of exposing
> underlying concepts like this in a way that lets people play with them
> programmatically for learning purposes).
>
> Cheers,
> Nick.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/e4ec41f8/attachment-0001.html>

From python at mrabarnett.plus.com  Thu May  7 04:15:20 2015
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 07 May 2015 03:15:20 +0100
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <554AC2CE.5040705@btinternet.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
Message-ID: <554ACAB8.7010006@mrabarnett.plus.com>

On 2015-05-07 02:41, Rob Cliffe wrote:
> This is no doubt *not* the best platform to raise these thoughts (which
> are nothing to do with Python - apologies), but I'm not sure where else
> to go.
> I watch discussions like this ...
> I watch posts like this one [Nick's] ...
> ...  And I despair.  I really despair.
>
> I am a very experienced but old (some would say "dinosaur") programmer.
> I appreciate the need for Unicode.  I really do.
> I don't understand Unicode and all its complications AT ALL.
> And I can't help wondering:
>      Why, oh why, do things have to be SO FU*****G COMPLICATED?  This
> thread, for example, is way over my head.  And it is typical of many
> discussions I have stared at, uncomprehendingly.
> Surely 65536 (2-byte) encodings are enough to express all characters in
> all the languages in the world, plus all the special characters we need.
> Why can't there be just *ONE* universal encoding?  (Decided upon, no
> doubt, by some international standards committee. There would surely be
> enough spare codes for any special characters etc. that might come up in
> the foreseeable future.)
>
> *Is it just historical accident* (partly due to an awkward move from
> 1-byte ASCII to 2-byte Unicode, implemented in many different places, in
> many different ways) *that we now have a patchwork of encodings that we
> strive to fit into some over-complicated scheme*?
> Or is there *really* some *fundamental reason* why things *can't* be
> simpler?  (Like, REALLY, _*REALLY*_ simple?)
> Imageine if we were starting to design the 21st century from scratch,
> throwing away all the history?  How would we go about it?
> (Maybe I'm just naive, but sometimes ... Out of the mouths of babes and
> sucklings.)
> Aaaargh!  Do I really have to learn all this mumbo-jumbo?!  (Forgive me.
> :-) )
> I would be grateful for any enlightenment - thanks in advance.
> Rob Cliffe
>
When Unicode first came out, they thought that 65536 would be enough.
When Java was released, for example, it used 16 bits per codepoint.
Simple.

But it turned out that it wasn't enough. People have been too inventive
over thousands of years!

There's the matter of accents and other diacritics. Some languages want
to add marks to the letters to indicate a different pronunciation,
stress, tone, whatever (a character might need more than one!). Having
a separate code for each combination would lead to an _lot_ of codes,
so a better solution is to add codes that can combine with the base
character when displayed.

And then there's the matter of writing direction. Some languages go 
left-to-right, others right-to-left.

So, you think it's complicated? Don't blame Unicode, it's just trying
to cope with a very messy problem.


From mistersheik at gmail.com  Thu May  7 04:05:15 2015
From: mistersheik at gmail.com (Neil Girdhar)
Date: Wed, 6 May 2015 19:05:15 -0700 (PDT)
Subject: [Python-ideas] Why don't CPython strings implement slicing using a
	view?
Message-ID: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>

Since strings are constant, wouldn't it be much faster to implement string 
slices as a view of other strings?

For clarity, I'm talking about CPython.  I'm not talking about anything the 
user sees.  The string views would still look like regular str instances to 
the user.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150506/8aedccbd/attachment.html>

From ncoghlan at gmail.com  Thu May  7 05:56:21 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 7 May 2015 13:56:21 +1000
Subject: [Python-ideas] Add `Executor.filter`
In-Reply-To: <CANXboVagtQcE_NMjYMeHEO_xazBigRHPT9Uo1NNQUUybMcJFgg@mail.gmail.com>
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
 <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
 <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>
 <B48915FA-76CC-46FE-BC4A-05B56D15CF43@yahoo.com>
 <CANXboVagtQcE_NMjYMeHEO_xazBigRHPT9Uo1NNQUUybMcJFgg@mail.gmail.com>
Message-ID: <CADiSq7dDJ_wrcWLO3w8b3TSS4x65aZ1fW4FGVR-7A6eHcz1uZg@mail.gmail.com>

On 2 May 2015 at 19:25, Ram Rachum <ram at rachum.com> wrote:
> Okay, I implemented it. Might be getting something wrong because I've never
> worked with the internals of this module before.

I think this is sufficiently tricky to get right that it's worth
adding filter() as a parallel to the existing map() API.

However, it did raise a separate question for me: is it currently
possible to use Executor.map() and the as_completed() module level
function together? Unless I'm missing something, it doesn't look like
it, as map() hides the futures from the caller, so you only have
something to pass to as_completed() if you invoke submit() directly.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Thu May  7 06:07:17 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 7 May 2015 14:07:17 +1000
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <CAPTjJmoRK472eokBGqsB-dmQ7boS1SoAtx7+UOYnEnt_enCQMQ@mail.gmail.com>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CAPTjJmoRK472eokBGqsB-dmQ7boS1SoAtx7+UOYnEnt_enCQMQ@mail.gmail.com>
Message-ID: <CADiSq7cM0nmy_VQ5VkXWE7BUcdxuet+wc64R+-M5bdYyPh+Xsg@mail.gmail.com>

On 7 May 2015 at 01:11, Chris Angelico <rosuav at gmail.com> wrote:
> On Thu, May 7, 2015 at 12:05 AM, Thomas G?ttler
> <guettliml at thomas-guettler.de> wrote:
>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py
>>
>> This instance get replace by a common list in lines like this:
>>
>> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
>
> Forgive the obtuse question, but wouldn't an __radd__ method resolve
> this for you?

If the custom subclass is implemented in Python or otherwise
implements the C level nb_add slot, yes, if it's implemented in C and
only provides sq_concat without nb_add, no (courtesy of
http://bugs.python.org/issue11477, which gets the operand precedence
dance wrong for sequence types that only implement the sequence
methods and not the corresponding numeric ones)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From rosuav at gmail.com  Thu May  7 06:13:05 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 7 May 2015 14:13:05 +1000
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <CADiSq7cM0nmy_VQ5VkXWE7BUcdxuet+wc64R+-M5bdYyPh+Xsg@mail.gmail.com>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CAPTjJmoRK472eokBGqsB-dmQ7boS1SoAtx7+UOYnEnt_enCQMQ@mail.gmail.com>
 <CADiSq7cM0nmy_VQ5VkXWE7BUcdxuet+wc64R+-M5bdYyPh+Xsg@mail.gmail.com>
Message-ID: <CAPTjJmptjaQUkk2g478vy5OFi0ciqyR-eQJXAz7cYK4iwYBo4Q@mail.gmail.com>

On Thu, May 7, 2015 at 2:07 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 7 May 2015 at 01:11, Chris Angelico <rosuav at gmail.com> wrote:
>> On Thu, May 7, 2015 at 12:05 AM, Thomas G?ttler
>> <guettliml at thomas-guettler.de> wrote:
>>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py
>>>
>>> This instance get replace by a common list in lines like this:
>>>
>>> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
>>
>> Forgive the obtuse question, but wouldn't an __radd__ method resolve
>> this for you?
>
> If the custom subclass is implemented in Python or otherwise
> implements the C level nb_add slot, yes, if it's implemented in C and
> only provides sq_concat without nb_add, no (courtesy of
> http://bugs.python.org/issue11477, which gets the operand precedence
> dance wrong for sequence types that only implement the sequence
> methods and not the corresponding numeric ones)

Okay, so it mightn't be quite as simple as I thought, but it should
still be in the control of the author of the subclass, right? That
ought to be easier than trying to stop everyone else from mutating
sys.path.

ChrisA

From ncoghlan at gmail.com  Thu May  7 06:22:55 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 7 May 2015 14:22:55 +1000
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
Message-ID: <CADiSq7e41R-Tu8e2e=AF1Qt2UaH8H2_icedksCJLtLYsrK_qSA@mail.gmail.com>

On 7 May 2015 at 02:23, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> A big blocker to making certain sweeping changes to CPython (e.g.
> ref-counting) is compatibility with the vast body of C extension
> modules out there that use the C-API.  While there are certainly
> drastic long-term solutions to that problem, there is one thing we can
> do in the short-term that would at least get the ball rolling.  We can
> put a big red note at the top of every page of the C-API docs that
> encourages folks to either use CFFI or Cython.
>
> Thoughts?

Rather than embedding these recommendations directly in the version
specific CPython docs, I'd prefer to see contributions to fill in the
incomplete sections in
https://packaging.python.org/en/latest/extensions.html with links back
to the relevant parts of the C API documentation and docs for other
projects (I was able to write the current overview section on that
page in a few hours, as I didn't need to do much research for that,
but filling in the other sections properly involves significantly more
work).

That page is already linked from the landing page for the extending &
embedding documentation as part of a recommendation to consider the
use of third party tools rather than handcrafting your own extension
modules: https://docs.python.org/3/extending/index.html#recommended-third-party-tools

The landing page for the C API docs links back to the extending &
embedding guide, but the link is embedded in the header paragraph
rather than being a See Also link:
https://docs.python.org/3/c-api/index.html

Cheers,
Nick.

>
> -eric
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Thu May  7 07:27:14 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 7 May 2015 15:27:14 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <554AC2CE.5040705@btinternet.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
Message-ID: <CADiSq7deKc6k6f3OXGOjV98AhXumicn10NjXzWzuXq8mbTQy8w@mail.gmail.com>

On 7 May 2015 at 11:41, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
> Or is there really some fundamental reason why things can't be simpler?
> (Like, REALLY, REALLY simple?)

Yep, there are around 7 billion fundamental reasons currently alive,
and I have no idea how many that have gone before us: humans :)

Unicode is currently messy and complicated because human written
communication is messy and complicated, and that inherent complexity
didn't go anywhere once we started networking our computers together
and digitising our historical records.

Early versions of Unicode attempted to simplify things by only
considering dictionary words in major living languages (which got them
under 65k characters), but folks in Asia and elsewhere were
understandably upset when the designers attempted to explain why it
was OK for a "universal" encoding to not be able to correctly
represent the names of people and places, while archivists and
historical researchers were similarly unimpressed when the designers
tried to explain why their "universal" encoding didn't adequately
cover texts that were more than a few decades old. Breaking down the
walls between historically silo'ed communications networks then made
things even more complicated, as historical proprietary encodings from
different telco networks needed to be mapped to the global standard
(this last process is a large part of where the assortment of emoji
characters in Unicode comes from).

However, most of the messiness and complexity in the digital realm
actually arises at the boundary between Unicode and *other encodings*.
That's why the fact that POSIX still uses ASCII as the default
encoding is such a pain, and why Apple instead unilaterally declared
that "everything shall be UTF-8" for Mac OS X, while Microsoft and
Java eventually settled on new UTF-16 APIs. We can't even assume ASCII
compatibility in general, as codecs like Shift-JIS, ISO-2022 and
various other East Asian codecs date from an era where international
network connectivity simply wasn't a problem encoding designers needed
to worry about, so solving *local* computing problems was a much
larger concern than compatibility with DARPA's then nascent internet
protocols.

I wrote an article attempting to summarise some of that history last
year: http://developerblog.redhat.com/2014/09/09/transition-to-multilingual-programming-python/

And gave a presentation about it at Australia's OSDC 2014 that
connected some of the dots even further back in history:
https://www.youtube.com/watch?v=xOadSc69Hrw (I also just noticed my
notes for the latter aren't currently online, which is an oversight
I'll aim to fix before too long).

As things stand, one suggestion I make to folks truly trying to
understand why we need Unicode (with all its complexity), is to
attempt to learn a foreign language that *doesn't use a latin based
script*. My own Japanese is atrociously bad, but it's good enough that
I can appreciate just how Anglo-centric most programming languages
(including Python) are. I'm also fully cognizant of the fact that as
bad as my written and spoken Japanese are, my ability to enter
Japanese text into a computer is entirely non-existent.

> Imageine if we were starting to design the 21st century from scratch,
> throwing away all the history?  How would we go about it?

We'd invite Japanese, Chinese, Indian, African, etc developers to get
involved in the design process much earlier than we did. Ideally back
when the Western Union telegraph was first being designed, as the
consequences of some of those original binary encoding design choices
are still felt today :)

http://utf8everywhere.org/ makes the case that the closest we have to
that today is UTF-8 + streaming compression, and it's a fairly
compelling story. However, it's premised on a world where string
processing algorithms are all written to be UTF-8 aware, when a lot of
them, including those used in the Python standard library, were in
fact written assuming fixed width encodings. Hence the Python 3.3
flexible string representation model, where string internal storage is
sized according to the largest code point, and you need to use
StringIO if you want to avoid having a single higher plane code point
significantly increase the memory consumption of your string.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From benjamin at python.org  Thu May  7 07:37:33 2015
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 7 May 2015 05:37:33 +0000 (UTC)
Subject: [Python-ideas]
	=?utf-8?q?Why_don=27t_CPython_strings_implement_sl?=
	=?utf-8?q?icing_using_a=09view=3F?=
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
Message-ID: <loom.20150507T073654-135@post.gmane.org>

Neil Girdhar <mistersheik at ...> writes:

> 
> Since strings are constant, wouldn't it be much faster to implement string
slices as a view of other strings?

Maybe for some workloads, but you can end up keeping a large string alive
and taking up memory with such an approach.


From ncoghlan at gmail.com  Thu May  7 07:55:07 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 7 May 2015 15:55:07 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7deKc6k6f3OXGOjV98AhXumicn10NjXzWzuXq8mbTQy8w@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <CADiSq7deKc6k6f3OXGOjV98AhXumicn10NjXzWzuXq8mbTQy8w@mail.gmail.com>
Message-ID: <CADiSq7etv_xO4Qm8014C4yquV8yrWOGMoCfws+toPLRiVAtM4A@mail.gmail.com>

On 7 May 2015 at 15:27, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 7 May 2015 at 11:41, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
>> Or is there really some fundamental reason why things can't be simpler?
>> (Like, REALLY, REALLY simple?)
>
> Yep, there are around 7 billion fundamental reasons currently alive,
> and I have no idea how many that have gone before us: humans :)

Heh, a message from Stephen off-list made me realise that an info dump
of all the reasons the edge cases are hard probably wasn't a good way
to answer your question :)

What "we're" working towards (where "we" ~= the Unicode consortium +
operating system designers + programming language designers) is a
world where everything "just works", and computers talk to humans in
each human's preferred language (or a collection of languages,
depending on what the human is doing), and to each other in Unicode.
There are then a whole host of technical and political reasons why
it's taking decades to get from the historical point A (where
computers talk to humans in at most one language at a time, and don't
talk to each other at all) to that desired point B.

We'll know we're done with that transition when Unicode becomes almost
transparently invisible, and the vast majority of programmers are once
again able to just deal with "text" without worrying too much about
how it's represented internally (but also having their programs be
readily usable in language's other than their own).

Python 3 is already a lot closer to that ideal than Python 2 was, but
there are still some rough edges to iron out. The ones I'm personally
aware of affecting 3.4+ (including the one Serhiy started this thread
about) are listed as dependencies of http://bugs.python.org/issue22555

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From guettliml at thomas-guettler.de  Thu May  7 08:00:09 2015
From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=)
Date: Thu, 07 May 2015 08:00:09 +0200
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
Message-ID: <554AFF69.9050404@thomas-guettler.de>

Am 06.05.2015 um 17:07 schrieb Paul Moore:
> On 6 May 2015 at 15:05, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
>> I am missing a policy how sys.path should be altered.
> 
> Well, the docs say that applications can modify sys.path as needed.
> Generally, applications modify sys.path in place via sys.path[:] =
> whatever, but that's not mandated as far as I know.
> 
>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py
> 
> Can you explain why? 

I forgot to explain the why I use a custom class. Sorry, here is the background.

I want sys.path to ordered:

 1. virtualenv
 2. /usr/local/
 3. /usr/lib

We use virtualenvs with system site-packages.

There are many places where sys.path gets altered.

The last time we had sys.path problems I tried to write a test
which checks that sys.path is the same for cron jobs and web requests.
I failed. Too many places,  I could not find all the places
and the conditions where sys.path got modified in a different way.

> It seems pretty risky to expect that no
> applications will replace sys.path. I understand that you're proposing
> that we say that applications shouldn't do that - but just saying so
> won't change the many applications already out there.

Of course I know that if we agree on a policy, it wont' change existing code
in one second. But if there is an official policy, you are able to
write bug reports like this "Please alter sys.path according to the docs. See http://www.python.org/...."

The next thing: If someone wants to add to sys.path, most of the
time the developer inserts its new entries in the front of the list.

This can break the ordering if you don't use a custom list class.




>> This instance get replace by a common list in lines like this:
>>
>> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
>>
>> The above line is from pip, it similar things happen in a lot of packages.
> 
> How does the fact that pip does that cause a problem? The sys.path
> modification is only in effect while pip is running, and no code in
> pip relies on sys.path being an instance of your custom class.

pip is a special case, since the pip authors say "we don't provide an API".
But they have handy methods which we want to use. We use "import pip"
and the class of sys.path of our application gets altered.


>> Before trying to solve this with code, I think the python community should
>> agree an a policy for altering sys.path.
> 
> I can't imagine that happening, and even if it does, it won't make any
> difference because a new policy won't change existing code. It won't
> even affect new code unless people know about it (which isn't certain
> - I doubt many people read the documentation that closely).

Code updates will happen step by step.
If someone has a problem, since his custom list class in sys.path gets
altered, he will write a bug report to the maintainer. A bug report
referencing official python docs has more weight. 

>> What can I do to this done?
> 
> I doubt you can.
> 
> A PR for pip that changes the above line to modify sys.path in place
> would probably get accepted (I can't see any reason why it wouldn't),
> and I guess you could do the same for any other code you find. But as
> for persuading the Python programming community not to replace
> sys.path in any code, that seems unlikely to happen.
> 
>> We use Python 2.7
> 
> If you were using 3.x, then it's (barely) conceivable that making
> sys.path read-only (so people could only modify it in-place) could be
> done as a new feature, but (a) it would be a major backward
> compatibility break, so there would have to be a strong justification,
> and (b) it would stop you from replacing sys.path with your custom
> class in the first place, so it wouldn't solve your issue.
> 
> Which also raises the question, why do you believe it's OK to forbid
> other people to replace sys.path when that's what you're doing in your
> sitecustomize code? That seems self-contradictory...

Yes, you are right this looks self-contradictory.
I am the one which is responsible for the set up of the environment.

Where is the best place during the interpreter initialization for
altering the class of sys.path? I guess it is sitecustomize. After
it was executed sys.path should be altered only in-place. 

Regards,
  Thomas G?ttler


-- 
http://www.thomas-guettler.de/

From ram at rachum.com  Thu May  7 08:02:25 2015
From: ram at rachum.com (Ram Rachum)
Date: Thu, 7 May 2015 09:02:25 +0300
Subject: [Python-ideas] Add `Executor.filter`
In-Reply-To: <CADiSq7dDJ_wrcWLO3w8b3TSS4x65aZ1fW4FGVR-7A6eHcz1uZg@mail.gmail.com>
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
 <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
 <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>
 <B48915FA-76CC-46FE-BC4A-05B56D15CF43@yahoo.com>
 <CANXboVagtQcE_NMjYMeHEO_xazBigRHPT9Uo1NNQUUybMcJFgg@mail.gmail.com>
 <CADiSq7dDJ_wrcWLO3w8b3TSS4x65aZ1fW4FGVR-7A6eHcz1uZg@mail.gmail.com>
Message-ID: <CANXboVZ4QSSo75iYz+JVhZForjkrsw+2t93uCUd9rA2iQkfDcQ@mail.gmail.com>

Funny, I suggested these 2 in the past:
https://groups.google.com/forum/m/#!searchin/python-ideas/map_as_completed/python-ideas/
<https://groups.google.com/forum/m/#!searchin/python-ideas/map_as_completed/python-ideas/VZBdUbYcQjg>
VZBdUbYcQjg
<https://groups.google.com/forum/m/#!searchin/python-ideas/map_as_completed/python-ideas/VZBdUbYcQjg>

https://groups.google.com/forum/m/#!searchin/python-ideas/as_completed/python-ideas/yGADxChihhk

Sent from my phone.
On 2 May 2015 at 19:25, Ram Rachum <ram at rachum.com> wrote:
> Okay, I implemented it. Might be getting something wrong because I've
never
> worked with the internals of this module before.

I think this is sufficiently tricky to get right that it's worth
adding filter() as a parallel to the existing map() API.

However, it did raise a separate question for me: is it currently
possible to use Executor.map() and the as_completed() module level
function together? Unless I'm missing something, it doesn't look like
it, as map() hides the futures from the caller, so you only have
something to pass to as_completed() if you invoke submit() directly.

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/9f57b398/attachment-0001.html>

From tjreedy at udel.edu  Thu May  7 08:10:46 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 07 May 2015 02:10:46 -0400
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
Message-ID: <mievl9$s9e$1@ger.gmane.org>

On 5/6/2015 10:05 PM, Neil Girdhar wrote:
> Since strings are constant, wouldn't it be much faster to implement
> string slices as a view of other strings?
>
> For clarity, I'm talking about CPython.  I'm not talking about anything
> the user sees.  The string views would still look like regular str
> instances to the user.

The idea has been discussed and rejected.  See pydev thread 'The "lazy 
strings" patch", Oct 2006, for one example.

I think the best solution is a separate Seqview class.  On the thread 
above, Josiah Carlson pointed out that he had made such a class that 
worked with multiple Python versions *and* with any sequence class.  The 
only computation involved is addition of start values to indexes when 
accessing the underlying object, and that is not specific to strings. 
There might be something on PyPI already, but PyPI cannot search for 
compounds such as "string view" (or "lazy string").

The three dict view classes are, obviouly, separate classes.  They 
happen to be created with dict methods.  But that is partly for 
historical reasons -- the methods already existed  but returned lists in 
2.x.  The views were only a change in the output class (and the removal 
of arbitrary order).  The API could have been dict_keys(somedict), with 
'dict_keys' a builtin name.  So there is nothing actually wrong with 
Seqview(seq, start, stop, step=1).

-- 
Terry Jan Reedy


From ncoghlan at gmail.com  Thu May  7 08:22:15 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 7 May 2015 16:22:15 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <4D8FF17C-1D0B-42C8-A55F-0479A652321F@yahoo.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <4D8FF17C-1D0B-42C8-A55F-0479A652321F@yahoo.com>
Message-ID: <CADiSq7ecOUCpb1r8HRZOR5KmeuNyBCT-H+aV8kvudLeM8fYsVg@mail.gmail.com>

On 6 May 2015 at 14:00, Andrew Barnert <abarnert at yahoo.com> wrote:
> It seems like launchd systems are as good as systemd systems here. Or are you not considering OS X a *nix?
>
> I suppose given than the timeline for Apple to switch to Python 3 as the default Python is "maybe it'll happen, but we'll never tell you until a month before the public beta", it isn't really all that relevant...

We don't look at the locale encoding at all when it comes to system
interfaces on Mac OS X - CPython is hardcoded to use UTF-8 instead.
While Apple's tight control over their ecosystem alienates me as a
consumer, it certainly has its advantages as a developer :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Thu May  7 08:47:19 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 7 May 2015 16:47:19 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7c2_Eqzy0r_tokEvGdVj+7a1THKZ0w+7MPyQUji2KusvQ@mail.gmail.com>

On 6 May 2015 at 17:56, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Nick Coghlan writes:
>  > The other suggested functions are then more about providing a "peek
>  > behind the curtain" API for folks that want to *use Python* to explore
>  > some of the ins and outs of Unicode surrogate handling.
>
> I just don't see a need.  .encode and .decode already give you all the
> tools you need for exploring, and they do so in a way that tells you
> via the type whether you're looking at abstract text or at the
> representation.  It doesn't get better than this!
>
> And if the APIs merely exposed the internal representation that would
> be one thing.  But they don't, and the people who are saying, "I'm not
> an expert on Unicode but this looks great!" are clearly interested in
> mutating str instances to be something more palatable to the requisite
> modules and I/O systems they need to use, but which aren't prepared for
> astral characters or proper handling of surrogateescapes.
>
>  > I can't actually think of a practical purpose for them other than
>  > teaching people the basics of how Unicode representations work,
>
> I agree, but it seems to me that a lot of people are already scheming
> to use them for practical purposes.  Serhiy mentions tkinter, email,
> and wsgiref, and David lusts after them for email.

While I personally care about the OS boundary case, that's not the
only "the metadata cannot be fully trusted" case that comes up (and
yes, I know I'm contradicting what I posted yesterday - I hadn't
reread the issue tracker thread at that point, so I'd forgotten the
cases the others had mentioned, and hadn't even fully reloaded my own
rationale for wanting the feature back into my brain).

The key operation to be supported by the proposed APIs is to allow a
piece of code to interrogate a string object to ask: "Was this string
permissively decoded *and* did that process leave some invalid code
points in the string?".

Essentially, it's designed to cover the cases where the interpreter
(or someone else) is using the "surrogateescape" or "surrogatepass"
error handler when decoding some input data to text (I don't believe
the interpreter defaults to using surrogatepass anywhere, but we do
use surrogateescape in several places).

If your code has direct control over the decoding step, you don't need
anything new to deal with this appropriately, as you can just change
the error handling mode to "strict" and be done with it.

However, if you *don't* have control over the decoding step, then a)
you can't switch the decoding step to a different error handler (as
that's not happening in your code); and b) you don't necessarily know
what the assumed encoding was, so your best guess is going to be
"hopefully something ASCII compatible", which is going to introduce
all kinds of other complexity as you have to start considering what
happens for code points outside the surrogate area if you do an
encode()/decode() dance in order to apply a different error handler to
the smuggled surrogates.

Hence the rehandle_surrogatepass() and rehandle_surrogateescape()
methods: by default, they will both *throw an exception* if there is
improperly decoded data in the input, as they apply the "strict" input
error handler instead of whichever one was actually used. This lets
you control where such errors are detected (e.g. at the point where
the string is first given to your code), rather than having it happen
implicitly later when you attempt to encode those strings to bytes.

rehandle_surrogateescape() also has the virtue of scanning the
supplied string for *other* lone surrogates (created via
surrogatepass) and *always* complaining about them (again, at a point
you choose, rather than happening unexpectedly elsewhere in the code,
often as part of an IO operation).

The "errors" argument is then designed to let you apply an arbitrary
*input* error handler to surrogates that were originally let through
by "surrogatepass" or "surrogateescape" (again, the assumption here is
that you don't control the code that did the original decoding). If
you decide to throw that improperly decoded data away entirely, you
may use "replace" or "ignore" to clean it out. Alternatively, you may
use "backslashreplace" (which is now usable on decoding as well as on
encoding) to replace the unknown bytes with their hexadecimal
representation.

Regardless of which specific approach you take, handling surrogates
explicitly when a string is passed to you from an API that uses
permissive decoding lets you avoid both unexpected UnicodeEncodeError
exceptions (if the surrogates end up being encoded with an error
handler other than surrogatepass or surrogateescape) or propagating
mojibake (if the surrogates are encoded with a suitable error handler,
but an encoding that differs from the original).

As far as "handle_astrals()" and friends go, I previously suggested on
the issue that they could potentially be considered as a separate RFE,
as their practical applicability is likely to be limited to cases
where you need to deal with a UCS-2 (note: *not* UTF-16) API for some
reason. I think they highlight any interesting aspect of what
surrogate and astral code points *are*, but they don't have the same
input validation use case that rehandle_surrogatepass and
rehandle_surrogateescape do.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From me at the-compiler.org  Thu May  7 08:48:36 2015
From: me at the-compiler.org (Florian Bruhin)
Date: Thu, 7 May 2015 08:48:36 +0200
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <554AFF69.9050404@thomas-guettler.de>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
 <554AFF69.9050404@thomas-guettler.de>
Message-ID: <20150507064836.GR429@tonks>

* Thomas G?ttler <guettliml at thomas-guettler.de> [2015-05-07 08:00:09 +0200]:
> Am 06.05.2015 um 17:07 schrieb Paul Moore:
> > On 6 May 2015 at 15:05, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
> >> I am missing a policy how sys.path should be altered.
> > 
> > Well, the docs say that applications can modify sys.path as needed.
> > Generally, applications modify sys.path in place via sys.path[:] =
> > whatever, but that's not mandated as far as I know.
> > 
> >> We run a custom sub class of list in sys.path. We set it in sitecustomize.py
> > 
> > Can you explain why? 
> 
> I forgot to explain the why I use a custom class. Sorry, here is the background.
> 
> I want sys.path to ordered:
> 
>  1. virtualenv
>  2. /usr/local/
>  3. /usr/lib
> 
> We use virtualenvs with system site-packages.
> 
> There are many places where sys.path gets altered.
> 
> The last time we had sys.path problems I tried to write a test
> which checks that sys.path is the same for cron jobs and web requests.
> I failed. Too many places,  I could not find all the places
> and the conditions where sys.path got modified in a different way.

It looks like you explained *how* you do what you do, but not *why* -
what problem is this solving? Why can't you just invoke the
virtualenv's python and let python take care of sys.path?

$ ./venv/bin/python -c 'import sys; from pprint import pprint; pprint(sys.path)'
['',
 '/home/user/venv/lib/python2.7',
 '/home/user/venv/lib/python2.7/plat-x86_64-linux-gnu',
 '/home/user/venv/lib/python2.7/lib-tk',
 '/home/user/venv/lib/python2.7/lib-old',
 '/home/user/venv/lib/python2.7/lib-dynload',
 '/usr/lib/python2.7',
 '/usr/lib/python2.7/plat-x86_64-linux-gnu',
 '/usr/lib/python2.7/lib-tk',
 '/home/user/venv/local/lib/python2.7/site-packages',
 '/home/user/venv/lib/python2.7/site-packages']

Florian

-- 
http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP)
   GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc
         I love long mails! | http://email.is-not-s.ms/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/60b82101/attachment.sig>

From robertc at robertcollins.net  Thu May  7 08:54:02 2015
From: robertc at robertcollins.net (Robert Collins)
Date: Thu, 7 May 2015 18:54:02 +1200
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <554AFF69.9050404@thomas-guettler.de>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
 <554AFF69.9050404@thomas-guettler.de>
Message-ID: <CAJ3HoZ2q3t7KvZ6kD8Ew6KjWrZS-CNxi3Paz5YL-kzeLne4xww@mail.gmail.com>

On 7 May 2015 at 18:00, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
> Am 06.05.2015 um 17:07 schrieb Paul Moore:

> pip is a special case, since the pip authors say "we don't provide an API".
> But they have handy methods which we want to use. We use "import pip"
> and the class of sys.path of our application gets altered.

Submit a PR to move the sys.path changes into something triggered by
the CLI entrypoint rather than an import side effect. I see no
in-principle issue with that.

-Rob


-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud

From guettliml at thomas-guettler.de  Thu May  7 08:59:10 2015
From: guettliml at thomas-guettler.de (=?windows-1252?Q?Thomas_G=FCttler?=)
Date: Thu, 07 May 2015 08:59:10 +0200
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <20150507064836.GR429@tonks>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
 <554AFF69.9050404@thomas-guettler.de> <20150507064836.GR429@tonks>
Message-ID: <554B0D3E.9020708@thomas-guettler.de>



Am 07.05.2015 um 08:48 schrieb Florian Bruhin:
> * Thomas G?ttler <guettliml at thomas-guettler.de> [2015-05-07 08:00:09 +0200]:
>> Am 06.05.2015 um 17:07 schrieb Paul Moore:
>>> On 6 May 2015 at 15:05, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
>>>> I am missing a policy how sys.path should be altered.
>>>
>>> Well, the docs say that applications can modify sys.path as needed.
>>> Generally, applications modify sys.path in place via sys.path[:] =
>>> whatever, but that's not mandated as far as I know.
>>>
>>>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py
>>>
>>> Can you explain why?
>>
>> I forgot to explain the why I use a custom class. Sorry, here is the background.
>>
>> I want sys.path to ordered:
>>
>>   1. virtualenv
>>   2. /usr/local/
>>   3. /usr/lib
>>
>> We use virtualenvs with system site-packages.
>>
>> There are many places where sys.path gets altered.
>>
>> The last time we had sys.path problems I tried to write a test
>> which checks that sys.path is the same for cron jobs and web requests.
>> I failed. Too many places,  I could not find all the places
>> and the conditions where sys.path got modified in a different way.
>
> It looks like you explained *how* you do what you do, but not *why* -
> what problem is this solving? Why can't you just invoke the
> virtualenv's python and let python take care of sys.path?

I want the sys.path be ordered like it, since I want that packages of the inner
environment are tried first.

Here "inner" means "upper" in the above sys.path order.

Example: If a package is installed in the virtualenv with version 2.2 and
in global site packages with version 1.0, then I want the interpreter to
use the version from virtualenv.

Does this explain the *why* enough? If not, please tell me what you want to know.

Regards,
   Thomas G?ttler



From abarnert at yahoo.com  Thu May  7 08:58:42 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 6 May 2015 23:58:42 -0700
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CALFfu7DNYsH_9SYxuwcKvNSM38kmzdx96WnbtzS4qagTsaJqwg@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <E8C9B3CA-992A-4A48-BDF8-37980BB0F737@yahoo.com>
 <CALFfu7DNYsH_9SYxuwcKvNSM38kmzdx96WnbtzS4qagTsaJqwg@mail.gmail.com>
Message-ID: <40B94090-3EA2-4970-BA48-81A18A61951B@yahoo.com>

On May 6, 2015, at 15:00, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> 
>> On Wed, May 6, 2015 at 1:59 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>>> On May 6, 2015, at 09:23, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>>> 
>>> A big blocker to making certain sweeping changes to CPython (e.g.
>>> ref-counting) is compatibility with the vast body of C extension
>>> modules out there that use the C-API.  While there are certainly
>>> drastic long-term solutions to that problem, there is one thing we can
>>> do in the short-term that would at least get the ball rolling.  We can
>>> put a big red note at the top of every page of the C-API docs that
>>> encourages folks to either use CFFI or Cython.
>> 
>> Does this mean you also want to discourage boost::python, SIP, SWIG, etc., which as far as I know come down to automatically building C API extensions, and would need to be completely rewritten if you wanted to make them work a different way?
> 
> Not really.  I mentioned CFFI and Cython specifically because they are
> the two that kept coming up in previous discussions related to
> discouraging use of the C-API.  If C extensions were always generated
> using tools, then only tools would have to adapt to (drastic) changes
> in the C-API.  That would be a much better situation than the status
> quo since it drastically reduces the impact of changes.

OK, that makes sense to me. Even if there are a dozen wrappers and wrapper generators (and I think it's more like 4 or 5...), and we had to get buy-in from all of them (or get buy-in from most of them and reluctantly decide to screw over the last one), that's still orders of magnitude easier than getting buy-in from (or screw over) the 69105 people who are currently maintaining or building a C API extension, so it's still a huge win.

I'm not sure it would do nearly enough, at least not for a long time (how many of the current top 100 projects on PyPI use C API extensions and would be non-trivial to rewrite?), but obviously you can make the point that if we don't do anything, we'll _never_ get there.

From me at the-compiler.org  Thu May  7 09:22:23 2015
From: me at the-compiler.org (Florian Bruhin)
Date: Thu, 7 May 2015 09:22:23 +0200
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <554B0D3E.9020708@thomas-guettler.de>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
 <554AFF69.9050404@thomas-guettler.de> <20150507064836.GR429@tonks>
 <554B0D3E.9020708@thomas-guettler.de>
Message-ID: <20150507072223.GS429@tonks>

* Thomas G?ttler <guettliml at thomas-guettler.de> [2015-05-07 08:59:10 +0200]:
> 
> 
> Am 07.05.2015 um 08:48 schrieb Florian Bruhin:
> >* Thomas G?ttler <guettliml at thomas-guettler.de> [2015-05-07 08:00:09 +0200]:
> >>Am 06.05.2015 um 17:07 schrieb Paul Moore:
> >>>On 6 May 2015 at 15:05, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
> >>>>I am missing a policy how sys.path should be altered.
> >>>
> >>>Well, the docs say that applications can modify sys.path as needed.
> >>>Generally, applications modify sys.path in place via sys.path[:] =
> >>>whatever, but that's not mandated as far as I know.
> >>>
> >>>>We run a custom sub class of list in sys.path. We set it in sitecustomize.py
> >>>
> >>>Can you explain why?
> >>
> >>I forgot to explain the why I use a custom class. Sorry, here is the background.
> >>
> >>I want sys.path to ordered:
> >>
> >>  1. virtualenv
> >>  2. /usr/local/
> >>  3. /usr/lib
> >>
> >>We use virtualenvs with system site-packages.
> >>
> >>There are many places where sys.path gets altered.
> >>
> >>The last time we had sys.path problems I tried to write a test
> >>which checks that sys.path is the same for cron jobs and web requests.
> >>I failed. Too many places,  I could not find all the places
> >>and the conditions where sys.path got modified in a different way.
> >
> >It looks like you explained *how* you do what you do, but not *why* -
> >what problem is this solving? Why can't you just invoke the
> >virtualenv's python and let python take care of sys.path?
> 
> I want the sys.path be ordered like it, since I want that packages of the inner
> environment are tried first.
> 
> Here "inner" means "upper" in the above sys.path order.
> 
> Example: If a package is installed in the virtualenv with version 2.2 and
> in global site packages with version 1.0, then I want the interpreter to
> use the version from virtualenv.

That's already the default virtualenv behaviour:

# apt-get install python-requests
[...]
Unpacking python-requests (2.4.3-6) ...
$ ./venv/bin/pip install requests
[...]
  Downloading requests-2.7.0-py2.py3-none-any.whl (470kB): 470kB downloaded

$ python -c 'import requests; print requests.__version__'
2.4.3
$ ./venv/bin/python -c 'import requests; print requests.__version__'
2.7.0

> Does this explain the *why* enough? If not, please tell me what you want to know.

I'm mainly trying to find out why you're modifying sys.path by hand
instead of using what virtualenv already provides. There might be a
good reason for that, but to me it seems like you're reinventing the
wheel ;)

Florian

-- 
http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP)
   GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc
         I love long mails! | http://email.is-not-s.ms/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/ed47ebc5/attachment.sig>

From abarnert at yahoo.com  Thu May  7 09:24:11 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 7 May 2015 00:24:11 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <554AC2CE.5040705@btinternet.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
Message-ID: <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>

On May 6, 2015, at 18:41, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
> 
> This is no doubt not the best platform to raise these thoughts (which are nothing to do with Python - apologies), but I'm not sure where else to go.
> I watch discussions like this ...
> I watch posts like this one [Nick's] ...
> ...  And I despair.  I really despair.
> 
> I am a very experienced but old (some would say "dinosaur") programmer.
> I appreciate the need for Unicode.  I really do.
> I don't understand Unicode and all its complications AT ALL.
> And I can't help wondering:
>     Why, oh why, do things have to be SO FU*****G COMPLICATED?  This thread, for example, is way over my head.  And it is typical of many discussions I have stared at, uncomprehendingly.
> Surely 65536 (2-byte) encodings are enough to express all characters in all the languages in the world, plus all the special characters we need.

Ironically, that idea is exactly why there are problems even within the "all-Unicode" world where cp1252 and Big5 and Shift-JIS don't exist.

Apple, Microsoft, Sun, and a few other vendors jumped on the Unicode bandwagon early and committed themselves to the idea that 2 bytes is enough for everything. When the world discovered that wasn't true, we were stuck with a bunch of APIs that insisted on 2 bytes. Apple was able to partly make a break with that era, but Windows and Java are completely stuck with "Unicode means 16-bit" forever, which is why the whole world is stuck dealing with UTF-16 and surrogates forever.

> Why can't there be just ONE universal encoding?  

There is, UTF-8.

Except sometimes you have algorithms that require fixed width, so you need UTF-32.

And Java and Windows need UTF-16.

And a few Internet protocols need UTF-7.

And DNS needs a sort-of-UTF-5 called IDNA.

At least everything else can die. Once every document stored in an old IBM code page or similar gets transliterated or goes away. Unfortunately, there are still people creating cp1252 documents every day on brand-new Windows desktops (and there are still people creating filenames on Latin-1 filesystems on older Linux and Unix boxes, but that's dying out a lot faster), so who knows when that day will come. Python can't force it. Even the Unicode committee can't force it (especially since Microsoft is one of the most active members).

> (Decided upon, no doubt, by some international standards committee. There would surely be enough spare codes for any special characters etc. that might come up in the foreseeable future.)
> 
> Is it just historical accident (partly due to an awkward move from 1-byte ASCII to 2-byte Unicode, implemented in many different places, in many different ways) that we now have a patchwork of encodings that we strive to fit into some over-complicated scheme?

UTF-16 is a historical accident, and UTF-7 and IDNA. And all of the non-Unicode encodings, even more so.

> Or is there really some fundamental reason why things can't be simpler?  (Like, REALLY, REALLY simple?)

We really do need at least UTF-8 and UTF-32. But that's it. And I think that's simple enough.

> Imageine if we were starting to design the 21st century from scratch, throwing away all the history?  How would we go about it?

If we could start over with a clean slate today, I'm pretty sure we would have just one character set, Unicode, and two encodings, UTF-8 and UTF-32, and everyone would be happy (except for a small group in Japan who insist TRON's text model is better, but we can ignore them).

In particular, this would mean that in Python, a bytes is either UTF-8, or not text. No need to specify codecs or error handlers, no surrogates (and definitely no surrogate escapes), etc.

Plus, we'd have no daylight savings time, no changing timezone boundaries, seamless PyPI failovers, sensible drug laws, cars that run forever using garbage as fuel, no war, no crime, and Netflix would never remove a season when you're on episode 11 out of 13. (Unfortunately, we would still have perl. I don't know why, but I know we would.)

> (Maybe I'm just naive, but sometimes ... Out of the mouths of babes and sucklings.)
> Aaaargh!  Do I really have to learn all this mumbo-jumbo?!  (Forgive me. :-) )
> I would be grateful for any enlightenment - thanks in advance.
> Rob Cliffe
> 
> 
>> On 05/05/2015 20:21, Nick Coghlan wrote:
>>> On 5 May 2015 at 18:23, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>>> So this proposal merely amounts to reintroduction of the Python 2 str
>>> confusion into Python 3.  It is dangerous *precisely because* the
>>> current situation is so frustrating.  These functions will not be used
>>> by "consenting adults", in most cases.  Those with sufficient
>>> knowledge for "informed consent" also know enough to decode encoded
>>> text ASAP, and encode internal text ALAP, with appropriate handlers,
>>> in the first place.
>>> 
>>> Rather, these str2str functions will be used by programmers at the
>>> ends of their ropes desperate to suppress "those damned Unicode
>>> errors" by any means available.  In fact, they are most likely to be
>>> used and recommended by *library* writers, because they're the ones
>>> who are least like to have control over input, or to know their
>>> clients' requirements for output.  "Just use rehandle_* to ameliorate
>>> the errors" is going to be far too tempting for them to resist.
>> The primary intended audience is Linux distribution developers using
>> Python 3 as the system Python. I agree misuse in other contexts is a
>> risk, but consider assisting the migration of the Linux ecosystem from
>> Python 2 to Python 3 sufficiently important that it's worth our while
>> taking that risk.
>> 
>>> That Nick, of all people, supports this proposal is to me just
>>> confirmation that it's frustration, and only frustration, speaking
>>> here.  He used to be one of the strongest supporters of keeping
>>> "native text" (Unicode) and "encoded text" separate by keeping the
>>> latter in bytes.
>> It's not frustration (at least, I don't think it is), it's a proposal
>> for advanced tooling to deal properly with legacy *nix systems that
>> either:
>> 
>> a. use a locale encoding other than UTF-8; or
>> b. don't reliably set the locale encoding for system services and cron
>> jobs (which anecdotally appears to amount to "aren't using systemd" in
>> the current crop of *nix init systems)
>> 
>> If a developer only cares about Windows, Mac OS X, or modern systemd
>> based *nix systems that use UTF-8 as the system locale, and they never
>> set "LANG=C" before running a Python program, then these new functions
>> will be completely irrelevant to them. (I've also submitted a request
>> to the glibc team to make C.UTF-8 universally available, reducing the
>> need to use "LANG=C", and they're amenable to the idea, but it
>> requires someone to work on preparing and submitting a patch:
>> https://sourceware.org/bugzilla/show_bug.cgi?id=17318)
>> 
>> If, however, a developer wants to handle "LANG=C", or other non-UTF-8
>> locales reliably across the full spectrum of *nix systems in Python 3,
>> they need a way to cope with system data that they *know* has been
>> decoded incorrectly by the interpreter, as we'll potentially do
>> exactly that for environment variables, command line arguments,
>> stdin/stdout/stderr and more if we get bad locale encoding settings
>> from the OS (such as when "LANG=C" is specified, or the init system
>> simply doesn't set a locale at all and hence CPython falls back to the
>> POSIX default of ASCII).
>> 
>> Python 2 lets users sweep a lot of that under the rug, as the data at
>> least round trips within the system, but you get unexpected mojibake
>> in some cases (especially when taking local data and pushing it out
>> over the network).
>> 
>> Since these boundary decoding issues don't arise on properly
>> configured modern *nix systems, we've been able to take advantage of
>> that by moving Python 3 towards a more pragmatic and distro-friendly
>> approach in coping with legacy *nix platforms and behaviours,
>> primarily by starting to use "surrogateescape" by default on a few
>> more system interfaces (e.g. on the standard streams when the OS
>> *claims* that the locale encoding is ASCII, which we now assume to
>> indicate a configuration error, which we can at least work around for
>> roundtripping purposes so that "os.listdir()" works reliably at the
>> interactive prompt).
>> 
>> This change in approach (heavily influenced by the parallel "Python 3
>> as the default system Python" efforts in Ubuntu and Fedora) *has*
>> moved us back towards an increased risk of introducing mojibake in
>> legacy environments, but the nature of that trade-off has changed
>> markedly from the situation back in 2009 (let alone 2006):
>> 
>> * most popular modern Linux systems use systemd with the UTF-8 locale,
>> which "just works" from a boundary encoding/decoding perspective (it's
>> closely akin to the situation we've had on Mac OS X from the dawn of
>> Python 3)
>> * even without systemd, most modern *nix systems at least default to
>> the UTF-8 locale, which works reliably for user processes in the
>> absence of an explicit setting like "LANG=C", even if service daemons
>> and cron jobs can be a bit sketchier in terms of the locale settings
>> they receive
>> * for legacy environments migrating from Python 2 without upgrading
>> the underlying OS, our emphasis has shifted to tolerating "bug
>> compatibility" at the Python level in order to ease migration, as the
>> most appropriate long term solution for those environments is now to
>> upgrade their OS such that it more reliably provides correct locale
>> encoding settings to the Python 3 interpreter (which wasn't a
>> generally available option back when Python 3 first launched)
>> 
>> Armin Ronacher (as ever) provides a good explanation of the system
>> interface problems that can arise in Python 3 with bad locale encoding
>> settings here: http://click.pocoo.org/4/python3/#python3-surrogates
>> 
>> In my view, the critical helper function for this purpose is actually
>> "handle_surrogateescape", as that's the one that lets us readily adapt
>> from the incorrectly specified ASCII locale encoding to any other
>> ASCII-compatible system encoding once we've bootstrapped into a full
>> Python environment which has more options for figuring out a suitable
>> encoding than just looking at the locale setting provided by the C
>> runtime. It's also the function that serves to provide the primary
>> "hook" where we can hang documentation of this platform specific
>> boundary encoding/decoding issue.
>> 
>> The other suggested functions are then more about providing a "peek
>> behind the curtain" API for folks that want to *use Python* to explore
>> some of the ins and outs of Unicode surrogate handling. Surrogates and
>> astrals really aren't that complicated, but we've historically hidden
>> them away as "dark magic not to be understood by mere mortals". In
>> reality, they're just different ways of composing sequences of
>> integers to represent text, and the suggested APIs are designed to
>> expose that in a way we haven't done in the past. I can't actually
>> think of a practical purpose for them other than teaching people the
>> basics of how Unicode representations work, but demystifying that
>> seems sufficiently worthwhile to me that I'm not opposed to their
>> inclusion (bear in mind I'm also the current "dis" module maintainer,
>> and a contributor to the "inspect", so I'm a big fan of exposing
>> underlying concepts like this in a way that lets people play with them
>> programmatically for learning purposes).
>> 
>> Cheers,
>> Nick.
>> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/558bf6ad/attachment-0001.html>

From abarnert at yahoo.com  Thu May  7 09:27:18 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 7 May 2015 00:27:18 -0700
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <554AFF69.9050404@thomas-guettler.de>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
 <554AFF69.9050404@thomas-guettler.de>
Message-ID: <285FE1BE-21DD-4033-B536-E2A5959A3F59@yahoo.com>

On May 6, 2015, at 23:00, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
> 
>> Am 06.05.2015 um 17:07 schrieb Paul Moore:
>>> On 6 May 2015 at 15:05, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
>>> I am missing a policy how sys.path should be altered.
>> 
>> Well, the docs say that applications can modify sys.path as needed.
>> Generally, applications modify sys.path in place via sys.path[:] =
>> whatever, but that's not mandated as far as I know.
>> 
>>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py
>> 
>> Can you explain why?
> 
> I forgot to explain the why I use a custom class. Sorry, here is the background.
> 
> I want sys.path to ordered:
> 
> 1. virtualenv
> 2. /usr/local/
> 3. /usr/lib

Can you instead just leave sys.path alone, and replace the module finder with a subclass that orders the directories in sys.path the way it wants to?

That's something a lot fewer packages are likely to screw with.

> We use virtualenvs with system site-packages.
> 
> There are many places where sys.path gets altered.
> 
> The last time we had sys.path problems I tried to write a test
> which checks that sys.path is the same for cron jobs and web requests.
> I failed. Too many places,  I could not find all the places
> and the conditions where sys.path got modified in a different way.
> 
>> It seems pretty risky to expect that no
>> applications will replace sys.path. I understand that you're proposing
>> that we say that applications shouldn't do that - but just saying so
>> won't change the many applications already out there.
> 
> Of course I know that if we agree on a policy, it wont' change existing code
> in one second. But if there is an official policy, you are able to
> write bug reports like this "Please alter sys.path according to the docs. See http://www.python.org/...."
> 
> The next thing: If someone wants to add to sys.path, most of the
> time the developer inserts its new entries in the front of the list.
> 
> This can break the ordering if you don't use a custom list class.
> 
> 
> 
> 
>>> This instance get replace by a common list in lines like this:
>>> 
>>> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
>>> 
>>> The above line is from pip, it similar things happen in a lot of packages.
>> 
>> How does the fact that pip does that cause a problem? The sys.path
>> modification is only in effect while pip is running, and no code in
>> pip relies on sys.path being an instance of your custom class.
> 
> pip is a special case, since the pip authors say "we don't provide an API".
> But they have handy methods which we want to use. We use "import pip"
> and the class of sys.path of our application gets altered.
> 
> 
>>> Before trying to solve this with code, I think the python community should
>>> agree an a policy for altering sys.path.
>> 
>> I can't imagine that happening, and even if it does, it won't make any
>> difference because a new policy won't change existing code. It won't
>> even affect new code unless people know about it (which isn't certain
>> - I doubt many people read the documentation that closely).
> 
> Code updates will happen step by step.
> If someone has a problem, since his custom list class in sys.path gets
> altered, he will write a bug report to the maintainer. A bug report
> referencing official python docs has more weight. 
> 
>>> What can I do to this done?
>> 
>> I doubt you can.
>> 
>> A PR for pip that changes the above line to modify sys.path in place
>> would probably get accepted (I can't see any reason why it wouldn't),
>> and I guess you could do the same for any other code you find. But as
>> for persuading the Python programming community not to replace
>> sys.path in any code, that seems unlikely to happen.
>> 
>>> We use Python 2.7
>> 
>> If you were using 3.x, then it's (barely) conceivable that making
>> sys.path read-only (so people could only modify it in-place) could be
>> done as a new feature, but (a) it would be a major backward
>> compatibility break, so there would have to be a strong justification,
>> and (b) it would stop you from replacing sys.path with your custom
>> class in the first place, so it wouldn't solve your issue.
>> 
>> Which also raises the question, why do you believe it's OK to forbid
>> other people to replace sys.path when that's what you're doing in your
>> sitecustomize code? That seems self-contradictory...
> 
> Yes, you are right this looks self-contradictory.
> I am the one which is responsible for the set up of the environment.
> 
> Where is the best place during the interpreter initialization for
> altering the class of sys.path? I guess it is sitecustomize. After
> it was executed sys.path should be altered only in-place. 
> 
> Regards,
>  Thomas G?ttler
> 
> 
> -- 
> http://www.thomas-guettler.de/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From robertc at robertcollins.net  Thu May  7 09:31:09 2015
From: robertc at robertcollins.net (Robert Collins)
Date: Thu, 7 May 2015 19:31:09 +1200
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7etv_xO4Qm8014C4yquV8yrWOGMoCfws+toPLRiVAtM4A@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <CADiSq7deKc6k6f3OXGOjV98AhXumicn10NjXzWzuXq8mbTQy8w@mail.gmail.com>
 <CADiSq7etv_xO4Qm8014C4yquV8yrWOGMoCfws+toPLRiVAtM4A@mail.gmail.com>
Message-ID: <CAJ3HoZ0i+jkBMoqoP9OcyTqV8jjDCZYvn026S-o22Xc+QYgOVg@mail.gmail.com>

On 7 May 2015 at 17:55, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 7 May 2015 at 15:27, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On 7 May 2015 at 11:41, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
>>> Or is there really some fundamental reason why things can't be simpler?
>>> (Like, REALLY, REALLY simple?)
>>
>> Yep, there are around 7 billion fundamental reasons currently alive,
>> and I have no idea how many that have gone before us: humans :)
>
> Heh, a message from Stephen off-list made me realise that an info dump
> of all the reasons the edge cases are hard probably wasn't a good way
> to answer your question :)
>
> What "we're" working towards (where "we" ~= the Unicode consortium +
> operating system designers + programming language designers) is a
> world where everything "just works", and computers talk to humans in
> each human's preferred language (or a collection of languages,
> depending on what the human is doing), and to each other in Unicode.
> There are then a whole host of technical and political reasons why
> it's taking decades to get from the historical point A (where
> computers talk to humans in at most one language at a time, and don't
> talk to each other at all) to that desired point B.
>
> We'll know we're done with that transition when Unicode becomes almost
> transparently invisible, and the vast majority of programmers are once
> again able to just deal with "text" without worrying too much about
> how it's represented internally (but also having their programs be
> readily usable in language's other than their own).
>
> Python 3 is already a lot closer to that ideal than Python 2 was, but
> there are still some rough edges to iron out. The ones I'm personally
> aware of affecting 3.4+ (including the one Serhiy started this thread
> about) are listed as dependencies of http://bugs.python.org/issue22555

So, just last week I had to teach pbr how to deal with git commit
messages that are not utf8 decodable.

Some of the lowest layers of our stacks are willfully hostile to utf8:

 - Linux itself refuses to consider paths to be anything other than
octet sequences
   [for various reasons, one of which is that it would be a backwards
compatibility break to stop handling non-unicode strings, and Linux
reallllllly doesn't want to do that, because you'd immediately make
some % of data worldwide inaccessible].
 - libc is somewhat, but not a lot better - its constrained by Linux
 - git considers commit messages to be octet sequences, and file paths likewise
   [for much the same reason as Linux: existing repositories have the
data in them, API break to reject it]

bzr refused non-unicode paths from day one, and we had a steady stream
of users reporting that they couldn't import their history into bzr.
One common reason is that they had test data in files on disk that was
deliberately non-unicode (e.g. they were testing unicode handling
boundary conditions in their software). Overall I believe we made the
right choice, because we had relatively little in the way of headaches
on Windows and MacOSX. [The most we ran into was the case insanity,
plus normalisation forms on MacOSX].

surrogate escaping is a clever hack, and while the underlying layers
are staunchly willing to give us crap data, we have a fairly simple
choice:
 - either accept that under some circumstances folk will have to do
their own interop shim at the boundary or
 - do the surrogate escaping hack to centralise the interop shims.

The big risk, as already pointed out, is that the interop shims can at
most get you mojibake rather than a crash. This isn't a win, its not
even beneficial.

I am not at all convinced by the distributor and packaging migration
to Python3 argument. They have 'python3 -u' available for writing
utilities that may be given mojibake input *and be expected to work
regardless*. That lets Python3 get up and started and they can choose
their own approach to handling the awful: they can just work in
bytestrings, never decoding; they can explicitly decode with
surrogateescape; they can write their own tooling.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud

From abarnert at yahoo.com  Thu May  7 09:46:20 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 7 May 2015 00:46:20 -0700
Subject: [Python-ideas] (no subject)
In-Reply-To: <87lhh142c8.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
 <CAOMjWkkSK6iAQhnCTJ4JPjFioxregNz4xFu-S3NpX00p3ZnznQ@mail.gmail.com>
 <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com>
 <CAOMjWk=fAzZxnTAU0baK5d631sjpQ4hykFiXhgBMBn3VJf-OEw@mail.gmail.com>
 <87lhh142c8.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4B4608DC-F4FF-420F-8985-39201CFECA8F@yahoo.com>

On May 6, 2015, at 18:13, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> 
> Ivan Levkivskyi writes:
> 
>> Ok, I will try inspecting all existing approaches to find the one
>> that seems more "right" to me :)
> 
> If you do inspect all the approaches you can find, I hope you'll keep
> notes and publish them, perhaps as a blog article.
> 
>> In any case that approach could be updated by incorporating matrix
>> @ as a dedicated operator for compositions.
> 
> I think rather than "dedicated" you mean "suggested".  One of Andrew's
> main points is that you're unlikely to find more than a small minority
> agreeing on the "right" approach, no matter which one you choose.

Whatever wording you use, I do think it's likely that at least some of the existing libraries would become much more readable just by using @ in place of what they currently use. Even better,
It may also turn out that the @ notation just "feels right" with one solution to the argument problem and wrong with another, narrowing down the possibility space.

So, I think it's definitely worth pushing the experiments if someone has the time and inclination, so I'm glad Ivan has volunteered.

>> At least, it seems that Erik from astropy likes this idea and it is
>> quite natural for people with "scientific" background.

I forgot to say before, but: it's great to have input from people coming from the MATLAB-y scientific/numeric world like him (I think) rather than just the Haskell/ML-y mathematical/CS world like you (Stephen, I think), as we usually get in these discussions. If there's one option that's universally obviously right to everyone in the first group, maybe everyone in the second group can shut up and deal with it. If not (which I think is likely, but I'll keep an open mind), well, at least we've got broader viewpoints and more data for Ivan's summary.

> Sure, but as he also points out, when you know that you're going to be
> composing only functions of one argument, the Unix pipe symbol is also
> quite natural (as is Haskell's operator-less notation).  While one of
> my hobbies is category theory (basically, the mathematical theory of
> composable maps for those not familiar with the term), I find the Unix
> pipeline somehow easier to think about than abstract composition,
> although I believe they're equivalent (at least as composition is
> modeled by category theory).

I think you're right that they're equivalent in theory.

But I feel like they're also equivalent in usability and readability (as in for 1/3 simple cases they're both fine, for 1/3 compose looks better, for 1/3 rcompose), but I definitely can't argue for that.

What always throws me is that most languages that offer both choose different precedence (and sometimes associativity, too) for them. The consequence seems to be that when I just use compose and rcompose operators without thinking about it, I always get them right, but as soon as I ask myself "which one is like shell pipes?" or "why did I put parens here?" I get confused and have to go take a break before I can write any more code. Haskell's operatorless notation is nice because it prevents me from noticing what I'm doing and asking myself those questions. :)

From mal at egenix.com  Thu May  7 09:56:15 2015
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 07 May 2015 09:56:15 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CALFfu7CDZ9rQWxQV2WbZ6m32-BeMDeZCT9oPs48LzqvaBxDsOg@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>	<554A8A79.2040306@egenix.com>
 <CALFfu7CDZ9rQWxQV2WbZ6m32-BeMDeZCT9oPs48LzqvaBxDsOg@mail.gmail.com>
Message-ID: <554B1A9F.6010606@egenix.com>

On 07.05.2015 00:19, Eric Snow wrote:
> On Wed, May 6, 2015 at 3:41 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> Python without the C extensions would hardly have had the
>> success it has. It is widely known as perfect language to
>> glue together different systems and provide integration.
>>
>> Deprecating the C API would mean that you deprecate all
>> those existing C extensions together with the C API.
> 
> As Donald noted, I'm not suggesting that the C-API be deprecated.  I
> was careful in calling it "discouraging direct use of the C-API". :)

Looks like that didn't work out when I read your suggestion :-)
I'd expect a big red warning on all C API pages to have
a similar effect on others.

>> This can hardly be in the interest of Python's quest for
>> world domination :-)
>>
>> BTW: What can be more drastic than deprecating the Python C API ?
>> There are certainly better ways to evolve an API than getting
>> rid of it.
> 
> I'd like to hear more on alternatives.  Lately all I've heard is how
> much better off we'd be if folks used CFFI or tools like Cython to
> write their extension modules.  Regardless of what it is, we should
> try to find *some* solution that puts us in a position that we can
> accomplish certain architectural changes, such as moving away from
> ref-counting.  Larry talked about it at the language summit.

C is pretty flexible when it comes to changing APIs gradually,
e.g. you can have macros adjusting signatures for you or small
wrapper functions fixing semantics, providing additional arguments,
etc.

I think it would be better to first investigate possible
changes to the C API before recommending putting a layer
between Python's C API and its C extensions.

Those layers are useful for people who don't want to dive into
the C API, but don't work well for those who know the C API
and how to use it to give them the best possible performance
or best possible integration with Python.

I haven't seen Larry's talk, just read a short summary of
things he mentioned in that talk. Those looked like a good
starting point for discussions. Perhaps we could have GSoC
students investigate some of these alternatives ?!

Removing the GIL and reference counting will break things,
but if there is a way we can reduce this breakage, I think
we should definitely go for that approach before saying
"oh, no, please don't use our C API".

Aside: The fact that we have so many nice C extensions out
there is proof that we have a good C API. Even though it is
not visible to most Python programmers, it forms a significant
part of Python's success.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 07 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From me at the-compiler.org  Thu May  7 10:18:44 2015
From: me at the-compiler.org (Florian Bruhin)
Date: Thu, 7 May 2015 10:18:44 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <554B1A9F.6010606@egenix.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <554A8A79.2040306@egenix.com>
 <CALFfu7CDZ9rQWxQV2WbZ6m32-BeMDeZCT9oPs48LzqvaBxDsOg@mail.gmail.com>
 <554B1A9F.6010606@egenix.com>
Message-ID: <20150507081844.GT429@tonks>

* M.-A. Lemburg <mal at egenix.com> [2015-05-07 09:56:15 +0200]:
> Aside: The fact that we have so many nice C extensions out
> there is proof that we have a good C API. Even though it is
> not visible to most Python programmers, it forms a significant
> part of Python's success.

Are many of those using the C API directly rather than using some
bindings generator?

Most projects I'm aware of use Cython/cffi/SWIG/... and not the raw C
API, which is kind of the whole point here :)

Florian

-- 
http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP)
   GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc
         I love long mails! | http://email.is-not-s.ms/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/3226664d/attachment.sig>

From p.f.moore at gmail.com  Thu May  7 10:31:13 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 7 May 2015 09:31:13 +0100
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <554AFF69.9050404@thomas-guettler.de>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
 <554AFF69.9050404@thomas-guettler.de>
Message-ID: <CACac1F9ND0z2ujOiLwHFi3mEtO-HVtzXs4YKEOp-QYyqgseDvw@mail.gmail.com>

On 7 May 2015 at 07:00, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
>> Can you explain why?
>
> I forgot to explain the why I use a custom class. Sorry, here is the background.
>
> I want sys.path to ordered:
>
>  1. virtualenv
>  2. /usr/local/
>  3. /usr/lib
>
> We use virtualenvs with system site-packages.
>
> There are many places where sys.path gets altered.
>
> The last time we had sys.path problems I tried to write a test
> which checks that sys.path is the same for cron jobs and web requests.
> I failed. Too many places,  I could not find all the places
> and the conditions where sys.path got modified in a different way.

You do understand that by reordering sys.path like this you could
easily break code that adds entries to sys.path, by shadowing local
modules that the code is *deliberately* trying to put at the start of
the path?

I'm going to assume you have good reasons for doing this (and for
needing to - it seems to me that this is normally the order you'd get
by default). But even assuming that, I think your requirement is
specialised enough that you shouldn't be expecting other applications
to have to cater for it.

>> It seems pretty risky to expect that no
>> applications will replace sys.path. I understand that you're proposing
>> that we say that applications shouldn't do that - but just saying so
>> won't change the many applications already out there.
>
> Of course I know that if we agree on a policy, it wont' change existing code
> in one second. But if there is an official policy, you are able to
> write bug reports like this "Please alter sys.path according to the docs. See http://www.python.org/...."
>
> The next thing: If someone wants to add to sys.path, most of the
> time the developer inserts its new entries in the front of the list.

Generally, I would say that applications have every right to alter
sys.path to suit their needs. Libraries (typically) shouldn't alter
sys.path - in particular on import - without that being part of the
documented API.

If a library alters sys.path in a way that is a problem, and doesn't
document that it's doing so, then I think you have a case for a bug
report to that library. At a minimum they should document what they
do.

Your problem here is that pip is an *application* and so assumes the
right to alter sys.path. You seem to be using it as a library, and
that's where your problem lies. There *is* a reason we don't support
using pip as a library (this wasn't one we'd thought of, but the risk
of issues like this certainly was). With luck, now that you've brought
this point up, we'll remember if & when we do document a supported
pip-as-a-library API, and maybe deal with sys.path differently.

Paul

PS As I said before, it wouldn't be hard to fix the specific usage you
pointed out in pip, and I don't see a problem with submitting an issue
to that effect. Your custom subclass may still break pip, even after
we make such a change, but that'd be a separate issue with your
subclass, not a pip issue ;-) For this thread, though, I'm focusing on
your request for a "global policy".

From p.f.moore at gmail.com  Thu May  7 10:47:15 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 7 May 2015 09:47:15 +0100
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <554B1A9F.6010606@egenix.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <554A8A79.2040306@egenix.com>
 <CALFfu7CDZ9rQWxQV2WbZ6m32-BeMDeZCT9oPs48LzqvaBxDsOg@mail.gmail.com>
 <554B1A9F.6010606@egenix.com>
Message-ID: <CACac1F8dZuyvk47-v2e+ruJ9L0vxvUnGhgNJCen_OK-10e6JTw@mail.gmail.com>

On 7 May 2015 at 08:56, M.-A. Lemburg <mal at egenix.com> wrote:
> Aside: The fact that we have so many nice C extensions out
> there is proof that we have a good C API. Even though it is
> not visible to most Python programmers, it forms a significant
> part of Python's success.

Agreed. Maybe a useful exercise for someone thinking about this issue
would be to survey some of the major projects using the C API out
there, and working out what would be involved in switching them to use
cffi or Cython. That would give a good idea of the scale of the issue,
as well as providing some practical help to projects that would be
affected by this sort of recommendation.

Good ones to look at would be:
- lxml
- pywin32

(I refrained from adding scipy and numpy to that list, as that would
make this post seem like a troll attempt, which it isn't, but has
anyone thought of the implications of a recommendation like this on
those projects? OK, they'd probably just ignore it as they have a
genuine need for direct use of the C API, but we would be sending
pretty mixed messages).

I prefer Nick's suggestion of adding better documentation to the
packaging user guide. Maybe even to the extent of having a worked
example. The article at
https://scipy-lectures.github.io/advanced/interfacing_with_c/interfacing_with_c.html
is quite a nice overview, although it's heavily numpy-focused and
doesn't include cffi.

Paul

From stefan at bytereef.org  Thu May  7 10:54:37 2015
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 7 May 2015 08:54:37 +0000 (UTC)
Subject: [Python-ideas] discouraging direct use of the C-API
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
Message-ID: <loom.20150507T103957-492@post.gmane.org>

Eric Snow <ericsnowcurrently at ...> writes:
> A big blocker to making certain sweeping changes to CPython (e.g.
> ref-counting) is compatibility with the vast body of C extension
> modules out there that use the C-API.  While there are certainly
> drastic long-term solutions to that problem, there is one thing we can
> do in the short-term that would at least get the ball rolling.  We can
> put a big red note at the top of every page of the C-API docs that
> encourages folks to either use CFFI or Cython.

-1. CFFI is much slower than using the C-API directly.

Python is a great language by itself, but its excellent C-API is one
of the major selling points.

As for garbage collection vs. refcounting:  I've tried OCaml's C-API
and found it 20% slower than Python's.  Note that OCaml has a fantastic
native code compiler (and the culture is C-friendly), so it seems to
be a hard problem.



Stefan Krah


From caleb.hattingh at gmail.com  Thu May  7 11:01:59 2015
From: caleb.hattingh at gmail.com (Caleb Hattingh)
Date: Thu, 7 May 2015 19:01:59 +1000
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <loom.20150507T103957-492@post.gmane.org>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <loom.20150507T103957-492@post.gmane.org>
Message-ID: <E9CF9DEE-F4F3-4DEE-B64F-AEF0E23312FD@gmail.com>


> On 7 May 2015, at 6:54 pm, Stefan Krah <stefan at bytereef.org> wrote:
> 
> Eric Snow <ericsnowcurrently at ...> writes:
>> A big blocker to making certain sweeping changes to CPython (e.g.
>> ref-counting) is compatibility with the vast body of C extension
>> modules out there that use the C-API.  While there are certainly
>> drastic long-term solutions to that problem, there is one thing we can
>> do in the short-term that would at least get the ball rolling.  We can
>> put a big red note at the top of every page of the C-API docs that
>> encourages folks to either use CFFI or Cython.
> 
> -1. CFFI is much slower than using the C-API directly.

I am quite interested in this; do you happen have a link to a case study/gist/repo where this has been measured? Even if you can remember people?s names involved or something similar, I could google it myself.

Kind regards
Caleb

From jmcs at jsantos.eu  Thu May  7 11:09:59 2015
From: jmcs at jsantos.eu (=?UTF-8?B?Sm/Do28gU2FudG9z?=)
Date: Thu, 07 May 2015 09:09:59 +0000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <20150506145131.GL5663@ando.pearwood.info>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <20150506145131.GL5663@ando.pearwood.info>
Message-ID: <CAH_XWH3FnX4=+Cz1v_SrHGijZtgv50mJ2vH7QjMrGF_vo2Cm2Q@mail.gmail.com>

On Wed, 6 May 2015 at 16:51 Steven D'Aprano <steve at pearwood.info> wrote:

>
> I think that there are some questions that would need to be answered.
> For instance, given some composition:
>
>     f = math.sin @ (lambda x: x**2)
>
> what would f.__name__ return? What about str(f)?
>

Lambdas return '<lambda>' so maybe something like '<composed>'?
Then str(f) would be '<function <composed> at 0xffffffffffff>'.


> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/5cf96286/attachment.html>

From stefan at bytereef.org  Thu May  7 11:11:29 2015
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 7 May 2015 09:11:29 +0000 (UTC)
Subject: [Python-ideas] discouraging direct use of the C-API
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <loom.20150507T103957-492@post.gmane.org>
 <E9CF9DEE-F4F3-4DEE-B64F-AEF0E23312FD@gmail.com>
Message-ID: <loom.20150507T110846-798@post.gmane.org>

Caleb Hattingh <caleb.hattingh at ...> writes:
> > -1. CFFI is much slower than using the C-API directly.
> 
> I am quite interested in this; do you happen have a link to a case
study/gist/repo where this has been
> measured? Even if you can remember people?s names involved or something
similar, I could google it myself.

I've measured it here:

https://mail.python.org/pipermail/python-dev/2013-December/130772.html


CFFI is very nice (superb API), but not for high performance use cases.



Stefan Krah

From donald at stufft.io  Thu May  7 11:14:18 2015
From: donald at stufft.io (Donald Stufft)
Date: Thu, 7 May 2015 05:14:18 -0400
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <loom.20150507T110846-798@post.gmane.org>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <loom.20150507T103957-492@post.gmane.org>
 <E9CF9DEE-F4F3-4DEE-B64F-AEF0E23312FD@gmail.com>
 <loom.20150507T110846-798@post.gmane.org>
Message-ID: <FE7386FC-319B-4E88-A65D-698DB8804724@stufft.io>


> On May 7, 2015, at 5:11 AM, Stefan Krah <stefan at bytereef.org> wrote:
> 
> Caleb Hattingh <caleb.hattingh at ...> writes:
>>> -1. CFFI is much slower than using the C-API directly.
>> 
>> I am quite interested in this; do you happen have a link to a case
> study/gist/repo where this has been
>> measured? Even if you can remember people?s names involved or something
> similar, I could google it myself.
> 
> I've measured it here:
> 
> https://mail.python.org/pipermail/python-dev/2013-December/130772.html
> 
> 
> CFFI is very nice (superb API), but not for high performance use cases.
> 

Is the source code for this benchmark available anywhere?

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/8f6cd1a2/attachment.sig>

From rosuav at gmail.com  Thu May  7 11:41:18 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Thu, 7 May 2015 19:41:18 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAH_XWH3FnX4=+Cz1v_SrHGijZtgv50mJ2vH7QjMrGF_vo2Cm2Q@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <20150506145131.GL5663@ando.pearwood.info>
 <CAH_XWH3FnX4=+Cz1v_SrHGijZtgv50mJ2vH7QjMrGF_vo2Cm2Q@mail.gmail.com>
Message-ID: <CAPTjJmrkWtegku8rOFzRo7rgp4MHdJUSgtzikHzvSpZKh3W7Hg@mail.gmail.com>

On Thu, May 7, 2015 at 7:09 PM, Jo?o Santos <jmcs at jsantos.eu> wrote:
> On Wed, 6 May 2015 at 16:51 Steven D'Aprano <steve at pearwood.info> wrote:
>>
>>
>> I think that there are some questions that would need to be answered.
>> For instance, given some composition:
>>
>>     f = math.sin @ (lambda x: x**2)
>>
>> what would f.__name__ return? What about str(f)?
>
>
> Lambdas return '<lambda>' so maybe something like '<composed>'?
> Then str(f) would be '<function <composed> at 0xffffffffffff>'.

Would be nice to use "<sin @ <lambda>>", incorporating both names, but
that could get unwieldy once you compose a bunch of functions.

ChrisA

From stephen at xemacs.org  Thu May  7 12:04:34 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 07 May 2015 19:04:34 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7etv_xO4Qm8014C4yquV8yrWOGMoCfws+toPLRiVAtM4A@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <CADiSq7deKc6k6f3OXGOjV98AhXumicn10NjXzWzuXq8mbTQy8w@mail.gmail.com>
 <CADiSq7etv_xO4Qm8014C4yquV8yrWOGMoCfws+toPLRiVAtM4A@mail.gmail.com>
Message-ID: <87bnhw4sb1.fsf@uwakimon.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > What "we're" working towards (where "we" ~= the Unicode consortium +
 > operating system designers + programming language designers) is a
 > world where everything "just works", and computers talk to humans in
 > each human's preferred language (or a collection of languages,
 > depending on what the human is doing), and to each other in Unicode.
 > There are then a whole host of technical and political reasons

And economic -- which really bites here because if it weren't for the
good ol' American greenback and that huge GDP and consumption
(especially of software) this thread would be all about why GB
18030[1] is so hard.  Think about *that* prospect the next time the
"complexity of Unicode" starts to bug you. :-)

 > We'll know we're done with that transition when Unicode becomes almost
 > transparently invisible, and the vast majority of programmers are once
 > again able to just deal with "text" without worrying too much about
 > how it's represented internally

That part after the "and" is a misstatement, isn't it?  Nobody using
Python 3 is concerned with how it's represented internally *at all*,
because for all the str class cares it *could* be GB 18030, and only
ord() (and esoteric features like memoryview) would ever tell you so.
And Python 3 programmers *can* treat str as "just text"[2] as long as
they stick to pure Python, and don't have to accept or generate
encoded text for *external* modules (such as Tcl/Tk) that don't know
about (all of) Unicode.  Even surrogateescapes only matter when you're
dealing with rather unruly input (or a mendacious OS).

So it's *still* all about I/O, viz: issue22555.  "Unicode" is just the
conventional curse word that programmers use when they're thinking
"HCI is hard and it sucks and I just wish it would go away!", even
though Unicode gets us 90% of the way to the solution.  (The other 10%
is where us humans go contributing a little peace, love, and
understanding. :-)


Footnotes: 
[1]  The Chinese standard which has exactly the same character
repertoire as Unicode (because it tracks it by design), but instead of
grandfathering ISO 8859-1 code points as the first 256 code points of
Unicode, it grandfathers GB 2312 (Chinese) as the first few thousand,
and has a rather obnoxious variable width representation as a result.

[2]  With a few exceptions such as dealing with Apple's icky NFD
filesystem encoding, and formatting bidirectional strings in
reStructuredText (which I haven't tried, but I bet doesn't work very
well in tables!)


From stefan at bytereef.org  Thu May  7 12:15:04 2015
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 7 May 2015 10:15:04 +0000 (UTC)
Subject: [Python-ideas] discouraging direct use of the C-API
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CADiSq7e41R-Tu8e2e=AF1Qt2UaH8H2_icedksCJLtLYsrK_qSA@mail.gmail.com>
Message-ID: <loom.20150507T121330-401@post.gmane.org>

Nick Coghlan <ncoghlan at ...> writes: 
> Rather than embedding these recommendations directly in the version
> specific CPython docs, I'd prefer to see contributions to fill in the
> incomplete sections in
> https://packaging.python.org/en/latest/extensions.html with links back
> to the relevant parts of the C API documentation and docs for other
> projects (I was able to write the current overview section on that
> page in a few hours, as I didn't need to do much research for that,
> but filling in the other sections properly involves significantly more
> work).

Hmm. I'm getting a twilio.com advertisement on that page.  I miss
the old python.org...


Stefan Krah




From steve at pearwood.info  Thu May  7 13:30:46 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 7 May 2015 21:30:46 +1000
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <loom.20150507T121330-401@post.gmane.org>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CADiSq7e41R-Tu8e2e=AF1Qt2UaH8H2_icedksCJLtLYsrK_qSA@mail.gmail.com>
 <loom.20150507T121330-401@post.gmane.org>
Message-ID: <20150507113045.GQ5663@ando.pearwood.info>

On Thu, May 07, 2015 at 10:15:04AM +0000, Stefan Krah wrote:
> Nick Coghlan <ncoghlan at ...> writes: 
> > Rather than embedding these recommendations directly in the version
> > specific CPython docs, I'd prefer to see contributions to fill in the
> > incomplete sections in
> > https://packaging.python.org/en/latest/extensions.html with links back
> > to the relevant parts of the C API documentation and docs for other
> > projects (I was able to write the current overview section on that
> > page in a few hours, as I didn't need to do much research for that,
> > but filling in the other sections properly involves significantly more
> > work).
> 
> Hmm. I'm getting a twilio.com advertisement on that page.  I miss
> the old python.org...

I see it too. Why is python.org displaying advertisments?



-- 
Steve

From donald at stufft.io  Thu May  7 13:44:16 2015
From: donald at stufft.io (Donald Stufft)
Date: Thu, 7 May 2015 07:44:16 -0400
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <20150507113045.GQ5663@ando.pearwood.info>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CADiSq7e41R-Tu8e2e=AF1Qt2UaH8H2_icedksCJLtLYsrK_qSA@mail.gmail.com>
 <loom.20150507T121330-401@post.gmane.org>
 <20150507113045.GQ5663@ando.pearwood.info>
Message-ID: <D9D64F1A-CADB-48D8-921E-419CE14F133E@stufft.io>


> On May 7, 2015, at 7:30 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> On Thu, May 07, 2015 at 10:15:04AM +0000, Stefan Krah wrote:
>> Nick Coghlan <ncoghlan at ...> writes:
>>> Rather than embedding these recommendations directly in the version
>>> specific CPython docs, I'd prefer to see contributions to fill in the
>>> incomplete sections in
>>> https://packaging.python.org/en/latest/extensions.html with links back
>>> to the relevant parts of the C API documentation and docs for other
>>> projects (I was able to write the current overview section on that
>>> page in a few hours, as I didn't need to do much research for that,
>>> but filling in the other sections properly involves significantly more
>>> work).
>> 
>> Hmm. I'm getting a twilio.com advertisement on that page.  I miss
>> the old python.org...
> 
> I see it too. Why is python.org displaying advertisments?
> 


packaging.python.org is hosted on RTD, I guess that RTD added ads to
it?s free service.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/5bd9591e/attachment.sig>

From p.f.moore at gmail.com  Thu May  7 13:50:09 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 7 May 2015 12:50:09 +0100
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <loom.20150507T110846-798@post.gmane.org>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <loom.20150507T103957-492@post.gmane.org>
 <E9CF9DEE-F4F3-4DEE-B64F-AEF0E23312FD@gmail.com>
 <loom.20150507T110846-798@post.gmane.org>
Message-ID: <CACac1F98N-Yekiv5rsX85N3vnNk-6vs6HyhcT0XwiXrQDHv5_A@mail.gmail.com>

On 7 May 2015 at 10:11, Stefan Krah <stefan at bytereef.org> wrote:
> Caleb Hattingh <caleb.hattingh at ...> writes:
>> > -1. CFFI is much slower than using the C-API directly.
>>
>> I am quite interested in this; do you happen have a link to a case
> study/gist/repo where this has been
>> measured? Even if you can remember people?s names involved or something
> similar, I could google it myself.
>
> I've measured it here:
>
> https://mail.python.org/pipermail/python-dev/2013-December/130772.html
>
>
> CFFI is very nice (superb API), but not for high performance use cases.

I'm guessing that benchmark used cffi in the "ABI level" dynamic form
that matches ctypes. Did you try the cffi "API level" form that
creates a C extension? I'd be curious as to where that falls in
performance.

Paul

From caleb.hattingh at gmail.com  Thu May  7 13:58:01 2015
From: caleb.hattingh at gmail.com (Caleb Hattingh)
Date: Thu, 7 May 2015 21:58:01 +1000
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CACac1F98N-Yekiv5rsX85N3vnNk-6vs6HyhcT0XwiXrQDHv5_A@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <loom.20150507T103957-492@post.gmane.org>
 <E9CF9DEE-F4F3-4DEE-B64F-AEF0E23312FD@gmail.com>
 <loom.20150507T110846-798@post.gmane.org>
 <CACac1F98N-Yekiv5rsX85N3vnNk-6vs6HyhcT0XwiXrQDHv5_A@mail.gmail.com>
Message-ID: <86012306-FEA2-4CAE-8553-FBFFA661654D@gmail.com>


> On 7 May 2015, at 9:50 pm, Paul Moore <p.f.moore at gmail.com> wrote:
> 
> On 7 May 2015 at 10:11, Stefan Krah <stefan at bytereef.org> wrote:
>> Caleb Hattingh <caleb.hattingh at ...> writes:
>>>> -1. CFFI is much slower than using the C-API directly.
>>> 
>>> I am quite interested in this; do you happen have a link to a case
>> study/gist/repo where this has been
>>> measured? Even if you can remember people?s names involved or something
>> similar, I could google it myself.
>> 
>> I've measured it here:
>> 
>> https://mail.python.org/pipermail/python-dev/2013-December/130772.html
>> 
>> CFFI is very nice (superb API), but not for high performance use cases.
> 
> I'm guessing that benchmark used cffi in the "ABI level" dynamic form
> that matches ctypes. Did you try the cffi "API level" form that
> creates a C extension? I'd be curious as to where that falls in
> performance.

I had a quick look around, @eevee made this comparison some time ago:

===
 	? CPython 2.7 + Cython: 2.0s
	? CPython 2.7 + CFFI: 2.7s
	? PyPy 2.1 + CFFI: 4.3s
That?s the time it takes, from a warm start, to run the test suite.
===

from http://eev.ee/blog/2013/09/13/cython-versus-cffi/

Kind regards
Caleb

From storchaka at gmail.com  Thu May  7 14:07:49 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 07 May 2015 15:07:49 +0300
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
Message-ID: <mifkil$jne$1@ger.gmane.org>

On 07.05.15 05:05, Neil Girdhar wrote:
> Since strings are constant, wouldn't it be much faster to implement
> string slices as a view of other strings?
>
> For clarity, I'm talking about CPython.  I'm not talking about anything
> the user sees.  The string views would still look like regular str
> instances to the user.

Note that String in Java was implemented as a view of underlying array 
of chars. This allowed sharing character data and fast (constant time) 
slicing. But the implementation was changed in Java 7u6.

http://java-performance.info/changes-to-string-java-1-7-0_06/


From dw+python-ideas at hmmz.org  Thu May  7 14:32:39 2015
From: dw+python-ideas at hmmz.org (David Wilson)
Date: Thu, 7 May 2015 12:32:39 +0000
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
Message-ID: <20150507123239.GA1768@k3>

On Wed, May 06, 2015 at 10:23:09AM -0600, Eric Snow wrote:

> A big blocker to making certain sweeping changes to CPython (e.g.
> ref-counting) is compatibility with the vast body of C extension
> modules out there that use the C-API.  While there are certainly
> drastic long-term solutions to that problem, there is one thing we can
> do in the short-term that would at least get the ball rolling.  We can
> put a big red note at the top of every page of the C-API docs that
> encourages folks to either use CFFI or Cython.

One of CPython's traditional strongholds is its use as an embedded
language. I've worked on a bunch of commercial projects using it in this
way, often specifically for improved performance/access to interpreter
internals, and this is not to mention the numerous free software
projects doing similar: gdb, uwsgi, mod_python, Freeswitch, and so on.

It might be better to discuss specifics of what should change in the API
besides refcounting, and hammer out concrete steps to make those changes
happen, since I doubt the C API is ever going to go away, as even if all
extension modules were rewritten today its use for embedding would still
prevent sweeping changes without upsetting a huge number of users and
mature products.


David

From guettliml at thomas-guettler.de  Thu May  7 16:51:36 2015
From: guettliml at thomas-guettler.de (=?windows-1252?Q?Thomas_G=FCttler?=)
Date: Thu, 07 May 2015 16:51:36 +0200
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <20150507072223.GS429@tonks>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
 <554AFF69.9050404@thomas-guettler.de> <20150507064836.GR429@tonks>
 <554B0D3E.9020708@thomas-guettler.de> <20150507072223.GS429@tonks>
Message-ID: <554B7BF8.8070508@thomas-guettler.de>



Am 07.05.2015 um 09:22 schrieb Florian Bruhin:
> * Thomas G?ttler <guettliml at thomas-guettler.de> [2015-05-07 08:59:10 +0200]:
>>
>>
>> Am 07.05.2015 um 08:48 schrieb Florian Bruhin:
>>> * Thomas G?ttler <guettliml at thomas-guettler.de> [2015-05-07 08:00:09 +0200]:
>>>> Am 06.05.2015 um 17:07 schrieb Paul Moore:
>>>>> On 6 May 2015 at 15:05, Thomas G?ttler <guettliml at thomas-guettler.de> wrote:
>>>>>> I am missing a policy how sys.path should be altered.
>>>>>
>>>>> Well, the docs say that applications can modify sys.path as needed.
>>>>> Generally, applications modify sys.path in place via sys.path[:] =
>>>>> whatever, but that's not mandated as far as I know.
>>>>>
>>>>>> We run a custom sub class of list in sys.path. We set it in sitecustomize.py
>>>>>
>>>>> Can you explain why?
>>>>
>>>> I forgot to explain the why I use a custom class. Sorry, here is the background.
>>>>
>>>> I want sys.path to ordered:
>>>>
>>>>   1. virtualenv
>>>>   2. /usr/local/
>>>>   3. /usr/lib
>>>>
>>>> We use virtualenvs with system site-packages.
>>>>
>>>> There are many places where sys.path gets altered.
>>>>
>>>> The last time we had sys.path problems I tried to write a test
>>>> which checks that sys.path is the same for cron jobs and web requests.
>>>> I failed. Too many places,  I could not find all the places
>>>> and the conditions where sys.path got modified in a different way.
>>>
>>> It looks like you explained *how* you do what you do, but not *why* -
>>> what problem is this solving? Why can't you just invoke the
>>> virtualenv's python and let python take care of sys.path?
>>
>> I want the sys.path be ordered like it, since I want that packages of the inner
>> environment are tried first.
>>
>> Here "inner" means "upper" in the above sys.path order.
>>
>> Example: If a package is installed in the virtualenv with version 2.2 and
>> in global site packages with version 1.0, then I want the interpreter to
>> use the version from virtualenv.
>
> That's already the default virtualenv behaviour:

If this is the behaviour in your virtualenv, that's nice for you.

In my virtualenv it was not that way. There are a lot of modules which do
magic with sys.path. I guess non of them are installed in your virtualenv.

Andrew Barnet suggested to alter the module finder. That looks interesting.

   Thomas

From steve at pearwood.info  Thu May  7 17:31:24 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 8 May 2015 01:31:24 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <554AC2CE.5040705@btinternet.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
Message-ID: <20150507153123.GT5663@ando.pearwood.info>

On Thu, May 07, 2015 at 02:41:34AM +0100, Rob Cliffe wrote:
> This is no doubt *not* the best platform to raise these thoughts (which 
> are nothing to do with Python - apologies), but I'm not sure where else 
> to go.
> I watch discussions like this ...
> I watch posts like this one [Nick's] ...
> ...  And I despair.  I really despair.
> 
> I am a very experienced but old (some would say "dinosaur") programmer.
> I appreciate the need for Unicode.  I really do.
> I don't understand Unicode and all its complications AT ALL.
> And I can't help wondering:
>     Why, oh why, do things have to be SO FU*****G COMPLICATED?  This 
> thread, for example, is way over my head.  And it is typical of many 
> discussions I have stared at, uncomprehendingly.
> Surely 65536 (2-byte) encodings are enough to express all characters in 
> all the languages in the world, plus all the special characters we need.

Not even close.

Unicode currently encodes over 74,000 CJK (Chinese/Japanese/Korean)
ideographs, which is comfortably larger than 2**16, so no 16-bit 
encoding can handle the complete range of CJK characters. 

It will probably take many more years before the entire CJK character 
set is added to Unicode, simply because the characters left to add are 
obscure and rare. Some may never be added at all, e.g. in 2007 Taiwan 
withdrew a submission to add 6,545 characters used as personal names as 
they were deemed to no longer be in use.

That's just *one* writing system. Then we add Latin, Cyrillic (Russian), 
Greek/Coptic, Arabic, Hebrew, Korea's other writing system Hangul, Thai, 
and dozens of others. (Fortunately, unlike Chinese characters, the other 
writing systems typically need only a few dozen or hundred characters, 
not tens of thousands.) Plus dozens of punctuation marks, symbols from 
mathematics, linguistics, and much more. And the Unicode Consortium 
projects that at least another five thousand characters will be added in 
version 8, and probably more beyond that.

So no, two bytes is not enough.

Unicode actually fits into 21 bits, which is a bit less than three 
bytes, but for machine efficiency four bytes will often be used.


> Why can't there be just *ONE* universal encoding?  (Decided upon, no 
> doubt, by some international standards committee. There would surely be 
> enough spare codes for any special characters etc. that might come up in 
> the foreseeable future.)

The problem isn't so much with the Unicode encodings (of which there are 
only a handful, and most of the time you only use one, UTF-8) but with 
the dozens and dozens of legacy encodings invented during the dark ages 
before Unicode.


> *Is it just historical accident* (partly due to an awkward move from 
> 1-byte ASCII to 2-byte Unicode, implemented in many different places, in 
> many different ways) *that we now have a patchwork of encodings that we 
> strive to fit into some over-complicated scheme*?

Yes, it is a historical accident. In the 1960s, 70s and 80s national 
governments and companies formed a plethora of one-byte (and occasional 
two-byte) encodings to support their own languages and symbols. E.g. in 
the 1980s, Apple used their own idiosyncratic set of 256 characters, 
which didn't match the 256 characters used on DOS, which was different 
again from those on Amstrad...

Unicode was started in the 1990s to bring order to that chaos. If you 
think things are complicated with Unicode, they would be much worse 
without it.


> Or is there *really* some *fundamental reason* why things *can't* be 
> simpler?  (Like, REALLY, _*REALLY*_ simple?)

90% of the complexity is due to the history of text encodings on various 
computer platforms. If people had predicted cheap memory and the 
Internet back in the early 1960s, perhaps we wouldn't have ended up with 
ASCII and the dozens of incompatible "Extended ASCII" encodings as we 
know them today.

But the other 90% of the complexity is inherent to human languages. For 
example, you know what the lower case of "I" is, don't you? It's "i". 
But not in Turkey, which has both a dotted and dotless version:

    I ? 
    ? i 

(Strangely, as far as I know, nobody has a dotted J or dotless j.)

Consequently, Unicode has a bunch of complexity related to left-to-right 
and right-to-left writing systems, accents, joiners, variant forms, and 
other issues. But, unless you're actually writing in a language which 
needs that, or writing a word-processor application, you can usually 
ignore all of that and just treat them as "characters".


> Imageine if we were starting to design the 21st century from scratch, 
> throwing away all the history?  How would we go about it?

Well, for starters I would insist on re-introducing thorn ? and eth ? 
back into English :-)

                                                                                                                                                                                                                                                                                                                                       

-- 
Steve

From steve at pearwood.info  Thu May  7 17:46:21 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 8 May 2015 01:46:21 +1000
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
Message-ID: <20150507154621.GU5663@ando.pearwood.info>

On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote:
> Since strings are constant, wouldn't it be much faster to implement string 
> slices as a view of other strings?

String or list views would be *very* useful in situations like this:

# Create a massive string
s = "some string"*1000000
for c in s[1:]:
    process(c)


which needlessly duplicates almost the entire string just to skip the 
first char. The same applies to lists or other sequences.

But a view would be harmful in this situation:

s = "some string"*1000000
t = s[1:2]  # a view maskerading as a new string
del s

Now we keep the entire string alive long after it is needed.

How would you solve the first problem without introducing the second?



-- 
Steve

From tjreedy at udel.edu  Thu May  7 17:58:00 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 07 May 2015 11:58:00 -0400
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <D9D64F1A-CADB-48D8-921E-419CE14F133E@stufft.io>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CADiSq7e41R-Tu8e2e=AF1Qt2UaH8H2_icedksCJLtLYsrK_qSA@mail.gmail.com>
 <loom.20150507T121330-401@post.gmane.org>
 <20150507113045.GQ5663@ando.pearwood.info>
 <D9D64F1A-CADB-48D8-921E-419CE14F133E@stufft.io>
Message-ID: <mig22b$beu$1@ger.gmane.org>

On 5/7/2015 7:44 AM, Donald Stufft wrote:

>>>> https://packaging.python.org/en/latest/extensions.html

>>> Hmm. I'm getting a twilio.com advertisement on that page.  I miss
>>> the old python.org...

>> I see it too. Why is python.org displaying advertisments?

> packaging.python.org is hosted on RTD, I guess that RTD added ads to
> it?s free service.

I don't see any ad (using Verizon FIOS). I have noscript running, but 
allowing first readthedocs.com and then grokthedocs.com did not produce 
an ad.

-- 
Terry Jan Reedy



From frankwoodall at gmail.com  Thu May  7 18:06:32 2015
From: frankwoodall at gmail.com (Frank Woodall)
Date: Thu, 7 May 2015 12:06:32 -0400
Subject: [Python-ideas] Handling lack of permissions/groups with pathlib's
	rglob
Message-ID: <CAFbwLUrGRjWZ_fqfsvZRxGxS+r6ceZaM91Nb+vPwiEPFScPjZg@mail.gmail.com>

Greetings,

I am attempting to use pathlib to recursively glob and/or find files. File
permissions and groups are all over the place due to poor management of the
filesystem which is out of my control.

The problem occurs when I lack both permissions and group membership to a
directory that rglob attempts to descend into. Rglob throws a KeyError and
then a PermissionError and finally stops entirely. I see no way to recover
gracefully from this and continue globbing. Is this the expected behavior
in this case?

The behavior that I want is for rglob to skip directories that I don't have
permissions on and to generate the list of everything that it saw/had
permissions on. The all or nothing nature isn't going to get me very far in
this particular case because I'm almost guaranteed to have bad permissions
on some directory or another on every run.

More specifics: Python: 3.4.1 (and 3.4.3) compiled from source for linux

Filesystem I am globbing on: automounted nfs share

How to reproduce:

mkdir /tmp/path_test && cd /tmp/path_test && mkdir dir1 dir2 dir2/dir3
&& touch dir1/file1 dir1/file2 dir2/file1 dir2/file2 dir2/dir3/file1
su
chmod 700 dir2/dir3/
chown root:root dir2/dir3/
exit

python 3.4.1

from pathlib import Path
p = Path('/tmp/path_test')
for x in p.rglob('*') : print(x)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/86c054e0/attachment.html>

From tjreedy at udel.edu  Thu May  7 18:11:08 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 07 May 2015 12:11:08 -0400
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <CACac1F9ND0z2ujOiLwHFi3mEtO-HVtzXs4YKEOp-QYyqgseDvw@mail.gmail.com>
References: <554A1F8C.1040005@thomas-guettler.de>
 <CACac1F82g4tVxXeuYvtf1PQWidHpc+k-c7POxx8kLzmboW+jbw@mail.gmail.com>
 <554AFF69.9050404@thomas-guettler.de>
 <CACac1F9ND0z2ujOiLwHFi3mEtO-HVtzXs4YKEOp-QYyqgseDvw@mail.gmail.com>
Message-ID: <mig2qv$oqq$1@ger.gmane.org>

On 5/7/2015 4:31 AM, Paul Moore wrote:

> Generally, I would say that applications have every right to alter
> sys.path to suit their needs. Libraries (typically) shouldn't alter
> sys.path - in particular on import - without that being part of the
> documented API.

I agree.  Altering sys.path is an instance of monkeypatching, as is 
altering sys.std*.  Libraries that automatically do either on import 
limit their usefulness and should document their behavior.

-- 
Terry Jan Reedy


From mistersheik at gmail.com  Thu May  7 18:22:40 2015
From: mistersheik at gmail.com (Neil Girdhar)
Date: Thu, 7 May 2015 12:22:40 -0400
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <20150507154621.GU5663@ando.pearwood.info>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info>
Message-ID: <CAA68w_mP8iZqdfj-06SYqzBZ5y05bn+ZWBJuVGB-q4F6ENMWAg@mail.gmail.com>

One way, is leave CPython as is, and create a string view class as a user.
The other is to make the views use weakref to the target string and copy on
delete.

Anyway, I'm not really bothered about this.  Just wanted to see what people
thought.  You probably shouldn't be using str for really long strings,
which is the only time this would matter anyway.

On Thu, May 7, 2015 at 11:46 AM, Steven D'Aprano <steve at pearwood.info>
wrote:

> On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote:
> > Since strings are constant, wouldn't it be much faster to implement
> string
> > slices as a view of other strings?
>
> String or list views would be *very* useful in situations like this:
>
> # Create a massive string
> s = "some string"*1000000
> for c in s[1:]:
>     process(c)
>
>
> which needlessly duplicates almost the entire string just to skip the
> first char. The same applies to lists or other sequences.
>
> But a view would be harmful in this situation:
>
> s = "some string"*1000000
> t = s[1:2]  # a view maskerading as a new string
> del s
>
> Now we keep the entire string alive long after it is needed.
>
> How would you solve the first problem without introducing the second?
>
>
>
> --
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/8941b936/attachment.html>

From solipsis at pitrou.net  Thu May  7 18:23:24 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 7 May 2015 18:23:24 +0200
Subject: [Python-ideas] Handling lack of permissions/groups with
	pathlib's rglob
References: <CAFbwLUrGRjWZ_fqfsvZRxGxS+r6ceZaM91Nb+vPwiEPFScPjZg@mail.gmail.com>
Message-ID: <20150507182324.78706543@fsol>


Hello Frank,

On Thu, 7 May 2015 12:06:32 -0400
Frank Woodall <frankwoodall at gmail.com>
wrote:
> The problem occurs when I lack both permissions and group membership to a
> directory that rglob attempts to descend into. Rglob throws a KeyError and
> then a PermissionError and finally stops entirely. I see no way to recover
> gracefully from this and continue globbing. Is this the expected behavior
> in this case?

It is not unexpected :) Actually, this case was simply not envisioned.
I agree that being more laxist could be convenient here.

If you want to provide a patch for this, you can start at
https://docs.python.org/devguide/

Regards

Antoine.


> The behavior that I want is for rglob to skip directories that I don't have
> permissions on and to generate the list of everything that it saw/had
> permissions on. The all or nothing nature isn't going to get me very far in
> this particular case because I'm almost guaranteed to have bad permissions
> on some directory or another on every run.
> 
> More specifics: Python: 3.4.1 (and 3.4.3) compiled from source for linux
> 
> Filesystem I am globbing on: automounted nfs share
> 
> How to reproduce:
> 
> mkdir /tmp/path_test && cd /tmp/path_test && mkdir dir1 dir2 dir2/dir3
> && touch dir1/file1 dir1/file2 dir2/file1 dir2/file2 dir2/dir3/file1
> su
> chmod 700 dir2/dir3/
> chown root:root dir2/dir3/
> exit
> 
> python 3.4.1
> 
> from pathlib import Path
> p = Path('/tmp/path_test')
> for x in p.rglob('*') : print(x)
> 



From stefan at bytereef.org  Thu May  7 18:54:22 2015
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 7 May 2015 16:54:22 +0000 (UTC)
Subject: [Python-ideas] discouraging direct use of the C-API
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <CADiSq7e41R-Tu8e2e=AF1Qt2UaH8H2_icedksCJLtLYsrK_qSA@mail.gmail.com>
 <loom.20150507T121330-401@post.gmane.org>
 <20150507113045.GQ5663@ando.pearwood.info>
 <D9D64F1A-CADB-48D8-921E-419CE14F133E@stufft.io> <mig22b$beu$1@ger.gmane.org>
Message-ID: <loom.20150507T185200-914@post.gmane.org>

Terry Reedy <tjreedy at ...> writes:
> >>>> https://packaging.python.org/en/latest/extensions.html
> I don't see any ad (using Verizon FIOS). I have noscript running, but 
> allowing first readthedocs.com and then grokthedocs.com did not produce 
> an ad.


It's gone now.  It was there when I posted earlier (I even clicked through).



Stefan Krah





From stefan_ml at behnel.de  Thu May  7 19:23:41 2015
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 07 May 2015 19:23:41 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <CACac1F8dZuyvk47-v2e+ruJ9L0vxvUnGhgNJCen_OK-10e6JTw@mail.gmail.com>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <554A8A79.2040306@egenix.com>
 <CALFfu7CDZ9rQWxQV2WbZ6m32-BeMDeZCT9oPs48LzqvaBxDsOg@mail.gmail.com>
 <554B1A9F.6010606@egenix.com>
 <CACac1F8dZuyvk47-v2e+ruJ9L0vxvUnGhgNJCen_OK-10e6JTw@mail.gmail.com>
Message-ID: <mig72t$312$1@ger.gmane.org>

Paul Moore schrieb am 07.05.2015 um 10:47:
> On 7 May 2015 at 08:56, M.-A. Lemburg wrote:
>> Aside: The fact that we have so many nice C extensions out
>> there is proof that we have a good C API. Even though it is
>> not visible to most Python programmers, it forms a significant
>> part of Python's success.

Oh, totally. But that doesn't mean people have to manually write code
against it, in the same way that you can benefit from excellent processors
without writing assembly.


> Maybe a useful exercise for someone thinking about this issue
> would be to survey some of the major projects using the C API out
> there, and working out what would be involved in switching them to use
> cffi or Cython. That would give a good idea of the scale of the issue,
> as well as providing some practical help to projects that would be
> affected by this sort of recommendation.

My general answer is that "Python is way easier to write than C", and
therefore "rewriting C code in Cython" is a rather fast thing to do (P's
and C's set as intended). Often enough, the rewrite also leads to immediate
functional improvements because stuff can easily be done in a more general
way in Python syntax than in plain C(-API) code. And it's not uncommon that
several ref-counting and/or error handling bugs get fixed on the way.

When I rewrite C-API code in Cython, the bulk of the time is spent reverse
engineering the intended Python semantics from the verbose (and sometimes
cryptic) C code. After that, writing them down in Python syntax is quite
easy. Once you get used to it, the plain transformation can be done at more
than a hundred lines of C code per hour, if it's not overly complex or
dense (the usual 5%). If you have a good test suite, debugging the
rewritten code should be quite straight forward afterwards.

So, if you have a project with 10000 lines of C code, 30% of which uses the
C-API, you should be able to rip out the direct usage of the C-API in just
a couple of days by rewriting it in Cython. The code size usually drops by
a factor of 2-5 that way. That also makes it a reasonable migration path
for porting Py2.x C-API code to Py3, for example.

I can't speak for cffi, but my guess is that if you know its API well, the
fact that it's also Python should keep the rewriting speed in the same ball
park as for Cython. So, for code that isn't performance critical, it's
certainly a reasonable alternative, with the added benefit of having
excellent support in PyPy.


> Good ones to look at would be:
> - lxml

lxml has been written in Cython even before Cython existed (it used to be a
patched Pyrex at the time). In fact, writing it in C would have been
entirely impossible. Even if the necessary developer resources had been
available, writing C code is so difficult in comparison that many of the
non-trivial features would never have been implemented.


> (I refrained from adding scipy and numpy to that list, as that would
> make this post seem like a troll attempt, which it isn't, but has
> anyone thought of the implications of a recommendation like this on
> those projects? OK, they'd probably just ignore it as they have a
> genuine need for direct use of the C API, but we would be sending
> pretty mixed messages).

Much of scipy and its surrounding tools and libraries are actually written
in Cython. At least much of their parts that interact with Python, and
often a lot more than just the interface layer. New code in the scientific
computing community is commonly written in Cython these days, or uses other
tools for JIT or AOT compilation (Numba, numexpr, ...), many of which were
themselves partly written in Cython.

Stefan



From levkivskyi at gmail.com  Thu May  7 19:50:29 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Thu, 7 May 2015 19:50:29 +0200
Subject: [Python-ideas] (no subject)
In-Reply-To: <4B4608DC-F4FF-420F-8985-39201CFECA8F@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <FA7C7E2C-28ED-4397-852E-F801C9183AFF@yahoo.com>
 <CAOMjWkkSK6iAQhnCTJ4JPjFioxregNz4xFu-S3NpX00p3ZnznQ@mail.gmail.com>
 <8C3A59B4-1C5B-4C67-A148-9ADBEE7123A7@yahoo.com>
 <CAOMjWk=fAzZxnTAU0baK5d631sjpQ4hykFiXhgBMBn3VJf-OEw@mail.gmail.com>
 <87lhh142c8.fsf@uwakimon.sk.tsukuba.ac.jp>
 <4B4608DC-F4FF-420F-8985-39201CFECA8F@yahoo.com>
Message-ID: <CAOMjWkmxKba-3DUB0zCQMT0JvzVWoFvGjMWurYnKD4yidc8njw@mail.gmail.com>

On May 7, 2015 9:46 AM, "Andrew Barnert" <abarnert at yahoo.com> wrote:
>
> On May 6, 2015, at 18:13, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> >
> > Ivan Levkivskyi writes:
> >
> >> Ok, I will try inspecting all existing approaches to find the one
> >> that seems more "right" to me :)
> >
> > If you do inspect all the approaches you can find, I hope you'll keep
> > notes and publish them, perhaps as a blog article.
> >
> >> In any case that approach could be updated by incorporating matrix
> >> @ as a dedicated operator for compositions.
> >
> > I think rather than "dedicated" you mean "suggested".  One of Andrew's
> > main points is that you're unlikely to find more than a small minority
> > agreeing on the "right" approach, no matter which one you choose.
>
> Whatever wording you use, I do think it's likely that at least some of
the existing libraries would become much more readable just by using @ in
place of what they currently use. Even better,
> It may also turn out that the @ notation just "feels right" with one
solution to the argument problem and wrong with another, narrowing down the
possibility space.
>
> So, I think it's definitely worth pushing the experiments if someone has
the time and inclination, so I'm glad Ivan has volunteered.
>

Thank you for encouraging me. It will be definitely an interesting
experience to do this.

> >> At least, it seems that Erik from astropy likes this idea and it is
> >> quite natural for people with "scientific" background.
>
> I forgot to say before, but: it's great to have input from people coming
from the MATLAB-y scientific/numeric world like him (I think) rather than
just the Haskell/ML-y mathematical/CS world like you (Stephen, I think), as
we usually get in these discussions. If there's one option that's
universally obviously right to everyone in the first group, maybe everyone
in the second group can shut up and deal with it. If not (which I think is
likely, but I'll keep an open mind), well, at least we've got broader
viewpoints and more data for Ivan's summary.
>
> > Sure, but as he also points out, when you know that you're going to be
> > composing only functions of one argument, the Unix pipe symbol is also
> > quite natural (as is Haskell's operator-less notation).  While one of
> > my hobbies is category theory (basically, the mathematical theory of
> > composable maps for those not familiar with the term), I find the Unix
> > pipeline somehow easier to think about than abstract composition,
> > although I believe they're equivalent (at least as composition is
> > modeled by category theory).
>
> I think you're right that they're equivalent in theory.
>
> But I feel like they're also equivalent in usability and readability (as
in for 1/3 simple cases they're both fine, for 1/3 compose looks better,
for 1/3 rcompose), but I definitely can't argue for that.
>
> What always throws me is that most languages that offer both choose
different precedence (and sometimes associativity, too) for them. The
consequence seems to be that when I just use compose and rcompose operators
without thinking about it, I always get them right, but as soon as I ask
myself "which one is like shell pipes?" or "why did I put parens here?" I
get confused and have to go take a break before I can write any more code.
Haskell's operatorless notation is nice because it prevents me from
noticing what I'm doing and asking myself those questions. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/d3201639/attachment.html>

From levkivskyi at gmail.com  Thu May  7 20:01:14 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Thu, 7 May 2015 20:01:14 +0200
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAOMjWkk7B9=QdkGsNJMf4F6d5jbDWp8AQ+xVyUz57VzT2bNcsA@mail.gmail.com>
References: <CAOMjWkk7B9=QdkGsNJMf4F6d5jbDWp8AQ+xVyUz57VzT2bNcsA@mail.gmail.com>
Message-ID: <CAOMjWk=GwB_BveiL_4O2=H65YeMBw_xUmgNadsNoy28ZGk6H-Q@mail.gmail.com>

> On Thu, May 7, 2015 at 7:09 PM, Jo?o Santos <jmcs at jsantos.eu> wrote:
> > On Wed, 6 May 2015 at 16:51 Steven D'Aprano <steve at pearwood.info> wrote:
> >>
> >>
> >> I think that there are some questions that would need to be answered.
> >> For instance, given some composition:
> >>
> >>     f = math.sin @ (lambda x: x**2)
> >>
> >> what would f.__name__ return? What about str(f)?
> >
> >
> > Lambdas return '<lambda>' so maybe something like '<composed>'?
> > Then str(f) would be '<function <composed> at 0xffffffffffff>'.
>
> Would be nice to use "<sin @ <lambda>>", incorporating both names, but
> that could get unwieldy once you compose a bunch of functions.

Maybe it would be better to have '<function <composed> at 0xffffffffffff>'
for str(f) but 'sin @ <lambda>' for repr (f). So that one can have more
info and it would be closer to idealistic obj == eval (repr (obj)).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/a118efac/attachment.html>

From tjreedy at udel.edu  Thu May  7 20:26:06 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 07 May 2015 14:26:06 -0400
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <20150507154621.GU5663@ando.pearwood.info>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info>
Message-ID: <migao1$1kt$1@ger.gmane.org>

On 5/7/2015 11:46 AM, Steven D'Aprano wrote:
> On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote:
>> Since strings are constant, wouldn't it be much faster to implement string
>> slices as a view of other strings?
>
> String or list views would be *very* useful in situations like this:
>
> # Create a massive string
> s = "some string"*1000000
> for c in s[1:]:
>      process(c)

Easily done without slicing, as discussed on python-list multiple times.

it = iter(s)
next(it)
for c in it: process(c)

for s[5555: 399999], use explicit indexes

for i in range(5555, 400000): process s[i]

or use islice.

The use case for sequence views is when one needs to keep around both 
the base sequence and the slices (views).

-- 
Terry Jan Reedy


From nad at acm.org  Thu May  7 20:27:40 2015
From: nad at acm.org (Ned Deily)
Date: Thu, 07 May 2015 11:27:40 -0700
Subject: [Python-ideas] Handling lack of permissions/groups with
	pathlib's rglob
References: <CAFbwLUrGRjWZ_fqfsvZRxGxS+r6ceZaM91Nb+vPwiEPFScPjZg@mail.gmail.com>
 <20150507182324.78706543@fsol>
Message-ID: <nad-EBC10D.11274007052015@news.gmane.org>

In article <20150507182324.78706543 at fsol>,
 Antoine Pitrou <solipsis at pitrou.net> 
 wrote:
> On Thu, 7 May 2015 12:06:32 -0400
> Frank Woodall <frankwoodall at gmail.com>
> wrote:
> > The problem occurs when I lack both permissions and group membership to a
> > directory that rglob attempts to descend into. Rglob throws a KeyError and
> > then a PermissionError and finally stops entirely. I see no way to recover
> > gracefully from this and continue globbing. Is this the expected behavior
> > in this case?
> It is not unexpected :) Actually, this case was simply not envisioned.
> I agree that being more laxist could be convenient here.
> 
> If you want to provide a patch for this, you can start at
> https://docs.python.org/devguide/

Also there is an open issue about this that can be used to attach a 
patch or further discussion:

http://bugs.python.org/issue24120

-- 
 Ned Deily,
 nad at acm.org


From chris.barker at noaa.gov  Thu May  7 20:32:31 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 7 May 2015 11:32:31 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
Message-ID: <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>

My not-an-expert thoughts on these issues:

[NOTE: nested comments, so attribution may be totally confused]

    Why, oh why, do things have to be SO FU*****G COMPLICATED?
>
> two reasons:

1) human languages are complicated, and they all have their idiosyncrasies
-- some are inherently better suited to machine interpretation, but the
real killer is that we want to use multiple languages with one system --
that IS inherently very complicated.

2) legacy decisions an backward compatibility -- this is what makes it
impossible to "simply" come up with a single bets way to to do it (or a few
 ways, anyway...)

> Surely 65536 (2-byte) encodings are enough to express all characters in
> all the languages in the world, plus all the special characters we need.
>
> That was once thought true -- but it turns out it's not -- darn!

Though we do think that 4 bytes is plenty, and to some extent I'm confused
as to why there isn't more use of UCS-4 -- sure it wastes a lot of space,
but everything in computer (memory, cache, disk space, bandwidth) is orders
of magnitudes larger/faster than it was when the Unicode discussion got
started. But people don't like inefficiency and, in fact, as the newer py3
Unicode objects shows, we don't need to compromise on that.

Or is there really some fundamental reason why things can't be simpler?
>  (Like, REALLY, REALLY simple?)


Well, if there were no legacy systems, it still couldn't be REALLY, REALLY
simple (though UCS-4 is close), but there could be a LOT fewer ways to do
things: programming languages would have their own internal representation
(like Python does), and we would have a small handful of encodings
optimized for various things: UCS-4 for easy of use, utf-8 for small disk
storage (at least of Euro-centered text), and that would be that. But we do
have the legacies to deal with.




Apple, Microsoft, Sun, and a few other vendors jumped on the Unicode
> bandwagon early and committed themselves to the idea that 2 bytes is enough
> for everything. When the world discovered that wasn't true, we were stuck
> with a bunch of APIs that insisted on 2 bytes. Apple was able to partly
> make a break with that era, but Windows and Java are completely stuck with
> "Unicode means 16-bit" forever, which is why the whole world is stuck
> dealing with UTF-16 and surrogates forever.
>

I've read many of the rants about UTF-16, but in fact, it's really not any
worse than UTF-8 -- it's kind of a worst of both worlds -- not a set number
of bytes per char, but a lot of wasted space (particularly for euro
languages), but other than a bi tof wasted sapce, it's jsut like UTF-8.

The Problem with is it not UTF-16 itself, but the fact that an really
surprising number of APIs and programmers still think that it's UCS-2,
rather than UTF-16 --painful. And the fact, that AFAIK, ther really is not
C++ Unicode type -- at least not one commonly used. Again -- legacy issues.

And there are still people creating filenames on Latin-1 filesystems on
> older Linux and Unix boxes,
>

This is the odd one to me -- reading about people's struggles with py3 an
*nix filenames -- they argue that *nix is not broken -- and the world
should just use char* for filenames and all is well! IN fact, maybe it
would be easier to handle filenames as char* in some circumstances, but to
argue that a system is not broken when you can't know the encoding of
filenames, and there may be differently encoded filenames ON THE SAME
Filesystem is insane! of course that is broken! It may be reality, and
maybe Py3 needs to do a bit more to accommodate it, but it is broken.

In fact, as much as I like to bash Windows, I've had NO problems with
assuming filenames in Windows are UTF-16 (as long as we use the "wide char"
APIs, sigh), and OS-X's specification of filenames as utf-8 works fine. So
Linux really needs to catch up here!

UTF-16 is a historical accident,
>

yeah, but it's not really a killer, either -- the problems come when people
assume UTF-16 is UCS-2, just alike assuming that utf-8 is ascii (or any
one-byte encoding...)

 We really do need at least UTF-8 and UTF-32. But that's it. And I think
> that's simple enough.


is UTF-32 the same as UCS-4 ? Always a bit confused by that.

Oh, and endian issues -- *sigh*

 Aaaargh!  Do I really have to learn all this mumbo-jumbo?!  (Forgive me.
> :-) )


Some of it yes, I'm afraid so -- but probably not the surrogate pair stuff,
etc. That stuff is pretty esoteric, and really needs to be understood by
people writing APIs -- but for those of us that USE APIs, not so much.

For instance, Python's handling Unicode file names almost always "just
works" (as long as you stay in Python...)


-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/4ef24c1c/attachment-0001.html>

From njs at pobox.com  Thu May  7 21:19:01 2015
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 7 May 2015 12:19:01 -0700
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <mig72t$312$1@ger.gmane.org>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <554A8A79.2040306@egenix.com>
 <CALFfu7CDZ9rQWxQV2WbZ6m32-BeMDeZCT9oPs48LzqvaBxDsOg@mail.gmail.com>
 <554B1A9F.6010606@egenix.com>
 <CACac1F8dZuyvk47-v2e+ruJ9L0vxvUnGhgNJCen_OK-10e6JTw@mail.gmail.com>
 <mig72t$312$1@ger.gmane.org>
Message-ID: <CAPJVwBnhjppRX0SXq1ZVtLsASWM5A-viry3U_0m5m1M4-0hhmg@mail.gmail.com>

On May 7, 2015 10:24 AM, "Stefan Behnel" <stefan_ml at behnel.de> wrote:
>
> Paul Moore schrieb am 07.05.2015 um 10:47:
>
> > (I refrained from adding scipy and numpy to that list, as that would
> > make this post seem like a troll attempt, which it isn't, but has
> > anyone thought of the implications of a recommendation like this on
> > those projects? OK, they'd probably just ignore it as they have a
> > genuine need for direct use of the C API, but we would be sending
> > pretty mixed messages).
>
> Much of scipy and its surrounding tools and libraries are actually written
> in Cython. At least much of their parts that interact with Python, and
> often a lot more than just the interface layer. New code in the scientific
> computing community is commonly written in Cython these days, or uses
other
> tools for JIT or AOT compilation (Numba, numexpr, ...), many of which were
> themselves partly written in Cython.

Yeah, I think if anyone talks to the developers of those libraries they
will get a very *un*mixed message saying, don't do what we did :-). One of
scipy's GSoC projects this year is even porting a c extension to Cython,
and I've been actively investigating the possibility of porting numpy into
Cython as well. Mostly for the immediate benefits, but certainly it has
occurred to me that in the long run this could potentially provide an
escape hatch from CPython. (Numerical people are *very* interested in
JITs... and something like Cython provides the unique possibility that if a
project like PyPy or pyston added direct support for the language, then one
could write a single source file that was fast on cpython b/c it compiled
to C, and was even faster on other interpreters because the same source got
jitted.)

The main obstacle to porting numpy, btw, is that Cython currently assumes
that each source file will generate one python extension, and any
communication between source files will be via python-level imports. NumPy,
of course, has 100,000 lines of C across lots of files that are all built
into one extension module, and which happily communicate via direct C
function calls. So incrementally porting is impossible without teaching
Cython to handle this case a bit better. NumPy is an extreme outlier in
this regard though. In particular this is absolutely not a reason to steer
*new* projects away from Cython.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/fda2b995/attachment.html>

From mistersheik at gmail.com  Thu May  7 21:29:57 2015
From: mistersheik at gmail.com (Neil Girdhar)
Date: Thu, 7 May 2015 15:29:57 -0400
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <migao1$1kt$1@ger.gmane.org>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <migao1$1kt$1@ger.gmane.org>
Message-ID: <CAA68w_=wka+x+psk=Q4z7xNcfFtXyQKo2JLjZXawJT9w1EC-aw@mail.gmail.com>

The point is to have a Pythonic way of saying that.  Using islice or
iterating over a range and indexing is ugly.  It would be cleaner to
implement a string class that implements fast slicing than those unpythonic
pieces of code.

Best,

Neil

On Thu, May 7, 2015 at 2:26 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> On 5/7/2015 11:46 AM, Steven D'Aprano wrote:
>
>> On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote:
>>
>>> Since strings are constant, wouldn't it be much faster to implement
>>> string
>>> slices as a view of other strings?
>>>
>>
>> String or list views would be *very* useful in situations like this:
>>
>> # Create a massive string
>> s = "some string"*1000000
>> for c in s[1:]:
>>      process(c)
>>
>
> Easily done without slicing, as discussed on python-list multiple times.
>
> it = iter(s)
> next(it)
> for c in it: process(c)
>
> for s[5555: 399999], use explicit indexes
>
> for i in range(5555, 400000): process s[i]
>
> or use islice.
>
> The use case for sequence views is when one needs to keep around both the
> base sequence and the slices (views).
>
> --
> Terry Jan Reedy
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> --- You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/d622d65b/attachment.html>

From abarnert at yahoo.com  Thu May  7 21:37:22 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 7 May 2015 12:37:22 -0700
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <migao1$1kt$1@ger.gmane.org>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <migao1$1kt$1@ger.gmane.org>
Message-ID: <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com>

On May 7, 2015, at 11:26, Terry Reedy <tjreedy at udel.edu> wrote:
> 
>> On 5/7/2015 11:46 AM, Steven D'Aprano wrote:
>>> On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote:
>>> Since strings are constant, wouldn't it be much faster to implement string
>>> slices as a view of other strings?
>> 
>> String or list views would be *very* useful in situations like this:
>> 
>> # Create a massive string
>> s = "some string"*1000000
>> for c in s[1:]:
>>     process(c)
> 
> Easily done without slicing, as discussed on python-list multiple times.
> 
> it = iter(s)
> next(it)
> for c in it: process(c)
> 
> for s[5555: 399999], use explicit indexes
> 
> for i in range(5555, 400000): process s[i]
> 
> or use islice.
> 
> The use case for sequence views is when one needs to keep around both the base sequence and the slices (views).

Or where you need to keep around multiple views at once.

Since NumPy has native view-slicing, I suspect we can find a lot of good use cases there.

One question: when you slice a view, do you get a copy, or another view? Because if it's the latter, you can write view slices with view(s)[1:] instead of view(s, 1, None), which seems like a big readability win, but on the other hand it means a view doesn't act just like a normal sequence--e.g., v[:] no longer makes a copy. (NumPy does the latter, of course.)


> -- 
> Terry Jan Reedy
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From edk141 at gmail.com  Thu May  7 21:57:08 2015
From: edk141 at gmail.com (Ed Kellett)
Date: Thu, 07 May 2015 19:57:08 +0000
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <migao1$1kt$1@ger.gmane.org>
 <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com>
Message-ID: <CABmzr0gs2G5oV5zuGynUuH4oPck0Vy7K4e=PgbOKRpjivS8icQ@mail.gmail.com>

On Thu, 7 May 2015 at 20:40 Andrew Barnert via Python-ideas <
python-ideas at python.org> wrote:
>
> One question: when you slice a view, do you get a copy, or another view?
> Because if it's the latter, you can write view slices with view(s)[1:]
> instead of view(s, 1, None), which seems like a big readability win, but on
> the other hand it means a view doesn't act just like a normal
> sequence--e.g., v[:] no longer makes a copy. (NumPy does the latter, of
> course.)


Well, in the context of strings it doesn't matter. (or, in some sense, not
copying immutable strings is a viable implementation technique for copying
them). CPython already knows that:

>>> x = "foo"
>>> x is x[:]
True

Ed Kellett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/d34d911f/attachment-0001.html>

From breamoreboy at yahoo.co.uk  Thu May  7 22:02:19 2015
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Thu, 07 May 2015 21:02:19 +0100
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <CAA68w_=wka+x+psk=Q4z7xNcfFtXyQKo2JLjZXawJT9w1EC-aw@mail.gmail.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <migao1$1kt$1@ger.gmane.org>
 <CAA68w_=wka+x+psk=Q4z7xNcfFtXyQKo2JLjZXawJT9w1EC-aw@mail.gmail.com>
Message-ID: <miggcd$4nh$1@ger.gmane.org>

On 07/05/2015 20:29, Neil Girdhar wrote:
> The point is to have a Pythonic way of saying that.  Using islice or
> iterating over a range and indexing is ugly.  It would be cleaner to
> implement a string class that implements fast slicing than those unpythonic
> pieces of code.
>
> Best,
>
> Neil

I don't see anything unpythonic there at all, just standard Python.  If 
you want fast slicing that badly you can write the class unless somebody 
beats you to it as their itch is more painful.

>
> On Thu, May 7, 2015 at 2:26 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>
>> On 5/7/2015 11:46 AM, Steven D'Aprano wrote:
>>
>>> On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote:
>>>
>>>> Since strings are constant, wouldn't it be much faster to implement
>>>> string
>>>> slices as a view of other strings?
>>>>
>>>
>>> String or list views would be *very* useful in situations like this:
>>>
>>> # Create a massive string
>>> s = "some string"*1000000
>>> for c in s[1:]:
>>>       process(c)
>>>
>>
>> Easily done without slicing, as discussed on python-list multiple times.
>>
>> it = iter(s)
>> next(it)
>> for c in it: process(c)
>>
>> for s[5555: 399999], use explicit indexes
>>
>> for i in range(5555, 400000): process s[i]
>>
>> or use islice.
>>
>> The use case for sequence views is when one needs to keep around both the
>> base sequence and the slices (views).
>>
>> --
>> Terry Jan Reedy
>>

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence


From skip.montanaro at gmail.com  Thu May  7 22:12:58 2015
From: skip.montanaro at gmail.com (Skip Montanaro)
Date: Thu, 7 May 2015 15:12:58 -0500
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <CABmzr0gs2G5oV5zuGynUuH4oPck0Vy7K4e=PgbOKRpjivS8icQ@mail.gmail.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info>
 <migao1$1kt$1@ger.gmane.org>
 <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com>
 <CABmzr0gs2G5oV5zuGynUuH4oPck0Vy7K4e=PgbOKRpjivS8icQ@mail.gmail.com>
Message-ID: <CANc-5Uz_TJ9YiibN_q15t7-mQamD-0qN=UGLGV5R=pHGYBHtkw@mail.gmail.com>

I haven't seen anyone else mention it, so I will point out:
interoperability with C. In C, strings are NUL-terminated. PyStringObject
instances do (or used to) have NUL-terminated strings in them. According to
unicodeobject.h, that seems still to be the case:

typedef struct {
    /* There are 4 forms of Unicode strings:
    ...
    wchar_t *wstr;              /* wchar_t representation (*null-terminated*)
*/
} PyASCIIObject;

and:

typedef struct {
    PyASCIIObject _base;
    Py_ssize_t utf8_length;     /* Number of bytes in utf8, *excluding the*
*                                 * terminating \0*. */
    char *utf8;                 /* UTF-8 representation (*null-terminated*)
*/
    Py_ssize_t wstr_length;     /* Number of code points in wstr, possible
                                 * surrogates count as two code points. */
} PyCompactUnicodeObject;

The raw string is NUL-terminated, precisely so copying isn't required in
most cases before passing to C. Making s[1:-1] a view onto the underlying
string data in s would require you to copy the data when you want to pass
the view into C so you could tack on that NUL. That happens a lot, so it's
likely you wouldn't save much work, and result in a lot more churn in
Python's memory allocator. The only place you could avoid the copy is if
the view you are dealing with is a strict suffix of s.

Skip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/45214e69/attachment.html>

From abarnert at yahoo.com  Thu May  7 23:04:55 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 7 May 2015 14:04:55 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
Message-ID: <08424643-0DB2-4632-A2E5-45C67818E615@yahoo.com>

On May 7, 2015, at 11:32, Chris Barker <chris.barker at noaa.gov> wrote:
> 
> My not-an-expert thoughts on these issues:
> 
> [NOTE: nested comments, so attribution may be totally confused]
> 
>>>     Why, oh why, do things have to be SO FU*****G COMPLICATED?
> two reasons:
> 
> 1) human languages are complicated, and they all have their idiosyncrasies -- some are inherently better suited to machine interpretation, but the real killer is that we want to use multiple languages with one system -- that IS inherently very complicated.
> 
> 2) legacy decisions an backward compatibility -- this is what makes it impossible to "simply" come up with a single bets way to to do it (or a few  ways, anyway...)
>>> Surely 65536 (2-byte) encodings are enough to express all characters in all the languages in the world, plus all the special characters we need.
> That was once thought true -- but it turns out it's not -- darn!
> 
> Though we do think that 4 bytes is plenty, and to some extent I'm confused as to why there isn't more use of UCS-4 -- sure it wastes a lot of space, but everything in computer (memory, cache, disk space, bandwidth) is orders of magnitudes larger/faster than it was when the Unicode discussion got started. But people don't like inefficiency and, in fact, as the newer py3 Unicode objects shows, we don't need to compromise on that.
> 
>> Or is there really some fundamental reason why things can't be simpler?  (Like, REALLY, REALLY simple?)
> 
> 
> Well, if there were no legacy systems, it still couldn't be REALLY, REALLY simple (though UCS-4 is close), but there could be a LOT fewer ways to do things: programming languages would have their own internal representation (like Python does), and we would have a small handful of encodings optimized for various things: UCS-4 for easy of use, utf-8 for small disk storage (at least of Euro-centered text), and that would be that. But we do have the legacies to deal with.
>  
> 
>> Apple, Microsoft, Sun, and a few other vendors jumped on the Unicode bandwagon early and committed themselves to the idea that 2 bytes is enough for everything. When the world discovered that wasn't true, we were stuck with a bunch of APIs that insisted on 2 bytes. Apple was able to partly make a break with that era, but Windows and Java are completely stuck with "Unicode means 16-bit" forever, which is why the whole world is stuck dealing with UTF-16 and surrogates forever.
> 
> I've read many of the rants about UTF-16, but in fact, it's really not any worse than UTF-8 -- it's kind of a worst of both worlds -- not a set number of bytes per char, but a lot of wasted space (particularly for euro languages), but other than a bi tof wasted sapce, it's jsut like UTF-8.
> 
> The Problem with is it not UTF-16 itself, but the fact that an really surprising number of APIs and programmers still think that it's UCS-2, rather than UTF-16 --painful.

But this makes UTF-16 an attractive nuisance. When people use UTF-16, it's not because it happens to save 12% storage or 3% CPU over UTF-8 for some particular corpus, it's because it lets either them or some API they're dealing with pretend Unicode == UCS-2 so they can write buggy code quickly instead of proper code almost as quickly. If we'd never had UCS-2, and invented UTF-16 only now, I don't think anyone would use it; therefore, it would be better it we didn't have it.

> And the fact, that AFAIK, ther really is not C++ Unicode type -- at least not one commonly used. 

I've got no problem with the fact that they defined UTF-8, UTF-16, and UTF-32 types instead of a Unicode type. In a language where strings are just pointers to arrays of characters, what would a Unicode type even mean?

> Again -- legacy issues.

>> And there are still people creating filenames on Latin-1 filesystems on older Linux and Unix boxes,
> 
> This is the odd one to me -- reading about people's struggles with py3 an *nix filenames -- they argue that *nix is not broken -- and the world should just use char* for filenames and all is well! IN fact, maybe it would be easier to handle filenames as char* in some circumstances, but to argue that a system is not broken when you can't know the encoding of filenames, and there may be differently encoded filenames ON THE SAME Filesystem is insane! of course that is broken! It may be reality, and maybe Py3 needs to do a bit more to accommodate it, but it is broken.
> 
> In fact, as much as I like to bash Windows, I've had NO problems with assuming filenames in Windows are UTF-16 (as long as we use the "wide char" APIs, sigh), and OS-X's specification of filenames as utf-8 works fine. So Linux really needs to catch up here!

I _almost_ like OS X's approach here. If you've got files on a filesystem that aren't in UTF-8 (or that a filesystem driver can't transparently represent as UTF-8 because it stores some other static, per-fs, or per-file encoding, like NTFS's static UTF-16-LE), you see those files as UTF-8 anyway. That means some are mojibake. And maybe some either aren't accessible at all, or are accessible through names the filesystem invented that mean nothing. Too bad, here are some tools to repair your broken filesystem if that's a problem for you.

The problem is, those tools are only available at way too high a level. If they just put the real bytes for an undecodable filename right in an extra DIRENTRY slot, anyone could easily write tools to help the user fix it that work at the normal filesystem level. ("rename --transcode-from=Latin-1 broken/*" would require adding 11 lines of trivial code to rename.pl, including the lines for processing the flag and dealing with post-transcoding collisions, if that information were available.)

But Apple doesn't seem to care about making those tools writable at that level. Which means there's no chance in hell of GNU nor BSD following Apple's lead. So no one's ever going to solve it, we'll just close our eyes and hope that eventually it's as rare a problem as dealing with Atari or EBCDIC source code are today so we can declare it solved-enough-I-guess.

>> UTF-16 is a historical accident,
> 
> yeah, but it's not really a killer, either -- the problems come when people assume UTF-16 is UCS-2, just alike assuming that utf-8 is ascii (or any one-byte encoding...)
> 
>>  We really do need at least UTF-8 and UTF-32. But that's it. And I think that's simple enough.
> 
> is UTF-32 the same as UCS-4 ? Always a bit confused by that.

Technically, UTF-32 is a subset of UCS-4. UCS-4 is an encoding of 31-bit values in 4 octets by leaving the top bit 0. UTF-32 is an encoding of 21-bit values in 32 bits by leaving the top 11 bits 0. So if you're using them to transmit Unicode code points (the only use they're defined for), they're identical.

> Oh, and endian issues -- *sigh*

Yes, big-endian-only is order #13 on my plans if I ever became supreme dictator. (Unless my advisors want to argue about big vs. little; in that case, I give them 4 hours to debate it, then drop them all in the crocodile pit and flip a coin.)

>>  Aaaargh!  Do I really have to learn all this mumbo-jumbo?!  (Forgive me. :-) )
> 
> Some of it yes, I'm afraid so -- but probably not the surrogate pair stuff, etc. That stuff is pretty esoteric, and really needs to be understood by people writing APIs -- but for those of us that USE APIs, not so much.
> 
> For instance, Python's handling Unicode file names almost always "just works" (as long as you stay in Python...)
> 
> 
> -Chris
> 
> 
> -- 
> 
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
> 
> Chris.Barker at noaa.gov
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/f56d0b98/attachment-0001.html>

From mistersheik at gmail.com  Fri May  8 00:09:34 2015
From: mistersheik at gmail.com (Neil Girdhar)
Date: Thu, 7 May 2015 18:09:34 -0400
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <CANc-5Uz_TJ9YiibN_q15t7-mQamD-0qN=UGLGV5R=pHGYBHtkw@mail.gmail.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <migao1$1kt$1@ger.gmane.org>
 <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com>
 <CABmzr0gs2G5oV5zuGynUuH4oPck0Vy7K4e=PgbOKRpjivS8icQ@mail.gmail.com>
 <CANc-5Uz_TJ9YiibN_q15t7-mQamD-0qN=UGLGV5R=pHGYBHtkw@mail.gmail.com>
Message-ID: <CAA68w_k7GxrSzpVMb4x6uv0tSA2aWNWENQaatCYN3KnQSA+Z3g@mail.gmail.com>

This was a CPython idea after all, so I was assuming a C implementation,
which means that new flags would have to be added to the string object to
denote a string slice, etc.

Like I said in another message: it's not that important to me though.  I
was just curious as to why CPython was designed so that string slicing is
linear rather than constant time given that strings are constant.

I'm getting the impression that the payoff is not worth the complexity.

On Thu, May 7, 2015 at 4:12 PM, Skip Montanaro <skip.montanaro at gmail.com>
wrote:

> I haven't seen anyone else mention it, so I will point out:
> interoperability with C. In C, strings are NUL-terminated. PyStringObject
> instances do (or used to) have NUL-terminated strings in them. According to
> unicodeobject.h, that seems still to be the case:
>
> typedef struct {
>     /* There are 4 forms of Unicode strings:
>     ...
>     wchar_t *wstr;              /* wchar_t representation (
> *null-terminated*) */
> } PyASCIIObject;
>
> and:
>
> typedef struct {
>     PyASCIIObject _base;
>     Py_ssize_t utf8_length;     /* Number of bytes in utf8, *excluding
> the*
> *                                 * terminating \0*. */
>     char *utf8;                 /* UTF-8 representation (*null-terminated*)
> */
>     Py_ssize_t wstr_length;     /* Number of code points in wstr, possible
>                                  * surrogates count as two code points. */
> } PyCompactUnicodeObject;
>
> The raw string is NUL-terminated, precisely so copying isn't required in
> most cases before passing to C. Making s[1:-1] a view onto the underlying
> string data in s would require you to copy the data when you want to pass
> the view into C so you could tack on that NUL. That happens a lot, so it's
> likely you wouldn't save much work, and result in a lot more churn in
> Python's memory allocator. The only place you could avoid the copy is if
> the view you are dealing with is a strict suffix of s.
>
> Skip
>
>  --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150507/9027ba85/attachment.html>

From stephen at xemacs.org  Fri May  8 00:30:11 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 08 May 2015 07:30:11 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
Message-ID: <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>

Chris Barker writes:

 > I've read many of the rants about UTF-16, but in fact, it's really
 > not any worse than UTF-8

Yes, it is.  It's not ASCII compatible.  You can safely use the usual
libc string APIs on UTF-8 (except for any that might return only part
of a string), but not on UTF-16 (nulls).  This is a pretty big
advantage for UTF-8 in practice.





From rosuav at gmail.com  Fri May  8 03:40:09 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 8 May 2015 11:40:09 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <20150507153123.GT5663@ando.pearwood.info>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <20150507153123.GT5663@ando.pearwood.info>
Message-ID: <CAPTjJmpL-OzNLUgwJY6OLpcHrCJ+U9mrNgaKuXS_5fyuYJ4zfw@mail.gmail.com>

On Fri, May 8, 2015 at 1:31 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> But the other 90% of the complexity is inherent to human languages. For
> example, you know what the lower case of "I" is, don't you? It's "i".
> But not in Turkey, which has both a dotted and dotless version:
>
>     I ?
>     ? i
>
> (Strangely, as far as I know, nobody has a dotted J or dotless j.)
>
> Consequently, Unicode has a bunch of complexity related to left-to-right
> and right-to-left writing systems, accents, joiners, variant forms, and
> other issues. But, unless you're actually writing in a language which
> needs that, or writing a word-processor application, you can usually
> ignore all of that and just treat them as "characters".

Or a transliteration script. Imagine you have a whole lot of videos
with text over them, and you'd like to transcribe that text into,
well, a text file. It's pretty easy with Latin-based scripts; just
come up with a notation for keying in diacriticals and the handful of
other characters (slashed O for Norwegian, D with bar for Vietnamese,
etc), then (optionally) perform an NFC transformation, and job's done.
Cyrillic, Greek, Elder Futhark, and even IPA, can be handled fairly
readily by means of simple reversible transliterations (? becomes d, d
becomes ?), with a handful of special cases (the Greek sigma has
medial (?) and final (?) forms, both of which translate into the Latin
letter 's'). Korean's hangul syllables are a slightly odd case,
because they can be NFC composed from individual letters, but the
decomposed forms take up more space on the page, which makes the NFC
transformation mandatory:

"hanguk" = "\u1112\u1161\u11ab\u1100\u116e\u11a8" = "\ud55c\uad6d" = "Korea"

Aside from that, all the complexities are, as Steven says, inherent to
human languages. Unicode isn't the problem; Unicode is just reflecting
the fact that people write stuff differently. Python also isn't the
problem; Python is one of my top two preferred languages for any sort
of international work (the other being Pike, and for all the same
reasons).

>> Imageine if we were starting to design the 21st century from scratch,
>> throwing away all the history?  How would we go about it?
>
> Well, for starters I would insist on re-introducing thorn ? and eth ?
> back into English :-)

Sure, that'll unify us with ancient texts, and with modern Icelandic.
But what about other languages with the same sound (IPA: ?)? European
Spanish (though not Mexican Spanish) spells it as "z" - English could
do the same, given that "s" is able to make the same sound "z" does in
English. :)

But seriously, the alphabetic languages aren't much of a problem.
Unicode can cope with European languages easily. What I'd want to
change is to use some form of phonetic system for Chinese and Japanese
languages - a system in which the written form does its best to
correspond to the spoken form, rather than the massively complex
pictorial system now in use. At very least, I'd like to see an
alternative written form used for names, in which they're composed of
sounds; that way, there'd be a finite set of characters in use, and
it'd be far easier for us to cope with them. (The problem of a
collision would be no worse than already exists when names are said
aloud. Having multiple characters pronounced the same way is a benefit
only to the written form.) It's too late now, of course.

ChrisA

From steve at pearwood.info  Fri May  8 03:54:56 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 8 May 2015 11:54:56 +1000
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <CANc-5Uz_TJ9YiibN_q15t7-mQamD-0qN=UGLGV5R=pHGYBHtkw@mail.gmail.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <migao1$1kt$1@ger.gmane.org>
 <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com>
 <CABmzr0gs2G5oV5zuGynUuH4oPck0Vy7K4e=PgbOKRpjivS8icQ@mail.gmail.com>
 <CANc-5Uz_TJ9YiibN_q15t7-mQamD-0qN=UGLGV5R=pHGYBHtkw@mail.gmail.com>
Message-ID: <20150508015455.GV5663@ando.pearwood.info>

On Thu, May 07, 2015 at 03:12:58PM -0500, Skip Montanaro wrote:
> I haven't seen anyone else mention it, so I will point out:
> interoperability with C. In C, strings are NUL-terminated. PyStringObject
> instances do (or used to) have NUL-terminated strings in them. According to
> unicodeobject.h, that seems still to be the case:

How does that work? Python strings can contain embedded NULs:

s = u"abc\0def"


-- 
Steve

From rosuav at gmail.com  Fri May  8 04:02:01 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 8 May 2015 12:02:01 +1000
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <20150508015455.GV5663@ando.pearwood.info>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info>
 <migao1$1kt$1@ger.gmane.org>
 <81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com>
 <CABmzr0gs2G5oV5zuGynUuH4oPck0Vy7K4e=PgbOKRpjivS8icQ@mail.gmail.com>
 <CANc-5Uz_TJ9YiibN_q15t7-mQamD-0qN=UGLGV5R=pHGYBHtkw@mail.gmail.com>
 <20150508015455.GV5663@ando.pearwood.info>
Message-ID: <CAPTjJmqCxgucwJwswU-R8wjreMxGnwS8c+E2LrTW65_BUGsxGQ@mail.gmail.com>

On Fri, May 8, 2015 at 11:54 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Thu, May 07, 2015 at 03:12:58PM -0500, Skip Montanaro wrote:
>> I haven't seen anyone else mention it, so I will point out:
>> interoperability with C. In C, strings are NUL-terminated. PyStringObject
>> instances do (or used to) have NUL-terminated strings in them. According to
>> unicodeobject.h, that seems still to be the case:
>
> How does that work? Python strings can contain embedded NULs:
>
> s = u"abc\0def"

It's a pure convenience. It means that C string operations are
guaranteed to terminate; they aren't guaranteed to process the whole
string, but they won't run on into random memory. For a lot of cases,
that's pretty handy.

ChrisA

From steve at pearwood.info  Fri May  8 04:11:26 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 8 May 2015 12:11:26 +1000
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <migao1$1kt$1@ger.gmane.org>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <migao1$1kt$1@ger.gmane.org>
Message-ID: <20150508021126.GW5663@ando.pearwood.info>

On Thu, May 07, 2015 at 02:26:06PM -0400, Terry Reedy wrote:
> On 5/7/2015 11:46 AM, Steven D'Aprano wrote:
> >On Wed, May 06, 2015 at 07:05:15PM -0700, Neil Girdhar wrote:
> >>Since strings are constant, wouldn't it be much faster to implement string
> >>slices as a view of other strings?
> >
> >String or list views would be *very* useful in situations like this:
> >
> ># Create a massive string
> >s = "some string"*1000000
> >for c in s[1:]:
> >     process(c)
> 
> Easily done without slicing, as discussed on python-list multiple times.

For some definition of "easy".

If all you want is to skip the first item, this is not too bad:

> it = iter(s)
> next(it)
> for c in it: process(c)

Skipping the *last* item, on the other hand?

for c in s[:-1]:
    process(c)

Yes, it can be done, but its even messier and uglier and a sequence view 
would make it neat and pretty:

it = iter(s)
prev = next(it)
for c in it:
    process(prev)
    prev = c


> for s[5555: 399999], use explicit indexes
> for i in range(5555, 400000): process s[i]

What, are we programming in Fortran, like some sort of Neanderthal?

*grins*


The point isn't that we cannot solve these problems without views, but 
that views would let us solve them in a clean Pythonic manner.



-- 
Steve

From ernest.moloko at gmail.com  Fri May  8 06:35:44 2015
From: ernest.moloko at gmail.com (Lesego Moloko)
Date: Fri, 8 May 2015 06:35:44 +0200
Subject: [Python-ideas] Problems with Python
Message-ID: <C6A032C5-865E-4FC0-AE88-5D374A6DB5B1@gmail.com>

Dear all 

I am python novice and I am experiencing some problems with one of my programs that I converted from Matlab to Python. Is this an appropriate platform to ask such questions?

I will be awaiting your answer before posting details of my problem,

Thank you 

Regards
Lesego 

Sent from Lesego's iPhone

From rosuav at gmail.com  Fri May  8 06:38:13 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 8 May 2015 14:38:13 +1000
Subject: [Python-ideas] Problems with Python
In-Reply-To: <C6A032C5-865E-4FC0-AE88-5D374A6DB5B1@gmail.com>
References: <C6A032C5-865E-4FC0-AE88-5D374A6DB5B1@gmail.com>
Message-ID: <CAPTjJmqyKfaAaBKvXNt3Qg2J0XQSuRwEw7PryKnwpsMRU39CRw@mail.gmail.com>

On Fri, May 8, 2015 at 2:35 PM, Lesego Moloko <ernest.moloko at gmail.com> wrote:
> I am python novice and I am experiencing some problems with one of my programs that I converted from Matlab to Python. Is this an appropriate platform to ask such questions?
>
> I will be awaiting your answer before posting details of my problem,
>

This list is for discussion of ideas about future development of the
Python language itself. The best place to ask would be
python-list at python.org, which is two-way gatewayed with the
comp.lang.python newsgroup; you'll find lots of people there who are
happy to help out!

ChrisA

From ben+python at benfinney.id.au  Fri May  8 07:15:43 2015
From: ben+python at benfinney.id.au (Ben Finney)
Date: Fri, 08 May 2015 15:15:43 +1000
Subject: [Python-ideas] Problems with Python
References: <C6A032C5-865E-4FC0-AE88-5D374A6DB5B1@gmail.com>
Message-ID: <857fsjd4zk.fsf@benfinney.id.au>

Lesego Moloko <ernest.moloko at gmail.com>
writes:

> I am python novice and I am experiencing some problems with one of my
> programs that I converted from Matlab to Python. Is this an
> appropriate platform to ask such questions?

If you're looking for a free-for-all discussion forum for Python
programmers, see <URL:news:comp.lang.python>.

If you're looking for a forum dedicated to teaching Python newcomers,
see <URL:https://mail.python.org/mailman/listinfo/tutor>.

These and more Python community forums are documented at
<URL:https://www.python.org/community/lists/>.

Thanks for asking!

-- 
 \            ?The whole area of [treating source code as intellectual |
  `\    property] is almost assuring a customer that you are not going |
_o__)               to do any innovation in the future.? ?Gary Barnett |
Ben Finney


From rustompmody at gmail.com  Fri May  8 07:19:50 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Fri, 8 May 2015 10:49:50 +0530
Subject: [Python-ideas] (no subject)
In-Reply-To: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
Message-ID: <CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>

On Wed, May 6, 2015 at 6:45 PM, Ivan Levkivskyi <levkivskyi at gmail.com>
wrote:

> Dear all,
>
> The matrix multiplication operator @ is going to be introduced in Python
> 3.5 and I am thinking about the following idea:
>
> The semantics of matrix multiplication is the composition of the
> corresponding linear transformations.
> A linear transformation is a particular example of a more general concept
> - functions.
> The latter are frequently composed with ("wrap") each other. For example:
>
> plot(real(sqrt(data)))
>
> However, it is not very readable in case of many wrapping layers.
> Therefore, it could be useful to employ
> the matrix multiplication operator @ for indication of function
> composition. This could be done by such (simplified) decorator:
>
> class composable:
>
>     def __init__(self, func):
>         self.func = func
>
>     def __call__(self, arg):
>         return self.func(arg)
>
>     def __matmul__(self, other):
>         def composition(*args, **kwargs):
>             return self.func(other(*args, **kwargs))
>         return composable(composition)
>
> I think using such decorator with functions that are going to be deeply
> wrapped
> could improve readability.
> You could compare (note that only the outermost function should be
> decorated):
>
> plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
> (data_array)
>
> I think the latter is more readable, also compare
>
> def sunique(lst):
>     return sorted(list(set(lst)))
>
> vs.
>
> sunique = sorted @ list @ set
>

I would like to suggest that if composition is in fact added to python its
order is 'corrected'
ie in math there are two alternative definitions of composition

[1] f o g = ? x ? g(f(x))
[2] f o g = ? x ? f(g(x))

[2] is more common but [1] is also used

And IMHO [1] is much better for left-to-right reading so your example
becomes
sunique = set @ list @ sorted
which reads as smoothly as a classic Unix pipeline:

"Unnamed parameter input to set; output inputted to list; output inputted
to sort"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150508/81e445f3/attachment.html>

From koos.zevenhoven at aalto.fi  Fri May  8 09:03:28 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Fri, 8 May 2015 10:03:28 +0300
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
Message-ID: <554C5FC0.1070106@aalto.fi>

On 8.5.2015 8:19, Rustom Mody wrote:
> On Wed, May 6, 2015 at 6:45 PM, Ivan Levkivskyi <levkivskyi at gmail.com 
> <mailto:levkivskyi at gmail.com>> wrote:
>
>
>     def sunique(lst):
>         return sorted(list(set(lst)))
>
>     vs.
>
>     sunique = sorted @ list @ set
>
>
> I would like to suggest that if composition is in fact added to python 
> its order is 'corrected'
> ie in math there are two alternative definitions of composition
>
> [1] f o g = ? x ? g(f(x))
> [2] f o g = ? x ? f(g(x))
>
> [2] is more common but [1] is also used
>
> And IMHO [1] is much better for left-to-right reading so your example 
> becomes
> sunique = set @ list @ sorted
> which reads as smoothly as a classic Unix pipeline:
>
> "Unnamed parameter input to set; output inputted to list; output 
> inputted to sort"
>
>

While both versions make sense, [2] is the one that resembles the 
chaining of linear operators or matrices, since column vectors are the 
convention. For the left-to-right pipeline version, some other operator 
might be more appropriate. Also, it would then be more clear to also 
feed x into the pipeline from the left, instead of putting (x) on the 
right like in a normal function call.

As a random example, (root @ mean @ square)(x) would produce the right 
order for rms when using [2].

-- Koos

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150508/8570f274/attachment-0001.html>

From mal at egenix.com  Fri May  8 09:59:01 2015
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 May 2015 09:59:01 +0200
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <CAA68w_k7GxrSzpVMb4x6uv0tSA2aWNWENQaatCYN3KnQSA+Z3g@mail.gmail.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>	<20150507154621.GU5663@ando.pearwood.info>
 <migao1$1kt$1@ger.gmane.org>	<81FDF3E7-82A3-4ACA-9547-0A3F426371E8@yahoo.com>	<CABmzr0gs2G5oV5zuGynUuH4oPck0Vy7K4e=PgbOKRpjivS8icQ@mail.gmail.com>	<CANc-5Uz_TJ9YiibN_q15t7-mQamD-0qN=UGLGV5R=pHGYBHtkw@mail.gmail.com>
 <CAA68w_k7GxrSzpVMb4x6uv0tSA2aWNWENQaatCYN3KnQSA+Z3g@mail.gmail.com>
Message-ID: <554C6CC5.8010605@egenix.com>

On 08.05.2015 00:09, Neil Girdhar wrote:
> This was a CPython idea after all, so I was assuming a C implementation,
> which means that new flags would have to be added to the string object to
> denote a string slice, etc.
> 
> Like I said in another message: it's not that important to me though.  I
> was just curious as to why CPython was designed so that string slicing is
> linear rather than constant time given that strings are constant.

This was considered very early on in the Unicode type design, but
dropped since the problem with such slices is that you have to keep
a reference to the original string around which keeps this alive,
even if you just use a slice of a few chars from it.

There are some situations where such a slicing mechanism
would be nice to have, but in most of those you can simply
work on the original string using an offset index.

Indeed, working with index tuples into the original string
is often a better strategy.

You can see this used in mxTextTools:

http://www.egenix.com/products/python/mxBase/mxTextTools/

to create high performance text parsing and manipulation
tools.

> I'm getting the impression that the payoff is not worth the complexity.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 08 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From stefan at bytereef.org  Fri May  8 11:09:21 2015
From: stefan at bytereef.org (Stefan Krah)
Date: Fri, 8 May 2015 09:09:21 +0000 (UTC)
Subject: [Python-ideas] discouraging direct use of the C-API
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <loom.20150507T103957-492@post.gmane.org>
 <E9CF9DEE-F4F3-4DEE-B64F-AEF0E23312FD@gmail.com>
 <loom.20150507T110846-798@post.gmane.org>
 <CACac1F98N-Yekiv5rsX85N3vnNk-6vs6HyhcT0XwiXrQDHv5_A@mail.gmail.com>
Message-ID: <loom.20150508T110557-327@post.gmane.org>

Paul Moore <p.f.moore at ...> writes:
> > https://mail.python.org/pipermail/python-dev/2013-December/130772.html
> >
> >
> > CFFI is very nice (superb API), but not for high performance use cases.
> 
> I'm guessing that benchmark used cffi in the "ABI level" dynamic form
> that matches ctypes. Did you try the cffi "API level" form that
> creates a C extension? I'd be curious as to where that falls in
> performance.

ffi.verify() is only about 10% faster both in pypy and cpython, so it 
doesn't change much in the posted figures.



Stefan Krah


From stefan_ml at behnel.de  Fri May  8 12:17:40 2015
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 08 May 2015 12:17:40 +0200
Subject: [Python-ideas] discouraging direct use of the C-API
In-Reply-To: <20150507123239.GA1768@k3>
References: <CALFfu7A+aScF87wAZ4zDBdcJiCW5FCDAh89WJQo-L+OgEsS9tA@mail.gmail.com>
 <20150507123239.GA1768@k3>
Message-ID: <mii2g4$u2p$1@ger.gmane.org>

David Wilson schrieb am 07.05.2015 um 14:32:
> On Wed, May 06, 2015 at 10:23:09AM -0600, Eric Snow wrote:
>> put a big red note at the top of every page of the C-API docs that
>> encourages folks to either use CFFI or Cython.
> 
> One of CPython's traditional strongholds is its use as an embedded
> language. I've worked on a bunch of commercial projects using it in this
> way, often specifically for improved performance/access to interpreter
> internals, and this is not to mention the numerous free software
> projects doing similar: gdb, uwsgi, mod_python, Freeswitch, and so on.

Ah, yes, there is a big wall in the CPython docs between "extending" and
"embedding" that gives users the impression that they are really different
concepts. But that's just marketing. They are not. The only difference is
that in one case, it's the CPython interpreter that starts up and then
calls into native user code, and in the other case, it's user code that
starts up and then launches a CPython interpreter. From the moment on where
both the user code and the CPython interpreter are running, there is
exactly zero difference between the two, and you can use the same tools for
interfacing native code with Python code in both cases.

What this means is that even in an embedding scenario, user code will
typically only need to call a tiny set of C-API functions to start up and
shut down the interpreter, and then leave all the rest, all the interesting
stuff, to tools that do it better.

Stefan



From storchaka at gmail.com  Fri May  8 13:18:07 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Fri, 08 May 2015 14:18:07 +0300
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <20150505172845.GF5663@ando.pearwood.info>
References: <mi79rj$vl8$1@ger.gmane.org>
 <20150505172845.GF5663@ando.pearwood.info>
Message-ID: <mii8mc$4f0$1@ger.gmane.org>

On 05.05.15 20:28, Steven D'Aprano wrote:
> On Mon, May 04, 2015 at 11:15:47AM +0300, Serhiy Storchaka wrote:
>> Surrogate characters (U+D800-U+DFFF) are not allowed in Unicode, but
>> Python allows them in Unicode strings for different purposes.
>>
>> 1) To represent UTF-8, UTF-16 or UTF-32 encoded strings that contain
>> surrogate characters. This data can came from other programs, including
>> Python 2.
>
> Can you give a simple example of a Python 2 program that provides output
> that Python 3 will read as surrogates?

f.write(u'?'[:1].encode('utf-8'))
json.dump(f, u'?'[:1])
pickle.dump(f, u'?'[:1])

>> 2) To represent undecodable bytes in ASCII-compatible encoding with the
>> "surrogateescape" error handlers.
>>
>> So surrogate characters can be obtained from "surrogateescape" or
>> "surrogatepass" error handlers or created manually with chr() or %c.
>>
>> Some encodings (UTF-7, unicode-escape) also allows surrogate characters.
>
> Also UTF-16, and possible others.
>
> I'm not entirely sure, but I think that this is a mistake, if not a
> bug. I think that *no* UTF encoding should allow lone surrogates to
> escape through encoding. But I not entirely sure, so I won't argue that
> now -- besides, it's irrelevant to the proposal.

UTF-7 is specified by RFC 2152 and should encode any UCS-2 character. 
unicode-escape and raw-unicode-escape should encode any Python string. 
This can't be changed.

UTF-8, UTF-16, and UTF-32 don't encode surrogates by default in current 
Python 3, but encode surrogates in Python 2. The "surrogatepass" error 
handler was added for compatibility with Python 2.

>> But on output the surrogate characters can cause fail.
>
> What do you mean by "on output"? Do you mean when printing?

Printing, writing to text file, passing to C extension, that makes 
encoding internally, etc.

>> In issue18814 proposed several functions to work with surrogate and
>> astral characters. All these functions takes a string and returns a string.
>
> I like the idea of having better surrogate and astral character
> handling, but I don't think I like your suggested API of using functions
> for this. I think this is better handled as str-to-str codecs.
>
> Unfortunately, there is still no concensus of the much-debated return of
> str-to-str and byte-to-byte codecs via the str.encode and byte.decode
> methods. At one point people were talking about adding a separate method
> (transform?) to handle them, but that seems to have been forgotten.
> Fortunately the codecs module handles them just fine:
>
> py> codecs.encode("Hello world", "rot-13")
> 'Uryyb jbeyq'
>
>
> I propose, instead of your function/method rehandle_surrogatepass(), we
> add a pair of str-to-str codecs:
>
> codecs.encode(mystring, 'remove_surrogates', errors='strict')
> codecs.encode(mystring, 'remove_astrals', errors='strict')
>
> For the first one, if the string has no surrogates, it returns the
> string unchanged. If it contains any surrogates, the error handler runs
> in the usual fashion.
>
> The second is exactly the same, except it checks for astral characters.
>
> For the avoidance of doubt:
>
> * surrogates are code points in the range U+D800 to U+DFFF inclusive;
>
> * astrals are characters from the Supplementary Multilingual Planes,
>    that is code points U+10000 and above.
>
>
> Advantage of using codecs:
>
> - there's no arguments about where to put it (is it a str method? a
>    function? in the string module? some other module? where?)
>
> - we can use the usual codec machinery, rather than duplicate it;
>
> - people already understand that codecs and error handles go together;
>
> Disadvantage:
>
> - have to use codec.encode instead of str.encode.
>
>
> It is slightly sad that there is still no entirely obvious way to call
> str-to-str codecs from the encode method, but since this is a fairly
> advanced and unusual use-case, I don't think it is a problem that we
> have to use the codecs module.

Disadvantage of using codecs is that "decoding" operation doesn't make 
sense. If use one global registry for named transformation, it should be 
separate registry and separate method (str.transform) for one-way 
str-to-str transformations. In additional to above transformations of 
surrogates, it can contain transformations "upper", "lower", "title". 
But this is separate issue.



From storchaka at gmail.com  Fri May  8 13:54:37 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Fri, 08 May 2015 14:54:37 +0300
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <mii8mr$4qt$1@ger.gmane.org>

On 05.05.15 11:23, Stephen J. Turnbull wrote:
> Serhiy Storchaka writes:
>
>   > Use cases include programs that use tkinter (common build of Tcl/Tk
>   > don't accept non-BMP characters), email or wsgiref.
>
> So, consider Tcl/Tk.  If you use it for input, no problem, it *can't*
> produce non-BMP characters.  So you're using it for output.  If
> knowing that your design involves tkinter, you deduce you must not
> accept non-BMP characters on input, where's your problem?

With Tcl/Tk all is not so easy. The main issue is with translating from 
Tcl to Python. Tcl uses at least two representations for strings (UCS-2 
and modified UTF-8, and Latin1 in some cases), both can contain invalid 
codes and implicit conversion from one to other is lossy. Currently 
there is a way to crash IDLE (and may be other Tkinter applications) by 
just pasting mailformed data from clipboard. I don't think that my 
proposal will help Tkinter a lot, but there are requests for such 
features, and perhaps these functions could help to solve or workaround 
at least some of Tkinter issues.

> And ... you looked twice at your proposal?  You have basically
> reproduced the codec error handling API for .decode and .encode in a
> bunch to str2str "rehandle" functions.

Yes, this is the main advantage of proposed functions. They reuse 
existing error handlers and are extensible by writing new error handlers.

>  In other words, you need to
> know as much to use "rehandle_*" properly as you do to use .decode and
> .encode.  I do not see a win for the programmer who is mostly innocent
> of encoding knowledge.

Is it a problem? These functions are for experienced users. Perhaps 
mostly for authors of libraries and frameworks.

> If we apply these rehandle_* thumbs to the holes in the I18N dike,
> it's just going to spring more leaks elsewhere.

There are a lot of butteries included in Python. They can explode if use 
them incorrectly.

Sorry, I don't understand your frustration.



From jonathan at slenders.be  Fri May  8 14:16:41 2015
From: jonathan at slenders.be (Jonathan Slenders)
Date: Fri, 8 May 2015 14:16:41 +0200
Subject: [Python-ideas] What is happening with array.array('u') in Python 4?
Message-ID: <CAKfyG3xTePExzAHZH0ragGoH1HUsc13ssG4MgWqCFKYq9tvGkg@mail.gmail.com>

Hi all,

What will happen to array.array('u') in Python 4? It is deprecated right
now.
I remember reading about mutable strings somewhere, but I forgot, and I
can't find the discussion.

In any case, I need to have a mutable character array, for efficient
manipulations. (Not a byte array.)
And I need to be able to use the "re" module to search through it.
array.array('u') works great in Python 3.

Will we still have something like this in Python 4?

Jonathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150508/1e9ea41c/attachment.html>

From rosuav at gmail.com  Fri May  8 14:28:33 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 8 May 2015 22:28:33 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <mii8mc$4f0$1@ger.gmane.org>
References: <mi79rj$vl8$1@ger.gmane.org>
 <20150505172845.GF5663@ando.pearwood.info>
 <mii8mc$4f0$1@ger.gmane.org>
Message-ID: <CAPTjJmo+aK2hH9WuEWyT4AdoGbHNA8KhUMCS6LEvAYX3Y=+35w@mail.gmail.com>

On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> Can you give a simple example of a Python 2 program that provides output
>> that Python 3 will read as surrogates?
>
>
> f.write(u'?'[:1].encode('utf-8'))
> json.dump(f, u'?'[:1])
> pickle.dump(f, u'?'[:1])

Not for me. In my Python 2, u'?'[:1] == u'?'. I suppose you're
talking only about the (buggy) narrow builds, in which case you don't
need to use string slicing at all. But in that case, all you're doing
is using a single "\uNNNN" escape code to create an unmatched
surrogate.

ChrisA

From storchaka at gmail.com  Fri May  8 14:32:50 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Fri, 08 May 2015 15:32:50 +0300
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CAPTjJmo+aK2hH9WuEWyT4AdoGbHNA8KhUMCS6LEvAYX3Y=+35w@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <20150505172845.GF5663@ando.pearwood.info> <mii8mc$4f0$1@ger.gmane.org>
 <CAPTjJmo+aK2hH9WuEWyT4AdoGbHNA8KhUMCS6LEvAYX3Y=+35w@mail.gmail.com>
Message-ID: <miiadi$1nn$1@ger.gmane.org>

On 08.05.15 15:28, Chris Angelico wrote:
> On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>>> Can you give a simple example of a Python 2 program that provides output
>>> that Python 3 will read as surrogates?
>>
>>
>> f.write(u'?'[:1].encode('utf-8'))
>> json.dump(f, u'?'[:1])
>> pickle.dump(f, u'?'[:1])
>
> Not for me. In my Python 2, u'?'[:1] == u'?'. I suppose you're
> talking only about the (buggy) narrow builds, in which case you don't
> need to use string slicing at all. But in that case, all you're doing
> is using a single "\uNNNN" escape code to create an unmatched
> surrogate.

I want to say that that it is easy to unintentionally get a data with 
encoded lone surrogate in Python 2.



From rosuav at gmail.com  Fri May  8 14:41:01 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 8 May 2015 22:41:01 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <miiadi$1nn$1@ger.gmane.org>
References: <mi79rj$vl8$1@ger.gmane.org>
 <20150505172845.GF5663@ando.pearwood.info>
 <mii8mc$4f0$1@ger.gmane.org>
 <CAPTjJmo+aK2hH9WuEWyT4AdoGbHNA8KhUMCS6LEvAYX3Y=+35w@mail.gmail.com>
 <miiadi$1nn$1@ger.gmane.org>
Message-ID: <CAPTjJmruDTRBe69vYkKm1u3BZqOeFec2W_UHFTe4ycutMpu02g@mail.gmail.com>

On Fri, May 8, 2015 at 10:32 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> On 08.05.15 15:28, Chris Angelico wrote:
>>
>> On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka <storchaka at gmail.com>
>> wrote:
>>>>
>>>> Can you give a simple example of a Python 2 program that provides output
>>>> that Python 3 will read as surrogates?
>>>
>>>
>>>
>>> f.write(u'?'[:1].encode('utf-8'))
>>> json.dump(f, u'?'[:1])
>>> pickle.dump(f, u'?'[:1])
>>
>>
>> Not for me. In my Python 2, u'?'[:1] == u'?'. I suppose you're
>> talking only about the (buggy) narrow builds, in which case you don't
>> need to use string slicing at all. But in that case, all you're doing
>> is using a single "\uNNNN" escape code to create an unmatched
>> surrogate.
>
>
> I want to say that that it is easy to unintentionally get a data with
> encoded lone surrogate in Python 2.

Only on Windows, where the standard builds are narrow ones. (Also, how
hard and how bad would it be to change that, and have all python.org
installers produce wide builds?)

ChrisA

From stefan_ml at behnel.de  Fri May  8 14:50:36 2015
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 08 May 2015 14:50:36 +0200
Subject: [Python-ideas] What is happening with array.array('u') in
	Python 4?
In-Reply-To: <CAKfyG3xTePExzAHZH0ragGoH1HUsc13ssG4MgWqCFKYq9tvGkg@mail.gmail.com>
References: <CAKfyG3xTePExzAHZH0ragGoH1HUsc13ssG4MgWqCFKYq9tvGkg@mail.gmail.com>
Message-ID: <miiber$g87$1@ger.gmane.org>

Jonathan Slenders schrieb am 08.05.2015 um 14:16:
> What will happen to array.array('u') in Python 4? It is deprecated right
> now.
> I remember reading about mutable strings somewhere, but I forgot, and I
> can't find the discussion.
> 
> In any case, I need to have a mutable character array, for efficient
> manipulations. (Not a byte array.)
> And I need to be able to use the "re" module to search through it.
> array.array('u') works great in Python 3.

Well, for some value of "great" and "works". The problems are that 1) 'u'
has a platform dependent size of 16 or 32 bits and 2) it does not match the
internal representation of unicode strings. It will thus use surrogate
pairs on some platforms and not on others, and converting between Unicode
strings and arrays requires an encoding/decoding step. And it also does not
seem like the "re" module currently supports searching in unicode arrays
(everything else would have been very surprising).

ISTM that your best bet is currently to look for a suitable module on PyPI
that implements mutable character arrays. I'm sure you're not the only one
who needs something like that. The usual suspect would be NumPy, but there
may be smaller and simpler tools available.

Stefan



From mal at egenix.com  Fri May  8 15:00:02 2015
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 08 May 2015 15:00:02 +0200
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CAPTjJmruDTRBe69vYkKm1u3BZqOeFec2W_UHFTe4ycutMpu02g@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>	<20150505172845.GF5663@ando.pearwood.info>	<mii8mc$4f0$1@ger.gmane.org>	<CAPTjJmo+aK2hH9WuEWyT4AdoGbHNA8KhUMCS6LEvAYX3Y=+35w@mail.gmail.com>	<miiadi$1nn$1@ger.gmane.org>
 <CAPTjJmruDTRBe69vYkKm1u3BZqOeFec2W_UHFTe4ycutMpu02g@mail.gmail.com>
Message-ID: <554CB352.9090208@egenix.com>

On 08.05.2015 14:41, Chris Angelico wrote:
> On Fri, May 8, 2015 at 10:32 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> On 08.05.15 15:28, Chris Angelico wrote:
>>>
>>> On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka <storchaka at gmail.com>
>>> wrote:
>>>>>
>>>>> Can you give a simple example of a Python 2 program that provides output
>>>>> that Python 3 will read as surrogates?
>>>>
>>>>
>>>>
>>>> f.write(u'?'[:1].encode('utf-8'))
>>>> json.dump(f, u'?'[:1])
>>>> pickle.dump(f, u'?'[:1])
>>>
>>>
>>> Not for me. In my Python 2, u'?'[:1] == u'?'. I suppose you're
>>> talking only about the (buggy) narrow builds, in which case you don't
>>> need to use string slicing at all. But in that case, all you're doing
>>> is using a single "\uNNNN" escape code to create an unmatched
>>> surrogate.
>>
>>
>> I want to say that that it is easy to unintentionally get a data with
>> encoded lone surrogate in Python 2.
> 
> Only on Windows, where the standard builds are narrow ones. (Also, how
> hard and how bad would it be to change that, and have all python.org
> installers produce wide builds?)

Not only on Windows. The default Python 2 build is a narrow build.

Most Unix distributions explicitly switch on the UCS4 support,
so you usually get UCS4 versions on Unix, but the default still
is UCS2.

In Python 3.3+ this doesn't matter anymore, since Python selects
the storage type based on the string content, so you get
UCS2/UCS4 as needed on all platforms.

All that said, it's still possible to work with lone surrogates
in Python, so Serhiy's example still applies in concept.

And slicing surrogates is only one way to break Unicode strings.
The many combining characters and annotations offer plenty
more :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 08 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From ron3200 at gmail.com  Fri May  8 18:05:54 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Fri, 08 May 2015 12:05:54 -0400
Subject: [Python-ideas] (no subject)
In-Reply-To: <CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
Message-ID: <miimt3$qhv$1@ger.gmane.org>



On 05/08/2015 01:19 AM, Rustom Mody wrote:
> On Wed, May 6, 2015 at 6:45 PM, Ivan Levkivskyi
> <levkivskyi at gmail.com
> <mailto:levkivskyi at gmail.com>> wrote:
>
>     Dear all,
>
>     The matrix multiplication operator @ is going to be introduced in
>     Python 3.5 and I am thinking about the following idea:
>
>     The semantics of matrix multiplication is the composition of the
>     corresponding linear transformations.
>     A linear transformation is a particular example of a more general
>     concept - functions.
>     The latter are frequently composed with ("wrap") each other. For example:
>
>     plot(real(sqrt(data)))
>
>     However, it is not very readable in case of many wrapping layers.
>     Therefore, it could be useful to employ
>     the matrix multiplication operator @ for indication of function
>     composition. This could be done by such (simplified) decorator:
>
>     class composable:
>
>          def __init__(self, func):
>              self.func = func
>
>          def __call__(self, arg):
>              return self.func(arg)
>
>          def __matmul__(self, other):
>              def composition(*args, **kwargs):
>                  return self.func(other(*args, **kwargs))
>              return composable(composition)
>
>     I think using such decorator with functions that are going to be deeply
>     wrapped
>     could improve readability.
>     You could compare (note that only the outermost function should be
>     decorated):
>
>     plot(sorted(sqrt(real(data_array)))) vs. (plot @ sorted @ sqrt @ real)
>     (data_array)
>
>     I think the latter is more readable, also compare
>
>     def sunique(lst):
>          return sorted(list(set(lst)))
>
>     vs.
>
>     sunique = sorted @ list @ set
>
>
> I would like to suggest that if composition is in fact added to python its
> order is 'corrected'
> ie in math there are two alternative definitions of composition
>
> [1] f o g = ? x ? g(f(x))
> [2] f o g = ? x ? f(g(x))
>
> [2] is more common but [1] is also used
>
> And IMHO [1] is much better for left-to-right reading so your example becomes
> sunique = set @ list @ sorted
> which reads as smoothly as a classic Unix pipeline:
>
> "Unnamed parameter input to set; output inputted to list; output inputted
> to sort"

Here's how I would do it as a function.

 >>> def apply(data, *fns):
...     for f in fns:
...         data = f(data)
...     return data
...
 >>> apply((8, 9, 8, 4, 5), set, list, sorted)
[4, 5, 8, 9]

This is a variation of reduce except, it's applying many functions to a 
single data item rather than applying a single function to many data items.

       result = apply(data, f, g, e)

Which would be the same as...

       result = e(g(f(data)))

Having the order be in alignment with object methods calls is a consistency 
which can help with learning how to use it.

I don't think special syntax has an advantage over a function for this.  It 
may even be a disadvantage. The problem with special syntax is it can't be 
represented as data easily.  That would be counter to a functional style of 
programming, which seems at odds with the desired feature. (IMO)

      fns = (set, list, sorted)
      result = apply(data, *fns)

Also having this right next to reduce in functools would work nicely, both 
for usability and documentation.

Cheers,
    Ron





From rosuav at gmail.com  Fri May  8 18:13:47 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 9 May 2015 02:13:47 +1000
Subject: [Python-ideas] (no subject)
In-Reply-To: <miimt3$qhv$1@ger.gmane.org>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <miimt3$qhv$1@ger.gmane.org>
Message-ID: <CAPTjJmoVR8vw3SRAEhy7h-h+V-c8KsCSMtuWTRC-w=tybwT88g@mail.gmail.com>

On Sat, May 9, 2015 at 2:05 AM, Ron Adam <ron3200 at gmail.com> wrote:
> I don't think special syntax has an advantage over a function for this.  It
> may even be a disadvantage. The problem with special syntax is it can't be
> represented as data easily.  That would be counter to a functional style of
> programming, which seems at odds with the desired feature. (IMO)
>
>      fns = (set, list, sorted)
>      result = apply(data, *fns)

There's no problem with representing it as data; just like elsewhere
in Python, you can break out a subexpression and give it a name.

(a @ b @ c)(x)
# <=>
f = (a @ b @ c)
f(x)

It's no different from method calls:

sys.stdout.write("Hello, world!\n")
# <=>
write = sys.stdout.write
write("Hello, world!\n")

ChrisA

From ron3200 at gmail.com  Fri May  8 19:10:49 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Fri, 08 May 2015 13:10:49 -0400
Subject: [Python-ideas] Function Composition       was:Re: (no subject)
In-Reply-To: <CAPTjJmoVR8vw3SRAEhy7h-h+V-c8KsCSMtuWTRC-w=tybwT88g@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <miimt3$qhv$1@ger.gmane.org>
 <CAPTjJmoVR8vw3SRAEhy7h-h+V-c8KsCSMtuWTRC-w=tybwT88g@mail.gmail.com>
Message-ID: <miiqmq$rha$1@ger.gmane.org>



On 05/08/2015 12:13 PM, Chris Angelico wrote:
> On Sat, May 9, 2015 at 2:05 AM, Ron Adam<ron3200 at gmail.com>  wrote:
>> >I don't think special syntax has an advantage over a function for this.  It
>> >may even be a disadvantage. The problem with special syntax is it can't be
>> >represented as data easily.  That would be counter to a functional style of
>> >programming, which seems at odds with the desired feature. (IMO)
>> >
>> >      fns = (set, list, sorted)
>> >      result = apply(data, *fns)
> There's no problem with representing it as data; just like elsewhere
> in Python, you can break out a subexpression and give it a name.
>
> (a @ b @ c)(x)
> # <=>
> f = (a @ b @ c)
> f(x)
>
> It's no different from method calls:
>
> sys.stdout.write("Hello, world!\n")
> # <=>
> write = sys.stdout.write
> write("Hello, world!\n")

That's good.

What advantage over using a function does the syntax have?  So far it looks 
fairly equal.

I think a function would be much simpler to implement, document, and 
maintain.  (Unless there are clear advantages to the syntax has that a 
function doesn't.)

Cheers,
    Ron


From abarnert at yahoo.com  Fri May  8 21:45:10 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 8 May 2015 12:45:10 -0700
Subject: [Python-ideas] (no subject)
In-Reply-To: <CAPTjJmoVR8vw3SRAEhy7h-h+V-c8KsCSMtuWTRC-w=tybwT88g@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <miimt3$qhv$1@ger.gmane.org>
 <CAPTjJmoVR8vw3SRAEhy7h-h+V-c8KsCSMtuWTRC-w=tybwT88g@mail.gmail.com>
Message-ID: <C21DF270-1853-4BEF-A9B4-1C3C3D380047@yahoo.com>

On May 8, 2015, at 09:13, Chris Angelico <rosuav at gmail.com> wrote:
> 
>> On Sat, May 9, 2015 at 2:05 AM, Ron Adam <ron3200 at gmail.com> wrote:
>> I don't think special syntax has an advantage over a function for this.  It
>> may even be a disadvantage. The problem with special syntax is it can't be
>> represented as data easily.  That would be counter to a functional style of
>> programming, which seems at odds with the desired feature. (IMO)
>> 
>>     fns = (set, list, sorted)
>>     result = apply(data, *fns)
> 
> There's no problem with representing it as data; just like elsewhere
> in Python, you can break out a subexpression and give it a name.
> 
> (a @ b @ c)(x)
> # <=>
> f = (a @ b @ c)
> f(x)

Except that a@ and @b aren't subexpressions in Python. With a function, you can write them with partial; with an operator... Well, you can write the first with the bound dunder method and the second with partial and the unbound dunder method, but I don't think anyone finds type(b).__mmul__ as readable as @ (not to mention that the former is monomorphic, while the latter, like almost everything in Python, is duck typed, as it should be).

> It's no different from method calls:
> 
> sys.stdout.write("Hello, world!\n")
> # <=>
> write = sys.stdout.write
> write("Hello, world!\n")

There are other things that are a lot easier to do with a function than an operator, which aren't a problem for methods.

For example, you can unpack arguments. How would you write apply(value, *funcs) or compose(*funcs) with an operator without calling reduce on type(funcs[0]).__mmul__) or similar?

Of course you can always get around any of these problems by wrapping any operator expression up in a function with lambda--but if that really were good enough in practice, we wouldn't have comprehensions (after all, you can do the same thing with map just by wrapping up the expression with lambda--but nobody ever does that except people trying to use Python as Lisp, and nobody else wants to read their code).

This is (a part of) what I meant when I said that just posting Haskell's compose operator without all of the other language features and style idioms that make it so useful won't necessarily give us a useful Python feature. In Haskell, a@ and @b are subexpressions that can be given names and passed around, you can turn @ into a function just by wrapping it in parens rather than having to pull the dunder method off a type, etc. That solves most of these problems automatically, but we don't want to import all of those features into Python. (On the other hand, Haskell only "solves" the *args unpacking issue by just not allowing variable or even optional parameters--I'm not arguing that Haskell is "better" here, just that a compose operator fits into Haskell better than into Python.)

From chris.barker at noaa.gov  Fri May  8 22:46:20 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Fri, 8 May 2015 13:46:20 -0700
Subject: [Python-ideas] What is happening with array.array('u') in
	Python 4?
In-Reply-To: <miiber$g87$1@ger.gmane.org>
References: <CAKfyG3xTePExzAHZH0ragGoH1HUsc13ssG4MgWqCFKYq9tvGkg@mail.gmail.com>
 <miiber$g87$1@ger.gmane.org>
Message-ID: <CALGmxEK7h8OwLU4ruQTWVznAGj-AVoDQ+Bdv4nNvK-20sVeeDg@mail.gmail.com>

On Fri, May 8, 2015 at 5:50 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:

> ISTM that your best bet is currently to look for a suitable module on PyPI
> that implements mutable character arrays. I'm sure you're not the only one
> who needs something like that. The usual suspect would be NumPy, but there
> may be smaller and simpler tools available.


Numpy does have mutable character arrays -- and the Unicode version uses
4bytes per char, regardless of platform (and so should array.array!)

But I don't think you get much of any of the features of strings, and I
doubt that the re module would work with it.

A "real" mutable string type might be pretty nice to have , but I think it
would be pretty hard to d to get it to do everything a string can do. (or
maybe not -- I suppose you could cut and paste the regular string cdce, and
simply add the mutable part....)

-Chris







-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150508/6ff551fd/attachment.html>

From random832 at fastmail.us  Fri May  8 23:45:35 2015
From: random832 at fastmail.us (random832 at fastmail.us)
Date: Fri, 08 May 2015 17:45:35 -0400
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <554CB352.9090208@egenix.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <20150505172845.GF5663@ando.pearwood.info>
 <mii8mc$4f0$1@ger.gmane.org>
 <CAPTjJmo+aK2hH9WuEWyT4AdoGbHNA8KhUMCS6LEvAYX3Y=+35w@mail.gmail.com>
 <miiadi$1nn$1@ger.gmane.org>
 <CAPTjJmruDTRBe69vYkKm1u3BZqOeFec2W_UHFTe4ycutMpu02g@mail.gmail.com>
 <554CB352.9090208@egenix.com>
Message-ID: <1431121535.657430.264648673.5D917DDF@webmail.messagingengine.com>

On Fri, May 8, 2015, at 09:00, M.-A. Lemburg wrote:
> Not only on Windows. The default Python 2 build is a narrow build.
> 
> Most Unix distributions explicitly switch on the UCS4 support,
> so you usually get UCS4 versions on Unix, but the default still
> is UCS2.

I had always assumed that the build system selects the default build
based on the size of wchar_t (2 on windows, 4 on most unix systems).

From chris.barker at noaa.gov  Fri May  8 22:39:57 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Fri, 8 May 2015 13:39:57 -0700
Subject: [Python-ideas] Problems with Python
In-Reply-To: <857fsjd4zk.fsf@benfinney.id.au>
References: <C6A032C5-865E-4FC0-AE88-5D374A6DB5B1@gmail.com>
 <857fsjd4zk.fsf@benfinney.id.au>
Message-ID: <CALGmxELVD667gfQYSEomzmeO08O9FkxUjyvGtkxHCcrRm8N0Hg@mail.gmail.com>

and if you are converting from MATLAB, then you probably are, and certainly
should, be using numpy, so the numpy list is good start:

http://www.numpy.org/

http://mail.scipy.org/mailman/listinfo/numpy-discussion

-Chris


On Thu, May 7, 2015 at 10:15 PM, Ben Finney <ben+python at benfinney.id.au>
wrote:

> Lesego Moloko <ernest.moloko at gmail.com>
> writes:
>
> > I am python novice and I am experiencing some problems with one of my
> > programs that I converted from Matlab to Python. Is this an
> > appropriate platform to ask such questions?
>
> If you're looking for a free-for-all discussion forum for Python
> programmers, see <URL:news:comp.lang.python>.
>
> If you're looking for a forum dedicated to teaching Python newcomers,
> see <URL:https://mail.python.org/mailman/listinfo/tutor>.
>
> These and more Python community forums are documented at
> <URL:https://www.python.org/community/lists/>.
>
> Thanks for asking!
>
> --
>  \            ?The whole area of [treating source code as intellectual |
>   `\    property] is almost assuring a customer that you are not going |
> _o__)               to do any innovation in the future.? ?Gary Barnett |
> Ben Finney
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150508/b559b83d/attachment-0001.html>

From stephen at xemacs.org  Sat May  9 04:58:35 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 09 May 2015 11:58:35 +0900
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <554C5FC0.1070106@aalto.fi>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
Message-ID: <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>

Koos Zevenhoven writes:

 > As a random example, (root @ mean @ square)(x) would produce the right 
 > order for rms when using [2].

Hardly interesting. :-)  The result is an exception, as root and square
are conceptually scalar-to-scalar, while mean is sequence-to-scalar.

I suppose you could write (root @ mean @ (map square)) (xs), which
seems to support your argument.  But will all such issues and
solutions give the same support?  This kind of thing is a conceptual
problem that has to be discussed pretty thoroughly (presumably based
on experience with implementations) before discussion of order can be
conclusive.


From koos.zevenhoven at aalto.fi  Sat May  9 06:00:52 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Sat, 9 May 2015 07:00:52 +0300
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>	<17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>	<554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <554D8674.4000004@aalto.fi>

On 9.5.2015 5:58, Stephen J. Turnbull wrote:
> Koos Zevenhoven writes:
>
>   > As a random example, (root @ mean @ square)(x) would produce the right
>   > order for rms when using [2].
>
> Hardly interesting. :-)  The result is an exception, as root and square
> are conceptually scalar-to-scalar, while mean is sequence-to-scalar.
>
> I suppose you could write (root @ mean @ (map square)) (xs), which
> seems to support your argument.  But will all such issues and
> solutions give the same support?  This kind of thing is a conceptual
> problem that has to be discussed pretty thoroughly (presumably based
> on experience with implementations) before discussion of order can be
> conclusive.
>

Well, you're wrong :-)

Working code:

from numpy import sqrt, mean, square

rms = sqrt(mean(square(x)))

The point is that people have previously described sqrt(mean(square(x))) 
as root-mean-squared x, not squared-mean-root x. But yes, as I said, 
it's just one example.

-- Koos

From Nikolaus at rath.org  Sat May  9 06:04:26 2015
From: Nikolaus at rath.org (Nikolaus Rath)
Date: Fri, 08 May 2015 21:04:26 -0700
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <20150507154621.GU5663@ando.pearwood.info> (Steven D'Aprano's
 message of "Fri, 8 May 2015 01:46:21 +1000")
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info>
Message-ID: <87mw1es8fp.fsf@vostro.rath.org>

On May 07 2015, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA at public.gmane.org> wrote:
> But a view would be harmful in this situation:
>
> s = "some string"*1000000
> t = s[1:2]  # a view maskerading as a new string
> del s
>
> Now we keep the entire string alive long after it is needed.
>
> How would you solve the first problem without introducing the second?

Keep track of the reference count of the underlying string, and if it
goes down to one, turn the view into a copy and remove the sliced
original?

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

From rosuav at gmail.com  Sat May  9 07:01:42 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 9 May 2015 15:01:42 +1000
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <87mw1es8fp.fsf@vostro.rath.org>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info>
 <87mw1es8fp.fsf@vostro.rath.org>
Message-ID: <CAPTjJmpxpjvqCJ4TLNqaUqRn+hPQZxCErKVmQ+6-L-WVbbj3Og@mail.gmail.com>

On Sat, May 9, 2015 at 2:04 PM, Nikolaus Rath <Nikolaus at rath.org> wrote:
> On May 07 2015, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA at public.gmane.org> wrote:
>> But a view would be harmful in this situation:
>>
>> s = "some string"*1000000
>> t = s[1:2]  # a view maskerading as a new string
>> del s
>>
>> Now we keep the entire string alive long after it is needed.
>>
>> How would you solve the first problem without introducing the second?
>
> Keep track of the reference count of the underlying string, and if it
> goes down to one, turn the view into a copy and remove the sliced
> original?
>

T

From rosuav at gmail.com  Sat May  9 07:06:07 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 9 May 2015 15:06:07 +1000
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <87mw1es8fp.fsf@vostro.rath.org>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info>
 <87mw1es8fp.fsf@vostro.rath.org>
Message-ID: <CAPTjJmqQDN0_Qq-aVkWY9k_jKsdDtOXGC33vyNwG8XoHXx=spg@mail.gmail.com>

On Sat, May 9, 2015 at 2:04 PM, Nikolaus Rath <Nikolaus at rath.org> wrote:
> On May 07 2015, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA at public.gmane.org> wrote:
>> But a view would be harmful in this situation:
>>
>> s = "some string"*1000000
>> t = s[1:2]  # a view maskerading as a new string
>> del s
>>
>> Now we keep the entire string alive long after it is needed.
>>
>> How would you solve the first problem without introducing the second?
>
> Keep track of the reference count of the underlying string, and if it
> goes down to one, turn the view into a copy and remove the sliced
> original?

Oops, mis-sent (stupid touchpad on this new laptop). Trying again.

There might be multiple views, so a hard-coded refcount-of-one check
wouldn't work. The view would need to keep a weak reference to its
underlying string - but not in the sense of the Python weakref module,
which doesn't seem to have any notion of "about to be garbage
collected", but only "has now been garbage collected". Notably, by the
time a callback gets called, it's too late to retrieve information
from the callback itself. A modified form of weakref could do it,
though; with the understanding that the referents are immutable, and
premature transform from view to coalesced slice has no consequence
beyond performance, this could be done.

Ideally, it'd be an entirely invisible optimization.

ChrisA

From mistersheik at gmail.com  Sat May  9 07:23:15 2015
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sat, 9 May 2015 01:23:15 -0400
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <CAPTjJmqQDN0_Qq-aVkWY9k_jKsdDtOXGC33vyNwG8XoHXx=spg@mail.gmail.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org>
 <CAPTjJmqQDN0_Qq-aVkWY9k_jKsdDtOXGC33vyNwG8XoHXx=spg@mail.gmail.com>
Message-ID: <CAA68w_nmS9PaymNN6pNy3efkqAiRor2CZbQJmktp=WTOCu07Tg@mail.gmail.com>

Exactly.

You know, it might be nice to have a recipe that creates a view to any
abc.Sequence for when you know that the underlying sequence won't change
(or don't care).  Something like:

class View: ...

some_view = View("some string", slice(2, 5))

some_view[0: 2]
"me"

etc.

Also a MutableView class could be used for abc.MutableSequences.

Best,

Neil

On Sat, May 9, 2015 at 1:06 AM, Chris Angelico <rosuav at gmail.com> wrote:

> On Sat, May 9, 2015 at 2:04 PM, Nikolaus Rath <Nikolaus at rath.org> wrote:
> > On May 07 2015, Steven D'Aprano <steve-iDnA/YwAAsAk+I/
> owrrOrA at public.gmane.org> wrote:
> >> But a view would be harmful in this situation:
> >>
> >> s = "some string"*1000000
> >> t = s[1:2]  # a view maskerading as a new string
> >> del s
> >>
> >> Now we keep the entire string alive long after it is needed.
> >>
> >> How would you solve the first problem without introducing the second?
> >
> > Keep track of the reference count of the underlying string, and if it
> > goes down to one, turn the view into a copy and remove the sliced
> > original?
>
> Oops, mis-sent (stupid touchpad on this new laptop). Trying again.
>
> There might be multiple views, so a hard-coded refcount-of-one check
> wouldn't work. The view would need to keep a weak reference to its
> underlying string - but not in the sense of the Python weakref module,
> which doesn't seem to have any notion of "about to be garbage
> collected", but only "has now been garbage collected". Notably, by the
> time a callback gets called, it's too late to retrieve information
> from the callback itself. A modified form of weakref could do it,
> though; with the understanding that the referents are immutable, and
> premature transform from view to coalesced slice has no consequence
> beyond performance, this could be done.
>
> Ideally, it'd be an entirely invisible optimization.
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/184e8a8e/attachment-0001.html>

From stephen at xemacs.org  Sat May  9 08:40:48 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 09 May 2015 15:40:48 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <mii8mr$4qt$1@ger.gmane.org>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mii8mr$4qt$1@ger.gmane.org>
Message-ID: <87383645jj.fsf@uwakimon.sk.tsukuba.ac.jp>

Serhiy Storchaka writes:
 > On 05.05.15 11:23, Stephen J. Turnbull wrote:
 > > Serhiy Storchaka writes:
 > >
 > >   > Use cases include programs that use tkinter (common build of Tcl/Tk
 > >   > don't accept non-BMP characters), email or wsgiref.
 > >
 > > So, consider Tcl/Tk.  If you use it for input, no problem, it *can't*
 > > produce non-BMP characters.  So you're using it for output.  If
 > > knowing that your design involves tkinter, you deduce you must not
 > > accept non-BMP characters on input, where's your problem?
 > 
 > With Tcl/Tk all is not so easy.

I didn't claim *all* was easy; IME Tcl is just easy to break, and not
only in its Unicode handling.  But dealing with the problem you
mentioned at the interface between Python and Tcl/Tk can be done this
way.

 > The main issue is with translating from Tcl to Python. Tcl uses at
 > least two representations for strings (UCS-2 and modified UTF-8,
 > and Latin1 in some cases),

These are not represented *in Tcl* as Python str, are they?  If not,
they need to be converted with a regular byte-oriented codec, no?
Once again, a regular codec with appropriate error handler can deal
with it early, and better.  So fix Tkinter; it's probably not much
harder than documenting the correct use of these functions in dealing
with Tkinter.

 > > And ... you looked twice at your proposal?  You have basically
 > > reproduced the codec error handling API for .decode and .encode in a
 > > bunch to str2str "rehandle" functions.
 > 
 > Yes, this is the main advantage of proposed functions. They reuse
 > existing error handlers and are extensible by writing new error
 > handlers.

They also violate TOOWTDI.  In fact, that's their whole purpose.<wink/>

 > > In other words, you need to know as much to use "rehandle_*"
 > > properly as you do to use .decode and .encode.  I do not see a
 > > win for the programmer who is mostly innocent of encoding
 > > knowledge.
 > 
 > Is it a problem? These functions are for experienced users. Perhaps 
 > mostly for authors of libraries and frameworks.

Yes, it's a problem.  You say they're "for" experienced users, but
that's a null concept. You intend to make them *available* to all
users.  Very few users have experience in I18N technology, and those
are generally able to chain .encode().decode() correctly, which is
conceptually what you're doing anyway (in fact, that's the
*implementation* *you* published in issue18814!)

OTOH, *most* experienced users have experienced I18N headaches.  "To a
man with a hammer, every problem looks like a nail" but with this
hammer, mostly it's actually a thumb.  These functions should only
ever be used on input, but in practice programmers under time pressure
(and who isn't?) tend to apply bandaids at the point where the problem
is detected -- which is output, since Python itself has no problems
with lone surrogates or astral characters.

As for authors of libraries and frameworks, *they* should *really*
should be handling these problems at the external bytes -> internal
Unicode interface when the original data, and often metadata or even a
human user, is available for interrogation.  Not later, when all you
have is the resulting radioactive garbage, which you'll end up passing
on to the framework users.

 > > If we apply these rehandle_* thumbs to the holes in the I18N dike,
 > > it's just going to spring more leaks elsewhere.
 > 
 > There are a lot of butteries included in Python. They can explode
 > if use them incorrectly.

I think a better analogy is explosive, which can be useful if used
safely. :-)

If you have to add these functions, *please* do not put them anywhere
near the codecs.  They are not codecs, they do not transform the
representation of data.  They change the semantics of the data.  Put
them in a "validation" submodule of the unicodedata package, or create
a new unicodetools package or something like that to hold them.

And they should be documented as dangerous because the transformations
they perform cannot be inverted to get the original input once the
strings produced are passed to other code (unless you also pass the
history of transformations as metadata).  This matters in applications
where the input bytes may have been digitally signed, for example.

(I've posted the last two paragraphs in somewhat more precise form to
the issue18814.)

From abarnert at yahoo.com  Sat May  9 09:21:53 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 9 May 2015 00:21:53 -0700
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>

On May 8, 2015, at 19:58, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> 
> Koos Zevenhoven writes:
> 
>> As a random example, (root @ mean @ square)(x) would produce the right 
>> order for rms when using [2].
> 
> Hardly interesting. :-)  The result is an exception, as root and square
> are conceptually scalar-to-scalar, while mean is sequence-to-scalar.

Unless you're using an elementwise square and an array-to-scalar mean, like the ones in NumPy, in which case it works perfectly well...

> I suppose you could write (root @ mean @ (map square)) (xs),

Actually, you can't. You could write (root @ mean @ partial(map, square))(xs), but that's pretty clearly less readable than root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's been my main argument: Without a full suite of higher-level operators and related syntax, compose alone doesn't do you any good except for toy examples.

But Koos's example, even if it was possibly inadvertent, shows that I may be wrong about that. Maybe compose together with element-wise operators actually _is_ sufficient for something beyond toy examples.

Of course the fact that we have two groups of people each arguing that obviously the only possible reading of @ is compose/rcompose respectively points out a whole other problem with the idea. If people just were going to have to look up which way it went and learn it through experience, that would be one thing; if everyone already knows intuitively and half of them are wrong, that's a different story...

> which
> seems to support your argument.  But will all such issues and
> solutions give the same support?  This kind of thing is a conceptual
> problem that has to be discussed pretty thoroughly (presumably based
> on experience with implementations) before discussion of order can be
> conclusive.
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From abarnert at yahoo.com  Sat May  9 09:31:54 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 9 May 2015 00:31:54 -0700
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <87mw1es8fp.fsf@vostro.rath.org>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org>
Message-ID: <F647F924-D007-4551-8B06-B46426E2EA4D@yahoo.com>

On May 8, 2015, at 21:04, Nikolaus Rath <Nikolaus at rath.org> wrote:
> 
>> On May 07 2015, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA at public.gmane.org> wrote:
>> But a view would be harmful in this situation:
>> 
>> s = "some string"*1000000
>> t = s[1:2]  # a view maskerading as a new string
>> del s
>> 
>> Now we keep the entire string alive long after it is needed.
>> 
>> How would you solve the first problem without introducing the second?
> 
> Keep track of the reference count of the underlying string, and if it
> goes down to one, turn the view into a copy and remove the sliced
> original?

It sounds like we're talking about an optimization that, although it could have a big benefit in some not too rare cases, could also have a non-negligible cost in incredibly common cases people use every day.

For example, today, "line = line.rstrip()" makes a copy of most of the original string, then discards the original string. With this change, the same line of code builds a view referencing most of line, then gets to some not-quite-a-weakref-destructor, which makes the copy and discards the original string and the view we just built. If line were huge, the small extra alloc and dealloc and refcheck might be unnoticeable noise, but if line is about 70 chars, as it usually will be, I'd expect a much more noticeable difference. And this is exactly the kind of thing you do in a loop 5 million times in a row in Python.

Of course I could be wrong; we won't really know until someone actually builds at least an implementation and tests it.

From jonathan at slenders.be  Sat May  9 09:56:46 2015
From: jonathan at slenders.be (Jonathan Slenders)
Date: Sat, 9 May 2015 09:56:46 +0200
Subject: [Python-ideas] What is happening with array.array('u') in
	Python 4?
In-Reply-To: <CALGmxEK7h8OwLU4ruQTWVznAGj-AVoDQ+Bdv4nNvK-20sVeeDg@mail.gmail.com>
References: <CAKfyG3xTePExzAHZH0ragGoH1HUsc13ssG4MgWqCFKYq9tvGkg@mail.gmail.com>
 <miiber$g87$1@ger.gmane.org>
 <CALGmxEK7h8OwLU4ruQTWVznAGj-AVoDQ+Bdv4nNvK-20sVeeDg@mail.gmail.com>
Message-ID: <CAKfyG3y7V6b-Cr70SKE+ayVyxErWBprAS9YzW-Jrr0HjK5AcgQ@mail.gmail.com>

Thanks a lot,

So, apparently it is possible to use a re bytes pattern to search through
array.array('u') and it works as well for numpy.chararray.

However, I suppose that for doing this you need to have knowledge of the
internal encoding, because re.search will actually compare bytes (from the
pattern) to unicode chars (from the array). So, the bytes have to be
utf32-encoded strings, I suppose.

Currently I have not enough knowledge of how Python strings are
implemented. I'm convinced that it's a good thing to have mutable strings,
but I guess it could indeed be hard to implement.

Cheers,
Jonathan





2015-05-08 22:46 GMT+02:00 Chris Barker <chris.barker at noaa.gov>:

> On Fri, May 8, 2015 at 5:50 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>
>> ISTM that your best bet is currently to look for a suitable module on PyPI
>> that implements mutable character arrays. I'm sure you're not the only one
>> who needs something like that. The usual suspect would be NumPy, but there
>> may be smaller and simpler tools available.
>
>
> Numpy does have mutable character arrays -- and the Unicode version uses
> 4bytes per char, regardless of platform (and so should array.array!)
>
> But I don't think you get much of any of the features of strings, and I
> doubt that the re module would work with it.
>
> A "real" mutable string type might be pretty nice to have , but I think it
> would be pretty hard to d to get it to do everything a string can do. (or
> maybe not -- I suppose you could cut and paste the regular string cdce, and
> simply add the mutable part....)
>
> -Chris
>
>
>
>
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/de5163f4/attachment-0001.html>

From stephen at xemacs.org  Sat May  9 10:36:03 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 09 May 2015 17:36:03 +0900
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
Message-ID: <871tiq407g.fsf@uwakimon.sk.tsukuba.ac.jp>

Andrew Barnert writes:
 > On May 8, 2015, at 19:58, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > > 
 > > Koos Zevenhoven writes:
 > > 
 > >> As a random example, (root @ mean @ square)(x) would produce the right 
 > >> order for rms when using [2].
 > > 
 > > Hardly interesting. :-)  The result is an exception, as root and square
 > > are conceptually scalar-to-scalar, while mean is sequence-to-scalar.
 > 
 > Unless you're using an elementwise square and an array-to-scalar
 > mean, like the ones in NumPy,

Erm, why would square be elementwise and root not?  I would suppose
that everything is element-wise in Numpy (not a user yet).

 > in which case it works perfectly well...

But that's an aspect of my point (evidently, obscure).  Conceptually,
as taught in junior high school or so, root and square are scalar-to-
scalar.  If you are working in a context such as Numpy where it makes
sense to assume they are element-wise and thus composable, the context
should provide the compose operator(s).  Without that context, Koos's
example looks like a TypeError.

 > But Koos's example, even if it was possibly inadvertent, shows that
 > I may be wrong about that. Maybe compose together with element-wise
 > operators actually _is_ sufficient for something beyond toy
 > examples.

Of course it is!<wink />  I didn't really think there was any doubt
about that.  I thought the question was whether there's enough
commonality among such examples to come up with a Pythonic generic
definition of compose, or perhaps a sufficiently compelling example to
enshrine its definition as the "usual" interpretation in Python (and
let other interpretations overload some operator to get that effect in
their contexts).

 > Of course the fact that we have two groups of people each arguing
 > that obviously the only possible reading of @ is compose/rcompose
 > respectively points out a whole other problem with the idea.

I prefer fgh = f(g(h(-))), but I hardly think it's obvious.  Unless
you're *not* Dutch.  (If it were obvious to a Dutchman, we'd have it
already. <wink />)


From breamoreboy at yahoo.co.uk  Sat May  9 10:51:18 2015
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Sat, 09 May 2015 09:51:18 +0100
Subject: [Python-ideas] Why don't CPython strings implement slicing
	using a view?
In-Reply-To: <F647F924-D007-4551-8B06-B46426E2EA4D@yahoo.com>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org>
 <F647F924-D007-4551-8B06-B46426E2EA4D@yahoo.com>
Message-ID: <mikhq8$kf1$1@ger.gmane.org>

On 09/05/2015 08:31, Andrew Barnert via Python-ideas wrote:
> On May 8, 2015, at 21:04, Nikolaus Rath <Nikolaus at rath.org> wrote:
>>
>>> On May 07 2015, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA at public.gmane.org> wrote:
>>> But a view would be harmful in this situation:
>>>
>>> s = "some string"*1000000
>>> t = s[1:2]  # a view maskerading as a new string
>>> del s
>>>
>>> Now we keep the entire string alive long after it is needed.
>>>
>>> How would you solve the first problem without introducing the second?
>>
>> Keep track of the reference count of the underlying string, and if it
>> goes down to one, turn the view into a copy and remove the sliced
>> original?
>
> Of course I could be wrong; we won't really know until someone actually builds at least an implementation and tests it.

Well they can, but I found a major problem with views is that you can't 
compare them and so can't sort them, thus rendering them useless for a 
lot of applications.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence


From mistersheik at gmail.com  Sat May  9 10:53:47 2015
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sat, 9 May 2015 04:53:47 -0400
Subject: [Python-ideas] Why don't CPython strings implement slicing
 using a view?
In-Reply-To: <mikhq8$kf1$1@ger.gmane.org>
References: <3535c298-c113-458b-afc8-b2265b8aca94@googlegroups.com>
 <20150507154621.GU5663@ando.pearwood.info> <87mw1es8fp.fsf@vostro.rath.org>
 <F647F924-D007-4551-8B06-B46426E2EA4D@yahoo.com> <mikhq8$kf1$1@ger.gmane.org>
Message-ID: <CAA68w_nvO5Zoyq7PkGuCb0fdRzjGwPS24vSk4H09q94H4m6e8w@mail.gmail.com>

Why not?  You can compare numpy array views, can't you?

In [4]: a = np.array([1,2])

In [5]: a[1:] < a[:1]
Out[5]: array([False], dtype=bool)

On Sat, May 9, 2015 at 4:51 AM, 'Mark Lawrence' via python-ideas <
python-ideas at googlegroups.com> wrote:

> On 09/05/2015 08:31, Andrew Barnert via Python-ideas wrote:
>
>> On May 8, 2015, at 21:04, Nikolaus Rath <Nikolaus at rath.org> wrote:
>>
>>>
>>>  On May 07 2015, Steven D'Aprano <steve-iDnA/YwAAsAk+I/
>>>> owrrOrA at public.gmane.org> wrote:
>>>> But a view would be harmful in this situation:
>>>>
>>>> s = "some string"*1000000
>>>> t = s[1:2]  # a view maskerading as a new string
>>>> del s
>>>>
>>>> Now we keep the entire string alive long after it is needed.
>>>>
>>>> How would you solve the first problem without introducing the second?
>>>>
>>>
>>> Keep track of the reference count of the underlying string, and if it
>>> goes down to one, turn the view into a copy and remove the sliced
>>> original?
>>>
>>
>> Of course I could be wrong; we won't really know until someone actually
>> builds at least an implementation and tests it.
>>
>
> Well they can, but I found a major problem with views is that you can't
> compare them and so can't sort them, thus rendering them useless for a lot
> of applications.
>
> --
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
>
> Mark Lawrence
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> --- You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/II-4QRDb8Is/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/9cf34e53/attachment.html>

From abarnert at yahoo.com  Sat May  9 12:19:37 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 9 May 2015 03:19:37 -0700
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <871tiq407g.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <871tiq407g.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <E1B0ADEA-5A75-4258-9010-43D9EE71BD99@yahoo.com>

On May 9, 2015, at 01:36, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> 
> Andrew Barnert writes:
>>> On May 8, 2015, at 19:58, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>>> 
>>> Koos Zevenhoven writes:
>>> 
>>>> As a random example, (root @ mean @ square)(x) would produce the right 
>>>> order for rms when using [2].
>>> 
>>> Hardly interesting. :-)  The result is an exception, as root and square
>>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar.
>> 
>> Unless you're using an elementwise square and an array-to-scalar
>> mean, like the ones in NumPy,
> 
> Erm, why would square be elementwise and root not?  I would suppose
> that everything is element-wise in Numpy (not a user yet).

Most functions in NumPy are elementwise when applied to arrays, but can also be applied to scalars. So, square is elementwise because it's called on an array, root is scalar because it's called on a scalar. (In fact, root could also be elementwise--aggregating functions like mean can be applied across just one axis of a 2D or higher array, reducing it by one dimension, if you want.)

Before you try it, this sounds like a complicated nightmare that can't possibly work in practice. But play with it for just a few minutes and it's completely natural. (Except for a few cases where you want some array-wide but not element-wise operation, most famously matrix multiplication, which is why we now have the @ operator to play with.)

>> in which case it works perfectly well...
> 
> But that's an aspect of my point (evidently, obscure).  Conceptually,
> as taught in junior high school or so, root and square are scalar-to-
> scalar.  If you are working in a context such as Numpy where it makes
> sense to assume they are element-wise and thus composable, the context
> should provide the compose operator(s).  

I was actually thinking on these lines: what if @ didn't work on types.FunctionType, but did work on numpy.ufunc (the name for the "universal function" type that knows how to broadcast across arrays but also work on scalars)? That's something NumPy could implement without any help from the core language. (Methods are a minor problem here, but it's obvious how to solve them, so I won't get into it.) And if it turned out to be useful all over the place in NumPy, that might turn up some great uses for the idiomatic non-NumPy Python, or it might show that, like elementwise addition, it's really more a part of NumPy than of Python.

But of course that's more of a proposal for NumPy than for Python.

> Without that context, Koos's
> example looks like a TypeError.

>> But Koos's example, even if it was possibly inadvertent, shows that
>> I may be wrong about that. Maybe compose together with element-wise
>> operators actually _is_ sufficient for something beyond toy
>> examples.
> 
> Of course it is!<wink />  I didn't really think there was any doubt
> about that.  

I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours). 

I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one.

From tjreedy at udel.edu  Sat May  9 17:20:53 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 09 May 2015 11:20:53 -0400
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <E1B0ADEA-5A75-4258-9010-43D9EE71BD99@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <871tiq407g.fsf@uwakimon.sk.tsukuba.ac.jp>
 <E1B0ADEA-5A75-4258-9010-43D9EE71BD99@yahoo.com>
Message-ID: <mil8kq$26i$1@ger.gmane.org>

On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote:

> I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours).
>
> I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one.

I agree that @ is most likely to be usefull in numpy's restricted context.

A composition operator is usually defined by application: f at g(x) is 
defined as f(g(x)).  (I sure there are also axiomatic treatments.)  It 
is an optional syntactic abbreviation. It is most useful in a context 
where there is one set of data objects, such as the real numbers, or one 
set + arrays (vectors) defined on the one set; where all function are 
univariate (or possible multivariate, but that can can be transformed to 
univariate on vectors); *and* where parameter names are dummies like 
'x', 'y', 'z', or '_'.

The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g 
does not lose any information as 'x' is basically a placeholder (so get 
rid of it).  But parameter names are important in most practical 
contexts, both for understanding a composition and for using it.

dev npv(transfers, discount):
     '''Return the net present value of discounted transfers.

     transfers: finite iterable of amounts at constant intervals
     discount: fraction per interval
     '''
     divisor = 1 + discount
     return sum(tranfer/divisor**time
                 for time, transfer in enumerate(transfers))

Even if one could replace the def statement with
npv = <some combination of @, sum, map, add, div, power, enumerate, ...>
with parameter names omitted, it would be harder to understand.  Using 
it would require the ability to infer argument types and order from the 
composed expression.

I intentionally added a statement to calculate the common subexpression 
prior to the return. I believe it would have to put back in the return 
expression before converting.

-- 
Terry Jan Reedy


From ron3200 at gmail.com  Sat May  9 17:38:38 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Sat, 09 May 2015 11:38:38 -0400
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
Message-ID: <mil9lv$hi2$1@ger.gmane.org>



On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>> >I suppose you could write (root @ mean @ (map square)) (xs),

> Actually, you can't. You could write (root @ mean @ partial(map,
> square))(xs), but that's pretty clearly less readable than
> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
> been my main argument: Without a full suite of higher-level operators
> and related syntax, compose alone doesn't do you any good except for toy
> examples.

How about an operator for partial?

           root @ mean @ map $ square(xs)


Actually I'd rather reuse the binary operators.  (I'd be happy if they were 
just methods on bytes objects BTW.)

           compose(root, mean, map(square, xs))

           root ^ mean ^ map & square (xs)

           root ^ mean ^ map & square ^ xs ()

Read this as...

          compose root, of mean, of map with square, of xs

Or...

           apply(map(square, xs), mean, root)

           map & square | mean | root (xs)

           xs | map & square | mean | root ()


Read this as...

           apply xs, to map with square, to mean, to root


These are kind of cool, but does it make python code easier to read?  That 
seems like it may be subjective depending on the amount of programming 
experience someone has.

Cheers,
    Ron





From apieum at gmail.com  Sat May  9 18:08:12 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Sat, 9 May 2015 18:08:12 +0200
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <mil9lv$hi2$1@ger.gmane.org>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
Message-ID: <CAAZsQLB3RDK9=4j9Rn6AZ6eBUu-PWTJVqof0J02nm6bB2TU10Q@mail.gmail.com>

Hi,
I had to answer some of these questions when I wrote Lawvere:
https://pypi.python.org/pypi/lawvere

First, there is two kind of composition: pipe and circle so I think a
single operator like @ is a bit restrictive.
I like "->" and "<-"

Then, for function name and function to string I had to introduce function
signature (a tuple).
It provides a good tool for decomposition, introspection and comparison in
respect with mathematic definition.

Finally, for me composition make sense when you have typed functions
otherwise it can easily become a mess and this make composition tied to
multiple dispatch.

I really hope composition will be introduced in python but I can't see how
it be made without rethinking a good part of function definition.



2015-05-09 17:38 GMT+02:00 Ron Adam <ron3200 at gmail.com>:

>
>
> On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>
>> >I suppose you could write (root @ mean @ (map square)) (xs),
>>>
>>
>  Actually, you can't. You could write (root @ mean @ partial(map,
>> square))(xs), but that's pretty clearly less readable than
>> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>> been my main argument: Without a full suite of higher-level operators
>> and related syntax, compose alone doesn't do you any good except for toy
>> examples.
>>
>
> How about an operator for partial?
>
>           root @ mean @ map $ square(xs)
>
>
> Actually I'd rather reuse the binary operators.  (I'd be happy if they
> were just methods on bytes objects BTW.)
>
>           compose(root, mean, map(square, xs))
>
>           root ^ mean ^ map & square (xs)
>
>           root ^ mean ^ map & square ^ xs ()
>
> Read this as...
>
>          compose root, of mean, of map with square, of xs
>
> Or...
>
>           apply(map(square, xs), mean, root)
>
>           map & square | mean | root (xs)
>
>           xs | map & square | mean | root ()
>
>
> Read this as...
>
>           apply xs, to map with square, to mean, to root
>
>
> These are kind of cool, but does it make python code easier to read?  That
> seems like it may be subjective depending on the amount of programming
> experience someone has.
>
> Cheers,
>    Ron
>
>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/cb3ad4c7/attachment-0001.html>

From steve at pearwood.info  Sat May  9 20:16:43 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 May 2015 04:16:43 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <mil9lv$hi2$1@ger.gmane.org>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
Message-ID: <20150509181642.GB5663@ando.pearwood.info>

On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote:

> How about an operator for partial?
> 
>           root @ mean @ map $ square(xs)

Apart from the little matter that Guido has said that $ will never be 
used as an operator in Python, what is the association between $ and 
partial?

Most other operators have either been used for centuries e.g. + and - or 
at least decades e.g. * for multiplication because ASCII doesn't have 
the ? symbol. The barrier to using a completely arbitrary symbol with no 
association to the function it plays should be considered very high.

I would only support an operator for function composition if it was at 
least close to the standard operators used for function composition in 
other areas. @ at least suggests the ? used in mathematics, e.g. 
sin?cos, but | is used in pipelining languages and shells and could be 
considered, e.g. ls | wc.

My own preference would be to look at @ as the closest available ASCII 
symbol to ? and use it for left-to-right composition, and | for 
left-to-right function application. E.g.

(spam @ eggs @ cheese)(arg) is equivalent to spam(eggs(cheese(arg)))

(spam | eggs | cheese)(arg) is equivalent to cheese(eggs(spam(arg)))

also known as compose() and rcompose().

We can read "@" as "of", "spam of eggs of cheese of arg", and | as 
a pipe, "spam(arg) piped to eggs piped to cheese".

It's a pity we can't match the shell syntax and write:

spam(args)|eggs|cheese

but that would have a completely different meaning.


David Beazley has a tutorial on using coroutines in pipelines:

http://www.dabeaz.com/coroutines/

where he ends up writing this:

    f = open("access-log")
    follow(f,
           grep('python',
           printer()))


Coroutines grep() and printer() make up the pipeline. I cannot help but 
feel that the | syntax would be especially powerful for this sort of 
data processing purpose:

    # could this work using some form of function composition?
    follow(f, grep('python')|printer)



-- 
Steve

From mertz at gnosis.cx  Sat May  9 20:30:17 2015
From: mertz at gnosis.cx (David Mertz)
Date: Sat, 9 May 2015 13:30:17 -0500
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <20150509181642.GB5663@ando.pearwood.info>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <20150509181642.GB5663@ando.pearwood.info>
Message-ID: <CAEbHw4awMVzdWBbZt2HoZozwoTEWtCPPfoCPTRzUqWTEigbkFQ@mail.gmail.com>

On Sat, May 9, 2015 at 1:16 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote:
>
> > How about an operator for partial?
> >
> >           root @ mean @ map $ square(xs)
>

I have trouble seeing the advantage of a special function composition
operator when it is easy to write a general 'compose()' function that can
produce such things easily enough.

E.g. in a white paper I just did for O'Reilly on _Functional Programming in
Python_ I propose this little example implementation:

def compose(*funcs):
    "Return a new function s.t. compose(f,g,...)(x) == f(g(...(x)))"
    def inner(data, funcs=funcs):
        result = data
        for f in reversed(funcs):
            result = f(result)
        return result
    return inner

Which we might use as:

  RMS = compose(root, mean, square)
  result = RMS(my_array)

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/dc548e12/attachment.html>

From donald at stufft.io  Sat May  9 20:33:21 2015
From: donald at stufft.io (Donald Stufft)
Date: Sat, 9 May 2015 14:33:21 -0400
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAEbHw4awMVzdWBbZt2HoZozwoTEWtCPPfoCPTRzUqWTEigbkFQ@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <20150509181642.GB5663@ando.pearwood.info>
 <CAEbHw4awMVzdWBbZt2HoZozwoTEWtCPPfoCPTRzUqWTEigbkFQ@mail.gmail.com>
Message-ID: <B9E2361B-73A4-4F1A-81D1-21FB281DC0A7@stufft.io>


> On May 9, 2015, at 2:30 PM, David Mertz <mertz at gnosis.cx> wrote:
> 
> On Sat, May 9, 2015 at 1:16 PM, Steven D'Aprano <steve at pearwood.info <mailto:steve at pearwood.info>> wrote:
> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote:
> 
> > How about an operator for partial?
> >
> >           root @ mean @ map $ square(xs)
> 
> I have trouble seeing the advantage of a special function composition operator when it is easy to write a general 'compose()' function that can produce such things easily enough.
> 
> E.g. in a white paper I just did for O'Reilly on _Functional Programming in Python_ I propose this little example implementation:
> 
> def compose(*funcs):
>     "Return a new function s.t. compose(f,g,...)(x) == f(g(...(x)))"
>     def inner(data, funcs=funcs):
>         result = data
>         for f in reversed(funcs):
>             result = f(result)
>         return result
>     return inner
> 
> Which we might use as:
> 
>   RMS = compose(root, mean, square)
>   result = RMS(my_array)


Maybe functools.compose?

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/23b0dbd8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/23b0dbd8/attachment.sig>

From koos.zevenhoven at aalto.fi  Sat May  9 21:15:21 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Sat, 9 May 2015 22:15:21 +0300
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
Message-ID: <554E5CC9.3010406@aalto.fi>


On 2015-05-09 21:16, Steven D'Aprano wrote:
> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote:
>
>> How about an operator for partial?
>>
>>            root @ mean @ map $ square(xs)
> Apart from the little matter that Guido has said that $ will never be
> used as an operator in Python, what is the association between $ and
> partial?
>
> Most other operators have either been used for centuries e.g. + and - or
> at least decades e.g. * for multiplication because ASCII doesn't have
> the ? symbol. The barrier to using a completely arbitrary symbol with no
> association to the function it plays should be considered very high.
>
> I would only support an operator for function composition if it was at
> least close to the standard operators used for function composition in
> other areas. @ at least suggests the ? used in mathematics, e.g.
> sin?cos, but | is used in pipelining languages and shells and could be
> considered, e.g. ls | wc.
>
> My own preference would be to look at @ as the closest available ASCII
> symbol to ? and use it for left-to-right composition, and | for
> left-to-right function application. E.g.
>
> (spam @ eggs @ cheese)(arg) is equivalent to spam(eggs(cheese(arg)))
>
> (spam | eggs | cheese)(arg) is equivalent to cheese(eggs(spam(arg)))
>
> also known as compose() and rcompose().
> We can read "@" as "of", "spam of eggs of cheese of arg", and | as
> a pipe, "spam(arg) piped to eggs piped to cheese".

For me these are by far the most logical ones too, for exactly the same 
reasons (and because of the connection of @ with matrix multiplication 
and operators that operate from the left).

> It's a pity we can't match the shell syntax and write:
>
> spam(args)|eggs|cheese
>
> but that would have a completely different meaning.
>


But it does not need to have a different meaning. You could in addition 
have:

spam @ eggs @ cheese @ arg   #  equivalent to spam(eggs(cheese(arg)))

arg | spam | eggs | cheese    # equivalent to cheese(eggs(spam(arg)))

Here, arg would thus be recognized as not a function.

In this version, your example of spam(args)|eggs|cheese would do exactly 
the same operation as (spam | eggs | cheese)(args) :-).


> David Beazley has a tutorial on using coroutines in pipelines:
>
> http://www.dabeaz.com/coroutines/
>
> where he ends up writing this:
>
>      f = open("access-log")
>      follow(f,
>             grep('python',
>             printer()))
>
>
> Coroutines grep() and printer() make up the pipeline. I cannot help but
> feel that the | syntax would be especially powerful for this sort of
> data processing purpose:
>
>      # could this work using some form of function composition?
>      follow(f, grep('python')|printer)
>
>
>

This seems promising!


-- Koos


From apieum at gmail.com  Sat May  9 22:41:24 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Sat, 9 May 2015 22:41:24 +0200
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <554E5CC9.3010406@aalto.fi>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
Message-ID: <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>

pipeline operator may be confusing with bitwise operator.
In this case :
eggs = arg | spam | cheese

Is eggs a composed function or string of bits ?


2015-05-09 21:15 GMT+02:00 Koos Zevenhoven <koos.zevenhoven at aalto.fi>:

>
> On 2015-05-09 21:16, Steven D'Aprano wrote:
>
>> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote:
>>
>>  How about an operator for partial?
>>>
>>>            root @ mean @ map $ square(xs)
>>>
>> Apart from the little matter that Guido has said that $ will never be
>> used as an operator in Python, what is the association between $ and
>> partial?
>>
>> Most other operators have either been used for centuries e.g. + and - or
>> at least decades e.g. * for multiplication because ASCII doesn't have
>> the ? symbol. The barrier to using a completely arbitrary symbol with no
>> association to the function it plays should be considered very high.
>>
>> I would only support an operator for function composition if it was at
>> least close to the standard operators used for function composition in
>> other areas. @ at least suggests the ? used in mathematics, e.g.
>> sin?cos, but | is used in pipelining languages and shells and could be
>> considered, e.g. ls | wc.
>>
>> My own preference would be to look at @ as the closest available ASCII
>> symbol to ? and use it for left-to-right composition, and | for
>> left-to-right function application. E.g.
>>
>> (spam @ eggs @ cheese)(arg) is equivalent to spam(eggs(cheese(arg)))
>>
>> (spam | eggs | cheese)(arg) is equivalent to cheese(eggs(spam(arg)))
>>
>> also known as compose() and rcompose().
>> We can read "@" as "of", "spam of eggs of cheese of arg", and | as
>> a pipe, "spam(arg) piped to eggs piped to cheese".
>>
>
> For me these are by far the most logical ones too, for exactly the same
> reasons (and because of the connection of @ with matrix multiplication and
> operators that operate from the left).
>
>  It's a pity we can't match the shell syntax and write:
>>
>> spam(args)|eggs|cheese
>>
>> but that would have a completely different meaning.
>>
>>
>
> But it does not need to have a different meaning. You could in addition
> have:
>
> spam @ eggs @ cheese @ arg   #  equivalent to spam(eggs(cheese(arg)))
>
> arg | spam | eggs | cheese    # equivalent to cheese(eggs(spam(arg)))
>
> Here, arg would thus be recognized as not a function.
>
> In this version, your example of spam(args)|eggs|cheese would do exactly
> the same operation as (spam | eggs | cheese)(args) :-).
>
>
>  David Beazley has a tutorial on using coroutines in pipelines:
>>
>> http://www.dabeaz.com/coroutines/
>>
>> where he ends up writing this:
>>
>>      f = open("access-log")
>>      follow(f,
>>             grep('python',
>>             printer()))
>>
>>
>> Coroutines grep() and printer() make up the pipeline. I cannot help but
>> feel that the | syntax would be especially powerful for this sort of
>> data processing purpose:
>>
>>      # could this work using some form of function composition?
>>      follow(f, grep('python')|printer)
>>
>>
>>
>>
> This seems promising!
>
>
> -- Koos
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/a772c816/attachment.html>

From apieum at gmail.com  Sun May 10 00:03:24 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Sun, 10 May 2015 00:03:24 +0200
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
Message-ID: <CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>

Nobody convinced by arrow operator ?

like: arg -> spam -> eggs -> cheese
or cheese <- eggs <- spam <- arg

This also make sense with annotations:

def func(x:type1, y:type2) -> type3:
    pass

we expect func to return type3(func(x, y))



2015-05-09 22:41 GMT+02:00 Gregory Salvan <apieum at gmail.com>:

> pipeline operator may be confusing with bitwise operator.
> In this case :
> eggs = arg | spam | cheese
>
> Is eggs a composed function or string of bits ?
>
>
> 2015-05-09 21:15 GMT+02:00 Koos Zevenhoven <koos.zevenhoven at aalto.fi>:
>
>>
>> On 2015-05-09 21:16, Steven D'Aprano wrote:
>>
>>> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote:
>>>
>>>  How about an operator for partial?
>>>>
>>>>            root @ mean @ map $ square(xs)
>>>>
>>> Apart from the little matter that Guido has said that $ will never be
>>> used as an operator in Python, what is the association between $ and
>>> partial?
>>>
>>> Most other operators have either been used for centuries e.g. + and - or
>>> at least decades e.g. * for multiplication because ASCII doesn't have
>>> the ? symbol. The barrier to using a completely arbitrary symbol with no
>>> association to the function it plays should be considered very high.
>>>
>>> I would only support an operator for function composition if it was at
>>> least close to the standard operators used for function composition in
>>> other areas. @ at least suggests the ? used in mathematics, e.g.
>>> sin?cos, but | is used in pipelining languages and shells and could be
>>> considered, e.g. ls | wc.
>>>
>>> My own preference would be to look at @ as the closest available ASCII
>>> symbol to ? and use it for left-to-right composition, and | for
>>> left-to-right function application. E.g.
>>>
>>> (spam @ eggs @ cheese)(arg) is equivalent to spam(eggs(cheese(arg)))
>>>
>>> (spam | eggs | cheese)(arg) is equivalent to cheese(eggs(spam(arg)))
>>>
>>> also known as compose() and rcompose().
>>> We can read "@" as "of", "spam of eggs of cheese of arg", and | as
>>> a pipe, "spam(arg) piped to eggs piped to cheese".
>>>
>>
>> For me these are by far the most logical ones too, for exactly the same
>> reasons (and because of the connection of @ with matrix multiplication and
>> operators that operate from the left).
>>
>>  It's a pity we can't match the shell syntax and write:
>>>
>>> spam(args)|eggs|cheese
>>>
>>> but that would have a completely different meaning.
>>>
>>>
>>
>> But it does not need to have a different meaning. You could in addition
>> have:
>>
>> spam @ eggs @ cheese @ arg   #  equivalent to spam(eggs(cheese(arg)))
>>
>> arg | spam | eggs | cheese    # equivalent to cheese(eggs(spam(arg)))
>>
>> Here, arg would thus be recognized as not a function.
>>
>> In this version, your example of spam(args)|eggs|cheese would do exactly
>> the same operation as (spam | eggs | cheese)(args) :-).
>>
>>
>>  David Beazley has a tutorial on using coroutines in pipelines:
>>>
>>> http://www.dabeaz.com/coroutines/
>>>
>>> where he ends up writing this:
>>>
>>>      f = open("access-log")
>>>      follow(f,
>>>             grep('python',
>>>             printer()))
>>>
>>>
>>> Coroutines grep() and printer() make up the pipeline. I cannot help but
>>> feel that the | syntax would be especially powerful for this sort of
>>> data processing purpose:
>>>
>>>      # could this work using some form of function composition?
>>>      follow(f, grep('python')|printer)
>>>
>>>
>>>
>>>
>> This seems promising!
>>
>>
>> -- Koos
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150510/28942e60/attachment.html>

From abarnert at yahoo.com  Sun May 10 00:45:26 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 9 May 2015 15:45:26 -0700
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <mil9lv$hi2$1@ger.gmane.org>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
Message-ID: <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com>

On May 9, 2015, at 08:38, Ron Adam <ron3200 at gmail.com> wrote:
> 
> 
> 
> On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>>> >I suppose you could write (root @ mean @ (map square)) (xs),
> 
>> Actually, you can't. You could write (root @ mean @ partial(map,
>> square))(xs), but that's pretty clearly less readable than
>> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>> been my main argument: Without a full suite of higher-level operators
>> and related syntax, compose alone doesn't do you any good except for toy
>> examples.
> 
> How about an operator for partial?
> 
>          root @ mean @ map $ square(xs)

I'm pretty sure that anyone who sees that and doesn't interpret it as meaningless nonsense is going to interpret it as a variation on Haskell and get the wrong intuition.

But, more importantly, this doesn't work. Your square(xs) isn't going to evaluate to a function, but to a whatever falling square on xs returns. (Which is presumably a TypeError, or you wouldn't be looking to map in the first place). And, even if that did work, you're not actually composing a function here anyway; your @ is just a call operator, which we already have in Python, spelled with parens.

> Actually I'd rather reuse the binary operators.  (I'd be happy if they were just methods on bytes objects BTW.)
> 
>          compose(root, mean, map(square, xs))

Now you're not calling square(xs), but you are calling map(square, xs), which is going to return an iterable of squares, not a function; again, you're not composing a function object at all.

And think about how you'd actually write this correctly. You need to either use lambda (which defeats the entire purpose of compose), or partial (which works, but is clumsy and ugly enough without an operator or syntactic sugar that people rarely use it).
> 
>          root ^ mean ^ map & square (xs)
> 
>          root ^ mean ^ map & square ^ xs ()
> 
> Read this as...
> 
>         compose root, of mean, of map with square, of xs

But that's not composing. The whole point of compose is that you can compose root of mean of mappings square over some argument to be passed in later, and the result is itself a function over some argument to be passed in later.

What you're doing doesn't add any new abstraction, it just obfuscates normal function application.

> Or...
> 
>          apply(map(square, xs), mean, root)
> 
>          map & square | mean | root (xs)
> 
>          xs | map & square | mean | root ()
> 
> 
> Read this as...
> 
>          apply xs, to map with square, to mean, to root
> 
> 
> These are kind of cool, but does it make python code easier to read?  That seems like it may be subjective depending on the amount of programming experience someone has.
> 
> Cheers,
>   Ron
> 
> 
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From abarnert at yahoo.com  Sun May 10 00:49:22 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 9 May 2015 15:49:22 -0700
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <B9E2361B-73A4-4F1A-81D1-21FB281DC0A7@stufft.io>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <20150509181642.GB5663@ando.pearwood.info>
 <CAEbHw4awMVzdWBbZt2HoZozwoTEWtCPPfoCPTRzUqWTEigbkFQ@mail.gmail.com>
 <B9E2361B-73A4-4F1A-81D1-21FB281DC0A7@stufft.io>
Message-ID: <3A90E8AC-ED40-4BD6-A895-F523246B173E@yahoo.com>

On May 9, 2015, at 11:33, Donald Stufft <donald at stufft.io> wrote:
> 
> 
>> On May 9, 2015, at 2:30 PM, David Mertz <mertz at gnosis.cx> wrote:
>> 
>>> On Sat, May 9, 2015 at 1:16 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>>> On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote:
>>> 
>>> > How about an operator for partial?
>>> >
>>> >           root @ mean @ map $ square(xs)
>> 
>> I have trouble seeing the advantage of a special function composition operator when it is easy to write a general 'compose()' function that can produce such things easily enough.
>> 
>> E.g. in a white paper I just did for O'Reilly on _Functional Programming in Python_ I propose this little example implementation:
>> 
>> def compose(*funcs):
>>     "Return a new function s.t. compose(f,g,...)(x) == f(g(...(x)))"
>>     def inner(data, funcs=funcs):
>>         result = data
>>         for f in reversed(funcs):
>>             result = f(result)
>>         return result
>>     return inner
>> 
>> Which we might use as:
>> 
>>   RMS = compose(root, mean, square)
>>   result = RMS(my_array)
> 
> 
> Maybe functools.compose?

But why?

This is trivial to write.

The nontrivial part is thinking through whether you want left or right compose, what you want to do about multiple arguments, etc. So, unless we can solve _that_ problem by showing that there is one and only one obvious answer, we don't gain anything by implementing one of the many trivial-to-implement possibilities in the stdlib.

Maybe as a recipe in the docs, it would be worth showing two different compose functions to demonstrate how easy it is to write whichever one you want (and how important it is to figure out which one you want).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/044c541b/attachment.html>

From koos.zevenhoven at aalto.fi  Sun May 10 01:07:19 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Sun, 10 May 2015 02:07:19 +0300
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
Message-ID: <554E9327.9030706@aalto.fi>

On 10.5.2015 1:03, Gregory Salvan wrote:
> Nobody convinced by arrow operator ?
>
> like: arg -> spam -> eggs -> cheese
> or cheese <- eggs <- spam <- arg
>
>

I like | a lot because of the pipe analogy. However, having a new 
operator for this could solve some issues about operator precedence.

Today, I sketched one possible version that would use a new .. operator. 
I'll explain what it would do (but with your -> instead of my ..)

Here, the operator (.. or ->) would have a higher precedence than 
function calls () but a lower precedence than attribute access (obj.attr).

First, with single-argument functions spam, eggs and cheese, and a 
non-function arg:

arg->eggs->spam->cheese()   # equivalent to cheese(spam(eggs(arg)))
eggs->spam->cheese  # equivalent to lambda arg: cheese(spam(eggs(arg)))

Then if, spam and eggs both took two arguments; eggs(arg1, arg2), 
spam(arg1, arg2)

arg->eggs   # equivalent to partial(eggs, arg)
eggs->spam(a, b, c)   # equivalent to spam(eggs(a, b), c)
arg->eggs->spam(b,c)  # equivalent to spam(eggs(arg, b), c)

So you could think of -> as an extended partial operator. And this would 
naturally generalize to functions with even more arguments. The 
arguments would always be fed in the same order as in the equivalent 
function call, which makes for a nice rule of thumb. However, I suppose 
one would usually avoid combinations that are difficult to understand.

Some examples that this would enable:

  # Example 1
  from numpy import square, mean, sqrt
  rms = square->mean->sqrt  # I think this order is fine because it is not @

  # Example 2 (both are equivalent)
  spam(args)->eggs->cheese() # the shell-syntax analogy that Steven 
mentioned.

  # Example 3
  # Last but not least, we would finally have this :)
  some_sequence->len()
  some_object->isinstance(MyType)

-- Koos

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150510/55a35085/attachment.html>

From levkivskyi at gmail.com  Sun May 10 01:28:29 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Sun, 10 May 2015 01:28:29 +0200
Subject: [Python-ideas] Function composition (was no subject)
Message-ID: <CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>

I was thinking about recent ideas discussed here. I also returned back to
origins of my initial idea. The point is that it came from Numpy, I use
Numpy arrays everyday, and typically I do exactly something like
root(mean(square(data))).

Now I am thinking: what is actually a matrix? It is something that takes a
vector and returns a vector. But on the other hand the same actually do
elementwise functions. It does not really matter, what we do with a vector:
transform by a product of matrices or by composition of functions. In other
words I agree with Andrew that "elementwise" is a good match with compose,
and what we really need is to "pipe" things that take a vector (or just an
iterable) and return a vector (iterable).

So that probably a good place (in a potential future) for compose would be
not functools but itertools. But indeed a good place to test this would be
Numpy.

An additional comment: it is indeed good to have both @ and | for compose
and rcompose.
Side note, one can actually overload __rmatmul__ on arrays as well so that
you can write

root @ mean @ square @ data

Moreover, one can overload __or__ on arrays, so that one can write

data | square | mean | root

even with ordinary functions (not Numpy's ufuncs or composable) . These
examples are actually "flat is better than nested" in the extreme form.

Anyway, they (Numpy) are going to implement the @ operator for arrays, may
be it would be a good idea to check that if something on the left from me
(array) is not an array but a callable then apply it elementwise.

Concerning the multi-argument functions, I don't like $ symbol, don't know
why. It seems really unintuitive why it means partial application.
One can autocurry composable functions and apply same rules that Numpy uses
for ufuncs.
More precisely, if I write

add(data1, data2)

with arrays it applies add pairwise. But if I write

add(data1, 42)

it is also fine, it simply adds 42 to every element. With autocurrying one
could write

root @ mean @ add(data) @ square @ data2

or

root @ mean @ square @ add(42) @ data

However, as I see it now it is not very readable, so that may be the best
choise is to reserve @ and | for "piping" iterables through transformers
that take one argument. In other words it should be left to user to make
add(42) of an appropriate type. It is the same logic as for decorators, if
I write

@modify(arg)
def func(x):
    return None

I must care that modify(arg) evaluates to something that takes one callable
and returns a callable.


On May 9, 2015, at 01:36, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> >
> > Andrew Barnert writes:
> >>> On May 8, 2015, at 19:58, Stephen J. Turnbull <stephen at xemacs.org>
> wrote:
> >>>
> >>> Koos Zevenhoven writes:
> >>>
> >>>> As a random example, (root @ mean @ square)(x) would produce the right
> >>>> order for rms when using [2].
> >>>
> >>> Hardly interesting. :-)  The result is an exception, as root and square
> >>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar.
> >>
> >> Unless you're using an elementwise square and an array-to-scalar
> >> mean, like the ones in NumPy,
> >
> > Erm, why would square be elementwise and root not?  I would suppose
> > that everything is element-wise in Numpy (not a user yet).
>
> Most functions in NumPy are elementwise when applied to arrays, but can
> also be applied to scalars. So, square is elementwise because it's called
> on an array, root is scalar because it's called on a scalar. (In fact, root
> could also be elementwise--aggregating functions like mean can be applied
> across just one axis of a 2D or higher array, reducing it by one dimension,
> if you want.)
>
> Before you try it, this sounds like a complicated nightmare that can't
> possibly work in practice. But play with it for just a few minutes and it's
> completely natural. (Except for a few cases where you want some array-wide
> but not element-wise operation, most famously matrix multiplication, which
> is why we now have the @ operator to play with.)
>
> >> in which case it works perfectly well...
> >
> > But that's an aspect of my point (evidently, obscure).  Conceptually,
> > as taught in junior high school or so, root and square are scalar-to-
> > scalar.  If you are working in a context such as Numpy where it makes
> > sense to assume they are element-wise and thus composable, the context
> > should provide the compose operator(s).
>
> I was actually thinking on these lines: what if @ didn't work on
> types.FunctionType, but did work on numpy.ufunc (the name for the
> "universal function" type that knows how to broadcast across arrays but
> also work on scalars)? That's something NumPy could implement without any
> help from the core language. (Methods are a minor problem here, but it's
> obvious how to solve them, so I won't get into it.) And if it turned out to
> be useful all over the place in NumPy, that might turn up some great uses
> for the idiomatic non-NumPy Python, or it might show that, like elementwise
> addition, it's really more a part of NumPy than of Python.
>
> But of course that's more of a proposal for NumPy than for Python.
>
> > Without that context, Koos's
> > example looks like a TypeError.
>
> >> But Koos's example, even if it was possibly inadvertent, shows that
> >> I may be wrong about that. Maybe compose together with element-wise
> >> operators actually _is_ sufficient for something beyond toy
> >> examples.
> >
> > Of course it is!<wink />  I didn't really think there was any doubt
> > about that.
>
> I think there was, and still is. People keep coming up with abstract toy
> examples, but as soon as someone tries to give a good real example, it only
> makes sense with NumPy (Koos's) or with some syntax that Python doesn't
> have (yours), because to write them with actual Python functions would
> actually be ugly and verbose (my version of yours).
>
> I don't think that's a coincidence. You didn't write "map square" because
> you don't know how to think in Python, but because using compose profitably
> inherently implies not thinking in Python. (Except, maybe, in the case of
> NumPy... which is a different idiom.) Maybe someone has a bunch of obvious
> good use cases for compose that don't also require other functions,
> operators, or syntax we don't have, but so far, nobody's mentioned one.
>
> ------------------------------
>
> On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote:
>
> > I think there was, and still is. People keep coming up with abstract toy
> examples, but as soon as someone tries to give a good real example, it only
> makes sense with NumPy (Koos's) or with some syntax that Python doesn't
> have (yours), because to write them with actual Python functions would
> actually be ugly and verbose (my version of yours).
> >
> > I don't think that's a coincidence. You didn't write "map square"
> because you don't know how to think in Python, but because using compose
> profitably inherently implies not thinking in Python. (Except, maybe, in
> the case of NumPy... which is a different idiom.) Maybe someone has a bunch
> of obvious good use cases for compose that don't also require other
> functions, operators, or syntax we don't have, but so far, nobody's
> mentioned one.
>
> I agree that @ is most likely to be usefull in numpy's restricted context.
>
> A composition operator is usually defined by application: f at g(x) is
> defined as f(g(x)).  (I sure there are also axiomatic treatments.)  It
> is an optional syntactic abbreviation. It is most useful in a context
> where there is one set of data objects, such as the real numbers, or one
> set + arrays (vectors) defined on the one set; where all function are
> univariate (or possible multivariate, but that can can be transformed to
> univariate on vectors); *and* where parameter names are dummies like
> 'x', 'y', 'z', or '_'.
>
> The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g
> does not lose any information as 'x' is basically a placeholder (so get
> rid of it).  But parameter names are important in most practical
> contexts, both for understanding a composition and for using it.
>
> dev npv(transfers, discount):
>      '''Return the net present value of discounted transfers.
>
>      transfers: finite iterable of amounts at constant intervals
>      discount: fraction per interval
>      '''
>      divisor = 1 + discount
>      return sum(tranfer/divisor**time
>                  for time, transfer in enumerate(transfers))
>
> Even if one could replace the def statement with
> npv = <some combination of @, sum, map, add, div, power, enumerate, ...>
> with parameter names omitted, it would be harder to understand.  Using
> it would require the ability to infer argument types and order from the
> composed expression.
>
> I intentionally added a statement to calculate the common subexpression
> prior to the return. I believe it would have to put back in the return
> expression before converting.
>
> --
> Terry Jan Reedy
>
>
>
> ------------------------------
>
> On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
> >> >I suppose you could write (root @ mean @ (map square)) (xs),
>
> > Actually, you can't. You could write (root @ mean @ partial(map,
> > square))(xs), but that's pretty clearly less readable than
> > root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
> > been my main argument: Without a full suite of higher-level operators
> > and related syntax, compose alone doesn't do you any good except for toy
> > examples.
>
> How about an operator for partial?
>
>            root @ mean @ map $ square(xs)
>
>
> Actually I'd rather reuse the binary operators.  (I'd be happy if they were
> just methods on bytes objects BTW.)
>
>            compose(root, mean, map(square, xs))
>
>            root ^ mean ^ map & square (xs)
>
>            root ^ mean ^ map & square ^ xs ()
>
> Read this as...
>
>           compose root, of mean, of map with square, of xs
>
> Or...
>
>            apply(map(square, xs), mean, root)
>
>            map & square | mean | root (xs)
>
>            xs | map & square | mean | root ()
>
>
> Read this as...
>
>            apply xs, to map with square, to mean, to root
>
>
> These are kind of cool, but does it make python code easier to read?  That
> seems like it may be subjective depending on the amount of programming
> experience someone has.
>
> Cheers,
>     Ron
>
>
>
> ------------------------------
>
> Hi,
> I had to answer some of these questions when I wrote Lawvere:
> https://pypi.python.org/pypi/lawvere
>
> First, there is two kind of composition: pipe and circle so I think a
> single operator like @ is a bit restrictive.
> I like "->" and "<-"
>
> Then, for function name and function to string I had to introduce function
> signature (a tuple).
> It provides a good tool for decomposition, introspection and comparison in
> respect with mathematic definition.
>
> Finally, for me composition make sense when you have typed functions
> otherwise it can easily become a mess and this make composition tied to
> multiple dispatch.
>
> I really hope composition will be introduced in python but I can't see how
> it be made without rethinking a good part of function definition.
>
>
>
> 2015-05-09 17:38 GMT+02:00 Ron Adam <ron3200 at gmail.com>:
>
> >
> >
> > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
> >
> >> >I suppose you could write (root @ mean @ (map square)) (xs),
> >>>
> >>
> >  Actually, you can't. You could write (root @ mean @ partial(map,
> >> square))(xs), but that's pretty clearly less readable than
> >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
> >> been my main argument: Without a full suite of higher-level operators
> >> and related syntax, compose alone doesn't do you any good except for toy
> >> examples.
> >>
> >
> > How about an operator for partial?
> >
> >           root @ mean @ map $ square(xs)
> >
> >
> > Actually I'd rather reuse the binary operators.  (I'd be happy if they
> > were just methods on bytes objects BTW.)
> >
> >           compose(root, mean, map(square, xs))
> >
> >           root ^ mean ^ map & square (xs)
> >
> >           root ^ mean ^ map & square ^ xs ()
> >
> > Read this as...
> >
> >          compose root, of mean, of map with square, of xs
> >
> > Or...
> >
> >           apply(map(square, xs), mean, root)
> >
> >           map & square | mean | root (xs)
> >
> >           xs | map & square | mean | root ()
> >
> >
> > Read this as...
> >
> >           apply xs, to map with square, to mean, to root
> >
> >
> > These are kind of cool, but does it make python code easier to read?
> That
> > seems like it may be subjective depending on the amount of programming
> > experience someone has.
> >
> > Cheers,
> >    Ron
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150510/fdc7df3d/attachment-0001.html>

From abarnert at yahoo.com  Sun May 10 02:05:06 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 9 May 2015 17:05:06 -0700
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
References: <CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
Message-ID: <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com>

On May 9, 2015, at 16:28, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> 
> I was thinking about recent ideas discussed here. I also returned back to origins of my initial idea. The point is that it came from Numpy, I use Numpy arrays everyday, and typically I do exactly something like root(mean(square(data))).
> 
> Now I am thinking: what is actually a matrix? It is something that takes a vector and returns a vector. But on the other hand the same actually do elementwise functions. It does not really matter, what we do with a vector: transform by a product of matrices or by composition of functions. In other words I agree with Andrew that "elementwise" is a good match with compose, and what we really need is to "pipe" things that take a vector (or just an iterable) and return a vector (iterable).
> 
> So that probably a good place (in a potential future) for compose would be not functools but itertools. But indeed a good place to test this would be Numpy.

Itertools is an interesting idea.

Anyway, assuming NumPy isn't going to add this in the near future (has anyone even brought it up on the NumPy list, or only here?), it wouldn't be that hard to write a (maybe inefficient but working) @composable wrapper and wrap all the relevant callables from NumPy or from itertools, upload it to PyPI, and let people start coming up with good examples. If it's later worth direct support in NumPy and/or Python (for simplicity or performance), the module will still be useful for backward compatibility.

> An additional comment: it is indeed good to have both @ and | for compose and rcompose.
> Side note, one can actually overload __rmatmul__ on arrays as well so that you can write
> 
> root @ mean @ square @ data

But this doesn't need to overload it on arrays, only on the utuncs, right?

Unless you're suggesting that one of these operations could be a matrix as easily as a function, and NumPy users often won't have to care which it is?

> 
> Moreover, one can overload __or__ on arrays, so that one can write
> 
> data | square | mean | root
> 
> even with ordinary functions (not Numpy's ufuncs or composable) .

That's an interesting point. But I think this will be a bit confusing, because now it _does_ matter whether square is a matrix or a function--you'll get elementwise bitwise or instead of application. (And really, this is the whole reason for @ in the first place--we needed an operator that never means elementwise.)

Also, this doesn't let you actually compose functions--if you want square | mean | root to be a function, square has to have a __or__ operator.

> These examples are actually "flat is better than nested" in the extreme form. 
> 
> Anyway, they (Numpy) are going to implement the @ operator for arrays, may be it would be a good idea to check that if something on the left from me (array) is not an array but a callable then apply it elementwise.
> 
> Concerning the multi-argument functions, I don't like $ symbol, don't know why. It seems really unintuitive why it means partial application.
> One can autocurry composable functions and apply same rules that Numpy uses for ufuncs.
> More precisely, if I write 
> 
> add(data1, data2) 
> 
> with arrays it applies add pairwise. But if I write 
> 
> add(data1, 42) 
> 
> it is also fine, it simply adds 42 to every element. With autocurrying one could write 
> 
> root @ mean @ add(data) @ square @ data2
> 
> or
> 
> root @ mean @ square @ add(42) @ data  
> 
> However, as I see it now it is not very readable, so that may be the best choise is to reserve @ and | for "piping" iterables through transformers that take one argument. In other words it should be left to user to make add(42) of an appropriate type. It is the same logic as for decorators, if I write
> 
> @modify(arg)
> def func(x):
>     return None
> 
> I must care that modify(arg) evaluates to something that takes one callable and returns a callable.
> 
> 
>> On May 9, 2015, at 01:36, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> >
>> > Andrew Barnert writes:
>> >>> On May 8, 2015, at 19:58, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> >>>
>> >>> Koos Zevenhoven writes:
>> >>>
>> >>>> As a random example, (root @ mean @ square)(x) would produce the right
>> >>>> order for rms when using [2].
>> >>>
>> >>> Hardly interesting. :-)  The result is an exception, as root and square
>> >>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar.
>> >>
>> >> Unless you're using an elementwise square and an array-to-scalar
>> >> mean, like the ones in NumPy,
>> >
>> > Erm, why would square be elementwise and root not?  I would suppose
>> > that everything is element-wise in Numpy (not a user yet).
>> 
>> Most functions in NumPy are elementwise when applied to arrays, but can also be applied to scalars. So, square is elementwise because it's called on an array, root is scalar because it's called on a scalar. (In fact, root could also be elementwise--aggregating functions like mean can be applied across just one axis of a 2D or higher array, reducing it by one dimension, if you want.)
>> 
>> Before you try it, this sounds like a complicated nightmare that can't possibly work in practice. But play with it for just a few minutes and it's completely natural. (Except for a few cases where you want some array-wide but not element-wise operation, most famously matrix multiplication, which is why we now have the @ operator to play with.)
>> 
>> >> in which case it works perfectly well...
>> >
>> > But that's an aspect of my point (evidently, obscure).  Conceptually,
>> > as taught in junior high school or so, root and square are scalar-to-
>> > scalar.  If you are working in a context such as Numpy where it makes
>> > sense to assume they are element-wise and thus composable, the context
>> > should provide the compose operator(s).
>> 
>> I was actually thinking on these lines: what if @ didn't work on types.FunctionType, but did work on numpy.ufunc (the name for the "universal function" type that knows how to broadcast across arrays but also work on scalars)? That's something NumPy could implement without any help from the core language. (Methods are a minor problem here, but it's obvious how to solve them, so I won't get into it.) And if it turned out to be useful all over the place in NumPy, that might turn up some great uses for the idiomatic non-NumPy Python, or it might show that, like elementwise addition, it's really more a part of NumPy than of Python.
>> 
>> But of course that's more of a proposal for NumPy than for Python.
>> 
>> > Without that context, Koos's
>> > example looks like a TypeError.
>> 
>> >> But Koos's example, even if it was possibly inadvertent, shows that
>> >> I may be wrong about that. Maybe compose together with element-wise
>> >> operators actually _is_ sufficient for something beyond toy
>> >> examples.
>> >
>> > Of course it is!<wink />  I didn't really think there was any doubt
>> > about that.
>> 
>> I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours).
>> 
>> I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one.
>> 
>> ------------------------------
>> 
>> On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote:
>> 
>> > I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours).
>> >
>> > I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one.
>> 
>> I agree that @ is most likely to be usefull in numpy's restricted context.
>> 
>> A composition operator is usually defined by application: f at g(x) is
>> defined as f(g(x)).  (I sure there are also axiomatic treatments.)  It
>> is an optional syntactic abbreviation. It is most useful in a context
>> where there is one set of data objects, such as the real numbers, or one
>> set + arrays (vectors) defined on the one set; where all function are
>> univariate (or possible multivariate, but that can can be transformed to
>> univariate on vectors); *and* where parameter names are dummies like
>> 'x', 'y', 'z', or '_'.
>> 
>> The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g
>> does not lose any information as 'x' is basically a placeholder (so get
>> rid of it).  But parameter names are important in most practical
>> contexts, both for understanding a composition and for using it.
>> 
>> dev npv(transfers, discount):
>>      '''Return the net present value of discounted transfers.
>> 
>>      transfers: finite iterable of amounts at constant intervals
>>      discount: fraction per interval
>>      '''
>>      divisor = 1 + discount
>>      return sum(tranfer/divisor**time
>>                  for time, transfer in enumerate(transfers))
>> 
>> Even if one could replace the def statement with
>> npv = <some combination of @, sum, map, add, div, power, enumerate, ...>
>> with parameter names omitted, it would be harder to understand.  Using
>> it would require the ability to infer argument types and order from the
>> composed expression.
>> 
>> I intentionally added a statement to calculate the common subexpression
>> prior to the return. I believe it would have to put back in the return
>> expression before converting.
>> 
>> --
>> Terry Jan Reedy
>> 
>> 
>> 
>> ------------------------------
>> 
>> On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>> >> >I suppose you could write (root @ mean @ (map square)) (xs),
>> 
>> > Actually, you can't. You could write (root @ mean @ partial(map,
>> > square))(xs), but that's pretty clearly less readable than
>> > root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>> > been my main argument: Without a full suite of higher-level operators
>> > and related syntax, compose alone doesn't do you any good except for toy
>> > examples.
>> 
>> How about an operator for partial?
>> 
>>            root @ mean @ map $ square(xs)
>> 
>> 
>> Actually I'd rather reuse the binary operators.  (I'd be happy if they were
>> just methods on bytes objects BTW.)
>> 
>>            compose(root, mean, map(square, xs))
>> 
>>            root ^ mean ^ map & square (xs)
>> 
>>            root ^ mean ^ map & square ^ xs ()
>> 
>> Read this as...
>> 
>>           compose root, of mean, of map with square, of xs
>> 
>> Or...
>> 
>>            apply(map(square, xs), mean, root)
>> 
>>            map & square | mean | root (xs)
>> 
>>            xs | map & square | mean | root ()
>> 
>> 
>> Read this as...
>> 
>>            apply xs, to map with square, to mean, to root
>> 
>> 
>> These are kind of cool, but does it make python code easier to read?  That
>> seems like it may be subjective depending on the amount of programming
>> experience someone has.
>> 
>> Cheers,
>>     Ron
>> 
>> 
>> 
>> ------------------------------
>> 
>> Hi,
>> I had to answer some of these questions when I wrote Lawvere:
>> https://pypi.python.org/pypi/lawvere
>> 
>> First, there is two kind of composition: pipe and circle so I think a
>> single operator like @ is a bit restrictive.
>> I like "->" and "<-"
>> 
>> Then, for function name and function to string I had to introduce function
>> signature (a tuple).
>> It provides a good tool for decomposition, introspection and comparison in
>> respect with mathematic definition.
>> 
>> Finally, for me composition make sense when you have typed functions
>> otherwise it can easily become a mess and this make composition tied to
>> multiple dispatch.
>> 
>> I really hope composition will be introduced in python but I can't see how
>> it be made without rethinking a good part of function definition.
>> 
>> 
>> 
>> 2015-05-09 17:38 GMT+02:00 Ron Adam <ron3200 at gmail.com>:
>> 
>> >
>> >
>> > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>> >
>> >> >I suppose you could write (root @ mean @ (map square)) (xs),
>> >>>
>> >>
>> >  Actually, you can't. You could write (root @ mean @ partial(map,
>> >> square))(xs), but that's pretty clearly less readable than
>> >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>> >> been my main argument: Without a full suite of higher-level operators
>> >> and related syntax, compose alone doesn't do you any good except for toy
>> >> examples.
>> >>
>> >
>> > How about an operator for partial?
>> >
>> >           root @ mean @ map $ square(xs)
>> >
>> >
>> > Actually I'd rather reuse the binary operators.  (I'd be happy if they
>> > were just methods on bytes objects BTW.)
>> >
>> >           compose(root, mean, map(square, xs))
>> >
>> >           root ^ mean ^ map & square (xs)
>> >
>> >           root ^ mean ^ map & square ^ xs ()
>> >
>> > Read this as...
>> >
>> >          compose root, of mean, of map with square, of xs
>> >
>> > Or...
>> >
>> >           apply(map(square, xs), mean, root)
>> >
>> >           map & square | mean | root (xs)
>> >
>> >           xs | map & square | mean | root ()
>> >
>> >
>> > Read this as...
>> >
>> >           apply xs, to map with square, to mean, to root
>> >
>> >
>> > These are kind of cool, but does it make python code easier to read?  That
>> > seems like it may be subjective depending on the amount of programming
>> > experience someone has.
>> >
>> > Cheers,
>> >    Ron
>> >
>> >
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150509/bb062d55/attachment-0001.html>

From koos.zevenhoven at aalto.fi  Sun May 10 02:51:38 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Sun, 10 May 2015 03:51:38 +0300
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
Message-ID: <554EAB9A.2090501@aalto.fi>

On 10.5.2015 2:28, Ivan Levkivskyi wrote:
> functions. In other words I agree with Andrew that "elementwise" is a 
> good match with compose, and what we really need is to "pipe" things 
> that take a vector (or just an iterable) and return a vector (iterable).
>
> So that probably a good place (in a potential future) for compose 
> would be not functools but itertools. But indeed a good place to test 
> this would be Numpy.
>

Another way to deal with elementwise operations on iterables would be to 
make a small, mostly backwards compatible change in map:

When map is called with just one argument, for instance map(square), it 
would return a function that takes iterables and maps them element-wise.

Now it would be easier to use map in pipelines, for example:

rms = sqrt @ mean @ map(square)

or

values->map(square)->mean->sqrt()

Or if the change in map is not popular, there could be something like 
functools.mapper(func) that does that. Or even something more crazy, 
like square.map(seq), so that square.map could be used in pipelines.

-- Koos

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150510/e8a4a603/attachment.html>

From steve at pearwood.info  Sun May 10 04:56:30 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 May 2015 12:56:30 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <554E5CC9.3010406@aalto.fi>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
Message-ID: <20150510025630.GC5663@ando.pearwood.info>

On Sat, May 09, 2015 at 10:15:21PM +0300, Koos Zevenhoven wrote:
> 
> On 2015-05-09 21:16, Steven D'Aprano wrote:
[...]

> >It's a pity we can't match the shell syntax and write:
> >
> >spam(args)|eggs|cheese
> >
> >but that would have a completely different meaning.
> 
> But it does not need to have a different meaning.

It *should* have a different meaning. I want it to have a different 
meaning. Python is not the shell and spam(args) could be a factory 
function which itself returns a callable, e.g. partial, or a decorator. 
We cannot match the shell syntax because Python can do so much more than 
the shell.


> You could in addition have:
> 
> spam @ eggs @ cheese @ arg   #  equivalent to spam(eggs(cheese(arg)))
> 
> arg | spam | eggs | cheese    # equivalent to cheese(eggs(spam(arg)))
> 
> Here, arg would thus be recognized as not a function.

No. I think it is absolutely vital to distinguish by syntax the 
difference between composition and function application, and not try to 
"do what I mean". DWIM software has a bad history of doing the wrong 
thing.

Every other kind of callable uses obj(arg) to call it: types, functions, 
methods, partial objects, etc. We shouldn't make function composition 
try to be different. If I write sqrt at 100 I should get a runtime error, 
not 10.

I don't mind if the error is delayed until I actually try to call the 
composed object, but at some point I should get a TypeError that 100 is 
not callable.


-- 
Steve

From steve at pearwood.info  Sun May 10 05:01:46 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 May 2015 13:01:46 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
Message-ID: <20150510030145.GD5663@ando.pearwood.info>

On Sat, May 09, 2015 at 10:41:24PM +0200, Gregory Salvan wrote:
> pipeline operator may be confusing with bitwise operator.
> In this case :
> eggs = arg | spam | cheese
> 
> Is eggs a composed function or string of bits ?

Or a set?

I think it is okay to overload operators and give them different 
meanings:

z = x + y

Is z a number, a string, a list, a tuple? Something else?

In practice, we rely on sensible names or context to understand 
overloaded operators, if you see

foo = search | grep | log | process
it = (foo(x) for x in data)
run(it)

it should be fairly obvious from context that foo is not a set or string 
of bits :-)



-- 
Steve

From ron3200 at gmail.com  Sun May 10 05:08:32 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Sat, 09 May 2015 23:08:32 -0400
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com>
Message-ID: <mimi3h$mlj$1@ger.gmane.org>



On 05/09/2015 06:45 PM, Andrew Barnert via Python-ideas wrote:
> On May 9, 2015, at 08:38, Ron Adam<ron3200 at gmail.com>  wrote:
>> >
>> >
>> >
>> >On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>>>>> >>> >I suppose you could write (root @ mean @ (map square)) (xs),
>> >
>>> >>Actually, you can't. You could write (root @ mean @ partial(map,
>>> >>square))(xs), but that's pretty clearly less readable than
>>> >>root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>>> >>been my main argument: Without a full suite of higher-level operators
>>> >>and related syntax, compose alone doesn't do you any good except for toy
>>> >>examples.
>> >
>> >How about an operator for partial?
>> >
>> >          root @ mean @ map $ square(xs)

> I'm pretty sure that anyone who sees that and doesn't interpret it as
> meaningless nonsense is going to interpret it as a variation on Haskell and
> get the wrong intuition.

Yes, I agree that is the problems with it.

> But, more importantly, this doesn't work. Your square(xs) isn't going
> to  evaluate to a function, but to a whatever falling square on xs returns.
> (Which is presumably a TypeError, or you wouldn't be looking to map in the
> first place). And, even if that did work, you're not actually composing a
> function here anyway; your @ is just a call operator, which we already have
> in Python, spelled with parens.

This is following the patterns being discussed in the thread.  (or at least 
an attempt to do so.)

The @ and $ above would bind more tightly than the ().  Like the doc "." 
does for method calls.  But the evaluation is from left to right at call 
time.  The calling part does not need to be done at the same times the rest 
is done.  Or at least that is what I got from the conversation.

      f = root @ mean @ map & square
      result = f(xs)

The other examples would work the same.


>> >Actually I'd rather reuse the binary operators.  (I'd be happy if they were just methods on bytes objects BTW.)
>> >
>> >          compose(root, mean, map(square, xs))

> Now you're not calling square(xs), but you are calling map(square, xs),
> which is going to return an iterable of squares, not a function; again,
> you're not composing a function object at all.

Yes, this is what directly calling the functions to do the same thing would 
look like.  Except without returning a composed function.


> And think about how you'd actually write this correctly. You need to
> either use lambda (which defeats the entire purpose of compose), or partial
> (which works, but is clumsy and ugly enough without an operator or
> syntactic sugar that people rarely use it).

The advantage of the syntax is that it is a "potentially" (a matter of 
opinion) alternative to using lambda.  And apparently there are a few here 
who think doing it with lambda's or other means is less than ideal.


Personally I'm not convinced yet either.

Cheers,
   Ron


From steve at pearwood.info  Sun May 10 05:14:56 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 May 2015 13:14:56 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
Message-ID: <20150510031455.GE5663@ando.pearwood.info>

On Sun, May 10, 2015 at 12:03:24AM +0200, Gregory Salvan wrote:
> Nobody convinced by arrow operator ?
> 
> like: arg -> spam -> eggs -> cheese
> or cheese <- eggs <- spam <- arg

Absolutely not!

If we were designing a new language from scratch, I might consider arrow 
operators. I think that they are cute.

But this proposal is going to be hard enough to get approval using 
*existing* operators, | __or__ and @ __mat_mul__ (if I remember the 
dunder methods correctly).

To convince people that we should support function composition as a 
built-in feature, using NEW operators that will need the parser changed 
to recognise, and new dunder methods, well, that will be virtually 
impossible. numpy is one of the biggest and most important user bases 
for Python, and it took them something like ten years and multiple 
failed attempts to get enough support for adding the @ operator.

You *might* just have a chance for a -> right arrow operator, just 
barely, but the left arrow <- operator is, I'm pretty sure, doomed to 
failure. The problem is that the parser would need to distinguish these 
two cases:

f<-x   # f left-arrow x

f<-x  # f less than minus x

and I don't think that is possible with Python's parser.



-- 
Steve

From steve at pearwood.info  Sun May 10 05:20:16 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 May 2015 13:20:16 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <554EAB9A.2090501@aalto.fi>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <554EAB9A.2090501@aalto.fi>
Message-ID: <20150510032016.GF5663@ando.pearwood.info>

On Sun, May 10, 2015 at 03:51:38AM +0300, Koos Zevenhoven wrote:

> Another way to deal with elementwise operations on iterables would be to 
> make a small, mostly backwards compatible change in map:
> 
> When map is called with just one argument, for instance map(square), it 
> would return a function that takes iterables and maps them element-wise.
> 
> Now it would be easier to use map in pipelines, for example:
> 
> rms = sqrt @ mean @ map(square)

Or just use a tiny helper function:

def vectorise(func):
    return partial(map, func)

rms = sqrt @ mean @ vectorise(square)


-- 
Steve

From larocca at abiresearch.com  Sun May 10 06:58:29 2015
From: larocca at abiresearch.com (Douglas La Rocca)
Date: Sun, 10 May 2015 04:58:29 +0000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <20150510032016.GF5663@ando.pearwood.info>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <554EAB9A.2090501@aalto.fi>,<20150510032016.GF5663@ando.pearwood.info>
Message-ID: <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>

(Newcomer here.)

I use function composition pretty extensively. I've found it to be incredibly powerful, but can lead to bad practices. Certain other drawbacks are there as well, like unreadable tracebacks. But in many cases there are real benefits. And for data pipelines where you want to avoid state and mutation it works well.

The fn and pymonad modules implement infix composition functions through overloading but I've found this to be unworkable.

For me, the ideal infix operator would simply be a space, with the composition wrapped in parentheses. So e.g.

>>> (list str sorted)(range(10))
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ',', ',', ',', ',', ',', ',', ',', ',', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '[', ']']
 
I might be overlooking something, but it seems to me this would work with existing syntax and semantics and wouldn't conflict with anything else like operator overloading would. The only other place non-indentation level spaces are significant is with keywords which can't be re-assigned. So e.g. (yield from gen()) wouldn't be parsed as 3 functions, and (def func) would raise SyntaxError.

Here's the composition function I'm working with, stripped of the little debugging helpers:

```
def compose(*fns):
    def compose_(*x):
        fn, *fns = fns
        value = fn(*x)
        if fns:
            return compose(*fns)(value)
        else:
            return value
    return compose_

O=compose
```

I haven't had any issues with the recursion. The `O` alias rubs me the wrong way but seemed to make sense at the time. The thought was that it should look like an operator because it acts like one.

So the use looks like

>>> O(fn1, fn2, fn3, ...)('string to be piped')

The problem for composition is essentially argument passing and has to do with the convenience of *args, **kwargs.

The way to make composition work predictably is to curry the functions yourself, wrapping the arguments you expect to get with nested closures, then repairing the __name__ etc with functools.wraps or update_wrapper in the usual way. This looks much nicer and almost natural when you write it with lambdas, e.g.

>>> getitem = lambda item: lambda container: container[item]

(Apologies for having named that lambda there...)

The other way to manage passing values from one function to the next is to define a function like

def star(x):
    return lambda fn: fn(*x)

Then if you get a list at one point in the pipeline and your function takes *args, you can decorate the function and call it like 

>>> star(getattr)((getattr, '__name__'))
'getattr'

I've run into problems using the @curried decorators from the fn and pymonad modules because they don't how to handle *args, i.e. when to stop collecting arguments and finally make the function call.

If you want to have the composition order reversed you could decorate the definition with

```
def flip(f):
    def flip_(*x):
        f(*reversed(x))
    return flip_
```

Once we have composition we can write partials for `map`, `filter`, and `reduce`, but with a small twist: make them variadic in the first argument and pass the arguments to compose:

def fmap(*fn):
    def fmap_(x):
        return list(map(compose(*fn),x))
    return fmap_

def ffilter(fn):
    def ffilter_(xs):
        return list(filter(fn, xs))
    return ffilter_

def freduce(fn):
    def _freduce(xs):
        return reduce(fn, xs)
    return _freduce

def Fmap(*fns):
    def Fmap_(x):
        return list(map(lambda fn:fn(x), fns))
    return Fmap_

The `Fmap` function seemed like some sort of "conjugate" to `fmap` so I tried to give it name suggesting this (again, at the expense of abusing naming conventions).

Instead of mapping a function over a iterable like `fmap`, `Fmap` applies a each given function to a value. So

>>> Fmap(add(1), sub(1))(1)
[2, 0]

I've called them `fmap`, `ffilter`, and `freduce` but don't much like these names as they imply they might be the same as Haskell's `fmap`, and they're not. And there's no way to make them anything like Haskell as far as I can tell and they shouldn't be. If these implement a "paradigm" it's not purely functional but tacit/concatenative.

It made sense to compose the passed arguments because there's no reason to pass anything else to `fmap` in the first call. So sequential calls to (the return value of) `fmap` inside a pipeline, like

>>> O(mul(10),
...   fmap(add(1)), 
...   fmap(mul(2))
...  )([1])
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4]                                          
                                                                        
can instead be written like

>>> O(mul(10),
...   fmap(add(1), 
...        mul(2))
...  )([1])
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
                                                                        
It also makes it easier to work at different levels inside nested structures. In these heavily nested cases the composition pipeline even begins to resemble the data structure passing through, which makes sense.

As another example, following is part of a pipeline that takes strings of bullet-separated strings of "key:value" pairs and converts each one to a dictionary, then folds the result together:

>>> d = [' foo00 : bar00 ? foo01 : bar01 ', 
...      ' foo10 : bar10 ? foo11 : bar11 ', 
...      ' foo20 : bar10 ? foo21 : bar21 ',]

>>> dict_foldl = freduce(lambda d1, d2: dict(d1, **d2))
>>> strip = lambda x: lambda s: s.strip(x)
>>> split = lambda x: lambda s: s.split(x)

>>> f = O(fmap(strip(' '),
...            split('?'), 
...            fmap(split(':'),
...                 strip(' '), 
...                 tuple), 
...            tuple,
...            dict),
...       dict_foldl)

>>> f(d)
{'foo00': 'bar00',
 'foo01': 'bar01',
 'foo10': 'bar10',
 'foo11': 'bar11',
 'foo20': 'bar10',
 'foo21': 'bar21'}

The combination of `compose`, `fmap`, and `Fmap` can be amazingly powerful for doing lots of work in a neat way while keeping the focus on the pipeline itself and not the individual values passing through.

The other thing is that this opens the door to a full "algebra" of maps which is kind of insane:

def mapeach(*fns):
    def mapeach_(*xs): 
        return list(map(lambda fn, *x: fn(*x), fns, *xs))
    return mapeach_

def product_map(fns):
    return lambda xs: list(map(lambda x: map(lambda fn: fn(x), fns), xs))

def smap(*fns):
    "star map"
    return lambda xs: list(map(O(*fns),*xs))

def pmap(*fns):
    return lambda *xs: list(map(lambda *x:list(map(lambda fn:fn(*x),fns)),*xs))

def matrix_map(*_fns):
    def matrix_map_(*_xs):
        return list(map(lambda fns, xs: list(map(lambda fn, x: fmap(fn)(x), fns, xs)), _fns, _xs))
    return matrix_map_

def mapcat(*fn):
    "clojure-inspired?"
    return compose(fmap(*fn), freduce(list.__add__))

def filtercat(*fn):
    return compose(ffilter(*fn), freduce(list.__add__))

I rarely use any of these of these. They grew out of an attempt to tease out some hidden structure behind the combination of `map` and star packing/unpacking.

I do think there's something there but the names get in the way--it would be better to find a way to define a function that takes a specification of the structures of functions and values and knows what to do, e.g. something like

>>> from types import FunctionType
>>> fn = FunctionType
>>> # then the desired/imaginary version of map...
>>> _map(fn, [int])(add(1))(range(5)) # sort of like `fmap`
[1,2,3,4,5]
>>> _map([fn], [int])((add(x) for x in range(5)))(range(5)) # sort of like `mapeach`
[0,2,4,6,8]
>>> _map([[fn]], [[int]])(((add(x) for x in range(5))*10))((list(range(5)))*10) # sort of like `matrix_map`
[[[0, 1, 2, 3, 4],
  [1, 2, 3, 4, 5],
  [2, 3, 4, 5, 6],
  [3, 4, 5, 6, 7],
  [4, 5, 6, 7, 8],
  [0, 1, 2, 3, 4],
  [1, 2, 3, 4, 5],
  [2, 3, 4, 5, 6],
  [3, 4, 5, 6, 7],
  [4, 5, 6, 7, 8]]]

In most cases the first argument would just be `fn`, but it would be *really* nice to be able to do something like

>>> map(fn, [[int], [[int],[[str],[str]]]])

where all you need to do is give the schema and indicate which values to apply the function to. Giving the type would be an added measure, but passing `type` in the schema for unknowns should work just as well.
________________________________________
From: Python-ideas <python-ideas-bounces+larocca=abiresearch.com at python.org> on behalf of Steven D'Aprano <steve at pearwood.info>
Sent: Saturday, May 09, 2015 11:20 PM
To: python-ideas at python.org
Subject: Re: [Python-ideas] Function composition (was no subject)

On Sun, May 10, 2015 at 03:51:38AM +0300, Koos Zevenhoven wrote:

> Another way to deal with elementwise operations on iterables would be to
> make a small, mostly backwards compatible change in map:
>
> When map is called with just one argument, for instance map(square), it
> would return a function that takes iterables and maps them element-wise.
>
> Now it would be easier to use map in pipelines, for example:
>
> rms = sqrt @ mean @ map(square)

Or just use a tiny helper function:

def vectorise(func):
    return partial(map, func)

rms = sqrt @ mean @ vectorise(square)


--
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas at python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

From abarnert at yahoo.com  Sun May 10 07:24:00 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 9 May 2015 22:24:00 -0700
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <mimi3h$mlj$1@ger.gmane.org>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com> <mimi3h$mlj$1@ger.gmane.org>
Message-ID: <A278F1E4-67FF-44AE-B5B0-EDCE53072E03@yahoo.com>

On May 9, 2015, at 20:08, Ron Adam <ron3200 at gmail.com> wrote:
> 
>> On 05/09/2015 06:45 PM, Andrew Barnert via Python-ideas wrote:
>>> On May 9, 2015, at 08:38, Ron Adam<ron3200 at gmail.com>  wrote:
>>> >
>>> >
>>> >
>>> >On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>>>>>> >>> >I suppose you could write (root @ mean @ (map square)) (xs),
>>> >
>>>> >>Actually, you can't. You could write (root @ mean @ partial(map,
>>>> >>square))(xs), but that's pretty clearly less readable than
>>>> >>root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>>>> >>been my main argument: Without a full suite of higher-level operators
>>>> >>and related syntax, compose alone doesn't do you any good except for toy
>>>> >>examples.
>>> >
>>> >How about an operator for partial?
>>> >
>>> >          root @ mean @ map $ square(xs)
> 
>> I'm pretty sure that anyone who sees that and doesn't interpret it as
>> meaningless nonsense is going to interpret it as a variation on Haskell and
>> get the wrong intuition.
> 
> Yes, I agree that is the problems with it.
> 
>> But, more importantly, this doesn't work. Your square(xs) isn't going
>> to  evaluate to a function, but to a whatever falling square on xs returns.
>> (Which is presumably a TypeError, or you wouldn't be looking to map in the
>> first place). And, even if that did work, you're not actually composing a
>> function here anyway; your @ is just a call operator, which we already have
>> in Python, spelled with parens.
> 
> This is following the patterns being discussed in the thread.  (or at least an attempt to do so.)
> 
> The @ and $ above would bind more tightly than the ().  Like the doc "." does for method calls.  

@ can't bind more tightly than (). The operator already exists (that's the whole reason people are suggesting it for compose), and it has the same precedence as *.

And even if you could change that, you wouldn't want to. Just as 2 * f(a) calls f on a and then multiplies by 2, b @ f(a) will call f on a and then matrix-multiply it by b; it would be very confusing if it matrix-multiplied b and f and then called the result on a.

I think I know what you're going for here. Half the reason Haskell has an apply operator even though adjacency already means apply is so it can have different precedence from adjacency. And if you don't like that, you can define your own infix operator with a different string of symbols and a different precedence or even associativity but the same body. That allows you to play all kinds of neat tricks like what you're trying to, where you can write almost anything without parentheses and it means exactly what it looks like. Of course you can just as easily write something that means something completely different from what it looks like... But you have to actually work the operators through carefully, not just wave your hands and say "something like this"; when "this" actually doesn't mean what you want it to, you need to define a new operator that does. And, while allowing users to define enough operators to eliminate all the parens and all the lambdas works great for Haskell, I don't think it's a road that Python should follow.

> But the evaluation is from left to right at call time.  The calling part does not need to be done at the same times the rest is done.  Or at least that is what I got from the conversation.
> 
>     f = root @ mean @ map & square
>     result = f(xs)

But that means (root @ mean @ map) & square. Assuming you intended function.__and__ to mean partial, you have to write root @ mean @ (map & square), or create a new operator that has the precedence you want.

> The other examples would work the same.

Exactly: they don't work, either because you've got the precedence wrong, or because you've got an explicit function call rather than something that defines or references a function, and it doesn't make sense to compose that (well, except when the explicit call is to a higher-order function that returns a function, but that wasn't true of any of the examples).

>>> >Actually I'd rather reuse the binary operators.  (I'd be happy if they were just methods on bytes objects BTW.)
>>> >
>>> >          compose(root, mean, map(square, xs))
> 
>> Now you're not calling square(xs), but you are calling map(square, xs),
>> which is going to return an iterable of squares, not a function; again,
>> you're not composing a function object at all.
> 
> Yes, this is what directly calling the functions to do the same thing would look like.  Except without returning a composed function.

I don't understand what you mean. The same thing as what? Neither directly calling the functions, nor your proposed thing, returns a composed function (because, again, the last argument is not a function, it's an iterator returned by a function that you called directly).

>> And think about how you'd actually write this correctly. You need to
>> either use lambda (which defeats the entire purpose of compose), or partial
>> (which works, but is clumsy and ugly enough without an operator or
>> syntactic sugar that people rarely use it).
> 
> The advantage of the syntax is that it is a "potentially" (a matter of opinion) alternative to using lambda.  

Not your syntax. All of your examples that do anything just call a function immediately, rather than defining a function to be called later, so they can't replace uses of lambda. For example, your compose(root, mean, map(square, xs)) doesn't define a new function anywhere, so no part of it can replace a lambda.

The earlier examples actually do attempt to replace uses of lambda. Stephen's compose(root, mean, map square) returns a function. The problem with his suggestion is that map square isn't valid Python syntax--and if it were, that new syntax would be the thing that replaces a need for lambda, not the compose function. Which is obvious if you look at how you'd write that in valid Python syntax: compose(root, mean, lambda xs: map(square, xs)).

I've used the compose(...) form instead of the @ operator form, but the result is exactly the same either way.

> And apparently there are a few here who think doing it with lambda's or other means is less than ideal.

I agree with them--but I don't think adding compose to Python, either as a stdlib function or as an operator--actually solves that problem. If we had auto-curried functions and adjacency as apply and a suite of HOFs like flip and custom infix operators and operator sectioning and so on, then the lack of compose would be a problem that forced people to write unnecessary lambda expressions (although still not a huge problem, since it's so trivial to write). But with none of those things, adding compose doesn't actually help you avoid lambdas, except in a few contrived cases. (And maybe in NumPy-like array processing.)

> Personally I'm not convinced yet either.
> 
> Cheers,
>  Ron
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From steve at pearwood.info  Sun May 10 08:01:11 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 May 2015 16:01:11 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
Message-ID: <20150510060111.GH5663@ando.pearwood.info>

On Sun, May 10, 2015 at 04:58:29AM +0000, Douglas La Rocca wrote:
> (Newcomer here.)
> 
> I use function composition pretty extensively. I've found it to be 
> incredibly powerful, but can lead to bad practices. Certain other 
> drawbacks are there as well, like unreadable tracebacks. But in many 
> cases there are real benefits. And for data pipelines where you want 
> to avoid state and mutation it works well.

Thanks for the well-thought out and very detailed post!

The concrete experience you bring to this discussion is a welcome change 
from all the theoretical "wouldn't it be nice (or awful) if ..." from 
many of us, and I include myself. The fact that you have extensive 
experience with using function composition in practice, and can point 
out the benefits and disadvantages, is great.


-- 
Steve

From larocca at abiresearch.com  Sun May 10 08:06:37 2015
From: larocca at abiresearch.com (Douglas La Rocca)
Date: Sun, 10 May 2015 06:06:37 +0000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <20150510060111.GH5663@ando.pearwood.info>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>,
 <20150510060111.GH5663@ando.pearwood.info>
Message-ID: <0af83e9ead994e639aa41a2f5e678a61@swordfish.abiresearch.com>

Thanks! Not sure what took me so long to get on the python lists, but I finally did and to my excitement you were talking about my favorite topic!

---

For replacing the need to write `lambda x: x...` inside compositions *in a limited set of cases*, you could use a sort of "doppelganger" type/metaclass:

class ThisType(type):
    def __getattr__(cls, attr):
        def attribute(*args, **kwargs):
            def method(this):
                this_attr = getattr(this, attr)
                if callable(this_attr):
                    return this_attr(*args, **kwargs)
                else:
                    return this_attr
            return method
        return attribute
    def __call__(cls, *args, **kwargs):
        def decorator(fn):
            return fn(*args, **kwargs)
        return decorator
    def __getitem__(cls, item):
        return lambda x: x[item]

class this(metaclass=ThisType): pass


Basically, it records whatever is done to it, then returns a function that takes a value and does those things to the value. So any call, __getattr__ with arguments, and __getitem__ you'd want to do with a value mid-pipe would be staged or set up by doing them to `this`.

So rather than writing 
>>> compose(lambda s: s.strip('<>'), lambda s: s.lower())('<HTML>')

you can write

>>> compose(this.strip('<>'), this.lower())('<HTML>')
'html'

or

>>> compose(float, this.__str__)('1')
'1.0'

But there are two caveats:

Property attributes would need to be *called*, which feels weird when you already know an API well, so e.g.

>>> from lxml import html
>>> html.fromstring('<b>bold text</b>').text
'bold text'
>>> compose(html.fromstring, this.text())('<b>bold text</b>')
'bold text'

It's also a bit weird because attributes that return functions/methods/callables *aren't* called (like above with `this.__str__`: `__str__` is a method of `float`).

Second caveat is that nothing past the __getitem__ and __getattr__ will work, so e.g.

>>> from pandas import DataFrame
>>> df = DataFrame([1]*2, columns=['A','B'])
   A  B
0  1  1
1  1  1
>>> compose(this.applymap(str), this['A'])(df)
0    1
1    1
Name: A, dtype: object
>>> compose(this.applymap(str), this['A'], this.shape())(df)
(2,)

...but...

>>> compose(this.applymap(str), this['A'].shape)(df)
AttributeError: 'function' object has no attribute 'shape'

________________________________________
From: Python-ideas <python-ideas-bounces+larocca=abiresearch.com at python.org> on behalf of Steven D'Aprano <steve at pearwood.info>
Sent: Sunday, May 10, 2015 2:01 AM
To: python-ideas at python.org
Subject: Re: [Python-ideas] Function composition (was no subject)

On Sun, May 10, 2015 at 04:58:29AM +0000, Douglas La Rocca wrote:
> (Newcomer here.)
>
> I use function composition pretty extensively. I've found it to be
> incredibly powerful, but can lead to bad practices. Certain other
> drawbacks are there as well, like unreadable tracebacks. But in many
> cases there are real benefits. And for data pipelines where you want
> to avoid state and mutation it works well.

Thanks for the well-thought out and very detailed post!

The concrete experience you bring to this discussion is a welcome change
from all the theoretical "wouldn't it be nice (or awful) if ..." from
many of us, and I include myself. The fact that you have extensive
experience with using function composition in practice, and can point
out the benefits and disadvantages, is great.


--
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas at python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

From levkivskyi at gmail.com  Sun May 10 09:13:51 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Sun, 10 May 2015 09:13:51 +0200
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com>
References: <CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com>
Message-ID: <CAOMjWkkWc3cUgrke-1UDpuXuVMknXSqZNat_xS3gtPB864Npgg@mail.gmail.com>

On 10 May 2015 at 02:05, Andrew Barnert <abarnert at yahoo.com> wrote:

> On May 9, 2015, at 16:28, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
>
> I was thinking about recent ideas discussed here. I also returned back to
> origins of my initial idea. The point is that it came from Numpy, I use
> Numpy arrays everyday, and typically I do exactly something like
> root(mean(square(data))).
>
> Now I am thinking: what is actually a matrix? It is something that takes a
> vector and returns a vector. But on the other hand the same actually do
> elementwise functions. It does not really matter, what we do with a vector:
> transform by a product of matrices or by composition of functions. In other
> words I agree with Andrew that "elementwise" is a good match with compose,
> and what we really need is to "pipe" things that take a vector (or just an
> iterable) and return a vector (iterable).
>
> So that probably a good place (in a potential future) for compose would be
> not functools but itertools. But indeed a good place to test this would be
> Numpy.
>
>
> Itertools is an interesting idea.
>
> Anyway, assuming NumPy isn't going to add this in the near future (has
> anyone even brought it up on the NumPy list, or only here?), it wouldn't be
> that hard to write a (maybe inefficient but working) @composable wrapper
> and wrap all the relevant callables from NumPy or from itertools, upload it
> to PyPI, and let people start coming up with good examples. If it's later
> worth direct support in NumPy and/or Python (for simplicity or
> performance), the module will still be useful for backward compatibility.
>
>
This is a good step-by-step approach. This is what I would try.


> An additional comment: it is indeed good to have both @ and | for compose
> and rcompose.
> Side note, one can actually overload __rmatmul__ on arrays as well so that
> you can write
>
> root @ mean @ square @ data
>
>
> But this doesn't need to overload it on arrays, only on the utuncs, right?
>
> Unless you're suggesting that one of these operations could be a matrix as
> easily as a function, and NumPy users often won't have to care which it is?
>
>
Exactly, this is what I want. Note that in such approach you have no
parentheses at all.


>
> Moreover, one can overload __or__ on arrays, so that one can write
>
> data | square | mean | root
>
> even with ordinary functions (not Numpy's ufuncs or composable) .
>
>
> That's an interesting point. But I think this will be a bit confusing,
> because now it _does_ matter whether square is a matrix or a
> function--you'll get elementwise bitwise or instead of application. (And
> really, this is the whole reason for @ in the first place--we needed an
> operator that never means elementwise.)
>
> Also, this doesn't let you actually compose functions--if you want square
> | mean | root to be a function, square has to have a __or__ operator.
>
>
This is true. The | is more limited because of its current semantics. The
fact that | operator already has a widely used semantics is also why I
would choose @ if I would need to choose only one: @ or |


> These examples are actually "flat is better than nested" in the extreme
> form.
>
> Anyway, they (Numpy) are going to implement the @ operator for arrays, may
> be it would be a good idea to check that if something on the left from me
> (array) is not an array but a callable then apply it elementwise.
>
> Concerning the multi-argument functions, I don't like $ symbol, don't know
> why. It seems really unintuitive why it means partial application.
> One can autocurry composable functions and apply same rules that Numpy
> uses for ufuncs.
> More precisely, if I write
>
> add(data1, data2)
>
> with arrays it applies add pairwise. But if I write
>
> add(data1, 42)
>
> it is also fine, it simply adds 42 to every element. With autocurrying one
> could write
>
> root @ mean @ add(data) @ square @ data2
>
> or
>
> root @ mean @ square @ add(42) @ data
>
> However, as I see it now it is not very readable, so that may be the best
> choise is to reserve @ and | for "piping" iterables through transformers
> that take one argument. In other words it should be left to user to make
> add(42) of an appropriate type. It is the same logic as for decorators, if
> I write
>
> @modify(arg)
> def func(x):
>     return None
>
> I must care that modify(arg) evaluates to something that takes one
> callable and returns a callable.
>
>
> On May 9, 2015, at 01:36, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> >
>> > Andrew Barnert writes:
>> >>> On May 8, 2015, at 19:58, Stephen J. Turnbull <stephen at xemacs.org>
>> wrote:
>> >>>
>> >>> Koos Zevenhoven writes:
>> >>>
>> >>>> As a random example, (root @ mean @ square)(x) would produce the
>> right
>> >>>> order for rms when using [2].
>> >>>
>> >>> Hardly interesting. :-)  The result is an exception, as root and
>> square
>> >>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar.
>> >>
>> >> Unless you're using an elementwise square and an array-to-scalar
>> >> mean, like the ones in NumPy,
>> >
>> > Erm, why would square be elementwise and root not?  I would suppose
>> > that everything is element-wise in Numpy (not a user yet).
>>
>> Most functions in NumPy are elementwise when applied to arrays, but can
>> also be applied to scalars. So, square is elementwise because it's called
>> on an array, root is scalar because it's called on a scalar. (In fact, root
>> could also be elementwise--aggregating functions like mean can be applied
>> across just one axis of a 2D or higher array, reducing it by one dimension,
>> if you want.)
>>
>> Before you try it, this sounds like a complicated nightmare that can't
>> possibly work in practice. But play with it for just a few minutes and it's
>> completely natural. (Except for a few cases where you want some array-wide
>> but not element-wise operation, most famously matrix multiplication, which
>> is why we now have the @ operator to play with.)
>>
>> >> in which case it works perfectly well...
>> >
>> > But that's an aspect of my point (evidently, obscure).  Conceptually,
>> > as taught in junior high school or so, root and square are scalar-to-
>> > scalar.  If you are working in a context such as Numpy where it makes
>> > sense to assume they are element-wise and thus composable, the context
>> > should provide the compose operator(s).
>>
>> I was actually thinking on these lines: what if @ didn't work on
>> types.FunctionType, but did work on numpy.ufunc (the name for the
>> "universal function" type that knows how to broadcast across arrays but
>> also work on scalars)? That's something NumPy could implement without any
>> help from the core language. (Methods are a minor problem here, but it's
>> obvious how to solve them, so I won't get into it.) And if it turned out to
>> be useful all over the place in NumPy, that might turn up some great uses
>> for the idiomatic non-NumPy Python, or it might show that, like elementwise
>> addition, it's really more a part of NumPy than of Python.
>>
>> But of course that's more of a proposal for NumPy than for Python.
>>
>> > Without that context, Koos's
>> > example looks like a TypeError.
>>
>> >> But Koos's example, even if it was possibly inadvertent, shows that
>> >> I may be wrong about that. Maybe compose together with element-wise
>> >> operators actually _is_ sufficient for something beyond toy
>> >> examples.
>> >
>> > Of course it is!<wink />  I didn't really think there was any doubt
>> > about that.
>>
>> I think there was, and still is. People keep coming up with abstract toy
>> examples, but as soon as someone tries to give a good real example, it only
>> makes sense with NumPy (Koos's) or with some syntax that Python doesn't
>> have (yours), because to write them with actual Python functions would
>> actually be ugly and verbose (my version of yours).
>>
>> I don't think that's a coincidence. You didn't write "map square" because
>> you don't know how to think in Python, but because using compose profitably
>> inherently implies not thinking in Python. (Except, maybe, in the case of
>> NumPy... which is a different idiom.) Maybe someone has a bunch of obvious
>> good use cases for compose that don't also require other functions,
>> operators, or syntax we don't have, but so far, nobody's mentioned one.
>>
>> ------------------------------
>>
>> On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote:
>>
>> > I think there was, and still is. People keep coming up with abstract
>> toy examples, but as soon as someone tries to give a good real example, it
>> only makes sense with NumPy (Koos's) or with some syntax that Python
>> doesn't have (yours), because to write them with actual Python functions
>> would actually be ugly and verbose (my version of yours).
>> >
>> > I don't think that's a coincidence. You didn't write "map square"
>> because you don't know how to think in Python, but because using compose
>> profitably inherently implies not thinking in Python. (Except, maybe, in
>> the case of NumPy... which is a different idiom.) Maybe someone has a bunch
>> of obvious good use cases for compose that don't also require other
>> functions, operators, or syntax we don't have, but so far, nobody's
>> mentioned one.
>>
>> I agree that @ is most likely to be usefull in numpy's restricted context.
>>
>> A composition operator is usually defined by application: f at g(x) is
>> defined as f(g(x)).  (I sure there are also axiomatic treatments.)  It
>> is an optional syntactic abbreviation. It is most useful in a context
>> where there is one set of data objects, such as the real numbers, or one
>> set + arrays (vectors) defined on the one set; where all function are
>> univariate (or possible multivariate, but that can can be transformed to
>> univariate on vectors); *and* where parameter names are dummies like
>> 'x', 'y', 'z', or '_'.
>>
>> The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g
>> does not lose any information as 'x' is basically a placeholder (so get
>> rid of it).  But parameter names are important in most practical
>> contexts, both for understanding a composition and for using it.
>>
>> dev npv(transfers, discount):
>>      '''Return the net present value of discounted transfers.
>>
>>      transfers: finite iterable of amounts at constant intervals
>>      discount: fraction per interval
>>      '''
>>      divisor = 1 + discount
>>      return sum(tranfer/divisor**time
>>                  for time, transfer in enumerate(transfers))
>>
>> Even if one could replace the def statement with
>> npv = <some combination of @, sum, map, add, div, power, enumerate, ...>
>> with parameter names omitted, it would be harder to understand.  Using
>> it would require the ability to infer argument types and order from the
>> composed expression.
>>
>> I intentionally added a statement to calculate the common subexpression
>> prior to the return. I believe it would have to put back in the return
>> expression before converting.
>>
>> --
>> Terry Jan Reedy
>>
>>
>>
>> ------------------------------
>>
>> On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>> >> >I suppose you could write (root @ mean @ (map square)) (xs),
>>
>> > Actually, you can't. You could write (root @ mean @ partial(map,
>> > square))(xs), but that's pretty clearly less readable than
>> > root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>> > been my main argument: Without a full suite of higher-level operators
>> > and related syntax, compose alone doesn't do you any good except for toy
>> > examples.
>>
>> How about an operator for partial?
>>
>>            root @ mean @ map $ square(xs)
>>
>>
>> Actually I'd rather reuse the binary operators.  (I'd be happy if they
>> were
>> just methods on bytes objects BTW.)
>>
>>            compose(root, mean, map(square, xs))
>>
>>            root ^ mean ^ map & square (xs)
>>
>>            root ^ mean ^ map & square ^ xs ()
>>
>> Read this as...
>>
>>           compose root, of mean, of map with square, of xs
>>
>> Or...
>>
>>            apply(map(square, xs), mean, root)
>>
>>            map & square | mean | root (xs)
>>
>>            xs | map & square | mean | root ()
>>
>>
>> Read this as...
>>
>>            apply xs, to map with square, to mean, to root
>>
>>
>> These are kind of cool, but does it make python code easier to read?  That
>> seems like it may be subjective depending on the amount of programming
>> experience someone has.
>>
>> Cheers,
>>     Ron
>>
>>
>>
>> ------------------------------
>>
>> Hi,
>> I had to answer some of these questions when I wrote Lawvere:
>> https://pypi.python.org/pypi/lawvere
>>
>> First, there is two kind of composition: pipe and circle so I think a
>> single operator like @ is a bit restrictive.
>> I like "->" and "<-"
>>
>> Then, for function name and function to string I had to introduce function
>> signature (a tuple).
>> It provides a good tool for decomposition, introspection and comparison in
>> respect with mathematic definition.
>>
>> Finally, for me composition make sense when you have typed functions
>> otherwise it can easily become a mess and this make composition tied to
>> multiple dispatch.
>>
>> I really hope composition will be introduced in python but I can't see how
>> it be made without rethinking a good part of function definition.
>>
>>
>>
>> 2015-05-09 17:38 GMT+02:00 Ron Adam <ron3200 at gmail.com>:
>>
>> >
>> >
>> > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>> >
>> >> >I suppose you could write (root @ mean @ (map square)) (xs),
>> >>>
>> >>
>> >  Actually, you can't. You could write (root @ mean @ partial(map,
>> >> square))(xs), but that's pretty clearly less readable than
>> >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>> >> been my main argument: Without a full suite of higher-level operators
>> >> and related syntax, compose alone doesn't do you any good except for
>> toy
>> >> examples.
>> >>
>> >
>> > How about an operator for partial?
>> >
>> >           root @ mean @ map $ square(xs)
>> >
>> >
>> > Actually I'd rather reuse the binary operators.  (I'd be happy if they
>> > were just methods on bytes objects BTW.)
>> >
>> >           compose(root, mean, map(square, xs))
>> >
>> >           root ^ mean ^ map & square (xs)
>> >
>> >           root ^ mean ^ map & square ^ xs ()
>> >
>> > Read this as...
>> >
>> >          compose root, of mean, of map with square, of xs
>> >
>> > Or...
>> >
>> >           apply(map(square, xs), mean, root)
>> >
>> >           map & square | mean | root (xs)
>> >
>> >           xs | map & square | mean | root ()
>> >
>> >
>> > Read this as...
>> >
>> >           apply xs, to map with square, to mean, to root
>> >
>> >
>> > These are kind of cool, but does it make python code easier to read?
>> That
>> > seems like it may be subjective depending on the amount of programming
>> > experience someone has.
>> >
>> > Cheers,
>> >    Ron
>> >
>> >
>>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150510/e5b934c6/attachment-0001.html>

From abarnert at yahoo.com  Sun May 10 10:18:02 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 10 May 2015 01:18:02 -0700
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAOMjWkkWc3cUgrke-1UDpuXuVMknXSqZNat_xS3gtPB864Npgg@mail.gmail.com>
References: <CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com>
 <CAOMjWkkWc3cUgrke-1UDpuXuVMknXSqZNat_xS3gtPB864Npgg@mail.gmail.com>
Message-ID: <1DD4C041-7C97-4F4B-8240-9C31A88F55BD@yahoo.com>

On May 10, 2015, at 00:13, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
> 
>> On 10 May 2015 at 02:05, Andrew Barnert <abarnert at yahoo.com> wrote:
>>> On May 9, 2015, at 16:28, Ivan Levkivskyi <levkivskyi at gmail.com> wrote:
>>> 
>>> I was thinking about recent ideas discussed here. I also returned back to origins of my initial idea. The point is that it came from Numpy, I use Numpy arrays everyday, and typically I do exactly something like root(mean(square(data))).
>>> 
>>> Now I am thinking: what is actually a matrix? It is something that takes a vector and returns a vector. But on the other hand the same actually do elementwise functions. It does not really matter, what we do with a vector: transform by a product of matrices or by composition of functions. In other words I agree with Andrew that "elementwise" is a good match with compose, and what we really need is to "pipe" things that take a vector (or just an iterable) and return a vector (iterable).
>>> 
>>> So that probably a good place (in a potential future) for compose would be not functools but itertools. But indeed a good place to test this would be Numpy.
>> 
>> Itertools is an interesting idea.
>> 
>> Anyway, assuming NumPy isn't going to add this in the near future (has anyone even brought it up on the NumPy list, or only here?), it wouldn't be that hard to write a (maybe inefficient but working) @composable wrapper and wrap all the relevant callables from NumPy or from itertools, upload it to PyPI, and let people start coming up with good examples. If it's later worth direct support in NumPy and/or Python (for simplicity or performance), the module will still be useful for backward compatibility.
> 
> This is a good step-by-step approach. This is what I would try.
>  
>>> An additional comment: it is indeed good to have both @ and | for compose and rcompose.
>>> Side note, one can actually overload __rmatmul__ on arrays as well so that you can write
>>> 
>>> root @ mean @ square @ data
>> 
>> But this doesn't need to overload it on arrays, only on the utuncs, right?
>> 
>> Unless you're suggesting that one of these operations could be a matrix as easily as a function, and NumPy users often won't have to care which it is?
> 
> Exactly, this is what I want. Note that in such approach you have no parentheses at all.

It's worth working up some practical examples here.

Annoyingly, I actually had a perfect example a few years ago, but I can't find it. I'm sure you can imagine what it was. We built-in vector transforms implemented as functions, and a way for a user to input new transforms as matrices, and a way for the user to chain built-in and user-defined transforms. Under the covers, we had to wrap each user transform in a function just so they'd all be callables, which led to a couple of annoying debugging sessions and probably a performance hit. If we could compose them interchangeably, that might have avoided those problems. But if I can't find the code, it's hard to say for sure, so now I'm offering the same vague, untestable use cases that I was complaining about. :)
>  
>>> 
>>> Moreover, one can overload __or__ on arrays, so that one can write
>>> 
>>> data | square | mean | root
>>> 
>>> even with ordinary functions (not Numpy's ufuncs or composable) .
>> 
>> That's an interesting point. But I think this will be a bit confusing, because now it _does_ matter whether square is a matrix or a function--you'll get elementwise bitwise or instead of application. (And really, this is the whole reason for @ in the first place--we needed an operator that never means elementwise.)
>> 
>> Also, this doesn't let you actually compose functions--if you want square | mean | root to be a function, square has to have a __or__ operator.
> 
> This is true. The | is more limited because of its current semantics. The fact that | operator already has a widely used semantics is also why I would choose @ if I would need to choose only one: @ or |
>  
>>> These examples are actually "flat is better than nested" in the extreme form. 
>>> 
>>> Anyway, they (Numpy) are going to implement the @ operator for arrays, may be it would be a good idea to check that if something on the left from me (array) is not an array but a callable then apply it elementwise.
>>> 
>>> Concerning the multi-argument functions, I don't like $ symbol, don't know why. It seems really unintuitive why it means partial application.
>>> One can autocurry composable functions and apply same rules that Numpy uses for ufuncs.
>>> More precisely, if I write 
>>> 
>>> add(data1, data2) 
>>> 
>>> with arrays it applies add pairwise. But if I write 
>>> 
>>> add(data1, 42) 
>>> 
>>> it is also fine, it simply adds 42 to every element. With autocurrying one could write 
>>> 
>>> root @ mean @ add(data) @ square @ data2
>>> 
>>> or
>>> 
>>> root @ mean @ square @ add(42) @ data  
>>> 
>>> However, as I see it now it is not very readable, so that may be the best choise is to reserve @ and | for "piping" iterables through transformers that take one argument. In other words it should be left to user to make add(42) of an appropriate type. It is the same logic as for decorators, if I write
>>> 
>>> @modify(arg)
>>> def func(x):
>>>     return None
>>> 
>>> I must care that modify(arg) evaluates to something that takes one callable and returns a callable.
>>> 
>>> 
>>>> On May 9, 2015, at 01:36, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>>>> >
>>>> > Andrew Barnert writes:
>>>> >>> On May 8, 2015, at 19:58, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>>>> >>>
>>>> >>> Koos Zevenhoven writes:
>>>> >>>
>>>> >>>> As a random example, (root @ mean @ square)(x) would produce the right
>>>> >>>> order for rms when using [2].
>>>> >>>
>>>> >>> Hardly interesting. :-)  The result is an exception, as root and square
>>>> >>> are conceptually scalar-to-scalar, while mean is sequence-to-scalar.
>>>> >>
>>>> >> Unless you're using an elementwise square and an array-to-scalar
>>>> >> mean, like the ones in NumPy,
>>>> >
>>>> > Erm, why would square be elementwise and root not?  I would suppose
>>>> > that everything is element-wise in Numpy (not a user yet).
>>>> 
>>>> Most functions in NumPy are elementwise when applied to arrays, but can also be applied to scalars. So, square is elementwise because it's called on an array, root is scalar because it's called on a scalar. (In fact, root could also be elementwise--aggregating functions like mean can be applied across just one axis of a 2D or higher array, reducing it by one dimension, if you want.)
>>>> 
>>>> Before you try it, this sounds like a complicated nightmare that can't possibly work in practice. But play with it for just a few minutes and it's completely natural. (Except for a few cases where you want some array-wide but not element-wise operation, most famously matrix multiplication, which is why we now have the @ operator to play with.)
>>>> 
>>>> >> in which case it works perfectly well...
>>>> >
>>>> > But that's an aspect of my point (evidently, obscure).  Conceptually,
>>>> > as taught in junior high school or so, root and square are scalar-to-
>>>> > scalar.  If you are working in a context such as Numpy where it makes
>>>> > sense to assume they are element-wise and thus composable, the context
>>>> > should provide the compose operator(s).
>>>> 
>>>> I was actually thinking on these lines: what if @ didn't work on types.FunctionType, but did work on numpy.ufunc (the name for the "universal function" type that knows how to broadcast across arrays but also work on scalars)? That's something NumPy could implement without any help from the core language. (Methods are a minor problem here, but it's obvious how to solve them, so I won't get into it.) And if it turned out to be useful all over the place in NumPy, that might turn up some great uses for the idiomatic non-NumPy Python, or it might show that, like elementwise addition, it's really more a part of NumPy than of Python.
>>>> 
>>>> But of course that's more of a proposal for NumPy than for Python.
>>>> 
>>>> > Without that context, Koos's
>>>> > example looks like a TypeError.
>>>> 
>>>> >> But Koos's example, even if it was possibly inadvertent, shows that
>>>> >> I may be wrong about that. Maybe compose together with element-wise
>>>> >> operators actually _is_ sufficient for something beyond toy
>>>> >> examples.
>>>> >
>>>> > Of course it is!<wink />  I didn't really think there was any doubt
>>>> > about that.
>>>> 
>>>> I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours).
>>>> 
>>>> I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one.
>>>> 
>>>> ------------------------------
>>>> 
>>>> On 5/9/2015 6:19 AM, Andrew Barnert via Python-ideas wrote:
>>>> 
>>>> > I think there was, and still is. People keep coming up with abstract toy examples, but as soon as someone tries to give a good real example, it only makes sense with NumPy (Koos's) or with some syntax that Python doesn't have (yours), because to write them with actual Python functions would actually be ugly and verbose (my version of yours).
>>>> >
>>>> > I don't think that's a coincidence. You didn't write "map square" because you don't know how to think in Python, but because using compose profitably inherently implies not thinking in Python. (Except, maybe, in the case of NumPy... which is a different idiom.) Maybe someone has a bunch of obvious good use cases for compose that don't also require other functions, operators, or syntax we don't have, but so far, nobody's mentioned one.
>>>> 
>>>> I agree that @ is most likely to be usefull in numpy's restricted context.
>>>> 
>>>> A composition operator is usually defined by application: f at g(x) is
>>>> defined as f(g(x)).  (I sure there are also axiomatic treatments.)  It
>>>> is an optional syntactic abbreviation. It is most useful in a context
>>>> where there is one set of data objects, such as the real numbers, or one
>>>> set + arrays (vectors) defined on the one set; where all function are
>>>> univariate (or possible multivariate, but that can can be transformed to
>>>> univariate on vectors); *and* where parameter names are dummies like
>>>> 'x', 'y', 'z', or '_'.
>>>> 
>>>> The last point is important. Abbreviating h(x) = f(g(x)) with h = f @ g
>>>> does not lose any information as 'x' is basically a placeholder (so get
>>>> rid of it).  But parameter names are important in most practical
>>>> contexts, both for understanding a composition and for using it.
>>>> 
>>>> dev npv(transfers, discount):
>>>>      '''Return the net present value of discounted transfers.
>>>> 
>>>>      transfers: finite iterable of amounts at constant intervals
>>>>      discount: fraction per interval
>>>>      '''
>>>>      divisor = 1 + discount
>>>>      return sum(tranfer/divisor**time
>>>>                  for time, transfer in enumerate(transfers))
>>>> 
>>>> Even if one could replace the def statement with
>>>> npv = <some combination of @, sum, map, add, div, power, enumerate, ...>
>>>> with parameter names omitted, it would be harder to understand.  Using
>>>> it would require the ability to infer argument types and order from the
>>>> composed expression.
>>>> 
>>>> I intentionally added a statement to calculate the common subexpression
>>>> prior to the return. I believe it would have to put back in the return
>>>> expression before converting.
>>>> 
>>>> --
>>>> Terry Jan Reedy
>>>> 
>>>> 
>>>> 
>>>> ------------------------------
>>>> 
>>>> On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>>>> >> >I suppose you could write (root @ mean @ (map square)) (xs),
>>>> 
>>>> > Actually, you can't. You could write (root @ mean @ partial(map,
>>>> > square))(xs), but that's pretty clearly less readable than
>>>> > root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>>>> > been my main argument: Without a full suite of higher-level operators
>>>> > and related syntax, compose alone doesn't do you any good except for toy
>>>> > examples.
>>>> 
>>>> How about an operator for partial?
>>>> 
>>>>            root @ mean @ map $ square(xs)
>>>> 
>>>> 
>>>> Actually I'd rather reuse the binary operators.  (I'd be happy if they were
>>>> just methods on bytes objects BTW.)
>>>> 
>>>>            compose(root, mean, map(square, xs))
>>>> 
>>>>            root ^ mean ^ map & square (xs)
>>>> 
>>>>            root ^ mean ^ map & square ^ xs ()
>>>> 
>>>> Read this as...
>>>> 
>>>>           compose root, of mean, of map with square, of xs
>>>> 
>>>> Or...
>>>> 
>>>>            apply(map(square, xs), mean, root)
>>>> 
>>>>            map & square | mean | root (xs)
>>>> 
>>>>            xs | map & square | mean | root ()
>>>> 
>>>> 
>>>> Read this as...
>>>> 
>>>>            apply xs, to map with square, to mean, to root
>>>> 
>>>> 
>>>> These are kind of cool, but does it make python code easier to read?  That
>>>> seems like it may be subjective depending on the amount of programming
>>>> experience someone has.
>>>> 
>>>> Cheers,
>>>>     Ron
>>>> 
>>>> 
>>>> 
>>>> ------------------------------
>>>> 
>>>> Hi,
>>>> I had to answer some of these questions when I wrote Lawvere:
>>>> https://pypi.python.org/pypi/lawvere
>>>> 
>>>> First, there is two kind of composition: pipe and circle so I think a
>>>> single operator like @ is a bit restrictive.
>>>> I like "->" and "<-"
>>>> 
>>>> Then, for function name and function to string I had to introduce function
>>>> signature (a tuple).
>>>> It provides a good tool for decomposition, introspection and comparison in
>>>> respect with mathematic definition.
>>>> 
>>>> Finally, for me composition make sense when you have typed functions
>>>> otherwise it can easily become a mess and this make composition tied to
>>>> multiple dispatch.
>>>> 
>>>> I really hope composition will be introduced in python but I can't see how
>>>> it be made without rethinking a good part of function definition.
>>>> 
>>>> 
>>>> 
>>>> 2015-05-09 17:38 GMT+02:00 Ron Adam <ron3200 at gmail.com>:
>>>> 
>>>> >
>>>> >
>>>> > On 05/09/2015 03:21 AM, Andrew Barnert via Python-ideas wrote:
>>>> >
>>>> >> >I suppose you could write (root @ mean @ (map square)) (xs),
>>>> >>>
>>>> >>
>>>> >  Actually, you can't. You could write (root @ mean @ partial(map,
>>>> >> square))(xs), but that's pretty clearly less readable than
>>>> >> root(mean(map(square, xs))) or root(mean(x*x for x in xs). And that's
>>>> >> been my main argument: Without a full suite of higher-level operators
>>>> >> and related syntax, compose alone doesn't do you any good except for toy
>>>> >> examples.
>>>> >>
>>>> >
>>>> > How about an operator for partial?
>>>> >
>>>> >           root @ mean @ map $ square(xs)
>>>> >
>>>> >
>>>> > Actually I'd rather reuse the binary operators.  (I'd be happy if they
>>>> > were just methods on bytes objects BTW.)
>>>> >
>>>> >           compose(root, mean, map(square, xs))
>>>> >
>>>> >           root ^ mean ^ map & square (xs)
>>>> >
>>>> >           root ^ mean ^ map & square ^ xs ()
>>>> >
>>>> > Read this as...
>>>> >
>>>> >          compose root, of mean, of map with square, of xs
>>>> >
>>>> > Or...
>>>> >
>>>> >           apply(map(square, xs), mean, root)
>>>> >
>>>> >           map & square | mean | root (xs)
>>>> >
>>>> >           xs | map & square | mean | root ()
>>>> >
>>>> >
>>>> > Read this as...
>>>> >
>>>> >           apply xs, to map with square, to mean, to root
>>>> >
>>>> >
>>>> > These are kind of cool, but does it make python code easier to read?  That
>>>> > seems like it may be subjective depending on the amount of programming
>>>> > experience someone has.
>>>> >
>>>> > Cheers,
>>>> >    Ron
>>>> >
>>>> >
>>> 
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150510/b8d643ca/attachment-0001.html>

From abarnert at yahoo.com  Sun May 10 10:54:48 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 10 May 2015 01:54:48 -0700
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info>
 <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
Message-ID: <AC873140-2A9C-49C6-826E-0425F96692A7@yahoo.com>

On May 9, 2015, at 21:58, Douglas La Rocca <larocca at abiresearch.com> wrote:
> 
> (Newcomer here.)
> 
> I use function composition pretty extensively. I've found it to be incredibly powerful, but can lead to bad practices. Certain other drawbacks are there as well, like unreadable tracebacks. But in many cases there are real benefits. And for data pipelines where you want to avoid state and mutation it works well.
> 
> The fn and pymonad modules implement infix composition functions through overloading but I've found this to be unworkable.
> 
> For me, the ideal infix operator would simply be a space, with the composition wrapped in parentheses. So e.g.
> 
>>>> (list str sorted)(range(10))
> [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ',', ',', ',', ',', ',', ',', ',', ',', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '[', ']']
> 
> I might be overlooking something, but it seems to me this would work with existing syntax and semantics and wouldn't conflict with anything else like operator overloading would. The only other place non-indentation level spaces are significant is with keywords which can't be re-assigned. So e.g. (yield from gen()) wouldn't be parsed as 3 functions, and (def func) would raise SyntaxError.
> 
> Here's the composition function I'm working with, stripped of the little debugging helpers:
> 
> ```
> def compose(*fns):
>    def compose_(*x):
>        fn, *fns = fns
>        value = fn(*x)
>        if fns:
>            return compose(*fns)(value)
>        else:
>            return value
>    return compose_
> 
> O=compose
> ```
> 
> I haven't had any issues with the recursion. The `O` alias rubs me the wrong way but seemed to make sense at the time. The thought was that it should look like an operator because it acts like one.
> 
> So the use looks like
> 
>>>> O(fn1, fn2, fn3, ...)('string to be piped')
> 
> The problem for composition is essentially argument passing and has to do with the convenience of *args, **kwargs.
> 
> The way to make composition work predictably is to curry the functions yourself, wrapping the arguments you expect to get with nested closures, then repairing the __name__ etc with functools.wraps or update_wrapper in the usual way. This looks much nicer and almost natural when you write it with lambdas, e.g.
> 
>>>> getitem = lambda item: lambda container: container[item]
> 
> (Apologies for having named that lambda there...)

I understand why you named it; I don't understand why you didn't just use def if you were going to name it (and declare it in a statement instead of the middle of an expression). Anyway, this is already in operator, as itemgetter, and it's definitely useful to functional code, especially itertools-style generator-driven functional code. And it feels like the pattern ought to be generalizable... but other than attrgetter, it's hard to think of another example where you want the same thing. After all, Python only has a couple of syntactic forms that you'd want to wrap up as functions at all, so it only has a couple of syntactic forms that you'd want to wrap up as curried functions.

> The other way to manage passing values from one function to the next is to define a function like
> 
> def star(x):
>    return lambda fn: fn(*x)
> 
> Then if you get a list at one point in the pipeline and your function takes *args, you can decorate the function and call it like 
> 
>>>> star(getattr)((getattr, '__name__'))
> 'getattr'
> 
> I've run into problems using the @curried decorators from the fn and pymonad modules because they don't how to handle *args, i.e. when to stop collecting arguments and finally make the function call.
> 
> If you want to have the composition order reversed you could decorate the definition with
> 
> ```
> def flip(f):
>    def flip_(*x):
>        f(*reversed(x))
>    return flip_
> ```
> 
> Once we have composition we can write partials for `map`, `filter`, and `reduce`, but with a small twist: make them variadic in the first argument and pass the arguments to compose:
> 
> def fmap(*fn):
>    def fmap_(x):
>        return list(map(compose(*fn),x))
>    return fmap_

I don't understand why this is called fmap. I see below that you're not implying anything like Haskell's fmap (which confused me...), but then what _does_ the f mean? It seems like this is just a manually curried map, that returns a list instead of an iterator, and only takes one iterable instead of one or more. None of those things say "f" to me, but maybe I'm still hung up on expecting it to mean "functor" and I'll feel like an idiot once you clear it up. :)

Also, why _is_ it calling list? Do your notions of composition and currying not play well with iterators? If so, that seems like a pretty major thing to give up. And why isn't it variadic in the iterables? You can trivially change that by just having the wrapped function take and pass *x, but I assume there's some reason you didn't?

> def ffilter(fn):
>    def ffilter_(xs):
>        return list(filter(fn, xs))
>    return ffilter_
> 
> def freduce(fn):
>    def _freduce(xs):
>        return reduce(fn, xs)
>    return _freduce

These two aren't variadic in fn like fmap was. Is that just a typo, or is there a reason not to be?

> def Fmap(*fns):
>    def Fmap_(x):
>        return list(map(lambda fn:fn(x), fns))
>    return Fmap_
> 
> The `Fmap` function seemed like some sort of "conjugate" to `fmap` so I tried to give it name suggesting this (again, at the expense of abusing naming conventions).
> 
> Instead of mapping a function over a iterable like `fmap`, `Fmap` applies a each given function to a value. So
> 
>>>> Fmap(add(1), sub(1))(1)
> [2, 0]
> 
> I've called them `fmap`, `ffilter`, and `freduce` but don't much like these names as they imply they might be the same as Haskell's `fmap`, and they're not. And there's no way to make them anything like Haskell as far as I can tell and they shouldn't be. If these implement a "paradigm" it's not purely functional but tacit/concatenative.
> 
> It made sense to compose the passed arguments because there's no reason to pass anything else to `fmap` in the first call. So sequential calls to (the return value of) `fmap` inside a pipeline, like
> 
>>>> O(mul(10),
> ...   fmap(add(1)), 
> ...   fmap(mul(2))
> ...  )([1])
> [4, 4, 4, 4, 4, 4, 4, 4, 4, 4]                                          
> 
> can instead be written like
> 
>>>> O(mul(10),
> ...   fmap(add(1), 
> ...        mul(2))
> ...  )([1])
> [4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
> 
> It also makes it easier to work at different levels inside nested structures. In these heavily nested cases the composition pipeline even begins to resemble the data structure passing through, which makes sense.
> 
> As another example, following is part of a pipeline that takes strings of bullet-separated strings of "key:value" pairs and converts each one to a dictionary, then folds the result together:
> 
>>>> d = [' foo00 : bar00 ? foo01 : bar01 ',
> ...      ' foo10 : bar10 ? foo11 : bar11 ', 
> ...      ' foo20 : bar10 ? foo21 : bar21 ',]
> 
>>>> dict_foldl = freduce(lambda d1, d2: dict(d1, **d2))
>>>> strip = lambda x: lambda s: s.strip(x)
>>>> split = lambda x: lambda s: s.split(x)
> 
>>>> f = O(fmap(strip(' '),
> ...            split('?'), 
> ...            fmap(split(':'),
> ...                 strip(' '), 
> ...                 tuple), 
> ...            tuple,
> ...            dict),
> ...       dict_foldl)

Now that we have a concrete example... This looks like a nifty translation of what you might write in Haskell, but it doesn't look at all like Python to me. 

And compare:

    def f(d):
        pairs = (pair.strip(' ').split(':') for pair in d.split('?'))
        strippedpairs = ((part.strip(' ') for part in pair) for pair in pairs)
        return dict(strippedpairs)

Or, even better:

    def f(d):
        pairs = (pair.strip(' ').split(':') for pair in d.split('?'))
        return {k.strip(' '): v.strip(' ') for k, v in pairs}

Of course I skipped a lot of steps--turning the inner iterables into tuples, then into dicts, then turning the outer iterable into a list, then merging all the dicts, and of course wrapping various subsets of the process up into functions and calling them--but that's because those steps are unnecessary. We have comprehensions, we have iterators, why try to write for Python 2.2?

And notice that any chain of iterator transformations like this _could_ be written as a single expression. But the fact that it doesn't _have_ to be--that you can take any step you want and name the intermediate iterable without having to change anything (and with negligible performance cost), and you can make your code vertical and play into Python indentation instead of writing it horizontally and faking indentation with paren-continuation--is what makes generator expressions and map and filter so nice.

Well, that, and the fact that in a comprehension I can just write an expression and it means that expression. I don't have to wrap the expression in a function, or try to come up with a higher-order expression that will effect that first-order expression when evaluated.

>>>> f(d)
> {'foo00': 'bar00',
> 'foo01': 'bar01',
> 'foo10': 'bar10',
> 'foo11': 'bar11',
> 'foo20': 'bar10',
> 'foo21': 'bar21'}
> 
> The combination of `compose`, `fmap`, and `Fmap` can be amazingly powerful for doing lots of work in a neat way while keeping the focus on the pipeline itself and not the individual values passing through.

But often, the individual values have useful names that make it easier to keep track of them. Like calling the keys and values k and v instead of having them be elements 0 and 1 of an implicit *args.

> The other thing is that this opens the door to a full "algebra" of maps which is kind of insane:
> 
> def mapeach(*fns):
>    def mapeach_(*xs): 
>        return list(map(lambda fn, *x: fn(*x), fns, *xs))
>    return mapeach_
> 
> def product_map(fns):
>    return lambda xs: list(map(lambda x: map(lambda fn: fn(x), fns), xs))
> 
> def smap(*fns):
>    "star map"
>    return lambda xs: list(map(O(*fns),*xs))
> 
> def pmap(*fns):
>    return lambda *xs: list(map(lambda *x:list(map(lambda fn:fn(*x),fns)),*xs))
> 
> def matrix_map(*_fns):
>    def matrix_map_(*_xs):
>        return list(map(lambda fns, xs: list(map(lambda fn, x: fmap(fn)(x), fns, xs)), _fns, _xs))
>    return matrix_map_
> 
> def mapcat(*fn):
>    "clojure-inspired?"
>    return compose(fmap(*fn), freduce(list.__add__))
> 
> def filtercat(*fn):
>    return compose(ffilter(*fn), freduce(list.__add__))
> 
> I rarely use any of these of these. They grew out of an attempt to tease out some hidden structure behind the combination of `map` and star packing/unpacking.
> 
> I do think there's something there but the names get in the way--it would be better to find a way to define a function that takes a specification of the structures of functions and values and knows what to do, e.g. something like
> 
>>>> from types import FunctionType
>>>> fn = FunctionType
>>>> # then the desired/imaginary version of map...
>>>> _map(fn, [int])(add(1))(range(5)) # sort of like `fmap`
> [1,2,3,4,5]
>>>> _map([fn], [int])((add(x) for x in range(5)))(range(5)) # sort of like `mapeach`
> [0,2,4,6,8]
>>>> _map([[fn]], [[int]])(((add(x) for x in range(5))*10))((list(range(5)))*10) # sort of like `matrix_map`
> [[[0, 1, 2, 3, 4],
>  [1, 2, 3, 4, 5],
>  [2, 3, 4, 5, 6],
>  [3, 4, 5, 6, 7],
>  [4, 5, 6, 7, 8],
>  [0, 1, 2, 3, 4],
>  [1, 2, 3, 4, 5],
>  [2, 3, 4, 5, 6],
>  [3, 4, 5, 6, 7],
>  [4, 5, 6, 7, 8]]]
> 
> In most cases the first argument would just be `fn`, but it would be *really* nice to be able to do something like
> 
>>>> map(fn, [[int], [[int],[[str],[str]]]])
> 
> where all you need to do is give the schema and indicate which values to apply the function to. Giving the type would be an added measure, but passing `type` in the schema for unknowns should work just as well.
> ________________________________________
> From: Python-ideas <python-ideas-bounces+larocca=abiresearch.com at python.org> on behalf of Steven D'Aprano <steve at pearwood.info>
> Sent: Saturday, May 09, 2015 11:20 PM
> To: python-ideas at python.org
> Subject: Re: [Python-ideas] Function composition (was no subject)
> 
>> On Sun, May 10, 2015 at 03:51:38AM +0300, Koos Zevenhoven wrote:
>> 
>> Another way to deal with elementwise operations on iterables would be to
>> make a small, mostly backwards compatible change in map:
>> 
>> When map is called with just one argument, for instance map(square), it
>> would return a function that takes iterables and maps them element-wise.
>> 
>> Now it would be easier to use map in pipelines, for example:
>> 
>> rms = sqrt @ mean @ map(square)
> 
> Or just use a tiny helper function:
> 
> def vectorise(func):
>    return partial(map, func)
> 
> rms = sqrt @ mean @ vectorise(square)
> 
> 
> --
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From steve at pearwood.info  Sun May 10 11:04:21 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 May 2015 19:04:21 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <1DD4C041-7C97-4F4B-8240-9C31A88F55BD@yahoo.com>
References: <CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <84C6058F-8015-4703-979D-59CD9780F93A@yahoo.com>
 <CAOMjWkkWc3cUgrke-1UDpuXuVMknXSqZNat_xS3gtPB864Npgg@mail.gmail.com>
 <1DD4C041-7C97-4F4B-8240-9C31A88F55BD@yahoo.com>
Message-ID: <20150510090421.GI5663@ando.pearwood.info>

On Sun, May 10, 2015 at 01:18:02AM -0700, Andrew Barnert via Python-ideas wrote:
[...]

Not picking on Andrew specifically, but could folks please trim their 
replies occasionally to keep the amount of quoted text manageable?

Andrew's post is about 10 pages of mostly-quoted text (depending on how 
you count pages, mutt claims it's 14 but I think it means screenfuls, not 
pages), and I'm seeing up to nine levels of quoting:

> >>>> >>>> As a random example, (root @ mean @ square)(x) would produce the right
> >>>> >>>> order for rms when using [2].

Thanks in advance.


-- 
Steve

From rosuav at gmail.com  Sun May 10 11:25:31 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Sun, 10 May 2015 19:25:31 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <554EAB9A.2090501@aalto.fi>
 <20150510032016.GF5663@ando.pearwood.info>
 <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
Message-ID: <CAPTjJmp-936NZ5SZeSq2PgtLc2KLTC9Afrp8X6jiYGLeN96UvQ@mail.gmail.com>

On Sun, May 10, 2015 at 2:58 PM, Douglas La Rocca
<larocca at abiresearch.com> wrote:
> (Newcomer here.)

Welcome to Bikeshed Central! Here, we take a plausible idea and fiddle
around with all the little detaily bits :)

> For me, the ideal infix operator would simply be a space, with the composition wrapped in parentheses. So e.g.
>
>>>> (list str sorted)(range(10))
> [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ',', ',', ',', ',', ',', ',', ',', ',', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '[', ']']
>
> I might be overlooking something, but it seems to me this would work with existing syntax and semantics and wouldn't conflict with anything else like operator overloading would.
>

One of the problems with using mere whitespace is that it's very easy
to do accidentally. There's already places where this can happen, for
instance:

strings = [
    "String one",
    "String two is a bit longer"
    "String three",
    "String four"
]

How many strings are there in my list? Clearly the programmer's
intention is to have four, but that's not what ends up happening. (Now
imagine there are actually hundreds of strings, and one gets selected
at random every time you do something. Have fun figuring out why, just
occasionally, it prints out two messages instead of one. For bonus
points, figure that out when there are two or three such bugs in the
list, so it's not always the exact same pair that come out together.)

At the moment, we can safely build up a list of functions like this:

funcs = [
    list,
    str,
    sorted,
]

because omitting a comma will produce an instant SyntaxError.

Python currently is pretty good at detecting problems in source code.
(Not all languages are, as you'll know as soon as you run into one of
those "oops I left out a semicolon and my JavaScript function does
something slightly different" bugs.) Part of that comes from having a
fairly simple set of rules governing syntax, such that any deviation
results in a simple and quick error *at or very near to* the place
where the error occurs. You won't, for instance, get an error at the
bottom of a file saying "Unmatched '{' or missing '}'", leaving you to
dig through your code to figure out exactly where the problem was. At
worst, you get an error on the immediately-following line of code:

def func1():
    value = x * (y + z # oops, forgot the close parens
    print(value) # boom, SyntaxError on this line

But if "function function" meant composition, this would actually be
legal, and you'd get an error rather further down. If you're lucky,
this is the end of this function, and the "def" keyword trips the
error; but otherwise, this would be validly parsed as "compose z and
print into a function, then call that with value", and we're still
looking for a close parens.

So I would strongly suggest having some sort of operator in between.
Okay. Can I just say something crazy? (Hans: I love crazy!) How about
using a comma?

>>> (fn1, fn2, fn3, ...)('string to be piped')

Currently, this produces a runtime TypeError: 'tuple' object is not
callable, but I could easily define my own callable subclass of tuple.

>>> class functuple(tuple):
...     def __call__(self, arg):
...         for func in self: arg = func(arg)
...         return arg
...
>>> f = functuple((fn1,fn2))
>>> f("this is a test")

(Use whatever semantics you like for handling multiple arguments. I'm
not getting into that part of the debate, as I have no idea how
function composition ought to work in the face of *args and **kwargs.)

The syntax is reasonably clean, and it actually doesn't require many
changes - just making tuples callable in some logical fashion. No new
syntax needed, and it's an already-known light-weight way to pack up a
bunch of things into one object. Does it make sense to do this?

ChrisA

From steve at pearwood.info  Sun May 10 11:31:25 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 May 2015 19:31:25 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAEbHw4awMVzdWBbZt2HoZozwoTEWtCPPfoCPTRzUqWTEigbkFQ@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <20150509181642.GB5663@ando.pearwood.info>
 <CAEbHw4awMVzdWBbZt2HoZozwoTEWtCPPfoCPTRzUqWTEigbkFQ@mail.gmail.com>
Message-ID: <20150510093125.GJ5663@ando.pearwood.info>

On Sat, May 09, 2015 at 01:30:17PM -0500, David Mertz wrote:
> On Sat, May 9, 2015 at 1:16 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> > On Sat, May 09, 2015 at 11:38:38AM -0400, Ron Adam wrote:
> >
> > > How about an operator for partial?
> > >
> > >           root @ mean @ map $ square(xs)
> >
> 
> I have trouble seeing the advantage of a special function composition
> operator when it is easy to write a general 'compose()' function that can
> produce such things easily enough.

Do you have trouble seeing the advantage of a special value addition 
operator when it is easy enough to write a general "add()" function?

*wink*

I think that, mentally, operators "feel" lightweight. If I write:

getattr(obj, 'method')(arg)

it puts too much emphasis on the attribute access. But using an 
operator:

obj.method(arg)

put the emphasis on calling the method, not looking it up, which is just 
right. Even though both forms do about the same about of work, mentally, 
the dot pseudo-operator feels much more lightweight.

The same with

compose(grep, filter)(data)

versus 

(grep @ filter)(data)

The first sends my attention to the wrong place, the composition. The 
second does not.

I don't expect everyone to agree with me, but I think this explains why 
people keep suggesting syntax or an operator to do function composition 
instead of a function. Not everyone thinks this way, but for those who 
do, a compose() function is like eating a great big bowl gruel that 
contains all the nutrients you need for the day and tastes of cardboard 
and smells of wet dog. It might do everything that you want 
functionally, but it feels wrong and looks wrong and it is not in the 
least bit pleasurable to use.



-- 
Steve

From steve at pearwood.info  Sun May 10 11:57:30 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 10 May 2015 19:57:30 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAPTjJmp-936NZ5SZeSq2PgtLc2KLTC9Afrp8X6jiYGLeN96UvQ@mail.gmail.com>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info>
 <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
 <CAPTjJmp-936NZ5SZeSq2PgtLc2KLTC9Afrp8X6jiYGLeN96UvQ@mail.gmail.com>
Message-ID: <20150510095729.GK5663@ando.pearwood.info>

On Sun, May 10, 2015 at 07:25:31PM +1000, Chris Angelico wrote:

> strings = [
>     "String one",
>     "String two is a bit longer"
>     "String three",
>     "String four"
> ]
> 
> How many strings are there in my list? Clearly the programmer's
> intention is to have four, but that's not what ends up happening. (Now
> imagine there are actually hundreds of strings, and one gets selected
> at random every time you do something.

If you are embedding hundreds of strings in the source, instead of 
reading them from a file, you deserve whatever horribleness you get :-)


> Have fun figuring out why, just
> occasionally, it prints out two messages instead of one.

That would actually be pretty easy to solve. When you get the unexpected 
"String two is a bit longerString three" message, just grep through the 
file for the first few words, and lo and behold, you are missing a 
comma.

But your point about syntactically meaningful whitespace is otherwise a 
good one. Python doesn't give whitespace in expressions any particular 
meaning, except as a separator. I'd be very dubious about making 
function composition an exception. 


> So I would strongly suggest having some sort of operator in between.
> Okay. Can I just say something crazy? (Hans: I love crazy!) How about
> using a comma?
> 
> >>> (fn1, fn2, fn3, ...)('string to be piped')
> 
> Currently, this produces a runtime TypeError: 'tuple' object is not
> callable, but I could easily define my own callable subclass of tuple.

There's lots of code that assumes that a tuple of functions is a 
sequence:

for f in (len, str, ord, chr, repr):
    test(f)

so we would need to keep that. But we don't want a composed function to 
be a sequence, any more than we want a partial or a regular function to 
be sequences. If I pass you a Composed object, and you try slicing it, 
that should be an error.


-- 
Steve

From rosuav at gmail.com  Sun May 10 12:17:09 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Sun, 10 May 2015 20:17:09 +1000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <20150510095729.GK5663@ando.pearwood.info>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <554EAB9A.2090501@aalto.fi>
 <20150510032016.GF5663@ando.pearwood.info>
 <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
 <CAPTjJmp-936NZ5SZeSq2PgtLc2KLTC9Afrp8X6jiYGLeN96UvQ@mail.gmail.com>
 <20150510095729.GK5663@ando.pearwood.info>
Message-ID: <CAPTjJmpF7Pjppp9w81G+7ij2x9M=QuVegFT0p9rh9eAPTda3yg@mail.gmail.com>

On Sun, May 10, 2015 at 7:57 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> So I would strongly suggest having some sort of operator in between.
>> Okay. Can I just say something crazy? (Hans: I love crazy!) How about
>> using a comma?
>>
>> >>> (fn1, fn2, fn3, ...)('string to be piped')
>>
>> Currently, this produces a runtime TypeError: 'tuple' object is not
>> callable, but I could easily define my own callable subclass of tuple.
>
> There's lots of code that assumes that a tuple of functions is a
> sequence:
>
> for f in (len, str, ord, chr, repr):
>     test(f)
>
> so we would need to keep that. But we don't want a composed function to
> be a sequence, any more than we want a partial or a regular function to
> be sequences. If I pass you a Composed object, and you try slicing it,
> that should be an error.

Well, I told you it was crazy :) But the significance here is that
there would be no Composed object, just a tuple. You could slice it,
iterate over it, etc; and if you call it, it calls each of its
arguments. I'm not sure that it's a fundamental problem for a composed
function to be sliceable, any more than it's a problem for any other
available operation that you aren't using. Tuples already have several
related uses (they can be used as "record" types, or as frozen lists
for hashability, etc), and this would simply mean that a tuple of
callables is callable.

ChrisA

From larocca at abiresearch.com  Sun May 10 12:36:59 2015
From: larocca at abiresearch.com (Douglas La Rocca)
Date: Sun, 10 May 2015 10:36:59 +0000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <AC873140-2A9C-49C6-826E-0425F96692A7@yahoo.com>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info>
 <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
 <AC873140-2A9C-49C6-826E-0425F96692A7@yahoo.com>
Message-ID: <eed235d3acea4514bb8ad5b4c885e5c0@swordfish.abiresearch.com>

> I understand why you named it; I don't understand why you didn't just use
> def if you were going to name it (and declare it in a statement instead of the
> middle of an expression). Anyway, this is already in operator, as itemgetter,
> and it's definitely useful to functional code, especially itertools-style
> generator-driven functional code. And it feels like the pattern ought to be
> generalizable... but other than attrgetter, it's hard to think of another
> example where you want the same thing. After all, Python only has a couple
> of syntactic forms that you'd want to wrap up as functions at all, so it only has
> a couple of syntactic forms that you'd want to wrap up as curried functions.

Sorry for the confusion here--I was trying to say that it's correct to use def in order to properly set __name__, give space for doc strings, etc. 
The downside is that the nesting can strain readability. I was only showing the "formal" equivalent in lambda-style to point out how currying arguments isn't really confusing at all, considering

    lambda x: lambda y: lambda z: <some expression of x, y, z>

begins to resemble syntactically

    def anon(x, y, z):
        <some expression of x, y, z>

(obviously semantically these are different). 

Regarding the `getitem` example, this wasn't intended as a use-case. It's true Python has few syntactic forms you'd want to wrap (isinstance, hasattr, etc.). I mostly had external module apis in mind here.

> I don't understand why this is called fmap. I see below that you're not
> implying anything like Haskell's fmap (which confused me...), but then what
> _does_ the f mean? It seems like this is just a manually curried map, that
> returns a list instead of an iterator, and only takes one iterable instead of one
> or more. None of those things say "f" to me, but maybe I'm still hung up on
> expecting it to mean "functor" and I'll feel like an idiot once you clear it up. :)

> Also, why _is_ it calling list? Do your notions of composition and currying not
> play well with iterators? If so, that seems like a pretty major thing to give up.
> And why isn't it variadic in the iterables? You can trivially change that by just
> having the wrapped function take and pass *x, but I assume there's some
> reason you didn't?

It was only called fmap to leave the builtin map in the namespace, the 'f' just meant 'function'.

Taking a single iterable as the first item rather than varargs avoids the use of the `star` shim in the composition. I do use a wrapper `s` for this but find it ugly to use. It's basically a conventional decision that's forced by the difference between passing a single value to a "monadic" (in the APL not Haskell sense) function and a variadic function. In my own util library this also shows up as two versions of the identity function:

    def identity(x):
        return x

    def identity_star(*x):
        return x

It will seem these are useless but purpose becomes felt when you're in the middle of a composition.

For data structures where you want to map over lists of lists of lists etc., you can either define a higher map or do something like

    fmap(fmap(fmap(function_to_apply)))(iterable)

which would incidentally be the same as the uglier

    compose(*(fmap,)*3)(function_to_apply)(iterable)

though the latter makes it possible to parametrize the iteration depth.

As for wrapping in `list`--in some cases (I can't immediately recall them all) the list actually needed to be built in order for the composition to work. A simple case would be

    compose(mul(10), fmap(len), len)([[1]*10]*10)

which would return TypeError. I should look again to see if there's a better way to fix it. But I reverted the default back to 2.x because I made full use of generators before moving to 3.x and decided I didn't need map to be lazy. To be honest, the preference for everything to be lazy seems somewhat fashionable at the moment... you can get along just as well knowing where things shouldn't be fully loaded into memory (i.e. when to use a generator).

> These two aren't variadic in fn like fmap was. Is that just a typo, or is there a
> reason not to be?

Yes just a typo!

> Now that we have a concrete example... This looks like a nifty translation of
> what you might write in Haskell, but it doesn't look at all like Python to me.
> 
> And compare:
> 
>     def f(d):
>         pairs = (pair.strip(' ').split(':') for pair in d.split('?'))
>         strippedpairs = ((part.strip(' ') for part in pair) for pair in pairs)
>         return dict(strippedpairs)
> 
> Or, even better:
> 
>     def f(d):
>         pairs = (pair.strip(' ').split(':') for pair in d.split('?'))
>         return {k.strip(' '): v.strip(' ') for k, v in pairs}
> 
> Of course I skipped a lot of steps--turning the inner iterables into tuples,
> then into dicts, then turning the outer iterable into a list, then merging all the
> dicts, and of course wrapping various subsets of the process up into
> functions and calling them--but that's because those steps are unnecessary.
> We have comprehensions, we have iterators, why try to write for Python
> 2.2?

I agree these work just as well.

> And notice that any chain of iterator transformations like this _could_ be
> written as a single expression. But the fact that it doesn't _have_ to be--that
> you can take any step you want and name the intermediate iterable without
> having to change anything (and with negligible performance cost), and you
> can make your code vertical and play into Python indentation instead of
> writing it horizontally and faking indentation with paren-continuation--is
> what makes generator expressions and map and filter so nice.

> Well, that, and the fact that in a comprehension I can just write an expression
> and it means that expression. I don't have to wrap the expression in a
> function, or try to come up with a higher-order expression that will effect
> that first-order expression when evaluated.

> But often, the individual values have useful names that make it easier to
> keep track of them. Like calling the keys and values k and v instead of having
> them be elements 0 and 1 of an implicit *args.

I agree for the most part, but there are cases where you're really deep into some structure, manipulating the values in a generic way, and the names *do* get in the way. The temptation for me in those cases is to use x, y, z, s, t, etc. At this point the readability really suffers. The alternative is to modularize more, breaking the functions apart, but this only helps so much... In a certain way I find `(pair.strip(' ').split(':') for pair in d.split('?'))` to be less readable than the first steps in the composition--with the generator I'm reading back and forth in order to find out what's happening whereas the composition + map outlines the steps in a tree-like structure.

From ron3200 at gmail.com  Sun May 10 16:45:49 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Sun, 10 May 2015 10:45:49 -0400
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <A278F1E4-67FF-44AE-B5B0-EDCE53072E03@yahoo.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <91A6985C-A94B-4132-99B1-0305933950B5@yahoo.com> <mimi3h$mlj$1@ger.gmane.org>
 <A278F1E4-67FF-44AE-B5B0-EDCE53072E03@yahoo.com>
Message-ID: <minquv$jgm$1@ger.gmane.org>



On 05/10/2015 01:24 AM, Andrew Barnert via Python-ideas wrote:
> On May 9, 2015, at 20:08, Ron Adam <ron3200 at gmail.com> wrote:
>>
>>> On 05/09/2015 06:45 PM, Andrew Barnert via Python-ideas wrote:
>>>> On May 9, 2015, at 08:38, Ron Adam<ron3200 at gmail.com>  wrote:

>>> But, more importantly, this doesn't work. Your square(xs) isn't going
>>> to  evaluate to a function, but to a whatever falling square on xs returns.
>>> (Which is presumably a TypeError, or you wouldn't be looking to map in the
>>> first place). And, even if that did work, you're not actually composing a
>>> function here anyway; your @ is just a call operator, which we already have
>>> in Python, spelled with parens.
>>
>> This is following the patterns being discussed in the thread.  (or at least an attempt to do so.)
>>
>> The @ and $ above would bind more tightly than the ().  Like the doc "." does for method calls.
>
> @ can't bind more tightly than (). The operator already exists (that's
> the whole reason people are suggesting it for compose), and it has the same
> precedence as *.

Yes, and so it may need different symbols to work, but there are not many 
easy to type and read symbols left.  So some double symbols of some sort 
may work.

Picking what those should be is a topic all its own, and it's not even an 
issue until the concept works.

I should not even given examples earlier.  The point I was trying to make 
was an operator that indicates the next argument is not complete my be 
useful.  And I think the initial (or another) example implementation does 
do that, but uses a tuple to package the function with the partial 
arguments instead.

Cheers,
    Ron
























From koos.zevenhoven at aalto.fi  Sun May 10 17:15:58 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Sun, 10 May 2015 18:15:58 +0300
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <26588_1431226604_554EC8EC_26588_715_1_20150510025630.GC5663@ando.pearwood.info>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <26588_1431226604_554EC8EC_26588_715_1_20150510025630.GC5663@ando.pearwood.info>
Message-ID: <554F762E.3030009@aalto.fi>

On 10.5.2015 5:56, Steven D'Aprano wrote:

[...]
>> You could in addition have:
>>
>> spam @ eggs @ cheese @ arg   #  equivalent to spam(eggs(cheese(arg)))
>>
>> arg | spam | eggs | cheese    # equivalent to cheese(eggs(spam(arg)))
>>
>> Here, arg would thus be recognized as not a function.
> No. I think it is absolutely vital to distinguish by syntax the
> difference between composition and function application, and not try to
> "do what I mean". DWIM software has a bad history of doing the wrong
> thing.
>
> Every other kind of callable uses obj(arg) to call it: types, functions,
> methods, partial objects, etc. We shouldn't make function composition
> try to be different. If I write sqrt at 100 I should get a runtime error,
> not 10.
>
> I don't mind if the error is delayed until I actually try to call the
> composed object, but at some point I should get a TypeError that 100 is
> not callable.
>

That is in fact a part of why I added a function call () to the sketch 
in my recent post (extended partial operator, there using ->). This way, 
the composition operator would never do the actual call by itself, but 
instead make a partial. But I admit that (sqrt at 100)() still would give 
10, not the runtime error you want (which may indeed cause problems with 
callable arguments). It only solves half the problem.

Another way to feed the left-to-right | composition from the left would 
of course be

(feed(x) | spam | eggs | cheese)()   # feed would be just  def feed(x): 
return x

But I'm not sure I like it. Luckily, (cheese @ eggs @ spam)(x) does not 
have this problem. However, if cheese, eggs and spam were matrix 
transformations, one would write

cheese @ eggs @ spam @ x

But perhaps numpy would want to bridge this gap with extended behavior 
(allow calling numpy functions with @ or "calling a matrix 
transformation" with () ). Or perhaps not :).

-- Koos



From koos.zevenhoven at aalto.fi  Sun May 10 17:30:50 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Sun, 10 May 2015 18:30:50 +0300
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <26588_1431270974_554F763D_26588_7409_1_554F762E.3030009@aalto.fi>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <26588_1431226604_554EC8EC_26588_715_1_20150510025630.GC5663@ando.pearwood.info>
 <26588_1431270974_554F763D_26588_7409_1_554F762E.3030009@aalto.fi>
Message-ID: <554F79AA.60100@aalto.fi>

Just a small correction to my below email (the definition of "feed"):

On 10.5.2015 18:15, Koos Zevenhoven wrote:
> On 10.5.2015 5:56, Steven D'Aprano wrote:
>
> [...]
>>> You could in addition have:
>>>
>>> spam @ eggs @ cheese @ arg   #  equivalent to spam(eggs(cheese(arg)))
>>>
>>> arg | spam | eggs | cheese    # equivalent to cheese(eggs(spam(arg)))
>>>
>>> Here, arg would thus be recognized as not a function.
>> No. I think it is absolutely vital to distinguish by syntax the
>> difference between composition and function application, and not try to
>> "do what I mean". DWIM software has a bad history of doing the wrong
>> thing.
>>
>> Every other kind of callable uses obj(arg) to call it: types, functions,
>> methods, partial objects, etc. We shouldn't make function composition
>> try to be different. If I write sqrt at 100 I should get a runtime error,
>> not 10.
>>
>> I don't mind if the error is delayed until I actually try to call the
>> composed object, but at some point I should get a TypeError that 100 is
>> not callable.
>>
>
> That is in fact a part of why I added a function call () to the sketch 
> in my recent post (extended partial operator, there using ->). This 
> way, the composition operator would never do the actual call by 
> itself, but instead make a partial. But I admit that (sqrt at 100)() 
> still would give 10, not the runtime error you want (which may indeed 
> cause problems with callable arguments). It only solves half the problem.
>
> Another way to feed the left-to-right | composition from the left 
> would of course be
>
> (feed(x) | spam | eggs | cheese)()   # feed would be just  def 
> feed(x): return x
>

Sorry, I messed that up. "feed" would of course be:

  def feed(x):
      def feeder():
          return x
      return feeder


> But I'm not sure I like it. Luckily, (cheese @ eggs @ spam)(x) does 
> not have this problem. However, if cheese, eggs and spam were matrix 
> transformations, one would write
>
> cheese @ eggs @ spam @ x
>
> But perhaps numpy would want to bridge this gap with extended behavior 
> (allow calling numpy functions with @ or "calling a matrix 
> transformation" with () ). Or perhaps not :).
>
> -- Koos
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


From stephen at xemacs.org  Sun May 10 19:52:37 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 11 May 2015 02:52:37 +0900
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
Message-ID: <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp>

Gregory Salvan writes:

 > Nobody convinced by arrow operator ?
 > 
 > like: arg -> spam -> eggs -> cheese
 > or cheese <- eggs <- spam <- arg

Yuck.  There are living languages (R) that use an arrow as an
assignment operator, and others (or perhaps you consider C a zombie
language <wink/>) that uses one as a member operator.  I would prefer
the C++ pipe operators, ie, << and >>.

But that's just bikeshedding a moot point; I doubt most people would
be favorable to introducing more operator symbols for this purpose,
and I personally would be opposed.  If functools was more popular and
its users were screaming for operators the way the numerical folk
screamed for a matrix multiplication operator, I'd be more sympathetic.
But they're not screaming that I can hear.

To give an idea of how difficult it is to get an operator added, it
took at least a decade to get the matrix multiplication operator added
after it was first proposed, and two of the key steps were first the
introduction of unary "@" for decorator application (another case that
screamed for a new operator), and then the proponents dropping the
"@@" operator from their proposal.

From larocca at abiresearch.com  Sun May 10 20:40:25 2015
From: larocca at abiresearch.com (Douglas La Rocca)
Date: Sun, 10 May 2015 18:40:25 +0000
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>,
 <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <87F614BB-795F-4B77-9D87-B544E75786AF@abiresearch.com>

I agree an operator is really unnecessary, especially because parens would be needed anyway for subexpressions. A LISP-like syntax would work better than something trying to imitate Haskell.

    make_breakfast = (make_spam make_eggs(2, 'overeasy') make_cheese)
    make_breakfast()

It also has the sense of collecting and reifying a series of functions rather than declaring a tuple or list or some other data structure. Why bother with the left/right associative issues of infix operators?

Suppose you wanted a simple quick composition of a few functions where the expression would be called right away. With infix it might look like 

    (list @ sorted @ ','.join)('a string of chars to be sorted, then joined on commas')

But you already have the parens so why not just

    (list sorted ','.join)('a string ...')

On May 10, 2015, at 1:53 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> 
> Gregory Salvan writes:
> 
>> Nobody convinced by arrow operator ?
>> 
>> like: arg -> spam -> eggs -> cheese
>> or cheese <- eggs <- spam <- arg
> 
> Yuck.  There are living languages (R) that use an arrow as an
> assignment operator, and others (or perhaps you consider C a zombie
> language <wink/>) that uses one as a member operator.  I would prefer
> the C++ pipe operators, ie, << and >>.
> 
> But that's just bikeshedding a moot point; I doubt most people would
> be favorable to introducing more operator symbols for this purpose,
> and I personally would be opposed.  If functools was more popular and
> its users were screaming for operators the way the numerical folk
> screamed for a matrix multiplication operator, I'd be more sympathetic.
> But they're not screaming that I can hear.
> 
> To give an idea of how difficult it is to get an operator added, it
> took at least a decade to get the matrix multiplication operator added
> after it was first proposed, and two of the key steps were first the
> introduction of unary "@" for decorator application (another case that
> screamed for a new operator), and then the proponents dropping the
> "@@" operator from their proposal.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From ron3200 at gmail.com  Sun May 10 21:55:03 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Sun, 10 May 2015 15:55:03 -0400
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <20150510095729.GK5663@ando.pearwood.info>
References: <16174_1431214114_554E9821_16174_32_1_CAOMjWkmVXoMr07tOUKL7iEavN8b3sVer7nPPMrRH55RTtN60dw@mail.gmail.com>
 <554EAB9A.2090501@aalto.fi> <20150510032016.GF5663@ando.pearwood.info>
 <84d2cc44a7004fbba33391ae41c19e67@swordfish.abiresearch.com>
 <CAPTjJmp-936NZ5SZeSq2PgtLc2KLTC9Afrp8X6jiYGLeN96UvQ@mail.gmail.com>
 <20150510095729.GK5663@ando.pearwood.info>
Message-ID: <miod2o$cu3$1@ger.gmane.org>



On 05/10/2015 05:57 AM, Steven D'Aprano wrote:
> There's lots of code that assumes that a tuple of functions is a
> sequence:
>
> for f in (len, str, ord, chr, repr):
>      test(f)
>
> so we would need to keep that. But we don't want a composed function to
> be a sequence, any more than we want a partial or a regular function to
> be sequences. If I pass you a Composed object, and you try slicing it,
> that should be an error.


It seems to me a linked list of composed objects works (rather than a 
sequence).  It's easier to understand what is going on in it.


from functools import partial
from operator import *
from statistics import mean

def root(x):
     return x ** .5

def square(x):
     return x ** 2

class CF:
     def __init__(self, f, *rest):
         if isinstance(f, tuple):
             self.f = partial(*f)
         else:
             self.f = f
         if rest:
             self.child = CF(*rest)
         else:
             self.child = None
     def __call__(self, data):
         if self.child == None:
             return self.f(data)
         return self.f(self.child(data))
     def __repr__(self):
         if self.child != None:
             s = repr(self.child)
         else:
             s = "CS()"
         return s[:3] + ("%s, " % repr(self.f)) + s[3:]


CF(print, root, mean, (map, square)) ([4, 9, 16])


Prints:
10.847426730181986



From koos.zevenhoven at aalto.fi  Sun May 10 22:06:21 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Sun, 10 May 2015 23:06:21 +0300
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi> <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
Message-ID: <554FBA3D.30907@aalto.fi>

Reading the recent emails in the function composition thread started by 
Ivan, I realized that my below sketch for a composition operator would 
be better if it did not actually do function composition ;). Instead, -> 
would be quite powerful as 'just' a partial operator -- perhaps even 
more powerful, as I demonstrate below. However, this is not an argument 
against @ composition, which might in fact play together with this quite 
nicely.

This allows some nice things with multi-argument functions too.

I realize that it may be unlikely that a new operator would be added, 
but here it is anyway, as food for thought.  (With an existing operator, 
I suspect it would be even less likely, because of precedence rules : )

So, -> would be an operator with a precedence similar to .attribute 
access (but lower than .attribute):

  # The simple definition of what it does:
  arg->func   # equivalent to functools.partial(func, arg)

This would allow for instance:
  arg -> spam() -> cheese(kind = 'gouda') -> eggs()

which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda'))

Or even together together with the proposed @ composition:
  rms = root @ mean @ square->map     # for an iterable non-numpy argument

And here's something I find quite interesting. Together with 
@singledispatch from 3.4 (or possibly an enhanced version using type 
annotations in the future?), one could add 'third-party methods' to 
classes in other libraries without monkey patching. A dummy example:

from numpy import array
my_list = [1,2,3]
my_array = array(my_list)
my_mean = my_array.mean()  # This currently works in numpy

from rmslib import rms
my_rms = my_array->rms()  # efficient rms for numpy arrays
my_other_rms = my_list->rms()  # rms that works on any iterable

One would be able to distinguish between calls to methods and 
'third-party methods' based on whether . or -> is used for accessing 
them, which I think is a good thing. Also, third-party methods would be 
less likely to mutate the object, just like func(obj) is less likely to 
mutate obj than obj.method().

See more examples below. I converted my examples from last night to this 
IMO better version, because at least some of them would still be relevant.

On 10.5.2015 2:07, Koos Zevenhoven wrote:
> On 10.5.2015 1:03, Gregory Salvan wrote:
>> Nobody convinced by arrow operator ?
>>
>> like: arg -> spam -> eggs -> cheese
>> or cheese <- eggs <- spam <- arg
>>
>>
>
> I like | a lot because of the pipe analogy. However, having a new 
> operator for this could solve some issues about operator precedence.
>
> Today, I sketched one possible version that would use a new .. 
> operator. I'll explain what it would do (but with your -> instead of 
> my ..)
>
> Here, the operator (.. or ->) would have a higher precedence than 
> function calls () but a lower precedence than attribute access (obj.attr).
>
> First, with single-argument functions spam, eggs and cheese, and a 
> non-function arg:
>
> arg->eggs->spam->cheese()   # equivalent to cheese(spam(eggs(arg)))

With -> as a partial operator, this would instead be:

arg->eggs()->spam()->cheese()     # equivalent to cheese(spam(eggs(arg)))

> eggs->spam->cheese  # equivalent to lambda arg: cheese(spam(eggs(arg)))
>

With -> as a partial operator this could be:

lambda arg: arg->eggs()->spam()->cheese()


> Then, if spam and eggs both took two arguments; eggs(arg1, arg2), 
> spam(arg1, arg2)
>
> arg->eggs   # equivalent to partial(eggs, arg)
> eggs->spam(a, b, c)   # equivalent to spam(eggs(a, b), c)

With -> as a partial operator, the first one would work, and the second 
would become:

eggs(a,b)->spam(c)     # equivalent to spam(eggs(a, b), c)

> arg->eggs->spam(b,c)    # equivalent to spam(eggs(arg, b), c)
>

This would become:

arg->eggs(b)->spam(c)     # equivalent to spam(eggs(arg, b), c)

Note that this would be quite flexible in partial 'piping' of 
multi-argument functions.

> So you could think of -> as an extended partial operator. And this 
> would naturally generalize to functions with even more arguments. The 
> arguments would always be fed in the same order as in the equivalent 
> function call, which makes for a nice rule of thumb. However, I 
> suppose one would usually avoid combinations that are difficult to 
> understand.
>
> Some examples that this would enable:
>
>  # Example 1
>  from numpy import square, mean, sqrt
>  rms = square->mean->sqrt  # I think this order is fine because it is 
> not @
>

This would become:

def rms(arr):
     return arr->square()->mean()->sqrt()

>  # Example 2 (both are equivalent)
>  spam(args)->eggs->cheese() # the shell-syntax analogy that Steven 
> mentioned.
>

This would be:

spam(args)->eggs()->cheese()

Of course the shell piping analogy would be quite far, because it looks 
so different.

>  # Example 3
>  # Last but not least, we would finally have this :)
>  some_sequence->len()
>  some_object->isinstance(MyType)
>

And:

  func->map(seq)
  func->reduce(seq)

-- Koos






From apieum at gmail.com  Sun May 10 23:11:51 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Sun, 10 May 2015 23:11:51 +0200
Subject: [Python-ideas] Function composition (was no subject)
In-Reply-To: <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <87k2wg2uca.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAAZsQLAU=39xF7UVUz8TFSCTYE5YFchkf4vbGPNFgbLSiTfGxQ@mail.gmail.com>

Stephen J. Turnbull, ok, I was wong about community expectation, I thougth
functools would be more popular with new symbols.
Personnally, I made a wide use of functionnal paradigm but except when I
need an heavy use of partial and reduce, the simple fact of importing
functools and use the "partial" function has a higher cost than making it
differently.
That's also because python syntax is really convenient and lambda,
decorators, iterators... allow a lot of things.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150510/3d49d969/attachment.html>

From apieum at gmail.com  Sun May 10 23:23:40 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Sun, 10 May 2015 23:23:40 +0200
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <554FBA3D.30907@aalto.fi>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi>
Message-ID: <CAAZsQLDhmmk5tCiUzJsUWyhLzNofujwLB6sxMu0H3QLVqnCSTg@mail.gmail.com>

In my opinion, this syntax make problems when your arguments are
functions/callables.
And if you code in a functionnal paradigm it is quite common to inject
functions in arguments otherwise how would you do polymorphism ?

The only way I see to distinguish cases is to have tuples, but syntax is
quite strange.

instead of : arg->eggs(b)->spam(c)
my_partial = (arg, b)->eggs->(c, )->spam

Then how would you call my_partial ?
For example, if you have:
def eggs(a, b, c)...
def spam(d, e)...

my_partial(c, e) or my_partial(c)(e) ?



2015-05-10 22:06 GMT+02:00 Koos Zevenhoven <koos.zevenhoven at aalto.fi>:

> Reading the recent emails in the function composition thread started by
> Ivan, I realized that my below sketch for a composition operator would be
> better if it did not actually do function composition ;). Instead, -> would
> be quite powerful as 'just' a partial operator -- perhaps even more
> powerful, as I demonstrate below. However, this is not an argument against
> @ composition, which might in fact play together with this quite nicely.
>
> This allows some nice things with multi-argument functions too.
>
> I realize that it may be unlikely that a new operator would be added, but
> here it is anyway, as food for thought.  (With an existing operator, I
> suspect it would be even less likely, because of precedence rules : )
>
> So, -> would be an operator with a precedence similar to .attribute access
> (but lower than .attribute):
>
>  # The simple definition of what it does:
>  arg->func   # equivalent to functools.partial(func, arg)
>
> This would allow for instance:
>  arg -> spam() -> cheese(kind = 'gouda') -> eggs()
>
> which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda'))
>
> Or even together together with the proposed @ composition:
>  rms = root @ mean @ square->map     # for an iterable non-numpy argument
>
> And here's something I find quite interesting. Together with
> @singledispatch from 3.4 (or possibly an enhanced version using type
> annotations in the future?), one could add 'third-party methods' to classes
> in other libraries without monkey patching. A dummy example:
>
> from numpy import array
> my_list = [1,2,3]
> my_array = array(my_list)
> my_mean = my_array.mean()  # This currently works in numpy
>
> from rmslib import rms
> my_rms = my_array->rms()  # efficient rms for numpy arrays
> my_other_rms = my_list->rms()  # rms that works on any iterable
>
> One would be able to distinguish between calls to methods and 'third-party
> methods' based on whether . or -> is used for accessing them, which I think
> is a good thing. Also, third-party methods would be less likely to mutate
> the object, just like func(obj) is less likely to mutate obj than
> obj.method().
>
> See more examples below. I converted my examples from last night to this
> IMO better version, because at least some of them would still be relevant.
>
> On 10.5.2015 2:07, Koos Zevenhoven wrote:
>
>> On 10.5.2015 1:03, Gregory Salvan wrote:
>>
>>> Nobody convinced by arrow operator ?
>>>
>>> like: arg -> spam -> eggs -> cheese
>>> or cheese <- eggs <- spam <- arg
>>>
>>>
>>>
>> I like | a lot because of the pipe analogy. However, having a new
>> operator for this could solve some issues about operator precedence.
>>
>> Today, I sketched one possible version that would use a new .. operator.
>> I'll explain what it would do (but with your -> instead of my ..)
>>
>> Here, the operator (.. or ->) would have a higher precedence than
>> function calls () but a lower precedence than attribute access (obj.attr).
>>
>> First, with single-argument functions spam, eggs and cheese, and a
>> non-function arg:
>>
>> arg->eggs->spam->cheese()   # equivalent to cheese(spam(eggs(arg)))
>>
>
> With -> as a partial operator, this would instead be:
>
> arg->eggs()->spam()->cheese()     # equivalent to cheese(spam(eggs(arg)))
>
>  eggs->spam->cheese  # equivalent to lambda arg: cheese(spam(eggs(arg)))
>>
>>
> With -> as a partial operator this could be:
>
> lambda arg: arg->eggs()->spam()->cheese()
>
>
>  Then, if spam and eggs both took two arguments; eggs(arg1, arg2),
>> spam(arg1, arg2)
>>
>> arg->eggs   # equivalent to partial(eggs, arg)
>> eggs->spam(a, b, c)   # equivalent to spam(eggs(a, b), c)
>>
>
> With -> as a partial operator, the first one would work, and the second
> would become:
>
> eggs(a,b)->spam(c)     # equivalent to spam(eggs(a, b), c)
>
>  arg->eggs->spam(b,c)    # equivalent to spam(eggs(arg, b), c)
>>
>>
> This would become:
>
> arg->eggs(b)->spam(c)     # equivalent to spam(eggs(arg, b), c)
>
> Note that this would be quite flexible in partial 'piping' of
> multi-argument functions.
>
>  So you could think of -> as an extended partial operator. And this would
>> naturally generalize to functions with even more arguments. The arguments
>> would always be fed in the same order as in the equivalent function call,
>> which makes for a nice rule of thumb. However, I suppose one would usually
>> avoid combinations that are difficult to understand.
>>
>> Some examples that this would enable:
>>
>>  # Example 1
>>  from numpy import square, mean, sqrt
>>  rms = square->mean->sqrt  # I think this order is fine because it is not
>> @
>>
>>
> This would become:
>
> def rms(arr):
>     return arr->square()->mean()->sqrt()
>
>   # Example 2 (both are equivalent)
>>  spam(args)->eggs->cheese() # the shell-syntax analogy that Steven
>> mentioned.
>>
>>
> This would be:
>
> spam(args)->eggs()->cheese()
>
> Of course the shell piping analogy would be quite far, because it looks so
> different.
>
>   # Example 3
>>  # Last but not least, we would finally have this :)
>>  some_sequence->len()
>>  some_object->isinstance(MyType)
>>
>>
> And:
>
>  func->map(seq)
>  func->reduce(seq)
>
> -- Koos
>
>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150510/2da3664a/attachment.html>

From koos.zevenhoven at aalto.fi  Sun May 10 23:41:59 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Mon, 11 May 2015 00:41:59 +0300
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAAZsQLDhmmk5tCiUzJsUWyhLzNofujwLB6sxMu0H3QLVqnCSTg@mail.gmail.com>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>	<17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>	<554C5FC0.1070106@aalto.fi>	<874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>	<EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>	<mil9lv$hi2$1@ger.gmane.org>	<27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>	<554E5CC9.3010406@aalto.fi>	<CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>	<10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>	<14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>	<554FBA3D.30907@aalto.fi>
 <CAAZsQLDhmmk5tCiUzJsUWyhLzNofujwLB6sxMu0H3QLVqnCSTg@mail.gmail.com>
Message-ID: <554FD0A7.8010606@aalto.fi>

Hi Gregory,

Did you look at the new version carefully? If I understand the problem 
you are describing (mentioned also by Steven), my previous version had 
that issue, but the new one does not. That is why I added examples with 
callable arguments :).

-- Koos


On 11.5.2015 0:23, Gregory Salvan wrote:
> In my opinion, this syntax make problems when your arguments are 
> functions/callables.
> And if you code in a functionnal paradigm it is quite common to inject 
> functions in arguments otherwise how would you do polymorphism ?
>
> The only way I see to distinguish cases is to have tuples, but syntax 
> is quite strange.
>
> instead of : arg->eggs(b)->spam(c)
> my_partial = (arg, b)->eggs->(c, )->spam
>
> Then how would you call my_partial ?
> For example, if you have:
> def eggs(a, b, c)...
> def spam(d, e)...
>
> my_partial(c, e) or my_partial(c)(e) ?
>
>
>
> 2015-05-10 22:06 GMT+02:00 Koos Zevenhoven <koos.zevenhoven at aalto.fi 
> <mailto:koos.zevenhoven at aalto.fi>>:
>
>     Reading the recent emails in the function composition thread
>     started by Ivan, I realized that my below sketch for a composition
>     operator would be better if it did not actually do function
>     composition ;). Instead, -> would be quite powerful as 'just' a
>     partial operator -- perhaps even more powerful, as I demonstrate
>     below. However, this is not an argument against @ composition,
>     which might in fact play together with this quite nicely.
>
>     This allows some nice things with multi-argument functions too.
>
>     I realize that it may be unlikely that a new operator would be
>     added, but here it is anyway, as food for thought.  (With an
>     existing operator, I suspect it would be even less likely, because
>     of precedence rules : )
>
>     So, -> would be an operator with a precedence similar to
>     .attribute access (but lower than .attribute):
>
>      # The simple definition of what it does:
>      arg->func   # equivalent to functools.partial(func, arg)
>
>     This would allow for instance:
>      arg -> spam() -> cheese(kind = 'gouda') -> eggs()
>
>     which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda'))
>
>     Or even together together with the proposed @ composition:
>      rms = root @ mean @ square->map     # for an iterable non-numpy
>     argument
>
>     And here's something I find quite interesting. Together with
>     @singledispatch from 3.4 (or possibly an enhanced version using
>     type annotations in the future?), one could add 'third-party
>     methods' to classes in other libraries without monkey patching. A
>     dummy example:
>
>     from numpy import array
>     my_list = [1,2,3]
>     my_array = array(my_list)
>     my_mean = my_array.mean()  # This currently works in numpy
>
>     from rmslib import rms
>     my_rms = my_array->rms()  # efficient rms for numpy arrays
>     my_other_rms = my_list->rms()  # rms that works on any iterable
>
>     One would be able to distinguish between calls to methods and
>     'third-party methods' based on whether . or -> is used for
>     accessing them, which I think is a good thing. Also, third-party
>     methods would be less likely to mutate the object, just like
>     func(obj) is less likely to mutate obj than obj.method().
>
>     See more examples below. I converted my examples from last night
>     to this IMO better version, because at least some of them would
>     still be relevant.
>
>     On 10.5.2015 2:07, Koos Zevenhoven wrote:
>
>         On 10.5.2015 1:03, Gregory Salvan wrote:
>
>             Nobody convinced by arrow operator ?
>
>             like: arg -> spam -> eggs -> cheese
>             or cheese <- eggs <- spam <- arg
>
>
>
>         I like | a lot because of the pipe analogy. However, having a
>         new operator for this could solve some issues about operator
>         precedence.
>
>         Today, I sketched one possible version that would use a new ..
>         operator. I'll explain what it would do (but with your ->
>         instead of my ..)
>
>         Here, the operator (.. or ->) would have a higher precedence
>         than function calls () but a lower precedence than attribute
>         access (obj.attr).
>
>         First, with single-argument functions spam, eggs and cheese,
>         and a non-function arg:
>
>         arg->eggs->spam->cheese()   # equivalent to
>         cheese(spam(eggs(arg)))
>
>
>     With -> as a partial operator, this would instead be:
>
>     arg->eggs()->spam()->cheese()     # equivalent to
>     cheese(spam(eggs(arg)))
>
>         eggs->spam->cheese  # equivalent to lambda arg:
>         cheese(spam(eggs(arg)))
>
>
>     With -> as a partial operator this could be:
>
>     lambda arg: arg->eggs()->spam()->cheese()
>
>
>         Then, if spam and eggs both took two arguments; eggs(arg1,
>         arg2), spam(arg1, arg2)
>
>         arg->eggs   # equivalent to partial(eggs, arg)
>         eggs->spam(a, b, c)   # equivalent to spam(eggs(a, b), c)
>
>
>     With -> as a partial operator, the first one would work, and the
>     second would become:
>
>     eggs(a,b)->spam(c)     # equivalent to spam(eggs(a, b), c)
>
>         arg->eggs->spam(b,c)    # equivalent to spam(eggs(arg, b), c)
>
>
>     This would become:
>
>     arg->eggs(b)->spam(c)     # equivalent to spam(eggs(arg, b), c)
>
>     Note that this would be quite flexible in partial 'piping' of
>     multi-argument functions.
>
>         So you could think of -> as an extended partial operator. And
>         this would naturally generalize to functions with even more
>         arguments. The arguments would always be fed in the same order
>         as in the equivalent function call, which makes for a nice
>         rule of thumb. However, I suppose one would usually avoid
>         combinations that are difficult to understand.
>
>         Some examples that this would enable:
>
>          # Example 1
>          from numpy import square, mean, sqrt
>          rms = square->mean->sqrt  # I think this order is fine
>         because it is not @
>
>
>     This would become:
>
>     def rms(arr):
>         return arr->square()->mean()->sqrt()
>
>          # Example 2 (both are equivalent)
>          spam(args)->eggs->cheese() # the shell-syntax analogy that
>         Steven mentioned.
>
>
>     This would be:
>
>     spam(args)->eggs()->cheese()
>
>     Of course the shell piping analogy would be quite far, because it
>     looks so different.
>
>          # Example 3
>          # Last but not least, we would finally have this :)
>          some_sequence->len()
>          some_object->isinstance(MyType)
>
>
>     And:
>
>      func->map(seq)
>      func->reduce(seq)
>
>     -- Koos
>
>
>
>
>
>     _______________________________________________
>     Python-ideas mailing list
>     Python-ideas at python.org <mailto:Python-ideas at python.org>
>     https://mail.python.org/mailman/listinfo/python-ideas
>     Code of Conduct: http://python.org/psf/codeofconduct/
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/c22f2ac5/attachment-0001.html>

From koos.zevenhoven at aalto.fi  Mon May 11 00:42:27 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Mon, 11 May 2015 01:42:27 +0300
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
	'piping')
Message-ID: <554FDED3.8030200@aalto.fi>

Hi everyone!

(Sorry about double posting, but I wanted to start a new thread, which I 
tried but apparently failed to do last time. Although inspired by and 
related to the function composition discussion, this is now something 
different and should not be cluttering the composition thread.)

Reading the recent emails in the function composition thread started by 
Ivan, I realized that my sketch for a composition operator (from 
yesterday, quoted below) would be much better if it did not actually do 
function composition . Instead, -> would be quite powerful as 'just' a 
partial operator -- perhaps even more powerful, as I demonstrate below. 
However, this is not an argument against @ composition, which might in 
fact play together with this quite nicely.

This allows some nice things with multi-argument functions too.

I realize that it may be unlikely that a new operator would be added, 
but here it is anyway, as food for thought.  (With an existing operator, 
I suspect it would be even less likely, because of precedence rules : )

So, -> would be an operator with a precedence similar to .attribute 
access (but lower than .attribute):

  # The simple definition of what it does:
  arg->func   # equivalent to functools.partial(func, arg)

This would allow for instance:
  arg -> spam() -> cheese(kind = 'gouda') -> eggs()

which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda'))

Or even together together with the proposed @ composition:
  rms = root @ mean @ square->map     # for an iterable non-numpy argument

And here's something I find quite interesting. Together with 
@singledispatch from 3.4 (or possibly an enhanced version using type 
annotations in the future?), one could add 'third-party methods' to 
classes in other libraries without monkey patching. A dummy example:

from numpy import array
my_list = [1,2,3]
my_array = array(my_list)
my_mean = my_array.mean()  # This currently works in numpy

from rmslib import rms
my_rms = my_array->rms()  # efficient rms for numpy arrays
my_other_rms = my_list->rms()  # rms that works on any iterable

One would be able to distinguish between calls to methods and 
'third-party methods' based on whether . or -> is used for accessing 
them, which I think is a good thing. Also, third-party methods would be 
less likely to mutate the object, just like func(obj) is less likely to 
mutate obj than obj.method().

See more examples below. I converted my examples from last night to this 
IMO better version, because at least some of them would still be relevant.

On 10.5.2015 2:07, Koos Zevenhoven wrote:
> On 10.5.2015 1:03, Gregory Salvan wrote:
>> Nobody convinced by arrow operator ?
>>
>> like: arg -> spam -> eggs -> cheese
>> or cheese <- eggs <- spam <- arg
>>
>>
>
> I like | a lot because of the pipe analogy. However, having a new 
> operator for this could solve some issues about operator precedence.
>
> Today, I sketched one possible version that would use a new .. 
> operator. I'll explain what it would do (but with your -> instead of 
> my ..)
>
> Here, the operator (.. or ->) would have a higher precedence than 
> function calls () but a lower precedence than attribute access 
> (obj.attr).
>
> First, with single-argument functions spam, eggs and cheese, and a 
> non-function arg:
>
> arg->eggs->spam->cheese()   # equivalent to cheese(spam(eggs(arg)))

With -> as a partial operator, this would instead be:

arg->eggs()->spam()->cheese()     # equivalent to cheese(spam(eggs(arg)))

> eggs->spam->cheese # equivalent to lambda arg: cheese(spam(eggs(arg)))
>

With -> as a partial operator this could be:

lambda arg: arg->eggs()->spam()->cheese()


> Then, if spam and eggs both took two arguments; eggs(arg1, arg2), 
> spam(arg1, arg2)
>
> arg->eggs   # equivalent to partial(eggs, arg)
> eggs->spam(a, b, c)   # equivalent to spam(eggs(a, b), c)

With -> as a partial operator, the first one would work, and the second 
would become:

eggs(a,b)->spam(c)     # equivalent to spam(eggs(a, b), c)

> arg->eggs->spam(b,c) # equivalent to spam(eggs(arg, b), c)
>

This would become:

arg->eggs(b)->spam(c)     # equivalent to spam(eggs(arg, b), c)

Note that this would be quite flexible in partial 'piping' of 
multi-argument functions.

> So you could think of -> as an extended partial operator. And this 
> would naturally generalize to functions with even more arguments. The 
> arguments would always be fed in the same order as in the equivalent 
> function call, which makes for a nice rule of thumb. However, I 
> suppose one would usually avoid combinations that are difficult to 
> understand.
>
> Some examples that this would enable:
>
>  # Example 1
>  from numpy import square, mean, sqrt
>  rms = square->mean->sqrt  # I think this order is fine because it is 
> not @
>

This would become:

def rms(arr):
     return arr->square()->mean()->sqrt()

>  # Example 2 (both are equivalent)
>  spam(args)->eggs->cheese() # the shell-syntax analogy that Steven 
> mentioned.
>

This would be:

spam(args)->eggs()->cheese()

Of course the shell piping analogy would be quite far, because it looks 
so different.

>  # Example 3
>  # Last but not least, we would finally have this
>  some_sequence->len()
>  some_object->isinstance(MyType)
>

And:

  func->map(seq)
  func->reduce(seq)

-- Koos

From apieum at gmail.com  Mon May 11 01:40:23 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Mon, 11 May 2015 01:40:23 +0200
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <554FD0A7.8010606@aalto.fi>
References: <CAOMjWkknQ4RDM13pgKagmJ_WOO5s7uzi3bnd87pXfr6E-yofKg@mail.gmail.com>
 <17583_1431062421_554C4795_17583_208_1_CAJ+Teoe1EBBKShc86TkhpCA3HPyKTTfrmh+0mgzEP7_vE5Bi2A@mail.gmail.com>
 <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi>
 <CAAZsQLDhmmk5tCiUzJsUWyhLzNofujwLB6sxMu0H3QLVqnCSTg@mail.gmail.com>
 <554FD0A7.8010606@aalto.fi>
Message-ID: <CAAZsQLAS7ioooe8Fehv-2Vbk+5GdjE==+3QoRZqo1=G8ZVfkvQ@mail.gmail.com>

Nope sorry I've misread your code, but it changes nothing.

for example with spam(args)->eggs()->cheese()

if instead you have:
args=something
spam = lambda: args

spam()->eggs()->cheese()
should be treaten as: cheese(eggs(spam())) or cheese(eggs(args)) or
partial(cheese) circle partial(eggs) circle partial(spam) ?


I don't find this syntax convenient, sorry.




2015-05-10 23:41 GMT+02:00 Koos Zevenhoven <koos.zevenhoven at aalto.fi>:

>  Hi Gregory,
>
> Did you look at the new version carefully? If I understand the problem you
> are describing (mentioned also by Steven), my previous version had that
> issue, but the new one does not. That is why I added examples with callable
> arguments :).
>
> -- Koos
>
>
>
> On 11.5.2015 0:23, Gregory Salvan wrote:
>
>     In my opinion, this syntax make problems when your arguments are
> functions/callables.
> And if you code in a functionnal paradigm it is quite common to inject
> functions in arguments otherwise how would you do polymorphism ?
>
>  The only way I see to distinguish cases is to have tuples, but syntax is
> quite strange.
>
> instead of : arg->eggs(b)->spam(c)
>  my_partial = (arg, b)->eggs->(c, )->spam
>
>  Then how would you call my_partial ?
>  For example, if you have:
>  def eggs(a, b, c)...
>  def spam(d, e)...
>
>  my_partial(c, e) or my_partial(c)(e) ?
>
>
>
> 2015-05-10 22:06 GMT+02:00 Koos Zevenhoven <koos.zevenhoven at aalto.fi>:
>
>> Reading the recent emails in the function composition thread started by
>> Ivan, I realized that my below sketch for a composition operator would be
>> better if it did not actually do function composition ;). Instead, -> would
>> be quite powerful as 'just' a partial operator -- perhaps even more
>> powerful, as I demonstrate below. However, this is not an argument against
>> @ composition, which might in fact play together with this quite nicely.
>>
>> This allows some nice things with multi-argument functions too.
>>
>> I realize that it may be unlikely that a new operator would be added, but
>> here it is anyway, as food for thought.  (With an existing operator, I
>> suspect it would be even less likely, because of precedence rules : )
>>
>> So, -> would be an operator with a precedence similar to .attribute
>> access (but lower than .attribute):
>>
>>  # The simple definition of what it does:
>>  arg->func   # equivalent to functools.partial(func, arg)
>>
>> This would allow for instance:
>>  arg -> spam() -> cheese(kind = 'gouda') -> eggs()
>>
>> which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda'))
>>
>> Or even together together with the proposed @ composition:
>>  rms = root @ mean @ square->map     # for an iterable non-numpy argument
>>
>> And here's something I find quite interesting. Together with
>> @singledispatch from 3.4 (or possibly an enhanced version using type
>> annotations in the future?), one could add 'third-party methods' to classes
>> in other libraries without monkey patching. A dummy example:
>>
>> from numpy import array
>> my_list = [1,2,3]
>> my_array = array(my_list)
>> my_mean = my_array.mean()  # This currently works in numpy
>>
>> from rmslib import rms
>> my_rms = my_array->rms()  # efficient rms for numpy arrays
>> my_other_rms = my_list->rms()  # rms that works on any iterable
>>
>> One would be able to distinguish between calls to methods and
>> 'third-party methods' based on whether . or -> is used for accessing them,
>> which I think is a good thing. Also, third-party methods would be less
>> likely to mutate the object, just like func(obj) is less likely to mutate
>> obj than obj.method().
>>
>> See more examples below. I converted my examples from last night to this
>> IMO better version, because at least some of them would still be relevant.
>>
>> On 10.5.2015 2:07, Koos Zevenhoven wrote:
>>
>>> On 10.5.2015 1:03, Gregory Salvan wrote:
>>>
>>>> Nobody convinced by arrow operator ?
>>>>
>>>> like: arg -> spam -> eggs -> cheese
>>>> or cheese <- eggs <- spam <- arg
>>>>
>>>>
>>>>
>>> I like | a lot because of the pipe analogy. However, having a new
>>> operator for this could solve some issues about operator precedence.
>>>
>>> Today, I sketched one possible version that would use a new .. operator.
>>> I'll explain what it would do (but with your -> instead of my ..)
>>>
>>> Here, the operator (.. or ->) would have a higher precedence than
>>> function calls () but a lower precedence than attribute access (obj.attr).
>>>
>>> First, with single-argument functions spam, eggs and cheese, and a
>>> non-function arg:
>>>
>>> arg->eggs->spam->cheese()   # equivalent to cheese(spam(eggs(arg)))
>>>
>>
>> With -> as a partial operator, this would instead be:
>>
>> arg->eggs()->spam()->cheese()     # equivalent to cheese(spam(eggs(arg)))
>>
>>  eggs->spam->cheese  # equivalent to lambda arg: cheese(spam(eggs(arg)))
>>>
>>>
>> With -> as a partial operator this could be:
>>
>> lambda arg: arg->eggs()->spam()->cheese()
>>
>>
>>  Then, if spam and eggs both took two arguments; eggs(arg1, arg2),
>>> spam(arg1, arg2)
>>>
>>> arg->eggs   # equivalent to partial(eggs, arg)
>>> eggs->spam(a, b, c)   # equivalent to spam(eggs(a, b), c)
>>>
>>
>> With -> as a partial operator, the first one would work, and the second
>> would become:
>>
>> eggs(a,b)->spam(c)     # equivalent to spam(eggs(a, b), c)
>>
>>  arg->eggs->spam(b,c)    # equivalent to spam(eggs(arg, b), c)
>>>
>>>
>> This would become:
>>
>> arg->eggs(b)->spam(c)     # equivalent to spam(eggs(arg, b), c)
>>
>> Note that this would be quite flexible in partial 'piping' of
>> multi-argument functions.
>>
>>  So you could think of -> as an extended partial operator. And this would
>>> naturally generalize to functions with even more arguments. The arguments
>>> would always be fed in the same order as in the equivalent function call,
>>> which makes for a nice rule of thumb. However, I suppose one would usually
>>> avoid combinations that are difficult to understand.
>>>
>>> Some examples that this would enable:
>>>
>>>  # Example 1
>>>  from numpy import square, mean, sqrt
>>>  rms = square->mean->sqrt  # I think this order is fine because it is
>>> not @
>>>
>>>
>> This would become:
>>
>> def rms(arr):
>>     return arr->square()->mean()->sqrt()
>>
>>   # Example 2 (both are equivalent)
>>>  spam(args)->eggs->cheese() # the shell-syntax analogy that Steven
>>> mentioned.
>>>
>>>
>> This would be:
>>
>> spam(args)->eggs()->cheese()
>>
>> Of course the shell piping analogy would be quite far, because it looks
>> so different.
>>
>>   # Example 3
>>>  # Last but not least, we would finally have this :)
>>>  some_sequence->len()
>>>  some_object->isinstance(MyType)
>>>
>>>
>> And:
>>
>>  func->map(seq)
>>  func->reduce(seq)
>>
>> -- Koos
>>
>>
>>
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/507d72dd/attachment-0001.html>

From steve at pearwood.info  Mon May 11 03:44:12 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 11 May 2015 11:44:12 +1000
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
	'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <554FBA3D.30907@aalto.fi>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi>
Message-ID: <20150511014412.GL5663@ando.pearwood.info>

On Sun, May 10, 2015 at 11:06:21PM +0300, Koos Zevenhoven wrote:

> So, -> would be an operator with a precedence similar to .attribute 
> access (but lower than .attribute):

Dot . is not an operator. If I remember correctly, the docs describe it 
as a delimiter.

>  # The simple definition of what it does:
>  arg->func   # equivalent to functools.partial(func, arg)

I believe you require that -> is applied before function application, so 

arg->func  # returns partial(func, arg)
arg->func(x)  # returns partial(func, arg)(x)
arg->(func(x))  # returns partial(func(x), arg)

> This would allow for instance:
>  arg -> spam() -> cheese(kind = 'gouda') -> eggs()

I am having a lot of difficulty seeing that as anything other than "call 
spam with no arguments, then apply arg to the result". But, teasing it 
apart with the precedence I established above:

arg->spam()  # returns partial(spam, arg)() == spam(arg)
""" -> cheese  # returns partial(cheese, spam(arg))

""" (kind='gouda')  # returns partial(cheese, spam(arg))(kind='gouda')
                    # == cheese(spam(arg), kind='gouda')

""" -> eggs  # returns partial(eggs, cheese(spam(arg), kind='gouda'))

""" ()  # calls the previous partial, with no arguments, giving:
        # partial(eggs, cheese(spam(arg), kind='gouda'))()
        # == eggs(cheese(spam(arg), kind='gouda'))


> which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda'))

Amazingly, you are correct! :-)

I think this demonstrates an abuse of partial and the sort of thing that 
gives functional idioms a bad name. To tease this apart and understand 
what it does was very difficult to me. And I don't understand the point 
of creating partial applications that you are then immediately going to 
call, that just adds an extra layer of indirection to slow the code 
down. If you write partial(len, 'foo')() instead of just len('foo'), 
something has gone drastically wrong.

So instead of

arg->spam()->cheese(kind='gouda')->eggs()

which includes *three* partial objects which are immediately called, 
wouldn't it be easier to just call the functions in the first place?

eggs(cheese(spam(arg), kind='gouda'))

It will certainly be more efficient!


Let's run through a simple chain with no parens:

a -> b  # partial(b, a)
a -> b -> c  # partial(c, partial(b, a))
a -> b -> c -> d  # partial(d, partial(c, partial(b, a)))

I'm not seeing why I would want to write something like that.


Let's apply multiple arguments:

a -> func  # partial(func, a)
b -> (a -> func)  # partial(partial(func, a), b)
c -> (b -> (a -> func))  # partial(partial(partial(func, a), b), c)

Perhaps a sufficiently clever implementation of partial could optimize 
partial(partial(func, a), b) to just a single layer of indirection 
partial(func, a, b), so it's not *necessarily* as awful as it looks. (I 
would expect a function composition operator to do the same.)

Note that we have to write the second argument first, and bracket the 
second arrow clause. Writing it the "obvious" way is wrong:

a -> b -> func  # partial(func, partial(b, a))


I think this is imaginative but hard to read, hard to understand, 
hard to use correctly, inefficient, and even if used correctly, there 
are not very many times that you would need it.


> Or even together together with the proposed @ composition:
>  rms = root @ mean @ square->map     # for an iterable non-numpy argument

I think that a single arrow may be reasonable as syntactic sugar for 
partial, but once you start chaining them, it all falls apart into a 
mess. That, in my mind, is a sign that the idea doesn't scale. We can 
chain dots with no problem:

fe.fi.fo.fum

and function calls in numerous ways:

foo(bar(baz()))
foo(bar)(baz)

and although they can get hard to read just because of the sheer number 
of components, they are not conceptually difficult. But chaining arrows 
is conceptually difficult even with as few as two arrows.

I think the problem here is that partial application is an N-ary 
operation. This is not Haskell where single-argument currying is 
enforced everywhere! You're trying to perform something which 
conceptually takes N arguments partial(func, 1, 2, 3, ..., N) using only 
a operator which can only take two arguments a->b. Things are going to 
get messy.


> And here's something I find quite interesting. Together with 
> @singledispatch from 3.4 (or possibly an enhanced version using type 
> annotations in the future?), one could add 'third-party methods' to 
> classes in other libraries without monkey patching. A dummy example:
> 
> from numpy import array
> my_list = [1,2,3]
> my_array = array(my_list)
> my_mean = my_array.mean()  # This currently works in numpy
> 
> from rmslib import rms
> my_rms = my_array->rms()  # efficient rms for numpy arrays
> my_other_rms = my_list->rms()  # rms that works on any iterable

That looks cute, but isn't very interesting. Effectively, you've 
invented a new (and less efficient) syntax for calling a function:

spam->eggs(cheese)  # eggs(spam, cheese)

It's less efficient because it builds a partial object first, so instead 
of one call you end up with two, and a temporary object that gets thrown 
away immediately after it is used. Yes, you could keep the partial 
object around, but as your example shows, you don't. And because it is 
cute, people will write:

a->func(), b->func(), c->func()

and not realise that it creates three partial functions before calling 
them. Writing:

func(a), func(b), func(c)

will avoid that needless overhead.



-- 
Steve

From larocca at abiresearch.com  Mon May 11 04:53:29 2015
From: larocca at abiresearch.com (Douglas La Rocca)
Date: Mon, 11 May 2015 02:53:29 +0000
Subject: [Python-ideas] Partial operator (and 'third-party methods'
	and	'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <20150511014412.GL5663@ando.pearwood.info>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi>,<20150511014412.GL5663@ando.pearwood.info>
Message-ID: <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>

I agree here--I don't think a special operator for functools.partial is desirable. The proposal seems to suggest something between an ordinary lambda expression and Haskell's (>>=) bind.

I expect the arrow to work as it does in Haskell and julia for anonymous functions

    x, *xs -> <some expr, x and xs in locals()>

Or in (very-)pseudo notation

    (->) argspec expr

Then bind (>>=) also comes to mind because it takes a value on the left and a function on the right. But doesn't have the nice things you get with monads.

These become non-issues if functions either explicitly accept and bind one argument at a time (currying/incremental binding), or if a @curried decorator is used. Or Gregory's @arrow decorator (which I've just now discovered!).

So

    arg -> spam() -> cheese(kind = 'gouda') -> eggs()

would be (with composition) written as

    compose(spam, cheese(kind='gouda'), eggs)(arg)

If you want to wrap `cheese` to avoid the awkwardness, you can do

    >>> cheese_kind = lambda kind: lambda *args, kind=kind, **kwargs: cheese(kind=kind)(*args, **kwargs)
    >>> compose(spam, cheese_kind('gouda'), eggs)(arg)

Then if you don't like two explicit sequential function calls, i.e. f(x)(y), there are ways to sugar it up, like

    def single_value_pipeline(fn):
        def wrapper(x, **kwargs):
            return compose(lambda *_: x, *fn(**kwargs))()
        return wrapper

Which would hide `compose` altogether (very anti-PEP8!) and allow binding keyword names across the pipeline:

    @single_value_pipeline
    def breakfast(cheese_kind='gouda'):
        return (spam, 
                cheese(kind=cheese_kind), 
                eggs)

    breakfast(arg, kind='something other than gouda') # what is gouda anyway?!

(`single_value_pipeline` is perhaps a bad name though...)

________________________________________
From: Python-ideas <python-ideas-bounces+larocca=abiresearch.com at python.org> on behalf of Steven D'Aprano <steve at pearwood.info>
Sent: Sunday, May 10, 2015 9:44 PM
To: python-ideas at python.org
Subject: Re: [Python-ideas] Partial operator (and 'third-party methods' and     'piping') [was Re: Function composition (was no subject)]

On Sun, May 10, 2015 at 11:06:21PM +0300, Koos Zevenhoven wrote:

> So, -> would be an operator with a precedence similar to .attribute
> access (but lower than .attribute):

Dot . is not an operator. If I remember correctly, the docs describe it
as a delimiter.

>  # The simple definition of what it does:
>  arg->func   # equivalent to functools.partial(func, arg)

I believe you require that -> is applied before function application, so

arg->func  # returns partial(func, arg)
arg->func(x)  # returns partial(func, arg)(x)
arg->(func(x))  # returns partial(func(x), arg)

> This would allow for instance:
>  arg -> spam() -> cheese(kind = 'gouda') -> eggs()

I am having a lot of difficulty seeing that as anything other than "call
spam with no arguments, then apply arg to the result". But, teasing it
apart with the precedence I established above:

arg->spam()  # returns partial(spam, arg)() == spam(arg)
""" -> cheese  # returns partial(cheese, spam(arg))

""" (kind='gouda')  # returns partial(cheese, spam(arg))(kind='gouda')
                    # == cheese(spam(arg), kind='gouda')

""" -> eggs  # returns partial(eggs, cheese(spam(arg), kind='gouda'))

""" ()  # calls the previous partial, with no arguments, giving:
        # partial(eggs, cheese(spam(arg), kind='gouda'))()
        # == eggs(cheese(spam(arg), kind='gouda'))


> which would be equivalent to eggs(cheese(spam(arg), kind = 'gouda'))

Amazingly, you are correct! :-)

I think this demonstrates an abuse of partial and the sort of thing that
gives functional idioms a bad name. To tease this apart and understand
what it does was very difficult to me. And I don't understand the point
of creating partial applications that you are then immediately going to
call, that just adds an extra layer of indirection to slow the code
down. If you write partial(len, 'foo')() instead of just len('foo'),
something has gone drastically wrong.

So instead of

arg->spam()->cheese(kind='gouda')->eggs()

which includes *three* partial objects which are immediately called,
wouldn't it be easier to just call the functions in the first place?

eggs(cheese(spam(arg), kind='gouda'))

It will certainly be more efficient!


Let's run through a simple chain with no parens:

a -> b  # partial(b, a)
a -> b -> c  # partial(c, partial(b, a))
a -> b -> c -> d  # partial(d, partial(c, partial(b, a)))

I'm not seeing why I would want to write something like that.


Let's apply multiple arguments:

a -> func  # partial(func, a)
b -> (a -> func)  # partial(partial(func, a), b)
c -> (b -> (a -> func))  # partial(partial(partial(func, a), b), c)

Perhaps a sufficiently clever implementation of partial could optimize
partial(partial(func, a), b) to just a single layer of indirection
partial(func, a, b), so it's not *necessarily* as awful as it looks. (I
would expect a function composition operator to do the same.)

Note that we have to write the second argument first, and bracket the
second arrow clause. Writing it the "obvious" way is wrong:

a -> b -> func  # partial(func, partial(b, a))


I think this is imaginative but hard to read, hard to understand,
hard to use correctly, inefficient, and even if used correctly, there
are not very many times that you would need it.


> Or even together together with the proposed @ composition:
>  rms = root @ mean @ square->map     # for an iterable non-numpy argument

I think that a single arrow may be reasonable as syntactic sugar for
partial, but once you start chaining them, it all falls apart into a
mess. That, in my mind, is a sign that the idea doesn't scale. We can
chain dots with no problem:

fe.fi.fo.fum

and function calls in numerous ways:

foo(bar(baz()))
foo(bar)(baz)

and although they can get hard to read just because of the sheer number
of components, they are not conceptually difficult. But chaining arrows
is conceptually difficult even with as few as two arrows.

I think the problem here is that partial application is an N-ary
operation. This is not Haskell where single-argument currying is
enforced everywhere! You're trying to perform something which
conceptually takes N arguments partial(func, 1, 2, 3, ..., N) using only
a operator which can only take two arguments a->b. Things are going to
get messy.


> And here's something I find quite interesting. Together with
> @singledispatch from 3.4 (or possibly an enhanced version using type
> annotations in the future?), one could add 'third-party methods' to
> classes in other libraries without monkey patching. A dummy example:
>
> from numpy import array
> my_list = [1,2,3]
> my_array = array(my_list)
> my_mean = my_array.mean()  # This currently works in numpy
>
> from rmslib import rms
> my_rms = my_array->rms()  # efficient rms for numpy arrays
> my_other_rms = my_list->rms()  # rms that works on any iterable

That looks cute, but isn't very interesting. Effectively, you've
invented a new (and less efficient) syntax for calling a function:

spam->eggs(cheese)  # eggs(spam, cheese)

It's less efficient because it builds a partial object first, so instead
of one call you end up with two, and a temporary object that gets thrown
away immediately after it is used. Yes, you could keep the partial
object around, but as your example shows, you don't. And because it is
cute, people will write:

a->func(), b->func(), c->func()

and not realise that it creates three partial functions before calling
them. Writing:

func(a), func(b), func(c)

will avoid that needless overhead.



--
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas at python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

From guettliml at thomas-guettler.de  Mon May 11 10:42:16 2015
From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=)
Date: Mon, 11 May 2015 10:42:16 +0200
Subject: [Python-ideas] Policy for altering sys.path
In-Reply-To: <554A1F8C.1040005@thomas-guettler.de>
References: <554A1F8C.1040005@thomas-guettler.de>
Message-ID: <55506B68.90504@thomas-guettler.de>

Hi,

for this case, the sys.path modification was solved like this:


-sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
+sys.path[:] = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path

See https://github.com/pypa/pip/issues/2759



Am 06.05.2015 um 16:05 schrieb Thomas G?ttler:
> I am missing a policy how sys.path should be altered.
>
> We run a custom sub class of list in sys.path. We set it in sitecustomize.py
>
> This instance get replace by a common list in lines like this:
>
> sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
>
> The above line is from pip, it similar things happen in a lot of packages.
>
> Before trying to solve this with code, I think the python community should agree an a policy for altering sys.path.
>
> What can I do to this done?
>
> We use Python 2.7.
>
>
> Related: http://bugs.python.org/issue24135
>
> Regards,
>    Thomas G?ttler
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From guido at python.org  Mon May 11 16:41:01 2015
From: guido at python.org (Guido van Rossum)
Date: Mon, 11 May 2015 07:41:01 -0700
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
Message-ID: <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>

As long as I'm "in charge" the chances of this (or anything like it) being
accepted into Python are zero. I get a headache when I try to understand
code that uses function composition, and I end up having to laboriously
rewrite it using more traditional call notation before I move on to
understanding what it actually does. Python is not Haskell, and perhaps
more importantly, Python users are not like Haskel users. Either way, what
may work out beautifully in Haskell will be like a fish out of water in
Python.

I understand that it's fun to try to sole this puzzle, but evolving Python
is more than solving puzzles. Enjoy debating the puzzle, but in the end
Python will survive without the solution.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/ff590c74/attachment.html>

From apieum at gmail.com  Mon May 11 18:13:28 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Mon, 11 May 2015 18:13:28 +0200
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi>
 <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
Message-ID: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>

I don't want to insist and I respect your point of view, I just want to
give a simplified real life example to show that function composition can
be less painful than another syntax.

When validating a lot of data you may want to reuse parts of already writen
validators. It can also be a mess to test complex data validation.
You can reduce this mess and reuse parts of your code by writing atomic
validators and compose them.

# sorry for using my own lib, but if I make no mistakes this code
functions, so...

import re
from lawvere import curry # curry is an arrow without type checking,
inherits composition, mutiple dispatch

user_match =
re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match
domain_match =
re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match
strict_user_match =
re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match

@curry
def is_string(value):
    assert isinstance(value, str), '%s is not a string' %value
    return value

@curry
def apply_until_char(func, char, value):
    func(value[:value.index(char)])
    return value

@curry
def apply_from_char(func, char, value):
    func(value[value.index(char) + 1:])
    return value

@curry
def has_char(char, value):
    assert value.count(char) == 1
    return value

@curry
def assert_ends_with(text, value):
    assert value.endswith(text), '%s do not ends with %s' % (value, text)
    return value

@curry
def assert_user(user):
    assert user_match(user) is not None, '%s is not a valid user name' %
value
    return user

@curry
def assert_strict_user(user):
    assert strict_user_match(user) is not None, '%s is not a valid strict
user' % value
    return user

@curry
def assert_domain(domain):
    assert domain_match(domain) is not None, '%s is not a valid domain
name' % value
    return domain

# currying (be made with partial)
has_user = apply_until_char(assert_user, '@')
has_strict_user = apply_until_char(assert_strict_user, '@')
has_domain = apply_from_char(assert_domain, '@')

# composition:
is_email_address = is_string >> has_char('@') >> has_user >> has_domain
is_strict_email_address = is_string >> has_char('@') >> has_strict_user >>
has_domain

# we just want org adresses ?
is_org_addess = is_email_address >> assert_ends_with('.org')


I found a lot of interest in this syntax, mainly for testing purpose,
readability and maintenability of code.
No matters if I'm a fish out of python waters. :)




2015-05-11 16:41 GMT+02:00 Guido van Rossum <guido at python.org>:

> As long as I'm "in charge" the chances of this (or anything like it) being
> accepted into Python are zero. I get a headache when I try to understand
> code that uses function composition, and I end up having to laboriously
> rewrite it using more traditional call notation before I move on to
> understanding what it actually does. Python is not Haskell, and perhaps
> more importantly, Python users are not like Haskel users. Either way, what
> may work out beautifully in Haskell will be like a fish out of water in
> Python.
>
> I understand that it's fun to try to sole this puzzle, but evolving Python
> is more than solving puzzles. Enjoy debating the puzzle, but in the end
> Python will survive without the solution.
>
> --
> --Guido van Rossum (python.org/~guido)
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/b49a50a3/attachment.html>

From larocca at abiresearch.com  Mon May 11 19:08:54 2015
From: larocca at abiresearch.com (Douglas La Rocca)
Date: Mon, 11 May 2015 17:08:54 +0000
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>,
 <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
Message-ID: <d08922ec2c5e47efb2237afb4aceddfb@swordfish.abiresearch.com>

Operator overloading (>>) has intuitive readability but in my experience it's better have functions remain "ordinary" functions, not class instances so you know what to expect regarding the type and so on. The other downside is that with (>>) only the functions you wrap can play together.

Leaving aside the readability concern, the really major problem is that your tracebacks are so badly mangled. And if your implementation of the composition function uses recursion it gets even worse.


You also lose the benefits of reflection/inspection--for example, with the code below, what happens if I call help ?? in ipython on `is_email_address`?


________________________________
From: Python-ideas <python-ideas-bounces+larocca=abiresearch.com at python.org> on behalf of Gregory Salvan <apieum at gmail.com>
Sent: Monday, May 11, 2015 12:13 PM
To: Guido van Rossum
Cc: python-ideas at python.org
Subject: Re: [Python-ideas] Partial operator (and 'third-party methods' and 'piping') [was Re: Function composition (was no subject)]

I don't want to insist and I respect your point of view, I just want to give a simplified real life example to show that function composition can be less painful than another syntax.

When validating a lot of data you may want to reuse parts of already writen validators. It can also be a mess to test complex data validation.
You can reduce this mess and reuse parts of your code by writing atomic validators and compose them.

# sorry for using my own lib, but if I make no mistakes this code functions, so...

import re
from lawvere import curry # curry is an arrow without type checking, inherits composition, mutiple dispatch

user_match = re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match
domain_match = re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match
strict_user_match = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match

@curry
def is_string(value):
    assert isinstance(value, str), '%s is not a string' %value
    return value

@curry
def apply_until_char(func, char, value):
    func(value[:value.index(char)])
    return value

@curry
def apply_from_char(func, char, value):
    func(value[value.index(char) + 1:])
    return value

@curry
def has_char(char, value):
    assert value.count(char) == 1
    return value

@curry
def assert_ends_with(text, value):
    assert value.endswith(text), '%s do not ends with %s' % (value, text)
    return value

@curry
def assert_user(user):
    assert user_match(user) is not None, '%s is not a valid user name' % value
    return user

@curry
def assert_strict_user(user):
    assert strict_user_match(user) is not None, '%s is not a valid strict user' % value
    return user

@curry
def assert_domain(domain):
    assert domain_match(domain) is not None, '%s is not a valid domain name' % value
    return domain

# currying (be made with partial)
has_user = apply_until_char(assert_user, '@')
has_strict_user = apply_until_char(assert_strict_user, '@')
has_domain = apply_from_char(assert_domain, '@')

# composition:
is_email_address = is_string >> has_char('@') >> has_user >> has_domain
is_strict_email_address = is_string >> has_char('@') >> has_strict_user >> has_domain

# we just want org adresses ?
is_org_addess = is_email_address >> assert_ends_with('.org')


I found a lot of interest in this syntax, mainly for testing purpose, readability and maintenability of code.
No matters if I'm a fish out of python waters. :)




2015-05-11 16:41 GMT+02:00 Guido van Rossum <guido at python.org<mailto:guido at python.org>>:
As long as I'm "in charge" the chances of this (or anything like it) being accepted into Python are zero. I get a headache when I try to understand code that uses function composition, and I end up having to laboriously rewrite it using more traditional call notation before I move on to understanding what it actually does. Python is not Haskell, and perhaps more importantly, Python users are not like Haskel users. Either way, what may work out beautifully in Haskell will be like a fish out of water in Python.

I understand that it's fun to try to sole this puzzle, but evolving Python is more than solving puzzles. Enjoy debating the puzzle, but in the end Python will survive without the solution.

--
--Guido van Rossum (python.org/~guido<http://python.org/~guido>)

_______________________________________________
Python-ideas mailing list
Python-ideas at python.org<mailto:Python-ideas at python.org>
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/ebd5c858/attachment-0001.html>

From tjreedy at udel.edu  Mon May 11 19:45:35 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 11 May 2015 13:45:35 -0400
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
Message-ID: <miqps6$rin$1@ger.gmane.org>

On 5/11/2015 10:41 AM, Guido van Rossum wrote:
> As long as I'm "in charge" the chances of this (or anything like it)
> being accepted into Python are zero.

I have been waiting for this response (which I agree with).
By 'this', I presume you mean either more new syntax other than '@', or 
official support of '@' other than for matrix or array multiplication.

 > I get a headache when I try to
> understand code that uses function composition,

Function composition is the *process* of using the output of one 
function (broadly speaking) as the input (or one of the inputs) of 
another function.  All python code does this.  The discussion is about 
adding a composition operator or function or notation (and 
accoutrements) as a duplicate *syntax* for expressing composition.  As I 
posted before, mathematician's usually define the operator in terms of 
call syntax, which can also express composition.

 > and I end up having to
> laboriously rewrite it using more traditional call notation before I
> move on to understanding what it actually does.

Mathematicians do rewrites also ;-).
The proof of (f @ g) @ h = f @ (g @ h) (associativity) is that
((f @ g) @ h)(x) and (f @ (g @ h))(x) can both be rewritten as
f(g(h(x))).

> I understand that it's fun to try to sole this puzzle, but evolving
> Python is more than solving puzzles.

Leaving aside the problem of stack overflow, one can rewrite "for x in 
iterable: process x" to perform the same computational process with 
recursive syntax (using iter and next and catching StopIteration).  But 
one would have to be really stuck on the recursive syntax, as opposed to 
the inductive process, to use it in practice.

-- 
Terry Jan Reedy


From apieum at gmail.com  Mon May 11 19:46:00 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Mon, 11 May 2015 19:46:00 +0200
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <d08922ec2c5e47efb2237afb4aceddfb@swordfish.abiresearch.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com>
 <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi>
 <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
 <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
 <d08922ec2c5e47efb2237afb4aceddfb@swordfish.abiresearch.com>
Message-ID: <CAAZsQLBVWXhL0U+oo04H-h_=81wnGAzxCHRjLGEjvREbC6NQcQ@mail.gmail.com>

'is_email_address' is a special tuple which contains functions.
(is_email_address[0] returns is_string)
For help, it's a feature I've not implemented but it's easy to return the
help of each function, plus details as each function has an object
representing it's signature.

For traceback mangling, I don't see what is the problem.
When you call is_email_address(something) it pretty like if you've called:
def is_email_address(value):
    is_string(value)
    has_char('@', value)
    has_user(value)
    has_domain(value)
    return value


2015-05-11 19:08 GMT+02:00 Douglas La Rocca <larocca at abiresearch.com>:

>  Operator overloading (>>) has intuitive readability but in my experience
> it's better have functions remain "ordinary" functions, not class
> instances so you know what to expect regarding the type and so on. The
> other downside is that with (>>) only the functions you wrap can play
> together.
>
>
> Leaving aside the readability concern, the really major problem is that
> your tracebacks are so badly mangled. And if your implementation of
> the composition function uses recursion it gets even worse.
>
>
>  You also lose the benefits of reflection/inspection--for example, with
> the code below, what happens if I call help ?? in ipython on `
> is_email_address`?
>
>
>  ------------------------------
> *From:* Python-ideas <python-ideas-bounces+larocca=
> abiresearch.com at python.org> on behalf of Gregory Salvan <apieum at gmail.com>
> *Sent:* Monday, May 11, 2015 12:13 PM
> *To:* Guido van Rossum
> *Cc:* python-ideas at python.org
> *Subject:* Re: [Python-ideas] Partial operator (and 'third-party methods'
> and 'piping') [was Re: Function composition (was no subject)]
>
>    I don't want to insist and I respect your point of view, I just want
> to give a simplified real life example to show that function composition
> can be less painful than another syntax.
>
>  When validating a lot of data you may want to reuse parts of already
> writen validators. It can also be a mess to test complex data validation.
>  You can reduce this mess and reuse parts of your code by writing atomic
> validators and compose them.
>
>  # sorry for using my own lib, but if I make no mistakes this code
> functions, so...
>
> import re
>  from lawvere import curry # curry is an arrow without type checking,
> inherits composition, mutiple dispatch
>
> user_match =
> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match
> domain_match =
> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match
> strict_user_match =
> re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match
>
>  @curry
>  def is_string(value):
>     assert isinstance(value, str), '%s is not a string' %value
>      return value
>
>  @curry
> def apply_until_char(func, char, value):
>      func(value[:value.index(char)])
>     return value
>
>  @curry
>  def apply_from_char(func, char, value):
>      func(value[value.index(char) + 1:])
>      return value
>
> @curry
>  def has_char(char, value):
>     assert value.count(char) == 1
>      return value
>
> @curry
>  def assert_ends_with(text, value):
>      assert value.endswith(text), '%s do not ends with %s' % (value, text)
>      return value
>
> @curry
> def assert_user(user):
>     assert user_match(user) is not None, '%s is not a valid user name' %
> value
>      return user
>
> @curry
> def assert_strict_user(user):
>     assert strict_user_match(user) is not None, '%s is not a valid strict
> user' % value
>      return user
>
> @curry
> def assert_domain(domain):
>     assert domain_match(domain) is not None, '%s is not a valid domain
> name' % value
>      return domain
>
>  # currying (be made with partial)
>  has_user = apply_until_char(assert_user, '@')
>  has_strict_user = apply_until_char(assert_strict_user, '@')
>  has_domain = apply_from_char(assert_domain, '@')
>
>  # composition:
>  is_email_address = is_string >> has_char('@') >> has_user >> has_domain
> is_strict_email_address = is_string >> has_char('@') >> has_strict_user >>
> has_domain
>
>  # we just want org adresses ?
>  is_org_addess = is_email_address >> assert_ends_with('.org')
>
>
>  I found a lot of interest in this syntax, mainly for testing purpose,
> readability and maintenability of code.
>  No matters if I'm a fish out of python waters. :)
>
>
>
>
> 2015-05-11 16:41 GMT+02:00 Guido van Rossum <guido at python.org>:
>
>>  As long as I'm "in charge" the chances of this (or anything like it)
>> being accepted into Python are zero. I get a headache when I try to
>> understand code that uses function composition, and I end up having to
>> laboriously rewrite it using more traditional call notation before I move
>> on to understanding what it actually does. Python is not Haskell, and
>> perhaps more importantly, Python users are not like Haskel users. Either
>> way, what may work out beautifully in Haskell will be like a fish out of
>> water in Python.
>>
>>  I understand that it's fun to try to sole this puzzle, but evolving
>> Python is more than solving puzzles. Enjoy debating the puzzle, but in the
>> end Python will survive without the solution.
>>
>>  --
>> --Guido van Rossum (python.org/~guido)
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/2634e3d4/attachment.html>

From guido at python.org  Mon May 11 19:49:10 2015
From: guido at python.org (Guido van Rossum)
Date: Mon, 11 May 2015 10:49:10 -0700
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <miqps6$rin$1@ger.gmane.org>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
 <miqps6$rin$1@ger.gmane.org>
Message-ID: <CAP7+vJJdmKBFvFFvHA2y-HfxEKVFeX6J01MWo73W1cxDQcdPJw@mail.gmail.com>

On Mon, May 11, 2015 at 10:45 AM, Terry Reedy <tjreedy at udel.edu> wrote:

> On 5/11/2015 10:41 AM, Guido van Rossum wrote:
>
>> As long as I'm "in charge" the chances of this (or anything like it)
>> being accepted into Python are zero.
>>
>
> I have been waiting for this response (which I agree with).
> By 'this', I presume you mean either more new syntax other than '@', or
> official support of '@' other than for matrix or array multiplication.
>

Or even adding a compose() function (or similar) to the stdlib.

I'm sorry, I don't have time to argue about this.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/145e3674/attachment-0001.html>

From tjreedy at udel.edu  Mon May 11 19:54:50 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 11 May 2015 13:54:50 -0400
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
 <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
Message-ID: <miqqdg$6u9$1@ger.gmane.org>

On 5/11/2015 12:13 PM, Gregory Salvan wrote:
> I don't want to insist and I respect your point of view, I just want to
> give a simplified real life example to show that function composition
> can be less painful than another syntax.
>
> When validating a lot of data you may want to reuse parts of already
> writen validators. It can also be a mess to test complex data validation.
> You can reduce this mess and reuse parts of your code by writing atomic
> validators and compose them.
>
> # sorry for using my own lib, but if I make no mistakes this code
> functions, so...
>
> import re
> from lawvere import curry # curry is an arrow without type checking,
> inherits composition, mutiple dispatch
>
> user_match =
> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match
> domain_match =
> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match
> strict_user_match =
> re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match
>
> @curry
> def is_string(value):
>      assert isinstance(value, str), '%s is not a string' %value
>      return value
>
> @curry
> def apply_until_char(func, char, value):
>      func(value[:value.index(char)])
>      return value
>
> @curry
> def apply_from_char(func, char, value):
>      func(value[value.index(char) + 1:])
>      return value
>
> @curry
> def has_char(char, value):
>      assert value.count(char) == 1
>      return value
>
> @curry
> def assert_ends_with(text, value):
>      assert value.endswith(text), '%s do not ends with %s' % (value, text)
>      return value
>
> @curry
> def assert_user(user):
>      assert user_match(user) is not None, '%s is not a valid user name'
> % value
>      return user
>
> @curry
> def assert_strict_user(user):
>      assert strict_user_match(user) is not None, '%s is not a valid
> strict user' % value
>      return user
>
> @curry
> def assert_domain(domain):
>      assert domain_match(domain) is not None, '%s is not a valid domain
> name' % value
>      return domain
>
> # currying (be made with partial)
> has_user = apply_until_char(assert_user, '@')
> has_strict_user = apply_until_char(assert_strict_user, '@')
> has_domain = apply_from_char(assert_domain, '@')
>
> # composition:
> is_email_address = is_string >> has_char('@') >> has_user >> has_domain
> is_strict_email_address = is_string >> has_char('@') >> has_strict_user
>  >> has_domain
>
> # we just want org adresses ?
> is_org_addess = is_email_address >> assert_ends_with('.org')
>
>
> I found a lot of interest in this syntax, mainly for testing purpose,
> readability and maintenability of code.
> No matters if I'm a fish out of python waters. :)

You could do much the same with standard syntax by writing an str 
subclass with multiple methods that return self, and then chain together 
the method calls.

class VString:  # verifiable string
     def has_char_once(self, char):
         assert self.count(char) == 1
         return self
...
     def is_email_address(self):  # or make standalone
         return self.has_char_once('@').has_user().has_domain()

data = VString(input())

data.is_email()

-- 
Terry Jan Reedy


From tjreedy at udel.edu  Mon May 11 20:21:28 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 11 May 2015 14:21:28 -0400
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAP7+vJJdmKBFvFFvHA2y-HfxEKVFeX6J01MWo73W1cxDQcdPJw@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
 <miqps6$rin$1@ger.gmane.org>
 <CAP7+vJJdmKBFvFFvHA2y-HfxEKVFeX6J01MWo73W1cxDQcdPJw@mail.gmail.com>
Message-ID: <miqrve$2v3$1@ger.gmane.org>

On 5/11/2015 1:49 PM, Guido van Rossum wrote:
> On Mon, May 11, 2015 at 10:45 AM, Terry Reedy
> <tjreedy at udel.edu
> <mailto:tjreedy at udel.edu>> wrote:
>
>     On 5/11/2015 10:41 AM, Guido van Rossum wrote:
>
>         As long as I'm "in charge" the chances of this (or anything like it)
>         being accepted into Python are zero.
>
>     I have been waiting for this response (which I agree with).
>     By 'this', I presume you mean either more new syntax other than '@',
>     or official support of '@' other than for matrix or array
>     multiplication.
>
> Or even adding a compose() function (or similar) to the stdlib.


> I'm sorry, I don't have time to argue about this.


-- 
Terry Jan Reedy


From abarnert at yahoo.com  Mon May 11 20:25:24 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 11 May 2015 18:25:24 +0000 (UTC)
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
References: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
Message-ID: <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com>

On Monday, May 11, 2015 9:15 AM, Gregory Salvan <apieum at gmail.com> wrote:


>I don't want to insist and I respect your point of view, I just want to give a simplified real life example to show that function composition can be less painful than another syntax.

OK, let's compare your example to a Pythonic implementation of the same thing.

import re

ruser = re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$")
rdomain = re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$")
rstrict_user = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$")


def is_email_address(addr):
    user, domain = addr.split('@', 1)
    return ruser.match(user) and rdomain.match(domain)

def is_strict_email_address(addr):
    user, domain = addr.split('@', 1)
    return rstrictuser.match(user) and rdomain.match(domain)


def is_org_address(addr):
    return is_email_address(addr) and addr.ends_with('.org')

(An even better solution, given that you're already using regexps, might be to just use a single regexp with named groups for the user or strict-user, full domain, and TLD? but I've left yours alone.)

Far from being more painful, the Pythonic version is easier to write, easier to read, easier to debug, shorter, and understandable to even a novice, without having to rewrite anything in your head. It also handles invalid input by returning failure values and/or raising appropriate exceptions rather than asserting and exiting. And it's almost certainly going to be significantly more efficient. And it works with any string-like type (that is, any type that has a .split method and works with re.match). And if you have to debug something, you will have, e.g., values named user and domain, rather than both being named value at different levels on the call stack.

If you really want to come up with a convincing example for your idea, I'd take an example out of Learn You a Haskell or another book or tutorial and translate that to Python with your library. I suspect it would still have some of the same problems, but this example wouldn't even really be good in Haskell, so it's just making it harder to see why anyone would want anything like it. And by offering this as the response to Guido's "You're never going to convince me," well, if he _was_ still reading this thread with an open mind, he probably isn't anymore (although, to be honest, he probably wasn't reading it anyway).

>import re
>
>from lawvere import curry # curry is an arrow without type checking, inherits composition, mutiple dispatch
>
>user_match = re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match
>domain_match = re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match
>strict_user_match = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match
>
>@curry>def is_string(value):
>    assert isinstance(value, str), '%s is not a string' %value
>    return value
>
>@curry
>def apply_until_char(func, char, value):
>    func(value[:value.index(char)])
>    return value
>
>@curry
>def apply_from_char(func, char, value):
>    func(value[value.index(char) + 1:])
>    return value
>
>@curry
>
>def has_char(char, value):
>    assert value.count(char) == 1
>    return value
>
>@curry
>def assert_ends_with(text, value):
>    assert value.endswith(text), '%s do not ends with %s' % (value, text)
>    return value
>
>@curry
>def assert_user(user):
>    assert user_match(user) is not None, '%s is not a valid user name' % value
>    return user
>
>@curry
>def assert_strict_user(user):
>    assert strict_user_match(user) is not None, '%s is not a valid strict user' % value
>    return user
>
>@curry
>def assert_domain(domain):
>    assert domain_match(domain) is not None, '%s is not a valid domain name' % value
>    return domain
>
># currying (be made with partial)
>
>has_user = apply_until_char(assert_user, '@')
>
>has_strict_user = apply_until_char(assert_strict_user, '@')
>
>has_domain = apply_from_char(assert_domain, '@')
>
>
># composition:
>
>is_email_address = is_string >> has_char('@') >> has_user >> has_domain
>
>is_strict_email_address = is_string >> has_char('@') >> has_strict_user >> has_domain
>
>
># we just want org adresses ?
>
>is_org_addess = is_email_address >> assert_ends_with('.org')
>
>
>
>
>I found a lot of interest in this syntax, mainly for testing purpose, readability and maintenability of code.
>
>No matters if I'm a fish out of python waters. :)
>
>
>
>
>
>
>
>
>2015-05-11 16:41 GMT+02:00 Guido van Rossum <guido at python.org>:
>
>As long as I'm "in charge" the chances of this (or anything like it) being accepted into Python are zero. I get a headache when I try to understand code that uses function composition, and I end up having to laboriously rewrite it using more traditional call notation before I move on to understanding what it actually does. Python is not Haskell, and perhaps more importantly, Python users are not like Haskel users. Either way, what may work out beautifully in Haskell will be like a fish out of water in Python.
>>
>>I understand that it's fun to try to sole this puzzle, but evolving Python is more than solving puzzles. Enjoy debating the puzzle, but in the end Python will survive without the solution.
>>
>>
>>
>>-- 
>>
>>--Guido van Rossum (python.org/~guido)
>>_______________________________________________
>>Python-ideas mailing list
>>Python-ideas at python.org
>>https://mail.python.org/mailman/listinfo/python-ideas
>>Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>https://mail.python.org/mailman/listinfo/python-ideas
>Code of Conduct: http://python.org/psf/codeofconduct/
>
>

From abarnert at yahoo.com  Mon May 11 20:43:17 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 11 May 2015 18:43:17 +0000 (UTC)
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <miqps6$rin$1@ger.gmane.org>
References: <miqps6$rin$1@ger.gmane.org>
Message-ID: <1134540450.4344373.1431369797892.JavaMail.yahoo@mail.yahoo.com>

On Monday, May 11, 2015 10:46 AM, Terry Reedy <tjreedy at udel.edu> wrote:

> On 5/11/2015 10:41 AM, Guido van Rossum wrote:
>>  As long as I'm "in charge" the chances of this (or anything 
> like it)
>>  being accepted into Python are zero.
> 
> I have been waiting for this response (which I agree with).
> By 'this', I presume you mean either more new syntax other than 
> '@', or 
> official support of '@' other than for matrix or array multiplication.

I don't think it's worth trying to push for this directly in Python, even with the @ operator or a functools.compose function, even if someone thinks they've solved all the problems. If anyone really wants this feature, the obvious thing to do at this point is to prepare a NumPy-wrapper library that adds __matmul__ and __rmatmul__ to ufuncs, and some examples, convince the NumPy team to accept it, and then, once it becomes idiomatic in NumPy code, come back to python-ideas. Maybe there is nothing about function composition which inherently requires broadcast-style operations to make it useful, but the only decent examples anyone's come up with in this thread (root-mean-square) all do, which has to mean something. And the NumPy core devs haven't explicitly announced that they don't want to be convinced.


>>  I get a headache when I try to

>>  understand code that uses function composition,
> 
> Function composition is the *process* of using the output of one 
> function (broadly speaking) as the input (or one of the inputs) of 
> another function.  All python code does this.  The discussion is about 
> adding a composition operator or function or notation (and 
> accoutrements) as a duplicate *syntax* for expressing composition.  As I 
> posted before, mathematician's usually define the operator in terms of 
> call syntax, which can also express composition.
> 
>>  and I end up having to
>>  laboriously rewrite it using more traditional call notation before I
>>  move on to understanding what it actually does.
> 
> Mathematicians do rewrites also ;-).
> The proof of (f @ g) @ h = f @ (g @ h) (associativity) is that
> ((f @ g) @ h)(x) and (f @ (g @ h))(x) can both be rewritten as
> f(g(h(x))).
> 
>>  I understand that it's fun to try to sole this puzzle, but evolving
>>  Python is more than solving puzzles.
> 
> Leaving aside the problem of stack overflow, one can rewrite "for x in 
> iterable: process x" to perform the same computational process with 
> recursive syntax (using iter and next and catching StopIteration).  But 
> one would have to be really stuck on the recursive syntax, as opposed to 
> the inductive process, to use it in practice.
> 
> -- 
> Terry Jan Reedy
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
> 

From levkivskyi at gmail.com  Mon May 11 21:00:30 2015
From: levkivskyi at gmail.com (Ivan Levkivskyi)
Date: Mon, 11 May 2015 21:00:30 +0200
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
Message-ID: <CAOMjWk=CbAup1VSCgS+sjSi6FaA05-ioXiVbebYG=6h5KO6o2w@mail.gmail.com>

Dear Guido,

1. The longest program that I had written in Haskell was 4 lines.

2. You don't need to accept anything, everything is already accepted.
Namely,

@one
@two
@three
def fun(x):
    ...

already means fun = one(two(three(fun)))

Also now we have @ operator.

3. My idea in its current state is to overload @ to allow piping of
arbitrary transformers of iterables, not only multiplication of matrices.
Semantics is the same: matrix is something that takes a vector and returns
a vector and multiplication of matrices is exactly "piping" the
corresponding transformations.

I now think one does not need any partial applications or something
similar. The rules should be the same as for decorators. If I write:

@deco(arg)
def fun(x):
    ...

it is my duty to be sure that deco(arg) evaluates to something that takes
one function and returns one function. Same should be for
vector-transformers, each should be "one vector in - one out".

4. Since you don't want this in stdlib, let's move this discussion to Numpy
lists.

5. I never thought that evolving Python is solving puzzles. My intention
was helping people that might have same problems with me. If it is not the
best place to do so, sorry for disturbing.

Date: Mon, 11 May 2015 07:41:01 -0700

> From: Guido van Rossum <guido at python.org>
> To: "python-ideas at python.org" <python-ideas at python.org>
> Subject: Re: [Python-ideas] Partial operator (and 'third-party
>         methods' and 'piping') [was Re: Function composition (was no
> subject)]
> Message-ID:
>         <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=
> z-XyLCosTpp1g at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> As long as I'm "in charge" the chances of this (or anything like it) being
> accepted into Python are zero. I get a headache when I try to understand
> code that uses function composition, and I end up having to laboriously
> rewrite it using more traditional call notation before I move on to
> understanding what it actually does. Python is not Haskell, and perhaps
> more importantly, Python users are not like Haskel users. Either way, what
> may work out beautifully in Haskell will be like a fish out of water in
> Python.
>
> I understand that it's fun to try to sole this puzzle, but evolving Python
> is more than solving puzzles. Enjoy debating the puzzle, but in the end
> Python will survive without the solution.
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/a72ddd3d/attachment.html>

From apieum at gmail.com  Mon May 11 21:44:19 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Mon, 11 May 2015 21:44:19 +0200
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com>
References: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
 <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <CAAZsQLDjbb6JCiMHJUcHs-9qeeiyp3AbkvHvOy5gKwKKwLSR+Q@mail.gmail.com>

Andrew Barnet we disagree.
In your example you have no information about if error comes from domain,
user name or domain extension...
Writing a big regexp with group... really ? it is easy to maintain, test
and reuse ? and for a novice ? muliply this by thousands of validators and
their respectives tests.
I call that a mess and inside a project I lead, I will not accept it.

Even in Haskell people rarelly use arrows, I don't criticize this choice as
arrows comes from category theory and we are used to think inside ZF set
theory.
Somes prefer a syntax over another, there is not a good answer, but this
also mean there is no irrelevant answer.
In fact both exists and choosing within the case is never easy. Thinking
the same way for each problem is also wrong, so I will never pretend to
resolve every problem with a single lib.

Now I understand this idea is not a priority, I've seen more and more
threads about functional tools, I regret we can't find a solution but
effectively this absence of solution now can't convince me to stop digging
other paths. This is not irrespectuous.



2015-05-11 20:25 GMT+02:00 Andrew Barnert <abarnert at yahoo.com>:

> On Monday, May 11, 2015 9:15 AM, Gregory Salvan <apieum at gmail.com> wrote:
>
>
> >I don't want to insist and I respect your point of view, I just want to
> give a simplified real life example to show that function composition can
> be less painful than another syntax.
>
> OK, let's compare your example to a Pythonic implementation of the same
> thing.
>
> import re
>
> ruser =
> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$")
> rdomain =
> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$")
> rstrict_user = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$")
>
>
> def is_email_address(addr):
>     user, domain = addr.split('@', 1)
>     return ruser.match(user) and rdomain.match(domain)
>
> def is_strict_email_address(addr):
>     user, domain = addr.split('@', 1)
>     return rstrictuser.match(user) and rdomain.match(domain)
>
>
> def is_org_address(addr):
>     return is_email_address(addr) and addr.ends_with('.org')
>
> (An even better solution, given that you're already using regexps, might
> be to just use a single regexp with named groups for the user or
> strict-user, full domain, and TLD? but I've left yours alone.)
>
> Far from being more painful, the Pythonic version is easier to write,
> easier to read, easier to debug, shorter, and understandable to even a
> novice, without having to rewrite anything in your head. It also handles
> invalid input by returning failure values and/or raising appropriate
> exceptions rather than asserting and exiting. And it's almost certainly
> going to be significantly more efficient. And it works with any string-like
> type (that is, any type that has a .split method and works with re.match).
> And if you have to debug something, you will have, e.g., values named user
> and domain, rather than both being named value at different levels on the
> call stack.
>
> If you really want to come up with a convincing example for your idea, I'd
> take an example out of Learn You a Haskell or another book or tutorial and
> translate that to Python with your library. I suspect it would still have
> some of the same problems, but this example wouldn't even really be good in
> Haskell, so it's just making it harder to see why anyone would want
> anything like it. And by offering this as the response to Guido's "You're
> never going to convince me," well, if he _was_ still reading this thread
> with an open mind, he probably isn't anymore (although, to be honest, he
> probably wasn't reading it anyway).
>
> >import re
> >
> >from lawvere import curry # curry is an arrow without type checking,
> inherits composition, mutiple dispatch
> >
> >user_match =
> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match
> >domain_match =
> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match
> >strict_user_match =
> re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match
> >
> >@curry>def is_string(value):
> >    assert isinstance(value, str), '%s is not a string' %value
> >    return value
> >
> >@curry
> >def apply_until_char(func, char, value):
> >    func(value[:value.index(char)])
> >    return value
> >
> >@curry
> >def apply_from_char(func, char, value):
> >    func(value[value.index(char) + 1:])
> >    return value
> >
> >@curry
> >
> >def has_char(char, value):
> >    assert value.count(char) == 1
> >    return value
> >
> >@curry
> >def assert_ends_with(text, value):
> >    assert value.endswith(text), '%s do not ends with %s' % (value, text)
> >    return value
> >
> >@curry
> >def assert_user(user):
> >    assert user_match(user) is not None, '%s is not a valid user name' %
> value
> >    return user
> >
> >@curry
> >def assert_strict_user(user):
> >    assert strict_user_match(user) is not None, '%s is not a valid strict
> user' % value
> >    return user
> >
> >@curry
> >def assert_domain(domain):
> >    assert domain_match(domain) is not None, '%s is not a valid domain
> name' % value
> >    return domain
> >
> ># currying (be made with partial)
> >
> >has_user = apply_until_char(assert_user, '@')
> >
> >has_strict_user = apply_until_char(assert_strict_user, '@')
> >
> >has_domain = apply_from_char(assert_domain, '@')
> >
> >
> ># composition:
> >
> >is_email_address = is_string >> has_char('@') >> has_user >> has_domain
> >
> >is_strict_email_address = is_string >> has_char('@') >> has_strict_user
> >> has_domain
> >
> >
> ># we just want org adresses ?
> >
> >is_org_addess = is_email_address >> assert_ends_with('.org')
> >
> >
> >
> >
> >I found a lot of interest in this syntax, mainly for testing purpose,
> readability and maintenability of code.
> >
> >No matters if I'm a fish out of python waters. :)
> >
> >
> >
> >
> >
> >
> >
> >
> >2015-05-11 16:41 GMT+02:00 Guido van Rossum <guido at python.org>:
> >
> >As long as I'm "in charge" the chances of this (or anything like it)
> being accepted into Python are zero. I get a headache when I try to
> understand code that uses function composition, and I end up having to
> laboriously rewrite it using more traditional call notation before I move
> on to understanding what it actually does. Python is not Haskell, and
> perhaps more importantly, Python users are not like Haskel users. Either
> way, what may work out beautifully in Haskell will be like a fish out of
> water in Python.
> >>
> >>I understand that it's fun to try to sole this puzzle, but evolving
> Python is more than solving puzzles. Enjoy debating the puzzle, but in the
> end Python will survive without the solution.
> >>
> >>
> >>
> >>--
> >>
> >>--Guido van Rossum (python.org/~guido)
> >>_______________________________________________
> >>Python-ideas mailing list
> >>Python-ideas at python.org
> >>https://mail.python.org/mailman/listinfo/python-ideas
> >>Code of Conduct: http://python.org/psf/codeofconduct/
> >>
> >
> >
> >_______________________________________________
> >Python-ideas mailing list
> >Python-ideas at python.org
> >https://mail.python.org/mailman/listinfo/python-ideas
> >Code of Conduct: http://python.org/psf/codeofconduct/
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/4e0bb57a/attachment-0001.html>

From apieum at gmail.com  Mon May 11 23:59:56 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Mon, 11 May 2015 23:59:56 +0200
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAAZsQLDjbb6JCiMHJUcHs-9qeeiyp3AbkvHvOy5gKwKKwLSR+Q@mail.gmail.com>
References: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
 <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com>
 <CAAZsQLDjbb6JCiMHJUcHs-9qeeiyp3AbkvHvOy5gKwKKwLSR+Q@mail.gmail.com>
Message-ID: <CAAZsQLAKY8_B4vFAz8AHKf_wjAUM2bzpTrqZ4EN3zmdbBWfmRQ@mail.gmail.com>

In case you've not seen how it divides the volume of code you'll need to
write, here are tests of "is_email_address":

# What's an email address ?
def test_it_is_a_string(self):
    assert is_string in is_email_address

def test_it_has_a_user_name(self):
    assert has_user in is_email_address

def test_it_contains_at(self):
    assert has_char('@') in is_email_address

def test_it_has_a_domain_name(self):
    assert has_domain in is_email_address

# answer: an email address is a string with a user name, char '@' and a
domain name.

@Teddy Reedy with a class you'll have to write more tests and abuse of
inheritance.


2015-05-11 21:44 GMT+02:00 Gregory Salvan <apieum at gmail.com>:

> Andrew Barnet we disagree.
> In your example you have no information about if error comes from domain,
> user name or domain extension...
> Writing a big regexp with group... really ? it is easy to maintain, test
> and reuse ? and for a novice ? muliply this by thousands of validators and
> their respectives tests.
> I call that a mess and inside a project I lead, I will not accept it.
>
> Even in Haskell people rarelly use arrows, I don't criticize this choice
> as arrows comes from category theory and we are used to think inside ZF set
> theory.
> Somes prefer a syntax over another, there is not a good answer, but this
> also mean there is no irrelevant answer.
> In fact both exists and choosing within the case is never easy. Thinking
> the same way for each problem is also wrong, so I will never pretend to
> resolve every problem with a single lib.
>
> Now I understand this idea is not a priority, I've seen more and more
> threads about functional tools, I regret we can't find a solution but
> effectively this absence of solution now can't convince me to stop digging
> other paths. This is not irrespectuous.
>
>
>
> 2015-05-11 20:25 GMT+02:00 Andrew Barnert <abarnert at yahoo.com>:
>
>> On Monday, May 11, 2015 9:15 AM, Gregory Salvan <apieum at gmail.com> wrote:
>>
>>
>> >I don't want to insist and I respect your point of view, I just want to
>> give a simplified real life example to show that function composition can
>> be less painful than another syntax.
>>
>> OK, let's compare your example to a Pythonic implementation of the same
>> thing.
>>
>> import re
>>
>> ruser =
>> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$")
>> rdomain =
>> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$")
>> rstrict_user = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$")
>>
>>
>> def is_email_address(addr):
>>     user, domain = addr.split('@', 1)
>>     return ruser.match(user) and rdomain.match(domain)
>>
>> def is_strict_email_address(addr):
>>     user, domain = addr.split('@', 1)
>>     return rstrictuser.match(user) and rdomain.match(domain)
>>
>>
>> def is_org_address(addr):
>>     return is_email_address(addr) and addr.ends_with('.org')
>>
>> (An even better solution, given that you're already using regexps, might
>> be to just use a single regexp with named groups for the user or
>> strict-user, full domain, and TLD? but I've left yours alone.)
>>
>> Far from being more painful, the Pythonic version is easier to write,
>> easier to read, easier to debug, shorter, and understandable to even a
>> novice, without having to rewrite anything in your head. It also handles
>> invalid input by returning failure values and/or raising appropriate
>> exceptions rather than asserting and exiting. And it's almost certainly
>> going to be significantly more efficient. And it works with any string-like
>> type (that is, any type that has a .split method and works with re.match).
>> And if you have to debug something, you will have, e.g., values named user
>> and domain, rather than both being named value at different levels on the
>> call stack.
>>
>> If you really want to come up with a convincing example for your idea,
>> I'd take an example out of Learn You a Haskell or another book or tutorial
>> and translate that to Python with your library. I suspect it would still
>> have some of the same problems, but this example wouldn't even really be
>> good in Haskell, so it's just making it harder to see why anyone would want
>> anything like it. And by offering this as the response to Guido's "You're
>> never going to convince me," well, if he _was_ still reading this thread
>> with an open mind, he probably isn't anymore (although, to be honest, he
>> probably wasn't reading it anyway).
>>
>> >import re
>> >
>> >from lawvere import curry # curry is an arrow without type checking,
>> inherits composition, mutiple dispatch
>> >
>> >user_match =
>> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match
>> >domain_match =
>> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match
>> >strict_user_match =
>> re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match
>> >
>> >@curry>def is_string(value):
>> >    assert isinstance(value, str), '%s is not a string' %value
>> >    return value
>> >
>> >@curry
>> >def apply_until_char(func, char, value):
>> >    func(value[:value.index(char)])
>> >    return value
>> >
>> >@curry
>> >def apply_from_char(func, char, value):
>> >    func(value[value.index(char) + 1:])
>> >    return value
>> >
>> >@curry
>> >
>> >def has_char(char, value):
>> >    assert value.count(char) == 1
>> >    return value
>> >
>> >@curry
>> >def assert_ends_with(text, value):
>> >    assert value.endswith(text), '%s do not ends with %s' % (value, text)
>> >    return value
>> >
>> >@curry
>> >def assert_user(user):
>> >    assert user_match(user) is not None, '%s is not a valid user name' %
>> value
>> >    return user
>> >
>> >@curry
>> >def assert_strict_user(user):
>> >    assert strict_user_match(user) is not None, '%s is not a valid
>> strict user' % value
>> >    return user
>> >
>> >@curry
>> >def assert_domain(domain):
>> >    assert domain_match(domain) is not None, '%s is not a valid domain
>> name' % value
>> >    return domain
>> >
>> ># currying (be made with partial)
>> >
>> >has_user = apply_until_char(assert_user, '@')
>> >
>> >has_strict_user = apply_until_char(assert_strict_user, '@')
>> >
>> >has_domain = apply_from_char(assert_domain, '@')
>> >
>> >
>> ># composition:
>> >
>> >is_email_address = is_string >> has_char('@') >> has_user >> has_domain
>> >
>> >is_strict_email_address = is_string >> has_char('@') >> has_strict_user
>> >> has_domain
>> >
>> >
>> ># we just want org adresses ?
>> >
>> >is_org_addess = is_email_address >> assert_ends_with('.org')
>> >
>> >
>> >
>> >
>> >I found a lot of interest in this syntax, mainly for testing purpose,
>> readability and maintenability of code.
>> >
>> >No matters if I'm a fish out of python waters. :)
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >2015-05-11 16:41 GMT+02:00 Guido van Rossum <guido at python.org>:
>> >
>> >As long as I'm "in charge" the chances of this (or anything like it)
>> being accepted into Python are zero. I get a headache when I try to
>> understand code that uses function composition, and I end up having to
>> laboriously rewrite it using more traditional call notation before I move
>> on to understanding what it actually does. Python is not Haskell, and
>> perhaps more importantly, Python users are not like Haskel users. Either
>> way, what may work out beautifully in Haskell will be like a fish out of
>> water in Python.
>> >>
>> >>I understand that it's fun to try to sole this puzzle, but evolving
>> Python is more than solving puzzles. Enjoy debating the puzzle, but in the
>> end Python will survive without the solution.
>> >>
>> >>
>> >>
>> >>--
>> >>
>> >>--Guido van Rossum (python.org/~guido)
>> >>_______________________________________________
>> >>Python-ideas mailing list
>> >>Python-ideas at python.org
>> >>https://mail.python.org/mailman/listinfo/python-ideas
>> >>Code of Conduct: http://python.org/psf/codeofconduct/
>> >>
>> >
>> >
>> >_______________________________________________
>> >Python-ideas mailing list
>> >Python-ideas at python.org
>> >https://mail.python.org/mailman/listinfo/python-ideas
>> >Code of Conduct: http://python.org/psf/codeofconduct/
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150511/896d7d30/attachment.html>

From apieum at gmail.com  Tue May 12 00:12:21 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Tue, 12 May 2015 00:12:21 +0200
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAAZsQLAKY8_B4vFAz8AHKf_wjAUM2bzpTrqZ4EN3zmdbBWfmRQ@mail.gmail.com>
References: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
 <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com>
 <CAAZsQLDjbb6JCiMHJUcHs-9qeeiyp3AbkvHvOy5gKwKKwLSR+Q@mail.gmail.com>
 <CAAZsQLAKY8_B4vFAz8AHKf_wjAUM2bzpTrqZ4EN3zmdbBWfmRQ@mail.gmail.com>
Message-ID: <CAAZsQLBEk=rzcz87NfLe4kEviNuQBmE9rJ_2_QhsgL+9772eWA@mail.gmail.com>

Sorry the fun part: the more you write code the less you have to write
tests.

# what's a strict email address:
def test_it_is_an_email_address_with_a_strict_user_name(self):
    assert is_email_address.replace(has_user, has_strict_user) ==
is_strict_email_address

2015-05-11 23:59 GMT+02:00 Gregory Salvan <apieum at gmail.com>:

> In case you've not seen how it divides the volume of code you'll need to
> write, here are tests of "is_email_address":
>
> # What's an email address ?
> def test_it_is_a_string(self):
>     assert is_string in is_email_address
>
> def test_it_has_a_user_name(self):
>     assert has_user in is_email_address
>
> def test_it_contains_at(self):
>     assert has_char('@') in is_email_address
>
> def test_it_has_a_domain_name(self):
>     assert has_domain in is_email_address
>
> # answer: an email address is a string with a user name, char '@' and a
> domain name.
>
> @Teddy Reedy with a class you'll have to write more tests and abuse of
> inheritance.
>
>
> 2015-05-11 21:44 GMT+02:00 Gregory Salvan <apieum at gmail.com>:
>
>> Andrew Barnet we disagree.
>> In your example you have no information about if error comes from domain,
>> user name or domain extension...
>> Writing a big regexp with group... really ? it is easy to maintain, test
>> and reuse ? and for a novice ? muliply this by thousands of validators and
>> their respectives tests.
>> I call that a mess and inside a project I lead, I will not accept it.
>>
>> Even in Haskell people rarelly use arrows, I don't criticize this choice
>> as arrows comes from category theory and we are used to think inside ZF set
>> theory.
>> Somes prefer a syntax over another, there is not a good answer, but this
>> also mean there is no irrelevant answer.
>> In fact both exists and choosing within the case is never easy. Thinking
>> the same way for each problem is also wrong, so I will never pretend to
>> resolve every problem with a single lib.
>>
>> Now I understand this idea is not a priority, I've seen more and more
>> threads about functional tools, I regret we can't find a solution but
>> effectively this absence of solution now can't convince me to stop digging
>> other paths. This is not irrespectuous.
>>
>>
>>
>> 2015-05-11 20:25 GMT+02:00 Andrew Barnert <abarnert at yahoo.com>:
>>
>>> On Monday, May 11, 2015 9:15 AM, Gregory Salvan <apieum at gmail.com>
>>> wrote:
>>>
>>>
>>> >I don't want to insist and I respect your point of view, I just want to
>>> give a simplified real life example to show that function composition can
>>> be less painful than another syntax.
>>>
>>> OK, let's compare your example to a Pythonic implementation of the same
>>> thing.
>>>
>>> import re
>>>
>>> ruser =
>>> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$")
>>> rdomain =
>>> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$")
>>> rstrict_user = re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$")
>>>
>>>
>>> def is_email_address(addr):
>>>     user, domain = addr.split('@', 1)
>>>     return ruser.match(user) and rdomain.match(domain)
>>>
>>> def is_strict_email_address(addr):
>>>     user, domain = addr.split('@', 1)
>>>     return rstrictuser.match(user) and rdomain.match(domain)
>>>
>>>
>>> def is_org_address(addr):
>>>     return is_email_address(addr) and addr.ends_with('.org')
>>>
>>> (An even better solution, given that you're already using regexps, might
>>> be to just use a single regexp with named groups for the user or
>>> strict-user, full domain, and TLD? but I've left yours alone.)
>>>
>>> Far from being more painful, the Pythonic version is easier to write,
>>> easier to read, easier to debug, shorter, and understandable to even a
>>> novice, without having to rewrite anything in your head. It also handles
>>> invalid input by returning failure values and/or raising appropriate
>>> exceptions rather than asserting and exiting. And it's almost certainly
>>> going to be significantly more efficient. And it works with any string-like
>>> type (that is, any type that has a .split method and works with re.match).
>>> And if you have to debug something, you will have, e.g., values named user
>>> and domain, rather than both being named value at different levels on the
>>> call stack.
>>>
>>> If you really want to come up with a convincing example for your idea,
>>> I'd take an example out of Learn You a Haskell or another book or tutorial
>>> and translate that to Python with your library. I suspect it would still
>>> have some of the same problems, but this example wouldn't even really be
>>> good in Haskell, so it's just making it harder to see why anyone would want
>>> anything like it. And by offering this as the response to Guido's "You're
>>> never going to convince me," well, if he _was_ still reading this thread
>>> with an open mind, he probably isn't anymore (although, to be honest, he
>>> probably wasn't reading it anyway).
>>>
>>> >import re
>>> >
>>> >from lawvere import curry # curry is an arrow without type checking,
>>> inherits composition, mutiple dispatch
>>> >
>>> >user_match =
>>> re.compile("^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*$").match
>>> >domain_match =
>>> re.compile("^(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$").match
>>> >strict_user_match =
>>> re.compile("^[a-z0-9][a-z0-9_-]+(?:\.[a-z0-9_-]+)*$").match
>>> >
>>> >@curry>def is_string(value):
>>> >    assert isinstance(value, str), '%s is not a string' %value
>>> >    return value
>>> >
>>> >@curry
>>> >def apply_until_char(func, char, value):
>>> >    func(value[:value.index(char)])
>>> >    return value
>>> >
>>> >@curry
>>> >def apply_from_char(func, char, value):
>>> >    func(value[value.index(char) + 1:])
>>> >    return value
>>> >
>>> >@curry
>>> >
>>> >def has_char(char, value):
>>> >    assert value.count(char) == 1
>>> >    return value
>>> >
>>> >@curry
>>> >def assert_ends_with(text, value):
>>> >    assert value.endswith(text), '%s do not ends with %s' % (value,
>>> text)
>>> >    return value
>>> >
>>> >@curry
>>> >def assert_user(user):
>>> >    assert user_match(user) is not None, '%s is not a valid user name'
>>> % value
>>> >    return user
>>> >
>>> >@curry
>>> >def assert_strict_user(user):
>>> >    assert strict_user_match(user) is not None, '%s is not a valid
>>> strict user' % value
>>> >    return user
>>> >
>>> >@curry
>>> >def assert_domain(domain):
>>> >    assert domain_match(domain) is not None, '%s is not a valid domain
>>> name' % value
>>> >    return domain
>>> >
>>> ># currying (be made with partial)
>>> >
>>> >has_user = apply_until_char(assert_user, '@')
>>> >
>>> >has_strict_user = apply_until_char(assert_strict_user, '@')
>>> >
>>> >has_domain = apply_from_char(assert_domain, '@')
>>> >
>>> >
>>> ># composition:
>>> >
>>> >is_email_address = is_string >> has_char('@') >> has_user >> has_domain
>>> >
>>> >is_strict_email_address = is_string >> has_char('@') >> has_strict_user
>>> >> has_domain
>>> >
>>> >
>>> ># we just want org adresses ?
>>> >
>>> >is_org_addess = is_email_address >> assert_ends_with('.org')
>>> >
>>> >
>>> >
>>> >
>>> >I found a lot of interest in this syntax, mainly for testing purpose,
>>> readability and maintenability of code.
>>> >
>>> >No matters if I'm a fish out of python waters. :)
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >2015-05-11 16:41 GMT+02:00 Guido van Rossum <guido at python.org>:
>>> >
>>> >As long as I'm "in charge" the chances of this (or anything like it)
>>> being accepted into Python are zero. I get a headache when I try to
>>> understand code that uses function composition, and I end up having to
>>> laboriously rewrite it using more traditional call notation before I move
>>> on to understanding what it actually does. Python is not Haskell, and
>>> perhaps more importantly, Python users are not like Haskel users. Either
>>> way, what may work out beautifully in Haskell will be like a fish out of
>>> water in Python.
>>> >>
>>> >>I understand that it's fun to try to sole this puzzle, but evolving
>>> Python is more than solving puzzles. Enjoy debating the puzzle, but in the
>>> end Python will survive without the solution.
>>> >>
>>> >>
>>> >>
>>> >>--
>>> >>
>>> >>--Guido van Rossum (python.org/~guido)
>>> >>_______________________________________________
>>> >>Python-ideas mailing list
>>> >>Python-ideas at python.org
>>> >>https://mail.python.org/mailman/listinfo/python-ideas
>>> >>Code of Conduct: http://python.org/psf/codeofconduct/
>>> >>
>>> >
>>> >
>>> >_______________________________________________
>>> >Python-ideas mailing list
>>> >Python-ideas at python.org
>>> >https://mail.python.org/mailman/listinfo/python-ideas
>>> >Code of Conduct: http://python.org/psf/codeofconduct/
>>> >
>>> >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150512/b00c70da/attachment-0001.html>

From ncoghlan at gmail.com  Tue May 12 04:15:44 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 12 May 2015 12:15:44 +1000
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAAZsQLBEk=rzcz87NfLe4kEviNuQBmE9rJ_2_QhsgL+9772eWA@mail.gmail.com>
References: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
 <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com>
 <CAAZsQLDjbb6JCiMHJUcHs-9qeeiyp3AbkvHvOy5gKwKKwLSR+Q@mail.gmail.com>
 <CAAZsQLAKY8_B4vFAz8AHKf_wjAUM2bzpTrqZ4EN3zmdbBWfmRQ@mail.gmail.com>
 <CAAZsQLBEk=rzcz87NfLe4kEviNuQBmE9rJ_2_QhsgL+9772eWA@mail.gmail.com>
Message-ID: <CADiSq7c3O9Dy5obT_ec-5JjJjyPUNfH5An-m=791GPDZWwyJHw@mail.gmail.com>

On 12 May 2015 at 08:12, Gregory Salvan <apieum at gmail.com> wrote:
> Sorry the fun part: the more you write code the less you have to write
> tests.

I think this is the key for the folks hoping to make the case for
increased support for function composition in the future (it's
definitely too late in the cycle for 3.5): focus on the *pragmatic*
benefits in testability, and argue that this makes up for the *loss*
of readability. "It's easier to read" is *not* a true statement for
anyone that hasn't already learned to think functionally, and "It is
worth your while to learn to think functionally, even if it takes you
years" is a very *different* statement.

The human brain tends to think procedurally by default (presumably
because our stream of consciousness is typically experienced as a
linear series of events), while object oriented programming can
benefit from analogies with physical objects (especially when taught
via robotics or other embodied systems), and message passing based
concurrent systems can benefit from analogies with human
communications. By contrast, there aren't any easy "interaction with
the physical world" analogies to draw on for functional programming,
so it takes extensive training and practice to teach people to think
in functional terms. Folks with a strong mathematical background
(especially in formal mathematical proofs) often already have that
training (even if they're only novice programmers), while the vast
majority of software developers (even professional ones), don't.

As a result, I think the more useful perspective to take is the one
taken for the PEP 484 type hinting PEP: positioning function
composition as an advanced tool for providing increased correctness
guarantees for critical components by building them up from
independently tested composable parts, rather than relying on ad hoc
procedural logic that may itself be a source of bugs. Aside from more
accurately reflecting the appropriate role of function composition in
Pythonic development (i.e. as a high barrier to entry technique that
is nevertheless sometimes worth the additional conceptual complexity,
akin to deciding to use metaclasses to solve a problem), it's also
likely to prove beneficial that Guido's recently been on the other
side of this kind of argument when it comes to both type hinting in
PEP 484 and async/await in PEP 492. I assume he'll still remain
skeptical of the value of the trade-off when it comes to further
improvements to Python's functional programming support, but at least
he'll be familiar with the form of the argument :)

On the "pragmatic benefits in testability" front, I believe one key
tool to focus on is the Quick Check test case generator
(https://wiki.haskell.org/Introduction_to_QuickCheck1) which lets the
test generator take care of determining appropriate boundary
conditions to check based on a specification of the desired externally
visible behaviour of a function, rather than relying on the developer
to manually specify those boundary conditions as particular test
cases.

I personally learned about that approach earlier this year through a
talk that Fraser Tweedale gave at LCA in January:
https://speakerdeck.com/frasertweedale/the-best-test-data-is-random-test-data
& https://www.youtube.com/watch?v=p7oRMB5V2kE

For Python, Fraser pointed out http://xion.io/pyqcy/ and Google tells
me there's also https://pypi.python.org/pypi/pytest-quickcheck

Gary Bernhardt's work is also worth exploring, including the
"Functional Core, Imperative Shell" model discussed in his
"Boundaries" presentation
(https://www.youtube.com/watch?v=yTkzNHF6rMs) a few years back (an
implementation of this approach is available for Python at
https://pypi.python.org/pypi/nonobvious/). His closing keynote
presentation at PyCon this year was also relevant (relating to the
differences between the assurances that testing can provide vs those
offered by powerful type systems like Idris), but unfortunately not
available online.

Andrew's recommendation to "approach via NumPy" is also a good one.
Scientific programmers tend to be much better mathematicians than
other programmers (and hence more likely to appreciate the value of
development techniques based on function composition), and the rapid
acceptance of the matrix multiplication PEP shows the scientific
Python community have also become quite skilled at making the case to
python-dev for new language level features of interest to them :)

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From nicholas.chammas at gmail.com  Tue May 12 04:23:43 2015
From: nicholas.chammas at gmail.com (Nicholas Chammas)
Date: Tue, 12 May 2015 02:23:43 +0000
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CADiSq7c3O9Dy5obT_ec-5JjJjyPUNfH5An-m=791GPDZWwyJHw@mail.gmail.com>
References: <CAAZsQLAX-uwr9W0SLg62xmmR_ypfM=JOK44BymFjcd5fueyJwA@mail.gmail.com>
 <1413695394.4340925.1431368724949.JavaMail.yahoo@mail.yahoo.com>
 <CAAZsQLDjbb6JCiMHJUcHs-9qeeiyp3AbkvHvOy5gKwKKwLSR+Q@mail.gmail.com>
 <CAAZsQLAKY8_B4vFAz8AHKf_wjAUM2bzpTrqZ4EN3zmdbBWfmRQ@mail.gmail.com>
 <CAAZsQLBEk=rzcz87NfLe4kEviNuQBmE9rJ_2_QhsgL+9772eWA@mail.gmail.com>
 <CADiSq7c3O9Dy5obT_ec-5JjJjyPUNfH5An-m=791GPDZWwyJHw@mail.gmail.com>
Message-ID: <CAOhmDzfojQJwuVzLytiL-oabCGSAa1e9ucyaK3VvtF4mJP4Bwg@mail.gmail.com>

> For Python, Fraser pointed out http://xion.io/pyqcy/ and Google tells me
there's also https://pypi.python.org/pypi/pytest-quickcheck

Don't forget about the latest and greatest Python library for
property-based testing, Hypothesis
<http://hypothesis.readthedocs.org/en/latest/>!

On Mon, May 11, 2015 at 10:16 PM Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 12 May 2015 at 08:12, Gregory Salvan <apieum at gmail.com> wrote:
> > Sorry the fun part: the more you write code the less you have to write
> > tests.
>
> I think this is the key for the folks hoping to make the case for
> increased support for function composition in the future (it's
> definitely too late in the cycle for 3.5): focus on the *pragmatic*
> benefits in testability, and argue that this makes up for the *loss*
> of readability. "It's easier to read" is *not* a true statement for
> anyone that hasn't already learned to think functionally, and "It is
> worth your while to learn to think functionally, even if it takes you
> years" is a very *different* statement.
>
> The human brain tends to think procedurally by default (presumably
> because our stream of consciousness is typically experienced as a
> linear series of events), while object oriented programming can
> benefit from analogies with physical objects (especially when taught
> via robotics or other embodied systems), and message passing based
> concurrent systems can benefit from analogies with human
> communications. By contrast, there aren't any easy "interaction with
> the physical world" analogies to draw on for functional programming,
> so it takes extensive training and practice to teach people to think
> in functional terms. Folks with a strong mathematical background
> (especially in formal mathematical proofs) often already have that
> training (even if they're only novice programmers), while the vast
> majority of software developers (even professional ones), don't.
>
> As a result, I think the more useful perspective to take is the one
> taken for the PEP 484 type hinting PEP: positioning function
> composition as an advanced tool for providing increased correctness
> guarantees for critical components by building them up from
> independently tested composable parts, rather than relying on ad hoc
> procedural logic that may itself be a source of bugs. Aside from more
> accurately reflecting the appropriate role of function composition in
> Pythonic development (i.e. as a high barrier to entry technique that
> is nevertheless sometimes worth the additional conceptual complexity,
> akin to deciding to use metaclasses to solve a problem), it's also
> likely to prove beneficial that Guido's recently been on the other
> side of this kind of argument when it comes to both type hinting in
> PEP 484 and async/await in PEP 492. I assume he'll still remain
> skeptical of the value of the trade-off when it comes to further
> improvements to Python's functional programming support, but at least
> he'll be familiar with the form of the argument :)
>
> On the "pragmatic benefits in testability" front, I believe one key
> tool to focus on is the Quick Check test case generator
> (https://wiki.haskell.org/Introduction_to_QuickCheck1) which lets the
> test generator take care of determining appropriate boundary
> conditions to check based on a specification of the desired externally
> visible behaviour of a function, rather than relying on the developer
> to manually specify those boundary conditions as particular test
> cases.
>
> I personally learned about that approach earlier this year through a
> talk that Fraser Tweedale gave at LCA in January:
>
> https://speakerdeck.com/frasertweedale/the-best-test-data-is-random-test-data
> &
> <https://speakerdeck.com/frasertweedale/the-best-test-data-is-random-test-data&>
> https://www.youtube.com/watch?v=p7oRMB5V2kE
>
> For Python, Fraser pointed out http://xion.io/pyqcy/ and Google tells
> me there's also https://pypi.python.org/pypi/pytest-quickcheck
>
> Gary Bernhardt's work is also worth exploring, including the
> "Functional Core, Imperative Shell" model discussed in his
> "Boundaries" presentation
> (https://www.youtube.com/watch?v=yTkzNHF6rMs) a few years back (an
> implementation of this approach is available for Python at
> https://pypi.python.org/pypi/nonobvious/). His closing keynote
> presentation at PyCon this year was also relevant (relating to the
> differences between the assurances that testing can provide vs those
> offered by powerful type systems like Idris), but unfortunately not
> available online.
>
> Andrew's recommendation to "approach via NumPy" is also a good one.
> Scientific programmers tend to be much better mathematicians than
> other programmers (and hence more likely to appreciate the value of
> development techniques based on function composition), and the rapid
> acceptance of the matrix multiplication PEP shows the scientific
> Python community have also become quite skilled at making the case to
> python-dev for new language level features of interest to them :)
>
> Regards,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150512/3eb08be8/attachment.html>

From rustompmody at gmail.com  Tue May 12 07:06:31 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Tue, 12 May 2015 10:36:31 +0530
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
Message-ID: <CAJ+TeofChUowoPz=FA9oVQdJ0ry_tJ8HzX2qAJ-=ZgEeKgb=+w@mail.gmail.com>

On Mon, May 11, 2015 at 8:11 PM, Guido van Rossum <guido at python.org> wrote:

>
>
> As long as I'm "in charge" the chances of this (or anything like it) being
> accepted into Python are zero. I get a headache when I try to understand
> code that uses function composition,
>

I find it piquant to see this comment from the creator of a language that
traces its lineage to Lambert Meertens :-)
[Was reading one of the classics just yesterday
http://www.kestrel.edu/home/people/meertens/publications/papers/Algorithmics.pdf
]
Personally, yeah I dont think python blindly morphing into haskell is a
neat idea
In the specific case of composition my position is...

sqrt(mean(square(x)))
is ugly in a lispy way

(sqrt @ mean @ square)(x)
is backward in one way

(square @ mean @ sqrt)(x)
is backward in another way

sqrt @ mean @ square
is neat for being point-free and reads easy like a Unix '|' but the '@' is
more strikingly ugly

sqrt o mean o square
is a parsing nightmare

square ? mean ? root
Just right!  [Assuming the unicode gods favor its transmission!]

...hopefully not too frivolous to say this but the ugliness of @ overrides
the succinctness of the math for me
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150512/a7b6ddd6/attachment-0001.html>

From flying-sheep at web.de  Tue May 12 10:36:22 2015
From: flying-sheep at web.de (Philipp A.)
Date: Tue, 12 May 2015 08:36:22 +0000
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAJ+TeofChUowoPz=FA9oVQdJ0ry_tJ8HzX2qAJ-=ZgEeKgb=+w@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
 <CAJ+TeofChUowoPz=FA9oVQdJ0ry_tJ8HzX2qAJ-=ZgEeKgb=+w@mail.gmail.com>
Message-ID: <CAN8d9gkWfskTaD-eNrMktfyT=jVyVNWYsqqOERsH7vcj+3w4Ug@mail.gmail.com>

ha, i love unicode operators (e.g. in scala), but i think guido said python
will stay ASCII.

i hope we one day gain the ability to *optionally* use unicode
alternatives, even if that would put an end to our __matmul__ ? function
combination aspirations:

* ? ?
@ ? ? (not ?)
/ ? ?
... ? ?
lambda ? ?

? phil

Rustom Mody <rustompmody at gmail.com> schrieb am Di., 12. Mai 2015 um
07:07 Uhr:

> On Mon, May 11, 2015 at 8:11 PM, Guido van Rossum <guido at python.org>
> wrote:
>
>>
>>
>> As long as I'm "in charge" the chances of this (or anything like it)
>> being accepted into Python are zero. I get a headache when I try to
>> understand code that uses function composition,
>>
>
> I find it piquant to see this comment from the creator of a language that
> traces its lineage to Lambert Meertens :-)
> [Was reading one of the classics just yesterday
>
> http://www.kestrel.edu/home/people/meertens/publications/papers/Algorithmics.pdf
> ]
> Personally, yeah I dont think python blindly morphing into haskell is a
> neat idea
> In the specific case of composition my position is...
>
> sqrt(mean(square(x)))
> is ugly in a lispy way
>
> (sqrt @ mean @ square)(x)
> is backward in one way
>
> (square @ mean @ sqrt)(x)
> is backward in another way
>
> sqrt @ mean @ square
> is neat for being point-free and reads easy like a Unix '|' but the '@' is
> more strikingly ugly
>
> sqrt o mean o square
> is a parsing nightmare
>
> square ? mean ? root
> Just right!  [Assuming the unicode gods favor its transmission!]
>
> ...hopefully not too frivolous to say this but the ugliness of @ overrides
> the succinctness of the math for me
>
>
>  _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150512/e4ef8954/attachment.html>

From jmcs at jsantos.eu  Tue May 12 11:01:01 2015
From: jmcs at jsantos.eu (=?UTF-8?B?Sm/Do28gU2FudG9z?=)
Date: Tue, 12 May 2015 09:01:01 +0000
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAN8d9gkWfskTaD-eNrMktfyT=jVyVNWYsqqOERsH7vcj+3w4Ug@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
 <CAJ+TeofChUowoPz=FA9oVQdJ0ry_tJ8HzX2qAJ-=ZgEeKgb=+w@mail.gmail.com>
 <CAN8d9gkWfskTaD-eNrMktfyT=jVyVNWYsqqOERsH7vcj+3w4Ug@mail.gmail.com>
Message-ID: <CAH_XWH0u6xyVTZx0TUkcnJzoNxu6_kcwE-9VnG9VhzCW+RU62A@mail.gmail.com>

Python already supports unicode operators (kind of). You just have to use a
custom codec that translates the unicode characters to proper python.

On Tue, 12 May 2015 at 10:42 Philipp A. <flying-sheep at web.de> wrote:

> ha, i love unicode operators (e.g. in scala), but i think guido said
> python will stay ASCII.
>
> i hope we one day gain the ability to *optionally* use unicode
> alternatives, even if that would put an end to our __matmul__ ? function
> combination aspirations:
>
> * ? ?
> @ ? ? (not ?)
> / ? ?
> ... ? ?
> lambda ? ?
>
> ? phil
>
> Rustom Mody <rustompmody at gmail.com> schrieb am Di., 12. Mai 2015 um
> 07:07 Uhr:
>
>> On Mon, May 11, 2015 at 8:11 PM, Guido van Rossum <guido at python.org>
>> wrote:
>>
>>>
>>>
>>> As long as I'm "in charge" the chances of this (or anything like it)
>>> being accepted into Python are zero. I get a headache when I try to
>>> understand code that uses function composition,
>>>
>>
>> I find it piquant to see this comment from the creator of a language that
>> traces its lineage to Lambert Meertens :-)
>> [Was reading one of the classics just yesterday
>>
>> http://www.kestrel.edu/home/people/meertens/publications/papers/Algorithmics.pdf
>> ]
>> Personally, yeah I dont think python blindly morphing into haskell is a
>> neat idea
>> In the specific case of composition my position is...
>>
>> sqrt(mean(square(x)))
>> is ugly in a lispy way
>>
>> (sqrt @ mean @ square)(x)
>> is backward in one way
>>
>> (square @ mean @ sqrt)(x)
>> is backward in another way
>>
>> sqrt @ mean @ square
>> is neat for being point-free and reads easy like a Unix '|' but the '@'
>> is more strikingly ugly
>>
>> sqrt o mean o square
>> is a parsing nightmare
>>
>> square ? mean ? root
>> Just right!  [Assuming the unicode gods favor its transmission!]
>>
>> ...hopefully not too frivolous to say this but the ugliness of @
>> overrides the succinctness of the math for me
>>
>>
>>  _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150512/4c210d94/attachment.html>

From spencerb21 at live.com  Tue May 12 11:30:55 2015
From: spencerb21 at live.com (Spencer Brown)
Date: Tue, 12 May 2015 19:30:55 +1000
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
	'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAH_XWH0u6xyVTZx0TUkcnJzoNxu6_kcwE-9VnG9VhzCW+RU62A@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
 <CAJ+TeofChUowoPz=FA9oVQdJ0ry_tJ8HzX2qAJ-=ZgEeKgb=+w@mail.gmail.com>
 <CAN8d9gkWfskTaD-eNrMktfyT=jVyVNWYsqqOERsH7vcj+3w4Ug@mail.gmail.com>
 <CAH_XWH0u6xyVTZx0TUkcnJzoNxu6_kcwE-9VnG9VhzCW+RU62A@mail.gmail.com>
Message-ID: <SNT405-EAS2220BA0CAAA916F815CDCF0BEDA0@phx.gbl>

It might be neat to be able to use the superscript and subscript number glyphs for exponentiation and indexing, so 'x?, x?, x?' == 'x[0], x[1], x[2]' and 'some_var????' == 'some_var ** 35.1'. (That probably shouldn't support anything other than numbers and '.' to keep things simple). There's also the comparison operators (?, ?, ?, ?, ?), '?' for in and perhaps even additional overloads for sets (?, ?, ?, ?, ?, ?, ?).
Maybe the math module could have a math.? alias as well for people who wish to import it.

- Spencer

> On 12 May 2015, at 7:01 pm, Jo?o Santos <jmcs at jsantos.eu> wrote:
> 
> Python already supports unicode operators (kind of). You just have to use a custom codec that translates the unicode characters to proper python.
> 
>> On Tue, 12 May 2015 at 10:42 Philipp A. <flying-sheep at web.de> wrote:
>> ha, i love unicode operators (e.g. in scala), but i think guido said python will stay ASCII.
>> 
>> i hope we one day gain the ability to optionally use unicode alternatives, even if that would put an end to our __matmul__ ? function combination aspirations:
>> 
>> * ? ?
>> @ ? ? (not ?)
>> / ? ?
>> ... ? ?
>> lambda ? ?
>> 
>> ? phil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150512/d59b7e40/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Python-ideas mailing list
Python-ideas at python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

From rustompmody at gmail.com  Tue May 12 13:27:41 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Tue, 12 May 2015 16:57:41 +0530
Subject: [Python-ideas] Partial operator (and 'third-party methods' and
 'piping') [was Re: Function composition (was no subject)]
In-Reply-To: <CAN8d9gkWfskTaD-eNrMktfyT=jVyVNWYsqqOERsH7vcj+3w4Ug@mail.gmail.com>
References: <554C5FC0.1070106@aalto.fi>
 <874mnm4ftw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <EE278C3F-6E64-4465-952C-56245055F9DE@yahoo.com> <mil9lv$hi2$1@ger.gmane.org>
 <27573_1431195411_554E4F12_27573_2470_1_20150509181642.GB5663@ando.pearwood.info>
 <554E5CC9.3010406@aalto.fi>
 <CAAZsQLDiVJ_d6Pp1N9iiDpXL=z3cSWx-eBdWzOK4r9tDJ2zRCg@mail.gmail.com>
 <10001_1431209016_554E8437_10001_426_1_CAAZsQLCX=9d3n9h0TZ+K2pfaUFiNVCtCahbjMkeEJ6L2WXLZTg@mail.gmail.com>
 <14232_1431212854_554E9336_14232_216_1_554E9327.9030706@aalto.fi>
 <554FBA3D.30907@aalto.fi> <20150511014412.GL5663@ando.pearwood.info>
 <512626e077e2445f887cfbf638c4d7ca@swordfish.abiresearch.com>
 <CAP7+vJJsRUD2_nA_NCQuB2smteYGiEbUbOyt=z-XyLCosTpp1g@mail.gmail.com>
 <CAJ+TeofChUowoPz=FA9oVQdJ0ry_tJ8HzX2qAJ-=ZgEeKgb=+w@mail.gmail.com>
 <CAN8d9gkWfskTaD-eNrMktfyT=jVyVNWYsqqOERsH7vcj+3w4Ug@mail.gmail.com>
Message-ID: <CAJ+Teoc0X7-m94jzegZUwr6hTdJchQ4D-Q4ocOPw54hQ2ZwzcQ@mail.gmail.com>

On Tue, May 12, 2015 at 2:06 PM, Philipp A. <flying-sheep at web.de> wrote:

> ha, i love unicode operators (e.g. in scala), but i think guido said
> python will stay ASCII.
>

Or Julia
http://iaindunning.com/blog/julia-unicode.html

Also Fortress, Agda and the classic APL

Interestingly Haskell is one step ahead of Python in some areas and behind
in others
---------
GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> let (x?, x?) = (1, 2)
Prelude> (x?, x?)
(1,2)
Prelude>

---------

However wrt getting ligatures right python is ahead:
[Haskell]

Prelude> let ?ag = True
Prelude> flag

<interactive>:5:1: Not in scope: `flag'

[Equivalent of NameError]
-------------
[Python3]

>>> ?ag = True
>>> flag
True
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150512/98d6da3f/attachment.html>

From mistersheik at gmail.com  Wed May 13 08:24:04 2015
From: mistersheik at gmail.com (Neil Girdhar)
Date: Tue, 12 May 2015 23:24:04 -0700 (PDT)
Subject: [Python-ideas] Add math.iszero() and math.isequal()?
In-Reply-To: <4537a315-a08c-4838-8d55-1483ac9656bc@googlegroups.com>
References: <4537a315-a08c-4838-8d55-1483ac9656bc@googlegroups.com>
Message-ID: <85fe56bf-84a1-45b9-84bf-26b2ff389486@googlegroups.com>

See PEP 485, which appears to be still a draft: 
https://www.python.org/dev/peps/pep-0485/

Best,

Neil


On Tuesday, May 12, 2015 at 3:18:47 AM UTC-4, Mark Summerfield wrote:
>
> From Python 3.2 it is easy to compare floats, e.g.,
>
> iszero = lambda x: hash(x) == hash(0)
> isequal = lambda a, b: hash(a) == hash(b)
>
> Clearly these are trivial functions (but perphaps math experts could 
> provide better implementations; I'm not proposing the implementations 
> shown, just the functions however they are implemented).
>
> It seems that not everyone is aware of the issues regarding comparing 
> floats for equality and so I still see code that compares floats using == 
> or !=.
>
> If these functions were in the math module it would be convenient (since I 
> find I need them in most non-trivial programs), but also provide a place to 
> document that they should be used rather than == or != for floats. (I guess 
> a similar argument might apply to the cmath module?)
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150512/2537d9dd/attachment.html>

From rob.cliffe at btinternet.com  Wed May 13 11:53:32 2015
From: rob.cliffe at btinternet.com (Rob Cliffe)
Date: Wed, 13 May 2015 10:53:32 +0100
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <20150507153123.GT5663@ando.pearwood.info>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com> <20150507153123.GT5663@ando.pearwood.info>
Message-ID: <55531F1C.2090509@btinternet.com>



On 07/05/2015 16:31, Steven D'Aprano wrote:
>
>> Imageine if we were starting to design the 21st century from scratch,
>> throwing away all the history?  How would we go about it?
> Well, for starters I would insist on re-introducing thorn ? and eth ?
> back into English :-)
>
>                                                                                                                                                                                                                                                                                                                                         
I'd second that. :-)

Seriously, thanks to everyone who took the trouble to reply to my rant, 
instead of just dismissing it as the ravings of an idiot.
I found your replies quite enlightening.
Rob Cliffe

From random832 at fastmail.us  Wed May 13 16:33:28 2015
From: random832 at fastmail.us (random832 at fastmail.us)
Date: Wed, 13 May 2015 10:33:28 -0400
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>

On Thu, May 7, 2015, at 18:30, Stephen J. Turnbull wrote:
> Chris Barker writes:
> 
>  > I've read many of the rants about UTF-16, but in fact, it's really
>  > not any worse than UTF-8
> 
> Yes, it is.  It's not ASCII compatible.  You can safely use the usual
> libc string APIs on UTF-8 (except for any that might return only part
> of a string), but not on UTF-16 (nulls).  This is a pretty big
> advantage for UTF-8 in practice.

If you're using libc, why shouldn't you be using the native wide
character types (whether that it UTF-16 or UCS-4) and using the wide
string APIs?

From ncoghlan at gmail.com  Wed May 13 18:22:41 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 May 2015 02:22:41 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7c2_Eqzy0r_tokEvGdVj+7a1THKZ0w+7MPyQUji2KusvQ@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7c2_Eqzy0r_tokEvGdVj+7a1THKZ0w+7MPyQUji2KusvQ@mail.gmail.com>
Message-ID: <CADiSq7diFc8CMQPBQeu6ugaZdsbNFYwkHHWgX11dw+9vO+5aFA@mail.gmail.com>

(Note: I've posted to the issue suggesting we defer further
consideration to 3.6, as well as suggesting a new "string.internals"
submodule as a possible home for them, but I'm following up here to
capture my current thinking on the topic)

On 7 May 2015 4:47 pm, "Nick Coghlan" <ncoghlan at gmail.com> wrote:
> Regardless of which specific approach you take, handling surrogates
> explicitly when a string is passed to you from an API that uses
> permissive decoding lets you avoid both unexpected UnicodeEncodeError
> exceptions (if the surrogates end up being encoded with an error
> handler other than surrogatepass or surrogateescape) or propagating
> mojibake (if the surrogates are encoded with a suitable error handler,
> but an encoding that differs from the original).

Considering this rationale further, the key purpose of the proposed
new surrogate handling functions is to take an input string that may
contain surrogate code points, and produce one that is guaranteed
*not* to contain such surrogates (either because they've been removed
or replaced, or because an exception will be thrown if there are any
present in the input). They're designed to let a developer either make
a program eagerly detect improperly decoded data, or else to convert
the surrogates to an encodable form (potentially losing data in the
process)

Three potential expected sources of surrogates have been identified:

* escaped surrogates smuggling arbitrary bytes passed through decoding
by the "surrogateescape" error handler
* surrogates passed through the decoding process by the
"surrogatepass" error handler
* decomposed surrogate pairs for astral characters

The various reasonable "data scrubbing" techniques that have been proposed are:

1. compose surrogate pairs to the corresponding astral code point
2. throw an error for any surrogates found
3. delete any surrogates found
4. replace any surrogates found with the Unicode replacement character
5. replace any surrogates found with their corresponding backslash
escaped sequence
6. as with the preceding, but only for surrogate escaped data, not
arbitrary surrogates

The first of those is handled by the suggested
"compose_surrogate_pairs()", which will convert valid pairs to their
corresponding astral code points.

2-5 are handled by rehandle_surrogatepass(), with the corresponding
decoding error handler (strict, ignore, replace, backslashreplace)
6 is handled by rehandle_surrogateescape(), again with the
corresponding error handlers

A potential downside of this approach of exposing the error handlers
directly as part of the data scrubbing API is that passing in
"surrogateescape" or "surrogatepass" as the error handler may break
the assurance that the output doesn't contain any surrogates (this
could be avoided if those two error handlers don't support str->str
conversions).

Anyway, I think we can readily put this question aside for now, and
revisit it again for 3.6 after folks have a chance to get more
experience with some of the other bytes/text handling changes in 3.5.
I created a tracking issue (http://bugs.python.org/issue22555) for
those a while back, and just did a pass through them the other day to
see if there were any I particularly wanted to see make it into 3.5
(all the still open ones ended up in the "wait for other developments
before pursuing further" category).

Cheers,
Nick.

From stephen at xemacs.org  Wed May 13 19:45:15 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 14 May 2015 02:45:15 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
Message-ID: <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp>

random832 at fastmail.us writes:

 > If you're using libc, why shouldn't you be using the native wide
 > character types (whether that it UTF-16 or UCS-4) and using the wide
 > string APIs?

Who says you are using libc?  You might be writing an operating system
or a shell script.  And if you do use the native wide character type,
you're guaranteed not to be portable because some systems have wide
characters are actually variable width and others aren't, as you just
pointed out.  Or you might have an ancient byte-oriented program you
want to use.

I'm not saying that UTF-8 is a panacea; just that every problem that
UTF-8 has, UTF-16 also has -- but UTF-16 does have problems that UTF-8
doesn't.  Specifically, surrogates and ASCII incompatibility.

From abarnert at yahoo.com  Wed May 13 20:18:44 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 13 May 2015 11:18:44 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
Message-ID: <A8A8D1BA-B6CC-44D0-95B7-D24E1629A31A@yahoo.com>

On May 13, 2015, at 07:33, random832 at fastmail.us wrote:
> 
>> On Thu, May 7, 2015, at 18:30, Stephen J. Turnbull wrote:
>> Chris Barker writes:
>> 
>>> I've read many of the rants about UTF-16, but in fact, it's really
>>> not any worse than UTF-8
>> 
>> Yes, it is.  It's not ASCII compatible.  You can safely use the usual
>> libc string APIs on UTF-8 (except for any that might return only part
>> of a string), but not on UTF-16 (nulls).  This is a pretty big
>> advantage for UTF-8 in practice.
> 
> If you're using libc, why shouldn't you be using the native wide
> character types (whether that it UTF-16 or UCS-4) and using the wide
> string APIs?

That's exactly how you create the problems this thread is trying to solve.

If you treat wchar_t as a "native wide char type" and call any of the wcs functions on UTF-16 strings, you will count astral characters as two characters, illegally split strings in the middle of surrogates, etc. And you'll count BOMs as two characters and split them. These are basically all the same problems you have using char with UTF-8, and more, and harder to notice in testing (not just because you may not think to test for astral characters, but because even if you do, you may not think to test both byte orders).

And that's not even taking into account the fact that C explicitly allows wchar_t to be as small as 8 bits.

The Unicode and C standards both explicitly say that you should never use wchar_t for Unicode characters in portable code, only use it for storing the native characters of any wider-than-char locale encodings that a specific compiler supports.

Later versions of C and POSIX (as in later than what Python requires) provide explicit __CHAR16_TYPE__ and __CHAR_32_TYPE__, but they don't provide APIs for analogs of strlen, strchr, strtok, etc. for those types, so you have to be explicit about whether you're counting code points or characters (and, if characters, how you're dealing with endianness).

From stephen at xemacs.org  Thu May 14 06:52:32 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 14 May 2015 13:52:32 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7diFc8CMQPBQeu6ugaZdsbNFYwkHHWgX11dw+9vO+5aFA@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7c2_Eqzy0r_tokEvGdVj+7a1THKZ0w+7MPyQUji2KusvQ@mail.gmail.com>
 <CADiSq7diFc8CMQPBQeu6ugaZdsbNFYwkHHWgX11dw+9vO+5aFA@mail.gmail.com>
Message-ID: <87r3qj2227.fsf@uwakimon.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > Three potential expected sources of surrogates have been identified:

[omitted]
 > * decomposed surrogate pairs for astral characters

I wouldn't call that "expected", as it requires wilful malice on the
part of a programmer (not users or other external sources of input),
though.  No standard codec should produce such in a PEP 393 Python.


From storchaka at gmail.com  Thu May 14 07:31:13 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 14 May 2015 08:31:13 +0300
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7diFc8CMQPBQeu6ugaZdsbNFYwkHHWgX11dw+9vO+5aFA@mail.gmail.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7c2_Eqzy0r_tokEvGdVj+7a1THKZ0w+7MPyQUji2KusvQ@mail.gmail.com>
 <CADiSq7diFc8CMQPBQeu6ugaZdsbNFYwkHHWgX11dw+9vO+5aFA@mail.gmail.com>
Message-ID: <mj1bv1$u93$1@ger.gmane.org>

On 13.05.15 19:22, Nick Coghlan wrote:
> Three potential expected sources of surrogates have been identified:
>
> * escaped surrogates smuggling arbitrary bytes passed through decoding
> by the "surrogateescape" error handler
> * surrogates passed through the decoding process by the
> "surrogatepass" error handler
> * decomposed surrogate pairs for astral characters

* json
* pickle
* email
* nntplib
* SimpleHTTPRequestHandler
* wsgiref
* cgi
* tarfile
* filesystem names (os.decode) and other os calls
* platform and sysconfig
* other serializers



From ncoghlan at gmail.com  Thu May 14 10:20:28 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 May 2015 18:20:28 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <mj1bv1$u93$1@ger.gmane.org>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <87siba3zrf.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7c2_Eqzy0r_tokEvGdVj+7a1THKZ0w+7MPyQUji2KusvQ@mail.gmail.com>
 <CADiSq7diFc8CMQPBQeu6ugaZdsbNFYwkHHWgX11dw+9vO+5aFA@mail.gmail.com>
 <mj1bv1$u93$1@ger.gmane.org>
Message-ID: <CADiSq7cXqiOBZ-Yry47jRg+r8NvubNMJ5LpJPOuGyY3CLy961g@mail.gmail.com>

On 14 May 2015 at 15:31, Serhiy Storchaka <storchaka at gmail.com> wrote:
> On 13.05.15 19:22, Nick Coghlan wrote:
>>
>> Three potential expected sources of surrogates have been identified:
>>
>> * escaped surrogates smuggling arbitrary bytes passed through decoding
>> by the "surrogateescape" error handler
>> * surrogates passed through the decoding process by the
>> "surrogatepass" error handler
>> * decomposed surrogate pairs for astral characters
>
>
> * json
> * pickle
> * email
> * nntplib
> * SimpleHTTPRequestHandler
> * wsgiref
> * cgi
> * tarfile
> * filesystem names (os.decode) and other os calls
> * platform and sysconfig
> * other serializers

Right, those are the kinds of boundary APIs that drove the
introduction of Python 3's arbitrary bytes smuggling capabilities in
the first place.

The key changes I realised it's potentially worth waiting and seeing
the impact of are:

* the restoration of printf-style formatting for binary data
* the introduction of bytes.hex()
* the rise of systemd as the preferred init system for Linux (while
that doesn't solve the "bad locale settings" problem for *nix systems,
it tackles a reasonable chunk of them)

The first two should make it easier to just stay in the binary domain
when working with arbitrary binary data, while the last will hopefully
eliminate one of the common sources of declared-vs-actual encoding
mismatches.

I *expect* we'll still want these proposed APIs (or a comparable
alternative) by the time 3.6 rolls around, but I also see value in
continuing to be cautious about adding them (since we'll be stuck with
them once we do, although I guess we could also go down the path of
declaring "string.internals" to be a provisional API in PEP 411
terms).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From abarnert at yahoo.com  Thu May 14 10:48:42 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 14 May 2015 08:48:42 +0000 (UTC)
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <mj1bv1$u93$1@ger.gmane.org>
References: <mj1bv1$u93$1@ger.gmane.org>
Message-ID: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>

On Wednesday, May 13, 2015 10:31 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:

> On 13.05.15 19:22, Nick Coghlan wrote:
>>  Three potential expected sources of surrogates have been identified:
>> 
>>  * escaped surrogates smuggling arbitrary bytes passed through decoding
>>  by the "surrogateescape" error handler
>>  * surrogates passed through the decoding process by the
>>  "surrogatepass" error handler
>>  * decomposed surrogate pairs for astral characters
> 
> * json
> * pickle
> * email
> * nntplib
> * SimpleHTTPRequestHandler
> * wsgiref
> * cgi
> * tarfile
> * filesystem names (os.decode) and other os calls
> * platform and sysconfig
> * other serializers

As far as I can tell, all of your extra cases are just examples of the surrogateescape error handler, which Nick already mentioned.


Beyond that, some of these modules may need to understand surrogates internally, but I can't see how they could get anywhere near the module boundaries. For example, to build and parse JSON's 12-character escape sequences, like "\uD834\uDD1E" for U+1D11E, you obviously need to be able to decompose and compose astrals internally, but that shouldn't even generate unicode strings with surrogate pairs in 3.3+, much less expose them to user code.

From storchaka at gmail.com  Thu May 14 12:15:18 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 14 May 2015 13:15:18 +0300
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <mj1sjm$ukh$1@ger.gmane.org>

On 14.05.15 11:48, Andrew Barnert via Python-ideas wrote:
> On Wednesday, May 13, 2015 10:31 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> On 13.05.15 19:22, Nick Coghlan wrote:
>>>   Three potential expected sources of surrogates have been identified:
>>>
>>>   * escaped surrogates smuggling arbitrary bytes passed through decoding
>>>   by the "surrogateescape" error handler
>>>   * surrogates passed through the decoding process by the
>>>   "surrogatepass" error handler
>>>   * decomposed surrogate pairs for astral characters
>>
>> * json
>> * pickle
>> * email
>> * nntplib
>> * SimpleHTTPRequestHandler
>> * wsgiref
>> * cgi
>> * tarfile
>> * filesystem names (os.decode) and other os calls
>> * platform and sysconfig
>> * other serializers
>
> As far as I can tell, all of your extra cases are just examples of the surrogateescape error handler, which Nick already mentioned.

Not all. JSON allows to inject surrogates as \uXXXX. Pickle with 
protocol 0 uses the raw-unicode-escape encoding that allows surrogates.

There is also the UTF-7 encoding that allows surrogates. And yet one 
source of surrogates -- Python sources. eval(), etc.

Tkinter can produce surrogates. XML parser unfortunately can't 
(unfortunately - because it makes impossible to handle with Python some 
files generated by third-party programs). I'm not sure about sqlite3. 
Any extension module, any wrapper around third-party library could 
potentially produce surrogates.



From koos.zevenhoven at aalto.fi  Thu May 14 13:03:39 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Thu, 14 May 2015 14:03:39 +0300
Subject: [Python-ideas] Units in type hints
Message-ID: <5554810B.7050409@aalto.fi>

Hi all,

How about extending the type annotations for int, float and complex to 
optionally include also a unit?

For instance,

     def sleep(duration : Float['s']):
         ...

Now the type checker could catch the error of trying to pass the sleep 
duration in milliseconds, Float['ms']. This would also be useful for 
documentation, avoiding the 'need' for having names like duration_s. At 
least the notation with square brackets would resemble the way units are 
often written in science.

Another example:

     def calculate_travel_time(distance: Float['km']) -> Float['h']:
         speed = get_current_speed()  # type: Float['km/h']
         return distance / speed

Now, if you try to pass the distance in miles, or Float['mi'], the type 
checker would catch the error. Note that the type checker would also 
understand that 'km' divided by 'km/h' becomes 'h'. Or should these be 
something like units.km / units.h?

But if you do have your distance in miles, you do

     calculate_travel_time(units.convert(distance_mi, 'mi', 'km'))

and the type checker and programmer get what they want.

Anyone interested?


-- Koos






From steve at pearwood.info  Thu May 14 13:59:58 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 14 May 2015 21:59:58 +1000
Subject: [Python-ideas] Units in type hints
In-Reply-To: <5554810B.7050409@aalto.fi>
References: <5554810B.7050409@aalto.fi>
Message-ID: <20150514115956.GW5663@ando.pearwood.info>

On Thu, May 14, 2015 at 02:03:39PM +0300, Koos Zevenhoven wrote:
> Hi all,
> 
> How about extending the type annotations for int, float and complex to 
> optionally include also a unit?

I really, really like the idea of having unit-aware calculations.

But this is not the way to do it. See below:


> For instance,
> 
>     def sleep(duration : Float['s']):
>         ...
> 
> Now the type checker could catch the error of trying to pass the sleep 
> duration in milliseconds, Float['ms']. 

But that's not an error. Calling sleep(weight_in_kilograms) is an error. 
But calling sleep(milliseconds(1000)) should be the same as calling 
sleep(seconds(1)). If the user has to do the conversion themselves, 
that's a source of error:

sleep(time_in_milliseconds / 1000)  # convert to seconds

If you think that's too obvious an error for anyone to make, (1) you're 
wrong, I've made that error, yes even that simple, and (2) you should 
try it with more complex sets of units. How many pound-foot per minute 
squared in a newton?

Having the language support unit calculations is not just to catch the 
wrong dimensions (passing a weight where a time is needed), but to 
manage unit conversions automatically without the user being responsible 
for getting the conversion right. A type checker is the wrong tool for 
the job.

If you want to see what a good unit-aware language should be capable of,  
check out:

- Frink: https://futureboy.us/frinkdocs/

- the HP-28 and HP-48 series of calculators;

- the Unix/Linux "units" utility.

There are also some existing Python libraries which do unit 
calculations. You should look into them.



-- 
Steve

From abarnert at yahoo.com  Thu May 14 14:21:10 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 14 May 2015 05:21:10 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <mj1sjm$ukh$1@ger.gmane.org>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
Message-ID: <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>

On May 14, 2015, at 03:15, Serhiy Storchaka <storchaka at gmail.com> wrote:
> 
>> On 14.05.15 11:48, Andrew Barnert via Python-ideas wrote:
>>> On Wednesday, May 13, 2015 10:31 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>>>> On 13.05.15 19:22, Nick Coghlan wrote:
>>>>  Three potential expected sources of surrogates have been identified:
>>>> 
>>>>  * escaped surrogates smuggling arbitrary bytes passed through decoding
>>>>  by the "surrogateescape" error handler
>>>>  * surrogates passed through the decoding process by the
>>>>  "surrogatepass" error handler
>>>>  * decomposed surrogate pairs for astral characters
>>> 
>>> * json
>>> * pickle
>>> * email
>>> * nntplib
>>> * SimpleHTTPRequestHandler
>>> * wsgiref
>>> * cgi
>>> * tarfile
>>> * filesystem names (os.decode) and other os calls
>>> * platform and sysconfig
>>> * other serializers
>> 
>> As far as I can tell, all of your extra cases are just examples of the surrogateescape error handler, which Nick already mentioned.
> 
> Not all. JSON allows to inject surrogates as \uXXXX.

JSON specifically requires treating \uXXXX\uYYYY as a "12-character escape sequence" for a single character if XXXX and YYYY are a surrogate pair. If Python is handling that wrong, then it needs to be fixed (but I don't think it is; I'll test tomorrow).

> Pickle with protocol 0 uses the raw-unicode-escape encoding that allows surrogates.

Sure, if you pickle a unicode object in a narrow 2.x, it gets pickled as surrogates. But when you unpickle it in 3.4, surely those surrogates are converted to astrals? If not, then every time you, e.g., pickle a Windows filename for use with win32api with astrals in 2.x, and unpickle it in 3.4 and try to use it with win32api it wouldn't work. Unless we actually are breaking those filenames, but win32api (and everything else) is working around the problem? Even if that's true, it seems like the obvious answer would be to fix the problem rather than provide tools for workarounds to libraries that must already have those workarounds anyway.

> There is also the UTF-7 encoding that allows surrogates.

Encoding to UTF-7 requires first encoding to UTF-16 and then doing the modified-base-64 thing. And decoding from UTF-7 requires reversing both those steps. There's no way surrogates can escape into Unicode from that. I suppose you could, instead of decoding from UTF-7, just do the base 64 decode and then skip the UTF-16 decode and instead just widen the code units, but that's not a valid thing to do, and I can't see why anyone would do it.

> And yet one source of surrogates -- Python sources. eval(), etc.

If I type '\uD834\uDD1E' in Python 3.4 source, am I actually going to get an illegal Unicode string made of 2 surrogate code points instead of either an error or the single-character string '\U0001D11E'?

If so, again, I think that's a bug that needs to be fixed, not worked around. There's no legitimate reason for any source code to expect that to be an illegal length-2 string.

> Tkinter can produce surrogates. XML parser unfortunately can't (unfortunately - because it makes impossible to handle with Python some files generated by third-party programs). I'm not sure about sqlite3. Any extension module, any wrapper around third-party library could potentially produce surrogates.

What C API function are they calling to make a PyUnicode out of a UTF-16 char* or wchar_t* or whatever without decoding it as UTF-16? And why do we have such a function?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150514/a83a78cf/attachment-0001.html>

From steve at pearwood.info  Thu May 14 15:18:33 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 14 May 2015 23:18:33 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
Message-ID: <20150514131832.GX5663@ando.pearwood.info>

On Thu, May 14, 2015 at 05:21:10AM -0700, Andrew Barnert via Python-ideas wrote:
> On May 14, 2015, at 03:15, Serhiy Storchaka <storchaka at gmail.com> wrote:

[...]
> > There is also the UTF-7 encoding that allows surrogates.
> 
> Encoding to UTF-7 requires first encoding to UTF-16 and then doing the 
> modified-base-64 thing. And decoding from UTF-7 requires reversing 
> both those steps. There's no way surrogates can escape into Unicode 
> from that. I suppose you could, instead of decoding from UTF-7, just 
> do the base 64 decode and then skip the UTF-16 decode and instead just 
> widen the code units, but that's not a valid thing to do, and I can't 
> see why anyone would do it.

I don't see how UTF-7 could include surrogates. It's a 7-bit encoding, 
which means it can only include bytes \x00 through \x7F, i.e. ASCII 
compatible. 

http://unicode.org/glossary/#UTF_7

For example, this passes:

for i in range(0x110000):
    c = chr(i)
    b = c.encode('utf-7')
    m = max(b)
    assert m <= 127

so where are the surrogates coming from?



> > And yet one source of surrogates -- Python sources. eval(), etc.
> 
> If I type '\uD834\uDD1E' in Python 3.4 source, am I actually going to 
> get an illegal Unicode string made of 2 surrogate code points instead 
> of either an error or the single-character string '\U0001D11E'?

I certainly hope so :-)

I think that we should understand Unicode strings as sequences of code 
points from U+0000 to U+10FFFF inclusive. I don't think we should try to 
enforce a rule that all Python strings are surrogate-free. That would make it awfully inconvenient to 
process the whole Unicode character set at once, like I did above. I'd 
need to write:

for i in list(range(0xD800)) + list(range(0xE000, 0x110000)):
    ...

instead, or catch the exception in chr(i), or something equally 
annoying.

The cost of that simplicity is that when you go to encode to bytes, you 
might get an exception. I think so long as we have tools for dealing 
with that (e.g. str->str transformations to remove or replace 
surrogates) that's a fair trade-off.

Another possibility would be to introduce a separate type, 
strict_unicode, which does enforce the rule that there are no surrogates 
in [strict unicode] strings. But having two unicode string types might 
be overkill/confusing. I think it might be better to have a is_strict() 
or is_surrogate() method that reports if the string contains surrogates, 
and let the user remove or replace them as needed.

> If so, again, I think that's a bug that needs to be fixed, not worked 
> around. There's no legitimate reason for any source code to expect 
> that to be an illegal length-2 string.

Well, there's backwards compatibility.

There's also testing:

assert unicodedata.category('\uD800') == 'Cs'

I'm sure there are others.



-- 
Steve

From skip.montanaro at gmail.com  Thu May 14 16:05:07 2015
From: skip.montanaro at gmail.com (Skip Montanaro)
Date: Thu, 14 May 2015 09:05:07 -0500
Subject: [Python-ideas] Units in type hints
In-Reply-To: <5554810B.7050409@aalto.fi>
References: <5554810B.7050409@aalto.fi>
Message-ID: <CANc-5UyZ7DJqjRpLc9T0XCW1y2ZXVBG4FTd+=Cz63fK4cs9UQQ@mail.gmail.com>

On Thu, May 14, 2015 at 6:03 AM, Koos Zevenhoven
<koos.zevenhoven at aalto.fi> wrote:
> How about extending the type annotations for int, float and complex to
> optionally include also a unit?

Not sure that's going to fly, but you might want to check out the
magnitude package:

https://pypi.python.org/pypi/magnitude/0.9.1

I've used it in situations where I want to specify units scaled to a
more natural (to me) size. For example, the gobject.timeout_add
function takes a delay in milliseconds. Given that most of the time I
want delays in seconds or minutes, it's much more natural for me to
let magnitude do the work silently.

Skip

From stephen at xemacs.org  Thu May 14 16:38:57 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 14 May 2015 23:38:57 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
 <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
Message-ID: <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>

Andrew Barnert via Python-ideas writes:

 > > And yet one source of surrogates -- Python sources. eval(), etc.

Yep:

$ python3.4
Python 3.4.3 (default, Mar 10 2015, 14:53:35) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> chr((16*13+8)*256)
'\ud800'
>>> '\ud800'
'\ud800'
>>> '\ud834\udd1e'
'\ud834\udd1e'
>>> 

 > If I type '\uD834\uDD1E' in Python 3.4 source, am I actually going
 > to get an illegal Unicode string made of 2 surrogate code points
 > instead of either an error or the single-character string
 > '\U0001D11E'?

Yes.  How else do you propose to test the surrogateescape error
handler?  Now, are you sitting down?  If not, you should before
looking at the next example. ;-)

>>> '\U0000d834\U0000dd1e'
'\ud834\udd1e'
>>> 

Isn't that disgusting?  But in Python, str is an array of code units.
Literals and chr() can be used to produce str containing surrogates,
as well as codec error handling.



From random832 at fastmail.us  Thu May 14 16:45:50 2015
From: random832 at fastmail.us (random832 at fastmail.us)
Date: Thu, 14 May 2015 10:45:50 -0400
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
 <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com>

On Wed, May 13, 2015, at 13:45, Stephen J. Turnbull wrote:
> random832 at fastmail.us writes:
> 
>  > If you're using libc, why shouldn't you be using the native wide
>  > character types (whether that it UTF-16 or UCS-4) and using the wide
>  > string APIs?
> 
> Who says you are using libc?

If you're not using libc, then "You can safely use the usual libc string
APIs" is not a benefit.

> You might be writing an operating system
> or a shell script.  And if you do use the native wide character type,
> you're guaranteed not to be portable because some systems have wide
> characters are actually variable width and others aren't, as you just
> pointed out.  Or you might have an ancient byte-oriented program you
> want to use.

Using UTF-8 *without* ensuring that the native multibyte character set
is UTF-8 [by setting the locale appropriately] and that it is supported
end-to-end (by your program, by the curses library if applicable, by the
terminal if applicable) just turns obvious problems into subtle ones -
not exactly an improvement.

> I'm not saying that UTF-8 is a panacea; just that every problem that
> UTF-8 has, UTF-16 also has -- but UTF-16 does have problems that UTF-8
> doesn't.  Specifically, surrogates and ASCII incompatibility.

ASCII incompatibility is a feature, not a bug - it prevents you from
doing stupid things that cause subtle bugs.

On Wed, May 13, 2015, at 14:18, Andrew Barnert wrote:
> That's exactly how you create the problems this thread is trying to
> solve.

The point I was getting at was more "you can't benefit from libc
functions at all, therefore your argument for UTF-8 is bad" than "you
should be using the native wchar_t type". Libc only has functions to
deal with native char strings [but these do not generally count
characters or respect character boundaries in multibyte character sets
even if UTF-8 *is* the native multibyte character set] and native
wchar_t strings, not any other kind of string.

> 
> If you treat wchar_t as a "native wide char type" and call any of the wcs
> functions on UTF-16 strings, you will count astral characters as two
> characters, illegally split strings in the middle of surrogates, etc.

No worse than UTF-8. If you can solve these problems for UTF-8 you can
solve them for UTF-16.

> And
> you'll count BOMs as two characters and split them.

Wait, what? The BOM is a single code unit in UTF-16. There is *no*
encoding in which a BOM is two code units (it's three in UTF-8). Anyway,
BOM shouldn't be used for in-memory strings, only text files.

> These are basically
> all the same problems you have using char with UTF-8, and more, and
> harder to notice in testing (not just because you may not think to test
> for astral characters, but because even if you do, you may not think to
> test both byte orders).

Byte orders are not an issue for anything other than file I/O, and I'm
not proposing using any type other than UTF-8 for *text files*, anyway,
only in-memory strings.

> Later versions of C and POSIX (as in later than what Python requires)
> provide explicit __CHAR16_TYPE__ and __CHAR_32_TYPE__, but they don't
> provide APIs for analogs of strlen, strchr, strtok, etc. for those types,
> so you have to be explicit about whether you're counting code points or
> characters (and, if characters, how you're dealing with endianness).

There are no analogs of these for UTF-8 either. And endianness is not an
issue for in-memory strings stored using any of these types.

From random832 at fastmail.us  Thu May 14 16:49:07 2015
From: random832 at fastmail.us (random832 at fastmail.us)
Date: Thu, 14 May 2015 10:49:07 -0400
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <1431614947.2825480.268771377.2227A960@webmail.messagingengine.com>

On Thu, May 14, 2015, at 04:48, Andrew Barnert via Python-ideas wrote:
> As far as I can tell, all of your extra cases are just examples of the
> surrogateescape error handler, which Nick already mentioned.

Technically filesystem names (and other similar boundary APIs like
environ, anything ctypes, etc) on Windows can contain arbitrary
surrogates and have nothing to do with surrogateescape.

From alexander at tutorfair.com  Thu May 14 16:52:55 2015
From: alexander at tutorfair.com (Alexander Atkins)
Date: Thu, 14 May 2015 15:52:55 +0100
Subject: [Python-ideas] lazy list
Message-ID: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>

Hi, I'm new to this mailing list.

I needed a lazy list implementation for something, so I created one.  I was
a little bit surprised to find that there wasn't one in the *itertools*
module and it seemed like quite a basic thing to me, as someone who has
used Haskell before, so I thought probably I should share it.  I'm
wondering whether something like this should be part of the standard
library?

A fuller explanation is in the README, which is here:
https://github.com/jadatkins/python-lazylist
The gist of it is that it allows you to index into a generator.  Previously
evaluated elements are remembered, so foo[5] returns the same thing each
time, and you can later call foo[4] and get the previous element.  There
are many uses for such a thing, but if you're not expecting it in the
language, you might not necessarily think of them.

Warning: it may contain bugs, especially the stuff to do with slicing,
which is not really what it's for.

--

*J Alexander D Atkins*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150514/30bfd04d/attachment-0001.html>

From rosuav at gmail.com  Thu May 14 17:01:43 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 15 May 2015 01:01:43 +1000
Subject: [Python-ideas] lazy list
In-Reply-To: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
References: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
Message-ID: <CAPTjJmoAcVJjoh6vu0fvpVhgi3vGtT0WJQ8VvrsroXv5ByU6zg@mail.gmail.com>

On Fri, May 15, 2015 at 12:52 AM, Alexander Atkins
<alexander at tutorfair.com> wrote:
> I needed a lazy list implementation for something, so I created one.  I was
> a little bit surprised to find that there wasn't one in the itertools module
> and it seemed like quite a basic thing to me, as someone who has used
> Haskell before, so I thought probably I should share it.  I'm wondering
> whether something like this should be part of the standard library?
>

It may well already exist on PyPI. There are a few things with "lazy"
in their names; you'd have to poke around and see if one of them is of
use to you.

Another thing you might want to search for is "indexable map()", which
is a related concept (imagine calling map() with a function and a
list; the result is theoretically subscriptable, but not with Py3's
basic map() implementation) that I'm fairly sure I've seen around at
times.

https://pypi.python.org/pypi

Have fun searching. There's a huge lot out there, most of which isn't
what you want... but you never know what you'll find!

ChrisA

From steve at pearwood.info  Thu May 14 17:24:33 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 15 May 2015 01:24:33 +1000
Subject: [Python-ideas] lazy list
In-Reply-To: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
References: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
Message-ID: <20150514152433.GY5663@ando.pearwood.info>

On Thu, May 14, 2015 at 03:52:55PM +0100, Alexander Atkins wrote:
> Hi, I'm new to this mailing list.
> 
> I needed a lazy list implementation for something, so I created one.  I was
> a little bit surprised to find that there wasn't one in the *itertools*
> module 

Why? It's not really an iterator tool. The things in itertools are tools 
for processing streams, not containers.


> and it seemed like quite a basic thing to me, as someone who has
> used Haskell before, so I thought probably I should share it.  I'm
> wondering whether something like this should be part of the standard
> library?
> 
> A fuller explanation is in the README, which is here:
> https://github.com/jadatkins/python-lazylist
> The gist of it is that it allows you to index into a generator.  Previously
> evaluated elements are remembered, so foo[5] returns the same thing each
> time, and you can later call foo[4] and get the previous element.  There
> are many uses for such a thing, but if you're not expecting it in the
> language, you might not necessarily think of them.

What sort of uses? Can you give some examples?

I'm having trouble thinking of a situation where I might use something 
like that. If I want random access, I'd use a list, or a computed 
sequence like (x)range. I don't think I would want something which acts 
like a generator but quietly holds onto all the items it has seen 
before, whether I need them or not.


> Warning: it may contain bugs, especially the stuff to do with slicing,
> which is not really what it's for.

A slice is just a subsequence of indexed values. If you can index it, 
you should be able to slice it.

assert spam[start:end:step] == [spam[i] for i in range(start, end, step)]



-- 
Steve

From alexander at tutorfair.com  Thu May 14 17:44:19 2015
From: alexander at tutorfair.com (Alexander Atkins)
Date: Thu, 14 May 2015 16:44:19 +0100
Subject: [Python-ideas] lazy list
In-Reply-To: <20150514152433.GY5663@ando.pearwood.info>
References: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
 <20150514152433.GY5663@ando.pearwood.info>
Message-ID: <CAH+CqNhRae=tDxiO7e7-61CAqyeONRYq=EQr_KM7+SBngkXgeg@mail.gmail.com>

On 14 May 2015 at 16:24, Steven D'Aprano <steve at pearwood.info> wrote:

> What sort of uses? Can you give some examples?
>
> I'm having trouble thinking of a situation where I might use something
> like that. If I want random access, I'd use a list, or a computed
> sequence like (x)range. I don't think I would want something which acts
> like a generator but quietly holds onto all the items it has seen
> before, whether I need them or not.
>

Yes: you might want random access into an infinite sequence, where you
can't be sure how many values you'll need at the start but you will want to
reuse or refer back to earlier values later, and you can't be sure which
ones you'll need.  If you know you only want each value once, then you
should stick with a generator.  For example, you might need to read from a
network stream or stdin, and the point where you stop reading might depend
on content, but you might need to refer back to earlier items read, where
the index of the item you need is determined at run-time.

In my particular case, I was writing a program that reads from a large, but
finite sequence, where I usually only need the first few items, but the
maximum index that I need is determined at runtime,* and it was taking too
long to process the whole list before starting.  So I wrote a generator for
the sequence instead, and used my LazyList wrapper to get random access on
it.

* Actually, it's not determined at runtime, but it is determined by another
part of the program outside of the function I was writing.



> A slice is just a subsequence of indexed values. If you can index it,
> you should be able to slice it.
>
> assert spam[start:end:step] == [spam[i] for i in range(start, end, step)]
>

What I was trying to do was to create a slice without losing the laziness.
For example, in my implementation you can take a slice like foo[start:]
from an infinite sequence without causing problems.  I haven't quite done
it right, because I've returned an iterator instead of another LazyList
object, but I could fix it up.  I discuss this a bit more in the example
program given in the repository.


Python draws a lot from Haskell, especially the itertools module.  Almost
everything that's cool about lazy evaluation in Haskell is in Python
somewhere.  But Haskell has neither the list type that Python has, nor the
generator type: it only has lazy linked lists, which have to serve for
both.  So it seemed to me to be an obvious omission for those few cases
where that's really what you want.  But I can totally believe that if
nobody's thought of this so far then it's probably not commonly useful.

--
*J Alexander D Atkins*

Personal:  <https://plus.google.com/105921567919327579489/about> ? 07963
237265

Work:  <https://www.tutorfair.com/> ? 020 3322 4748
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150514/e104c669/attachment.html>

From alexander at tutorfair.com  Thu May 14 17:55:07 2015
From: alexander at tutorfair.com (Alexander Atkins)
Date: Thu, 14 May 2015 16:55:07 +0100
Subject: [Python-ideas] lazy list
In-Reply-To: <CAPTjJmoAcVJjoh6vu0fvpVhgi3vGtT0WJQ8VvrsroXv5ByU6zg@mail.gmail.com>
References: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
 <CAPTjJmoAcVJjoh6vu0fvpVhgi3vGtT0WJQ8VvrsroXv5ByU6zg@mail.gmail.com>
Message-ID: <CAH+CqNjyjpL6AbpMNfA-LesC4uE09k+URshkgD9dd15q-eDR_g@mail.gmail.com>

Whoops, at some point I hit 'Reply' instead of 'Reply All', so some of
these messages didn't end up in the public group.

On 14 May 2015 at 16:22, Alexander Atkins <alexander at tutorfair.com> wrote:
>
> On 14 May 2015 at 16:01, Chris Angelico <rosuav at gmail.com> wrote:
> >
> > It may well already exist on PyPI. There are a few things with "lazy"
> > in their names; you'd have to poke around and see if one of them is of
> > use to you.
>
> Ah, yes.  The package zc.lazylist looks quite similar.  In some ways it's
better than mine, in some ways it's not so ambitious.  It's quite difficult
to work out who the author is, though.  It just says "Copyright Zope 2006",
which isn't very helpful.
>
> I should perhaps reiterate that I have already written a lazy-list
implementation, and therefore I don't need another one.  What I was
wondering is whether I should make any effort to share it with the
community.  I'm quite happy to shut up and go back to my paid work if
that's not useful to anyone.

(I'm leaving out Chris' intervening message in case he didn't intend it to
be public ? not that there's anything saucy in there.)

On 14 May 2015 at 16:46, Alexander Atkins <alexander at tutorfair.com> wrote:
>
> On 14 May 2015 at 16:29, Chris Angelico <rosuav at gmail.com> wrote:
> >
> > Fair enough. What I'd recommend is putting it up on PyPI yourself; if
> > something like this ever does make it into the standard library, it'll
> > most likely be by incorporation of a PyPI package.
>
> Yes, that seems to be the thing to do.  Some day when I've got a minute,
I'll incorporate the improvements from Zope's implementation (the ideas, I
mean, not the code) and fix up my slice implementation, then put it on PyPi.

--
J Alexander D Atkins
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150514/c8e56d94/attachment-0001.html>

From tjreedy at udel.edu  Thu May 14 18:51:40 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 14 May 2015 12:51:40 -0400
Subject: [Python-ideas] lazy list
In-Reply-To: <CAPTjJmoAcVJjoh6vu0fvpVhgi3vGtT0WJQ8VvrsroXv5ByU6zg@mail.gmail.com>
References: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
 <CAPTjJmoAcVJjoh6vu0fvpVhgi3vGtT0WJQ8VvrsroXv5ByU6zg@mail.gmail.com>
Message-ID: <mj2jr5$afk$1@ger.gmane.org>

On 5/14/2015 11:01 AM, Chris Angelico wrote:
> On Fri, May 15, 2015 at 12:52 AM, Alexander Atkins
> <alexander at tutorfair.com> wrote:
>> I needed a lazy list implementation for something, so I created one.  I was
>> a little bit surprised to find that there wasn't one in the itertools module
>> and it seemed like quite a basic thing to me, as someone who has used
>> Haskell before, so I thought probably I should share it.  I'm wondering
>> whether something like this should be part of the standard library?

This is a memoizer using a list rather than a dict. This is appropriate
for f(count) = g(count-1).

> It may well already exist on PyPI. There are a few things with "lazy"
> in their names; you'd have to poke around and see if one of them is of
> use to you.

I would also try 'memo' and 'memoize'.

> Another thing you might want to search for is "indexable map()", which
> is a related concept (imagine calling map() with a function and a
> list; the result is theoretically subscriptable, but not with Py3's
> basic map() implementation) that I'm fairly sure I've seen around at
> times.
>
> https://pypi.python.org/pypi
>
> Have fun searching. There's a huge lot out there, most of which isn't
> what you want... but you never know what you'll find!

The problem with putting any one thing in stdlib is that there are so 
many little variations.

-- 
Terry Jan Reedy


From abarnert at yahoo.com  Thu May 14 21:38:19 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 14 May 2015 12:38:19 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>

On May 14, 2015, at 07:38, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> 
> Andrew Barnert via Python-ideas writes:
> 
>>> And yet one source of surrogates -- Python sources. eval(), etc.
> 
> Yep:
> 
> $ python3.4
> Python 3.4.3 (default, Mar 10 2015, 14:53:35) 
> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>> chr((16*13+8)*256)
> '\ud800'
>>>> '\ud800'
> '\ud800'
>>>> '\ud834\udd1e'
> '\ud834\udd1e'
> 
>> If I type '\uD834\uDD1E' in Python 3.4 source, am I actually going
>> to get an illegal Unicode string made of 2 surrogate code points
>> instead of either an error or the single-character string
>> '\U0001D11E'?
> 
> Yes.  How else do you propose to test the surrogateescape error
> handler?  Now, are you sitting down?  If not, you should before
> looking at the next example. ;-)
> 
>>>> '\U0000d834\U0000dd1e'
> '\ud834\udd1e'
> 
> Isn't that disgusting?  

No; if the former gave you surrogates, the latter pretty much has to. Otherwise, that would essentially mean you can create illegal strings by accident, but it's hard to create them in the obvious explicitly intentional way. (The other way around might be reasonable, however.)

At any rate, I can see that allowing people to go out of their way to create invalid strings is potentially useful (for testing invalid string handling, if nothing else) and possibly a "consenting adults" issue even if it weren't. So maybe that's the one case from the list that isn't just an example of Nick's three general cases.

But meanwhile: if you're intentionally writing literals for invalid strings to test for invalid string handling, is that an argument for this proposal? For example, I might want to test that some fast JSON library does the same thing as the stdlib one in all cases; if there's a text-to-text codec in front of it, that makes the test a lot harder to write.

So I think it still comes down to what Nick said: if you've got surrogates in your unicode, either you have a bug at your boundaries, you're dealing with surrogate escapes, or I forget the third... Or you're doing it intentionally and don't want to fix it. (And, although you didn't re-raise it, Serhiy mentioned eval, so let me just say that something like "I called eval on some arbitrary string that happened to be JSON not Python" sounds like a bug at the boundaries case, not a separate problem.)


From abarnert at yahoo.com  Thu May 14 21:48:16 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 14 May 2015 12:48:16 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <1431614947.2825480.268771377.2227A960@webmail.messagingengine.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <1431614947.2825480.268771377.2227A960@webmail.messagingengine.com>
Message-ID: <BE014335-D461-4DCA-A123-F71752699518@yahoo.com>

On May 14, 2015, at 07:49, random832 at fastmail.us wrote:
> 
>> On Thu, May 14, 2015, at 04:48, Andrew Barnert via Python-ideas wrote:
>> As far as I can tell, all of your extra cases are just examples of the
>> surrogateescape error handler, which Nick already mentioned.
> 
> Technically filesystem names (and other similar boundary APIs like
> environ, anything ctypes, etc) on Windows can contain arbitrary
> surrogates

Are you sure? I thought that, unless you're using Win95 or NT 3.1 or something, Win32 *W APIs are explicitly for Unicode characters (not code units), minus nulls and any relevant reserved characters (e.g.. no slashes in filenames, no control characters in filenames except for substream names, etc.). That's what the Naming Files doc seems to imply. (Then again, there are other areas that seem confusing or misleading--e.g., where it tells you not to worry about normalization because once the string gets through Win32 and to the filesystem it's just a string of WCHARs, which sounds to me like that's exactly why you _should_ worry about normalization...)

> and have nothing to do with surrogateescape.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From abarnert at yahoo.com  Thu May 14 22:17:15 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 14 May 2015 13:17:15 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
 <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com>
Message-ID: <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com>

On May 14, 2015, at 07:45, random832 at fastmail.us wrote:

[snipping reply to Stephen J. Turnbull]

>> On Wed, May 13, 2015, at 14:18, Andrew Barnert wrote:
>> That's exactly how you create the problems this thread is trying to
>> solve.
> 
> The point I was getting at was more "you can't benefit from libc
> functions at all, therefore your argument for UTF-8 is bad" than "you
> should be using the native wchar_t type".

I'm not sure is this was Stephen's point, but _my_ point is not that it's easier to use UTF-16 incorrectly, but rather that it's just as easy to do, and much more likely to get through unit testing and lead to a later debugging nightmare when you do. The only bug that's easier to catch with UTF-16 is the incredibly obvious "why am I only seeing the first character of my filename" bug.

> Libc only has functions to
> deal with native char strings [but these do not generally count
> characters or respect character boundaries in multibyte character sets
> even if UTF-8 *is* the native multibyte character set] and native
> wchar_t strings, not any other kind of string.
> 
>> 
>> If you treat wchar_t as a "native wide char type" and call any of the wcs
>> functions on UTF-16 strings, you will count astral characters as two
>> characters, illegally split strings in the middle of surrogates, etc.
> 
> No worse than UTF-8. If you can solve these problems for UTF-8 you can
> solve them for UTF-16.
> 
>> And
>> you'll count BOMs as two characters and split them.
> 
> Wait, what? The BOM is a single code unit in UTF-16.

Sorry, that "two" was a stupid typo (or braino) for "one", which then changes the meaning of the rest of the paragraph badly.

The point is that you can miscount lengths by counting the BOM, and you can split a BOM stream into a BOM steam and an "I hope it's in native order or we're screwed" stream.

> There is *no*
> encoding in which a BOM is two code units (it's three in UTF-8). Anyway,
> BOM shouldn't be used for in-memory strings, only text files.

In a language with StringIO and socket.makefile and FTP and HTTP requests as transparent file-like objects and a slew of libraries that can take an open binary or text file or a bytes or str, that last point doesn't work as well.

For example, if I pass a binary file to you library's spam.parse function, I can expect that to be the same as reading the binary file and passing it to your spam.fromstring function. So, I may expect to be able to, say, re.split the document into smaller documents and pass them to spam.fromstring as well. Which is wrong, but it works when I test it, because most UTF-16 files are little-endian, and so is my machine. And then someone runs my app on a big-endian machine and they get a hard-to-debug exception (or, if we're really unlucky, silent mojibake, but that's pretty rare).

>> These are basically
>> all the same problems you have using char with UTF-8, and more, and
>> harder to notice in testing (not just because you may not think to test
>> for astral characters, but because even if you do, you may not think to
>> test both byte orders).
> 
> Byte orders are not an issue for anything other than file I/O, and I'm
> not proposing using any type other than UTF-8 for *text files*, anyway,
> only in-memory strings.

Why do you want to use UTF-16 for in-memory strings? If you need to avoid the problems of UTF-8 (and can't use a higher-level Unicode API like Python's str type), you can use UTF-32, which solves all of the problems, or you can use UTF-16, which solves almost none of them, but makes them less likely to be caught in testing.

There's a reason very new frameworks force you to use UTF-16 APIs and string types, only the ones that were originally written for UCS2 and it's too late to change (Win32, Cocoa, Java, and a couple others).

>> Later versions of C and POSIX (as in later than what Python requires)
>> provide explicit __CHAR16_TYPE__ and __CHAR_32_TYPE__, but they don't
>> provide APIs for analogs of strlen, strchr, strtok, etc. for those types,
>> so you have to be explicit about whether you're counting code points or
>> characters (and, if characters, how you're dealing with endianness).
> 
> There are no analogs of these for UTF-8 either. And endianness is not an
> issue for in-memory strings stored using any of these types.

Sure, if you've, say, explicitly encoded text to UTF-16-LE and want to treat it as UTF-16-LE, you don't need to worry about endianness; a WCHAR or char16_t is a WCHAR or char16_t. But why would you do that in the first place? 

Usually, when you have WCHARs, it's because you opened a file and wread from it, or received UTF-16 over the network of from a Windows FooW API, in which case you have the same endianness issues as any other binary I/O on non-char-sized types. And yes, of course the right answer is to decode at input, but if you're doing that, why wouldn't you just decide to Unicode instead of byte-swapping the WCHARs?


From abarnert at yahoo.com  Thu May 14 22:29:58 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 14 May 2015 13:29:58 -0700
Subject: [Python-ideas] lazy list
In-Reply-To: <CAH+CqNhRae=tDxiO7e7-61CAqyeONRYq=EQr_KM7+SBngkXgeg@mail.gmail.com>
References: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
 <20150514152433.GY5663@ando.pearwood.info>
 <CAH+CqNhRae=tDxiO7e7-61CAqyeONRYq=EQr_KM7+SBngkXgeg@mail.gmail.com>
Message-ID: <38002E1A-14E0-4251-A89C-28C70C186EF8@yahoo.com>

On May 14, 2015, at 08:44, Alexander Atkins <alexander at tutorfair.com> wrote:
> 
>> A slice is just a subsequence of indexed values. If you can index it,
>> you should be able to slice it.
>> 
>> assert spam[start:end:step] == [spam[i] for i in range(start, end, step)]
> 
> What I was trying to do was to create a slice without losing the laziness.  For example, in my implementation you can take a slice like foo[start:] from an infinite sequence without causing problems.  I haven't quite done it right, because I've returned an iterator instead of another LazyList object, but I could fix it up.  I discuss this a bit more in the example program given in the repository.

Having gone through this whole idea before (and then never finding a good use for it...), that's the only hard part--and the easiest way to solve that hard part is to create a generic sequence view library, which turns out to be more useful than the lazy list library anyway.

(Plus, once you build the slice view type of the sequence view abstract type, it's pretty easy to build a deque- or rope-like sequence of discontiguous, or even different-source, slices, at which point tail-sharing becomes trivial, which makes lazy lists a lot more useful.)

One more thing: a lot of the problems you (at least if you're thinking the same way I was) think you want lazy lists for, you only need tee--or you only need tee with its cache exposed so you can explicitly access it. Being able to directly index or slice or even delete from it as a sequence is a neat problem to solve, but it's hard to find a case where explicitly working on the cache is significantly less readable, and it's a lot simpler.

But anyway, if you think it could be useful to someone else, you don't need to ask python-ideas whether to upload it to PyPI; just do it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150514/091eeadc/attachment.html>

From chris.barker at noaa.gov  Thu May 14 22:37:00 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 14 May 2015 13:37:00 -0700
Subject: [Python-ideas] Fwd:  Add math.iszero() and math.isequal()?
In-Reply-To: <CALGmxELNkHvrUDmBu4_ZR5nXFCn=Vj_THJZG5GPCW80sc=uh9w@mail.gmail.com>
References: <4537a315-a08c-4838-8d55-1483ac9656bc@googlegroups.com>
 <85fe56bf-84a1-45b9-84bf-26b2ff389486@googlegroups.com>
 <CALGmxELNkHvrUDmBu4_ZR5nXFCn=Vj_THJZG5GPCW80sc=uh9w@mail.gmail.com>
Message-ID: <CALGmxE+2Vm6dh=46HVh6ZRAgLQ_d_W4OZvHqODXQU7jJyORx1Q@mail.gmail.com>

something went weird with the google groups mirror of this list -- sorry if
this lands twice.

-Chris

---------- Forwarded message ----------
From: Chris Barker <chris.barker at noaa.gov>
Date: Thu, May 14, 2015 at 1:34 PM
Subject: Re: [Python-ideas] Add math.iszero() and math.isequal()?
To: Neil Girdhar <mistersheik at gmail.com>
Cc: "python-ideas at googlegroups.com" <python-ideas at googlegroups.com>




On Tue, May 12, 2015 at 11:24 PM, Neil Girdhar <mistersheik at gmail.com>
wrote:

> See PEP 485, which appears to be still a draft:
> https://www.python.org/dev/peps/pep-0485/
>

It's been approved, and it's "just" waiting for me to implement the code
and get it reviewed, etc.

I've been much sidetracked, but hoping to get to in the next couple days....

iszero = lambda x: hash(x) == hash(0)
>> isequal = lambda a, b: hash(a) == hash(b)
>>
>> Clearly these are trivial functions (but perphaps math experts could
>> provide better implementations; I'm not proposing the implementations
>> shown, just the functions however they are implemented).
>>
>
I'm not familiar with how hashing works for floats, but I can't image this
would even work -- == an !== work for floats, then just don't test what
people most often want :-)

Anyway, see the PEP, and the quite long and drawn out discussion on this
list a couple months back.

-CHB






>
>> It seems that not everyone is aware of the issues regarding comparing
>> floats for equality and so I still see code that compares floats using ==
>> or !=.
>>
>> If these functions were in the math module it would be convenient (since
>> I find I need them in most non-trivial programs), but also provide a place
>> to document that they should be used rather than == or != for floats. (I
>> guess a similar argument might apply to the cmath module?)
>>
>>
>>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150514/07018675/attachment.html>

From ram at rachum.com  Thu May 14 22:17:58 2015
From: ram at rachum.com (Ram Rachum)
Date: Thu, 14 May 2015 23:17:58 +0300
Subject: [Python-ideas] Add `Executor.filter`
In-Reply-To: <CADiSq7dDJ_wrcWLO3w8b3TSS4x65aZ1fW4FGVR-7A6eHcz1uZg@mail.gmail.com>
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
 <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
 <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>
 <B48915FA-76CC-46FE-BC4A-05B56D15CF43@yahoo.com>
 <CANXboVagtQcE_NMjYMeHEO_xazBigRHPT9Uo1NNQUUybMcJFgg@mail.gmail.com>
 <CADiSq7dDJ_wrcWLO3w8b3TSS4x65aZ1fW4FGVR-7A6eHcz1uZg@mail.gmail.com>
Message-ID: <CANXboVZdrMqKDZgUZyMrHqG1VnA8qGux1GDG28re+hN=P5mMoQ@mail.gmail.com>

I'd like to move `Executor.filter` forward, if that's possible. Can we get
more people on the list to express their opinion about whether
`Executor.filter` should be added to the stdlib? (See my implementation in
a previous message on this thread.)

On Thu, May 7, 2015 at 6:56 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 2 May 2015 at 19:25, Ram Rachum <ram at rachum.com> wrote:
> > Okay, I implemented it. Might be getting something wrong because I've
> never
> > worked with the internals of this module before.
>
> I think this is sufficiently tricky to get right that it's worth
> adding filter() as a parallel to the existing map() API.
>
> However, it did raise a separate question for me: is it currently
> possible to use Executor.map() and the as_completed() module level
> function together? Unless I'm missing something, it doesn't look like
> it, as map() hides the futures from the caller, so you only have
> something to pass to as_completed() if you invoke submit() directly.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150514/b68b3e50/attachment-0001.html>

From guido at python.org  Thu May 14 23:03:02 2015
From: guido at python.org (Guido van Rossum)
Date: Thu, 14 May 2015 14:03:02 -0700
Subject: [Python-ideas] Add `Executor.filter`
In-Reply-To: <CANXboVZdrMqKDZgUZyMrHqG1VnA8qGux1GDG28re+hN=P5mMoQ@mail.gmail.com>
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
 <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
 <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>
 <B48915FA-76CC-46FE-BC4A-05B56D15CF43@yahoo.com>
 <CANXboVagtQcE_NMjYMeHEO_xazBigRHPT9Uo1NNQUUybMcJFgg@mail.gmail.com>
 <CADiSq7dDJ_wrcWLO3w8b3TSS4x65aZ1fW4FGVR-7A6eHcz1uZg@mail.gmail.com>
 <CANXboVZdrMqKDZgUZyMrHqG1VnA8qGux1GDG28re+hN=P5mMoQ@mail.gmail.com>
Message-ID: <CAP7+vJ+EWFv_N+-HVFB4mTAEBFFJLsGd4RMp+Z-gJWE45-DBHA@mail.gmail.com>

If there's a working patch and you can get a core developer as a reviewer
I'm fine with that. No PEP needed.)

On Thu, May 14, 2015 at 1:17 PM, Ram Rachum <ram at rachum.com> wrote:

> I'd like to move `Executor.filter` forward, if that's possible. Can we get
> more people on the list to express their opinion about whether
> `Executor.filter` should be added to the stdlib? (See my implementation in
> a previous message on this thread.)
>
> On Thu, May 7, 2015 at 6:56 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>> On 2 May 2015 at 19:25, Ram Rachum <ram at rachum.com> wrote:
>> > Okay, I implemented it. Might be getting something wrong because I've
>> never
>> > worked with the internals of this module before.
>>
>> I think this is sufficiently tricky to get right that it's worth
>> adding filter() as a parallel to the existing map() API.
>>
>> However, it did raise a separate question for me: is it currently
>> possible to use Executor.map() and the as_completed() module level
>> function together? Unless I'm missing something, it doesn't look like
>> it, as map() hides the futures from the caller, so you only have
>> something to pass to as_completed() if you invoke submit() directly.
>>
>> Cheers,
>> Nick.
>>
>> --
>> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150514/88a28262/attachment.html>

From ethan at stoneleaf.us  Thu May 14 23:10:25 2015
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 14 May 2015 14:10:25 -0700
Subject: [Python-ideas] Add `Executor.filter`
In-Reply-To: <CANXboVZdrMqKDZgUZyMrHqG1VnA8qGux1GDG28re+hN=P5mMoQ@mail.gmail.com>
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
 <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
 <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>
 <B48915FA-76CC-46FE-BC4A-05B56D15CF43@yahoo.com>
 <CANXboVagtQcE_NMjYMeHEO_xazBigRHPT9Uo1NNQUUybMcJFgg@mail.gmail.com>
 <CADiSq7dDJ_wrcWLO3w8b3TSS4x65aZ1fW4FGVR-7A6eHcz1uZg@mail.gmail.com>
 <CANXboVZdrMqKDZgUZyMrHqG1VnA8qGux1GDG28re+hN=P5mMoQ@mail.gmail.com>
Message-ID: <55550F41.2090805@stoneleaf.us>

On 05/14/2015 01:17 PM, Ram Rachum wrote:
> I'd like to move `Executor.filter` forward, if that's possible. Can we
> get more people on the list to express their opinion about whether
> `Executor.filter` should be added to the stdlib? (See my implementation
> in a previous message on this thread.)

Open up an issue on the tracker and attach your patch.

--
~Ethan~


From solipsis at pitrou.net  Thu May 14 23:24:41 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 14 May 2015 23:24:41 +0200
Subject: [Python-ideas] Add `Executor.filter`
References: <CANXboVbmZNDUp8PCqDwh_DpWrr-zAgt2SWA15hHMovY+rRRxoQ@mail.gmail.com>
 <CAP7+vJL9aosyOkVHVc8PoJQJNbvDnfhyk-enAj2KsPAXpHOfng@mail.gmail.com>
 <CANXboVYd64d=pmOX7q6x0pBvSSmh-jLq5hymZ5C6DY-jEi99tg@mail.gmail.com>
 <B48915FA-76CC-46FE-BC4A-05B56D15CF43@yahoo.com>
 <CANXboVagtQcE_NMjYMeHEO_xazBigRHPT9Uo1NNQUUybMcJFgg@mail.gmail.com>
 <CADiSq7dDJ_wrcWLO3w8b3TSS4x65aZ1fW4FGVR-7A6eHcz1uZg@mail.gmail.com>
 <CANXboVZdrMqKDZgUZyMrHqG1VnA8qGux1GDG28re+hN=P5mMoQ@mail.gmail.com>
Message-ID: <20150514232441.2aa648cf@fsol>

On Thu, 14 May 2015 23:17:58 +0300
Ram Rachum <ram at rachum.com> wrote:
> I'd like to move `Executor.filter` forward, if that's possible. Can we get
> more people on the list to express their opinion about whether
> `Executor.filter` should be added to the stdlib? (See my implementation in
> a previous message on this thread.)

I don't think there's a common use case for Executor.filter().  Builtin
filter() and map() are a bad analogy, because they are meant to be
easily composable in order to define more complex processing chains.
But I don't see a reason to compose Executor operations.

Regards

Antoine.



From mertz at gnosis.cx  Fri May 15 02:11:28 2015
From: mertz at gnosis.cx (David Mertz)
Date: Thu, 14 May 2015 17:11:28 -0700
Subject: [Python-ideas] lazy list
In-Reply-To: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
References: <CAH+CqNhDopTi8qZWgd7AJJkO58CvO7ny7c=UAEw1zWQy_WvLnQ@mail.gmail.com>
Message-ID: <CAEbHw4YZSvFVRHvdEgw7fwaWEktnEHhgHkr-eFr5UX+gdsxf8Q@mail.gmail.com>

I actually taught almost exactly this two days ago as an example of a class
in the context of laziness, and included it in a white paper I wrote for
O'Reilly on _Functional Programming in Python_ that will be given out
starting at OSCon.  I'm sure I'm also not the first or the 50th person to
think of it.  My basic implementation--made to exhibit a concept not to be
complete--was rather short:

from collections.abc import Sequence
class ExpandingSequence(Sequence):
    def __init__(self, it):
        self.it = it
        self._cache = []
    def __getitem__(self, index):
        while len(self._cache) <= index:
            self._cache.append(next(self.it))
        return self._cache[index]
    def __len__(self):
        return len(self._cache)

I think it's kinda cute.  Especially when passed in something like an
infinite iterator of all the primes or all the Fibonacci numbers.  But I
can't really recommend it (nor the fleshed out version the OP wrote) for
the standard library.  There's no size limit to the object, and so we don't
*really* save space over just appending more elements to a list.

I can certainly see that it could be OK for particular people with
particular use cases, but it doesn't feel general enough for stdlib.

On Thu, May 14, 2015 at 7:52 AM, Alexander Atkins <alexander at tutorfair.com>
wrote:

> Hi, I'm new to this mailing list.
>
> I needed a lazy list implementation for something, so I created one.  I
> was a little bit surprised to find that there wasn't one in the
> *itertools* module and it seemed like quite a basic thing to me, as
> someone who has used Haskell before, so I thought probably I should share
> it.  I'm wondering whether something like this should be part of the
> standard library?
>
> A fuller explanation is in the README, which is here:
> https://github.com/jadatkins/python-lazylist
> The gist of it is that it allows you to index into a generator.
> Previously evaluated elements are remembered, so foo[5] returns the same
> thing each time, and you can later call foo[4] and get the previous
> element.  There are many uses for such a thing, but if you're not expecting
> it in the language, you might not necessarily think of them.
>
> Warning: it may contain bugs, especially the stuff to do with slicing,
> which is not really what it's for.
>
> --
>
> *J Alexander D Atkins*
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150514/a26fdc82/attachment-0001.html>

From stephen at xemacs.org  Fri May 15 03:02:26 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 15 May 2015 10:02:26 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
 <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
Message-ID: <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>

Andrew Barnert writes:

 > >>>> '\U0000d834\U0000dd1e'
 > > '\ud834\udd1e'
 > > 
 > > Isn't that disgusting?  
 > 
 > No; if the former gave you surrogates, the latter pretty much has to.

That, of course.  What I was referring to as "disgusting" was using
32-bit syntax for Unicode literals to create surrogates.

 > But meanwhile: if you're intentionally writing literals for invalid
 > strings to test for invalid string handling, is that an argument
 > for this proposal?

No.  I see three cases:

(1) Problem: You created a Python string which is invalid Unicode
    using literals or chr().
    Solution: You know why you did that, we don't.  You deal with it.
    (aka, "consenting adults")

(2) Problem: You used surrogateescape or surrogatepass because you want
    the invalid Unicode to get to the other side some times.
    Solution: That's not a problem, that's a solution.
    Advice:  Handle with care, like radioactives.  Use strict error
    handling everywhere except the "out" door for invalid Unicode.  If
    you can't afford a UnicodeError if such a string inadvertantly
    gets mixed with other stuff, use "try".
    (aka, "consenting adults")

(3) Problem: Code you can't or won't fix buggily passes you Unicode
    that might have surrogates in it.
    Solution: text-to-text codecs (but I don't see why they can't be
    written as encode-decode chains).

As I've written before, I think text-to-text codecs are an attractive
nuisance.  The temptation to use them in most cases should be refused,
because it's a better solution to deal with the problem at the
incoming boundary or the outgoing boundary (using str<->bytes codecs).
Dealing with them elsewhere and reintroducing the corrupted str into
the data flow is likely to cause issues with correctness (if altered
data is actually OK, why didn't you use a replace error handler in the
first place?)  And most likely unless you do a complete analysis of
all the ways str can get into or out of your module, you've just
started a game of whack-a-mole.

I could very easily be wrong about my assessment of where the majority
of these Unicode handling defects get injected: it's possible the
great majority comes from assorted legacy modules, and whack-a-mole is
the most cost-effective way to deal with them for most programs.  I
hope not, though. :-/

From p.f.moore at gmail.com  Fri May 15 14:21:29 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 15 May 2015 13:21:29 +0100
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
 <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
 <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>

On 15 May 2015 at 02:02, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> (3) Problem: Code you can't or won't fix buggily passes you Unicode
>     that might have surrogates in it.
>     Solution: text-to-text codecs (but I don't see why they can't be
>     written as encode-decode chains).
>
> As I've written before, I think text-to-text codecs are an attractive
> nuisance.  The temptation to use them in most cases should be refused,
> because it's a better solution to deal with the problem at the
> incoming boundary or the outgoing boundary (using str<->bytes codecs).

One case I'd found a need for text->text handling (although not
related to surrogates) was taking arbitrary Unicode and applying an
error handler to it before writing it to a stream with "strict"
encoding. (So something like "arbitrary text".encode('latin1',
'errors='backslashescape').decode('latin1')).

The encode/decode pair seemed ugly, although it was the only way I
could find. I could easily imagine using a "rehandle" type of function
for this (although I wouldn't use the actual proposed functions here,
as the use of "surrogate" and "astral" in the names would lead me to
assume they were inappropriate).

Whether that's an argument for or against the idea that they are an
attractive nuisance, I'm not sure :-)

Paul

From koos.zevenhoven at aalto.fi  Fri May 15 16:00:26 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Fri, 15 May 2015 17:00:26 +0300
Subject: [Python-ideas] Units in type hints
In-Reply-To: <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info>
References: <5554810B.7050409@aalto.fi>
 <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info>
Message-ID: <5555FBFA.8090805@aalto.fi>

On 14.5.2015 14:59, Steven D'Aprano wrote:
> On Thu, May 14, 2015 at 02:03:39PM +0300, Koos Zevenhoven wrote:
>> Hi all,
>>
>> How about extending the type annotations for int, float and complex to
>> optionally include also a unit?
> I really, really like the idea of having unit-aware calculations.
>
> But this is not the way to do it. See below:
>
>

Getting something even better would of course be great. Needless to say, 
I would not be in favor of adding my first rough sketch to Python. I do 
believe that, whatever the solution, it would need to be some kind of a 
standard for it to really work. See comments below.

>> For instance,
>>
>>      def sleep(duration : Float['s']):
>>          ...
>>
>> Now the type checker could catch the error of trying to pass the sleep
>> duration in milliseconds, Float['ms'].
> But that's not an error. Calling sleep(weight_in_kilograms) is an error.

In the example I gave, it is clearly an error. And it would be an error 
with time.sleep. But you are obviously right, sleeping for kilograms is 
also an error, although a very bizarre one.

> But calling sleep(milliseconds(1000)) should be the same as calling
> sleep(seconds(1)).

Yes, something like that would be nice. What would sleep(1) do?

> If the user has to do the conversion themselves,
> that's a source of error:
>
> sleep(time_in_milliseconds / 1000)  # convert to seconds
>
> If you think that's too obvious an error for anyone to make,

You lost me now. There does not seem to be an error in the line of code 
you provided, especially not when using Python 3, which has true 
division by default. However, in what I proposed, the type checker would 
complain because you made a manual conversion without changing the unit 
hint (which is also potential source of error, and you seem to agree). 
According to my preliminary sketch, the correct way (which you did not 
quote) would be

     sleep(convert(time_in_milliseconds, 'ms', 's'))

I do think this might be unnecessarily verbose. Anyway, I was not 
proposing the 'user' should do the actual conversion calculation by hand.

> (1) you're
> wrong, I've made that error, yes even that simple, and (2) you should
> try it with more complex sets of units. How many pound-foot per minute
> squared in a newton?

There's no error so we will never find out whether I would have been 
wrong :(. But I can assure you, I have made errors in unit conversions 
too. Anyway, you did not quote the part of my email which addresses 
conversions and derived units (km/h). Regarding your example, it might 
work like this (not that I think this is optimal, though):

     convert(value, 'lb * ft / min**2', 'N')

> Having the language support unit calculations is not just to catch the
> wrong dimensions (passing a weight where a time is needed), but to
> manage unit conversions automatically without the user being responsible
> for getting the conversion right.

That would be ideal, I agree. Would that not be a really hard thing to 
introduce into the language, taking into account backwards compatibility 
and all? I intentionally proposed something less than that.

> A type checker is the wrong tool for
> the job.

At least not ideal. I do think the error of using the wrong unit is 
conceptually similar to many cases of accidentally passing something 
with the wrong type. Also, the type hints have other uses besides type 
checkers. Of course, having everything just work, without the 
user/programmer having to care, would be even better.

> If you want to see what a good unit-aware language should be capable of,
> check out:
>
> - Frink:https://futureboy.us/frinkdocs/
>
> - the HP-28 and HP-48 series of calculators;
>
> - the Unix/Linux "units" utility.
>
> There are also some existing Python libraries which do unit
> calculations. You should look into them.
>

There was also a talk at PyCon about existing libraries, but I can't 
seem to find it now. I assume some of you have seen it.

-- Koos


From koos.zevenhoven at aalto.fi  Fri May 15 16:21:58 2015
From: koos.zevenhoven at aalto.fi (Koos Zevenhoven)
Date: Fri, 15 May 2015 17:21:58 +0300
Subject: [Python-ideas] Units in type hints
In-Reply-To: <CANc-5UyZ7DJqjRpLc9T0XCW1y2ZXVBG4FTd+=Cz63fK4cs9UQQ@mail.gmail.com>
References: <5554810B.7050409@aalto.fi>
 <CANc-5UyZ7DJqjRpLc9T0XCW1y2ZXVBG4FTd+=Cz63fK4cs9UQQ@mail.gmail.com>
Message-ID: <55560106.4080909@aalto.fi>

Thanks for the email and tip!

For my own code, I tend to always use SI units or those derived from 
them. If I want 3 milliseconds, I do 3e-3. Although seconds are pretty 
universal, not everyone has the privilege of being born and raised in SI 
units :P. Well, I guess m/s is rarely the everyday unit for speed, anywhere.

For me, the problems arise when there are third-party non-SI functions 
or things like functions that take a duration in terms of samples of 
discretized signals (potentially Int['samples'] or in some cases 
Float['samples']).

-- Koos


On 2015-05-14 17:05, Skip Montanaro wrote:
> On Thu, May 14, 2015 at 6:03 AM, Koos Zevenhoven
> <koos.zevenhoven at aalto.fi> wrote:
>> How about extending the type annotations for int, float and complex to
>> optionally include also a unit?
> Not sure that's going to fly, but you might want to check out the
> magnitude package:
>
> https://pypi.python.org/pypi/magnitude/0.9.1
>
> I've used it in situations where I want to specify units scaled to a
> more natural (to me) size. For example, the gobject.timeout_add
> function takes a delay in milliseconds. Given that most of the time I
> want delays in seconds or minutes, it's much more natural for me to
> let magnitude do the work silently.
>
> Skip


From rosuav at gmail.com  Fri May 15 17:28:36 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 16 May 2015 01:28:36 +1000
Subject: [Python-ideas] Units in type hints
In-Reply-To: <5555FBFA.8090805@aalto.fi>
References: <5554810B.7050409@aalto.fi>
 <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info>
 <5555FBFA.8090805@aalto.fi>
Message-ID: <CAPTjJmrRUNxDuPWuk-xpLh-GhfV8=hs3ePdrkiPNkt1wCU+2fg@mail.gmail.com>

On Sat, May 16, 2015 at 12:00 AM, Koos Zevenhoven
<koos.zevenhoven at aalto.fi> wrote:
> On 14.5.2015 14:59, Steven D'Aprano wrote:
>> But that's not an error. Calling sleep(weight_in_kilograms) is an error.
>
> In the example I gave, it is clearly an error. And it would be an error with
> time.sleep. But you are obviously right, sleeping for kilograms is also an
> error, although a very bizarre one.

I dunno, maybe you're a heavy sleeper? :)

>> If the user has to do the conversion themselves,
>> that's a source of error:
>>
>> sleep(time_in_milliseconds / 1000)  # convert to seconds
>>
>> If you think that's too obvious an error for anyone to make,
>
> You lost me now. There does not seem to be an error in the line of code you
> provided, especially not when using Python 3, which has true division by
> default. However, in what I proposed, the type checker would complain
> because you made a manual conversion without changing the unit hint (which
> is also potential source of error, and you seem to agree).

Dividing a unit-aware value by a scalar shouldn't be an error. "I have
an A4 sheet of paper. If I fold it in half seven times, how big will
it be?" => 210mm*297mm/(2**7) == 487.265625 mm^2.

The unit would simply stay the same after the division; what you'd
have is the thousandth part of the time, still in milliseconds. If you
have a typing system that's unit-aware, this would still be an error,
but it would be an error because you're still giving milliseconds to a
function that wants seconds.

It'd possibly be best to have actual real types for your unit-aware
values. Something like:

class UnitAware:
    def __init__(self, value: float, unit: str):
        self.value = value
        self.unit = unit
    def __mul__(self, other):
        if isinstance(other, UnitAware):
            # perform compatibility/conversion checks
        else: return UnitAware(self.value * other, self.unit)
    # etc
    def as(self, unit):
        # attempt to convert this value into the other unit

Then you could have hinting types that stipulate specific units:

class Unit(str):
    def __instancecheck__(self, val):
        return isinstance(val, UnitAware) and val.unit == self
ms = Unit("ms")
sec = Unit("sec")
m = Unit("m")

This would allow you to go a lot further than just type hints. But
maybe this would defeat the purpose, in that it'd have to have every
caller and callee aware that they're looking for a unit-aware value
rather than a raw number - so it wouldn't be easy to deploy
backward-compatibly.

ChrisA

From skip.montanaro at gmail.com  Fri May 15 18:07:58 2015
From: skip.montanaro at gmail.com (Skip Montanaro)
Date: Fri, 15 May 2015 11:07:58 -0500
Subject: [Python-ideas] Units in type hints
In-Reply-To: <CAPTjJmrRUNxDuPWuk-xpLh-GhfV8=hs3ePdrkiPNkt1wCU+2fg@mail.gmail.com>
References: <5554810B.7050409@aalto.fi>
 <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info>
 <5555FBFA.8090805@aalto.fi>
 <CAPTjJmrRUNxDuPWuk-xpLh-GhfV8=hs3ePdrkiPNkt1wCU+2fg@mail.gmail.com>
Message-ID: <CANc-5Uyud0xbNmjya3t-U1h5F+F4mnpo1nCkVnCi9PTtJDeV8g@mail.gmail.com>

On Fri, May 15, 2015 at 10:28 AM, Chris Angelico <rosuav at gmail.com> wrote:
> Dividing a unit-aware value by a scalar shouldn't be an error. "I
> have an A4 sheet of paper. If I fold it in half seven times, how big
> will it be?" => 210mm*297mm/(2**7) == 487.265625 mm^2.

Here's this example using the magnitude module:

>>> from magnitude import mg
>>> a4 = mg(210, "mm") * mg(297, "mm")
>>> a4
<magnitude.Magnitude instance at 0x15de9e0>
>>> # Invalid - units are actually mm^2
>>> a4.ounit("mm")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/lib/python2.7/site-packages/magnitude.py", line 397, in ounit
    (self.out_factor.unit, self.unit))
MagnitudeError: Inconsistent Magnitude units: [1, 0, 0, 0, 0, 0, 0, 0,
0], [2, 0, 0, 0, 0, 0, 0, 0, 0]
>>> a4.ounit("mm2")
<magnitude.Magnitude instance at 0x15de9e0>
>>> a4.ounit("mm2").toval()
62370.0
>>> a4.toval()
62370.0
>>> 210 * 297
62370
>>> # Not sure why dimensionless / isn't supported
>>> folded7x = a4 * (1/2**7)
>>> folded7x
<magnitude.Magnitude instance at 0x15f00e0>
>>> folded7x.ounit("mm2").toval()
487.265625

Skip

From apieum at gmail.com  Fri May 15 18:13:26 2015
From: apieum at gmail.com (Gregory Salvan)
Date: Fri, 15 May 2015 18:13:26 +0200
Subject: [Python-ideas] Units in type hints
In-Reply-To: <55560106.4080909@aalto.fi>
References: <5554810B.7050409@aalto.fi>
 <CANc-5UyZ7DJqjRpLc9T0XCW1y2ZXVBG4FTd+=Cz63fK4cs9UQQ@mail.gmail.com>
 <55560106.4080909@aalto.fi>
Message-ID: <CAAZsQLBV0wEg_3eBEKiQ88yGFMiQ59SLbU19Jg3za-Kd=6Lh4A@mail.gmail.com>

Hi,
why don't you try a tuple of types ?
(Float, Samples)

Period:
(int, Time)
and if you want to force seconds eventually:
(int, Time[seconds])


2015-05-15 16:21 GMT+02:00 Koos Zevenhoven <koos.zevenhoven at aalto.fi>:

> Thanks for the email and tip!
>
> For my own code, I tend to always use SI units or those derived from them.
> If I want 3 milliseconds, I do 3e-3. Although seconds are pretty universal,
> not everyone has the privilege of being born and raised in SI units :P.
> Well, I guess m/s is rarely the everyday unit for speed, anywhere.
>
> For me, the problems arise when there are third-party non-SI functions or
> things like functions that take a duration in terms of samples of
> discretized signals (potentially Int['samples'] or in some cases
> Float['samples']).
>
> -- Koos
>
>
>
> On 2015-05-14 17:05, Skip Montanaro wrote:
>
>> On Thu, May 14, 2015 at 6:03 AM, Koos Zevenhoven
>> <koos.zevenhoven at aalto.fi> wrote:
>>
>>> How about extending the type annotations for int, float and complex to
>>> optionally include also a unit?
>>>
>> Not sure that's going to fly, but you might want to check out the
>> magnitude package:
>>
>> https://pypi.python.org/pypi/magnitude/0.9.1
>>
>> I've used it in situations where I want to specify units scaled to a
>> more natural (to me) size. For example, the gobject.timeout_add
>> function takes a delay in milliseconds. Given that most of the time I
>> want delays in seconds or minutes, it's much more natural for me to
>> let magnitude do the work silently.
>>
>> Skip
>>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150515/311cebba/attachment.html>

From steve at pearwood.info  Fri May 15 19:14:07 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 16 May 2015 03:14:07 +1000
Subject: [Python-ideas] Units in type hints
In-Reply-To: <5555FBFA.8090805@aalto.fi>
References: <5554810B.7050409@aalto.fi>
 <16724_1431604817_55548E51_16724_7557_1_20150514115956.GW5663@ando.pearwood.info>
 <5555FBFA.8090805@aalto.fi>
Message-ID: <20150515171407.GZ5663@ando.pearwood.info>

On Fri, May 15, 2015 at 05:00:26PM +0300, Koos Zevenhoven wrote:
> On 14.5.2015 14:59, Steven D'Aprano wrote:
[...]
> >>For instance,
> >>
> >>     def sleep(duration : Float['s']):
> >>         ...
> >>
> >>Now the type checker could catch the error of trying to pass the sleep
> >>duration in milliseconds, Float['ms'].
> >
> >But that's not an error. Calling sleep(weight_in_kilograms) is an error.
> 
> In the example I gave, it is clearly an error. And it would be an error 
> with time.sleep. But you are obviously right, sleeping for kilograms is 
> also an error, although a very bizarre one.

Calling sleep(x) where x is a millisecond unit should not be an error, 
because millisecond is just a constant times second. To be precise, 
milliseconds and seconds both have the same dimension, T (time) and so 
differ only by a fixed conversion constant. Hence:

sleep( 1000 millisecond )
sleep( 1 second )
sleep( 0.016666667 minute )
sleep( 8.2671958e-07 fortnight )
sleep( 0.91134442 feet/kph )

etc. should all have exactly the same result, namely, to sleep for one 
second.

(Obviously "1 second" is not valid Python syntax. I'm just using it as 
shorthand for whatever syntax is used, possibly a function call.)


> >But calling sleep(milliseconds(1000)) should be the same as calling
> >sleep(seconds(1)).
> 
> Yes, something like that would be nice. What would sleep(1) do?

That depends on the sleep function. If we're talking about the actual 
time.sleep function that exists today, it will sleep for one second. But 
that's because it's not aware of units. A unit-aware function could:

- assume you know what you are doing and assign a default unit 
  to scalar quantities, e.g. treat 1 as "1 second";

- treat 1 as a dimensionless quantity and raise an exception
  ("no dimension" is not compatible with "time dimension").

Of the two, the Zen suggests the second is the right thing to do. ("In 
the face of ambiguity, refuse the temptation to guess.") But perhaps 
backwards-compatibility requires the first.

You could, I suppose, use a static type checker to get a really poor 
unit checker:

def sleep(t: Second):
    time.sleep(t)

class Second(float):
    pass


sleep(1.0)  # type checker flags this as wrong
sleep(Second(1.0))  # type checker allows this

But it's a really poor one, because it utter fails to enforce 
conversion factors, not even a little bit:

class Minute(float):
    pass

one_hour = Minute(60.0)
sleep(one_hour)  # flagged as wrong
sleep(Second(one_hour))  # allowed

but of course that will sleep for 60 seconds, not one hour. The problem 
here is that we've satisfied the type checker with meaningless types 
that don't do any conversions, and left all the conversions up to the 
user.



> >If the user has to do the conversion themselves,
> >that's a source of error:
> >
> >sleep(time_in_milliseconds / 1000)  # convert to seconds
> >
> >If you think that's too obvious an error for anyone to make,
> 
> You lost me now. There does not seem to be an error in the line of code 
> you provided, especially not when using Python 3, which has true 
> division by default.

D'oh!

Well, I demonstrated my point that unit conversions are prone to human 
error, only not the way I intended to. 

I *intended* to write the conversion the wrong way around, except I got 
it wrong myself. I *wrongly* convinced myself that the conversion factor 
was milliseconds * 1000 -> seconds, hence /1000 would get it wrong. Only 
it isn't.

Believe me, I didn't intend to make my point in such a convoluted way. 
This was a genuine screw-up on my part.


> However, in what I proposed, the type checker would 
> complain because you made a manual conversion without changing the unit 
> hint (which is also potential source of error, and you seem to agree). 
> According to my preliminary sketch, the correct way (which you did not 
> quote) would be
> 
>     sleep(convert(time_in_milliseconds, 'ms', 's'))

That can't work, because how does the static type checker know that 

    convert(time_in_milliseconds, 'ms', 's')

returns seconds rather than milliseconds or minutes or days? What sort 
of annotations can you give convert() that will be known at compile 
time? Maybe you can see something I haven't thought of, but I cannot 
think of any possible static declaration which would allow a type 
checker to correctly reason that

    convert(x, 'ms', 's')

returns Second or Float['s'], and so does this:

    new_unit = get_unit_from_config()
    convert(y, 'minute', new_unit)

but not this:

    convert(z, 's', 'hour')

Let alone more complex cases involving units multiplied, divided, and 
raised to powers.


-- 
Steve

From random832 at fastmail.us  Fri May 15 20:14:52 2015
From: random832 at fastmail.us (random832 at fastmail.us)
Date: Fri, 15 May 2015 14:14:52 -0400
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <BE014335-D461-4DCA-A123-F71752699518@yahoo.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <1431614947.2825480.268771377.2227A960@webmail.messagingengine.com>
 <BE014335-D461-4DCA-A123-F71752699518@yahoo.com>
Message-ID: <1431713692.3280482.269840281.21784468@webmail.messagingengine.com>

On Thu, May 14, 2015, at 15:48, Andrew Barnert wrote:
> > Technically filesystem names (and other similar boundary APIs like
> > environ, anything ctypes, etc) on Windows can contain arbitrary
> > surrogates
> 
> Are you sure? I thought that, unless you're using Win95 or NT 3.1 or
> something, Win32 *W APIs are explicitly for Unicode characters (not code
> units),

Windows documentation often uses "unicode" to mean UTF-16 and
"character" to mean WCHAR. The real point is that the APIs perform no
validation, and existing filenames on the disk, user input into edit
controls, etc, can contain invalid surrogates. There's basically nothing
at any point to reject invalid surrogates. I can create a file now whose
filename consists of a single surrogate code unit. I can copy that
filename to the clipboard, paste it anywhere, create more files with it
in the filename or contents, etc. (Notepad, incidentally, will save a
UTF-16 file containing an invalid surrogate, but saving it as UTF-8 will
replace it with U+FFFD, the one and only place I could find where
invalid surrogates are rejected by Windows).

> minus nulls and any relevant reserved characters (e.g.. no
> slashes in filenames, no control characters in filenames except for
> substream names, etc.). That's what the Naming Files doc seems to imply.
> (Then again, there are other areas that seem confusing or
> misleading--e.g., where it tells you not to worry about normalization
> because once the string gets through Win32 and to the filesystem it's
> just a string of WCHARs, which sounds to me like that's exactly why you
> _should_ worry about normalization...)'

Well, it depends on why you're worried about it. No normalization is
great for being able to expect that your filename you just saved will
come back unchanged in a directory listing.

From random832 at fastmail.us  Fri May 15 20:19:35 2015
From: random832 at fastmail.us (random832 at fastmail.us)
Date: Fri, 15 May 2015 14:19:35 -0400
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
 <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com>
 <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com>
Message-ID: <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com>

On Thu, May 14, 2015, at 16:17, Andrew Barnert wrote:
> The point is that you can miscount lengths by counting the BOM, and you
> can split a BOM stream into a BOM steam and an "I hope it's in native
> order or we're screwed" stream.

Python provides no operations for splitting streams. You mention
re.split further on, but that only works on in-memory strings, which
should have already had the BOM stripped and been put in native order.
In-memory wide strings should _never_ be in an endianness other than the
machine's native one and should _never_ have a BOM. That should be taken
care of when reading it off the disk/wire. If you haven't done that, you
still have a byte array, which it's not so easy to accidentally assume
you'll be able to split up and pass to your fromstring function.

> Which is wrong, but it works when I test it,
> because most UTF-16 files are little-endian, and so is my machine. And
> then someone runs my app on a big-endian machine and they get a
> hard-to-debug exception (or, if we're really unlucky, silent mojibake,
> but that's pretty rare).

The proper equivalent of a UTF-16 file with a byte-order-mark would be a
_binary_ StringIO on a _byte_ array containing a BOM and UTF-16. You can
layer a TextIOWrapper on top of either of them. And it never makes sense
to expect to be able to arbitrarily split up encoded byte arrays,
whether those are in UTF-16 or not.

> Usually, when you have WCHARs, it's because you opened a file and wread
> from it, or received UTF-16 over the network of from a Windows FooW API,
> in which case you have the same endianness issues as any other binary I/O
> on non-char-sized types. And yes, of course the right answer is to decode
> at input, but if you're doing that, why wouldn't you just decide to
> Unicode instead of byte-swapping the WCHARs?

You shouldn't have WCHARS (of any kind) in the first place until you've
decoded. If you're receiving UTF-16 of unknown endianness over the
network you should be receiving it as bytes. If you're directly calling
a FooW API, you are obviously on a win32 system and you've already got
native WCHARs in native endianness. But, once again, that wasn't really
my point.

My point that there are no native libc functions for working with utf-8
strings - even if you're willing to presume that the native multibyte
character set is UTF-8, there are very few standard functions for
working with multibyte characters. "ascii compatibility" means you're
going to write something using strchr or strtok that works for ascii
characters and does something terrible when given non-ascii multibyte
characters to search for.

The benefits of using libc only work if you play by libc's rules, which
we've established are inadequate. If you're _not_ going to use libc
string functions, then there's no reason not to prefer UTF-32 (when
you're not using the FSR, which is essentially a fancy immutable
container for UTF-32 code points) over UTF-8.

From abarnert at yahoo.com  Fri May 15 21:37:57 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 15 May 2015 12:37:57 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
 <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com>
 <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com>
 <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com>
Message-ID: <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com>

On May 15, 2015, at 11:19, random832 at fastmail.us wrote:
> 
>> On Thu, May 14, 2015, at 16:17, Andrew Barnert wrote:
>> The point is that you can miscount lengths by counting the BOM, and you
>> can split a BOM stream into a BOM steam and an "I hope it's in native
>> order or we're screwed" stream.
> 
> Python provides no operations for splitting streams. You mention
> re.split further on, but that only works on in-memory strings, which
> should have already had the BOM stripped and been put in native order.

If you're decoding to text, you don't have UTF-16 anymore (or, if you do under the covers, you neither know nor care that you do), you have Unicode text. 

Conversely, if you have UTF-16--even in native order and with the BOM stripped--you don't have text, you still have bytes (or WCHARs, if you prefer, but not in Python).

Why would you want to transcode from one encoding to another in memory just to still have to work on encoded bytes? There's no more reason for you to be passing byteswapped, BOM-stripped UTF-16 to re.split than there is for you to be passing any other encoded bytes to re.split.

> In-memory wide strings should _never_ be in an endianness other than the
> machine's native one and should _never_ have a BOM. That should be taken
> care of when reading it off the disk/wire. If you haven't done that, you
> still have a byte array, which it's not so easy to accidentally assume
> you'll be able to split up and pass to your fromstring function.

I explicitly mentioned opening the file in binary mode, reading it in, and passing it to some fromstring function that takes bytes, so yes, of course you have a byte array.

And again, if you have UTF-16, even in native endianness and without a BOM, that's still a byte array, so how is that any different?

And of course you can have in-memory byte arrays with a BOM, or in non-native endianness; that's what the UTF-16 and UTF-16-BE (or -LE) codecs produce and consume.

And it _is_ easy to use those byte arrays, exactly as easy as to use UTF-8 byte arrays or native-endian BOM-less UTF-16 byte arrays or anything else. All you need is a library that's willing to do the decoding for you in its loads/fromstring/etc. function, which includes most libraries on PyPI (because otherwise they wouldn't work with str in 2.x). See simplejson, for an example.

>> Which is wrong, but it works when I test it,
>> because most UTF-16 files are little-endian, and so is my machine. And
>> then someone runs my app on a big-endian machine and they get a
>> hard-to-debug exception (or, if we're really unlucky, silent mojibake,
>> but that's pretty rare).
> 
> The proper equivalent of a UTF-16 file with a byte-order-mark would be a
> _binary_ StringIO on a _byte_ array containing a BOM and UTF-16.

I mentioned BytesIO; that's what a binary StringIO is called.

> You can
> layer a TextIOWrapper on top of either of them. And it never makes sense
> to expect to be able to arbitrarily split up encoded byte arrays,
> whether those are in UTF-16 or not.

There are countless protocols and file formats that _require_ being able to split byte arrays before decoding them. That's how you split the header and body of an RFC822 message like an email or an HTTP response, and how you parse OLE substreams out of a binary-format Office file.

>> Usually, when you have WCHARs, it's because you opened a file and wread
>> from it, or received UTF-16 over the network of from a Windows FooW API,
>> in which case you have the same endianness issues as any other binary I/O
>> on non-char-sized types. And yes, of course the right answer is to decode
>> at input, but if you're doing that, why wouldn't you just decide to
>> Unicode instead of byte-swapping the WCHARs?
> 
> You shouldn't have WCHARS (of any kind) in the first place until you've
> decoded.

And yet Microsoft's APIs, both Win32 and MSVCRT, are full of wread and similar functions.

But anyway, I'll grant that you usually shouldn't have WCHARs before you've decoded.

But you definitely should not have WCHARs _after_ you've decoded. In fact, you _can't_ have them after you've decoded, because a WCHAR isn't big enough to hold a Unicode code point. If you have WCHARs, either you're still encoded (or just transcoded to UTF-16), or your code will break as soon as you get a Chinese user with a moderately uncommon last name.

So, you should never have WCHARs. Which was my point in the first place.

If you need to deal with UTF-16 streams, treat them as streams of bytes and decode them the same way you would UTF-8 or Big5 or anything else, don't treat them as streams of WCHARs that are often but not always complete Unicode characters.

> If you're receiving UTF-16 of unknown endianness over the
> network you should be receiving it as bytes. If you're directly calling
> a FooW API, you are obviously on a win32 system and you've already got
> native WCHARs in native endianness.

Only if you got those characters from another win32 FooW API, as opposed to, say, from user input from a cross-platform GUI framework that may have different rules from Windows.

> But, once again, that wasn't really
> my point.
> 
> My point that there are no native libc functions for working with utf-8
> strings - even if you're willing to presume that the native multibyte
> character set is UTF-8, there are very few standard functions for
> working with multibyte characters. "ascii compatibility" means you're
> going to write something using strchr or strtok that works for ascii
> characters and does something terrible when given non-ascii multibyte
> characters to search for.

But many specific static patterns _do_ work with ASCII compatible encodings. Again, think of HTTP responses. Even though the headers and body are both text, they're defined as being separated by b"\r\n\r\n".

If this were never useful--or if it often seemed useful but was really just an attractive nuisance--Python 3 wouldn't have bytes.split and bytes.find and be adding bytes.__mod__. Or do you think that proposal is a mistake?

> The benefits of using libc only work if you play by libc's rules, which
> we've established are inadequate. If you're _not_ going to use libc
> string functions, then there's no reason not to prefer UTF-32 (when
> you're not using the FSR, which is essentially a fancy immutable
> container for UTF-32 code points) over UTF-8.

Preferring UTF-32 over UTF-8 makes perfect sense. But that's not what you started out arguing. Nick mentioned off-hand that UTF-16 has the worst of both worlds of UTF-8 and UTF-32, Stephen explained that further to someone else, and you challenged his explanation, arguing that UTF-16 doesn't introduce any problems over UTF-8. But it does. It introduces all the same problems as UTF-32, but without any of the benefits.


From wes.turner at gmail.com  Fri May 15 23:23:10 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Fri, 15 May 2015 16:23:10 -0500
Subject: [Python-ideas] Units in type hints
In-Reply-To: <5554810B.7050409@aalto.fi>
References: <5554810B.7050409@aalto.fi>
Message-ID: <CACfEFw9nz8bnx6rJ03nGjjxEb5aBspe2i+47-YcFu4gV5sxGiQ@mail.gmail.com>

* https://pint.readthedocs.org/en/latest/ (supports NumPy)
* QUDT maintains SI units, non-SI units, conversion factors, labels, etc.
as RDF classes and instances with properties:
  * https://wrdrd.com/docs/consulting/knowledge-engineering#qudt

On Thu, May 14, 2015 at 6:03 AM, Koos Zevenhoven <koos.zevenhoven at aalto.fi>
wrote:

> Hi all,
>
> How about extending the type annotations for int, float and complex to
> optionally include also a unit?
>
> For instance,
>
>     def sleep(duration : Float['s']):
>         ...
>
> Now the type checker could catch the error of trying to pass the sleep
> duration in milliseconds, Float['ms']. This would also be useful for
> documentation, avoiding the 'need' for having names like duration_s. At
> least the notation with square brackets would resemble the way units are
> often written in science.
>
> Another example:
>
>     def calculate_travel_time(distance: Float['km']) -> Float['h']:
>         speed = get_current_speed()  # type: Float['km/h']
>         return distance / speed
>
> Now, if you try to pass the distance in miles, or Float['mi'], the type
> checker would catch the error. Note that the type checker would also
> understand that 'km' divided by 'km/h' becomes 'h'. Or should these be
> something like units.km / units.h?
>
> But if you do have your distance in miles, you do
>
>     calculate_travel_time(units.convert(distance_mi, 'mi', 'km'))
>
> and the type checker and programmer get what they want.
>
> Anyone interested?
>
>
> -- Koos
>
>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150515/f0f93beb/attachment.html>

From random832 at fastmail.us  Fri May 15 23:52:18 2015
From: random832 at fastmail.us (random832 at fastmail.us)
Date: Fri, 15 May 2015 17:52:18 -0400
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org> <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
 <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com>
 <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com>
 <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com>
 <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com>
Message-ID: <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com>

On Fri, May 15, 2015, at 15:37, Andrew Barnert wrote:
> Conversely, if you have UTF-16--even in native order and with the BOM
> stripped--you don't have text, you still have bytes (or WCHARs, if you
> prefer, but not in Python).

This line of discussion began with someone asserting the [dubious]
merits of using the native libc functions, which on windows does mean
UTF-16 WCHARs as well as (ASCII, but certainly not properly-handled
UTF-8) bytes.

> I explicitly mentioned opening the file in binary mode, reading it in,
> and passing it to some fromstring function that takes bytes, so yes, of
> course you have a byte array.

Why would a fromstring function take bytes? How would you use re.split
on it?

> > You shouldn't have WCHARS (of any kind) in the first place until you've
> > decoded.
> 
> And yet Microsoft's APIs, both Win32 and MSVCRT, are full of wread and
> similar functions.

No such thing as "wread". And given the appropriate flags to _open,
_read can perform decoding.

> But anyway, I'll grant that you usually shouldn't have WCHARs before
> you've decoded.
> 
> But you definitely should not have WCHARs _after_ you've decoded. In
> fact, you _can't_ have them after you've decoded, because a WCHAR isn't
> big enough to hold a Unicode code point.

You're nitpicking on word choice. Going from bytes to UTF-16 words
[whether as WCHAR or unsigned short] is a form of decoding. Or don't you
think python narrow builds' decode function was properly named?

> But many specific static patterns _do_ work with ASCII compatible
> encodings. Again, think of HTTP responses. Even though the headers and
> body are both text, they're defined as being separated by b"\r\n\r\n".

Right, but those aren't UTF-8. Working with ASCII is fine, but don't
pretend you've actually found a way to work with UTF-8.

> Preferring UTF-32 over UTF-8 makes perfect sense. But that's not what you
> started out arguing. Nick mentioned off-hand that UTF-16 has the worst of
> both worlds of UTF-8 and UTF-32, Stephen explained that further to
> someone else, and you challenged his explanation, arguing that UTF-16
> doesn't introduce any problems over UTF-8.
> But it does. It introduces all
> the same problems as UTF-32, but without any of the benefits.

No, because UTF-32 has the additional problem, shared with UTF-8, that
(Windows) libc doesn't support it.

My point was that if you want the benefits of using libc you have to pay
the costs of using libc, and that means using libc's native encodings.
Which, on Windows, are UTF-16 and (e.g.) Codepage 1252. If you don't
want the benefits of using libc, then there's no benefit to using UTF-8.

From abarnert at yahoo.com  Sat May 16 01:44:23 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 15 May 2015 16:44:23 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp> <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
 <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com>
 <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com>
 <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com>
 <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com>
 <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com>
Message-ID: <5F666AAB-E680-4EF5-973F-AC33A03F64F2@yahoo.com>

On May 15, 2015, at 14:52, random832 at fastmail.us wrote:
> 
>> On Fri, May 15, 2015, at 15:37, Andrew Barnert wrote:

>> I explicitly mentioned opening the file in binary mode, reading it in,
>> and passing it to some fromstring function that takes bytes, so yes, of
>> course you have a byte array.
> 
> Why would a fromstring function take bytes?

I just gave you a specific example of this (simplejson.loads), and explained why they do it (because the same code is how they work with str in 2.x), in the very next paragraph, which you snipped out. And I'd already explained it in the previous email. I'm not sure how many other ways there are to explain it. I'd bet that the vast majority of modules on PyPI that have a fromstring/loads/parsetxt/readcsv/etc.-style function can take bytes; how is this surprising to you?

> How would you use re.split
> on it?

On a bytes? This is explained in the second line of the re docs: re works with byte patterns and strings just as it works with Unicode patterns and strings.

>> But anyway, I'll grant that you usually shouldn't have WCHARs before
>> you've decoded.
>> 
>> But you definitely should not have WCHARs _after_ you've decoded. In
>> fact, you _can't_ have them after you've decoded, because a WCHAR isn't
>> big enough to hold a Unicode code point.
> 
> You're nitpicking on word choice.

No, I'm not. Pretending 16-bit wide chars are "Unicode" is not just a trivial matter of bad word choice, it's wrong, and it's exactly how the world created the problems that this thread is thing to help solve.

Win32, Cocoa, and Java have the good excuse that they were created back when Unicode only had 64K code points and, as far as anyone believed, always would. So they were based on UCS2, and later going from there to UTF-16 broke less code than going from there to UCS4 would have. But that isn't a good reason for any new framework, library, or app to use UTF-16.

> Going from bytes to UTF-16 words
> [whether as WCHAR or unsigned short] is a form of decoding.

Only in the same sense that going from Shift-JIS to UTF-8 is a form of decoding. Or, for that matter, going from UTF-16 to baudot 6-bit units, if that's what your code wants to work on.

If your code treats UTF-8 or UTF-16 or Shift-JIS strings as sequences of unicode characters, it makes sense to call that decoding. If your code treats them as sequences of bytes or words, then your strings are still encoded bytes or words, not strings, and it's misleading to call that decoding.

> Or don't you
> think python narrow builds' decode function was properly named?

The real problem was that Python narrow builds shouldn't exist in the first place. Which was fixed in 3.3, so I don't think I need to argue that it should be fixed.

>> But many specific static patterns _do_ work with ASCII compatible
>> encodings. Again, think of HTTP responses. Even though the headers and
>> body are both text, they're defined as being separated by b"\r\n\r\n".
> 
> Right, but those aren't UTF-8. Working with ASCII is fine, but don't
> pretend you've actually found a way to work with UTF-8.

But the same functions _do_ work for UTF-8. That's one of the whole points of UTF-8: every byte is unambiguously either a single character, a leading byte, or a continuation byte. This means you can search any UTF-8 encoded string for any UTF-8-encoded substring (or any regex pattern) and it will never have false positives (or negatives), whether that substring or pattern is b'\r\n\r\n' or '?'.encode('utf-8').

And that's the only reason that searching UTF-16 works: every word is unambiguously either a single character, a leading surrogate, or a continuation surrogate. So UTF-16 is exactly the same as UTF-8 here, for exactly the same reason; it's not better.

>> Preferring UTF-32 over UTF-8 makes perfect sense. But that's not what you
>> started out arguing. Nick mentioned off-hand that UTF-16 has the worst of
>> both worlds of UTF-8 and UTF-32, Stephen explained that further to
>> someone else, and you challenged his explanation, arguing that UTF-16
>> doesn't introduce any problems over UTF-8.
>> But it does. It introduces all
>> the same problems as UTF-32, but without any of the benefits.
> 
> No, because UTF-32 has the additional problem, shared with UTF-8, that
> (Windows) libc doesn't support it.

But Windows libc doesn't support UTF-16. When you call wcslen on "?", that emoji counts as 2 characters, not 1. It returns "the count of characters" in "wide (two-byte) characters", which aren't actually characters. 

> My point was that if you want the benefits of using libc you have to pay
> the costs of using libc, and that means using libc's native encodings.
> Which, on Windows, are UTF-16 and (e.g.) Codepage 1252. If you don't
> want the benefits of using libc, then there's no benefit to using UTF-8.


The traditional libc functions like strlen and strstr don't care what your native encoding or actual encoding are. Some of them will produce the right result with UTF-8 even if it isn't your encoding (strstr), some will produce the wrong wrong even if it is (strlen). There are also some newer functions that do care (mbslen), which are only right if UTF-8 is your locale encoding (which it probably isn't, and you're probably not going to set LC_CTYPE yourself).

The ones that are always right with UTF-8 have corresponding wide functions that are right with UTF-16, the ones the are always wrong with UTF-8 have corresponding wide functions that are always wrong with UTF-16, the ones that are locale-dependent don't have corresponding wide functions at all, forcing you to use functions that are always wrong.

Microsoft's libc documentation is seriously misleading, and refers to functions like wcslen as returning "the count in characters", but is generally not misleading for strlen. Catching UTF-8 strlen-style bugs before release requires testing some non-English text; catching UTF-16 wcslen-style bugs requires testing very specific kinds of text (you pretty much have to know what an astral is to even guess what kind of text you need--although emoji are making the problem more noticeable).

Which part of that is as advantage for UTF-16 with libc? In every case, it's either the same as UTF-8 (strstr) or worse (both mbslen and strlen, for different reasons).

From stephen at xemacs.org  Sat May 16 05:56:07 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 16 May 2015 12:56:07 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
 <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
 <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
Message-ID: <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp>

Paul Moore writes:

 > One case I'd found a need for text->text handling (although not
 > related to surrogates) was taking arbitrary Unicode and applying an
 > error handler to it before writing it to a stream with "strict"
 > encoding. (So something like "arbitrary text".encode('latin1',
 > 'errors='backslashescape').decode('latin1')).

That's not the use case envisioned for these functions, though.  You
want to change the textual content of the stream (by restricting the
repertoire), not change the representation of non-textual content.

 > The encode/decode pair seemed ugly, although it was the only way I
 > could find.

I find the fact that there's an output stream with an inappropriate
error handler far uglier!

Note that the encode/decode pair is quite efficient, although the
"rehandle" function could be about twice as fast.  Still, if you're
output-bound by the speed of a disk or the like, encode/decode will
have no trouble keeping up.

 > I could easily imagine using a "rehandle" type of function for this
 > (although I wouldn't use the actual proposed functions here, as the
 > use of "surrogate" and "astral" in the names would lead me to
 > assume they were inappropriate).

AFAICT, you'd be right -- they don't (as proposed) handle your use
case of restricting to a Unicode subset.  Your kind of use case is why
I think general repertoire filtering functions in unicodedata (or a
new unicodetools package) would be a much better home for this
functionality.

 > Whether that's an argument for or against the idea that they are an
 > attractive nuisance, I'm not sure :-)

I think your use case is quite independent of that issue.


From stephen at xemacs.org  Sat May 16 06:26:19 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 16 May 2015 13:26:19 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com>
References: <mi79rj$vl8$1@ger.gmane.org>
 <878ud4599h.fsf@uwakimon.sk.tsukuba.ac.jp>
 <mi8q15$bbp$1@ger.gmane.org>
 <871tiv5t5z.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7eYJgnFuAmbooLdKpyT3KCDfJxNweL3xBWVsRxLeHc97w@mail.gmail.com>
 <554AC2CE.5040705@btinternet.com>
 <3D6D122B-68A4-439E-A875-EBE412AAC31B@yahoo.com>
 <CALGmxEL8GBVtCApFcKoOp9DgH4TPb0GbAqvNH7571jX9H86TKw@mail.gmail.com>
 <87a8xg3tsc.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431527608.2421991.267775809.4755A9EE@webmail.messagingengine.com>
 <87y4ks1idw.fsf@uwakimon.sk.tsukuba.ac.jp>
 <1431614750.2824980.268749449.2F8C06E3@webmail.messagingengine.com>
 <70D2C600-9CEC-4837-9F17-25FAE9A84026@yahoo.com>
 <1431713975.3281476.269867321.225D681F@webmail.messagingengine.com>
 <503A25DF-4B83-4DBA-A5AA-3F3B0224B596@yahoo.com>
 <1431726738.3335993.270006353.65CC50BF@webmail.messagingengine.com>
Message-ID: <87a8x5172s.fsf@uwakimon.sk.tsukuba.ac.jp>

random832 at fastmail.us writes:

 > My point was that if you want the benefits of using libc you have
 > to pay the costs of using libc, and that means using libc's native
 > encodings.

Of course it doesn't mean any such thing.  My point was that there are
many utility functions in libc and out that don't care at all that the
array of bytes is encoded text, only that its content not contain
NULs, and that it be NUL-terminated.

Sure, nowadays there are better alternatives for handling text as text
(for example, Python 3 str! -- whose design *nobody* is proposing to
change here, although in the past some have asked that it be turned
into something Unicode compatible), but at least on POSIX systems
the traditional utilities still assume those classic characteristics,
which UTF-8 satisfies and UTF-16 does not.  Incompatibility with those
utilities is an issue for UTF-16, but not for UTF-8.  That's all.

From ncoghlan at gmail.com  Sat May 16 09:50:41 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 16 May 2015 17:50:41 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
 <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
 <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
Message-ID: <CADiSq7eq=CJ7zuJEUstesW-wn+ysz3z8kExAZeaOC=OjciKVaw@mail.gmail.com>

On 15 May 2015 at 22:21, Paul Moore <p.f.moore at gmail.com> wrote:
> On 15 May 2015 at 02:02, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> (3) Problem: Code you can't or won't fix buggily passes you Unicode
>>     that might have surrogates in it.
>>     Solution: text-to-text codecs (but I don't see why they can't be
>>     written as encode-decode chains).
>>
>> As I've written before, I think text-to-text codecs are an attractive
>> nuisance.  The temptation to use them in most cases should be refused,
>> because it's a better solution to deal with the problem at the
>> incoming boundary or the outgoing boundary (using str<->bytes codecs).
>
> One case I'd found a need for text->text handling (although not
> related to surrogates) was taking arbitrary Unicode and applying an
> error handler to it before writing it to a stream with "strict"
> encoding. (So something like "arbitrary text".encode('latin1',
> 'errors='backslashescape').decode('latin1')).
>
> The encode/decode pair seemed ugly, although it was the only way I
> could find. I could easily imagine using a "rehandle" type of function
> for this (although I wouldn't use the actual proposed functions here,
> as the use of "surrogate" and "astral" in the names would lead me to
> assume they were inappropriate).

That's a different case, as you need to know the encoding of the
target stream in order to know which code points that codec can't
handle. Even when you do know the target encoding, Python itself has
no idea which code points a given text encoding can and can't handle,
so the only way to find out is to try it and see what happens.

The unique thing about the surrogate case is that *no* codec is
supposed to encode them, not even the universal ones:

>>> '\ud834\udd1e'.encode("utf-8")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud834' in
position 0: surrogates not allowed

>>> '\ud834\udd1e'.encode("utf-16-le")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-16-le' codec can't encode character '\ud834'
in position 0: surrogates not allowed

>>> '\ud834\udd1e'.encode("utf-32")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-32' codec can't encode character '\ud834' in
position 0: surrogates not allowed

The fact that it's purely a code point level manipulation of the
entire surrogate range (rehandle_surrogatepass), or a particular usage
pattern of that range (rehandle_surrogateescape) is the difference
that makes it possible to define text->text APIs for surrogate
manipulation without caring about the eventual text encoding used (if
any).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From abarnert at yahoo.com  Sat May 16 11:47:02 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 16 May 2015 02:47:02 -0700
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CADiSq7eq=CJ7zuJEUstesW-wn+ysz3z8kExAZeaOC=OjciKVaw@mail.gmail.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
 <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
 <CADiSq7eq=CJ7zuJEUstesW-wn+ysz3z8kExAZeaOC=OjciKVaw@mail.gmail.com>
Message-ID: <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com>

On May 16, 2015, at 00:50, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
>> On 15 May 2015 at 22:21, Paul Moore <p.f.moore at gmail.com> wrote:
>>> On 15 May 2015 at 02:02, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>>> (3) Problem: Code you can't or won't fix buggily passes you Unicode
>>>    that might have surrogates in it.
>>>    Solution: text-to-text codecs (but I don't see why they can't be
>>>    written as encode-decode chains).
>>> 
>>> As I've written before, I think text-to-text codecs are an attractive
>>> nuisance.  The temptation to use them in most cases should be refused,
>>> because it's a better solution to deal with the problem at the
>>> incoming boundary or the outgoing boundary (using str<->bytes codecs).
>> 
>> One case I'd found a need for text->text handling (although not
>> related to surrogates) was taking arbitrary Unicode and applying an
>> error handler to it before writing it to a stream with "strict"
>> encoding. (So something like "arbitrary text".encode('latin1',
>> 'errors='backslashescape').decode('latin1')).
>> 
>> The encode/decode pair seemed ugly, although it was the only way I
>> could find. I could easily imagine using a "rehandle" type of function
>> for this (although I wouldn't use the actual proposed functions here,
>> as the use of "surrogate" and "astral" in the names would lead me to
>> assume they were inappropriate).
> 
> That's a different case, as you need to know the encoding of the
> target stream in order to know which code points that codec can't
> handle. Even when you do know the target encoding, Python itself has
> no idea which code points a given text encoding can and can't handle,
> so the only way to find out is to try it and see what happens.
> 
> The unique thing about the surrogate case is that *no* codec is
> supposed to encode them, not even the universal ones:

Python doesn't have a CESU-8 codec (or "JNI UTF-8" or any of the other near-equivalent abominations), right? Because IIRC, CESU-8 says that (in Python terms) '\U00010400' and '\uD801\uDC00' should both encode to b'\xED\xA0\x81\xED\xB0\x80', instead of the former encoding to b'\xF0\x90\x90\x80' and the latter not being encodable because it's not a string.

Anyway, I don't know if that counts as a Unicode encoding, since it's only described in a TR, not the standard itself. And Python is probably right to ignore it (assuming I'm remembering right and Python does ignore it...), even if that makes problems for Jython or Oracle DB-API libs or whatever.



From steve at pearwood.info  Sat May 16 12:02:41 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 16 May 2015 20:02:41 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org> <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
 <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
 <CADiSq7eq=CJ7zuJEUstesW-wn+ysz3z8kExAZeaOC=OjciKVaw@mail.gmail.com>
 <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com>
Message-ID: <20150516100240.GE5663@ando.pearwood.info>

On Sat, May 16, 2015 at 02:47:02AM -0700, Andrew Barnert via Python-ideas wrote:

> > The unique thing about the surrogate case is that *no* codec is
> > supposed to encode them, not even the universal ones:
> 
> Python doesn't have a CESU-8 codec (or "JNI UTF-8" or any of the other 
> near-equivalent abominations), right? 

*shrug* Even if it doesn't, it's just a codec, not new syntax. Anyone 
can create their own codecs. There probably are people who need CESU-8 
for compatibility with other apps, and if the std lib can include 
UTF-8-sig, it can probably include CESU-8. Or it can be left for those 
who need it to implement it themselves.


> Because IIRC, CESU-8 says that 
> (in Python terms) '\U00010400' and '\uD801\uDC00' should both encode 
> to b'\xED\xA0\x81\xED\xB0\x80', instead of the former encoding to 
> b'\xF0\x90\x90\x80' and the latter not being encodable because it's 
> not a string.

Sounds about right as far as the first half goes:

http://unicode.org/reports/tr26/

As far as the second half goes, the TR doesn't say anything about 
processing surrogate pairs in the source Unicode string. Since (strict) 
Unicode strings cannot contain surrogates, I think that CESU-8 should 
treat it as an error just like UTF-8. The TR does say:

    CESU-8 defines an encoding scheme for Unicode identical to
    UTF-8 except for its representation of supplementary characters. 


That seems pretty clear to me: if '\uDC00'.encode('utf-8') raises an 
error, then so should '\uDC00'.encode('cesu-8').



-- 
Steve

From p.f.moore at gmail.com  Sat May 16 12:19:26 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 16 May 2015 11:19:26 +0100
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
 <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
 <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
 <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CACac1F9iGEgyBzJ-oS9eyek79YPR6qmN-ouMVaHK+XkvhpYFbw@mail.gmail.com>

On 16 May 2015 at 04:56, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> That's not the use case envisioned for these functions, though.  You
> want to change the textual content of the stream (by restricting the
> repertoire), not change the representation of non-textual content.

Thanks. I see the difference now. (Plus Nick's point about needing to
know the encoding in my use case).

>  > The encode/decode pair seemed ugly, although it was the only way I
>  > could find.
>
> I find the fact that there's an output stream with an inappropriate
> error handler far uglier!

The stream in this case was sys.stdout, which you can't blame me for, though :-)

The use case in question was specifically wanting to avoid encoding
errors when printing arbitrary text. (On Windows, where
sys.stdout.encoding is not UTF-8). This is a pretty common issue that
I see raised a lot, and it is frustrating to have to deal with it in
application code. I don't know enough about the issues to make a good
case that errors='strict' is the wrong error handling policy for
sys.stdout, though. And you can't change the policy on an existing
stream, so the application is stuck with strict unless it wants to
re-wrap sys.stdout.buffer (which I'm always a little reluctant to do,
as it seems like it may cause other issues, although I don't know why
I think that :-)).

> Note that the encode/decode pair is quite efficient, although the
> "rehandle" function could be about twice as fast.  Still, if you're
> output-bound by the speed of a disk or the like, encode/decode will
> have no trouble keeping up.

Yeah, it's not a performance issue, just a mild feeling of "this looks clumsy".

Paul

From stephen at xemacs.org  Sat May 16 15:50:49 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 16 May 2015 22:50:49 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
 <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
 <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
 <CADiSq7eq=CJ7zuJEUstesW-wn+ysz3z8kExAZeaOC=OjciKVaw@mail.gmail.com>
 <7F341A38-69A4-4EA8-858D-C08D50E9D3C1@yahoo.com>
Message-ID: <87617s1vie.fsf@uwakimon.sk.tsukuba.ac.jp>

Andrew Barnert via Python-ideas writes:

 > Python doesn't have a CESU-8 codec (or "JNI UTF-8" or any of the
 > other near-equivalent abominations), right? Because IIRC, CESU-8
 > says that (in Python terms) '\U00010400' and '\uD801\uDC00' should
 > both encode to b'\xED\xA0\x81\xED\xB0\x80', instead of the former
 > encoding to b'\xF0\x90\x90\x80' and the latter not being encodable
 > because it's not a string.

It's ambiguous what the TR intends.  It does say it encodes code
points, which would argue that '\uD801\uDC00' is encodable.  However,
it also defines itself as a representation of UTF-16, and the
definition of the encoding itself states "Prior to transforming data
into CESU-8, supplementary characters must first be converted to their
surrogate pair UTF-16 representation."  UTF-16's normative definition
defines it a Unicode transformation format, and therefore a UTF-16
stream cannot contain surrogates representing themselves, and there's
nothing in the document that refers to the possible interpretation of
surrogate code points as themselves.

So I agree with Steven that a str-to-bytes CESU-8 encoder should error
on any surrogates, and the decoder should error on surrogates not
encountered as a valid surrogate pair.  Possibly you'd want special
error handlers that allow handling of the UTF-8 encoding of surrogates.

 > Anyway, I don't know if that counts as a Unicode encoding, since
 > it's only described in a TR, not the standard itself.

The TR specifically excludes it from the standard.

 > And Python is probably right to ignore it (assuming I'm remembering
 > right and Python does ignore it...), even if that makes problems
 > for Jython or Oracle DB-API libs or whatever.

Why would it cause trouble for them?  They're not going to use
byte-oriented functions to manipulate Unicode after going to all that
trouble to implement UTF-16 handling internally.

We're getting kinda far afield here, aren't we?


From stephen at xemacs.org  Sat May 16 16:15:10 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 16 May 2015 23:15:10 +0900
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CACac1F9iGEgyBzJ-oS9eyek79YPR6qmN-ouMVaHK+XkvhpYFbw@mail.gmail.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
 <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
 <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
 <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F9iGEgyBzJ-oS9eyek79YPR6qmN-ouMVaHK+XkvhpYFbw@mail.gmail.com>
Message-ID: <874mnc1udt.fsf@uwakimon.sk.tsukuba.ac.jp>

Paul Moore writes:

 > The stream in this case was sys.stdout, which you can't blame me
 > for, though :-)

Yeah, I think there's an issue or two on that.
 
 > I don't know enough about the issues to make a good case that
 > errors='strict' is the wrong error handling policy for sys.stdout,
 > though.

No, errors='strict' is always the right default policy, especially for
UTF-encoded output, but for other encodings as well.

 > And you can't change the policy on an existing stream,

Hm.  I would not want the job of rewriting the codec machinery to
guarantee that users would get what they deserve from changing
encodings on a stream -- I suspect that would be hard, or even
impossible for a stateful encoding (eg, a 7-bit ISO-2022 encoding).
But I can't really see where the harm would be in allowing changes of
the error handler.  (Of course that goes in the categories of "for
consenting adults" and "you can keep any bullets that lodge in your
foot".)  I'll have to think hard about it.

 > so the application is stuck with strict unless it wants to re-wrap
 > sys.stdout.buffer (which I'm always a little reluctant to do, as it
 > seems like it may cause other issues, although I don't know why I
 > think that :-)).

In your case, I don't see why it would cause a problem unless there's
other output potentially incompatible with the sys.stdout encoding
that *you* *do* want errors on.  I can imagine there exist cases where
you have something like log output where you *know* that the logger
produces 30 columns of ASCII and then up to 45 columns copied from its
input, and only the first 30 "really need" to be accurate and valid in
the output encoding.  (I don't actually have such a case to hand,
though -- I've never seen a logger that randomly inserted Japanese in
timestamps or something like that.)


From ncoghlan at gmail.com  Sat May 16 16:44:52 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 17 May 2015 00:44:52 +1000
Subject: [Python-ideas] Processing surrogates in
In-Reply-To: <CACac1F9iGEgyBzJ-oS9eyek79YPR6qmN-ouMVaHK+XkvhpYFbw@mail.gmail.com>
References: <mj1bv1$u93$1@ger.gmane.org>
 <721512777.175235.1431593322139.JavaMail.yahoo@mail.yahoo.com>
 <mj1sjm$ukh$1@ger.gmane.org>
 <115169E2-1271-42F8-9B72-E863EE61DBEA@yahoo.com>
 <87fv6z1awu.fsf@uwakimon.sk.tsukuba.ac.jp>
 <A50A63C8-425D-4CD3-9D93-FDC10850E263@yahoo.com>
 <87egmi1wm5.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F-2Dt+cEQfPASRzTPEzqK7Nz-RYDH=fg6UkaT_XSCbO1w@mail.gmail.com>
 <87bnhl18h4.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CACac1F9iGEgyBzJ-oS9eyek79YPR6qmN-ouMVaHK+XkvhpYFbw@mail.gmail.com>
Message-ID: <CADiSq7ftFaVkdWC3LKqf_mPYHpXFOJvCY+_EXiCiA=LhCH4zBw@mail.gmail.com>

On 16 May 2015 at 20:19, Paul Moore <p.f.moore at gmail.com> wrote:
> On 16 May 2015 at 04:56, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> That's not the use case envisioned for these functions, though.  You
>> want to change the textual content of the stream (by restricting the
>> repertoire), not change the representation of non-textual content.
>
> Thanks. I see the difference now. (Plus Nick's point about needing to
> know the encoding in my use case).
>
>>  > The encode/decode pair seemed ugly, although it was the only way I
>>  > could find.
>>
>> I find the fact that there's an output stream with an inappropriate
>> error handler far uglier!
>
> The stream in this case was sys.stdout, which you can't blame me for, though :-)
>
> The use case in question was specifically wanting to avoid encoding
> errors when printing arbitrary text. (On Windows, where
> sys.stdout.encoding is not UTF-8). This is a pretty common issue that
> I see raised a lot, and it is frustrating to have to deal with it in
> application code. I don't know enough about the issues to make a good
> case that errors='strict' is the wrong error handling policy for
> sys.stdout, though. And you can't change the policy on an existing
> stream, so the application is stuck with strict unless it wants to
> re-wrap sys.stdout.buffer (which I'm always a little reluctant to do,
> as it seems like it may cause other issues, although I don't know why
> I think that :-)).

It has the potential to cause problems if anything still has a
reference to the old stream (such as, say, sys.__stdout__, or an
eagerly bound reference in a default argument value). If you call
detach(), the old references will be entirely broken, if you don't
then you have two different text wrappers sharing the same underlying
buffered stream. Creating a completely new IO stream that only shares
the operating system level file descriptor has similar data
interleaving problems to the latter approach.

There's an open issue to support changing the encoding and error
handling of an existing stream in place, which I'd suggested deferring
to 3.6 based on the fact we're switching the *nix streams to use
surrogateescape if the system claims the locale encoding is ASCII:
http://bugs.python.org/issue15216#msg242942

However, it the lack of that capability is causing problems on Windows
as well, then it may be worth updating Nikolaus Rath's patch and
applying it for 3.5 and dealing with the consequences. The main reason
I've personally been wary of the change is because I expect there to
be various edge cases encountered with different codecs, so I suspect
that adding this feature will be setting the stage for an
"interesting" collection of future bug reports. On the other hand,
there's certain kinds of programs (like an iconv equivalent) that
could most readily be implemented by being able to change the encoding
of the standard streams based on application level configuration
settings, which means having a way to override the default settings
chosen by the interpreter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From charleshixsn at earthlink.net  Sun May 17 20:07:42 2015
From: charleshixsn at earthlink.net (Charles Hixson)
Date: Sun, 17 May 2015 11:07:42 -0700
Subject: [Python-ideas] an unless statement would occasionally be useful
Message-ID: <5558D8EE.8010105@earthlink.net>

I'm envisioning "unless" as a synonym for "if not(...):"  currently I use

if .... :
     pass
else:
   ...

which works.

N.B.:  This isn't extremely important as there are already two ways to 
accomplish the same purpose, but it would be useful, seems easy to 
implement, and is already used by many other languages.  The advantage 
is that when the condition is long it simplifies understanding.

From mertz at gnosis.cx  Sun May 17 21:02:00 2015
From: mertz at gnosis.cx (David Mertz)
Date: Sun, 17 May 2015 12:02:00 -0700
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <5558D8EE.8010105@earthlink.net>
References: <5558D8EE.8010105@earthlink.net>
Message-ID: <CAEbHw4aQXOq0DZ-CLhHQEQwMr5Na7fd74B+0AKaMGkyi+CwnBg@mail.gmail.com>

This exists and is spelled 'not' in Python :-)
On May 17, 2015 11:16 AM, "Charles Hixson" <charleshixsn at earthlink.net>
wrote:

> I'm envisioning "unless" as a synonym for "if not(...):"  currently I use
>
> if .... :
>     pass
> else:
>   ...
>
> which works.
>
> N.B.:  This isn't extremely important as there are already two ways to
> accomplish the same purpose, but it would be useful, seems easy to
> implement, and is already used by many other languages.  The advantage is
> that when the condition is long it simplifies understanding.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150517/0c9f073a/attachment.html>

From cory at lukasa.co.uk  Sun May 17 21:02:34 2015
From: cory at lukasa.co.uk (Cory Benfield)
Date: Sun, 17 May 2015 20:02:34 +0100
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <5558D8EE.8010105@earthlink.net>
References: <5558D8EE.8010105@earthlink.net>
Message-ID: <CFE79C55-9F55-469E-9385-AFC99F797B8F@lukasa.co.uk>


> On 17 May 2015, at 19:07, Charles Hixson <charleshixsn at earthlink.net> wrote:
> 
> I'm envisioning "unless" as a synonym for "if not(...):"  currently I use
> 
> if .... :
>    pass
> else:
>  ?

That?s interesting, Personally, I think I?d invert that conditional (or, if the rest of the body is long, do an early return).

Playing the role of opposition for a moment, I?d argue that we don?t need ?unless? because we already have a spelling for that: ?if not?. Is it not said: "There should be one-- and preferably only one --obvious way to do it.??

From tjreedy at udel.edu  Sun May 17 21:04:51 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 17 May 2015 15:04:51 -0400
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <5558D8EE.8010105@earthlink.net>
References: <5558D8EE.8010105@earthlink.net>
Message-ID: <mjaook$jh0$1@ger.gmane.org>

On 5/17/2015 2:07 PM, Charles Hixson wrote:
> I'm envisioning "unless" as a synonym for "if not(...):"  currently I use
>
> if .... :
>      pass
> else:
>    ...
>
> which works.
>
> N.B.:  This isn't extremely important as there are already two ways to
> accomplish the same purpose, but it would be useful, seems easy to
> implement, and is already used by many other languages.  The advantage
> is that when the condition is long it simplifies understanding.

We try not to bloat Python with minor synonyms. They make it harder to 
learn and remember the language and chose which synonym to use.


-- 
Terry Jan Reedy


From breamoreboy at yahoo.co.uk  Sun May 17 22:28:01 2015
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Sun, 17 May 2015 21:28:01 +0100
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <5558D8EE.8010105@earthlink.net>
References: <5558D8EE.8010105@earthlink.net>
Message-ID: <mjatkm$s4u$1@ger.gmane.org>

On 17/05/2015 19:07, Charles Hixson wrote:
> I'm envisioning "unless" as a synonym for "if not(...):"  currently I use
>
> if .... :
>      pass
> else:
>    ...
>
> which works.
>
> N.B.:  This isn't extremely important as there are already two ways to
> accomplish the same purpose, but it would be useful, seems easy to
> implement, and is already used by many other languages.  The advantage
> is that when the condition is long it simplifies understanding.

IMHO if a statement is only "occasionally useful" is should not be in 
the Python language.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence


From larocca at abiresearch.com  Sun May 17 22:31:29 2015
From: larocca at abiresearch.com (Douglas La Rocca)
Date: Sun, 17 May 2015 20:31:29 +0000
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <mjatkm$s4u$1@ger.gmane.org>
References: <5558D8EE.8010105@earthlink.net>,<mjatkm$s4u$1@ger.gmane.org>
Message-ID: <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com>

it could also be confused as a synonym for

    while not condition:
        ...




> On May 17, 2015, at 4:28 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
> 
>> On 17/05/2015 19:07, Charles Hixson wrote:
>> I'm envisioning "unless" as a synonym for "if not(...):"  currently I use
>> 
>> if .... :
>>     pass
>> else:
>>   ...
>> 
>> which works.
>> 
>> N.B.:  This isn't extremely important as there are already two ways to
>> accomplish the same purpose, but it would be useful, seems easy to
>> implement, and is already used by many other languages.  The advantage
>> is that when the condition is long it simplifies understanding.
> 
> IMHO if a statement is only "occasionally useful" is should not be in the Python language.
> 
> -- 
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
> 
> Mark Lawrence
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From rymg19 at gmail.com  Sun May 17 22:57:11 2015
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Sun, 17 May 2015 15:57:11 -0500
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com>
References: <5558D8EE.8010105@earthlink.net>, <mjatkm$s4u$1@ger.gmane.org>
 <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com>
Message-ID: <174563E1-4BA9-47C7-9B47-6FD7B3070E5C@gmail.com>

Has anyone done that?????

I mean, I like Python without `unless`, but I've never seen it used to mean that. Usually, `until` is used.

On May 17, 2015 3:31:29 PM CDT, Douglas La Rocca <larocca at abiresearch.com> wrote:
>it could also be confused as a synonym for
>
>    while not condition:
>        ...
>
>
>
>
>> On May 17, 2015, at 4:28 PM, Mark Lawrence <breamoreboy at yahoo.co.uk>
>wrote:
>> 
>>> On 17/05/2015 19:07, Charles Hixson wrote:
>>> I'm envisioning "unless" as a synonym for "if not(...):"  currently
>I use
>>> 
>>> if .... :
>>>     pass
>>> else:
>>>   ...
>>> 
>>> which works.
>>> 
>>> N.B.:  This isn't extremely important as there are already two ways
>to
>>> accomplish the same purpose, but it would be useful, seems easy to
>>> implement, and is already used by many other languages.  The
>advantage
>>> is that when the condition is long it simplifies understanding.
>> 
>> IMHO if a statement is only "occasionally useful" is should not be in
>the Python language.
>> 
>> -- 
>> My fellow Pythonistas, ask not what our language can do for you, ask
>> what you can do for our language.
>> 
>> Mark Lawrence
>> 
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>https://mail.python.org/mailman/listinfo/python-ideas
>Code of Conduct: http://python.org/psf/codeofconduct/

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150517/08513ae1/attachment-0001.html>

From python at mrabarnett.plus.com  Sun May 17 23:07:02 2015
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 17 May 2015 22:07:02 +0100
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com>
References: <5558D8EE.8010105@earthlink.net>, <mjatkm$s4u$1@ger.gmane.org>
 <4F01CDCF-8527-472A-9B47-DD41246608D7@abiresearch.com>
Message-ID: <555902F6.60300@mrabarnett.plus.com>

On 2015-05-17 21:31, Douglas La Rocca wrote:
> it could also be confused as a synonym for
>
>      while not condition:
>          ...
>
No, that would be:

     until condition:
         ...

which we don't want either. :-)

>> On May 17, 2015, at 4:28 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>>
>>> On 17/05/2015 19:07, Charles Hixson wrote:
>>> I'm envisioning "unless" as a synonym for "if not(...):"  currently I use
>>>
>>> if .... :
>>>     pass
>>> else:
>>>   ...
>>>
>>> which works.
>>>
>>> N.B.:  This isn't extremely important as there are already two ways to
>>> accomplish the same purpose, but it would be useful, seems easy to
>>> implement, and is already used by many other languages.  The advantage
>>> is that when the condition is long it simplifies understanding.
>>
>> IMHO if a statement is only "occasionally useful" is should not be in the Python language.
>>


From abarnert at yahoo.com  Sun May 17 23:35:01 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 17 May 2015 14:35:01 -0700
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <5558D8EE.8010105@earthlink.net>
References: <5558D8EE.8010105@earthlink.net>
Message-ID: <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com>

On May 17, 2015, at 11:07, Charles Hixson <charleshixsn at earthlink.net> wrote:
> 
> I'm envisioning "unless" as a synonym for "if not(...):"  currently I use
> 
> if .... :
>    pass
> else:
>  ...
> 
> which works.
> 
> N.B.:  This isn't extremely important as there are already two ways to accomplish the same purpose, but it would be useful, seems easy to implement, and is already used by many other languages.  The advantage is that when the condition is long it simplifies understanding.

But if you just use not instead of else, it simplifies understanding just as much--and without making the language larger (which makes it harder to learn/remember when switching languages, makes the parser bigger, etc.):

    if not ...:
        ...

It seems like every year someone proposes either "unless" or "until" or the whole suite of Perl variants (inherently-negated keywords, postfix, do...while-type syntax), but nobody ever asks for anything clever. Think of what you could do with a "lest" statement, which will speculatively execute the body and then test the condition before deciding whether to actually have executed the body. Or a "without" that closes a context before the body instead of after. Or a "butfor" that iterates over every extant object that isn't contained in the Iterable. Or a "because" that raises instead of skipping the body if the condition isn't truthy. Or a "before" that remembers the body for later and executes it a synchronously when the condition becomes true.



> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From stephen at xemacs.org  Mon May 18 04:53:21 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 18 May 2015 11:53:21 +0900
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com>
References: <5558D8EE.8010105@earthlink.net>
 <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com>
Message-ID: <87r3qezjdq.fsf@uwakimon.sk.tsukuba.ac.jp>

Andrew Barnert via Python-ideas writes:

 > Or a "before" that remembers the body for later and executes it a
 > synchronously when the condition becomes true.

That's not Perl, that's Make.<action class="duck"/>


From ram at rachum.com  Mon May 18 10:14:31 2015
From: ram at rachum.com (Ram Rachum)
Date: Mon, 18 May 2015 11:14:31 +0300
Subject: [Python-ideas] Making it easy to prepare for PEP479
Message-ID: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>

Hi everybody,

I just heard about PEP479, and I want to prepare my open-source projects
for it.

I have no problem changing the code so it won't depend on StopIteration to
stop generators, but I'd also like to test it in my test suite. In Python
3.5 I could use `from __future__ import generator_stop` so the test would
be real (i.e. would fail wherever I rely on StopIteration to stop a
generator). But I can't really put this snippet in my code because then it
would fail on all Python versions below 3.5.

This makes me think of two ideas:

1. Maybe we should allow `from __future__ import whatever` in code, even if
`whatever` wasn't invented yet, and simply make it a no-op? This wouldn't
help now but it could prevent these problems in the future.
2. Maybe introduce a way to do `from __future__ import generator_stop`
without including it in code? Maybe a flag to the `python` command? (If
something like this exists please let me know.)


Thanks,
Ram.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150518/a3eb1386/attachment.html>

From rosuav at gmail.com  Mon May 18 10:38:32 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Mon, 18 May 2015 18:38:32 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
Message-ID: <CAPTjJmrae41T9Fqb11koED0uz-fe0CogjkG+iGErFcj6NbLApw@mail.gmail.com>

On Mon, May 18, 2015 at 6:14 PM, Ram Rachum <ram at rachum.com> wrote:
> Hi everybody,
>
> I just heard about PEP479, and I want to prepare my open-source projects for
> it.
>
> I have no problem changing the code so it won't depend on StopIteration to
> stop generators, but I'd also like to test it in my test suite. In Python
> 3.5 I could use `from __future__ import generator_stop` so the test would be
> real (i.e. would fail wherever I rely on StopIteration to stop a generator).
> But I can't really put this snippet in my code because then it would fail on
> all Python versions below 3.5.
>
> This makes me think of two ideas:
>
> 1. Maybe we should allow `from __future__ import whatever` in code, even if
> `whatever` wasn't invented yet, and simply make it a no-op? This wouldn't
> help now but it could prevent these problems in the future.

Downside: A typo would silently stop a future directive from working.
If "from __future__ import generator_stop" doesn't cause an error in
<3.5, then "from __future__ import genarator_stop" would cause no
error in any version, and that's a problem.

> 2. Maybe introduce a way to do `from __future__ import generator_stop`
> without including it in code? Maybe a flag to the `python` command? (If
> something like this exists please let me know.)

The problem is that it's hard to try-except special directives like
this. You can try-except a regular import, catch the run-time error,
and do something else; but short of exec'ing your code, you can't
catch SyntaxError. I'm not sure how best to deal with this.

However, it ought to be possible to simply run your tests with
generator_stop active, even if that means using exec instead of
regular imports. Something like this:

# utils.py
# In the presence of generator_stop, this will bomb

def f():
    raise StopIteration

def g():
    yield f()


# test_utils.py

# Instead of:
# import utils
# Try this:
with open("utils.py") as f:
    code = "from __future__ import generator_stop\n" + f.read()
import sys # Any module at all
utils = type(sys)("utils")
exec(code,vars(utils))

# At this point, you can write regular tests involving the
# 'utils' module, which has been executed in the presence
# of the generator_stop directive.
list(utils.g())

It's ugly, and it depends on the module being in the current directory
(though you could probably use importlib to deal with that part), but
it's just for your tests. I don't know of any way to simplify this
out, but it may well be possible (using some mechanism similar to what
the interactive interpreter does); in any case, all the ugliness
should be in a single block up the top of your test runner - and you
could turn it into a function, as you'll probably want to do this for
lots of modules.

Experts of python-ideas, is there a way to use an import hook to do this?

ChrisA

From steve at pearwood.info  Mon May 18 14:00:55 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 18 May 2015 22:00:55 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
Message-ID: <20150518120054.GK5663@ando.pearwood.info>

On Mon, May 18, 2015 at 11:14:31AM +0300, Ram Rachum wrote:
> Hi everybody,
> 
> I just heard about PEP479, and I want to prepare my open-source projects
> for it.
> 
> I have no problem changing the code so it won't depend on StopIteration to
> stop generators, but I'd also like to test it in my test suite. In Python
> 3.5 I could use `from __future__ import generator_stop` so the test would
> be real (i.e. would fail wherever I rely on StopIteration to stop a
> generator). But I can't really put this snippet in my code because then it
> would fail on all Python versions below 3.5.

Sometimes you have to do things the old fashioned way:

if sys.version_info[:2] < (3, 5):
    # write test one way
else:
    # write test another way

At least it's not a change of syntax :-)

You can also move tests into a separate file that is version specific. 
That's a bit of a nuisance with small projects where you would a single 
test file, but for larger projects there's nothing wrong with splitting 
tests across multiple files.


> This makes me think of two ideas:
> 
> 1. Maybe we should allow `from __future__ import whatever` in code, even if
> `whatever` wasn't invented yet, and simply make it a no-op? This wouldn't
> help now but it could prevent these problems in the future.

from __future__ import spelling_mistaek
# code that depends on spelling_mistake feature will now behave weirdly


> 2. Maybe introduce a way to do `from __future__ import generator_stop`
> without including it in code? Maybe a flag to the `python` command? (If
> something like this exists please let me know.)

I don't think that is important enough to require either an environment 
variable or a command line switch.



-- 
Steve

From greg.ewing at canterbury.ac.nz  Mon May 18 14:07:55 2015
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 19 May 2015 00:07:55 +1200
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CAPTjJmrae41T9Fqb11koED0uz-fe0CogjkG+iGErFcj6NbLApw@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <CAPTjJmrae41T9Fqb11koED0uz-fe0CogjkG+iGErFcj6NbLApw@mail.gmail.com>
Message-ID: <5559D61B.3060302@canterbury.ac.nz>

Chris Angelico wrote:
> However, it ought to be possible to simply run your tests with
> generator_stop active, even if that means using exec instead of
> regular imports.

Would it be possible for site.py to monkey-patch
something into the __future__ module, to make
importing it a no-op?

-- 
Greg

From rosuav at gmail.com  Mon May 18 14:51:02 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Mon, 18 May 2015 22:51:02 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <5559D61B.3060302@canterbury.ac.nz>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <CAPTjJmrae41T9Fqb11koED0uz-fe0CogjkG+iGErFcj6NbLApw@mail.gmail.com>
 <5559D61B.3060302@canterbury.ac.nz>
Message-ID: <CAPTjJmrqmbMNcQ+gbuTFqnWmn=uc0hjat1j9Cx3vVE2c3NfiOw@mail.gmail.com>

On Mon, May 18, 2015 at 10:07 PM, Greg Ewing
<greg.ewing at canterbury.ac.nz> wrote:
> Chris Angelico wrote:
>>
>> However, it ought to be possible to simply run your tests with
>> generator_stop active, even if that means using exec instead of
>> regular imports.
>
>
> Would it be possible for site.py to monkey-patch
> something into the __future__ module, to make
> importing it a no-op?

I doubt it; __future__ imports are special compiler magic.

>>> import __future__
>>> __future__.all_feature_names.append("asdf")
>>> __future__.asdf = __future__.with_statement
>>> from __future__ import asdf
  File "<stdin>", line 1
SyntaxError: future feature asdf is not defined

ChrisA

From python at mrabarnett.plus.com  Mon May 18 15:16:20 2015
From: python at mrabarnett.plus.com (MRAB)
Date: Mon, 18 May 2015 14:16:20 +0100
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <20150518120054.GK5663@ando.pearwood.info>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
Message-ID: <5559E624.7030708@mrabarnett.plus.com>

On 2015-05-18 13:00, Steven D'Aprano wrote:
> On Mon, May 18, 2015 at 11:14:31AM +0300, Ram Rachum wrote:
>> Hi everybody,
>>
>> I just heard about PEP479, and I want to prepare my open-source projects
>> for it.
>>
>> I have no problem changing the code so it won't depend on StopIteration to
>> stop generators, but I'd also like to test it in my test suite. In Python
>> 3.5 I could use `from __future__ import generator_stop` so the test would
>> be real (i.e. would fail wherever I rely on StopIteration to stop a
>> generator). But I can't really put this snippet in my code because then it
>> would fail on all Python versions below 3.5.
>
> Sometimes you have to do things the old fashioned way:
>
> if sys.version_info[:2] < (3, 5):
>      # write test one way
> else:
>      # write test another way
>
> At least it's not a change of syntax :-)
>
> You can also move tests into a separate file that is version specific.
> That's a bit of a nuisance with small projects where you would a single
> test file, but for larger projects there's nothing wrong with splitting
> tests across multiple files.
>
>
>> This makes me think of two ideas:
>>
>> 1. Maybe we should allow `from __future__ import whatever` in code, even if
>> `whatever` wasn't invented yet, and simply make it a no-op? This wouldn't
>> help now but it could prevent these problems in the future.
>
> from __future__ import spelling_mistaek
> # code that depends on spelling_mistake feature will now behave weirdly
>
Suppose I used:

     from __future__ import unicode_literals

in Python 2.5 and it didn't complain.

I'd then be puzzled why my plain string literals weren't Unicode.

>
>> 2. Maybe introduce a way to do `from __future__ import generator_stop`
>> without including it in code? Maybe a flag to the `python` command? (If
>> something like this exists please let me know.)
>
> I don't think that is important enough to require either an environment
> variable or a command line switch.
>


From jsbueno at python.org.br  Mon May 18 15:32:35 2015
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Mon, 18 May 2015 10:32:35 -0300
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <5559E624.7030708@mrabarnett.plus.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
Message-ID: <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>

Indeed - importing as NOP would surely be broken -

The nice fix would be to be able to do
from __future__ import jaberwock

and have a plain "ImportError" that could be catched.

But, as Chris Angelico put it, it might be complicated.
Manually testing sys.version seens to be the way to go
Because,  even if making __future__ imports raise
ImportError, taht would also only be available from
Py 3.5/3.6 onwards.

(Otherwise
from __future__ import from__future__import_ImportError
seens fun enough to actually be created)



On 18 May 2015 at 10:16, MRAB <python at mrabarnett.plus.com> wrote:
> On 2015-05-18 13:00, Steven D'Aprano wrote:
>>
>> On Mon, May 18, 2015 at 11:14:31AM +0300, Ram Rachum wrote:
>>>
>>> Hi everybody,
>>>
>>> I just heard about PEP479, and I want to prepare my open-source projects
>>> for it.
>>>
>>> I have no problem changing the code so it won't depend on StopIteration
>>> to
>>> stop generators, but I'd also like to test it in my test suite. In Python
>>> 3.5 I could use `from __future__ import generator_stop` so the test would
>>> be real (i.e. would fail wherever I rely on StopIteration to stop a
>>> generator). But I can't really put this snippet in my code because then
>>> it
>>> would fail on all Python versions below 3.5.
>>
>>
>> Sometimes you have to do things the old fashioned way:
>>
>> if sys.version_info[:2] < (3, 5):
>>      # write test one way
>> else:
>>      # write test another way
>>
>> At least it's not a change of syntax :-)
>>
>> You can also move tests into a separate file that is version specific.
>> That's a bit of a nuisance with small projects where you would a single
>> test file, but for larger projects there's nothing wrong with splitting
>> tests across multiple files.
>>
>>
>>> This makes me think of two ideas:
>>>
>>> 1. Maybe we should allow `from __future__ import whatever` in code, even
>>> if
>>> `whatever` wasn't invented yet, and simply make it a no-op? This wouldn't
>>> help now but it could prevent these problems in the future.
>>
>>
>> from __future__ import spelling_mistaek
>> # code that depends on spelling_mistake feature will now behave weirdly
>>
> Suppose I used:
>
>     from __future__ import unicode_literals
>
> in Python 2.5 and it didn't complain.
>
> I'd then be puzzled why my plain string literals weren't Unicode.
>
>>
>>> 2. Maybe introduce a way to do `from __future__ import generator_stop`
>>> without including it in code? Maybe a flag to the `python` command? (If
>>> something like this exists please let me know.)
>>
>>
>> I don't think that is important enough to require either an environment
>> variable or a command line switch.
>>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From rosuav at gmail.com  Mon May 18 16:13:21 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 19 May 2015 00:13:21 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
 <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
Message-ID: <CAPTjJmpNSS2yn2NHJfm1rzYUyCN9QH0tvoV57n_623hGVFbiWQ@mail.gmail.com>

On Mon, May 18, 2015 at 11:32 PM, Joao S. O. Bueno
<jsbueno at python.org.br> wrote:
> Indeed - importing as NOP would surely be broken -
>
> The nice fix would be to be able to do
> from __future__ import jaberwock
>
> and have a plain "ImportError" that could be catched.

Indeed. Though I'm not sure what a correctly-spelled "from __future__
import jabberwock" would do; exceptions already "burble" up the call
stack until they meet "the clause that catch[es]" them. :)

> But, as Chris Angelico put it, it might be complicated.
> Manually testing sys.version seens to be the way to go
> Because,  even if making __future__ imports raise
> ImportError, taht would also only be available from
> Py 3.5/3.6 onwards.
>
> (Otherwise
> from __future__ import from__future__import_ImportError
> seens fun enough to actually be created)

Heh. Though there's no particular reason to guard this with a future
directive; if the behaviour were to be changed, it could just be done
immediately - you wouldn't need a couple of minor versions' notice
that something's going to stop raising errors.

The way to make this work would be two-fold. Firstly, an incorrect
__future__ directive would have to no longer be a SyntaxError; and
secondly, __future__ directives would have to be permitted after a try
statement (currently, they're not allowed to follow anything, so the
'try' would have to be special-cased to be allowed in). With those two
changes, though, the failing of a __future__ directive would now
become a failure at the (usually-ignored) run-time import - the
regular action of "from module import name" would fail when it tries
to import something that isn't present in the module. As a side
effect, some specific directives would become legal no-ops:

from __future__ import CO_FUTURE_PRINT_FUNCTION
from __future__ import __builtins__
# etc

I don't see this as a problem, given that the point of the SyntaxError
is to catch either outright spelling errors or version issues (eg
trying to use "from __future__ import print_function" in Python 2.5),
both of which will still raise ImportError.

The question is, how often is it actually useful to import a module
and ignore a __future__ directive? Going through all_feature_names:

nested_scopes: No idea; I think code is legal with or without it.
generators: Using "yield" as a keyword will fail
division: Yes, this one would work
absolute_import: This would work
with_statement: Any actual use of 'with' will bomb out
print_function: Might work if you restrict yourself
unicode_literals: Possibly would work, but ow, big confusion
barry_as_FLUFL: No idea, give it a try!
generator_stop: Yes, would work.

So three of them would definitely work (in the sense that code is
syntactically correct in both forms), and you could cope in some way
with an except block; print_function would work as long as you build
your code with that in mind (but if you're doing that anyway, just
drop the future directive); and unicode_literals *might* work, maybe.
The rest? If you're using the future directive, it's because you want
the new keyword, which means you're going to be using it. If the
future directive isn't recognized, you're getting syntax errors
elsewhere, so there's no opportunity to try/except the problem away.
What will the future of Python future directives be like? Most likely
a similarly mixed bag, so this is a feature that could potentially
have very little value.

Is it worth downgrading an instant SyntaxError to a run-time
ImportError to allow a narrow use-case?

ChrisA

From tjreedy at udel.edu  Mon May 18 16:17:06 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 18 May 2015 10:17:06 -0400
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
Message-ID: <mjcs94$20s$1@ger.gmane.org>

On 5/18/2015 4:14 AM, Ram Rachum wrote:

> I just heard about PEP479, and I want to prepare my open-source projects
> for it.
>
> I have no problem changing the code so it won't depend on StopIteration
> to stop generators, but I'd also like to test it in my test suite. In
> Python 3.5 I could use `from __future__ import generator_stop` so the
> test would be real (i.e. would fail wherever I rely on StopIteration to
> stop a generator). But I can't really put this snippet in my code
> because then it would fail on all Python versions below 3.5.

The purpose of future imports is to allow one to use a future feature, 
at the cost of either not supporting older Python versions, or of 
branching your code and making separate releases.

This future is an anomaly in that it import a future disablement of a 
current feature.  So you just want to make sure your one, no-branch code 
base is ready for that feature removal by not using it now. You do not 
want to have a separate branch and release for 3.5 with the future imports.

Try the following: add the future statement to the top of modules with 
generators, compile with 3.5, and when successful, comment-out the 
statement.  For continued testing, especially with multiple authors, 
write functions to un-comment and re-comment a file.  In the test file:

if <3.5>: uncomment('xyz')  # triggers re-compile on import
import xyz
if <3.5>: recomment('xyz')  # ditto,

If this works, put pep479_helper on pypi.

-- 
Terry Jan Reedy


From steve at pearwood.info  Mon May 18 16:52:25 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 19 May 2015 00:52:25 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <mjcs94$20s$1@ger.gmane.org>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org>
Message-ID: <20150518145225.GM5663@ando.pearwood.info>

On Mon, May 18, 2015 at 10:17:06AM -0400, Terry Reedy wrote:

> Try the following: add the future statement to the top of modules with 
> generators, compile with 3.5, and when successful, comment-out the 
> statement.  For continued testing, especially with multiple authors, 
> write functions to un-comment and re-comment a file.  In the test file:
> 
> if <3.5>: uncomment('xyz')  # triggers re-compile on import
> import xyz
> if <3.5>: recomment('xyz')  # ditto,
> 
> If this works, put pep479_helper on pypi.

o_O

I'm not entirely sure what you are trying to do, but I *think* what you 
are trying is to have the byte code in the .pyc file be different from 
what the source code in the .py file says.

Fortunately Python does not make that easy to do. You would have to 
change the datestamp on the files so that the .pyc file appears newer 
than the source code.

I once worked on a system where it was easy to get the source and byte 
code out of sync. The original programmer was a frustrated C developer, 
and so he had built this intricate system where you edited the source 
code in one place, then ran the Unix "make" utility which compiled it, 
and moved the byte code to a completely different place on the 
PYTHONPATH. Oh, and you couldn't just run the Python modules as scripts, 
you had to run bash wrapper scripts which set up a bunch of environment 
variables. And of course there was no documentation other than rants 
about how stupid Python was but at least it was better than Perl. 
Believe me, debugging code where the byte code being imported is 
different from the source code you are reading is fun, if your idea of 
fun is horrible pain.


-- 
Steve

From tjreedy at udel.edu  Mon May 18 17:24:06 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 18 May 2015 11:24:06 -0400
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <20150518145225.GM5663@ando.pearwood.info>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org> <20150518145225.GM5663@ando.pearwood.info>
Message-ID: <mjd06p$70k$1@ger.gmane.org>

On 5/18/2015 10:52 AM, Steven D'Aprano wrote:
> On Mon, May 18, 2015 at 10:17:06AM -0400, Terry Reedy wrote:
>
>> Try the following: add the future statement to the top of modules with
>> generators, compile with 3.5, and when successful, comment-out the
>> statement.  For continued testing, especially with multiple authors,
>> write functions to un-comment and re-comment a file.  In the test file:
>>
>> if <3.5>: uncomment('xyz')  # triggers re-compile on import
>> import xyz
>> if <3.5>: recomment('xyz')  # ditto,
>>
>> If this works, put pep479_helper on pypi.

> I'm not entirely sure what you are trying to do,

Solve the OP's problem.  What are *you* trying to do?  If you do not 
think that the offered solution will work, please explain why, instead 
of diverting attention to some insane projection of yours.

 > but I *think* what you
> are trying is to have the byte code in the .pyc file be different from
> what the source code in the .py file says.

Do you really think I meant the opposite of what I said?
Standard Python behavior that you are completely familiar with: edit 
x.py and import it; Python assumes that x.pyc is obsolete, recompiles 
x.py and rewrites x.pyc.

-- 
Terry Jan Reedy


From steve at pearwood.info  Mon May 18 18:18:27 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 19 May 2015 02:18:27 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CAPTjJmpNSS2yn2NHJfm1rzYUyCN9QH0tvoV57n_623hGVFbiWQ@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
 <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
 <CAPTjJmpNSS2yn2NHJfm1rzYUyCN9QH0tvoV57n_623hGVFbiWQ@mail.gmail.com>
Message-ID: <20150518161827.GN5663@ando.pearwood.info>

On Tue, May 19, 2015 at 12:13:21AM +1000, Chris Angelico wrote:
> On Mon, May 18, 2015 at 11:32 PM, Joao S. O. Bueno
> <jsbueno at python.org.br> wrote:
> > Indeed - importing as NOP would surely be broken -
> >
> > The nice fix would be to be able to do
> > from __future__ import jaberwock
> >
> > and have a plain "ImportError" that could be catched.
> 
> Indeed. Though I'm not sure what a correctly-spelled "from __future__
> import jabberwock" would do; exceptions already "burble" up the call
> stack until they meet "the clause that catch[es]" them. :)

You cannot catch errors in "from __future__ import" lines, because they 
are compile-time errors, not runtime errors. Any __future__ lines must 
be the first lines of executable code in the module. Only comments, 
blank lines, the module docstring, and other __future__ lines can 
preceed them, so this cannot work:

try:
    from __future__ import feature
except:
    pass


for the same reason that this cannot work:

try:
    thing = }{
except SyntaxError:
    thing = {}


It is best to think of the __future__ imports as directives to the 
compiler. They tell the compiler to produce different code, change 
syntax, or similar. Except in the interactive interpreter, you cannot 
change the compiler settings part way through compiling the module.

There is a real __future__ module, but it exists only for introspection 
purposes.


[...]
> The way to make this work would be two-fold. Firstly, an incorrect
> __future__ directive would have to no longer be a SyntaxError; and
> secondly, __future__ directives would have to be permitted after a try
> statement (currently, they're not allowed to follow anything, so the
> 'try' would have to be special-cased to be allowed in). 

It's not enough to merely change the wording of the error from 
SyntaxError to something else. You have to change when it occurs: it can 
no longer be raised at compile time, but has to happen at run time. That 
means that __future__ imports have to have compile to something which 
runs at run time, instead of just being a directive to the compiler.

As for the changes necessary to the compiler, I have no idea how 
extensive they would be, but my guess is "extremely".

Also, consider that once you are allowing __future__ directives to occur 
after a try statement, expect there to be a lot more pressure to allow 
it after any arbitrary code. After all, I might want to write:

if sys.version != '3.7' and read_config('config.ini')['allow_jabberwocky']:
    from __future__ import jabberwocky

so you're opening the doors to a LOT more complexity.

Which, as far as I am concerned, is a good thing, because it makes the 
chances of this actually happening to be somewhere between Buckley's and 
none *wink*



-- 
Steve

From rosuav at gmail.com  Mon May 18 18:45:19 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 19 May 2015 02:45:19 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <20150518161827.GN5663@ando.pearwood.info>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
 <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
 <CAPTjJmpNSS2yn2NHJfm1rzYUyCN9QH0tvoV57n_623hGVFbiWQ@mail.gmail.com>
 <20150518161827.GN5663@ando.pearwood.info>
Message-ID: <CAPTjJmqCjx3YozFA7paQjhBE8anbC8pv+0ycTXsP_15R7062NQ@mail.gmail.com>

On Tue, May 19, 2015 at 2:18 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>> The way to make this work would be two-fold. Firstly, an incorrect
>> __future__ directive would have to no longer be a SyntaxError; and
>> secondly, __future__ directives would have to be permitted after a try
>> statement (currently, they're not allowed to follow anything, so the
>> 'try' would have to be special-cased to be allowed in).
>
> It's not enough to merely change the wording of the error from
> SyntaxError to something else. You have to change when it occurs: it can
> no longer be raised at compile time, but has to happen at run time. That
> means that __future__ imports have to have compile to something which
> runs at run time, instead of just being a directive to the compiler.

Precisely. I'm not saying that the incorrect future directive would be
some other sort of error instead of SyntaxError - for this to be
possible, it would have to *not be an error at all* at compile time,
leaving the faulty directive unannounced until it gets to the second
step (actual run-time importing of the __future__ module) to catch
errors. (Hence the side effect that "from __future__ import
all_feature_names" would actually not be an error; to the compiler,
it's an unknown future directive and thus ignored, and to the
run-time, it's a perfectly valid way to grab the list of features.)

> As for the changes necessary to the compiler, I have no idea how
> extensive they would be, but my guess is "extremely".

Actually, not much. Since it's just the nerfing of one error, it can
be done fairly easily - as proof of concept, I just commented out
lines 50 through 54 of future.c (the "else" block that raises an
error) and compiled:

rosuav at sikorsky:~/cpython$ cat futuredemo.py
from __future__ import generator_stop
from __future__ import all_feature_names
from __future__ import oops
rosuav at sikorsky:~/cpython$ ./python futuredemo.py
Traceback (most recent call last):
  File "futuredemo.py", line 3, in <module>
    from __future__ import oops
ImportError: cannot import name 'oops'

> Also, consider that once you are allowing __future__ directives to occur
> after a try statement, expect there to be a lot more pressure to allow
> it after any arbitrary code. After all, I might want to write:
>
> if sys.version != '3.7' and read_config('config.ini')['allow_jabberwocky']:
>     from __future__ import jabberwocky
>
> so you're opening the doors to a LOT more complexity.

Yes, now that is a much bigger concern. I did say that the "try:" part
of a try block would have to be deemed not-code, as a special case.
Simply nerfing that error (in compile.c and future.c) does make for a
viable proof-of-concept, though, so it's still nothing that requires
extensive changes to the compiler.  However...

> Which, as far as I am concerned, is a good thing, because it makes the
> chances of this actually happening to be somewhere between Buckley's and
> none *wink*

... this I agree with. I don't think the feature is all that useful,
and while it might well not be all that hard to implement, it would
complicate things somewhat, and that's not good. (It also may end up
being quite hard, and more so depending on the complexity of the
definition of what's allowed prior to a __future__ directive.) I can
imagine, for instance, a special case given to this precise structure:

try: from __future__ import feature
except: pass

which would then be an "optional future import"; but again, how often
is it even useful, much less necessary?

ChrisA

From steve at pearwood.info  Mon May 18 19:13:20 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 19 May 2015 03:13:20 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <mjd06p$70k$1@ger.gmane.org>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org> <20150518145225.GM5663@ando.pearwood.info>
 <mjd06p$70k$1@ger.gmane.org>
Message-ID: <20150518171319.GO5663@ando.pearwood.info>

On Mon, May 18, 2015 at 11:24:06AM -0400, Terry Reedy wrote:
> On 5/18/2015 10:52 AM, Steven D'Aprano wrote:
> >On Mon, May 18, 2015 at 10:17:06AM -0400, Terry Reedy wrote:
> >
> >>Try the following: add the future statement to the top of modules with
> >>generators, compile with 3.5, and when successful, comment-out the
> >>statement.  For continued testing, especially with multiple authors,
> >>write functions to un-comment and re-comment a file.  In the test file:
> >>
> >>if <3.5>: uncomment('xyz')  # triggers re-compile on import
> >>import xyz
> >>if <3.5>: recomment('xyz')  # ditto,
> >>
> >>If this works, put pep479_helper on pypi.
> 
> >I'm not entirely sure what you are trying to do,
> 
> Solve the OP's problem.  What are *you* trying to do?  If you do not 
> think that the offered solution will work, please explain why, instead 
> of diverting attention to some insane projection of yours.

You call my comments an "insane projection", but it's your code snippet 
which does exactly what I warned against: first you modify the source 
code, compile and import using the new, modified source, then change the 
source back to the way it was before the import so that what's inside 
the byte code no longer matches what's in the source. Here it is again:

    In the test file:

    if <3.5>: uncomment('xyz')  # triggers re-compile on import
    import xyz
    if <3.5>: recomment('xyz')  # ditto,

In other words: edit source, compile, revert source, use compiled 
version. See the problem now?

You might say, "But it's only a single line that is different." I say, 
*any* difference is too much. I've been burnt too badly by people using 
"clever hacks" that lead to the .pyc file being imported and the .py 
source being out of sync to trust even a single line difference.

If I interpret your words as you wrote them, the solution seems to risk 
becoming as convoluted and messy as the code I had to work with in real 
life. If I try to interpret your words more sensibly ("surely Terry 
cannot possibly mean what he said...?") the suggestion is *still* 
convoluted. If Ram is permitted multiple test files, then the simplest 
solution is to split off the code that relies on the future directive 
into its own file:

try:
    import xyz  # requires the future directive
except ImportError:
    xyz = None

if xyz:
    # tests with directive
else:
    # tests without


Instead of going through your process of editing the source code, 
compiling, importing, re-editing, just have two source files.



-- 
Steve

From rosuav at gmail.com  Mon May 18 19:36:12 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 19 May 2015 03:36:12 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <20150518171319.GO5663@ando.pearwood.info>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org>
 <20150518145225.GM5663@ando.pearwood.info>
 <mjd06p$70k$1@ger.gmane.org>
 <20150518171319.GO5663@ando.pearwood.info>
Message-ID: <CAPTjJmqxve-h382Vhwp=9sMOj6TAmj9y2aaDzjzJ5H66SmxSuA@mail.gmail.com>

On Tue, May 19, 2015 at 3:13 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> If I interpret your words as you wrote them, the solution seems to risk
> becoming as convoluted and messy as the code I had to work with in real
> life. If I try to interpret your words more sensibly ("surely Terry
> cannot possibly mean what he said...?") the suggestion is *still*
> convoluted. If Ram is permitted multiple test files, then the simplest
> solution is to split off the code that relies on the future directive
> into its own file:
>
> try:
>     import xyz  # requires the future directive
> except ImportError:
>     xyz = None
>
> if xyz:
>     # tests with directive
> else:
>     # tests without
>
>
> Instead of going through your process of editing the source code,
> compiling, importing, re-editing, just have two source files.

My understanding of the OP's problem is this:

# Utility file
def some_generator():
    yield stuff

# Tests file
import utils
assert next(some_generator()) == stuff


Now, PEP 479 says that his code should never raise StopIteration in a
generator, or in anything called by a generator. He has no problem
with this, philosophically, but to prove that the change has indeed
happened, it would be good to run the test suite with generator_stop
active - equivalent to running Python 3.7 on the test suite. However,
simply adding a future directive to the tests file will have no effect
(obviously), and adding a future directive to the utility module
itself will break it on Python <3.5, even though it would work just
fine. So the options are:

1) Omit the directive, and trust that it's all working - no benefit
from PEP 479 until Python 3.7.
2) Include the directive, and require Python 3.5+ for no reason other
than this check.
3) Hack something so that the tests are run with the directive active,
but normal running doesn't use it.
4) Hack something so Python 3.5 and 3.6 use the directive, and others don't.

The first two are easy, but have nasty consequences. The third is what
I provided a hack to accomplish (exec the code with a line prepended),
and which Terry suggested the "adorn, import, unadorn" scheme, which
probably also counts as a hack. The fourth is the notion of try/except
around future directives, which I think won't fly.

Terry's proposal doesn't actually require that the .pyc bytecode file
differ from the source code; it will simply mean that the
in-memory-being-executed bytecode will differ from the source. In the
case of future directives like unicode_literals, yes, that would be a
nightmare to debug; but for generator_stop, I doubt it'll cause
problems.

The trouble here is that it's not so much "some code needs the future
directive, some doesn't" as "some use-cases want strict checking, but
we still want compatibility". Ideally, it should be possible to prove
that your test suite now passes in a post-PEP-479 world, without
breaking anything on 3.4 or 2.7. The only question is, how much
hackery are we prepared to accept in order to do this?

Maybe the simplest hackery of all is just to build a tweaked Python
that just always uses generator_stop. It's not hard to do - either
hard-code the bitflag into the default value for ff_features
(future.c:135), or remove part of the condition that actually does the
work (genobject.c:137) - and then you have a 3.7-like Python that
assumes generator_stop semantics. Run your tests with that, and don't
use the future directive at all.

ChrisA

From abarnert at yahoo.com  Mon May 18 20:45:30 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 18 May 2015 11:45:30 -0700
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <20150518171319.GO5663@ando.pearwood.info>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org> <20150518145225.GM5663@ando.pearwood.info>
 <mjd06p$70k$1@ger.gmane.org> <20150518171319.GO5663@ando.pearwood.info>
Message-ID: <0D0794EF-60D4-4103-BA13-434F0A1F65D5@yahoo.com>

On May 18, 2015, at 10:13, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> If Ram is permitted multiple test files, then the simplest 
> solution is to split off the code that relies on the future directive 
> into its own file:
> 
> try:
>    import xyz  # requires the future directive
> except ImportError:
>    xyz = None
> 
> if xyz:
>    # tests with directive
> else:
>    # tests without

Would this break unittest's automated discovery, setuptools' automatic test command, etc., unless you moved xyz out of the tests directory and added a sys.path.import before trying to import it? If so, that might be something worth explaining in a howto or a section of the packaging developer guide.


From greg.ewing at canterbury.ac.nz  Tue May 19 01:06:03 2015
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 19 May 2015 11:06:03 +1200
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <20150518161827.GN5663@ando.pearwood.info>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
 <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
 <CAPTjJmpNSS2yn2NHJfm1rzYUyCN9QH0tvoV57n_623hGVFbiWQ@mail.gmail.com>
 <20150518161827.GN5663@ando.pearwood.info>
Message-ID: <555A705B.4080100@canterbury.ac.nz>

Steven D'Aprano wrote:
> After all, I might want to write:
> 
> if sys.version != '3.7' and read_config('config.ini')['allow_jabberwocky']:
>     from __future__ import jabberwocky

You might want to, but I would have no qualms about
firmly telling you that you can't. Putting try:
in front of a future import still doesn't introduce
any executable code before it, whereas the above does.

-- 
Greg

From rosuav at gmail.com  Tue May 19 02:22:07 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 19 May 2015 10:22:07 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <555A705B.4080100@canterbury.ac.nz>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
 <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
 <CAPTjJmpNSS2yn2NHJfm1rzYUyCN9QH0tvoV57n_623hGVFbiWQ@mail.gmail.com>
 <20150518161827.GN5663@ando.pearwood.info>
 <555A705B.4080100@canterbury.ac.nz>
Message-ID: <CAPTjJmpAO0qmXcjhW-Xwv2ZKW5zO8bfFB0t_Ty5BO=ETvnAbiw@mail.gmail.com>

On Tue, May 19, 2015 at 9:06 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Steven D'Aprano wrote:
>>
>> After all, I might want to write:
>>
>> if sys.version != '3.7' and
>> read_config('config.ini')['allow_jabberwocky']:
>>     from __future__ import jabberwocky
>
>
> You might want to, but I would have no qualms about
> firmly telling you that you can't. Putting try:
> in front of a future import still doesn't introduce
> any executable code before it, whereas the above does.

Yes, but imagine what happens if you want to have _two_ future imports
guarded by try/except. Either something gets completely special-cased
("try: from __future__ import foo except: pass", and no other
except/finally permitted), or you're allowed a maximum of one guarded
future import (though it might have more than one keyword in it), or
there's arbitrary code permitted in the "except" clause prior to a
future import, which would be a major problem.

ChrisA

From python at mrabarnett.plus.com  Tue May 19 02:32:33 2015
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 19 May 2015 01:32:33 +0100
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CAPTjJmpAO0qmXcjhW-Xwv2ZKW5zO8bfFB0t_Ty5BO=ETvnAbiw@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
 <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
 <CAPTjJmpNSS2yn2NHJfm1rzYUyCN9QH0tvoV57n_623hGVFbiWQ@mail.gmail.com>
 <20150518161827.GN5663@ando.pearwood.info>
 <555A705B.4080100@canterbury.ac.nz>
 <CAPTjJmpAO0qmXcjhW-Xwv2ZKW5zO8bfFB0t_Ty5BO=ETvnAbiw@mail.gmail.com>
Message-ID: <555A84A1.60608@mrabarnett.plus.com>

On 2015-05-19 01:22, Chris Angelico wrote:
> On Tue, May 19, 2015 at 9:06 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> Steven D'Aprano wrote:
>>>
>>> After all, I might want to write:
>>>
>>> if sys.version != '3.7' and
>>> read_config('config.ini')['allow_jabberwocky']:
>>>     from __future__ import jabberwocky
>>
>>
>> You might want to, but I would have no qualms about
>> firmly telling you that you can't. Putting try:
>> in front of a future import still doesn't introduce
>> any executable code before it, whereas the above does.
>
> Yes, but imagine what happens if you want to have _two_ future imports
> guarded by try/except. Either something gets completely special-cased
> ("try: from __future__ import foo except: pass", and no other
> except/finally permitted), or you're allowed a maximum of one guarded
> future import (though it might have more than one keyword in it), or
> there's arbitrary code permitted in the "except" clause prior to a
> future import, which would be a major problem.
>
I think that part of the problem is that it looks like an import
statement, but it's really a compiler directive in disguise...


From greg.ewing at canterbury.ac.nz  Tue May 19 00:48:41 2015
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 19 May 2015 10:48:41 +1200
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
 <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
Message-ID: <555A6C49.50505@canterbury.ac.nz>

Joao S. O. Bueno wrote:
> (Otherwise
> from __future__ import from__future__import_ImportError
> seens fun enough to actually be created)

I don't think it would even be all that hard to implement.

As I understand things, a __future__ import already
results in a run-time import in addition to its magical
effects. So all the compiler needs to do is ignore
undefined future features, and an ImportError will
result at run time.

(The rules would need to be relaxed slightly to
allow a try-except around future imports, but that
doesn't seem like a big problem.)

A benefit of this arrangement is that it would permit
monkey-patching of __future__ at run time to get
no-ops.

-- 
Greg

From steve at pearwood.info  Tue May 19 03:17:36 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 19 May 2015 11:17:36 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <555A705B.4080100@canterbury.ac.nz>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
 <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
 <CAPTjJmpNSS2yn2NHJfm1rzYUyCN9QH0tvoV57n_623hGVFbiWQ@mail.gmail.com>
 <20150518161827.GN5663@ando.pearwood.info>
 <555A705B.4080100@canterbury.ac.nz>
Message-ID: <20150519011736.GP5663@ando.pearwood.info>

On Tue, May 19, 2015 at 11:06:03AM +1200, Greg Ewing wrote:
> Steven D'Aprano wrote:
> >After all, I might want to write:
> >
> >if sys.version != '3.7' and read_config('config.ini')['allow_jabberwocky']:
> >    from __future__ import jabberwocky
> 
> You might want to, but I would have no qualms about
> firmly telling you that you can't. Putting try:
> in front of a future import still doesn't introduce
> any executable code before it, whereas the above does.

"Set up a try...except block" is not executable? Then how does it, um, 
you know, set up the try...except block? :-)

"try" compiles to executable code. If you don't believe me:

def a():
    spam

def b():
    try:  spam
    except:  pass

from dis import dis
dis(a)
dis(b)

and take note of the SETUP_EXCEPT byte-code.

In any case, I think that neither of us wants to change the rules about 
what can precede a __future__ import, so hopefully the point is moot.


-- 
Steve

From steve at pearwood.info  Tue May 19 03:20:38 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 19 May 2015 11:20:38 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <555A84A1.60608@mrabarnett.plus.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <20150518120054.GK5663@ando.pearwood.info>
 <5559E624.7030708@mrabarnett.plus.com>
 <CAH0mxTRN=z+7zk9sf9jTM5xJbGraLYUG1fmFVvBCK7FvDHq+cA@mail.gmail.com>
 <CAPTjJmpNSS2yn2NHJfm1rzYUyCN9QH0tvoV57n_623hGVFbiWQ@mail.gmail.com>
 <20150518161827.GN5663@ando.pearwood.info>
 <555A705B.4080100@canterbury.ac.nz>
 <CAPTjJmpAO0qmXcjhW-Xwv2ZKW5zO8bfFB0t_Ty5BO=ETvnAbiw@mail.gmail.com>
 <555A84A1.60608@mrabarnett.plus.com>
Message-ID: <20150519012038.GQ5663@ando.pearwood.info>

On Tue, May 19, 2015 at 01:32:33AM +0100, MRAB wrote:

> I think that part of the problem is that it looks like an import
> statement, but it's really a compiler directive in disguise...

+1

-- 
Steve

From steve at pearwood.info  Tue May 19 05:15:46 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 19 May 2015 13:15:46 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CAPTjJmqxve-h382Vhwp=9sMOj6TAmj9y2aaDzjzJ5H66SmxSuA@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org> <20150518145225.GM5663@ando.pearwood.info>
 <mjd06p$70k$1@ger.gmane.org> <20150518171319.GO5663@ando.pearwood.info>
 <CAPTjJmqxve-h382Vhwp=9sMOj6TAmj9y2aaDzjzJ5H66SmxSuA@mail.gmail.com>
Message-ID: <20150519031545.GR5663@ando.pearwood.info>

On Tue, May 19, 2015 at 03:36:12AM +1000, Chris Angelico wrote:

> My understanding of the OP's problem is this:
> 
> # Utility file
> def some_generator():
>     yield stuff
> 
> # Tests file
> import utils
> assert next(some_generator()) == stuff

That doesn't test that some_generator never raises StopIteration 
directly. All it does is test that it yields correctly. See below for a 
meaningful test.


> Now, PEP 479 says that his code should never raise StopIteration in a
> generator, or in anything called by a generator. He has no problem
> with this, philosophically, but to prove that the change has indeed
> happened, it would be good to run the test suite with generator_stop
> active - equivalent to running Python 3.7 on the test suite.

It's not clear to me whether you're talking about Ram testing that PEP 
479 is working as claimed ("to prove that the change has indeed 
happened"), or testing *his own generators* to check that he doesn't 
accidentally call "raise StopIteration" inside them, regardless of 
version.

If Ram is merely testing PEP 479, then he needs tests like this:

def gen():
    raise StopIteration

assertRaises(RuntimeError, gen)


These tests are only meaningful for 3.5 or better, since in 3.4 the PEP 
isn't implemented and his tests will fail. They belong in the Python 3.5 
test suite, not Ram's library test suite, but if he insists on having 
them, he can stick them in a separate file as already discussed.

More likely, Ram is testing his own generators, not the interpreter. He 
wants to ensure that none of his generators raise StopIteration but 
always use return instead. Whatever test he writes, he has to run it on 
a generator which is passed in, not on a test generator he writes 
specifically for the test.

It's hard to test arbitrary generators for compliance with the rule 
"don't raise StopIterator directly", since you cannot distinguish a 
return from a raise from the outside unless PEP 479 is in effect. 
Ideally, the test should still fail even if PEP 479 is not implemented. 
Otherwise it's a useless test under 3.4 and older, and you might as well 
not even bother running it.

Before PEP 479 is in effect, I can't think of any practical way to 
distinguish the cases:

(1) generator exits by raising (fail);
(2) generator exits by returning (pass);

since both cases end up raising StopIteration. Perhaps Ram is cleverer 
than me and can come up with a definite test, but I'd like to see it 
before commenting.

The only solution I can come up with is to use the inspect module to 
fetch the generator's source code and scan it for "raise StopIteration". 
Parsing the AST will also work, or even the byte-code at a pinch, but 
the source code is easiest:

assertFalse("raise StopIteration" in source)

That will fail if the generator raises StopIteration directly regardless 
of version. It doesn't catch *all possible* violations, e.g.:

    exc = eval(codecs.encode('FgbcVgrengvba', 'rot-13'))
    raise exc

but I assume that Ram trusts himself not to be actively trying to 
subvert his own tests. (If not, then he has bigger problems.)

So, I believe that the whole __future__ directive is a red herring, and 
doesn't actually help Ram do what he wants, which is to write tests 
which will fail if his generators call raise StopIteration regardless of 
what version of Python he runs the test under.



-- 
Steve

From rosuav at gmail.com  Tue May 19 07:46:49 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 19 May 2015 15:46:49 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <20150519031545.GR5663@ando.pearwood.info>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org>
 <20150518145225.GM5663@ando.pearwood.info>
 <mjd06p$70k$1@ger.gmane.org>
 <20150518171319.GO5663@ando.pearwood.info>
 <CAPTjJmqxve-h382Vhwp=9sMOj6TAmj9y2aaDzjzJ5H66SmxSuA@mail.gmail.com>
 <20150519031545.GR5663@ando.pearwood.info>
Message-ID: <CAPTjJmoB5TV-WRvEXfu=R0G8TUNNPt7_8ayGNzNDj3e=gcn7iQ@mail.gmail.com>

On Tue, May 19, 2015 at 1:15 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Tue, May 19, 2015 at 03:36:12AM +1000, Chris Angelico wrote:
>
>> My understanding of the OP's problem is this:
>>
>> # Utility file
>> def some_generator():
>>     yield stuff
>>
>> # Tests file
>> import utils
>> assert next(some_generator()) == stuff
>
> That doesn't test that some_generator never raises StopIteration
> directly. All it does is test that it yields correctly. See below for a
> meaningful test.

Right; this is an existing codebase which (presumably) already has
tests. These tests will continue to pass post-479, but they are
inadequate as proof that the transformation to "never raise
StopIteration" has been completed.

> It's not clear to me whether you're talking about Ram testing that PEP
> 479 is working as claimed ("to prove that the change has indeed
> happened"), or testing *his own generators* to check that he doesn't
> accidentally call "raise StopIteration" inside them, regardless of
> version.
>
> More likely, Ram is testing his own generators, not the interpreter. He
> wants to ensure that none of his generators raise StopIteration but
> always use return instead. Whatever test he writes, he has to run it on
> a generator which is passed in, not on a test generator he writes
> specifically for the test.

Correct.

> It's hard to test arbitrary generators for compliance with the rule
> "don't raise StopIterator directly", since you cannot distinguish a
> return from a raise from the outside unless PEP 479 is in effect.
> Ideally, the test should still fail even if PEP 479 is not implemented.
> Otherwise it's a useless test under 3.4 and older, and you might as well
> not even bother running it.

Indeed. You have summed up the problem.

> Before PEP 479 is in effect, I can't think of any practical way to
> distinguish the cases:
>
> (1) generator exits by raising (fail);
> (2) generator exits by returning (pass);
>
> since both cases end up raising StopIteration. Perhaps Ram is cleverer
> than me and can come up with a definite test, but I'd like to see it
> before commenting.
>
> The only solution I can come up with is to use the inspect module to
> fetch the generator's source code and scan it for "raise StopIteration".
> Parsing the AST will also work, or even the byte-code at a pinch, but
> the source code is easiest:
>
> assertFalse("raise StopIteration" in source)
>
> That will fail if the generator raises StopIteration directly regardless
> of version. It doesn't catch *all possible* violations, e.g.:

More significant example: It doesn't catch a codebase that has some
functions which are used in generators and others which are used in
class-based iterators.

> So, I believe that the whole __future__ directive is a red herring, and
> doesn't actually help Ram do what he wants, which is to write tests
> which will fail if his generators call raise StopIteration regardless of
> what version of Python he runs the test under.

Okay. So how do you ensure that Python 3.7 and Python 3.4 can both run
your code?

ChrisA

From greg.ewing at canterbury.ac.nz  Tue May 19 08:16:17 2015
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 19 May 2015 18:16:17 +1200
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CAPTjJmoB5TV-WRvEXfu=R0G8TUNNPt7_8ayGNzNDj3e=gcn7iQ@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org> <20150518145225.GM5663@ando.pearwood.info>
 <mjd06p$70k$1@ger.gmane.org> <20150518171319.GO5663@ando.pearwood.info>
 <CAPTjJmqxve-h382Vhwp=9sMOj6TAmj9y2aaDzjzJ5H66SmxSuA@mail.gmail.com>
 <20150519031545.GR5663@ando.pearwood.info>
 <CAPTjJmoB5TV-WRvEXfu=R0G8TUNNPt7_8ayGNzNDj3e=gcn7iQ@mail.gmail.com>
Message-ID: <555AD531.9060102@canterbury.ac.nz>

Maybe what's needed is a command-line switch
that turns on a future feature for all code?

Then you can run the Python 3 tests with it,
and the Python 2 tests without it, and not
have to modify any code.

-- 
Greg

From steve at pearwood.info  Tue May 19 11:25:47 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 19 May 2015 19:25:47 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CAPTjJmoB5TV-WRvEXfu=R0G8TUNNPt7_8ayGNzNDj3e=gcn7iQ@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org> <20150518145225.GM5663@ando.pearwood.info>
 <mjd06p$70k$1@ger.gmane.org> <20150518171319.GO5663@ando.pearwood.info>
 <CAPTjJmqxve-h382Vhwp=9sMOj6TAmj9y2aaDzjzJ5H66SmxSuA@mail.gmail.com>
 <20150519031545.GR5663@ando.pearwood.info>
 <CAPTjJmoB5TV-WRvEXfu=R0G8TUNNPt7_8ayGNzNDj3e=gcn7iQ@mail.gmail.com>
Message-ID: <20150519092547.GA28058@ando.pearwood.info>

On Tue, May 19, 2015 at 03:46:49PM +1000, Chris Angelico wrote:

> > The only solution I can come up with is to use the inspect module to
> > fetch the generator's source code and scan it for "raise StopIteration".
> > Parsing the AST will also work, or even the byte-code at a pinch, but
> > the source code is easiest:
> >
> > assertFalse("raise StopIteration" in source)
> >
> > That will fail if the generator raises StopIteration directly regardless
> > of version. It doesn't catch *all possible* violations, e.g.:
> 
> More significant example: It doesn't catch a codebase that has some
> functions which are used in generators and others which are used in
> class-based iterators.

Obviously I only sketched a solution. The person writing the tests has 
to distinguish between functions or methods which must not call "raise 
StopIteration", and test them, while avoiding testing those which may 
use raise. They may want to test more than just the generator function 
themselves, e.g. any functions they call.

In principle, if you're reading the code or the AST, you can do a static 
analysis to automatically detect what functions it calls, and scan them 
as well, but that's a lot of effort for mere unit tests, and the chances 
are that your test code will be buggier than your non-test code. Easier 
to just add the called functions to a list of functions to be checked.

The person writing the tests must decide how much he cares about this. 
"Do the simplest thing that can possibly work" applies to tests as well 
as code (tests *are* code).

(In my opinion, just by *reading* this thread, Ram has already exceeded 
the amount of time and energy that these tests are worth.)


> > So, I believe that the whole __future__ directive is a red herring, and
> > doesn't actually help Ram do what he wants, which is to write tests
> > which will fail if his generators call raise StopIteration regardless of
> > what version of Python he runs the test under.
> 
> Okay. So how do you ensure that Python 3.7 and Python 3.4 can both run
> your code?

If I am right that the future directive is irrelevant, then you simply 
*don't include the future directive*.

Or you split the code into parts that don't require the directive, and 
parts that do, and put them in different files, then conditionally 
import the second set, either in a try...except or if version... block.

Or you write one file: test.py, and run your tests with a wrapper 
script which duplicates that file and inserts the future directive:

# untested
cp test.py test479.py
sed -i '1i from __future__ import feature' test479.py
python -m unittest test.py
python -m unittest test479.py


Combine and adjust as needed.


-- 
Steve

From ncoghlan at gmail.com  Tue May 19 13:22:36 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 19 May 2015 21:22:36 +1000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <20150519031545.GR5663@ando.pearwood.info>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org>
 <20150518145225.GM5663@ando.pearwood.info>
 <mjd06p$70k$1@ger.gmane.org>
 <20150518171319.GO5663@ando.pearwood.info>
 <CAPTjJmqxve-h382Vhwp=9sMOj6TAmj9y2aaDzjzJ5H66SmxSuA@mail.gmail.com>
 <20150519031545.GR5663@ando.pearwood.info>
Message-ID: <CADiSq7e3JfqLbWxctirJf4g6SSJA3BAtOiLPzaBxW_2U8=Cafg@mail.gmail.com>

On 19 May 2015 at 13:15, Steven D'Aprano <steve at pearwood.info> wrote:
> So, I believe that the whole __future__ directive is a red herring, and
> doesn't actually help Ram do what he wants, which is to write tests
> which will fail if his generators call raise StopIteration regardless of
> what version of Python he runs the test under.

The essential impossibility of writing such tests is one of the
underlying reasons *why* PEP 479 was accepted - you can't sensibly
test for inadvertently escaping StopIteration values.

However, I interpreted Ram's request slightly differently: if I'm
understanding the request correctly, he'd like a way to write
single-source modules such that *on Python 3.5+* they effectively run
with "from __future__ import generator_stop", while on older Python
versions, they run unmodified. That way, running the test suite under
Python 3.5 will show that at least the regression tests aren't relying
on "escaping StopIteration" in order to pass.

The intended answer to Ram's request is "configure the warnings module
to turn the otherwise silent deprecation warning into an error". From
https://www.python.org/dev/peps/pep-0479/#transition-plan:

* Python 3.5: Enable new semantics under __future__ import; silent
deprecation warning if StopIteration bubbles out of a generator not
under __future__ import.

However, we missed the second half of that in the initial PEP
implementation, so it doesn't currently emit the deprecation warning
at all, which means there's no way to turn it into an error instead:
http://bugs.python.org/issue24237

Once that issue has been fixed, then "-Wall" will cause any tests
relying on the deprecated behaviour to fail, *without* needing to
modify the code under test to use the future import.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From mistersheik at gmail.com  Tue May 19 17:42:48 2015
From: mistersheik at gmail.com (Neil Girdhar)
Date: Tue, 19 May 2015 08:42:48 -0700 (PDT)
Subject: [Python-ideas] an unless statement would occasionally be useful
In-Reply-To: <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com>
References: <5558D8EE.8010105@earthlink.net>
 <032DB30B-2DFB-447D-B558-7B8C247113EC@yahoo.com>
Message-ID: <c290947c-85ea-433c-a8c4-9b7d9fd81ffe@googlegroups.com>

This is hilarious.

Although to be fair, test might be useful if for example, you test types in 
one thread and run code optimized for that type in another?

On Sunday, May 17, 2015 at 5:40:59 PM UTC-4, Andrew Barnert via 
Python-ideas wrote:
>
> On May 17, 2015, at 11:07, Charles Hixson <charle... at earthlink.net 
> <javascript:>> wrote: 
> > 
> > I'm envisioning "unless" as a synonym for "if not(...):"  currently I 
> use 
> > 
> > if .... : 
> >    pass 
> > else: 
> >  ... 
> > 
> > which works. 
> > 
> > N.B.:  This isn't extremely important as there are already two ways to 
> accomplish the same purpose, but it would be useful, seems easy to 
> implement, and is already used by many other languages.  The advantage is 
> that when the condition is long it simplifies understanding. 
>
> But if you just use not instead of else, it simplifies understanding just 
> as much--and without making the language larger (which makes it harder to 
> learn/remember when switching languages, makes the parser bigger, etc.): 
>
>     if not ...: 
>         ... 
>
> It seems like every year someone proposes either "unless" or "until" or 
> the whole suite of Perl variants (inherently-negated keywords, postfix, 
> do...while-type syntax), but nobody ever asks for anything clever. Think of 
> what you could do with a "lest" statement, which will speculatively execute 
> the body and then test the condition before deciding whether to actually 
> have executed the body. Or a "without" that closes a context before the 
> body instead of after. Or a "butfor" that iterates over every extant object 
> that isn't contained in the Iterable. Or a "because" that raises instead of 
> skipping the body if the condition isn't truthy. Or a "before" that 
> remembers the body for later and executes it a synchronously when the 
> condition becomes true. 
>
>
>
> > _______________________________________________ 
> > Python-ideas mailing list 
> > Python... at python.org <javascript:> 
> > https://mail.python.org/mailman/listinfo/python-ideas 
> > Code of Conduct: http://python.org/psf/codeofconduct/ 
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at python.org <javascript:> 
> https://mail.python.org/mailman/listinfo/python-ideas 
> Code of Conduct: http://python.org/psf/codeofconduct/ 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150519/71b1341b/attachment.html>

From brett at python.org  Wed May 20 17:57:42 2015
From: brett at python.org (Brett Cannon)
Date: Wed, 20 May 2015 15:57:42 +0000
Subject: [Python-ideas] Making it easy to prepare for PEP479
In-Reply-To: <CADiSq7e3JfqLbWxctirJf4g6SSJA3BAtOiLPzaBxW_2U8=Cafg@mail.gmail.com>
References: <CANXboVZv-oyw=FemDtNXJb04mM_7jQTwZkQZ8OLV2BEafgn6aQ@mail.gmail.com>
 <mjcs94$20s$1@ger.gmane.org> <20150518145225.GM5663@ando.pearwood.info>
 <mjd06p$70k$1@ger.gmane.org> <20150518171319.GO5663@ando.pearwood.info>
 <CAPTjJmqxve-h382Vhwp=9sMOj6TAmj9y2aaDzjzJ5H66SmxSuA@mail.gmail.com>
 <20150519031545.GR5663@ando.pearwood.info>
 <CADiSq7e3JfqLbWxctirJf4g6SSJA3BAtOiLPzaBxW_2U8=Cafg@mail.gmail.com>
Message-ID: <CAP1=2W4cYs2tqz9EhTuGi3u6SKTHLucTt-j7NuS5mF58YWiuBw@mail.gmail.com>

On Tue, May 19, 2015 at 7:30 AM Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 19 May 2015 at 13:15, Steven D'Aprano <steve at pearwood.info> wrote:
> > So, I believe that the whole __future__ directive is a red herring, and
> > doesn't actually help Ram do what he wants, which is to write tests
> > which will fail if his generators call raise StopIteration regardless of
> > what version of Python he runs the test under.
>
> The essential impossibility of writing such tests is one of the
> underlying reasons *why* PEP 479 was accepted - you can't sensibly
> test for inadvertently escaping StopIteration values.
>
> However, I interpreted Ram's request slightly differently: if I'm
> understanding the request correctly, he'd like a way to write
> single-source modules such that *on Python 3.5+* they effectively run
> with "from __future__ import generator_stop", while on older Python
> versions, they run unmodified. That way, running the test suite under
> Python 3.5 will show that at least the regression tests aren't relying
> on "escaping StopIteration" in order to pass.
>
> The intended answer to Ram's request is "configure the warnings module
> to turn the otherwise silent deprecation warning into an error". From
> https://www.python.org/dev/peps/pep-0479/#transition-plan:
>
> * Python 3.5: Enable new semantics under __future__ import; silent
> deprecation warning if StopIteration bubbles out of a generator not
> under __future__ import.
>
> However, we missed the second half of that in the initial PEP
> implementation, so it doesn't currently emit the deprecation warning
> at all, which means there's no way to turn it into an error instead:
> http://bugs.python.org/issue24237
>
> Once that issue has been fixed, then "-Wall" will cause any tests
> relying on the deprecated behaviour to fail, *without* needing to
> modify the code under test to use the future import.
>

Another option is to use a custom import loader which sets the __future__
flag passed to compile() depending under what version of Python you were
running your code under; overriding
https://docs.python.org/3.5/library/importlib.html#importlib.abc.InspectLoader.source_to_code
is all that would be needed to make that happen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150520/33b951b9/attachment.html>

From demianbrecht at gmail.com  Thu May 21 07:29:15 2015
From: demianbrecht at gmail.com (Demian Brecht)
Date: Wed, 20 May 2015 22:29:15 -0700
Subject: [Python-ideas] Adding jsonschema to the standard library
Message-ID: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>

Disclaimer: I?m not the author of jsonschema (https://github.com/Julian/jsonschema), but as a user think that users of the standard library (and potentially areas of the standard library itself) could benefit from its addition into the standard library.

I?ve been using jsonschema for the better part of a couple years now and have found it not only invaluable, but flexible around the variety of applications it has. Personally, I generally use it for HTTP response validation when dealing with RESTful APIs and system configuration input validation. For those not familiar with the package:

RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04
Home: http://json-schema.org/
Proposed addition implementation: https://github.com/Julian/jsonschema

Coles notes stats:

Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014)
Heavily used by the community: Currently sees ~585k downloads per month according to PyPI

I?ve reached out to the author to express my interest in authoring a PEP to have the module included to gauge his interest in assisting with maintenance as needed during the integration period (or following). I?d also be personally interested in supporting it as part of the stdlib as well.

My question is: Is there any reason up front anyone can see that this addition wouldn?t fly, or are others interested in the addition as well?

Thanks,
Demian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150520/e8fc413d/attachment.sig>

From gmludo at gmail.com  Thu May 21 07:46:27 2015
From: gmludo at gmail.com (Ludovic Gasc)
Date: Thu, 21 May 2015 07:46:27 +0200
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
Message-ID: <CAON-fpFtOK3p_fcL-UrtoZN3tOKxzVfoV3nVLpVSeX9-WS-19g@mail.gmail.com>

As a end-dev that uses your library for a small time, it's an useful tool.

We're migrating quicker an Erlang application to Python with your library
because the legacy application uses JSON schema.

>From my point of view, validating I/O data is a common problem of most
developers, however, it means that you have a lot of developers that have a
strong opinion how to validate data ;-)

At least to me, it's a good idea to include this library in Python, even if
you have plenty of libraries to do that with several approachs, for now, I
didn't find a simpler approach that via JSON schemas.

The bonus with that is that you can reuse your JSON schemas for migrations
and also in your javascript source code.

It isn't a silver bullet to resolve all validation corner cases, however
enough powerful to resolve the most boring use cases.

Ludovic Gasc (GMLudo)
http://www.gmludo.eu/
On 21 May 2015 07:29, "Demian Brecht" <demianbrecht at gmail.com> wrote:

> Disclaimer: I?m not the author of jsonschema (
> https://github.com/Julian/jsonschema), but as a user think that users of
> the standard library (and potentially areas of the standard library itself)
> could benefit from its addition into the standard library.
>
> I?ve been using jsonschema for the better part of a couple years now and
> have found it not only invaluable, but flexible around the variety of
> applications it has. Personally, I generally use it for HTTP response
> validation when dealing with RESTful APIs and system configuration input
> validation. For those not familiar with the package:
>
> RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04
> Home: http://json-schema.org/
> Proposed addition implementation: https://github.com/Julian/jsonschema
>
> Coles notes stats:
>
> Has been publicly available for over a year: v0.1 released Jan 1, 2012,
> currently at 2.4.0 (released Sept 22, 2014)
> Heavily used by the community: Currently sees ~585k downloads per month
> according to PyPI
>
> I?ve reached out to the author to express my interest in authoring a PEP
> to have the module included to gauge his interest in assisting with
> maintenance as needed during the integration period (or following). I?d
> also be personally interested in supporting it as part of the stdlib as
> well.
>
> My question is: Is there any reason up front anyone can see that this
> addition wouldn?t fly, or are others interested in the addition as well?
>
> Thanks,
> Demian
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150521/82563fd3/attachment.html>

From demianbrecht at gmail.com  Thu May 21 07:53:33 2015
From: demianbrecht at gmail.com (Demian Brecht)
Date: Wed, 20 May 2015 22:53:33 -0700
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <CAON-fpFtOK3p_fcL-UrtoZN3tOKxzVfoV3nVLpVSeX9-WS-19g@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CAON-fpFtOK3p_fcL-UrtoZN3tOKxzVfoV3nVLpVSeX9-WS-19g@mail.gmail.com>
Message-ID: <B298E65B-C874-4461-8749-7BE7A52CFD55@gmail.com>


> On May 20, 2015, at 10:46 PM, Ludovic Gasc <gmludo at gmail.com> wrote:
> As a end-dev that uses your library for a small time, it's an useful tool.

> Disclaimer: I?m not the author of jsonschema

Emphasis on /not/. I?m just another user of the library like you :) But cheers for the feedback!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150520/2bfcad51/attachment-0001.sig>

From yselivanov.ml at gmail.com  Thu May 21 07:59:43 2015
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Thu, 21 May 2015 01:59:43 -0400
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
Message-ID: <555D744F.4000307@gmail.com>



On 2015-05-21 1:29 AM, Demian Brecht wrote:
[..]
> My question is: Is there any reason up front anyone can see that this addition wouldn?t fly, or are others interested in the addition as well?
>

I think we should wait at least until json-schema.org releases a final 
version of the spec.

Thanks,
Yury

From demianbrecht at gmail.com  Thu May 21 08:18:08 2015
From: demianbrecht at gmail.com (Demian Brecht)
Date: Wed, 20 May 2015 23:18:08 -0700
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <555D744F.4000307@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <555D744F.4000307@gmail.com>
Message-ID: <9E094141-A7EF-44B1-B713-301F9D9524E9@gmail.com>


> On May 20, 2015, at 10:59 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> I think we should wait at least until json-schema.org releases a final version of the spec.

I?d thought about that as well, but here were the arguments that I could think of that led me to proposing this in the first place:

The latest draft of the RFC expired Jan 31, 2013. I?d have to try to reach out to the author(s) to confirm, but I?d venture to say there likely isn?t much more effort being put into it.

The library is in heavy use and is useful in practice in its current state. I think that in situations like this practicality of a module should come first and finalized spec second.

There are numerous places in the library that deviate from specs in the name of practical use. I?m not advocating that shouldn?t be an exception as opposed to the rule, I?m just saying that there are multiple things to consider prior to simply squashing an inclusion because of RFC draft state.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150520/bca9e147/attachment.sig>

From stephen at xemacs.org  Thu May 21 09:39:56 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 21 May 2015 16:39:56 +0900
Subject: [Python-ideas]  Adding jsonschema to the standard library
In-Reply-To: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
Message-ID: <87y4kixttf.fsf@uwakimon.sk.tsukuba.ac.jp>

Demian Brecht writes:

 > RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04

I note that this draft, apparently written in Nov. 2011, expired
almost two years ago with no update.  OTOH, 4 other RFCs related to
JSON (6901, 6902, 7386, 7396) have been published recently.  (This
kind of thing is common with RFCs; people get fed up with the process
and just go off and do something that's "good enough" for them.  But
it does show they've given up on the process of getting a global
standard at least for now.)  Then in Oct 2012, Andy Newton wrote[1]:

    Schemas. There is no one standardized schema language for JSON,
    although several are presently in the works (including one by this
    author). The need for a JSON schema language is controversial?JSON
    is regarded by most as simple enough on its own. Indeed, there is
    no shortage of JSON-based interchange specification making due
    without schema formalism.

and his independent proposal[2] (confusingly called "content rules")
is current, expiring on June 5.  (Note that there is no proposal
currently being discussed by the IETF APPSAWG.  Newton's proposal is
independent, pending formation of a new charter for a JSON schema WG.)

 > My question is: Is there any reason up front anyone can see that
 > this addition wouldn?t fly?

I would say that the evident controversy over which schema language
will be standardized is a barrier, unless you can say that Newton's
proposals have no support from the community or something like that.
It's not a terribly high barrier in one sense (Python doesn't demand
that modules be perfect in all ways), but you do have to address the
perception of controversy, I think (at least to deny there really is
any).

A more substantive issue is that Appendix A of Newton's I-D certainly
makes json-schema look "over the top" in verbosity of notation -- XML
would be proud.<wink />  If that assessment is correct, the module
could be considered un-Pythonic (see Zen #4, and although JSON content
rules are not themselves JSON while JSON schema is valid JSON, see Zen
#9).

N.B. I'm not against this proposal, just answering your question.

I did see that somebody named James Newton-King (aka newtonsoft.com)
has an implementation of json-schema for .NET, and json-schema.org
seems to be in active development, which are arguments in favor of
your proposal.

Footnotes: 
[1]  http://www.internetsociety.org/articles/using-json-ietf-protocols

[2]  https://tools.ietf.org/html/draft-newton-json-content-rules-04



From stephen at xemacs.org  Thu May 21 09:52:07 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 21 May 2015 16:52:07 +0900
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <555D744F.4000307@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <555D744F.4000307@gmail.com>
Message-ID: <87wq02xt94.fsf@uwakimon.sk.tsukuba.ac.jp>

Yury Selivanov writes:

 > I think we should wait at least until json-schema.org releases a
 > final version of the spec.

If you mean an RFC, there are all kinds of reasons, some important,
some just tedious, why a perfectly good spec never gets released as an
RFC.  I agree that the fact that none of the IETF, W3C, or ECMA has
released a formal spec yet needs discussion.


From p.f.moore at gmail.com  Thu May 21 09:57:27 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 21 May 2015 08:57:27 +0100
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
Message-ID: <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>

On 21 May 2015 at 06:29, Demian Brecht <demianbrecht at gmail.com> wrote:
> Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014)
> Heavily used by the community: Currently sees ~585k downloads per month according to PyPI

One key question that should be addressed as part of any proposal for
inclusion into the stdlib. Would switching to having feature releases
only when a new major Python version is released (with bugfixes at
minor releases) be acceptable to the project? From the figures you
quote, it sounds like there has been some rapid development, although
things seem to have slowed down now, so maybe things are stable
enough.

Paul

From stephen at xemacs.org  Thu May 21 10:04:56 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 21 May 2015 17:04:56 +0900
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <9E094141-A7EF-44B1-B713-301F9D9524E9@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <555D744F.4000307@gmail.com>
 <9E094141-A7EF-44B1-B713-301F9D9524E9@gmail.com>
Message-ID: <87vbfmxsnr.fsf@uwakimon.sk.tsukuba.ac.jp>

Demian Brecht writes:

 > The latest draft of the RFC expired Jan 31, 2013.

Actually, expiration is more than half a year fresher: August 4,
2013.  But AFAICT none of the schema proposals were RFC track at all,
let alone normative.  They're just in support of various other
JSON-related IETF work.

Steve


From ncoghlan at gmail.com  Thu May 21 11:15:20 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 21 May 2015 19:15:20 +1000
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
Message-ID: <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>

On 21 May 2015 at 17:57, Paul Moore <p.f.moore at gmail.com> wrote:
> On 21 May 2015 at 06:29, Demian Brecht <demianbrecht at gmail.com> wrote:
>> Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014)
>> Heavily used by the community: Currently sees ~585k downloads per month according to PyPI
>
> One key question that should be addressed as part of any proposal for
> inclusion into the stdlib. Would switching to having feature releases
> only when a new major Python version is released (with bugfixes at
> minor releases) be acceptable to the project? From the figures you
> quote, it sounds like there has been some rapid development, although
> things seem to have slowed down now, so maybe things are stable
> enough.

The other question to be answered these days is the value bundling
offers over "pip install jsonschema" (or a platform specific
equivalent). While it's still possible to meet that condition, it's
harder now that we offer pip as a standard feature, especially since
getting added to the standard library almost universally makes life
more difficult for module maintainers if they're not already core
developers.

I'm not necessarily opposed to including JSON schema validation in
general or jsonschema in particular (I've used it myself in the past
and think it's a decent option if you want a bit more rigor in your
data validation), but I'm also not sure how large an overlap there
will be between "could benefit from using jsonschema", "has a
spectacularly onerous package review process", and "can't already get
jsonschema from an approved source".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From julian at grayvines.com  Thu May 21 23:10:42 2015
From: julian at grayvines.com (Julian Berman)
Date: Thu, 21 May 2015 14:10:42 -0700
Subject: [Python-ideas] Adding jsonschema to the standard library
Message-ID: <CABJQSkm2HY5foBH-eYB2BnvCvZrhpdt3wTnEvHGMOyhi+t6RXw@mail.gmail.com>

Hey, author here, thanks a lot Demian for even suggesting such a thing :).

I'm really glad that people have found jsonschema useful.

I actually tend these days to think similarly to what Nick mentioned, that
the standard library really has decreased in importance as pip has shaped
up and now been bundled -- so overall my personal opinion is that I
wouldn't personally be pushing to get jsonschema in -- but! If you felt
strongly, just some brief answers -- I think jsonschema would be able to
cope with more restricted release cycles.

And there are a few areas that I don't like about jsonschema (some APIs)
which eventually I'd like to fix (RefResolver in particular), but for the
most part I think it has stabilized more or less.

I can provide some more details if there's any interest.

Thanks again for even proposing such a thing :)

-Julian


On Thu, May 21, 2015 at 2:15 AM, <python-ideas-request at python.org> wrote:
>
> ------------------------------
>
> Message: 7
> Date: Thu, 21 May 2015 19:15:20 +1000
> From: Nick Coghlan <ncoghlan at gmail.com>
> To: Paul Moore <p.f.moore at gmail.com>
> Cc: Demian Brecht <demianbrecht at gmail.com>, Python-Ideas
>         <python-ideas at python.org>
> Subject: Re: [Python-ideas] Adding jsonschema to the standard library
> Message-ID:
>         <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=
> khvnsQ at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On 21 May 2015 at 17:57, Paul Moore <p.f.moore at gmail.com> wrote:
> > On 21 May 2015 at 06:29, Demian Brecht <demianbrecht at gmail.com> wrote:
> >> Has been publicly available for over a year: v0.1 released Jan 1, 2012,
> currently at 2.4.0 (released Sept 22, 2014)
> >> Heavily used by the community: Currently sees ~585k downloads per month
> according to PyPI
> >
> > One key question that should be addressed as part of any proposal for
> > inclusion into the stdlib. Would switching to having feature releases
> > only when a new major Python version is released (with bugfixes at
> > minor releases) be acceptable to the project? From the figures you
> > quote, it sounds like there has been some rapid development, although
> > things seem to have slowed down now, so maybe things are stable
> > enough.
>
> The other question to be answered these days is the value bundling
> offers over "pip install jsonschema" (or a platform specific
> equivalent). While it's still possible to meet that condition, it's
> harder now that we offer pip as a standard feature, especially since
> getting added to the standard library almost universally makes life
> more difficult for module maintainers if they're not already core
> developers.
>
> I'm not necessarily opposed to including JSON schema validation in
> general or jsonschema in particular (I've used it myself in the past
> and think it's a decent option if you want a bit more rigor in your
> data validation), but I'm also not sure how large an overlap there
> will be between "could benefit from using jsonschema", "has a
> spectacularly onerous package review process", and "can't already get
> jsonschema from an approved source".
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150521/43340659/attachment.html>

From tjreedy at udel.edu  Fri May 22 00:37:37 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 21 May 2015 18:37:37 -0400
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <CABJQSkm2HY5foBH-eYB2BnvCvZrhpdt3wTnEvHGMOyhi+t6RXw@mail.gmail.com>
References: <CABJQSkm2HY5foBH-eYB2BnvCvZrhpdt3wTnEvHGMOyhi+t6RXw@mail.gmail.com>
Message-ID: <mjlmnq$f14$1@ger.gmane.org>

On 5/21/2015 5:10 PM, Julian Berman wrote:
> Hey, author here, thanks a lot Demian for even suggesting such a thing :).

Welcome to python-ideas.

> I'm really glad that people have found jsonschema useful.

In response to Demian, the module initially strikes me, a non-json user, 
as too specialized for the stdlib, even if extremely useful to people 
within the specialty. The high pypi download rate could be interpreted 
as meaning that the module does not need to be in the stdlib to be 
discovered and used.

> I actually tend these days to think similarly to what Nick mentioned,
> that the standard library really has decreased in importance as pip has
> shaped up and now been bundled -- so overall my personal opinion is that
> I wouldn't personally be pushing to get jsonschema in -- but! If you
> felt strongly, just some brief answers -- I think jsonschema would be
> able to cope with more restricted release cycles.

As a core developer, I can see a downside for you, so I would advise you 
to decline the invitation unless you see a stronger upside than is 
immediately obvious.

> And there are a few areas that I don't like about jsonschema (some APIs)
> which eventually I'd like to fix (RefResolver in particular), but for
> the most part I think it has stabilized more or less.


-- 
Terry Jan Reedy


From benhoyt at gmail.com  Fri May 22 03:18:24 2015
From: benhoyt at gmail.com (Ben Hoyt)
Date: Thu, 21 May 2015 21:18:24 -0400
Subject: [Python-ideas] Enabling access to the AST for Python code
Message-ID: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>

Hi Python Ideas folks,

(I previously posted a similar message on Python-Dev, but it's a
better fit for this list. See that thread here:
https://mail.python.org/pipermail/python-dev/2015-May/140063.html)

Enabling access to the AST for compiled code would make some cool
things possible (C# LINQ-style ORMs, for example), and not knowing too
much about this part of Python internals, I'm wondering how possible
and practical this would be.

Context: PonyORM (http://ponyorm.com/) allows you to write regular
Python generator expressions like this:

    select(c for c in Customer if sum(c.orders.price) > 1000)

which compile into and run SQL like this:

    SELECT "c"."id"
    FROM "Customer" "c"
    LEFT JOIN "Order" "order-1" ON "c"."id" = "order-1"."customer"
    GROUP BY "c"."id"
    HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000

I think the Pythonic syntax here is beautiful. But the tricks PonyORM
has to go to get it are ... not quite so beautiful. Because the AST is
not available, PonyORM decompiles Python bytecode into an AST first,
and then converts that to SQL. (More details on all that from author's
EuroPython talk at http://pyvideo.org/video/2968)

PonyORM needs the AST just for generator expressions and
lambda functions, but obviously if this kind of AST access feature
were in Python it'd probably be more general.

I believe C#'s LINQ provides something similar, where if you're
developing a LINQ converter library (say LINQ to SQL), you essentially
get the AST of the code ("expression tree") and the library can do
what it wants with that.

(I know that there's the "ast" module and ast.parse(), which can give
you an AST given a *source string*, but that's not very convenient
here.)

What would it take to enable this kind of AST access in Python? Is it
possible? Is it a good idea?

-Ben

From njs at pobox.com  Fri May 22 03:40:25 2015
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 21 May 2015 18:40:25 -0700
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
Message-ID: <CAPJVwBnDSqvvURnQ7Mt+Oz1bAmQms7tt2qG-R5XufXXjFjSVqA@mail.gmail.com>

On Thu, May 21, 2015 at 6:18 PM, Ben Hoyt <benhoyt at gmail.com> wrote:
> Hi Python Ideas folks,
>
> (I previously posted a similar message on Python-Dev, but it's a
> better fit for this list. See that thread here:
> https://mail.python.org/pipermail/python-dev/2015-May/140063.html)
>
> Enabling access to the AST for compiled code would make some cool
> things possible (C# LINQ-style ORMs, for example), and not knowing too
> much about this part of Python internals, I'm wondering how possible
> and practical this would be.

What concretely are you imagining? I can imagine lots of possibilities
with pretty different properties... e.g., one could have an '.ast'
attribute attached to every code object, which always tracks the
source that the code was compiled from. Or one could add a new
(quasi)quoting syntax, like 'select(! c for c in Customer if
sum(c.orders.price) > 1000)' where ! is a low-priority operator that
simply returns the AST of whatever is written to the right of it.
Or... lots of things, probably.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org

From abarnert at yahoo.com  Fri May 22 03:51:34 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 21 May 2015 18:51:34 -0700
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
Message-ID: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>

On May 21, 2015, at 18:18, Ben Hoyt <benhoyt at gmail.com> wrote:
> 
> (I know that there's the "ast" module and ast.parse(), which can give
> you an AST given a *source string*, but that's not very convenient
> here.)

Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way?

For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST.

From benhoyt at gmail.com  Fri May 22 03:57:50 2015
From: benhoyt at gmail.com (Ben Hoyt)
Date: Thu, 21 May 2015 21:57:50 -0400
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAPJVwBnDSqvvURnQ7Mt+Oz1bAmQms7tt2qG-R5XufXXjFjSVqA@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <CAPJVwBnDSqvvURnQ7Mt+Oz1bAmQms7tt2qG-R5XufXXjFjSVqA@mail.gmail.com>
Message-ID: <CAL9jXCGwhpfocgVOk4UM1QUdKNgDSf3PpH3tB_GVTuiiGEM0-g@mail.gmail.com>

Not knowing too much about interpreter internals, I guess I was
fishing somewhat for the range of possibilities. :-)

But I was definitely thinking more along the lines of a "co_ast"
attribute on code objects. The new syntax approach might be fun, but
I'd think it's a lot more challenging and problematic to add new
syntax.

-Ben

On Thu, May 21, 2015 at 9:40 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Thu, May 21, 2015 at 6:18 PM, Ben Hoyt <benhoyt at gmail.com> wrote:
>> Hi Python Ideas folks,
>>
>> (I previously posted a similar message on Python-Dev, but it's a
>> better fit for this list. See that thread here:
>> https://mail.python.org/pipermail/python-dev/2015-May/140063.html)
>>
>> Enabling access to the AST for compiled code would make some cool
>> things possible (C# LINQ-style ORMs, for example), and not knowing too
>> much about this part of Python internals, I'm wondering how possible
>> and practical this would be.
>
> What concretely are you imagining? I can imagine lots of possibilities
> with pretty different properties... e.g., one could have an '.ast'
> attribute attached to every code object, which always tracks the
> source that the code was compiled from. Or one could add a new
> (quasi)quoting syntax, like 'select(! c for c in Customer if
> sum(c.orders.price) > 1000)' where ! is a low-priority operator that
> simply returns the AST of whatever is written to the right of it.
> Or... lots of things, probably.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org

From greg.ewing at canterbury.ac.nz  Fri May 22 04:08:45 2015
From: greg.ewing at canterbury.ac.nz (Greg)
Date: Fri, 22 May 2015 14:08:45 +1200
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
Message-ID: <555E8FAD.1060100@canterbury.ac.nz>

On 22/05/2015 1:51 p.m., Andrew Barnert via Python-ideas wrote:
>  Or just use MacroPy, which
> wraps up all the hard stuff (especially 2.x compatibility) and
> provides a huge framework of useful tools. What do you want to do
> that can't be done that way?

You might not want to drag in a huge framework just to
do one thing.

-- 
Greg

From benhoyt at gmail.com  Fri May 22 04:10:15 2015
From: benhoyt at gmail.com (Ben Hoyt)
Date: Thu, 21 May 2015 22:10:15 -0400
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
Message-ID: <CAL9jXCGkouc_0WRwEMRpwt-z1mKEk7hwnhiivqyZ9PbnuUsADQ@mail.gmail.com>

Huh, interesting idea. I've never used import hooks. Looks like the
relevant macropy source code is here:

https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py

So basically you would do the following:

1) intercept the import
2) find the source code file yourself and read it
3) call ast.parse() on the source string
4) do anything you want to the AST, for example turn the "select(c for
c in Customer if sum(c.orders.price) > 1000" into whatever SQL or
other function calls
5) pass the massaged AST to compile(), execute it and return the module

Hmmm, yeah, I think you're basically suggesting macro-like processing
of the AST. Pretty cool, but not quite what I was thinking of ... I
was thinking select() would get an AST object at runtime and do stuff
with it.

-Ben

On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
> On May 21, 2015, at 18:18, Ben Hoyt <benhoyt at gmail.com> wrote:
>>
>> (I know that there's the "ast" module and ast.parse(), which can give
>> you an AST given a *source string*, but that's not very convenient
>> here.)
>
> Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way?
>
> For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST.

From greg.ewing at canterbury.ac.nz  Fri May 22 04:13:12 2015
From: greg.ewing at canterbury.ac.nz (Greg)
Date: Fri, 22 May 2015 14:13:12 +1200
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAL9jXCGwhpfocgVOk4UM1QUdKNgDSf3PpH3tB_GVTuiiGEM0-g@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <CAPJVwBnDSqvvURnQ7Mt+Oz1bAmQms7tt2qG-R5XufXXjFjSVqA@mail.gmail.com>
 <CAL9jXCGwhpfocgVOk4UM1QUdKNgDSf3PpH3tB_GVTuiiGEM0-g@mail.gmail.com>
Message-ID: <555E90B8.7070600@canterbury.ac.nz>

On 22/05/2015 1:57 p.m., Ben Hoyt wrote:
> But I was definitely thinking more along the lines of a "co_ast"
> attribute on code objects. The new syntax approach might be fun, but
> I'd think it's a lot more challenging and problematic to add new
> syntax.

Advantages of new syntax:

* More flexible: Any expression can be made into an AST,
   not just lambdas or genexps.

* More efficient: No need to carry an AST around with
   every code object, the vast majority of which will
   never be used.

Disadvantages of new syntax:

* All the disadvantages of new syntax.

-- 
Greg


From yselivanov.ml at gmail.com  Fri May 22 04:13:37 2015
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Thu, 21 May 2015 22:13:37 -0400
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAL9jXCGkouc_0WRwEMRpwt-z1mKEk7hwnhiivqyZ9PbnuUsADQ@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
 <CAL9jXCGkouc_0WRwEMRpwt-z1mKEk7hwnhiivqyZ9PbnuUsADQ@mail.gmail.com>
Message-ID: <555E90D1.7060404@gmail.com>

Hi Ben,

On 2015-05-21 10:10 PM, Ben Hoyt wrote:
> Hmmm, yeah, I think you're basically suggesting macro-like processing
> of the AST. Pretty cool, but not quite what I was thinking of ... I
> was thinking select() would get an AST object at runtime and do stuff
> with it.


Unfortunately, it's not that easy.  Storing AST would require
a lot of extra memory in runtime.  You have to somehow mark
the places where you need it syntactically.

I like how it's done in Rust: select!( ... )

Yury

From benhoyt at gmail.com  Fri May 22 04:15:23 2015
From: benhoyt at gmail.com (Ben Hoyt)
Date: Thu, 21 May 2015 22:15:23 -0400
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAL9jXCGkouc_0WRwEMRpwt-z1mKEk7hwnhiivqyZ9PbnuUsADQ@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
 <CAL9jXCGkouc_0WRwEMRpwt-z1mKEk7hwnhiivqyZ9PbnuUsADQ@mail.gmail.com>
Message-ID: <CAL9jXCFWhXBU3FP9dWLNf-dqWwn0KeJUXg9rnPgugfqgmvny0Q@mail.gmail.com>

Oh wait, macropy already has this exact thing. They call it PINQ
(kinda Python LINQ), and they're macro-compiling it to SQLAlchemy
calls.

https://github.com/lihaoyi/macropy#pinq-to-sqlalchemy

Wow.

-Ben

On Thu, May 21, 2015 at 10:10 PM, Ben Hoyt <benhoyt at gmail.com> wrote:
> Huh, interesting idea. I've never used import hooks. Looks like the
> relevant macropy source code is here:
>
> https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py
>
> So basically you would do the following:
>
> 1) intercept the import
> 2) find the source code file yourself and read it
> 3) call ast.parse() on the source string
> 4) do anything you want to the AST, for example turn the "select(c for
> c in Customer if sum(c.orders.price) > 1000" into whatever SQL or
> other function calls
> 5) pass the massaged AST to compile(), execute it and return the module
>
> Hmmm, yeah, I think you're basically suggesting macro-like processing
> of the AST. Pretty cool, but not quite what I was thinking of ... I
> was thinking select() would get an AST object at runtime and do stuff
> with it.
>
> -Ben
>
> On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>> On May 21, 2015, at 18:18, Ben Hoyt <benhoyt at gmail.com> wrote:
>>>
>>> (I know that there's the "ast" module and ast.parse(), which can give
>>> you an AST given a *source string*, but that's not very convenient
>>> here.)
>>
>> Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way?
>>
>> For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST.

From ethan at stoneleaf.us  Fri May 22 04:22:46 2015
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 21 May 2015 19:22:46 -0700
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
Message-ID: <555E92F6.80803@stoneleaf.us>

redirecting py-dev thread here

On 05/21/2015 07:06 PM, Greg wrote:
> On 22/05/2015 1:33 p.m., Ethan Furman wrote:

>> Going back to the OP:
>>
>>>     select(c for c in Customer if sum(c.orders.price) > 1000)
>>>
>>> which compile into and run SQL like this:
>>>
>>>     SELECT "c"."id"
>>>     FROM "Customer" "c"
>>>     LEFT JOIN "Order" "order-1" ON "c"."id" = "order-1"."customer"
>>>     GROUP BY "c"."id"
>>>     HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000
>>
>> That last code is /not/ Python.  ;)
>
> More importantly, it's not Python *semantics*. You can't view
> it as simply a translation of the Python expression into a
> different language.

Ah, I think I see -- that 'select' isn't really doing anything is it?  The 'if' clause is acting as the 'select' in the gen-exp.

But then `sum(c.orders.price)` isn't really Python semantics either, is it... although it could be if it was souped up -- `c.orders` would have to return a customer-based object that was smart enough 
to return a list of whatever attribute was asked for.  That'd be cool.

--
~Ethan~

From abarnert at yahoo.com  Fri May 22 04:22:25 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 21 May 2015 19:22:25 -0700
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <555E8FAD.1060100@canterbury.ac.nz>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
 <555E8FAD.1060100@canterbury.ac.nz>
Message-ID: <042AA2E2-6FC8-480A-8C2E-A42AE941C5BA@yahoo.com>


> On May 21, 2015, at 19:08, Greg <greg.ewing at canterbury.ac.nz> wrote:
> 
>> On 22/05/2015 1:51 p.m., Andrew Barnert via Python-ideas wrote:
>> Or just use MacroPy, which
>> wraps up all the hard stuff (especially 2.x compatibility) and
>> provides a huge framework of useful tools. What do you want to do
>> that can't be done that way?
> 
> You might not want to drag in a huge framework just to
> do one thing.

But "all kinds of LINQ-style things, like ORMs" isn't just one thing. If you're going to build a huge framework, why not build it on top of another framework that does the hard part of the work for you?


From abarnert at yahoo.com  Fri May 22 04:37:29 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 21 May 2015 19:37:29 -0700
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAL9jXCFWhXBU3FP9dWLNf-dqWwn0KeJUXg9rnPgugfqgmvny0Q@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
 <CAL9jXCGkouc_0WRwEMRpwt-z1mKEk7hwnhiivqyZ9PbnuUsADQ@mail.gmail.com>
 <CAL9jXCFWhXBU3FP9dWLNf-dqWwn0KeJUXg9rnPgugfqgmvny0Q@mail.gmail.com>
Message-ID: <94EFFB86-0672-4F80-944B-0B73C5107ED3@yahoo.com>

On May 21, 2015, at 19:15, Ben Hoyt <benhoyt at gmail.com> wrote:
> 
> Oh wait, macropy already has this exact thing. They call it PINQ
> (kinda Python LINQ), and they're macro-compiling it to SQLAlchemy
> calls.

I didn't even realize he'd included this when suggesting MacroPy. :)

Anyway, most of his macros are pretty easy to read as sample code, so even if what he's done isn't exactly what you wanted, it should be a good foundation.

> https://github.com/lihaoyi/macropy#pinq-to-sqlalchemy
> 
> Wow.
> 
> -Ben
> 
>> On Thu, May 21, 2015 at 10:10 PM, Ben Hoyt <benhoyt at gmail.com> wrote:
>> Huh, interesting idea. I've never used import hooks. Looks like the
>> relevant macropy source code is here:
>> 
>> https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py

If you wanted to do this yourself, and only need to support 3.4+, it's a lot easier than the way MacroPy does it.

But of course it's even easier to just use MacroPy.

>> So basically you would do the following:
>> 
>> 1) intercept the import
>> 2) find the source code file yourself and read it
>> 3) call ast.parse() on the source string
>> 4) do anything you want to the AST, for example turn the "select(c for
>> c in Customer if sum(c.orders.price) > 1000" into whatever SQL or
>> other function calls
>> 5) pass the massaged AST to compile(), execute it and return the module
>> 
>> Hmmm, yeah, I think you're basically suggesting macro-like processing
>> of the AST. Pretty cool, but not quite what I was thinking of ... I
>> was thinking select() would get an AST object at runtime and do stuff
>> with it.

If you really want to, you can build a trivial import hook that just attaches the ASTs (to everything, or only to specific code) and then ignore the code and process the AST at runtime. If you actually need to use runtime information in the processing, that might be worth it, but otherwise it seems like you're just wasting time transforming and compiling the AST on every request. Of course you could build in a cache if the information isn't really dynamic, but in that case, using the code object and .pyc as a cache is a lot simpler and probably more efficient.

>> 
>> -Ben
>> 
>>> On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>>>> On May 21, 2015, at 18:18, Ben Hoyt <benhoyt at gmail.com> wrote:
>>>> 
>>>> (I know that there's the "ast" module and ast.parse(), which can give
>>>> you an AST given a *source string*, but that's not very convenient
>>>> here.)
>>> 
>>> Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way?
>>> 
>>> For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST.

From steve at pearwood.info  Fri May 22 04:44:37 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 22 May 2015 12:44:37 +1000
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
Message-ID: <20150522024437.GY5663@ando.pearwood.info>

On Thu, May 21, 2015 at 06:51:34PM -0700, Andrew Barnert via Python-ideas wrote:
> On May 21, 2015, at 18:18, Ben Hoyt <benhoyt at gmail.com> wrote:
> > 
> > (I know that there's the "ast" module and ast.parse(), which can give
> > you an AST given a *source string*, but that's not very convenient
> > here.)
> 
> Why not? Python modules are distributed as source.

*Some* Python modules are distributed as source. Don't forget that 
byte-code only modules are officially supported.

Functions may also be constructed dynamically, at runtime. Closures may 
have source code available for them, but functions and methods 
constructed with exec (such as those in namedtuples) do not.

Also, the interactive interpreter is a very powerful tool, but it 
doesn't record the source code of functions you type into it.

So there are at least three examples where the source is not available 
at all. Ben also talks about *convenience*: `func.ast` will always be 
more convenient than:

import ast
import parse
ast.parse(inspect.getsource(func))

not to mention the wastefulness of parsing something which has already 
been parsed before. On the other hand, keeping the ast around even when 
it's not used wastes memory, so this is a classic time/space trade off.


> You can pretty 
> easily write an import hook to intercept module loading at the AST 
> level and transform it however you want. 

Let's have a look at yours then, that ought to only take a minute or 
three :-) 

(That's my definition of "pretty easily".)

I think that the majority of Python programmers have no idea that you 
can even write an import hook at all, let alone how to do it.


-- 
Steve

From abarnert at yahoo.com  Fri May 22 05:00:15 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 21 May 2015 20:00:15 -0700
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <20150522024437.GY5663@ando.pearwood.info>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
 <20150522024437.GY5663@ando.pearwood.info>
Message-ID: <509DA1BB-AA08-4945-91FE-17E9546D3FDB@yahoo.com>

LOn May 21, 2015, at 19:44, Steven D'Aprano <steve at pearwood.info> wrote:
> 
>> On Thu, May 21, 2015 at 06:51:34PM -0700, Andrew Barnert via Python-ideas wrote:
>>> On May 21, 2015, at 18:18, Ben Hoyt <benhoyt at gmail.com> wrote:
>>> 
>>> (I know that there's the "ast" module and ast.parse(), which can give
>>> you an AST given a *source string*, but that's not very convenient
>>> here.)
>> 
>> Why not? Python modules are distributed as source.
> 
> *Some* Python modules are distributed as source. Don't forget that 
> byte-code only modules are officially supported.
> 
> Functions may also be constructed dynamically, at runtime. Closures may 
> have source code available for them, but functions and methods 
> constructed with exec (such as those in namedtuples) do not.
> 
> Also, the interactive interpreter is a very powerful tool, but it 
> doesn't record the source code of functions you type into it.
> 
> So there are at least three examples where the source is not available 
> at all.

By comparison, code objects that don't carry around their AST including everything running in any version of Python except maybe a future version that'll be out in a year and a half, if this idea gets accepted, and probably only in CPython.

Plus, I'm pretty sure people would demand the ability to not waste memory and disk space on ASTs when they don't need them, so they still wouldn't be always available.

> Ben also talks about *convenience*: `func.ast` will always be 
> more convenient than:
> 
> import ast
> import parse
> ast.parse(inspect.getsource(func))
> 
> not to mention the wastefulness of parsing something which has already 
> been parsed before.

> On the other hand, keeping the ast around even when 
> it's not used wastes memory, so this is a classic time/space trade off.
> 
> 
>> You can pretty 
>> easily write an import hook to intercept module loading at the AST 
>> level and transform it however you want.
> 
> Let's have a look at yours then, that ought to only take a minute or 
> three :-) 

Does "import macropy" count? That only took me a second or three. :)

Certainly a _lot_ easier than hacking the CPython source, even for something as trivial as adding a new member to the code object and finding all the places to attach the AST.

> (That's my definition of "pretty easily".)
> 
> I think that the majority of Python programmers have no idea that you 
> can even write an import hook at all, let alone how to do it.

Sure, because they have no need to do so. But it's very easy to learn. Especially after the changes in 3.3 and again in 3.4.

During the discussion on Unicode operators that turned into a discussion on a Unicode empty set literal, I suggested an import hook, someone (possibly you?) challenged me to write one if it was so easy, and it took me under half an hour to learn the 3.4 system and implement one. (I'm sure it would be a lot faster this time. But probably not on my phone...) All that work to improve the import system really did pay off. 

By comparison, hacking in new syntax to CPython to play with operator sectioning yesterday took me about four hours.

And of course anyone can download and use my import hook to get the empty set literal in any standard Python 3.4 or later, but anyone who wants to use my operator sectioning hacks has to clone my fork and build and install a new interpreter.

From dw+python-ideas at hmmz.org  Fri May 22 05:02:10 2015
From: dw+python-ideas at hmmz.org (David Wilson)
Date: Fri, 22 May 2015 03:02:10 +0000
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
Message-ID: <20150522030210.GD515@k3>

This sounds like a cool feature, though I'm not sure if exposing the AST
directly on the code object is the best approach..

Attaching the AST to the code object implies serializing (and
deserializing into nicely sparse heap allocations) it via .pyc
files, since code objects are marshalled there.

What about improving the parser so that exact start/end positions are
recorded for function bodies? This might be represented as 2 cheap
integers in RAM, allowing for a helper function in the compiler or
inspect modules (inspect.ast()?) to handle the grunt work.
Implementations like Micropython could just stub out those fields with
-1 or whatever else if desired.

One upside to direct attachment would be that a function returned by
e.g. eval() with no underlying source file would still have its AST
attached, without the caller having to keep hold of the unparsed string,
but the downside of RAM/disk/potentially hefty deserialization
performance seems to outweigh that.

I also wish there was a nicer way of introducing an expression that was
to be represented as an AST, but I think that would involve adding
another language keyword, and simply overloading the meaning of
generators slightly seems preferable to that. :)


David

On Thu, May 21, 2015 at 09:18:24PM -0400, Ben Hoyt wrote:
> Hi Python Ideas folks,
> 
> (I previously posted a similar message on Python-Dev, but it's a
> better fit for this list. See that thread here:
> https://mail.python.org/pipermail/python-dev/2015-May/140063.html)
> 
> Enabling access to the AST for compiled code would make some cool
> things possible (C# LINQ-style ORMs, for example), and not knowing too
> much about this part of Python internals, I'm wondering how possible
> and practical this would be.
> 
> Context: PonyORM (http://ponyorm.com/) allows you to write regular
> Python generator expressions like this:
> 
>     select(c for c in Customer if sum(c.orders.price) > 1000)
> 
> which compile into and run SQL like this:
> 
>     SELECT "c"."id"
>     FROM "Customer" "c"
>     LEFT JOIN "Order" "order-1" ON "c"."id" = "order-1"."customer"
>     GROUP BY "c"."id"
>     HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000
> 
> I think the Pythonic syntax here is beautiful. But the tricks PonyORM
> has to go to get it are ... not quite so beautiful. Because the AST is
> not available, PonyORM decompiles Python bytecode into an AST first,
> and then converts that to SQL. (More details on all that from author's
> EuroPython talk at http://pyvideo.org/video/2968)
> 
> PonyORM needs the AST just for generator expressions and
> lambda functions, but obviously if this kind of AST access feature
> were in Python it'd probably be more general.
> 
> I believe C#'s LINQ provides something similar, where if you're
> developing a LINQ converter library (say LINQ to SQL), you essentially
> get the AST of the code ("expression tree") and the library can do
> what it wants with that.
> 
> (I know that there's the "ast" module and ast.parse(), which can give
> you an AST given a *source string*, but that's not very convenient
> here.)
> 
> What would it take to enable this kind of AST access in Python? Is it
> possible? Is it a good idea?
> 
> -Ben
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From techtonik at gmail.com  Fri May 22 11:59:30 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 22 May 2015 12:59:30 +0300
Subject: [Python-ideas] Timer that starts as soon as it is imported
Message-ID: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>

Is the idea to have timer that starts on import is good?

From phd at phdru.name  Fri May 22 12:58:47 2015
From: phd at phdru.name (Oleg Broytman)
Date: Fri, 22 May 2015 12:58:47 +0200
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
Message-ID: <20150522105847.GA9624@phdru.name>

On Fri, May 22, 2015 at 12:59:30PM +0300, anatoly techtonik <techtonik at gmail.com> wrote:
> Is the idea to have timer that starts on import is good?

   No, because:

-- it could be imported at the wrong time;
-- it couldn't be "reimported"; what is the usage of one-time timer?
-- if it could be reset and restarted at need -- why not start it
   manually in the first place?

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From brett at python.org  Fri May 22 16:40:04 2015
From: brett at python.org (Brett Cannon)
Date: Fri, 22 May 2015 14:40:04 +0000
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAL9jXCGkouc_0WRwEMRpwt-z1mKEk7hwnhiivqyZ9PbnuUsADQ@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
 <CAL9jXCGkouc_0WRwEMRpwt-z1mKEk7hwnhiivqyZ9PbnuUsADQ@mail.gmail.com>
Message-ID: <CAP1=2W4h93gRjxTjfVkL2waCiPRqxVz+AzVmFZfszR=Zcp4O_A@mail.gmail.com>

On Thu, May 21, 2015 at 10:10 PM Ben Hoyt <benhoyt at gmail.com> wrote:

> Huh, interesting idea. I've never used import hooks. Looks like the
> relevant macropy source code is here:
>
> https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py
>
> So basically you would do the following:
>
> 1) intercept the import
> 2) find the source code file yourself and read it
> 3) call ast.parse() on the source string
> 4) do anything you want to the AST, for example turn the "select(c for
> c in Customer if sum(c.orders.price) > 1000" into whatever SQL or
> other function calls
> 5) pass the massaged AST to compile(), execute it and return the module
>
> Hmmm, yeah, I think you're basically suggesting macro-like processing
> of the AST. Pretty cool, but not quite what I was thinking of ... I
> was thinking select() would get an AST object at runtime and do stuff
> with it.
>

Depending on what version of Python you are targeting, it's actually
simpler than that even to get it into the import system:

   1. Subclass importlib.machinery.SourceFileLoader
   <https://docs.python.org/3/library/importlib.html#importlib.machinery.SourceFileLoader>
   and override source_to_code()
   <https://docs.python.org/3/library/importlib.html#importlib.abc.InspectLoader.source_to_code>
   to do your AST transformation and return your changed code object
   (basically your steps 3-5 above)
   2. Set a path hook that uses an instance of
   importlib.machinery.FileFinder
   <https://docs.python.org/3/library/importlib.html#importlib.machinery.FileFinder>
   which utilizes your custom loader
   3. There is no step 3

I know this isn't what you're after, but I just wanted to let you know
importlib has made this sort of thing fairly trivial to implement.

-Brett


>
> -Ben
>
> On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert <abarnert at yahoo.com>
> wrote:
> > On May 21, 2015, at 18:18, Ben Hoyt <benhoyt at gmail.com> wrote:
> >>
> >> (I know that there's the "ast" module and ast.parse(), which can give
> >> you an AST given a *source string*, but that's not very convenient
> >> here.)
> >
> > Why not? Python modules are distributed as source. You can pretty easily
> write an import hook to intercept module loading at the AST level and
> transform it however you want. Or just use MacroPy, which wraps up all the
> hard stuff (especially 2.x compatibility) and provides a huge framework of
> useful tools. What do you want to do that can't be done that way?
> >
> > For many uses, you don't even have to go that far--code objects remember
> their source file and line number, which you can usually use to retrieve
> the text and regenerate the AST.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150522/dd242a44/attachment.html>

From benhoyt at gmail.com  Fri May 22 16:52:45 2015
From: benhoyt at gmail.com (Ben Hoyt)
Date: Fri, 22 May 2015 10:52:45 -0400
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <CAP1=2W4h93gRjxTjfVkL2waCiPRqxVz+AzVmFZfszR=Zcp4O_A@mail.gmail.com>
References: <CAL9jXCEMg7NCpk3925CmVUtY6n=c9vL_7eqO_di8-GXkYy7ARQ@mail.gmail.com>
 <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com>
 <CAL9jXCGkouc_0WRwEMRpwt-z1mKEk7hwnhiivqyZ9PbnuUsADQ@mail.gmail.com>
 <CAP1=2W4h93gRjxTjfVkL2waCiPRqxVz+AzVmFZfszR=Zcp4O_A@mail.gmail.com>
Message-ID: <CAL9jXCFK-CuUdui=BYR51MkF3BFFQaesfwmJvcpwryVqLkTDRQ@mail.gmail.com>

Good to know -- thanks! -Ben

On Fri, May 22, 2015 at 10:40 AM, Brett Cannon <brett at python.org> wrote:
>
>
> On Thu, May 21, 2015 at 10:10 PM Ben Hoyt <benhoyt at gmail.com> wrote:
>>
>> Huh, interesting idea. I've never used import hooks. Looks like the
>> relevant macropy source code is here:
>>
>>
>> https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py
>>
>> So basically you would do the following:
>>
>> 1) intercept the import
>> 2) find the source code file yourself and read it
>> 3) call ast.parse() on the source string
>> 4) do anything you want to the AST, for example turn the "select(c for
>> c in Customer if sum(c.orders.price) > 1000" into whatever SQL or
>> other function calls
>> 5) pass the massaged AST to compile(), execute it and return the module
>>
>> Hmmm, yeah, I think you're basically suggesting macro-like processing
>> of the AST. Pretty cool, but not quite what I was thinking of ... I
>> was thinking select() would get an AST object at runtime and do stuff
>> with it.
>
>
> Depending on what version of Python you are targeting, it's actually simpler
> than that even to get it into the import system:
>
> Subclass importlib.machinery.SourceFileLoader and override source_to_code()
> to do your AST transformation and return your changed code object (basically
> your steps 3-5 above)
> Set a path hook that uses an instance of importlib.machinery.FileFinder
> which utilizes your custom loader
> There is no step 3
>
> I know this isn't what you're after, but I just wanted to let you know
> importlib has made this sort of thing fairly trivial to implement.
>
> -Brett
>
>>
>>
>> -Ben
>>
>> On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert <abarnert at yahoo.com>
>> wrote:
>> > On May 21, 2015, at 18:18, Ben Hoyt <benhoyt at gmail.com> wrote:
>> >>
>> >> (I know that there's the "ast" module and ast.parse(), which can give
>> >> you an AST given a *source string*, but that's not very convenient
>> >> here.)
>> >
>> > Why not? Python modules are distributed as source. You can pretty easily
>> > write an import hook to intercept module loading at the AST level and
>> > transform it however you want. Or just use MacroPy, which wraps up all the
>> > hard stuff (especially 2.x compatibility) and provides a huge framework of
>> > useful tools. What do you want to do that can't be done that way?
>> >
>> > For many uses, you don't even have to go that far--code objects remember
>> > their source file and line number, which you can usually use to retrieve the
>> > text and regenerate the AST.
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/

From demianbrecht at gmail.com  Fri May 22 18:39:34 2015
From: demianbrecht at gmail.com (Demian Brecht)
Date: Fri, 22 May 2015 09:39:34 -0700
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
Message-ID: <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>

First off, thanks all for the well thought out responses! Will try to touch on each point when I get a few spare cycles throughout the day.

> On May 21, 2015, at 2:15 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> The other question to be answered these days is the value bundling
> offers over "pip install jsonschema" (or a platform specific
> equivalent). While it's still possible to meet that condition, it's
> harder now that we offer pip as a standard feature, especially since
> getting added to the standard library almost universally makes life
> more difficult for module maintainers if they're not already core
> developers.

This is an interesting problem and a question that I?ve had at the back of my mind as well. With the addition of pip, there is really no additional value /to those who already know about the package and what problem it solves/. In my mind, the value of bundling anything nowadays really boils down to ?this is the suggested de facto standard of solving problem [X] using Python?. I see two problems with relying on pip and PyPI as an alternative to bundling:

1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.
2. You generally won't know about packages that don?t solve problems you?ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn?t even know were a thing. Likewise with jsonschema, I wouldn?t have known it was a thing had a co-worker not introduced me to it a couple years ago.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150522/c61a7a0f/attachment-0001.sig>

From graffatcolmingov at gmail.com  Fri May 22 21:08:47 2015
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Fri, 22 May 2015 14:08:47 -0500
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
Message-ID: <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>

On Fri, May 22, 2015 at 11:39 AM, Demian Brecht <demianbrecht at gmail.com> wrote:
> First off, thanks all for the well thought out responses! Will try to touch on each point when I get a few spare cycles throughout the day.
>
>> On May 21, 2015, at 2:15 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> The other question to be answered these days is the value bundling
>> offers over "pip install jsonschema" (or a platform specific
>> equivalent). While it's still possible to meet that condition, it's
>> harder now that we offer pip as a standard feature, especially since
>> getting added to the standard library almost universally makes life
>> more difficult for module maintainers if they're not already core
>> developers.
>
> This is an interesting problem and a question that I?ve had at the back of my mind as well. With the addition of pip, there is really no additional value /to those who already know about the package and what problem it solves/. In my mind, the value of bundling anything nowadays really boils down to ?this is the suggested de facto standard of solving problem [X] using Python?. I see two problems with relying on pip and PyPI as an alternative to bundling:

Counter-point: What library is the de facto standard of doing HTTP in
Python? Requests is, of course. Discussion of its inclusion has
happened several times and each time the decision is to not include
it. The most recent such discussion was at the Language Summit at
PyCon 2015 in Montreal. If you want to go by download count, then
Requests should still be in the standard library but it just will not
happen.

> 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.

That's not exactly true in every case. The only library that parses
and emits YAML is PyYAML. It's both unmaintained, incomplete, and full
of bugs. That said, it's the de facto standard and it's the only onw
of its kind that I know of on PyPI. I would vehemently argue against
its inclusion were it ever purposed.

> 2. You generally won't know about packages that don?t solve problems you?ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn?t even know were a thing. Likewise with jsonschema, I wouldn?t have known it was a thing had a co-worker not introduced me to it a couple years ago.

Counter-point, once you know you want to use JSON Schema looking for
implementations in python yields Julian's implementation first.

You said (paraphrasing) in your first email that jsonschema should
only be excluded from the stdlib if people could bring up reasons
against it. The standard library has grown in the past few releases
but that doesn't mean it needs to grow every time. It also means it
doesn't need to grow to include an implementation of every possible
/thing/ that exists. Further, leaving it up to others to prove why it
shouldn't be included isn't sufficient. You have to prove to the
community why it MUST be included. Saying "Ah let's throw this thing
in there anyway because why not" isn't valid. By that logic, I could
nominate several libraries that I find useful in day-to-day work and
the barrier to entry would be exactly as much energy as people who
care about the standard library are willing to expend to keep the less
than sultry candidates out.

In this case, that /thing/ is JSON Schema. Last I checked, JSON Schema
was a IETF Draft that was never accepted and a specification which
expired. That means in a couple years, ostensibly after this was added
to the stdlib, it could be made completely irrelevant and the time to
fix it would be incredible. That would be far less of an issue if
jsonschema were not included at all.

Overall, I'm strongly against its inclusion. Not because the library
isn't excellent. It is. I use it. I'm strongly against it for the
reasons listed above.

From donald at stufft.io  Fri May 22 21:23:14 2015
From: donald at stufft.io (Donald Stufft)
Date: Fri, 22 May 2015 15:23:14 -0400
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
Message-ID: <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>


> On May 22, 2015, at 3:08 PM, Ian Cordasco <graffatcolmingov at gmail.com> wrote:
> 
>> 
>> 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.
> 
> That's not exactly true in every case. The only library that parses
> and emits YAML is PyYAML. It's both unmaintained, incomplete, and full
> of bugs. That said, it's the de facto standard and it's the only onw
> of its kind that I know of on PyPI. I would vehemently argue against
> its inclusion were it ever purposed.
> 
>> 2. You generally won't know about packages that don?t solve problems you?ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn?t even know were a thing. Likewise with jsonschema, I wouldn?t have known it was a thing had a co-worker not introduced me to it a couple years ago.
> 
> Counter-point, once you know you want to use JSON Schema looking for
> implementations in python yields Julian's implementation first.


I think a future area of work is going to be on improving the ability for
people who don't know what they want to find out that they want something and
which thing they want on PyPI. I'm not entirely sure what this is going to look
like but I think it's an important problem. It's being solved for very specific
cases by starting to have the standard documentation explicitly call out these
defacto standards of the Python ecosystem where it makes sense. This of course
does not scale to every single problem domain or module on PyPI so we still
need a more general solution.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150522/0184be2b/attachment.sig>

From abarnert at yahoo.com  Fri May 22 21:24:26 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 22 May 2015 12:24:26 -0700
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
Message-ID: <16BABB0C-6CB3-447D-A6B2-223A8D985674@yahoo.com>

On May 22, 2015, at 09:39, Demian Brecht <demianbrecht at gmail.com> wrote:

> In my mind, the value of bundling anything nowadays really boils down to ?this is the suggested de facto standard of solving problem [X] using Python?.

The other way of saying that is to say it explicitly in the stdlib docs, usage docs, and/or tutorial and link to the package. While that used to be pretty rare, that's changed recently. Off the top of my head, there are links to setuptools, requests, nose, py.test, Pillow, PyObjC, py2app, PyWin32, WConio, Console, UniCurses, Urwid, the major alternative GUI frameworks, Twisted, and pexpect.

So, if you wrote something to put in the json module docs, the input/output section of the tutorial, or a howto explaining that if you want structured and validated JSON the usual standard is JSON Schema and the jsonschema library can do it for you in Python, that would get most of the same benefits as adding jsonschema to the stdlib without most of the costs.

> I see two problems with relying on pip and PyPI as an alternative to bundling:

In general, there's a potentially much bigger reason: some projects can't use arbitrary third-party projects without a costly vetting process, or need to work on machines that don't have Internet access or don't have a way to install user site-packages or virtualenvs, etc. Fortunately, those kinds of problems aren't likely to come up for the kinds of projects that need JSON Schema (e.g., Internet servers, client frameworks that are themselves installed via pip, client apps that are distributed by bundling with cx_Freeze/py2app/etc.).

> 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.

Usually this is a strength, not a weakness. Until one project really is good enough to become the de facto standard, you wouldn't want to limit the competition, right? The problem traditionally has been that once something _does_ reach that point, there's no way to make that clear--but now that the stdlib docs link to outside projects, there's a solution.

> 2. You generally won't know about packages that don?t solve problems you?ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn?t even know were a thing. Likewise with jsonschema, I wouldn?t have known it was a thing had a co-worker not introduced me to it a couple years ago.


From p.andrefreitas at gmail.com  Sat May 23 01:08:40 2015
From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=)
Date: Fri, 22 May 2015 23:08:40 +0000
Subject: [Python-ideas] Cmake as build system
Message-ID: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>

Hi,
What you think about using Cmake build system?

I see advantages such as:
- Cross-plataform;
- Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc);
- Simple and easy to use (Zen of Python :)
https://www.python.org/dev/peps/pep-0020/ );

I was actually seeing a discussion in python-commiters about Windows 7
buildbots failing. Found that someone already had the same idea but don't
know if it was shared here: http://www.vtk.org/Wiki/BuildingPythonWithCMake

Please share your thoughts.

Regards,
Andr? Freitas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150522/69978d6f/attachment-0001.html>

From rymg19 at gmail.com  Sat May 23 01:48:52 2015
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Fri, 22 May 2015 18:48:52 -0500
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
Message-ID: <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>

HAHAHA!!

Good luck! I've raised this issue before. Twice. Autotools sucks. And makes
cross-compiling a pain in the neck. Bottom line was:

- C++ is a big dependency
- The autotools build system has been tested already on lots and lots and
lots of platforms
- Nobody has even implemented an alternative build system for Python 3 yet
(python-cmake is only for Python 2)
- No one can agree on a best build system (for instance, I hate CMake!)


On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas <p.andrefreitas at gmail.com>
wrote:

> Hi,
> What you think about using Cmake build system?
>
> I see advantages such as:
> - Cross-plataform;
> - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc);
> - Simple and easy to use (Zen of Python :)
> https://www.python.org/dev/peps/pep-0020/ );
>
> I was actually seeing a discussion in python-commiters about Windows 7
> buildbots failing. Found that someone already had the same idea but don't
> know if it was shared here:
> http://www.vtk.org/Wiki/BuildingPythonWithCMake
>
> Please share your thoughts.
>
> Regards,
> Andr? Freitas
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something?s wrong.
http://kirbyfan64.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150522/ddc478fa/attachment.html>

From p.andrefreitas at gmail.com  Sat May 23 02:08:55 2015
From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=)
Date: Sat, 23 May 2015 01:08:55 +0100
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
Message-ID: <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>

Hi,
Thanks for sharing Ryan Gonzalez :)

It just could be another alternative and not a replacement of autotools.
Not only about the cross-platform feature of Cmake but the integration with
modern IDEs. I really see an improvement in productivity using the IDE
debugger (e.g Clion) instead of using prints everywhere (
http://programmers.stackexchange.com/questions/78152/real-programmers-use-debuggers
).





2015-05-23 0:48 GMT+01:00 Ryan Gonzalez <rymg19 at gmail.com>:

> HAHAHA!!
>
> Good luck! I've raised this issue before. Twice. Autotools sucks. And
> makes cross-compiling a pain in the neck. Bottom line was:
>
> - C++ is a big dependency
> - The autotools build system has been tested already on lots and lots and
> lots of platforms
> - Nobody has even implemented an alternative build system for Python 3 yet
> (python-cmake is only for Python 2)
> - No one can agree on a best build system (for instance, I hate CMake!)
>
>
> On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas <p.andrefreitas at gmail.com>
> wrote:
>
>> Hi,
>> What you think about using Cmake build system?
>>
>> I see advantages such as:
>> - Cross-plataform;
>> - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc);
>> - Simple and easy to use (Zen of Python :)
>> https://www.python.org/dev/peps/pep-0020/ );
>>
>> I was actually seeing a discussion in python-commiters about Windows 7
>> buildbots failing. Found that someone already had the same idea but don't
>> know if it was shared here:
>> http://www.vtk.org/Wiki/BuildingPythonWithCMake
>>
>> Please share your thoughts.
>>
>> Regards,
>> Andr? Freitas
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>
> --
> Ryan
> [ERROR]: Your autotools build scripts are 200 lines longer than your
> program. Something?s wrong.
> http://kirbyfan64.github.io/
>
>



-- 
Andr? Freitas
p.andrefreitas at gmail.com
"Imagination is more important than knowledge" - Albert Einstein
*google+* Andr?Freitas92 <https://plus.google.com/+Andr?Freitas92>
*linkedin* pandrefreitas <http://pt.linkedin.com/in/pandrefreitas/>
*github* andrefreitas <https://github.com/andrefreitas>
*website* www.andrefreitas.pt <http://andrefreitas.pt>
Esta mensagem pode conter informa??o confidencial ou privilegiada, sendo
seu sigilo protegido por lei. Se voc? n?o for o destinat?rio ou a pessoa
autorizada a receber esta mensagem, n?o pode usar, copiar ou divulgar as
informa??es nela contidas ou tomar qualquer a??o baseada nessas
informa??es. Se voc? recebeu esta mensagem por engano, por favor, avise
imediatamente ao remetente, respondendo o e-mail e em seguida apague-a.
Agradecemos a sua coopera??o.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150523/f2e1403e/attachment.html>

From abarnert at yahoo.com  Sat May 23 03:45:22 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 22 May 2015 18:45:22 -0700
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
 <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>
Message-ID: <9A12460B-8A5E-44CC-BF8B-D1EBF23EF9B4@yahoo.com>

On May 22, 2015, at 17:08, Andr? Freitas <p.andrefreitas at gmail.com> wrote:
> 
> Hi,
> Thanks for sharing Ryan Gonzalez :)
> 
> It just could be another alternative and not a replacement of autotools. Not only about the cross-platform feature of Cmake but the integration with modern IDEs. I really see an improvement in productivity using the IDE debugger (e.g Clion) instead of using prints everywhere (http://programmers.stackexchange.com/questions/78152/real-programmers-use-debuggers).

What's stopping you from using an IDE debugger? I've run CPython itself or other similarly complex projects under Xcode, Eclipse, Visual Studio, WinDebug, ggdb, and other graphical debuggers without them having to understand how the code got built. If Clion can't do the same, that sounds like a problem with Clion.

(Although personally, I usually find it easier to debug interpreters or other complex CLI programs just running gdb/lldb/whatever on the terminal.)

> 2015-05-23 0:48 GMT+01:00 Ryan Gonzalez <rymg19 at gmail.com>:
>> HAHAHA!!
>> 
>> Good luck! I've raised this issue before. Twice. Autotools sucks. And makes cross-compiling a pain in the neck. Bottom line was:
>> 
>> - C++ is a big dependency
>> - The autotools build system has been tested already on lots and lots and lots of platforms
>> - Nobody has even implemented an alternative build system for Python 3 yet (python-cmake is only for Python 2)
>> - No one can agree on a best build system (for instance, I hate CMake!)
>> 
>> 
>>> On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas <p.andrefreitas at gmail.com> wrote:
>>> Hi,
>>> What you think about using Cmake build system? 
>>> 
>>> I see advantages such as:
>>> - Cross-plataform;
>>> - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc);
>>> - Simple and easy to use (Zen of Python :) https://www.python.org/dev/peps/pep-0020/ );
>>> 
>>> I was actually seeing a discussion in python-commiters about Windows 7 buildbots failing. Found that someone already had the same idea but don't know if it was shared here: http://www.vtk.org/Wiki/BuildingPythonWithCMake
>>> 
>>> Please share your thoughts.
>>> 
>>> Regards,
>>> Andr? Freitas
>>> 
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>> 
>> 
>> 
>> -- 
>> Ryan
>> [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong.
>> http://kirbyfan64.github.io/
> 
> 
> 
> -- 
> Andr? Freitas
> p.andrefreitas at gmail.com
> "Imagination is more important than knowledge" - Albert Einstein
> google+ Andr?Freitas92 
> linkedin pandrefreitas 
> github andrefreitas
> website www.andrefreitas.pt 
> Esta mensagem pode conter informa??o confidencial ou privilegiada, sendo seu sigilo protegido por lei. Se voc? n?o for o destinat?rio ou a pessoa autorizada a receber esta mensagem, n?o pode usar, copiar ou divulgar as informa??es nela contidas ou tomar qualquer a??o baseada nessas informa??es. Se voc? recebeu esta mensagem por engano, por favor, avise imediatamente ao remetente, respondendo o e-mail e em seguida apague-a. Agradecemos a sua coopera??o.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150522/29a07493/attachment-0001.html>

From abarnert at yahoo.com  Sat May 23 04:08:31 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 22 May 2015 19:08:31 -0700
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
 <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>
Message-ID: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com>

Sorry, meant to include this in my previous reply, but I accidentally cut and didn't paste...

Sent from my iPhone

> On May 22, 2015, at 17:08, Andr? Freitas <p.andrefreitas at gmail.com> wrote:
> 
> Hi,
> Thanks for sharing Ryan Gonzalez :)
> 
> It just could be another alternative and not a replacement of autotools.

If the problem is that the autotools build system is a nightmare to maintain, how is having two completely different complex build systems that have to be kept perfectly in sync not going to be an even bigger nightmare?

> Not only about the cross-platform feature of Cmake but the integration with modern IDEs. I really see an improvement in productivity using the IDE debugger (e.g Clion) instead of using prints everywhere (http://programmers.stackexchange.com/questions/78152/real-programmers-use-debuggers).

Why did you link to a question that was migrated and then closed as not constructive, and that was written to argue that debuggers are useless, and whose contrary answers only talk about command-line debugging rather than whether a GUI wrapper can help debugging? That seems to argue against your case, not for it...

> 2015-05-23 0:48 GMT+01:00 Ryan Gonzalez <rymg19 at gmail.com>:
>> HAHAHA!!
>> 
>> Good luck! I've raised this issue before. Twice. Autotools sucks. And makes cross-compiling a pain in the neck. Bottom line was:
>> 
>> - C++ is a big dependency
>> - The autotools build system has been tested already on lots and lots and lots of platforms
>> - Nobody has even implemented an alternative build system for Python 3 yet (python-cmake is only for Python 2)
>> - No one can agree on a best build system (for instance, I hate CMake!)
>> 
>> 
>>> On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas <p.andrefreitas at gmail.com> wrote:
>>> Hi,
>>> What you think about using Cmake build system? 
>>> 
>>> I see advantages such as:
>>> - Cross-plataform;
>>> - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc);
>>> - Simple and easy to use (Zen of Python :) https://www.python.org/dev/peps/pep-0020/ );
>>> 
>>> I was actually seeing a discussion in python-commiters about Windows 7 buildbots failing. Found that someone already had the same idea but don't know if it was shared here: http://www.vtk.org/Wiki/BuildingPythonWithCMake
>>> 
>>> Please share your thoughts.
>>> 
>>> Regards,
>>> Andr? Freitas
>>> 
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>> 
>> 
>> 
>> -- 
>> Ryan
>> [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong.
>> http://kirbyfan64.github.io/
> 
> 
> 
> -- 
> Andr? Freitas
> p.andrefreitas at gmail.com
> "Imagination is more important than knowledge" - Albert Einstein
> google+ Andr?Freitas92 
> linkedin pandrefreitas 
> github andrefreitas
> website www.andrefreitas.pt 
> Esta mensagem pode conter informa??o confidencial ou privilegiada, sendo seu sigilo protegido por lei. Se voc? n?o for o destinat?rio ou a pessoa autorizada a receber esta mensagem, n?o pode usar, copiar ou divulgar as informa??es nela contidas ou tomar qualquer a??o baseada nessas informa??es. Se voc? recebeu esta mensagem por engano, por favor, avise imediatamente ao remetente, respondendo o e-mail e em seguida apague-a. Agradecemos a sua coopera??o.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150522/58f4dc3c/attachment.html>

From stephen at xemacs.org  Sat May 23 04:59:21 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 23 May 2015 11:59:21 +0900
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
Message-ID: <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>

Donald Stufft writes:

 > I think a future area of work is going to be on improving the
 > ability for people who don't know what they want to find out that
 > they want something and which thing they want on PyPI. I'm not
 > entirely sure what this is going to look like

+1

 > but I think it's an important problem.

+1

 > It's being solved for very specific cases by starting to have the
 > standard documentation explicitly call out these defacto standards
 > of the Python ecosystem where it makes sense.

Because that's necessarily centralized, it's a solution to a different
problem.  We need a decentralized approach to deal with the "people
who use package X often would benefit from Y too, but don't know where
to find Y or which implementation to use."  IOW, there needs to be a
way for X to recommend implementation Z (or implementations Z1 or Z2)
of Y.

 > This of course does not scale to every single problem domain or
 > module on PyPI so we still need a more general solution.

The only way we know to scale a web is to embed the solution in the
nodes.  Currently many packages know what they use internally (the
install_requires field), but as far as I can see there's no way for a
package X to recommend "related" packages Z to implement function Y in
applications using X.  Eg, the plethora of ORMs available, some of
which work better with particular packages than others do.

We could also recommend that package maintainers document such
recommendations, preferably in a fairly standard place, in their
package documentation.  Even something like "I've successfully used Z
to do Y in combination with this package" would often help a lot.

If a maintainer (obvious extension: 3rd party recommendations and
voting) wants to recommend other packages that work and play well with
her package but aren't essential to its function, how about a
dictionary mapping Trove classifiers to lists of recommended packages
for that implmenentation?


From abarnert at yahoo.com  Sat May 23 07:07:34 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 22 May 2015 22:07:34 -0700
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <8DF1D599-2DF9-4F36-8235-BAF23B3E0076@yahoo.com>

On May 22, 2015, at 19:59, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> 
> Donald Stufft writes:
> 
>> I think a future area of work is going to be on improving the
>> ability for people who don't know what they want to find out that
>> they want something and which thing they want on PyPI. I'm not
>> entirely sure what this is going to look like
> 
> +1
> 
>> but I think it's an important problem.
> 
> +1
> 
>> It's being solved for very specific cases by starting to have the
>> standard documentation explicitly call out these defacto standards
>> of the Python ecosystem where it makes sense.
> 
> Because that's necessarily centralized, it's a solution to a different
> problem.  We need a decentralized approach to deal with the "people
> who use package X often would benefit from Y too, but don't know where
> to find Y or which implementation to use."  IOW, there needs to be a
> way for X to recommend implementation Z (or implementations Z1 or Z2)
> of Y.
> 
>> This of course does not scale to every single problem domain or
>> module on PyPI so we still need a more general solution.
> 
> The only way we know to scale a web is to embed the solution in the
> nodes.  Currently many packages know what they use internally (the
> install_requires field), but as far as I can see there's no way for a
> package X to recommend "related" packages Z to implement function Y in
> applications using X.  Eg, the plethora of ORMs available, some of
> which work better with particular packages than others do.
> 
> We could also recommend that package maintainers document such
> recommendations, preferably in a fairly standard place, in their
> package documentation.  Even something like "I've successfully used Z
> to do Y in combination with this package" would often help a lot.
> 
> If a maintainer (obvious extension: 3rd party recommendations and
> voting) wants to recommend other packages that work and play well with
> her package but aren't essential to its function, how about a
> dictionary mapping Trove classifiers to lists of recommended packages
> for that implmenentation?

This is a really cool idea, but it would help to have some specific examples.

For example, BeautifulSoup can only use html5lib or lxml as optional HTML parsers, and lxml as an optional XML parser; nothing else will do any good. But it works well with any HTTP request engine, so any "global" recommendation is a good idea, so it should get the same list (say, requests, urllib3, grequests, pycurl) as any other project that wants to suggest an HTTP request engine. And as for scraper frameworks, that should look at the global recommendations, but restricted to the ones that use, or can use, BeautifulSoup. I'm not sure how to reasonably represent all three of those things in a node.

Of course it's quite possible that I jumped right to a particularly hard example with unique problems that don't need to be solved in general, and really only the first one is necessary, in which case this is a much simpler problem...


From stephen at xemacs.org  Sat May 23 08:55:16 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 23 May 2015 15:55:16 +0900
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <8DF1D599-2DF9-4F36-8235-BAF23B3E0076@yahoo.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <8DF1D599-2DF9-4F36-8235-BAF23B3E0076@yahoo.com>
Message-ID: <87fv6nhjfv.fsf@uwakimon.sk.tsukuba.ac.jp>

Andrew Barnert writes:

 > > If a maintainer (obvious extension: 3rd party recommendations and
 > > voting) wants to recommend other packages that work and play well with
 > > her package but aren't essential to its function, how about a
 > > dictionary mapping Trove classifiers to lists of recommended packages
 > > for that implmenentation?
 > 
 > This is a really cool idea, but it would help to have some specific examples.
 > 
 > For example, BeautifulSoup can only use html5lib or lxml as
 > optional HTML parsers, and lxml as an optional XML parser; nothing
 > else will do any good. But it works well with any HTTP request
 > engine, so any "global" recommendation is a good idea, so it should
 > get the same list (say, requests, urllib3, grequests, pycurl) as
 > any other project that wants to suggest an HTTP request engine. And
 > as for scraper frameworks, that should look at the global
 > recommendations, but restricted to the ones that use, or can use,
 > BeautifulSoup. I'm not sure how to reasonably represent all three
 > of those things in a node.

Well, #2 is easy.  You just have a special "global" node that has the
same kind of classifier->package map, and link to that.  I don't think
#3 can be handled so easily, and probably it's not really worth it
complexifying things that far at first -- I think you probably need
most of SQL to express such constraints.  I suspect that I would
handle #3 with a special sort of "group" package, that just requires
certain classifiers and then recommends implementations of them that
work well together.  It would be easy for the database to
automatically update a group's recommended implementations to point to
the group (which would be yet another new attribute for the package).

I'll take a look at the whole shebang and see if I can come up with
something a bit more elegant than the crockery of adhoc-ery above, but
it will be at least next week before I have anything to say.

Steve

From p.andrefreitas at gmail.com  Sat May 23 13:00:22 2015
From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=)
Date: Sat, 23 May 2015 11:00:22 +0000
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
 <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>
 <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com>
Message-ID: <CAMkX=YUYVtpSNjNfMcMTzpRmT=w8M6n+Mhd-wu+CyyXJwsuFqQ@mail.gmail.com>

Andrew,
Thanks for sharing your thoughts.

I am trying to write a Cmake file for cpython for those that want to
contribute using Clion IDE and I will put it on a public repository. If it
can be useful for some developers is worth sharing.

Best regards,
Andr? Freitas



Em s?b, 23 de mai de 2015 ?s 03:08, Andrew Barnert <abarnert at yahoo.com>
escreveu:

> Sorry, meant to include this in my previous reply, but I accidentally cut
> and didn't paste...
>
> Sent from my iPhone
>
> On May 22, 2015, at 17:08, Andr? Freitas <p.andrefreitas at gmail.com> wrote:
>
> Hi,
>
> Thanks for sharing Ryan Gonzalez :)
>
> It just could be another alternative and not a replacement of autotools.
>
>
> If the problem is that the autotools build system is a nightmare to
> maintain, how is having two completely different complex build systems that
> have to be kept perfectly in sync not going to be an even bigger nightmare?
>
> Not only about the cross-platform feature of Cmake but the integration
> with modern IDEs. I really see an improvement in productivity using the IDE
> debugger (e.g Clion) instead of using prints everywhere (
> http://programmers.stackexchange.com/questions/78152/real-programmers-use-debuggers
> ).
>
>
> Why did you link to a question that was migrated and then closed as not
> constructive, and that was written to argue that debuggers are useless, and
> whose contrary answers only talk about command-line debugging rather than
> whether a GUI wrapper can help debugging? That seems to argue against your
> case, not for it...
>
> 2015-05-23 0:48 GMT+01:00 Ryan Gonzalez <rymg19 at gmail.com>:
>
>> HAHAHA!!
>>
>> Good luck! I've raised this issue before. Twice. Autotools sucks. And
>> makes cross-compiling a pain in the neck. Bottom line was:
>>
>> - C++ is a big dependency
>> - The autotools build system has been tested already on lots and lots and
>> lots of platforms
>> - Nobody has even implemented an alternative build system for Python 3
>> yet (python-cmake is only for Python 2)
>> - No one can agree on a best build system (for instance, I hate CMake!)
>>
>>
>> On Fri, May 22, 2015 at 6:08 PM, Andr? Freitas <p.andrefreitas at gmail.com>
>> wrote:
>>
>>> Hi,
>>> What you think about using Cmake build system?
>>>
>>> I see advantages such as:
>>> - Cross-plataform;
>>> - Supported in Clion IDE (amazing C/C++ IDE, breakpoints, etc);
>>> - Simple and easy to use (Zen of Python :)
>>> https://www.python.org/dev/peps/pep-0020/ );
>>>
>>> I was actually seeing a discussion in python-commiters about Windows 7
>>> buildbots failing. Found that someone already had the same idea but don't
>>> know if it was shared here:
>>> http://www.vtk.org/Wiki/BuildingPythonWithCMake
>>>
>>> Please share your thoughts.
>>>
>>> Regards,
>>> Andr? Freitas
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>>
>> --
>> Ryan
>> [ERROR]: Your autotools build scripts are 200 lines longer than your
>> program. Something?s wrong.
>> http://kirbyfan64.github.io/
>>
>>
>
>
>
> --
> Andr? Freitas
> p.andrefreitas at gmail.com
> "Imagination is more important than knowledge" - Albert Einstein
> *google+* Andr?Freitas92 <https://plus.google.com/+Andr?Freitas92>
> *linkedin* pandrefreitas <http://pt.linkedin.com/in/pandrefreitas/>
> *github* andrefreitas <https://github.com/andrefreitas>
> *website* www.andrefreitas.pt <http://andrefreitas.pt>
> Esta mensagem pode conter informa??o confidencial ou privilegiada, sendo
> seu sigilo protegido por lei. Se voc? n?o for o destinat?rio ou a pessoa
> autorizada a receber esta mensagem, n?o pode usar, copiar ou divulgar as
> informa??es nela contidas ou tomar qualquer a??o baseada nessas
> informa??es. Se voc? recebeu esta mensagem por engano, por favor, avise
> imediatamente ao remetente, respondendo o e-mail e em seguida apague-a.
> Agradecemos a sua coopera??o.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150523/ab03c648/attachment.html>

From ncoghlan at gmail.com  Sat May 23 16:21:48 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 24 May 2015 00:21:48 +1000
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>

On 23 May 2015 at 12:59, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Donald Stufft writes:
>  > It's being solved for very specific cases by starting to have the
>  > standard documentation explicitly call out these defacto standards
>  > of the Python ecosystem where it makes sense.
>
> Because that's necessarily centralized, it's a solution to a different
> problem.  We need a decentralized approach to deal with the "people
> who use package X often would benefit from Y too, but don't know where
> to find Y or which implementation to use."  IOW, there needs to be a
> way for X to recommend implementation Z (or implementations Z1 or Z2)
> of Y.

https://www.djangopackages.com/ covers this well for the Django
ecosystem (I actually consider it to be one of Django's killer
features, and I'm pretty sure I'm not alone in that - like
ReadTheDocs, it was a product of DjangoDash 2010).

There was an effort a few years back to set up an instance of that for
PyPI in general, as well as similar comparison sites for Pyramid and
Plone, but none of them ever hit the same kind of critical mass of
useful input as the Django one.

The situation has changed substantially since then, though, as we've
been more actively promoting pip, PyPI and third party libraries as
part of the recommended Python developer experience, and the main
standard library documentation now delegates to packaging.python.org
for the details after very brief introductions to installing and
publishing packages.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Sat May 23 16:41:08 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 24 May 2015 00:41:08 +1000
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
 <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>
 <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com>
Message-ID: <CADiSq7cx6gXCa=QWBUMzA+0i+r41Zsb5VjRHXyTv-4XYetY9oA@mail.gmail.com>

On 23 May 2015 at 12:08, Andrew Barnert via Python-ideas <
python-ideas at python.org> wrote:

> Sorry, meant to include this in my previous reply, but I accidentally cut
> and didn't paste...
>
> Sent from my iPhone
>
> On May 22, 2015, at 17:08, Andr? Freitas <p.andrefreitas at gmail.com> wrote:
>
> Hi,
> Thanks for sharing Ryan Gonzalez :)
>
> It just could be another alternative and not a replacement of autotools.
>
>
> If the problem is that the autotools build system is a nightmare to
> maintain, how is having two completely different complex build systems that
> have to be kept perfectly in sync not going to be an even bigger nightmare?
>

Three - we already have to keep autotools and the MSVS solution in sync
(except where they're deliberately different, such as always bundling
OpenSSL on Windows).

I don't think there's actually any active *opposition* to replacing
autotools, there just aren't currently any sufficiently compelling
alternatives out there to motivate someone to do all the work involved in
proposing a change, working through all the build requirements across all
the different redistributor channels (including the nascent iOS and Android
support being pursued on mobile-sig), and figuring out how to get from
point A to point B without breaking the world at any point in the process.

That said, I'll admit that to someone interested in the alternatives,
listing some of the problems that autotools is currently solving for us may
*sound* like opposition, rather than accurately scoping out the problem
requirements and the transition to be managed :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150524/3fe355e3/attachment-0001.html>

From p.andrefreitas at gmail.com  Sun May 24 01:28:58 2015
From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=)
Date: Sat, 23 May 2015 23:28:58 +0000
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <CADiSq7cx6gXCa=QWBUMzA+0i+r41Zsb5VjRHXyTv-4XYetY9oA@mail.gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
 <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>
 <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com>
 <CADiSq7cx6gXCa=QWBUMzA+0i+r41Zsb5VjRHXyTv-4XYetY9oA@mail.gmail.com>
Message-ID: <CAMkX=YVVjA5kK9sr4gd5n22z9pNJ+uO3F+igpiVP=4_OnM7kEg@mail.gmail.com>

Hi Nick,
I agree with you. You are completely right :)

I am new to python mailing list and contributors. Hope to suggest more
effective ideas in the future.

Best regards,
Andr? Freitas

A s?b, 23/05/2015, 3:41 da tarde, Nick Coghlan <ncoghlan at gmail.com>
escreveu:

> On 23 May 2015 at 12:08, Andrew Barnert via Python-ideas <
> python-ideas at python.org> wrote:
>
>> Sorry, meant to include this in my previous reply, but I accidentally cut
>> and didn't paste...
>>
>> Sent from my iPhone
>>
>> On May 22, 2015, at 17:08, Andr? Freitas <p.andrefreitas at gmail.com>
>> wrote:
>>
>> Hi,
>> Thanks for sharing Ryan Gonzalez :)
>>
>> It just could be another alternative and not a replacement of autotools.
>>
>>
>> If the problem is that the autotools build system is a nightmare to
>> maintain, how is having two completely different complex build systems that
>> have to be kept perfectly in sync not going to be an even bigger nightmare?
>>
>
> Three - we already have to keep autotools and the MSVS solution in sync
> (except where they're deliberately different, such as always bundling
> OpenSSL on Windows).
>
> I don't think there's actually any active *opposition* to replacing
> autotools, there just aren't currently any sufficiently compelling
> alternatives out there to motivate someone to do all the work involved in
> proposing a change, working through all the build requirements across all
> the different redistributor channels (including the nascent iOS and Android
> support being pursued on mobile-sig), and figuring out how to get from
> point A to point B without breaking the world at any point in the process.
>
> That said, I'll admit that to someone interested in the alternatives,
> listing some of the problems that autotools is currently solving for us may
> *sound* like opposition, rather than accurately scoping out the problem
> requirements and the transition to be managed :)
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150523/1475dca5/attachment.html>

From rymg19 at gmail.com  Sun May 24 01:44:31 2015
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Sat, 23 May 2015 18:44:31 -0500
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <CAMkX=YVVjA5kK9sr4gd5n22z9pNJ+uO3F+igpiVP=4_OnM7kEg@mail.gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
 <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>
 <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com>
 <CADiSq7cx6gXCa=QWBUMzA+0i+r41Zsb5VjRHXyTv-4XYetY9oA@mail.gmail.com>
 <CAMkX=YVVjA5kK9sr4gd5n22z9pNJ+uO3F+igpiVP=4_OnM7kEg@mail.gmail.com>
Message-ID: <56E6005F-B559-4181-8512-42AA303F9C8F@gmail.com>

No worries. Most of my ideas still get vetoed in 10 minutes. :)

On May 23, 2015 6:28:58 PM CDT, "Andr? Freitas" <p.andrefreitas at gmail.com> wrote:
>Hi Nick,
>I agree with you. You are completely right :)
>
>I am new to python mailing list and contributors. Hope to suggest more
>effective ideas in the future.
>
>Best regards,
>Andr? Freitas
>
>A s?b, 23/05/2015, 3:41 da tarde, Nick Coghlan <ncoghlan at gmail.com>
>escreveu:
>
>> On 23 May 2015 at 12:08, Andrew Barnert via Python-ideas <
>> python-ideas at python.org> wrote:
>>
>>> Sorry, meant to include this in my previous reply, but I
>accidentally cut
>>> and didn't paste...
>>>
>>> Sent from my iPhone
>>>
>>> On May 22, 2015, at 17:08, Andr? Freitas <p.andrefreitas at gmail.com>
>>> wrote:
>>>
>>> Hi,
>>> Thanks for sharing Ryan Gonzalez :)
>>>
>>> It just could be another alternative and not a replacement of
>autotools.
>>>
>>>
>>> If the problem is that the autotools build system is a nightmare to
>>> maintain, how is having two completely different complex build
>systems that
>>> have to be kept perfectly in sync not going to be an even bigger
>nightmare?
>>>
>>
>> Three - we already have to keep autotools and the MSVS solution in
>sync
>> (except where they're deliberately different, such as always bundling
>> OpenSSL on Windows).
>>
>> I don't think there's actually any active *opposition* to replacing
>> autotools, there just aren't currently any sufficiently compelling
>> alternatives out there to motivate someone to do all the work
>involved in
>> proposing a change, working through all the build requirements across
>all
>> the different redistributor channels (including the nascent iOS and
>Android
>> support being pursued on mobile-sig), and figuring out how to get
>from
>> point A to point B without breaking the world at any point in the
>process.
>>
>> That said, I'll admit that to someone interested in the alternatives,
>> listing some of the problems that autotools is currently solving for
>us may
>> *sound* like opposition, rather than accurately scoping out the
>problem
>> requirements and the transition to be managed :)
>>
>> Cheers,
>> Nick.
>>
>> --
>> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>https://mail.python.org/mailman/listinfo/python-ideas
>Code of Conduct: http://python.org/psf/codeofconduct/

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150523/03fcd0a8/attachment.html>

From ncoghlan at gmail.com  Sun May 24 02:32:18 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 24 May 2015 10:32:18 +1000
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <56E6005F-B559-4181-8512-42AA303F9C8F@gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
 <CAMkX=YWL+TYzs2AS5Qe1+kqoD4myTuqVbT5o-TG_sq4HX7485w@mail.gmail.com>
 <67F8BEBA-DEDD-4983-9F0D-3A3D81BA13CD@yahoo.com>
 <CADiSq7cx6gXCa=QWBUMzA+0i+r41Zsb5VjRHXyTv-4XYetY9oA@mail.gmail.com>
 <CAMkX=YVVjA5kK9sr4gd5n22z9pNJ+uO3F+igpiVP=4_OnM7kEg@mail.gmail.com>
 <56E6005F-B559-4181-8512-42AA303F9C8F@gmail.com>
Message-ID: <CADiSq7dU8d8RbBhFfmRx_F3eKzw-A4M0xOCYMi5Smc1z7+bxWQ@mail.gmail.com>

On 24 May 2015 09:44, "Ryan Gonzalez" <rymg19 at gmail.com> wrote:
>
> No worries. Most of my ideas still get vetoed in 10 minutes. :)

Having a place to publish those is a big part of the reason this list
exists, though.

Even when we ultimately decide an idea *isn't* worth pursuing, the pay-off
is having both a pool of contributors that appreciate the problems with the
idea, as well as a permanent public record of the related discussion.

And that's before we even get to the fact that the first step in having
good ideas is simply having lots of ideas to consider for refinement. Folks
that would prefer to focus their limited time on the
at-least-potentially-plausible suggestions have the option of just
following python-dev and skipping python-ideas entirely. (Respecting that
is why we try to be fairly strict in redirecting more speculative
discussions back here rather than letting them continue indefinitely on
python-dev)

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150524/891224fa/attachment.html>

From gmludo at gmail.com  Sun May 24 13:56:43 2015
From: gmludo at gmail.com (Ludovic Gasc)
Date: Sun, 24 May 2015 13:56:43 +0200
Subject: [Python-ideas] Adding jsonschema to the standard library
In-Reply-To: <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
Message-ID: <CAON-fpGUqLJp=6CGvEGMJCvMQamVic3Gf2WER18jxk4_B92+ow@mail.gmail.com>

Hi all,

After to read all responses, I've changed my mind:
At the first look, the advantage to push jsonschema into Python lib is to
standardize and promote an actual good practice.
But yes, you're right, it's too early to include that because the standard
should be changed and/or abandonned by a new good practice, like SOAP and
REST.

It's more future proof to promote PyPI and pip to Python developers.

Regards.


--
Ludovic Gasc (GMLudo)
http://www.gmludo.eu/

2015-05-23 16:21 GMT+02:00 Nick Coghlan <ncoghlan at gmail.com>:

> On 23 May 2015 at 12:59, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> > Donald Stufft writes:
> >  > It's being solved for very specific cases by starting to have the
> >  > standard documentation explicitly call out these defacto standards
> >  > of the Python ecosystem where it makes sense.
> >
> > Because that's necessarily centralized, it's a solution to a different
> > problem.  We need a decentralized approach to deal with the "people
> > who use package X often would benefit from Y too, but don't know where
> > to find Y or which implementation to use."  IOW, there needs to be a
> > way for X to recommend implementation Z (or implementations Z1 or Z2)
> > of Y.
>
> https://www.djangopackages.com/ covers this well for the Django
> ecosystem (I actually consider it to be one of Django's killer
> features, and I'm pretty sure I'm not alone in that - like
> ReadTheDocs, it was a product of DjangoDash 2010).
>
> There was an effort a few years back to set up an instance of that for
> PyPI in general, as well as similar comparison sites for Pyramid and
> Plone, but none of them ever hit the same kind of critical mass of
> useful input as the Django one.
>
> The situation has changed substantially since then, though, as we've
> been more actively promoting pip, PyPI and third party libraries as
> part of the recommended Python developer experience, and the main
> standard library documentation now delegates to packaging.python.org
> for the details after very brief introductions to installing and
> publishing packages.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150524/c9cedf77/attachment.html>

From gmludo at gmail.com  Mon May 25 00:26:33 2015
From: gmludo at gmail.com (Ludovic Gasc)
Date: Mon, 25 May 2015 00:26:33 +0200
Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in
 logging module to simplify structured logs support
In-Reply-To: <CAP7+vJJ5Z__GJxvueuj_Dg98Uu7krWxNw34MUR7GZSquC5iNPw@mail.gmail.com>
References: <CAON-fpEHScFMFyjL_uYWY4XtUGskYuYqDi5zgno4aYYFe2GJsw@mail.gmail.com>
 <CAP7+vJJ5Z__GJxvueuj_Dg98Uu7krWxNw34MUR7GZSquC5iNPw@mail.gmail.com>
Message-ID: <CAON-fpEeZ1tscPBUdiEhcxC4MDx5nTkE5-hxWxDfSEquovtSZA@mail.gmail.com>

Hi Python-Ideas ML,

To resume quickly the idea: I wish to add "extra" attribute to LogMessage,
to facilitate structured logs generation.
For more details with use case and example, you can read message below.

Before to push the patch on bugs.python.org, I'm interested in by your
opinions: the patch seems to be too simple to be honest.

Regards.
--
Ludovic Gasc (GMLudo)
http://www.gmludo.eu/

---------- Forwarded message ----------
From: Guido van Rossum <guido at python.org>
Date: 2015-05-24 23:44 GMT+02:00
Subject: Re: [Python-Dev] An yocto change proposal in logging module to
simplify structured logs support
To: Ludovic Gasc <gmludo at gmail.com>


Ehh, python-ideas?

On Sun, May 24, 2015 at 10:22 AM, Ludovic Gasc <gmludo at gmail.com> wrote:

> Hi,
>
> 1. The problem
>
> For now, when you want to write a log message, you concatenate the data
> from your context to generate a string: In fact, you convert your
> structured data to a string.
> When a sysadmin needs to debug your logs when something is wrong, he must
> write regular expressions to extract interesting data.
>
> Often, he must find the beginning of the interesting log and follow the
> path. Sometimes, you can have several requests in the same time in the log,
> it's harder to find interesting log.
> In fact, with regular expressions, the sysadmin tries to convert the log
> lines strings to structured data.
>
> 2. A possible solution
>
> You should provide a set of regular expressions to your sysadmins to help
> them to find the right logs, however, another approach is possible:
> structured logs.
> Instead of to break your data structure to push in the log message, the
> idea is to keep the data structure, to attach that as metadata of the log
> message.
> For now, I know at least Logstash and Journald that can handle structured
> logs and provide a query tool to extract easily logs.
>
> 3. A concrete example with structured logs
>
> As most Web developers, we build HTTP daemons used by several different
> human clients in the same time.
> In the Python source code, to support structured logs, you don't have a
> big change, you can use "extra" parameter for that, example:
>
>     [handle HTTP request]
>     LOG.debug('Receive a create_or_update request', extra={'request_id':
> request.request_id,
>
>                        'account_id': account_id,
>
>                        'aiohttp_request': request,
>
>                        'payload': str(payload)})
>    [create data in database]
>     LOG.debug('Callflow created', extra={'account_id': account_id,
>                                              'request_id':
> request.request_id,
>                                              'aiopg_cursor': cur,
>                                              'results': row})
>
> Now, if you want, you can enhance the structured log with a custom logging
> Handler, because the standard journald handler doesn't know how to handle
> aiohttp_request or aiopg_cursor.
> My example is based on journald, but you can write an equivalent version
> with python-logstash:
> ####
> from systemdream.journal.handler import JournalHandler
>
> class Handler(JournalHandler):
>     # Tip: on a system without journald, use socat to test:
>     # socat UNIX-RECV:/run/systemd/journal/socket STDIN
>     def emit(self, record):
>         if record.extra:
>             # import ipdb; ipdb.set_trace()
>             if 'aiohttp_request' in record.extra:
>                 record.extra['http_method'] =
> record.extra['aiohttp_request'].method
>                 record.extra['http_path'] =
> record.extra['aiohttp_request'].path
>                 record.extra['http_headers'] =
> str(record.extra['aiohttp_request'].headers)
>                 del(record.extra['aiohttp_request'])
>             if 'aiopg_cursor' in record.extra:
>                 record.extra['pg_query'] =
> record.extra['aiopg_cursor'].query.decode('utf-8')
>                 record.extra['pg_status_message'] =
> record.extra['aiopg_cursor'].statusmessage
>                 record.extra['pg_rows_count'] =
> record.extra['aiopg_cursor'].rowcount
>                 del(record.extra['aiopg_cursor'])
>         super().emit(record)
> ####
>
> And you can enable this custom handler in your logging config file like
> this:
> [handler_journald]
> class=XXXXXXXXXX.utils.logs.Handler
> args=()
> formatter=detailed
>
> And now, with journalctl, you can easily extract logs, some examples:
> Logs messages from 'lg' account:
>     journalctl ACCOUNT_ID=lg
> All HTTP requests that modify the 'lg' account (PUT, POST and DELETE):
>     journalctl ACCOUNT_ID=lg HTTP_METHOD=PUT
> HTTP_METHOD=POST HTTP_METHOD=DELETE
> Retrieve all logs from one specific HTTP request:
>     journalctl REQUEST_ID=130b8fa0-6576-43b6-a624-4a4265a2fbdd
> All HTTP requests with a specific path:
>     journalctl HTTP_PATH=/v1/accounts/lg/callflows
> All logs of "create" function in the file "example.py"
>    journalctl CODE_FUNC=create CODE_FILE=/path/example.py
>
> If you already do a troubleshooting on a production system, you should
> understand the interest of this:
> In fact, it's like to have SQL queries capabilities, but it's logging
> oriented.
> We use that since a small time on one of our critical daemon that handles
> a lot of requests across several servers, it's already adopted from our
> support team.
>
> 4. The yocto issue with the Python logging module
>
> I don't explain here a small part of my professional life for my pleasure,
> but to help you to understand the context and the usages, because my patch
> for logging is very small.
> If you're an expert of Python logging, you already know that my Handler
> class example I provided above can't run on a classical Python logging,
> because LogRecord doesn't have an extra attribute.
>
> extra parameter exists in the Logger, but, in the LogRecord, it's merged
> as attributes of LogRecord:
> https://github.com/python/cpython/blob/master/Lib/logging/__init__.py#L1386
>
> It means, that when the LogRecord is sent to the Handler, you can't
> retrieve the dict from the extra parameter of logger.
> The only way to do that without to patch Python logging, is to rebuild by
> yourself the dict with a list of official attributes of LogRecord, as is
> done in python-logstash:
>
> https://github.com/vklochan/python-logstash/blob/master/logstash/formatter.py#L23
> At least to me, it's a little bit dirty.
>
> My quick'n'dirty patch I use for now on our CPython on production:
>
> diff --git a/Lib/logging/__init__.py b/Lib/logging/__init__.py
> index 104b0be..30fa6ef 100644
> --- a/Lib/logging/__init__.py
> +++ b/Lib/logging/__init__.py
> @@ -1382,6 +1382,7 @@ class Logger(Filterer):
>          """
>          rv = _logRecordFactory(name, level, fn, lno, msg, args, exc_info,
> func,
>                               sinfo)
> +        rv.extra = extra
>          if extra is not None:
>              for key in extra:
>                  if (key in ["message", "asctime"]) or (key in
> rv.__dict__):
>
> At least to me, it should be cleaner to add "extra" as parameter
> of _logRecordFactory, but I've no idea of side effects, I understand that
> logging module is critical, because it's used everywhere.
> However, except with python-logstash, to my knowledge, extra parameter
> isn't massively used.
> The only backward incompatibility I see with a new extra attribute of
> LogRecord, is that if you have a log like this:
>     LOG.debug('message', extra={'extra': 'example'})
> It will raise a KeyError("Attempt to overwrite 'extra' in LogRecord")
> exception, but, at least to me, the probability of this use case is near to
> 0.
>
> Instead of to "maintain" this yocto patch, even it's very small, I should
> prefer to have a clean solution in Python directly.
>
> Thanks for your remarks.
>
> Regards.
> --
> Ludovic Gasc (GMLudo)
> http://www.gmludo.eu/
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/b2e104fc/attachment-0001.html>

From steve at pearwood.info  Mon May 25 04:19:07 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 25 May 2015 12:19:07 +1000
Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in
	logging module to simplify structured logs support
In-Reply-To: <CAON-fpEeZ1tscPBUdiEhcxC4MDx5nTkE5-hxWxDfSEquovtSZA@mail.gmail.com>
References: <CAON-fpEHScFMFyjL_uYWY4XtUGskYuYqDi5zgno4aYYFe2GJsw@mail.gmail.com>
 <CAP7+vJJ5Z__GJxvueuj_Dg98Uu7krWxNw34MUR7GZSquC5iNPw@mail.gmail.com>
 <CAON-fpEeZ1tscPBUdiEhcxC4MDx5nTkE5-hxWxDfSEquovtSZA@mail.gmail.com>
Message-ID: <20150525021907.GD5663@ando.pearwood.info>

On Mon, May 25, 2015 at 12:26:33AM +0200, Ludovic Gasc wrote:
> Hi Python-Ideas ML,
> 
> To resume quickly the idea: I wish to add "extra" attribute to LogMessage,
> to facilitate structured logs generation.

The documentation for the logging module already includes a recipe for 
simple structured logging:

https://docs.python.org/2/howto/logging-cookbook.html#implementing-structured-logging

At the other extreme, there is the structlog module:

https://structlog.readthedocs.org/en/stable/

How does your change compare to those?


-- 
Steve

From rustompmody at gmail.com  Mon May 25 07:06:00 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Mon, 25 May 2015 10:36:00 +0530
Subject: [Python-ideas] Framework for Python for CS101
Message-ID: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>

Context:  A bunch of my students will be working with me (if all goes
according to plan!!)to hack on/in CPython sources.

One of the things we would like to try is a framework for CS101 [Intro to
programming]

So for example beginners get knocked out by None 'disappearing' from the
prompt
Correctable by

>>> import sys
>>> sys.displayhook = print

Now of course one can say: "If you want that behavior, set it as you choose"
However at the stage that beginners are knocked down by such, setting up a
pythonstartup file is a little premature.

So the idea (inspired by Scheme's racket) is to have a sequence of
'teachpacks'.
They are like concentric rings, the innermost one being the noob ring, the
outermost one being standard python.

Now note that while the larger changes would in general be restrictions, ie
subsetting standard python, they may not be easily settable in
PYTHONSTARTUP.
eg sorted function and sort method confusion
extend/append/etc mutable methods vs immutable '+'

Now different teachers may like to navigate the world of python differently.
So for example I prefer to start with the immutable (functional) subset and
go on to the stateful/imperative.  The point (here) is not so much which is
preferable so much as this that a given teacher should have the freedom to
chart out a course through python in which (s)he can cross out certain
features at certain points for students.  So a teacher preferring to
emphasise OO/imperative over functional may prefer the opposite choice.

[Aside:  ACM curriculum 2013 juxtaposes OO and FP as absolute basic in core
CS
https://www.acm.org/education/CS2013-final-report.pdf
pgs 157,158
]

So the idea is to make a framework for teachers to easily configure and
select teachpacks to their taste.

How does that sound?

Rusi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/6ab9742e/attachment.html>

From abarnert at yahoo.com  Mon May 25 10:01:10 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 25 May 2015 01:01:10 -0700
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
Message-ID: <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>

On May 24, 2015, at 22:06, Rustom Mody <rustompmody at gmail.com> wrote:
> 
> Context:  A bunch of my students will be working with me (if all goes according to plan!!)to hack on/in CPython sources. 
> 
> One of the things we would like to try is a framework for CS101 [Intro to programming]
> 
> So for example beginners get knocked out by None 'disappearing' from the prompt
> Correctable by
> 
> >>> import sys 
> >>> sys.displayhook = print 
> 
> Now of course one can say: "If you want that behavior, set it as you choose"
> However at the stage that beginners are knocked down by such, setting up a pythonstartup file is a little premature.
> 
> So the idea (inspired by Scheme's racket) is to have a sequence of 'teachpacks'.
> They are like concentric rings, the innermost one being the noob ring, the outermost one being standard python.

How exactly does this work? Is it basically just a custom pythonstartup file that teachers can give to their students? Maybe with some menu- or wizard-based configuration to help create the file? Or is this some different mechanism? If so, what does setting it up, and distributing it to students, look like?

I realize that below you talk about doing things that are currently not easy to do in a pythonstartup, like hiding all mutating sequence methods, but presumably the patches to the interpreter core would be something like adding hide_mutating_sequence_methods() and similar functions that teachers could then choose to include in the pythonstartup file or whatever they give out.

> Now note that while the larger changes would in general be restrictions, ie subsetting standard python, they may not be easily settable in PYTHONSTARTUP.
> eg sorted function and sort method confusion
> extend/append/etc mutable methods vs immutable '+'
> 
> Now different teachers may like to navigate the world of python differently.
> So for example I prefer to start with the immutable (functional) subset and go on to the stateful/imperative.  The point (here) is not so much which is preferable so much as this that a given teacher should have the freedom to chart out a course through python in which (s)he can cross out certain features at certain points for students.  So a teacher preferring to emphasise OO/imperative over functional may prefer the opposite choice.
> 
> [Aside:  ACM curriculum 2013 juxtaposes OO and FP as absolute basic in core CS 
> https://www.acm.org/education/CS2013-final-report.pdf
> pgs 157,158
> ]
> 
> So the idea is to make a framework for teachers to easily configure and select teachpacks to their taste.
> 
> How does that sound?
> 
> Rusi
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/dbd11e55/attachment.html>

From gmludo at gmail.com  Mon May 25 15:56:40 2015
From: gmludo at gmail.com (Ludovic Gasc)
Date: Mon, 25 May 2015 15:56:40 +0200
Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in
 logging module to simplify structured logs support
In-Reply-To: <20150525021907.GD5663@ando.pearwood.info>
References: <CAON-fpEHScFMFyjL_uYWY4XtUGskYuYqDi5zgno4aYYFe2GJsw@mail.gmail.com>
 <CAP7+vJJ5Z__GJxvueuj_Dg98Uu7krWxNw34MUR7GZSquC5iNPw@mail.gmail.com>
 <CAON-fpEeZ1tscPBUdiEhcxC4MDx5nTkE5-hxWxDfSEquovtSZA@mail.gmail.com>
 <20150525021907.GD5663@ando.pearwood.info>
Message-ID: <CAON-fpFiNMRSbMKQq+5MGuOf6rjzHBrrrgSYKGmKWKoPVWG9yw@mail.gmail.com>

2015-05-25 4:19 GMT+02:00 Steven D'Aprano <steve at pearwood.info>:

> On Mon, May 25, 2015 at 12:26:33AM +0200, Ludovic Gasc wrote:
> > Hi Python-Ideas ML,
> >
> > To resume quickly the idea: I wish to add "extra" attribute to
> LogMessage,
> > to facilitate structured logs generation.
>
> The documentation for the logging module already includes a recipe for
> simple structured logging:
>
>
> https://docs.python.org/2/howto/logging-cookbook.html#implementing-structured-logging


If I understand correctly this recipe, it's "only" to standardize log
message content => not really sysadmin friendly to be read, but the most
important, you must continue to parse and construct a database of
structured logs to query inside.
When you have more than 400+ log messages each second on only one server,
rebuild the data structure isn't a negligible cost, contrary to push
directly a structured data directly on the wire, directly understandable by
your structured log daemon.


>
>
> At the other extreme, there is the structlog module:
>
> https://structlog.readthedocs.org/en/stable/


Thank you for the link, it's an interesting project, it's like "logging"
module but on steroids, some good logging ideas inside.
However, in fact, if I understand correctly, it's the same approach that
the previous recipe: Generate a log file with JSON content,
use logstash-forwarder to reparse the JSON content, to finally send the
structure to logstash, for the query part:
https://structlog.readthedocs.org/en/stable/standard-library.html#suggested-configuration


> How does your change compare to those?
>

In the use case of structlog, drop the logstash-forwarder step to
interconnect directly Python daemon with structured log daemon.
Even if logstash-forwarder should be efficient, why to have an additional
step to rebuild a structure you have at the beginning ?

It's certainly possible to monkey patch or override the logging module to
have this behaviour, nevertheless, it should be cleaner to be directly
integrated in Python.
Moreover, in fact, with the "extra" parameter addition, 99% of the work is
already done in Python, my addition is only to keep explicit the list of
metadata in the LogMessage.

The nice to have, at least to me, is that extra dict should be also usable
to format string message, to avoid to pass two times the same information.

If I don't raise blocking remarks in this discussion, I'll send a patch on
bugs.python.org.

Regards.


>
>
> --
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/4de19ea8/attachment.html>

From chris.barker at noaa.gov  Mon May 25 18:43:36 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Mon, 25 May 2015 09:43:36 -0700
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
Message-ID: <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>

Just a note here, that (as an intro to python teacher), I think this is a
pedagogically bad idea.

At least if the goal is to teach Python -- while you don't need to
introduce all the complexity up front, hiding it just sends students down
the wrong track.

On the other hand, if you want a kind-of-like-python-but-simpler language
to teach particular computer science concepts, this kind of hacking may be
of value.

But I don't think it would be a good idea to build that capability inot
Python itself. And I think you can hack in in with monkey patching anyway
-- so that's probably the way to go.

for example:

"""So for example I prefer to start with the immutable (functional)
subset"""

you can certainly do that by simply using tuples and the functional tools.

(OK, maybe not -- after all most (all?) of the functional stuff returns
lists, not tuples, and that may be beyond monkey-patchable)

But that's going to be a lot of hacking to change.

Is it so bad to have them work with lists in a purely functional way?

-Chris



On Mon, May 25, 2015 at 1:01 AM, Andrew Barnert via Python-ideas <
python-ideas at python.org> wrote:

> On May 24, 2015, at 22:06, Rustom Mody <rustompmody at gmail.com> wrote:
>
> Context:  A bunch of my students will be working with me (if all goes
> according to plan!!)to hack on/in CPython sources.
>
> One of the things we would like to try is a framework for CS101 [Intro to
> programming]
>
> So for example beginners get knocked out by None 'disappearing' from the
> prompt
> Correctable by
>
> >>> import sys
> >>> sys.displayhook = print
>
> Now of course one can say: "If you want that behavior, set it as you
> choose"
> However at the stage that beginners are knocked down by such, setting up a
> pythonstartup file is a little premature.
>
> So the idea (inspired by Scheme's racket) is to have a sequence of
> 'teachpacks'.
> They are like concentric rings, the innermost one being the noob ring, the
> outermost one being standard python.
>
>
> How exactly does this work? Is it basically just a custom pythonstartup
> file that teachers can give to their students? Maybe with some menu- or
> wizard-based configuration to help create the file? Or is this some
> different mechanism? If so, what does setting it up, and distributing it to
> students, look like?
>
> I realize that below you talk about doing things that are currently not
> easy to do in a pythonstartup, like hiding all mutating sequence methods,
> but presumably the patches to the interpreter core would be something like
> adding hide_mutating_sequence_methods() and similar functions that teachers
> could then choose to include in the pythonstartup file or whatever they
> give out.
>
> Now note that while the larger changes would in general be restrictions,
> ie subsetting standard python, they may not be easily settable in
> PYTHONSTARTUP.
> eg sorted function and sort method confusion
> extend/append/etc mutable methods vs immutable '+'
>
> Now different teachers may like to navigate the world of python
> differently.
> So for example I prefer to start with the immutable (functional) subset
> and go on to the stateful/imperative.  The point (here) is not so much
> which is preferable so much as this that a given teacher should have the
> freedom to chart out a course through python in which (s)he can cross out
> certain features at certain points for students.  So a teacher preferring
> to emphasise OO/imperative over functional may prefer the opposite choice.
>
> [Aside:  ACM curriculum 2013 juxtaposes OO and FP as absolute basic in
> core CS
> https://www.acm.org/education/CS2013-final-report.pdf
> pgs 157,158
> ]
>
> So the idea is to make a framework for teachers to easily configure and
> select teachpacks to their taste.
>
> How does that sound?
>
> Rusi
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/519320b0/attachment-0001.html>

From rustompmody at gmail.com  Mon May 25 14:11:19 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Mon, 25 May 2015 05:11:19 -0700 (PDT)
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
Message-ID: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com>


On Monday, May 25, 2015 at 1:31:58 PM UTC+5:30, Andrew Barnert via 
Python-ideas wrote:
>
> On May 24, 2015, at 22:06, Rustom Mody <rusto... at gmail.com <javascript:>> 
> wrote:
>
> Context:  A bunch of my students will be working with me (if all goes 
> according to plan!!)to hack on/in CPython sources. 
>
> One of the things we would like to try is a framework for CS101 [Intro to 
> programming]
>
> So for example beginners get knocked out by None 'disappearing' from the 
> prompt
> Correctable by
>
> >>> import sys 
> >>> sys.displayhook = print 
>
> Now of course one can say: "If you want that behavior, set it as you 
> choose"
> However at the stage that beginners are knocked down by such, setting up a 
> pythonstartup file is a little premature.
>
> So the idea (inspired by Scheme's racket) is to have a sequence of 
> 'teachpacks'.
> They are like concentric rings, the innermost one being the noob ring, the 
> outermost one being standard python.
>
>
> How exactly does this work? Is it basically just a custom pythonstartup 
> file that teachers can give to their students? Maybe with some menu- or 
> wizard-based configuration to help create the file? Or is this some 
> different mechanism? If so, what does setting it up, and distributing it to 
> students, look like?
>

Frankly Ive not thought through these details in detail(!) 
 

> I realize that below you talk about doing things that are currently not 
> easy to do in a pythonstartup, like hiding all mutating sequence methods, 
> but presumably the patches to the interpreter core would be something like 
> adding hide_mutating_sequence_methods() and similar functions that teachers 
> could then choose to include in the pythonstartup file or whatever they 
> give out.
>
>
I personally would wish for other minor surgeries eg a different keyword 
from 'def' for generators.
>From the pov of an experienced programmer the mental load of one keyword 
for two disparate purposes is easy enough to handle and the language 
clutter from an extra keyword is probably just not worth it.
However from having taught python for 10+ years I can say this 
'overloading' causes endless grief and slowdown of beginners.

Then there is even more wishful thinking changes -- distinguishing 
procedure from function.
After 30 years of Lisp and ML and ... and Haskell and 
square-peg-into-round-holing these into python, Ive come to the conclusion 
that Pascal got this distinction more right than all these.  However I 
expect this surgery to be more invasive and pervasive than I can handle 
with my (current) resources.
etc
etc
In short I am talking of a language that is morally equivalent to python 
but cosmetically different and is designed to be conducive to learning 
programming

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/5951fce2/attachment.html>

From wes.turner at gmail.com  Mon May 25 20:17:32 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Mon, 25 May 2015 13:17:32 -0500
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
Message-ID: <CACfEFw931iA8Bmrqa1opgLm-cYBQpEMf2dXQY8nT2OdStNpGzQ@mail.gmail.com>

On Mon, May 25, 2015 at 12:06 AM, Rustom Mody <rustompmody at gmail.com> wrote:

> Context:  A bunch of my students will be working with me (if all goes
> according to plan!!)to hack on/in CPython sources.
>
> One of the things we would like to try is a framework for CS101 [Intro to
> programming]
>

You said framework, and I thought 'web framework' and 'testing':

* Bottle is great; simple; single file; and WSGI-compatible
* TDD! from the start!
https://westurner.org/wiki/awesome-python-testing#web-frameworks


>
> So for example beginners get knocked out by None 'disappearing' from the
> prompt
> Correctable by
>
> >>> import sys
> >>> sys.displayhook = print
>
> Now of course one can say: "If you want that behavior, set it as you
> choose"
> However at the stage that beginners are knocked down by such, setting up a
> pythonstartup file is a little premature.
>
> So the idea (inspired by Scheme's racket) is to have a sequence of
> 'teachpacks'.
> They are like concentric rings, the innermost one being the noob ring, the
> outermost one being standard python.
>

In terms of a curricula graph, are they flat or nested dependencies?


>
> Now note that while the larger changes would in general be restrictions,
> ie subsetting standard python, they may not be easily settable in
> PYTHONSTARTUP.
> eg sorted function and sort method confusion
> extend/append/etc mutable methods vs immutable '+'
>

I add tab-completion in ~/.pythonrc (and little more; so that my scripts
work without trying to remember imports etc) (see:
gh:westurner/dotfiles/etc/.pythonrc; symlinked in by
gh:westurner/dotfiles/scripts/bootstrap_dotfiles.sh).

dotfiles.venv.ipython_config.py (and conda) make navigating VIRTUAL_ENVs
(that are/can be isolated from concurrent changes in system packages)
easier for me.

Depending on your setup, managing that many envs is probably easier with
conda and/or Docker.

* https://github.com/ipython/ipython/wiki/Install:-Docker



> Now different teachers may like to navigate the world of python
> differently.
> So for example I prefer to start with the immutable (functional) subset
> and go on to the stateful/imperative.  The point (here) is not so much
> which is preferable so much as this that a given teacher should have the
> freedom to chart out a course through python in which (s)he can cross out
> certain features at certain points for students.  So a teacher preferring
> to emphasise OO/imperative over functional may prefer the opposite choice.
>

* https://www.reddit.com/r/learnpython/wiki
* https://www.reddit.com/r/learnpython/wiki/books
* https://github.com/scipy-lectures/scipy-lecture-notes (Sphinx)
*
https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks
  * https://github.com/jrjohansson/scientific-python-lectures/ (IPython)

There was also talk of generating EdX courses from IPython notebooks over
on ipython-dev:
http://mail.scipy.org/pipermail/ipython-dev/2015-February/015911.html


> [Aside:  ACM curriculum 2013 juxtaposes OO and FP as absolute basic in
> core CS
> https://www.acm.org/education/CS2013-final-report.pdf
> pgs 157,158
> ]
>
> So the idea is to make a framework for teachers to easily configure and
> select teachpacks to their taste.
>
> How does that sound?
>
> Rusi
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/66565c9a/attachment.html>

From wes.turner at gmail.com  Mon May 25 20:33:41 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Mon, 25 May 2015 13:33:41 -0500
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com>
Message-ID: <CACfEFw9c9BV2cDZ00noGwS3Ja0rLCEZjnrQH9dKDPfDUtM76pg@mail.gmail.com>

On Mon, May 25, 2015 at 7:11 AM, Rustom Mody <rustompmody at gmail.com> wrote:

>
> On Monday, May 25, 2015 at 1:31:58 PM UTC+5:30, Andrew Barnert via
> Python-ideas wrote:
>>
>> On May 24, 2015, at 22:06, Rustom Mody <rusto... at gmail.com> wrote:
>>
>> Context:  A bunch of my students will be working with me (if all goes
>> according to plan!!)to hack on/in CPython sources.
>>
>> One of the things we would like to try is a framework for CS101 [Intro to
>> programming]
>>
>> So for example beginners get knocked out by None 'disappearing' from the
>> prompt
>> Correctable by
>>
>> >>> import sys
>> >>> sys.displayhook = print
>>
>> Now of course one can say: "If you want that behavior, set it as you
>> choose"
>> However at the stage that beginners are knocked down by such, setting up
>> a pythonstartup file is a little premature.
>>
>> So the idea (inspired by Scheme's racket) is to have a sequence of
>> 'teachpacks'.
>> They are like concentric rings, the innermost one being the noob ring,
>> the outermost one being standard python.
>>
>>
>> How exactly does this work? Is it basically just a custom pythonstartup
>> file that teachers can give to their students? Maybe with some menu- or
>> wizard-based configuration to help create the file? Or is this some
>> different mechanism? If so, what does setting it up, and distributing it to
>> students, look like?
>>
>
> Frankly Ive not thought through these details in detail(!)
>
>
>> I realize that below you talk about doing things that are currently not
>> easy to do in a pythonstartup, like hiding all mutating sequence methods,
>> but presumably the patches to the interpreter core would be something like
>> adding hide_mutating_sequence_methods() and similar functions that teachers
>> could then choose to include in the pythonstartup file or whatever they
>> give out.
>>
>>
> I personally would wish for other minor surgeries eg a different keyword
> from 'def' for generators.
> From the pov of an experienced programmer the mental load of one keyword
> for two disparate purposes is easy enough to handle and the language
> clutter from an extra keyword is probably just not worth it.
> However from having taught python for 10+ years I can say this
> 'overloading' causes endless grief and slowdown of beginners.
>

* https://docs.python.org/2/library/tokenize.html
* https://hg.python.org/cpython/file/2.7/Lib/tokenize.py
* https://hg.python.org/cpython/file/tip/Grammar/Grammar
*
https://www.youtube.com/watch?v=R31NRWgoIWM&index=9&list=PLt_DvKGJ_QLZd6Gpug-6x4eYoHPy4q_kb

* https://docs.python.org/devguide/compiler.html
* https://docs.python.org/2/library/compiler.html

I identify functions that are generators by the 'yield' (and 'yield from')
tokens.

I document functions that yield:

def generating_function(n):
    """Generate a sequence
    # numpy style
    :returns: (1,2,..,n)
    :rtype: generator (int)

    # google-style
    Yields:
        int: (1,2,n)

    """
    returns (x for x in range(n))




> Then there is even more wishful thinking changes -- distinguishing
> procedure from function.
> After 30 years of Lisp and ML and ... and Haskell and
> square-peg-into-round-holing these into python, Ive come to the conclusion
> that Pascal got this distinction more right than all these.  However I
> expect this surgery to be more invasive and pervasive than I can handle
> with my (current) resources.
>

@staticmethod, @classmethod, @property decorators


> etc
> etc
> In short I am talking of a language that is morally equivalent to python
> but cosmetically different and is designed to be conducive to learning
> programming
>

I suppose you could fork to teach; but [...].

You might check out http://pythontutor.com/ and/or http://www.brython.info/
(Python, JS, and compilation).


>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/c0c16899/attachment-0001.html>

From gokoproject at gmail.com  Mon May 25 21:48:18 2015
From: gokoproject at gmail.com (John Wong)
Date: Mon, 25 May 2015 15:48:18 -0400
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CACfEFw9c9BV2cDZ00noGwS3Ja0rLCEZjnrQH9dKDPfDUtM76pg@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com>
 <CACfEFw9c9BV2cDZ00noGwS3Ja0rLCEZjnrQH9dKDPfDUtM76pg@mail.gmail.com>
Message-ID: <CACCLA562mciF4TmsbAuyjB7cyoqaZwQUxhUfqc=gyhZwrq3HdQ@mail.gmail.com>

The title is very catchy :-) I am genuinely interested in improving CS
education, both from the perspective of a recent grad and a learner in
general (who isn't?). But I think we all do appreciate every effort this
community put together as a whole.

> One of the things we would like to try is a framework for CS101 [Intro to
programming]
I am still unsure what kind of framework we are proposing. What is the
exact goal? Is it just teaching more FP? Benefit? What are the user
stories? Maybe this is not something people would integrate into CPython,
and probably something you'd fork from CPython.

>From my experience, the hard thing about learning a new programming
language is not always about undefined behaviors, under-documented APIs or
confusing syntax, but sometimes the lack of documentation demonstrating how
to effectively use the language and leverage community tools frustrate
people. Python community is extremely supportive and there is so many info
out there that a search would yield some useful answer right away. For some
other languages that is not that case. But Python can do better, and I feel
this is the issue with learning a language more effective. Teaching syntax,
or understand how to model your problem before you crack a solution are
secondary IMO. I as a learner is far more interested in "how do I
accomplish XYZ" and then I will get curious about how one come to a
particular solution. Similarly, when I see beautiful Vim setup I would go
look for "how to setup my Vim like this."

I say all the above because, IMO, OO vs FP, Scheme vs Python vs Java vs C
is really not an interesting debate. I am biased because the first language
I was taught officially in my undergraduate career was Python (but I knew
basic programming well before that). I appreciate the expressive of FP, and
how FP "supposedly" help reason your solution, much closer how you would
write a proof. But with all due respect, I don't think FP vs OO is really
the problem in CS 101, or just about any one learning a new language.

Teachers (including TAs) would have to spend time to get a Python setup
working on each student's workstation, or troubleshoot environmental issue
(to be fair, this is part of learning dealing with different platform,
different toolsets). Hence I love projects which aim to reduce
administrative tasks in classroom (ipython notebook, online
compiler/interpreter etc).

Or maybe more verbose warning for beginners? For example, there was a PEP
(???) to add a warning when someone type (print "hello world") in Python 3
instead of showing invalid syntax. That can be extremely helpful for
beginners and this is something worth thinking. For example, certain errors
/ deprecations could show "refer to this awesome PEP, or refer to this
interesting discussion, or refer to this really well-written blogpost -
something python core-dev agrees with."

Thanks.

John

On Mon, May 25, 2015 at 2:33 PM, Wes Turner <wes.turner at gmail.com> wrote:

>
>
> On Mon, May 25, 2015 at 7:11 AM, Rustom Mody <rustompmody at gmail.com>
> wrote:
>
>>
>> On Monday, May 25, 2015 at 1:31:58 PM UTC+5:30, Andrew Barnert via
>> Python-ideas wrote:
>>>
>>> On May 24, 2015, at 22:06, Rustom Mody <rusto... at gmail.com> wrote:
>>>
>>> Context:  A bunch of my students will be working with me (if all goes
>>> according to plan!!)to hack on/in CPython sources.
>>>
>>> One of the things we would like to try is a framework for CS101 [Intro
>>> to programming]
>>>
>>> So for example beginners get knocked out by None 'disappearing' from the
>>> prompt
>>> Correctable by
>>>
>>> >>> import sys
>>> >>> sys.displayhook = print
>>>
>>> Now of course one can say: "If you want that behavior, set it as you
>>> choose"
>>> However at the stage that beginners are knocked down by such, setting up
>>> a pythonstartup file is a little premature.
>>>
>>> So the idea (inspired by Scheme's racket) is to have a sequence of
>>> 'teachpacks'.
>>> They are like concentric rings, the innermost one being the noob ring,
>>> the outermost one being standard python.
>>>
>>>
>>> How exactly does this work? Is it basically just a custom pythonstartup
>>> file that teachers can give to their students? Maybe with some menu- or
>>> wizard-based configuration to help create the file? Or is this some
>>> different mechanism? If so, what does setting it up, and distributing it to
>>> students, look like?
>>>
>>
>> Frankly Ive not thought through these details in detail(!)
>>
>>
>>> I realize that below you talk about doing things that are currently not
>>> easy to do in a pythonstartup, like hiding all mutating sequence methods,
>>> but presumably the patches to the interpreter core would be something like
>>> adding hide_mutating_sequence_methods() and similar functions that teachers
>>> could then choose to include in the pythonstartup file or whatever they
>>> give out.
>>>
>>>
>> I personally would wish for other minor surgeries eg a different keyword
>> from 'def' for generators.
>> From the pov of an experienced programmer the mental load of one keyword
>> for two disparate purposes is easy enough to handle and the language
>> clutter from an extra keyword is probably just not worth it.
>> However from having taught python for 10+ years I can say this
>> 'overloading' causes endless grief and slowdown of beginners.
>>
>
> * https://docs.python.org/2/library/tokenize.html
> * https://hg.python.org/cpython/file/2.7/Lib/tokenize.py
> * https://hg.python.org/cpython/file/tip/Grammar/Grammar
> *
> https://www.youtube.com/watch?v=R31NRWgoIWM&index=9&list=PLt_DvKGJ_QLZd6Gpug-6x4eYoHPy4q_kb
>
> * https://docs.python.org/devguide/compiler.html
> * https://docs.python.org/2/library/compiler.html
>
> I identify functions that are generators by the 'yield' (and 'yield from')
> tokens.
>
> I document functions that yield:
>
> def generating_function(n):
>     """Generate a sequence
>     # numpy style
>     :returns: (1,2,..,n)
>     :rtype: generator (int)
>
>     # google-style
>     Yields:
>         int: (1,2,n)
>
>     """
>     returns (x for x in range(n))
>
>
>
>
>> Then there is even more wishful thinking changes -- distinguishing
>> procedure from function.
>> After 30 years of Lisp and ML and ... and Haskell and
>> square-peg-into-round-holing these into python, Ive come to the conclusion
>> that Pascal got this distinction more right than all these.  However I
>> expect this surgery to be more invasive and pervasive than I can handle
>> with my (current) resources.
>>
>
> @staticmethod, @classmethod, @property decorators
>
>
>> etc
>> etc
>> In short I am talking of a language that is morally equivalent to python
>> but cosmetically different and is designed to be conducive to learning
>> programming
>>
>
> I suppose you could fork to teach; but [...].
>
> You might check out http://pythontutor.com/ and/or
> http://www.brython.info/ (Python, JS, and compilation).
>
>
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/81fe9b9b/attachment.html>

From abarnert at yahoo.com  Mon May 25 22:08:46 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 25 May 2015 20:08:46 +0000 (UTC)
Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in
 logging module to simplify structured logs support
In-Reply-To: <CAON-fpFiNMRSbMKQq+5MGuOf6rjzHBrrrgSYKGmKWKoPVWG9yw@mail.gmail.com>
References: <CAON-fpFiNMRSbMKQq+5MGuOf6rjzHBrrrgSYKGmKWKoPVWG9yw@mail.gmail.com>
Message-ID: <1110534133.1837705.1432584526401.JavaMail.yahoo@mail.yahoo.com>

On Monday, May 25, 2015 6:57 AM, Ludovic Gasc <gmludo at gmail.com> wrote:

>2015-05-25 4:19 GMT+02:00 Steven D'Aprano <steve at pearwood.info>:

>>At the other extreme, there is the structlog module:
>>
>>https://structlog.readthedocs.org/en/stable/
>
>Thank you for the link, it's an interesting project, it's like "logging" module but on steroids, some good logging ideas inside.


>However, in fact, if I understand correctly, it's the same approach that the previous recipe: Generate a log file with JSON content, use logstash-forwarder to reparse the JSON content, to finally send the structure to logstash, for the query part: https://structlog.readthedocs.org/en/stable/standard-library.html#suggested-configuration

>>How does your change compare to those?
>>
>
>
>In the use case of structlog, drop the logstash-forwarder step to interconnect directly Python daemon with structured log daemon.

>Even if logstash-forwarder should be efficient, why to have an additional step to rebuild a structure you have at the beginning ?

You can't send a Python dictionary over the wire, or store a Python dictionary in a database. You need to encode it to some transmission and/or storage format; there's no way around that. And what's wrong with using JSON as that format?

More importantly, when you drop logstash-forwarder, how are you intending to get the messages to the upstream server? You don't want to make your log calls synchronously wait for acknowledgement before returning. So you need some kind of buffering. And just buffering in memory doesn't work: if your service shuts down unexpectedly, you've lost the last batch of log messages which would tell you why it went down (plus, if the network goes down temporarily, your memory use becomes unbounded). You can of course buffer to disk, but then you've just reintroduced the same need for some kind of intermediate storage format you were trying to eliminate?and it doesn't really solve the problem, because if your service shuts down, the last messages won't get sent until it starts up again. So you could write a separate simple store-and-forward daemon that either reads those file buffers or listens on localhost UDP? but then you've just recreated logstash-forwarder.

And even if you wanted to do all that, I don't see why you couldn't do it all with structlog. They recommend using an already-working workflow instead of designing a different one from scratch, but it's just a recommendation.

From abarnert at yahoo.com  Mon May 25 23:01:23 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 25 May 2015 21:01:23 +0000 (UTC)
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com>
References: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com>
Message-ID: <556623157.1837196.1432587683128.JavaMail.yahoo@mail.yahoo.com>

On Monday, May 25, 2015 5:11 AM, Rustom Mody <rustompmody at gmail.com> wrote:

>On Monday, May 25, 2015 at 1:31:58 PM UTC+5:30, Andrew Barnert via Python-ideas wrote:
>On May 24, 2015, at 22:06, Rustom Mody <rusto... at gmail.com> wrote:
>>
>>
>>Context:  A bunch of my students will be working with me (if all goes according to plan!!)to hack on/in CPython sources. 
>>>
>>>One of the things we would like to try is a framework for CS101 [Intro to programming]
>>>
>>>So for example beginners get knocked out by None 'disappearing' from the prompt
>>>Correctable by
>>>
>>>>>> import sys 
>>>>>> sys.displayhook = print 
>>>
>>>Now of course one can say: "If you want that behavior, set it as you choose"
>>>However at the stage that beginners are knocked down by such, setting up a pythonstartup file is a little premature.
>>>
>>>So the idea (inspired by Scheme's racket) is to have a sequence of 'teachpacks'.
>>>They are like concentric rings, the innermost one being the noob ring, the outermost one being standard python.
>>>
>>
>>How exactly does this work? Is it basically just a custom pythonstartup file that teachers can give to their students? Maybe with some menu- or wizard-based configuration to help create the file? Or is this some different mechanism? If so, what does setting it up, and distributing it to students, look like?
>

>Frankly Ive not thought through these details in detail(!) 

OK, but have you thought through them at all? Or, if not, are you willing to? Without some idea of what the intended interface for teachers and students is, it's going to be very hard to think through how anything else works.

For example, if you have a set of special functions (maybe in a "teachpack" stdlib module) that disable and enable different things in the current Python session, then building a teachpack is just a matter of writing (or GUI-generating) a pythonstartup file with a few function calls, and distributing it to students is just a matter of telling them how to download it and set up the PYTHONSTARTUP environment variable, which seems reasonable. But of course that limits what you can do in these teachpacks to the kinds of things you could change dynamically at runtime.

If, on the other hand, each "surgery" is a patch to CPython, there's no limit to what you can change, but assembling a teachpack is a matter of assembling and applying patches (and hoping they don't conflict), and building CPython for every platform any of the students might use, and then distributing it requires telling them to download an installer and explaining how to make sure they never accidentally run the system Python instead of your build, which doesn't seem reasonable.
>>I realize that below you talk about doing things that are currently not easy to do in a pythonstartup, like hiding all mutating sequence methods, but presumably the patches to the interpreter core would be something like adding hide_mutating_sequence_ methods() and similar functions that teachers could then choose to include in the pythonstartup file or whatever they give out.
>
>I personally would wish for other minor surgeries eg a different keyword from 'def' for generators.
>From the pov of an experienced programmer the mental load of one keyword for two disparate purposes is easy enough to handle and the language clutter from an extra keyword is probably just not worth it.

>However from having taught python for 10+ years I can say this 'overloading' causes endless grief and slowdown of beginners.

>Then there is even more wishful thinking changes -- distinguishing procedure from function.
>After 30 years of Lisp and ML and ... and Haskell and square-peg-into-round-holing these into python, Ive come to the conclusion that Pascal got this distinction more right than all these.  However I expect this surgery to be more invasive and pervasive than I can handle with my (current) resources.


That depends. If you want them to actually be different under the covers, maybe. But if you only care what they look like at the language level, that should be almost as easy as the previous idea. A "defproc" introduces a function definition with an extra flag set; it's an error to use a return statement with a value in any definition being compiled with that flag; there's your definition side. If you want a separate procedure call statement, it's compiled to, in essence, a check that the procedure flag is set on the function's code object, a normal function call, and popping the useless None off the stack, while the function call expression just needs to add a check that the procedure flag is clear.
>etc
>etc
>In short I am talking of a language that is morally equivalent to python but cosmetically different and is designed to be conducive to learning programming


I'm not sure a language that doesn't have any mutating methods, distinguishes procedures from functions, etc. is actually morally equivalent to Python. And this is also straying very far from the original idea of a restricted subset of Python. Adding new syntax and semantics to define explicit generators, or to define and call procedures, is not a subset.

And it's also very different from your original analogy to Racket teachpacks. The idea behind Racket is that Scheme is a language whose implementation is dead-simple, and designed from the start to be extensible at every level, from the syntax up, so almost anything can be done in a library. So, you can start with almost no core stdlib, and then a teachpack is just a handful of stdlib-style functions to do the things the students haven't yet learned how to do (or, sometimes, to standardize what the students did in a previous exercise).

If you really wanted to do this, I think the first step would have to be transforming Python into an extensible language (or transforming CPython or another implementation into a more general implementation) in the same sense as Scheme. Maybe Python plus macros and read-macros would be sufficient for that, but I'm not sure it would be, and, even if it were, it sounds like a much bigger project than you're envisioning.

And honestly, I think it would be less work to design a new language that's effectively Python-esque m-expressions on top of a Scheme core. Since it's "only a toy" language for teaching", you don't need to worry about all kinds of issues that a real language like Python needs to deal with, like making iteration over a zillion items efficient, or having a nice C API, or exposing even the most advanced functionality in an easy-to-use (and easy-to-hook) way.

From abarnert at yahoo.com  Mon May 25 23:13:53 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 25 May 2015 21:13:53 +0000 (UTC)
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
References: <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
Message-ID: <974479162.1867210.1432588433071.JavaMail.yahoo@mail.yahoo.com>

On Monday, May 25, 2015 10:50 AM, Rustom Mody <rustompmody at gmail.com> wrote:

>About programming pedagogy:
>
>| Rob Hagan at Monash had shown that you could teach students more COBOL with one semester of Scheme and one semester of COBOL than you

>| could with three semesters of COBOL

OK, fine. But what can you take away from that?

It may just be that COBOL is hard to teach. Is the same thing true of Python? If not, this is irrelevant.

Or it may be that teaching two very different languages is a useful thing to do. In that case, this is relevant, but it doesn't seem likely that two similar dialects of the same language would be sufficient to get the same benefit. Maybe with a language that can be radically reconfigured like Oz (which you can switch from having Python-style variables to C-style variables to Prolog-style variables) it would work, but even that's little more than a guess.

>from https://groups.google.com/d/msg/erlang-programming/5X1irAmLMD8/qCQJ11Y5jEAJ
>
>No this is not about 'pro-scheme' but about 'pro-learning-curve'
>I dont believe we should be teaching python (or C++ or Java or Haskell or...) but programming.
>[I started my last programming paradigms with python course with the koan:
>You cannot do programming without syntax
>Syntax is irrelevant to programming
>So what is relevant?
>]


I don't think syntax _is_ irrelevant to programming. I think that's a large part of the reason for using Python: it makes the flow of the program visually graspable, it has constructs that read like English, it avoids many ambiguities or near-ambiguities that you'd otherwise have to stop and think through, it has strongly-reinforced idioms for complex patterns that people can recognize at a glance, etc. And, in a very different way, syntax (at a slightly higher level than the actual s-expression syntax) is also a large part of the reason for using Lisp, in a very different way: half of writing an application in Lisp is in essence programming the language constructs to make your application easier to write.

Besides, if syntax were irrelevant, why would you care about the same keyword for defining regular functions and generator functions, the same expressions for calling functions and procedures, etc.? That's just syntax.

From tjreedy at udel.edu  Mon May 25 23:52:45 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 25 May 2015 17:52:45 -0400
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <2bbfe6bb-40a8-4095-923a-a5d86c76ccb7@googlegroups.com>
Message-ID: <mk05jt$15e$1@ger.gmane.org>

On 5/25/2015 8:11 AM, Rustom Mody wrote:

> I personally would wish for other minor surgeries eg a different keyword
> from 'def' for generators.

'Def' is for generator *functions*.  Guido notwithstanding, overloading 
'generator' to mean both a subcategory of function and the non-function 
iterators they produce leads to confusion.  The only structural 
difference between a normal function and generator function is a flag 
bit in the associated code object.

In 3.5, non-buggy generator functions must exit with explicit or 
implicit 'return', just as with other normal functions other than 
.__next__ methods.  Allowing generator functions to exit with 
StopIteration slightly confused them with iterator .__next__ methods.

>  From the pov of an experienced programmer the mental load of one
> keyword for two disparate purposes

The single purpose is to define a function object, with a defined set of 
attributes, that one may call.

  is easy enough to handle and the
> language clutter from an extra keyword is probably just not worth it.

An extra keyword 'async' is being added for coroutine functions, 
resulting I believe in 'async def'.  But I also believe this is not the 
only usage of 'async', while it would be the only usage of a 'gen' prefix.

> However from having taught python for 10+ years I can say this
> 'overloading' causes endless grief and slowdown of beginners.

I think part of the grief is overloading 'generator' to mean both a 
non-iterable function and an iterable non-function.

To really understand generators and generator functions, I think one 
needs to understand what an iterator class looks like, with a 
combination of boilerplate and custom code in .__init__, .__iter__, and 
.__next__.  The generator as iterator has the boilerplate code, while 
the generator function has the needed custom code in the .__init__ and 
.__next__ methods combined in one function body.  For this purpose, 
assignments between local and self attribute namespaces, which one might 
call 'custom boilerplate' are not needed and disappear.  One may think 
of a generator function as defining a subclass of the generator class.

-- 
Terry Jan Reedy


From shoyer at gmail.com  Tue May 26 01:38:20 2015
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 25 May 2015 16:38:20 -0700
Subject: [Python-ideas] The pipe protocol,
	a convention for extensible method chaining
Message-ID: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>

In the PyData community, we really like method chaining for data analysis
pipelines:

(iris.query('SepalLength > 5')
 .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
         PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
 .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))


Unfortunately, method chaining isn't very extensible -- short of monkey
patching, every method we want to use has exist on the original object. If
a user wants to supply their own plotting function, they can't use method
chaining anymore.

You may recall that we brought this up a few months ago on python-ideas as
an example of why we would like macros.

To get around this issue, we are contemplating adding a pipe method to
pandas DataFrames. It looks like this:

def pipe(self, func, *args, **kwargs):
    pipe_func = getattr(func, '__pipe_func__', func)
    return pipe_func(self, *args, **kwargs)


We would encourage third party libraries with objects on which method
chaining is useful to define a pipe method in the same way.

The main idea here is to create an easy way for users to do method chaining
with their own functions and with functions from third party libraries.

The business with __pipe_func__ is more magical, and frankly we aren't sure
it's worth the complexity. The idea is to create a "pipe protocol" that
allows functions to decide how they are called when piped. This is useful
in some cases, because it doesn't always make sense for functions that act
on piped data to accept that data as their first argument.

For more motivation and examples, please read the opening post in this
GitHub issue: https://github.com/pydata/pandas/issues/10129

Obviously, this sort of protocol would not be an official part of the
Python language. But because we are considering creating a de-facto
standard, we would love to get feedback from other Python communities that
use method chaining:
1. Have you encountered or addressed the problem of extensible method
chaining?
2. Would this pipe protocol be useful to you?
3. Is it worth allowing piped functions to override how they are called by
defining something like __pipe_func__?
Note that I'm not particularly interested in feedback about how we
shouldn't be defining double underscore methods. There are other ways we
could spell __pipe_func__, but double underscores seems to be pretty
standard for ad-hoc protocols.
Thanks for your attention.
Best,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/55c106d2/attachment-0001.html>

From chris.barker at noaa.gov  Tue May 26 04:21:43 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Mon, 25 May 2015 19:21:43 -0700
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
 <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
Message-ID: <CALGmxEL5-DLiYELK76Qqx3vhRSSu=cVZf8obT4tmYX9s+hgcwg@mail.gmail.com>

On Mon, May 25, 2015 at 10:50 AM, Rustom Mody <rustompmody at gmail.com> wrote:

> About programming pedagogy:
>
> | Rob Hagan at Monash had shown that you could teach students more COBOL
> with one semester of Scheme and one semester of COBOL than you
> | could with three semesters of COBOL
>

I've seen similar claims with Java and Python in place of COBOL and Scheme.

My thoughts on that are that Python already has little of the cruft that
isn't really about programming.

But it sounds to me like you aren't so much simplifying the language as
hiding parts of it, which I'm not sure buys you much.

> You cannot do programming without syntax
> Syntax is irrelevant to programming
> So what is relevant?
>

:-)

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/9088f088/attachment.html>

From steve at pearwood.info  Tue May 26 04:54:55 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 26 May 2015 12:54:55 +1000
Subject: [Python-ideas] The pipe protocol,
	a convention for extensible method chaining
In-Reply-To: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
References: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
Message-ID: <20150526025455.GG5663@ando.pearwood.info>

On Mon, May 25, 2015 at 04:38:20PM -0700, Stephan Hoyer wrote:
> In the PyData community, we really like method chaining for data analysis
> pipelines:
> 
> (iris.query('SepalLength > 5')
>  .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
>          PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
>  .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))
> 
> 
> Unfortunately, method chaining isn't very extensible -- short of monkey
> patching, every method we want to use has exist on the original object. If
> a user wants to supply their own plotting function, they can't use method
> chaining anymore.

It's not really *method* chaining any more if they do that :-)


> You may recall that we brought this up a few months ago on python-ideas as
> an example of why we would like macros.
> 
> To get around this issue, we are contemplating adding a pipe method to
> pandas DataFrames. It looks like this:
> 
> def pipe(self, func, *args, **kwargs):
>     pipe_func = getattr(func, '__pipe_func__', func)
>     return pipe_func(self, *args, **kwargs)

Are you sure this actually works in practice?

Since pipe() returns the result of calling the passed in function, not 
the dataframe, it seems to me that you can't actually chain this unless 
it's the last call in the chain. This should work:

(iris.query('SepalLength > 5')
    .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
            PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
   .pipe(myplot, kind='scatter', x='SepalRatio', y='PetalRatio')
   )


but I don't think this will work:

(iris.query('SepalLength > 5')
    .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
            PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
   .pipe(myexport, spam=True, eggs=False)
   .plot(kind='scatter', x='SepalRatio', y='PetalRatio')
   )


That makes it somewhat less of a general purpose pipelining method and 
more of a special case "replace the plotter with a different plotter" 
helper method. And for that special case, I'd prefer to give the plot 
method an extra argument, which if not None, is a function to delegate 
to:

    .plot(kind='scatter', x='SepalRatio', y='PetalRatio', plotter=myplot)


What's the point of the redirection to __pipe_func__? Under what 
circumstances would somebody use __pipe_func__ instead of just passing a 
callable (a function or other object with __call__ method)? If you don't 
have a good use case for it, then "You Ain't Gonna Need It" applies.

I think that is completely unnecessary. (It also abuses a reserved 
namespace, but you've already said you don't care about that.) Instead 
of passing:

    .pipe(myobject, args)  # myobject has a __pipe_func__ method

just make it explicit and write:

    .pipe(myobject.some_method, args)


And for what it's worth, apart from the dunder issue, I think it's silly 
to have a *method* called "*_func__".


> The business with __pipe_func__ is more magical, and frankly we aren't sure
> it's worth the complexity. The idea is to create a "pipe protocol" that
> allows functions to decide how they are called when piped. This is useful
> in some cases, because it doesn't always make sense for functions that act
> on piped data to accept that data as their first argument.

Just use a wrapper function that reorders the arguments. If the 
reordering is simple enough, you can do it in place with a lambda:

    .pipe(lambda *args, **kwargs: myplot(args[1], args[0], *args[2:]))


> Obviously, this sort of protocol would not be an official part of the
> Python language. But because we are considering creating a de-facto
> standard, we would love to get feedback from other Python communities that
> use method chaining:

Because you are considering creating a de-facto standard, I think it is 
especially rude to trespass on the reserved dunder namespace. (Unless, 
of course, the core developers decide that they don't mind.)


> 1. Have you encountered or addressed the problem of extensible method
> chaining?

Yes. I love chaining in, say, bash, and it works well in Ruby, but it's 
less useful in Python. My attempt to help bring chaining to Python is 
here 

http://code.activestate.com/recipes/578770-method-chaining/

but it relies on methods operating by side-effect, not returning a new 
result. But generally speaking, I don't like methods that operate by 
side-effect, so I don't use chaining much in practice. I'm always on the 
look-out for opportunities where it makes sense though.


> 2. Would this pipe protocol be useful to you?

I don't think so.


> 3. Is it worth allowing piped functions to override how they are called by
> defining something like __pipe_func__?

No, I think it is completely unnecessary.


-- 
Steve

From rustompmody at gmail.com  Tue May 26 04:56:36 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Mon, 25 May 2015 19:56:36 -0700 (PDT)
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CALGmxEL5-DLiYELK76Qqx3vhRSSu=cVZf8obT4tmYX9s+hgcwg@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
 <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
 <CALGmxEL5-DLiYELK76Qqx3vhRSSu=cVZf8obT4tmYX9s+hgcwg@mail.gmail.com>
Message-ID: <c08a5b9f-acf3-41a8-94b7-a35f60b93e09@googlegroups.com>


On Tuesday, May 26, 2015 at 7:58:31 AM UTC+5:30, Chris Barker wrote:
>
>
> But it sounds to me like you aren't so much simplifying the language as 
> hiding parts of it, which I'm not sure buys you much.
>

Learners learn sequentially -- probably a more fundamental law than 
'computer compute sequentially'.
If there are significant tracts of the language that are out of one's 
(current) understanding and not arising to confuse the noob, thats ok.
But when they arise and the learner does not have the intellectual 
equipment to deal with that it just slows down learning.

Take the print statment/function.

It would be rather ridiculous to remove the print from a realistic language.
However if you've taught enough beginners you'd know how hard it is to get 
beginners to write
... return <something>
as against
... print (<something>)

And so in an early teachpack, I'd disable the print statement.

This of course means that at that level the student is bound to trying out 
python at the interactive interpreter.
Some people think that that renders the language ridiculously impotent.
My experience suggests that if this guide-rail were available, beginners 
would get key beginner-stuff eg
- writing structured code
- defining, passing, using, visualizing suitable data structures
much faster.

So...

FOR THE BEGINNER: "cutting out" == "simplifying"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/c7823992/attachment.html>

From liik.joonas at gmail.com  Tue May 26 05:22:49 2015
From: liik.joonas at gmail.com (Joonas Liik)
Date: Tue, 26 May 2015 06:22:49 +0300
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <c08a5b9f-acf3-41a8-94b7-a35f60b93e09@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
 <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
 <CALGmxEL5-DLiYELK76Qqx3vhRSSu=cVZf8obT4tmYX9s+hgcwg@mail.gmail.com>
 <c08a5b9f-acf3-41a8-94b7-a35f60b93e09@googlegroups.com>
Message-ID: <CAB1GNpRkTSLe5a0LeSpyWT+jeKhmb7PVWxnog5H-=s5Kw-q=wg@mail.gmail.com>

I'm not sure how good the analogy is.. but i've just taken a certain course
with such a "simplified" language.

(speaking about the SQL here, of course you can get around that if you do
things in VBA.. IMO thats not really an improvement tho)
MS Access felt really impotent, and since you often stumble on SQLServer
docs MS Access often feels broken when you try to use some of those and it
doesn't work.
..and that happened like lots of times (double digits..)

If you omit basic features that people will come to expect based on readily
available documentation you will only breed resentment.

I'm afraid that all you will achieve with your good intentions is scare
newcomers away from python :(
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/73d4b707/attachment-0001.html>

From steve at pearwood.info  Tue May 26 05:50:48 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 26 May 2015 13:50:48 +1000
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
Message-ID: <20150526035048.GJ5663@ando.pearwood.info>

On Mon, May 25, 2015 at 10:36:00AM +0530, Rustom Mody wrote:
> Context:  A bunch of my students will be working with me (if all goes
> according to plan!!)to hack on/in CPython sources.

I'm sorry, I see a serious disconnect between what you are trying to do 
(hack on CPython sources) and your students (beginners so early in the 
learning process that they are confused by the fact that None doesn't 
print in the interactive interpreter).

How on earth do you expect that students who cannot even cope with 
"disappearing None" will deal with hacking the CPython source code?

I assume you don't mean the C code of the interpreter itself, but the 
standard library. Even so, the standard library has code written in a 
multitude of styles, some of it is 20 years old, some of it is more or 
less a direct port of Java code, much of it involves the use of advanced 
concepts.

I don't see this as being even remotely viable.

[...]
> Now different teachers may like to navigate the world of python differently.
> So for example I prefer to start with the immutable (functional) subset and
> go on to the stateful/imperative.  The point (here) is not so much which is
> preferable so much as this that a given teacher should have the freedom to
> chart out a course through python in which (s)he can cross out certain
> features at certain points for students.  So a teacher preferring to
> emphasise OO/imperative over functional may prefer the opposite choice.

And of course you can do so. But you cannot expect to chart out a *pure* 
OO or *pure* functional course, since Python is not purely either. As a 
deliberate design choice, Python uses both functional and OO concepts 
all the way through the builtins and standard library.

If you insist on a pure approach, Python is the wrong language for you. 
Python uses a hybrid paradigm of functional and procedural and OO and 
imperative approaches.

Why not make that a teaching feature rather than a problem? You can 
compare the different approaches: functional sorted() versus OO .sort(), 
for example. Or have the students write them own OO version of map().


-- 
Steve

From rustompmody at gmail.com  Tue May 26 06:01:58 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Mon, 25 May 2015 21:01:58 -0700 (PDT)
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <20150526035048.GJ5663@ando.pearwood.info>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
Message-ID: <787080c9-7320-443f-a63e-09b975c8cd88@googlegroups.com>



On Tuesday, May 26, 2015 at 9:21:43 AM UTC+5:30, Steven D'Aprano wrote:
>
> On Mon, May 25, 2015 at 10:36:00AM +0530, Rustom Mody wrote: 
> > Context:  A bunch of my students will be working with me (if all goes 
> > according to plan!!)to hack on/in CPython sources. 
>
> I'm sorry, I see a serious disconnect between what you are trying to do 
> (hack on CPython sources) and your students (beginners so early in the 
> learning process that they are confused by the fact that None doesn't 
> print in the interactive interpreter). 
>
> How on earth do you expect that students who cannot even cope with 
> "disappearing None" will deal with hacking the CPython source code? 
>

Heh! They are not the same students!!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/308fd0c5/attachment.html>

From ncoghlan at gmail.com  Tue May 26 06:13:09 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 26 May 2015 14:13:09 +1000
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <20150526035048.GJ5663@ando.pearwood.info>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
Message-ID: <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>

On 26 May 2015 at 13:50, Steven D'Aprano <steve at pearwood.info> wrote:
> If you insist on a pure approach, Python is the wrong language for you.
> Python uses a hybrid paradigm of functional and procedural and OO and
> imperative approaches.

Not only that, but Python *deliberately* makes stateful procedural
code the default, as that's the only style that comes to humans
intuitively enough for it to be the standard way of *giving
instructions to other humans*. It's the way checklists are written,
it's the way cookbooks are written, it's the way work instructions and
procedure manuals are written. If you allow for the use of
illustrations in place of words, it's even the way IKEA and LEGO
assembly instructions are written.

More advanced conceptual modelling techniques like functional
programming and object-oriented programming are then *optional*
aspects of the language to help people cope with the fact that
imperative programming doesn't scale very well when it comes to
handling more complex problems.

Regards,
Nick.

P.S. Gary Bernhardt coined a nice phrase for the functional
programming focused variant of this: Imperative Shell, Functional
Core. The notion works similarly well for an object-oriented core. The
key though is that you can't skip over teaching the side effect laden
procedural layer, or you're going to inadvertently persuade vast
swathes of people that they can't program at all, when there's
actually a lot of software development tasks that are well within
their reach.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From rustompmody at gmail.com  Tue May 26 07:36:47 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Mon, 25 May 2015 22:36:47 -0700 (PDT)
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
Message-ID: <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>


On Tuesday, May 26, 2015 at 9:50:09 AM UTC+5:30, Nick Coghlan wrote:
 

> Gary Bernhardt coined a nice phrase for the functional 
> programming focused variant of this: Imperative Shell, Functional 
> Core. The notion works similarly well for an object-oriented core. The 
> key though is that you can't skip over teaching the side effect laden 
> procedural layer, or you're going to inadvertently persuade vast 
> swathes of people that they can't program at all, when there's 
> actually a lot of software development tasks that are well within 
> their reach. 
>


Why does the question arise of not teaching side-effects/procedural 
programming?
Its only (if at all) a question of sequencing not of 'not teaching'. In 
fact that is why Python is
preferable to say haskell.
Take the example of arithmetic and algebra.
Lets say we agree that both need to be taught/learnt.
Can that be done simultaneously?
>From my pov:
arithmetic ? functional
algebra ? imperative

You want to see it the other way? That's ok and one can make a case for 
that viewpoint also.
You want to say that algebra is more basic than arithmetic? Thats also ok 
[I guess many professional mathematicians would tend to that view: a group 
is more basic than a ring is more basic than a field. School arithmetic is 
one very specific and not too interesting field]

The viewpoint that will not stand up to scrutiny is to say that 
arithmetic/algebra are the same and can be approached simultaneously.

Easiest seen in the most simple and basic building block of imperative 
programming: the assignment statement. When you have:

x = y+1

One understands the "y+1" functionally
Whereas we understand the x = <rhs> imperatively

If you think the separation of these two worlds is unnecessary then you 
have the mess of C's 'expressions' like ++
And you will have students puzzling over the wonders of nature like i = i++ 
whereas the most useful answer would be "Syntax Error"


 

> More advanced conceptual modelling techniques like functional 
> programming and object-oriented programming are then *optional* 
> aspects of the language to help people cope with the fact that 
> imperative programming doesn't scale very well when it comes to 
> handling more complex problems. 


Thats certainly true historically.
However as I tried to say above I dont believe its true logically.
And pedagogically the case remains very much open.

ACM's most recent curriculum? juxtaposes FP and OOP (pg 157, 158) and says 
that 3 hours FP + 4 hours OOP is an absolute basic requirement for a CS 
major. I regard this as an epochal shift in our pcerception of what 
programming is about. The fact that this has happened 50 years after Lisp 
should indicate the actual speed with which our field adapts.  Dijkstra 
said that it takes 100 years for an idea to go from inception to general 
acceptance. Think of when Cantor invented set theory and when modern math 
entered primary schools. Other inversions of historical | logical | 
pedagogical order here.?

And finally all this is rather OT.  I am talking of a framework for a 
teacher to chart a course through python, not any changes per se to python 
itself.
A teacher wanting to chart a different course through python should be free 
(and encouraged) to do that as well.

? https://www.acm.org/education/CS2013-final-report.pdf
? http://blog.languager.org/2011/02/cs-education-is-fat-and-weak-1.html and 
sequel http://blog.languager.org/2011/02/cs-education-is-fat-and-weak-2.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/ad59a6cc/attachment-0001.html>

From tjreedy at udel.edu  Tue May 26 07:54:36 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 26 May 2015 01:54:36 -0400
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <c08a5b9f-acf3-41a8-94b7-a35f60b93e09@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
 <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
 <CALGmxEL5-DLiYELK76Qqx3vhRSSu=cVZf8obT4tmYX9s+hgcwg@mail.gmail.com>
 <c08a5b9f-acf3-41a8-94b7-a35f60b93e09@googlegroups.com>
Message-ID: <mk11rd$vbj$1@ger.gmane.org>

On 5/25/2015 10:56 PM, Rustom Mody wrote:

> It would be rather ridiculous to remove the print from a realistic language.
> However if you've taught enough beginners you'd know how hard it is to
> get beginners to write
> ... return <something>
> as against
> ... print (<something>)

Programming is composition, the connections of outputs to inputs (as 
with circuit design).  'Print' is the enemy of composition. We agree so far.

If submitting code with 'print' instead of 'return' gets a grade of 0 or 
'fail', don't people learn fairly quickly?  If assignments are partially 
test-defined and automatically test-graded, 'print' rather than 'return' 
will fail.  Example: 'write a function that returns a tuple of the 
number of positive and negative values in an finite iterable of signed 
numbers', followed by examples with the caveat that the grading test 
will have other inputs and expected outputs.

> And so in an early teachpack, I'd disable the print statement.

Print is essential for debugging.  You should only want to disallow 
print in the final submission of function code, as suggested above.

-- 
Terry Jan Reedy


From rosuav at gmail.com  Tue May 26 08:07:40 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 26 May 2015 16:07:40 +1000
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <mk11rd$vbj$1@ger.gmane.org>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
 <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
 <CALGmxEL5-DLiYELK76Qqx3vhRSSu=cVZf8obT4tmYX9s+hgcwg@mail.gmail.com>
 <c08a5b9f-acf3-41a8-94b7-a35f60b93e09@googlegroups.com>
 <mk11rd$vbj$1@ger.gmane.org>
Message-ID: <CAPTjJmrZpAiwn6OFO22ZyeFnZzwBWwxH=VeEAEkF-F+5LmZV9Q@mail.gmail.com>

On Tue, May 26, 2015 at 3:54 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 5/25/2015 10:56 PM, Rustom Mody wrote:
>
>> It would be rather ridiculous to remove the print from a realistic
>> language.
>> However if you've taught enough beginners you'd know how hard it is to
>> get beginners to write
>> ... return <something>
>> as against
>> ... print (<something>)
>
>
> Programming is composition, the connections of outputs to inputs (as with
> circuit design).  'Print' is the enemy of composition. We agree so far.

That's fine as long as it's okay to produce no results whatsoever
until all processing is complete. In a pure sense, yes, a program's
goal is to produce output, and it doesn't make a lot of difference how
that output is produced. You can build a web framework in which the
only way to send a result is to return it from a function. But there
are innumerable times when it's more useful to produce intermediate
output; whether that output goes to a file, a socket, the console, or
something else, it's as much a part of real-world programming as
returned values are.

Doesn't the Zen of Python say something about practicality and purity? Hmmm.

ChrisA

From stephen at xemacs.org  Tue May 26 08:31:20 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 26 May 2015 15:31:20 +0900
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
Message-ID: <87y4kbg893.fsf@uwakimon.sk.tsukuba.ac.jp>

Rustom Mody writes:

 > And finally all this is rather OT.  I am talking of a framework for
 > a teacher to chart a course through python, not any changes per se
 > to python itself.

Then why is this conversation, interesting as it is, on python-ideas
instead of python-list?



From abarnert at yahoo.com  Tue May 26 08:56:38 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 25 May 2015 23:56:38 -0700
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
Message-ID: <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com>

On May 25, 2015, at 22:36, Rustom Mody <rustompmody at gmail.com> wrote:
> 
> I am talking of a framework for a teacher to chart a course through python, not any changes per se to python itself.

How exactly can you allow a teacher to "chart a course through python" that includes separate function and generator function definition statements, procedures as distinct from functions, etc. without changing Python? Python doesn't have the configurability to switch those features on and off, and also doesn't have the features to switch on in the first place.

> A teacher wanting to chart a different course through python should be free (and encouraged) to do that as well.


I would like a framework for a teacher to chart a course through driving the Nissan 370Z that would allow me to start off teaching hoverpads instead of wheels, but a teacher wanting to chart a different course should be free to start with sails instead. And I want to do this without changing anything about the 370Z.



From ncoghlan at gmail.com  Tue May 26 09:52:14 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 26 May 2015 17:52:14 +1000
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <87y4kbg893.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
 <87y4kbg893.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7exjevzM0roZbhDDpfFcVqj_1xQ7aPkN+2mbJJM3cixJQ@mail.gmail.com>

On 26 May 2015 16:31, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>
> Rustom Mody writes:
>
>  > And finally all this is rather OT.  I am talking of a framework for
>  > a teacher to chart a course through python, not any changes per se
>  > to python itself.
>
> Then why is this conversation, interesting as it is, on python-ideas
> instead of python-list?

Or edu-sig: https://mail.python.org/mailman/listinfo/edu-sig

Cheers,
Nick.

>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/cf576ef7/attachment.html>

From wes.turner at gmail.com  Tue May 26 11:46:36 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Tue, 26 May 2015 04:46:36 -0500
Subject: [Python-ideas] The pipe protocol,
 a convention for extensible method chaining
In-Reply-To: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
References: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
Message-ID: <CACfEFw9ucWa_EaY_dXtBN8bJHbd5rzWpbNV0tqHPe7464fj-Cw@mail.gmail.com>

On May 25, 2015 6:45 PM, "Stephan Hoyer" <shoyer at gmail.com> wrote:
>
> In the PyData community, we really like method chaining for data analysis
pipelines:
>
> (iris.query('SepalLength > 5')
>  .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
>          PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
>  .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))
>
>
> Unfortunately, method chaining isn't very extensible -- short of monkey
patching, every method we want to use has exist on the original object. If
a user wants to supply their own plotting function, they can't use method
chaining anymore.

>
> You may recall that we brought this up a few months ago on python-ideas
as an example of why we would like macros.
>
> To get around this issue, we are contemplating adding a pipe method to
pandas DataFrames. It looks like this:
>
> def pipe(self, func, *args, **kwargs):
>     pipe_func = getattr(func, '__pipe_func__', func)
>     return pipe_func(self, *args, **kwargs)
>
>
> We would encourage third party libraries with objects on which method
chaining is useful to define a pipe method in the same way.
>
> The main idea here is to create an easy way for users to do method
chaining with their own functions and with functions from third party
libraries.
>
> The business with __pipe_func__ is more magical, and frankly we aren't
sure it's worth the complexity. The idea is to create a "pipe protocol"
that allows functions to decide how they are called when piped. This is
useful in some cases, because it doesn't always make sense for functions
that act on piped data to accept that data as their first argument.
>
> For more motivation and examples, please read the opening post in this
GitHub issue: https://github.com/pydata/pandas/issues/10129
>
> Obviously, this sort of protocol would not be an official part of the
Python language. But because we are considering creating a de-facto
standard, we would love to get feedback from other Python communities that
use method chaining:
> 1. Have you encountered or addressed the problem of extensible method
chaining?

* https://pythonhosted.org/pyquery/api.html
* SQLAlchemy

> 2. Would this pipe protocol be useful to you?

What are the advantages over just returning 'self'? (Which use cases are
not possible with current syntax?)

In terms of documenting functional composition, I find it easier to test
and add comment strings to multiple statements.

Months ago, when I looked at creating pandasrdf (pandas #3402), there is
need for a (...).meta.columns w/ columnar URIs, units, (metadata: who,
what, when, how). Said metadata is not storable with e.g. CSV; but is with
JSON-LD, RDF, RDFa, CSVW.

It would be neat to be able to track provenance metadata through [chained]
transformations.

> 3. Is it worth allowing piped functions to override how they are called
by defining something like __pipe_func__?

"There should be one-- and preferably only one --obvious way to do it."

> Note that I'm not particularly interested in feedback about how we
shouldn't be defining double underscore methods. There are other ways we
could spell __pipe_func__, but double underscores seems to be pretty
standard for ad-hoc protocols.
> Thanks for your attention.
> Best,
> Stephan
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/a5dd4c3e/attachment-0001.html>

From rustompmody at gmail.com  Mon May 25 19:50:43 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Mon, 25 May 2015 10:50:43 -0700 (PDT)
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
Message-ID: <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>



On Monday, May 25, 2015 at 10:14:50 PM UTC+5:30, Chris Barker wrote:
>
> Just a note here, that (as an intro to python teacher), I think this is a 
> pedagogically bad idea.
>
> At least if the goal is to teach Python -- while you don't need to 
> introduce all the complexity up front, hiding it just sends students down 
> the wrong track.
>
> On the other hand, if you want a kind-of-like-python-but-simpler language 
> to teach particular computer science concepts, this kind of hacking may be 
> of value.
>
> But I don't think it would be a good idea to build that capability inot 
> Python itself. And I think you can hack in in with monkey patching anyway 
> -- so that's probably the way to go.
>
> for example:
>
> """So for example I prefer to start with the immutable (functional) 
> subset"""
>
> you can certainly do that by simply using tuples and the functional tools.
>
> (OK, maybe not -- after all most (all?) of the functional stuff returns 
> lists, not tuples, and that may be beyond monkey-patchable)
>
> But that's going to be a lot of hacking to change.
>
> Is it so bad to have them work with lists in a purely functional way?
>
> -Chris
>
> I guess there are  2 questions here one about teaching, one about 
python-ideas, both having somewhat OT answers...
Anyways here goes.
About ideas for python:

This is really about some kids and I mucking around inside python sources.
That will become something used by other teachers -- far away
That will be suitable for patches to python -- even further

About programming pedagogy:

| Rob Hagan at Monash had shown that you could teach students more COBOL 
with one semester of Scheme and one semester of COBOL than you
| could with three semesters of COBOL

from 
https://groups.google.com/d/msg/erlang-programming/5X1irAmLMD8/qCQJ11Y5jEAJ

No this is not about 'pro-scheme' but about 'pro-learning-curve'
I dont believe we should be teaching python (or C++ or Java or Haskell 
or...) but programming.
[I started my last programming paradigms with python course with the koan:
You cannot do programming without syntax
Syntax is irrelevant to programming
So what is relevant?
]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/1ea4a000/attachment.html>

From jsbueno at python.org.br  Tue May 26 15:43:58 2015
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Tue, 26 May 2015 10:43:58 -0300
Subject: [Python-ideas] The pipe protocol,
 a convention for extensible method chaining
In-Reply-To: <CACfEFw9ucWa_EaY_dXtBN8bJHbd5rzWpbNV0tqHPe7464fj-Cw@mail.gmail.com>
References: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
 <CACfEFw9ucWa_EaY_dXtBN8bJHbd5rzWpbNV0tqHPe7464fj-Cw@mail.gmail.com>
Message-ID: <CAH0mxTTQ3i9S=EU_A4wD1BXPQ6==aipVRzvMCos2-LbWTBi7xA@mail.gmail.com>

> Unfortunately, method chaining isn't very extensible -- short of monkey patching
> every method we want to use has exist on the original object.

(Link for repo on which the examples here are implemented:
https://github.com/jsbueno/chillicurry )

Actually, the last time this subjetc showed up (and it is was not that
long ago) -
I could think of something "short of monkey patching everything" --

It is possible to fashion an special object with a custom
`__getattr__` - sai that you call it
"curry" - that them proceeds to retrieve references to functions (and
methods) with the same names of the attributes you try to get from it,
and wrap those function calls in order to create your pipeline.

Say:

>>> curry.len.list.range(5,10)
5

The trick is to pick the names "len", "list" and "range" from the
calling stack frame.
You can them evolve on this idea, and pass a sepecial sentinel
parameter to calls on the chain, so that the function call gets
delayed and the sentinel is replaced by the piped object when it is
actually executed - say:

>>> curry.mul(DELAY, 2).mul(DELAY, 3).complex.int(5)
(30+0j)

So I did put this together - but lacking a concrete use case myself,
it is somewhat
"amorph" - lacking specifications on what it should do  -
it can for example, retrieve names from the piped object attributes
instead of the calling namespace:

>>> curry.split.upper.str("good morning Vietnam")
['GOOD', 'MORNING', 'VIETNAM']

And the "|" operator is overriden as well so that with some
parentheses  lambdas and other things can be added to the chain -

Just throwing in what could give you more ideas to the approach you
have in mind. This one works applying the calls on the rightside first
and traversing the object to the left - but it should be easy to do
the opposite - starting with a call with the "seed" object on the
left, and chaining calls on the right.

If you find the idea interesting enough to be of use, I'd be happy to
evolve what is already in place there so it could be useful.

regards,

   js
  -><-


On 26 May 2015 at 06:46, Wes Turner <wes.turner at gmail.com> wrote:
>
> On May 25, 2015 6:45 PM, "Stephan Hoyer" <shoyer at gmail.com> wrote:
>>
>> In the PyData community, we really like method chaining for data analysis
>> pipelines:
>>
>> (iris.query('SepalLength > 5')
>>  .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
>>          PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
>>  .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))
>>
>>
>> Unfortunately, method chaining isn't very extensible -- short of monkey
>> patching, every method we want to use has exist on the original object. If a
>> user wants to supply their own plotting function, they can't use method
>> chaining anymore.
>
>>
>> You may recall that we brought this up a few months ago on python-ideas as
>> an example of why we would like macros.
>>
>> To get around this issue, we are contemplating adding a pipe method to
>> pandas DataFrames. It looks like this:
>>
>> def pipe(self, func, *args, **kwargs):
>>     pipe_func = getattr(func, '__pipe_func__', func)
>>     return pipe_func(self, *args, **kwargs)
>>
>>
>> We would encourage third party libraries with objects on which method
>> chaining is useful to define a pipe method in the same way.
>>
>> The main idea here is to create an easy way for users to do method
>> chaining with their own functions and with functions from third party
>> libraries.
>>
>> The business with __pipe_func__ is more magical, and frankly we aren't
>> sure it's worth the complexity. The idea is to create a "pipe protocol" that
>> allows functions to decide how they are called when piped. This is useful in
>> some cases, because it doesn't always make sense for functions that act on
>> piped data to accept that data as their first argument.
>>
>> For more motivation and examples, please read the opening post in this
>> GitHub issue: https://github.com/pydata/pandas/issues/10129
>>
>> Obviously, this sort of protocol would not be an official part of the
>> Python language. But because we are considering creating a de-facto
>> standard, we would love to get feedback from other Python communities that
>> use method chaining:
>> 1. Have you encountered or addressed the problem of extensible method
>> chaining?
>
> * https://pythonhosted.org/pyquery/api.html
> * SQLAlchemy
>
>> 2. Would this pipe protocol be useful to you?
>
> What are the advantages over just returning 'self'? (Which use cases are not
> possible with current syntax?)
>
> In terms of documenting functional composition, I find it easier to test and
> add comment strings to multiple statements.
>
> Months ago, when I looked at creating pandasrdf (pandas #3402), there is
> need for a (...).meta.columns w/ columnar URIs, units, (metadata: who, what,
> when, how). Said metadata is not storable with e.g. CSV; but is with
> JSON-LD, RDF, RDFa, CSVW.
>
> It would be neat to be able to track provenance metadata through [chained]
> transformations.
>
>> 3. Is it worth allowing piped functions to override how they are called by
>> defining something like __pipe_func__?
>
> "There should be one-- and preferably only one --obvious way to do it."
>
>> Note that I'm not particularly interested in feedback about how we
>> shouldn't be defining double underscore methods. There are other ways we
>> could spell __pipe_func__, but double underscores seems to be pretty
>> standard for ad-hoc protocols.
>> Thanks for your attention.
>> Best,
>> Stephan
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From tjreedy at udel.edu  Tue May 26 15:45:32 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 26 May 2015 09:45:32 -0400
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CAPTjJmrZpAiwn6OFO22ZyeFnZzwBWwxH=VeEAEkF-F+5LmZV9Q@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
 <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
 <CALGmxEL5-DLiYELK76Qqx3vhRSSu=cVZf8obT4tmYX9s+hgcwg@mail.gmail.com>
 <c08a5b9f-acf3-41a8-94b7-a35f60b93e09@googlegroups.com>
 <mk11rd$vbj$1@ger.gmane.org>
 <CAPTjJmrZpAiwn6OFO22ZyeFnZzwBWwxH=VeEAEkF-F+5LmZV9Q@mail.gmail.com>
Message-ID: <mk1ted$cq5$1@ger.gmane.org>

On 5/26/2015 2:07 AM, Chris Angelico wrote:
> On Tue, May 26, 2015 at 3:54 PM, Terry Reedy <tjreedy at udel.edu> wrote:

>> Programming is composition, the connections of outputs to inputs (as with
>> circuit design).  'Print' is the enemy of composition. We agree so far.
>
> That's fine as long as it's okay to produce no results whatsoever
> until all processing is complete.
> In a pure sense, yes, a program's
> goal is to produce output, and it doesn't make a lot of difference how
> that output is produced. You can build a web framework in which the
> only way to send a result is to return it from a function. But there
> are innumerable times when it's more useful to produce intermediate
> output; whether that output goes to a file, a socket, the console, or
> something else, it's as much a part of real-world programming as
> returned values are.

The context is a beginning programming course where the goal is to teach 
people to write

def f(a, b, c): return a*b + c
print(f(2, 3, 4))

instead

def f(a, b, d): print(a*b + c)
f(2, 3, 4)

In other words, to teach beginners to relegate output to top level code, 
separate from the calculation code. (Or perhaps output functions, but 
that is a more advanced topic.)  The first function is easy testable, 
the second is not.

For printing intermediate results, yield lines to top-level code that 
can do whatever with them, including printing.

def text_generator(args):
    ...
        yield line

for line in text_generator: print(line)

is top-level code that prints intermediate results produced by a 
testable generator.

People want to see results, which is half of why I said not to delete 
print.  But proper assignments and grading can enforce separation of 
calculations from use of results.  The idea of separation of concerns 
did not start with OOP.

-- 
Terry Jan Reedy


From julien at palard.fr  Tue May 26 16:33:19 2015
From: julien at palard.fr (Julien Palard)
Date: Tue, 26 May 2015 16:33:19 +0200
Subject: [Python-ideas] The pipe protocol,
 a convention for extensible method chaining
In-Reply-To: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
References: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
Message-ID: <5564842F.5000501@palard.fr>

o/

On 05/26/2015 01:38 AM, Stephan Hoyer wrote:
> In the PyData community, we  really like method chaining for data
 > analysis pipelines:

A few month ago, I created an almost similar thread here:

  https://mail.python.org/pipermail//python-ideas/2014-October/029839.html

About a package of mine https://pypi.python.org/pypi/pipe that I'm not 
that proud of.

As the answers of my thread states, using piplines is not the Pythonic 
way to do things, it looks like it makes the code more readable, but 
it's not true, there's [always] a Pythonic way to write the same code 
but readable. By affecting intermediate computation to variables for 
example, side gain: variables are named, so self-documenting.

My point of view about pipes is that it's hard to modify as, for the 
reader, what is passed between each action is opaque.

My point of view about my library is that i should not expose an 
operator overloading, as it may confuse people actually needing `|` to 
apply a `binary or` to a result of a chained thing.

--
Julien Palard


From rustompmody at gmail.com  Tue May 26 16:55:36 2015
From: rustompmody at gmail.com (Rustom Mody)
Date: Tue, 26 May 2015 07:55:36 -0700 (PDT)
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <mk1ted$cq5$1@ger.gmane.org>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
 <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
 <CALGmxEL5-DLiYELK76Qqx3vhRSSu=cVZf8obT4tmYX9s+hgcwg@mail.gmail.com>
 <c08a5b9f-acf3-41a8-94b7-a35f60b93e09@googlegroups.com>
 <mk11rd$vbj$1@ger.gmane.org>
 <CAPTjJmrZpAiwn6OFO22ZyeFnZzwBWwxH=VeEAEkF-F+5LmZV9Q@mail.gmail.com>
 <mk1ted$cq5$1@ger.gmane.org>
Message-ID: <dde3732c-5b2b-4bde-bd08-b0177a62a63a@googlegroups.com>



On Tuesday, May 26, 2015 at 7:16:18 PM UTC+5:30, Terry Reedy wrote:
>
> On 5/26/2015 2:07 AM, Chris Angelico wrote: 
> > On Tue, May 26, 2015 at 3:54 PM, Terry Reedy <tjr... at udel.edu 
> <javascript:>> wrote: 
>
> >> Programming is composition, the connections of outputs to inputs (as 
> with 
> >> circuit design).  'Print' is the enemy of composition. We agree so far. 
> > 
> > That's fine as long as it's okay to produce no results whatsoever 
> > until all processing is complete. 
> > In a pure sense, yes, a program's 
> > goal is to produce output, and it doesn't make a lot of difference how 
> > that output is produced. You can build a web framework in which the 
> > only way to send a result is to return it from a function. But there 
> > are innumerable times when it's more useful to produce intermediate 
> > output; whether that output goes to a file, a socket, the console, or 
> > something else, it's as much a part of real-world programming as 
> > returned values are. 
>
> The context is a beginning programming course where the goal is to teach 
> people to write 
>
> def f(a, b, c): return a*b + c 
> print(f(2, 3, 4)) 
>
> instead 
>
> def f(a, b, d): print(a*b + c) 
> f(2, 3, 4) 
>
> In other words, to teach beginners to relegate output to top level code, 
> separate from the calculation code. (Or perhaps output functions, but 
> that is a more advanced topic.)  The first function is easy testable, 
> the second is not. 
>

Thanks Terry for the elucidation
 

>
> For printing intermediate results, yield lines to top-level code that 
> can do whatever with them, including printing. 
>
> def text_generator(args): 
>     ... 
>         yield line 
>
> for line in text_generator: print(line) 
>
> is top-level code that prints intermediate results produced by a 
> testable generator. 
>
>
And thanks-squared for that.
Generators are a really wonderful feature of python and not enough 
showcased.
Think of lazy lists in haskell and how much fanfaring and trumpeting goes 
on around these.
And by contrast how little of that for generators in the python world. Are 
the two all that different?

You just have to think of all the data-structure/AI/etc books explaining 
depth-first-search and more arcane algorithms with a 'print' in the innards 
of it.
And how far generators as a fundamental tool would go towards 
clarifying/modularizing these explanations 

So yes generators are an important component towards the goal of 
'print-less' programming

> People want to see results, which is half of why I said not to delete 
> print.  But proper assignments and grading can enforce separation of 
> calculations from use of results.  The idea of separation of concerns 
> did not start with OOP. 
>
> -- 
> Terry Jan Reedy 
>
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at python.org <javascript:> 
> https://mail.python.org/mailman/listinfo/python-ideas 
> Code of Conduct: http://python.org/psf/codeofconduct/ 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/1267cb3d/attachment.html>

From chris.barker at noaa.gov  Tue May 26 19:13:08 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 26 May 2015 10:13:08 -0700
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
Message-ID: <CALGmxE+8VnESoe7EmaK9AfG+amVk463UgUrH9N22=txjz+YJZw@mail.gmail.com>

On Mon, May 25, 2015 at 10:36 PM, Rustom Mody <rustompmody at gmail.com> wrote:

> If you think the separation of these two worlds is unnecessary then you
> have the mess of C's 'expressions' like ++
> And you will have students puzzling over the wonders of nature like i =
> i++ whereas the most useful answer would be "Syntax Error"
>

A good reason NOT to teach C as a first language ;-)


> And finally all this is rather OT.  I am talking of a framework for a
> teacher to chart a course through python, not any changes per se to python
> itself.
>

I would argue that you are actually not talking about teaching Python, per
se -- but using a (subset) of python to teach programming in the more
general sense.

If you want to teach Python, then I think it is a mistake to teach a
truncated version first -- it will just lead to confusion later.

But having a "functional" version of Python for teaching functional
programming concepts makes some sense.

Though I think one woulc create that as a monkey=patched version of python
wihtout hacing into the core pyton implimentaiton:

i.e. replace map(), etc, with versions that return tuples, rather than
lists, that kind of thing.

though maybe replacing list comprehensions with tuple comprehensions would
be a bit tricky...

Though I'm still not sure you'd need to -- sure you CAN mutate a list, but
if you use functional approaches, lists won't get mutated -- so where is
the source of the confusion?

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/a5f6a892/attachment.html>

From techtonik at gmail.com  Tue May 26 20:05:06 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Tue, 26 May 2015 21:05:06 +0300
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <20150522105847.GA9624@phdru.name>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
Message-ID: <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>

On Fri, May 22, 2015 at 1:58 PM, Oleg Broytman <phd at phdru.name> wrote:
> On Fri, May 22, 2015 at 12:59:30PM +0300, anatoly techtonik <techtonik at gmail.com> wrote:
>> Is the idea to have timer that starts on import is good?
>
>    No, because:
>
> -- it could be imported at the wrong time;

Any time is right.

> -- it couldn't be "reimported"; what is the usage of one-time timer?

The idea is to have convenient default timer to measure
script run-time.

> -- if it could be reset and restarted at need -- why not start it
>    manually in the first place?

Current ways of measuring script run-time are not cross-platform or
not memorizable. I have to reinvent timer code a couple of times, and
that's not convenient for the code that is only relevant while debugging.

From wes.turner at gmail.com  Tue May 26 20:19:59 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Tue, 26 May 2015 13:19:59 -0500
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
 <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com>
Message-ID: <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>

Ways to teach Python from first principles:

* Restrict the syntactical token list ("switch features on and off")
  * Fork Python
  * RPython -- https://rpython.readthedocs.org/en/latest/
  * https://pypi.python.org/pypi/RestrictedPython
  * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox
  * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub);
virtualization)

* Add a preprocessor with a cost function to limit valid tokens for a given
code submission
  (see the links to the Python grammar, tokenizer, compiler linked above)

* Modify nbgrader to evaluate submissions with such a cost function:
  https://github.com/jupyter/nbgrader

* Receive feedback about code syntax and tests from a CI system with
repository commit (web)hooks
  * BuildBot, Jenkins, Travis CI, xUnit XML

https://westurner.org/wiki/awesome-python-testing#continuous-integration-ci-and-continuous-delivery-cd



On Tue, May 26, 2015 at 1:56 AM, Andrew Barnert via Python-ideas <
python-ideas at python.org> wrote:

> On May 25, 2015, at 22:36, Rustom Mody <rustompmody at gmail.com> wrote:
> >
> > I am talking of a framework for a teacher to chart a course through
> python, not any changes per se to python itself.
>
> How exactly can you allow a teacher to "chart a course through python"
> that includes separate function and generator function definition
> statements, procedures as distinct from functions, etc. without changing
> Python? Python doesn't have the configurability to switch those features on
> and off, and also doesn't have the features to switch on in the first place.
>
> > A teacher wanting to chart a different course through python should be
> free (and encouraged) to do that as well.
>
>
> I would like a framework for a teacher to chart a course through driving
> the Nissan 370Z that would allow me to start off teaching hoverpads instead
> of wheels, but a teacher wanting to chart a different course should be free
> to start with sails instead. And I want to do this without changing
> anything about the 370Z.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/8e69fd0a/attachment-0001.html>

From rosuav at gmail.com  Tue May 26 20:21:01 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 27 May 2015 04:21:01 +1000
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
Message-ID: <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>

On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik <techtonik at gmail.com> wrote:
>> -- if it could be reset and restarted at need -- why not start it
>>    manually in the first place?
>
> Current ways of measuring script run-time are not cross-platform or
> not memorizable. I have to reinvent timer code a couple of times, and
> that's not convenient for the code that is only relevant while debugging.

Sounds to me like something that doesn't belong in the stdlib, but
makes a great utility module for private use.

ChrisA

From wes.turner at gmail.com  Tue May 26 20:25:27 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Tue, 26 May 2015 13:25:27 -0500
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
 <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com>
 <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>
Message-ID: <CACfEFw9VeRbVOcBgR9kvM+HWpvh3Tpnqbsnb=iO4TrK022LiLw@mail.gmail.com>

On Tue, May 26, 2015 at 1:19 PM, Wes Turner <wes.turner at gmail.com> wrote:

> Ways to teach Python from first principles:
>
> * Restrict the syntactical token list ("switch features on and off")
>   * Fork Python
>   * RPython -- https://rpython.readthedocs.org/en/latest/
>

RPython -> PyPy: https://bitbucket.org/pypy/pypy

PyPy is both an implementation of the Python programming language, and an
> extensive compiler framework for dynamic language implementations. You can
> build self-contained Python implementations which execute independently
> from CPython.



>   * https://pypi.python.org/pypi/RestrictedPython
>   * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox
>   * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub);
> virtualization)
>
> * Add a preprocessor with a cost function to limit valid tokens for a
> given code submission
>   (see the links to the Python grammar, tokenizer, compiler linked above)
>
> * Modify nbgrader to evaluate submissions with such a cost function:
>   https://github.com/jupyter/nbgrader
>
> * Receive feedback about code syntax and tests from a CI system with
> repository commit (web)hooks
>   * BuildBot, Jenkins, Travis CI, xUnit XML
>
> https://westurner.org/wiki/awesome-python-testing#continuous-integration-ci-and-continuous-delivery-cd
>
>
>
> On Tue, May 26, 2015 at 1:56 AM, Andrew Barnert via Python-ideas <
> python-ideas at python.org> wrote:
>
>> On May 25, 2015, at 22:36, Rustom Mody <rustompmody at gmail.com> wrote:
>> >
>> > I am talking of a framework for a teacher to chart a course through
>> python, not any changes per se to python itself.
>>
>> How exactly can you allow a teacher to "chart a course through python"
>> that includes separate function and generator function definition
>> statements, procedures as distinct from functions, etc. without changing
>> Python? Python doesn't have the configurability to switch those features on
>> and off, and also doesn't have the features to switch on in the first place.
>>
>> > A teacher wanting to chart a different course through python should be
>> free (and encouraged) to do that as well.
>>
>>
>> I would like a framework for a teacher to chart a course through driving
>> the Nissan 370Z that would allow me to start off teaching hoverpads instead
>> of wheels, but a teacher wanting to chart a different course should be free
>> to start with sails instead. And I want to do this without changing
>> anything about the 370Z.
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/2b15a0a8/attachment.html>

From rosuav at gmail.com  Tue May 26 20:28:39 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 27 May 2015 04:28:39 +1000
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
Message-ID: <CAPTjJmrzQfXqgAjrH+C0D=+VwpCVMErpPkh15q9UzM-D9Ax=3w@mail.gmail.com>

On Wed, May 27, 2015 at 4:24 AM, anatoly techtonik <techtonik at gmail.com> wrote:
> There are a lot of helpers like this that might be useful. Installing them
> separately is a lot of hassle - it is easy to forget some.

Package 'em all up into a single repository and clone that repo on
every system you use. For me, that's called "shed", and I keep it on
github:

https://github.com/Rosuav/shed

But whether it's public or private, git or hg, pure Python or a mix of
languages, it's an easy way to pick up all those convenient little
scripts. You'll never "forget some", because they're all in one place.

ChrisA

From techtonik at gmail.com  Tue May 26 20:24:37 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Tue, 26 May 2015 21:24:37 +0300
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
Message-ID: <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>

On Tue, May 26, 2015 at 9:21 PM, Chris Angelico <rosuav at gmail.com> wrote:
> On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik <techtonik at gmail.com> wrote:
>>> -- if it could be reset and restarted at need -- why not start it
>>>    manually in the first place?
>>
>> Current ways of measuring script run-time are not cross-platform or
>> not memorizable. I have to reinvent timer code a couple of times, and
>> that's not convenient for the code that is only relevant while debugging.
>
> Sounds to me like something that doesn't belong in the stdlib, but
> makes a great utility module for private use.

There are a lot of helpers like this that might be useful. Installing them
separately is a lot of hassle - it is easy to forget some.

From techtonik at gmail.com  Tue May 26 20:30:54 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Tue, 26 May 2015 21:30:54 +0300
Subject: [Python-ideas] Lossless bulletproof conversion to unicode
	(backslashing)
Message-ID: <CAPkN8xKTXJu2nhvocG8KuyO1XkJVfK_WsmY6dM=hWsVyg+BVyA@mail.gmail.com>

https://docs.python.org/2.7/library/functions.html?highlight=unicode#unicode

There is no lossless way to encode the information
to unicode. The argument that you know the encoding
the data is coming from is a fallacy. The argument that
data is always correct is a fallacy as well. So:

1. external data encoding is unknown or varies
2. external data has binary chunks that are invalid for
conversion to unicode

In real world you have to deal with broken and invalid
output and UnicodeDecode crashes is not an option.
The unicode() constructor proposes two options to
deal with invalid output:

1. ignore  - meaning skip and corrupt the data
2. replace  - just corrupt the data

The solution is to have filter preprocess the binary
string to escape all non-unicode symbols so that the
following lossless transformation becomes possible:

   binary -> escaped utf-8 string -> unicode -> binary

How to accomplish that with Python 2.x?

This stuff is critical to port SCons to Python 3.x and I
expect for other such tools too.

-- 
anatoly t.

From ethan at stoneleaf.us  Tue May 26 20:47:47 2015
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 26 May 2015 11:47:47 -0700
Subject: [Python-ideas] Lossless bulletproof conversion to unicode
	(backslashing)
In-Reply-To: <CAPkN8xKTXJu2nhvocG8KuyO1XkJVfK_WsmY6dM=hWsVyg+BVyA@mail.gmail.com>
References: <CAPkN8xKTXJu2nhvocG8KuyO1XkJVfK_WsmY6dM=hWsVyg+BVyA@mail.gmail.com>
Message-ID: <5564BFD3.7000101@stoneleaf.us>

On 05/26/2015 11:30 AM, anatoly techtonik wrote:

[...]

> How to accomplish that with Python 2.x?

This should be on Python List, not on Ideas.

--
~Ethan~

From phd at phdru.name  Tue May 26 21:06:46 2015
From: phd at phdru.name (Oleg Broytman)
Date: Tue, 26 May 2015 21:06:46 +0200
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
Message-ID: <20150526190646.GA12204@phdru.name>

Hi!

On Tue, May 26, 2015 at 09:05:06PM +0300, anatoly techtonik <techtonik at gmail.com> wrote:
> On Fri, May 22, 2015 at 1:58 PM, Oleg Broytman <phd at phdru.name> wrote:
> > On Fri, May 22, 2015 at 12:59:30PM +0300, anatoly techtonik <techtonik at gmail.com> wrote:
> >> Is the idea to have timer that starts on import is good?
> >
> >    No, because:
> >
> > -- it could be imported at the wrong time;
> 
> Any time is right.

   Very much application-dependent. What if you wanna measure import
time?

> > -- it couldn't be "reimported"; what is the usage of one-time timer?
> 
> The idea is to have convenient default timer to measure
> script run-time.

   Good idea for a small separate project. Bad for the stdlib. Not every
small simple useful module must be in the stdlib.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From phd at phdru.name  Tue May 26 21:08:01 2015
From: phd at phdru.name (Oleg Broytman)
Date: Tue, 26 May 2015 21:08:01 +0200
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
Message-ID: <20150526190801.GB12204@phdru.name>

On Tue, May 26, 2015 at 09:24:37PM +0300, anatoly techtonik <techtonik at gmail.com> wrote:
> On Tue, May 26, 2015 at 9:21 PM, Chris Angelico <rosuav at gmail.com> wrote:
> > On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik <techtonik at gmail.com> wrote:
> >>> -- if it could be reset and restarted at need -- why not start it
> >>>    manually in the first place?
> >>
> >> Current ways of measuring script run-time are not cross-platform or
> >> not memorizable. I have to reinvent timer code a couple of times, and
> >> that's not convenient for the code that is only relevant while debugging.
> >
> > Sounds to me like something that doesn't belong in the stdlib, but
> > makes a great utility module for private use.
> 
> There are a lot of helpers like this that might be useful. Installing them
> separately is a lot of hassle - it is easy to forget some.

   Incorporate them into your main repository as submodules.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From tjreedy at udel.edu  Tue May 26 21:20:47 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 26 May 2015 15:20:47 -0400
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <dde3732c-5b2b-4bde-bd08-b0177a62a63a@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
 <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
 <CALGmxEL5-DLiYELK76Qqx3vhRSSu=cVZf8obT4tmYX9s+hgcwg@mail.gmail.com>
 <c08a5b9f-acf3-41a8-94b7-a35f60b93e09@googlegroups.com>
 <mk11rd$vbj$1@ger.gmane.org>
 <CAPTjJmrZpAiwn6OFO22ZyeFnZzwBWwxH=VeEAEkF-F+5LmZV9Q@mail.gmail.com>
 <mk1ted$cq5$1@ger.gmane.org>
 <dde3732c-5b2b-4bde-bd08-b0177a62a63a@googlegroups.com>
Message-ID: <mk2h30$pt0$1@ger.gmane.org>

On 5/26/2015 10:55 AM, Rustom Mody wrote:

> On Tuesday, May 26, 2015 at 7:16:18 PM UTC+5:30, Terry Reedy wrote:

>     The context is a beginning programming course where the goal is to
>     teach
>     people to write
>
>     def f(a, b, c): return a*b + c
>     print(f(2, 3, 4))
>
>     instead
>
>     def f(a, b, d): print(a*b + c)
>     f(2, 3, 4)
>
>     In other words, to teach beginners to relegate output to top level
>     code,
>     separate from the calculation code. (Or perhaps output functions, but
>     that is a more advanced topic.)  The first function is easy testable,
>     the second is not.

> Thanks Terry for the elucidation

>     For printing intermediate results, yield lines to top-level code that
>     can do whatever with them, including printing.
>
>     def text_generator(args):
>          ...
>              yield line
>
>     for line in text_generator: print(line)
>
>     is top-level code that prints intermediate results produced by a
>     testable generator.

> And thanks-squared for that.
> Generators are a really wonderful feature of python and not enough
> showcased.
> Think of lazy lists in haskell and how much fanfaring and trumpeting
> goes on around these.
> And by contrast how little of that for generators in the python world.
> Are the two all that different?
>
> You just have to think of all the data-structure/AI/etc books explaining
> depth-first-search and more arcane algorithms with a 'print' in the
> innards of it.
> And how far generators as a fundamental tool would go towards
> clarifying/modularizing these explanations
>
> So yes generators are an important component towards the goal of
> 'print-less' programming

I will just note that the stdlib is not immune from overly embedded 
prints.  Until 3.4, one could print a disassembly to stdout with 
dis.dis.  Period.  Hard for us to test; hard for others to use.  In 3.4, 
a file arg was added to dis, and the Bytecode and Instruction classes 
added, so one could a) iterate over unformatted named tuples, b) get the 
output as a string, or c) redirect the output to any 'file' (with a 
write method).  The result: easy for us to test; easy for others to use 
the data.

-- 
Terry Jan Reedy


From abarnert at yahoo.com  Tue May 26 22:32:08 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 26 May 2015 13:32:08 -0700
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
 <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com>
 <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>
Message-ID: <D31B4234-75A1-489F-AB58-2C4CB23D4F60@yahoo.com>

On May 26, 2015, at 11:19, Wes Turner <wes.turner at gmail.com> wrote:
> 
> Ways to teach Python from first principles:

What you're suggesting may be a reasonable way to restrict Python for teaching (although, as others have argued, I don't think it's necessary)--but it isn't a reasonable way to get what Rustom Mody says he wants.

While his first paragraph started out talking about restricting Python to a subset, only one of the four examples I've seen actually is a restriction. He wants procedures and functions to be fundamentally distinct things, defined differently and called differently. You can't do that by restricting the token list, or by using RPython instead of Python, or by executing code inside a container. And the one that actually _is_ a restriction isn't at the grammar level, it's just hiding a bunch of methods (list.append, presumably list.__setitem__, etc.).

Of course you _could_ do everything he wants by forking one of the Python installations and heavily modifying it (I even suggested how that particular change could be implemented in a CPython fork), or by writing a new Python-like language with a compiler that compiles to Python (which, at its simplest, might be reducible to a set of MacroPy macros or a source preprocessor), because you can do _anything_ that way. But then you're not really talking about Python in the first place, you're talking about designing and implementing a new teaching language that just borrows a lot of ideas from Python and is implemented with Python's help. And none of the rest of your suggestions are relevant once that's what you're doing.

> * Restrict the syntactical token list ("switch features on and off")
>   * Fork Python
>   * RPython -- https://rpython.readthedocs.org/en/latest/ 
>   * https://pypi.python.org/pypi/RestrictedPython
>   * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox
>   * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub); virtualization)
> 
> * Add a preprocessor with a cost function to limit valid tokens for a given code submission
>   (see the links to the Python grammar, tokenizer, compiler linked above)
> 
> * Modify nbgrader to evaluate submissions with such a cost function:
>   https://github.com/jupyter/nbgrader
> 
> * Receive feedback about code syntax and tests from a CI system with repository commit (web)hooks
>   * BuildBot, Jenkins, Travis CI, xUnit XML
>     https://westurner.org/wiki/awesome-python-testing#continuous-integration-ci-and-continuous-delivery-cd
> 
> 
> 
>> On Tue, May 26, 2015 at 1:56 AM, Andrew Barnert via Python-ideas <python-ideas at python.org> wrote:
>> On May 25, 2015, at 22:36, Rustom Mody <rustompmody at gmail.com> wrote:
>> >
>> > I am talking of a framework for a teacher to chart a course through python, not any changes per se to python itself.
>> 
>> How exactly can you allow a teacher to "chart a course through python" that includes separate function and generator function definition statements, procedures as distinct from functions, etc. without changing Python? Python doesn't have the configurability to switch those features on and off, and also doesn't have the features to switch on in the first place.
>> 
>> > A teacher wanting to chart a different course through python should be free (and encouraged) to do that as well.
>> 
>> 
>> I would like a framework for a teacher to chart a course through driving the Nissan 370Z that would allow me to start off teaching hoverpads instead of wheels, but a teacher wanting to chart a different course should be free to start with sails instead. And I want to do this without changing anything about the 370Z.
>> 
>> 
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/58a584ab/attachment-0001.html>

From chris.barker at noaa.gov  Tue May 26 23:00:18 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 26 May 2015 14:00:18 -0700
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <D31B4234-75A1-489F-AB58-2C4CB23D4F60@yahoo.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
 <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com>
 <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>
 <D31B4234-75A1-489F-AB58-2C4CB23D4F60@yahoo.com>
Message-ID: <CALGmxELfW9OXUdgaZwNOfDMPby=EReKUNshNrEeCbifbig_m+w@mail.gmail.com>

On Tue, May 26, 2015 at 1:32 PM, Andrew Barnert via Python-ideas <
python-ideas at python.org> wrote:

 And the one that actually _is_ a restriction isn't at the grammar level,
> it's just hiding a bunch of methods (list.append, presumably
> list.__setitem__, etc.).
>

which is odd, because Python already has an immutable sequence -- it's call
a tuple.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/df8b1444/attachment.html>

From wes.turner at gmail.com  Tue May 26 23:33:38 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Tue, 26 May 2015 16:33:38 -0500
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <D31B4234-75A1-489F-AB58-2C4CB23D4F60@yahoo.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
 <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com>
 <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>
 <D31B4234-75A1-489F-AB58-2C4CB23D4F60@yahoo.com>
Message-ID: <CACfEFw_TeOMbVxQaK5J-ZuJg0DKwJUs1WH_FTDw8HA3cD-a2pw@mail.gmail.com>

On Tue, May 26, 2015 at 3:32 PM, Andrew Barnert <abarnert at yahoo.com> wrote:

> On May 26, 2015, at 11:19, Wes Turner <wes.turner at gmail.com> wrote:
>
> Ways to teach Python from first principles:
>
>
> What you're suggesting may be a reasonable way to restrict Python for
> teaching (although, as others have argued, I don't think it's
> necessary)--but it isn't a reasonable way to get what Rustom Mody says he
> wants.
>
> While his first paragraph started out talking about restricting Python to
> a subset, only one of the four examples I've seen actually is a
> restriction. He wants procedures and functions to be fundamentally distinct
> things, defined differently and called differently. You can't do that by
> restricting the token list, or by using RPython instead of Python, or by
> executing code inside a container. And the one that actually _is_ a
> restriction isn't at the grammar level, it's just hiding a bunch of methods
> (list.append, presumably list.__setitem__, etc.).
>
> Of course you _could_ do everything he wants by forking one of the Python
> installations and heavily modifying it (I even suggested how that
> particular change could be implemented in a CPython fork), or by writing a
> new Python-like language with a compiler that compiles to Python (which, at
> its simplest, might be reducible to a set of MacroPy macros or a source
> preprocessor), because you can do _anything_ that way. But then you're not
> really talking about Python in the first place, you're talking about
> designing and implementing a new teaching language that just borrows a lot
> of ideas from Python and is implemented with Python's help. And none of the
> rest of your suggestions are relevant once that's what you're doing.
>

I must have misunderstood the objectives.

All of these suggestions are relevant to teaching [core] python in an
academic environment.


>
> * Restrict the syntactical token list ("switch features on and off")
>   * Fork Python
>   * RPython -- https://rpython.readthedocs.org/en/latest/
>   * https://pypi.python.org/pypi/RestrictedPython
>   * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox
>   * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub);
> virtualization)
>
> * Add a preprocessor with a cost function to limit valid tokens for a
> given code submission
>   (see the links to the Python grammar, tokenizer, compiler linked above)
>
> * Modify nbgrader to evaluate submissions with such a cost function:
>   https://github.com/jupyter/nbgrader
>
> * Receive feedback about code syntax and tests from a CI system with
> repository commit (web)hooks
>   * BuildBot, Jenkins, Travis CI, xUnit XML
>
> https://westurner.org/wiki/awesome-python-testing#continuous-integration-ci-and-continuous-delivery-cd
>
>
>
> On Tue, May 26, 2015 at 1:56 AM, Andrew Barnert via Python-ideas <
> python-ideas at python.org> wrote:
>
>> On May 25, 2015, at 22:36, Rustom Mody <rustompmody at gmail.com> wrote:
>> >
>> > I am talking of a framework for a teacher to chart a course through
>> python, not any changes per se to python itself.
>>
>> How exactly can you allow a teacher to "chart a course through python"
>> that includes separate function and generator function definition
>> statements, procedures as distinct from functions, etc. without changing
>> Python? Python doesn't have the configurability to switch those features on
>> and off, and also doesn't have the features to switch on in the first place.
>>
>> > A teacher wanting to chart a different course through python should be
>> free (and encouraged) to do that as well.
>>
>>
>> I would like a framework for a teacher to chart a course through driving
>> the Nissan 370Z that would allow me to start off teaching hoverpads instead
>> of wheels, but a teacher wanting to chart a different course should be free
>> to start with sails instead. And I want to do this without changing
>> anything about the 370Z.
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/114461c9/attachment.html>

From steve at pearwood.info  Wed May 27 00:41:46 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 27 May 2015 08:41:46 +1000
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
 <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com>
 <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>
Message-ID: <20150526224146.GB932@ando.pearwood.info>

On Tue, May 26, 2015 at 01:19:59PM -0500, Wes Turner wrote:

> Ways to teach Python from first principles:

Most of these methods fail to teach *Python*. They teach something 
similar, but different to, Python: almost-Python.

If Rustom wishes to fork Python to create his own version of almost- 
Python, he doesn't need to discuss it here. I'd rather he didn't discuss 
it here -- this is PYTHON-ideas, not Cobra-ideas, or Lua-ideas, or 
Rustom's-purely-functional-almost-python-ideas.

There is, or at least was, a strong tradition of creating specialist 
teaching languages, starting with Pascal which developed as a more 
restricted and more pure form of Algol. But this is not the place to 
discuss it.


> * Restrict the syntactical token list ("switch features on and off")
>   * Fork Python
>   * RPython -- https://rpython.readthedocs.org/en/latest/

I'm pretty sure that RPython is not designed as a teaching language. The 
PyPy guys are fairly insistent that RPython is not a general purpose 
language, but exists for one reason and one reason only: building 
compilers.


>   * https://pypi.python.org/pypi/RestrictedPython
>   * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox
>   * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub);
> virtualization)

Sandboxing Python and restricting the functionality of almost-Python are 
unrelated issues. Purely functional almost-Python would want to replace 
things like dict.update which modifies the dict in place with a built-in 
function which returns a new, updated, dict. Running regular Python in a 
container doesn't make it almost-Python, it is still regular Python.


-- 
Steve

From wes.turner at gmail.com  Wed May 27 00:58:57 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Tue, 26 May 2015 17:58:57 -0500
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <20150526224146.GB932@ando.pearwood.info>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <20150526035048.GJ5663@ando.pearwood.info>
 <CADiSq7cWS2d6+f_h191V1xCpy91ufi-PaCGQ0c9uH+ZMt_Y=jA@mail.gmail.com>
 <5118b7c3-12d8-4c2d-81e0-430a7522a391@googlegroups.com>
 <8716F442-0C8A-48C9-95BC-AC8E63696DF2@yahoo.com>
 <CACfEFw9-PVa7o4twdiAM9WfUOHYdWq9Zizotj0JR39cLzMSQXg@mail.gmail.com>
 <20150526224146.GB932@ando.pearwood.info>
Message-ID: <CACfEFw8eK96OOYrKBxN4vjppZpvr+7f4ZOOh9ek9N_Usg1z-HQ@mail.gmail.com>

On Tue, May 26, 2015 at 5:41 PM, Steven D'Aprano <steve at pearwood.info>
wrote:

> On Tue, May 26, 2015 at 01:19:59PM -0500, Wes Turner wrote:
>
> > Ways to teach Python from first principles:
>
> Most of these methods fail to teach *Python*. They teach something
> similar, but different to, Python: almost-Python.
>
> If Rustom wishes to fork Python to create his own version of almost-
> Python, he doesn't need to discuss it here. I'd rather he didn't discuss
> it here -- this is PYTHON-ideas, not Cobra-ideas, or Lua-ideas, or
> Rustom's-purely-functional-almost-python-ideas.
>

I agree.

* Language syntax propositions -> python-ideas at python.org
  * Or, if not feasible for the general community,
     RPython and Sandboxing research do identify
     methods for (more than) syntactical restriction
* Teaching -> edu-sig at python.org
   * IPython Notebook, JupyterHub
     * A custom interpreter with RPython and a custom Jupyter kernel
       may be of use.


>
> There is, or at least was, a strong tradition of creating specialist
> teaching languages, starting with Pascal which developed as a more
> restricted and more pure form of Algol. But this is not the place to
> discuss it.
>

https://en.wikipedia.org/wiki/History_of_Python


>
>
> > * Restrict the syntactical token list ("switch features on and off")
> >   * Fork Python
> >   * RPython -- https://rpython.readthedocs.org/en/latest/
>
> I'm pretty sure that RPython is not designed as a teaching language. The
> PyPy guys are fairly insistent that RPython is not a general purpose
> language, but exists for one reason and one reason only: building
> compilers.
>

Rather than forking, writing an interpeter may be more maintainable
(and relatively consistent with a widely-deployed language with versioned
semantics):

https://rpython.readthedocs.org/en/latest/#writing-your-own-interpreter-in-rpython


>
>
> >   * https://pypi.python.org/pypi/RestrictedPython
> >   * http://pyvideo.org/video/2585/building-and-breaking-a-python-sandbox
> >   * OR: execute code in container (e.g. LXC, LXD, Docker (JupyterHub);
> > virtualization)
>
> Sandboxing Python and restricting the functionality of almost-Python are
> unrelated issues. Purely functional almost-Python would want to replace
> things like dict.update which modifies the dict in place with a built-in
> function which returns a new, updated, dict. Running regular Python in a
> container doesn't make it almost-Python, it is still regular Python.
>

If hosting (or trying to maintain n shells), sandboxing and containers are
directly relevant.

* IPython notebooks can be converted to edX courses (link above)
* There are reproducible Dockerfiles for development and education
* A custom interpreter with RPython and a custom Jupyter kernel may be of
use.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/a95e50f8/attachment-0001.html>

From shoyer at gmail.com  Wed May 27 07:56:14 2015
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 26 May 2015 22:56:14 -0700
Subject: [Python-ideas] The pipe protocol,
 a convention for extensible method chaining
In-Reply-To: <20150526025455.GG5663@ando.pearwood.info>
References: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
 <20150526025455.GG5663@ando.pearwood.info>
Message-ID: <CAEQ_TvcFYBhcqev2ac6jO5d_8tdM9wrw3fFWXO47MKkyeChwWQ@mail.gmail.com>

Hi Steve,

On Mon, May 25, 2015 at 7:54 PM, Steven D'Aprano <steve at pearwood.info>
wrote:

> Are you sure this actually works in practice?
>
> Since pipe() returns the result of calling the passed in function, not
> the dataframe, it seems to me that you can't actually chain this unless
> it's the last call in the chain.


This is a good point. We're pretty sure it will work in practice, because
many functions that take dataframes return other dataframes -- or other
objects that will implement a .pipe() method. The prototypical use case is
actually closer to:

df.pipe(reformat_my_data)

Plotting and saving data with method chaining is convenient, but usually as
the terminal step in a data analysis flow. None of the existing pandas
methods for plotting or exporting return a dataframe, and it doesn't seem
to be much of an impediment to method chaining.

That said, we've also thought about adding a .tee() method for exactly this
use case -- it's like pipe, but returns the original object instead of
modifying it.

What's the point of the redirection to __pipe_func__? Under what
> circumstances would somebody use __pipe_func__ instead of just passing a
> callable (a function or other object with __call__ method)? If you don't
> have a good use case for it, then "You Ain't Gonna Need It" applies.
>

Our main use case was for APIs that can't accept a DataFrame as their first
argument, but that naturally can be understood as modifying dataframes.

Here's an example based on the Seaborn plotting library:

def scatterplot(x, y, data=None):
    # make a 2D plot of x vs y

If `x` or `y` are strings, Seaborn looks them up as columns in the provided
dataframe `data`. But `x` and `y` can also be directly provided as columns.
This API is in unfortunate conflict with passing in `data` as the first,
required argument.


> I think that is completely unnecessary. (It also abuses a reserved
> namespace, but you've already said you don't care about that.) Instead
> of passing:
>
>     .pipe(myobject, args)  # myobject has a __pipe_func__ method
>
> just make it explicit and write:
>
>     .pipe(myobject.some_method, args)
>

This is a fair point. Writing something like:

.pipe(seaborn.scatterplot.df, 'x', 'y')

is not so much worst than omitting the .df.


> Yes. I love chaining in, say, bash, and it works well in Ruby, but it's
> less useful in Python. My attempt to help bring chaining to Python is
> here
>
> http://code.activestate.com/recipes/578770-method-chaining/
>
> but it relies on methods operating by side-effect, not returning a new
> result. But generally speaking, I don't like methods that operate by
> side-effect, so I don't use chaining much in practice. I'm always on the
> look-out for opportunities where it makes sense though.
>

I think this is where we have an advantage in the PyData world. We tend to
work less with built-in data structures and prefer to make our methods pure
functions, which together make chaining much more feasible.

Cheers,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/b65af67a/attachment.html>

From mal at egenix.com  Wed May 27 10:45:39 2015
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 27 May 2015 10:45:39 +0200
Subject: [Python-ideas] The pipe protocol,
 a convention for extensible method chaining
In-Reply-To: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
References: <CAEQ_Tve+0MwkJY+MDOb3dUSVg8=q99WuE-Qps_ik6EXjOWtohQ@mail.gmail.com>
Message-ID: <55658433.7090009@egenix.com>

On 26.05.2015 01:38, Stephan Hoyer wrote:
> In the PyData community, we really like method chaining for data analysis
> pipelines:
> 
> (iris.query('SepalLength > 5')
>  .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
>          PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
>  .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))

FWIW: I don't think this is a programming style we should encourage
in Python in general, so I'm -1 on this.

It doesn't read well, you cannot easily tell what the intermediate
objects are on which you run the methods, debugging the above becomes
hard, it only gives you a minor typing advantage over using variables
and calling methods on those and it gives you no performance
advantage.

If you need a pipe pattern, it would be better to make that
explicit through some special helper function or perhaps a
piping object on which you register the various steps to run.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 27 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From rosuav at gmail.com  Wed May 27 11:00:11 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 27 May 2015 19:00:11 +1000
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPkN8x+qnjBQE+RLFtqmS6Nc5aiNQ1f7FD5ROLnvr3XGGWmTbw@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
 <CAPTjJmrzQfXqgAjrH+C0D=+VwpCVMErpPkh15q9UzM-D9Ax=3w@mail.gmail.com>
 <CAPkN8x+qnjBQE+RLFtqmS6Nc5aiNQ1f7FD5ROLnvr3XGGWmTbw@mail.gmail.com>
Message-ID: <CAPTjJmqJqBLDfkFm44jw-i67yLPEO6F=sTV1vF9mqGLQJwKMKg@mail.gmail.com>

On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik <techtonik at gmail.com> wrote:
> How do you make these importable? Do you git clone it from site-packages?
> Like:
>
> cd site-packages/
> git clone .../shed .
>
> ??? What if you have two shed repositories with different tools?

I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And
there won't be two sheds, because there is only one me.

ChrisA

From abarnert at yahoo.com  Wed May 27 11:43:55 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 27 May 2015 02:43:55 -0700
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPTjJmqJqBLDfkFm44jw-i67yLPEO6F=sTV1vF9mqGLQJwKMKg@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
 <CAPTjJmrzQfXqgAjrH+C0D=+VwpCVMErpPkh15q9UzM-D9Ax=3w@mail.gmail.com>
 <CAPkN8x+qnjBQE+RLFtqmS6Nc5aiNQ1f7FD5ROLnvr3XGGWmTbw@mail.gmail.com>
 <CAPTjJmqJqBLDfkFm44jw-i67yLPEO6F=sTV1vF9mqGLQJwKMKg@mail.gmail.com>
Message-ID: <D5125CFC-4DBE-4809-814D-CEF657C89F67@yahoo.com>

On May 27, 2015, at 02:00, Chris Angelico <rosuav at gmail.com> wrote:
> 
>> On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik <techtonik at gmail.com> wrote:
>> How do you make these importable? Do you git clone it from site-packages?
>> Like:
>> 
>> cd site-packages/
>> git clone .../shed .

Or just build a trivial distribution out of it and then you can just "pip install git+https://github.com/you/repo".

>> ??? What if you have two shed repositories with different tools?

... which solves that problem.

> I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And
> there won't be two sheds, because there is only one me.

... even if someone figures out how to fork and clone Chris or Anatoly.

Anyway, it works for me a lot better than the floppy I used to carry around with abutils.py, .emacs, and half a dozen other files I couldn't live without on a new/borrowed computer.

From abarnert at yahoo.com  Wed May 27 14:11:02 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 27 May 2015 05:11:02 -0700
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPkN8xLdn_6ESGTxcVBbQBUX4KYAECcZ4DcPqjFQ8b2pGgSAEQ@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
 <CAPTjJmrzQfXqgAjrH+C0D=+VwpCVMErpPkh15q9UzM-D9Ax=3w@mail.gmail.com>
 <CAPkN8x+qnjBQE+RLFtqmS6Nc5aiNQ1f7FD5ROLnvr3XGGWmTbw@mail.gmail.com>
 <CAPTjJmqJqBLDfkFm44jw-i67yLPEO6F=sTV1vF9mqGLQJwKMKg@mail.gmail.com>
 <D5125CFC-4DBE-4809-814D-CEF657C89F67@yahoo.com>
 <CAPkN8xLdn_6ESGTxcVBbQBUX4KYAECcZ4DcPqjFQ8b2pGgSAEQ@mail.gmail.com>
Message-ID: <31856E7F-437A-4241-8961-3B196EFA380A@yahoo.com>

On May 27, 2015, at 04:30, anatoly techtonik <techtonik at gmail.com> wrote:
> 
> On Wed, May 27, 2015 at 12:43 PM, Andrew Barnert via Python-ideas
> <python-ideas at python.org> wrote:
>> On May 27, 2015, at 02:00, Chris Angelico <rosuav at gmail.com> wrote:
>>> 
>>>> On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik <techtonik at gmail.com> wrote:
>>>> How do you make these importable? Do you git clone it from site-packages?
>>>> Like:
>>>> 
>>>> cd site-packages/
>>>> git clone .../shed .
>> 
>> Or just build a trivial distribution out of it and then you can just "pip install git+https://github.com/you/repo".
> 
> But that would make it nested under the "repo" package namespace, no?

That depends on how you write your setup.py. It can install a module, a package, three separate packages, whatever you want. I just install one flat module full of helpers (plus a whole bunch of dependencies).

> If not, then how pip detects conflicts when the same file is provided
> by different sheds?

You can't do that. But I don't have a bunch of different sheds in the same environment, and I don't see why you'd want to either. I can imagine having different sheds for different environments (different stuff for base Mac, Linux, and Windows systems, or for venvs targeted to Gtk+ vs. PyObjC vs. Flask web services, or whatever), but I can't imagine wanting to install 7 other people's sheds all at once or something. If someone else's shed were useful enough to me, I'd either merge it into mine, or suggest that they clean it up and put it on PyPI as a real distribution instead of a personal shed (or fork it and do it myself, if they didn't want to maintain it).

It's really not much different from using .emacs files (except for the added bonus of being able to pull in dependencies from PyPI and GitHub automatically). I used to look around dotfiles for ideas to borrow from other people's configs, but I never wanted to install 3 .emacs files at the same time.

> I don't want it to just overwrite my scripts when
> somebody updates their repository.
> 
>>>> ??? What if you have two shed repositories with different tools?
>> 
>> ... which solves that problem.
>> 
>>> I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And
>>> there won't be two sheds, because there is only one me.
>> 
>> ... even if someone figures out how to fork and clone Chris or Anatoly.
>> 
>> Anyway, it works for me a lot better than the floppy I used to carry around with abutils.py, .emacs, and half a dozen other files I couldn't live without on a new/borrowed computer.
> 
> There was no Python in my floppy universe. It probably appeared 10 years later
> when internet became more accessible and Google said they are hiring. =)
> 
> I am now more inclined that there needs to be a shed convention to gather
> statistical data on custom root level importable that may be handy for some
> kind of "altlib" distribution.

I doubt you'd find much of use. There are specialized communities that have a broad set of things usable to most of the community, but those communities already have distributions like Python(x,y) that take care of that. I can't imagine too many things that would be useful to almost everyone. Even obvious things like lxml (and even if you could solve release schedule and similar problems), there are plenty of people with absolutely no need for it, and it has external dependencies like libxml2 that you wouldn't want to force on everyone.

Also, I think this thread has shown that, even though the basic shed idea is pretty common among experienced Python devs, different people prefer different variations--whether to pip install or just clone into your venv site-packages, how extensively to make use of git features like submodules or branches, etc.

But maybe promoting the idea as a suggestion somewhere in the Python or PyPA docs would get everyone closer to a convention that would make it easier to track. I'm not sure where you'd put it or what you'd say; any ideas?

From steve at pearwood.info  Wed May 27 14:18:32 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 27 May 2015 22:18:32 +1000
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
Message-ID: <20150527121832.GD932@ando.pearwood.info>

On Wed, May 27, 2015 at 04:21:01AM +1000, Chris Angelico wrote:
> On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik <techtonik at gmail.com> wrote:
> >> -- if it could be reset and restarted at need -- why not start it
> >>    manually in the first place?
> >
> > Current ways of measuring script run-time are not cross-platform or
> > not memorizable. I have to reinvent timer code a couple of times, and
> > that's not convenient for the code that is only relevant while debugging.
> 
> Sounds to me like something that doesn't belong in the stdlib, but
> makes a great utility module for private use.

I disagree. I don't think it makes a good utility. I think it is a 
terrible design, for a number of reasons.

(1) Module top level code runs only the first time you import it, after 
that the module is loaded from cache and the code doesn't run again. So 

import timer  # starts a timer

will only start the time the first time you import it. To make it work 
the second time, you have to do:

del sys.modules['timer']
del timer
import timer

(2) Suppose you find some hack that fixes that problem. Now you have 
another problem: it's too hard to control when the timer starts. You 
only have one choice: immediately after the import. So we *have* to 
write our code like this:

import a, b, c  # do our regular imports
setup = x + y + z  # setup everything in advance
import timer
main()

If you move the import timer where the other imports are, as PEP 8 
suggests, you'll time too much: all the setup code as well.

(3) You can only have one timer at a time. You can't run the timer in 
two different threads. (At least not with the simplistic UI of "import 
starts the timer, timer.stop() stops the timer".

Contrast that to how timeit works: timeit is an ordinary module that 
requires no magic to work. Importing it is not the same as running it. 
You can import it at the top of your code, follow it by setup code, and 
run the timeit.Timer whenever you like. You can have as many, or as few, 
timers as you want. The only downside to timeit is that you normally 
have to provide the timed code as a string.

I have a timer context manager which is designed for timing long-running 
code. You write the code in a "with" block:

with Stopwatch():
    do_this()
    do_that()


The context manager starts the timer when you enter, and stops it when 
you leave. By default it prints the time used, but you can easily 
suppress printing and capture the result instead. I've been using this 
for a few years now, and it works well. The only downside is that it 
works too well, so I'm tempted to use it for micro code snippets, so I 
have it print a warning if the time taken is too small:

py> with Stopwatch():
...     n = len("spam")
...
elapsed time is very small; consider using timeit.Timer for 
micro-timings of small code snippets
time taken: 0.000010 seconds



-- 
Steve

From abarnert at yahoo.com  Wed May 27 14:29:37 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 27 May 2015 05:29:37 -0700
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <20150527121832.GD932@ando.pearwood.info>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <20150527121832.GD932@ando.pearwood.info>
Message-ID: <6ED56DD5-2260-4207-B242-E29E71D0B3A4@yahoo.com>

On May 27, 2015, at 05:18, Steven D'Aprano <steve at pearwood.info> wrote:
> 
>> On Wed, May 27, 2015 at 04:21:01AM +1000, Chris Angelico wrote:
>> On Wed, May 27, 2015 at 4:05 AM, anatoly techtonik <techtonik at gmail.com> wrote:
>>>> -- if it could be reset and restarted at need -- why not start it
>>>>   manually in the first place?
>>> 
>>> Current ways of measuring script run-time are not cross-platform or
>>> not memorizable. I have to reinvent timer code a couple of times, and
>>> that's not convenient for the code that is only relevant while debugging.
>> 
>> Sounds to me like something that doesn't belong in the stdlib, but
>> makes a great utility module for private use.
> 
> I disagree. I don't think it makes a good utility. I think it is a 
> terrible design, for a number of reasons.
> 
> (1) Module top level code runs only the first time you import it, after 
> that the module is loaded from cache and the code doesn't run again. So 
> 
> import timer  # starts a timer
> 
> will only start the time the first time you import it. To make it work 
> the second time, you have to do:
> 
> del sys.modules['timer']
> del timer
> import timer
> 
> (2) Suppose you find some hack that fixes that problem. Now you have 
> another problem: it's too hard to control when the timer starts. You 
> only have one choice: immediately after the import. So we *have* to 
> write our code like this:
> 
> import a, b, c  # do our regular imports
> setup = x + y + z  # setup everything in advance
> import timer
> main()
> 
> If you move the import timer where the other imports are, as PEP 8 
> suggests, you'll time too much: all the setup code as well.
> 
> (3) You can only have one timer at a time. You can't run the timer in 
> two different threads. (At least not with the simplistic UI of "import 
> starts the timer, timer.stop() stops the timer".

Presumably you could add a "timer.restart()" to the UI.

(But in that case, how much does it really cost to use that at the start of the module instead of the magic import anyway? It's like your system uptime; it's hard to find any use for that besides actually reporting system uptime...)

> Contrast that to how timeit works: timeit is an ordinary module that 
> requires no magic to work. Importing it is not the same as running it. 
> You can import it at the top of your code, follow it by setup code, and 
> run the timeit.Timer whenever you like. You can have as many, or as few, 
> timers as you want. The only downside to timeit is that you normally 
> have to provide the timed code as a string.

For many uses, providing it as a function call works just fine, in which case there are no downsides at all.

> I have a timer context manager which is designed for timing long-running 
> code. You write the code in a "with" block:
> 
> with Stopwatch():
>    do_this()
>    do_that()
> 
> 
> The context manager starts the timer when you enter, and stops it when 
> you leave. By default it prints the time used, but you can easily 
> suppress printing and capture the result instead. I've been using this 
> for a few years now, and it works well. The only downside is that it 
> works too well, so I'm tempted to use it for micro code snippets, so I 
> have it print a warning if the time taken is too small:
> 
> py> with Stopwatch():
> ...     n = len("spam")
> ...
> elapsed time is very small; consider using timeit.Timer for 
> micro-timings of small code snippets
> time taken: 0.000010 seconds

That's a clever idea. I have something very similar, and I sometimes find myself abusing it that way...


From jeanpierreda at gmail.com  Wed May 27 14:52:17 2015
From: jeanpierreda at gmail.com (Devin Jeanpierre)
Date: Wed, 27 May 2015 05:52:17 -0700
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <6ED56DD5-2260-4207-B242-E29E71D0B3A4@yahoo.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <20150527121832.GD932@ando.pearwood.info>
 <6ED56DD5-2260-4207-B242-E29E71D0B3A4@yahoo.com>
Message-ID: <CABicbJKVLjB-WXE6xM5O9asTi5e8AWYGYUKOoEpZ3PRoigY2_w@mail.gmail.com>

On Wed, May 27, 2015 at 5:29 AM, Andrew Barnert via Python-ideas
<python-ideas at python.org> wrote:
> On May 27, 2015, at 05:18, Steven D'Aprano <steve at pearwood.info> wrote:
>>
>> I have a timer context manager which is designed for timing long-running
>> code. You write the code in a "with" block:
>>
>> with Stopwatch():
>>    do_this()
>>    do_that()
>>
>>
>> The context manager starts the timer when you enter, and stops it when
>> you leave. By default it prints the time used, but you can easily
>> suppress printing and capture the result instead. I've been using this
>> for a few years now, and it works well. The only downside is that it
>> works too well, so I'm tempted to use it for micro code snippets, so I
>> have it print a warning if the time taken is too small:
>>
>> py> with Stopwatch():
>> ...     n = len("spam")
>> ...
>> elapsed time is very small; consider using timeit.Timer for
>> micro-timings of small code snippets
>> time taken: 0.000010 seconds
>
> That's a clever idea. I have something very similar, and I sometimes find myself abusing it that way...

Why not use an iterable stopwatch that measures time between calls to
__next__/next?

for _ in Stopwatch():
    ....

-- Devin

From techtonik at gmail.com  Wed May 27 10:21:07 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Wed, 27 May 2015 11:21:07 +0300
Subject: [Python-ideas] Lossless bulletproof conversion to unicode
	(backslashing)
In-Reply-To: <5564BFD3.7000101@stoneleaf.us>
References: <CAPkN8xKTXJu2nhvocG8KuyO1XkJVfK_WsmY6dM=hWsVyg+BVyA@mail.gmail.com>
 <5564BFD3.7000101@stoneleaf.us>
Message-ID: <CAPkN8xLvDABdisuD3MZrDvnpyL=fRUgG38OLnYQE-NiEuOUL3A@mail.gmail.com>

On Tue, May 26, 2015 at 9:47 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 05/26/2015 11:30 AM, anatoly techtonik wrote:
>
> [...]
>
>> How to accomplish that with Python 2.x?
>
>
> This should be on Python List, not on Ideas.

The way to do this, probably, the idea to make it into unicode()
function belongs here. So, if you're replying to this thread and
read the letter, are against or for the idea?

-- 
anatoly t.

From techtonik at gmail.com  Wed May 27 10:47:29 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Wed, 27 May 2015 11:47:29 +0300
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <20150526190646.GA12204@phdru.name>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <20150526190646.GA12204@phdru.name>
Message-ID: <CAPkN8xJ03QyJAAKXRP-jRPTBEiTWOH3ngaZ+YXYfjfOB=LT1jw@mail.gmail.com>

On Tue, May 26, 2015 at 10:06 PM, Oleg Broytman <phd at phdru.name> wrote:
> On Tue, May 26, 2015 at 09:05:06PM +0300, anatoly techtonik <techtonik at gmail.com> wrote:
>> On Fri, May 22, 2015 at 1:58 PM, Oleg Broytman <phd at phdru.name> wrote:
>> > On Fri, May 22, 2015 at 12:59:30PM +0300, anatoly techtonik <techtonik at gmail.com> wrote:
>> >> Is the idea to have timer that starts on import is good?
>> >
>> >    No, because:
>> >
>> > -- it could be imported at the wrong time;
>>
>> Any time is right.
>
>    Very much application-dependent. What if you wanna measure import
> time?

The design principle is that default behaviour is designed:

1. most simple/intuitive thought
2. most often needed operation

Every "what if" means you need to do non-default customization,
such as care to place starttimer into your bootstrap script. If you want
to trace, when exactly the module is imported, it can record the caller
full name sys.path:module.class.method (provided that Python
supports this), and lines executed from the Python start.

>> > -- it couldn't be "reimported"; what is the usage of one-time timer?
>>
>> The idea is to have convenient default timer to measure
>> script run-time.
>
>    Good idea for a small separate project. Bad for the stdlib. Not every
> small simple useful module must be in the stdlib.

Yes. That's not a criteria. The criteria that modules that save time
during development should come with bundled.

Or another idea - the stdlib should provide a standard layout that people
can replicate in their "shed" repositories on Github. Then by crawling
these repositories, the names and contents could be aggregated into
stats to see what are the most popular imports.

That way it will be quickly to identify useful stuff that people coming from
other languages find missing in Python. Also, it will allow people to
document the behavior differences for modules named the same.
-- 
anatoly t.

From techtonik at gmail.com  Wed May 27 10:50:09 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Wed, 27 May 2015 11:50:09 +0300
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPTjJmrzQfXqgAjrH+C0D=+VwpCVMErpPkh15q9UzM-D9Ax=3w@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
 <CAPTjJmrzQfXqgAjrH+C0D=+VwpCVMErpPkh15q9UzM-D9Ax=3w@mail.gmail.com>
Message-ID: <CAPkN8x+qnjBQE+RLFtqmS6Nc5aiNQ1f7FD5ROLnvr3XGGWmTbw@mail.gmail.com>

On Tue, May 26, 2015 at 9:28 PM, Chris Angelico <rosuav at gmail.com> wrote:
> On Wed, May 27, 2015 at 4:24 AM, anatoly techtonik <techtonik at gmail.com> wrote:
>> There are a lot of helpers like this that might be useful. Installing them
>> separately is a lot of hassle - it is easy to forget some.
>
> Package 'em all up into a single repository and clone that repo on
> every system you use. For me, that's called "shed", and I keep it on
> github:
>
> https://github.com/Rosuav/shed
>
> But whether it's public or private, git or hg, pure Python or a mix of
> languages, it's an easy way to pick up all those convenient little
> scripts. You'll never "forget some", because they're all in one place.

How do you make these importable? Do you git clone it from site-packages?
Like:

cd site-packages/
git clone .../shed .

??? What if you have two shed repositories with different tools?

From techtonik at gmail.com  Wed May 27 11:20:10 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Wed, 27 May 2015 12:20:10 +0300
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
Message-ID: <CAPkN8x+H8ho1k9d2mrMuhGn97FjedOs8VMj3EaFsTxuCh1nGhQ@mail.gmail.com>

On Sat, May 23, 2015 at 2:48 AM, Ryan Gonzalez <rymg19 at gmail.com> wrote:
> HAHAHA!!
>
> Good luck! I've raised this issue before. Twice. Autotools sucks.

Yes. If even people from Linux world say that autotools suxx
( http://esr.ibiblio.org/?p=1877 ). Imagine frustration of someone with Windows
background. Perl, macros, what the hell is all this about?

> And makes
> cross-compiling a pain in the neck. Bottom line was:
>
> - C++ is a big dependency
> - The autotools build system has been tested already on lots and lots and
> lots of platforms

Orly? The ticket to build Python with MinGW on Windows was filled many
years ago, and I am not sure if it works. Maybe it was tested, but platforms
evolve and break things, so those Autotools probably contain more kludges
that it is realistic for maintenance.

> - Nobody has even implemented an alternative build system for Python 3 yet
> (python-cmake is only for Python 2)

Because Python development is concentrated around patching Python itself,
there is no practice in eating your own dogfood when making decisions. Take
a SCons, for example, and try to port that to Python 3. You will see the key
points that need to be solved (see the bulletproof unicode thread in this list).
If Python developers had those toys at hand, the Python 3 would be more
practical language, but looks like it is a task for a university or a
full time paid job, because it is not fun for anybody here to do actual
development *in Python*, and discussing Python usage issues in development
lists is discouraged even though the issues raised there are important for
language usability.

> - No one can agree on a best build system (for instance, I hate CMake!)

There is no best build system, because there is no build book with a
reference of "best" or even "good enough" criteria. Even Google failed to give
good rationale while releasing their Bazel. It sounded like "that worked for us
better". Also, most build packages are about a fairly complex subject of
tracking dependencies, caching and traversing graphs, and their documentation
often doesn't have any graphics at all! Knowing how long it takes for
a free time
coder to draw a picture, only the mighty company can allow that, but even they
don't allow their "valuable resources" to spend time on that.

So, the problem is not to use fancy build system, but to use one that most
people with Python background can use and enhance. CMake needs C++ skills
and a separate install, SCons can be stuffed into repository. I am sure there
are plenty of other Python build systems to choose from that work like this.
Also, if you look at why SCons codebase, you'll notice a huge wrapping layers
over subprocess management and other things that should came shipped with
the Python itself just to make it a good platform for system tools. I
believe that
the sole reason why Python loses to Go in systems programming is that the
complexity of those wrappings that you need to do over Python to make core
cross-platforms concepts right.

-- 
anatoly t.

From techtonik at gmail.com  Wed May 27 11:32:00 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Wed, 27 May 2015 12:32:00 +0300
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <CAPTjJmqJqBLDfkFm44jw-i67yLPEO6F=sTV1vF9mqGLQJwKMKg@mail.gmail.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
 <CAPTjJmrzQfXqgAjrH+C0D=+VwpCVMErpPkh15q9UzM-D9Ax=3w@mail.gmail.com>
 <CAPkN8x+qnjBQE+RLFtqmS6Nc5aiNQ1f7FD5ROLnvr3XGGWmTbw@mail.gmail.com>
 <CAPTjJmqJqBLDfkFm44jw-i67yLPEO6F=sTV1vF9mqGLQJwKMKg@mail.gmail.com>
Message-ID: <CAPkN8xKmYuNShXHhuMNvZsd=GDpckjOBYjM=9YR0QbBjR97QXA@mail.gmail.com>

On Wed, May 27, 2015 at 12:00 PM, Chris Angelico <rosuav at gmail.com> wrote:
> On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik <techtonik at gmail.com> wrote:
>> How do you make these importable? Do you git clone it from site-packages?
>> Like:
>>
>> cd site-packages/
>> git clone .../shed .
>>
>> ??? What if you have two shed repositories with different tools?
>
> I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And
> there won't be two sheds, because there is only one me.

symlink every module?

From techtonik at gmail.com  Wed May 27 13:30:53 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Wed, 27 May 2015 14:30:53 +0300
Subject: [Python-ideas] Timer that starts as soon as it is imported
In-Reply-To: <D5125CFC-4DBE-4809-814D-CEF657C89F67@yahoo.com>
References: <CAPkN8xJRkP+3266FTjT3tKJPYwgzg=3Q04jjn8ZRxkYDoFP7iA@mail.gmail.com>
 <20150522105847.GA9624@phdru.name>
 <CAPkN8x+MCUmT7O_Yze38uBankr6nfu8q8qdxA=Lsx4mq7AjK4Q@mail.gmail.com>
 <CAPTjJmqzY5tt1DQH4b=XPK48QfU+ozFP4JS8Km10JgTeEXmLGQ@mail.gmail.com>
 <CAPkN8xJTba+pAnAPb3OVOi==CCgdP9gBU9Qr5xHqmO8XMCZLTA@mail.gmail.com>
 <CAPTjJmrzQfXqgAjrH+C0D=+VwpCVMErpPkh15q9UzM-D9Ax=3w@mail.gmail.com>
 <CAPkN8x+qnjBQE+RLFtqmS6Nc5aiNQ1f7FD5ROLnvr3XGGWmTbw@mail.gmail.com>
 <CAPTjJmqJqBLDfkFm44jw-i67yLPEO6F=sTV1vF9mqGLQJwKMKg@mail.gmail.com>
 <D5125CFC-4DBE-4809-814D-CEF657C89F67@yahoo.com>
Message-ID: <CAPkN8xLdn_6ESGTxcVBbQBUX4KYAECcZ4DcPqjFQ8b2pGgSAEQ@mail.gmail.com>

On Wed, May 27, 2015 at 12:43 PM, Andrew Barnert via Python-ideas
<python-ideas at python.org> wrote:
> On May 27, 2015, at 02:00, Chris Angelico <rosuav at gmail.com> wrote:
>>
>>> On Wed, May 27, 2015 at 6:50 PM, anatoly techtonik <techtonik at gmail.com> wrote:
>>> How do you make these importable? Do you git clone it from site-packages?
>>> Like:
>>>
>>> cd site-packages/
>>> git clone .../shed .
>
> Or just build a trivial distribution out of it and then you can just "pip install git+https://github.com/you/repo".

But that would make it nested under the "repo" package namespace, no?
If not, then how pip detects conflicts when the same file is provided
by different sheds? I don't want it to just overwrite my scripts when
somebody updates their repository.

>>> ??? What if you have two shed repositories with different tools?
>
> ... which solves that problem.
>
>> I prefer to operate out of ~ so I'd symlink, but otherwise, yes. And
>> there won't be two sheds, because there is only one me.
>
> ... even if someone figures out how to fork and clone Chris or Anatoly.
>
> Anyway, it works for me a lot better than the floppy I used to carry around with abutils.py, .emacs, and half a dozen other files I couldn't live without on a new/borrowed computer.

There was no Python in my floppy universe. It probably appeared 10 years later
when internet became more accessible and Google said they are hiring. =)

I am now more inclined that there needs to be a shed convention to gather
statistical data on custom root level importable that may be handy for some
kind of "altlib" distribution.
-- 
anatoly t.

From p.f.moore at gmail.com  Wed May 27 17:28:30 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 27 May 2015 16:28:30 +0100
Subject: [Python-ideas] Lossless bulletproof conversion to unicode
	(backslashing)
In-Reply-To: <CAPkN8xKTXJu2nhvocG8KuyO1XkJVfK_WsmY6dM=hWsVyg+BVyA@mail.gmail.com>
References: <CAPkN8xKTXJu2nhvocG8KuyO1XkJVfK_WsmY6dM=hWsVyg+BVyA@mail.gmail.com>
Message-ID: <CACac1F9whLHJDxpCZGXpf6UXMYu-BHEd5zvT0HwQjf3qXBSvrA@mail.gmail.com>

On 26 May 2015 at 19:30, anatoly techtonik <techtonik at gmail.com> wrote:
> In real world you have to deal with broken and invalid
> output and UnicodeDecode crashes is not an option.
> The unicode() constructor proposes two options to
> deal with invalid output:
>
> 1. ignore  - meaning skip and corrupt the data
> 2. replace  - just corrupt the data

There are other error handlers, specifically surrogateescape is
designed for this use. Only in Python 3.x admittedly, but this list is
about future versions of Python, so that's what matters here.

> The solution is to have filter preprocess the binary
> string to escape all non-unicode symbols so that the
> following lossless transformation becomes possible:
>
>    binary -> escaped utf-8 string -> unicode -> binary
>
> How to accomplish that with Python 2.x?

That question is for python-list. Language changes will only be made
to 3.x - python-ideas isn't appropriate for questions about how to
achieve something in 2.x.

Paul

From rymg19 at gmail.com  Wed May 27 18:47:09 2015
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Wed, 27 May 2015 11:47:09 -0500
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <CAPkN8x+H8ho1k9d2mrMuhGn97FjedOs8VMj3EaFsTxuCh1nGhQ@mail.gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
 <CAPkN8x+H8ho1k9d2mrMuhGn97FjedOs8VMj3EaFsTxuCh1nGhQ@mail.gmail.com>
Message-ID: <409899E1-14F7-46E9-B8A3-0D152C2C44C6@gmail.com>



On May 27, 2015 4:20:10 AM CDT, anatoly techtonik <techtonik at gmail.com> wrote:
>On Sat, May 23, 2015 at 2:48 AM, Ryan Gonzalez <rymg19 at gmail.com>
>wrote:
>> HAHAHA!!
>>
>> Good luck! I've raised this issue before. Twice. Autotools sucks.
>
>Yes. If even people from Linux world say that autotools suxx
>( http://esr.ibiblio.org/?p=1877 ). Imagine frustration of someone with
>Windows
>background. Perl, macros, what the hell is all this about?
>
>> And makes
>> cross-compiling a pain in the neck. Bottom line was:
>>
>> - C++ is a big dependency
>> - The autotools build system has been tested already on lots and lots
>and
>> lots of platforms
>
>Orly? The ticket to build Python with MinGW on Windows was filled many
>years ago, and I am not sure if it works. Maybe it was tested, but
>platforms
>evolve and break things, so those Autotools probably contain more
>kludges
>that it is realistic for maintenance.
>
>> - Nobody has even implemented an alternative build system for Python
>3 yet
>> (python-cmake is only for Python 2)
>
>Because Python development is concentrated around patching Python
>itself,
>there is no practice in eating your own dogfood when making decisions.
>Take
>a SCons, for example, and try to port that to Python 3. You will see
>the key
>points that need to be solved (see the bulletproof unicode thread in
>this list).
>If Python developers had those toys at hand, the Python 3 would be more
>practical language, but looks like it is a task for a university or a
>full time paid job, because it is not fun for anybody here to do actual
>development *in Python*, and discussing Python usage issues in
>development
>lists is discouraged even though the issues raised there are important
>for
>language usability.
>
>> - No one can agree on a best build system (for instance, I hate
>CMake!)
>
>There is no best build system, because there is no build book with a
>reference of "best" or even "good enough" criteria. Even Google failed
>to give
>good rationale while releasing their Bazel. It sounded like "that
>worked for us
>better". Also, most build packages are about a fairly complex subject
>of
>tracking dependencies, caching and traversing graphs, and their
>documentation
>often doesn't have any graphics at all! Knowing how long it takes for
>a free time
>coder to draw a picture, only the mighty company can allow that, but
>even they
>don't allow their "valuable resources" to spend time on that.
>
>So, the problem is not to use fancy build system, but to use one that
>most
>people with Python background can use and enhance. CMake needs C++
>skills
>and a separate install, SCons can be stuffed into repository. I am sure
>there
>are plenty of other Python build systems to choose from that work like
>this.
>Also, if you look at why SCons codebase, you'll notice a huge wrapping
>layers
>over subprocess management and other things that should came shipped
>with
>the Python itself just to make it a good platform for system tools. I
>believe that
>the sole reason why Python loses to Go in systems programming is that
>the
>complexity of those wrappings that you need to do over Python to make
>core
>cross-platforms concepts right.

The main thing is that no API is perfectly cross-platform, and no API is bulletproof. Go can get away with that because Go is very opinionated. Python, on the other hand, has a huge user base that they don't want to tick off.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

From demianbrecht at gmail.com  Wed May 27 20:28:49 2015
From: demianbrecht at gmail.com (Demian Brecht)
Date: Wed, 27 May 2015 11:28:49 -0700
Subject: [Python-ideas] Increasing public package discoverability (was:
	Adding jsonschema to the standard library)
In-Reply-To: <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
Message-ID: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>


> On May 23, 2015, at 7:21 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> https://www.djangopackages.com/ covers this well for the Django
> ecosystem (I actually consider it to be one of Django's killer
> features, and I'm pretty sure I'm not alone in that - like
> ReadTheDocs, it was a product of DjangoDash 2010).

Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I?ve had in the back of my mind for a while:

With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn?t going to be a mechanism for that (and I?m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be ?recommended packages?. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced.

"Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.? (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library)

This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a ?recommended packages? framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set.



Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150527/1112fd85/attachment.sig>

From p.f.moore at gmail.com  Wed May 27 20:46:18 2015
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 27 May 2015 19:46:18 +0100
Subject: [Python-ideas] Increasing public package discoverability (was:
 Adding jsonschema to the standard library)
In-Reply-To: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
Message-ID: <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>

On 27 May 2015 at 19:28, Demian Brecht <demianbrecht at gmail.com> wrote:
> This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules?

It has been discussed on a number of occasions. The major issue with
the idea is that a lot of people use Python in closed corporate
environments, where access to the internet from tools such as pip can
be restricted. Also, many companies have legal approval processes for
software - getting approval for "Python" includes the standard
library, but each external package required would need a separate,
probably lengthy and possibly prohibitive, approval process before it
could be used.

So it's unlikely to ever happen, because it would cripple Python for a
non-trivial group of its users.

Paul

From graffatcolmingov at gmail.com  Wed May 27 20:55:35 2015
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Wed, 27 May 2015 13:55:35 -0500
Subject: [Python-ideas] Increasing public package discoverability (was:
 Adding jsonschema to the standard library)
In-Reply-To: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
Message-ID: <CAN-Kwu2tOi8dqNNtm21rUCKB-oWVum_VK4OoXEU2JKmSHcr5dw@mail.gmail.com>

On Wed, May 27, 2015 at 1:28 PM, Demian Brecht <demianbrecht at gmail.com> wrote:
>
>> On May 23, 2015, at 7:21 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> https://www.djangopackages.com/ covers this well for the Django
>> ecosystem (I actually consider it to be one of Django's killer
>> features, and I'm pretty sure I'm not alone in that - like
>> ReadTheDocs, it was a product of DjangoDash 2010).
>
> Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I?ve had in the back of my mind for a while:
>
> With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn?t going to be a mechanism for that (and I?m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be ?recommended packages?. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced.
>
> "Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.? (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library)
>
> This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a ?recommended packages? framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set.

The mirror of this would be asking if Django should rip out it's base
classes for models, views, etc.  I think Python 4 could move towards
perhaps deprecating any duplicated modules, but I see no point to rip
the entire standard library out... except maybe for
httplib/urllib/etc. (for various reasons beyond my obvious conflict of
interest).

> Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From demianbrecht at gmail.com  Wed May 27 20:57:18 2015
From: demianbrecht at gmail.com (Demian Brecht)
Date: Wed, 27 May 2015 11:57:18 -0700
Subject: [Python-ideas] Increasing public package discoverability (was:
	Adding jsonschema to the standard library)
In-Reply-To: <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
Message-ID: <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com>


> On May 27, 2015, at 11:46 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> 
> So it's unlikely to ever happen, because it would cripple Python for a
> non-trivial group of its users.

I?m just throwing ideas at the wall here, but would it not be possible to release two versions, one for those who choose to use decentralized packages with out-of-band releases and one with all ?recommended? packages bundled (obvious potential for version conflicts and such aside)? If one of the prerequisites of a ?recommended? package was that it?s released under PSFL, I?m assuming there wouldn?t be any legal issues with going down such a path? That way, you still get the ability to decentralize the library, but don?t alienate the user base that can?t rely on pip?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150527/273d8c6c/attachment.sig>

From donald at stufft.io  Wed May 27 21:03:52 2015
From: donald at stufft.io (Donald Stufft)
Date: Wed, 27 May 2015 15:03:52 -0400
Subject: [Python-ideas] Increasing public package discoverability (was:
 Adding jsonschema to the standard library)
In-Reply-To: <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
 <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com>
Message-ID: <etPan.55661518.5c3d4cdd.12a4d@Draupnir.home>



On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht at gmail.com) wrote:
>  
> > On May 27, 2015, at 11:46 AM, Paul Moore wrote:
> >
> > So it's unlikely to ever happen, because it would cripple Python for a
> > non-trivial group of its users.
>  
> I?m just throwing ideas at the wall here, but would it not be possible to release two versions,  
> one for those who choose to use decentralized packages with out-of-band releases and  
> one with all ?recommended? packages bundled (obvious potential for version conflicts  
> and such aside)? If one of the prerequisites of a ?recommended? package was that it?s  
> released under PSFL, I?m assuming there wouldn?t be any legal issues with going down  
> such a path? That way, you still get the ability to decentralize the library, but don?t  
> alienate the user base that can?t rely on pip?


I?m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call ?FooLang Core? or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that ?FooLang Platform? or something.

This means that people who want/need a comprehensive standard library can get the Platform edition of the runtime which will function similar to the standard library of a language. However, if they run into some critical feature they need or a bug fix, they can selectively choose to step outside of that preset package versions and install a newer version of one of the bundled software. Of course they can install non-bundled software as well.

As far as Python is concerned, while I think the above model is better in the general sense, I think that it?s probably too late to switch to that, the history of having a big standard library goes back pretty far and a lot of people and processes depend on it. We?re also still trying to heal the rift that 3.x created, and creating a new rift is probably not the most effective use of time. It?s also the case (though we?re working to make it less true) that our packaging tools still can routinely run into problems that would make me uncomfortable using them for this approach.

---  
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



From demianbrecht at gmail.com  Wed May 27 21:13:09 2015
From: demianbrecht at gmail.com (Demian Brecht)
Date: Wed, 27 May 2015 12:13:09 -0700
Subject: [Python-ideas] Increasing public package discoverability (was:
	Adding jsonschema to the standard library)
In-Reply-To: <CAN-Kwu2tOi8dqNNtm21rUCKB-oWVum_VK4OoXEU2JKmSHcr5dw@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CAN-Kwu2tOi8dqNNtm21rUCKB-oWVum_VK4OoXEU2JKmSHcr5dw@mail.gmail.com>
Message-ID: <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com>


> On May 27, 2015, at 11:55 AM, Ian Cordasco <graffatcolmingov at gmail.com> wrote:
> 
> The mirror of this would be asking if Django should rip out it's base
> classes for models, views, etc.  I think Python 4 could move towards
> perhaps deprecating any duplicated modules, but I see no point to rip
> the entire standard library out... except maybe for
> httplib/urllib/etc. (for various reasons beyond my obvious conflict of
> interest).

I can somewhat see the comparison, but not entirely because Django itself is a package and not the core interpreter and set of builtins. There are also other frameworks that split out modules from the core (I?m not overly familiar with either, but I believe both zope and wheezy follow such models).

The major advantage of going with a fully distributed model would be the out-of-band releases. While nice to have for feature development, it can be crucial for bug fixes, but even more so for security patches. Other than that, I could see it opening the door to adoption of packages as ?recommended? without worrying too much about state of development. requests is a perfect example of that. Note that my personal focus on standard library development is the http package so I?m somewhat cutting my legs out from under me, but I?m starting to think that adopting such a distribution mechanism might solve a number of problems (but is probably just as likely to introduce new ones ;)).

I?m also aware of the politics of such a change. What does it mean then for core devs who concentrate on the current standard library and don?t contribute to the interpreter core or builtins?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150527/050c4413/attachment.sig>

From demianbrecht at gmail.com  Wed May 27 21:16:37 2015
From: demianbrecht at gmail.com (Demian Brecht)
Date: Wed, 27 May 2015 12:16:37 -0700
Subject: [Python-ideas] Increasing public package discoverability (was:
	Adding jsonschema to the standard library)
In-Reply-To: <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CAN-Kwu2tOi8dqNNtm21rUCKB-oWVum_VK4OoXEU2JKmSHcr5dw@mail.gmail.com>
 <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com>
Message-ID: <123C893A-CDA1-42E1-ACE6-C70A29A103C0@gmail.com>


> On May 27, 2015, at 12:13 PM, Demian Brecht <demianbrecht at gmail.com> wrote:
> 
> without worrying too much about state of development

I should have elaborated on this more: What I mean is more around feature development, such as introducing HTTP/2.0 to requests. The core feature set would still have to be well proven and have minimal to no changes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150527/d3ca3354/attachment.sig>

From wes.turner at gmail.com  Wed May 27 21:23:23 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Wed, 27 May 2015 14:23:23 -0500
Subject: [Python-ideas] Increasing public package discoverability (was:
 Adding jsonschema to the standard library)
In-Reply-To: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
Message-ID: <CACfEFw9z=coG5mvPRzCBStGu1oPV6--YSnRsK=UopdsdnUAe8A@mail.gmail.com>

On Wed, May 27, 2015 at 1:28 PM, Demian Brecht <demianbrecht at gmail.com>
wrote:

>
> > On May 23, 2015, at 7:21 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >
> > https://www.djangopackages.com/ covers this well for the Django
> > ecosystem (I actually consider it to be one of Django's killer
> > features, and I'm pretty sure I'm not alone in that - like
> > ReadTheDocs, it was a product of DjangoDash 2010).
>
> Thanks again all for the great discussion here. It seems to have taken
> quite a turn to a couple other points that I?ve had in the back of my mind
> for a while:
>
> With with integration of pip and the focus on non-standard library
> packages, how do we increase discoverability? If the standard library isn?t
> going to be a mechanism for that (and I?m not putting forward the argument
> that it should), adopting something like Django Packages might be
> tremendously beneficial. Perhaps on top of what Django Packages already
> has, there could be ?recommended packages?. Recommended packages could go
> through nearly just as much of a rigorous review process as standard
> library adoption before being flagged, although there would be a number of
> barriers reduced.
>

So there is a schema.org/SoftwareApplication (or doap:Project, or seon:)
Resource,
which has

* a unique URI (e.g. http://python.org/pypi/readme)
* JSON metadata extracted from setup.py into pydist.json (setuptools, wheel)
  - [ ] create JSON-LD @context
  - [ ] create mappings to standard schema
    * [ ] http://schema.org/SoftwareApplication
    * [ ] http://schema.org/SoftwareSourceCode

In terms of schema.org, a Django Packages resource has:

* [ ] a unique URI
* [ ] typed features (predicates with ranges)
* [ ] http://schema.org/review
* [ ] http://schema.org/VoteAction
* [ ] http://schema.org/LikeAction


>
> "Essentially, the standard library is where a library goes to die. It is
> appropriate for a module to be included when active development is no
> longer necessary.? (
> https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library
> )
>
> This is probably a silly idea, but given the above quote and the new(er)
> focus on pip and distributed packages, has there been any discussion around
> perhaps deprecating (and entirely removing from a Python 4 release)
> non-builtin packages and modules? I would think that if there was a system
> similar to Django Packages that made discoverability/importing of packages
> as easy as using those in the standard library, having a distributed
> package model where bug fixes and releases could be done out of band with
> CPython releases would likely more beneficial to the end users. If there
> was a ?recommended packages? framework, perhaps there could also be
> buildbots put to testing interoperability of the recommended package set.
>
>
Tox is great for this (in conjunction with whichever build system:
BuildBot, TravisCI)



>
>
> Also, to put the original question in this thread to rest, while I
> personally think that the addition of jsonschema to the standard library,
> whether as a top level package or perhaps splitting the json module into a
> package and introducing it there would be beneficial, I think that solving
> the distributed package discoverability is a much more interesting problem
> and would serve many more packages and users. Aside from that, solving that
> problem would have the same intended effect as integrating jsonschema into
> the standard library.
>

jsonschema // JSON-LD (RDF)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150527/1e70574e/attachment-0001.html>

From breamoreboy at yahoo.co.uk  Wed May 27 22:28:07 2015
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Wed, 27 May 2015 21:28:07 +0100
Subject: [Python-ideas] Increasing public package discoverability
In-Reply-To: <etPan.55661518.5c3d4cdd.12a4d@Draupnir.home>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
 <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com>
 <etPan.55661518.5c3d4cdd.12a4d@Draupnir.home>
Message-ID: <mk59cp$ebd$1@ger.gmane.org>

On 27/05/2015 20:03, Donald Stufft wrote:
>
>
> On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht at gmail.com) wrote:
>>
>>> On May 27, 2015, at 11:46 AM, Paul Moore wrote:
>>>
>>> So it's unlikely to ever happen, because it would cripple Python for a
>>> non-trivial group of its users.
>>
>> I?m just throwing ideas at the wall here, but would it not be possible to release two versions,
>> one for those who choose to use decentralized packages with out-of-band releases and
>> one with all ?recommended? packages bundled (obvious potential for version conflicts
>> and such aside)? If one of the prerequisites of a ?recommended? package was that it?s
>> released under PSFL, I?m assuming there wouldn?t be any legal issues with going down
>> such a path? That way, you still get the ability to decentralize the library, but don?t
>> alienate the user base that can?t rely on pip?
>
>
> I?m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call ?FooLang Core? or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that ?FooLang Platform? or something.
>
> This means that people who want/need a comprehensive standard library can get the Platform edition of the runtime which will function similar to the standard library of a language. However, if they run into some critical feature they need or a bug fix, they can selectively choose to step outside of that preset package versions and install a newer version of one of the bundled software. Of course they can install non-bundled software as well.
>
> As far as Python is concerned, while I think the above model is better in the general sense, I think that it?s probably too late to switch to that, the history of having a big standard library goes back pretty far and a lot of people and processes depend on it. We?re also still trying to heal the rift that 3.x created, and creating a new rift is probably not the most effective use of time. It?s also the case (though we?re working to make it less true) that our packaging tools still can routinely run into problems that would make me uncomfortable using them for this approach.
>
> ---
> Donald Stufft
> PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>

Could Python 4 tear out the stdlib completely and go to pypi, to what I 
believe Nick Coghlan called stdlib+, or would this be A PEP Too Far, 
given the one or two minor issues over the move from Python 2 to Python 3?

Yes this is my very dry sense of humour working, but at the same time if 
it gets somebody thinking, which in turn gets somebody else thinking, 
then hopefully ideas come up which are practical and everybody benefits.

Just my ?0.02p worth.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence


From abarnert at yahoo.com  Wed May 27 23:50:52 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 27 May 2015 14:50:52 -0700
Subject: [Python-ideas] Increasing public package discoverability (was:
	Adding jsonschema to the standard library)
In-Reply-To: <etPan.55661518.5c3d4cdd.12a4d@Draupnir.home>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
 <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com>
 <etPan.55661518.5c3d4cdd.12a4d@Draupnir.home>
Message-ID: <59DB2A78-F6A7-45C0-A7EF-4152EE73504C@yahoo.com>

On May 27, 2015, at 12:03, Donald Stufft <donald at stufft.io> wrote:
> 
>> On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht at gmail.com) wrote:
>> 
>>> On May 27, 2015, at 11:46 AM, Paul Moore wrote:
>>> 
>>> So it's unlikely to ever happen, because it would cripple Python for a
>>> non-trivial group of its users.
>> 
>> I?m just throwing ideas at the wall here, but would it not be possible to release two versions,  
>> one for those who choose to use decentralized packages with out-of-band releases and  
>> one with all ?recommended? packages bundled (obvious potential for version conflicts  
>> and such aside)? If one of the prerequisites of a ?recommended? package was that it?s  
>> released under PSFL, I?m assuming there wouldn?t be any legal issues with going down  
>> such a path? That way, you still get the ability to decentralize the library, but don?t  
>> alienate the user base that can?t rely on pip?
> 
> 
> I?m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call ?FooLang Core? or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that ?FooLang Platform? or something.

Dependencies are always going to be a problem. The best way to parse XML is lxml (and the best way to parse HTML is BeautifulSoup plus lxml); does that mean that the Python Platform requires libxml2? The best way to do numerical computing is with NumPy, and the best way to build NumPy is with MKL on platforms where it exists, ATLAS on others; does that mean the Python Platform requires MKL and/or ATLAS? The best way to build cross-platform GUIs with desktop integration is PySide; does that mean the Python Platform requires Qt? (One of the biggest portability problems for Python in practice has always been Tcl/Tk; Qt would be much worse.)

You could look at it as something like the core plus distributions model used in OS's. FreeBSD has a core and ports; there's a simple rule for what's in core (a complete POSIX system plus enough to build ports, nothing else), and the practicality-vs.-purity decisions for how to apply that to real-life problems isn't that hard. But Linux took a different approach: it's just a kernel, and everything else--libc, the ports system, etc.--can be swapped out. There is no official distribution; at any given time in history, there are 3-6 competing "major distributions", dozens of others based on them, and some "special-case" distros like ucLinux or Android. And that means different distros can make different decisions on what dependencies are acceptable--include packages that only run on x86, or accept some corporate quasi-open-source license or closed-source blob.

Python seems to have fallen into a place halfway between the two. The stdlib is closer to FreeBSD core than to Linux. On the other hand, while many people start with the official stdlib and use pip to expand on it, there are third-party distributions competing to provide more useful or better-organized batteries than the official version, plus custom distributions that come with some OS distros (e.g., Apple includes PyObjC with theirs), and special things like Kivy.

That doesn't seem to have caused any harm, and may have caused a lot of benefit. While Python may not have found the perfect sweet spot, what it found isn't that bad. And the way it continues to evolve isn't that bad. If you could go back in time to 2010 and come up with a grand five-year plan for how the stdlib, core distribution, and third-party ecosystem should be better, how much different would Python be today?


From donald at stufft.io  Wed May 27 23:54:19 2015
From: donald at stufft.io (Donald Stufft)
Date: Wed, 27 May 2015 17:54:19 -0400
Subject: [Python-ideas] Increasing public package discoverability (was:
 Adding jsonschema to the standard library)
In-Reply-To: <59DB2A78-F6A7-45C0-A7EF-4152EE73504C@yahoo.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
 <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com>
 <etPan.55661518.5c3d4cdd.12a4d@Draupnir.home>
 <59DB2A78-F6A7-45C0-A7EF-4152EE73504C@yahoo.com>
Message-ID: <etPan.55663d0b.3aff8464.12a4d@Draupnir.home>



On May 27, 2015 at 5:50:55 PM, Andrew Barnert (abarnert at yahoo.com) wrote:
> On May 27, 2015, at 12:03, Donald Stufft wrote:
> >
> >> On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht at gmail.com) wrote:
> >>
> >>> On May 27, 2015, at 11:46 AM, Paul Moore wrote:
> >>>
> >>> So it's unlikely to ever happen, because it would cripple Python for a
> >>> non-trivial group of its users.
> >>
> >> I?m just throwing ideas at the wall here, but would it not be possible to release two  
> versions,
> >> one for those who choose to use decentralized packages with out-of-band releases  
> and
> >> one with all ?recommended? packages bundled (obvious potential for version conflicts  
> >> and such aside)? If one of the prerequisites of a ?recommended? package was that it?s  
> >> released under PSFL, I?m assuming there wouldn?t be any legal issues with going down  
> >> such a path? That way, you still get the ability to decentralize the library, but don?t  
> >> alienate the user base that can?t rely on pip?
> >
> >
> > I?m of the opinion that, given a brand new language, it makes more sense to have really  
> good packaging tools built in, but not to have a standard library. This you call ?FooLang  
> Core? or something of the sort. Then you take the most popular or the best examples or whatever  
> criteria you want from the ecosystem around that and you bundle them all together so that  
> the third party packages essentially get preinstalled and you call that ?FooLang Platform?  
> or something.
>  
> Dependencies are always going to be a problem. The best way to parse XML is lxml (and the  
> best way to parse HTML is BeautifulSoup plus lxml); does that mean that the Python Platform  
> requires libxml2? The best way to do numerical computing is with NumPy, and the best way  
> to build NumPy is with MKL on platforms where it exists, ATLAS on others; does that mean  
> the Python Platform requires MKL and/or ATLAS? The best way to build cross-platform  
> GUIs with desktop integration is PySide; does that mean the Python Platform requires  
> Qt? (One of the biggest portability problems for Python in practice has always been Tcl/Tk;  
> Qt would be much worse.)
>  
> You could look at it as something like the core plus distributions model used in OS's.  
> FreeBSD has a core and ports; there's a simple rule for what's in core (a complete POSIX  
> system plus enough to build ports, nothing else), and the practicality-vs.-purity  
> decisions for how to apply that to real-life problems isn't that hard. But Linux took  
> a different approach: it's just a kernel, and everything else--libc, the ports system,  
> etc.--can be swapped out. There is no official distribution; at any given time in history,  
> there are 3-6 competing "major distributions", dozens of others based on them, and some  
> "special-case" distros like ucLinux or Android. And that means different distros can  
> make different decisions on what dependencies are acceptable--include packages that  
> only run on x86, or accept some corporate quasi-open-source license or closed-source  
> blob.
>  
> Python seems to have fallen into a place halfway between the two. The stdlib is closer  
> to FreeBSD core than to Linux. On the other hand, while many people start with the official  
> stdlib and use pip to expand on it, there are third-party distributions competing to  
> provide more useful or better-organized batteries than the official version, plus  
> custom distributions that come with some OS distros (e.g., Apple includes PyObjC with  
> theirs), and special things like Kivy.
>  
> That doesn't seem to have caused any harm, and may have caused a lot of benefit. While Python  
> may not have found the perfect sweet spot, what it found isn't that bad. And the way it continues  
> to evolve isn't that bad. If you could go back in time to 2010 and come up with a grand five-year  
> plan for how the stdlib, core distribution, and third-party ecosystem should be better,  
> how much different would Python be today?
>  
>  

It certainly doesn?t require you to add something to the ?Platform? for every topic either. You can still be conservative in what you include in the ?Platform? based on how many people are likely to need/want it and what sort of dependency or building impact it has on actually building out the full Platform.

---  
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



From abarnert at yahoo.com  Thu May 28 00:05:51 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 27 May 2015 15:05:51 -0700
Subject: [Python-ideas] Increasing public package discoverability (was:
	Adding jsonschema to the standard library)
In-Reply-To: <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CAN-Kwu2tOi8dqNNtm21rUCKB-oWVum_VK4OoXEU2JKmSHcr5dw@mail.gmail.com>
 <01071ECC-1949-40EF-8D7A-3073E643344B@gmail.com>
Message-ID: <40169569-4709-463B-A4BB-687B1A7BE1ED@yahoo.com>

On May 27, 2015, at 12:13, Demian Brecht <demianbrecht at gmail.com> wrote:
> 
> The major advantage of going with a fully distributed model would be the out-of-band releases. While nice to have for feature development, it can be crucial for bug fixes, but even more so for security patches. Other than that, I could see it opening the door to adoption of packages as ?recommended? without worrying too much about state of development. requests is a perfect example of that. Note that my personal focus on standard library development is the http package so I?m somewhat cutting my legs out from under me, but I?m starting to think that adopting such a distribution mechanism might solve a number of problems (but is probably just as likely to introduce new ones ;)).

One way to do that might be to focus the stdlib on picking the abstract interfaces (whether in the actual code, like dbm allows bsddb to plug in, or just in documentation, like DB-API 2) and providing a bare-bones implementation or none at all. It would be nice if things like lxml.etree didn't take so much work and it weren't so hard to quantify how perfect of a replacement it is. Or if we had a SortedMapping ABC so the half-dozen popular implementations could share a consistent API, so they could compete more cleanly on things that matter like performance or the need for a C extension.

But the example of requests shows how hard, and possibly undesirable, that is. Most people use requests not because of the advanced features it has that urllib doesn't, but because the intermediate-level features that both include have a nicer interface in requests. And, while people have talked about how nice it would be to restructure urllib so that it matches requests' interface wherever possible (while still retaining the existing interface for backward compat), it doesn't seem that likely anyone will actually ever do it. And, even if someone did, and requests became a drop-in replacement for urllib' new-style API and urllib was eventually deprecated, what are the odds competitors like PyCurl would be reworked into a "URL-API 2.0" module?




From scott+python-ideas at scottdial.com  Thu May 28 00:39:29 2015
From: scott+python-ideas at scottdial.com (Scott Dial)
Date: Wed, 27 May 2015 18:39:29 -0400
Subject: [Python-ideas] Framework for Python for CS101
In-Reply-To: <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
References: <CAJ+TeoeaQog7s8bo=CuhTH-Y=CVR9J8B0-vX=1AUKxsiQimz3Q@mail.gmail.com>
 <C61507D3-B5AA-445B-839A-1A9F1D5A9C7B@yahoo.com>
 <CALGmxEJMB6gweSro0Yhx8gSmX1gjtSXgmjpV+3bbvjCmY4c4WA@mail.gmail.com>
 <eb760ade-6154-47a6-8d70-38822ca1949d@googlegroups.com>
Message-ID: <556647A1.8010703@scottdial.com>

On 2015-05-25 1:50 PM, Rustom Mody wrote:
> from
> https://groups.google.com/d/msg/erlang-programming/5X1irAmLMD8/qCQJ11Y5jEAJ

>From the same post:
"""
One problem is that Computer Science departments simply do not have
the time to teach everything they need to teach.  Students want to
leave in 3 years with qualifications an employer will like, and
employers want 'practical' languages in CVs.  I have a colleague
who cannot spell because he was taught to read using the Initial
Teaching Alphabet, so I'm less convinced about the educational
benefits of neat languages than I used to be.
"""

Would that not be the same problem with a Python-like teaching language?

-- 
Scott Dial
scott at scottdial.com

From ncoghlan at gmail.com  Thu May 28 01:16:04 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 28 May 2015 09:16:04 +1000
Subject: [Python-ideas] Increasing public package discoverability (was:
 Adding jsonschema to the standard library)
In-Reply-To: <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
Message-ID: <CADiSq7d9icqunOxnEokXqnG-Dutc6beXUaaUdd9vWUXAsht0Aw@mail.gmail.com>

On 28 May 2015 04:46, "Paul Moore" <p.f.moore at gmail.com> wrote:
>
> On 27 May 2015 at 19:28, Demian Brecht <demianbrecht at gmail.com> wrote:
> > This is probably a silly idea, but given the above quote and the
new(er) focus on pip and distributed packages, has there been any
discussion around perhaps deprecating (and entirely removing from a Python
4 release) non-builtin packages and modules?
>
> It has been discussed on a number of occasions. The major issue with
> the idea is that a lot of people use Python in closed corporate
> environments, where access to the internet from tools such as pip can
> be restricted. Also, many companies have legal approval processes for
> software - getting approval for "Python" includes the standard
> library, but each external package required would need a separate,
> probably lengthy and possibly prohibitive, approval process before it
> could be used.
>
> So it's unlikely to ever happen, because it would cripple Python for a
> non-trivial group of its users.

I expect splitting the standard library into a minimal core and a suite of
default independently updatable add-ons will happen eventually, we just
need to help fix the broken way a lot of organisations currently work as we
go:
http://community.redhat.com/blog/2015/02/the-quid-pro-quo-of-open-infrastructure/

Organisations that don't suitably adapt to the rise of open collaborative
models for infrastructure development are going to have a very rough time
of it in the coming years.

Cheers,
Nick.

P.S. For a less verbally dense presentation of some of the concepts in that
article: http://www.redhat.com/en/explore/infrastructure/na

P.P.S. And for a book length exposition of these kinds of concepts:
http://www.redhat.com/en/explore/the-open-organization-book

>
> Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150528/5af0e395/attachment.html>

From njs at pobox.com  Thu May 28 02:29:03 2015
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 27 May 2015 17:29:03 -0700
Subject: [Python-ideas] Displaying DeprecationWarnings in the interactive
	interpreter, second try
Message-ID: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>

Hi all,

I'm tired of getting bug reports like this one:

  https://github.com/numpy/numpy/issues/5919

where the issue is just that the user didn't see deprecation warnings,
so I just filed a bug report requesting that the interactive Python
REPL start printing DeprecationWarnings when users use deprecated
functionality:

  https://bugs.python.org/issue24294

In the bug report it was pointed out that this was discussed on
python-ideas a few months ago, and the discussion petered out without
any consensus:

  http://thread.gmane.org/gmane.comp.python.ideas/32191

As far as I can tell, though, there were only two real objections
raised in that previous thread, and IMO neither is really convincing.
So let me pre-empt those now:

Objection 1: This will cause the display of lots of unrelated warnings.

Response: You misunderstand the proposal. I'm not suggesting that we
display *all* DeprecationWarnings whenever the interactive interpreter
is running; I'm only suggesting that we display the deprecation
warnings that are warning about *code that was actually typed at the
interpreter*.

# not this
warnings.filterwarnings("default", category=DeprecationWarning)

# this
warnings.filterwarnings("default", category=DeprecationWarning,
module="__main__")

So for example, if we have

# module1.py
def deprecated_function():
    warnings.warn("stop it!", DeprecationWarning, stacklevel=2)

# module2.py
import module1
def foo():
    module1.deprecated_function()

>> import module1, module2
# This doesn't print a warning, because 'foo' is not deprecated
# it merely uses deprecated functionality, which is not my problem,
# because I am merely a user of module1, not the author.
>> module2.foo()
# This *does* print a warning, because now I am using the
# deprecated functionality directly.
>> module1.deprecated_function()
__main__:1: DeprecationWarning: stop it!


Objection 2: There are lots of places that code is run interactively
besides the standard REPL -- there's IDLE and IPython and etc.

Response: Well, this isn't really an objection :-). Basically I'm
looking for consensus from the CPython team that this is what should
happen in the interactive interpreters that they distribute. Other
interfaces can then follow that lead or not. (For some value of
"follow". By the time you read this IPython may have already made the
change: https://github.com/ipython/ipython/pull/8480 ;-).)

So, totally awesome idea, let's do it, yes/yes?

-n

-- 
Nathaniel J. Smith -- http://vorpus.org

From stephen at xemacs.org  Thu May 28 03:31:13 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 28 May 2015 10:31:13 +0900
Subject: [Python-ideas] Increasing public package discoverability (was:
 Adding jsonschema to the standard library)
In-Reply-To: <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
Message-ID: <87mw0pfpy6.fsf@uwakimon.sk.tsukuba.ac.jp>

Demian Brecht writes:

 > This is probably a silly idea, but given the above quote and the
 > new(er) focus on pip and distributed packages, has there been any
 > discussion around perhaps deprecating (and entirely removing from a
 > Python 4 release) non-builtin packages and modules?

Of course there has, including in parallel to your post.  It's a dead
obvious idea.  I'd point to threads, but none of the ones I remember
would be of great use; the same ideas and suggestions that were
advanced before have been reproduced here.

The problems are that the devil is in the details which are rarely
specified, and it would have a huge impact on relationships in the
community.  For example, in the context of a relatively short timed
release cycle, I do recall the debates mentioned by Nick over
corporate environments where "Python" (the CPython distribution) is
approved as a single package, so stdlib facilities are automatically
available to "Python" users, but other packages would need to be
approved on a package-by-package basis.  There's significant overhead
to each such application, so it is efficiency-increasing to have a
big stdlib in those environments.

OK, you say, so we automatically bundle the separate stdlib current at
a given point in time with the less frequently released Python core
distribution.  Now, in the Department of Devilsh Details, do those
"same core + new stdlib" bundles get the core version number, the
stdlib version number (which now must be different!) or a separate
bundle version number?  In the Bureau of Relationship Impacts, if I
were a fascist QA/security person, I would surely view that bundle as
a new release requiring a new iteration of the security vetting
process (relationship impact).  Maybe the departments doing such
vetting are not as fascist as I would be, but we'd have to find out,
wouldn't we?  If we just went ahead with this process and discovered
later that 80% of the people who were depending on the "Python"
package now cannot benefit from the bundling because the tarball
labelled "Python-X.Y" no longer is eternal, that would be sad.

And although that is the drag on a core/stdlib release cycle split
most often cited, I'm sure there are plenty of others.  Is it worth
the effort to try to discover and address all/most/some of those?
Which ones to address (and we don't know what problems might exist
yet!)?

 > I would think that if there was a system similar to Django Packages
 > that made discoverability/importing of packages as easy as using
 > those in the standard library, having a distributed package model
 > where bug fixes and releases could be done out of band with CPython
 > releases would likely more beneficial to the end users. If there
 > was a ?recommended packages? framework, perhaps there could also be
 > buildbots put to testing interoperability of the recommended
 > package set.

I don't think either "recommended packages" or buildbots scales much
beyond Django (and I wonder whether buildbots would even scale to the
Django packages ecosystem).  But the Python ecosystem includes all of
Django already, plus NumPy, SciPy, Pandas, Twisted, Egenix's mx*
stuff, a dozen more or less popular ORMs, a similar number of web
frameworks more or less directly competing with Django itself, and all
the rest of the cast of thousands on PyPI.

At the present time, I think we need to accept that integration of a
system, even one that implements a single application, has a shallow
learning curve.  It takes quite a bit of time to become aware of needs
(my initial reaction was "json-schema in the stdlib? YAGNI!!"), and
some time and a bit of Google-foo to translate needs to search
keywords.  After that, the Googling goes rapidly -- that's a solved
problem, thank you very much DEC AltaVista.  Then you hit the multiple
implementations wall, and after recovering consciousness, you start
moving forward again slowly, evaluating alternatives and choosing one.

And that doesn't mean you're done, because those integration decisions
will not be set in stone.  Eg, for Mailman's 3.0 release, Barry
decided to swap out two mission-critical modules, the ORM and the REST
generator -- after the first beta was released!  Granted, Mailman 3.0
has had an extremely long release process, but the example remains
relevant -- such reevaluations occur in .2 or .9 releases all the
time.)  Except for Googling, none of these tasks are solved problems:
the system integrator has to go through the process over again each
time with a new system, or in an existing system when the relative
strengths of the chosen modules vs. alternatives change dramatically.
In this last case, it's true that choosing keywords is probably
trivial, and the alternative pruning goes faster, but retrofitting the
whole system to the new! improved! alternative!! module may be pretty
painful -- and there's not necessarily a guarantee it will succeed.

IMO, fiddling with the Python release and distribution is unlikely to
solve any of the above problems, and is likely to be a step backward
for some users.  Of course at some point we decide the benefits to
other users, the developers, and the release engineers outweigh the
costs to the users who don't like the change, but it's never a
no-brainer.


From graffatcolmingov at gmail.com  Thu May 28 04:39:09 2015
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Wed, 27 May 2015 21:39:09 -0500
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
Message-ID: <CAN-Kwu3nOG9FR6zoeGNm0jJm7Vki4nDpkZ51Qo4-0yD0mEonUQ@mail.gmail.com>

On Wed, May 27, 2015 at 7:29 PM, Nathaniel Smith <njs at pobox.com> wrote:
> Hi all,
>
> I'm tired of getting bug reports like this one:
>
>   https://github.com/numpy/numpy/issues/5919
>
> where the issue is just that the user didn't see deprecation warnings,
> so I just filed a bug report requesting that the interactive Python
> REPL start printing DeprecationWarnings when users use deprecated
> functionality:
>
>   https://bugs.python.org/issue24294
>
> In the bug report it was pointed out that this was discussed on
> python-ideas a few months ago, and the discussion petered out without
> any consensus:
>
>   http://thread.gmane.org/gmane.comp.python.ideas/32191
>
> As far as I can tell, though, there were only two real objections
> raised in that previous thread, and IMO neither is really convincing.
> So let me pre-empt those now:
>
> Objection 1: This will cause the display of lots of unrelated warnings.
>
> Response: You misunderstand the proposal. I'm not suggesting that we
> display *all* DeprecationWarnings whenever the interactive interpreter
> is running; I'm only suggesting that we display the deprecation
> warnings that are warning about *code that was actually typed at the
> interpreter*.
>
> # not this
> warnings.filterwarnings("default", category=DeprecationWarning)
>
> # this
> warnings.filterwarnings("default", category=DeprecationWarning,
> module="__main__")
>
> So for example, if we have
>
> # module1.py
> def deprecated_function():
>     warnings.warn("stop it!", DeprecationWarning, stacklevel=2)
>
> # module2.py
> import module1
> def foo():
>     module1.deprecated_function()
>
>>> import module1, module2
> # This doesn't print a warning, because 'foo' is not deprecated
> # it merely uses deprecated functionality, which is not my problem,
> # because I am merely a user of module1, not the author.
>>> module2.foo()
> # This *does* print a warning, because now I am using the
> # deprecated functionality directly.
>>> module1.deprecated_function()
> __main__:1: DeprecationWarning: stop it!
>
>
> Objection 2: There are lots of places that code is run interactively
> besides the standard REPL -- there's IDLE and IPython and etc.
>
> Response: Well, this isn't really an objection :-). Basically I'm
> looking for consensus from the CPython team that this is what should
> happen in the interactive interpreters that they distribute. Other
> interfaces can then follow that lead or not. (For some value of
> "follow". By the time you read this IPython may have already made the
> change: https://github.com/ipython/ipython/pull/8480 ;-).)
>
> So, totally awesome idea, let's do it, yes/yes?
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

I'm in favor of this. It's especially convincing to me that IPython is
considering a similar change.

From ncoghlan at gmail.com  Thu May 28 04:53:23 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 28 May 2015 12:53:23 +1000
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
Message-ID: <CADiSq7efuyuMBg55JQdeBsHQ-SjO1wSJWxPCtRDxDMj2RRGKcw@mail.gmail.com>

On 28 May 2015 at 10:29, Nathaniel Smith <njs at pobox.com> wrote:
> Hi all,
>
> I'm tired of getting bug reports like this one:
>
>   https://github.com/numpy/numpy/issues/5919
>
> where the issue is just that the user didn't see deprecation warnings,
> so I just filed a bug report requesting that the interactive Python
> REPL start printing DeprecationWarnings when users use deprecated
> functionality:
>
>   https://bugs.python.org/issue24294

+1 from me. For folks that aren't aware of the history, prior to
Python 2.7, the situation was like this (DW = DeprecationWarning, PDW
= PendingDeprecationWarning):

Test frameworks: DW visible by default, PDW hidden by default
Interactive REPL: DW visible by default, PDW hidden by default
Non-interactive execution: DW visible by default, PDW hidden by default

In Python 2.7, this behaviour was changed to be as follows:

Test frameworks: both visible by default
Interactive REPL: both hidden by default
Non-interactive execution: both hidden by default

This eliminated deprecation warnings from the experience of end users
running scripts and applications that merely happened to be written in
Python, but also eliminated any real behavioural difference between DW
and PDW, making it very unclear as to whether or not retaining PDW
still had any practical purpose beyond backwards compatibility.

In addition to better alerting end users to genuinely imminent
deprecations that they should adapt to ASAP, splitting them again in
the interactive REPL case would restore a meaningful behavioural
difference that can help pragmatically guide decisions as to which is
more appropriate to use for a given deprecation:

Test frameworks: both visible by default
Interactive REPL: DW visible by default, PDW hidden by default
Non-interactive execution: both hidden by default

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From berker.peksag at gmail.com  Thu May 28 04:59:04 2015
From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=)
Date: Thu, 28 May 2015 05:59:04 +0300
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <CADiSq7efuyuMBg55JQdeBsHQ-SjO1wSJWxPCtRDxDMj2RRGKcw@mail.gmail.com>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
 <CADiSq7efuyuMBg55JQdeBsHQ-SjO1wSJWxPCtRDxDMj2RRGKcw@mail.gmail.com>
Message-ID: <CAF4280KQ4VPeDuEDyBh5JGP4MnBFi+J8yn6ozRhqWEcMOGJ=dw@mail.gmail.com>

On Thu, May 28, 2015 at 5:53 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 28 May 2015 at 10:29, Nathaniel Smith <njs at pobox.com> wrote:
>> Hi all,
>>
>> I'm tired of getting bug reports like this one:
>>
>>   https://github.com/numpy/numpy/issues/5919
>>
>> where the issue is just that the user didn't see deprecation warnings,
>> so I just filed a bug report requesting that the interactive Python
>> REPL start printing DeprecationWarnings when users use deprecated
>> functionality:
>>
>>   https://bugs.python.org/issue24294
>
> +1 from me.

+1 from me, too.

--Berker

From ben+python at benfinney.id.au  Thu May 28 07:04:54 2015
From: ben+python at benfinney.id.au (Ben Finney)
Date: Thu, 28 May 2015 15:04:54 +1000
Subject: [Python-ideas] Displaying DeprecationWarnings in the
	interactive interpreter, second try
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
 <CADiSq7efuyuMBg55JQdeBsHQ-SjO1wSJWxPCtRDxDMj2RRGKcw@mail.gmail.com>
Message-ID: <85a8wpwavd.fsf@benfinney.id.au>

Nick Coghlan <ncoghlan at gmail.com> writes:

> In addition to better alerting end users to genuinely imminent
> deprecations that they should adapt to ASAP, splitting them again in
> the interactive REPL case would restore a meaningful behavioural
> difference that can help pragmatically guide decisions as to which is
> more appropriate to use for a given deprecation:
>
> Test frameworks: both visible by default
> Interactive REPL: DW visible by default, PDW hidden by default
> Non-interactive execution: both hidden by default

Is there already a clear API for a ?test framework? or ?interactive
REPL? to declare itself as such?

Do all the test frameworks and interactive REPL implementations already
follow that API?

I ask this to know whether your proposal entails that each
implementation of a test framework or REPL will likely behave
differently from other implementations in how it fits into the above
categories.

-- 
 \           ?We have clumsy, sputtering, inefficient brains?. It is a |
  `\     *struggle* to be rational and objective, and failures are not |
_o__) evidence for an alternative reality.? ?Paul Z. Myers, 2010-10-14 |
Ben Finney


From tjreedy at udel.edu  Thu May 28 08:22:43 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 28 May 2015 02:22:43 -0400
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
Message-ID: <mk6c87$q66$1@ger.gmane.org>

On 5/27/2015 8:29 PM, Nathaniel Smith wrote:

>    https://bugs.python.org/issue24294

I had already planned to add this to Idle, at least as an option. I had 
not seen the issue yet.  I am pretty ignorant about the warnings system 
so I posted some questions there.  I was thinking to make this an 
option, but I do not know how to convey options set in the Idle process 
to the user execution process, as the rpc protocol seems undocumented. 
I might just turn DeprecationWarnings on the way they used to be and 
will be in the console interpreter, but I am slightly worried about 
warnings being intermixed with user output.  This is not a problem with 
tracebacks as they end user output.

-- 
Terry Jan Reedy


From ncoghlan at gmail.com  Thu May 28 08:46:12 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 28 May 2015 16:46:12 +1000
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <85a8wpwavd.fsf@benfinney.id.au>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
 <CADiSq7efuyuMBg55JQdeBsHQ-SjO1wSJWxPCtRDxDMj2RRGKcw@mail.gmail.com>
 <85a8wpwavd.fsf@benfinney.id.au>
Message-ID: <CADiSq7eY6Heuv2jVHnXsZHCSKKaqhi9C5ZKyjXcA0_CVRd-gFw@mail.gmail.com>

On 28 May 2015 at 15:04, Ben Finney <ben+python at benfinney.id.au> wrote:
> Nick Coghlan <ncoghlan at gmail.com> writes:
>
>> In addition to better alerting end users to genuinely imminent
>> deprecations that they should adapt to ASAP, splitting them again in
>> the interactive REPL case would restore a meaningful behavioural
>> difference that can help pragmatically guide decisions as to which is
>> more appropriate to use for a given deprecation:
>>
>> Test frameworks: both visible by default
>> Interactive REPL: DW visible by default, PDW hidden by default
>> Non-interactive execution: both hidden by default
>
> Is there already a clear API for a ?test framework? or ?interactive
> REPL? to declare itself as such?
>
> Do all the test frameworks and interactive REPL implementations already
> follow that API?

It's a convention. unittest sets the convention for test frameworks
(and, as far as I am aware, other popular test runners like nose and
py.test abide by it), while the default REPL, the code module, IDLE
and IPython will set the convention for REPLs (assuming we change it
away from matching the non-interactive default behaviour)

> I ask this to know whether your proposal entails that each
> implementation of a test framework or REPL will likely behave
> differently from other implementations in how it fits into the above
> categories.

Test frameworks and REPLs that don't adjust the warning filters on
startup will continue to default to the non-interactive behaviour.
Nobody is proposing to change that.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From mal at egenix.com  Thu May 28 10:26:04 2015
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 28 May 2015 10:26:04 +0200
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
Message-ID: <5566D11C.6050002@egenix.com>

On 28.05.2015 02:29, Nathaniel Smith wrote:
> Hi all,
> 
> I'm tired of getting bug reports like this one:
> 
>   https://github.com/numpy/numpy/issues/5919

Well, in that particular case, I think numpy should raise a TypeError
instead of a DeprecationWarning :-)

> where the issue is just that the user didn't see deprecation warnings,
> so I just filed a bug report requesting that the interactive Python
> REPL start printing DeprecationWarnings when users use deprecated
> functionality:
> 
>   https://bugs.python.org/issue24294

+1 on the general idea, but I think this needs some more thought
on the topic of how you detect an interactive session that's being
used by a user.

You wouldn't want these warning to show up when piping in commands
to a Python interpreter.

In eGenix PyRun we use sys.stdin.isatty() to check whether we
want an interactive prompt or not. I guess the same could be done
here.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 28 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From jonathan at slenders.be  Thu May 28 10:53:13 2015
From: jonathan at slenders.be (Jonathan Slenders)
Date: Thu, 28 May 2015 10:53:13 +0200
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <5566D11C.6050002@egenix.com>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
 <5566D11C.6050002@egenix.com>
Message-ID: <CAKfyG3x04SAaLFfTexfSyb0oopWBGgHGgzx12k452jDgZC+rCA@mail.gmail.com>

+1 on this too.

I'm author of the "ptpython" REPL.

Nathaniel Smith: could you tell me what I should do?

Is it enough when I make sure that all code runs in __main__ and running
this command at the start?
warnings.filterwarnings("default", category=DeprecationWarning,
module="__main__")

Jonathan




2015-05-28 10:26 GMT+02:00 M.-A. Lemburg <mal at egenix.com>:

> On 28.05.2015 02:29, Nathaniel Smith wrote:
> > Hi all,
> >
> > I'm tired of getting bug reports like this one:
> >
> >   https://github.com/numpy/numpy/issues/5919
>
> Well, in that particular case, I think numpy should raise a TypeError
> instead of a DeprecationWarning :-)
>
> > where the issue is just that the user didn't see deprecation warnings,
> > so I just filed a bug report requesting that the interactive Python
> > REPL start printing DeprecationWarnings when users use deprecated
> > functionality:
> >
> >   https://bugs.python.org/issue24294
>
> +1 on the general idea, but I think this needs some more thought
> on the topic of how you detect an interactive session that's being
> used by a user.
>
> You wouldn't want these warning to show up when piping in commands
> to a Python interpreter.
>
> In eGenix PyRun we use sys.stdin.isatty() to check whether we
> want an interactive prompt or not. I guess the same could be done
> here.
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Source  (#1, May 28 2015)
> >>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
> >>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
>
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150528/55014d7d/attachment.html>

From njs at pobox.com  Thu May 28 11:04:05 2015
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 28 May 2015 02:04:05 -0700
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <CAKfyG3x04SAaLFfTexfSyb0oopWBGgHGgzx12k452jDgZC+rCA@mail.gmail.com>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
 <5566D11C.6050002@egenix.com>
 <CAKfyG3x04SAaLFfTexfSyb0oopWBGgHGgzx12k452jDgZC+rCA@mail.gmail.com>
Message-ID: <CAPJVwBmyujmBwrSnsA6vkeqZ-7maxGCxQ2tS747dLHqtRYnrRg@mail.gmail.com>

On Thu, May 28, 2015 at 1:53 AM, Jonathan Slenders <jonathan at slenders.be> wrote:
> +1 on this too.
>
> I'm author of the "ptpython" REPL.
>
> Nathaniel Smith: could you tell me what I should do?
>
> Is it enough when I make sure that all code runs in __main__ and running
> this command at the start?
> warnings.filterwarnings("default", category=DeprecationWarning,
> module="__main__")

That should do it, yes.

-- 
Nathaniel J. Smith -- http://vorpus.org

From greg at krypto.org  Thu May 28 16:27:38 2015
From: greg at krypto.org (Gregory P. Smith)
Date: Thu, 28 May 2015 14:27:38 +0000
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <5566D11C.6050002@egenix.com>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
 <5566D11C.6050002@egenix.com>
Message-ID: <CAGE7PN+gUow-XoK7MGCNhqtCb-0-yMRqPHV9iEc0Ep01-povNA@mail.gmail.com>

On Thu, May 28, 2015, 1:26 AM M.-A. Lemburg <mal at egenix.com> wrote:

> On 28.05.2015 02:29, Nathaniel Smith wrote:
> > Hi all,
> >
> > I'm tired of getting bug reports like this one:
> >
> >   https://github.com/numpy/numpy/issues/5919
>
> Well, in that particular case, I think numpy should raise a TypeError
> instead of a DeprecationWarning :-)
>
> > where the issue is just that the user didn't see deprecation warnings,
> > so I just filed a bug report requesting that the interactive Python
> > REPL start printing DeprecationWarnings when users use deprecated
> > functionality:
> >
> >   https://bugs.python.org/issue24294
>
> +1 on the general idea, but I think this needs some more thought
> on the topic of how you detect an interactive session that's being
> used by a user.
>
> You wouldn't want these warning to show up when piping in commands
> to a Python interpreter.
>
> In eGenix PyRun we use sys.stdin.isatty() to check whether we
> want an interactive prompt or not. I guess the same could be done
> here.
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Source  (#1, May 28 2015)
> >>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
> >>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
>
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
+1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150528/6d139928/attachment.html>

From skip.montanaro at gmail.com  Thu May 28 16:34:23 2015
From: skip.montanaro at gmail.com (Skip Montanaro)
Date: Thu, 28 May 2015 09:34:23 -0500
Subject: [Python-ideas] Increasing public package discoverability (was:
 Adding jsonschema to the standard library)
In-Reply-To: <etPan.55661518.5c3d4cdd.12a4d@Draupnir.home>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
 <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com>
 <etPan.55661518.5c3d4cdd.12a4d@Draupnir.home>
Message-ID: <CANc-5Ux5sdXcqyV7b3exzgp0LSk847gbYLTosZKrmRTtx8eESQ@mail.gmail.com>

On Wed, May 27, 2015 at 2:03 PM, Donald Stufft <donald at stufft.io> wrote:
> I?m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library.

While perhaps nice in theory, the process of getting a package into
the standard library provides a number of filters (hurdles, if you
will) through which a package much pass (or surmount) before it is
deemed suitable for broad availability by default to users, and for
support by the core development team. Today, that includes
documentation, unit tests, broad acceptance by the user community (in
many cases), and a commitment by the core development team to maintain
the package for the foreseeable future. To the best of my knowledge,
none of those filters apply to PyPI-cataloged packages. That is not to
say that the current process doesn't have its problems. Some really
useful stuff is surely not available in the core. If the core
development team was stacked with people who program numeric
applications for a living, perhaps numpy or something similar would be
in the core today.

The other end of the spectrum is Perl. It has been more than a decade
since I did any Perl programming, and even then, not much, but I still
remember how confused I was trying to choose a package to manipulate
dates and times from CPAN with no guidance. I know PyPI has a weight
field. I just went back and reread the footnote describing it, but I
really have no idea how it operates. I'm sure someone nefarious could
game that system so their security compromising package drifts toward
the top of the list. Try searching for "xml." 2208 packages are
return, with weights ranging from 1 to 9. 107 packages have weights of
8 or 9. If the standard library is to dwindle down to next-to-nothing,
a better scheme for package selection/recommendation will have to be
developed.

Skip

From brett at python.org  Thu May 28 16:45:01 2015
From: brett at python.org (Brett Cannon)
Date: Thu, 28 May 2015 14:45:01 +0000
Subject: [Python-ideas] Displaying DeprecationWarnings in the
 interactive interpreter, second try
In-Reply-To: <CAGE7PN+gUow-XoK7MGCNhqtCb-0-yMRqPHV9iEc0Ep01-povNA@mail.gmail.com>
References: <CAPJVwBm6Ww1ssh1hfK7XYaKiW2bk89qNJynghz8zk6AtLny1sg@mail.gmail.com>
 <5566D11C.6050002@egenix.com>
 <CAGE7PN+gUow-XoK7MGCNhqtCb-0-yMRqPHV9iEc0Ep01-povNA@mail.gmail.com>
Message-ID: <CAP1=2W6GjOS+i1rrACERzp5udO5mR=yeYkd=EO8Q5xKkosvmGQ@mail.gmail.com>

On Thu, May 28, 2015 at 10:28 AM Gregory P. Smith <greg at krypto.org> wrote:

>
>
> On Thu, May 28, 2015, 1:26 AM M.-A. Lemburg <mal at egenix.com> wrote:
>
>> On 28.05.2015 02:29, Nathaniel Smith wrote:
>> > Hi all,
>> >
>> > I'm tired of getting bug reports like this one:
>> >
>> >   https://github.com/numpy/numpy/issues/5919
>>
>> Well, in that particular case, I think numpy should raise a TypeError
>> instead of a DeprecationWarning :-)
>>
>> > where the issue is just that the user didn't see deprecation warnings,
>> > so I just filed a bug report requesting that the interactive Python
>> > REPL start printing DeprecationWarnings when users use deprecated
>> > functionality:
>> >
>> >   https://bugs.python.org/issue24294
>>
>> +1 on the general idea, but I think this needs some more thought
>> on the topic of how you detect an interactive session that's being
>> used by a user.
>>
>> You wouldn't want these warning to show up when piping in commands
>> to a Python interpreter.
>>
>> In eGenix PyRun we use sys.stdin.isatty() to check whether we
>> want an interactive prompt or not. I guess the same could be done
>> here.
>>
>> --
>> Marc-Andre Lemburg
>> eGenix.com
>>
>> Professional Python Services directly from the Source  (#1, May 28 2015)
>> >>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>> >>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>> ________________________________________________________________________
>>
>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>>
>>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>            Registered at Amtsgericht Duesseldorf: HRB 46611
>>                http://www.egenix.com/company/contact/
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> +1
>

+1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150528/8be5b29a/attachment.html>

From chris.barker at noaa.gov  Thu May 28 18:06:44 2015
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 28 May 2015 09:06:44 -0700
Subject: [Python-ideas] Cmake as build system
In-Reply-To: <CAPkN8x+H8ho1k9d2mrMuhGn97FjedOs8VMj3EaFsTxuCh1nGhQ@mail.gmail.com>
References: <CAMkX=YUGgPXvj08GhME53-6VuDATg0N7asYEpQMaaGy2Prc43w@mail.gmail.com>
 <CAO41-mPo_CVTRWGDzU23MMZFj11R_9FAYeKgihTy+vXVi+t81w@mail.gmail.com>
 <CAPkN8x+H8ho1k9d2mrMuhGn97FjedOs8VMj3EaFsTxuCh1nGhQ@mail.gmail.com>
Message-ID: <CALGmxELuSpFo1sS2Yf1JcqD1cOicHSNvD00eJJCyrX0Ct4Z2Bg@mail.gmail.com>

There was a big thread about this recently l-- and many more before that,
I'm sure. Please read them before posting more...

But:

cPython is an open source project -- while it would be NICE to get core
developer's support before going out and doing something new, if anyone is
convinced that they can set up a better build system for Python -- go ahead
and do it -- if it turns out all skeptic are wrong, and the issues they
raise can be overcome easily enough -- then you will have proved that.

But going on and on on this list about how other people should do something
different isn't going to get you anywhere.

One small note:

> Take
> a SCons, for example, and try to port that to Python 3. You will see the
> key
> points that need to be solved (see the bulletproof unicode thread in this
> list).
>

uhm, in that thread, you ask for a Python2 solution (so apparently nothing
to do with porting to py3) -- whereas in Python3, there is surrogateescape
support.

So while yes, python3's consistent, robust approach to Unicode has made
processing ill-defined text harder than python2, this particular problem
HAS been addressed, and in fact, is easier to to do in py2 than py3.


discussing Python usage issues in development
> lists is discouraged even though the issues raised there are important for
> language usability.
>

you can only put so much on one list -- if you want to discuss how to do
something with the existing implementation of Python (2 or 3...) then an
"ideas" list or "devel" list isn't the right place. What is the problem
with that?

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150528/d18c3c9b/attachment-0001.html>

From wes.turner at gmail.com  Thu May 28 19:07:13 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Thu, 28 May 2015 12:07:13 -0500
Subject: [Python-ideas] Increasing public package discoverability (was:
 Adding jsonschema to the standard library)
In-Reply-To: <CANc-5Ux5sdXcqyV7b3exzgp0LSk847gbYLTosZKrmRTtx8eESQ@mail.gmail.com>
References: <0E73E517-C718-44EC-9C42-711C43009793@gmail.com>
 <CACac1F9yQEdH4MpOuLS-m9oe8ojcBYh_0JH8KbvtQQimJkwhKw@mail.gmail.com>
 <CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khvnsQ@mail.gmail.com>
 <733B5538-B921-42E1-BC37-C6F1E6990091@gmail.com>
 <CAN-Kwu1Z1eO0X_2nx7Neg7UjXOZTCQ4z+ns3ZBnGyWo0-x6gJg@mail.gmail.com>
 <E203A6E7-158B-47F1-8B47-F2E3C5529282@stufft.io>
 <87oalcgfsm.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dVrqjU9bQRufXPQAO+y3oYxQBUg9kEySTqKh6yeVpXSw@mail.gmail.com>
 <285F0766-B023-4523-9794-819DC9CDD1CB@gmail.com>
 <CACac1F-PLO9qv1aaCshRmbqgmuDv557LODmgKCT_n3qC9S06Bg@mail.gmail.com>
 <007971A6-7D77-44FB-8B28-381673D981D5@gmail.com>
 <etPan.55661518.5c3d4cdd.12a4d@Draupnir.home>
 <CANc-5Ux5sdXcqyV7b3exzgp0LSk847gbYLTosZKrmRTtx8eESQ@mail.gmail.com>
Message-ID: <CACfEFw8wZEEewTBBgc73SrEuGhst7DzMDGbeOWnJtJ+Vh6_4sQ@mail.gmail.com>

On Thu, May 28, 2015 at 9:34 AM, Skip Montanaro <skip.montanaro at gmail.com>
wrote:

> On Wed, May 27, 2015 at 2:03 PM, Donald Stufft <donald at stufft.io> wrote:
> > I?m of the opinion that, given a brand new language, it makes more sense
> to have really good packaging tools built in, but not to have a standard
> library.
>
> While perhaps nice in theory, the process of getting a package into
> the standard library provides a number of filters (hurdles, if you
> will) through which a package much pass (or surmount) before it is
> deemed suitable for broad availability by default to users, and for
> support by the core development team. Today, that includes
> documentation, unit tests, broad acceptance by the user community (in
> many cases), and a commitment by the core development team to maintain
> the package for the foreseeable future. To the best of my knowledge,
> none of those filters apply to PyPI-cataloged packages. That is not to
> say that the current process doesn't have its problems. Some really
> useful stuff is surely not available in the core. If the core
> development team was stacked with people who program numeric
> applications for a living, perhaps numpy or something similar would be
> in the core today.
>
> The other end of the spectrum is Perl. It has been more than a decade
> since I did any Perl programming, and even then, not much, but I still
> remember how confused I was trying to choose a package to manipulate
> dates and times from CPAN with no guidance. I know PyPI has a weight
> field. I just went back and reread the footnote describing it, but I
> really have no idea how it operates. I'm sure someone nefarious could
> game that system so their security compromising package drifts toward
> the top of the list. Try searching for "xml." 2208 packages are
> return, with weights ranging from 1 to 9. 107 packages have weights of
> 8 or 9. If the standard library is to dwindle down to next-to-nothing,
> a better scheme for package selection/recommendation will have to be
> developed.
>

A workflow for building CI-able, vendorable packages with coverage
and fuzzing?

* xUnit XML test results
* http://schema.org/AssessAction
  * Quality 1 (Use Cases n, m)
  * Quality 2 (Use cases x, y)
  * SecurityAssessAction
* http://schema.org/ChooseAction
  * Why am I downloading duplicate functionality?

* http://schema.org/LikeAction
  * Community feedback is always helpful.

Or, a workflow for maintaining a *distribution of* **versions of** (C and)
Python packages?


> Skip
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150528/f40ca00b/attachment.html>

From techtonik at gmail.com  Fri May 29 10:10:53 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 29 May 2015 11:10:53 +0300
Subject: [Python-ideas] Lossless bulletproof conversion to unicode
	(backslashing)
In-Reply-To: <CACac1F9whLHJDxpCZGXpf6UXMYu-BHEd5zvT0HwQjf3qXBSvrA@mail.gmail.com>
References: <CAPkN8xKTXJu2nhvocG8KuyO1XkJVfK_WsmY6dM=hWsVyg+BVyA@mail.gmail.com>
 <CACac1F9whLHJDxpCZGXpf6UXMYu-BHEd5zvT0HwQjf3qXBSvrA@mail.gmail.com>
Message-ID: <CAPkN8xKE1Q9kKj4kxBJV41ghUZB5G4skXcOzn2+-3HdQYgDqeg@mail.gmail.com>

On Wed, May 27, 2015 at 6:28 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 26 May 2015 at 19:30, anatoly techtonik <techtonik at gmail.com> wrote:
>> In real world you have to deal with broken and invalid
>> output and UnicodeDecode crashes is not an option.
>> The unicode() constructor proposes two options to
>> deal with invalid output:
>>
>> 1. ignore  - meaning skip and corrupt the data
>> 2. replace  - just corrupt the data
>
> There are other error handlers, specifically surrogateescape is
> designed for this use. Only in Python 3.x admittedly, but this list is
> about future versions of Python, so that's what matters here.

Forwarded message to python-list and now I have a thread
schizophrenia. I read it like python-list is also about Python 3 and
got really mad about that. I was a click away from sending me
into the ban list again. =)


Ok. Closing thread in python-idea. This needs to be reopened
when the thread is about Python 4 (which should be all about
improving user experience and assessment of the results).

-- 
anatoly t.

From techtonik at gmail.com  Fri May 29 10:56:44 2015
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 29 May 2015 11:56:44 +0300
Subject: [Python-ideas] Why decode()/encode() name is harmful
Message-ID: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>

First, let me start with The Curse of Knowledge
https://en.wikipedia.org/wiki/Curse_of_knowledge
which can be summarized as:

"Once you get something, it becomes hard
to think how it was to be without it".

I assume that all of you know difference between
decode() and encode(), so you're cursed and
therefore think that getting that right it is just a
matter of reading documentation, experience and
time. But quite a lot of had passed and Python 2
is still there, and Python 3, which is all unicode
at the core (and which is great for people who
finally get it) is not as popular. So, remember that
you are biased towards (or against)
decode/unicode perception.


Now imaging a person who has a text file. The
person need to process that with Python. That
person is probably a journalist and doesn't know
anything that "any developer should know about
unicode". In Python 2 he just copy pastes regular
expressions to match the letter and is happy. In
Python 3 he needs to *convert* that text to unicode.

Then he tries to read the documentation, it
already starts to bring conflict to his mind. It says
to him to "decode" the text. I don't know about you,
but when I'm being told to decode the text, I
assume that it is crypted, because I watched a
few spy movies including ones with Sherlock
Holmes and Stierlitz. But the text looks legit to me,
I can clearly see and read it and now you say that
I need to decode it. You're basically ruining my
world right here. No wonder that I will resist. I
probably stressed, has a lot of stuff to do, and you
are trying to load me with all those abstract
concepts that conflict with what I know. No way!
Unless I have a really strong motivation (or
scientific background) there is no chance to get
this stuff for me right on this day. I will probably
repeat the exercise and after a few tries will get
the output right, but there is no chance I will
remember this thing on that day. Because
rewiring neural paths in my brain is much harder
that paving them from scratch.
-- 
anatoly t.

From rosuav at gmail.com  Fri May 29 17:57:53 2015
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 30 May 2015 01:57:53 +1000
Subject: [Python-ideas] Why decode()/encode() name is harmful
In-Reply-To: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
References: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
Message-ID: <CAPTjJmr0EyaJYjgto+TK2YEYEGAU-yKtSUZ9S=D2OLdjjK_t5A@mail.gmail.com>

On Fri, May 29, 2015 at 6:56 PM, anatoly techtonik <techtonik at gmail.com> wrote:
> Then he tries to read the documentation, it
> already starts to bring conflict to his mind. It says
> to him to "decode" the text. I don't know about you,
> but when I'm being told to decode the text, I
> assume that it is crypted, because I watched a
> few spy movies including ones with Sherlock
> Holmes and Stierlitz. But the text looks legit to me,
> I can clearly see and read it and now you say that
> I need to decode it.

This is because you fundamentally do not understand the difference
between bytes and text. Consequently, you are trying to shoehorn new
knowledge into your preconceived idea that the file *already contains
text*, which is not true.

Go read:
http://www.joelonsoftware.com/articles/Unicode.html
http://nedbatchelder.com/text/unipain.html

Also, why is this on python-ideas? Talk about this sort of thing on python-list.

ChrisA

From random832 at fastmail.us  Fri May 29 21:32:04 2015
From: random832 at fastmail.us (random832 at fastmail.us)
Date: Fri, 29 May 2015 15:32:04 -0400
Subject: [Python-ideas] Why decode()/encode() name is harmful
In-Reply-To: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
References: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
Message-ID: <1432927924.2536251.281727161.2C137F01@webmail.messagingengine.com>

On Fri, May 29, 2015, at 04:56, anatoly techtonik wrote:
> First, let me start with The Curse of Knowledge
> https://en.wikipedia.org/wiki/Curse_of_knowledge
> which can be summarized as:
> 
> "Once you get something, it becomes hard
> to think how it was to be without it".

Let's think about how it is to be without _the idea that text is a byte
stream in the first place_ - which some people here learned from Python
2, some learned from C, some may have learned from some other language.
It was the way things always were, after all, before Unicode came along.

The language I was using the most immediately before I started using
Python was C#. And C# uses Unicode (well, UTF-16, but the important
thing is that it's not an ASCII-compatible sequence of bytes) for
strings. One could argue that this paradigm - and the attendant "encode"
and "decode" concepts, and stream wrappers that take care of it in the
common cases, are _the future_, and that one day nobody will learn that
text's natural form is as a sequence of ASCII-compatible bytes... even
if text files continue to be encoded that way on the disk.

> Now imaging a person who has a text file. The
> person need to process that with Python. That
> person is probably a journalist and doesn't know
> anything that "any developer should know about
> unicode". In Python 2 he just copy pastes regular
> expressions to match the letter and is happy. In
> Python 3 he needs to *convert* that text to unicode.

You don't have to do so explicitly, if the text file's encoding matches
your locale. You can just open the file and read it, and it will open as
a text-mode stream that takes care of this for you and returns unicode
strings. It's a text file, so you open it in text mode.

Even if it doesn't match your locale, the proper way is to pass an
"encoding" argument to the open function; not to go so deep as to open
it in binary mode and decode the bytes yourself.

From graffatcolmingov at gmail.com  Fri May 29 21:47:25 2015
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Fri, 29 May 2015 14:47:25 -0500
Subject: [Python-ideas] Why decode()/encode() name is harmful
In-Reply-To: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
References: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
Message-ID: <CAN-Kwu2n6brQDsaMLXvAkHCVPEidc5yg2SeUsubTqQ6go6U8mw@mail.gmail.com>

On Fri, May 29, 2015 at 3:56 AM, anatoly techtonik <techtonik at gmail.com> wrote:
> First, let me start with The Curse of Knowledge
> https://en.wikipedia.org/wiki/Curse_of_knowledge
> which can be summarized as:
>
> "Once you get something, it becomes hard
> to think how it was to be without it".
>
> I assume that all of you know difference between
> decode() and encode(), so you're cursed and
> therefore think that getting that right it is just a
> matter of reading documentation, experience and
> time. But quite a lot of had passed and Python 2
> is still there, and Python 3, which is all unicode
> at the core (and which is great for people who
> finally get it) is not as popular. So, remember that
> you are biased towards (or against)
> decode/unicode perception.
>
>
> Now imaging a person who has a text file. The
> person need to process that with Python. That
> person is probably a journalist and doesn't know
> anything that "any developer should know about
> unicode". In Python 2 he just copy pastes regular
> expressions to match the letter and is happy. In
> Python 3 he needs to *convert* that text to unicode.
>
> Then he tries to read the documentation, it
> already starts to bring conflict to his mind. It says
> to him to "decode" the text. I don't know about you,
> but when I'm being told to decode the text, I
> assume that it is crypted, because I watched a
> few spy movies including ones with Sherlock
> Holmes and Stierlitz. But the text looks legit to me,
> I can clearly see and read it and now you say that
> I need to decode it. You're basically ruining my
> world right here. No wonder that I will resist. I
> probably stressed, has a lot of stuff to do, and you
> are trying to load me with all those abstract
> concepts that conflict with what I know. No way!
> Unless I have a really strong motivation (or
> scientific background) there is no chance to get
> this stuff for me right on this day. I will probably
> repeat the exercise and after a few tries will get
> the output right, but there is no chance I will
> remember this thing on that day. Because
> rewiring neural paths in my brain is much harder
> that paving them from scratch.
> --
> anatoly t.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

So, ignoring your lack of suggestions for different names, would you
also argue that the codecs module (which is how people should be
handling this when dealing with files on disk) should also be renamed?
codecs is a portmanteau of coder-decoder and deals with converting the
code-points to bytes and back. codecs, "Encoding", and "Decoding" are
also used for non-text formats too (e.g., files containing video or
audio). They in all of the related contexts they have the same
meaning. I'm failing to understand your problem with the terminology.

From abarnert at yahoo.com  Fri May 29 21:57:16 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 29 May 2015 12:57:16 -0700
Subject: [Python-ideas] Why decode()/encode() name is harmful
In-Reply-To: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
References: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
Message-ID: <659FCF6A-91F0-4D7D-A88E-28CD1D18EC38@yahoo.com>

On May 29, 2015, at 01:56, anatoly techtonik <techtonik at gmail.com> wrote:
> 
> First, let me start with The Curse of Knowledge
> https://en.wikipedia.org/wiki/Curse_of_knowledge
> which can be summarized as:
> 
> "Once you get something, it becomes hard
> to think how it was to be without it".
> 
> I assume that all of you know difference between
> decode() and encode(), so you're cursed and
> therefore think that getting that right it is just a
> matter of reading documentation, experience and
> time. But quite a lot of had passed and Python 2
> is still there, and Python 3, which is all unicode
> at the core (and which is great for people who
> finally get it) is not as popular. So, remember that
> you are biased towards (or against)
> decode/unicode perception.
> 
> 
> Now imaging a person who has a text file. The
> person need to process that with Python. That
> person is probably a journalist and doesn't know
> anything that "any developer should know about
> unicode". In Python 2 he just copy pastes regular
> expressions to match the letter and is happy. In
> Python 3 he needs to *convert* that text to unicode.

No he doesn't. In Python 3, unless he goes out of his way to open the file in binary mode, or use binary string literals for his regexps, that text is unicode from the moment his code sees it. So he doesn't have to read the docs. 

Python 3 was deliberately designed to make it easier to never have to use bytes internally, so 80% of the users never even have to think about bytes (even at the cost of sometimes making things harder for the more advanced coders who need to write the low-level stuff like network protocol handlers and can't avoid bytes).

Now, all those things _are_ still problems for people who use Python 2. But the only way to fix that is to get those people--and, even more importantly, new people--using Python 3. Which means not introducing any new radical inconsistencies in between Python 2 and 3 (or 4) for no good reason--or, of course, between Python 3.5 and 3.6 (or 4.0).

> Then he tries to read the documentation, it
> already starts to bring conflict to his mind. It says
> to him to "decode" the text.

Where in the documentation does it ever tell you to decode text? If you're inventing fictitious documentation that would confuse people if it existed but doesn't because it doesn't, you can just as well claim that the int method is confusing because it tells him he needs to truncate his integers even though integers are already truncated. Yes, that would be confusing--which is why the docs don't say that.

> I don't know about you,
> but when I'm being told to decode the text, I
> assume that it is crypted, because I watched a
> few spy movies including ones with Sherlock
> Holmes and Stierlitz.

If you open Shift-JIS text as if it were Latin-1 and see a mess of mojibake, it doesn't seem that surprising to be told that you need to decode it properly.

If you open UTF-8 text as if it were UTF-8, and Python has already decoded it for you under the covers, you never have to think about it, so there's no opportunity to be surprised.

> But the text looks legit to me,
> I can clearly see and read it and now you say that
> I need to decode it. You're basically ruining my
> world right here. No wonder that I will resist. I
> probably stressed, has a lot of stuff to do, and you
> are trying to load me with all those abstract
> concepts that conflict with what I know. No way!
> Unless I have a really strong motivation (or
> scientific background) there is no chance to get
> this stuff for me right on this day. I will probably
> repeat the exercise and after a few tries will get
> the output right, but there is no chance I will
> remember this thing on that day.

That's a good point. That's exactly why you see people add random calls to str, unicode, encode, and decode to their Python 2 code until it seems to do the right thing on their one test input, and then freak out when it doesn't work on their second test input and go post a confused mess on StackOverflow or Python-list asking someone to solve it for them.

What's the solution? Make it as unlikely as possible that you'll run into the problem in the first place by nearly forcing you to deal in Unicode all the way through your script, and, when you do need to deal with manual encoding and decoding, make the almost-certainly-wrong nonsensical code impossible to write by not having bytes.encode or str.decode or automatic conversions between the two types. Of course that's a backward-incompatible change, and maybe a radical-enough one that it'll take half a decade for the ecosystem to catch up to the point where most users can benefit from it. Which makes it a good thing that Python started that process half a decade ago. So now, to anyone who runs into that confusion, there's an answer: just upgrade from 2.7 to 3.4, undo all the changes you introduced trying to solve this problem incorrectly, and your original code just works.

Even if you had a better solution than Python 3's (which I doubt, but let's assume you do), what good would that do? That would make the answer: wait 18 months for Python 3.6, then another 12 months for the last of the packages you depend on to finally adjust to the breaking incompatibility that 3.6 introduced, then undo all the changes you introduced trying to solve this problem incorrectly, then make different, more sensible, changes. That's clearly not a better answer.

So, unless you have a better solution than Python 3's and also have a time machine to go back to 2007, what could you possibly have to propose?

> Because
> rewiring neural paths in my brain is much harder
> that paving them from scratch.
> -- 
> anatoly t.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From steve at pearwood.info  Sat May 30 02:18:12 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 30 May 2015 10:18:12 +1000
Subject: [Python-ideas] Why decode()/encode() name is harmful
In-Reply-To: <659FCF6A-91F0-4D7D-A88E-28CD1D18EC38@yahoo.com>
References: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
 <659FCF6A-91F0-4D7D-A88E-28CD1D18EC38@yahoo.com>
Message-ID: <20150530001811.GS932@ando.pearwood.info>

On Fri, May 29, 2015 at 12:57:16PM -0700, Andrew Barnert via Python-ideas wrote:

Before anyone else engages too deeply in this off-topic discussion, some 
background: Anatoly wrote to python-list asking for help dealing with a 
problem where he has a bunch of bytes (file names) which probably 
represent Russian text but in an unknown legacy encoding, and he wants 
to round-trip it from bytes to Unicode and back again losslessly.

(Russian is a particularly nasty example, because there are multiple 
mutually-incompatible Russian encodings in widespread use.)

As far as I can see, he has been given the solution, or at least a 
potential solution, on python-list, but as far as I can tell he either 
hasn't read it, or doesn't like the solutions offerred and so is 
ignoring them.

So there's a real problem hidden here, buried beneath the dramatic 
presentation of imaginary journalists processing text, but I don't think 
it's a problem that needs discussing *here* (at least not unless 
somebody comes up with a concrete proposal or idea to be discussed).

A couple more comments follow:


> On May 29, 2015, at 01:56, anatoly techtonik <techtonik at gmail.com> wrote:

> > In Python 2 he just copy pastes regular
> > expressions to match the letter and is happy. In
> > Python 3 he needs to *convert* that text to unicode.
> 
> No he doesn't. In Python 3, unless he goes out of his way to open the 
> file in binary mode, or use binary string literals for his regexps, 
> that text is unicode from the moment his code sees it. So he doesn't 
> have to read the docs.

This is not the case when you have to deal with unknown encodings. And 
from the perspective of people who only have ASCII (or at worst, 
Latin-1) text, or who don't care about moji-bake, Python 2 appears 
easier to work with. To quote Chris Smith:

"I find it amusing when novice programmers believe their main job is
preventing programs from crashing. More experienced programmers realize
that correct code is great, code that crashes could use improvement, but
incorrect code that doesn?t crash is a horrible nightmare."

Python 2's string handling is designed to minimize the chance of getting 
an exception when dealing with text in an unknown encoding, but the 
consequence is that it also minimizes the chance of it doing the right 
thing except by accident. In Python 2, you can give me a bunch of 
arbitrary bytes as a string, and I can read them as text, in a sort of 
ASCII-ish pseudo-encoding, regardless of how inappropriate it is or how 
much moji-bake it generates. But it won't raise an exception, which for 
some people is all that matters.

Moving to Unicode (in Python 2 or 3) can come as a shock to users who 
have never had to think about this before. Moji-bake is ubiquitous on 
the Internet, so there is a real problem to be solved. Python 2's string 
model is not the way to solve it. I don't think there is any 
"no-brainer" solution which doesn't involve thinking about bytes and 
encodings, but if Anatoly or anyone else wants to suggest one, we can 
discuss it.

 
[...]
> Now, all those things _are_ still problems for people who use Python 
> 2. But the only way to fix that is to get those people--and, even more 
> importantly, new people--using Python 3. Which means not introducing 
> any new radical inconsistencies in between Python 2 and 3 (or 4) for 
> no good reason--or, of course, between Python 3.5 and 3.6 (or 4.0).

These same issues occur in Python 2 if you exclusively use unicode 
strings u"" instead of the default string type.

[...]
> So, unless you have a better solution than Python 3's and also have a 
> time machine to go back to 2007, what could you possibly have to 
> propose?

Surely you would have to go back to 1953 when the ASCII encoding first 
started, so we can skip over the whole mess of dozens of mutually 
incompatible "extended ASCII" code pages?



-- 
Steve

From tjreedy at udel.edu  Sat May 30 02:38:52 2015
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 29 May 2015 20:38:52 -0400
Subject: [Python-ideas] Why decode()/encode() name is harmful
In-Reply-To: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
References: <CAPkN8x+YTevWBYhiA0Vb0hkD_vBOEeCz2QY6_dN0vr0xEkbF3w@mail.gmail.com>
Message-ID: <mkb0ri$us6$1@ger.gmane.org>

On 5/29/2015 4:56 AM, anatoly techtonik wrote:

This essay, which is mostly about the clash between python2 thinking and 
python3 thinking, is off topic for this list.  Please use python-list, 
which is open to any python-related topic.

-- 
Terry Jan Reedy


From wes.turner at gmail.com  Sat May 30 14:54:35 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Sat, 30 May 2015 07:54:35 -0500
Subject: [Python-ideas] import features; if "print_function" in features.data
Message-ID: <CACfEFw8H_HLbA4xVPBaCU7P=3bZsBOQzKV0n4SLxQ3s06ayT4Q@mail.gmail.com>

Would it be useful to have one Python source file with an OrderedDict of
(API_feat_lbl, [(start, None)]) mappings
and a lookup?

* [ ] feat/version segments/rays map
* [ ] .lookup("print[_function]")

Syntax ideas:

* has("print[_function]")

Advantages

* More pythonic to check for features than capabilities
* Forward maintainability

Disadvantages:

*

Alternatives:

* six, nine, future
* try/import ENOENT
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150530/a9e09f58/attachment.html>

From wes.turner at gmail.com  Sat May 30 15:15:31 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Sat, 30 May 2015 08:15:31 -0500
Subject: [Python-ideas] a segment tree of available features for the
	current/a given Python interpreter
Message-ID: <CACfEFw9StE=SZL5GuVJ89K97t+=vmDa6ki86QWshdhhERHe+rg@mail.gmail.com>

To reframe the problem (set the subject line), a segment tree of available
features for the current/a given Python interpreter would be useful.

* [ ] this could be e.g. 'features.py' and
* [ ] requested of (new) implementations (with historical data)
* [ ] very simple Python package (python.features ?)
On May 30, 2015 7:54 AM, "Wes Turner" <wes.turner at gmail.com> wrote:

> Would it be useful to have one Python source file with an OrderedDict of
> (API_feat_lbl, [(start, None)]) mappings
> and a lookup?
>
> * [ ] feat/version segments/rays map
> * [ ] .lookup("print[_function]")
>
> Syntax ideas:
>
> * has("print[_function]")
>
> Advantages
>
> * More pythonic to check for features than capabilities
> * Forward maintainability
>
> Disadvantages:
>
> *
>
> Alternatives:
>
> * six, nine, future
> * try/import ENOENT
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150530/9a58527c/attachment.html>

From ncoghlan at gmail.com  Sat May 30 15:25:45 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 30 May 2015 23:25:45 +1000
Subject: [Python-ideas] import features;
	if "print_function" in features.data
In-Reply-To: <CACfEFw8H_HLbA4xVPBaCU7P=3bZsBOQzKV0n4SLxQ3s06ayT4Q@mail.gmail.com>
References: <CACfEFw8H_HLbA4xVPBaCU7P=3bZsBOQzKV0n4SLxQ3s06ayT4Q@mail.gmail.com>
Message-ID: <CADiSq7dzSSuJSKOGt9too5_1RNm4NOFXks5MSuFcyTmmYNTAWA@mail.gmail.com>

On 30 May 2015 at 22:54, Wes Turner <wes.turner at gmail.com> wrote:
> Would it be useful to have one Python source file with an OrderedDict of
> (API_feat_lbl, [(start, None)]) mappings
> and a lookup?

Your choice of example means I'm not sure what additional capabilities
you're seeking.

The __future__ module already aims to cover this for compiler directives:

>>> import __future__
>>> __future__.all_feature_names
['nested_scopes', 'generators', 'division', 'absolute_import',
'with_statement', 'print_function', 'unicode_literals']
>>> __future__.print_function
_Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)

If you're looking for particular builtins, importing builtins (Python
3) or __builtin__ (Python 2) and checking attributes lets you see what
is available via hasattr().

hasattr() will also cover most feature check needs for other modules
(file descriptor support in the os module is an exception, hence the
related dedicated query APIs for that).

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Sat May 30 15:39:16 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 30 May 2015 23:39:16 +1000
Subject: [Python-ideas] a segment tree of available features for the
 current/a given Python interpreter
In-Reply-To: <CACfEFw9StE=SZL5GuVJ89K97t+=vmDa6ki86QWshdhhERHe+rg@mail.gmail.com>
References: <CACfEFw9StE=SZL5GuVJ89K97t+=vmDa6ki86QWshdhhERHe+rg@mail.gmail.com>
Message-ID: <CADiSq7fG846LYJgvXFu4hjTxrqR5=E9w4gZbydn0damqHrfQeQ@mail.gmail.com>

On 30 May 2015 at 23:15, Wes Turner <wes.turner at gmail.com> wrote:
> To reframe the problem (set the subject line), a segment tree of available
> features for the current/a given Python interpreter would be useful.
>
> * [ ] this could be e.g. 'features.py' and
> * [ ] requested of (new) implementations (with historical data)
> * [ ] very simple Python package (python.features ?)

Now I'm even more convinced I'm not following you properly :)

Is it perhaps a request for a programmatically queryable version of
Ned Batchelder's "What's in which Python?" articles?

http://nedbatchelder.com/blog/201109/whats_in_which_python.html
http://nedbatchelder.com/blog/201310/whats_in_which_python_3.html

If yes, that seems like a reasonable idea, but would likely work
better as a community maintained PyPI module, rather than as a
standard library module. My rationale for that:

* older versions would need support for new feature checks to avoid
failing on the feature checker
* other implementations could contribute as needed to adjust feature
checks that were overly specific to CPython
* the community could collectively determine what "features" were
sufficiently interesting to be worth tracking through the relevant
projects issue tracker, rather than the core development team needing
to decide a priori which new features in each release end users are
going to want to conditionally adopt

If that interpretation of the question is incorrect, then you're going
to need to expand more on the problem you're hoping to solve with this
suggestion.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ron3200 at gmail.com  Sat May 30 17:45:54 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Sat, 30 May 2015 11:45:54 -0400
Subject: [Python-ideas] Explicitly shared objects with sub modules vs import
Message-ID: <mkclvj$390$1@ger.gmane.org>


While trying to debug a problem and thinking that it may be an issue with 
circular imports,  I come up with an interesting idea that might be of 
value.  It wasn't a circular import problem in this case, but I may have 
found the actual bug sooner if I didn't need to be concerned about that 
possibility.

I have had some difficulty splitting larger modules into smaller modules in 
the past where if I split the code by functionality, it doesn't correspond 
with how the code is organized by dependency.  The result is an imported 
module needs to import the module it's imported into.  Which just doesn't 
feel right to me.

The solution I found was to call a function to explicitly set the shared 
items in the imported module.

(The example is from a language I'm experimenting with written in python. 
So don't be concerned about the shared object names in this case.)

In the main module...

     import parse
     parse.set_main(List=List,
                    Keyword=Keyword,
                    Name=Name,
                    String=String,
                    Express=Express,
                    keywords=keywords,
                    raise_with=raise_with,
                    nil=nil)

And in parse...

# Sets shared objects from main module.
from collections import namedtuple
def set_main(**d):
     global main
     main = namedtuple(__name__, d.keys())
     for k, v in d.items():
         setattr(main, k, v)



After this, the sub module access's the parent modules objects with...

     main.Keyword

Just the same as if the parent module was imported as main, but it only 
shares what is intended to be shared within this specific imported module. 
  I think that is better than using "import from" in the sub module.  And 
an improvement over importing the whole module which can possibly expose 
too much.

The benifits:

     * The shared items are explicitly set by the parent module.
     * If an item is missing, it results in a nice error message
     * Accessing objects works the same as if import was used.
     * It avoids (most) circular import problems.
     * It's easier to think about once you understand what it does.


The problem is the submodule needs a function to make it work. I think it 
would be nice if it could be made a builtin but doing that may be tricky.

Where I've used "main", it could set the name of the shared parent 
module(s) automatically.

The name of the function probably should be "shared" or "sharing".  (Or 
some other thing that makes sense.)

I would like to hear what other here think, and of course if there are any 
obvious improvements that can be made.

Would this be a good candidate for a new builtin?


Cheers,
    Ron






















From storchaka at gmail.com  Sat May 30 18:13:44 2015
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Sat, 30 May 2015 19:13:44 +0300
Subject: [Python-ideas] Explicitly shared objects with sub modules vs
	import
In-Reply-To: <mkclvj$390$1@ger.gmane.org>
References: <mkclvj$390$1@ger.gmane.org>
Message-ID: <mkcnjo$sif$1@ger.gmane.org>

On 30.05.15 18:45, Ron Adam wrote:
>
> While trying to debug a problem and thinking that it may be an issue
> with circular imports,  I come up with an interesting idea that might be
> of value.  It wasn't a circular import problem in this case, but I may
> have found the actual bug sooner if I didn't need to be concerned about
> that possibility.
>
> I have had some difficulty splitting larger modules into smaller modules
> in the past where if I split the code by functionality, it doesn't
> correspond with how the code is organized by dependency.  The result is
> an imported module needs to import the module it's imported into.  Which
> just doesn't feel right to me.
>
> The solution I found was to call a function to explicitly set the shared
> items in the imported module.

Why not move all shared objects in common module? Then in both main and 
parse module you can write

from common import *



From ron3200 at gmail.com  Sat May 30 18:51:59 2015
From: ron3200 at gmail.com (Ron Adam)
Date: Sat, 30 May 2015 12:51:59 -0400
Subject: [Python-ideas] Explicitly shared objects with sub modules vs
	import
In-Reply-To: <mkcnjo$sif$1@ger.gmane.org>
References: <mkclvj$390$1@ger.gmane.org> <mkcnjo$sif$1@ger.gmane.org>
Message-ID: <mkcprg$vim$1@ger.gmane.org>



On 05/30/2015 12:13 PM, Serhiy Storchaka wrote:
> On 30.05.15 18:45, Ron Adam wrote:
>>
>> While trying to debug a problem and thinking that it may be an issue
>> with circular imports,  I come up with an interesting idea that might be
>> of value.  It wasn't a circular import problem in this case, but I may
>> have found the actual bug sooner if I didn't need to be concerned about
>> that possibility.
>>
>> I have had some difficulty splitting larger modules into smaller modules
>> in the past where if I split the code by functionality, it doesn't
>> correspond with how the code is organized by dependency.  The result is
>> an imported module needs to import the module it's imported into.  Which
>> just doesn't feel right to me.
>>
>> The solution I found was to call a function to explicitly set the shared
>> items in the imported module.
>
> Why not move all shared objects in common module? Then in both main and
> parse module you can write
>
> from common import *


As I said, sometimes I prefer to organise things by function rather than 
dependency.

The point is this fits a somewhat different pattern than when you have 
independent common objects.  These can be inter-dependent shared objects 
that would require a circular imports.  So common may need an "import 
__main__ as main" in order for the items that are imported with "import *" 
to work.

One argument might be the organisation of the code is wrong if that is 
needed, or the may be a better way to organise it.  While that is a valid 
point, it may not be the only factor involved in deciding how to organise 
the code.

I also like to avoid "import *" except when importing very general and 
common utility functions.  ie.. "from math import *".

Cheers,
    Ron




From steve at pearwood.info  Sun May 31 01:32:26 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 31 May 2015 09:32:26 +1000
Subject: [Python-ideas] import features;
	if "print_function" in features.data
In-Reply-To: <CACfEFw8H_HLbA4xVPBaCU7P=3bZsBOQzKV0n4SLxQ3s06ayT4Q@mail.gmail.com>
References: <CACfEFw8H_HLbA4xVPBaCU7P=3bZsBOQzKV0n4SLxQ3s06ayT4Q@mail.gmail.com>
Message-ID: <20150530233220.GU932@ando.pearwood.info>

On Sat, May 30, 2015 at 07:54:35AM -0500, Wes Turner wrote:
> Would it be useful to have one Python source file with an OrderedDict of
> (API_feat_lbl, [(start, None)]) mappings
> and a lookup?

Why an OrderedDict?

This already exists for __future__ features:

py> import __future__
py> __future__.all_feature_names
['nested_scopes', 'generators', 'division', 'absolute_import', 
'with_statement', 'print_function', 'unicode_literals', 
'barry_as_FLUFL']



> * [ ] feat/version segments/rays map
> * [ ] .lookup("print[_function]")

I don't know what this means.



> Syntax ideas:
> 
> * has("print[_function]")

Why does it need a new function instead of just this?

    "print" in featureset


> Advantages
> 
> * More pythonic to check for features than capabilities

I think that is wrong. I think that Look Before You Leap is generally 
considered *less* Pythonic.



> * Forward maintainability

Not when it comes to syntax changes. You can't write:

if has("print"):
    print "Hello world"
else:
    print("Hello world")

because *it won't compile* if print_function is in effect.

For non-syntax changes, it's not backwards compatible:

if not has("enumerate takes a start argument"):
    def enumerate(values, start):
        for i, x in builtins.enumerate(values):
            yield i+start, x

doesn't work for anything older than 3.6 (at the earliest). It's better 
to check for the feature directly, which always work:

try:
    enumerate([], 1)
except TypeError:
    ...

> Disadvantages:
> 
> *

* It's ugly, especially for small changes to features, such as when a 
function started to accept an optional argument.

* It requires more work: you have to duplicate the information 
about every feature in at least three places, not just two (the code 
itself, the documentation, plus the "features" database).

* It's hard to use.

* Bootstrapping problem: how do you check for the "has" feature 
itself?

     if has("has"): ... # obviously cannot work

* Doesn't help with writing hybrid 2+3 code, as it doesn't exist in 2.



> Alternatives:
> 
> * six, nine, future
> * try/import ENOENT

I don't understand this.




-- 
Steve

From aquavitae69 at gmail.com  Sun May 31 09:16:57 2015
From: aquavitae69 at gmail.com (David Townshend)
Date: Sun, 31 May 2015 09:16:57 +0200
Subject: [Python-ideas] npm-style venv-aware launcher
Message-ID: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>

Pip and venv have done a lot to improve the accessibility and ease of
installing python packages, but I believe there is still a lot of room for
improvement.  I only realised how cumbersome I find working with python
packages when I recently spent a lot of time on a javascript project using
npm.  A bit of googling and I found several articles discussing pip, venv
and npm, and all of them seemed to say the same thing, i.e. pip/venv could
learn a lot from npm.

My proposal revolves around two issues:

   1. Setting up and working with virtual environments can be onerous.
   Creating one is easy enough, but using them means remembering to run
   `source activate` every time, which also means remembering which venv is
   used for which project.  Not a major issue, but still and annoyance.
   2. Managing lists of required packages is not nearly as easy as in npm
   since these is no equivalent to `npm install --save ...`.  The best that
   pip offers is `pip freeze`.  Howevere, using that is a) an extra step to
   remember and b) includes all implied dependencies which is not ideal.

My proposal is to use a similar model to npm, where each project has a
`venvrc` file which lets python-related tools know which environment to
use.  In order to showcase the sort of funcionality I'm proposing, I've
created a basic example on github (https://github.com/aquavitae/pyle).
This is currently py3.4 on linux only and very pre-alpha.  Once I've added
a few more features that I have in mind (e.g. multiple venvs) I'll add it
to pypi and if there is sufficient interest I'd be happy to write up a PEP
for getting it into the stdlib.

Does this seem like the sort of tool that would be useful in the stdlib?

Regards

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/bbfe50b0/attachment.html>

From abarnert at yahoo.com  Sun May 31 09:35:56 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 31 May 2015 00:35:56 -0700
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
Message-ID: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com>

On May 31, 2015, at 00:16, David Townshend <aquavitae69 at gmail.com> wrote:
> 
> Pip and venv have done a lot to improve the accessibility and ease of installing python packages, but I believe there is still a lot of room for improvement.  I only realised how cumbersome I find working with python packages when I recently spent a lot of time on a javascript project using npm.  A bit of googling and I found several articles discussing pip, venv and npm, and all of them seemed to say the same thing, i.e. pip/venv could learn a lot from npm.
> 
> My proposal revolves around two issues:
> Setting up and working with virtual environments can be onerous.  Creating one is easy enough, but using them means remembering to run `source activate` every time, which also means remembering which venv is used for which project.  Not a major issue, but still and annoyance.
If you're not using virtualenvwrapper.

You do have to get used to using workon instead of cd to switch between environments--although if you want to, there's a hook you can alias cd to (virtualenvwrapperhelper). And I haven't tried either the native Windows cmd or PowerShell ports or the PowerShell port (it works great with MSYS bash, but I realize not everyone on Windows wants to pretend they're not on Windows). And managing multiple environments with different Python versions (at least different versions of 2.x or different versions of 3.x) could be nicer.

But I think it does 90% of what you're looking for, and I think it might be easier to add the other 10% to virtualenvwrapper than to start from scratch. And it works with 2.6-3.3 as well as 3.4+ (with virtualenv instead of venv, of course), on most platforms. with multiple environments, with tab completion (at least in bash and zsh), etc.
> Managing lists of required packages is not nearly as easy as in npm since these is no equivalent to `npm install --save ...`.  The best that pip offers is `pip freeze`.  Howevere, using that is a) an extra step to remember and b) includes all implied dependencies which is not ideal.
> My proposal is to use a similar model to npm, where each project has a `venvrc` file which lets python-related tools know which environment to use.  In order to showcase the sort of funcionality I'm proposing, I've created a basic example on github (https://github.com/aquavitae/pyle).  This is currently py3.4 on linux only and very pre-alpha.  Once I've added a few more features that I have in mind (e.g. multiple venvs) I'll add it to pypi and if there is sufficient interest I'd be happy to write up a PEP for getting it into the stdlib.
> 
> Does this seem like the sort of tool that would be useful in the stdlib?
> 
> Regards
> 
> David
> 
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/1c6ab8f9/attachment-0001.html>

From aquavitae69 at gmail.com  Sun May 31 10:01:21 2015
From: aquavitae69 at gmail.com (David Townshend)
Date: Sun, 31 May 2015 10:01:21 +0200
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com>
Message-ID: <CAEgL-feaq3f6kGb_J_U2U36=d3dME4jyKYUv9i1FwxqQieCbng@mail.gmail.com>

On Sun, May 31, 2015 at 9:35 AM, Andrew Barnert <abarnert at yahoo.com> wrote:

> On May 31, 2015, at 00:16, David Townshend <aquavitae69 at gmail.com> wrote:
>
> Pip and venv have done a lot to improve the accessibility and ease of
> installing python packages, but I believe there is still a lot of room for
> improvement.  I only realised how cumbersome I find working with python
> packages when I recently spent a lot of time on a javascript project using
> npm.  A bit of googling and I found several articles discussing pip, venv
> and npm, and all of them seemed to say the same thing, i.e. pip/venv could
> learn a lot from npm.
>
> My proposal revolves around two issues:
>
>    1. Setting up and working with virtual environments can be onerous.
>    Creating one is easy enough, but using them means remembering to run
>    `source activate` every time, which also means remembering which venv is
>    used for which project.  Not a major issue, but still and annoyance.
>
> If you're not using virtualenvwrapper.
>
> You do have to get used to using workon instead of cd to switch between
> environments--although if you want to, there's a hook you can alias cd to
> (virtualenvwrapperhelper). And I haven't tried either the native Windows
> cmd or PowerShell ports or the PowerShell port (it works great with MSYS
> bash, but I realize not everyone on Windows wants to pretend they're not on
> Windows). And managing multiple environments with different Python versions
> (at least different versions of 2.x or different versions of 3.x) could be
> nicer.
>
> But I think it does 90% of what you're looking for, and I think it might
> be easier to add the other 10% to virtualenvwrapper than to start from
> scratch. And it works with 2.6-3.3 as well as 3.4+ (with virtualenv instead
> of venv, of course), on most platforms. with multiple environments, with
> tab completion (at least in bash and zsh), etc.
>

Virtualenvwrapper does help a bit, but nowhere near 90%.  It doesn't touch
any of the issues with pip, it still requires configuration and manually
ensuring that the venv is activated.  But the biggest issue with extending
it is that it has a totally different workflow philosophy in that it
enforces a separation between the venv and the project, whereas my proposal
involves more integration of the two.  I have used virtualenvwrapper quite
a bit in the past, but in the end I've always found it easier to just work
with venv because of the lack of flexibiltiy in where and how I store the
venvs.

>
>    1. Managing lists of required packages is not nearly as easy as in npm
>    since these is no equivalent to `npm install --save ...`.  The best that
>    pip offers is `pip freeze`.  Howevere, using that is a) an extra step to
>    remember and b) includes all implied dependencies which is not ideal.
>
> My proposal is to use a similar model to npm, where each project has a
> `venvrc` file which lets python-related tools know which environment to
> use.  In order to showcase the sort of funcionality I'm proposing, I've
> created a basic example on github (https://github.com/aquavitae/pyle).
> This is currently py3.4 on linux only and very pre-alpha.  Once I've added
> a few more features that I have in mind (e.g. multiple venvs) I'll add it
> to pypi and if there is sufficient interest I'd be happy to write up a PEP
> for getting it into the stdlib.
>
> Does this seem like the sort of tool that would be useful in the stdlib?
>
> Regards
>
> David
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/a1278d1c/attachment.html>

From abarnert at yahoo.com  Sun May 31 10:41:42 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 31 May 2015 01:41:42 -0700
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <CAEgL-feaq3f6kGb_J_U2U36=d3dME4jyKYUv9i1FwxqQieCbng@mail.gmail.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com>
 <CAEgL-feaq3f6kGb_J_U2U36=d3dME4jyKYUv9i1FwxqQieCbng@mail.gmail.com>
Message-ID: <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com>

On May 31, 2015, at 01:01, David Townshend <aquavitae69 at gmail.com> wrote:
> 
> 
>> On Sun, May 31, 2015 at 9:35 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
>>> On May 31, 2015, at 00:16, David Townshend <aquavitae69 at gmail.com> wrote:
>>> 
>>> Pip and venv have done a lot to improve the accessibility and ease of installing python packages, but I believe there is still a lot of room for improvement.  I only realised how cumbersome I find working with python packages when I recently spent a lot of time on a javascript project using npm.  A bit of googling and I found several articles discussing pip, venv and npm, and all of them seemed to say the same thing, i.e. pip/venv could learn a lot from npm.
>>> 
>>> My proposal revolves around two issues:
>>> Setting up and working with virtual environments can be onerous.  Creating one is easy enough, but using them means remembering to run `source activate` every time, which also means remembering which venv is used for which project.  Not a major issue, but still and annoyance.
>> 
>> If you're not using virtualenvwrapper.
>> 
>> You do have to get used to using workon instead of cd to switch between environments--although if you want to, there's a hook you can alias cd to (virtualenvwrapperhelper). And I haven't tried either the native Windows cmd or PowerShell ports or the PowerShell port (it works great with MSYS bash, but I realize not everyone on Windows wants to pretend they're not on Windows). And managing multiple environments with different Python versions (at least different versions of 2.x or different versions of 3.x) could be nicer.
>> 
>> But I think it does 90% of what you're looking for, and I think it might be easier to add the other 10% to virtualenvwrapper than to start from scratch. And it works with 2.6-3.3 as well as 3.4+ (with virtualenv instead of venv, of course), on most platforms. with multiple environments, with tab completion (at least in bash and zsh), etc.
> 
> Virtualenvwrapper does help a bit, but nowhere near 90%.  It doesn't touch any of the issues with pip, it still requires configuration and manually ensuring that the venv is activated. 

As I already mentioned, if you use virtualenvwrapperhelper or autoenv, you don't need to manually ensure that the venv is activated. I personally use it by having workon cd into the directory for me instead of vice-versa, but if you like vice-versa, you can do it that way, so every time you cd into a directory with a venv in, it activates.

> But the biggest issue with extending it is that it has a totally different workflow philosophy in that it enforces a separation between the venv and the project,

I don't understand what you mean. I have a one-to-one mapping between venvs and projects (although you _can_ have multiple projects using the same venv, that isn't the simplest way to use it), and I have everything checked into git together, and I didn't have to do anything complicated to get there.

> whereas my proposal involves more integration of the two.  I have used virtualenvwrapper quite a bit in the past, but in the end I've always found it easier to just work with venv because of the lack of flexibiltiy in where and how I store the venvs.  

The default for npm is that your package dir is attached directly to the project. You can get more flexibility by setting an environment variable or creating a symlink, but normally you don't. It has about the same flexibility as virtualenvwrapper, with about the same amount of effort. So if virtualenvwrapper isn't flexible enough for you, my guess is that your take on npm won't be flexible enough either, it'll just come preconfigured for your own idiosyncratic use and everyone else will have to adjust...
>>> Managing lists of required packages is not nearly as easy as in npm since these is no equivalent to `npm install --save ...`.  The best that pip offers is `pip freeze`.  Howevere, using that is a) an extra step to remember and b) includes all implied dependencies which is not ideal.
>>> My proposal is to use a similar model to npm, where each project has a `venvrc` file which lets python-related tools know which environment to use.  In order to showcase the sort of funcionality I'm proposing, I've created a basic example on github (https://github.com/aquavitae/pyle).  This is currently py3.4 on linux only and very pre-alpha.  Once I've added a few more features that I have in mind (e.g. multiple venvs) I'll add it to pypi and if there is sufficient interest I'd be happy to write up a PEP for getting it into the stdlib.
>>> 
>>> Does this seem like the sort of tool that would be useful in the stdlib?
>>> 
>>> Regards
>>> 
>>> David
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/3617513e/attachment-0001.html>

From p.andrefreitas at gmail.com  Sun May 31 12:36:33 2015
From: p.andrefreitas at gmail.com (=?UTF-8?Q?Andr=C3=A9_Freitas?=)
Date: Sun, 31 May 2015 10:36:33 +0000
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com>
 <CAEgL-feaq3f6kGb_J_U2U36=d3dME4jyKYUv9i1FwxqQieCbng@mail.gmail.com>
 <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com>
Message-ID: <CAMkX=YUNQS7zMuV=1A4J==zeNvJ==AFdybAa5Ev7Z_=FpwdJSw@mail.gmail.com>

+1 for this idea David.

I am using requirements.txt for managing dependencies but the NPM approach
is simpler than doing pip freeze, inspecting what are the requirements we
really use and setting up a virtualenv. If you need help with the PEP
writing I can help you.



Em dom, 31 de mai de 2015 ?s 09:45, Andrew Barnert via Python-ideas <
python-ideas at python.org> escreveu:

> On May 31, 2015, at 01:01, David Townshend <aquavitae69 at gmail.com> wrote:
>
>
> On Sun, May 31, 2015 at 9:35 AM, Andrew Barnert <abarnert at yahoo.com>
> wrote:
>
>> On May 31, 2015, at 00:16, David Townshend <aquavitae69 at gmail.com> wrote:
>>
>> Pip and venv have done a lot to improve the accessibility and ease of
>> installing python packages, but I believe there is still a lot of room for
>> improvement.  I only realised how cumbersome I find working with python
>> packages when I recently spent a lot of time on a javascript project using
>> npm.  A bit of googling and I found several articles discussing pip, venv
>> and npm, and all of them seemed to say the same thing, i.e. pip/venv could
>> learn a lot from npm.
>>
>> My proposal revolves around two issues:
>>
>>    1. Setting up and working with virtual environments can be onerous.
>>    Creating one is easy enough, but using them means remembering to run
>>    `source activate` every time, which also means remembering which venv is
>>    used for which project.  Not a major issue, but still and annoyance.
>>
>> If you're not using virtualenvwrapper.
>>
>> You do have to get used to using workon instead of cd to switch between
>> environments--although if you want to, there's a hook you can alias cd to
>> (virtualenvwrapperhelper). And I haven't tried either the native Windows
>> cmd or PowerShell ports or the PowerShell port (it works great with MSYS
>> bash, but I realize not everyone on Windows wants to pretend they're not on
>> Windows). And managing multiple environments with different Python versions
>> (at least different versions of 2.x or different versions of 3.x) could be
>> nicer.
>>
>> But I think it does 90% of what you're looking for, and I think it might
>> be easier to add the other 10% to virtualenvwrapper than to start from
>> scratch. And it works with 2.6-3.3 as well as 3.4+ (with virtualenv instead
>> of venv, of course), on most platforms. with multiple environments, with
>> tab completion (at least in bash and zsh), etc.
>>
>
> Virtualenvwrapper does help a bit, but nowhere near 90%.  It doesn't touch
> any of the issues with pip, it still requires configuration and manually
> ensuring that the venv is activated.
>
>
> As I already mentioned, if you use virtualenvwrapperhelper or autoenv, you
> don't need to manually ensure that the venv is activated. I personally use
> it by having workon cd into the directory for me instead of vice-versa, but
> if you like vice-versa, you can do it that way, so every time you cd into a
> directory with a venv in, it activates.
>
> But the biggest issue with extending it is that it has a totally different
> workflow philosophy in that it enforces a separation between the venv and
> the project,
>
>
> I don't understand what you mean. I have a one-to-one mapping between
> venvs and projects (although you _can_ have multiple projects using the
> same venv, that isn't the simplest way to use it), and I have everything
> checked into git together, and I didn't have to do anything complicated to
> get there.
>
> whereas my proposal involves more integration of the two.  I have used
> virtualenvwrapper quite a bit in the past, but in the end I've always found
> it easier to just work with venv because of the lack of flexibiltiy in
> where and how I store the venvs.
>
>
> The default for npm is that your package dir is attached directly to the
> project. You can get more flexibility by setting an environment variable or
> creating a symlink, but normally you don't. It has about the same
> flexibility as virtualenvwrapper, with about the same amount of effort. So
> if virtualenvwrapper isn't flexible enough for you, my guess is that your
> take on npm won't be flexible enough either, it'll just come preconfigured
> for your own idiosyncratic use and everyone else will have to adjust...
>
>
>>    1. Managing lists of required packages is not nearly as easy as in
>>    npm since these is no equivalent to `npm install --save ...`.  The best
>>    that pip offers is `pip freeze`.  Howevere, using that is a) an extra step
>>    to remember and b) includes all implied dependencies which is not ideal.
>>
>> My proposal is to use a similar model to npm, where each project has a
>> `venvrc` file which lets python-related tools know which environment to
>> use.  In order to showcase the sort of funcionality I'm proposing, I've
>> created a basic example on github (https://github.com/aquavitae/pyle).
>> This is currently py3.4 on linux only and very pre-alpha.  Once I've added
>> a few more features that I have in mind (e.g. multiple venvs) I'll add it
>> to pypi and if there is sufficient interest I'd be happy to write up a PEP
>> for getting it into the stdlib.
>>
>> Does this seem like the sort of tool that would be useful in the stdlib?
>>
>> Regards
>>
>> David
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/b07df06b/attachment.html>

From steve at pearwood.info  Sun May 31 13:32:25 2015
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 31 May 2015 21:32:25 +1000
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
Message-ID: <20150531113225.GW932@ando.pearwood.info>

On Sun, May 31, 2015 at 09:16:57AM +0200, David Townshend wrote:
> Pip and venv have done a lot to improve the accessibility and ease of
> installing python packages, but I believe there is still a lot of room for
> improvement.  I only realised how cumbersome I find working with python
> packages when I recently spent a lot of time on a javascript project using
> npm.  A bit of googling and I found several articles discussing pip, venv
> and npm, and all of them seemed to say the same thing, i.e. pip/venv could
> learn a lot from npm.
> 
> My proposal revolves around two issues:
[...]


I don't think this is the right place to discuss either of those ideas. 
pip is not part of either the Python language or the standard library 
(apart from the very narrow sense that the most recent versions of 
Python include a tool to bootstrap pip). I think you should submit them 
on whatever forum pip uses to discuss feature suggestions.


-- 
Steve

From tritium-list at sdamon.com  Sun May 31 13:34:14 2015
From: tritium-list at sdamon.com (Alexander Walters)
Date: Sun, 31 May 2015 07:34:14 -0400
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
Message-ID: <556AF1B6.6000808@sdamon.com>

You might want to shoot this over to the distutils-sig mailing list.

On 5/31/2015 03:16, David Townshend wrote:
> Pip and venv have done a lot to improve the accessibility and ease of 
> installing python packages, but I believe there is still a lot of room 
> for improvement.  I only realised how cumbersome I find working with 
> python packages when I recently spent a lot of time on a javascript 
> project using npm.  A bit of googling and I found several articles 
> discussing pip, venv and npm, and all of them seemed to say the same 
> thing, i.e. pip/venv could learn a lot from npm.
>
> My proposal revolves around two issues:
>
>  1. Setting up and working with virtual environments can be onerous. 
>     Creating one is easy enough, but using them means remembering to
>     run `source activate` every time, which also means remembering
>     which venv is used for which project.  Not a major issue, but
>     still and annoyance.
>  2. Managing lists of required packages is not nearly as easy as in
>     npm since these is no equivalent to `npm install --save ...`.  The
>     best that pip offers is `pip freeze`. Howevere, using that is a)
>     an extra step to remember and b) includes all implied dependencies
>     which is not ideal.
>
> My proposal is to use a similar model to npm, where each project has a 
> `venvrc` file which lets python-related tools know which environment 
> to use.  In order to showcase the sort of funcionality I'm proposing, 
> I've created a basic example on github 
> (https://github.com/aquavitae/pyle). This is currently py3.4 on linux 
> only and very pre-alpha. Once I've added a few more features that I 
> have in mind (e.g. multiple venvs) I'll add it to pypi and if there is 
> sufficient interest I'd be happy to write up a PEP for getting it into 
> the stdlib.
>
> Does this seem like the sort of tool that would be useful in the stdlib?
>
> Regards
>
> David
>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/21325b7f/attachment-0001.html>

From stephen at xemacs.org  Sun May 31 15:10:54 2015
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 31 May 2015 22:10:54 +0900
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <20150531113225.GW932@ando.pearwood.info>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <20150531113225.GW932@ando.pearwood.info>
Message-ID: <87pp5guc2p.fsf@uwakimon.sk.tsukuba.ac.jp>

Steven D'Aprano writes:

 > I don't think this is the right place to discuss either of those ideas. 

I think you're missing the point -- this is part of the larger
discussion on packaging, as Alexander recognized ("shoot this over to
distutils-sig", he said).  While technically it may belong elsewhere
(distutils, for example), the amount of attention it's attracting from
core committers right now suggests that it's a real pain point, and
should get discussion from the wider community while requirements are
still unclear.

While I'm not one for suggesting that TOOWTDI is obvious in advance
(and not even if you're Dutch), surely it's worth narrowing down the
field by looking at a lot of ideas.


From ncoghlan at gmail.com  Sun May 31 17:04:07 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 1 Jun 2015 01:04:07 +1000
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <87pp5guc2p.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <20150531113225.GW932@ando.pearwood.info>
 <87pp5guc2p.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7dQA-XikhwhJ+D_ff6L2QARw+bxOqOUS_Vvr=iXJTFpbw@mail.gmail.com>

On 31 May 2015 at 23:10, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Steven D'Aprano writes:
>
>  > I don't think this is the right place to discuss either of those ideas.
>
> I think you're missing the point -- this is part of the larger
> discussion on packaging, as Alexander recognized ("shoot this over to
> distutils-sig", he said).  While technically it may belong elsewhere
> (distutils, for example), the amount of attention it's attracting from
> core committers right now suggests that it's a real pain point, and
> should get discussion from the wider community while requirements are
> still unclear.
>
> While I'm not one for suggesting that TOOWTDI is obvious in advance
> (and not even if you're Dutch), surely it's worth narrowing down the
> field by looking at a lot of ideas.

There are a plethora of environment management options out there, and
https://github.com/pypa/python-packaging-user-guide/issues/118
discusses some of them (focusing specifically on the ad hoc
environment management side of things rather than VCS linked
environment management, though).

The npm model in particular unfortunately gets a lot of its
"simplicity" by isolating all the dependencies from each other during
component development (including freely permitting duplicates and even
different versions of the same component), so you get the excitement
of live integration at runtime instead of rationalising your
dependency set as part of your design and development process (see
https://speakerdeck.com/nzpug/francois-marier-external-dependencies-in-web-apps-system-libs-are-not-that-scary?slide=9
). As developers, we can make our lives *very* easy if we're happy to
discount the interests of other folks that are actually tasked with
deploying and maintaining our code (either an operations team if we
have one, or at the very least future maintainers if we don't).

So while there are still useful user experience lessons to be learned
from npm, they require careful filtering to ensure they actually *are*
a simplification of the overall user experience, rather than cases
where the designers of the system have made things easier for
developers working on the project itself at the expense of making them
harder for operators and end users that just want to install it
(potentially as part of a larger integrated system).

Cheers,
Nick.

P.S. I've unfortunately never found the time to write up my own
packaging system research properly, but
https://bitbucket.org/ncoghlan/misc/src/default/talks/2013-07-pyconau/packaging/brispy-talk.md
has some rough notes from a couple of years ago, while
https://fedoraproject.org/wiki/Env_and_Stacks/Projects/UserLevelPackageManagement
looks at the general problem space from an operating system developer
experience design perspective.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From donald at stufft.io  Sun May 31 17:17:46 2015
From: donald at stufft.io (Donald Stufft)
Date: Sun, 31 May 2015 11:17:46 -0400
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <CADiSq7dQA-XikhwhJ+D_ff6L2QARw+bxOqOUS_Vvr=iXJTFpbw@mail.gmail.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <20150531113225.GW932@ando.pearwood.info>
 <87pp5guc2p.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CADiSq7dQA-XikhwhJ+D_ff6L2QARw+bxOqOUS_Vvr=iXJTFpbw@mail.gmail.com>
Message-ID: <etPan.556b261a.4fc85550.18516@Draupnir.home>



On May 31, 2015 at 11:05:24 AM, Nick Coghlan (ncoghlan at gmail.com) wrote:
> On 31 May 2015 at 23:10, Stephen J. Turnbull wrote:
> > Steven D'Aprano writes:
> >
> > > I don't think this is the right place to discuss either of those ideas.
> >
> > I think you're missing the point -- this is part of the larger
> > discussion on packaging, as Alexander recognized ("shoot this over to
> > distutils-sig", he said). While technically it may belong elsewhere
> > (distutils, for example), the amount of attention it's attracting from
> > core committers right now suggests that it's a real pain point, and
> > should get discussion from the wider community while requirements are
> > still unclear.
> >
> > While I'm not one for suggesting that TOOWTDI is obvious in advance
> > (and not even if you're Dutch), surely it's worth narrowing down the
> > field by looking at a lot of ideas.
>  
> There are a plethora of environment management options out there, and
> https://github.com/pypa/python-packaging-user-guide/issues/118
> discusses some of them (focusing specifically on the ad hoc
> environment management side of things rather than VCS linked
> environment management, though).
>  
> The npm model in particular unfortunately gets a lot of its
> "simplicity" by isolating all the dependencies from each other during
> component development (including freely permitting duplicates and even
> different versions of the same component), so you get the excitement
> of live integration at runtime instead of rationalising your
> dependency set as part of your design and development process (see
> https://speakerdeck.com/nzpug/francois-marier-external-dependencies-in-web-apps-system-libs-are-not-that-scary?slide=9  
> ). As developers, we can make our lives *very* easy if we're happy to
> discount the interests of other folks that are actually tasked with
> deploying and maintaining our code (either an operations team if we
> have one, or at the very least future maintainers if we don't).
>  
> So while there are still useful user experience lessons to be learned
> from npm, they require careful filtering to ensure they actually *are*
> a simplification of the overall user experience, rather than cases
> where the designers of the system have made things easier for
> developers working on the project itself at the expense of making them
> harder for operators and end users that just want to install it
> (potentially as part of a larger integrated system).
>  
> Cheers,
> Nick.
>  
> P.S. I've unfortunately never found the time to write up my own
> packaging system research properly, but
> https://bitbucket.org/ncoghlan/misc/src/default/talks/2013-07-pyconau/packaging/brispy-talk.md  
> has some rough notes from a couple of years ago, while
> https://fedoraproject.org/wiki/Env_and_Stacks/Projects/UserLevelPackageManagement  
> looks at the general problem space from an operating system developer
> experience design perspective.
>  
> --
> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>  


One of the things that make NPM a lot simpler is that their ?virtualenv?
is implicit and the default, and you have to go out of your way to get
a ?global? install. It would be possible to add this to Python by doing
something like ``sys.path.append(?./.python-modules/?)`` (but it also
needs to recurse upwards) to the Python?startup (and possibly some file
you can put in that folder so that it?doesn?t add the typical site-packages
or user-packages to the sys.path.

This makes it easier to have isolation being the default, however it
comes with it?s own problems. It becomes a lot harder to determine
what?s going to happen when you type ``python`` since you have to inspect
the entire directory hierarchy above you looking for a .python_modules
file. There?s also the problem that binary scripts tend to get installed
into something like .python-modules/bin/ or so in that layout, but that?s
rarely what people want. The npm community ?solved? this by having the
actual CLI command be installable on it?s own that will call into the
main program that you have installed per project.

---  
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



From aquavitae69 at gmail.com  Sun May 31 18:19:09 2015
From: aquavitae69 at gmail.com (David Townshend)
Date: Sun, 31 May 2015 18:19:09 +0200
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com>
 <CAEgL-feaq3f6kGb_J_U2U36=d3dME4jyKYUv9i1FwxqQieCbng@mail.gmail.com>
 <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com>
Message-ID: <CAEgL-fcvYC-iEGrt8Qy=yMg4W2jxS69QMjRSV2kjt70pbQ=sFw@mail.gmail.com>

>
> The default for npm is that your package dir is attached directly to the
> project. You can get more flexibility by setting an environment variable or
> creating a symlink, but normally you don't. It has about the same
> flexibility as virtualenvwrapper, with about the same amount of effort. So
> if virtualenvwrapper isn't flexible enough for you, my guess is that your
> take on npm won't be flexible enough either, it'll just come preconfigured
> for your own idiosyncratic use and everyone else will have to adjust...
>

You have a point.  Maybe lack of flexibility is not actually the issue -
it's too much flexibility.  The problem that I have with virtualenv is that
it requires quite a bit of configuration and a great deal of awareness by
the user of what is going on and how things are configured.  As stated on
it's home page While there is nothing specifically wrong with this, I
usually just want a way to do something in a venv without thinking too much
about where it is or when or how to activate it.  If you've had a look at
the details of the sort of tool I'm proposing, it is completely
transparent.  Perhaps the preconfiguration is just to my own
idiosyncrasies, but if it serves its use 90% of the time then maybe that is
good enough.

Some of what I'm proposing could be incorporated in to pip (i.e. better
requirements) and some could possibly be incorporated into
virtualenvwrapper (although I still think that my proposal for handling
venvs is just too different from that of virtualenvwrapper to be worth
pursuing that course), but one of the main aims is to merge it all into one
tool that manages both the venv and the requirements.

I'm quite sure that this proposal is not going to accepted without a trial
period on pypi, so maybe that will be the test of whether this is useful.

Is this the right place for this, or would distutils-sig be better?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/79d33c2b/attachment-0001.html>

From wes.turner at gmail.com  Sun May 31 19:07:41 2015
From: wes.turner at gmail.com (Wes Turner)
Date: Sun, 31 May 2015 12:07:41 -0500
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <CAEgL-fcvYC-iEGrt8Qy=yMg4W2jxS69QMjRSV2kjt70pbQ=sFw@mail.gmail.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com>
 <CAEgL-feaq3f6kGb_J_U2U36=d3dME4jyKYUv9i1FwxqQieCbng@mail.gmail.com>
 <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com>
 <CAEgL-fcvYC-iEGrt8Qy=yMg4W2jxS69QMjRSV2kjt70pbQ=sFw@mail.gmail.com>
Message-ID: <CACfEFw9-jXoZtyyrJYO=7hEywtT2PB87SZr0GpT+WKzjWDNo+Q@mail.gmail.com>

On May 31, 2015 11:20 AM, "David Townshend" <aquavitae69 at gmail.com> wrote:
>
>
>>
>> The default for npm is that your package dir is attached directly to the
project. You can get more flexibility by setting an environment variable or
creating a symlink, but normally you don't.

I set variables in $VIRTUAL_ENV/bin/postactivate (for Python, Go, NPM, ...)
[Virtualenvwrapper].

> It has about the same flexibility as virtualenvwrapper, with about the
same amount of effort. So if virtualenvwrapper isn't flexible enough for
you, my guess is that your take on npm won't be flexible enough either,
it'll just come preconfigured for your own idiosyncratic use and everyone
else will have to adjust...
>
>
> You have a point.  Maybe lack of flexibility is not actually the issue -
it's too much flexibility.  The problem that I have with virtualenv is that
it requires quite a bit of configuration and a great deal of awareness by
the user of what is going on and how things are configured.

You must set WORKON_HOME and PROJECT_HOME.

>  As stated on it's home page While there is nothing specifically wrong
with this, I usually just want a way to do something in a venv without
thinking too much about where it is or when or how to activate it.  If
you've had a look at the details of the sort of tool I'm proposing, it is
completely transparent.  Perhaps the preconfiguration is just to my own
idiosyncrasies, but if it serves its use 90% of the time then maybe that is
good enough.
>
> Some of what I'm proposing could be incorporated in to pip (i.e. better
requirements) and some could possibly be incorporated into
virtualenvwrapper (although I still think that my proposal for handling
venvs is just too different from that of virtualenvwrapper to be worth
pursuing that course), but one of the main aims is to merge it all into one
tool that manages both the venv and the requirements.

* you can install an initial set of packages with just virtualenv (a
minimal covering / only explicitly installed packages would be useful (for
pruning deprecated dependencies))
* conda-env manages requirements for conda envs (conda env export)

  * http://conda.pydata.org/docs/test-drive.html#managing-environments
  * http://conda.pydata.org/docs/env-commands.html

* I've a similar script for working with virtualenv (now venv) and/or conda
envs in gh:westurner/dotfiles/dotfiles/venv/ipython_config.py that sets FSH
paths and more commands and aliases (like cdv for cdvirtualenv) . IDK
whether this would be useful for these use cases.

So:

* [ ] ENH: pip freeze --minimum-covering
* [ ] ENH: pip freeze --explicit-only
* [ ] DOC: virtualenv for NPM'ers

>
> I'm quite sure that this proposal is not going to accepted without a
trial period on pypi, so maybe that will be the test of whether this is
useful.
>
> Is this the right place for this, or would distutils-sig be better?

PyPA: https://github.com/mitsuhiko/pipsi/issues/44#issuecomment-105961957

>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/8bde314f/attachment.html>

From abarnert at yahoo.com  Sun May 31 21:00:57 2015
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 31 May 2015 12:00:57 -0700
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <CAEgL-fcvYC-iEGrt8Qy=yMg4W2jxS69QMjRSV2kjt70pbQ=sFw@mail.gmail.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com>
 <CAEgL-feaq3f6kGb_J_U2U36=d3dME4jyKYUv9i1FwxqQieCbng@mail.gmail.com>
 <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com>
 <CAEgL-fcvYC-iEGrt8Qy=yMg4W2jxS69QMjRSV2kjt70pbQ=sFw@mail.gmail.com>
Message-ID: <FB426EFA-666C-48A6-A8B8-8046BE906BA3@yahoo.com>

On May 31, 2015, at 09:19, David Townshend <aquavitae69 at gmail.com> wrote:
>> 
>> The default for npm is that your package dir is attached directly to the project. You can get more flexibility by setting an environment variable or creating a symlink, but normally you don't. It has about the same flexibility as virtualenvwrapper, with about the same amount of effort. So if virtualenvwrapper isn't flexible enough for you, my guess is that your take on npm won't be flexible enough either, it'll just come preconfigured for your own idiosyncratic use and everyone else will have to adjust...
> 
> You have a point.  Maybe lack of flexibility is not actually the issue - it's too much flexibility.

I think Python needs that kind of flexibility, because it's used in a much wider range of use cases, from binary end-user applications to OS components to "just run this script against your system environment" to conda packages, not just web apps managed by a deployment team and other things that fall into the same model. And it needs to be backward compatible with the different ways people have come up with for handling all those models.

While it's possible to rebuild all of those models around the npm model, and the node community is gradually coming up with ways of doing so (although notice that much of the node community is instead relying on docker or VMs...), you'd have to be able to transparently replace all of the current Python use cases today if you wanted to change Python today.

Also, as Nick pointed out, making things easier for the developer comes at the cost of making things harder for the user--which is acceptable when the user is the developer himself or a deployment team that sits at the next set of cubicles, but may not be acceptable when the user is someone who just wants to run a script he found online. Again, the Node community is coming to terms with this, but they haven't got to the same level as the Python community, and, even if they had, it still wouldn't work as a drop-in replacement without a lot of work.

What someone _could_ do is make it easier to set up a dev-friendly environment based on virtualenvwrapper and virtualenvwrapperhelper. Currently, you have to know what you're looking for and find a blog page somewhere that tells you how to install and configure all the tools and follow three or four steps. That's obvious less than ideal. It would be nice if there were a single "pip install envstuff" that got you ready out of the box (including working for Windows cmd and PowerShell), and if links to that were included in the basic Python docs. It would also be nice if there were a way to transfer your own custom setup to a new machine. But I don't see why that can't all be built as improvements on the existing tools (and a new package that just included requirements and configuration and no new tools).

> The problem that I have with virtualenv is that it requires quite a bit of configuration and a great deal of awareness by the user of what is going on and how things are configured. As stated on it's home page While there is nothing specifically wrong with this, I usually just want a way to do something in a venv without thinking too much about where it is or when or how to activate it.

But again, if that's what you want, that's what you have with virtualenvwrapper or autoenv. You just cd into the directory (whether a new one you just created with the wrapper or an old one you just pulled from git) and it's set up for you. And setting up a new environment or cloning an existing one is just a single command, too. Sure, you can make your configuration more complicated than that, but if you don't want to, you don't have to.

> If you've had a look at the details of the sort of tool I'm proposing, it is completely transparent.  Perhaps the preconfiguration is just to my own idiosyncrasies, but if it serves its use 90% of the time then maybe that is good enough.
> 
> Some of what I'm proposing could be incorporated in to pip (i.e. better requirements) and some could possibly be incorporated into virtualenvwrapper (although I still think that my proposal for handling venvs is just too different from that of virtualenvwrapper to be worth pursuing that course), but one of the main aims is to merge it all into one tool that manages both the venv and the requirements.

There are major advantages in not splitting the Python community between two different sets of tools. We've only recently gotten past easy_install vs. pip and distribute vs. setuptools, which has finally enabled a clean story for everyone who wants to distribute packages to get it right, which has finally started to happen (although there are people still finding and following blog posts that tell them to install distribute or not to use virtualenv because it doesn't play nice with py2app or whatever).

> I'm quite sure that this proposal is not going to accepted without a trial period on pypi, so maybe that will be the test of whether this is useful.
> 
> Is this the right place for this, or would distutils-sig be better?

Other people have made the case for both sides of that earlier in the thread and I'm not sure which one is more compelling...

Also, the pure pip enhancement of coming up with something better than freeze/-r may belong on distutils-sig while the environment-aware launcher and/or environment-managing tools may belong here. (Notice that Python includes venv and the py launcher, but doesn't include setuptools or pip...)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/b4acf0e4/attachment-0001.html>

From aquavitae69 at gmail.com  Sun May 31 21:50:36 2015
From: aquavitae69 at gmail.com (David Townshend)
Date: Sun, 31 May 2015 21:50:36 +0200
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <FB426EFA-666C-48A6-A8B8-8046BE906BA3@yahoo.com>
References: <CAEgL-fePfTqFs6StDJZFbPVmOS=hUa_XSu_CFOPLr_uE9keGZQ@mail.gmail.com>
 <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com>
 <CAEgL-feaq3f6kGb_J_U2U36=d3dME4jyKYUv9i1FwxqQieCbng@mail.gmail.com>
 <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com>
 <CAEgL-fcvYC-iEGrt8Qy=yMg4W2jxS69QMjRSV2kjt70pbQ=sFw@mail.gmail.com>
 <FB426EFA-666C-48A6-A8B8-8046BE906BA3@yahoo.com>
Message-ID: <CAEgL-ffVOfDkAJYO3cABAtw_HBHLP38jkTD+uBd8aApB6ECDOw@mail.gmail.com>

On Sun, May 31, 2015 at 9:00 PM, Andrew Barnert <abarnert at yahoo.com> wrote:

> On May 31, 2015, at 09:19, David Townshend <aquavitae69 at gmail.com> wrote:
>
>
>> The default for npm is that your package dir is attached directly to the
>> project. You can get more flexibility by setting an environment variable or
>> creating a symlink, but normally you don't. It has about the same
>> flexibility as virtualenvwrapper, with about the same amount of effort. So
>> if virtualenvwrapper isn't flexible enough for you, my guess is that your
>> take on npm won't be flexible enough either, it'll just come preconfigured
>> for your own idiosyncratic use and everyone else will have to adjust...
>>
>
> You have a point.  Maybe lack of flexibility is not actually the issue -
> it's too much flexibility.
>
>
> I think Python needs that kind of flexibility, because it's used in a much
> wider range of use cases, from binary end-user applications to OS
> components to "just run this script against your system environment" to
> conda packages, not just web apps managed by a deployment team and other
> things that fall into the same model. And it needs to be backward
> compatible with the different ways people have come up with for handling
> all those models.
>
> While it's possible to rebuild all of those models around the npm model,
> and the node community is gradually coming up with ways of doing so
> (although notice that much of the node community is instead relying on
> docker or VMs...), you'd have to be able to transparently replace all of
> the current Python use cases today if you wanted to change Python today.
>
> Also, as Nick pointed out, making things easier for the developer comes at
> the cost of making things harder for the user--which is acceptable when the
> user is the developer himself or a deployment team that sits at the next
> set of cubicles, but may not be acceptable when the user is someone who
> just wants to run a script he found online. Again, the Node community is
> coming to terms with this, but they haven't got to the same level as the
> Python community, and, even if they had, it still wouldn't work as a
> drop-in replacement without a lot of work.
>
> What someone _could_ do is make it easier to set up a dev-friendly
> environment based on virtualenvwrapper and virtualenvwrapperhelper.
> Currently, you have to know what you're looking for and find a blog page
> somewhere that tells you how to install and configure all the tools and
> follow three or four steps. That's obvious less than ideal. It would be
> nice if there were a single "pip install envstuff" that got you ready out
> of the box (including working for Windows cmd and PowerShell), and if links
> to that were included in the basic Python docs. It would also be nice if
> there were a way to transfer your own custom setup to a new machine. But I
> don't see why that can't all be built as improvements on the existing tools
> (and a new package that just included requirements and configuration and no
> new tools).
>
> The problem that I have with virtualenv is that it requires quite a bit of
> configuration and a great deal of awareness by the user of what is going on
> and how things are configured. As stated on it's home page While there is
> nothing specifically wrong with this, I usually just want a way to do
> something in a venv without thinking too much about where it is or when or
> how to activate it.
>
>
> But again, if that's what you want, that's what you have with
> virtualenvwrapper or autoenv. You just cd into the directory (whether a new
> one you just created with the wrapper or an old one you just pulled from
> git) and it's set up for you. And setting up a new environment or cloning
> an existing one is just a single command, too. Sure, you can make your
> configuration more complicated than that, but if you don't want to, you
> don't have to.
>
> If you've had a look at the details of the sort of tool I'm proposing, it
> is completely transparent.  Perhaps the preconfiguration is just to my own
> idiosyncrasies, but if it serves its use 90% of the time then maybe that is
> good enough.
>
>
> Some of what I'm proposing could be incorporated in to pip (i.e. better
> requirements) and some could possibly be incorporated into
> virtualenvwrapper (although I still think that my proposal for handling
> venvs is just too different from that of virtualenvwrapper to be worth
> pursuing that course), but one of the main aims is to merge it all into one
> tool that manages both the venv and the requirements.
>
>
> There are major advantages in not splitting the Python community between
> two different sets of tools. We've only recently gotten past easy_install
> vs. pip and distribute vs. setuptools, which has finally enabled a clean
> story for everyone who wants to distribute packages to get it right, which
> has finally started to happen (although there are people still finding and
> following blog posts that tell them to install distribute or not to use
> virtualenv because it doesn't play nice with py2app or whatever).
>
> I'm quite sure that this proposal is not going to accepted without a trial
> period on pypi, so maybe that will be the test of whether this is useful.
>
> Is this the right place for this, or would distutils-sig be better?
>
>
> Other people have made the case for both sides of that earlier in the
> thread and I'm not sure which one is more compelling...
>
> Also, the pure pip enhancement of coming up with something better than
> freeze/-r may belong on distutils-sig while the environment-aware launcher
> and/or environment-managing tools may belong here. (Notice that Python
> includes venv and the py launcher, but doesn't include setuptools or pip...)
>

Just to be clear, I'm not suggesting changing the python executable itself,
or any of the other tools already in existence.  My proposal is a separate
wrapper around existing python, pip and venv which would not change
anything about the way it works currently.  A dev environment set up using
it could still be deployed in the same way it would be now, and there would
still be the option of using virtualenvwrapper, or something else for those
that want to.  It is obviously way too early to try to get it included in
the next python release (apart form anything else, pip would need to be
added first), so really this proposal is meant more to gauge interest in
the concept so that if it is popular I can carry on developing it and
preparing it for inclusion in the stdlib, or at least a serious discussion
about including it, once it is mature.

That said, Andrew's arguments have convinced me that much could be done to
improve existing tools before creating a new one, although I still don't
believe virtualenvwrapper can be squashed into the shape I'm aiming for
without fundamental changes.  Also, from the other responses so far it
seems that the general feeling is that handling of requirements could
definitely be improved, but that anything too prescriptive with venvs would
be problematic.  Unfortunately for my proposal, if something like what I'm
suggesting were officially supported via inclusion in the stdlib it would
quickly become, at best, the "strongly recommended" way of working and at
worst the One Obvious Way.  With all this in mind, I'll withdraw my
proposal, but continue development on my version and see if it goes
anywhere.  I'll also see how much of it's functionality I can put into
other tools (specifically pip's requirements handling) instead.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150531/82262f5b/attachment.html>