From tomerfiliba at gmail.com  Tue Aug  1 19:27:58 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Tue, 1 Aug 2006 19:27:58 +0200
Subject: [Python-3000] gettype
In-Reply-To: <44CC6A3E.8000003@v.loewis.de>
References: <1d85506f0607061119w1c3cab60o6f762a8e3849e45c@mail.gmail.com>
	<44CC6A3E.8000003@v.loewis.de>
Message-ID: <1d85506f0608011027v4402f905ge6bc18e25ef0aa9e@mail.gmail.com>

that's surly anachronism :)

o.__class__ is a little more typing and will surely scare newbies.
moreover, type(x) and x.__class__ can return different things
(you can fool __class__, but not type()).

for my part, i'm fine with any form that makes a distinction between
the metaclass "type" and the inquire-type "type".
call it o.__class__, gettype() or typeof(), just don't mix that with
the metaclass


-tomer

On 7/30/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> tomer filiba schrieb:
> > so why not choose the "get%s()" notation?
>
> Why not o.__class__?
>
> Regards,
> Martin
>

From talin at acm.org  Wed Aug  2 04:29:51 2006
From: talin at acm.org (Talin)
Date: Tue, 01 Aug 2006 19:29:51 -0700
Subject: [Python-3000] gettype
In-Reply-To: <1d85506f0608011027v4402f905ge6bc18e25ef0aa9e@mail.gmail.com>
References: <1d85506f0607061119w1c3cab60o6f762a8e3849e45c@mail.gmail.com>	<44CC6A3E.8000003@v.loewis.de>
	<1d85506f0608011027v4402f905ge6bc18e25ef0aa9e@mail.gmail.com>
Message-ID: <44D00E1F.2040209@acm.org>

tomer filiba wrote:
> that's surly anachronism :)
> 
> o.__class__ is a little more typing and will surely scare newbies.
> moreover, type(x) and x.__class__ can return different things
> (you can fool __class__, but not type()).
> 
> for my part, i'm fine with any form that makes a distinction between
> the metaclass "type" and the inquire-type "type".
> call it o.__class__, gettype() or typeof(), just don't mix that with
> the metaclass

 From a code style perspective, I've always felt that the magical 
__underscore__ names should not be referred to ouside of the class 
implementing those names. The double underscores are an indication that 
this method or property is in most normal use cases referred to 
implicitly by use rather than explicitly by name; Thus str() invokes 
__str__ and so on.

-- Talin

From jack at psynchronous.com  Wed Aug  2 05:14:37 2006
From: jack at psynchronous.com (Jack Diederich)
Date: Tue, 1 Aug 2006 23:14:37 -0400
Subject: [Python-3000] gettype
In-Reply-To: <44D00E1F.2040209@acm.org>
References: <1d85506f0607061119w1c3cab60o6f762a8e3849e45c@mail.gmail.com>
	<44CC6A3E.8000003@v.loewis.de>
	<1d85506f0608011027v4402f905ge6bc18e25ef0aa9e@mail.gmail.com>
	<44D00E1F.2040209@acm.org>
Message-ID: <20060802031437.GJ25353@performancedrivers.com>

On Tue, Aug 01, 2006 at 07:29:51PM -0700, Talin wrote:
> tomer filiba wrote:
> > that's surly anachronism :)
> > 
> > o.__class__ is a little more typing and will surely scare newbies.
> > moreover, type(x) and x.__class__ can return different things
> > (you can fool __class__, but not type()).
> > 
> > for my part, i'm fine with any form that makes a distinction between
> > the metaclass "type" and the inquire-type "type".
> > call it o.__class__, gettype() or typeof(), just don't mix that with
> > the metaclass
> 
>  From a code style perspective, I've always felt that the magical 
> __underscore__ names should not be referred to ouside of the class 
> implementing those names. The double underscores are an indication that 
> this method or property is in most normal use cases referred to 
> implicitly by use rather than explicitly by name; Thus str() invokes 
> __str__ and so on.

The paired double underscores indicate that the function is special to 
the instance's class.  C++ converts understand this just fine until you
mention that classes are themselves instances at which point the grey
matter takes a while to settle again [guilty].  After that reshuffling
you are again assaulted because the stack stops.  The class of a class
is a type but the class of a class of a class is still a type.  Turtles
all the way down.

See the recent thread on python-checkins for some discussion on why
"isinstance(ob, type(type))" isn't just legal -- it's backwards compatible!

-Jack


From ark-mlist at att.net  Wed Aug  2 06:56:18 2006
From: ark-mlist at att.net (Andrew Koenig)
Date: Wed, 2 Aug 2006 00:56:18 -0400
Subject: [Python-3000] gettype
In-Reply-To: <44D00E1F.2040209@acm.org>
Message-ID: <001001c6b5ef$fdd258f0$6402a8c0@arkdesktop>

>  From a code style perspective, I've always felt that the magical
> __underscore__ names should not be referred to ouside of the class
> implementing those names. The double underscores are an indication that
> this method or property is in most normal use cases referred to
> implicitly by use rather than explicitly by name; Thus str() invokes
> __str__ and so on.

Haven't we seen this argument somewhere before?  :-)

(needless to say, I'm in agreement with it in this context too)



From ncoghlan at iinet.net.au  Thu Aug  3 14:58:44 2006
From: ncoghlan at iinet.net.au (Nick Coghlan)
Date: Thu, 03 Aug 2006 22:58:44 +1000
Subject: [Python-3000] Rounding in Py3k
Message-ID: <44D1F304.4020700@iinet.net.au>

Some musings inspired by the rounding discussion on python-dev.

The Decimal module provides all of the rounding modes from the general decimal 
arithmetic specification [1].

Both Decimal rounding methods (quantize() and to_integral()) return Decimal 
instances - a subsequent explicit conversion to int() is needed if you want a 
real integer (just like the builtin round()).

Normal floats, OTOH, only have easy access to truncate (through int()) and 
round-half-up (through round()).

Additionally, the Decimal 'quantize' method signature is fine if you have 
decimal literals, but not so good for Python where you have to write 
"n.quantize(d('1e-2'))" to round to two decimal places.

The implicit Decimal->float conversion also allows Decimals to be rounded with 
the round() builtin, but that can lead to errors in rounding near the limits 
of floating point precision due to the use of an imprecise conversion in 
Decimal.__float__():

 >>> n = (1 + d("5e-16"))
 >>> n
Decimal("1.0000000000000005")
 >>> float(n.quantize(d('1e-15')))
1.0
 >>> round(n, 15)
1.0000000000000011

Would it be worthwhile to design a common rounding mechanism that can be used 
to cleanly round values to the built in floating point type, as well as being 
able to access the different rounding modes for decimal instances?

For example, replace the builtin function round() with a non-instantiable 
class like the following:

   _TEN = decimal.Decimal(10)
   class round(object):

     @staticmethod
     def half_up(num, ndigits=0):
         if isinstance(num, decimal.Decimal):
             return float(num.quantize(_TEN**(-ndigits)),
                          rounding = decimal.ROUND_HALF_UP)
         return float(num)._round_half_up()


     __call__ = half_up

     @staticmethod
     def down(num, ndigits=0):
         if isinstance(num, decimal.Decimal):
             return float(num.quantize(_TEN**(-ndigits)),
                          rounding = decimal.ROUND_DOWN)
         return float(num)._round_down()

     # etc for the other 5 rounding modes

Cheers,
Nick.

[1] The 7 decimal rounding modes:

round-down (truncate; round towards 0)
round-half-up (school rounding)
round-half-even (bankers' rounding)
round-ceiling (round towards positive infinity)
round-floor (round towards negative infinity)
round-half-down (WTF rounding :)
round-up (round away from zero)


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From greg.ewing at canterbury.ac.nz  Fri Aug  4 03:51:19 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 04 Aug 2006 13:51:19 +1200
Subject: [Python-3000] Rounding in Py3k
In-Reply-To: <44D1F304.4020700@iinet.net.au>
References: <44D1F304.4020700@iinet.net.au>
Message-ID: <44D2A817.8040303@canterbury.ac.nz>

Nick Coghlan wrote:

> The implicit Decimal->float conversion

Hang on, I thought there weren't supposed to be any
implicit conversions between Decimal and float.

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Aug  4 03:51:25 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 04 Aug 2006 13:51:25 +1200
Subject: [Python-3000] Rounding in Py3k
In-Reply-To: <44D1F304.4020700@iinet.net.au>
References: <44D1F304.4020700@iinet.net.au>
Message-ID: <44D2A81D.2050204@canterbury.ac.nz>

Nick Coghlan wrote:

> Would it be worthwhile to design a common rounding mechanism that can be used 
> to cleanly round values to the built in floating point type, as well as being 
> able to access the different rounding modes for decimal instances?

Sounds like a job for a new protocol, such as __round__(self, mode, places).

--
Greg

From rrr at ronadam.com  Fri Aug  4 07:33:01 2006
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 04 Aug 2006 00:33:01 -0500
Subject: [Python-3000] Rounding in Py3k
In-Reply-To: <44D2A81D.2050204@canterbury.ac.nz>
References: <44D1F304.4020700@iinet.net.au> <44D2A81D.2050204@canterbury.ac.nz>
Message-ID: <eaumbe$b3m$1@sea.gmane.org>

Greg Ewing wrote:
> Nick Coghlan wrote:
> 
>> Would it be worthwhile to design a common rounding mechanism that can be used 
>> to cleanly round values to the built in floating point type, as well as being 
>> able to access the different rounding modes for decimal instances?
> 
> Sounds like a job for a new protocol, such as __round__(self, mode, places).
> 
> --
> Greg


Yes I agree.  And viewing this in the larger sense of how it works with 
all numeric types is better than just sticking a function into the math 
module I think.  (Although that might end up the way to do it.)

Nicks proposal adds a private method to each of the types for each mode, 
which I think clutters things up a bit, but his method does create a 
single interface to them which is nice.

I'm still not sure why "__round__" should be preferred in place of 
"round" as a method name.  There isn't an operator associated to 
rounding so wouldn't the method name not have underscores?


I think rounding any type should return that same type.  For example:


     def round(n, places, mode='half-down'):
        return n.round(places, mode)


     round(i, 2)    -> integer, unchanged value
     round(i)       -> integer, precision == 0
     round(i, -2)   -> integer

     round(f, 2)    -> float
     round(f)       -> float,   precision == 0
     round(f, -2)   -> float

     round(d, 2)    -> decimal
     round(d)       -> decimal,  precision == max (*)
     round(d, -2)   -> decimal


(*) The default decimal rounding behavior is not the same as the default 
builtin round behavior.  Should one be changed to match the other?


Calling the desired types method directly could be a good way to handle 
getting an integer when a float is given.  It's explicit.

     int.round(f, 2)  ->  integer
     int.round(f)     ->  integer
     int.round(f -2)  ->  integer

Or if you prefer...

     int.__round__(f)

Having modes seems to me to be the best way not to clutter the namespace 
although sometimes that seems like it's not an issue, and sometimes it 
seems like it is.

Here's the list of java rounding modes for comparison. It's nearly 
identical to the ones in Decimal.

     http://java.sun.com/j2se/1.5.0/docs/api/java/math/RoundingMode.html



Cheers,
   Ron













From greg.ewing at canterbury.ac.nz  Fri Aug  4 11:24:26 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 04 Aug 2006 21:24:26 +1200
Subject: [Python-3000] Rounding in Py3k
In-Reply-To: <eaumbe$b3m$1@sea.gmane.org>
References: <44D1F304.4020700@iinet.net.au> <44D2A81D.2050204@canterbury.ac.nz>
	<eaumbe$b3m$1@sea.gmane.org>
Message-ID: <44D3124A.6010300@canterbury.ac.nz>

Ron Adam wrote:

> I'm still not sure why "__round__" should be preferred in place of 
> "round" as a method name.  There isn't an operator associated to 
> rounding so wouldn't the method name not have underscores?

I was thinking there would be functions such as round(),
trunc(), etc. that use __round__ to do their work. That's
why I called it a protocol and not just a method.

--
Greg

From rrr at ronadam.com  Fri Aug  4 12:46:42 2006
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 04 Aug 2006 05:46:42 -0500
Subject: [Python-3000] Rounding in Py3k
In-Reply-To: <44D3124A.6010300@canterbury.ac.nz>
References: <44D1F304.4020700@iinet.net.au>
	<44D2A81D.2050204@canterbury.ac.nz>	<eaumbe$b3m$1@sea.gmane.org>
	<44D3124A.6010300@canterbury.ac.nz>
Message-ID: <eav8nk$vji$1@sea.gmane.org>

Greg Ewing wrote:
> Ron Adam wrote:
> 
>> I'm still not sure why "__round__" should be preferred in place of 
>> "round" as a method name.  There isn't an operator associated to 
>> rounding so wouldn't the method name not have underscores?
> 
> I was thinking there would be functions such as round(),
> trunc(), etc. that use __round__ to do their work. That's
> why I called it a protocol and not just a method.
> 
> --
> Greg

I understood your point. :-)

If you look at the methods in int, long, and float, there are no methods 
that do not have double underscores.  While there are many that don't in 
unicode and string.  There also are many methods in Decimal that do not 
use the double underscore naming convention.

I am just curious why not in general for the builtin numeric types.

The style guide says...

> - __double_leading_and_trailing_underscore__: "magic" objects or
>       attributes that live in user-controlled namespaces.  E.g. __init__,
>       __import__ or __file__.  Never invent such names; only use them
>       as documented.

So would __round__ interact with the interpreter in some "magic" way?  I 
take "magic" to mean the interpreter calls the method directly at times 
without having python coded instructions to do so.  Such as when we 
create an object from a class and __init__ gets called by the 
interpreter directly. The same goes for methods like __add__ and 
__repr__, etc...

But that doesn't explain why int, long, and float, don't have other 
non-magic methods.

I'm not attempting taking sides for or against either way, I just want 
to understand the reasons as it seems like by knowing that, the correct 
way to do it would be clear, instead of trying to wag the dog by the 
tail if you know what I mean.

Cheers,
    Ron











From tomerfiliba at gmail.com  Fri Aug  4 17:36:40 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Fri, 4 Aug 2006 17:36:40 +0200
Subject: [Python-3000] improved threading in py3k
Message-ID: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com>

python's threading model seems too weak imo. i'm not talking about
the GIL and the fact threads run one at a time -- i'm talking about the
incompleteness of the API of thread module.

once a thread is created, there is no way to kill it *externally*. which
is a pity, since the thread must be "willing" to die, for example:

def threadfunc():
   while i_am_alive:
      ....
i_am_alive = True
thread.start_new_thread(threadfunc)
i_am_alive = False

but of course you can't trust all threads work this way. moreover,
if the thread calls an internal function that blocks but doesn't check
i_am_alive, it will never exit. not to mention messing around with
globals, etc.

the proposed solution is introducing thread.kill, for example:
>>> import time
>>> import thread
>>> thread.start_new_thread(time.sleep, (10,))
476
>>> thread.kill(476)

thread.kill() would raise the ThreadExit exception at the context of the
given thread, which, unless caught, causes the thread to exit silently.
if it is the last thread of the process, ThreadExit is equivalent to
SystemExit.

another issue is sys.exit()/SystemExit -- suppose a thread wants to
cause the interpreter to exit. calling sys.exit in any thread but the main
one will simply kill the *calling* thread. the only way around it is calling
os.abort or os._exit(*)... but these functions do not perform cleanups.

i would suggest raising SystemExit at the context of any thread, when
the exception is not caught, will re-raise the exception at the context
of the main thread, where it can be re-caught or the interpreter would exit.

and of course, once the functionality of the thread module is extended,
the threading module must be extended to support it as well.

- - - -

(*) about os._exit -- how about introducing os.exit, which would serve
as the "nicer" version of os._exit? os.exit would kill the process in
the same way SystemExit kills it (performing cleanups and all).
in fact, the interpreter would just call os.exit() when catching SystemExit.

it would also allow you to ensure the interpreter is killed, as SystemExit
can be caught by external code against your will.


-tomer

From jcarlson at uci.edu  Fri Aug  4 20:17:49 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 04 Aug 2006 11:17:49 -0700
Subject: [Python-3000] improved threading in py3k
In-Reply-To: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com>
References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com>
Message-ID: <20060804105349.E6C3.JCARLSON@uci.edu>


"tomer filiba" <tomerfiliba at gmail.com> wrote:
> python's threading model seems too weak imo. i'm not talking about
> the GIL and the fact threads run one at a time -- i'm talking about the
> incompleteness of the API of thread module.

I could have sworn that it could be implemented as a debugging trace
function [1], but my tests [2] seem to imply that non-mainthread code
doesn't actually have the trace function called.

 - Josiah

[1]

>>> import sys
>>> import threading
>>>
>>> kill_these = {}
>>>
>>> def killthread(thread):
...     kill_these[thread] = None
...
>>> def trace(*args):
...     del args
...     if threading.currentThread() in kill_these:
...         #pick some exception unlikely/impossible to catch
...         raise MemoryError
...     return trace
...
>>> sys.settrace(trace)
>>> def waster():
...     while 1:
...             a = 1
...             b = 2
...             c = 3
...
>>> x = threading.Thread(target=waster)
>>> x.start()
>>> killthread(x)
>>> kill_these
{<Thread(Thread-1, started)>: None}
>>> x in kill_these
True
>>> x in threading.enumerate()
True
>>> threading.enumerate()
[<Thread(Thread-1, started)>, <_MainThread(MainThread, started)>]
>>> 


[2]
>>> import threading
>>> import sys
>>> seen = {}
>>> def trace(*args):
...     x = threading.currentThread()
...     if x not in seen:
...             print x
...     seen[x] = None
...     return trace
...
>>> sys.settrace(trace)
>>> def waster():
<_MainThread(MainThread, started)>
...     while 1:
...             a = 1
...             b = 2
...             c = 3
...
>>> x = threading.Thread(target=waster)
>>> x.start()
>>>

This is in Python 2.4.3 on Windows.

> - - - -
> 
> (*) about os._exit -- how about introducing os.exit, which would serve
> as the "nicer" version of os._exit? os.exit would kill the process in
> the same way SystemExit kills it (performing cleanups and all).
> in fact, the interpreter would just call os.exit() when catching SystemExit.

Already exists as sys.exit()

 - Josiah


From tomerfiliba at gmail.com  Fri Aug  4 20:55:55 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Fri, 4 Aug 2006 20:55:55 +0200
Subject: [Python-3000] improved threading in py3k
In-Reply-To: <20060804105349.E6C3.JCARLSON@uci.edu>
References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com>
	<20060804105349.E6C3.JCARLSON@uci.edu>
Message-ID: <1d85506f0608041155rbf7b38egbae39f521a6f8a2a@mail.gmail.com>

> [...] it could be implemented as a debugging trace function

even if it could be, *why*? you can't really suggest that from now on,
every multithreaded app must run in trace mode, right? it's a performance
penalty for no good reason -- it's a question of API.

just as the API lets you *create* threads, it should allow you to *kill* them,
once you decide so. your code shouldn't rely on the "cooperativeness" of
other functions (i.e., the thread does blocking IO using some external
library, but you wish to stop it after some timeout, etc.).

all i was talking about was adding a new function to the thread module,
as well as a new builtin exception to completement it. it's no such a big
change that you should work extra hours in inventing creative workarounds
for.

- - - -

you said:
> Already exists as sys.exit()

but i said:
>> it would also allow you to ensure the interpreter is killed, as SystemExit
>> can be caught by external code against your will.

please take the time to read my post before you reply.
here is what i mean by "against your will":

>>> import sys
>>>
>>> try:
...     sys.exit()
... except:
...     print "fooled you"
...
fooled you
>>>

if my library raises SystemExit, but the user is not aware of that, he/she
can block it [un]intentionally, causing undefined behavior in my library.
os.exit() would really just perform cleanup and exit (not by the means
of exceptions)... just like os._exit(), but not as crude.


-tomer

On 8/4/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "tomer filiba" <tomerfiliba at gmail.com> wrote:
> > python's threading model seems too weak imo. i'm not talking about
> > the GIL and the fact threads run one at a time -- i'm talking about the
> > incompleteness of the API of thread module.
>
> I could have sworn that it could be implemented as a debugging trace
> function [1], but my tests [2] seem to imply that non-mainthread code
> doesn't actually have the trace function called.
>
>  - Josiah
>
> [1]
>
> >>> import sys
> >>> import threading
> >>>
> >>> kill_these = {}
> >>>
> >>> def killthread(thread):
> ...     kill_these[thread] = None
> ...
> >>> def trace(*args):
> ...     del args
> ...     if threading.currentThread() in kill_these:
> ...         #pick some exception unlikely/impossible to catch
> ...         raise MemoryError
> ...     return trace
> ...
> >>> sys.settrace(trace)
> >>> def waster():
> ...     while 1:
> ...             a = 1
> ...             b = 2
> ...             c = 3
> ...
> >>> x = threading.Thread(target=waster)
> >>> x.start()
> >>> killthread(x)
> >>> kill_these
> {<Thread(Thread-1, started)>: None}
> >>> x in kill_these
> True
> >>> x in threading.enumerate()
> True
> >>> threading.enumerate()
> [<Thread(Thread-1, started)>, <_MainThread(MainThread, started)>]
> >>>
>
>
> [2]
> >>> import threading
> >>> import sys
> >>> seen = {}
> >>> def trace(*args):
> ...     x = threading.currentThread()
> ...     if x not in seen:
> ...             print x
> ...     seen[x] = None
> ...     return trace
> ...
> >>> sys.settrace(trace)
> >>> def waster():
> <_MainThread(MainThread, started)>
> ...     while 1:
> ...             a = 1
> ...             b = 2
> ...             c = 3
> ...
> >>> x = threading.Thread(target=waster)
> >>> x.start()
> >>>
>
> This is in Python 2.4.3 on Windows.
>
> > - - - -
> >
> > (*) about os._exit -- how about introducing os.exit, which would serve
> > as the "nicer" version of os._exit? os.exit would kill the process in
> > the same way SystemExit kills it (performing cleanups and all).
> > in fact, the interpreter would just call os.exit() when catching SystemExit.
>
> Already exists as sys.exit()
>
>  - Josiah
>
>

From jcarlson at uci.edu  Fri Aug  4 21:29:09 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 04 Aug 2006 12:29:09 -0700
Subject: [Python-3000] improved threading in py3k
In-Reply-To: <1d85506f0608041155rbf7b38egbae39f521a6f8a2a@mail.gmail.com>
References: <20060804105349.E6C3.JCARLSON@uci.edu>
	<1d85506f0608041155rbf7b38egbae39f521a6f8a2a@mail.gmail.com>
Message-ID: <20060804121614.E6D4.JCARLSON@uci.edu>


"tomer filiba" <tomerfiliba at gmail.com> wrote:
> 
> > [...] it could be implemented as a debugging trace function
> 
> even if it could be, *why*? you can't really suggest that from now on,
> every multithreaded app must run in trace mode, right? it's a performance
> penalty for no good reason -- it's a question of API.

You can remove the performance penalty by resetting the trace function
to None.


> just as the API lets you *create* threads, it should allow you to *kill* them,
> once you decide so. your code shouldn't rely on the "cooperativeness" of
> other functions (i.e., the thread does blocking IO using some external
> library, but you wish to stop it after some timeout, etc.).

According to recent unrelated research with regards to the Win32 API,
most thread killing methods (if not all?) leaves the thread state broken
in such a way that the only way to fix it is to close down the process. 
Then again, I could be misremembering, the Win32 API is huge.


> all i was talking about was adding a new function to the thread module,
> as well as a new builtin exception to completement it. it's no such a big
> change that you should work extra hours in inventing creative workarounds
> for.

It took me 5 minutes to generate that possible solution and a test for
it.  I wasn't saying that the functionality was generally undesireable,
just that I believed it should be possible in pure Python today (rather
than waiting for Py3k as is the implication by your posting in the Py3k
mailing list), and showing why it couldn't be done today.  It also
brings up the implied question as to whether non-mainthreads should
actually execute trace functions.


> you said:
> > Already exists as sys.exit()
> 
> but i said:
> >> it would also allow you to ensure the interpreter is killed, as SystemExit
> >> can be caught by external code against your will.
> 
> please take the time to read my post before you reply.
> here is what i mean by "against your will":

I wasn't aware that sys.exit() raised SystemExit, as I tend to not use
bare excepts or sys.exit() in my code (I prefer os._exit(), because when
I want to quit, cleanup is the least of my worries).  You could have
said "sys.exit() raises SystemExit" and I would have understood my
mistake.


I'm curious as to what I have done to deserve the rudeness of your reply.
 - Josiah


From tomerfiliba at gmail.com  Fri Aug  4 22:21:54 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Fri, 4 Aug 2006 22:21:54 +0200
Subject: [Python-3000] improved threading in py3k
In-Reply-To: <20060804121614.E6D4.JCARLSON@uci.edu>
References: <20060804105349.E6C3.JCARLSON@uci.edu>
	<1d85506f0608041155rbf7b38egbae39f521a6f8a2a@mail.gmail.com>
	<20060804121614.E6D4.JCARLSON@uci.edu>
Message-ID: <1d85506f0608041321h5a3b1d76gfae5bca45c37ff7e@mail.gmail.com>

> I'm curious as to what I have done to deserve the rudeness of your reply.
well, i'm kinda pissed off by rockets flying over my house, svn giving me a
hard life, and what not. but what you have done was dismissing my post on
shaky grounds.

if all you meant was adding this support for the 2.x branch as a *workaround*,
i truly apologize.

> According to recent unrelated research with regards to the Win32 API,
> most thread killing methods (if not all?) leaves the thread state broken
> in such a way that the only way to fix it is to close down the process.
> Then again, I could be misremembering, the Win32 API is huge.

that may be so, but my suggestion wasn't *killing* the thread directly -
i'm sure one can use win32api to forcefully kill threads.
my idea, which is loosely based on dotNET (perhaps also applicable in java),
was raising a ThreadExit exception in the context of the given thread.
that way, the exception propagates up normally, and will eventually cause
the thread's main function to exit silently, unless caught (just as it works
today).

the issue here is raising the exception in *another* thread (externally);
this could only be done from a builtin-function (AFAIK); the rest of the
mechanisms are already in place.

- - -

sorry for bursting out.


-tomer

On 8/4/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "tomer filiba" <tomerfiliba at gmail.com> wrote:
> >
> > > [...] it could be implemented as a debugging trace function
> >
> > even if it could be, *why*? you can't really suggest that from now on,
> > every multithreaded app must run in trace mode, right? it's a performance
> > penalty for no good reason -- it's a question of API.
>
> You can remove the performance penalty by resetting the trace function
> to None.
>
>
> > just as the API lets you *create* threads, it should allow you to *kill* them,
> > once you decide so. your code shouldn't rely on the "cooperativeness" of
> > other functions (i.e., the thread does blocking IO using some external
> > library, but you wish to stop it after some timeout, etc.).
>
> According to recent unrelated research with regards to the Win32 API,
> most thread killing methods (if not all?) leaves the thread state broken
> in such a way that the only way to fix it is to close down the process.
> Then again, I could be misremembering, the Win32 API is huge.
>
>
> > all i was talking about was adding a new function to the thread module,
> > as well as a new builtin exception to completement it. it's no such a big
> > change that you should work extra hours in inventing creative workarounds
> > for.
>
> It took me 5 minutes to generate that possible solution and a test for
> it.  I wasn't saying that the functionality was generally undesireable,
> just that I believed it should be possible in pure Python today (rather
> than waiting for Py3k as is the implication by your posting in the Py3k
> mailing list), and showing why it couldn't be done today.  It also
> brings up the implied question as to whether non-mainthreads should
> actually execute trace functions.
>
>
> > you said:
> > > Already exists as sys.exit()
> >
> > but i said:
> > >> it would also allow you to ensure the interpreter is killed, as SystemExit
> > >> can be caught by external code against your will.
> >
> > please take the time to read my post before you reply.
> > here is what i mean by "against your will":
>
> I wasn't aware that sys.exit() raised SystemExit, as I tend to not use
> bare excepts or sys.exit() in my code (I prefer os._exit(), because when
> I want to quit, cleanup is the least of my worries).  You could have
> said "sys.exit() raises SystemExit" and I would have understood my
> mistake.
>
>
> I'm curious as to what I have done to deserve the rudeness of your reply.
>  - Josiah
>
>

From jcarlson at uci.edu  Fri Aug  4 23:02:28 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 04 Aug 2006 14:02:28 -0700
Subject: [Python-3000] improved threading in py3k
In-Reply-To: <1d85506f0608041321h5a3b1d76gfae5bca45c37ff7e@mail.gmail.com>
References: <20060804121614.E6D4.JCARLSON@uci.edu>
	<1d85506f0608041321h5a3b1d76gfae5bca45c37ff7e@mail.gmail.com>
Message-ID: <20060804134148.E6D7.JCARLSON@uci.edu>


"tomer filiba" <tomerfiliba at gmail.com> wrote:
> 
> > I'm curious as to what I have done to deserve the rudeness of your reply.
> well, i'm kinda pissed off by rockets flying over my house, svn giving me a
> hard life, and what not. but what you have done was dismissing my post on
> shaky grounds.

Ick.  I can understand how you are frustrated.

> > According to recent unrelated research with regards to the Win32 API,
> > most thread killing methods (if not all?) leaves the thread state broken
> > in such a way that the only way to fix it is to close down the process.
> > Then again, I could be misremembering, the Win32 API is huge.
> 
> that may be so, but my suggestion wasn't *killing* the thread directly -
> i'm sure one can use win32api to forcefully kill threads.
> my idea, which is loosely based on dotNET (perhaps also applicable in java),
> was raising a ThreadExit exception in the context of the given thread.
> that way, the exception propagates up normally, and will eventually cause
> the thread's main function to exit silently, unless caught (just as it works
> today).
> 
> the issue here is raising the exception in *another* thread (externally);
> this could only be done from a builtin-function (AFAIK); the rest of the
> mechanisms are already in place.

One of the use-cases you specified was that C calls could perhaps be
aborted (an artificial timeout).

Does there exist a mechanism that is able to abort the execution of C
code from another C thread without killing the process?  If so, then
given that the C could be aborted at literally any point of execution,
how could any cleanup be done?


 - Josiah


From qrczak at knm.org.pl  Fri Aug  4 23:42:07 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Fri, 04 Aug 2006 23:42:07 +0200
Subject: [Python-3000] improved threading in py3k
In-Reply-To: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com>
	(tomer filiba's message of "Fri, 4 Aug 2006 17:36:40 +0200")
References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com>
Message-ID: <87r6zwp49c.fsf@qrnik.zagroda>

"tomer filiba" <tomerfiliba at gmail.com> writes:

> once a thread is created, there is no way to kill it *externally*.
> which is a pity, since the thread must be "willing" to die,

Doing that unconditionally is impractical: the thread has no way
to protect itself from being killed at moments it has invariants of
shared data temporarily violated.

I agree that it should not require continuous checking for a
thread-local "ask to terminate" flag spread into all potentially
long-running loops, i.e. it requires a language mechanism. But it
must be temporarily blockable and catchable.

Here is how I think the design should look like:
http://www.cs.ioc.ee/tfp-icfp-gpce05/tfp-proc/06num.pdf

This is the same issue as with other asynchronous exceptions like ^C.
What has happened to Freund's & Mitchell's "Safe Asynchronous Exceptions
For Python" <http://www.cs.williams.edu/~freund/papers/02-lwl2.ps>?
My design is an extension of that.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From jcarlson at uci.edu  Sat Aug  5 00:16:33 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 04 Aug 2006 15:16:33 -0700
Subject: [Python-3000] improved threading in py3k
In-Reply-To: <87r6zwp49c.fsf@qrnik.zagroda>
References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com>
	<87r6zwp49c.fsf@qrnik.zagroda>
Message-ID: <20060804150338.E6DA.JCARLSON@uci.edu>


"Marcin 'Qrczak' Kowalczyk" <qrczak at knm.org.pl> wrote:
> "tomer filiba" <tomerfiliba at gmail.com> writes:
> 
> > once a thread is created, there is no way to kill it *externally*.
> > which is a pity, since the thread must be "willing" to die,
> 
> Doing that unconditionally is impractical: the thread has no way
> to protect itself from being killed at moments it has invariants of
> shared data temporarily violated.
> 
> I agree that it should not require continuous checking for a
> thread-local "ask to terminate" flag spread into all potentially
> long-running loops, i.e. it requires a language mechanism. But it
> must be temporarily blockable and catchable.
> 
> Here is how I think the design should look like:
> http://www.cs.ioc.ee/tfp-icfp-gpce05/tfp-proc/06num.pdf

I did not read all of that paper, but it seems to rely on the
(un)masking of signals in threads, as well as the sending of signals to
'kill' a thread.  One problem is that Windows doesn't really allow the
sending/recieving of any non-process-killing signals, so it would be a
platform-specific feature.

If you want a sample implementation of that kind of thing, SAGE (http://modular.math.washington.edu/sage/) performs signal
masking/unmasking to stop the execution of underlying computation
threads.

 - Josiah


From qrczak at knm.org.pl  Sat Aug  5 12:29:59 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sat, 05 Aug 2006 12:29:59 +0200
Subject: [Python-3000] improved threading in py3k
In-Reply-To: <20060804150338.E6DA.JCARLSON@uci.edu> (Josiah Carlson's
	message of "Fri, 04 Aug 2006 15:16:33 -0700")
References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com>
	<87r6zwp49c.fsf@qrnik.zagroda> <20060804150338.E6DA.JCARLSON@uci.edu>
Message-ID: <87ac6j326w.fsf@qrnik.zagroda>

Josiah Carlson <jcarlson at uci.edu> writes:

> I did not read all of that paper, but it seems to rely on the
> (un)masking of signals in threads, as well as the sending of signals
> to 'kill' a thread.

They are not OS signals: it's entirely the matter of the language's
runtime system.

(But Unix signals can be nicely exposed as these signals for the
programmer.)

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From robinbryce at gmail.com  Mon Aug  7 17:11:22 2006
From: robinbryce at gmail.com (Robin Bryce)
Date: Mon, 7 Aug 2006 16:11:22 +0100
Subject: [Python-3000] improved threading in py3k
In-Reply-To: <87ac6j326w.fsf@qrnik.zagroda>
References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com>
	<87r6zwp49c.fsf@qrnik.zagroda> <20060804150338.E6DA.JCARLSON@uci.edu>
	<87ac6j326w.fsf@qrnik.zagroda>
Message-ID: <bcf87d920608070811h37e37047r2f0fe49710956303@mail.gmail.com>

On 05/08/06, Marcin 'Qrczak' Kowalczyk <qrczak at knm.org.pl> wrote:
> Josiah Carlson <jcarlson at uci.edu> writes:
>
> > I did not read all of that paper, but it seems to rely on the
> > (un)masking of signals in threads, as well as the sending of signals
> > to 'kill' a thread.
>
> They are not OS signals: it's entirely the matter of the language's
> runtime system.
>

Have you come across the Pi-Calculus ? Every time I see this topic
come up (GIL, threads, concurrency) it seems to founder on the fact[1]
that this can not be solved without language support. This is not
unique to python[2].

The thing that caught my attention with the Pi-Calculus is that it
does not draw an artificial lines between os process, threads,
functional program units or data parameters and it starts out by
demonstrating very clearly why language equivalence (deterministic
automata a == DAb) does not prevent *very* annoying behavioural
differences.

A result of the work (as far as I understood it) is that all can be
treated as equivalent and strong formal tools are given for both
modeling the interactions and proving things like behavioral
equivalence. The book[4] references work done to show this is viable
in interpreted/objecty languages as well as functional ones. Coming
back a little way towards planet earth I remember the last time this
sort of thing came up someone half heatedly suggested "active objects
with messaging"[3] and things died off. Python has always struck me as
a language for pragmatists, rather than a place to play about with
esoteric academic curiosities. May be some one on this list can pick
something useful to py3k out of Pi-calculus ?

<bait-mode>quoting:http://www.python.org/dev/summary/2005-09-16_2005-09-30.html#concurrency-in-python

Guido threw down the gauntlet: rather than the endless discussion
about this topic, someone should come up with a GIL-free Python (not
necessarily CPython) and demonstrate its worth.


[1]  err, ok I can't locate the paper that shows this but I *swear*
some one far better qualified than me has written one to this effect.
[2] http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-October/000715.html

[3] http://www.python.org/dev/summary/2005-09-16_2005-09-30.html#concurrency-in-python

also, http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/365292

[4] http://www.amazon.com/gp/product/0521658691/ref=si3_rdr_err_product/002-5641420-6196034?ie=UTF8

Cheers,

Robin

From talin at acm.org  Tue Aug  8 18:49:08 2006
From: talin at acm.org (Talin)
Date: Tue, 08 Aug 2006 09:49:08 -0700
Subject: [Python-3000] Set literals - another try
Message-ID: <44D8C084.8090503@acm.org>

Part 1: The concrete proposal part.

I noticed that a lot of folks seemed to like the idea of making the 
empty set resemble the greek letter Phi, using a combination of 
parentheses and the vertical bar or forward slash character.

So lets expand on this: slice Phi in half and say that (| and |) are 
delimiters for a set literal, as follows:

    (|)     # Empty set

    (|a|)   # Set with 1 item

    (|a,b|) # Set with 2 items

The advantage of this proposal is that it maintains visual consistency 
between the 0, 1, and N element cases.


Part 2: The idle speculation part, not to be considered as a actual 
proposal.

I've often said that "whenever a programmer has the urge to invent a new 
programming language, that they should lie down on the couch until the 
feeling passes".

One of the reasons for this is that many times, a programmer's 
motivation in creating a new language is not that they actually need a 
new language, but rather as a means of *criticising* an existing 
language. Inventing their own language gives them the opportunity to 
show how they would have done it.

I think that kind of criticism can be valid, and that languages invented 
for this purpose can be useful, as long as you don't actually sit down 
and try to implement the thing.

As a thought experiment, I decided to apply this idea to the Python set 
literal case - i.e. if we were going to do a massive "do over" of 
Python, how would we approach the problem of set literals?

The syntax that comes to mind is something like this:

    a = b|c

Where the vertical bar character means "forms a set with". Larger sets 
could be made using the same syntax:

    a = b|c|c|d

You can also wrap parens around the set if you want:

    a = (b|c)

Like tuples, a set with a single member still requires at least one 
delimiter:

    a = (b|)

And the for the empty set, we're back to phi again:

    a = (|)

However, the parens aren't generally required - the rules are pretty 
much the same as for tuples and the comma operator. Thus, passing sets 
as arguments:

    index = s.find_first_of( 'a'|'b'|'c'|'d' )

Of course, by doing this, we're re-assigning the meaning of the '|' 
operator from 'bitwise or' to 'set construction'. This only makes sense 
if you assume that either (a) set construction is more common than 
bitwise-or operations or (b) you provide some reasonable alternative way 
to express bitwise-or operations. Lets assume that we create some 
reasonable replacement and move on.

Another thing to note is that the set construction operator resembles in 
some ways the "alternative" operator of BNF notation. In the previous 
example, 'find_first_of' looks for the first of the given alternatives.

Since dictionaries are similar to sets, we can represent a dictionary as 
a set of keys and associated values. Dictionary literals already use the 
':' operator to indicate a key - we can continue that with:

    a = ('Monday':1 | 'Tuesday':2 | 'Wednesday':3)

Unlike the current language, however, you can omit the parens:

    a = 'Monday':1 | 'Tuesday':2 | 'Wednesday':3

(This creates a syntax ambiguity with colon, but let's move on :)

One of the fun things about this line of speculation is watching how 
such a tiny change ripples outward, affecting the entire language 
definition. In this case, the change to set construction has much 
farther-reaching effects than what I have described here, assuming that 
you take each effect to its logical conclusion. I find it an enjoyable 
mental excersize :)

-- Talin

From talin at acm.org  Tue Aug  8 18:52:36 2006
From: talin at acm.org (Talin)
Date: Tue, 08 Aug 2006 09:52:36 -0700
Subject: [Python-3000] Range literals
Message-ID: <44D8C154.9020406@acm.org>

I've seen some languages that use a double-dot (..) to mean a range of 
items. This could be syntactic sugar for range(), like so:


    for x in 1..10:
       ...

-- Talin

From jcarlson at uci.edu  Tue Aug  8 19:36:40 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 08 Aug 2006 10:36:40 -0700
Subject: [Python-3000] Set literals - another try
In-Reply-To: <44D8C084.8090503@acm.org>
References: <44D8C084.8090503@acm.org>
Message-ID: <20060808100536.E706.JCARLSON@uci.edu>


Talin <talin at acm.org> wrote:
> 
> Part 1: The concrete proposal part.
> 
> I noticed that a lot of folks seemed to like the idea of making the 
> empty set resemble the greek letter Phi, using a combination of 
> parentheses and the vertical bar or forward slash character.
> 
> So lets expand on this: slice Phi in half and say that (| and |) are 
> delimiters for a set literal, as follows:
> 
>     (|)     # Empty set
> 
>     (|a|)   # Set with 1 item
> 
>     (|a,b|) # Set with 2 items
> 
> The advantage of this proposal is that it maintains visual consistency 
> between the 0, 1, and N element cases.

That's quite a bit of punctuation to define a set literal.  In fact, for
1+ element sets, it's only 1 character shy of the set() punctuation,
while also being more difficult to type on at least US keyboards.

And if I remember my set math correctly, phi wasn't the character
generally used, it was usually a zero with a diagonal cross through it,
making (/) a better empty set literal.  But from there, the notation
devolves into a place I don't want to go.


> Part 2: The idle speculation part, not to be considered as a actual 
> proposal.
> 
> I've often said that "whenever a programmer has the urge to invent a new 
> programming language, that they should lie down on the couch until the 
> feeling passes".

Presumably you again don't remember the source of this quote, but it is
still applicable.


> As a thought experiment, I decided to apply this idea to the Python set 
> literal case - i.e. if we were going to do a massive "do over" of 
> Python, how would we approach the problem of set literals?
> 
> The syntax that comes to mind is something like this:
> 
>     a = b|c

The pipe character/bitwise or operator doesn't say to me "make a set".

Knowing what I do about set math, the only literal that really makes
sense to me is:
    {a,b,c,...}

With the empty set being:
    {/}

Interstingly enough, the non-empty set case has already been proposed,
and if I remember correctly, was generally liked, except for the
somewhat ambiguity with regards to dictionary literals.

I personally don't see much of a use for set literals, considering that
there is a non-ambiguous spelling of it currently; set(...), whose only
cost above and beyond that of a set literal is a global name lookup.  It
is 'different' from some other first-class objects (tuple, list,
dictionary, string, unicode, ...), but other first-class objects also
require such spelling: bool, enumerate, iter, len, property, reduce.
Each of which may be used sufficiently often to make sense as having a
syntax for their operations, though perhaps only len having an obvious
syntax of |obj| -> len(obj), though |obj| could also mean abs(obj), but
presumably objects would only ever have __len__ or __abs__ and not both. 
I digress.


-.5 for a set literal syntax at all, -1 for offering your particular set
literal variant, -2 for your change propagating to dictionaries and
beyond.

 - Josiah


From jcarlson at uci.edu  Tue Aug  8 19:44:17 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 08 Aug 2006 10:44:17 -0700
Subject: [Python-3000] Range literals
In-Reply-To: <44D8C154.9020406@acm.org>
References: <44D8C154.9020406@acm.org>
Message-ID: <20060808104049.E709.JCARLSON@uci.edu>


Talin <talin at acm.org> wrote:
> 
> I've seen some languages that use a double-dot (..) to mean a range of 
> items. This could be syntactic sugar for range(), like so:
> 
> 
>     for x in 1..10:
>        ...

In the pronouncement on PEP 284: http://www.python.org/dev/peps/pep-0284/

    Guido did not buy the premise that the range() format needed fixing,
    "The whole point (15 years ago) of range() was to *avoid* needing syntax
    to specify a loop over numbers. I think it's worked out well and there's
    nothing that needs to be fixed (except range() needs to become an
    iterator, which it will in Python 3.0)."

Unless Guido has decided that range/xrange are the wrong way to do
things, I don't think there is much discussion here.

 - Josiah


From tomerfiliba at gmail.com  Tue Aug  8 20:22:24 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Tue, 8 Aug 2006 20:22:24 +0200
Subject: [Python-3000] threading, part 2
Message-ID: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>

let me bring this anew, as the the previous discussion has gone quite off
tracks.
i know there are many theories/paradigms concerning parallel execution,
some require language level constructs, other being external, and let's not
ever start talking about the GIL.

(on a side note, if i may add my opinion on the subject matter, stackless
python
has the best approach to concurrency -- don't lock, yield!)

my previous suggestion asked for is a means to raise exceptions in the
context of *other* threads. all it calls for is for a new builtin function,
that
would raise a given exception at the context of a given thread.

there are some points to address:
* native calls -- well, calling builtin functions can't be interrupted that
way,
and it is problematic, but not directly related to this proposal. that's a
problem of machine code.

* breaking the thread's state -- that's not really an issue. i'm not talking

about *forcefully* killing the thread, without cleanup.

after all, exceptions can occur anywhere in the code, and at any time...
you code should always be aware of that, with no regard to being
thread-safe.

for example:
def f(a, b):
    return a + b

an innocent function, but now suppose i pass two huge strings... bad input
can cause MemoryError, although unlikely. you can't take care of
*everything*,
you must learn to live with the occasional unexpected exception.

so it's may seem brute to suggest a mechanism that raises exceptions
at arbitrary points in your code-flow, but:
* cleanup will be performed (objects will be reclaimed)
* you can handle it anywhere in the call chain (just as any other exception)
* most of the time, i'd use that to *kill* threads (the ThreadExit
exception),
so i don't expect the thread to recover. it should just die silently.


sounds better now?


-tomer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060808/ee3aa259/attachment.htm 

From qrczak at knm.org.pl  Tue Aug  8 21:05:24 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Tue, 08 Aug 2006 21:05:24 +0200
Subject: [Python-3000] threading, part 2
In-Reply-To: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
	(tomer filiba's message of "Tue, 8 Aug 2006 20:22:24 +0200")
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
Message-ID: <877j1jf3pn.fsf@qrnik.zagroda>

"tomer filiba" <tomerfiliba at gmail.com> writes:

> after all, exceptions can occur anywhere in the code, and at any time...

It's impossible to write safe code when exceptions can occur at any
time, except when you already happen have the needed atomic primitives
available.

Let's say we have a mutable doubly linked list (the list have first
and last pointers, each node has next and prev pointers). Please show
how to append a first node if exceptions can occur at any time. Not
adding the element at all if an asynchronous exception is coming is
acceptable, but corrupting the list structure is not.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From pje at telecommunity.com  Tue Aug  8 21:30:30 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 08 Aug 2006 15:30:30 -0400
Subject: [Python-3000] Cross-interpreter FFI for Python 3000?
In-Reply-To: <A1C50561B5543D45A91A0E5505A79B98058FE731@luke.radius.ad>
Message-ID: <5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com>

[Note: Discussion should move to the python-3000 list]

At 11:28 AM 8/8/2006 -0700, Paul Prescod wrote:
>I'll use up a little bit of my post-conference goodwill to push a
>long-term obsession of mine...using a Python variant as the "standard"
>extension/FFI model for Python (3000). I've heard variants of this idea
>from many people I respect, some of who are cc:ed.
>
>I want to guage interest before doing any next steps. If it's
>preemptively -1 then I won't bother. Therefore I would like to poll the
>assembled brains about the feasibility of using something like
>RPython/Pyrex as an abstraction layer to be compiled to Py2.5 PyObjects,
>Py3000 PyObjects, JNI, .NET, ...
>
>Rationale:
>
>Each Python implementation needs an FFI. Any Python without a C-oriented
>FFI lacks compatibility with C modules like Numeric and PIL. For this
>reason, PyPy re-invented something like Pyrex as RPython.

Just FYI, but if I understand correctly, PyPy is now using the ctypes API 
for its FFI.  Also, RPython is entirely unrelated to Pyrex.  RPython is 
Python with restrictions on how it's used, and doesn't include an FFI of 
its own.

I would suggest that PyPy's use of ctypes, coupled with the inclusion of 
ctypes in the Python 2.5 stdlib, means that ctypes could reasonably be 
considered a defacto standard for a C FFI in Python at this point.  While I 
*like* Pyrex a lot and use it for most extension modules I write, it is 
currently heavily tied to the CPython API, lacks many Python features that 
even RPython allows, it invents its own object model for C inheritance and 
imports, and has a lot of quirks due to being "not quite Python" in syntax 
or semantics.  These characteristics are undesirable for a 
cross-interpreter FFI, IMO.

A major advantage of using ctypes as the FFI, however, is that ctypes is a 
library, and thus does not require language or interpreter changes.  This 
means, for example, that a third party could implement a ctypes clone for 
Jython or IronPython without burdening the core developers of those 
interpreters.


>  The two are
>obviously not identical but I'm looking at the core idea of a language
>that merges Python and C concepts to achieve a usable extension
>mechanism. I overheard Jim musing about something similar for
>IronPython.
>
>But most important: Python 3000 needs something like Pyrex. Python 3000
>and Python 2.6, 2.7, 2.8 may be arbitrarily different internally. If the
>goal is for it to be "just a bit" incompatible then Guido's design space
>is quite constrained. If it is allowed to be massively incompatible then
>extension authors will scream. The Python 2.x line will co-exist with
>the Python 3000 line for a while, and both with co-exist with
>IronPython, Jython, PyPy and others.

It would probably be best if you catch up on the current work by the PyPy 
team in this area, since my understanding is that PyPy is now able to 
compile "RPython+ctypes" code to create CPython extensions in C.  This 
suggests that it should be possible to backends for C# and Java, because 
(again, if I understand correctly) the ctypes handling is done at a 
relatively high level of the translation tool chain, such that the backend 
code generators don't need to know anything about ctypes.  Hopefully Armin 
or somebody else will jump in on this point if I'm getting something wrong 
about all that.


>  * it would be simpler to write competitive Python interpreters to test
>out different design ideas...one wouldn't have to worry that such an
>interpreter would be inherently a toy because of the unavailability of
>third-party software

Note that this is also a goal of the PyPy project, and they have many such 
options now, such as "pure" GC and refcounted variants, even if you 
entirely ignore the part where backends can generate code for a variety of 
languages.


From jimjjewett at gmail.com  Tue Aug  8 21:31:37 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 8 Aug 2006 15:31:37 -0400
Subject: [Python-3000] threading, part 2
In-Reply-To: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
Message-ID: <fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>

On 8/8/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> my previous suggestion asked for is a means to raise exceptions in the
> context of *other* threads.

...

> * breaking the thread's state -- that's not really an issue. i'm not talking
> about *forcefully* killing the thread, without cleanup.

This has the same inherent problem as Java's Thread.stop -- that data
shared beyond the thread may be left in an inconsistent state because
the cleanup wasn't done, perhaps because a lock was held.

https://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html

> so it's may seem brute to suggest a mechanism that raises exceptions
> at arbitrary points in your code-flow, but:

If you're willing to forget about native code (and you suggested that
you were), then you could just check[*] every N bytecodes, the way the
interpreters already checks to decide whether it should switch
threads.  Whether the performance overhead is worthwhile is a
different question.

It might be better to just add an example thread to threading.py (or
Queue) that does its processing in a loop, and checks its own stop
variable every time through the loop.

[*] What to do in case of a raise it a bit trickier, of course --
basically, replace the next bytecode with a RAISE_VARARGS bytecode,
but that might violate some current try-except assumptions.

-jJ

From collinw at gmail.com  Tue Aug  8 21:50:20 2006
From: collinw at gmail.com (Collin Winter)
Date: Tue, 8 Aug 2006 15:50:20 -0400
Subject: [Python-3000] Set literals - another try
In-Reply-To: <20060808100536.E706.JCARLSON@uci.edu>
References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu>
Message-ID: <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com>

On 8/8/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> I personally don't see much of a use for set literals, considering that
> there is a non-ambiguous spelling of it currently; set(...), whose only
> cost above and beyond that of a set literal is a global name lookup.

I thought one of the main arguments in favor of set literals is that a
literal form would allow the compiler to perform optimisations that
the set(...) spelling doesn't allow.

Collin Winter

From jcarlson at uci.edu  Tue Aug  8 22:21:39 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 08 Aug 2006 13:21:39 -0700
Subject: [Python-3000] Set literals - another try
In-Reply-To: <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com>
References: <20060808100536.E706.JCARLSON@uci.edu>
	<43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com>
Message-ID: <20060808131458.E70C.JCARLSON@uci.edu>


"Collin Winter" <collinw at gmail.com> wrote:
> 
> On 8/8/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > I personally don't see much of a use for set literals, considering that
> > there is a non-ambiguous spelling of it currently; set(...), whose only
> > cost above and beyond that of a set literal is a global name lookup.
> 
> I thought one of the main arguments in favor of set literals is that a
> literal form would allow the compiler to perform optimisations that
> the set(...) spelling doesn't allow.

The optimization argument used to define language syntax seems a bit
like the "tail wagging the dog" cliche.  For immutable literals that are
used a huge number of times (int, tuple, and other immutables), a
literal syntax for compiler optimization makes sense.  But for mutables
(list, dict, etc.), literal syntax is more a convenience as than an
optimization, as the compiler hasn't historically created once and
copied for re-use, but pushed values on the stack and called the
relevant create list bytecode. [1]

 - Josiah


[1]
Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> def foo():
...     return [1,2,3]
...
>>> def goo():
...     return (1,2,3)
...
>>> dis.dis(foo)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 BUILD_LIST               3
             12 RETURN_VALUE
>>> dis.dis(goo)
  2           0 LOAD_CONST               4 ((1, 2, 3))
              3 RETURN_VALUE
>>>


From tjreedy at udel.edu  Tue Aug  8 23:12:25 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 8 Aug 2006 17:12:25 -0400
Subject: [Python-3000] Cross-interpreter FFI for Python 3000?
References: <A1C50561B5543D45A91A0E5505A79B98058FE731@luke.radius.ad>
	<5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com>
Message-ID: <ebaunp$k54$1@sea.gmane.org>

For those as ignorant as I was, FFI does not here mean
Friendly File Interface
Fauna and Flora International
Family Firm Institute
Forsvarets forskningsinstitutt
Film Finances, Inc.
Financial Freedom Institute
Focus on the Family Institute
...
(all but the first from Google)

but Foreign Function Interface
(from the PHP FFI package).

> I would suggest that PyPy's use of ctypes, coupled with the inclusion of
> ctypes in the Python 2.5 stdlib, means that ctypes could reasonably be
> considered a defacto standard for a C FFI in Python at this point.

Intriguing idea.  I know that the Pygame folks, for example, are 
experimenting with rewrapping the SDL (Simple Directmedia Library, the core 
of Pygame) in ctypes.

Terry Jan Reedy




From guido at python.org  Tue Aug  8 23:31:59 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Aug 2006 14:31:59 -0700
Subject: [Python-3000] Cross-interpreter FFI for Python 3000?
In-Reply-To: <ebaunp$k54$1@sea.gmane.org>
References: <A1C50561B5543D45A91A0E5505A79B98058FE731@luke.radius.ad>
	<5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com>
	<ebaunp$k54$1@sea.gmane.org>
Message-ID: <ca471dc20608081431p22afb2d4td15121228db9c1a@mail.gmail.com>

On 8/8/06, Terry Reedy <tjreedy at udel.edu> wrote:
> > I would suggest that PyPy's use of ctypes, coupled with the inclusion of
> > ctypes in the Python 2.5 stdlib, means that ctypes could reasonably be
> > considered a defacto standard for a C FFI in Python at this point.
>
> Intriguing idea.  I know that the Pygame folks, for example, are
> experimenting with rewrapping the SDL (Simple Directmedia Library, the core
> of Pygame) in ctypes.

Isn't a problem with ctypes that such extensions can no longer
guarantee "no segfaults"? This pretty much completely rules them out
for use in sandboxes such as what Brett Cannon is currently working
on. With hand-written extensions at least you can audit them to decide
whether they are safe enough.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul.prescod at xmetal.com  Tue Aug  8 23:45:18 2006
From: paul.prescod at xmetal.com (Paul Prescod)
Date: Tue, 8 Aug 2006 14:45:18 -0700
Subject: [Python-3000] Cross-interpreter FFI for Python 3000?
Message-ID: <A1C50561B5543D45A91A0E5505A79B98058FE9D6@luke.radius.ad>

>...
> 
> Just FYI, but if I understand correctly, PyPy is now using 
> the ctypes API for its FFI.  Also, RPython is entirely 
> unrelated to Pyrex.  RPython is Python with restrictions on 
> how it's used, and doesn't include an FFI of its own.

As you said elsewhere, PyPy can compile an Rpython+rctypes program to a
C file, just as Pyrex does. So I don't understand why you see them as
"entirely unrelated". There are different syntaxes, but the goals are
very similar. Pyrex uses optional type declarations (which are planned
for Python 3000). RPython infers types from rctypes API calls (which
will also be available in Python 3000). Perhaps it would be better if I
dropped the reference to Rpython and merely talked about "extcompiler"
there tool which is very parallel to the Pyrex compiler?

You make some good points about Pyrex and ctypes. I'd rather explore the
design space after I've heard whether this design direction has the
potential to be fruitful. I infer that you think "yes".

 Paul Prescod

From pje at telecommunity.com  Wed Aug  9 00:40:15 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 08 Aug 2006 18:40:15 -0400
Subject: [Python-3000] Cross-interpreter FFI for Python 3000?
In-Reply-To: <A1C50561B5543D45A91A0E5505A79B98058FE9D6@luke.radius.ad>
Message-ID: <5.1.1.6.0.20060808180036.03b61bd8@sparrow.telecommunity.com>

At 02:45 PM 8/8/2006 -0700, Paul Prescod wrote:
>As you said elsewhere, PyPy can compile an Rpython+rctypes program to a
>C file, just as Pyrex does. So I don't understand why you see them as
>"entirely unrelated".

Disclaimer again: I like and use Pyrex; I even built additional support for 
it into setuptools.  Conversely, I've only used ctypes once and am not sure 
I care for its API.  But as a practical matter, these preferences are 
irrelevant; I will end up learning to use ctypes and liking it, and so will 
everybody else, because ctypes' *dynamic* advantage will clean Pyrex's 
clock at the very moment that extcompiler is as easy to use as Pyrex is now.

To summarize the differences, Pyrex is:

* A *Python-like* language, rather than Python
* Invents new inheritance/import facilities
* Imports various bits of syntax from C, including operators, pointers, etc.
* Inherently tied to the CPython API in its implementation
* Has its own system of "header" files for compile-time import/include
* Generates C code directly from Pyrex
* Cannot be executed by standard Python

PyPy's RPython+rctypes is:

* 100% Python, with certain dynamicity constraints
* Is not tied to any particular back end -- it can be translated to C, LLVM 
code, or even JavaScript if you like, as the type inference, annotation, 
and optimization machinery is backend-independent
* Code can be run in a normal Python interpreter if a ctypes library is 
available

The only relationship I see between the two are some overlap in use cases, 
and the letters "R", "P" and "Y" in the names.  :)  In particular, Pyrex 
cannot be used in the interpreter, and I can't see Guido allowing Pyrex's C 
syntax to infect Python-the-language, so this is likely to be a stable 
barrier keeping Pyrex from having this feature, unless Greg or somebody 
else decides to create a Pyrex interpreter, or perhaps an import hook to 
translates Pyrex source code to Python bytecode that invokes the ctypes 
API.  :)

(Note, by the way, that such an import hook/translator would be equally 
usable in PyPy, instantly making it possible to compile Pyrex to any 
backend supported by PyPy!  I suggest you let that idea sink in for a 
little bit, as it helps to illustrate why making ctypes the standard FFI is 
the One Obvious Way To Do It.)


>There are different syntaxes, but the goals are very similar.

Well, you could say that about Python and Ruby, to name just two.  Syntax 
is important.  :)

But that's also entirely ignoring the wide range of practical issues 
alluded to above, and some more I'll dig into below.


>Pyrex uses optional type declarations (which are planned
>for Python 3000). RPython infers types from rctypes API calls (which
>will also be available in Python 3000).

They're available in Python 2.5, which means code can be written for them 
today.  The dynamic usability of ctypes from interpreted Python means that 
Pyrex will become a historical footnote as soon as the 
RPython+rctypes->CPython translator is practically usable; i.e., when it 
can compete with Pyrex for code generation speed (and speed of generated 
code), installability, documentation, and user community.

At that point, the advantage of being able to debug your C interface using 
the interpreter's ctypes library, and then to compile the code only if/when 
you need to, will be a killer advantage.

IMO, it doesn't make sense to fight that now-inevitable future, either on 
behalf of Pyrex or some imagined "better" alternative; instead, we might as 
well hasten that future's arrival.  We can always provide better syntax for 
ctypes at a later date, the way 'classmethod' and friends arrived in Python 
2.2 but didn't get syntax until 2.4.  If you can't wait that long, write 
that import hook to turn Pyrex source into Python bytecode.  :)


>  Perhaps it would be better if I
>dropped the reference to Rpython and merely talked about "extcompiler"
>there tool which is very parallel to the Pyrex compiler?

I'm at a bit of a loss as to how to explain how very not-useful that 
comparison is.  I would suggest reading up on PyPy architecture and Pyrex 
architecture a bit.  From an end-user perspective you can compare them as 
things that take Python-looking stuff in and spit C code out, but the devil 
is definitely in the details.  See also the lists I gave above.


>You make some good points about Pyrex and ctypes. I'd rather explore the
>design space after I've heard whether this design direction has the
>potential to be fruitful. I infer that you think "yes".

See http://dirtsimple.org/2005/10/children-of-lesser-python.html for what I 
think.  :)

In that article, I highlighted the absence of a standard Python FFI as 
being a stumbling block to the future evolution of the language, but noted 
that PyPy would likely end up with a solution.  The subsequent emergence of 
ctypes as an FFI shared by CPython and PyPy has already solved this 
problem; it is merely a question of recognizing the fact.

As of Python 2.5, anything else is going to have a serious uphill battle to 
fight -- even if it's something like Pyrex, that at least already *exists* 
and has at least *one* part-time maintainer.  A brand-new FFI invented by 
committee and with nobody yet stepping up to implement or maintain it, 
really has no chance at all.

(This is all IMO, of course, but I find it hard to imagine how anything 
else could succeed.)


From 2006a at usenet.alexanderweb.de  Wed Aug  9 01:08:50 2006
From: 2006a at usenet.alexanderweb.de (Alexander Schremmer)
Date: Wed, 9 Aug 2006 01:08:50 +0200
Subject: [Python-3000] Cross-interpreter FFI for Python 3000?
References: <A1C50561B5543D45A91A0E5505A79B98058FE731@luke.radius.ad>
	<5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com>
	<ebaunp$k54$1@sea.gmane.org>
	<ca471dc20608081431p22afb2d4td15121228db9c1a@mail.gmail.com>
Message-ID: <1ci9un3z1n806.dlg@usenet.alexanderweb.de>

On Tue, 8 Aug 2006 14:31:59 -0700, Guido van Rossum wrote:

> Isn't a problem with ctypes that such extensions can no longer
> guarantee "no segfaults"? 

How would you guarantee the "no segfaults" policy for every other bindings
involved?  In either case, auditing an extension written using ctypes or
rctypes is potentially simpler than looking at Pyrex or C code. (Think of
memory management, ref counting etc.)

> This pretty much completely rules them out for use in sandboxes such 
> as what Brett Cannon is currently working on.

Of course you will have severe problems if you allow somebody to do
unprotected calls to dynamic libraries.  But at least I am not sure if this
a problem of using CTypes ... it should be possible to e.g. flag the code
using CTypes classes to be in a different security class than the
user-sandboxed code. Building the barrier on the C level might be too
restrictive in real world applications.

> With hand-written extensions at least you can audit them to decide
> whether they are safe enough.

Please elaborate on that point - why isn't a ctypes extension
"hand-written"?

Kind regards,
Alexander


From pedronis at strakt.com  Wed Aug  9 01:18:06 2006
From: pedronis at strakt.com (Samuele Pedroni)
Date: Wed, 09 Aug 2006 01:18:06 +0200
Subject: [Python-3000] Cross-interpreter FFI for Python 3000?
In-Reply-To: <ca471dc20608081431p22afb2d4td15121228db9c1a@mail.gmail.com>
References: <A1C50561B5543D45A91A0E5505A79B98058FE731@luke.radius.ad>	<5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com>	<ebaunp$k54$1@sea.gmane.org>
	<ca471dc20608081431p22afb2d4td15121228db9c1a@mail.gmail.com>
Message-ID: <44D91BAE.5040507@strakt.com>

Guido van Rossum wrote:
> On 8/8/06, Terry Reedy <tjreedy at udel.edu> wrote:
> 
>>>I would suggest that PyPy's use of ctypes, coupled with the inclusion of
>>>ctypes in the Python 2.5 stdlib, means that ctypes could reasonably be
>>>considered a defacto standard for a C FFI in Python at this point.
>>
>>Intriguing idea.  I know that the Pygame folks, for example, are
>>experimenting with rewrapping the SDL (Simple Directmedia Library, the core
>>of Pygame) in ctypes.
> 
> 
> Isn't a problem with ctypes that such extensions can no longer
> guarantee "no segfaults"? This pretty much completely rules them out
> for use in sandboxes such as what Brett Cannon is currently working
> on. With hand-written extensions at least you can audit them to decide
> whether they are safe enough.

in PyPy rctypes approach the extensions still get compiled to c code,
and ctypes-like calls get resolved to normal c calls, although at some
point a ctypes module is going to be exposed by PyPy, in the rctypes 
approach such an exposed ctypes is not a requirement at all.
Rctypes gives ctypes-like C gluing to RPython, a different level
from normal application-level full Python. And indeed (although with
rough edges and some missing features at the moment) PyPy tool-chain can 
produce CPython extensions from such rpython+rctypes extension code.


From ncoghlan at gmail.com  Wed Aug  9 12:17:25 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 09 Aug 2006 20:17:25 +1000
Subject: [Python-3000] Set literals - another try
In-Reply-To: <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com>
References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu>
	<43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com>
Message-ID: <44D9B635.9010200@gmail.com>

Collin Winter wrote:
> On 8/8/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>> I personally don't see much of a use for set literals, considering that
>> there is a non-ambiguous spelling of it currently; set(...), whose only
>> cost above and beyond that of a set literal is a global name lookup.

> I thought one of the main arguments in favor of set literals is that a
> literal form would allow the compiler to perform optimisations that
> the set(...) spelling doesn't allow.

A different way to enable that would be to include a set of non-keyword names 
(a subset of the default builtin namespace) in the language definition that 
the compiler is explicitly permitted to treat as constants if they are not 
otherwise defined in the current lexical scope.

Then constant-folding could turn "len('abcde')" into 5, and "str(3+2)" into 
'5' and "set((1, 2, 3))" into the corresponding set object.

The only thing that would break is hacks like poking an alternate 
implementation of str or set or len into the global namespace from somewhere 
outside the module.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Wed Aug  9 12:45:08 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 09 Aug 2006 20:45:08 +1000
Subject: [Python-3000] threading, part 2
In-Reply-To: <fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>
Message-ID: <44D9BCB4.5010404@gmail.com>

Jim Jewett wrote:
> On 8/8/06, tomer filiba <tomerfiliba at gmail.com> wrote:
>> my previous suggestion asked for is a means to raise exceptions in the
>> context of *other* threads.
> 
> ...
> 
>> * breaking the thread's state -- that's not really an issue. i'm not talking
>> about *forcefully* killing the thread, without cleanup.
> 
> This has the same inherent problem as Java's Thread.stop -- that data
> shared beyond the thread may be left in an inconsistent state because
> the cleanup wasn't done, perhaps because a lock was held.
> 
> https://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html
> 
>> so it's may seem brute to suggest a mechanism that raises exceptions
>> at arbitrary points in your code-flow, but:
> 
> If you're willing to forget about native code (and you suggested that
> you were), then you could just check[*] every N bytecodes, the way the
> interpreters already checks to decide whether it should switch
> threads.  Whether the performance overhead is worthwhile is a
> different question.

That check is already there:

int PyThreadState_SetAsyncExc(	long id, PyObject *exc)
     Asynchronously raise an exception in a thread. The id argument is the 
thread id of the target thread; exc is the exception object to be raised. This 
function does not steal any references to exc. To prevent naive misuse, you 
must write your own C extension to call this. Must be called with the GIL 
held. Returns the number of thread states modified; if it returns a number 
greater than one, you're in trouble, and you should call it again with exc set 
to NULL to revert the effect. This raises no exceptions. New in version 2.3.

In Python 2.5, you can use ctypes to get at the whole C API from Python code, 
and calling thread.get_ident() in the run() method will allow you to find out 
the thread id of your thread (you'll need to save that value somewhere so 
other code can get at it).

All Tober is really asking for is a method on threading.Thread objects that 
uses this existing API to set a builtin ThreadExit exception. The thread 
module would consider a thread finishing with ThreadExit to be 
non-exceptional, so you could easily do:

   th.terminate() # Raise ThreadExit in th's thread of control
   th.join() # Should finish up pretty quickly

Proper resource cleanup would be reliant on correct use of try/finally or with 
statements, but that's the case regardless of whether or not asynchronous 
exceptions are allowed.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Wed Aug  9 12:57:12 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 09 Aug 2006 20:57:12 +1000
Subject: [Python-3000] Cross-interpreter FFI for Python 3000?
In-Reply-To: <5.1.1.6.0.20060808180036.03b61bd8@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060808180036.03b61bd8@sparrow.telecommunity.com>
Message-ID: <44D9BF88.6080705@gmail.com>

Phillip J. Eby wrote:
> (This is all IMO, of course, but I find it hard to imagine how anything 
> else could succeed.)

Having just made the point in another thread that it is possible to use ctypes 
to access the CPython API functions like PyThreadState_SetAsyncExc that have 
been designated "extension module only", I'm one who agrees with you - adding 
ctypes to the standard library effectively adopted it as Python's foreign 
function interface.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From jimjjewett at gmail.com  Wed Aug  9 18:36:24 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 9 Aug 2006 12:36:24 -0400
Subject: [Python-3000] Set literals - another try
In-Reply-To: <44D9B635.9010200@gmail.com>
References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu>
	<43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com>
	<44D9B635.9010200@gmail.com>
Message-ID: <fb6fbf560608090936q2df9d04dje5ea25ba835a0dbe@mail.gmail.com>

On 8/9/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> A different way to enable that would be to include a set of non-keyword names
> (a subset of the default builtin namespace) in the language definition that
> the compiler is explicitly permitted to treat as constants if they are not
> otherwise defined in the current lexical scope.

Realistically, I want my own functions and class definitions to be
treated that way (inlinable) most of the time.  I don't want to start
marking them with "stable".

> The only thing that would break is hacks like poking an alternate
> implementation of str or set or len into the global namespace from somewhere
> outside the module.

So what we need is a module that either rejects changes (after it is
sealed) or at least provides notification (so things can be
recompiled).  In theory, this could even go into python 2.x (though
not as the default), though it is a bit difficult in practice.  (By
the time you can specify an alternative dict factory, it is too late.)

-jJ

From guido at python.org  Wed Aug  9 20:36:32 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Aug 2006 11:36:32 -0700
Subject: [Python-3000] Set literals - another try
In-Reply-To: <44D8C084.8090503@acm.org>
References: <44D8C084.8090503@acm.org>
Message-ID: <ca471dc20608091136p3d6ad23fi1b7fab8cf4bdcfde@mail.gmail.com>

On 8/8/06, Talin <talin at acm.org> wrote:
> Part 1: The concrete proposal part.
>
> I noticed that a lot of folks seemed to like the idea of making the
> empty set resemble the greek letter Phi, using a combination of
> parentheses and the vertical bar or forward slash character.
>
> So lets expand on this: slice Phi in half and say that (| and |) are
> delimiters for a set literal, as follows:
>
>     (|)     # Empty set
>
>     (|a|)   # Set with 1 item
>
>     (|a,b|) # Set with 2 items
>
> The advantage of this proposal is that it maintains visual consistency
> between the 0, 1, and N element cases.

-1.

This attempts to solve the lack of an empty set literal in the current
best proposal, which is set(), {1}, {1, 2}, {1, 2, 3} etc. But it does
so at the tremendous cost of inventing new unfamiliar brackets.

> Part 2: The idle speculation part, not to be considered as a actual
> proposal.
[...]
> The syntax that comes to mind is something like this:
>
>     a = b|c

This would be ambiguous since b|c also means set union.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug  9 20:43:50 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Aug 2006 11:43:50 -0700
Subject: [Python-3000] Set literals - another try
In-Reply-To: <fb6fbf560608090936q2df9d04dje5ea25ba835a0dbe@mail.gmail.com>
References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu>
	<43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com>
	<44D9B635.9010200@gmail.com>
	<fb6fbf560608090936q2df9d04dje5ea25ba835a0dbe@mail.gmail.com>
Message-ID: <ca471dc20608091143jb4fa170pe6262da44c8165be@mail.gmail.com>

> On 8/9/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > A different way to enable that would be to include a set of non-keyword names
> > (a subset of the default builtin namespace) in the language definition that
> > the compiler is explicitly permitted to treat as constants if they are not
> > otherwise defined in the current lexical scope.

Right. This has been considered many times. I would love it if someone
wrote up a PEP for this.

On 8/9/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> Realistically, I want my own functions and class definitions to be
> treated that way (inlinable) most of the time.  I don't want to start
> marking them with "stable".

I'm not sure what you mean here. Inlining user code really isn't on
the table; it's unrealistic to expect this to happen any time soon
(especially since you're likely to want to inline things imported from
other modules too, and methds, etc.).

> > The only thing that would break is hacks like poking an alternate
> > implementation of str or set or len into the global namespace from somewhere
> > outside the module.

The PEP should consider this use case and propose a solution. I'm fine
with requiring a module to write

  len = len

near the top to declare that it wants len patchable.

OTOH for open I think the compiler should *not* inline this as it is
fairly common to monkey-patch it.

> So what we need is a module that either rejects changes (after it is
> sealed) or at least provides notification (so things can be
> recompiled).  In theory, this could even go into python 2.x (though
> not as the default), though it is a bit difficult in practice.  (By
> the time you can specify an alternative dict factory, it is too late.)

Recompilation upon notification seems way over the top; it's not like
anything we currently do or are even considering.

I'd much rather pick one of the following:

(a) if the module doesn't have a global named 'len' and you add one
(e.g. by "m.len = ...") the behavior is undefined

(b) module objects actively reject attempts to inject new globals that
would shadow built-ins in the list that Nick proposes. (BTW having
such a list is a good idea. Requiring the compiler to know about *all*
built-ins is not realistic since some frameworks patch the __builtin__
module.)

PS. Nick, how's the book coming along?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug  9 20:45:34 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Aug 2006 11:45:34 -0700
Subject: [Python-3000] Set literals - another try
In-Reply-To: <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com>
References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu>
	<43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com>
Message-ID: <ca471dc20608091145v58a57d02y6e483db4b35ed029@mail.gmail.com>

On 8/8/06, Collin Winter <collinw at gmail.com> wrote:
> I thought one of the main arguments in favor of set literals is that a
> literal form would allow the compiler to perform optimisations that
> the set(...) spelling doesn't allow.

Let me clear up this misunderstanding. Optimizations have nothing to
do with it (they would be invalid anyway since sets are mutable). It's
a matter of writing more readable code.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug  9 20:53:31 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Aug 2006 11:53:31 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <44D9BCB4.5010404@gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>
	<44D9BCB4.5010404@gmail.com>
Message-ID: <ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>

On 8/9/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> That check is already there:
>
> int PyThreadState_SetAsyncExc(  long id, PyObject *exc)
>      Asynchronously raise an exception in a thread. The id argument is the
> thread id of the target thread; exc is the exception object to be raised. This
> function does not steal any references to exc. To prevent naive misuse, you
> must write your own C extension to call this. Must be called with the GIL
> held. Returns the number of thread states modified; if it returns a number
> greater than one, you're in trouble, and you should call it again with exc set
> to NULL to revert the effect. This raises no exceptions. New in version 2.3.

Note that it is intentionally not directly accessible from Python --
but this can be revised.

> In Python 2.5, you can use ctypes to get at the whole C API from Python code,
> and calling thread.get_ident() in the run() method will allow you to find out
> the thread id of your thread (you'll need to save that value somewhere so
> other code can get at it).
>
> All Tober is really asking for is a method on threading.Thread objects that
> uses this existing API to set a builtin ThreadExit exception. The thread
> module would consider a thread finishing with ThreadExit to be
> non-exceptional, so you could easily do:
>
>    th.terminate() # Raise ThreadExit in th's thread of control
>    th.join() # Should finish up pretty quickly
>
> Proper resource cleanup would be reliant on correct use of try/finally or with
> statements, but that's the case regardless of whether or not asynchronous
> exceptions are allowed.

I'm +0 on this.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tim.peters at gmail.com  Wed Aug  9 21:48:58 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Wed, 9 Aug 2006 15:48:58 -0400
Subject: [Python-3000] threading, part 2
In-Reply-To: <ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>
	<44D9BCB4.5010404@gmail.com>
	<ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>
Message-ID: <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>

[Nick Coghlan]
>> That check is already there:
>>
>> int PyThreadState_SetAsyncExc(  long id, PyObject *exc)
>>      Asynchronously raise an exception in a thread. The id argument is the
>> thread id of the target thread; exc is the exception object to be
raised. This
>> function does not steal any references to exc. To prevent naive misuse, you
>> must write your own C extension to call this. Must be called with the GIL
>> held. Returns the number of thread states modified; if it returns a number
>> greater than one, you're in trouble, and you should call it again
with exc set
>> to NULL to revert the effect. This raises no exceptions. New in version 2.3.

Guido, do you have any idea now what the "number greater than one"
business is about?  That would happen if and only if we found more
than one thread state with the given thread id in the interpreter's
list of thread states, but we're counting those with both the GIL and
the global head_mutex lock held.  My impression has been that it would
be an internal logic error if we ever saw this count exceed 1.

While I'm at it, I expect:

		Py_CLEAR(p->async_exc);
		Py_XINCREF(exc);
		p->async_exc = exc;

is better written:

		Py_XINCREF(exc);
		Py_CLEAR(p->async_exc);
		p->async_exc = exc;

for the same reason one should always incref B before decrefing A in

    A = B

...

>> All Tober is really asking for is a method on threading.Thread objects that
>> uses this existing API to set a builtin ThreadExit exception. The thread
>> module would consider a thread finishing with ThreadExit to be
>> non-exceptional, so you could easily do:
>>
>>    th.terminate() # Raise ThreadExit in th's thread of control
>>    th.join() # Should finish up pretty quickly
>>
>> Proper resource cleanup would be reliant on correct use of
try/finally or with
>> statements, but that's the case regardless of whether or not asynchronous
>> exceptions are allowed.

[Guido]
> I'm +0 on this.

Me too, although it won't stay that simple, and I'm clear as mud on
how implementations other than CPython could implement this.

From guido at python.org  Wed Aug  9 22:39:25 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Aug 2006 13:39:25 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>
	<44D9BCB4.5010404@gmail.com>
	<ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>
	<1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>
Message-ID: <ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>

On 8/9/06, Tim Peters <tim.peters at gmail.com> wrote:
> [Nick Coghlan]
> >> That check is already there:
> >>
> >> int PyThreadState_SetAsyncExc(  long id, PyObject *exc)
> >>      Asynchronously raise an exception in a thread. The id argument is the
> >> thread id of the target thread; exc is the exception object to be raised. This
> >> function does not steal any references to exc. To prevent naive misuse, you
> >> must write your own C extension to call this. Must be called with the GIL
> >> held. Returns the number of thread states modified; if it returns a number
> >> greater than one, you're in trouble, and you should call it again with exc set
> >> to NULL to revert the effect. This raises no exceptions. New in version 2.3.
>
> Guido, do you have any idea now what the "number greater than one"
> business is about?  That would happen if and only if we found more
> than one thread state with the given thread id in the interpreter's
> list of thread states, but we're counting those with both the GIL and
> the global head_mutex lock held.  My impression has been that it would
> be an internal logic error if we ever saw this count exceed 1.

Right, I think that's it. I guess I was in a grumpy mood when I wrote
this (and Just & Alex never ended up using it!).

> While I'm at it, I expect:
>
>                 Py_CLEAR(p->async_exc);
>                 Py_XINCREF(exc);
>                 p->async_exc = exc;
>
> is better written:
>
>                 Py_XINCREF(exc);
>                 Py_CLEAR(p->async_exc);
>                 p->async_exc = exc;
>
> for the same reason one should always incref B before decrefing A in
>
>     A = B
>
> ...

That reason that A and B might already be the same object, right?

> >> All Tober is really asking for is a method on threading.Thread objects that
> >> uses this existing API to set a builtin ThreadExit exception. The thread
> >> module would consider a thread finishing with ThreadExit to be
> >> non-exceptional, so you could easily do:
> >>
> >>    th.terminate() # Raise ThreadExit in th's thread of control
> >>    th.join() # Should finish up pretty quickly
> >>
> >> Proper resource cleanup would be reliant on correct use of try/finally or with
> >> statements, but that's the case regardless of whether or not asynchronous
> >> exceptions are allowed.
>
> [Guido]
> > I'm +0 on this.
>
> Me too, although it won't stay that simple, and I'm clear as mud on
> how implementations other than CPython could implement this.

Another good reason to keep it accessible from the C API only. Now I'm
-0 on adding it. I suggest that if someone really wants this
accessible from Python, they should research how Jython, IronPython,
PyPy and Stackless could handle this, and report their research in a
PEP.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From qrczak at knm.org.pl  Thu Aug 10 00:27:16 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Thu, 10 Aug 2006 00:27:16 +0200
Subject: [Python-3000] threading, part 2
In-Reply-To: <ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>
	(Guido van Rossum's message of "Wed, 9 Aug 2006 13:39:25 -0700")
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>
	<44D9BCB4.5010404@gmail.com>
	<ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>
	<1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>
	<ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>
Message-ID: <871wrp8rzv.fsf@qrnik.zagroda>

"Guido van Rossum" <guido at python.org> writes:

>> for the same reason one should always incref B before decrefing A in
>>
>>     A = B
>>
>> ...
>
> That reason that A and B might already be the same object, right?

Or B might be a subobject of A, not referenced elsewhere.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From talin at acm.org  Thu Aug 10 01:13:05 2006
From: talin at acm.org (Talin)
Date: Wed, 09 Aug 2006 16:13:05 -0700
Subject: [Python-3000] Python/C++ question
Message-ID: <44DA6C01.2040904@acm.org>

A while back someone proposed switching to C++ as the implementation 
language for CPython, and the response was that this would make ABI 
compatibility too difficult, since the different C++ compilers don't 
have a common way to represent things like vtables and such.

However, I was thinking - if you remove all of the ABI-breaking features 
of C++, such as virtual functions, name mangling, RTTI, exceptions, and 
so on, its still a pretty nice language compared to C - you still have 
things like namespaces, constructors/destructors (especially nice for 
stack-local objects), overloadable type conversion, automatic 
upcasting/downcasting, references, plus you don't have to keep repeating 
the word 'struct' everywhere.

Think how much cleaner the Python source would be if just one C++ 
feature - namespaces - could be used. Imagine being able to put all of 
your enumeration values in their own namespace, instead of mixing them 
in with all the other global symbols.

Think of the gazillions of cast operators you could get rid of if you 
could assign from PyString* to PyObject*, without having to explicitly 
cast between pointer types.

My question is, however - would this even work? That is, if you wrapped 
all the source files in 'extern "C"', turned off the exception and RTTI 
compiler switches, suppressed the use of the C++ runtime libs and 
forbade use of the word 'virtual', would that effectively avoid the ABI 
compatibility issues? Would you be able to produce, on all supported 
platforms, a binary executable that was interoperable with ones produced 
by straight C?

I actually have a personal motivation in asking this - it has been so 
many years since I've written in C, that I've actually *forgotten how*. 
Despite the fact that my very first C program, written in 1982, was a C 
compiler, today I find writing C programs a considerable challenge, 
because I don't remember exactly where the dividing line between C and 
C++ is - and I will either end up accidentally using a C++-specific 
language feature, or worse, I'll unconsciously avoid a valid C language 
feature because I don't remember whether it's C++ specific or not. (For 
example, I don't remember whether its valid to define an enumeration 
within a struct, which is something that I do all the time in C++.)

-- Talin

From guido at python.org  Thu Aug 10 01:18:02 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Aug 2006 16:18:02 -0700
Subject: [Python-3000] Python/C++ question
In-Reply-To: <44DA6C01.2040904@acm.org>
References: <44DA6C01.2040904@acm.org>
Message-ID: <ca471dc20608091618k3cffb3ewa3029794f0f02761@mail.gmail.com>

On 8/9/06, Talin <talin at acm.org> wrote:
> A while back someone proposed switching to C++ as the implementation
> language for CPython, and the response was that this would make ABI
> compatibility too difficult, since the different C++ compilers don't
> have a common way to represent things like vtables and such.
>
> However, I was thinking - if you remove all of the ABI-breaking features
> of C++, such as virtual functions, name mangling, RTTI, exceptions, and
> so on, its still a pretty nice language compared to C - you still have
> things like namespaces, constructors/destructors (especially nice for
> stack-local objects), overloadable type conversion, automatic
> upcasting/downcasting, references, plus you don't have to keep repeating
> the word 'struct' everywhere.
>
> Think how much cleaner the Python source would be if just one C++
> feature - namespaces - could be used. Imagine being able to put all of
> your enumeration values in their own namespace, instead of mixing them
> in with all the other global symbols.
>
> Think of the gazillions of cast operators you could get rid of if you
> could assign from PyString* to PyObject*, without having to explicitly
> cast between pointer types.
>
> My question is, however - would this even work? That is, if you wrapped
> all the source files in 'extern "C"', turned off the exception and RTTI
> compiler switches, suppressed the use of the C++ runtime libs and
> forbade use of the word 'virtual', would that effectively avoid the ABI
> compatibility issues? Would you be able to produce, on all supported
> platforms, a binary executable that was interoperable with ones produced
> by straight C?
>
> I actually have a personal motivation in asking this - it has been so
> many years since I've written in C, that I've actually *forgotten how*.
> Despite the fact that my very first C program, written in 1982, was a C
> compiler, today I find writing C programs a considerable challenge,
> because I don't remember exactly where the dividing line between C and
> C++ is - and I will either end up accidentally using a C++-specific
> language feature, or worse, I'll unconsciously avoid a valid C language
> feature because I don't remember whether it's C++ specific or not. (For
> example, I don't remember whether its valid to define an enumeration
> within a struct, which is something that I do all the time in C++.)

For the majority of Python developers it's probably the other way
around. It's been 15 years since I wrote C++, and unlike C, that
language has changed a lot since then...

It would be a complete rewrite; I prefer doing a gradual
transmogrification of the current codebase into Py3k rather than
starting from scratch (read Joel Spolsky on why).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From collinw at gmail.com  Thu Aug 10 02:32:19 2006
From: collinw at gmail.com (Collin Winter)
Date: Wed, 9 Aug 2006 20:32:19 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
Message-ID: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>

After letting the discussions from the Spring stew in my head for a
few months, here's my first draft of the proto-PEP for function
annotations. This is intended to lay out in a single document the
basic ideas for function annotations, to get community feedback on the
fundamentals before proceeding to the nitty-gritty. As such, the
implementation section isn't filled out; that's still in progress.
Also, the list of references is incomplete. Both of these will be
completed before the initial submission to the PEP editors.

Without further ado...


PEP: 3XXX
Title: Function Annotations
Version: $Revision: 43251 $
Last-Modified: $Date: 2006-03-23 09:28:55 -0500 (Thu, 23 Mar 2006) $
Author: Collin Winter <collinw at gmail period com>
Discussions-To: python-3000 at python.org
Status: Draft
Type: Standards Track
Requires: 3XXX (Brett Cannon's __signature__ PEP)
Content-Type: text/x-rst
Created: 03-Aug-2006
Python-Version: 3.0
Post-History:


Abstract
========

This PEP introduces a syntax for adding annotations to Python
functions [#func-term#]_.  In addition to annotations for function
parameters, the syntax includes support for annotating a function's
return value(s).

In section one, I outline the "philosophy" and fundamentals needed
to understand function annotations before launching into an
in-depth discussion.

In section two, the syntax for function annotations is presented,
including a full explanation of the changes needed in Python's
grammar.

In section three, I discuss how user code will be able to access
the annotation information.

Section four describes a possible implementation of function
annotations for Python 3.0.

In section five, a C-language API for use by extension modules is
discussed.

Lastly, section six lists a number of ideas that were considered for
inclusion but were ultimately rejected.


Rationale
=========

Because Python's 2.x series lacks a standard way of annotating a
function's parameters and return values (e.g., with information about
a what type a function's return value should be), a variety
of tools and libraries have appeared to fill this gap [#tail-examp#]_.
Some utilise the decorators introduced in "PEP 318", while others
parse a function's doctext strings, looking for annotations
there.

This PEP aims to provide a single, standard way of specifying this
information, reducing the confusion caused by the wide variation in
mechanism and syntax that has existed until this point.


Fundamentals of Function Annotations
====================================

Before launching into a discussion of the precise ins and outs of
Python 3.0's function annotations, let's first talk broadly about
what annotations are and are not:


1. Function annotations, both for parameters and return values, are
   completely optional.


2. Function annotations are nothing more than a way of associating
   arbitrary Python expressions with various parts of a function at
   compile-time.

   Re-read that. Once more.

   By itself, Python does not attach any particular meaning or
   significance to annotations.  Left to its own, Python simply
   takes these expressions and uses them as the values in some
   theoretical parameter-name-to-annotation-expression mapping.

   The only way that annotations take on meaning is when they
   are interpreted by third-party libraries.  These third-party,
   annotation-interpreting libraries (TAILs, for short) can do
   anything they want with a function's annotations.  For
   example, one library might use string-based annotations to provide
   improved help messages, like so:

   ::
        def compile(source: "something compilable",
                    filename: "where the compilable thing comes from",
                    mode: "is this a single statement or a suite?"):
            ...

   Another library might be used to provide typechecking for Python
   functions and methods.  This library could use annotations to
   indicate the function's expected input and return types, possibly
   something like

   ::
        def sum(*vargs: Number) -> Number:
            ...

   where ``Number`` is some description of the protocol for numeric
   types.

   However, neither the strings in the first example nor the
   type information in the second example have any meaning on their
   own;  meaning comes from third-party libraries alone.


3. Following from point 2, this PEP makes no attempt to introduce
   any kind of standard semantics, even for the built-in types.
   This work will be left to third-party libraries.

   There is no worry that these libraries will assign semantics at
   random, or that a variety of libraries will appear, each with varying
   semantics and interpretations of what, say, a tuple of strings
   means. The difficulty inherent in writing annotation interpreting
   libraries will keep their number low and their authorship in the
   hands of people who, frankly, know what they're doing.


Syntax
======

Parameters
----------

Annotations for parameters take the form of optional expressions
that follow the parameter name.  This example indicates that
parameters 'a' and 'c' should both be a ``Number``, while parameter
'b' should both be a ``Mapping``:

::
    def foo(a: Number, b: Mapping, c: Number = 5):
        ...

In pseudo-grammar, parameters now look like
``identifier [: expression] [= expression]``.  That is, type
annotations always precede a parameter's default value and both type
annotations and default values are optional.  Just like how equal
signs are used to indicate a default value, colons are used to mark
annotations.  All annotation expressions are evaluated at the time
the function is compiled.

Annotations for excess parameters (i.e., *vargs and **kwargs)
are indicated similarly.  In the follow function definition,
``*vargs`` is flagged as a list of ``Number``s, and ``**kwargs`` is
marked as a dict whose keys are strings and whose values are
``Sequence``s.

::
    def foo(*vargs: Number, **kwargs: Sequence):
        ...

Note that, depending on what annotation-interpreting library you're
using, the following might also be a valid spelling of the above:

::
    def foo(*vargs: [Number], **kwargs: {str: Sequence}):
        ...

Only the first, however, has the BDFL's blessing [#blessed-excess#]_
as the One Obvious Way.


Return Values
-------------

The examples thus far have omitted examples of how to annotate the
type of a function's return value. This is done like so:

::
    def sum(*vargs: Number) -> Number:
        ...


The parameter list can now be followed by a literal ``->`` and
a Python expression.  Like the annotations for parameters, this
expression will be evaluated when the function is compiled.

The pseudo-grammar for function definition is now something like

::
    vargs     = '*'  identifier [':' expression]
    kwargs    = '**' identifier [':' expression]
    parameter = identifier [':' expression] ['=' expression]

    funcdef = 'def' identifier '(' [parameter ',']*
                                   [vargs ',']
                                   [kwargs]
                               ')' ['->' expression] ':' suite


For a complete discussion of the changes to Python's grammar, see
the section `Grammar Changes`_.


Accessing Function Annotations
==============================

Once compiled, a function's annotations are available via the
function's ``__signature__`` attribute, introduced by PEP 3XXX.
Signature objects include an attribute just for annotations,
appropriately called ``annotations``.  This attribute is a
dictionary, mapping parameter names to an object representing
the evaluated annotation expression.

There is a special key in the ``annotations`` mapping, ``"return"``.
This key is present only if an annotation was supplied for the
function's return value.

For example, the following annotation:

::
    def foo(a: Number, b: 5 + 6, c: list) -> String:
        ...

would result in a ``__signature__.annotations`` mapping of

::
    {'a': Number,
     'b': 11,
     'c': list,
     'return': String}


The ``return`` key was chosen because it cannot conflict with
the name of a parameter;  any attempt to use ``return`` as a
parameter name would result in a ``SyntaxError``.


Implementation
==============

XXX This is all very much TODO.  Beyond the obvious changes to Python's
grammar, the eventual implementation will probably involve a change to
the MAKE_FUNCTION opcode, though the details haven't been fully worked
out yet.

I'm still working on a sample implementation that works separately from
the __signature__ mechanism.


API for Annotations in C-language Extension Modules
===================================================

XXX TODO

This will probably involve macros around CPython API calls to set
and fetch the annotation expression for a given parameter.


Rejected Proposals
==================

+ The BDFL rejected the author's idea for a special syntax for adding
  annotations to generators as being "too ugly" [#reject-gen-syn]_.

+ Though discussed early on ([#thread-gen#]_, [#thread-hof#]_),
  including special objects in the stdlib for annotating generator
  functions and higher-order functions was ultimately rejected as
  being more appropriate for third-party libraries:  including them
  in the standard library raised too many thorny issues.

+ Despite considerable discussion about a standard type parameterisation
  syntax, it was decided that this should also be left to third-party
  libraries. ([#thread_imm-list#]_, [#thread-mixing#]_,
  [#emphasis-tpls#]_)


Footnotes
=========

.. _[#func-term#] - Unless specifically stated, "function" is
   generally used as a synonym for "callable" throughout this
   document.

.. _[#tail-examp#] - The author's typecheck_ library makes use of
   decorators, while `Maxime Bourget's own typechecker`_ utilises parsed
   doctext strings.


References
##########

.. _[#blessed-excess#] -
        http://mail.python.org/pipermail/python-3000/2006-May/002173.html

.. _[#reject-gen-syn#] -
        http://mail.python.org/pipermail/python-3000/2006-May/002103.html

.. _typecheck -
        http://oakwinter.com/code/typecheck/

.. _Maxime Bourget's own typechecker -
        http://maxrepo.info/taxonomy/term/3,6/all

.. _[#thread-gen#] -
        http://mail.python.org/pipermail/python-3000/2006-May/002091.html

.. _[#thread-hof#] -
        http://mail.python.org/pipermail/python-3000/2006-May/001972.html

.. _[#thread-imm-list#] -
        http://mail.python.org/pipermail/python-3000/2006-May/002105.html

.. _[#thread-mixing#] -
        http://mail.python.org/pipermail/python-3000/2006-May/002209.html

.. _[#emphasis-tpls#] -
        http://mail.python.org/pipermail/python-3000/2006-June/002438.html

From talin at acm.org  Thu Aug 10 02:51:02 2006
From: talin at acm.org (Talin)
Date: Wed, 09 Aug 2006 17:51:02 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
Message-ID: <44DA82F6.5030907@acm.org>

Collin Winter wrote:
>    There is no worry that these libraries will assign semantics at
>    random, or that a variety of libraries will appear, each with varying
>    semantics and interpretations of what, say, a tuple of strings
>    means. The difficulty inherent in writing annotation interpreting
>    libraries will keep their number low and their authorship in the
>    hands of people who, frankly, know what they're doing.

I find this assumption extremely dubious.

> In pseudo-grammar, parameters now look like
> ``identifier [: expression] [= expression]``.  That is, type
> annotations always precede a parameter's default value and both type
> annotations and default values are optional.  Just like how equal
> signs are used to indicate a default value, colons are used to mark
> annotations.  All annotation expressions are evaluated at the time
> the function is compiled.

Only one annotation per parameter? What if I want to specify both a 
docstring *and* a type constraint?

-- Talin

From collinw at gmail.com  Thu Aug 10 03:02:08 2006
From: collinw at gmail.com (Collin Winter)
Date: Wed, 9 Aug 2006 21:02:08 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DA82F6.5030907@acm.org>
References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
	<44DA82F6.5030907@acm.org>
Message-ID: <43aa6ff70608091802sc2cd03bg9c43a237bcf13d8@mail.gmail.com>

On 8/9/06, Talin <talin at acm.org> wrote:
> Collin Winter wrote:
> >    There is no worry that these libraries will assign semantics at
> >    random, or that a variety of libraries will appear, each with varying
> >    semantics and interpretations of what, say, a tuple of strings
> >    means. The difficulty inherent in writing annotation interpreting
> >    libraries will keep their number low and their authorship in the
> >    hands of people who, frankly, know what they're doing.
>
> I find this assumption extremely dubious.

Why? This is something Guido and I have discussed and agreed on.

What's your reasoning?

> > In pseudo-grammar, parameters now look like
> > ``identifier [: expression] [= expression]``.  That is, type
> > annotations always precede a parameter's default value and both type
> > annotations and default values are optional.  Just like how equal
> > signs are used to indicate a default value, colons are used to mark
> > annotations.  All annotation expressions are evaluated at the time
> > the function is compiled.
>
> Only one annotation per parameter? What if I want to specify both a
> docstring *and* a type constraint?

If the grammar were something like ``identifier [: expression]* [=
expression]`` instead, it would be possible to add multiple
annotations to parameters. But what of the return value? Would you
want to write

def foo() -> Number -> "total number of frobnications":
...

I wouldn't.

The way to make this explicit, if you need it, would be something like this:

def bar(a: ("number of whatzits", Number)) -> ("frobnication count", Number):

then use a decorator to determine which annotation-interpreting
decorators are assigned which annotations, something like this,
perhaps:

@chain(annotation_as_docstring, annotation_as_type)
def bar(a: ("number of whatzits", Number)) -> ("frobnication count", Number):


Collin Winter

From tim.peters at gmail.com  Thu Aug 10 03:38:20 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Wed, 9 Aug 2006 21:38:20 -0400
Subject: [Python-3000] threading, part 2
In-Reply-To: <ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>
	<44D9BCB4.5010404@gmail.com>
	<ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>
	<1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>
	<ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>
Message-ID: <1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com>

[back and forth on PyThreadState_SetAsyncExc()]

[Tim]
>> Guido, do you have any idea now what the "number greater than one"
>> business is about?
>> ...
>> My impression has been that it would be an internal logic error if we
>> ever saw this count exceed 1.

[Guido]
> Right, I think that's it. I guess I was in a grumpy mood when I wrote
> this

I forgot that we talked about this close to two years ago:

    http://www.python.org/sf/1069160

As comments there say, it's still the case that it's clearly possible
to provoke this into deadlocking (but unlikely if you're not
deliberately trying to).

> (and Just & Alex never ended up using it!).

They spoke for themselves on this matter in that bug report ;-)

>> While I'm at it, I expect:
>>
>>                 Py_CLEAR(p->async_exc);
>>                 Py_XINCREF(exc);
>>                 p->async_exc = exc;
>>
>> is better written:
>>
>>                 Py_XINCREF(exc);
>>                 Py_CLEAR(p->async_exc);
>>                 p->async_exc = exc;
>>
>> for the same reason one should always incref B before decrefing A in
>>
>>     A = B
>>
>> ...

> That reason that A and B might already be the same object, right?

Right, or that B's only owned reference is on a chain only reachable
from A, and in either case A's incoming refcount is 1.  The suggested
deadlock-avoiding rewrite in the patch comment addresses that too.

...

>>> I'm +0 on [exposing] this [from Python].

>> Me too, although it won't stay that simple, and I'm clear as mud on
>> how implementations other than CPython could implement this.

> Another good reason to keep it accessible from the C API only. Now I'm
> -0 on adding it. I suggest that if someone really wants this
> accessible from Python, they should research how Jython, IronPython,
> PyPy and Stackless could handle this, and report their research in a
> PEP.

As a full-blown language feature, I'm -1 unless that work is done
first.  I'm still +0 on adding it to CPython if it's given a
leading-underscore name and docs to make clear that it's a
CPython-specific hack that may never work under any other
implementation.

From greg.ewing at canterbury.ac.nz  Thu Aug 10 04:47:48 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 10 Aug 2006 14:47:48 +1200
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
Message-ID: <44DA9E54.5020105@canterbury.ac.nz>

Collin Winter wrote:
>    one library might use string-based annotations to provide
>    improved help messages, like so:
>
>         def compile(source: "something compilable",
>                     filename: "where the compilable thing comes from",
>                     mode: "is this a single statement or a suite?"):
> 
>    Another library might be used to provide typechecking for Python
>    functions and methods.
>
>         def sum(*vargs: Number) -> Number:
>             ...

And what are you supposed to do if you want to write
a function that has improved help messages *and*
type checking?

>    The difficulty inherent in writing annotation interpreting
>    libraries will keep their number low and their authorship in the
>    hands of people who, frankly, know what they're doing.

Even if there are only two of them, they can still
conflict.

I think the idea of having totally undefined
annotations is fundamentally flawed.

--
Greg

From greg.ewing at canterbury.ac.nz  Thu Aug 10 04:49:55 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 10 Aug 2006 14:49:55 +1200
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608091802sc2cd03bg9c43a237bcf13d8@mail.gmail.com>
References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
	<44DA82F6.5030907@acm.org>
	<43aa6ff70608091802sc2cd03bg9c43a237bcf13d8@mail.gmail.com>
Message-ID: <44DA9ED3.3040304@canterbury.ac.nz>

Collin Winter wrote:
> On 8/9/06, Talin <talin at acm.org> wrote:
> 
>>Collin Winter wrote:
>>
>>>   The difficulty inherent in writing annotation interpreting
>>>   libraries will keep their number low and their authorship in the
>>>   hands of people who, frankly, know what they're doing.
>>
>>I find this assumption extremely dubious.
> 
> Why? This is something Guido and I have discussed and agreed on.

It smells like something akin to security by
obscurity to me.

--
Greg

From collinw at gmail.com  Thu Aug 10 04:58:55 2006
From: collinw at gmail.com (Collin Winter)
Date: Wed, 9 Aug 2006 22:58:55 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DA9E54.5020105@canterbury.ac.nz>
References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
	<44DA9E54.5020105@canterbury.ac.nz>
Message-ID: <43aa6ff70608091958u2d00db76s48260853942bed32@mail.gmail.com>

On 8/9/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Collin Winter wrote:
> >    one library might use string-based annotations to provide
> >    improved help messages, like so:
> >
> >         def compile(source: "something compilable",
> >                     filename: "where the compilable thing comes from",
> >                     mode: "is this a single statement or a suite?"):
> >
> >    Another library might be used to provide typechecking for Python
> >    functions and methods.
> >
> >         def sum(*vargs: Number) -> Number:
> >             ...
>
> And what are you supposed to do if you want to write
> a function that has improved help messages *and*
> type checking?

I already answered this in my response to Talin.

The next draft will address this directly.

> >    The difficulty inherent in writing annotation interpreting
> >    libraries will keep their number low and their authorship in the
> >    hands of people who, frankly, know what they're doing.
>
> Even if there are only two of them, they can still
> conflict.

No-one is arguing that there won't be conflicting ideas about how to
spell different annotation ideas; just look at the number of
interface/role/typeclass/whatever implementations.

The idea is that each developer can pick the notation/semantics that's
most natural to them. I'll go even further: say one library offers a
semantics you find handy for task A, while another library's ideas
about type annotations are best suited for task B. Without a single
standard, you're free to mix and match these libraries to give you a
combination that allows you to best express the ideas you're going
for.

Collin Winter

From guido at python.org  Thu Aug 10 06:14:03 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Aug 2006 21:14:03 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>
	<44D9BCB4.5010404@gmail.com>
	<ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>
	<1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>
	<ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>
	<1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com>
Message-ID: <ca471dc20608092114j43071728nbd660d182d065316@mail.gmail.com>

On 8/9/06, Tim Peters <tim.peters at gmail.com> wrote:
> [back and forth on PyThreadState_SetAsyncExc()]
>
> [Tim]
> >> Guido, do you have any idea now what the "number greater than one"
> >> business is about?
> >> ...
> >> My impression has been that it would be an internal logic error if we
> >> ever saw this count exceed 1.
>
> [Guido]
> > Right, I think that's it. I guess I was in a grumpy mood when I wrote
> > this
>
> I forgot that we talked about this close to two years ago:
>
>     http://www.python.org/sf/1069160
>
> As comments there say, it's still the case that it's clearly possible
> to provoke this into deadlocking (but unlikely if you're not
> deliberately trying to).
>
> > (and Just & Alex never ended up using it!).
>
> They spoke for themselves on this matter in that bug report ;-)
>
> >> While I'm at it, I expect:
> >>
> >>                 Py_CLEAR(p->async_exc);
> >>                 Py_XINCREF(exc);
> >>                 p->async_exc = exc;
> >>
> >> is better written:
> >>
> >>                 Py_XINCREF(exc);
> >>                 Py_CLEAR(p->async_exc);
> >>                 p->async_exc = exc;
> >>
> >> for the same reason one should always incref B before decrefing A in
> >>
> >>     A = B
> >>
> >> ...
>
> > That reason that A and B might already be the same object, right?
>
> Right, or that B's only owned reference is on a chain only reachable
> from A, and in either case A's incoming refcount is 1.  The suggested
> deadlock-avoiding rewrite in the patch comment addresses that too.

So why didn't we check that in?

> ...
>
> >>> I'm +0 on [exposing] this [from Python].
>
> >> Me too, although it won't stay that simple, and I'm clear as mud on
> >> how implementations other than CPython could implement this.
>
> > Another good reason to keep it accessible from the C API only. Now I'm
> > -0 on adding it. I suggest that if someone really wants this
> > accessible from Python, they should research how Jython, IronPython,
> > PyPy and Stackless could handle this, and report their research in a
> > PEP.
>
> As a full-blown language feature, I'm -1 unless that work is done
> first.  I'm still +0 on adding it to CPython if it's given a
> leading-underscore name and docs to make clear that it's a
> CPython-specific hack that may never work under any other
> implementation.

Fine with me then. In 2.5? 2.6? Or py3k? (This is the py3k list.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at prescod.net  Thu Aug 10 07:19:03 2006
From: paul at prescod.net (Paul Prescod)
Date: Wed, 9 Aug 2006 22:19:03 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
Message-ID: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>

Thanks for everyone who contributed. It seems that the emerging consensus
(bar a security question from Guido) is that ctypes it the way forward for
calling C code in Python 3000. I'd like to clarify what this might mean:

1. Is ctypes and pure python fast enough for most real-world extension
modules like PyOpenGL, PyExpat, Tkinter, and socket programming? I know that
experimentation is ongoing. Are any results in?

 2. If not, will Python 3000's build or runtime system use some kind of
optimization technique such as static compilation (e.g. extcompiler[1]) or
JIT compilation to allow parts of its library (especially new parts) to be
written using ctypes instead of C?

 3. Presuming that the performance issue can be worked out one way or
another, are there arguments in favour of interpreter-specific C-coded
extensions other than those doing explicitly interpreter-specific stuff (e.g.
tweaking the GC).

 4. Will the Python 3000 standard library start to migrate towards ctypes
(for new extensions)?

 Paul Prescod

[1] http://codespeak.net/pypy/dist/pypy/doc/extcompiler.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060809/3b2d1dd1/attachment.html 

From krstic at solarsail.hcs.harvard.edu  Thu Aug 10 07:32:38 2006
From: krstic at solarsail.hcs.harvard.edu (Ivan Krstic)
Date: Thu, 10 Aug 2006 01:32:38 -0400
Subject: [Python-3000] threading, part 2
In-Reply-To: <ca471dc20608092114j43071728nbd660d182d065316@mail.gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>	<44D9BCB4.5010404@gmail.com>	<ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>	<1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>	<ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>	<1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com>
	<ca471dc20608092114j43071728nbd660d182d065316@mail.gmail.com>
Message-ID: <44DAC4F6.3010002@solarsail.hcs.harvard.edu>

Guido van Rossum wrote:
> Fine with me then. In 2.5? 2.6? Or py3k? (This is the py3k list.)

FWIW, we'll ship 2.5 on the OLPC (laptop.org) machines, and it looks
like we'll need this. It'd be useful to have it directly in CPython, so
people running our software outside the laptops don't have to fuss with
an extension.

-- 
Ivan Krstic <krstic at solarsail.hcs.harvard.edu> | GPG: 0x147C722D

From pje at telecommunity.com  Thu Aug 10 08:28:23 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 10 Aug 2006 02:28:23 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <mailman.33865.1155187962.27774.python-3000@python.org>
Message-ID: <5.1.1.6.0.20060810021302.0262fcd0@sparrow.telecommunity.com>

At 14:47 8/10/2006 +1200, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>And what are you supposed to do if you want to write
>a function that has improved help messages *and*
>type checking?

Create a type annotation object that wraps multiple objects -- or better 
yet, use a list or tuple of annotations.  (See below.)


> >    The difficulty inherent in writing annotation interpreting
> >    libraries will keep their number low and their authorship in the
> >    hands of people who, frankly, know what they're doing.
>
>Even if there are only two of them, they can still
>conflict.
>
>I think the idea of having totally undefined
>annotations is fundamentally flawed.

No, your assumption is fundamentally flawed.  ;-)  This is a trivial 
application of overloaded functions.

In PEAK, there is a similar concept called "attribute metadata" that can be 
applied to the attributes of a class.  A single overloaded function called 
"declareAttribute" is used to "declare" the metadata.  These metadata 
annotations can be anything you want.  Certain PEAK frameworks use them for 
security declarations.  Others use them to mark an attribute as providing a 
certain interface for child components, to describe the attribute's syntax 
for parsing or formatting, and so on.

There is no predefined semantics for these metadata objects -- none 
whatsoever.  Each framework that needs a new kind of metadata object simply 
defines a class that holds whatever metadata is desired, and adds a method 
to the "declareAttribute" function to handle objects of that type.  The 
added method can do anything: modify the class or descriptor in some way, 
register something in a registry, or whatever else you want it to do.

In addition, the declareAttribute function comes with predefined methods 
for processing tuples and lists by iterating over them and calling 
declareAttribute recursively.  This makes it easy to combine groups of 
metadata objects and reuse them.

So I see no problems with this concept that overloaded functions don't 
trivially solve.  Any operation you want to perform on function annotations 
need only be implemented as an overloaded function, and there is then no 
conflict to worry about.

For example, if you are writing a documentation tool that needs to generate 
a short HTML string for an annotation, you just create an overloaded 
function for that.  Then somebody using the documentation tool with 
arbitrary type annotation frameworks (e.g. their own) can just add methods 
to the documentation tool's overloaded functions to support that.

Indeed, many a time I've wished that epydoc was written using overloaded 
functions, as it then would've been easy to extend it to gracefully handle 
PEAK's more esoteric descriptors and metaclasses.


From behnel_ml at gkec.informatik.tu-darmstadt.de  Thu Aug 10 09:28:12 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 10 Aug 2006 09:28:12 +0200
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
Message-ID: <44DAE00C.400@gkec.informatik.tu-darmstadt.de>

Hi,

Collin Winter wrote:
>         def compile(source: "something compilable",
>                     filename: "where the compilable thing comes from",
>                     mode: "is this a single statement or a suite?"):
>             ...
> 
>         def sum(*vargs: Number) -> Number:
>             ...

Admittedly, I'm not so much in the "Spring stew" discussion, but I'm not a big
fan of cluttering up my function signature with "make them short to make them
fit" comments.

What would be wrong in adding a standard decorator for this purpose? Something
like:

   @type_annotate("This is a filename passed as string", filename = str)
   @type_annotate(source = str)
   def compile(source, filename, mode):
      ...

or, more explicitly:

   @arg_docstring(filename = "This is a filename passed as string")
   @arg_type(filename = str)
   @arg_type(source = str)
   def compile(source, filename, mode):
      ...

Stefan

From behnel_ml at gkec.informatik.tu-darmstadt.de  Thu Aug 10 09:31:24 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 10 Aug 2006 09:31:24 +0200
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
Message-ID: <44DAE0CC.8040909@gkec.informatik.tu-darmstadt.de>


Paul Prescod wrote:
>  2. If not, will Python 3000's build or runtime system use some kind of
> optimization technique such as static compilation ( e.g. extcompiler[1])
> or JIT compilation to allow parts of its library (especially new parts)
> to be written using ctypes instead of C?

What's the problem? Just take PyPy and brand it as Python 3000.

Stefan


From l.oluyede at gmail.com  Thu Aug 10 10:15:10 2006
From: l.oluyede at gmail.com (Lawrence Oluyede)
Date: Thu, 10 Aug 2006 10:15:10 +0200
Subject: [Python-3000] Changing behavior of sequence multiplication by
	negative integer
Message-ID: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com>

I've never seen bugs determined by operations such as:

"foobar" * -1

and to be honest I've never seen code like that because the semantics
is somewhat senseless to me but I think the behavior of the expression
evaluation of "Sequence * negative integer" should be changed from:

>>> "foobar" * -1
''
>>> ["foobar"] * -1
[]
>>> ("foobar") * -1
''

to something throwing an exception like when you try to multiplicate
the sequence by a floating point number:

>>> "foobar" * 1.0
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: can't multiply sequence by non-int

It's not a big deal to me but maybe this can be addressed in the
python3000 branch

-- 
Lawrence
http://www.oluyede.org/blog

From ncoghlan at gmail.com  Thu Aug 10 13:19:55 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 10 Aug 2006 21:19:55 +1000
Subject: [Python-3000] threading, part 2
In-Reply-To: <44DAC4F6.3010002@solarsail.hcs.harvard.edu>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>	<44D9BCB4.5010404@gmail.com>	<ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>	<1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>	<ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>	<1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com>	<ca471dc20608092114j43071728nbd660d182d065316@mail.gmail.com>
	<44DAC4F6.3010002@solarsail.hcs.harvard.edu>
Message-ID: <44DB165B.2040901@gmail.com>

Ivan Krstic wrote:
> Guido van Rossum wrote:
>> Fine with me then. In 2.5? 2.6? Or py3k? (This is the py3k list.)
> 
> FWIW, we'll ship 2.5 on the OLPC (laptop.org) machines, and it looks
> like we'll need this. It'd be useful to have it directly in CPython, so
> people running our software outside the laptops don't have to fuss with
> an extension.

Given the time frame, I think you might be stuck with using ctypes to get at 
the functionality for Python 2.5. Now that Guido & Tim have mentioned it, I 
also vaguely recall portability to GIL-free implementations being one of the 
problems with the idea back when the C API function was added, so exposing 
this officially to Python code should probably wait until 2.6.

Peter Hansen worked out the necessary incantations to invoke it through ctypes 
back in 2004 [1]. The difference now is that "import ctypes" will work on a 
vanilla 2.5 installation.

Cheers,
Nick.

[1]
http://groups.google.com/group/comp.lang.python/msg/d310502f7c7133a9

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Thu Aug 10 13:40:32 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 10 Aug 2006 21:40:32 +1000
Subject: [Python-3000] Changing behavior of sequence multiplication by
 negative integer
In-Reply-To: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com>
References: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com>
Message-ID: <44DB1B30.1030200@gmail.com>

Lawrence Oluyede wrote:
> I've never seen bugs determined by operations such as:
> 
> "foobar" * -1
> 
> and to be honest I've never seen code like that because the semantics
> is somewhat senseless to me but I think the behavior of the expression
> evaluation of "Sequence * negative integer" should be changed from:
> 
>>>> "foobar" * -1
> ''
>>>> ["foobar"] * -1
> []
>>>> ("foobar") * -1
> ''
> 
> to something throwing an exception like when you try to multiplicate
> the sequence by a floating point number:
> 
>>>> "foobar" * 1.0
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: can't multiply sequence by non-int
> 
> It's not a big deal to me but maybe this can be addressed in the
> python3000 branch
> 

The "negative coerced to 0" behaviour is to make it easy to do things like 
padding a sequence to a minimum length:

   seq = seq + pad * (min_length- len(seq))

Without the current behaviour, all such operations would need to be rewritten as:

   seq = seq + pad * max((min_length- len(seq)), 0)

Gratuitous breakage that leads to a more verbose result gets a solid -1 from me :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From l.oluyede at gmail.com  Thu Aug 10 14:03:28 2006
From: l.oluyede at gmail.com (Lawrence Oluyede)
Date: Thu, 10 Aug 2006 14:03:28 +0200
Subject: [Python-3000] Changing behavior of sequence multiplication by
	negative integer
In-Reply-To: <44DB1B30.1030200@gmail.com>
References: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com>
	<44DB1B30.1030200@gmail.com>
Message-ID: <9eebf5740608100503l16238585yf1f2c38b1a4a4142@mail.gmail.com>

> The "negative coerced to 0" behaviour is to make it easy to do things like
> padding a sequence to a minimum length:
>
>    seq = seq + pad * (min_length- len(seq))
>
> Without the current behaviour, all such operations would need to be rewritten as:
>
>    seq = seq + pad * max((min_length- len(seq)), 0)
>
> Gratuitous breakage that leads to a more verbose result gets a solid -1 from me :)

That sound a -1 to me too. Thanks for the explanation. I was sure
there was one for that kind of behavior.


-- 
Lawrence
http://www.oluyede.org/blog

From behnel_ml at gkec.informatik.tu-darmstadt.de  Thu Aug 10 15:00:30 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Thu, 10 Aug 2006 15:00:30 +0200
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DAE00C.400@gkec.informatik.tu-darmstadt.de>
References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
	<44DAE00C.400@gkec.informatik.tu-darmstadt.de>
Message-ID: <44DB2DEE.7020601@gkec.informatik.tu-darmstadt.de>



Stefan Behnel wrote:
> Collin Winter wrote:
>>         def compile(source: "something compilable",
>>                     filename: "where the compilable thing comes from",
>>                     mode: "is this a single statement or a suite?"):
>>             ...
>>
>>         def sum(*vargs: Number) -> Number:
>>             ...
> 
> Admittedly, I'm not so much in the "Spring stew" discussion, but I'm not a big
> fan of cluttering up my function signature with "make them short to make them
> fit" comments.
> 
> What would be wrong in adding a standard decorator for this purpose? Something
> like:
> 
>    @type_annotate("This is a filename passed as string", filename = str)
>    @type_annotate(source = str)
>    def compile(source, filename, mode):
>       ...
> 
> or, more explicitly:
> 
>    @arg_docstring(filename = "This is a filename passed as string")
>    @arg_type(filename = str)
>    @arg_type(source = str)
>    def compile(source, filename, mode):
>       ...

Ah, never mind, that only applies to docstrings. The type annotation would not
be available to the compiler...

So, it would be a good idea to split the two: docstrings and types. Where a
decorator provides a readable (and extensible) solution for the first, type
annotations should be part of the signature IMHO.

Stefan


From jimjjewett at gmail.com  Thu Aug 10 16:13:14 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 10 Aug 2006 10:13:14 -0400
Subject: [Python-3000] Changing behavior of sequence multiplication by
	negative integer
In-Reply-To: <44DB1B30.1030200@gmail.com>
References: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com>
	<44DB1B30.1030200@gmail.com>
Message-ID: <fb6fbf560608100713i55839644mcf6806ca4b6a6e98@mail.gmail.com>

Lawrence Oluyede wrote:

> seq * -5
> and to be honest I've never seen code like that because the semantics
> is somewhat senseless to me

To be honest, I would almost expect the negative to mean "count from
the end", so that it also reversed the sequence.  It doesn't, but ...
it does make for a hard-to-explain case.

> ... evaluation of "Sequence * negative integer" should be changed from:

> >>> "foobar" * -1
>  ''

> > ... to something throwing an exception like when you try to multiplicate
> > the sequence by a floating point number:

Agreed.

On 8/10/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> The "negative coerced to 0" behaviour is to make it easy to do things like
> padding a sequence to a minimum length:

>    seq = seq + pad * (min_length- len(seq))

Typically, if I need to pad a sequence to a minimum length, I really
need it to be a specific length.  Having it already be too long is
likely to cause problems later.  So I really do prefer the explicit
version.

Also compare this to the recent decision that __index__ should *not*
silently clip to a C long

> Without the current behaviour, all such operations would need to be rewritten as:

>    seq = seq + pad * max((min_length- len(seq)), 0)

I would write it as

# Create a record-size pad outside the loop
pad = " "*length
...
    seq = (seq+pad)[:length]

-jJ

From ncoghlan at gmail.com  Thu Aug 10 16:33:27 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 11 Aug 2006 00:33:27 +1000
Subject: [Python-3000] Changing behavior of sequence multiplication by
 negative integer
In-Reply-To: <fb6fbf560608100713i55839644mcf6806ca4b6a6e98@mail.gmail.com>
References: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com>	
	<44DB1B30.1030200@gmail.com>
	<fb6fbf560608100713i55839644mcf6806ca4b6a6e98@mail.gmail.com>
Message-ID: <44DB43B7.2060608@gmail.com>

Jim Jewett wrote:
> I would write it as
> 
> # Create a record-size pad outside the loop
> pad = " "*length
> ...
>    seq = (seq+pad)[:length]

I'd generally do padding to a fixed length that way as well, but any code 
relying on the current 'clip to 0' behaviour would break if this changed. 
Without a really compelling reason to change it, it's hard to justify any 
breakage at all (even if there may be better ways of doing things).

While I take your point about the comparison to __index__, the difference is 
that clipping sequence repetition to 0 has been the expected behaviour for 
many releases, whereas in the __index__ overflow case the expected behaviour 
was for the code to raise an exception.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From slawek at cs.lth.se  Thu Aug 10 16:49:07 2006
From: slawek at cs.lth.se (Slawomir Nowaczyk)
Date: Thu, 10 Aug 2006 16:49:07 +0200
Subject: [Python-3000] Changing behavior of sequence multiplication by
	negative integer
In-Reply-To: <fb6fbf560608100713i55839644mcf6806ca4b6a6e98@mail.gmail.com>
References: <44DB1B30.1030200@gmail.com>
	<fb6fbf560608100713i55839644mcf6806ca4b6a6e98@mail.gmail.com>
Message-ID: <20060810164518.EF5E.SLAWEK@cs.lth.se>

On Thu, 10 Aug 2006 10:13:14 -0400
Jim Jewett <jimjjewett at gmail.com> wrote:

#> >    seq = seq + pad * (min_length- len(seq))
#> 
#> Typically, if I need to pad a sequence to a minimum length, I really
#> need it to be a specific length.  Having it already be too long is
#> likely to cause problems later.  So I really do prefer the explicit
#> version.

Well, for whatever it is worth, if I pad the data to present it in a
readable form I *most definitely* do not want values to become
truncated just because they turn out to be bigger than I originally
expected.

An ugly result is worse than nice result, but still better than wrong
result.

-- 
 Best wishes,
   Slawomir Nowaczyk
     ( Slawomir.Nowaczyk at cs.lth.se )

All I want is a warm bed, and a kind word, and unlimited power.


From guido at python.org  Thu Aug 10 19:50:24 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 10 Aug 2006 10:50:24 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
Message-ID: <ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>

I worry that this may be too ambitious to add to the already
significant load for the Py3k project. You've seen my timeline --
alpha in early 07, final a year later.

Don't get me wrong! I think that completely changing the FFI paradigm
(as opposed to evolutionary changes to the existing C API, which py3k
is doing) is a very worthy project, but I'd rather conceive it as
something orthogonal to the py3k transition. It doesn't have to wait
for py3k, nor should py3k have to wait for it. Tying too many projects
together in terms of mutual dependencies is a great way to cause total
paralysis.

--Guido

On 8/9/06, Paul Prescod <paul at prescod.net> wrote:
> Thanks for everyone who contributed. It seems that the emerging consensus
> (bar a security question from Guido) is that ctypes it the way forward for
> calling C code in Python 3000. I'd like to clarify what this might mean:
>
> 1. Is ctypes and pure python fast enough for most real-world extension
> modules like PyOpenGL, PyExpat, Tkinter, and socket programming? I know that
> experimentation is ongoing. Are any results in?
>
>  2. If not, will Python 3000's build or runtime system use some kind of
> optimization technique such as static compilation ( e.g. extcompiler[1]) or
> JIT compilation to allow parts of its library (especially new parts) to be
> written using ctypes instead of C?
>
>  3. Presuming that the performance issue can be worked out one way or
> another, are there arguments in favour of interpreter-specific C-coded
> extensions other than those doing explicitly interpreter-specific stuff (
> e.g. tweaking the GC).
>
>  4. Will the Python 3000 standard library start to migrate towards ctypes
> (for new extensions)?
>
>  Paul Prescod
>
> [1]
> http://codespeak.net/pypy/dist/pypy/doc/extcompiler.html
>
>
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug 10 20:05:51 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 10 Aug 2006 11:05:51 -0700
Subject: [Python-3000] Range literals
In-Reply-To: <20060808104049.E709.JCARLSON@uci.edu>
References: <44D8C154.9020406@acm.org> <20060808104049.E709.JCARLSON@uci.edu>
Message-ID: <ca471dc20608101105o5f4c7f00w3144e1a1d23b33fe@mail.gmail.com>

I haven't changed my mind. Do you really want to add atrocities such
as having both .. and ... in the language where one includes the end
point and the other excludes it? How would a casual user remember
which is which?

--Guido

On 8/8/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> Talin <talin at acm.org> wrote:
> >
> > I've seen some languages that use a double-dot (..) to mean a range of
> > items. This could be syntactic sugar for range(), like so:
> >
> >
> >     for x in 1..10:
> >        ...
>
> In the pronouncement on PEP 284: http://www.python.org/dev/peps/pep-0284/
>
>     Guido did not buy the premise that the range() format needed fixing,
>     "The whole point (15 years ago) of range() was to *avoid* needing syntax
>     to specify a loop over numbers. I think it's worked out well and there's
>     nothing that needs to be fixed (except range() needs to become an
>     iterator, which it will in Python 3.0)."
>
> Unless Guido has decided that range/xrange are the wrong way to do
> things, I don't think there is much discussion here.
>
>  - Josiah
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug 10 20:40:46 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 10 Aug 2006 11:40:46 -0700
Subject: [Python-3000] Rounding in Py3k
In-Reply-To: <eav8nk$vji$1@sea.gmane.org>
References: <44D1F304.4020700@iinet.net.au> <44D2A81D.2050204@canterbury.ac.nz>
	<eaumbe$b3m$1@sea.gmane.org> <44D3124A.6010300@canterbury.ac.nz>
	<eav8nk$vji$1@sea.gmane.org>
Message-ID: <ca471dc20608101140s55c6de25vcf538a669a9d3232@mail.gmail.com>

On 8/4/06, Ron Adam <rrr at ronadam.com> wrote:
> But that doesn't explain why int, long, and float, don't have other
> non-magic methods.
>
> I'm not attempting taking sides for or against either way, I just want
> to understand the reasons as it seems like by knowing that, the correct
> way to do it would be clear, instead of trying to wag the dog by the
> tail if you know what I mean.

I'm probably the source of this convention. For numbers, I find foo(x)
more readable than x.foo(), mostly because of the longstanding
tradition in mathematics to write things like f(x) and sin(x).

Originally I had extended the same convention to strings; but over
time it became clear that there was a common set of operations on
strings that were so fundamental that having to import a module to use
them was a mistake, and there were too many to make them all
built-ins. (I didn't insist on not using methods/attributes for
complex, since I was already used to seeing z.re and z.im in
Algol-68).

I'm not convinced that there are enough common operations on the
standard numbers to change my mind now. I'd rather see the built-in
round() use a new protocol __round__() than switching to a round()
method on various numbers; this should hopefully make it possible to
use round() on Decimal instances.

A question is what the API for __round__() should be. It seems Decimal
uses a different API than round(). Can someone think about this more
and propose a unified and backwards compatible solution?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tomerfiliba at gmail.com  Thu Aug 10 21:14:27 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Thu, 10 Aug 2006 21:14:27 +0200
Subject: [Python-3000] threading, part 2
Message-ID: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com>

[Tim]
> Me too, although it won't stay that simple, and I'm clear as mud on
> how implementations other than CPython could implement this.

[Guido]
> Another good reason to keep it accessible from the C API only. Now I'm
> -0 on adding it. I suggest that if someone really wants this
> accessible from Python, they should research how Jython, IronPython,
> PyPy and Stackless could handle this, and report their research in a
> PEP.

then how does interrupt_main work? is it implementation-agnostic?

-----
>>> import thread
>>> help(thread.interrupt_main)
Help on built-in function interrupt_main in module thread:

interrupt_main(...)
    interrupt_main()

    Raise a KeyboardInterrupt in the main thread.
    A subthread can use this function to interrupt the main thread.
-----

just let me raise arbitrary exceptions (don't limit it to KeyboardInterrupt)



-tomer

From tim.peters at gmail.com  Thu Aug 10 23:40:59 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Thu, 10 Aug 2006 17:40:59 -0400
Subject: [Python-3000] threading, part 2
In-Reply-To: <ca471dc20608092114j43071728nbd660d182d065316@mail.gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>
	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>
	<44D9BCB4.5010404@gmail.com>
	<ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>
	<1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>
	<ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>
	<1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com>
	<ca471dc20608092114j43071728nbd660d182d065316@mail.gmail.com>
Message-ID: <1f7befae0608101440i30590f4dv2740e584f801881c@mail.gmail.com>

[back and forth on PyThreadState_SetAsyncExc(), and
 the 2-year old discussion in

 http://www.python.org/sf/1069160
]

[Tim]
>> [still-current deadlock & refcount issues not fixed at the time]

[Guido]
> So why didn't we check that in?

The shallow answer is that you closed the report without checking it
in, so ask a mirror ;-)  The real answer seems to be that nobody
(including me) really cared about this function, since it's both
unused and untested in the core, and there were no known uses from
anyone's C extensions either.

[on adding it to the language]

>>>>> +0

>>>> Me too, although ... I'm clear as mud on how implementations other
>>>> than CPython could implement this.

>>> Now I'm -0 on adding it. I suggest that if someone really wants this
>>> accessible from Python, they should research how Jython, IronPython,
>>> PyPy and Stackless could handle this, and report their research in a
>>> PEP.

>> As a full-blown language feature, I'm -1 unless that work is done
>> first.  I'm still +0 on adding it to CPython if it's given a
>> leading-underscore name and docs to make clear that it's a
>> CPython-specific hack that may never work under any other
>> implementation.

> Fine with me then. In 2.5? 2.6? Or py3k? (This is the py3k list.)

Since the 2.5 beta series is supposedly done with, I strongly doubt
Anthony wants to see a new feature snuck into 2.5c1.  Someone who
wants it enough could target 2.6.  I'm only +0, so I'd do that only if
someone wants it enough to pay for it.  For 2.5, I'll check in the
anal correctness changes, add a ctypes-based test case, and reword the
docs to stop warning about a return value > 1 (all those are just
fixing what's going to be in 2.5 anyway).

From guido at python.org  Fri Aug 11 01:17:52 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 10 Aug 2006 16:17:52 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
	<ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>
	<1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com>
Message-ID: <ca471dc20608101617p2e6d13e7k2b1e96c7f23746e2@mail.gmail.com>

(Adding python-3000 back to the CC: list.)

On 8/10/06, Paul Prescod <paul at prescod.net> wrote:
> The only reason to tie it to Py3K is because Py3K is breaking APIs anyhow.
> It will be in the overlap period between Py3K and Py2x that the need for an
> abstraction will be most acute. Otherwise extensions will probably end up
> with a lot of #ifdef py3k #else etc.
>
> It isn't clear how ambitious or not this is until we drill in. For example,
> if pure "ctypes" is sufficiently efficient for 90% of all extensions, then
> moving in this direction for Py3K might require nothing more than a
> declaration from you that new extensions should be written using ctypes
> instead of the PyObject APIs unless there is a very good reason. After all,
> people will take their cue from you as to what sort of coding convention is
> appropriate for the standard library. Is this first step doable? Just a
> declaration that (with a few exceptions) ctypes is preferable to C code for
> new extensions?
>
> But if that's totally unreasonable because ctypes is seldom performant
> enough then the project gets more ambitious because it would have to pull in
> extcompiler...

I don't know enough about ctypes, but assuming I have a reason to
write an extension in C (e.g. Tkinter, which uses the Tcl/Tk API), how
to I use ctypes to call things like PyDict_GetItem() or
PyErr_SetString()?

--Guido

> On 8/10/06, Guido van Rossum <guido at python.org> wrote:
> > I worry that this may be too ambitious to add to the already
> > significant load for the Py3k project. You've seen my timeline --
> > alpha in early 07, final a year later.
> >
> > Don't get me wrong! I think that completely changing the FFI paradigm
> > (as opposed to evolutionary changes to the existing C API, which py3k
> > is doing) is a very worthy project, but I'd rather conceive it as
> > something orthogonal to the py3k transition. It doesn't have to wait
> > for py3k, nor should py3k have to wait for it. Tying too many projects
> > together in terms of mutual dependencies is a great way to cause total
> > paralysis.
> >
> > --Guido
> >
> > On 8/9/06, Paul Prescod <paul at prescod.net> wrote:
> > > Thanks for everyone who contributed. It seems that the emerging
> consensus
> > > (bar a security question from Guido) is that ctypes it the way forward
> for
> > > calling C code in Python 3000. I'd like to clarify what this might mean:
> > >
> > > 1. Is ctypes and pure python fast enough for most real-world extension
> > > modules like PyOpenGL, PyExpat, Tkinter, and socket programming? I know
> that
> > > experimentation is ongoing. Are any results in?
> > >
> > >  2. If not, will Python 3000's build or runtime system use some kind of
> > > optimization technique such as static compilation ( e.g. extcompiler[1])
> or
> > > JIT compilation to allow parts of its library (especially new parts) to
> be
> > > written using ctypes instead of C?
> > >
> > >  3. Presuming that the performance issue can be worked out one way or
> > > another, are there arguments in favour of interpreter-specific C-coded
> > > extensions other than those doing explicitly interpreter-specific stuff
> (
> > > e.g. tweaking the GC).
> > >
> > >  4. Will the Python 3000 standard library start to migrate towards
> ctypes
> > > (for new extensions)?
> > >
> > >  Paul Prescod
> > >
> > > [1]
> > >
> http://codespeak.net/pypy/dist/pypy/doc/extcompiler.html
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Python-3000 mailing list
> > > Python-3000 at python.org
> > > http://mail.python.org/mailman/listinfo/python-3000
> > > Unsubscribe:
> > >
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
> > >
> > >
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 11 01:21:02 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 10 Aug 2006 16:21:02 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com>
References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com>
Message-ID: <ca471dc20608101621j557f735cs10f4f491eb3b2ee5@mail.gmail.com>

On 8/10/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> [Tim]
> > Me too, although it won't stay that simple, and I'm clear as mud on
> > how implementations other than CPython could implement this.
>
> [Guido]
> > Another good reason to keep it accessible from the C API only. Now I'm
> > -0 on adding it. I suggest that if someone really wants this
> > accessible from Python, they should research how Jython, IronPython,
> > PyPy and Stackless could handle this, and report their research in a
> > PEP.
>
> then how does interrupt_main work? is it implementation-agnostic?

I expect that Jython doesn't implement this; it doesn't handle ^C either AFAIK.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at prescod.net  Fri Aug 11 01:45:00 2006
From: paul at prescod.net (Paul Prescod)
Date: Thu, 10 Aug 2006 16:45:00 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <ca471dc20608101617p2e6d13e7k2b1e96c7f23746e2@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
	<ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>
	<1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com>
	<ca471dc20608101617p2e6d13e7k2b1e96c7f23746e2@mail.gmail.com>
Message-ID: <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com>

Sorry for the cc mistake.


I don't know enough about ctypes, but assuming I have a reason to
> write an extension in C (e.g. Tkinter, which uses the Tcl/Tk API), how
> to I use ctypes to call things like PyDict_GetItem() or
> PyErr_SetString()?


There are two answers to your question. The simplest is that if you have a
dict object called "foo" you just call 'foo["abc"]'. It's just Python. Same
for the other one: you'd just call 'raise'.

Ctypes is the opposite model of the standard extension stuff. You're writing
in Python so Python stuff is straightforward (just Python) and C stuff is a
bit weird. So if you had to populate a Python dictionary from a C struct
then it is the reading from the C struct that takes a bit of doing. The
writing the Python dictionary is straightforward.

If there was a reason to call PyDict_GetItem directly (performance maybe???)
then that's possible. You need to set up the function prototype (which you
would probably do in a helper library) and then you just call
PyDict_GetItem. CTypes would coerce the types. py_object is a native data
type.

So I think it ends up looking like

from PythonConvenienceFunctions import PyDict_GetItem

obj = {}
key = "Guido"

rc = PyDict_GetItem(obj, key)

I'm sure an expert will correct me if I'm wrong...

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060810/d4b50a3e/attachment.html 

From paul at prescod.net  Fri Aug 11 01:57:59 2006
From: paul at prescod.net (Paul Prescod)
Date: Thu, 10 Aug 2006 16:57:59 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
	<ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>
	<1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com>
	<ca471dc20608101617p2e6d13e7k2b1e96c7f23746e2@mail.gmail.com>
	<1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com>
Message-ID: <1cb725390608101657x447df09cm888228b31e424a87@mail.gmail.com>

And if you're curious about how to use ctypes without all of the helper
functions set up for you, then I guess it is easiest to poke around the
documentation for code samples.


>>> printf.argtypes = [c_char_p, c_char_p, c_int, c_double]
>>> printf("String '%s', Int %d, Double %f\n", "Hi", 10, 2.2)
String 'Hi', Int 10, Double 2.200000
37
>>>


>>> from ctypes import c_int, WINFUNCTYPE, windll
>>> from ctypes.wintypes import HWND, LPCSTR, UINT
>>> prototype = WINFUNCTYPE(c_int, HWND, LPCSTR, LPCSTR, c_uint)
>>> paramflags = (1, "hwnd", 0), (1, "text", "Hi"), (1, "caption",
None), (1, "flags", 0)
>>> MessageBox = prototype(("MessageBoxA", windll.user32), paramflags)

It's ugly but in the typical case you would hide all of the declarations in
a module (maybe an auto-generated module) and just focus on your logic:

>>> MessageBox()
>>> MessageBox(text="Spam, spam, spam")
>>> MessageBox(flags=2, text="foo bar")
>>


 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060810/07616e7c/attachment.htm 

From guido at python.org  Fri Aug 11 02:56:47 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 10 Aug 2006 17:56:47 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
	<ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>
	<1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com>
	<ca471dc20608101617p2e6d13e7k2b1e96c7f23746e2@mail.gmail.com>
	<1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com>
Message-ID: <ca471dc20608101756y65048ee8v501ee8955d1f70dc@mail.gmail.com>

On 8/10/06, Paul Prescod <paul at prescod.net> wrote:
> > I don't know enough about ctypes, but assuming I have a reason to
> > write an extension in C (e.g. Tkinter, which uses the Tcl/Tk API), how
> > to I use ctypes to call things like PyDict_GetItem() or
> > PyErr_SetString()?
>
> There are two answers to your question. The simplest is that if you have a
> dict object called "foo" you just call 'foo["abc"]'. It's just Python. Same
> for the other one: you'd just call 'raise'.

That doesn't make sense if you want to write your extension in C.
Surely you don't propose to rewrite all of tkinter.c in Python? That
would be insane. Or Numeric? That would kill performance.

> Ctypes is the opposite model of the standard extension stuff. You're writing
> in Python so Python stuff is straightforward (just Python) and C stuff is a
> bit weird. So if you had to populate a Python dictionary from a C struct
> then it is the reading from the C struct that takes a bit of doing. The
> writing the Python dictionary is straightforward.
>
> If there was a reason to call PyDict_GetItem directly (performance maybe???)
> then that's possible. You need to set up the function prototype (which you
> would probably do in a helper library) and then you just call
> PyDict_GetItem. CTypes would coerce the types. py_object is a native data
> type.
>
> So I think it ends up looking like
>
> from PythonConvenienceFunctions import PyDict_GetItem
>
> obj = {}
> key = "Guido"
>
> rc = PyDict_GetItem(obj, key)
>
> I'm sure an expert will correct me if I'm wrong...

I guess I object against the idea that we have to write all extensions
in Python using ctypes for all C calls. This is okay if there's
relatively little interaction with C code. It's insane if you're doing
serious C code. And what about C++ extensions?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Fri Aug 11 03:47:38 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Aug 2006 13:47:38 +1200
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
Message-ID: <44DBE1BA.6000204@canterbury.ac.nz>

Paul Prescod wrote:
> It seems that the emerging 
> consensus (bar a security question from Guido) is that ctypes it the way 
> forward for calling C code in Python 3000. I'd like to clarify what this 
> might mean:

What's the state of play concerning ctypes support
on non-x86 platforms?

Until ctypes is uniformly supported on all platforms,
it can't be considered a complete replacement for
C-coded extensions (whether handwritten or generated
by something else).

--
Greg

From lcaamano at gmail.com  Fri Aug 11 05:01:45 2006
From: lcaamano at gmail.com (Luis P Caamano)
Date: Thu, 10 Aug 2006 23:01:45 -0400
Subject: [Python-3000] threading, part 2
Message-ID: <c56e219d0608102001i44b1267dqb581c2171ced33ce@mail.gmail.com>

Yes, I also wonder about how non-CPython implementations would handle
this but I'd just like to say that this feature, making a thread raise
a specific exception from another thread asynchronously is a very
useful feature.

We have a subsystem that schedules requests that are dispatched in a
thread each.  The only way to cancel one of those requests right now
is via a cooperative checking method in which we explicitly make calls
through out the code to see if the request has been canceled, and in
such case, the check raises an exception that triggers clean up and
cancellation.

Problem is we have to spread check calls all over the place.  All this
would be a lot easier if we could do thread.terminate() as proposed,
especially for new code.


On 8/9/06, "Guido van Rossum" wrote:
> On 8/9/06, Tim Peters <tim.peters at gmail.com> wrote:
> > [Nick Coghlan]
> > >> That check is already there:
> > >>
> > >> int PyThreadState_SetAsyncExc(  long id, PyObject *exc)
> > >>      Asynchronously raise an exception in a thread. The id argument is the
> > >> thread id of the target thread; exc is the exception object to be raised. This
> > >> function does not steal any references to exc. To prevent naive misuse, you
> > >> must write your own C extension to call this. Must be called with the GIL
> > >> held. Returns the number of thread states modified; if it returns a number
> > >> greater than one, you're in trouble, and you should call it again with exc set
> > >> to NULL to revert the effect. This raises no exceptions. New in version 2.3.
> >
> > Guido, do you have any idea now what the "number greater than one"
> > business is about?  That would happen if and only if we found more
> > than one thread state with the given thread id in the interpreter's
> > list of thread states, but we're counting those with both the GIL and
> > the global head_mutex lock held.  My impression has been that it would
> > be an internal logic error if we ever saw this count exceed 1.
>
> Right, I think that's it. I guess I was in a grumpy mood when I wrote
> this (and Just & Alex never ended up using it!).
>
> > While I'm at it, I expect:
> >
> >                 Py_CLEAR(p->async_exc);
> >                 Py_XINCREF(exc);
> >                 p->async_exc = exc;
> >
> > is better written:
> >
> >                 Py_XINCREF(exc);
> >                 Py_CLEAR(p->async_exc);
> >                 p->async_exc = exc;
> >
> > for the same reason one should always incref B before decrefing A in
> >
> >     A = B
> >
> > ...
>
> That reason that A and B might already be the same object, right?
>
> > >> All Tober is really asking for is a method on threading.Thread objects that
> > >> uses this existing API to set a builtin ThreadExit exception. The thread
> > >> module would consider a thread finishing with ThreadExit to be
> > >> non-exceptional, so you could easily do:
> > >>
> > >>    th.terminate() # Raise ThreadExit in th's thread of control
> > >>    th.join() # Should finish up pretty quickly
> > >>
> > >> Proper resource cleanup would be reliant on correct use of try/finally or with
> > >> statements, but that's the case regardless of whether or not asynchronous
> > >> exceptions are allowed.
> >
> > [Guido]
> > > I'm +0 on this.
> >
> > Me too, although it won't stay that simple, and I'm clear as mud on
> > how implementations other than CPython could implement this.
>
> Another good reason to keep it accessible from the C API only. Now I'm
> -0 on adding it. I suggest that if someone really wants this
> accessible from Python, they should research how Jython, IronPython,
> PyPy and Stackless could handle this, and report their research in a
> PEP.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>


-- 
Luis P Caamano
Atlanta, GA USA

From greg.ewing at canterbury.ac.nz  Fri Aug 11 05:48:48 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Aug 2006 15:48:48 +1200
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
	<ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>
	<1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com>
	<ca471dc20608101617p2e6d13e7k2b1e96c7f23746e2@mail.gmail.com>
	<1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com>
Message-ID: <44DBFE20.7040900@canterbury.ac.nz>

Another thought about ctypes: What if you want to pass
a Python function into C as a callback? Does ctypes
have a way of handling that?

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From talin at acm.org  Fri Aug 11 15:10:31 2006
From: talin at acm.org (Talin)
Date: Fri, 11 Aug 2006 06:10:31 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608091958u2d00db76s48260853942bed32@mail.gmail.com>
References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>	<44DA9E54.5020105@canterbury.ac.nz>
	<43aa6ff70608091958u2d00db76s48260853942bed32@mail.gmail.com>
Message-ID: <44DC81C7.1070905@acm.org>

Collin Winter wrote:
> The idea is that each developer can pick the notation/semantics that's
> most natural to them. I'll go even further: say one library offers a
> semantics you find handy for task A, while another library's ideas
> about type annotations are best suited for task B. Without a single
> standard, you're free to mix and match these libraries to give you a
> combination that allows you to best express the ideas you're going
> for.

Let me tell you a story.

Once upon a time, there was a little standard called Midi (Musical
Instrument Digital Interface). The Midi standard was small and
lightweight, containing less than a dozen commands of 2-3 bytes each.
However, they realized that they needed a way to allow hardware vendors
to add their own custom message types, so they created a special message
type called "System Exclusive Message" or SysEx for short. The idea is
that you would send a 3-byte manufacturer ID, and then any subsequent
bytes would be considered to be in a vendor-specific format. The MMA
(Midi Manufacturers Association) did not provide any guidelines or
suggestions as to what the format of those bytes should be - it would be
completely up to the vendors to decide what the format of their system
exclusive message would be.

Since the Midi standard did not define a way to save and load the
instrument's memory, vendors typically would use the SysEx message to
allow a "bulk dump" of patch information - essentially it was a way to
access the instrument's internal state of sounds, programs, sequences,
and so on.

This would have worked fine, except for the fact that the vendors and
the MMA were not the only stakeholders. Just about this time (mid-80s)
there began to rise a new type of music company: companies like Mark of
the Unicorn, Steinberg Audio and Blue Ribbon Soundworks that created
professional music software for personal computers. Some companies made
sequencer programs that would allow you to enter musical scores on the
computer screen and play them back through your Midi instrument. Other
companies worked on a different type of product - a "Universal
Librarian", essentially a computer program which would store all of your
patches and sound programs for all your different instruments.

In 1987 I created a program for the Amiga called Music-X, which was a
combination of sequencer and Universal Librarian. In order to create the
librarian module, I needed to get information about all of the various
vendor-specific protocols

    Interrupt - as I was typing this last sentence, I knocked over my
    glass of ice water onto my Powerbook G4, completely toasting the
    motherboard and damaging the display. 24 hours, and $2700 later, I
    have completed my "forced upgrade" and can now continue this posting.
    Lesson to be learned: Internet rants and prescription pain meds do
    not mix! Be warned!

...which was not that difficult, since most of the vendors wold include 
an appendix in the back of the users manual (generally written in very 
bad english) describing the SysEx protocol for that device. I was also 
able to get my hands on "The Big Midi Book of SysEx protocols", which 
was essentially the xerox of all of these various appendices, bound up 
in book form and sold commercially.

At the time there were approximately 150 registered vendor IDs, but my 
idea was that I wouldn't have to implement every protocol - I figured, 
since all I wanted to do was load and store the resulting information, I 
didn't really need to *interpret* the data, I just needed to store it. 
Of course, I would need to interpret any transport-layer instructions 
(commands, block headers, checksums and so on), since a lot of 
instruments sent their "data dumps" as multiple SysEx messages which 
would need to be stored together.

But I figured, since I was only supporting two vendor-specific commands 
for each vendor - bulk dump and bulk load - how different can they all 
be? Sure, there were likely to be individual variations on how things 
were done, but I could solve that by creating a per-instrument 
"personality file" - essentially a set of parameters which would tweak 
the behavior of my transport module. So for example, one parameter would 
indicate the type of checksum algorithm to be used, the second would 
indicate the number of checksum bytes, and so on.

For instruments that I couldn't borrow to test, I would rely on my users 
to fill in the holes (Ah, the heady optimism of the early days of the 
computer revolution!) and I would then add the user-contributed 
parameters to each update of the product.

I think by now you can start to see where this all goes wrong.

I started with a small set of 3 instruments, each from a different 
manufacturer. I analyzed their bulk data protocols, and came up with an 
abstract model that encompassed all of them as a superset. Then I added 
a 4th synth, only to discover that its bulk dump protocol was completely 
different than the previous three, and so my model had to be rebuild 
from scratch. No problem, I thought, 3 is too small a sample size 
anyway. Then I added a 5th synth, and the same thing happened. And a 
6th. And so on.

For example, every vendor I investigated used a *completely different* 
algorithm for computing checksums. Some used CRCs, some did simple 
addition, others used XOR - and some had odd ideas of *which* bytes 
should be checksummed. Some of the algorithms were really bad too.

Different vendors also used different byte encodings. Because Midi is 
designed to work in an environment where cables can be unplugged at any 
moment, and because all other Midi messages (other than SysEx) were at 
most 3 bytes long, the Midi standard required that only 7 bits of each 
byte could be used to carry data, the 8th bit was reserved for a "start 
of new message" flag.

Different vendors adapted to this challenge with surprising creativity. 
Some would simply slice the whole dump into units of 7 bits each, 
crossing the normal byte boundaries. Some would only send 4 bits per 
Midi Byte. Some did things like: For each 7 bytes of input data, send 
the bottom 7 bits of each input byte as the first 7 bytes, and then send 
an 8th byte containing the missing top-bits from the first seven. And 
then there were those clever manufacturers who simply decided to design 
their instruments so that no control parameter could have a magnitude 
greater than 127.

Another example of variation was in timing. Roland machines (of certain 
models) were notorious for rejecting messages if they were sent too fast 
- you had to wait at least 20 ms from the time you received a message to 
the time you sent the response. Others would "time out" if you waited 
too long.

There were half-duplex and full-duplex, stateless and stateful 
protocols, and I could go on. The point is, that there was no way for me 
to come up with some sort of algorithmic way to describe all of these 
protocols - the only way to do was in code, with a separate 
implementation for each and every protocol. Nowadays, I'd simply embed 
Python into the program and make each personality file a Python script, 
but I didn't have that option back then. I toyed around with the idea of 
inventing a custom scripting language specifically for representing dump 
protocols, but the idea was infeasible at the time.

So, if you have had the patience to read through this long-winded 
anecdote and are wondering how in the hell this relates to Colin's 
question, I can sum it up in a very short motto (and potential QOTW):

   "Never question the creative power of an infinite number of monkeys."

Or to put it another way: If you create a tool, and you assume that tool 
will only be used in certain specific ways, but you fail to enforce that 
limitation, then your assumption will be dead wrong. The idea that there 
will only be a few type annotation providers who will all nicely 
cooperate with one another is just as naive as I was in the SysEx debacle.

I'll have more focused things to say about this later, but I need to 
rest. (Had to get that out before all the rant energy dissipated.)

-- Talin

From krstic at solarsail.hcs.harvard.edu  Fri Aug 11 08:44:56 2006
From: krstic at solarsail.hcs.harvard.edu (Ivan Krstic)
Date: Fri, 11 Aug 2006 02:44:56 -0400
Subject: [Python-3000] threading, part 2
In-Reply-To: <44DB165B.2040901@gmail.com>
References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com>	<fb6fbf560608081231x179bbbd1y9fd06af48cc64e84@mail.gmail.com>	<44D9BCB4.5010404@gmail.com>	<ca471dc20608091153m7cc60a68yfc3f002519e93795@mail.gmail.com>	<1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com>	<ca471dc20608091339g46ebced6y18c7a218678fb6d0@mail.gmail.com>	<1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com>	<ca471dc20608092114j43071728nbd660d182d065316@mail.gmail.com>
	<44DAC4F6.3010002@solarsail.hcs.harvard.edu>
	<44DB165B.2040901@gmail.com>
Message-ID: <44DC2768.7060009@solarsail.hcs.harvard.edu>

Nick Coghlan wrote:
> Given the time frame, I think you might be stuck with using ctypes to
> get at the functionality for Python 2.5. 

That's probably no worse a way to do it than calling an underscored
CPython function; I keep forgetting we're getting out-of-the-box ctypes
goodness in 2.5.

-- 
Ivan Krstic <krstic at solarsail.hcs.harvard.edu> | GPG: 0x147C722D

From theller at python.net  Fri Aug 11 08:58:51 2006
From: theller at python.net (Thomas Heller)
Date: Fri, 11 Aug 2006 08:58:51 +0200
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <44DBFE20.7040900@canterbury.ac.nz>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>	<ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>	<1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com>	<ca471dc20608101617p2e6d13e7k2b1e96c7f23746e2@mail.gmail.com>	<1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com>
	<44DBFE20.7040900@canterbury.ac.nz>
Message-ID: <ebh9rb$sr$1@sea.gmane.org>

Greg Ewing schrieb:
> Another thought about ctypes: What if you want to pass
> a Python function into C as a callback? Does ctypes
> have a way of handling that?
> 
Sure.  The tutorial has an example that calls qsort with a Python
comparison function:

http://docs.python.org/dev/lib/ctypes-callback-functions.html

Thomas


From theller at python.net  Fri Aug 11 09:10:01 2006
From: theller at python.net (Thomas Heller)
Date: Fri, 11 Aug 2006 09:10:01 +0200
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <44DBE1BA.6000204@canterbury.ac.nz>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
	<44DBE1BA.6000204@canterbury.ac.nz>
Message-ID: <ebhag8$4tu$1@sea.gmane.org>

Greg Ewing schrieb:
> Paul Prescod wrote:
>> It seems that the emerging 
>> consensus (bar a security question from Guido) is that ctypes it the way 
>> forward for calling C code in Python 3000. I'd like to clarify what this 
>> might mean:
> 
> What's the state of play concerning ctypes support
> on non-x86 platforms?

Pretty good, I would say.  Look, for example, at the buildbots.
Major architectures that are currently *not* supported:

- Linux/BSD/arm (because the libffi/arm doesn't support closures,
  although ctypes on WindowsCE/arm works)
- Windows/AMD64 (This is probably currently not a major platform.
  Sometimes I'm working on a port for this)
- I know that there are some problems on solaris, although the solaris10/sparc
  buildbot does not report probems.

> Until ctypes is uniformly supported on all platforms,
> it can't be considered a complete replacement for
> C-coded extensions (whether handwritten or generated
> by something else).
> 
> --
> Greg

Thomas


From tomerfiliba at gmail.com  Fri Aug 11 09:33:00 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Fri, 11 Aug 2006 09:33:00 +0200
Subject: [Python-3000] threading, part 2
In-Reply-To: <ca471dc20608101621j557f735cs10f4f491eb3b2ee5@mail.gmail.com>
References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com>
	<ca471dc20608101621j557f735cs10f4f491eb3b2ee5@mail.gmail.com>
Message-ID: <1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com>

[Guido]
> I expect that Jython doesn't implement this; it doesn't handle ^C either AFAIK.

threads are at most platform agnostic (old unices, embedded systems, etc.
are not likely to have thread support)

so keeping this in mind, and having interrupt_main part of the standard
thread API, which as you say, may not be implementation agnostic,
why is thread.raise_exc(id, excobj) a bad API?

and as i recall, dotNET's Thread.AbortThread or whatever it's called
works that way (raising an exception in the other thread), so IronPython
for once, should be happy with it.

by the way, is the GIL part of the python standard? i.e., does IronPython
implement it, although it shouldn't be necessary in dotNET?


-tomer

From slawomir.nowaczyk.847 at student.lu.se  Fri Aug 11 12:48:32 2006
From: slawomir.nowaczyk.847 at student.lu.se (Slawomir Nowaczyk)
Date: Fri, 11 Aug 2006 12:48:32 +0200
Subject: [Python-3000] threading, part 2
In-Reply-To: <c56e219d0608102001i44b1267dqb581c2171ced33ce@mail.gmail.com>
References: <c56e219d0608102001i44b1267dqb581c2171ced33ce@mail.gmail.com>
Message-ID: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se>

On Thu, 10 Aug 2006 23:01:45 -0400
Luis P Caamano <lcaamano at gmail.com> wrote:

#> Yes, I also wonder about how non-CPython implementations would handle
#> this but I'd just like to say that this feature, making a thread raise
#> a specific exception from another thread asynchronously is a very
#> useful feature.
#> 
#> We have a subsystem that schedules requests that are dispatched in a
#> thread each.  The only way to cancel one of those requests right now
#> is via a cooperative checking method in which we explicitly make calls
#> through out the code to see if the request has been canceled, and in
#> such case, the check raises an exception that triggers clean up and
#> cancellation.
#> 
#> Problem is we have to spread check calls all over the place.  All this
#> would be a lot easier if we could do thread.terminate() as proposed,
#> especially for new code.

"All over the place"? Literally? In other words, how likely is it that
your code would still be correct if you had this check after *every*
single statement? Or even more often -- every N bytecodes?

I believe that if asynchronous exception raising ever gets officially
approved, there absolutely *needs* to be a way to block it for a piece
of code that should execute atomically.

It is (more or less) OK to have an unofficial way to terminate the
thread, with "use on your own risk", because there are situations
where it is useful and (in a cooperative environment) reasonably safe
thing to do. 

But it should not be done lightly and never when the code is not
specifically expecting it.

-- 
 Best wishes,
   Slawomir Nowaczyk
     ( Slawomir.Nowaczyk at cs.lth.se )

Live in the past and future only.


From pje at telecommunity.com  Fri Aug 11 17:32:55 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 11 Aug 2006 11:32:55 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <mailman.34014.1155280218.27774.python-3000@python.org>
Message-ID: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>

At 06:10 AM 8/11/2006 -0700, Talin <talin at acm.org> wrote:
>Or to put it another way: If you create a tool, and you assume that tool
>will only be used in certain specific ways, but you fail to enforce that
>limitation, then your assumption will be dead wrong. The idea that there
>will only be a few type annotation providers who will all nicely
>cooperate with one another is just as naive as I was in the SysEx debacle.

Are you saying that function annotations are a bad idea because we won't be 
able to pickle them?

If not, your entire argument seems specious.  Actually, even if that *is* 
your argument, it's specious, since all that's needed to support pickling 
is to support pickling.  All that's needed to support printing is to 
support printing (via __str__), and so on.

Thus, by a similar process of analogy, all that's needed to support any 
operation is to have an extensible mechanism by which the operation is 
defined, so that the operation can be extended to include new types -- 
i.e., an overloadable function, like pickle.dump.

Conversely, using your analogy, one could say that the iteration protocol 
is a bad idea because lots of people might then have to implement their own 
__iter__ methods.  We should thus only have a fixed set of sequence types!

In short, your argument is based on a false analogy and is nonsensical when 
moved out of the realm of on-the-wire protocols and into the realm of a 
programming language.


From jcarlson at uci.edu  Fri Aug 11 17:45:54 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 11 Aug 2006 08:45:54 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se>
References: <c56e219d0608102001i44b1267dqb581c2171ced33ce@mail.gmail.com>
	<20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se>
Message-ID: <20060811082620.192E.JCARLSON@uci.edu>


Slawomir Nowaczyk <slawomir.nowaczyk.847 at student.lu.se> wrote:
> I believe that if asynchronous exception raising ever gets officially
> approved, there absolutely *needs* to be a way to block it for a piece
> of code that should execute atomically.

There is already a way of making Python source execution atomic with
respect to other Python code [1].


> But it should not be done lightly and never when the code is not
> specifically expecting it.

If you don't want random exceptions being raised in your threads, then
don't use this method that is capable of raising exceptions somewhat
randomly.

 - Josiah

[1]
Remove the two sys.setcheckinterval calls to verify this works.
"proper" use should probably use try/finally wrapping.

>>> import sys
>>> import threading
>>> import time
>>>
>>> x = 0
>>>
>>>
>>> def thr(n):
...     global x
...     while not x:
...         time.sleep(.01)
...     for i in xrange(n):
...         sys.setcheckinterval(sys.maxint)
...         _x = x + 1
...         x, _x = _x, x
...         sys.setcheckinterval(100)
...
>>>
>>> for i in xrange(10):
...     threading.Thread(target=thr, args=(1000000,)).start()
...
>>> x += 1
>>> while threading.activeCount() > 1:
...     time.sleep(.1)
...
>>> print x
10000001
>>>


From jason.orendorff at gmail.com  Fri Aug 11 17:47:39 2006
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Fri, 11 Aug 2006 11:47:39 -0400
Subject: [Python-3000] threading, part 2
In-Reply-To: <1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com>
References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com>
	<ca471dc20608101621j557f735cs10f4f491eb3b2ee5@mail.gmail.com>
	<1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com>
Message-ID: <bb8868b90608110847wb5465eekd13cdb454eeac4cb@mail.gmail.com>

On 8/11/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> why is thread.raise_exc(id, excobj) a bad API?

It breaks seemingly innocent code in subtle ways.  Worse, the breakage
will always be a race condition, so it'll be especially hard to
reproduce and debug.

class Foo:
    ...
    def close(self):
        self.f.close()
        self.closed = True

Any code that uses the "closed" attribute obviously depends on it
being properly set, right?  This close() method gets this right.  It
sets "closed" if and only if the self.f.close() call succeeds.  There
are circumstances where this will fail:  MemoryError,
KeyboardInterrupt, a broken trace function, a broken __setattr__(),
del __builtins__.True... but all are extreme cases.  I think
thread.raise_exc() should be considered extreme too.  Otherwise, its
existence must be considered to degrade the reliability of the above
code.

I'm not saying "don't add this".  Maybe it's useful, particuarly as a
fallback mechanism for killing a runaway thread.  But it should be
documented as an extreme measure.

-j

From jcarlson at uci.edu  Fri Aug 11 18:04:54 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 11 Aug 2006 09:04:54 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
References: <mailman.34014.1155280218.27774.python-3000@python.org>
	<5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
Message-ID: <20060811084623.1931.JCARLSON@uci.edu>


"Phillip J. Eby" <pje at telecommunity.com> wrote:
> 
> At 06:10 AM 8/11/2006 -0700, Talin <talin at acm.org> wrote:
> >Or to put it another way: If you create a tool, and you assume that tool
> >will only be used in certain specific ways, but you fail to enforce that
> >limitation, then your assumption will be dead wrong. The idea that there
> >will only be a few type annotation providers who will all nicely
> >cooperate with one another is just as naive as I was in the SysEx debacle.
> 
> Are you saying that function annotations are a bad idea because we won't be 
> able to pickle them?

That is not what I got out of the message at all.

> If not, your entire argument seems specious.  Actually, even if that *is* 
> your argument, it's specious, since all that's needed to support pickling 
> is to support pickling.  All that's needed to support printing is to 
> support printing (via __str__), and so on.

I think you misunderstood Talin.  While it was a pain for him to work
his way through implementing all of the loading/etc. protocols, I
believe his point was that if we allow any and all arbitrary metadata to
be placed on arguments to and from functions, then invariably there will
be multiple methods of doing as much.  That isn't a problem unto itself,
but when there ends up being multiple metadata formats, with multiple
interpretations of them, and a user decides that they want to combine
the functionality of two metadata formats, they may be stuck due to
incompatibilities, etc.

I think that it can be fixed by defining a standard mechanism for
'metadata chaining', one involving tuples and/or dictionaries.

Say, for example, we have the following function definition:
    def foo(argn:meta=dflt):
        ...

Since meta can take on the value of a Python expression (executed during
compile-time), a tuple-based chaining would work like so:

    @chainmetadatatuple(meta_fcn1, meta_fcn2)
    def foo(argn:(meta1, meta2)=dflt):
        ...

And a dictionary-based chaining would work like so:
    @chainmetadatadict(m1=meta_fcn1, m2=meta_fcn2)
    def foo(argn:{'m1'=meta1, 'm2'=meta2}=dflt):
        ...

The reason to include the dict-based option is to allow for annotations
to be optional.


This method may or may not be good.  But, if we don't define a standard
method for metadata to be combined from multiple protocols, etc., then
we could end up with incompatabilities.  However, if we do define a
standard chaining mechanism, then it can be used, and presumably
we shouldn't run into problems relating to incompatible annotation, etc.


 - Josiah


From jason.orendorff at gmail.com  Fri Aug 11 18:04:09 2006
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Fri, 11 Aug 2006 12:04:09 -0400
Subject: [Python-3000] threading, part 2
In-Reply-To: <20060811082620.192E.JCARLSON@uci.edu>
References: <c56e219d0608102001i44b1267dqb581c2171ced33ce@mail.gmail.com>
	<20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se>
	<20060811082620.192E.JCARLSON@uci.edu>
Message-ID: <bb8868b90608110904s289236d7i9b60f14969966625@mail.gmail.com>

On 8/11/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> Slawomir Nowaczyk <slawomir.nowaczyk.847 at student.lu.se> wrote:
> > But it should not be done lightly and never when the code is not
> > specifically expecting it.
>
> If you don't want random exceptions being raised in your threads, then
> don't use this method that is capable of raising exceptions somewhat
> randomly.

I agree.  The only question is how dire the warnings should be.

I'll answer that question with another question:  Are we going to make
the standard library robust against asynchronous exceptions?  For
example, class Thread has an attribute __stopped that is set using
code similar to the example code I posted.  An exception at just the
wrong time would kill the thread while leaving __stopped == False.

Maybe that particular case is worth fixing, but to find and fix them
all?  Better to put strong warnings on this one method: may cause
unpredictable brokenness.

-j

From jcarlson at uci.edu  Fri Aug 11 18:15:32 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 11 Aug 2006 09:15:32 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <bb8868b90608110904s289236d7i9b60f14969966625@mail.gmail.com>
References: <20060811082620.192E.JCARLSON@uci.edu>
	<bb8868b90608110904s289236d7i9b60f14969966625@mail.gmail.com>
Message-ID: <20060811091309.1934.JCARLSON@uci.edu>


"Jason Orendorff" <jason.orendorff at gmail.com> wrote:
> 
> On 8/11/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Slawomir Nowaczyk <slawomir.nowaczyk.847 at student.lu.se> wrote:
> > > But it should not be done lightly and never when the code is not
> > > specifically expecting it.
> >
> > If you don't want random exceptions being raised in your threads, then
> > don't use this method that is capable of raising exceptions somewhat
> > randomly.
> 
> I agree.  The only question is how dire the warnings should be.
> 
> I'll answer that question with another question:  Are we going to make
> the standard library robust against asynchronous exceptions?  For
> example, class Thread has an attribute __stopped that is set using
> code similar to the example code I posted.  An exception at just the
> wrong time would kill the thread while leaving __stopped == False.
> 
> Maybe that particular case is worth fixing, but to find and fix them
> all?  Better to put strong warnings on this one method: may cause
> unpredictable brokenness.

Considering that it will not be accessable via standard Python, only
through a few ctypes hoops, I believe that is a fairly ready indication
that one should be wary of its use.  I also think it would make sense to
fix that particular instance (to not do so seems to be a bit foolish).


 - Josiah


From qrczak at knm.org.pl  Fri Aug 11 19:00:07 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Fri, 11 Aug 2006 19:00:07 +0200
Subject: [Python-3000] threading, part 2
In-Reply-To: <20060811082620.192E.JCARLSON@uci.edu> (Josiah Carlson's
	message of "Fri, 11 Aug 2006 08:45:54 -0700")
References: <c56e219d0608102001i44b1267dqb581c2171ced33ce@mail.gmail.com>
	<20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se>
	<20060811082620.192E.JCARLSON@uci.edu>
Message-ID: <87fyg32oo8.fsf@qrnik.zagroda>

Josiah Carlson <jcarlson at uci.edu> writes:

> There is already a way of making Python source execution atomic with
> respect to other Python code [1].

It's not realistic to expect sys.setcheckinterval be implementable on
other runtimes.

Also, it doesn't provide a way to unblock asynchronous exceptions until
a particular blocking operation completes.

> If you don't want random exceptions being raised in your threads, then
> don't use this method that is capable of raising exceptions somewhat
> randomly.

It's like saying "if you don't want integer addition overflow, then
don't do addition".

I do want asynchronous exceptions, but not anywhere, only in selected
regions (or excluding selected regions). This can be designed well.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From jcarlson at uci.edu  Fri Aug 11 20:18:56 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 11 Aug 2006 11:18:56 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <87fyg32oo8.fsf@qrnik.zagroda>
References: <20060811082620.192E.JCARLSON@uci.edu>
	<87fyg32oo8.fsf@qrnik.zagroda>
Message-ID: <20060811105742.193A.JCARLSON@uci.edu>


"Marcin 'Qrczak' Kowalczyk" <qrczak at knm.org.pl> wrote:
> 
> Josiah Carlson <jcarlson at uci.edu> writes:
> 
> > There is already a way of making Python source execution atomic with
> > respect to other Python code [1].
> 
> It's not realistic to expect sys.setcheckinterval be implementable on
> other runtimes.

The 'raise an exception in an alternate thread' functionality is a
CPython specific functionality.  If you believe that it could be
implemented in all other runtimes, then you missed the discussion that
stated that it would be impossible to implement in Jython.  As such,
because both are CPython specific features, I don't see a problem with
using both if you are going to be using one of them.


> Also, it doesn't provide a way to unblock asynchronous exceptions until
> a particular blocking operation completes.

I thought the point of this 'block asynchronous exceptions' business was
to block asynchronous exceptions during a particular bit of code.  Now
you are saying that there needs to be a method of bypassing such
blocking from other threads?


> > If you don't want random exceptions being raised in your threads, then
> > don't use this method that is capable of raising exceptions somewhat
> > randomly.
> 
> It's like saying "if you don't want integer addition overflow, then
> don't do addition".

No.  Integer addition is a defined feature of the language.  Raising
exceptions in an alternate thread is a generally unsupported feature
available to CPython, very likely not implementable in most other
runtimes.

It has previously been available via ctypes, but its previous non-use is
a function of its lack of documentation, lack of cytpes shipping with
base Python, etc.

> I do want asynchronous exceptions, but not anywhere, only in selected
> regions (or excluding selected regions). This can be designed well.

Yes, it can be.  You can add a lock to each thread (each thread gets its
own lock).  When a thread doesn't want to be interrupted, it .acquire()s
its lock.  When it is OK to interrupt it, it .release()s its lock.  When
you want to kill a thread, .acquire() its lock, and kill it.

In effect, the above would be what is necessary to give you what you
want.  It can easily be defined as a set of 3 functions, whose
implementation should be left out of the standard library.  Including it
in the standard library offers the illusion of support (in the 'this
language feature is supported' sense) for raising an exception in an
alternate thread, which is not the case (it is available, but not
supported).

 - Josiah


From qrczak at knm.org.pl  Fri Aug 11 21:33:10 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Fri, 11 Aug 2006 21:33:10 +0200
Subject: [Python-3000] threading, part 2
In-Reply-To: <20060811105742.193A.JCARLSON@uci.edu> (Josiah Carlson's
	message of "Fri, 11 Aug 2006 11:18:56 -0700")
References: <20060811082620.192E.JCARLSON@uci.edu>
	<87fyg32oo8.fsf@qrnik.zagroda> <20060811105742.193A.JCARLSON@uci.edu>
Message-ID: <87veozoyo9.fsf@qrnik.zagroda>

Josiah Carlson <jcarlson at uci.edu> writes:

>> It's not realistic to expect sys.setcheckinterval be implementable on
>> other runtimes.
>
> The 'raise an exception in an alternate thread' functionality is a
> CPython specific functionality.  If you believe that it could be
> implemented in all other runtimes, then you missed the discussion that
> stated that it would be impossible to implement in Jython.

Indeed both are hard to implement on some runtimes.

I believe there are runtimes where asynchronous exceptions are
practical while blocking context switching is not (e.g. POSIX threads
combined with Unix signals and C++ exceptions).

In any case, blocking switching the context to any other thread is an
overkill. It's hard to say how sys.setcheckinterval should behave on
truly parallel runtimes, while the semantics of blockable asynchronous
exceptions doesn't depend on threads being dispatched sequentially.

>> Also, it doesn't provide a way to unblock asynchronous exceptions until
>> a particular blocking operation completes.
>
> I thought the point of this 'block asynchronous exceptions' business
> was to block asynchronous exceptions during a particular bit of code.
> Now you are saying that there needs to be a method of bypassing such
> blocking from other threads?

No, I'm talking about specifying the blocking behavior by the thread
to be interrupted. It makes sense to wait for e.g. accept() such that
asynchronous exceptions are processed during the wait, but that they
are atomically blocked as soon as a connection is accepted.

Unfortunately it's yet another obstacle to some runtimes.

Yet another issue is asynchronous "signals" which don't necessarily
throw an exception but cause the computation to react and possibly
continue (e.g. suspend a thread until it's resumed).

> Yes, it can be.  You can add a lock to each thread (each thread gets its
> own lock).  When a thread doesn't want to be interrupted, it .acquire()s
> its lock.  When it is OK to interrupt it, it .release()s its lock.  When
> you want to kill a thread, .acquire() its lock, and kill it.

This works almost well. The thread sending an exception is unnecessarily
blocked; this could be solved by starting another thread to send an
exception. And it doesn't support the mentioned unblocking only while
waiting.

The problem is that there is no universally recognized convention:
I can't expect third-party libraries to protect their sensitive
regions by my mutex. Without an agreed convention they can't even
if they want to.

My design includes implicit blocking of asynchronous exception by
certain language constructs, e.g. by taking *any* mutex. Most cases
of taking a mutex also want to block asynchronous signals.

I'm surprised that various runtimes that I would expect to be well
designed provide mostly either unsafe or too restricted means of
asynchronous interruption.
http://java.sun.com/j2se/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html
http://www.interact-sw.co.uk/iangblog/2004/11/12/cancellation

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From pje at telecommunity.com  Fri Aug 11 21:34:01 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 11 Aug 2006 15:34:01 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <20060811084623.1931.JCARLSON@uci.edu>
References: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
	<mailman.34014.1155280218.27774.python-3000@python.org>
	<5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com>

At 09:04 AM 8/11/2006 -0700, Josiah Carlson wrote:
>I think you misunderstood Talin.  While it was a pain for him to work
>his way through implementing all of the loading/etc. protocols, I
>believe his point was that if we allow any and all arbitrary metadata to
>be placed on arguments to and from functions, then invariably there will
>be multiple methods of doing as much.  That isn't a problem unto itself,
>but when there ends up being multiple metadata formats, with multiple
>interpretations of them, and a user decides that they want to combine
>the functionality of two metadata formats, they may be stuck due to
>incompatibilities, etc.

I was giving him the benefit of the doubt by assuming he was bringing up a 
*new* objection that I hadn't already answered.  This "incompatibility" 
argument has already been addressed; it is trivially solved by overloaded 
functions (e.g. pickle.dump(), str(), iter(), etc.).


>This method may or may not be good.  But, if we don't define a standard
>method for metadata to be combined from multiple protocols, etc., then
>we could end up with incompatabilities.

Not if you use overloaded functions to define the operations you're going 
to perform.  You and Talin are proposing a problem here that is not only 
hypothetical, it's non-existent.

Remember, PEAK already does this kind of openly-extensible metadata for 
attributes, using a single-dispatch overloaded function (analagous to 
pickle.dump).  If you want to show that it's really possible to create 
"incompatible" annotations, try creating some for attributes in PEAK.

But, you'll quickly find that the only "meaning" that metadata has is 
*operational*.  That is, either some behavior is influenced by the 
metadata, or no behavior is.  If no behavior is involved, then there can be 
no incompatibility.  If there is behavior, there is an operation to be 
performed, and that operation can be based on the type of the metadata.

Ergo, using an overloadable function for the operation to be performed 
allows a meaning to be defined for the specific combination of operation 
and type.  Therefore, there is no problem - every piece of metadata may be 
assigned a meaning that is relevant for each operation that needs to be 
performed.

Now, it is of course possible that two pieces of metadata may be 
contradictory, redundant, overlapping, etc.  However, this has nothing to 
do with whether the semantics of metadata are predefined.  Any 
sufficiently-useful annotation scheme will include these possibilities, and 
the operations to be performed are going to have to have some defined 
semantics for them.  This is entirely independent of whether there is more 
than one metadata framework in existence.


From jcarlson at uci.edu  Fri Aug 11 22:12:15 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 11 Aug 2006 13:12:15 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <87veozoyo9.fsf@qrnik.zagroda>
References: <20060811105742.193A.JCARLSON@uci.edu>
	<87veozoyo9.fsf@qrnik.zagroda>
Message-ID: <20060811125449.1940.JCARLSON@uci.edu>


Threading is already difficult enough to do 'right' (see the dozens of
threads discussing why this is really the case), and designing software
that can survive the raising of an exception at any point makes
threading even more difficult.

I believe that you are attempting to design an interface to make this
particular feature foolproof.  I think that such is a mistake; killing a
thread should be frought with gotchas and should be documented as "may
crash the runtime".  Offering users anything more is tantamount to
encouraging its use, which is counter to the reasons why it is not
available via a standard threading.function call: because it shouldn't
be used at all, except by people who know what the heck they are doing.

I believe that if a user cannot design and implement their own system to
handle when a thread can be killed or not to their own satisfaction,
then they have no business killing threads.


 - Josiah

"Marcin 'Qrczak' Kowalczyk" <qrczak at knm.org.pl> wrote:
> Josiah Carlson <jcarlson at uci.edu> writes:
> 
> >> It's not realistic to expect sys.setcheckinterval be implementable on
> >> other runtimes.
> >
> > The 'raise an exception in an alternate thread' functionality is a
> > CPython specific functionality.  If you believe that it could be
> > implemented in all other runtimes, then you missed the discussion that
> > stated that it would be impossible to implement in Jython.
> 
> Indeed both are hard to implement on some runtimes.
> 
> I believe there are runtimes where asynchronous exceptions are
> practical while blocking context switching is not (e.g. POSIX threads
> combined with Unix signals and C++ exceptions).
> 
> In any case, blocking switching the context to any other thread is an
> overkill. It's hard to say how sys.setcheckinterval should behave on
> truly parallel runtimes, while the semantics of blockable asynchronous
> exceptions doesn't depend on threads being dispatched sequentially.
> 
> >> Also, it doesn't provide a way to unblock asynchronous exceptions until
> >> a particular blocking operation completes.
> >
> > I thought the point of this 'block asynchronous exceptions' business
> > was to block asynchronous exceptions during a particular bit of code.
> > Now you are saying that there needs to be a method of bypassing such
> > blocking from other threads?
> 
> No, I'm talking about specifying the blocking behavior by the thread
> to be interrupted. It makes sense to wait for e.g. accept() such that
> asynchronous exceptions are processed during the wait, but that they
> are atomically blocked as soon as a connection is accepted.
> 
> Unfortunately it's yet another obstacle to some runtimes.
> 
> Yet another issue is asynchronous "signals" which don't necessarily
> throw an exception but cause the computation to react and possibly
> continue (e.g. suspend a thread until it's resumed).
> 
> > Yes, it can be.  You can add a lock to each thread (each thread gets its
> > own lock).  When a thread doesn't want to be interrupted, it .acquire()s
> > its lock.  When it is OK to interrupt it, it .release()s its lock.  When
> > you want to kill a thread, .acquire() its lock, and kill it.
> 
> This works almost well. The thread sending an exception is unnecessarily
> blocked; this could be solved by starting another thread to send an
> exception. And it doesn't support the mentioned unblocking only while
> waiting.
> 
> The problem is that there is no universally recognized convention:
> I can't expect third-party libraries to protect their sensitive
> regions by my mutex. Without an agreed convention they can't even
> if they want to.
> 
> My design includes implicit blocking of asynchronous exception by
> certain language constructs, e.g. by taking *any* mutex. Most cases
> of taking a mutex also want to block asynchronous signals.
> 
> I'm surprised that various runtimes that I would expect to be well
> designed provide mostly either unsafe or too restricted means of
> asynchronous interruption.
> http://java.sun.com/j2se/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html
> http://www.interact-sw.co.uk/iangblog/2004/11/12/cancellation
> 
> -- 
>    __("<         Marcin Kowalczyk
>    \__/       qrczak at knm.org.pl
>     ^^     http://qrnik.knm.org.pl/~qrczak/
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jcarlson%40uci.edu


From jcarlson at uci.edu  Fri Aug 11 22:46:42 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 11 Aug 2006 13:46:42 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com>
References: <20060811084623.1931.JCARLSON@uci.edu>
	<5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com>
Message-ID: <20060811131616.1943.JCARLSON@uci.edu>


"Phillip J. Eby" <pje at telecommunity.com> wrote:
> At 09:04 AM 8/11/2006 -0700, Josiah Carlson wrote:
> >I think you misunderstood Talin.  While it was a pain for him to work
> >his way through implementing all of the loading/etc. protocols, I
> >believe his point was that if we allow any and all arbitrary metadata to
> >be placed on arguments to and from functions, then invariably there will
> >be multiple methods of doing as much.  That isn't a problem unto itself,
> >but when there ends up being multiple metadata formats, with multiple
> >interpretations of them, and a user decides that they want to combine
> >the functionality of two metadata formats, they may be stuck due to
> >incompatibilities, etc.
> 
> I was giving him the benefit of the doubt by assuming he was bringing up a 
> *new* objection that I hadn't already answered.  This "incompatibility" 
> argument has already been addressed; it is trivially solved by overloaded 
> functions (e.g. pickle.dump(), str(), iter(), etc.).

In effect, you seem to be saying "when user X wants to add their own
metadata with interpretation, they need to overload the previously
existing metadata interpreter".  However, as has already been stated,
because there is no standard metadata interpreter, nor a standard method
for chaining metadata, how is user X supposed to overload the previously
existing metadata interpreter?

Since you brought up pickle.dump(), str(), iter(), etc., I'll point out
that str(), iter(), etc., call special methods on the defined object
(__str__, __iter__, etc.), and while pickle can have picklers be
registered, it also has a special method interface.  Because all of the
metadata defined is (according to the pre-PEP) attached to a single
__signature__ attribute of the function, interpretation of the metadata
isn't as easy as calling str(obj), as you claim.

Let us say that I have two metadata interpters.  One that believes that
the metadata is types and wants to verify type on function call.  The
other believes that the metadata is documentation.  Both were written
without regards to the other.  Please describe to me (in code preferably)
how I would be able to use both of them without having a defined
metadata interpretation chaining semantic.


> >This method may or may not be good.  But, if we don't define a standard
> >method for metadata to be combined from multiple protocols, etc., then
> >we could end up with incompatabilities.
> 
> Not if you use overloaded functions to define the operations you're going 
> to perform.  You and Talin are proposing a problem here that is not only 
> hypothetical, it's non-existent.
> 
> Remember, PEAK already does this kind of openly-extensible metadata for 
> attributes, using a single-dispatch overloaded function (analagous to 
> pickle.dump).  If you want to show that it's really possible to create 
> "incompatible" annotations, try creating some for attributes in PEAK.

Could you at least provide a link to where it is documented how to
create metadata attributes in PEAK?  My attempts to delve into PEAK
documentation has thus far failed horribly.

 - Josiah


From pje at telecommunity.com  Fri Aug 11 23:11:00 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 11 Aug 2006 17:11:00 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <20060811131616.1943.JCARLSON@uci.edu>
References: <5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com>
	<20060811084623.1931.JCARLSON@uci.edu>
	<5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060811165113.03cabe60@sparrow.telecommunity.com>

At 01:46 PM 8/11/2006 -0700, Josiah Carlson wrote:

>"Phillip J. Eby" <pje at telecommunity.com> wrote:
> > At 09:04 AM 8/11/2006 -0700, Josiah Carlson wrote:
> > >I think you misunderstood Talin.  While it was a pain for him to work
> > >his way through implementing all of the loading/etc. protocols, I
> > >believe his point was that if we allow any and all arbitrary metadata to
> > >be placed on arguments to and from functions, then invariably there will
> > >be multiple methods of doing as much.  That isn't a problem unto itself,
> > >but when there ends up being multiple metadata formats, with multiple
> > >interpretations of them, and a user decides that they want to combine
> > >the functionality of two metadata formats, they may be stuck due to
> > >incompatibilities, etc.
> >
> > I was giving him the benefit of the doubt by assuming he was bringing up a
> > *new* objection that I hadn't already answered.  This "incompatibility"
> > argument has already been addressed; it is trivially solved by overloaded
> > functions (e.g. pickle.dump(), str(), iter(), etc.).
>
>In effect, you seem to be saying "when user X wants to add their own
>metadata with interpretation, they need to overload the previously
>existing metadata interpreter".

No, they need to overload whatever *operation* is being performed *on* the 
metadata.

For example, if I am using a decorator that adds type checking to the 
function, then that decorator is an example of an operation that should be 
overloadable.

More precisely, that decorator would probably have an operation that 
generates type checking code for an individual type annotation -- and 
*that* is the operation that would need overloading.  The 
"generate_typecheck_code()" operation would be an overloadable function.

Another possible operation: printing help for a function.  You would need a 
"format_type_annotation()" overloadable operation, and so on.

There is no *single* "metadata interpreter", in other words.  There are 
just operations you perform on metadata.

If multiple people define different variants of the same operation, let's 
say "generate_typecheck_code()" and "generate_code_for_typecheck()", and 
you have some code that defines methods for one overloadable function, but 
you have code that wants to call the other, you just write some methods for 
one that call the other, or make one be the default implementation for the 
other.

There is no need for a *single* canonical operation *or* type.  This is the 
whole point of generic functions, really.  They eliminate the need for One 
Framework To Rule Them All, and tend to dissolve the "framework"ness right 
out of frameworks.  What you end up with are extensible libraries instead 
of frameworks.


>Since you brought up pickle.dump(), str(), iter(), etc., I'll point out
>that str(), iter(), etc., call special methods on the defined object
>(__str__, __iter__, etc.), and while pickle can have picklers be
>registered, it also has a special method interface.  Because all of the
>metadata defined is (according to the pre-PEP) attached to a single
>__signature__ attribute of the function, interpretation of the metadata
>isn't as easy as calling str(obj), as you claim.

Actually, with overloadable functions, it is, since overloadable functions 
can be extended by anybody, without needing to monkey with the 
classes.  Note that if Guido had originally created Python with 
overloadable functions, it's rather unlikely that __special__ methods would 
have arisen.  Instead, it's much more likely that there would be syntax 
sugar for easily defining overloads, like "defop str(self): ...".


>Let us say that I have two metadata interpters.  One that believes that
>the metadata is types and wants to verify type on function call.  The
>other believes that the metadata is documentation.  Both were written
>without regards to the other.  Please describe to me (in code preferably)
>how I would be able to use both of them without having a defined
>metadata interpretation chaining semantic.

See explanation above.


> > Remember, PEAK already does this kind of openly-extensible metadata for
> > attributes, using a single-dispatch overloaded function (analagous to
> > pickle.dump).  If you want to show that it's really possible to create
> > "incompatible" annotations, try creating some for attributes in PEAK.
>
>Could you at least provide a link to where it is documented how to
>create metadata attributes in PEAK?  My attempts to delve into PEAK
>documentation has thus far failed horribly.

Here's the tutorial for defining new metadata (among other things):

http://svn.eby-sarna.com/PEAK/src/peak/binding/attributes.txt?view=markup

The example defines a "Message()" metadata type whose sole purpose is to 
print a message when the attribute is declared.

What's not really explained there is that all the 'addMethod' stuff is 
basically adding methods to an overloaded function.

Anyway, PEAK uses this simple metadata declaration system to implement both
security permission declarations:

http://peak.telecommunity.com/DevCenter/SecurityRules#linking-actions-to-permissions

and command-line options:

http://peak.telecommunity.com/DevCenter/OptionsHowTo#declaring-options

In PEAK's case, a single overloaded operation is invoked when the metadata 
is defined, and then that overloaded operation performs whatever actions 
are relevant for the metadata.  For function metadata, however, it's 
sufficient to use distinct overloaded functions for distinct operations and 
not actually "do" anything unless it's needed.

However, if we wanted things to be able to happen just by declaring 
metadata (without using any decorators or performing any other operations), 
then yes, the language would need some equivalent to PEAK's 
"declareAttribute()" overloaded function.  However, my understanding of the 
proposal was that annotations were intended to be inert and purely 
informational *unless* processed by a decorator or some other mechanism.


From talin at acm.org  Sat Aug 12 00:16:11 2006
From: talin at acm.org (Talin)
Date: Fri, 11 Aug 2006 15:16:11 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
Message-ID: <44DD01AB.20809@acm.org>

Phillip J. Eby wrote:
> At 06:10 AM 8/11/2006 -0700, Talin <talin at acm.org> wrote:
>> Or to put it another way: If you create a tool, and you assume that tool
>> will only be used in certain specific ways, but you fail to enforce that
>> limitation, then your assumption will be dead wrong. The idea that there
>> will only be a few type annotation providers who will all nicely
>> cooperate with one another is just as naive as I was in the SysEx 
>> debacle.
> 
> Are you saying that function annotations are a bad idea because we won't 
> be able to pickle them?

Huh? What does pickling have to do with anything I said?

-- Talin


From talin at acm.org  Sat Aug 12 00:39:56 2006
From: talin at acm.org (Talin)
Date: Fri, 11 Aug 2006 15:39:56 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <20060811084623.1931.JCARLSON@uci.edu>
References: <mailman.34014.1155280218.27774.python-3000@python.org>	<5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
	<20060811084623.1931.JCARLSON@uci.edu>
Message-ID: <44DD073C.7030305@acm.org>

Josiah Carlson wrote:
> "Phillip J. Eby" <pje at telecommunity.com> wrote:
>> At 06:10 AM 8/11/2006 -0700, Talin <talin at acm.org> wrote:
>>> Or to put it another way: If you create a tool, and you assume that tool
>>> will only be used in certain specific ways, but you fail to enforce that
>>> limitation, then your assumption will be dead wrong. The idea that there
>>> will only be a few type annotation providers who will all nicely
>>> cooperate with one another is just as naive as I was in the SysEx debacle.
>> Are you saying that function annotations are a bad idea because we won't be 
>> able to pickle them?
> 
> That is not what I got out of the message at all.
> 
>> If not, your entire argument seems specious.  Actually, even if that *is* 
>> your argument, it's specious, since all that's needed to support pickling 
>> is to support pickling.  All that's needed to support printing is to 
>> support printing (via __str__), and so on.
> 
> I think you misunderstood Talin.  While it was a pain for him to work
> his way through implementing all of the loading/etc. protocols, I
> believe his point was that if we allow any and all arbitrary metadata to
> be placed on arguments to and from functions, then invariably there will
> be multiple methods of doing as much.  That isn't a problem unto itself,
> but when there ends up being multiple metadata formats, with multiple
> interpretations of them, and a user decides that they want to combine
> the functionality of two metadata formats, they may be stuck due to
> incompatibilities, etc.
> 
> I think that it can be fixed by defining a standard mechanism for
> 'metadata chaining', one involving tuples and/or dictionaries.
> 
> Say, for example, we have the following function definition:
>     def foo(argn:meta=dflt):
>         ...
> 
> Since meta can take on the value of a Python expression (executed during
> compile-time), a tuple-based chaining would work like so:
> 
>     @chainmetadatatuple(meta_fcn1, meta_fcn2)
>     def foo(argn:(meta1, meta2)=dflt):
>         ...
> 
> And a dictionary-based chaining would work like so:
>     @chainmetadatadict(m1=meta_fcn1, m2=meta_fcn2)
>     def foo(argn:{'m1'=meta1, 'm2'=meta2}=dflt):
>         ...
> 
> The reason to include the dict-based option is to allow for annotations
> to be optional.
> 
> 
> This method may or may not be good.  But, if we don't define a standard
> method for metadata to be combined from multiple protocols, etc., then
> we could end up with incompatabilities.  However, if we do define a
> standard chaining mechanism, then it can be used, and presumably
> we shouldn't run into problems relating to incompatible annotation, etc.
> 
> 
>  - Josiah

Josiah is essentially correct in his interpretation of my views. I 
really don't understand what Phillip is talking about here.

Say I want to annotate a specific argument with two pieces of 
information, a type and a docstring. I have two metadata interpreters, 
one which uses the type information to restrict the kinds of arguments 
that can be passed in, and another which uses the docstring to enhance 
the generated documentation.

Now, lets say that these two metadata interpreters were written by two 
people, who are not in communication with each other. Each one decides 
that they would like to "play nice" with other competing metadata.

So Author A, who wrote the annotation decorator that looks for 
docstrings, decides that not only will he accept docstring annotations, 
but if the annotation is a tuple, then he will search that tuple for any 
docstrings, skipping over any annotations that he doesn't understand. 
(Although how he is supposed to manage that is unclear - since there 
could also be other annotations that are simple text strings as well.)

Author B, who wrote the type-enforcement module, also wants to play nice 
with others, but since he doesn't know A, comes up with a different 
solution. His idea is to create a system in which annotations 
automatically chain each other - so that each annotation has a "next" 
attribute referring to the next annotation.

So programmer C, who wants to incorporate both A and B's work into his 
program, has a dilemma - each has a sharing mechanism, but the sharing 
mechanisms are different and incompatible. So he is unable to apply both 
A-type and B-type metadata to any given signature.

What happens next is that C complains to both A and B (and in the 
process introducing them to each other.) A and B exchange emails, and 
reach the conclusion that B will modify his library to confirm to the 
sharing mechanism of A.

What this means is that A and B have created a defacto standard. Anyone 
who wants to interoperate with A and B have to write their interpreter 
to conform to the sharing mechanism defined by A and B.

But it also means that anyone outside of the ABC clique will not know 
about A&B's sharing convention, which means that their metadata 
interpreter will not be able to interoperate with A&B-style metadata. So 
in essence, A&B have now "captured" the space of annotations - that is, 
anyone who conforms to the A&B protocol can combine their annotations 
together; Anyone outside that group is excluded from interoperating.

Finally, lets say that A&B eventually become well-known enough that 
their sharing convention becomes the defacto standard. Any metadata that 
wants to interoperate with other metadata-interpretation libraries will 
have to follow the A&B convention. Any metadata library that chooses to 
use a different convention will be at a severe disadvantage, since they 
won't be able to be used together with other metadata interpreters.

What this means is that, despite the statements that annotations have no 
defined format or meaning, the fact is that they now do: The defacto A&B 
sharing convention. The sharing convention tells metadata interpreters 
how to distinguish between metadata that they can interpret, and how to 
skip over other metadata.

So in other words, since the original author of the annotation system 
failed to provide a convention for multiple annotations, they force the 
community to fill in the parts of the standard that they left out.

-- Talin


From seojiwon at gmail.com  Sat Aug 12 01:20:20 2006
From: seojiwon at gmail.com (Jiwon Seo)
Date: Fri, 11 Aug 2006 16:20:20 -0700
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
Message-ID: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>

When we have keyword-only arguments, do we allow 'keyword dictionary'
argument? If that's the case, where would we want to place
keyword-only arguments?

Are we going to allow any of followings?

1. def foo(a, b,  *, key1=None, key2=None, **map)
2. def foo(a, b, *,  **map, key1=None, key2=None)
3. def foo(a, b, *, **map)

-Jiwon

From collinw at gmail.com  Sat Aug 12 01:49:32 2006
From: collinw at gmail.com (Collin Winter)
Date: Fri, 11 Aug 2006 19:49:32 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DD073C.7030305@acm.org>
References: <mailman.34014.1155280218.27774.python-3000@python.org>
	<5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
	<20060811084623.1931.JCARLSON@uci.edu> <44DD073C.7030305@acm.org>
Message-ID: <43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com>

I'll combine my replies to Josian and Talin:

On 8/11/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> Let us say that I have two metadata interpters.  One that believes that
> the metadata is types and wants to verify type on function call.  The
> other believes that the metadata is documentation.  Both were written
> without regards to the other.  Please describe to me (in code preferably)
> how I would be able to use both of them without having a defined
> metadata interpretation chaining semantic.

On 8/11/06, Talin <talin at acm.org> wrote:
> Say I want to annotate a specific argument with two pieces of
> information, a type and a docstring. I have two metadata interpreters,
> one which uses the type information to restrict the kinds of arguments
> that can be passed in, and another which uses the docstring to enhance
> the generated documentation.

[snipped: the rise of a defacto annotation-sharing standard]

> What this means is that, despite the statements that annotations have no
> defined format or meaning, the fact is that they now do: The defacto A&B
> sharing convention. The sharing convention tells metadata interpreters
> how to distinguish between metadata that they can interpret, and how to
> skip over other metadata.

What Josiah is hinting at -- and what Talin describes more explicitly
-- is the problem of how exactly "chaining" annotation interpreters
will work.

The case I've thought out the most completely is that of using
decorators to analyse/utilise the annotations:

1) Each decorator should be written with the assumption that it is the
only decorator that will be applied to a given function (with respect
to annotations).

2) Chaining will be accomplished by maintaining this illusion for each
decorator. For example, if our annotation-sharing convention is that
annotations will be n-tuples (n == number of annotation-interpreting
decorators), where t[i] is the annotation the i-th decorator should
care about, the following chain() function will do the trick (a full
demo script is attached):

>>> def chain(*decorators):
>>>     assert len(decorators) >= 2
>>>
>>>     def decorate(function):
>>>         sig = function.__signature__
>>>         original = sig.annotations
>>>
>>>         for i, dec in enumerate(decorators):
>>>             fake = dict((p, original[p][i]) for p in original)
>>>
>>>             function.__signature__.annotations = fake
>>>             function = dec(function)
>>>
>>>         function.__signature__.annotations = original
>>>         return function
>>>     return decorate

A similar function can be worked out for using dictionaries to specify
multiple annotations.

I'll update the PEP draft to include a section on guidelines for
writing such decorators.

Collin Winter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chaining_decorators.py
Type: text/x-python-script
Size: 1497 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060811/065a0df9/attachment.bin 

From tomerfiliba at gmail.com  Sat Aug 12 02:13:24 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Sat, 12 Aug 2006 02:13:24 +0200
Subject: [Python-3000] threading, part 2
Message-ID: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com>

i mailed this to several people separately, but then i thought it could
benefit
the entire group:

http://sebulba.wikispaces.com/recipe+thread2

it's an implementation of the proposed "thread.raise_exc", through an
extension
to the threading.Thread class. you can test it for yourself; if it proves
useful,
it should be exposed as thread.raise_exc in the stdlib (instead of the
ctypes
hack)... and of course it should be reflected in threading.Thread as welll.



-tomer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/487eb7e6/attachment.htm 

From greg.ewing at canterbury.ac.nz  Sat Aug 12 03:06:40 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 12 Aug 2006 13:06:40 +1200
Subject: [Python-3000] threading, part 2
In-Reply-To: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se>
References: <c56e219d0608102001i44b1267dqb581c2171ced33ce@mail.gmail.com>
	<20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se>
Message-ID: <44DD29A0.4000902@canterbury.ac.nz>

Slawomir Nowaczyk wrote:

> But it should not be done lightly and never when the code is not
> specifically expecting it.

What if, together with a way of blocking asynchronous
exceptions, threads started out by default with them
blocked? Then a thread would have to explicitly consent
to being interrupted.

--
Greg

From pje at telecommunity.com  Sat Aug 12 03:32:49 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 11 Aug 2006 21:32:49 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <mailman.34095.1155334580.27774.python-3000@python.org>
Message-ID: <5.1.1.6.0.20060811211801.02287420@sparrow.telecommunity.com>

At 3:16 PM 8/12/2006 -0700, Talin <talin at acm.org> wrote:
>Phillip J. Eby wrote:
> > At 06:10 AM 8/11/2006 -0700, Talin <talin at acm.org> wrote:
> >> Or to put it another way: If you create a tool, and you assume that tool
> >> will only be used in certain specific ways, but you fail to enforce that
> >> limitation, then your assumption will be dead wrong. The idea that there
> >> will only be a few type annotation providers who will all nicely
> >> cooperate with one another is just as naive as I was in the SysEx
> >> debacle.
> >
> > Are you saying that function annotations are a bad idea because we won't
> > be able to pickle them?
>
>Huh? What does pickling have to do with anything I said?

I'll happily answer that question as soon as you explain what *function 
annotations* have to do with anything you said.  Bonus points if you can 
explain what MIDI has to do with overloaded functions.  :)

To put it another way, the only reason I asked about pickling was to try to 
find *some* meaning in your post.  If pickling doesn't relate, then your 
post has nothing to do with function annotations, because pickling is the 
most similar thing to the programming problem you actually described.

However, if pickling *does* relate, then the mere existence of Python's 
ability to do pickling proves that the MIDI issue, transferred to the 
Python sphere, doesn't actually exist.

Thus, either way, the MIDI problems you described are moot with respect to 
function annotations in Python.

Is that clearer?  (See also my replies to Greg and Josiah on this subject.)


From lcaamano at gmail.com  Sat Aug 12 03:51:25 2006
From: lcaamano at gmail.com (Luis P Caamano)
Date: Fri, 11 Aug 2006 21:51:25 -0400
Subject: [Python-3000] threading, part 2
Message-ID: <c56e219d0608111851i6f053407q96e5989fdff77848@mail.gmail.com>

That's how I feel too Josiah.  In some ways, it's the same as writing
device drivers in a pre-emptable kernel.  You can get interrupted and
pre-empted by the hardware at any freaking time in any piece of code
and your memory might go away so you better pin it and deal with the
interrupts.  Forget about that and you end up with a nice kernel
panic.  Still, we have all kinds of device drivers on SMP,
pre-emptable kernels.  It can be done.

[ sarcastic mode on ]

Yes, if it gets exposed to the language it should come with a big
warning ... now, how condescending should that warning be?  "You can't
use this unless you're a good programmer!"  or "You better know what
you're doing"  or how about "A guy once pulled out all his pubic hair
trying to figure out what happened when he started using this
feature!"?

[ sarcastic mode off]

It's a gun, here's a bullet, it's a tool, go get food but try not to
shoot yourself.

I'm also -0 on this, not that I think my opinion counts though.  I'm
-0 because Tomer pointed me to a nice recipe that uses ctypes to get
to the C interface.  I'm happy with that and we can start using it
right now.  Perhaps that should be as high as it gets expose so that
it would be an automatic skill test?  If you can find it, you probably
know how to use it and the kind of problems you might run into.


On 8/11/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
>
> I believe that if a user cannot design and implement their own system to
> handle when a thread can be killed or not to their own satisfaction,
> then they have no business killing threads.
>
>
>  - Josiah
>


-- 
Luis P Caamano
Atlanta, GA USA

From talin at acm.org  Sat Aug 12 04:17:37 2006
From: talin at acm.org (Talin)
Date: Fri, 11 Aug 2006 19:17:37 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060811211801.02287420@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811211801.02287420@sparrow.telecommunity.com>
Message-ID: <44DD3A41.10507@acm.org>

Phillip J. Eby wrote:
> At 3:16 PM 8/12/2006 -0700, Talin <talin at acm.org> wrote:
>> Phillip J. Eby wrote:
>> > At 06:10 AM 8/11/2006 -0700, Talin <talin at acm.org> wrote:
>> >> Or to put it another way: If you create a tool, and you assume that 
>> tool
>> >> will only be used in certain specific ways, but you fail to enforce 
>> that
>> >> limitation, then your assumption will be dead wrong. The idea that 
>> there
>> >> will only be a few type annotation providers who will all nicely
>> >> cooperate with one another is just as naive as I was in the SysEx
>> >> debacle.
>> >
>> > Are you saying that function annotations are a bad idea because we 
>> won't
>> > be able to pickle them?
>>
>> Huh? What does pickling have to do with anything I said?
> 
> I'll happily answer that question as soon as you explain what *function 
> annotations* have to do with anything you said.  Bonus points if you can 
> explain what MIDI has to do with overloaded functions.  :)

All right. I realize that not everyone made the connection between my 
parable and the current debate, and I need to spell it out more explicitly.

The parable is essentially about standards-writers who fail to do their 
job by underspecifying certain aspects of the standard, and leave the 
solution to individual implementers of the standard; And its also how 
the implementers who try to fill in the missing pieces of the standard 
do so in a way that is unique and incompatible with what every other 
implementer is doing.

The story also has to do with people who assume things about the 
behavior of other software developers - specifically, my assumption that 
other people, working in isolation from one another, would come up with 
the same or similar solutions to a given problem, vs. Colin's assumption 
that other creators of annotation interpreters would coordinate their 
efforts in a sensible way.

What the annotation PEP and the SysEx have in common is that they are 
both dealing with an open-ended specification - one which allows any 
provider to extend the protocol in any way they wish, without any 
knowledge or coordination from any other provider. Both specs describe a 
'container' for information, but deliberately avoid saying what's in the 
container. Both specs fail to provide any means for an external entity 
to discover the meaning of what the objects in the container are - 
instead, external entities must have a priori knowledge of the contained 
data.

My criticism of Colin's PEP was that it hand-waved over some fairly 
major problems, and the logic behind the hand-wave was that, well, 
developers won't do that - there's only going to be a small number of 
such developers, and they will all deal with each other. I wanted to 
illustrate how disastrous such an assumption could be.

Another lesson of the story has to do with the failure of the MMA 
committee to specify any guidelines or hints as to how their open-ended 
protocol should be used. If the MMA had simply put a paragraph in the 
original standard saying "You are free to create any protocol format you 
want, but here's an example of how a bulk dump protocol might work" 
(followed by a description of such), then what would have happened is 
that most of the instrument makers would simply have used the example as 
a starting point. This would have saved millions of man-hours of 
confusion and chaos over the last 20 years. Dozens of companies created 
Universal Librarian products, and all of them had to deal with the 
astounding diversity of protocols, which could have been avoided by one 
little non-binding paragraph in the standard.

In other words, I criticize both the MMA's spec and Colin's for the sin 
of underspecification - that is, allowing critical decisions that 
*should* have been made by the standard writer to instead be made by the 
standard implementers, with the result that each implementer comes up 
with their own unique solution to a problem which should have been 
solved in the original standard doc.

-- Talin


From collinw at gmail.com  Sat Aug 12 04:43:43 2006
From: collinw at gmail.com (Collin Winter)
Date: Fri, 11 Aug 2006 22:43:43 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DD3A41.10507@acm.org>
References: <5.1.1.6.0.20060811211801.02287420@sparrow.telecommunity.com>
	<44DD3A41.10507@acm.org>
Message-ID: <43aa6ff70608111943o1fb05d1eq753157bc4fc53ccb@mail.gmail.com>

On 8/11/06, Talin <talin at acm.org> wrote:
> The story also has to do with people who assume things about the
> behavior of other software developers - specifically, my assumption that
> other people, working in isolation from one another, would come up with
> the same or similar solutions to a given problem, vs. Colin's assumption
> that other creators of annotation interpreters would coordinate their
> efforts in a sensible way.

I make no assumptions that people writing annotation interpreters will
coordinate their efforts. My assertion that "[t]here is no worry that
these libraries will assign semantics at random, or that a variety of
libraries will appear, each with varying semantics and interpretations
of what, say, a tuple of strings means." is not based on coordination
but rather the marketplace. If someone starts assigning semantics that
aren't "pythonic", that don't fit in with how the majority of Python
programmers think, no-one will use their library and it will die.

The drive to write, release and maintain open-source software is
predicated on a desire to have people use your product, to find it
useful. To that end, I expect that the creators of annotation
interpreters will take care to maximise the utility (and hence the
audience) for their library.

> What the annotation PEP and the SysEx have in common is that they are
> both dealing with an open-ended specification - one which allows any
> provider to extend the protocol in any way they wish, without any
> knowledge or coordination from any other provider.

In your long parable, you've ignored the key difference between the
open-ended-ness of my PEP and that of SysEx: there are much greater
environmental constraints on people writing interpreters for function
annotations. The only constraints for developers using SysEx are
"anything you can turn into bytes".

> Another lesson of the story has to do with the failure of the MMA
> committee to specify any guidelines or hints as to how their open-ended
> protocol should be used.

I agree that the PEP needs to include some guidance for those writing
annotation interpreters (such as how to anticipate being used in
conjunction with other interpreters), but I see no merit in setting in
stone a list of officially endorsed uses for function annotations.

> In other words, I criticize both the MMA's spec and Colin's for the sin
> of underspecification - that is, allowing critical decisions that
> *should* have been made by the standard writer to instead be made by the
> standard implementers, with the result that each implementer comes up
> with their own unique solution to a problem which should have been
> solved in the original standard doc.

Are you referring to the fact that the PEP doesn't dictate how lists,
tuples, etc are to be interpreted, or still to the fact that I didn't
include a paragraph talking about interpreter chaining?

> -- Talin

Collin Winter

PS: My name has 2 L's in it.

From pje at telecommunity.com  Sat Aug 12 04:52:57 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 11 Aug 2006 22:52:57 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <mailman.34113.1155347488.27774.python-3000@python.org>
Message-ID: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>

At 03:39 PM 8/12/2006 -0700, Talin <talin at acm.org> wrote:
>Say I want to annotate a specific argument with two pieces of
>information, a type and a docstring. I have two metadata interpreters,
>one which uses the type information to restrict the kinds of arguments
>that can be passed in, and another which uses the docstring to enhance
>the generated documentation.
>
>Now, lets say that these two metadata interpreters were written by two
>people, who are not in communication with each other. Each one decides
>that they would like to "play nice" with other competing metadata.
>
>So Author A, who wrote the annotation decorator that looks for
>docstrings, decides that not only will he accept docstring annotations,
>but if the annotation is a tuple, then he will search that tuple for any
>docstrings, skipping over any annotations that he doesn't understand.
>(Although how he is supposed to manage that is unclear - since there
>could also be other annotations that are simple text strings as well.)
>
>Author B, who wrote the type-enforcement module, also wants to play nice
>with others, but since he doesn't know A, comes up with a different
>solution. His idea is to create a system in which annotations
>automatically chain each other - so that each annotation has a "next"
>attribute referring to the next annotation.
>
>So programmer C, who wants to incorporate both A and B's work into his
>program, has a dilemma - each has a sharing mechanism, but the sharing
>mechanisms are different and incompatible. So he is unable to apply both
>A-type and B-type metadata to any given signature.

Not at all.  A and B need only use overloadable functions, and the problem 
is trivially resolved by adding overloads.  The author of C can add an 
overload to "A" that will handle objects with 'next' attributes, or add one 
to "B" that handles tuples, or both.

I've not bothered to reply to the rest of your email, since it depends on 
assumptions that I've already shown to be invalid.


From pje at telecommunity.com  Sat Aug 12 05:01:38 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 11 Aug 2006 23:01:38 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <mailman.34113.1155347488.27774.python-3000@python.org>
Message-ID: <5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com>

At 07:49 PM 8/12/2006 -0400, "Collin Winter" <collinw at gmail.com> wrote:
>What Josiah is hinting at -- and what Talin describes more explicitly
>-- is the problem of how exactly "chaining" annotation interpreters
>will work.

I'd prefer we not use the word "interpreters" to describe operations that 
use annotations.  It carries a lot of excess baggage.


>The case I've thought out the most completely is that of using
>decorators to analyse/utilise the annotations:
>
>1) Each decorator should be written with the assumption that it is the
>only decorator that will be applied to a given function (with respect
>to annotations).
>
>2) Chaining will be accomplished by maintaining this illusion for each
>decorator. For example, if our annotation-sharing convention is that
>annotations will be n-tuples (n == number of annotation-interpreting
>decorators), where t[i] is the annotation the i-th decorator should
>care about, the following chain() function will do the trick (a full
>demo script is attached):

I don't see the point of this.  A decorator should be responsible for 
manipulating the signature of its return value.  Meanwhile, the semantics 
for combining annotations should be defined by an overloaded function like 
"combineAnnotations(a1,a2)" that returns a new annotation.  There is no 
need to have a special chaining decorator.

May I suggest that you try using Guido's Py3K overloaded function 
prototype?  I expect you'll find that if you play around with it a bit, it 
will considerably simplify your view of what's required to do this.  It 
truly isn't necessary to predefine what an annotation is, or even any 
structural constraints on how they will be combined, since the user is able 
to define for any given type how such things will be handled.


From qrczak at knm.org.pl  Sat Aug 12 06:06:53 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sat, 12 Aug 2006 06:06:53 +0200
Subject: [Python-3000] threading, part 2
In-Reply-To: <20060811125449.1940.JCARLSON@uci.edu> (Josiah Carlson's
	message of "Fri, 11 Aug 2006 13:12:15 -0700")
References: <20060811105742.193A.JCARLSON@uci.edu>
	<87veozoyo9.fsf@qrnik.zagroda> <20060811125449.1940.JCARLSON@uci.edu>
Message-ID: <877j1emwbm.fsf@qrnik.zagroda>

Josiah Carlson <jcarlson at uci.edu> writes:

> Threading is already difficult enough to do 'right' (see the dozens
> of threads discussing why this is really the case), and designing
> software that can survive the raising of an exception at any point
> makes threading even more difficult.

That's why I'm proposing to provide ways to limit those "any points".

> I believe that you are attempting to design an interface to make
> this particular feature foolproof.

No, I'm merely attempting to make it usable.

> I think that such is a mistake; killing a thread should be frought
> with gotchas and should be documented as "may crash the runtime".

You are proposing to make it unusable?

> Offering users anything more is tantamount to encouraging its use,
> which is counter to the reasons why it is not available via a
> standard threading.function call: because it shouldn't be used at
> all, except by people who know what the heck they are doing.

Indeed, you are proposing to make it unusable.

> I believe that if a user cannot design and implement their own
> system to handle when a thread can be killed or not to their own
> satisfaction, then they have no business killing threads.

I have already implemented it. In my own language, where I have
full control over the runtime.

Some Haskell people made the first design a few years ago,
and implemented it in Glasgow Haskell Compiler.
http://citeseer.ist.psu.edu/415348.html

Some people saw that it was good, that the existing handling of
KeyboardInterrupt in Python is unsafe, and they adapted the design
for Python (without actually implementeing it as far as I know).
http://www.cs.williams.edu/~freund/papers/02-lwl2.ps

I built on their experience, extended the design, and implemented it
in my language Kogut, so I can play with it and see how it works in
practice.
http://www.cs.ioc.ee/tfp-icfp-gpce05/tfp-proc/06num.pdf

I'm quite confident that something like this is the right design,
even if some details could be changed.

Now it would be nice if Python had usable asynchronous exceptions too.

If we are not brave enough, we can implement at least an equivalent
of POSIX thread cancellation. It would be better than nothing, though
not as useful, because the default mode allows interruption only at
certain blocking primitives. In this scenario Unix signals need a
different policy so a pure computation not performing I/O nor thread
synchronization can be interrupted; Unix signals usually cause the
whole process to abort so data integrity was less of a concern.

A language with GC and exceptions can do better, with a unified policy
for thread cancellation and Unix signals and other asynchronous events.
It can be done such that well-written libraries are safely interruptible
even if exceptions may occur almost anywhere. Protection should be
built into certain operations (e.g. try...finally extended with an
"initially" clause, or taking a mutex), so that there is less work
needed to make code safe to be interrupted; then quite often it's
already safe.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From collinw at gmail.com  Sat Aug 12 06:33:28 2006
From: collinw at gmail.com (Collin Winter)
Date: Sat, 12 Aug 2006 00:33:28 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com>
References: <mailman.34113.1155347488.27774.python-3000@python.org>
	<5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com>
Message-ID: <43aa6ff70608112133w7eb2d0c6x287c021b108974b@mail.gmail.com>

> I don't see the point of this.  A decorator should be responsible for
> manipulating the signature of its return value.  Meanwhile, the semantics
> for combining annotations should be defined by an overloaded function like
> "combineAnnotations(a1,a2)" that returns a new annotation.  There is no
> need to have a special chaining decorator.
>
> May I suggest that you try using Guido's Py3K overloaded function
> prototype?  I expect you'll find that if you play around with it a bit, it
> will considerably simplify your view of what's required to do this.  It
> truly isn't necessary to predefine what an annotation is, or even any
> structural constraints on how they will be combined, since the user is able
> to define for any given type how such things will be handled.

I've looked at Guido's overloaded function prototype, and while I
think I'm in the direction of understanding, I'm not quite there 100%.

Could you illustrate (in code) what you've got in mind for how to
apply overloaded functions to this problem space?

Collin Winter

From talin at acm.org  Sat Aug 12 06:49:52 2006
From: talin at acm.org (Talin)
Date: Fri, 11 Aug 2006 21:49:52 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
Message-ID: <44DD5DF0.40405@acm.org>

Phillip J. Eby wrote:
> Not at all.  A and B need only use overloadable functions, and the 
> problem is trivially resolved by adding overloads.  The author of C can 
> add an overload to "A" that will handle objects with 'next' attributes, 
> or add one to "B" that handles tuples, or both.


I'm still not sure what you are talking about - what is being overloaded 
here?

Let me give you a better example. Suppose I have a 'docstring' 
annotation and a 'getopt' annotation. The docstring annotation 
associates a string with each argument, which can be inspected by an 
external documentation scanner to produce documentation for that argument.

Thus:

    def myfunc( x : "The x coordinate", y : "The y coordinate" )
       ...

The 'getopt' annotation is used in conjunction with the 'getopt' 
decorator, which converts from command-line arguments to python method 
arguments. The idea is that you have a class that is acting as a back 
end to a command-line shell. Each method in the class corresponds to a 
single command. The annotations allow you to associate specific flags or 
switches with particular arguments. So:

class MyHandler( CommandLineHandler ):

    @getopt
    def list( infile:"i" = sys.stdin, outfile:"o" = sys.stdout ):
       ...

With the getopt handler in place, I can type the following shell command:

    list -i <infile> -o <outfile>

If either the -i or -o switch is omitted, then the corresponding 
argument is either stdin or stdout.

Additionally, the getopt module can generate 'usage' information for the 
function in question:

    Usage: list [-i infile] [-o outfile]

Now, what happens if I want to use both docstrings and the getopt 
decorator on the same function? The both expect to see annotations that 
are strings! How does the doc extractor and the getopt decorator know 
which strings belong to them, and which strings they should ignore?

-- Talin

From slawomir.nowaczyk.847 at student.lu.se  Sat Aug 12 08:22:17 2006
From: slawomir.nowaczyk.847 at student.lu.se (Slawomir Nowaczyk)
Date: Sat, 12 Aug 2006 08:22:17 +0200
Subject: [Python-3000] threading, part 2
In-Reply-To: <c56e219d0608111851i6f053407q96e5989fdff77848@mail.gmail.com>
References: <c56e219d0608111851i6f053407q96e5989fdff77848@mail.gmail.com>
Message-ID: <20060812082034.EFEC.SLAWOMIR.NOWACZYK.847@student.lu.se>

On Fri, 11 Aug 2006 21:51:25 -0400
Luis P Caamano <lcaamano at gmail.com> wrote:

#> That's how I feel too Josiah.  In some ways, it's the same as writing
#> device drivers in a pre-emptable kernel.  You can get interrupted and
#> pre-empted by the hardware at any freaking time in any piece of code
#> and your memory might go away so you better pin it and deal with the
#> interrupts.  Forget about that and you end up with a nice kernel
#> panic.  Still, we have all kinds of device drivers on SMP,
#> pre-emptable kernels.  It can be done.

Of course it can... but do we *really* want programming in Python3k to
be comparable in difficulty to writing device drivers?

-- 
 Best wishes,
   Slawomir Nowaczyk
     ( Slawomir.Nowaczyk at cs.lth.se )

Numeric stability is probably not all that important when you're guessing.


From ncoghlan at gmail.com  Sat Aug 12 08:58:47 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Aug 2006 16:58:47 +1000
Subject: [Python-3000] threading, part 2
In-Reply-To: <20060812082034.EFEC.SLAWOMIR.NOWACZYK.847@student.lu.se>
References: <c56e219d0608111851i6f053407q96e5989fdff77848@mail.gmail.com>
	<20060812082034.EFEC.SLAWOMIR.NOWACZYK.847@student.lu.se>
Message-ID: <44DD7C27.9000006@gmail.com>

Slawomir Nowaczyk wrote:
> On Fri, 11 Aug 2006 21:51:25 -0400
> Luis P Caamano <lcaamano at gmail.com> wrote:
> 
> #> That's how I feel too Josiah.  In some ways, it's the same as writing
> #> device drivers in a pre-emptable kernel.  You can get interrupted and
> #> pre-empted by the hardware at any freaking time in any piece of code
> #> and your memory might go away so you better pin it and deal with the
> #> interrupts.  Forget about that and you end up with a nice kernel
> #> panic.  Still, we have all kinds of device drivers on SMP,
> #> pre-emptable kernels.  It can be done.
> 
> Of course it can... but do we *really* want programming in Python3k to
> be comparable in difficulty to writing device drivers?
> 

No, but "programming in Py3k" and "trying to asynchronously terminate an 
active thread in Py3k without active cooperation from that thread" are not 
really the same thing. Making easy things easy and difficult things possible 
is a good goal - making difficult things appear to be deceptively easy is a 
good way to cause problems down the road :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sat Aug 12 09:58:08 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Aug 2006 17:58:08 +1000
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
Message-ID: <44DD8A10.1040808@gmail.com>

Phillip J. Eby wrote:
> At 03:39 PM 8/12/2006 -0700, Talin <talin at acm.org> wrote:
>> So programmer C, who wants to incorporate both A and B's work into his
>> program, has a dilemma - each has a sharing mechanism, but the sharing
>> mechanisms are different and incompatible. So he is unable to apply both
>> A-type and B-type metadata to any given signature.
> 
> Not at all.  A and B need only use overloadable functions,

Stop right there. "A and B need only use overloadable functions"? That sounds 
an awful lot like placing a constraint on the way annotation libraries are 
implemented in order to facilitate a single program using multiple annotation 
libraries - which is exactly what Talin is saying is needed!

Talin is saying "the annotation PEP needs to recommend a mechanism that allows 
a single program to use multiple annotation libraries". And you're saying "a 
good mechanism for allow a program to use multiple annotation libraries is for 
every annotation library to expose an overloades 'interpret_annotation' 
function that the application can hook in order to handle new annotation types".

I think you're right that overloaded functions are a possible solution to this 
problem, but that doesn't obviate the need for the PEP to address the question 
explicitly (and using overloaded functions for this strikes me as hitting a 
very small nail with a very large hammer).

With the function overloading solution, you would need to do three things in 
order to get two frameworks to cooperate:
   1. Define your own Annotation type and register it with the frameworks you 
are using
   2. Define a decorator to wrap the annotations in a function __signature__ 
into your custom annotation type
   3. Apply your decorator to functions before the decorators for the 
annotation libraries are invoked

Overloading a standard type (like tuple) wouldn't work, as you might have two 
different modules, both using the same annotation library, that want it to 
interpret tuples in two different ways (e.g. in module A, the library's info 
is at index 0, while in module B it is at index 1).

So, for example:

   @library_A_type_processor
   @library_B_docstring_processor
   @handle_annotations
   def func(a: (int, "an int"),
            b: (str, "a string"))
            -> (str, "returns a string, too!):
     # do something

   def handle_annotations(f):
      note_dict = f.__signature__.annotations
      for param, note in note_dict.items():
              note_dict[param] = MyAnnotation(note)
      return f

However, what we're really talking about here is a scenario where you're 
defining your *own* custom annotation processor: you want the first part of 
the tuple in the expression handled by the type processing library, and the 
second part handled by the docstring processing library.

Which says to me that the right solution is for the annotation to be split up 
into its constituent parts before the libraries ever see it.

This could be done as Collin suggests by tampering with 
__signature__.annotations before calling each decorator, but I think it is 
cleaner to do it by defining a particular signature for decorators that are 
intended to process annotations.

Specifically, such decorators should accept a separate dictionary to use in 
preference to the annotations on the function itself:

   process_function_annotations(f, annotations=None):
     # Process the function f
     # If annotations is not None, use it
     # otherwise, get the annotations from f.__signature__

Then our function declaration and decorator would look like:

   @handle_annotations
   def func(a: (int, "an int"), b: (str, "a string")) -> (str, "returns!):
     # do something


   def handle_annotations(f):
      decorators = library_A_type_processor, library_B_docstring_processor
      note_dicts = {}, {}
      for param, note in f.__signature__.annotations.iteritems():
          for note_dict, subnote in zip(note_dicts, note):
              note_dict[param] = subnote
      for decorator, note_dict in zip(decorators, note_dicts):
          f = decorator(f, note_dict)
      return f

Writing a factory function to handle chaining of an arbitrary number of 
annotation interpreting libraries would be trivial, with the set of decroators 
provided as positional arguments if your notes are in a tuple, and as a 
keyword arguments if the notes are in a dictionary.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sat Aug 12 10:13:44 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Aug 2006 18:13:44 +1000
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com>
Message-ID: <44DD8DB8.3050102@gmail.com>

Collin Winter wrote:
> Return Values
> -------------
> 
> The examples thus far have omitted examples of how to annotate the
> type of a function's return value. This is done like so:
> 
> ::
>     def sum(*vargs: Number) -> Number:
>         ...
> 
> 
> The parameter list can now be followed by a literal ``->`` and
> a Python expression.  Like the annotations for parameters, this
> expression will be evaluated when the function is compiled.

I'd like to request that the annotation for the return type be *inside* the 
parentheses for the parameter list. Why, you ask?

Because, as soon as the annotations are at all verbose, you're going to want 
to split the function definition up so that each parameter gets its own line.

For the parameters, this works beautifully because parenthesis matching keeps 
the compiler from getting upset:

   def sum(seq: "the sequence of values to be added",
           init=0: "the initial value of the total"):
       # do it

But now try to document the return type on its own line:

   def sum(seq: "the sequence of values to be added",
           init=0: "the initial value of the total")
           -> "the summation of the sequence":
       # do it

Kaboom - SyntaxError on the second line because of the missing colon. However, 
if the return type annotation is *inside* the parentheses and separated by a 
comma, there's no problem:

   def sum(seq: "the sequence of values to be added",
           init=0: "the initial value of the total",
           -> "the summation of the sequence"):
       # do it

Having to use a line continuation just to be able to annotate the return type 
on a separate line would be an annoyance.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From jcarlson at uci.edu  Sat Aug 12 10:35:02 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sat, 12 Aug 2006 01:35:02 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <c56e219d0608111851i6f053407q96e5989fdff77848@mail.gmail.com>
References: <c56e219d0608111851i6f053407q96e5989fdff77848@mail.gmail.com>
Message-ID: <20060812012526.195B.JCARLSON@uci.edu>


"Luis P Caamano" <lcaamano at gmail.com> wrote:
> It's a gun, here's a bullet, it's a tool, go get food but try not to
> shoot yourself.
> 
> I'm also -0 on this, not that I think my opinion counts though.  I'm
> -0 because Tomer pointed me to a nice recipe that uses ctypes to get
> to the C interface.  I'm happy with that and we can start using it
> right now.  Perhaps that should be as high as it gets expose so that
> it would be an automatic skill test?  If you can find it, you probably
> know how to use it and the kind of problems you might run into.

Remember that the meat of Tomer's recipe, the ctypes call, is the only
thing that is going to be documented in Python 2.5 .  The functionality
of being able to kill threads with exceptions has existed since Python
2.3 (if I understood previous postings correctly), but has been
generally undocumented. Because it is literally just a documentation
change, and not actually additional functionality, means that it *can*
go into Python 2.5 . All other feature additions are too late in the
Beta cycle (Beta 3 is next week) to be added, unless someone manages to
convince the release manager that it should be allowed (I would put
money on it not going to happen).


 - Josiah

> On 8/11/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> >
> >
> > I believe that if a user cannot design and implement their own system to
> > handle when a thread can be killed or not to their own satisfaction,
> > then they have no business killing threads.
> >
> >
> >  - Josiah
> >
> 
> 
> -- 
> Luis P Caamano
> Atlanta, GA USA
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jcarlson%40uci.edu


From jcarlson at uci.edu  Sat Aug 12 11:07:29 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sat, 12 Aug 2006 02:07:29 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <877j1emwbm.fsf@qrnik.zagroda>
References: <20060811125449.1940.JCARLSON@uci.edu>
	<877j1emwbm.fsf@qrnik.zagroda>
Message-ID: <20060812013530.195E.JCARLSON@uci.edu>


"Marcin 'Qrczak' Kowalczyk" <qrczak at knm.org.pl> wrote:
> Josiah Carlson <jcarlson at uci.edu> writes:
> > Threading is already difficult enough to do 'right' (see the dozens
> > of threads discussing why this is really the case), and designing
> > software that can survive the raising of an exception at any point
> > makes threading even more difficult.
> 
> That's why I'm proposing to provide ways to limit those "any points".
> 
> > I believe that you are attempting to design an interface to make
> > this particular feature foolproof.
> 
> No, I'm merely attempting to make it usable.
> 
> You are proposing to make it unusable?
> 
> Indeed, you are proposing to make it unusable.

Because you or anyone else can define a standard mechanism of handling
these points where threads are allowed to be killed, and you can publish
it on the internet via the Python cookbook, etc., having nothing in the
standard library specifically supporting the operation isn't making
anything unusable.

I'm not proposing to make it unusable, merely that it should not be made
any easier to use.  See Nick Coughlan's comment with regards to '...easy
things easy...'.


> > I believe that if a user cannot design and implement their own
> > system to handle when a thread can be killed or not to their own
> > satisfaction, then they have no business killing threads.
> 
> I have already implemented it. In my own language, where I have
> full control over the runtime.

I'm glad that you have managed to implement it in your programming
language.  But this discussion isn't about Kogut, Haskell, etc., this is
about Python.  Specifically what should and should not be available in
the Python standard library.

I've said it before, but apparently the following point is ignored, so
I'll say it again.  The 'kill thread' mechanism isn't available via some
threading.kill_thread(thr) function because Guido and other core
developers *of* Python do not want it to be generally acceptable for
users to kill arbitrary threads.  The introduction of methods of
controlling where a thread could be killed into the standard library
would be encouraging the 'kill thread' usage.

It would be far safer (and much less work for the developers of Python)
for users to just learn how to handle thread quitting using any of the
standard methods of doing so (check the value of a variable, wait for a
signal, etc.).

Never mind that any feature is going to have to wait 18+ months before
Python 2.6 comes out in order to get your proposed changes in.


> Now it would be nice if Python had usable asynchronous exceptions too.

Python has had usable asynchronous exceptions since Python 2.3 [1].


> If we are not brave enough, we can implement at least an equivalent
> of POSIX thread cancellation. It would be better than nothing, though
> not as useful, because the default mode allows interruption only at
> certain blocking primitives. In this scenario Unix signals need a
> different policy so a pure computation not performing I/O nor thread
> synchronization can be interrupted; Unix signals usually cause the
> whole process to abort so data integrity was less of a concern.
> 
> A language with GC and exceptions can do better, with a unified policy
> for thread cancellation and Unix signals and other asynchronous events.
> It can be done such that well-written libraries are safely interruptible
> even if exceptions may occur almost anywhere. Protection should be
> built into certain operations (e.g. try...finally extended with an
> "initially" clause, or taking a mutex), so that there is less work
> needed to make code safe to be interrupted; then quite often it's
> already safe.

I don't have much of a comment with regards to attempted unification of
signals, etc., as Windows signal handling is effectively useless (and my
primary development platform tends to be Windows).


 - Josiah

[1]
Python 2.3.5 (#62, Feb  8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> import threading
>>> import time
>>> def foo():
...     try:
...             while 1:
...                     time.sleep(.01)
...     finally:
...             print "I quit!"
...
>>> x = threading.Thread(target=foo)
>>> x.start()
>>> for i,j in threading._active.items():
...     if j is x:
...             break
...
>>> ctypes.pythonapi.PyThreadState_SetAsyncExc(i, ctypes.py_object(Exception))
1
>>> I quit!
Exception in thread Thread-2:Traceback (most recent call last):
  File "C:\python23\lib\threading.py", line 442, in __bootstrap
    self.run()
  File "C:\python23\lib\threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "<stdin>", line 4, in foo
Exception



From tim.peters at gmail.com  Sat Aug 12 12:29:07 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Sat, 12 Aug 2006 06:29:07 -0400
Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry
Message-ID: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>

[Josiah Carlson]
> ...
> Python 2.3.5 (#62, Feb  8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import ctypes
> >>> import threading
> >>> import time
> >>> def foo():
> ...     try:
> ...             while 1:
> ...                     time.sleep(.01)
> ...     finally:
> ...             print "I quit!"
> ...
> >>> x = threading.Thread(target=foo)
> >>> x.start()
> >>> for i,j in threading._active.items():
> ...     if j is x:
> ...             break
> ...
> >>> ctypes.pythonapi.PyThreadState_SetAsyncExc(i, ctypes.py_object(Exception))

As I discovered to my chagrin when I added a similar test to the test
suite a few days ago, that's got a subtle error on most 64-bit boxes.
When the ctypes docs talk about passing and returning integers, they
never explain what "integers" /means/, but it seems the docs
implicitly have a 32-bit-only view of the world here.  In reality
"integer" seems to mean the native C `int` type.  But a Python thread
id is a native C `long` (== a Python short integer), and the code
above fails in a baffling way on most 64-bit boxes:  the call returns
0 instead; i.e. the thread id isn't found, and no exception gets set.
So I believe that needs to be:

    ctypes.pythonapi.PyThreadState_SetAsyncExc(
        ctypes.c_long(i),
        ctypes.py_object(Exception))

to make it portable.

It's unclear to me how to write portable ctypes code in the presence
of a gazillion integer typedefs and #defines, such as for Py_ssize_t.
That doesn't map to a fixed C integral type cross-platform, so what
can you do?  You're not required to answer that ;-)

Thread ids may bite us someday too.  Python casts the platform's
notion of a thread id to C `long`, but there's no guarantee this won't
lose information (or is even legal) on all platforms.  We'd probably
be safer casting to, e.g., Py_uintptr_t (some thread implementions
return an index into a kernel or library thread-info table, but at
least some in my lifetime returned a pointer to a thread-info struct,
and that's definitely fatter than C `long` on some boxes).

> 1
> >>> I quit!
> Exception in thread Thread-2:Traceback (most recent call last):
>   File "C:\python23\lib\threading.py", line 442, in __bootstrap
>     self.run()
>   File "C:\python23\lib\threading.py", line 422, in run
>     self.__target(*self.__args, **self.__kwargs)
>   File "<stdin>", line 4, in foo
> Exception

It's really cool that you can do this from ctypes, eh?  That's exactly
the right level of abstraction for this attractive nuisance too ;-)

From greg.ewing at canterbury.ac.nz  Sat Aug 12 13:05:08 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 12 Aug 2006 23:05:08 +1200
Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry
In-Reply-To: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>
References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>
Message-ID: <44DDB5E4.9010903@canterbury.ac.nz>

Tim Peters wrote:

> It's unclear to me how to write portable ctypes code in the presence
> of a gazillion integer typedefs and #defines, such as for Py_ssize_t.

A start would be to have constants in the ctypes module
for Py_ssize_t and other such Python-defined API types.

--
Greg

From l.oluyede at gmail.com  Sat Aug 12 13:11:47 2006
From: l.oluyede at gmail.com (Lawrence Oluyede)
Date: Sat, 12 Aug 2006 13:11:47 +0200
Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry
In-Reply-To: <44DDB5E4.9010903@canterbury.ac.nz>
References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>
	<44DDB5E4.9010903@canterbury.ac.nz>
Message-ID: <9eebf5740608120411m40da5724r11700fdbe509914@mail.gmail.com>

On 8/12/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Tim Peters wrote:
>
> > It's unclear to me how to write portable ctypes code in the presence
> > of a gazillion integer typedefs and #defines, such as for Py_ssize_t.
>
> A start would be to have constants in the ctypes module
> for Py_ssize_t and other such Python-defined API types.

rctypes and pypy tools are somewhat one step further than ctypes
machinery. In rctypes you can easily do something like:

size_t = ctypes_platform.SimpleType("size_t", c_ulong)

In this way you have platform safe data type to use in your code. The
second argument of SimpleType() is a hint for the tool.

You can also use ConstantInteger() and DefinedCostantInteger() to get
values of costants in headers file like this:

BUFSIZ = ctypes_platform.ConstantInteger("BUFSIZ")

Maybe one day this can be ported to CPython ctypes from the RPython one.

-- 
Lawrence
http://www.oluyede.org/blog

From aahz at pythoncraft.com  Sat Aug 12 15:42:44 2006
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 12 Aug 2006 06:42:44 -0700
Subject: [Python-3000]  Python 2.5 release schedule (was: threading, part 2)
In-Reply-To: <20060812012526.195B.JCARLSON@uci.edu>
References: <c56e219d0608111851i6f053407q96e5989fdff77848@mail.gmail.com>
	<20060812012526.195B.JCARLSON@uci.edu>
Message-ID: <20060812134244.GA29374@panix.com>

[added python-dev to make sure everyone sees this]

On Sat, Aug 12, 2006, Josiah Carlson wrote:
>
> All other feature additions are too late in the Beta cycle (Beta 3 is
> next week)

For some reason, this is the second time I've seen this claim.  Beta 3
was released August 3 and next week is rc1.  We are right now in
complete feature lockdown; even documenting an existing API IMO requires
approval from the Release Manager.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."  --Brian W. Kernighan

From aahz at pythoncraft.com  Sat Aug 12 15:44:28 2006
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 12 Aug 2006 06:44:28 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <c56e219d0608111851i6f053407q96e5989fdff77848@mail.gmail.com>
References: <c56e219d0608111851i6f053407q96e5989fdff77848@mail.gmail.com>
Message-ID: <20060812134428.GB29374@panix.com>

On Fri, Aug 11, 2006, Luis P Caamano wrote:
>
> That's how I feel too Josiah.  In some ways, it's the same as writing
> device drivers in a pre-emptable kernel.  You can get interrupted and
> pre-empted by the hardware at any freaking time in any piece of code
> and your memory might go away so you better pin it and deal with the
> interrupts.  Forget about that and you end up with a nice kernel
> panic.  Still, we have all kinds of device drivers on SMP,
> pre-emptable kernels.  It can be done.

But Python is not the language/platform to do it.

(Yeah, someone else said that already, but I think it needs emphasis.)
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."  --Brian W. Kernighan

From pje at telecommunity.com  Sat Aug 12 17:36:51 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 12 Aug 2006 11:36:51 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DD5DF0.40405@acm.org>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>

At 09:49 PM 8/11/2006 -0700, Talin wrote:
>Phillip J. Eby wrote:
>>Not at all.  A and B need only use overloadable functions, and the 
>>problem is trivially resolved by adding overloads.  The author of C can 
>>add an overload to "A" that will handle objects with 'next' attributes, 
>>or add one to "B" that handles tuples, or both.
>
>
>I'm still not sure what you are talking about - what is being overloaded here?
>
>Let me give you a better example. Suppose I have a 'docstring' annotation 
>and a 'getopt' annotation. The docstring annotation associates a string 
>with each argument, which can be inspected by an external documentation 
>scanner to produce documentation for that argument.
>
>Thus:
>
>    def myfunc( x : "The x coordinate", y : "The y coordinate" )
>       ...
>
>The 'getopt' annotation is used in conjunction with the 'getopt' 
>decorator, which converts from command-line arguments to python method 
>arguments. The idea is that you have a class that is acting as a back end 
>to a command-line shell. Each method in the class corresponds to a single 
>command. The annotations allow you to associate specific flags or switches 
>with particular arguments. So:
>
>class MyHandler( CommandLineHandler ):
>
>    @getopt
>    def list( infile:"i" = sys.stdin, outfile:"o" = sys.stdout ):
>       ...
>
>With the getopt handler in place, I can type the following shell command:
>
>    list -i <infile> -o <outfile>
>
>If either the -i or -o switch is omitted, then the corresponding argument 
>is either stdin or stdout.
>
>Additionally, the getopt module can generate 'usage' information for the 
>function in question:
>
>    Usage: list [-i infile] [-o outfile]
>
>Now, what happens if I want to use both docstrings and the getopt 
>decorator on the same function? The both expect to see annotations that 
>are strings! How does the doc extractor and the getopt decorator know 
>which strings belong to them, and which strings they should ignore?

Each one defines an overloaded function that performs the 
operation.  E.g.  "getArgumentOption(annotation)" and 
"getArgumentDoc(annotation)".

If somebody wants to use both decorators on the same function, they add 
methods to one or both of those functions to define how to handle their own 
type.  For example, I could create a "documented option" class that has 
attributes for the docstring and option character, and register methods 
with both getArgumentOption and getArgumentDoc to extact the right 
attributes from it.


From pje at telecommunity.com  Sat Aug 12 18:12:26 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 12 Aug 2006 12:12:26 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DD8A10.1040808@gmail.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060812113701.02343408@sparrow.telecommunity.com>

At 05:58 PM 8/12/2006 +1000, Nick Coghlan wrote:
>Phillip J. Eby wrote:
>>At 03:39 PM 8/12/2006 -0700, Talin <talin at acm.org> wrote:
>>>So programmer C, who wants to incorporate both A and B's work into his
>>>program, has a dilemma - each has a sharing mechanism, but the sharing
>>>mechanisms are different and incompatible. So he is unable to apply both
>>>A-type and B-type metadata to any given signature.
>>Not at all.  A and B need only use overloadable functions,
>
>Stop right there. "A and B need only use overloadable functions"? That 
>sounds an awful lot like placing a constraint on the way annotation 
>libraries are implemented in order to facilitate a single program using 
>multiple annotation libraries - which is exactly what Talin is saying is 
>needed!

You could perhaps look at it that way.  However, I'm simply using 
overloadable functions as a trivial example of how easy this is to handle 
without specifying a single mechanism.  There are numerous overloaded 
function implementations available, for example, including ad-hoc 
registry-based ones (like the ones used by pickle) and other mechanisms 
besides overloaded functions that do the same thing.  PEP 246 adaptation, 
for example, as used by Twisted and Zope.

My point is that:

1. trivial standard extension mechanisms (that are already in use in 
today's Python) allow libraries to offer compatibility between approaches, 
without choosing any blessed implementation or even approach to combination

2. there is no need to define a fixed semantic framework for 
annotations.  Guidelines for combinability (e.g. a standard interpretation 
for tuples or lists) might be a good idea, but it isn't *necessary* to 
mandate a single interpretation.


>(and using overloaded functions for this strikes me as hitting a very 
>small nail with a very large hammer).

Remember: Python is built from the ground up on overloaded 
functions.  len(), iter(), str(), repr(), hash(), int(), ...  You name it 
in builtins or operator, it's pretty much an overloaded function.

These functions differ from "full" overloaded functions in only these respects:

1. There is no framework to let you define new ones

2. They are single-dispatch only (except for the binary arithmetic 
operators, which have a crude double-dispatching protocol)

3. They do not allow third-party registration; classes must define 
__special__ methods to register implementations

(Some other overloaded functions in Python, such as pickle.dump and 
copy.copy, *do* allow third-party registrations, but they have ad-hoc 
implementations rather than using a common base implementation.)

So, saying that overloaded functions are a large hammer may or may not be 
meaningful, but it's certainly true that they are in *enormous* use in 
today's Python, even for very small nails like determining the length of an 
object.  :)

Indeed, the *default* way of doing almost anything in Python that involves 
multiple possible implementations is to define an overloaded function -- 
regardless of how small the nail might be.


>However, what we're really talking about here is a scenario where you're 
>defining your *own* custom annotation processor: you want the first part 
>of the tuple in the expression handled by the type processing library, and 
>the second part handled by the docstring processing library.
>
>Which says to me that the right solution is for the annotation to be split 
>up into its constituent parts before the libraries ever see it.
>
>This could be done as Collin suggests by tampering with 
>__signature__.annotations before calling each decorator, but I think it is 
>cleaner to do it by defining a particular signature for decorators that 
>are intended to process annotations.

Now you're embedding a particular implementation again.  The way to do this 
that imposes the least constraints on users, is to just have an 
'iter_annotations()' overloadable function, and let it iterate over lists 
and tuples, and yield anything else, e.g.:

     @iter_annotations.when(tuple)
     @iter_annotations.when(list)
     def iter_annotation_sequence(annotation):
         for a in annotation:
             for aa in iter_annotations(a):
                 yield aa

Now, if you have some custom annotation type that contains other 
annotations, you need only add a method to iter_annotations, and everything 
works.

In contrast, your approach is too limiting because you're *creating a 
framework* that then everyone has to conform to.  I want annotations to be 
framework-free.  I don't even think that the stdlib needs to provide an 
iter_annotations function, because there's no reason not to just define a 
method similar to the above for the specific operations you're doing.

In fact the general rule of overloadable functions is that the closer to 
the domain semantics the function is, the better.  For example, a 
'generateCodeFor(annotation)' overloaded function that can walk annotation 
sequences itself is a better idea than writing a non-overloaded function 
that uses iter_annotations() and then generates code for individual 
annotations, because it allows for better overloads.

For example, if you have a type that contains something that would 
ordinarily be considered separate annotation objects, but which the code 
generator could combine in some way to produce more optimal code.  Walking 
the annotations and then generating code would rob you of the opportunity 
to define an optimization overload in this case.

And *that* is why I don't think the stdlib should impose any semantics on 
annotations -- semantic imposition doesn't *fix* incompatibility, it 
*creates* it.

How?  Because if somebody needs to do something that doesn't fit within the 
imposed semantics, they are forced to create their own, and they now must 
reinvent everything so it works with their own!

This is the history of Python frameworks in a nutshell, and it's entirely 
avoidable.  We should leave the semantics open, precisely so that it will 
force people to make their code *extensible*.  As a side benefit, it 
provides a nice example of when and how to use overloaded functions 
effectively.


From pje at telecommunity.com  Sat Aug 12 18:39:15 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 12 Aug 2006 12:39:15 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608112133w7eb2d0c6x287c021b108974b@mail.gmail.com
 >
References: <5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com>
	<mailman.34113.1155347488.27774.python-3000@python.org>
	<5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060812121239.03c1da60@sparrow.telecommunity.com>

At 12:33 AM 8/12/2006 -0400, Collin Winter wrote:
>>I don't see the point of this.  A decorator should be responsible for
>>manipulating the signature of its return value.  Meanwhile, the semantics
>>for combining annotations should be defined by an overloaded function like
>>"combineAnnotations(a1,a2)" that returns a new annotation.  There is no
>>need to have a special chaining decorator.
>>
>>May I suggest that you try using Guido's Py3K overloaded function
>>prototype?  I expect you'll find that if you play around with it a bit, it
>>will considerably simplify your view of what's required to do this.  It
>>truly isn't necessary to predefine what an annotation is, or even any
>>structural constraints on how they will be combined, since the user is able
>>to define for any given type how such things will be handled.
>
>I've looked at Guido's overloaded function prototype, and while I
>think I'm in the direction of understanding, I'm not quite there 100%.
>
>Could you illustrate (in code) what you've got in mind for how to
>apply overloaded functions to this problem space?

You just define an overloadable function for whatever operation you want to 
perform on annotations.  Then you define methods that implement the 
operation for known types, and a default method that ignores unknown 
types.  Then you're done.

If somebody wants to do more than one thing with the annotations on their 
functions, then everything "just works", since there is only one annotation 
per argument (per the PEP), and each operation is ignoring types it doesn't 
understand.

This leaves only one problem: the possibility of incompatible 
interpretations for a given type of annotation -- and it is easily solved 
by using some container or wrapper type, for which methods can be added to 
the respective operations.

So, let's say I'm using two decorators that have a common (and 
incompatible) interpretation for type "str".  I need only create a type 
that is unique to my program, and then define methods for the overloaded 
functions those decorators expose.

QED: any incompatibility can be trivially solved by introducing a new 
type.  However, the most likely source of conflict is the need to specify 
multiple, unrelated annotations for a given argument.  So, it's likely that 
most operations will want to interpret a list of annotations as just that: 
a list of annotations.

But there is no *requirement* that they do so.  Someone writing a library 
of their own that has a special use for lists is under no obligation to 
adhere to that pattern.  Remember: any conflict can be trivially solved by 
introducing a new type.

If you'd like me to sketch this out in code, fine, but you define the 
specific example you'd like to see.  To me, this all seems as obvious and 
straightforward as 2+2=4 implying that 4-2=2.  And it doesn't even have 
anything specifically to do with overloaded functions!

If you replace overloaded functions with functions that expect to call 
certain method names on the objects, *the exact same principles apply*.  As 
long as each operation gets a unique method name, any conflict can be 
trivially solved by introducing a new type that implements both methods.

The key here is that introspection and explicit dispatching are bad.  Code 
like this:

      def decorate(func):
          ...
          if isinstance(annotation,str):
               # do something with string

is wrong, wrong, *wrong*.  It should simply be doing the equivalent of:

          annotation.doWhatIWant()

Except in the overloaded function case, it's 
'doWhatIWant(annotation)'.  The latter spelling has the advantage that you 
don't have to be able to modify the 'str' class to add a 'doWhatIWant()' 
method.

Is this clearer now?  This is known, by the way, as the "tell, don't ask" 
pattern.  In Python, we use the variant terms "duck typing" and "EAFP" 
(easier to ask forgiveness than permission), but "tell, don't ask" refers 
specifically to the idea that you should never dig around in an object's 
guts to perform an operation, and instead always delegate the operation to it.

Of course, delegation is impossible in the case of a "third-party" object 
being used -- i.e., one that can't be modified to add the necessary 
method.  Overloaded functions remove that restriction.

(This, by the way, is why I think Python should ultimately add an 
overloading syntax -- so that we could ultimately replace things like 'def 
__str__(self)' with something like 'defop str(self)'.  But that's not 
relevant to the immediate discussion.)


From paul at prescod.net  Sat Aug 12 21:38:06 2006
From: paul at prescod.net (Paul Prescod)
Date: Sat, 12 Aug 2006 12:38:06 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
Message-ID: <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.com>

Phillip. I'm having trouble following the discussion. I briefly caught up
when Talin got very concrete with syntax and I would appreciate if you could
offer some correspondingly remedial training.

Talin's example is that metadata inventor A documents that his/her users
should use this syntax for parameter docstrings:

def myfunc( x : "The x coordinate", y : "The y coordinate" )
      ...

Then metadata inventor B documents this his/her users should use this syntax
for getopt strings:

class MyHandler( CommandLineHandler ):

   @getopt
   def list( infile:"i" = sys.stdin, outfile:"o" = sys.stdout ):

Now the user is faced with the challenge of making these two work together
in order to get the best of both worlds. What does the user type?

The mechanism of overloading, function dispatching etc. is uninteresting to
me until I understand what goes in the user's Python file. Syntax is
important.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/a022b238/attachment.htm 

From pje at telecommunity.com  Sat Aug 12 23:10:17 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 12 Aug 2006 17:10:17 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.co
 m>
References: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>

At 12:38 PM 8/12/2006 -0700, Paul Prescod wrote:
>Phillip. I'm having trouble following the discussion. I briefly caught up 
>when Talin got very concrete with syntax and I would appreciate if you 
>could offer some correspondingly remedial training.
>
>Talin's example is that metadata inventor A documents that his/her users 
>should use this syntax for parameter docstrings:
>
>def myfunc( x : "The x coordinate", y : "The y coordinate" )
>       ...
>
>Then metadata inventor B documents this his/her users should use this 
>syntax for getopt strings:
>
>class MyHandler( CommandLineHandler ):
>
>    @getopt
>    def list( infile:"i" = sys.stdin, outfile:"o" = sys.stdout ):
>
>Now the user is faced with the challenge of making these two work together 
>in order to get the best of both worlds. What does the user type?

As long as both inventors used overloadable functions, the user can type 
almost *anything they want to*, as long as:

1. It's consistent,
2. It's unambiguous, and
3. They've defined the appropriate overloads.

For example, they might use a 'docopt' class that allows both to be 
specified, or a pair of 'doc' and 'opt' objects in a list.


>The mechanism of overloading, function dispatching etc. is uninteresting 
>to me until I understand what goes in the user's Python file. Syntax is 
>important.

Indeed it is.  Hence the importance of not forcing some particular 
semantics, so as to allow the user to use the types and semantics of their 
choosing.

By the way, it should be understood that when I say "overloadable 
function", I simply mean some type-extensible dispatching mechanism.  If 
you exclude built-in types from consideration, and simply have special 
attribute or method names, then duck typing works just as well.  You can 
have decorators that use hasattr() and such to do their dirty work.

It's only if you want to have sensible meaning for built-in types that 
there even begins to be an illusion that conflicts are an issue.  However, 
the only built-in types likely to even be used in such a way are lists, 
dictionaries, tuples, and strings.  If there's more than one way to 
interpret them, depending on the operation, their use is inherently 
ambiguous, and it's up to the person combining them to supply the 
differentiation.

However, if you have:

    def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") )

There is no ambiguity.  Likewise:

    def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ):

is unambiguous.  And the interpetation of:

    def cat(infile: [doc("input stream"), opt("i")] = sys.stdin,
            outfile: [doc("output stream"), opt("o")] = sys.stdout
    ):

is likewise unambiguous, unless the creator of the documentation or option 
features has defined some other interpretation for a list than "recursively 
apply to contained items".  In which case, you need only do something like:

    def cat(infile: docopt("input stream", "i") = sys.stdin,
            outfile: docopt("output stream", "o") = sys.stdout
    ):

with an appropriate definition of methods for the 'docopt' type.

Since many people seem to be unfamiliar with overloaded functions, I would 
just like to take this opportunity to remind you that the actual overload 
mechanism is irrelevant.  If you gave 'doc' objects a 'printDocString()' 
method and 'opt' objects a 'setOptionName()' method, the exact same logic 
regarding extensibility applies.  The 'docopt' type would simply implement 
both methods.

This is normal, simple standard Python stuff; nothing at all fancy.  The 
only thing that overloaded functions add to this is that they allow you to 
(in effect) add methods to existing types without monkeypatching.  Thus, 
you can define overloads for built-in types, and types you didn't implement 
yourself.  Even if overloaded functions didn't exist, it wouldn't be 
necessary to invent them just to allow arbitrary annotation semantics!  It 
simply requires that operations that *use* annotations always follow the 
"tell, don't ask" pattern, whether it's done by duck typing, EAFP, or 
overloaded functions.


From talin at acm.org  Sat Aug 12 23:07:18 2006
From: talin at acm.org (Talin)
Date: Sat, 12 Aug 2006 14:07:18 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>	
	<44DD5DF0.40405@acm.org>	
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.com>
Message-ID: <44DE4306.4070304@acm.org>

Paul Prescod wrote:
> Phillip. I'm having trouble following the discussion. I briefly caught up
> when Talin got very concrete with syntax and I would appreciate if you 
> could
> offer some correspondingly remedial training.
> 
> Talin's example is that metadata inventor A documents that his/her users
> should use this syntax for parameter docstrings:
> 
> def myfunc( x : "The x coordinate", y : "The y coordinate" )
>      ...

One important point I want to mention. I deliberately did *not* show a 
decorator for this above example. The reason for this is that the 
docstring annotations are not intended for consumption by a decorator 
function - they are intended for consumption by an external program that 
extracts documentation.

More specifically, this external doc extractor program would be part of 
a standard package of documentation tools, written by an entirely 
different author than the person actually writing 'myfunc'. This doc 
extractor knows nothing about decorators, and is unconcerned with their 
presence.

So I'd like Phillip to incorporate that into his explanation of how that 
is all supposed to work.

-- Talin



From talin at acm.org  Sun Aug 13 00:00:45 2006
From: talin at acm.org (Talin)
Date: Sat, 12 Aug 2006 15:00:45 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
Message-ID: <44DE4F8D.6050503@acm.org>

Phillip J. Eby wrote:
> At 12:38 PM 8/12/2006 -0700, Paul Prescod wrote:

> However, if you have:
> 
>    def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") )
> 
> There is no ambiguity.  Likewise:
> 
>    def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ):
> 
> is unambiguous.  And the interpetation of:
> 
>    def cat(infile: [doc("input stream"), opt("i")] = sys.stdin,
>            outfile: [doc("output stream"), opt("o")] = sys.stdout
>    ):

By doing this, you've already introduced an implicit requirement for 
annotations: Rather than saying that annotations can be "any format you 
want", the actual restriction is "any format you want that is 
distinguishable from other formats." More specifically, the rule is that 
annotations intended for different consumers must be distinguishable 
from each other via rule. This is in direct contradiction with the 
statement in the PEP that says that annotations have no predefined 
syntax or semantics -- they are required to have, at minimum, semantics 
sufficient to allow rule-based discrimination.

(BTW, I propose the term "Annotation Consumer" to mean a body of code 
that is intended to process annotations. You can have decorator-based 
consumers, as well as external consumers that are not part of the 
decorator stack and which inspect the function signature directly, 
without invoking the decorators.)

Lets use the term 'discriminator' to indicate any means, using function 
overloading or whatever, of determining which consumers should process 
which annotations. Lets also define the term 'discriminator protocol' to 
  mean any input specifications to the discriminator - so in the above 
example, 'doc()' and 'opt()' are part of the discriminator protocol.

Now, you are trying very hard not to specify a standard discriminator 
protocol, but the fact is that if you don't do it, someone else will. 
Nobody wants to have to write their own discriminator for each 
application. And you can't mix discriminator protocols unless those 
protocols are a priori compatible.

Thus, there is very strong pressure to create a single, standard 
discriminator, or at least a standard discriminator protocol. The 
pressure is based on the fact that most users would rather deal with a 
protocol that someone else has written rather than writing their own. 
And because mixing protocols has the potential for discrimination 
errors, a heterogeneous environment with multiple protocols will 
inevitably degenerate into one where a single protocol has a monopoly.

So why don't you save us all the trouble and pain and just define the 
standard discrimination mechanism up front? As I have shown, its going 
to happen anyway - its inevitable - and delaying the decision simply 
means a lot of heartache for a lot of folk until the one true 
discriminator takes over. (Which is another thing that I was trying to 
illustrate with my SysEx story.)

As a footnote, I'd like to make a philosophical point about designing 
protocols. A 'protocol' (not in the technical sense, but in the sense of 
human relations) is simply an agreement to curtail the range of one's 
behavior to a restricted subset of what one is capable of, in order to 
facilitate cooperation between individuals. Language is a protocol - as 
I am typing this message, I implicitly agree to use words of English, 
rather than random made-up syllables, in order to facilitate 
understanding of my meaning.

Now, the curious and paradoxical thing about protocols is that in order 
to give the most freedom, you have to take some freedom away. Taking 
away certain freedoms can give you *more* freedom, because it allows you 
to predict and rely on the behaviors of the other participants in the 
protocol, enabling you to accomplish things that you wouldn't be able to 
do otherwise. For a given situation, there will be some "sweet spot", 
some balance between openness and restriction, that will give the 
largest amount of "effective" freedom and capability to the participants.

Here's an example: Cultures which have a strong mercantile ethic for 
fair dealing and enforcement of contracts tend to have vastly more 
efficient national economies. In countries where the mercantile ethic is 
poor, transaction costs are much higher - each individual has to spend 
effort vetting and enforcing each potential transaction, instead of 
being able to simply trust the other person. So by voluntarily 
restricting ones behavior to not unfairly take advantage of others and 
thus gain a temporary local advantage, one gains a huge advantage on the 
aggregate level.

For this reason, I am skeptical of the benefit of completely open-ended 
protocols. The value of the protocol is in the agreement between 
individuals - if the individuals don't agree on much, then there's not 
much value to be had.

-- Talin

From paul at prescod.net  Sun Aug 13 02:05:56 2006
From: paul at prescod.net (Paul Prescod)
Date: Sat, 12 Aug 2006 17:05:56 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
Message-ID: <1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.com>

It seems to me that there are two very reasonable positions being expressed.
Is the following (non-normative) text a compromise?

"In order for processors of function annotations to work interoperably, they
must use a common interpretation of objects used as annotations on a
particular function. For example, one might interpret string annotations as
docstrings. Another might interpet them as path segments for a web
framework. For this reason, function annotation processors SHOULD avoid
assigning processor-specific meanings to types defined outside of the
processor's framework. For example, a Django processor could process
annotations of a type defined in a Zope package, but Zope's creators should
be considered the authorities on the type's meaning for the same reasons
that they would be considered authorities on the semantics of classes or
methods in their packages. This implies that the interpretation of built-in
types would be controlled by Python's developers and documented in Python's
documentation. This is just a best practice. Nothing in the language can or
should enforce this practice and there may be a few domains where there is a
strong argument for violating it (e.g. an education environment where saving
keystrokes may be more important than easing interopability)."

"In Python 3000, semantics will be attached to the following types:
basestring and its subtypes are to be used for documentation (though they
are not necessarily the exclusive source of documentation about the type).
List and its subtypes are to be used for attaching multiple independent
annotations."

(does chaining make sense in this context?)

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/978a3b2c/attachment.htm 

From greg.ewing at canterbury.ac.nz  Sun Aug 13 03:26:26 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 13 Aug 2006 13:26:26 +1200
Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry
In-Reply-To: <9eebf5740608120411m40da5724r11700fdbe509914@mail.gmail.com>
References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>
	<44DDB5E4.9010903@canterbury.ac.nz>
	<9eebf5740608120411m40da5724r11700fdbe509914@mail.gmail.com>
Message-ID: <44DE7FC2.4030501@canterbury.ac.nz>

Lawrence Oluyede wrote:

> rctypes and pypy tools are somewhat one step further than ctypes
> machinery. In rctypes you can easily do something like:
> 
> size_t = ctypes_platform.SimpleType("size_t", c_ulong)

Does this work dynamically, or does it rely on
C code being generated and the C compiler working
out the details?

--
Greg


From l.oluyede at gmail.com  Sun Aug 13 03:42:44 2006
From: l.oluyede at gmail.com (Lawrence Oluyede)
Date: Sun, 13 Aug 2006 03:42:44 +0200
Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry
In-Reply-To: <44DE7FC2.4030501@canterbury.ac.nz>
References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>
	<44DDB5E4.9010903@canterbury.ac.nz>
	<9eebf5740608120411m40da5724r11700fdbe509914@mail.gmail.com>
	<44DE7FC2.4030501@canterbury.ac.nz>
Message-ID: <9eebf5740608121842x4c1492baq9e049302905c2837@mail.gmail.com>

> Does this work dynamically, or does it rely on
> C code being generated and the C compiler working
> out the details?

It relies on C... that somewhat hinders the usefulness of the process.
There's also the code generator option but we're again onto a
compilation stuff.

-- 
Lawrence
http://www.oluyede.org/blog

From pje at telecommunity.com  Sun Aug 13 04:21:47 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 12 Aug 2006 22:21:47 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.co
 m>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060812221550.0258ce68@sparrow.telecommunity.com>

At 05:05 PM 8/12/2006 -0700, Paul Prescod wrote:
>It seems to me that there are two very reasonable positions being 
>expressed. Is the following (non-normative) text a compromise?
>
>"In order for processors of function annotations to work interoperably, 
>they must use a common interpretation of objects used as annotations on a 
>particular function. For example, one might interpret string annotations 
>as docstrings. Another might interpet them as path segments for a web 
>framework. For this reason, function annotation processors SHOULD avoid 
>assigning processor-specific meanings to types defined outside of the 
>processor's framework. For example, a Django processor could process 
>annotations of a type defined in a Zope package, but Zope's creators 
>should be considered the authorities on the type's meaning for the same 
>reasons that they would be considered authorities on the semantics of 
>classes or methods in their packages. This implies that the interpretation 
>of built-in types would be controlled by Python's developers and 
>documented in Python's documentation. This is just a best practice. 
>Nothing in the language can or should enforce this practice and there may 
>be a few domains where there is a strong argument for violating it ( e.g. 
>an education environment where saving keystrokes may be more important 
>than easing interopability)."

I mostly like this; the main issue I see is that as long as we're 
recommending best practices, we should recommend using tell-don't-ask (via 
duck typing protocols, adaptation, or overloaded functions) so that their 
libraries can be enhanced and extended by other developers.


>"In Python 3000, semantics will be attached to the following types: 
>basestring and its subtypes are to be used for documentation (though they 
>are not necessarily the exclusive source of documentation about the type). 
>List and its subtypes are to be used for attaching multiple independent 
>annotations."

I'm not sure why we would use strings for documentation, but I'm not 
opposed since it eliminates the question of multiple interpretations for 
strings.


>(does chaining make sense in this context?)

I don't know if I know what you mean by "chaining".  Good use of 
tell-don't-ask means that any interpretation of annotations nested in other 
annotations would be defined by the enclosing annotation (or in an overload 
for it).



From pje at telecommunity.com  Sun Aug 13 04:23:00 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 12 Aug 2006 22:23:00 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <mailman.34193.1155432421.27774.python-3000@python.org>
Message-ID: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com>

At 03:00 PM 8/12/2006 -0700, Talin <talin at acm.org> wrote:
>Phillip J. Eby wrote:
> > At 12:38 PM 8/12/2006 -0700, Paul Prescod wrote:
>
> > However, if you have:
> >
> >    def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") )
> >
> > There is no ambiguity.  Likewise:
> >
> >    def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ):
> >
> > is unambiguous.  And the interpetation of:
> >
> >    def cat(infile: [doc("input stream"), opt("i")] = sys.stdin,
> >            outfile: [doc("output stream"), opt("o")] = sys.stdout
> >    ):
>
>By doing this, you've already introduced an implicit requirement for
>annotations: Rather than saying that annotations can be "any format you
>want", the actual restriction is "any format you want that is
>distinguishable from other formats."

And your point is what?


>  More specifically, the rule is that
>annotations intended for different consumers must be distinguishable
>from each other via rule. This is in direct contradiction with the
>statement in the PEP that says that annotations have no predefined
>syntax or semantics -- they are required to have, at minimum, semantics
>sufficient to allow rule-based discrimination.

You've lost me here entirely.  If we didn't want unambiguous semantics, 
we'd write programs in English, not Python.  :)


>(BTW, I propose the term "Annotation Consumer" to mean a body of code
>that is intended to process annotations. You can have decorator-based
>consumers, as well as external consumers that are not part of the
>decorator stack and which inspect the function signature directly,
>without invoking the decorators.)

Um, okay.  I'm not sure what benefit this new term adds over "operation 
that uses annotations", which is what I've been using, but whatever.


>Lets use the term 'discriminator' to indicate any means, using function
>overloading or whatever, of determining which consumers should process
>which annotations. Lets also define the term 'discriminator protocol' to
>   mean any input specifications to the discriminator - so in the above
>example, 'doc()' and 'opt()' are part of the discriminator protocol.

Um, what?  Why are you adding all this complication to a simple idea?

Duck typing is normal, simple, standard Python programming practice.  We 
use objects with methods all the time, and check for the existence of 
attributes all the time.

I don't understand why you insist on making that more complicated than it 
is.  It's really simple.  Annotations are objects.  Objects can be 
inspected, or selected by type.  You can do what you want to with them.

How complex is that?

(Meanwhile, I'm going to ignore all the red herrings about freedom and 
commerce and other rigamarole that has absolutely nothing to do with 
argument annotations.)

Going forward, may I suggest you take a look at Java and C# argument 
annotations before continuing to pursue this spurious line of 
reasoning?  I'm curious to see what your explanation will be for why these 
other languages doesn't have the problems that you claim will inevitably occur.

Meanwhile, if library authors write bad code because they don't understand 
basic OO concepts like duck typing and "tell, don't ask", then their users 
will educate them when they complain about not being able to use multiple 
annotation types.

Providing good examples and recommending best practices is one thing, but 
mandating a particular semantics is another.


From exarkun at divmod.com  Sun Aug 13 05:21:49 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Sat, 12 Aug 2006 23:21:49 -0400
Subject: [Python-3000] [Python-Dev] What is the status of file.readinto?
In-Reply-To: <ca471dc20608121928r2695f1b9s19d9159927be936@mail.gmail.com>
Message-ID: <20060813032149.1717.1953938655.divmod.quotient.21274@ohm>

On Sat, 12 Aug 2006 19:28:44 -0700, Guido van Rossum <guido at python.org> wrote:
>On 8/12/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> I can only guess why it may go away; my guess it will go away when
>> the buffer interface is removed from Python (then it becomes
>> unimplementable).
>
>In Py3k, the I/O APIs will be redesigned, especially the binary ones.
>My current idea is to have read() on a binary file return a bytes
>object. If readinto() continues to be necessary, please make sure the
>Py3k list (python-3000 at python.org) knows about your use case. We
>aren't quite writing up the I/O APIs in PEP-form, but when we do, that
>would be the right time to speak up.
>

The ability to read into pre-allocated memory is fairly important
for high-performance applications.  This should be preserved somehow
(and preferably given a real, supported API).

Jean-Paul

From ironfroggy at gmail.com  Sun Aug 13 05:50:26 2006
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Sat, 12 Aug 2006 23:50:26 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.com>
Message-ID: <76fd5acf0608122050v75aa6dbbs32bf05f85222fa7e@mail.gmail.com>

I am getting very tired of gmail's ingoring of the mailing-list
headers in context of replying! Anyway, here is what I accidentally
sent as personal messages related to this thread. Replying to Talin's
long story about MIDI devices:

WOW

I won't even pretend to reply with anything near a similar sized body
of text. Condolences go out to you for the water and laptop, by the
way. Anyways...

Although this is a humourous story (post it somewhere readily with
some more fleshiness, maybe!) and I enjoyed reading it quite a bit, I
saw where it was going very early on and disagreed immedately with the
point I see you trying to get across. The thing is, the situations are
too different to compare so bluntly. The era from which this story
comes was a different world, which was far more brutal for any
attempts at loose cooperation than we can do today, what with the
internet and this being lots of open source software, not a hundred
and fifty competing MIDI vendors who think compatibility would just
make it easier to loose customers. The simplicity of the matter is
that there won't be that many annotation libraries, and mixing them
will be possible. When someone writes the good type annottation
handling library, other people (even those writing other annotation
libraries) will use it, until it reaches the point that it will get
put into the standard library. And, lets no one pretend that will not
happen. De facto and even just mildly common libraries almost always
get pushed into the standard library eventually, but having some time
in the wild is good for evolution to take its course.

And to what Paul Said here:
On 8/12/06, Paul Prescod <paul at prescod.net> wrote:
> It seems to me that there are two very reasonable positions being expressed.
> Is the following (non-normative) text a compromise?
>
> "In order for processors of function annotations to work interoperably, they
> must use a common interpretation of objects used as annotations on a
> particular function. For example, one might interpret string annotations as
> docstrings. Another might interpet them as path segments for a web
> framework. For this reason, function annotation processors SHOULD avoid
> assigning processor-specific meanings to types defined outside of the
> processor's framework. For example, a Django processor could process
> annotations of a type defined in a Zope package, but Zope's creators should
> be considered the authorities on the type's meaning for the same reasons
> that they would be considered authorities on the semantics of classes or
> methods in their packages. This implies that the interpretation of built-in
> types would be controlled by Python's developers and documented in Python's
> documentation. This is just a best practice. Nothing in the language can or
> should enforce this practice and there may be a few domains where there is a
> strong argument for violating it ( e.g. an education environment where
> saving keystrokes may be more important than easing interopability)."
>
> "In Python 3000, semantics will be attached to the following types:
> basestring and its subtypes are to be used for documentation (though they
> are not necessarily the exclusive source of documentation about the type).
> List and its subtypes are to be used for attaching multiple independent
> annotations."
>
> (does chaining make sense in this context?)
>
>  Paul Prescod

I've been looking for a good place to pipe in with the suggestion of
defining that a dictionary as an annotation is taken as a mapping of
annotation type names to the annotation itself, such as using {'doc':
"The single character argument for the command line.", 'type': int} as
an annotation for some parameter in a function.

However, reading through all the posts I missed recooperating from a
long trip I just returned from, I think this coupled with taking _any
iterable_ (not just list and subtypes) and the whole "your type, your
annotation" guideline, is definately sufficient for all uses.

From jimjjewett at gmail.com  Sun Aug 13 05:56:15 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sat, 12 Aug 2006 23:56:15 -0400
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>
Message-ID: <fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>

On 8/11/06, Jiwon Seo <seojiwon at gmail.com> wrote:
> When we have keyword-only arguments, do we allow 'keyword dictionary'
> argument? If that's the case, where would we want to place
> keyword-only arguments?

> Are we going to allow any of followings?

> 1. def foo(a, b,  *, key1=None, key2=None, **map)

Seems perfectly reasonable.

I think the controversy was over whether or not to allow keyword-only
without a default.

> 2. def foo(a, b, *,  **map, key1=None, key2=None)

Seems backward, though I suppose we could adjust if we needed to.

> 3. def foo(a, b, *, **map)

What would the * even mean, since there aren't any named keywords to separate?

-jJ

From talin at acm.org  Sun Aug 13 06:05:27 2006
From: talin at acm.org (Talin)
Date: Sat, 12 Aug 2006 21:05:27 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com>
Message-ID: <44DEA507.9040900@acm.org>

Phillip J. Eby wrote:
> At 03:00 PM 8/12/2006 -0700, Talin <talin at acm.org> wrote:
>> Phillip J. Eby wrote:
>> > At 12:38 PM 8/12/2006 -0700, Paul Prescod wrote:
>>
>> > However, if you have:
>> >
>> >    def myfunc( x : doc("The x coordinate"), y : doc("The y 
>> coordinate") )
>> >
>> > There is no ambiguity.  Likewise:
>> >
>> >    def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = 
>> sys.stdout ):
>> >
>> > is unambiguous.  And the interpetation of:
>> >
>> >    def cat(infile: [doc("input stream"), opt("i")] = sys.stdin,
>> >            outfile: [doc("output stream"), opt("o")] = sys.stdout
>> >    ):
>>
>> By doing this, you've already introduced an implicit requirement for
>> annotations: Rather than saying that annotations can be "any format you
>> want", the actual restriction is "any format you want that is
>> distinguishable from other formats."
> 
> And your point is what?
> 

My point is that this statement in the Collin's PEP is wrong:

 > There is no worry that these libraries will assign semantics at
 > random, or that a variety of libraries will appear, each with varying
 > semantics and interpretations of what, say, a tuple of strings
 > means. The difficulty inherent in writing annotation interpreting
 > libraries will keep their number low and their authorship in the
 > hands of people who, frankly, know what they're doing.

The way I read this is "there is no need for annotations to be designed 
so as not to interfere with one another, nor does there need to be any 
mechanism defined in this PEP for resolving such interference". I and 
others have provided extensive use cases to show that unless care is 
taken, different annotations *will* step on each others toes.

>>  More specifically, the rule is that
>> annotations intended for different consumers must be distinguishable
>> from each other via rule. This is in direct contradiction with the
>> statement in the PEP that says that annotations have no predefined
>> syntax or semantics -- they are required to have, at minimum, semantics
>> sufficient to allow rule-based discrimination.
> 
> You've lost me here entirely.  If we didn't want unambiguous semantics, 
> we'd write programs in English, not Python.  :)

Again, look at the language of the PEP.

>> (BTW, I propose the term "Annotation Consumer" to mean a body of code
>> that is intended to process annotations. You can have decorator-based
>> consumers, as well as external consumers that are not part of the
>> decorator stack and which inspect the function signature directly,
>> without invoking the decorators.)
> 
> Um, okay.  I'm not sure what benefit this new term adds over "operation 
> that uses annotations", which is what I've been using, but whatever.
> 
I'm just trying to get a handle on this stuff so that we can *talk* 
about it.

>> Lets use the term 'discriminator' to indicate any means, using function
>> overloading or whatever, of determining which consumers should process
>> which annotations. Lets also define the term 'discriminator protocol' to
>>   mean any input specifications to the discriminator - so in the above
>> example, 'doc()' and 'opt()' are part of the discriminator protocol.
> 
> Um, what?  Why are you adding all this complication to a simple idea?

I'm not adding anything to the concept, I am trying to come up with a 
way to *talk* about the concept. So far the whole conversation has 
gotten very confused because we're dealing with some highly abstract 
stuff here.

> Duck typing is normal, simple, standard Python programming practice.  We 
> use objects with methods all the time, and check for the existence of 
> attributes all the time.
> 
> I don't understand why you insist on making that more complicated than 
> it is.  It's really simple.  Annotations are objects.  Objects can be 
> inspected, or selected by type.  You can do what you want to with them.
> 
> How complex is that?

It gets complex when you have more than one inspector or selector. What 
we are arguing about is how much the various inspectors/selectors need 
to know about each other. And while the answer is hopefully "not much", 
I hope that I have shown that it cannot be "nothing at all". There has 
to be some ground rules for cooperation, or cooperation is impossible, 
that's basic logic.

> (Meanwhile, I'm going to ignore all the red herrings about freedom and 
> commerce and other rigamarole that has absolutely nothing to do with 
> argument annotations.)

Don't think of it as red herrings. Think of it as, um, "highly 
non-linear train of thought". :)

> Going forward, may I suggest you take a look at Java and C# argument 
> annotations before continuing to pursue this spurious line of 
> reasoning?  I'm curious to see what your explanation will be for why 
> these other languages doesn't have the problems that you claim will 
> inevitably occur.

Dude, you don't want to know how many man-years of C# programming I've 
done :)

Lets take C# attributes as an example. C# Attributes have the following 
syntactical/semantic structure:

   1) They must be derived from the base class "Attribute". (This by 
itself is not really significant.)
   2) Attributes are distinguished by type, or in some cases by value.
   3) The types do not overlap.
   4) A given consumer of attributes can always distinguish attributes 
which are relevant to their purposes to attributes which are not, even 
against hypothetical future annotations which have not yet been established.

As a user, when I add an attribute to a method, I know that (a) there is 
a known consumer of that attribute, (b) That it is impossible for an 
attribute which is not intended for that consumer to be confused for one 
that is. if I set [Browseable(false)] on a property, I know exactly how 
that attribute is going to be interpreted, and by what component. If 
someone comes along later and adds a new annotation called 
"SortOfBrowseable", which has many of the same attributes as Browseable, 
there will never be the possibility that there annotation and mine can 
get confused with each other. (As opposed to Python, where it's 
relatively easy to have classes that masquerade as one another.)

The Annotation PEP, on the other hand, makes none of these guarantees, 
because it tries hard not to guarantee anything. It doesn't specify the 
mechanism by which one annotation is distinguished from another; Unlike 
the C# attributes which are organized into a tree of types, the 
annotations have no organization and no categorization defined. Because 
there is no prohibition against category overlap, that means that the 
annotations that I write today might one day in the future match against 
a newly-created category, with results that I can't predict.

I also want to point out that C# attributes are very different from 
Python decorators, so you can't use analogies between them. Decorators 
are active agents - that is, they hook into the process of defining a 
method. Because of this, decorators have the option of having all of 
their semantic meaning buried within the decorator itself. In essence, 
the rule by which decorators "play nice" with each other is already 
defined - each gets a shot at modifying the function object, and each 
receives the result of the previous decorator.

C# attributes and function annotations, on the other hand, are purely 
passive - they have no knowledge of what they are attached to, and their 
only meaning is derived from external use. They themselves don't have to 
play nice with each other, but the interpreters / inspectors / consumers do.

> Meanwhile, if library authors write bad code because they don't 
> understand basic OO concepts like duck typing and "tell, don't ask", 
> then their users will educate them when they complain about not being 
> able to use multiple annotation types.
> 
> Providing good examples and recommending best practices is one thing, 
> but mandating a particular semantics is another.



From jcarlson at uci.edu  Sun Aug 13 06:16:18 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sat, 12 Aug 2006 21:16:18 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
References: <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.co m>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
Message-ID: <20060812205512.197A.JCARLSON@uci.edu>


"Phillip J. Eby" <pje at telecommunity.com> wrote:
> However, if you have:
> 
>     def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") )
> 
> There is no ambiguity.  Likewise:
> 
>     def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ):
> 
> is unambiguous.  And the interpetation of:
> 
>     def cat(infile: [doc("input stream"), opt("i")] = sys.stdin,
>             outfile: [doc("output stream"), opt("o")] = sys.stdout
>     ):
> 
> is likewise unambiguous, unless the creator of the documentation or option 
> features has defined some other interpretation for a list than "recursively 
> apply to contained items".  In which case, you need only do something like:
> 
>     def cat(infile: docopt("input stream", "i") = sys.stdin,
>             outfile: docopt("output stream", "o") = sys.stdout
>     ):

I now understand where you were coming from with regards to this being
equivalent to pickle (at least pickle + copy_reg).  I think that if you
would have posted this particular sample a couple days ago, there
wouldn't have been the discussion (argument?) about incompatible
mechanisms for annotation processing.

With that said, the above is a protocol.  Just like __len__, __str__,
copy_reg, __reduce__, __setstate__, etc., are protocols.  It may not be
fully specified (when annotations are to be processed, if at all, by
whom, where the annotation registry is, etc.), but it is still a
protocol.

Do we need any more specification for the PEP and 2.6/3k?  I don't know,
maybe. You claim no, with the history of PEAK and other languages as
proof that doing anything more is unnecessary.  And I can understand why
you would resist any further specification: PEAK has been doing
annotations for quite a while, and additional specifications could make
transitioning to these annotations a pain in the ass for you and your
users.

I'm personally not convinced that no further specification is desired or
necessary (provided we include a variant of the above example
annotations), but I also cannot convince myself that specifying anything
further would be flexible enough to not be a mistake.

 - Josiah


From pje at telecommunity.com  Sun Aug 13 07:05:13 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 13 Aug 2006 01:05:13 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DEA507.9040900@acm.org>
References: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060813005228.022737c8@sparrow.telecommunity.com>

At 09:05 PM 8/12/2006 -0700, Talin wrote:
>What we are arguing about is how much the various inspectors/selectors 
>need to know about each other. And while the answer is hopefully "not 
>much", I hope that I have shown that it cannot be "nothing at all".

As I've previously stated, they need to know enough to ignore what they 
don't understand.  And, to be useful, they should allow user extension via 
duck typing or overloading.


>  There has to be some ground rules for cooperation, or cooperation is 
> impossible, that's basic logic.

See the ground rules provided above.


>>Going forward, may I suggest you take a look at Java and C# argument 
>>annotations before continuing to pursue this spurious line of 
>>reasoning?  I'm curious to see what your explanation will be for why 
>>these other languages doesn't have the problems that you claim will 
>>inevitably occur.
>
>Dude, you don't want to know how many man-years of C# programming I've done :)
>
>Lets take C# attributes as an example. C# Attributes have the following 
>syntactical/semantic structure:
>
>   1) They must be derived from the base class "Attribute". (This by 
> itself is not really significant.)
>   2) Attributes are distinguished by type, or in some cases by value.
>   3) The types do not overlap.
>   4) A given consumer of attributes can always distinguish attributes 
> which are relevant to their purposes to attributes which are not, even 
> against hypothetical future annotations which have not yet been established.

I fail to see how this is different from what I've already said.


>As a user, when I add an attribute to a method, I know that (a) there is a 
>known consumer of that attribute, (b) That it is impossible for an 
>attribute which is not intended for that consumer to be confused for one 
>that is. if I set [Browseable(false)] on a property, I know exactly how 
>that attribute is going to be interpreted, and by what component. If 
>someone comes along later and adds a new annotation called 
>"SortOfBrowseable", which has many of the same attributes as Browseable, 
>there will never be the possibility that there annotation and mine can get 
>confused with each other.

Again, so far it sounds just like the existing proposal.


>  (As opposed to Python, where it's relatively easy to have classes that 
> masquerade as one another.)

That's a feature, not a bug.  :)


>The Annotation PEP, on the other hand, makes none of these guarantees, 
>because it tries hard not to guarantee anything. It doesn't specify the 
>mechanism by which one annotation is distinguished from another; Unlike 
>the C# attributes which are organized into a tree of types, the 
>annotations have no organization and no categorization defined. Because 
>there is no prohibition against category overlap, that means that the 
>annotations that I write today might one day in the future match against a 
>newly-created category, with results that I can't predict.

Not if the annotation consumers simply use a tell-don't-ask pattern -- a 
pattern which I've repeatedly explained, and which can be trivially 
implemented with either duck typing or overloading.


>I also want to point out that C# attributes are very different from Python 
>decorators, so you can't use analogies between them.

That statement makes me think that the reason we're not communicating is 
that you are talking about something else than I am.   I never compared 
Python decorators and C# attributes.  In fact, I've rarely mentioned 
decorators at all and have tried as much as possible to push decorators 
*out* of the conversation, because they are irrelevant.  Documentation 
tools, for example, are unlikely to use decorators.  Metaclasses also 
aren't decorators, but both documentation tools and metaclasses are likely 
candidates for consuming annotation data.

Thus, I prefer to talk about "operations using annotations" since 
decorators are only a kind of "delivery vector" for such 
annotation-consuming operations.


>C# attributes and function annotations, on the other hand, are purely 
>passive - they have no knowledge of what they are attached to, and their 
>only meaning is derived from external use. They themselves don't have to 
>play nice with each other, but the interpreters / inspectors / consumers do.

And precisely the same things are true of Python function annotations.  I'm 
still lost as to why you think there's something different going on 
here.  Python decorators simply provide a vector for immediate annotation 
processing -- one that is entirely orthogonal to the notion of annotations 
themselves.


From pje at telecommunity.com  Sun Aug 13 07:21:01 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 13 Aug 2006 01:21:01 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <20060812205512.197A.JCARLSON@uci.edu>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.co m>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060813010634.0228ee30@sparrow.telecommunity.com>

At 09:16 PM 8/12/2006 -0700, Josiah Carlson wrote:
>"Phillip J. Eby" <pje at telecommunity.com> wrote:
> > However, if you have:
> >
> >     def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") )
> >
> > There is no ambiguity.  Likewise:
> >
> >     def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ):
> >
> > is unambiguous.  And the interpetation of:
> >
> >     def cat(infile: [doc("input stream"), opt("i")] = sys.stdin,
> >             outfile: [doc("output stream"), opt("o")] = sys.stdout
> >     ):
> >
> > is likewise unambiguous, unless the creator of the documentation or option
> > features has defined some other interpretation for a list than 
> "recursively
> > apply to contained items".  In which case, you need only do something like:
> >
> >     def cat(infile: docopt("input stream", "i") = sys.stdin,
> >             outfile: docopt("output stream", "o") = sys.stdout
> >     ):
>
>I now understand where you were coming from with regards to this being
>equivalent to pickle (at least pickle + copy_reg).  I think that if you
>would have posted this particular sample a couple days ago, there
>wouldn't have been the discussion (argument?) about incompatible
>mechanisms for annotation processing.

Well, it just seemed to me that that was the One Obvious Way To Do It; more 
specifically, I couldn't conceive of any *other* way to do it!


>With that said, the above is a protocol.  Just like __len__, __str__,
>copy_reg, __reduce__, __setstate__, etc., are protocols.  It may not be
>fully specified (when annotations are to be processed, if at all, by
>whom, where the annotation registry is, etc.), but it is still a
>protocol.

Actually, it's a family of *patterns* for creating protocols.  It's not a 
protocol, incompletely specified or otherwise.  Note that the actual 
implementation of the tell-don't-ask pattern can be via:

1. duck typing (i.e., prearranged method names)
2. adaptation
3. overloaded functions (any of several implementations)
4. ad hoc type-based registries

So it isn't even a *meta*-protocol, just a pattern family.


>Do we need any more specification for the PEP and 2.6/3k?  I don't know,
>maybe. You claim no, with the history of PEAK and other languages as
>proof that doing anything more is unnecessary.  And I can understand why
>you would resist any further specification: PEAK has been doing
>annotations for quite a while, and additional specifications could make
>transitioning to these annotations a pain in the ass for you and your
>users.

Not really; PEAK's annotations are currently only on *attributes* and 
*classes*, not functions, arguments, or return values.  I was merely using 
it as an example of how overloaded functions allow heterogeneous 
annotations to coexist without needing any prearranged common semantics.

But I don't believe we know enough *today* to be able to safely define a 
rigid specification without ruling out possibly-valid uses.  By making a 
less-rigid specification, we force annotation consumers to code 
defensively...  which is really the right thing to do in a heterogeneous 
environment anyway.


>I'm personally not convinced that no further specification is desired or
>necessary (provided we include a variant of the above example
>annotations),

As I said, I'd prefer to see the tell-don't-ask pattern specifically cited 
and recommended, perhaps with examples.

I'll note, however, that the only consequence of *not* following that 
pattern is that you create a non-extensible, non-interoperable framework -- 
of which Python has huge numbers already.  This is not so damaging an 
outcome as to be worrisome, any more than we worry about people creating 
incompatible metaclasses today!


>but I also cannot convince myself that specifying anything
>further would be flexible enough to not be a mistake.

Right - that's the bit I'm concerned about.  Python also usually doesn't 
impose such policy constraints on mechanism.  For example, function 
attributes can be or contain anything, and nobody has argued that there 
need to be prespecified combination semantics, despite the fact that 
multiple tools can be consumers of the attributes.


From jimjjewett at gmail.com  Sun Aug 13 07:29:52 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 13 Aug 2006 01:29:52 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <20060812205512.197A.JCARLSON@uci.edu>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
Message-ID: <fb6fbf560608122229q56ccff29s780cbef862dc741c@mail.gmail.com>

On 8/13/06, Josiah Carlson <jcarlson at uci.edu> wrote:

> "Phillip J. Eby" <pje at telecommunity.com> wrote:
> > However, if you have:

> >     def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") )

> > There is no ambiguity.

Sure there is.  There will probably be several frameworks using the
magic name "doc".

This isn't a problem for the person writing myfunc, and therefore
isn't a problem for immediate decorators.  It is a problem for
inspection code that wants to present information about arbitrary
3rd-party libraries.

And once you get into multiple annotations, there will be some
frameworks that say "the doc annotation is mine, I'll ignore the opt
annotation" and others that say "oh, a dictionary of annotations, I
need to do this with name doc and that with name opt"

And of course, people won't really write doc("The x coordinate")
unless they're already thinking of other uses for a string; they'll
just write "The x coordinate" and someone later (perhaps from a
different package) will have to untangle what they meant -- short
expressions will end up being ambiguous almost from the start.

Eventually, ways will be found to sort things out.  But there will be
less pain and backwards incompatibility if these issues are considered
from the start.


> Do we need any more specification for the PEP and 2.6/3k?  I don't know,
> maybe. You claim no, with the history of PEAK and other languages as
> proof that doing anything more is unnecessary.

The history of complaints about PEAK being hard to understand and
inadequately documented suggests that a fair number of people would
prefer additional guidance and handholding.  If annotations could only
be used safely by people who can understand PEAK, then offering
syntactic sugar to everyone would be asking for trouble.

-jJ

From paul at prescod.net  Sun Aug 13 08:00:36 2006
From: paul at prescod.net (Paul Prescod)
Date: Sat, 12 Aug 2006 23:00:36 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DEA507.9040900@acm.org>
References: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com>
	<44DEA507.9040900@acm.org>
Message-ID: <1cb725390608122300q3b20db1apc707e537c36fd0ee@mail.gmail.com>

I made a proposal that Phillip was mostly okay with. What do other
participants in the thread think? Would it move towards resolving this
thread?

"In order for processors of function annotations to work interoperably, they
must use a common interpretation of objects used as annotations on a
particular function. For example, one might interpret string annotations as
docstrings. Another might interpet them as path segments for a web
framework. For this reason, function annotation processors SHOULD avoid
assigning processor-specific meanings to types defined outside of the
processor's framework. For example, a Django processor could process
annotations of a type defined in a Zope package, but Zope's creators should
be considered the authorities on the type's meaning for the same reasons
that they would be considered authorities on the semantics of classes or
methods in their packages. This implies that the interpretation of built-in
types would be controlled by Python's developers and documented in Python's
documentation. This is just a best practice. Nothing in the language can or
should enforce this practice and there may be a few domains where there is a
strong argument for violating it ( e.g. an education environment where
saving keystrokes may be more important than easing interopability)."

"In Python 3000, semantics will be attached to the following types:
basestring and its subtypes are to be used for documentation (though they
are not necessarily the exclusive source of documentation about the type).
List and its subtypes are to be used for attaching multiple independent
annotations."

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/d1576e7a/attachment.htm 

From pje at telecommunity.com  Sun Aug 13 08:06:50 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 13 Aug 2006 02:06:50 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <fb6fbf560608122229q56ccff29s780cbef862dc741c@mail.gmail.co
 m>
References: <20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
Message-ID: <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>

At 01:29 AM 8/13/2006 -0400, Jim Jewett wrote:
>On 8/13/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
>>"Phillip J. Eby" <pje at telecommunity.com> wrote:
>> > However, if you have:
>
>> >     def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") )
>
>> > There is no ambiguity.
>
>Sure there is.  There will probably be several frameworks using the
>magic name "doc".
>
>This isn't a problem for the person writing myfunc, and therefore
>isn't a problem for immediate decorators.  It is a problem for
>inspection code that wants to present information about arbitrary
>3rd-party libraries.

By this argument, we shouldn't have metaclasses or function attributes, 
because they have the same "problem".

However, it's only a problem if you insist on writing brain-damaged 
code.  If you want interoperability here, you must write tell-don't-ask 
code.  This is true for *any* use case where frameworks might share 
objects; there is absolutely *nothing* special about annotations in this 
regard!

I'm really baffled by the controversy over this; is it really the case that 
so many people don't know what tell-don't-ask code is or why you want 
it?  I guess maybe it's something that's only grasped by people who have 
experience writing code intended for interoperability.

After you run into the issue a few times, you look for a solution, and end 
up with either duck typing, interfaces/adaptation, overloaded functions, or 
ad hoc registries.  ALL of these solutions are *more* than adequate to 
handle a simple thing like argument annotations.  That's why I keep 
describing this as a trivial thing: even *pickling* is more complicated 
than this is.  This is no more complex than len() or iter() or filter()!

However, it appears that mine is a minority opinion.  Unfortunately, I'm at 
a bit of a communication disadvantage, because if somebody wants to believe 
something is complicated, there is nothing that anybody can do to change 
their mind.  If you don't consider the possibility that it is way simpler 
than you think, you will never be able to see it.

The other possibility, of course, is that all of you have some horrendously 
complex use case in mind that I just don't "get".  But so far all the 
examples that anybody else has put forth have been practically whimsical in 
their triviality -- while I've been explaining how the same principles will 
even work for complex things like type-checking code generation, let alone 
the trivial examples.  So I don't think that's it.  And at least Paul and 
Josiah have shown that they "get" what I'm saying, so I don't think that 
the answer is simply that I'm crazy, either.

[Meanwhile, I'm not going to respond to the rest of your message, since it 
contained some things that appeared to me to be a mixture of ad hominem 
attack and straw man argument.  I hope that was not actually your intent.]


From ironfroggy at gmail.com  Sun Aug 13 08:07:19 2006
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Sun, 13 Aug 2006 02:07:19 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608122300q3b20db1apc707e537c36fd0ee@mail.gmail.com>
References: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com>
	<44DEA507.9040900@acm.org>
	<1cb725390608122300q3b20db1apc707e537c36fd0ee@mail.gmail.com>
Message-ID: <76fd5acf0608122307m11d3128ah3791ded3b3df2cd@mail.gmail.com>

On 8/13/06, Paul Prescod <paul at prescod.net> wrote:
> I made a proposal that Phillip was mostly okay with. What do other
> participants in the thread think? Would it move towards resolving this
> thread?
>
> "In order for processors of function annotations to work interoperably, they
> must use a common interpretation of objects used as annotations on a
> particular function. For example, one might interpret string annotations as
> docstrings. Another might interpet them as path segments for a web
> framework. For this reason, function annotation processors SHOULD avoid
> assigning processor-specific meanings to types defined outside of the
> processor's framework. For example, a Django processor could process
> annotations of a type defined in a Zope package, but Zope's creators should
> be considered the authorities on the type's meaning for the same reasons
> that they would be considered authorities on the semantics of classes or
> methods in their packages. This implies that the interpretation of built-in
> types would be controlled by Python's developers and documented in Python's
> documentation. This is just a best practice. Nothing in the language can or
> should enforce this practice and there may be a few domains where there is a
> strong argument for violating it ( e.g. an education environment where
> saving keystrokes may be more important than easing interopability)."
>
>
> "In Python 3000, semantics will be attached to the following types:
> basestring and its subtypes are to be used for documentation (though they
> are not necessarily the exclusive source of documentation about the type).
> List and its subtypes are to be used for attaching multiple independent
> annotations."
>
>  Paul Prescod

+1

This needs resolved, and willy-nilly use of built-in types or someone
else's types.. doesn't seem like anyone could be supportive of that.

From paul at prescod.net  Sun Aug 13 08:39:32 2006
From: paul at prescod.net (Paul Prescod)
Date: Sat, 12 Aug 2006 23:39:32 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060812221550.0258ce68@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812221550.0258ce68@sparrow.telecommunity.com>
Message-ID: <1cb725390608122339m6087c604l85faeb89d6061524@mail.gmail.com>

On 8/12/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>
>
> I mostly like this; the main issue I see is that as long as we're
> recommending best practices, we should recommend using tell-don't-ask (via
> duck typing protocols, adaptation, or overloaded functions) so that their
> libraries can be enhanced and extended by other developers.


Would you mind suggesting text for the PEP as an addendum to what I
proposed? And an example of both bad and good practice?

>"In Python 3000, semantics will be attached to the following types:
> >basestring and its subtypes are to be used for documentation (though they
> >are not necessarily the exclusive source of documentation about the
> type).
> >List and its subtypes are to be used for attaching multiple independent
> >annotations."
>
> I'm not sure why we would use strings for documentation, but I'm not
> opposed since it eliminates the question of multiple interpretations for
> strings.


I don't understand your point. Is there a better use for strings? Or a
better type to associate with documentation? Or you just don't see a need
for inline parameter documentation? The PEP itself used string docstrings as
an example.

>(does chaining make sense in this context?)
>
> I don't know if I know what you mean by "chaining".  Good use of
> tell-don't-ask means that any interpretation of annotations nested in
> other
> annotations would be defined by the enclosing annotation (or in an
> overload
> for it).


Yes, it's clear what nesting means. I'm not asking about nesting.

The question was whether there should be any relationship implied by the
fact that an annotation appears to the left or right of another annotation
in a list of annotations.

def a(b: [doc('x'), type('y')]): pass

Is there any sense in which the function 'x' should be passed context
information that would help it wrap or communicate with 'y'?

The most likely answer is "no" but function decorators do chain so I just
wanted to raise the issue in case anyone wanted to make the case that
parameter and return code annotations should as well.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/31cb1cab/attachment.html 

From paul at prescod.net  Sun Aug 13 08:47:29 2006
From: paul at prescod.net (Paul Prescod)
Date: Sat, 12 Aug 2006 23:47:29 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com>
References: <mailman.34014.1155280218.27774.python-3000@python.org>
	<5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
	<20060811084623.1931.JCARLSON@uci.edu> <44DD073C.7030305@acm.org>
	<43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com>
Message-ID: <1cb725390608122347q2527151fiadf1a8fc7bcd4af5@mail.gmail.com>

On 8/11/06, Collin Winter <collinw at gmail.com> wrote:
>
> >>> def chain(*decorators):
> >>>     assert len(decorators) >= 2
> >>>
> >>>     def decorate(function):
> >>>         sig = function.__signature__
> >>>         original = sig.annotations
> >>>
> >>>         for i, dec in enumerate(decorators):
> >>>             fake = dict((p, original[p][i]) for p in original)
> >>>
> >>>             function.__signature__.annotations = fake
> >>>             function = dec(function)
> >>>
> >>>         function.__signature__.annotations = original
> >>>         return function
> >>>     return decorate


I must be confused. This is a function returning a function. Does that mean
that the thing showing up in the __signatures__ dictionary is a function? Or
does the caller need to use two sets of parentheses to call the factory
function and then the inner function?

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/3263fd44/attachment.htm 

From paul at prescod.net  Sun Aug 13 09:02:05 2006
From: paul at prescod.net (Paul Prescod)
Date: Sun, 13 Aug 2006 00:02:05 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <76fd5acf0608122011w442afac8o6bfaa7f42ec9cbcd@mail.gmail.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.com>
	<76fd5acf0608122011w442afac8o6bfaa7f42ec9cbcd@mail.gmail.com>
Message-ID: <1cb725390608130002gbe3cb88j301b451386c51328@mail.gmail.com>

On 8/12/06, Calvin Spealman <ironfroggy at gmail.com> wrote:
>
> I've been looking for a good place to pipe in with the suggestion of
> defining that a dictionary as an annotation is taken as a mapping of
> annotation type names to the annotation itself, such as using {'doc':
> "The single character argument for the command line.", 'type': int} as
> an annotation for some parameter in a function.


I think we need to decide whether metadata type identifiers are just strings
or whether they will typically be objects. I think that the arguments in
favour of objects are strong.

However, reading through all the posts I missed recooperating from a
> long trip I just returned from, I think this coupled with taking _any
> iterable_ (not just list and subtypes) and the whole "your type, your
> annotation" guideline, is definately sufficient for all uses.
>

One reason not to treat any iterable as a list of decorators is that a
string is an iterable. Maybe strings won't be the only annotation that
people want to attach that happens to be iterable for unrelated reasons.

A second reason that I restricted it to lists in particular is to encourage
consistent syntax (rather than one person using a list, another a tuple, a
third a generator, etc.).

And overall it is just overgeneralization. YAGNI. Lists work fine.

def myProtocolChainer(*args):
   return list(doSomething(args)):

It is easy to loosen the protocol in future versions if I turn out to be
wrong.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/a1656667/attachment.html 

From paul at prescod.net  Sun Aug 13 09:42:06 2006
From: paul at prescod.net (Paul Prescod)
Date: Sun, 13 Aug 2006 00:42:06 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
Message-ID: <1cb725390608130042h50c7d7f9oc4068f30f2b04bbb@mail.gmail.com>

> And the interpetation of:
>
>     def cat(infile: [doc("input stream"), opt("i")] = sys.stdin,
>             outfile: [doc("output stream"), opt("o")] = sys.stdout
>     ):
>
> is likewise unambiguous, unless the creator of the documentation or option
> features has defined some other interpretation for a list than
> "recursively
> apply to contained items".


The meaning is "unambiguous unless..." then it ambiguous. So as per my
previous proposal I think that you and I agree that we should disallow the
stupid interpretation by encoding the obvious one in the PEP.

In which case, you need only do something like:
>
>     def cat(infile: docopt("input stream", "i") = sys.stdin,
>             outfile: docopt("output stream", "o") = sys.stdout
>     ):
>
> with an appropriate definition of methods for the 'docopt' type.


Given that there are an infinite number of tools in the universe that could
be processing "doc" and "opt" annotations, how would the user KNOW that
there is one out there with a stupid interpretation of lists? They might
annotate thousands of classes before finding out that some hot tool that
they were planning to use next year is incompatible. So let's please define
a STANDARD way of attaching multiple annotations to a parameter. Lists seem
like a no-brainer choice for that.

Since many people seem to be unfamiliar with overloaded functions, I would
> just like to take this opportunity to remind you that the actual overload
> mechanism is irrelevant.  If you gave 'doc' objects a 'printDocString()'
> method and 'opt' objects a 'setOptionName()' method, the exact same logic
> regarding extensibility applies.  The 'docopt' type would simply implement
> both methods.
>
> This is normal, simple standard Python stuff; nothing at all fancy.


The context is a little bit different than standard duck typing.

Let's say I define a function like this:

def car(b):
 "b is a list-like object"
 return b[0]

Then someone comes along and does something I never expected. They invent a
type representing a list of bits in a bitfield. They pass it to my function
and everything works trivially. But there's something important that
happened. The programmer ASSERTED by passing the RDF list to the function
'a' that it is a list like object. My code wouldn't have tried to treat it
as a list if the user hadn't passed it as one explicitly.

Now look at it from the point of view of function annotations. As we said
before, the annotations are inert. They are just attached. There is some
code like a type checker or documentation generator that comes along after
the fact and scoops them up to do something with them. The user did not
assert (at the language level!) that any particular annotation applies to
any particular annotation processor. The annotation processor is just
looking for stuff that it recognizes. But what if it thinks it recognizes
something but does not?

Consider this potential case:

BobsDocumentationGenerator.py:

class BobsDocumentationGeneratorAnnotation:
    def __init__...
    def printDocument(self):
        print self.doc
    def sideEffect(self):
        deleteHardDrive()

def BobsDocumentationGenerator(annotation):
   if hasattr(annotation, "printDocument"):
       annotation.printDocument()

SamsDocumentationGenerator.py:

class SamsDocumentationGeneratorAnnotation:
    def __init__...
    def printDocument(self):
        return self.doc
    def sideEffect(self):
        email(self.doc, "python-dev at pytho...")

def SamsDocumentationGenerator(annotation):
   if hasattr(annotation, "printDocument"):
       print annotation.printDocument()
       annotation.sideEffect()

These objects, _by accident_ have the same method signature but different
side effects and return values. Nobody anywhere in the system made an
incorrect assertion. They just happened to be unlucky in the naming of their
methods. (unbelievably unlucky but you get the drift)

One simple way to make it unambiguous would be to do a test more like:

   if hasattr(annotation, SamsDocumentationGenerator.uniqueObject): ...

The association of the unique object with an annotator object would be an
explicit assertion of compatibility.

Can we agree that the PEP should describe strategies that people should use
to make their annotation recognition strategies unambiguous and
failure-proof?

I think that merely documenting appropriately defensive techniques might be
enough to make Talin happy. Note that it isn't the processing code that
needs to be defensive (in the sense of try/catch blocks). It is the whole
recognition strategy that the processing code uses. Whatever recognition
strategy it uses must be unambiguous. It seems like it would hurt nobody to
document this and suggest some unambiguous techniques.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/8fd0e73f/attachment-0001.htm 

From jcarlson at uci.edu  Sun Aug 13 09:59:06 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 13 Aug 2006 00:59:06 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060813010634.0228ee30@sparrow.telecommunity.com>
References: <20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813010634.0228ee30@sparrow.telecommunity.com>
Message-ID: <20060812233132.197F.JCARLSON@uci.edu>


"Phillip J. Eby" <pje at telecommunity.com> wrote:
> At 09:16 PM 8/12/2006 -0700, Josiah Carlson wrote:
> >"Phillip J. Eby" <pje at telecommunity.com> wrote:
> > > However, if you have:
> > >
> > >     def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") )
> > >
> > > There is no ambiguity.  Likewise:
> > >
> > >     def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ):
> > >
> > > is unambiguous.  And the interpetation of:
> > >
> > >     def cat(infile: [doc("input stream"), opt("i")] = sys.stdin,
> > >             outfile: [doc("output stream"), opt("o")] = sys.stdout
> > >     ):
> > >
> > > is likewise unambiguous, unless the creator of the documentation or option
> > > features has defined some other interpretation for a list than 
> > "recursively
> > > apply to contained items".  In which case, you need only do something like:
> > >
> > >     def cat(infile: docopt("input stream", "i") = sys.stdin,
> > >             outfile: docopt("output stream", "o") = sys.stdout
> > >     ):
> >
> >I now understand where you were coming from with regards to this being
> >equivalent to pickle (at least pickle + copy_reg).  I think that if you
> >would have posted this particular sample a couple days ago, there
> >wouldn't have been the discussion (argument?) about incompatible
> >mechanisms for annotation processing.
> 
> Well, it just seemed to me that that was the One Obvious Way To Do It; more 
> specifically, I couldn't conceive of any *other* way to do it!

Perhaps, but it was also obvious that very few people knew what the heck
you were talking about (hence the "how" and "what do you mean" queries).

Try to remember that while you may be old-hat at annotations, perhaps
not everyone discussing them at the moment has your particular
experience and assumptions.  Also, when you hand-wave with "it's trivial",
it's more than a little frustrating, because while it may be "trivial"
to you, it's certainly not trivial to the asker (why would they be
asking otherwise?)


> >With that said, the above is a protocol.  Just like __len__, __str__,
> >copy_reg, __reduce__, __setstate__, etc., are protocols.  It may not be
> >fully specified (when annotations are to be processed, if at all, by
> >whom, where the annotation registry is, etc.), but it is still a
> >protocol.
> 
> Actually, it's a family of *patterns* for creating protocols.  It's not a 
> protocol, incompletely specified or otherwise.  Note that the actual 
> implementation of the tell-don't-ask pattern can be via:

Here's my take: Protocol in this context is a set of rules for the
definition of the annotations and their interaction with the handler for
the annotations.  For what we seem to have agreed upon, the definition
is via a base class or instance, and the annotation handling is left up
to the user to define (via the four methods you offered, or even others).

If you want to call it a 'pattern', 'protocol', 'meta-protocol', or
whatever, they are all effectively the same thing in this context; a way
of writing annotations that can later be seen as having a (hopefully
unambiguous) meaning.


> But I don't believe we know enough *today* to be able to safely define a 
> rigid specification without ruling out possibly-valid uses.  By making a 
> less-rigid specification, we force annotation consumers to code 
> defensively...  which is really the right thing to do in a heterogeneous 
> environment anyway.

Right.  I'm in no way suggesting that a 'rigid' specification be
developed, and I'm generally on the fence about whether *any*
specification should be done.  But really, the more I think about it,
the more I believe that *something* should be offered as a starting
point. Whether it is in the Python cookbook, a 3rd party module or
package, etc. As long as it includes a link from the standard Python
documentation where annotations are discussed, I think that would be
satisfactory.


> >but I also cannot convince myself that specifying anything
> >further would be flexible enough to not be a mistake.
> 
> Right - that's the bit I'm concerned about.  Python also usually doesn't 
> impose such policy constraints on mechanism.  For example, function 
> attributes can be or contain anything, and nobody has argued that there 
> need to be prespecified combination semantics, despite the fact that 
> multiple tools can be consumers of the attributes.

Ahh, but function decorators *do* have a specified combination semantic;
specifically an order of application and chaining (the return from the
first decorator will be passed to the second decorator, etc.).

If we were to specify anything, I would suggest we define an order of
annotation calling, which would also define a chaining order if
applicable.  Maybe it is completely obvious, but one should never
underestimate what kinds of silly things users will do.


You responded to Jim Jewett
> [Meanwhile, I'm not going to respond to the rest of your message, since it 
> contained some things that appeared to me to be a mixture of ad hominem 
> attack and straw man argument.  I hope that was not actually your intent.]

As a point of reference, even after you linked the documentation about
PEAK, I still had *no idea* what the heck you meant about PEAK
annotations or their implications to function argument annotations. I
like to believe that I'm not stupid, but maybe I'm wrong, or maybe the
documentation could be better (this isn't an insult, I'm quite
experienced at writing poor documentation)?

 - Josiah


From paul at prescod.net  Sun Aug 13 10:06:01 2006
From: paul at prescod.net (Paul Prescod)
Date: Sun, 13 Aug 2006 01:06:01 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
Message-ID: <1cb725390608130106y3cf29002q6c63dd6ac1ce04d4@mail.gmail.com>

Sorry to write so many emails, but I want to get in one last point tonight
(I'm sure I'll regret posting late at night)

Jim's email seems not to have gotten through to the whole list. There's a
lot of that going aruond.

On 8/12/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>
> >Sure there is.  There will probably be several frameworks using the
> >magic name "doc".
> >
> >This isn't a problem for the person writing myfunc, and therefore
> >isn't a problem for immediate decorators.  It is a problem for
> >inspection code that wants to present information about arbitrary
> >3rd-party libraries.
>
> By this argument, we shouldn't have metaclasses or function attributes,
> because they have the same "problem".


I don't think Jim's issue is a real one (according to the snippet I see in
your email) because doc is an object defined in one and only one place in
Python. It has a unique id(). If two people use the name "doc" then they
will be addressable as module1.doc() and module2.doc(). No problem.

However, it's only a problem if you insist on writing brain-damaged
> code.  If you want interoperability here, you must write tell-don't-ask
> code.  This is true for *any* use case where frameworks might share
> objects; there is absolutely *nothing* special about annotations in this
> regard!


There is something different about annotations than everything else in
Python so far. Annotations are the first feature other than docstrings
(which are proto-annotations) in core Python where third party tools are
supposed to go trolling through your objects FINDING STUFF that they may
decide is interesting or not to them. When you attach a metaclass or a
decorator, you INVOKE CODE that you have installed on your hard drive and if
it crashes then you load up your debugger and see what happend.

When you attach an annotation, you are just adding information that code
OUTSIDE OF YOUR CONTROL will poke around and interpret (the metadata
processor, like a type checker or documentation generator). What you do when
you attach an annotation is make an assertion. You always want to be
confident that you and the person writing the processor code have the same
understanding of the assertion you are making. You do not want to attach a
list because you are asserting that the list is a container for a bunch of
other assertions about the contents of the list whereas the person writing
the processing code thinks that you are asserting that the variable will be
of TYPE list.

Now I'm sure that with all of your framework programming you've run into
this many times and have many techniques for making these assertions
unambiguous. All we need to do is document them so that people who are not
as knowledgable will not get themselves into trouble. It isn't sufficient to
say: "Only smart people will use this stuff so we need not worry" which is
what the original PEP said. Even if it is true, I don't understand why we
would bother taking the risk when the alternative is so low-cost. Define the
behaviour for intepreting a few built-in types and define guidelines and
best practices for other types.

After you run into the issue a few times, you look for a solution, and end
> up with either duck typing, interfaces/adaptation, overloaded functions,
> or
> ad hoc registries.  ALL of these solutions are *more* than adequate to
> handle a simple thing like argument annotations.  That's why I keep
> describing this as a trivial thing: even *pickling* is more complicated
> than this is.  This is no more complex than len() or iter() or filter()!


Pickling works because of the underscores and magic like "
__safe_for_unpickling__". Len works because of __length__. etc. There are
reasons there are underscores there. You understand them, I understand them,
Talin understands them. That doesn't mean that they are self-evident. A
lesser inventor might have used a method just called "safe_for_pickling" and
some unlucky programmer at Bick's might have accidentally triggered
unexpected aspects of the protocol while documenting the properties of
cucumbers.

These are not universally understood techniques. Let's just document them in
the PEP.

However, it appears that mine is a minority opinion.  Unfortunately, I'm at
> a bit of a communication disadvantage, because if somebody wants to
> believe
> something is complicated, there is nothing that anybody can do to change
> their mind.  If you don't consider the possibility that it is way simpler
> than you think, you will never be able to see it.


If it wasn't at least a bit complicated then there would be no underscores.
The underscores are there to prevent SOMETHING bad from happening, right?

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/f99f4ac2/attachment.htm 

From paul at prescod.net  Sun Aug 13 10:17:26 2006
From: paul at prescod.net (Paul Prescod)
Date: Sun, 13 Aug 2006 01:17:26 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com>
References: <mailman.34014.1155280218.27774.python-3000@python.org>
	<5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com>
	<20060811084623.1931.JCARLSON@uci.edu> <44DD073C.7030305@acm.org>
	<43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com>
Message-ID: <1cb725390608130117p7f393441ld43f4f901728b316@mail.gmail.com>

On 8/11/06, Collin Winter <collinw at gmail.com> wrote:
...

What Josiah is hinting at -- and what Talin describes more explicitly
> -- is the problem of how exactly "chaining" annotation interpreters
> will work.


I don't think the question is really how to chain them. The question is how
to avoid them stepping on top of each other accidentally.

The case I've thought out the most completely is that of using
> decorators to analyse/utilise the annotations:


This is not as interesting a case as the following:

annotation scheme 1 is invented by person 1
annotation scheme 2 is invented by person 2
person 3 must use them together on a single function
persons 4 through 1000 write programs that hunt for annotation scheme 1
objects on functions in modules.
persons 2000 through 4000 write programs that hunt for annotation scheme 2
objects.

How can persons 4 through 4000 be confident when they see an annotation on
an object that they are interpreting it as person 3 intended? How can they
be confident that they are not accidentally processing an object (a list, a
string, a file, a customer object, whatever) that was intended to be an
assertion in annotation scheme 1 according to the rules of annotation scheme
2?

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/7929f61c/attachment.html 

From talin at acm.org  Sun Aug 13 10:18:18 2006
From: talin at acm.org (Talin)
Date: Sun, 13 Aug 2006 01:18:18 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608130042h50c7d7f9oc4068f30f2b04bbb@mail.gmail.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>	
	<44DD5DF0.40405@acm.org>	
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>	
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<1cb725390608130042h50c7d7f9oc4068f30f2b04bbb@mail.gmail.com>
Message-ID: <44DEE04A.4090708@acm.org>

Paul Prescod wrote:
>> And the interpetation of:
>>
>>     def cat(infile: [doc("input stream"), opt("i")] = sys.stdin,
>>             outfile: [doc("output stream"), opt("o")] = sys.stdout
>>     ):
>>
>> is likewise unambiguous, unless the creator of the documentation or 
>> option
>> features has defined some other interpretation for a list than
>> "recursively
>> apply to contained items".
> 
> 
> The meaning is "unambiguous unless..." then it ambiguous. So as per my
> previous proposal I think that you and I agree that we should disallow the
> stupid interpretation by encoding the obvious one in the PEP.
> 
> In which case, you need only do something like:
>>
>>     def cat(infile: docopt("input stream", "i") = sys.stdin,
>>             outfile: docopt("output stream", "o") = sys.stdout
>>     ):
>>
>> with an appropriate definition of methods for the 'docopt' type.
> 
> 
> Given that there are an infinite number of tools in the universe that could
> be processing "doc" and "opt" annotations, how would the user KNOW that
> there is one out there with a stupid interpretation of lists? They might
> annotate thousands of classes before finding out that some hot tool that
> they were planning to use next year is incompatible. So let's please define
> a STANDARD way of attaching multiple annotations to a parameter. Lists seem
> like a no-brainer choice for that.
> 
> Since many people seem to be unfamiliar with overloaded functions, I would
>> just like to take this opportunity to remind you that the actual overload
>> mechanism is irrelevant.  If you gave 'doc' objects a 'printDocString()'
>> method and 'opt' objects a 'setOptionName()' method, the exact same logic
>> regarding extensibility applies.  The 'docopt' type would simply 
>> implement
>> both methods.
>>
>> This is normal, simple standard Python stuff; nothing at all fancy.
> 
> 
> The context is a little bit different than standard duck typing.
> 
> Let's say I define a function like this:
> 
> def car(b):
> "b is a list-like object"
> return b[0]
> 
> Then someone comes along and does something I never expected. They invent a
> type representing a list of bits in a bitfield. They pass it to my function
> and everything works trivially. But there's something important that
> happened. The programmer ASSERTED by passing the RDF list to the function
> 'a' that it is a list like object. My code wouldn't have tried to treat it
> as a list if the user hadn't passed it as one explicitly.
> 
> Now look at it from the point of view of function annotations. As we said
> before, the annotations are inert. They are just attached. There is some
> code like a type checker or documentation generator that comes along after
> the fact and scoops them up to do something with them. The user did not
> assert (at the language level!) that any particular annotation applies to
> any particular annotation processor. The annotation processor is just
> looking for stuff that it recognizes. But what if it thinks it recognizes
> something but does not?
> 
> Consider this potential case:
> 
> BobsDocumentationGenerator.py:
> 
> class BobsDocumentationGeneratorAnnotation:
>    def __init__...
>    def printDocument(self):
>        print self.doc
>    def sideEffect(self):
>        deleteHardDrive()
> 
> def BobsDocumentationGenerator(annotation):
>   if hasattr(annotation, "printDocument"):
>       annotation.printDocument()
> 
> SamsDocumentationGenerator.py:
> 
> class SamsDocumentationGeneratorAnnotation:
>    def __init__...
>    def printDocument(self):
>        return self.doc
>    def sideEffect(self):
>        email(self.doc, "python-dev at pytho...")
> 
> def SamsDocumentationGenerator(annotation):
>   if hasattr(annotation, "printDocument"):
>       print annotation.printDocument()
>       annotation.sideEffect()
> 
> These objects, _by accident_ have the same method signature but different
> side effects and return values. Nobody anywhere in the system made an
> incorrect assertion. They just happened to be unlucky in the naming of 
> their
> methods. (unbelievably unlucky but you get the drift)
> 
> One simple way to make it unambiguous would be to do a test more like:
> 
>   if hasattr(annotation, SamsDocumentationGenerator.uniqueObject): ...
> 
> The association of the unique object with an annotator object would be an
> explicit assertion of compatibility.
> 
> Can we agree that the PEP should describe strategies that people should use
> to make their annotation recognition strategies unambiguous and
> failure-proof?
> 
> I think that merely documenting appropriately defensive techniques might be
> enough to make Talin happy. Note that it isn't the processing code that
> needs to be defensive (in the sense of try/catch blocks). It is the whole
> recognition strategy that the processing code uses. Whatever recognition
> strategy it uses must be unambiguous. It seems like it would hurt nobody to
> document this and suggest some unambiguous techniques.

This says pretty much what I was trying to say, only better :)

I think I am going to chill out on this topic for a bit - it seems that 
there are folks who have a better understanding of the issue than I do, 
and mainly the only reason I was commenting on the PEP was because that 
was what was asked for. I don't really have a big stake in the whole 
annotation effort, there are other issues that I am really more 
interested in.

-- Talin


From paul at prescod.net  Sun Aug 13 10:24:00 2006
From: paul at prescod.net (Paul Prescod)
Date: Sun, 13 Aug 2006 01:24:00 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <20060812233132.197F.JCARLSON@uci.edu>
References: <20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813010634.0228ee30@sparrow.telecommunity.com>
	<20060812233132.197F.JCARLSON@uci.edu>
Message-ID: <1cb725390608130124m2e3a3254v40058e23c2b6b737@mail.gmail.com>

On 8/13/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> ...
> If we were to specify anything, I would suggest we define an order of
> annotation calling, which would also define a chaining order if
> applicable.  Maybe it is completely obvious, but one should never
> underestimate what kinds of silly things users will do.
>

Annotations are not called. They are not like decorators. Decorators
typically "wrap" a function. Annotations are just attached to it. A
decorator must be a callable. An annotation could be just the number "5".
Decorators build on each other, perhaps changing the function's behaviour.
Annotations (should!) just accumulate and typically do not change the
parameter's behaviour. The PEP does not say how you would define annotations
that just accumulate but it seems common sense to me that it would be
through a list syntax. I think that the PEP should just say that.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/b232372c/attachment.htm 

From jcarlson at uci.edu  Sun Aug 13 10:53:23 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 13 Aug 2006 01:53:23 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608130124m2e3a3254v40058e23c2b6b737@mail.gmail.com>
References: <20060812233132.197F.JCARLSON@uci.edu>
	<1cb725390608130124m2e3a3254v40058e23c2b6b737@mail.gmail.com>
Message-ID: <20060813013709.1982.JCARLSON@uci.edu>


"Paul Prescod" <paul at prescod.net> wrote:
> On 8/13/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> >
> > ...
> > If we were to specify anything, I would suggest we define an order of
> > annotation calling, which would also define a chaining order if
> > applicable.  Maybe it is completely obvious, but one should never
> > underestimate what kinds of silly things users will do.
> 
> Annotations are not called. They are not like decorators.

Right.  What I meant (which perhaps wan't what I said), was that we
should define the order in which functions that operate on these
annotations execute, regardless of the mechanism.  Say, for example, I
have the following function definition:

    def foo(arg1:[bar(1), baz(2)]):
        ...

However the (unspecified user defined machinery that handles the)
annotation processing gets to foo(), if it knows about how to handle the
'bar' and 'baz' annotations, a properly written annotation processor
will handle the 'bar' annotation before the 'baz' annotation.


 - Josiah


From talin at acm.org  Sun Aug 13 13:07:44 2006
From: talin at acm.org (Talin)
Date: Sun, 13 Aug 2006 04:07:44 -0700
Subject: [Python-3000] Python/C++ question
In-Reply-To: <ca471dc20608091618k3cffb3ewa3029794f0f02761@mail.gmail.com>
References: <44DA6C01.2040904@acm.org>
	<ca471dc20608091618k3cffb3ewa3029794f0f02761@mail.gmail.com>
Message-ID: <44DF0800.4060204@acm.org>

Guido van Rossum wrote:
> On 8/9/06, Talin <talin at acm.org> wrote:
> For the majority of Python developers it's probably the other way
> around. It's been 15 years since I wrote C++, and unlike C, that
> language has changed a lot since then...
> 
> It would be a complete rewrite; I prefer doing a gradual
> transmogrification of the current codebase into Py3k rather than
> starting from scratch (read Joel Spolsky on why).

BTW, Should this be added to PEP 3099?

(Although I do think that a gradual transition is certainly possible, I 
am not going to push for it.)

-- Talin

From talin at acm.org  Sun Aug 13 13:30:00 2006
From: talin at acm.org (Talin)
Date: Sun, 13 Aug 2006 04:30:00 -0700
Subject: [Python-3000] Bound and unbound methods
Message-ID: <44DF0D38.6070507@acm.org>

One of the items in PEP 3100 is getting rid of unbound methods. I want 
to explore a heretical notion, which is getting rid of bound methods as 
well.

Now, to be honest, I rather like bound methods. I like being able to 
capture a method call, store it in a variable, and call it later.

However, I also realize that requiring every access to a class variable 
to instantiate a new method object is expensive, to say the least.

Calling a callable would not require a bound method - the 'self' 
parameter would be just another argument. User-defined functions would 
then be no different from native built-in functions or other callables.

You would still need some way to explicitly bind a method if you wanted 
to store it in a variable, perhaps using something like the various 
wrappers in module 'functional'. It would be extra typing, but for me at 
least its not something I do very often, and it would at least have the 
virtue that the intent of the code would be more visually obvious. 
(Also, I tend to find, in my code at least, that I more often use 
closures to accomplish the same thing, which are both clearer to read 
and more powerful.)

Now, one remaining problem to be solved is whether or not to pass 'self' 
as an argument to the resulting callable. I suppose that could be 
handled by inspecting the attributes of the callable and adding the 
extra 'self' argument at the last minute if its not a static method. I 
suspect such tests would be relatively fast, much less than the time 
needed to instantiate and initialize a new method object.

Anyway, I just wanted to throw that out there. Feel free to -1 away... :)

-- Talin

From g.brandl at gmx.net  Sun Aug 13 14:24:59 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 13 Aug 2006 14:24:59 +0200
Subject: [Python-3000] Python/C++ question
In-Reply-To: <44DF0800.4060204@acm.org>
References: <44DA6C01.2040904@acm.org>	<ca471dc20608091618k3cffb3ewa3029794f0f02761@mail.gmail.com>
	<44DF0800.4060204@acm.org>
Message-ID: <ebn5ms$mne$1@sea.gmane.org>

Talin wrote:
> Guido van Rossum wrote:
>> On 8/9/06, Talin <talin at acm.org> wrote:
>> For the majority of Python developers it's probably the other way
>> around. It's been 15 years since I wrote C++, and unlike C, that
>> language has changed a lot since then...
>> 
>> It would be a complete rewrite; I prefer doing a gradual
>> transmogrification of the current codebase into Py3k rather than
>> starting from scratch (read Joel Spolsky on why).
> 
> BTW, Should this be added to PEP 3099?

Yes, why not.

Georg


From pje at telecommunity.com  Sun Aug 13 19:28:42 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 13 Aug 2006 13:28:42 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608130106y3cf29002q6c63dd6ac1ce04d4@mail.gmail.co
 m>
References: <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>

At 01:06 AM 8/13/2006 -0700, Paul Prescod wrote:
>There is something different about annotations than everything else in 
>Python so far. Annotations are the first feature other than docstrings 
>(which are proto-annotations) in core Python where third party tools are 
>supposed to go trolling through your objects FINDING STUFF that they may 
>decide is interesting or not to them.

You make it sound like we've never had documentation tools before, or web 
servers.  Zope has been trolling through Python objects "finding stuff" 
since *1996*.  It's not at all a coincidence that the first 
interface/adaptation systems for Python (AFAIK) were built for Zope.

So some people in the Python community have had an entire *decade* of 
experience with this kind of thing.  It's just a guess, but some of them 
might actually know a thing or two about the subject by now.  ;-)


>Now I'm sure that with all of your framework programming you've run into 
>this many times and have many techniques for making these assertions 
>unambiguous. All we need to do is document them so that people who are not 
>as knowledgable will not get themselves into trouble.

Sure.  Here are two nice articles that people can read to understand the 
basic ideas of "tell, don't ask".  One by the "Pragmatic Programmers":

     http://www.pragmaticprogrammer.com/articles/jan_03_enbug.pdf


And another by Allen Holub on the evils of getters and setters, that 
touches on the same principles:

     http://www.javaworld.com/javaworld/jw-09-2003/jw-0905-toolbox.html



>It isn't sufficient to say: "Only smart people will use this stuff so we 
>need not worry" which is what the original PEP said. Even if it is true, I 
>don't understand why we would bother taking the risk when the alternative 
>is so low-cost.

There are so many other pitfalls to writing extensible and interoperable 
code in Python, why focus so much effort on such an incredibly minor 
one?  The truth is that hardly anybody cares about writing extensible or 
interoperable code except framework developers -- and they've already *got* 
solutions.  Twisted or Zope developers would see this as a trivial use case 
for adaptation, and PEAK developers would use either adaptation or generic 
functions, and keep on moving with nary a speedbump.

Nonetheless, I don't object to documenting best practices; I just don't 
want to mandate a *particular* solution -- with one exception.

If Py3K is going to include overloaded functions, then that should be 
considered the One Obvious Way to work with annotations, since it's an 
"included battery" (and none of the existing 
interface/adaptation/overloading toolkits are likely to work as-is in Py3K 
without some porting effort).  But if Py3K doesn't include overloading or 
adaptation, then the One Obvious Way will be "whatever a knowledgeable 
framework programmer wants to do."


>Pickling works because of the underscores and magic like " 
>__safe_for_unpickling__". Len works because of __length__. etc. There are 
>reasons there are underscores there. You understand them, I understand 
>them, Talin understands them. That doesn't mean that they are 
>self-evident. A lesser inventor might have used a method just called 
>"safe_for_pickling" and some unlucky programmer at Bick's might have 
>accidentally triggered unexpected aspects of the protocol while 
>documenting the properties of cucumbers.

Note that you're pointing out a problem that already exists today in 
Python, and has for some time.  It's why the Zope folks use interfaces and 
adaptation, and why I use overloaded functions.  The problem has nothing to 
do with annotations as such, so if you want to solve that problem, you 
should be pushing for overloaded functions in the stdlib, and using 
annotations as an example of why they're good to have.


>Can we agree that the PEP should describe strategies that people should 
>use to make their annotation recognition strategies unambiguous and 
>failure-proof?

Absolutely - and I recommended that we recommend "tell, don't ask" 
processing using one of the following techniques:

1. duck typing
2. adaptation
3. overloaded functions
4. type registries

You seem to be arguing that duck typing is inadequate because it is 
name-based and names can conflict.  I agree, which is why I believe #2-4 
are better: they don't rely on mere name matching.  However, duck typing is 
still *adequate* as long as names are sufficiently descriptive or at least 
lengthy enough to prevent collision.  Including a package-specific 
namespace prefix like "foo_printDocumentation" is sufficient best practice 
to avoid duck typing name collisions in virtually all cases.

I'm just baffled why all this focus on the issue on such a minor thing, 
when Python has far more pitfalls to interoperability than this.  But I 
guess if you see this as the first time that objects might be implicitly 
used by something, I suppose it makes sense.  But it's really not the first 
time, and these are well-understood problems among developers of major 
Python frameworks, especially Zope.


>I think that merely documenting appropriately defensive techniques might 
>be enough to make Talin happy. Note that it isn't the processing code that 
>needs to be defensive (in the sense of try/catch blocks). It is the whole 
>recognition strategy that the processing code uses. Whatever recognition 
>strategy it uses must be unambiguous. It seems like it would hurt nobody 
>to document this and suggest some unambiguous techniques.

I already recommended that we do this, and have repeated my recommendation 
above for your convenience.


From steven.bethard at gmail.com  Sun Aug 13 19:29:20 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Sun, 13 Aug 2006 11:29:20 -0600
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <44DF0D38.6070507@acm.org>
References: <44DF0D38.6070507@acm.org>
Message-ID: <d11dcfba0608131029t32a82113yf851e1c6cfce23c2@mail.gmail.com>

On 8/13/06, Talin <talin at acm.org> wrote:
> One of the items in PEP 3100 is getting rid of unbound methods. I want
> to explore a heretical notion, which is getting rid of bound methods as
> well.

I believe you're suggesting that the code that I just wrote moments
ago would stop working::

    get_features = self._get_document_features
    return [get_features(i, document_graph, comparable_graphs)
            for i, document_graph in enumerate(document_graphs)]

The line ``get_features = ...`` expects the function stored to be
bound to ``self``.  I write code like this *all the time*,
particularly when I have a long method name that needs to be used in a
complex expression and I want to keep my lines within the suggested 79
characters.

If I understand the proposal right and my code above would be
invalidated, I'm a strong -1 to this. It would break an enormous
amount of my code.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From jcarlson at uci.edu  Sun Aug 13 19:58:33 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 13 Aug 2006 10:58:33 -0700
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <44DF0D38.6070507@acm.org>
References: <44DF0D38.6070507@acm.org>
Message-ID: <20060813102036.1985.JCARLSON@uci.edu>


Talin <talin at acm.org> wrote:
> 
> One of the items in PEP 3100 is getting rid of unbound methods. I want 
> to explore a heretical notion, which is getting rid of bound methods as 
> well.
> 
> Now, to be honest, I rather like bound methods. I like being able to 
> capture a method call, store it in a variable, and call it later.
> 
> However, I also realize that requiring every access to a class variable 
> to instantiate a new method object is expensive, to say the least.

Well, it's up-front vs. at-access.  For instances whose methods are
generally used rarely, the up-front cost of instantiating every method
is high in comparison (unless there are a relatively large number of
method accesses), and technically infinite if applied to all objects.
Why?

I have a class foo, I instantiate foo, now all of foo's methods get
instantiated.  Ahh, but foo's methods are also instances of function. It
doesn't really have any new methods on foo's methods, but they do have
attributes that are instances, so we will need to instantiate all of the
methods' attributes' methods, and recursively, to infinity.  The
non-creation of instantiated methods for objects is a lazy-evaluation
technique to prevent infinite recursion, in general.

On the other hand, it may make sense to offer a metaclass and/or
decorator that signals that a single method instance should be created
for particular methods up-front, rather than at-access to those methods.
But what kind of difference could we expect?  42%/28% improvement for
class methods/object methods in 2.4 respectively, and 45%/26%
improvement in 2.5 beta .  This does not include actually calling the
methods.


> Now, one remaining problem to be solved is whether or not to pass 'self' 
> as an argument to the resulting callable. I suppose that could be 
> handled by inspecting the attributes of the callable and adding the 
> extra 'self' argument at the last minute if its not a static method. I 
> suspect such tests would be relatively fast, much less than the time 
> needed to instantiate and initialize a new method object.

I think that a change that required calls of the form
obj.instancemethod(obj, ...) are non-starters.  


I'm -1 for instantiating all methods (for the infinite cost reasons),
and -1 for int, long, list, tuple, dict, float (method access is
generally limited for these objects).  I'm +0 for offering a suitable
metaclass and/or decorator, but believe it would be better suited for
the Python cookbook, as performance improvements when function calls are
taken into consideration is significantly less.

 - Josiah

[1]

Timings for accessing instance methods

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>>
>>> def test(n):
...     _time = time
...
...     class foo:
...         def bar(self):
...             pass
...     xr = xrange(n)
...     x = foo()
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'class method', time.time()-t
...
...     x.bar = x.bar
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'instantiated class method', time.time()-t
...
...     class foo(object):
...         def bar(self):
...             pass
...
...     x = foo()
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'object method', time.time()-t
...
...     x.bar = x.bar
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'instantiated object method', time.time()-t
...
...     class foo(object):
...         __slots__ = 'bar'
...         def __init__(self):
...             self.bar = self._bar
...         def _bar(self):
...             pass
...
...     x = foo()
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'instantiated object __slot__ method', time.time()-t
...
>>> test(5000000)
class method 1.96799993515
instantiated class method 1.14100003242
object method 1.71900010109
instantiated object method 1.23399996758
instantiated object __slot__ method 1.26600003242
>>>

Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on
 win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>>
>>> def test(n):
...     _time = time
...
...     class foo:
...         def bar(self):
...             pass
...     xr = xrange(n)
...     x = foo()
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'class method', time.time()-t
...
...     x.bar = x.bar
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'instantiated class method', time.time()-t
...
...     class foo(object):
...         def bar(self):
...             pass
...
...     x = foo()
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'object method', time.time()-t
...
...     x.bar = x.bar
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'instantiated object method', time.time()-t
...
...     class foo(object):
...         __slots__ = 'bar'
...         def __init__(self):
...             self.bar = self._bar
...         def _bar(self):
...             pass
...
...     x = foo()
...     t = time.time()
...     for i in xr:
...         x.bar
...     print 'instantiated object __slot__ method', time.time()-t
...
>>> test(5000000)
class method 1.98500013351
instantiated class method 1.09299993515
object method 1.67199993134
instantiated object method 1.23500013351
instantiated object __slot__ method 1.23399996758
>>>


From paul at prescod.net  Sun Aug 13 19:57:20 2006
From: paul at prescod.net (Paul Prescod)
Date: Sun, 13 Aug 2006 10:57:20 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
Message-ID: <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>

If we get past the meta-discussion, I don't really see any disagreement
left. I'll grit my teeth and avoid commenting on the meta-discussion. ;)

My proposed text for the PEP is as follows:

"In order for processors of function annotations to work interoperably, they
must use a common interpretation of objects used as annotations on a
particular function. For example, one might interpret string annotations as
docstrings. Another might interpet them as path segments for a web
framework. For this reason, function annotation processors SHOULD avoid
assigning processor-specific meanings to types defined outside of the
processor's framework. For example, a Django processor could process
annotations of a type defined in a Zope package, but Zope's creators should
be considered the authorities on the type's meaning for the same reasons
that they would be considered authorities on the semantics of classes or
methods in their packages."

"This implies that the interpretation of built-in types would be controlled
by Python's developers and documented in Python's documentation. This is
just a best practice. Nothing in the language can or should enforce this
practice and there may be a few domains where there is a strong argument for
violating it ( e.g. an education environment where saving keystrokes may be
more important than easing interopability)."

"In Python 3000, semantics will be attached to the following types: objects
of type string (or subtype of string) are to be used for documentation
(though they are not necessarily the exclusive source of documentation about
the type). Objects of type list (or subtype of list) are to be used for
attaching multiple independent annotations."

"Developers who define new metadata frameworks SHOULD choose explicit and
unambiguous mechanisms for associating objects with their frameworks.
Furthermore, they SHOULD consider that some users may wish to extend their
frameworks and should support that. For example, they could use Python 3000
overloaded functions, some form of registry, some kind of interface or some
unambiguously recognizable method signature protocol (e.g.
_pytypelib_type_check())."

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/9edf62a9/attachment.htm 

From talin at acm.org  Sun Aug 13 22:08:10 2006
From: talin at acm.org (Talin)
Date: Sun, 13 Aug 2006 13:08:10 -0700
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <20060813102036.1985.JCARLSON@uci.edu>
References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu>
Message-ID: <44DF86AA.7050207@acm.org>

Josiah Carlson wrote:
> Talin <talin at acm.org> wrote:
>> One of the items in PEP 3100 is getting rid of unbound methods. I want 
>> to explore a heretical notion, which is getting rid of bound methods as 
>> well.
>>
>> Now, to be honest, I rather like bound methods. I like being able to 
>> capture a method call, store it in a variable, and call it later.
>>
>> However, I also realize that requiring every access to a class variable 
>> to instantiate a new method object is expensive, to say the least.
> 
> Well, it's up-front vs. at-access.  For instances whose methods are
> generally used rarely, the up-front cost of instantiating every method
> is high in comparison (unless there are a relatively large number of
> method accesses), and technically infinite if applied to all objects.
> Why?
> 
> I have a class foo, I instantiate foo, now all of foo's methods get
> instantiated.  Ahh, but foo's methods are also instances of function. It
> doesn't really have any new methods on foo's methods, but they do have
> attributes that are instances, so we will need to instantiate all of the
> methods' attributes' methods, and recursively, to infinity.  The
> non-creation of instantiated methods for objects is a lazy-evaluation
> technique to prevent infinite recursion, in general.
> 
> On the other hand, it may make sense to offer a metaclass and/or
> decorator that signals that a single method instance should be created
> for particular methods up-front, rather than at-access to those methods.
> But what kind of difference could we expect?  42%/28% improvement for
> class methods/object methods in 2.4 respectively, and 45%/26%
> improvement in 2.5 beta .  This does not include actually calling the
> methods.

No, I wasn't proposing that methods be bound up front...read on.

>> Now, one remaining problem to be solved is whether or not to pass 'self' 
>> as an argument to the resulting callable. I suppose that could be 
>> handled by inspecting the attributes of the callable and adding the 
>> extra 'self' argument at the last minute if its not a static method. I 
>> suspect such tests would be relatively fast, much less than the time 
>> needed to instantiate and initialize a new method object.
> 
> I think that a change that required calls of the form
> obj.instancemethod(obj, ...) are non-starters.  

Yes, that's a non-starter, but that's not what I was proposing either.

I see that I left an important piece out of my proposal, which I'll need 
to explain.

Right now, when you say: 'obj.instancemethod()', there are in fact two 
distinct operations going on. The first is the lookup of the attribute 
'instancemethod', and the second is the invoking of the resulting callable.

In order to get rid of the creation of method objects, the compiler 
would have to recognize these two operations and combine them into a 
single "call method" opcode - one which looks up the attribute, but 
leaves the original object pointer on the stack, and then invokes the 
resulting callable, along with the object pointer.

So essentially the 'bind' operation is moved into the method invocation 
code - which eliminates the need to create a holding object to remember 
the binding information.

Hmmmm....I wonder if it could be me made to work in a 
backwards-compatible way. In other words, suppose the existing logic of 
creating a method object were left in place, however the 
'obj.instancemethod()' pattern would bypass all of that. In other words, 
the compiler would note the combination of the attribute access and the 
call, and combine them into an opcode that skips the whole method 
creation step. (Maybe it already does this and I'm just being stupid.)

> I'm -1 for instantiating all methods (for the infinite cost reasons),
> and -1 for int, long, list, tuple, dict, float (method access is
> generally limited for these objects).  I'm +0 for offering a suitable
> metaclass and/or decorator, but believe it would be better suited for
> the Python cookbook, as performance improvements when function calls are
> taken into consideration is significantly less.

Thanks for the timing information by the way.

From thomas at python.org  Sun Aug 13 23:22:32 2006
From: thomas at python.org (Thomas Wouters)
Date: Sun, 13 Aug 2006 23:22:32 +0200
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <44DF86AA.7050207@acm.org>
References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu>
	<44DF86AA.7050207@acm.org>
Message-ID: <9e804ac0608131422w3bd95d57gb1c195e16dc1f9bd@mail.gmail.com>

On 8/13/06, Talin <talin at acm.org> wrote:

> Hmmmm....I wonder if it could be me made to work in a
> backwards-compatible way. In other words, suppose the existing logic of
> creating a method object were left in place, however the
> 'obj.instancemethod()' pattern would bypass all of that. In other words,
> the compiler would note the combination of the attribute access and the
> call, and combine them into an opcode that skips the whole method
> creation step. (Maybe it already does this and I'm just being stupid.)


Been there, done that, bought the T-shirt (well, it was just a PyCon-1
T-shirt):

http://sourceforge.net/tracker/index.php?func=detail&aid=709744&group_id=5470&atid=305470

Back then, the end result of that particular change was very tiny, and it
wasn't even taking new-style classes into account (which would have made it
more complex.) It may be worth re-trying anyway, especially for python-3000:
no classic classes to worry about. And quite a lot has changed in the
compiler and opcode dispatcher in the mean time. I am completely -1 on
getting rid of bound methods, though.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/5d70b8e9/attachment.html 

From ark-mlist at att.net  Mon Aug 14 00:47:31 2006
From: ark-mlist at att.net (Andrew Koenig)
Date: Sun, 13 Aug 2006 18:47:31 -0400
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <44DF0D38.6070507@acm.org>
Message-ID: <000901c6bf2a$76781270$6402a8c0@arkdesktop>

> However, I also realize that requiring every access to a class variable
> to instantiate a new method object is expensive, to say the least.

Why does every access to a class variable have to instantiate a new method
object?



From tomerfiliba at gmail.com  Mon Aug 14 01:03:06 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Mon, 14 Aug 2006 01:03:06 +0200
Subject: [Python-3000]  Bound and unbound methods
Message-ID: <1d85506f0608131603u39be2727ie0b2f15db3dee69f@mail.gmail.com>

[Josiah]
> I'm -1 for instantiating all methods (for the infinite cost reasons),
> and -1 for int, long, list, tuple, dict, float (method access is
> generally limited for these objects).  I'm +0 for offering a suitable
> metaclass and/or decorator, but believe it would be better suited for
> the Python cookbook, as performance improvements when function calls are
> taken into consideration is significantly less.

http://sebulba.wikispaces.com/receip+prebound

i'm sorry, i just love descriptors too much. it kept me out of bed,
until i wrote it down :)



-tomer

From ncoghlan at gmail.com  Mon Aug 14 03:40:43 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 14 Aug 2006 11:40:43 +1000
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>	<20060812205512.197A.JCARLSON@uci.edu>	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
Message-ID: <44DFD49B.4030308@gmail.com>

Paul Prescod wrote:
> If we get past the meta-discussion, I don't really see any disagreement 
> left. I'll grit my teeth and avoid commenting on the meta-discussion. ;)

Ah, so I'm not the only one doing that then };>

> My proposed text for the PEP is as follows:

Generally +1, except for this bit:

> "In Python 3000, semantics will be attached to the following types: 
> objects of type string (or subtype of string) are to be used for 
> documentation (though they are not necessarily the exclusive source of 
> documentation about the type). Objects of type list (or subtype of list) 
> are to be used for attaching multiple independent annotations."

Interpretations of string & list subtypes should be up to whoever creates 
those subtypes - it's only the builtins themselves that python-dev should be 
the authority for.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Mon Aug 14 04:27:57 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 14 Aug 2006 12:27:57 +1000
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <44DF0D38.6070507@acm.org>
References: <44DF0D38.6070507@acm.org>
Message-ID: <44DFDFAD.3010408@gmail.com>

Talin wrote:
> Anyway, I just wanted to throw that out there. Feel free to -1 away... :)

Based on the later discussion, I see two interesting possibilities:

1. A special CALL_METHOD opcode that the compiler emits when it spots the 
".NAME(ARGS)" pattern. This could simply be an optimisation performed by the 
bytecode emitter when processing an AST Call node with an Attribute node as 
the "func" subnode (it would need to poke around inside the Attribute node, 
rather than generating the Attribute node's code normally, though). For 
functions, this opcode could bypass __get__ and invoke __call__ directly with 
the right arguments. Put the actual optimisation into PyObject_CallMethod and 
call that from the new opcode, and more than just the eval loop would benefit.

This could also be done by the addition of a MethodCall AST node, and an 
AST->AST optimizing pass that took the Call+Attribute node and merged them 
into a single MethodCall node (The concrete parser can't look far enough ahead 
to figure out that a given attribute access is part of a method call).

Option 1 is focused on the speedup Talin mentioned. Aside from the downside of 
additional complexity in the code generation phase, I don't see any real 
downside - __get__ will only be bypassed when the interpreter *knows* what the 
descriptor would do.

2. Rewrite the __get__ methods on functions, classmethod and staticmethod to 
cache the resulting method object in the class dictionary or instance 
dictionary. This would entail making method objects descriptors that returned 
a bound copy of themselves when retrieved through an instance. That way, for 
methods that are never called, the method objects are never created, but for 
methods that are used, the method object is created only once. Something would 
need to be done to make this work for object's without an instance dictionary 
- those could either continue to not cache their instance methods, they could 
have a lazily initialized __dict__ pointer that is instantiated the first time 
it is needed instead of no dict at all (yay, attributes on object() 
instances!), or else there could be an id() keyed cache internal to the 
interpreter.

I personally would favour the option of making __dict__ available by default 
(i.e. put that behaviour in object), with no caching occurring if the object 
had no __dict__ attribute at all. Tuples and the numeric types could continue 
not to support attributes (as allocating space for an extra pointer would be a 
big size increase for them in their general usage pattern, and they don't 
generally have methods that are called from Python), while the other builtin 
types would acquire a usable __dict__ attribute (which may not be instantiated 
until the first time it is needed, although if instance methods get cached, it 
would be needed most of the time, so the extra complexity of lazy 
initialization may not be worth it).

The interesting benefit of option 2 is that "assert list.append is 
list.append" would now succeed, as would "s = []; assert s.append is 
s.append". "assert [].index is [].index" would still fail though, as different 
instances would get their own bound methods.

The downside of option 2 is that it is slightly more likely to break stuff due 
to the changes in semantics, and that it is a case of a genuine space-speed 
tradeoff - this approach *will* use more memory than the current approach, 
because bound method objects are always allocated permanently instead of being 
ephemeral things.

OTOH, if you did both option 1 and option 2, the caching would occur only if 
you retrieved a method without calling it immediately, and be bypassed most of 
the time.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From greg.ewing at canterbury.ac.nz  Mon Aug 14 04:31:46 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 14 Aug 2006 14:31:46 +1200
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <44DF86AA.7050207@acm.org>
References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu>
	<44DF86AA.7050207@acm.org>
Message-ID: <44DFE092.8030604@canterbury.ac.nz>

Talin wrote:
> the compiler would note the combination of the attribute access and the 
> call, and combine them into an opcode that skips the whole method 
> creation step.

Something like that could probably be made to work. You'd
want to be careful to do the optimisation only when the
attribute in question is an ordinary attribute, not
a property or other descriptor.

I'm also -1 on eliminating bound methods entirely.
I worked through that idea in considerable depth during my
discussions with the author of Prothon, which was also to
have been without any notion of bound methods. The
consequences are further-reaching than you might think at
first. The bottom line is that without bound methods,
Python wouldn't really be Python any more.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug 14 05:22:10 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 14 Aug 2006 15:22:10 +1200
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
Message-ID: <44DFEC62.6000904@canterbury.ac.nz>

Phillip J. Eby wrote:

> Since many people seem to be unfamiliar with overloaded functions, I would 
> just like to take this opportunity to remind you that the actual overload 
> mechanism is irrelevant.

I don't think it's the concept of overloadable functions
that people are having trouble with here, but that you
haven't clearly explained *how* they would be applied
to solving this particular problem.

You seem to think the answer to that is so obvious
that it doesn't need mentioning, but we're not all
up to the same mental speed as you on this.

Perhaps you could provide a complete worked-out
example for people to look at?

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug 14 05:22:22 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 14 Aug 2006 15:22:22 +1200
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
Message-ID: <44DFEC6E.8020603@canterbury.ac.nz>

Phillip J. Eby wrote:

> Not at all.  A and B need only use overloadable functions, and the problem 
> is trivially resolved by adding overloads.  The author of C can add an 
> overload to "A" that will handle objects with 'next' attributes, or add one 
> to "B" that handles tuples, or both.

Phillip, you still haven't explained what to do if
the code processing the annotations is in a separate
program altogether, to which the user has no access
in order to overload methods or perform other such
modifications.

--
Greg

From pje at telecommunity.com  Mon Aug 14 06:21:27 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 14 Aug 2006 00:21:27 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DFEC6E.8020603@canterbury.ac.nz>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com>

At 03:22 PM 8/14/2006 +1200, Greg Ewing wrote:
>Phillip J. Eby wrote:
>
>>Not at all.  A and B need only use overloadable functions, and the 
>>problem is trivially resolved by adding overloads.  The author of C can 
>>add an overload to "A" that will handle objects with 'next' attributes, 
>>or add one to "B" that handles tuples, or both.
>
>Phillip, you still haven't explained what to do if
>the code processing the annotations is in a separate
>program altogether, to which the user has no access
>in order to overload methods or perform other such
>modifications.

It can't be a "separate program altogether", since to get at the 
annotations, the  program must import the module that contains them.  Thus, 
the registration need only occur in some module imported by the module that 
uses the annotations.


From pje at telecommunity.com  Mon Aug 14 06:52:41 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 14 Aug 2006 00:52:41 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44DFEC62.6000904@canterbury.ac.nz>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com>

At 03:22 PM 8/14/2006 +1200, Greg Ewing wrote:
>Phillip J. Eby wrote:
>>Since many people seem to be unfamiliar with overloaded functions, I 
>>would just like to take this opportunity to remind you that the actual 
>>overload mechanism is irrelevant.
>
>I don't think it's the concept of overloadable functions
>that people are having trouble with here, but that you
>haven't clearly explained *how* they would be applied
>to solving this particular problem.

In the same way that plain old standard Python duck typing would be 
used.  The only differences between overloaded functions and duck typing 
are that:

1. Overloaded functions can't accidentally collide, the way names chosen 
for duck typing can.

2. Third parties can declare overloaded methods without monkeypatching, but 
duck typing requires that you be the author of the object in question or 
that you be able to monkeypatch the type to add methods.

3. You can usually define some default behavior for an unrecognized type - 
as though you could add methods to the 'object' type.

4. Overloaded functions can dispatch on more than one type at the same 
time, or do other things, depending on their implementation.

Aside from these extra features of overloaded functions, there isn't much 
difference between overloading and duck typing; it's merely the difference 
between:

       someOb.quack()

and:

       quack(someOb)

So, if you can imagine handling annotations using duck typing and 
hasattr(), then you can imagine doing it with overloaded functions.  If you 
can't imagine using duck typing or hasattr() to process some annotations 
and ignore the ones you don't understand, then I don't really know how I 
would explain it.


>You seem to think the answer to that is so obvious
>that it doesn't need mentioning, but we're not all
>up to the same mental speed as you on this.
>
>Perhaps you could provide a complete worked-out
>example for people to look at?

I did - the PEAK documentation links I gave previously included a doctest 
that walked through the definition of a 'Message()' attribute annotation 
that prints a message at class definition (or other metadata definition) 
time.  The other two links showed examples of using attribute annotations 
for declaring security permissions and command-line options.

Some people said they didn't "get" anything from those links, but I'm at 
somewhat of a loss to understand why.  The examples there are very short 
and simple; in fact the complete Message implementation, including imports 
and overload declarations is only *6 lines long*.

So, my only guess is that the people who looked at that skimmed right past 
it, looking for something more complicated!  They probably then proceeded 
to the rest of the documentation and got bogged down in other aspects of 
the framework that aren't related to this discussion.

Therefore, if anybody would like to provide an example of how *they* would 
write code for some function attribute scenario, I'll happily modify it to 
demonstrate tell-don't-ask with either duck typing, adaptation, 
overloading, or whatever you like.  But from a communication POV, it 
doesn't make sense to me to try and write an example, since it's going to 
come from *my* worldview (in which this is a trivial problem) and not the 
worldview of the people who don't understand it.

It seems to me that the right way to proceed is to have somebody provide an 
example in *their* worldview, so that when I alter it they will have a 
reference point for what I'm talking about.  (Notice that this seemed to 
work well for Josiah and Paul when I reworked Paul's example.)


From theller at python.net  Mon Aug 14 16:55:24 2006
From: theller at python.net (Thomas Heller)
Date: Mon, 14 Aug 2006 16:55:24 +0200
Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry
In-Reply-To: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>
References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>
Message-ID: <ebq2ss$qhf$1@sea.gmane.org>

Tim Peters schrieb:
> [Josiah Carlson]
>> ...
>> Python 2.3.5 (#62, Feb  8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> import ctypes
>> >>> import threading
>> >>> import time
>> >>> def foo():
>> ...     try:
>> ...             while 1:
>> ...                     time.sleep(.01)
>> ...     finally:
>> ...             print "I quit!"
>> ...
>> >>> x = threading.Thread(target=foo)
>> >>> x.start()
>> >>> for i,j in threading._active.items():
>> ...     if j is x:
>> ...             break
>> ...
>> >>> ctypes.pythonapi.PyThreadState_SetAsyncExc(i, ctypes.py_object(Exception))
> 
> As I discovered to my chagrin when I added a similar test to the test
> suite a few days ago, that's got a subtle error on most 64-bit boxes.
> When the ctypes docs talk about passing and returning integers, they
> never explain what "integers" /means/, but it seems the docs
> implicitly have a 32-bit-only view of the world here.  In reality
> "integer" seems to mean the native C `int` type.

'ctypes.c_int' and 'ctypes.c_long' correspond to the C 'int' and 'long' types.
If you think that the docs could be clearer, please suggest changes.

> But a Python thread
> id is a native C `long` (== a Python short integer), and the code
> above fails in a baffling way on most 64-bit boxes:  the call returns
> 0 instead; i.e. the thread id isn't found, and no exception gets set.
> So I believe that needs to be:
> 
>     ctypes.pythonapi.PyThreadState_SetAsyncExc(
>         ctypes.c_long(i),
>         ctypes.py_object(Exception))
> 
> to make it portable.

Right.  A little bit more safety migt be gained by setting the argtypes attribute
of the PyThreadState_SetAsyncExc function in this way:

ctypes.pythonapi.PyThreadState_SetAsyncEx.argtypes = ctypes.c_long, ctypes.py_object

This way the wrapping of arguments is automatic.

> It's unclear to me how to write portable ctypes code in the presence
> of a gazillion integer typedefs and #defines, such as for Py_ssize_t.
> That doesn't map to a fixed C integral type cross-platform, so what
> can you do?  You're not required to answer that ;-)

This must probably be exported from the C code.  Currently ctypes has
the basic (integer) types c_byte, c_short, c_int, c_long, c_longlong, plus
their unsigned variants.  On 32-bit platforms, c_int is an alias to c_long.

Sized ints are defined: c_int8, c_int16, c_int32, c_int64, (plus the unsigned
variants again), also as aliases to the 10 basic integer types.

I *should* be possible by some checks to find out about the size
of Py_ssize_t at runtime (unless it is an configurable option)...

> Thread ids may bite us someday too.  Python casts the platform's
> notion of a thread id to C `long`, but there's no guarantee this won't
> lose information (or is even legal) on all platforms.  We'd probably
> be safer casting to, e.g., Py_uintptr_t (some thread implementions
> return an index into a kernel or library thread-info table, but at
> least some in my lifetime returned a pointer to a thread-info struct,
> and that's definitely fatter than C `long` on some boxes).
> 
>> 1
>> >>> I quit!
>> Exception in thread Thread-2:Traceback (most recent call last):
>>   File "C:\python23\lib\threading.py", line 442, in __bootstrap
>>     self.run()
>>   File "C:\python23\lib\threading.py", line 422, in run
>>     self.__target(*self.__args, **self.__kwargs)
>>   File "<stdin>", line 4, in foo
>> Exception
> 
> It's really cool that you can do this from ctypes, eh?  That's exactly
> the right level of abstraction for this attractive nuisance too ;-)

;-)

Thomas


From guido at python.org  Mon Aug 14 17:31:31 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 08:31:31 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
	<ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>
Message-ID: <ca471dc20608140831g453e1b7emf6fee0f2c14c71b3@mail.gmail.com>

After thinking about it some more, IMO for most purposes ctypes is
really quite sub-optimal. I think it would make more sense to work on
Parrot support for Python. Sure, in the short term  ctypes is more
practical than Parrot -- in its most recent incarnation, the latter
doesn't even list Python as a supported language -- a regression from
last year when Python support was among the best. But in the long
term, Parrot (like .NET or Jython do in other contexts) offers
cross-language interoperability, and perhaps even (like .NET and
Jython) automatic generation of wrappers.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From exarkun at divmod.com  Mon Aug 14 17:33:48 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Mon, 14 Aug 2006 11:33:48 -0400
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <ca471dc20608140831g453e1b7emf6fee0f2c14c71b3@mail.gmail.com>
Message-ID: <20060814153348.1717.1313126828.divmod.quotient.22734@ohm>

On Mon, 14 Aug 2006 08:31:31 -0700, Guido van Rossum <guido at python.org> wrote:
>After thinking about it some more, IMO for most purposes ctypes is
>really quite sub-optimal. I think it would make more sense to work on
>Parrot support for Python. Sure, in the short term  ctypes is more
>practical than Parrot -- in its most recent incarnation, the latter
>doesn't even list Python as a supported language -- a regression from
>last year when Python support was among the best. But in the long
>term, Parrot (like .NET or Jython do in other contexts) offers
>cross-language interoperability, and perhaps even (like .NET and
>Jython) automatic generation of wrappers.
>

This is a joke, right?

Jean-Paul

From guido at python.org  Mon Aug 14 18:09:49 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 09:09:49 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <20060814153348.1717.1313126828.divmod.quotient.22734@ohm>
References: <ca471dc20608140831g453e1b7emf6fee0f2c14c71b3@mail.gmail.com>
	<20060814153348.1717.1313126828.divmod.quotient.22734@ohm>
Message-ID: <ca471dc20608140909o730ab1e0i86c6d562cfa90abd@mail.gmail.com>

No. Why would it be a joke? Because it's a Perl thing? Because it
doesn't acknowledge Python's obvious supremacy in the universe of
languages? Because it admits that other projects sometimes have good
ideas? Because it's a good idea to have to write separate wrappers
around every useful library for each dynamic languague separately?
Because Parrot isn't real? IMO it's pretty real already -- the 0.4.6
release supports Ruby, Javascript, Tcl, and a bunch more (possibly
even Perl 6 :-). I wouldn't be surprised if Parrot reached maturity
around the same time as Py3k.

--Guido

On 8/14/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
> On Mon, 14 Aug 2006 08:31:31 -0700, Guido van Rossum <guido at python.org> wrote:
> >After thinking about it some more, IMO for most purposes ctypes is
> >really quite sub-optimal. I think it would make more sense to work on
> >Parrot support for Python. Sure, in the short term  ctypes is more
> >practical than Parrot -- in its most recent incarnation, the latter
> >doesn't even list Python as a supported language -- a regression from
> >last year when Python support was among the best. But in the long
> >term, Parrot (like .NET or Jython do in other contexts) offers
> >cross-language interoperability, and perhaps even (like .NET and
> >Jython) automatic generation of wrappers.
> >
>
> This is a joke, right?
>
> Jean-Paul
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From exarkun at divmod.com  Mon Aug 14 19:20:00 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Mon, 14 Aug 2006 13:20:00 -0400
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <ca471dc20608140909o730ab1e0i86c6d562cfa90abd@mail.gmail.com>
Message-ID: <20060814172000.1717.863905740.divmod.quotient.22821@ohm>

On Mon, 14 Aug 2006 09:09:49 -0700, Guido van Rossum <guido at python.org> wrote:
>On 8/14/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
>>On Mon, 14 Aug 2006 08:31:31 -0700, Guido van Rossum <guido at python.org> 
>>wrote:
>> >After thinking about it some more, IMO for most purposes ctypes is
>> >really quite sub-optimal. I think it would make more sense to work on
>> >Parrot support for Python. Sure, in the short term  ctypes is more
>> >practical than Parrot -- in its most recent incarnation, the latter
>> >doesn't even list Python as a supported language -- a regression from
>> >last year when Python support was among the best. But in the long
>> >term, Parrot (like .NET or Jython do in other contexts) offers
>> >cross-language interoperability, and perhaps even (like .NET and
>> >Jython) automatic generation of wrappers.
>> >
>>
>>This is a joke, right?
>>
>No. Why would it be a joke? Because it's a Perl thing? Because it
>doesn't acknowledge Python's obvious supremacy in the universe of
>languages? Because it admits that other projects sometimes have good
>ideas?

Heh.  Strawmen, all.  I assure you, none of these objections ever entered
my mind.

>Because it's a good idea to have to write separate wrappers
>around every useful library for each dynamic languague separately?

If a project has done this successfully, I don't think I've seen it.  Can
you point out some examples where this has been accomplished in a useful
form?  The nearest thing I can think of is SWIG, which is basically a
failure.

This is not to say that it is not a noble goal, but I think it remains to
be shown that Parrot is actually a solution here.

>Because Parrot isn't real? IMO it's pretty real already -- the 0.4.6
>release supports Ruby, Javascript, Tcl, and a bunch more (possibly
>even Perl 6 :-). I wouldn't be surprised if Parrot reached maturity
>around the same time as Py3k.
>

Parrot has been around for quite a while now without accomplishing anything
much of practical value.  Does anyone _use_ it for Ruby, JavaScript, or Tcl?
(I know no one uses it for Perl 6 ;)

For five years of development by a pretty large community, that's not showing
a lot.  The reason I suspected a joke is that you seem to want to discard a
fairly good existing widely used solution in favor of one that's just vapor
right now.  Granted Py3k is a ways off, but it's not /that/ far off.  We're
talking about a year or two here.  Is Parrot going to be as solid in a year
as ctypes already is?  I doubt it.

If you /really/ want to look outside of the Python community for solutions
here, the lisp community has thought about this for a long time.  Instead of
looking at Parrot, you should look at the ffi provided by almost any lisp
runtime.

Jean-Paul

From guido at python.org  Mon Aug 14 19:38:25 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 10:38:25 -0700
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>
	<fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>
Message-ID: <ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>

Not remembering the PEP in detail, I agree with Jim's resolution of all these.

I guess the right rule is that all positional arguments come first
(first the regular ones, then * or *args). Then come the keyword
arguments, again, first the regular ones (name=value), then **kwds.

I believe the PEP doesn't address the opposite use case: positional
arguments that should *not* be specified as keyword arguments. For
example, I might want to write

  def foo(a, b): ...

but I don't want callers to be able to call it as foo(b=1, a=2) or
even foo(a=2, b=1).

A realistic example is the write() method of file objects. We really
don't want people starting to say f.write(s="abc") because even if
that works for the current file type you're using, it won't work if an
instance of some other class implementing write() is substituted --
write() is always documented as an API taking a positional argument,
so different "compatible" classes are likely to have different
argument names. Currently this is enforced because the default file
type is implemented in C and it doesn't have keyword arguments; but in
Py3k it may well be implemented in Python and then we currently have
no decent way to say "this should really be a positional argument".
(There's an analogy to forcing keyword arguments using **, using *args
for all arguments and parsing that explicitly -- but that's tedious
for a fairly common use case.)

Perhaps we can use ** without following identifier to signal this?
It's not entirely analogous to * without following identifier, but at
least somewhat similar.

--Guido

On 8/12/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/11/06, Jiwon Seo <seojiwon at gmail.com> wrote:
> > When we have keyword-only arguments, do we allow 'keyword dictionary'
> > argument? If that's the case, where would we want to place
> > keyword-only arguments?
>
> > Are we going to allow any of followings?
>
> > 1. def foo(a, b,  *, key1=None, key2=None, **map)
>
> Seems perfectly reasonable.
>
> I think the controversy was over whether or not to allow keyword-only
> without a default.
>
> > 2. def foo(a, b, *,  **map, key1=None, key2=None)
>
> Seems backward, though I suppose we could adjust if we needed to.
>
> > 3. def foo(a, b, *, **map)
>
> What would the * even mean, since there aren't any named keywords to separate?
>
> -jJ
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steven.bethard at gmail.com  Mon Aug 14 19:49:36 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 14 Aug 2006 11:49:36 -0600
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>
	<fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>
	<ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>
Message-ID: <d11dcfba0608141049p24c03471k3c0252bd188ee5e7@mail.gmail.com>

On 8/14/06, Guido van Rossum <guido at python.org> wrote:
> I believe the PEP doesn't address the opposite use case: positional
> arguments that should *not* be specified as keyword arguments. For
> example, I might want to write
>
>   def foo(a, b): ...
>
> but I don't want callers to be able to call it as foo(b=1, a=2) or
> even foo(a=2, b=1).

Another use case is when you want to accept the arguments of another
callable, but you have your own positional arguments::

    >>> class Wrapper(object):
    ...     def __init__(self, func):
    ...         self.func = func
    ...     def __call__(self, *args, **kwargs):
    ...         print 'calling wrapped function'
    ...         return self.func(*args, **kwargs)
    ...
    >>> @Wrapper
    ... def func(self, other):
    ...     return self, other
    ...
    >>> func(other=1, self=2)
    Traceback (most recent call last):
      File "<interactive input>", line 1, in ?
    TypeError: __call__() got multiple values for keyword argument 'self'

It would be really nice in the example above to mark ``self`` in
``__call__`` as a positional only argument.

> Perhaps we can use ** without following identifier to signal this?
> It's not entirely analogous to * without following identifier, but at
> least somewhat similar.

I'm certainly not opposed to going this way, but I don't think it
would solve the problem above since you still need to take keyword
arguments.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From guido at python.org  Mon Aug 14 20:04:19 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 11:04:19 -0700
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <d11dcfba0608141049p24c03471k3c0252bd188ee5e7@mail.gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>
	<fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>
	<ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>
	<d11dcfba0608141049p24c03471k3c0252bd188ee5e7@mail.gmail.com>
Message-ID: <ca471dc20608141104i154efbfehf88e8f10f7877ea8@mail.gmail.com>

On 8/14/06, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 8/14/06, Guido van Rossum <guido at python.org> wrote:
> > I believe the PEP doesn't address the opposite use case: positional
> > arguments that should *not* be specified as keyword arguments. For
> > example, I might want to write
> >
> >   def foo(a, b): ...
> >
> > but I don't want callers to be able to call it as foo(b=1, a=2) or
> > even foo(a=2, b=1).
>
> Another use case is when you want to accept the arguments of another
> callable, but you have your own positional arguments::
>
>     >>> class Wrapper(object):
>     ...     def __init__(self, func):
>     ...         self.func = func
>     ...     def __call__(self, *args, **kwargs):
>     ...         print 'calling wrapped function'
>     ...         return self.func(*args, **kwargs)
>     ...
>     >>> @Wrapper
>     ... def func(self, other):
>     ...     return self, other
>     ...
>     >>> func(other=1, self=2)
>     Traceback (most recent call last):
>       File "<interactive input>", line 1, in ?
>     TypeError: __call__() got multiple values for keyword argument 'self'
>
> It would be really nice in the example above to mark ``self`` in
> ``__call__`` as a positional only argument.

But this is a rather unusual use case isn't it? It's due to the bound
methods machinery. Do you have other use cases? I would assume that
normally such wrappers take their own control arguments in the form of
keyword-only arguments (that are unlikely to conflict with arguments
of the wrapped method).

> > Perhaps we can use ** without following identifier to signal this?
> > It's not entirely analogous to * without following identifier, but at
> > least somewhat similar.
>
> I'm certainly not opposed to going this way, but I don't think it
> would solve the problem above since you still need to take keyword
> arguments.

Can you elaborate?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 14 20:08:56 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 11:08:56 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <20060814172000.1717.863905740.divmod.quotient.22821@ohm>
References: <ca471dc20608140909o730ab1e0i86c6d562cfa90abd@mail.gmail.com>
	<20060814172000.1717.863905740.divmod.quotient.22821@ohm>
Message-ID: <ca471dc20608141108u118487ccw16cc8527c6f24744@mail.gmail.com>

On 8/14/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
> On Mon, 14 Aug 2006 09:09:49 -0700, Guido van Rossum <guido at python.org> wrote:
> >On 8/14/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
> >>This is a joke, right?
> >Because it's a good idea to have to write separate wrappers
> >around every useful library for each dynamic languague separately?
>
> If a project has done this successfully, I don't think I've seen it.  Can
> you point out some examples where this has been accomplished in a useful
> form?  The nearest thing I can think of is SWIG, which is basically a
> failure.

SWIG is not my favorite (msotly because I don't like C++ much) but
it's used very effectively here at Google (for example); I wouldn't
dream of calling it a failure.

I also consider .NET's CLR a success, based on the testimony of Jim
Hugunin (who must be Microsoft's most reluctant employee :).

And I see the JVM as a successful case too -- Jython can link to
anything written in Java or compiled to JVM bytecode, and so can other
languages that use JVM introspection the same way as Jython (I hear
there's  Ruby analogue).

The major difference between all these examples and ctypes is that
ctypes has no way of introspecting the wrapped library; you have to
repeat everything you know about the API in your calls to ctypes (and
as was just shown in another thread about 64-bit issues, that's not
always easy).

> This is not to say that it is not a noble goal, but I think it remains to
> be shown that Parrot is actually a solution here.

Parrot definitely has to show itself still. But a year ago Sam Ruby
reported on his efforts of making Python work on Parrot, and he
sounded like it was very a feasible proposition.

> Parrot has been around for quite a while now without accomplishing anything
> much of practical value.  Does anyone _use_ it for Ruby, JavaScript, or Tcl?
> (I know no one uses it for Perl 6 ;)
>
> For five years of development by a pretty large community, that's not showing
> a lot.  The reason I suspected a joke is that you seem to want to discard a
> fairly good existing widely used solution in favor of one that's just vapor
> right now.  Granted Py3k is a ways off, but it's not /that/ far off.  We're
> talking about a year or two here.  Is Parrot going to be as solid in a year
> as ctypes already is?  I doubt it.

That's not exactly the point I am making. I find Parrot's approach,
assuming the project won't fail due to internal friction, much more
long-term viable than ctypes. The big difference being (I hope)
introspective generation of APIs rather than having to repeat the
linkage information in each client language.

> If you /really/ want to look outside of the Python community for solutions
> here, the lisp community has thought about this for a long time.  Instead of
> looking at Parrot, you should look at the ffi provided by almost any lisp
> runtime.

This seems a mostly theoretical viewpoint to me. Can you point me to
an example of a Python-like language that is successful in reusing a
Lisp runtime? (And I don't consider Lisp or Scheme Python-like in this
context. ;-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 14 20:13:56 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 11:13:56 -0700
Subject: [Python-3000] Python/C++ question
In-Reply-To: <ebn5ms$mne$1@sea.gmane.org>
References: <44DA6C01.2040904@acm.org>
	<ca471dc20608091618k3cffb3ewa3029794f0f02761@mail.gmail.com>
	<44DF0800.4060204@acm.org> <ebn5ms$mne$1@sea.gmane.org>
Message-ID: <ca471dc20608141113y15e6ba9u3ea405905a0ca0ad@mail.gmail.com>

On 8/13/06, Georg Brandl <g.brandl at gmx.net> wrote:
> Talin wrote:
> > Guido van Rossum wrote:
> >> On 8/9/06, Talin <talin at acm.org> wrote:
> >> For the majority of Python developers it's probably the other way
> >> around. It's been 15 years since I wrote C++, and unlike C, that
> >> language has changed a lot since then...
> >>
> >> It would be a complete rewrite; I prefer doing a gradual
> >> transmogrification of the current codebase into Py3k rather than
> >> starting from scratch (read Joel Spolsky on why).
> >
> > BTW, Should this be added to PEP 3099?
>
> Yes, why not.

Although perhaps it makes more sense to add something positive to PEP 3000, e.g.

Implementation Language
==================

Python 3000 will be implemented in C, and the implementation will be
derived as an evolution of the Python 2 code base. This reflects my
views (which I share with Joel Spolsky) on the dangers of complete
rewrites. Since Python 3000 as a language is a relatively mild
improvement on Python 2, we can gain a lot by not attempting to
reimplement the language from scratch. I am not against parallel
from-scratch implementation efforts, but my own efforts will be
directed at the language and implementation that I know best.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 14 20:17:14 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 11:17:14 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com>
References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com>
Message-ID: <ca471dc20608141117l1c61247fy39bae2b00d45675d@mail.gmail.com>

On 8/11/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> i mailed this to several people separately, but then i thought it could
> benefit the entire group:
>
> http://sebulba.wikispaces.com/recipe+thread2
>
> it's an implementation of the proposed " thread.raise_exc", through an extension
> to the threading.Thread class. you can test it for yourself; if it proves useful,
> it should be exposed as thread.raise_exc in the stdlib (instead of the ctypes
> hack)... and of course it should be reflected in threading.Thread as welll.

Cool. Question: what's the problem with raising exception instances?
Especially in the light of my proposal to use

  raise SomeException(42)

in preference over (and perhaps exclusively instead of)

  raise SomeException, 42

in Py3k. The latter IMO is a relic from the days of string exceptions
which are as numbered as they come. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at prescod.net  Mon Aug 14 20:40:04 2006
From: paul at prescod.net (Paul Prescod)
Date: Mon, 14 Aug 2006 11:40:04 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <ca471dc20608140831g453e1b7emf6fee0f2c14c71b3@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
	<ca471dc20608101050m231b618asc5c70181ed4078dc@mail.gmail.com>
	<ca471dc20608140831g453e1b7emf6fee0f2c14c71b3@mail.gmail.com>
Message-ID: <1cb725390608141140g480e0c66q6f1e74f32ad1e540@mail.gmail.com>

I guess I don't see ctypes and Parrot solving the same problem at all. My
idea with ctypes was the opposite of choosing a new runtime. It was to help
various runtimes (PyPy, Jython, IronPython, CPython 2.5, CPython 3.0,
Parrot, ...) to compete on their own merits (primarily performance and
interoperability) and not on the basis that they don't support some Python
library whether it be "crypt" or "pyopengl". It would also be nice to move
beyond the situation where everyone in the world must re-release their C
modules (no matter how trivial) every time Python goes through a minor
upgrade. Does Parrot these problems or exacerbate them?

Also, Parrot seems like a bit of a random choice considering the fact that
there are many candidates for a next-generation Python runtime: PyPy,
IronPython/mono, etc. They have both come much further, much quicker, than
Parrot. I'm a bit skeptical of the Parrot story after the Guile mess. It was
supposed to be a multi-language dynamic runtime as well. But that's a
digression. I don't think you're betting on any particular strategy, just
saying that we should watch Parrot and see how it turns out.

But anyhow, my original suggestion did not start with ctypes at all. From my
point of view, the goal is to express Pythonic constructs in Python (whether
using Ctypes, Pyrex, rctypes, or whaver) where possible rather than
expressing Pythonic constructs in C (PyErr_SetString, PyDict_SetItem, etc.).
Then each runtime can map the Pythonic constructs to their internal model
and use their native FFI strategy (JNI, P/Invoke, libffi) to handle the C
stuff. The actual details of the syntax do not matter to me (though they do
matter!). I also do not care whether it uses a compiler strategy like Pyrex
or a runtime model like ctypes, or a dual-mode strategy like
PyPy/extcompiler.

I accept the current limitations of this technique when it comes to
(especially) C++, and therefore don't promote it as a panacea.

Let me ask a question about our current status. If there were a requirement
to do a simple wrapper library like "crypt" or "getpasswd"...is there any
high level wrapping strategy that you would allow into the standard library?
A ctypes-based module? The C output of a Pyrex compiler? The output of SWIG?


Or is hand-written C code the only thing you want for extensions in the
Python library? Even if the answer is "hand-written C code" it might be nice
to have an explicit statement so that people know in advance. I propose that
if the developer can make the case that a ctypes-based library is more
maintainable than the C code would be, and performance is acceptable for the
problem domain, that the ctypes-based library be acceptable. Would you agree
that that small step is reasonable?

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/04bca409/attachment.htm 

From jimjjewett at gmail.com  Mon Aug 14 21:11:00 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 14 Aug 2006 15:11:00 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
Message-ID: <fb6fbf560608141211k528e754eva03e90612dff9ecd@mail.gmail.com>

On 8/13/06, Paul Prescod <paul at prescod.net> wrote:
> My proposed text for the PEP is as follows:

Mostly good.  A few remaining comments...

Should annotation objects with defined semantics have some standard
way to indicate this?  (By analogy, new exceptions *should* inherit
from Exception; should annotation objects inherit from an Annotation
class, at least as a mixin?)

> "This implies that the interpretation of built-in types would be controlled
> by Python's developers and documented in Python's documentation.

It also implies that the interpretation of annotations made with a
built-in type should be safe -- they shouldn't trigger any
irreversible actions.

>  "In Python 3000, semantics will be attached to the following types: objects
> of type string (or subtype of string) are to be used for documentation
> (though they are not necessarily the exclusive source of documentation about
> the type). Objects of type list (or subtype of list) are to be used for
> attaching multiple independent annotations."

subtypes should be available for other frameworks.

This implies that something other than lists should be used if the
annotations are not independent.  The obvious candidates are tuples
and dicts, but this should be explicit (or explicitly not defined).

The definition of a type as an annotation should probably be either
defined or explicitly undefined.  Earlier discussions talked about
things like


    def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str)

This implied that a type object would represent the type of the
argument (but would it be safe to call as an adapter?), that special
syntactic support might be added for type unions, and that the
"independent" part of the list specification should probably be
repeated at least in an example.  I'm not sure if these implications
*should* be true, but they're obvious enough to some people (and not
to others) that the decision should be explicit.

-jJ

From jcarlson at uci.edu  Mon Aug 14 21:15:18 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 14 Aug 2006 12:15:18 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <ca471dc20608141117l1c61247fy39bae2b00d45675d@mail.gmail.com>
References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com>
	<ca471dc20608141117l1c61247fy39bae2b00d45675d@mail.gmail.com>
Message-ID: <20060814121235.19A8.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> 
> On 8/11/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> > i mailed this to several people separately, but then i thought it could
> > benefit the entire group:
> >
> > http://sebulba.wikispaces.com/recipe+thread2
> >
> > it's an implementation of the proposed " thread.raise_exc", through an extension
> > to the threading.Thread class. you can test it for yourself; if it proves useful,
> > it should be exposed as thread.raise_exc in the stdlib (instead of the ctypes
> > hack)... and of course it should be reflected in threading.Thread as welll.
> 
> Cool. Question: what's the problem with raising exception instances?
> Especially in the light of my proposal to use
> 
>   raise SomeException(42)
> 
> in preference over (and perhaps exclusively instead of)

The problem is that it is not implemented in the underlying CPython API
PyThreadState_SetAsyncExc function.

 - Josiah


From g.brandl at gmx.net  Mon Aug 14 21:12:50 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 14 Aug 2006 21:12:50 +0200
Subject: [Python-3000] threading, part 2
In-Reply-To: <ca471dc20608141117l1c61247fy39bae2b00d45675d@mail.gmail.com>
References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com>
	<ca471dc20608141117l1c61247fy39bae2b00d45675d@mail.gmail.com>
Message-ID: <ebqhvi$80m$1@sea.gmane.org>

Guido van Rossum wrote:
> On 8/11/06, tomer filiba <tomerfiliba at gmail.com> wrote:
>> i mailed this to several people separately, but then i thought it could
>> benefit the entire group:
>>
>> http://sebulba.wikispaces.com/recipe+thread2
>>
>> it's an implementation of the proposed " thread.raise_exc", through an extension
>> to the threading.Thread class. you can test it for yourself; if it proves useful,
>> it should be exposed as thread.raise_exc in the stdlib (instead of the ctypes
>> hack)... and of course it should be reflected in threading.Thread as welll.
> 
> Cool. Question: what's the problem with raising exception instances?
> Especially in the light of my proposal to use
> 
>   raise SomeException(42)
> 
> in preference over (and perhaps exclusively instead of)
> 
>   raise SomeException, 42
> 
> in Py3k. The latter IMO is a relic from the days of string exceptions
> which are as numbered as they come. :-)

I think this is the answer:

http://mail.python.org/pipermail/python-dev/2006-August/068165.html

Georg


From g.brandl at gmx.net  Mon Aug 14 21:13:50 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 14 Aug 2006 21:13:50 +0200
Subject: [Python-3000] Python/C++ question
In-Reply-To: <ca471dc20608141113y15e6ba9u3ea405905a0ca0ad@mail.gmail.com>
References: <44DA6C01.2040904@acm.org>	<ca471dc20608091618k3cffb3ewa3029794f0f02761@mail.gmail.com>	<44DF0800.4060204@acm.org>
	<ebn5ms$mne$1@sea.gmane.org>
	<ca471dc20608141113y15e6ba9u3ea405905a0ca0ad@mail.gmail.com>
Message-ID: <ebqi1f$80m$2@sea.gmane.org>

Guido van Rossum wrote:
> On 8/13/06, Georg Brandl <g.brandl at gmx.net> wrote:
>> Talin wrote:
>> > Guido van Rossum wrote:
>> >> On 8/9/06, Talin <talin at acm.org> wrote:
>> >> For the majority of Python developers it's probably the other way
>> >> around. It's been 15 years since I wrote C++, and unlike C, that
>> >> language has changed a lot since then...
>> >>
>> >> It would be a complete rewrite; I prefer doing a gradual
>> >> transmogrification of the current codebase into Py3k rather than
>> >> starting from scratch (read Joel Spolsky on why).
>> >
>> > BTW, Should this be added to PEP 3099?
>>
>> Yes, why not.
> 
> Although perhaps it makes more sense to add something positive to PEP 3000, e.g.
> 
> Implementation Language
> ==================
> 
> Python 3000 will be implemented in C, and the implementation will be
> derived as an evolution of the Python 2 code base. This reflects my
> views (which I share with Joel Spolsky) on the dangers of complete
> rewrites. Since Python 3000 as a language is a relatively mild
> improvement on Python 2, we can gain a lot by not attempting to
> reimplement the language from scratch. I am not against parallel
> from-scratch implementation efforts, but my own efforts will be
> directed at the language and implementation that I know best.

I had already added something to PEP 3099, but if you like that approach
better, I'll add that to PEP 3000.

Georg


From tim.peters at gmail.com  Mon Aug 14 21:15:30 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 14 Aug 2006 15:15:30 -0400
Subject: [Python-3000] threading, part 2
In-Reply-To: <ca471dc20608141117l1c61247fy39bae2b00d45675d@mail.gmail.com>
References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com>
	<ca471dc20608141117l1c61247fy39bae2b00d45675d@mail.gmail.com>
Message-ID: <1f7befae0608141215y72e827cfo4f541b7e5fe927a8@mail.gmail.com>

[tomer filiba]
>> i mailed this to several people separately, but then i thought it could
>> benefit the entire group:
>>
>> http://sebulba.wikispaces.com/recipe+thread2
>>
>> it's an implementation of the proposed " thread.raise_exc",
>> ...

[Guido]
> Cool. Question: what's the problem with raising exception instances?

See

    http://mail.python.org/pipermail/python-dev/2006-August/068165.html

Short course:  in ceval.c,

    x = tstate->async_exc;
    ...
    PyErr_SetNone(x);

That is, with the current code it's only possible to set the exception
type via PyThreadState_SetAsyncExc(); the exception value is forced to
None/NULL.  What was the intent ;-)?

Example:

"""
from time import sleep
import ctypes, thread, sys, threading

setexc = ctypes.pythonapi.PyThreadState_SetAsyncExc

f_done = threading.Event()
def f():
    try:
        while 1:
            time.sleep(1)
    finally:
        f_done.set()

tid = thread.start_new_thread(f, ())
exc = ValueError("13")

setexc(ctypes.c_long(tid),
       ctypes.py_object(exc))

f_done.wait()
"""

Output:

Unhandled exception in thread started by <function f at 0x009E8370>
Traceback (most recent call last):
  File "setexc.py", line 12, in f
    f_done.set()
  File "C:\Code\python\lib\threading.py", line 351, in set
    self.__cond.release()
SystemError: 'finally' pops bad exception


Change `exc` to, e.g.,

exc = ValueError

and then it's fine:

Unhandled exception in thread started by <function f at 0x009E8370>
Traceback (most recent call last):
  File "setexc.py", line 12, in f
    f_done.set()
  File "C:\Code\python\lib\threading.py", line 349, in set
    self.__cond.notifyAll()
  File "C:\Code\python\lib\threading.py", line 265, in notifyAll
    self.notify(len(self.__waiters))
  File "C:\Code\python\lib\threading.py", line 258, in notify
    waiter.release()
ValueError

From jimjjewett at gmail.com  Mon Aug 14 21:26:18 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 14 Aug 2006 15:26:18 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
Message-ID: <fb6fbf560608141226j7dd377e5pd361d9a340f70265@mail.gmail.com>

On 8/13/06, Phillip J. Eby <pje at telecommunity.com> wrote:

> However, it's only a problem if you insist on writing brain-damaged
> code.  If you want interoperability here, you must write tell-don't-ask
> code.  ... is it really the case that
> so many people don't know what tell-don't-ask code is or why you want
> it?  I guess maybe it's something that's only grasped by people who have
> experience writing code intended for interoperability.

> [Meanwhile, I'm not going to respond to the rest of your message, since it
> contained some things that appeared to me to be a mixture of ad hominem
> attack and straw man argument.  I hope that was not actually your intent.]

I did not intend to insult you.

My point is simply that what is obvious to you -- and even what is
obvious to almost anyone experienced enough to be reading this message
-- won't be obvious to everyone first starting out.

I want to be able to use a new programmer's first contribution.

I absolutely don't want to tell them "Great, but you really should
have used XYZ.  We didn't really make that explicit because
experienced folks tend to do it naturally."

-jJ

From exarkun at divmod.com  Mon Aug 14 21:34:25 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Mon, 14 Aug 2006 15:34:25 -0400
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <ca471dc20608141108u118487ccw16cc8527c6f24744@mail.gmail.com>
Message-ID: <20060814193425.1717.135462452.divmod.quotient.22922@ohm>

On Mon, 14 Aug 2006 11:08:56 -0700, Guido van Rossum <guido at python.org> wrote:
>On 8/14/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
>>On Mon, 14 Aug 2006 09:09:49 -0700, Guido van Rossum <guido at python.org> 
>>wrote:
>I also consider .NET's CLR a success, based on the testimony of Jim
>Hugunin (who must be Microsoft's most reluctant employee :).
>
>And I see the JVM as a successful case too -- Jython can link to
>anything written in Java or compiled to JVM bytecode, and so can other
>languages that use JVM introspection the same way as Jython (I hear
>there's  Ruby analogue).

These successes are necessarily limited in scope.  Jython can use any
Java library, and that's great, as far as it goes.  Clearly, though,
it isn't a complete solution.  Relying on Parrot to have a rich library
of wrapper modules seems ill advised.  If it /already/ had a rich library,
then maybe it would seem more reasonable.

>
>The major difference between all these examples and ctypes is that
>ctypes has no way of introspecting the wrapped library; you have to
>repeat everything you know about the API in your calls to ctypes (and
>as was just shown in another thread about 64-bit issues, that's not
>always easy).

The codegenerator package which is closely related to ctypes is capable of
this as well.  PyPy has a complete ctypes-based OpenSSL wrapper which is
automatically generated.

>That's not exactly the point I am making. I find Parrot's approach,
>assuming the project won't fail due to internal friction, much more
>long-term viable than ctypes. The big difference being (I hope)
>introspective generation of APIs rather than having to repeat the
>linkage information in each client language.

Given the existence of codegenerator, do you still find Parrot's approach
more viable?  It seems to me that it easily levels the playing field, and
makes ctypes still more attractive than Parrot, since it side-steps the
not insignificant internal political issues with the Parrot team.

>This seems a mostly theoretical viewpoint to me. Can you point me to
>an example of a Python-like language that is successful in reusing a
>Lisp runtime? (And I don't consider Lisp or Scheme Python-like in this
>context. ;-)

PyPy has a Common Lisp backend.  It's not the primary target, but it's not
inconceivable that it could someday provide an ffi from a Common Lisp
runtime to Python programs.

There has also been work done on an IL backend for PyPy.  This could be
used to make any CLR library available to Python programs.

Of course, with those two examples in hand, we see a fundamental drawback
to the Parrot-style solution (of which these are both essentially examples).
What if I want to use the CL FFI at the same time as a library exposed via
.NET?  I'm out of luck.  Had the libraries I wanted both been wrapped with
ctypes, I could have used them both from either runtime.

In general, what are alternate runtimes like PyPy to do if Parrot becomes
the de facto standard for extension modules?  Link against Parrot?  Suffer
without those modules until someone does a custom binding for that runtime?

Jean-Paul

From collinw at gmail.com  Mon Aug 14 21:41:11 2006
From: collinw at gmail.com (Collin Winter)
Date: Mon, 14 Aug 2006 15:41:11 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
Message-ID: <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com>

On 8/13/06, Paul Prescod <paul at prescod.net> wrote:
> "In order for processors of function annotations to work interoperably, they
> must use a common interpretation of objects used as annotations on a
> particular function. For example, one might interpret string annotations as
> docstrings. Another might interpet them as path segments for a web
> framework. For this reason, function annotation processors SHOULD avoid
> assigning processor-specific meanings to types defined outside of the
> processor's framework. For example, a Django processor could process
> annotations of a type defined in a Zope package, but Zope's creators should
> be considered the authorities on the type's meaning for the same reasons
> that they would be considered authorities on the semantics of classes or
> methods in their packages."

The way I read this, it forces (more or less) each
annotation-consuming library to invent new ways to spell Python's
built-in types.

I read all this as saying that annotation processors should avoid
using Python's lists, tuples and dicts in annotations (since whatever
semantics the Python developers come up with will inevitably be
incompatible with what some library writer needs/wants). Each
processor library will then define my_processor.List,
my_processor.Tuple, my_processor.Dict, etc as alternate spellings for
[x, y, z], (x, y, z), {x: y} and so on.

> "This implies that the interpretation of built-in types would be controlled
> by Python's developers and documented in Python's documentation.

The inherent difficulty in defining a standard interpretation for
these types is what motivated me to leave this up to the authors of
annotation consumers. I don't mean "it was hard so I gave up"; I can
easily come up with a standard, but it will probably be of limited or
no utility to some section of the possible userbase.

If you have an idea, though, feel free to propose something concrete.

>  "In Python 3000, semantics will be attached to the following types: objects
> of type string (or subtype of string) are to be used for documentation
> (though they are not necessarily the exclusive source of documentation about
> the type). Objects of type list (or subtype of list) are to be used for
> attaching multiple independent annotations."

Does this mean all lists "are to be used for attaching multiple
independent annotations", or just top-level lists (ie, "def foo(a: [x,
y])" indicates two independent annotations)? What does "def foo(a: [x,
[y, z]])" indicate?

Collin Winter

From paul at prescod.net  Mon Aug 14 22:00:59 2006
From: paul at prescod.net (Paul Prescod)
Date: Mon, 14 Aug 2006 13:00:59 -0700
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <ca471dc20608141108u118487ccw16cc8527c6f24744@mail.gmail.com>
References: <ca471dc20608140909o730ab1e0i86c6d562cfa90abd@mail.gmail.com>
	<20060814172000.1717.863905740.divmod.quotient.22821@ohm>
	<ca471dc20608141108u118487ccw16cc8527c6f24744@mail.gmail.com>
Message-ID: <1cb725390608141300o7b6e6503x23e6c7b9cf31b92f@mail.gmail.com>

On 8/14/06, Guido van Rossum <guido at python.org> wrote:
>
>
> The major difference between all these examples and ctypes is that
> ctypes has no way of introspecting the wrapped library; you have to
> repeat everything you know about the API in your calls to ctypes (and
> as was just shown in another thread about 64-bit issues, that's not
> always easy).


An excellent point and very clarifying (though I still don't totally
understand the relationship with Parrot).

What do you think about techniques like these:

 * http://starship.python.net/crew/theller/ctypes/old/codegen.html

 * http://lists.copyleft.no/pipermail/pyrex/2006-June/001885.html

I agree that this is an issue.

But then on the other hand, given N methods and objects that you need
wrapped, you will in general need to make N individual mapping statements no
matter what technology you use. The question is how many lines of mapping
are you doing? Ctypes currently requires you to re-declare what you know
about the C library. Hand-written C libraries require you to do go through
other hoops.

For example, looking at Pygame ctypes, consider the following method:

    def __copy__(self):
        return Rect(self.x, self.y, self.w, self.h)

That's the ctypes version. Here's the C version:

/* for copy module */
static PyObject* rect_copy(PyObject* oself, PyObject* args)
{
    PyRectObject* self = (PyRectObject*)oself;
        return PyRect_New4(self->r.x, self->r.y, self->r.w, self->r.h);
}

static struct PyMethodDef rect_methods[] =
{
...        {"__copy__",            (PyCFunction)rect_copy, 0, NULL},...
};

So there is some repetition there as well (casts, function name
duplications, etc.).

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/a281094f/attachment.htm 

From paul at prescod.net  Mon Aug 14 22:20:54 2006
From: paul at prescod.net (Paul Prescod)
Date: Mon, 14 Aug 2006 13:20:54 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
	<43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com>
Message-ID: <1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com>

On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
>
> The way I read this, it forces (more or less) each
> annotation-consuming library to invent new ways to spell Python's
> built-in types.


I think that this is related to your other question. What if an annotation
consuming library wanted to use Python's built-in types nested within their
own top-level structures.

def foo(a: xxx([x, y, z])): ...

I would say that the innermost list has its semantics (as metadata) defined
by "xxx", not raw Python. That's the only reasonable thing.

> "This implies that the interpretation of built-in types would be
> controlled
> > by Python's developers and documented in Python's documentation.
>
> The inherent difficulty in defining a standard interpretation for
> these types is what motivated me to leave this up to the authors of
> annotation consumers.


There are three issues: first, we need to RESERVE the types for
standardization by Guido and crew. Second, we can decide to do the
standardization at any point. Third, we absolutely need a standard for
multiple independent annotations on a parameter. Using lists is a
no-brainer. So let's do that.

If you have an idea, though, feel free to propose something concrete.


Yes, my proposal is here:

>  "In Python 3000, semantics will be attached to the following types:
> objects
> > of type string (or subtype of string) are to be used for documentation
> > (though they are not necessarily the exclusive source of documentation
> about
> > the type). Objects of type list (or subtype of list) are to be used for
> > attaching multiple independent annotations."
>
> Does this mean all lists "are to be used for attaching multiple
> independent annotations", or just top-level lists (ie, "def foo(a: [x,
> y])" indicates two independent annotations)? What does "def foo(a: [x,
> [y, z]])" indicate?


I meant only top-level lists. I hadn't thought through nesting.

def foo(a: [x, y, [a, b, c]]): ...

This should probably be just handled recursively or disallowed. I don't feel
strongly either way.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/ff1b6d61/attachment.html 

From jimjjewett at gmail.com  Mon Aug 14 22:24:15 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 14 Aug 2006 16:24:15 -0400
Subject: [Python-3000] PEP3102 Keyword-Only Arguments; Signature
Message-ID: <fb6fbf560608141324ie44f75anc33876a18ae202e0@mail.gmail.com>

On 8/14/06, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 8/14/06, Guido van Rossum <guido at python.org> wrote:
> > I believe the PEP doesn't address the opposite use case: positional
> > arguments that should *not* be specified as keyword arguments.

...
> It would be really nice in the example above to mark ``self`` in
> ``__call__`` as a positional only argument.

Would this have to be in the standard function prologue, or would it
be acceptable to modify a function's Signature object?

As I see it, each argument can be any combination of the following:

    positional
    keyword
    named
    defaulted
    annotated

I can see some value in supporting all 32 possibilities, but doing it
directly as part of the def syntax might get awkward.

Most arguments are both positional and keyword.  The bare * will
support keyword-only, and you're asking for positional-only.  (An
argument which is neither positional nor keyword doesn't make sense.)

Today (except in extension code), an argument that isn't named only
appears courtesy of *args or **kwargs.

Today, named + keyword <==> defaulted

Today, arguments are not annotated.

Would it be acceptable if functions contained a (possibly implicit)
Signature object, and the way to get the odd combinations were through
modifying that?

For example:

    def unnamedargs(func):
        for arg in func.Signature:
            arg.name=None
        return func
...
        @unnamedargs
        def write(self, s):


-jJ

From steven.bethard at gmail.com  Mon Aug 14 22:34:54 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 14 Aug 2006 14:34:54 -0600
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <ca471dc20608141104i154efbfehf88e8f10f7877ea8@mail.gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>
	<fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>
	<ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>
	<d11dcfba0608141049p24c03471k3c0252bd188ee5e7@mail.gmail.com>
	<ca471dc20608141104i154efbfehf88e8f10f7877ea8@mail.gmail.com>
Message-ID: <d11dcfba0608141334g8f55c82h44616c0f02397f2c@mail.gmail.com>

On 8/14/06, Guido van Rossum <guido at python.org> wrote:
> On 8/14/06, Steven Bethard <steven.bethard at gmail.com> wrote:
> > On 8/14/06, Guido van Rossum <guido at python.org> wrote:
> > > I believe the PEP doesn't address the opposite use case: positional
> > > arguments that should *not* be specified as keyword arguments. For
> > > example, I might want to write
> > >
> > >   def foo(a, b): ...
> > >
> > > but I don't want callers to be able to call it as foo(b=1, a=2) or
> > > even foo(a=2, b=1).
> >
> > Another use case is when you want to accept the arguments of another
> > callable, but you have your own positional arguments::
> >
> >     >>> class Wrapper(object):
> >     ...     def __init__(self, func):
> >     ...         self.func = func
> >     ...     def __call__(self, *args, **kwargs):
> >     ...         print 'calling wrapped function'
> >     ...         return self.func(*args, **kwargs)
> >     ...
> >     >>> @Wrapper
> >     ... def func(self, other):
> >     ...     return self, other
> >     ...
> >     >>> func(other=1, self=2)
> >     Traceback (most recent call last):
> >       File "<interactive input>", line 1, in ?
> >     TypeError: __call__() got multiple values for keyword argument 'self'
> >
> > It would be really nice in the example above to mark ``self`` in
> > ``__call__`` as a positional only argument.
>
> But this is a rather unusual use case isn't it? It's due to the bound
> methods machinery. Do you have other use cases?

Well, for example, unitest.TestCase.failUnlessRaises works this way.
Here's the method signature::

    def failUnlessRaises(self, excClass, callableObj, *args, **kwargs):

Which means that if you write::

    self.failUnlessRaises(TypeError, my_func, callableObj=foo)

you'll get an error since there's a name clash between the callableObj
taken by failUnlessRaises and the one taken by the my_func object.

OTOH, I haven't run into this error because I don't use camelCase
names.  Perhaps the right answer is to always use camelCase on any
arguments that you don't want to worry about conflicts, and then any
PEP 8 compliant code will never have problems. ;-)

> > > Perhaps we can use ** without following identifier to signal this?
> > > It's not entirely analogous to * without following identifier, but at
> > > least somewhat similar.
> >
> > I'm certainly not opposed to going this way, but I don't think it
> > would solve the problem above since you still need to take keyword
> > arguments.
>
> Can you elaborate?

Well, taking the failUnlessRaises signature above, if you wanted to
specify that ``self``, ``excClass`` and ``callableObj`` were
positional only arguments, I believe you'd have to write::

    def failUnlessRaises(self, excClass, callableObj, *args, **):

I believe that means that you can't use failUnlessRaises to call a
method that expects keyword arguments, e.g.::

    self.assertRaises(OptionError, parser.add_option, type='foo')


STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From paul at prescod.net  Mon Aug 14 22:51:10 2006
From: paul at prescod.net (Paul Prescod)
Date: Mon, 14 Aug 2006 13:51:10 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <fb6fbf560608141211k528e754eva03e90612dff9ecd@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
	<fb6fbf560608141211k528e754eva03e90612dff9ecd@mail.gmail.com>
Message-ID: <1cb725390608141351n78099df6s6bf4359758d18b10@mail.gmail.com>

On 8/14/06, Jim Jewett <jimjjewett at gmail.com> wrote:
>
> Should annotation objects with defined semantics have some standard
> way to indicate this?  (By analogy, new exceptions *should* inherit
> from Exception; should annotation objects inherit from an Annotation
> class, at least as a mixin?)


All annotation objects have defined semantics (somewhere) or else they are
useless. I don't see any benefit in making them inherit from anything in
particular. Python has a very specific reason for requiring that in the
exception case. I'd rather not complicate the design without a good reason.

> "This implies that the interpretation of built-in types would be
> controlled
> > by Python's developers and documented in Python's documentation.
>
> It also implies that the interpretation of annotations made with a
> built-in type should be safe -- they shouldn't trigger any
> irreversible actions.


I disagree and don't think you can come up with a clear definition of
"irreversible" in any case. Is spitting out text to a stream "irreversible"?
I'd rather not complicate stuff.

>  "In Python 3000, semantics will be attached to the following types:
> objects
> > of type string (or subtype of string) are to be used for documentation
> > (though they are not necessarily the exclusive source of documentation
> about
> > the type). Objects of type list (or subtype of list) are to be used for
> > attaching multiple independent annotations."
>
> subtypes should be available for other frameworks.


I'd be happy to remove the whole subtype clause. I don't care much either
way. But anyhow I (now) disagree that there is a problem as stated. If a
framework wants to use a subtype of list they just need to wrap it in a
top-level wrapper that makes the association.

def foo(a: xxx(mylist_subtype(a, b, c))):

This is clear thanks to Collin Winters' recent post.

This implies that something other than lists should be used if the
> annotations are not independent.  The obvious candidates are tuples
> and dicts, but this should be explicit (or explicitly not defined).


The "dependence" between notations is totally up to the framework. To repeat
the example:

def foo(a: xxx(mylist_subtype(a, b, c))):

xxx might say that a is passed as a ".next" attribute to b which is passed
as a ".next" attribute to "c". Or xxx might say that "a" is passed to "b"'s
constructor which is passed to "c"'s constructor. Remeber that "xxx" is
executable so it could do whatever it wants. It should just document what it
did so that various libraries know how to navigate the object structure it
creates.

The definition of a type as an annotation should probably be either
> defined or explicitly undefined.  Earlier discussions talked about
> things like
>
>     def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str)


I think that's a separate (large!) PEP. This PEP should disallow frameworks
from inventing their own meaning for this syntax (requiring them to at least
wrap). Then Guido and crew can dig into this issue on their own schedule.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/1629696c/attachment.html 

From collinw at gmail.com  Mon Aug 14 23:03:56 2006
From: collinw at gmail.com (Collin Winter)
Date: Mon, 14 Aug 2006 16:03:56 -0500
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
	<43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com>
	<1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com>
Message-ID: <43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com>

On 8/14/06, Paul Prescod <paul at prescod.net> wrote:
> There are three issues: first, we need to RESERVE the types for
> standardization by Guido and crew.

You're just pushing the decision off to someone else. Regardless of
who makes it, decisions involving the built-in types are going to make
some group unhappy. This list saw several discussions related to
standard interpretations for the built-in types back in May and June;
here's a selection for you to catch up on:

http://mail.python.org/pipermail/python-3000/2006-May/002134.html
http://mail.python.org/pipermail/python-3000/2006-May/002216.html
http://mail.python.org/pipermail/python-3000/2006-June/002438.html

One particularly divisive issue is whether tuples should be treated as
fixed- or arbitrary-length containers. Concretely, does
"tuple(Number)" match only 1-tuples with a single Number element, or
does it match all tuples that have only Number elements?

Regardless of which you pick, somebody's going to be pissed.

> Second, we can decide to do the standardization at any point.

Um, "at any point"? You mean it's conceivable that this
standardisation could come *after* Python ships with function
annotations?

Collin Winter

From paul at prescod.net  Mon Aug 14 23:18:07 2006
From: paul at prescod.net (Paul Prescod)
Date: Mon, 14 Aug 2006 14:18:07 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
	<43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com>
	<1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com>
	<43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com>
Message-ID: <1cb725390608141418y4c111070l73554a2a959e5d72@mail.gmail.com>

On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
>
> On 8/14/06, Paul Prescod <paul at prescod.net> wrote:
> > There are three issues: first, we need to RESERVE the types for
> > standardization by Guido and crew.
>
> You're just pushing the decision off to someone else. Regardless of
> who makes it, decisions involving the built-in types are going to make
> some group unhappy.


Yes, I know. I spent about a month of my life going through the same process
back around 2003.

> Second, we can decide to do the standardization at any point.
>
> Um, "at any point"? You mean it's conceivable that this
> standardisation could come *after* Python ships with function
> annotations?


Sure. Why not?

All I'm saying is that the "function annotations" PEP should not depend on
the "function annotations for static type declarations" PEP. That was
implicit in your original pre-PEP!

If the "static type declarations PEP" misses the Python 3000 deadline then
the function annotations feature is still valuable. The former could be used
as a testbed for the latter:

def myfunc( NumTuples: [typepackage1(tuple(Number)),
                       typepackage2("tuple(Number+))"]):...

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/2ac569eb/attachment.htm 

From collinw at gmail.com  Mon Aug 14 23:23:48 2006
From: collinw at gmail.com (Collin Winter)
Date: Mon, 14 Aug 2006 16:23:48 -0500
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608141418y4c111070l73554a2a959e5d72@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
	<43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com>
	<1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com>
	<43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com>
	<1cb725390608141418y4c111070l73554a2a959e5d72@mail.gmail.com>
Message-ID: <43aa6ff70608141423w64afca33uc284417cec4a62fe@mail.gmail.com>

On 8/14/06, Paul Prescod <paul at prescod.net> wrote:
> On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
> > On 8/14/06, Paul Prescod <paul at prescod.net> wrote:
> > > Second, we can decide to do the standardization at any point.
> >
> > Um, "at any point"? You mean it's conceivable that this
> > standardisation could come *after* Python ships with function
> > annotations?
>
> Sure. Why not?

Because not having standardised meanings at the same time as the
feature becomes available says to developers, "don't use the built-in
types in your annotations because we might give them a meaning
later...or maybe we won't...but in the meantime, you're going to need
to invent new spellings for lists, tuples, dicts, sets, strings, just
in case". As someone writing an annotation consumer, that comes across
as an incredibly arbitrary decision that forces me to do a lot of
extra work.

Collin Winter

From collinw at gmail.com  Mon Aug 14 23:59:38 2006
From: collinw at gmail.com (Collin Winter)
Date: Mon, 14 Aug 2006 16:59:38 -0500
Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft
	pre-PEP: function annotations)
Message-ID: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>

On 8/14/06, Paul Prescod <paul at prescod.net> wrote:
> Third, we absolutely need a standard for
> multiple independent annotations on a parameter. Using lists is a
> no-brainer. So let's do that.

The problem with using lists is that its impossible for non-decorator
annotation consumers to know which element "belongs" to them.

Way back in http://mail.python.org/pipermail/python-3000/2006-August/002865.html,
Nick Coghlan said:
> However, what we're really talking about here is a scenario where you're
> defining your *own* custom annotation processor: you want the first part of
> the tuple in the expression handled by the type processing library, and the
> second part handled by the docstring processing library.
>
> Which says to me that the right solution is for the annotation to be split up
> into its constituent parts before the libraries ever see it.
>
> This could be done as Collin suggests by tampering with
> __signature__.annotations before calling each decorator, but I think it is
> cleaner to do it by defining a particular signature for decorators that are
> intended to process annotations.
>
> Specifically, such decorators should accept a separate dictionary to use in
> preference to the annotations on the function itself:
>
>    process_function_annotations(f, annotations=None):
>      # Process the function f
>      # If annotations is not None, use it
>      # otherwise, get the annotations from f.__signature__

I've come to like this idea more and more. Here's my stab at a
dict-based convention for specifying annotations for decorator-style
consumers:

There are several annotation consumers, docstring, typecheck and
constrain_values. Respectively, these treat annotations as
documentation; as restrictions on the type of an argument; as
restrictions on the values of an argument.

Each of these is defined something like

def consumer(annotated_function, annotations=sentinel):
    ...

If the consumer isn't given an `annotations` parameter, it is free to
assume it is the only consumer for the annotations on that function
and is free to treat the annotation expressions however it sees fit.
However, if it is given an `annotations` argument, it should observe
those annotations and only those annotations.

The more complete example:

@multiple_annotations(docstring, typecheck, constrain_values)
def foo(a: {'docstring': "Frobnication count",
            'typecheck': Number,
            'constrain_values': range(3, 9)},
        b: {'typecheck': Number,
             # This can be only 4, 8 or 12
            'constrain_values': [4, 8, 12]}) -> {'typecheck': Number}


Here, multiple_annotations assumes that the annotation dicts are keyed
on consumer.__name__; the test "if consumer.__name__ in
per_parameter_annotations" should do nicely for figuring out whether a
given consumer should be provided an `annotations` argument. (It is up
to multiple_annotations() to decide whether "consumer.__name__ in
per_parameter_annotations == False" should raise an exception.)

Collin Winter

From jimjjewett at gmail.com  Tue Aug 15 00:03:17 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 14 Aug 2006 18:03:17 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608141351n78099df6s6bf4359758d18b10@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
	<fb6fbf560608141211k528e754eva03e90612dff9ecd@mail.gmail.com>
	<1cb725390608141351n78099df6s6bf4359758d18b10@mail.gmail.com>
Message-ID: <fb6fbf560608141503s53f0beafx729321f74bda917c@mail.gmail.com>

On 8/14/06, Paul Prescod <paul at prescod.net> wrote:

> > > "This implies that the interpretation of built-in types would be
> controlled
> > > by Python's developers and documented in Python's documentation.

> > It also implies that the interpretation of annotations made with a
> > built-in type should be safe -- they shouldn't trigger any
> > irreversible actions.

> I disagree and don't think you can come up with a clear definition of
> "irreversible" in any case. Is spitting out text to a stream "irreversible"?
> I'd rather not complicate stuff.

That part is admittedly a guideline for development of python, rather
than with python.  The question is what happens with something like

    def f(a:int): ...

If the thing starts compiling (like Pyrex) to code which assumes an
int and doesn't verify, that would be a disaster waiting to happen --
unless int were explicitly reserved to the python core more strongly
than the proposed wording implies.

> I'd be happy to remove the whole subtype clause. I don't care much either
> way. But anyhow I (now) disagree that there is a problem as stated. If a
> framework wants to use a subtype of list they just need to wrap it in a
> top-level wrapper that makes the association.

> def foo(a: xxx(mylist_subtype(a, b, c))):

mylist_subtype is as unique as an object (but not as a name); if xxx
is sufficient disambiguation, then so is mylist_subtype on its own.

> > This implies that something other than lists should be used if the
> > annotations are not independent.  The obvious candidates are tuples
> > and dicts, but this should be explicit (or explicitly not defined).

> The "dependence" between notations is totally up to the framework. To repeat
> the example:

For builtin lists, they meaning should be reserved to python core.
What does the following mean?

    def f(a:[int, str])

I assume it doesn't mean a list of int and str (because lists are used
for independent annotations).  I assume it also doesn't mean "int _or_
str" because the annotations are independent.  If they two are
supposed to be used together, then they should be chained with
something other than list.

> > The definition of a type as an annotation should probably be either
> > defined or explicitly undefined.  Earlier discussions talked about
> > things like

> >     def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str)

> I think that's a separate (large!) PEP.

Agreed.  But I think the PEP should explicitly reserve the
(annotational) meaning of

(1)  builtin and standard library types, such as int and Decimal
(2)  The results of combining types with operators (such as |, +=, etc)
(3)  lists, tuples, and dictionaries of the above

It doesn't have to say what they mean, but it has to warn that a
standard meaning is contemplated, and that 3rd parties should consider
them reserved.

-jJ

From jimjjewett at gmail.com  Tue Aug 15 00:22:45 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 14 Aug 2006 18:22:45 -0400
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
Message-ID: <fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>

On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
> On 8/14/06, Paul Prescod <paul at prescod.net> wrote:

> The problem with using lists is that its impossible for non-decorator
> annotation consumers to know which element "belongs" to them.

The ones whose type they own -- which is why I see at least some
parallel to exceptions, and its inheritance based semantics.

    def f(a:[mytype("asdfljasdf"),
             zope.mypackage.something(b,d,e),
             "a string",
             mytype([47]),
             15):
        """Example of long compound annotations

Maybe annotations this size should just be restricted to Signature modification
instead of allowing them in the actual declaration?  At least by style guides?
"""

By the defined meaning of list, these are 5 independent annotations.

Whoever defined mytype controls the meaning of the mytype annotations;
anyone not familiar with that package should ignore them (and hope
there were no side effects in the expressions that generated them).

zope.mypackage controls that annotation; anyone not familiar with that
product should ignore it (and hope there were no side effects ...)

"a string" and 15 are builtin types -- so their semantics are defined
by core python, which says that they are documentation only --
stripping them off or changing them wouldn't break a properly written
program.

> Here, multiple_annotations assumes that the annotation dicts are keyed
> on consumer.__name__;

Too many consumers will call themselves "wrapper" or some such.  You
should key on the actual type object -- in which case you probably
want isinstance to support subtypes.

-jJ

From paul at prescod.net  Tue Aug 15 00:48:33 2006
From: paul at prescod.net (Paul Prescod)
Date: Mon, 14 Aug 2006 15:48:33 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608141423w64afca33uc284417cec4a62fe@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
	<43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com>
	<1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com>
	<43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com>
	<1cb725390608141418y4c111070l73554a2a959e5d72@mail.gmail.com>
	<43aa6ff70608141423w64afca33uc284417cec4a62fe@mail.gmail.com>
Message-ID: <1cb725390608141548l2cf6f484rd6cf909cdb3637e7@mail.gmail.com>

On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
>
> Because not having standardised meanings at the same time as the
> feature becomes available says to developers, "don't use the built-in
> types in your annotations because we might give them a meaning
> later...or maybe we won't...but in the meantime, you're going to need
> to invent new spellings for lists, tuples, dicts, sets, strings, just
> in case". As someone writing an annotation consumer, that comes across
> as an incredibly arbitrary decision that forces me to do a lot of
> extra work.


No, you aren't going to have to invent new spellings.  As per my previous
email, this should be allowed:

def myfunc( NumTuples: [typepackage1(tuple(int)),
                       typepackage2("tuple(Number+))")]):...

All you need to do is declare the fact that you are using the built-in types
in a non-standard way by wrapping them in your own annotation constructor.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/206991d8/attachment.html 

From collinw at gmail.com  Tue Aug 15 00:51:40 2006
From: collinw at gmail.com (Collin Winter)
Date: Mon, 14 Aug 2006 17:51:40 -0500
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
Message-ID: <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>

On 8/14/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
> > The problem with using lists is that its impossible for non-decorator
> > annotation consumers to know which element "belongs" to them.
>
> The ones whose type they own -- which is why I see at least some
> parallel to exceptions, and its inheritance based semantics.
>
>     def f(a:[mytype("asdfljasdf"),
>              zope.mypackage.something(b,d,e),
>              "a string",
>              mytype([47]),
>              15):
>
> Whoever defined mytype controls the meaning of the mytype annotations;
> anyone not familiar with that package should ignore them (and hope
> there were no side effects in the expressions that generated them).
>
> zope.mypackage controls that annotation; anyone not familiar with that
> product should ignore it (and hope there were no side effects ...)

As hideous as I think this is from an aesthetics/visual noise
standpoint, it's probably the only reliable way to let both decorator-
and non-decorator-based consumers work.

What would the rule be about top-level types? Would you have to use a
list, or could a set or dict be used?

Collin Winter

From paul at prescod.net  Tue Aug 15 01:18:14 2006
From: paul at prescod.net (Paul Prescod)
Date: Mon, 14 Aug 2006 16:18:14 -0700
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
Message-ID: <1cb725390608141618r11e61720y7ad3c1ab410dccc5@mail.gmail.com>

On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
>
> What would the rule be about top-level types? Would you have to use a
> list, or could a set or dict be used?


I argue to restrict to a list for the following reasons:

1. Better to just pick something for visual consistency (someone said they
liked tuples but I find all of the rounded parens confusing)

 2.  May want to use other types for other meanings in the future.

 3. What do you do with the keys of the dictionary? Is this back to
connecting decorators to annotations by name or something? The string
namespace is not very managable.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/4deed1fd/attachment.htm 

From guido at python.org  Tue Aug 15 02:26:42 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 17:26:42 -0700
Subject: [Python-3000] PEP3102 Keyword-Only Arguments; Signature
In-Reply-To: <fb6fbf560608141324ie44f75anc33876a18ae202e0@mail.gmail.com>
References: <fb6fbf560608141324ie44f75anc33876a18ae202e0@mail.gmail.com>
Message-ID: <ca471dc20608141726m3ffc6f88o4013370def76b809@mail.gmail.com>

On 8/14/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/14/06, Steven Bethard <steven.bethard at gmail.com> wrote:
> > On 8/14/06, Guido van Rossum <guido at python.org> wrote:
> > > I believe the PEP doesn't address the opposite use case: positional
> > > arguments that should *not* be specified as keyword arguments.
>
> ...
> > It would be really nice in the example above to mark ``self`` in
> > ``__call__`` as a positional only argument.
>
> Would this have to be in the standard function prologue, or would it
> be acceptable to modify a function's Signature object?
>
> As I see it, each argument can be any combination of the following:
>
>     positional
>     keyword
>     named
>     defaulted
>     annotated
>
> I can see some value in supporting all 32 possibilities, but doing it
> directly as part of the def syntax might get awkward.

Perhaps. Though you're making it seem worse than it is by adding
annotated (which should be considered completely orthogonal to the
rest, and may not combine with everything else).

> Most arguments are both positional and keyword.  The bare * will
> support keyword-only, and you're asking for positional-only.  (An
> argument which is neither positional nor keyword doesn't make sense.)
>
> Today (except in extension code), an argument that isn't named only
> appears courtesy of *args or **kwargs.
>
> Today, named + keyword <==> defaulted

I'm not sure I follow. You seem to be perpetuating the eternal
misunderstanding that from the caller's POV this is not a keyword
argument:

  def foo(a): pass

In fact, calling foo(a=1) is totally legal.

> Today, arguments are not annotated.
>
> Would it be acceptable if functions contained a (possibly implicit)
> Signature object, and the way to get the odd combinations were through
> modifying that?
>
> For example:
>
>     def unnamedargs(func):
>         for arg in func.Signature:
>             arg.name=None
>         return func
> ...
>         @unnamedargs
>         def write(self, s):

This seems a last-resort approach; I'd rather do something less
drastic. Unfortunately the more I think about it the less I like using
'**' without a following name for this feature.

PS whenever you respond to something it becomes a new thread in Gmail.
Is your mail app perhaps not properly inserting In-reply-to headers?
Or do you forge a reply by creating a new message with the same
subject and "Re:" prepended?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Tue Aug 15 03:08:25 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 14 Aug 2006 21:08:25 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <mailman.34357.1155590629.27774.python-3000@python.org>
Message-ID: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>

At 1:51 PM 8/14/2006 -0700, "Paul Prescod" <paul at prescod.net> wrote:
>On 8/14/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> > The definition of a type as an annotation should probably be either
> > defined or explicitly undefined.  Earlier discussions talked about
> > things like
> >
> >     def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str)
>
>
>I think that's a separate (large!) PEP. This PEP should disallow frameworks
>from inventing their own meaning for this syntax (requiring them to at least
>wrap). Then Guido and crew can dig into this issue on their own schedule.

I see we haven't made nearly as much progress on the concept of "no 
predefined semantics" as I thought we had.  :(

i.e., -1 on constraining what types mean.


From ironfroggy at gmail.com  Tue Aug 15 03:10:13 2006
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Mon, 14 Aug 2006 21:10:13 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<20060812205512.197A.JCARLSON@uci.edu>
	<5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com>
	<5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com>
	<1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com>
	<43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com>
Message-ID: <76fd5acf0608141810gf062eabh76b0ca92d61372b1@mail.gmail.com>

On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
> On 8/13/06, Paul Prescod <paul at prescod.net> wrote:
> > "In order for processors of function annotations to work interoperably, they
> > must use a common interpretation of objects used as annotations on a
> > particular function. For example, one might interpret string annotations as
> > docstrings. Another might interpet them as path segments for a web
> > framework. For this reason, function annotation processors SHOULD avoid
> > assigning processor-specific meanings to types defined outside of the
> > processor's framework. For example, a Django processor could process
> > annotations of a type defined in a Zope package, but Zope's creators should
> > be considered the authorities on the type's meaning for the same reasons
> > that they would be considered authorities on the semantics of classes or
> > methods in their packages."
>
> The way I read this, it forces (more or less) each
> annotation-consuming library to invent new ways to spell Python's
> built-in types.
>
> I read all this as saying that annotation processors should avoid
> using Python's lists, tuples and dicts in annotations (since whatever
> semantics the Python developers come up with will inevitably be
> incompatible with what some library writer needs/wants). Each
> processor library will then define my_processor.List,
> my_processor.Tuple, my_processor.Dict, etc as alternate spellings for
> [x, y, z], (x, y, z), {x: y} and so on.

I'm sorry but I don't see the logic here. Why will all the annotation
libraries need to invent stand-ins for the built-in types? They just
shouldn't define any meaning to standard types as annotations, leaving
the interpretation of int in 'def foo(a: int)' up to the python
developers. The only thing I can figure is that you see this need in
order for other annotation libraries to handle associating types with
arguments, but there is evidence that this shouldn't be done directly
with built-in type objects (unless defined by python itself). Using
the types directly doesn't cover important use-cases like adapting,
even tho we can expect it is safe with builtin types, we can not be
sure of this with all types, so there is a good chance the type
annotations will take the form of

   def foo(a: argtype(int))
   def bar(b: argtype(Baz, adapter=Baz.adaptFrom))

which defines that foo takes an int object and bar takes a Baz
instance, which can be adapted to with the classmethod Baz.adaptFrom.
Maybe Baz' constructor takes a database connection and object ID, and
would break just being passed a random object. In this case, we don't
need to use my_anno.Integer or something like that, because we aren't
(and shouldn't) use the built-in type objects directly as our
annotation objects.

I'll propose this as a new rule the PEP should define that annotation
handling libraries should not only avoid expecting instances of
built-in types as annotations (lists and strings, etc.) but also those
types themselves (using the int object itself as an annotation). It
may seem terribly convenient to use types directly, but its becoming
more and more apparent that all annotations should be wrapped in
something by which the meaning of the annotation can be reliably and
safely determined by its type, and no built-in type really does that
in an agreeable way.

Also, Collin Winter said:
> One particularly divisive issue is whether tuples should be treated as
> fixed- or arbitrary-length containers. Concretely, does
> "tuple(Number)" match only 1-tuples with a single Number element, or
> does it match all tuples that have only Number elements?

I would personally be completely adverse to the use of any containers
as a meaning of "This argument is a list/tuple of some specific
types". On on hand, this is the realm of the individual annotation
libraries, so it isn't even relevent to this convesation. However,
when it is done, a specific type to represent the concept would be
more prodent. For example, I would like to annotate with listOf(str,
int) or tupleOf(multiple(bool)) to mean "A list of a str and an int"
and "A tuple of muiltple bool objects", respectively.

From greg.ewing at canterbury.ac.nz  Tue Aug 15 03:13:53 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 Aug 2006 13:13:53 +1200
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com>
Message-ID: <44E11FD1.1020201@canterbury.ac.nz>

Phillip J. Eby wrote:

> It can't be a "separate program altogether", since to get at the 
> annotations, the  program must import the module that contains them.

Why? I can imagine something like a documentation
generator or static type checker that just parses the
source, being careful not to execute anything.

Also, even if it does work by importing the module,
how is the module being imported supposed to know
which annotation processor is going to be processing
its annotations, and therefore what generic methods
need to be overridden, and how to go about doing
that -- assuming there is no standardisation of any
sort?

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug 15 03:19:26 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 Aug 2006 13:19:26 +1200
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com>
Message-ID: <44E1211E.5040308@canterbury.ac.nz>

Phillip J. Eby wrote:

 > The examples there are very short
> and simple; in fact the complete Message implementation, including 
> imports and overload declarations is only *6 lines long*.
> 
> So, my only guess is that the people who looked at that skimmed right 
> past it, looking for something more complicated!

If it really is that short and simple, why not just post
the whole thing? Then there's no danger of anyone getting
lost in parts of the documentation they're not supposed
to be looking at.

--
Greg

From ncoghlan at gmail.com  Tue Aug 15 03:25:57 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 15 Aug 2006 11:25:57 +1000
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <ca471dc20608141108u118487ccw16cc8527c6f24744@mail.gmail.com>
References: <ca471dc20608140909o730ab1e0i86c6d562cfa90abd@mail.gmail.com>	<20060814172000.1717.863905740.divmod.quotient.22821@ohm>
	<ca471dc20608141108u118487ccw16cc8527c6f24744@mail.gmail.com>
Message-ID: <44E122A5.6090203@gmail.com>

Guido van Rossum wrote:
> On 8/14/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
>> On Mon, 14 Aug 2006 09:09:49 -0700, Guido van Rossum <guido at python.org> wrote:
>>> On 8/14/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
>>>> This is a joke, right?
>>> Because it's a good idea to have to write separate wrappers
>>> around every useful library for each dynamic languague separately?
>> If a project has done this successfully, I don't think I've seen it.  Can
>> you point out some examples where this has been accomplished in a useful
>> form?  The nearest thing I can think of is SWIG, which is basically a
>> failure.
> 
> SWIG is not my favorite (msotly because I don't like C++ much) but
> it's used very effectively here at Google (for example); I wouldn't
> dream of calling it a failure.

I've found SWIG to be especially effective when using it to wrap a library I 
have control over, so I can tweak the interface to avoid stressing the code 
generator too much. Running it over arbitrary C libraries requires a fair bit 
of work defining the necessary typemaps (although you still have the benefit 
of writing the typemap for a given style of interface *once* instead of for 
every function that uses it).

However, in the context of this discussion, a SWIG-like tool that produced 
pure Python ctypes-based code would be a vast improvement. Taking the SWIG 
typemaps for the Python C API as a starting point, you could even do it with 
SWIG itself (rather than reinventing the wheel, as codegen's components for 
parsing C header files appear to do).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From pje at telecommunity.com  Tue Aug 15 03:33:03 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 14 Aug 2006 21:33:03 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44E11FD1.1020201@canterbury.ac.nz>
References: <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060814212620.025d8f70@sparrow.telecommunity.com>

At 01:13 PM 8/15/2006 +1200, Greg Ewing wrote:
>Phillip J. Eby wrote:
>
>>It can't be a "separate program altogether", since to get at the 
>>annotations, the  program must import the module that contains them.
>
>Why? I can imagine something like a documentation
>generator or static type checker that just parses the
>source, being careful not to execute anything.

How is such a thing going to know what doc("foo") means at the time the 
code is run?  What about closures, dynamic imports, etc.?


>Also, even if it does work by importing the module,
>how is the module being imported supposed to know
>which annotation processor is going to be processing
>its annotations, and therefore what generic methods
>need to be overridden, and how to go about doing
>that -- assuming there is no standardisation of any
>sort?

Weak imports are a good solution for the case where interop is 
optional.  You do something like:

     @whenImported('some.doc.processor')
     def registerDocHandler(processor):
         @processor.someOverloadedFunction.when(SomeType)
         def handleTypeDefinedByThisModule(...):
             ...

The idea here being that the registration occurs if and only if the 
some.doc.processor module is imported during the lifetime of the 
program.  See http://cheeseshop.python.org/pypi/Importing for a package 
that contains a non-decorator version of this functionality.

Anyway, the idea here is that if you create a library with a bunch of 
annotation types in it, you use weak importing to optionally register 
handlers for whatever processors are out there that you want to 
support.  Also, other people can of course define their own third-party 
glue modules that provide this kind of support for some given combination 
of annotation types and processors.


From pje at telecommunity.com  Tue Aug 15 03:37:47 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 14 Aug 2006 21:37:47 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <44E1211E.5040308@canterbury.ac.nz>
References: <5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<44DD5DF0.40405@acm.org>
	<5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com>
	<5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060814213434.027739f0@sparrow.telecommunity.com>

At 01:19 PM 8/15/2006 +1200, Greg Ewing wrote:
>Phillip J. Eby wrote:
>
> > The examples there are very short
>>and simple; in fact the complete Message implementation, including 
>>imports and overload declarations is only *6 lines long*.
>>So, my only guess is that the people who looked at that skimmed right 
>>past it, looking for something more complicated!
>
>If it really is that short and simple, why not just post
>the whole thing? Then there's no danger of anyone getting
>lost in parts of the documentation they're not supposed
>to be looking at.

Here are the most relevant bits excerpted from the text:

To create a new kind of metadata, we need to create a class that represents
the metadata, and then add a method to  the ``binding.declareAttribute()``
generic function.  For our example, we'll create a ``Message`` metadata type
that just prints a message when the metadata is registered::

     >>> class Message(str):
     ...     pass

     >>> def print_message(classobj, attrname, metadata):
     ...     print metadata, "(%s.%s)" % (classobj.__name__, attrname)

     >>> binding.declareAttribute.addMethod(Message,print_message)

Now, we'll see if it works::

     >>> class Foo: pass
     >>> binding.declareAttribute(Foo, 'bar', Message("testing"))
     testing (Foo.bar)

In addition to defining your own metadata types, ``declareAttribute()`` has
built-in semantics for ``None`` and sequence types.  The former is a no-op, and
the latter re-invokes ``declareAttribute()`` on the sequence contents::

     >>> binding.declareAttribute(Foo, 'baz',
     ...     [Message('test1'), Message('test2')]
     ... )
     test1 (Foo.baz)
     test2 (Foo.baz)

     >>> binding.declareAttribute(Foo, 'spam', None)     # no-op


From tim.peters at gmail.com  Tue Aug 15 03:39:42 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 14 Aug 2006 21:39:42 -0400
Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry
In-Reply-To: <ebq2ss$qhf$1@sea.gmane.org>
References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>
	<ebq2ss$qhf$1@sea.gmane.org>
Message-ID: <1f7befae0608141839vf45bddaw7a19701d766cb4af@mail.gmail.com>

[Tim Peters]
>> ...
>> When the ctypes docs talk about passing and returning integers, they
>> never explain what "integers" /means/, but it seems the docs
>> implicitly have a 32-bit-only view of the world here.  In reality
>> "integer" seems to mean the native C `int` type.

[Thomas Heller]
> 'ctypes.c_int' and 'ctypes.c_long' correspond to the C 'int' and 'long' types.

Sure, that's clear.  It's where the docs talk about (the unqualified)
"integers", and the quotes there aren't just to scare you ;-).  Like
in:

    http://starship.python.net/crew/theller/ctypes/tutorial.html

near the end of section "Calling functions":

    Python integers, strings and unicode strings are the only objects that can
    directly be used as parameters in these function calls.

What does the word "integers" /mean/ there?

> If you think that the docs could be clearer, please suggest changes.

I can't, because I don't know what was intended.  Python integers come
in two flavors, `int` and `long`, so I assumed at first that the
"Python integers" in the above probably meant "a Python (short) int"
(which is a C `long`).  But writing the thread test using that
assumption failed on some 64-bit buildbots.  After staring at the
specific ways it failed, my next guess was that by "Python integers"
the docs don't really mean Python integers at all, but C's `int`.
That's what convinced me to /try/ wrapping the thread id in
ctypes.c_long(), and the test problems went away then, so I did too
:-)

I searched all the docs for the word "integers" and never found out
what was intended.  So you could search the docs for the same thing.
Like, still in the tutorial, at the start of section "Return types":

    By default functions are assumed to return integers.

Or in the reference docs:

    Note that all these functions are assumed to return integers,
which is of course
    not always the truth, so you have to assign the correct restype
attribute to use
    these functions.

and the description of memmove():

    memmove(dst, src, count)

    Same as the standard C memmove library function: copies count bytes from
    src to dst. dst and src must be integers or ...

Python has at least three meanings for the word "integer" (short,
long, & "either"), and C has at least 10 (signed & unsigned char,
short, int, long, & long long), so the unqualifed "integer" is highly
ambiguous.  While in many contexts that doesn't much matter, in ctypes
it does.

From ncoghlan at gmail.com  Tue Aug 15 03:44:10 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 15 Aug 2006 11:44:10 +1000
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <ca471dc20608141104i154efbfehf88e8f10f7877ea8@mail.gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>	<fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>	<ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>	<d11dcfba0608141049p24c03471k3c0252bd188ee5e7@mail.gmail.com>
	<ca471dc20608141104i154efbfehf88e8f10f7877ea8@mail.gmail.com>
Message-ID: <44E126EA.5000200@gmail.com>

Guido van Rossum wrote:
>> It would be really nice in the example above to mark ``self`` in
>> ``__call__`` as a positional only argument.
> 
> But this is a rather unusual use case isn't it? It's due to the bound
> methods machinery. Do you have other use cases? I would assume that
> normally such wrappers take their own control arguments in the form of
> keyword-only arguments (that are unlikely to conflict with arguments
> of the wrapped method).
> 

I'd like a syntax or convention for it so I can document the signature of 
functions written in C that accept positional-only arguments using Python's 
own function definition notation ;)

I'd also like to be able to use it to say "I'm not sure about this parameter 
name yet, so don't rely on it staying the same!" while developing an API.

However, I'm also wondering if we need an actual syntax, or if a simple 
convention would do the trick: start the names of positional-only arguments 
with an underscore.

Then Steven's examples would become:

     >>> class Wrapper(object):
     ...     def __init__(self, func):
     ...         self.func = func
     ...     def __call__(_self, *args, **kwargs):
     ...         print 'calling wrapped function'
     ...         return self.func(*args, **kwargs)
     ...

     def failUnlessRaises(_self, _excClass, _callableObj, *args, **kwargs):

With the 'best practice' being that any function that accepts arbitrary kwargs 
should use an underscore on its named parameters.

The only way to screw the latter example up would be for a caller to do:

   self.failUnlessRaises(TypeError, my_func, _callableObj=foo)

And if the 'underscore indicates positional only' convention were adopted 
officially, it would be trivial for PyLint/PyChecker to flag any call that 
specifies a name starting with an underscore as a keyword argument.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Tue Aug 15 03:49:05 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 15 Aug 2006 11:49:05 +1000
Subject: [Python-3000] Python/C++ question
In-Reply-To: <ebqi1f$80m$2@sea.gmane.org>
References: <44DA6C01.2040904@acm.org>	<ca471dc20608091618k3cffb3ewa3029794f0f02761@mail.gmail.com>	<44DF0800.4060204@acm.org>	<ebn5ms$mne$1@sea.gmane.org>	<ca471dc20608141113y15e6ba9u3ea405905a0ca0ad@mail.gmail.com>
	<ebqi1f$80m$2@sea.gmane.org>
Message-ID: <44E12811.5090709@gmail.com>

Georg Brandl wrote:
> Guido van Rossum wrote:
>> Implementation Language
>> ==================
>>
>> Python 3000 will be implemented in C, and the implementation will be
>> derived as an evolution of the Python 2 code base. This reflects my
>> views (which I share with Joel Spolsky) on the dangers of complete
>> rewrites. Since Python 3000 as a language is a relatively mild
>> improvement on Python 2, we can gain a lot by not attempting to
>> reimplement the language from scratch. I am not against parallel
>> from-scratch implementation efforts, but my own efforts will be
>> directed at the language and implementation that I know best.
> 
> I had already added something to PEP 3099, but if you like that approach
> better, I'll add that to PEP 3000.

You can always keep both :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From tjreedy at udel.edu  Tue Aug 15 03:53:27 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 14 Aug 2006 21:53:27 -0400
Subject: [Python-3000] Python/C++ question
References: <44DA6C01.2040904@acm.org>	<ca471dc20608091618k3cffb3ewa3029794f0f02761@mail.gmail.com>	<44DF0800.4060204@acm.org><ebn5ms$mne$1@sea.gmane.org><ca471dc20608141113y15e6ba9u3ea405905a0ca0ad@mail.gmail.com>
	<ebqi1f$80m$2@sea.gmane.org>
Message-ID: <ebr9eq$cfc$1@sea.gmane.org>


"Georg Brandl" <g.brandl at gmx.net> wrote in message 
news:ebqi1f$80m$2 at sea.gmane.org...
> Guido van Rossum wrote:
>> Implementation Language
>> ==================
>>
>> Python 3000 will be implemented in C, and the implementation will be
>> derived as an evolution of the Python 2 code base. This reflects my
>> views (which I share with Joel Spolsky) on the dangers of complete
>> rewrites. Since Python 3000 as a language is a relatively mild
>> improvement on Python 2, we can gain a lot by not attempting to
>> reimplement the language from scratch. I am not against parallel
>> from-scratch implementation efforts, but my own efforts will be
>> directed at the language and implementation that I know best.
>
> I had already added something to PEP 3099, but if you like that approach
> better, I'll add that to PEP 3000.

Please add this.  It clearly says what and why and  will answer questions 
that are sure to come.  I would leave a comment in 3099 also.

tjr




From alexander.belopolsky at gmail.com  Tue Aug 15 04:15:36 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Mon, 14 Aug 2006 22:15:36 -0400
Subject: [Python-3000] [Python-Dev] Type of range object members
In-Reply-To: <44E12B0A.9020907@gmail.com>
References: <d38f5330608141608o7c748a20ka1daa2504896b213@mail.gmail.com>
	<ca471dc20608141632r173e8d60o20af3787b7efbc9d@mail.gmail.com>
	<44E12B0A.9020907@gmail.com>
Message-ID: <2DA248BC-5534-4CE5-A9C8-84259E8A71B2@local>


On Aug 14, 2006, at 10:01 PM, Nick Coghlan wrote:

> Guido van Rossum wrote:
>> Methinks that as long as PyIntObject uses long (see intobject.h)
>> there's no point in changing this to long.
>
> Those fields are going to have to change to Py_Object* eventually  
> if xrange() is going to become the range() replacement in Py3k. . .
>

In this case it will become indistinguishable from

typedef struct {
     PyObject_HEAD
     PyObject *start, *stop, *step;      /* not NULL */
} PySliceObject;

See sliceobject.h .  Would it make sense to unify rangeobject with  
PySliceObject?


From ncoghlan at gmail.com  Tue Aug 15 04:34:18 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 15 Aug 2006 12:34:18 +1000
Subject: [Python-3000] [Python-Dev] Type of range object members
In-Reply-To: <2DA248BC-5534-4CE5-A9C8-84259E8A71B2@local>
References: <d38f5330608141608o7c748a20ka1daa2504896b213@mail.gmail.com>
	<ca471dc20608141632r173e8d60o20af3787b7efbc9d@mail.gmail.com>
	<44E12B0A.9020907@gmail.com>
	<2DA248BC-5534-4CE5-A9C8-84259E8A71B2@local>
Message-ID: <44E132AA.1020900@gmail.com>

Alexander Belopolsky wrote:
> 
> On Aug 14, 2006, at 10:01 PM, Nick Coghlan wrote:
> 
>> Guido van Rossum wrote:
>>> Methinks that as long as PyIntObject uses long (see intobject.h)
>>> there's no point in changing this to long.
>>
>> Those fields are going to have to change to Py_Object* eventually if 
>> xrange() is going to become the range() replacement in Py3k. . .
>>
> 
> In this case it will become indistinguishable from
> 
> typedef struct {
>     PyObject_HEAD
>     PyObject *start, *stop, *step;      /* not NULL */
> } PySliceObject;
> 
> See sliceobject.h .  Would it make sense to unify rangeobject with 
> PySliceObject?
> 

Not really. The memory layouts may end up being the same in Py3k, but they're 
still different types. The major differences between the two types just happen 
to lie in the methods they support (as defined by the value of the type 
pointer in PyObject_HEAD), rather than the data they contain.

Besides, the range object may actually keep the current optimised behaviour 
for dealing with PyInt values, only falling back to PyObject* if one of start, 
stop or step was too large to fit into a PyInt.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From greg.ewing at canterbury.ac.nz  Tue Aug 15 04:49:42 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 Aug 2006 14:49:42 +1200
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060814212620.025d8f70@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com>
	<5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com>
	<5.1.1.6.0.20060814212620.025d8f70@sparrow.telecommunity.com>
Message-ID: <44E13646.9020709@canterbury.ac.nz>

Phillip J. Eby wrote:

> How is such a thing going to know what doc("foo") means at the time the 
> code is run?  What about closures, dynamic imports, etc.?

Annotations intended for such external processors would
have to be designed not to rely on anything dynamic,
i.e. be purely declarative.

Maybe this is why we're having trouble communicating.
You seem to be thinking of annotations purely as
dynamic things that affect the execution of the
program. I'm thinking of them as something that will
just as likely be used in a declarative way, possibly
by tools that don't execute the code at all, but do
something entirely different with it.

> Weak imports are a good solution for the case where interop is 
> optional.  You do something like:
> 
>     @whenImported('some.doc.processor')
>     def registerDocHandler(processor):
>         @processor.someOverloadedFunction.when(SomeType)
>         def handleTypeDefinedByThisModule(...):
>             ...

But this requires the module using the annotations to
anticipate all the processors that will potentially
process its annotations, and teach each one of them
about itself.

 > Also, other people can of course define their own third-party
> glue modules that provide this kind of support for some given 
> combination of annotation types and processors.

I don't see how a third party can do this, because only the
module containing the annotations can know what idiosynchratic
scheme it's chosen for combining them.

--
Greg


> 


From guido at python.org  Tue Aug 15 04:58:12 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 19:58:12 -0700
Subject: [Python-3000] Python/C++ question
In-Reply-To: <ebr9eq$cfc$1@sea.gmane.org>
References: <44DA6C01.2040904@acm.org>
	<ca471dc20608091618k3cffb3ewa3029794f0f02761@mail.gmail.com>
	<44DF0800.4060204@acm.org> <ebn5ms$mne$1@sea.gmane.org>
	<ca471dc20608141113y15e6ba9u3ea405905a0ca0ad@mail.gmail.com>
	<ebqi1f$80m$2@sea.gmane.org> <ebr9eq$cfc$1@sea.gmane.org>
Message-ID: <ca471dc20608141958k2e88485r85a5c80cce5b561@mail.gmail.com>

+1

On 8/14/06, Terry Reedy <tjreedy at udel.edu> wrote:
>
> "Georg Brandl" <g.brandl at gmx.net> wrote in message
> news:ebqi1f$80m$2 at sea.gmane.org...
> > Guido van Rossum wrote:
> >> Implementation Language
> >> ==================
> >>
> >> Python 3000 will be implemented in C, and the implementation will be
> >> derived as an evolution of the Python 2 code base. This reflects my
> >> views (which I share with Joel Spolsky) on the dangers of complete
> >> rewrites. Since Python 3000 as a language is a relatively mild
> >> improvement on Python 2, we can gain a lot by not attempting to
> >> reimplement the language from scratch. I am not against parallel
> >> from-scratch implementation efforts, but my own efforts will be
> >> directed at the language and implementation that I know best.
> >
> > I had already added something to PEP 3099, but if you like that approach
> > better, I'll add that to PEP 3000.
>
> Please add this.  It clearly says what and why and  will answer questions
> that are sure to come.  I would leave a comment in 3099 also.
>
> tjr
>
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 15 05:00:32 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 20:00:32 -0700
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <44E126EA.5000200@gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>
	<fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>
	<ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>
	<d11dcfba0608141049p24c03471k3c0252bd188ee5e7@mail.gmail.com>
	<ca471dc20608141104i154efbfehf88e8f10f7877ea8@mail.gmail.com>
	<44E126EA.5000200@gmail.com>
Message-ID: <ca471dc20608142000i6879c970u9de31abe178434a9@mail.gmail.com>

On 8/14/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> However, I'm also wondering if we need an actual syntax, or if a simple
> convention would do the trick: start the names of positional-only arguments
> with an underscore.

Hm... and perhaps we could forbid keyword arguments starting with an
underscore in the call syntax?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 15 05:01:57 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 20:01:57 -0700
Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry
In-Reply-To: <1f7befae0608141839vf45bddaw7a19701d766cb4af@mail.gmail.com>
References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com>
	<ebq2ss$qhf$1@sea.gmane.org>
	<1f7befae0608141839vf45bddaw7a19701d766cb4af@mail.gmail.com>
Message-ID: <ca471dc20608142001n68a9b560p5d788391629640a4@mail.gmail.com>

Perhaps this thread can be moved back to python-dev? I'm not sure how
relevant a discussion of the ambiguities in ctypes' docs are for Py3k.

On 8/14/06, Tim Peters <tim.peters at gmail.com> wrote:
> [Tim Peters]
> >> ...
> >> When the ctypes docs talk about passing and returning integers, they
> >> never explain what "integers" /means/, but it seems the docs
> >> implicitly have a 32-bit-only view of the world here.  In reality
> >> "integer" seems to mean the native C `int` type.
>
> [Thomas Heller]
> > 'ctypes.c_int' and 'ctypes.c_long' correspond to the C 'int' and 'long' types.
>
> Sure, that's clear.  It's where the docs talk about (the unqualified)
> "integers", and the quotes there aren't just to scare you ;-).  Like
> in:
>
>     http://starship.python.net/crew/theller/ctypes/tutorial.html
>
> near the end of section "Calling functions":
>
>     Python integers, strings and unicode strings are the only objects that can
>     directly be used as parameters in these function calls.
>
> What does the word "integers" /mean/ there?
>
> > If you think that the docs could be clearer, please suggest changes.
>
> I can't, because I don't know what was intended.  Python integers come
> in two flavors, `int` and `long`, so I assumed at first that the
> "Python integers" in the above probably meant "a Python (short) int"
> (which is a C `long`).  But writing the thread test using that
> assumption failed on some 64-bit buildbots.  After staring at the
> specific ways it failed, my next guess was that by "Python integers"
> the docs don't really mean Python integers at all, but C's `int`.
> That's what convinced me to /try/ wrapping the thread id in
> ctypes.c_long(), and the test problems went away then, so I did too
> :-)
>
> I searched all the docs for the word "integers" and never found out
> what was intended.  So you could search the docs for the same thing.
> Like, still in the tutorial, at the start of section "Return types":
>
>     By default functions are assumed to return integers.
>
> Or in the reference docs:
>
>     Note that all these functions are assumed to return integers,
> which is of course
>     not always the truth, so you have to assign the correct restype
> attribute to use
>     these functions.
>
> and the description of memmove():
>
>     memmove(dst, src, count)
>
>     Same as the standard C memmove library function: copies count bytes from
>     src to dst. dst and src must be integers or ...
>
> Python has at least three meanings for the word "integer" (short,
> long, & "either"), and C has at least 10 (signed & unsigned char,
> short, int, long, & long long), so the unqualifed "integer" is highly
> ambiguous.  While in many contexts that doesn't much matter, in ctypes
> it does.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 15 05:04:27 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 14 Aug 2006 20:04:27 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>
References: <mailman.34357.1155590629.27774.python-3000@python.org>
	<5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>
Message-ID: <ca471dc20608142004t7f0eb56cj6efa40947504dc01@mail.gmail.com>

On 8/14/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 1:51 PM 8/14/2006 -0700, "Paul Prescod" <paul at prescod.net> wrote:
> >On 8/14/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> > > The definition of a type as an annotation should probably be either
> > > defined or explicitly undefined.  Earlier discussions talked about
> > > things like
> > >
> > >     def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str)
> >
> >
> >I think that's a separate (large!) PEP. This PEP should disallow frameworks
> >from inventing their own meaning for this syntax (requiring them to at least
> >wrap). Then Guido and crew can dig into this issue on their own schedule.
>
> I see we haven't made nearly as much progress on the concept of "no
> predefined semantics" as I thought we had.  :(
>
> i.e., -1 on constraining what types mean.

Haven't I said that the whole time? I *thought* that Collin's PEP
steered clear from the topic too. At the same time, does this preclude
having some kind of "default" type notation in the standard library?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steven.bethard at gmail.com  Tue Aug 15 05:11:40 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 14 Aug 2006 21:11:40 -0600
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <ca471dc20608142000i6879c970u9de31abe178434a9@mail.gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>
	<fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>
	<ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>
	<d11dcfba0608141049p24c03471k3c0252bd188ee5e7@mail.gmail.com>
	<ca471dc20608141104i154efbfehf88e8f10f7877ea8@mail.gmail.com>
	<44E126EA.5000200@gmail.com>
	<ca471dc20608142000i6879c970u9de31abe178434a9@mail.gmail.com>
Message-ID: <d11dcfba0608142011t5bdce999r2f547c3d9fe53fcb@mail.gmail.com>

[Steven Bethard]
> It would be really nice in the example above to mark ``self`` in
> ``__call__`` as a positional only argument.

[Nick Coghlan]
> However, I'm also wondering if we need an actual syntax, or if a simple
> convention would do the trick: start the names of positional-only arguments
> with an underscore.

That would certainly be good enough for me.  As long as it's
documented and there's somewhere to point to when someone does it
wrong, it solves my problem.

[Guido van Rossum]
> Hm... and perhaps we could forbid keyword arguments starting with an
> underscore in the call syntax?

-0.  As long as the convention exists somewhere, I don't think this
buys us too much. I think supplying a keyword argument when you should
be using a positional is about the same level of
willing-to-shoot-yourself-in-the-foot as using attributes that are
supposed to be private (the other place where leading underscores are
suggested).

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From aahz at pythoncraft.com  Tue Aug 15 05:37:59 2006
From: aahz at pythoncraft.com (Aahz)
Date: Mon, 14 Aug 2006 20:37:59 -0700
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <ca471dc20608142000i6879c970u9de31abe178434a9@mail.gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>
	<fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>
	<ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>
	<d11dcfba0608141049p24c03471k3c0252bd188ee5e7@mail.gmail.com>
	<ca471dc20608141104i154efbfehf88e8f10f7877ea8@mail.gmail.com>
	<44E126EA.5000200@gmail.com>
	<ca471dc20608142000i6879c970u9de31abe178434a9@mail.gmail.com>
Message-ID: <20060815033759.GA4078@panix.com>

On Mon, Aug 14, 2006, Guido van Rossum wrote:
> On 8/14/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> However, I'm also wondering if we need an actual syntax, or if a simple
>> convention would do the trick: start the names of positional-only arguments
>> with an underscore.
> 
> Hm... and perhaps we could forbid keyword arguments starting with an
> underscore in the call syntax?

Do you mean forbid by convention or syntactically?  I'm -1 on the latter;
that would be far too much gratuitous code breakage.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."  --Brian W. Kernighan

From alexander.belopolsky at gmail.com  Tue Aug 15 06:15:54 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 15 Aug 2006 00:15:54 -0400
Subject: [Python-3000] [Python-Dev] Type of range object members
In-Reply-To: <44E132AA.1020900@gmail.com>
References: <d38f5330608141608o7c748a20ka1daa2504896b213@mail.gmail.com>
	<ca471dc20608141632r173e8d60o20af3787b7efbc9d@mail.gmail.com>
	<44E12B0A.9020907@gmail.com>
	<2DA248BC-5534-4CE5-A9C8-84259E8A71B2@local>
	<44E132AA.1020900@gmail.com>
Message-ID: <F01A6726-B65D-42CF-9449-59FD851DC0A1@local>


On Aug 14, 2006, at 10:34 PM, Nick Coghlan wrote:

> Alexander Belopolsky wrote:
[snip]
>> Would it make sense to unify rangeobject with PySliceObject?
>
> Not really. The memory layouts may end up being the same in Py3k,  
> but they're still different types. The major differences between  
> the two types just happen to lie in the methods they support (as  
> defined by the value of the type pointer in PyObject_HEAD), rather  
> than the data they contain.

The slice objects support a single method "indices", which I have to  
admit I've have not seen before a minute ago.  (I've grepped through  
the standard library and did not see it used anywhere).   The slice  
attributes start/stop/step are probably more useful, but I don't see  
why those cannot be added to the range object.

>
> Besides, the range object may actually keep the current optimised  
> behaviour for dealing with PyInt values, only falling back to  
> PyObject* if one of start, stop or step was too large to fit into a  
> PyInt.


How would that hurt reusing them for slicing?

From jcarlson at uci.edu  Tue Aug 15 07:43:17 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 14 Aug 2006 22:43:17 -0700
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <20060815033759.GA4078@panix.com>
References: <ca471dc20608142000i6879c970u9de31abe178434a9@mail.gmail.com>
	<20060815033759.GA4078@panix.com>
Message-ID: <20060814223931.19BD.JCARLSON@uci.edu>


Aahz <aahz at pythoncraft.com> wrote:
> 
> On Mon, Aug 14, 2006, Guido van Rossum wrote:
> > On 8/14/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >>
> >> However, I'm also wondering if we need an actual syntax, or if a simple
> >> convention would do the trick: start the names of positional-only arguments
> >> with an underscore.
> > 
> > Hm... and perhaps we could forbid keyword arguments starting with an
> > underscore in the call syntax?
> 
> Do you mean forbid by convention or syntactically?  I'm -1 on the latter;
> that would be far too much gratuitous code breakage.

At least 40 examples of it being used in a keyword argument in the 2.5b2
standard library (so sayeth my regular expression of '\((.*?\s)?_\w*=' ).

 - Josiah


From ncoghlan at gmail.com  Tue Aug 15 08:44:25 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 15 Aug 2006 16:44:25 +1000
Subject: [Python-3000] PEP3102 Keyword-Only Arguments
In-Reply-To: <d11dcfba0608142011t5bdce999r2f547c3d9fe53fcb@mail.gmail.com>
References: <b008462b0608111620q709e691fqdb6283b194e1a893@mail.gmail.com>	
	<fb6fbf560608122056w5a9af394ga358614c0d8d10d7@mail.gmail.com>	
	<ca471dc20608141038w55d67754s9407f52eaa5ce64b@mail.gmail.com>	
	<d11dcfba0608141049p24c03471k3c0252bd188ee5e7@mail.gmail.com>	
	<ca471dc20608141104i154efbfehf88e8f10f7877ea8@mail.gmail.com>	
	<44E126EA.5000200@gmail.com>	
	<ca471dc20608142000i6879c970u9de31abe178434a9@mail.gmail.com>
	<d11dcfba0608142011t5bdce999r2f547c3d9fe53fcb@mail.gmail.com>
Message-ID: <44E16D49.2010601@gmail.com>

Steven Bethard wrote:
> [Steven Bethard]
>> It would be really nice in the example above to mark ``self`` in
>> ``__call__`` as a positional only argument.
> 
> [Nick Coghlan]
>> However, I'm also wondering if we need an actual syntax, or if a simple
>> convention would do the trick: start the names of positional-only 
>> arguments
>> with an underscore.
> 
> That would certainly be good enough for me.  As long as it's
> documented and there's somewhere to point to when someone does it
> wrong, it solves my problem.

Putting something in PEP 8's section on naming conventions should do the trick 
(along with updating the standard library so that things like UserDict that 
accept arbitrary **kwargs use it for their positional arguments).

That would also serve as a reminder that the support for keyword arguments 
means that the parameter *names* are part of the public interface of a Python 
function along with their positions and types.

> 
> [Guido van Rossum]
>> Hm... and perhaps we could forbid keyword arguments starting with an
>> underscore in the call syntax?
> 
> -0.  As long as the convention exists somewhere, I don't think this
> buys us too much. I think supplying a keyword argument when you should
> be using a positional is about the same level of
> willing-to-shoot-yourself-in-the-foot as using attributes that are
> supposed to be private (the other place where leading underscores are
> suggested).

That's exactly the comparison I was aiming for - you *can* if you really have 
to, but you also *shouldn't*. And if you do, you'd better including a comment 
explaining why you have to if you don't want any reviewers complaining about it ;)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From paul at prescod.net  Tue Aug 15 15:56:18 2006
From: paul at prescod.net (Paul Prescod)
Date: Tue, 15 Aug 2006 06:56:18 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>
References: <mailman.34357.1155590629.27774.python-3000@python.org>
	<5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>
Message-ID: <1cb725390608150656o2587c0ddx2af8e7df80f8e7b8@mail.gmail.com>

On 8/14/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>
> At 1:51 PM 8/14/2006 -0700, "Paul Prescod" <paul at prescod.net> wrote:
> >On 8/14/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> > > The definition of a type as an annotation should probably be either
> > > defined or explicitly undefined.  Earlier discussions talked about
> > > things like
> > >
> > >     def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str)
> >
> >
> >I think that's a separate (large!) PEP. This PEP should disallow
> frameworks
> >from inventing their own meaning for this syntax (requiring them to at
> least
> >wrap). Then Guido and crew can dig into this issue on their own schedule.
>
> I see we haven't made nearly as much progress on the concept of "no
> predefined semantics" as I thought we had.  :(
>


i.e., -1 on constraining what types mean.
>
>
I don't understand what you're saying.

1. Do you (still?) agree that the meaning of the list type should be defined
as a semantically neutral container for other annotations?

2. Do you (still?) agree that the meanings of ALL built-in types at the
top-level should be reserved for the Python language designers and should
not be randomly used by framework developers. In other words: the function
type declaration syntax above should not be used by one third party type
checker while another third-party type checker uses the same structure to
mean something totally different. Note that I don't mind if they have
conflicting semantics for the same expression as long as the end-user is
forced to declare which semantic model they are using:

tc = typechecker.typecheck
tl = typelinter.check_types

def f (a:tc(int),
        b:tc(float | Decimal),
        c:tc([int, str, X])) -> tc(str)

def g (a:tl(int),
         b:tl(float | Decimal),
         c:tl([int, str, X])) -> tl(str)

3. Do you agree that 1. and 2. together promotes the experimentation and
variety that we need?

def f (a: [tc(int),tl("Integer")]
        b: [tc(float | Decimal), tl(Or("float", "Decimal")]
        c: [tc([int, str, X]), tl(listOf("Integer", "string", "X"))] ) ->
                        [tc(str), tl(str)]

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/31a4e803/attachment.htm 

From paul at prescod.net  Tue Aug 15 16:04:19 2006
From: paul at prescod.net (Paul Prescod)
Date: Tue, 15 Aug 2006 07:04:19 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <ca471dc20608142004t7f0eb56cj6efa40947504dc01@mail.gmail.com>
References: <mailman.34357.1155590629.27774.python-3000@python.org>
	<5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>
	<ca471dc20608142004t7f0eb56cj6efa40947504dc01@mail.gmail.com>
Message-ID: <1cb725390608150704x4ef5a9abm532cd7ebaae511d@mail.gmail.com>

On 8/14/06, Guido van Rossum <guido at python.org> wrote:
>
>
> Haven't I said that the whole time? I *thought* that Collin's PEP
> steered clear from the topic too. At the same time, does this preclude
> having some kind of "default" type notation in the standard library?


The PEP steered TOO far of this topic. If it is total free-for-all then when
and if we do come up with a standard syntax (whether in 2006 or 2010) it
will conflict with deployed code that used the same syntax to mean something
different then the standard. And even if there is never, ever, going to be a
standard, it must be possible for tools reading the annotations to know
whether the user intended their markup to conform to metadata-syntax 1,
where "int" means "32 bit int" or metadata syntax 2 where it means
"arbitrary sized int". Similarly, they must know whether the annotater
intended to use metadata syntax 1 where "tuple" means "fixed size,
heterogenous" or syntax 2 where it means "immutable list".

Finally, there must be a standard way for attaching more than one annotation
to a single parameter. The PEP did not define a syntax for that.

I think that there must be enough standardized infrastructure that
annotation processors can recognize the annotations that are applicable to
them and act on them, even if the user has chosen to use more than one
annotation scheme.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/b8793f1c/attachment.html 

From collinw at gmail.com  Tue Aug 15 16:06:28 2006
From: collinw at gmail.com (Collin Winter)
Date: Tue, 15 Aug 2006 09:06:28 -0500
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
Message-ID: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>

On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
> On 8/14/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> > On 8/14/06, Collin Winter <collinw at gmail.com> wrote:
> > > The problem with using lists is that its impossible for non-decorator
> > > annotation consumers to know which element "belongs" to them.
> >
> > The ones whose type they own -- which is why I see at least some
> > parallel to exceptions, and its inheritance based semantics.
> >
> >     def f(a:[mytype("asdfljasdf"),
> >              zope.mypackage.something(b,d,e),
> >              "a string",
> >              mytype([47]),
> >              15):
> >
> > Whoever defined mytype controls the meaning of the mytype annotations;
> > anyone not familiar with that package should ignore them (and hope
> > there were no side effects in the expressions that generated them).
> >
> > zope.mypackage controls that annotation; anyone not familiar with that
> > product should ignore it (and hope there were no side effects ...)
>
> As hideous as I think this is from an aesthetics/visual noise
> standpoint, it's probably the only reliable way to let both decorator-
> and non-decorator-based consumers work.

I've changed my mind. This idea isn't going to work at all.

The sticking point is that while this might allow decorator and
non-decorator-based consumers to operate side-by-side *within a single
program*, it makes it impossible for things like pychecker or an
optimising compiler to take advantage of the annotations.

Here's another stab at my earlier idea:

Here's the modified example

@docstring
@typechecker
@constrain_values
def foo(a: {'doc': "Frobnication count",
           'type': Number,
           'constrain_values': range(3, 9)},
       b: {'type': Number,
            # This can be only 4, 8 or 12
           'constrain_values': [4, 8, 12]}) -> {'type': Number}


We're still using dicts to hold the annotations, but instead of having
the dict keyed on the name (function.__name__) of the annotation
consumer, the keys are arbitrary (for certain values of "arbitrary").
To enable both in-program and static analysis, the most prominent keys
will be specified by the PEP. In this example, "type" and "doc" are
reserved keys; anything that needs the intended type of an annotation
will look at the "type" key, anything that's looking for special doc
strings will look at the "doc" key. Any other consumers are free to
define whatever keys they want (e.g., "constrain_values", above), so
long as they stay away from the reserved strings.

The dict form will be required, even if there's only one type of
annotation. To modify the example above to only use typechecker(),

@typechecker
def foo(a: {'type': Number},
       b: {'type': Number}) -> {'type': Number}


I'm going to raise the bar for future ideas on this subject: any
proposals must be able to address the following use cases:

1) Static analysis tools (pychecker, optimising compilers, etc) must
be able to use the annotations
2) Decorator-based annotation consumers must be able to use the annotations
3) Non-decorator-based annotation consumers (pydoc, etc) must be able
to use the annotations

Proposals that do not address all of these will not be considered.

Collin Winter

From p.f.moore at gmail.com  Tue Aug 15 16:38:34 2006
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 15 Aug 2006 15:38:34 +0100
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
Message-ID: <79990c6b0608150738p37debcf4qfa97400d9c17ba52@mail.gmail.com>

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
> Here's the modified example
>
> @docstring
> @typechecker
> @constrain_values
> def foo(a: {'doc': "Frobnication count",
>            'type': Number,
>            'constrain_values': range(3, 9)},
>        b: {'type': Number,
>             # This can be only 4, 8 or 12
>            'constrain_values': [4, 8, 12]}) -> {'type': Number}
>

I've been keeping out of this - I haven't followed the discussions,
and I am certainly not up to speed on the various subtleties, but
*surely* there's no intention that a monstrosity like this would count
as a "normal" function definition in Py3K???!!!!

> I'm going to raise the bar for future ideas on this subject: any
> proposals must be able to address the following use cases:
[...]
> Proposals that do not address all of these will not be considered.

Can I suggest a further constraint - anything that results in the
definition of a simple 2-argument function not fitting on a single
source line is probably unworkable in practice?

Paul.

From paul at prescod.net  Tue Aug 15 17:18:31 2006
From: paul at prescod.net (Paul Prescod)
Date: Tue, 15 Aug 2006 08:18:31 -0700
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
Message-ID: <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>

I totally do not understand the requirement for the dictionary and its extra
overhead.

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
>
>
> @typechecker
> def foo(a: {'type': Number},
>        b: {'type': Number}) -> {'type': Number}
>
>
> I'm going to raise the bar for future ideas on this subject: any
> proposals must be able to address the following use cases:
>
> 1) Static analysis tools (pychecker, optimising compilers, etc) must
> be able to use the annotations
> 2) Decorator-based annotation consumers must be able to use the
> annotations
> 3) Non-decorator-based annotation consumers (pydoc, etc) must be able
> to use the annotations


Consider the following syntax:

class MyType:
   def __init__(self, name):
        self.name = name

Number = MyType("Number")
Tuple = MyTime("Tuple")

def foo(a: tc(Number)) -> Tuple(Number, Number)

1. Static analysis tools can deal with this as much as with ANY truly
Pythonic syntax. Their ability to deal will depend (as in any syntax) on
their ability to do module or whole-program analysis. In your syntax, or
mine, "Number" could be defined dynamically. In either case, someone could
say "Number = None" and confuse everything.

2. A decorator based anaysis could look at __signatures__ and do what it
needs.

3. Similarly for non-decorator analyzers.

In fact, given that decorators are just syntactic sugar for function calls,
I don't see why they should require special consideration at all. If the
syntax works well for non-decorator consumers then decorators will be just a
special case. As far as static analysis tools: Python has never made major
concessions to them. Minor concessions, yes.

I'd ask that you add the following requirement:

 * must define how multiple annotation syntaxes can assign potentially
differing meanings to built-in types and objects, on the same parameter,
without actually conflicting

My full program (meeting all requirements) follows.

 Paul Prescod

====

def simulate_signature(sig):
    "simulates the signature feature of Pythn 3000"
    def _(func):
        func.__signature__ = sig
        return func
    return _

def my_print_signature(func):
    "a demo decorator that prints signatures."
    if hasattr(func, "__signature__"):
        sig = func.__signature__
        [my_print_arg(name, value) for name, value in sig.items()]
    return func

def my_print_arg(name, annotation):
    """print a single argument's
        declaration, skipping unknown anno types."""
    if isinstance(annotation, list):
        [my_print_arg(name, anno)
                for anno in annotation]
    elif conformsToInterface(annotation, MyType):
        print name
        annotation.print_arg()

def conformsToInterface(object, interface):
   "naive implemenation of interfaces"
   return isinstance(object, interface)

class MyType:
   def __init__(self, *children):
        self.children = children
   def print_arg(self):
        print self.children

#defined in your module. I have no knowledge of it
class YourType:
   def __init__(self, *stuff):
        pass

# a simple signature

# real syntax should be:
# def foo(bar: MyType(int))
@simulate_signature({"bar": MyType(int)})
def foo(bar):
        return (bar, bar)

# use print signature decorator

# real syntax should be:
# def foo2(bar: [MyType(int)...]) -> [MyType(...]
@my_print_signature
@simulate_signature({"bar": [MyType(int), YourType("int")],
                        "return": [MyType(tuple([int, int])),
                                   YourType("tuple of int,int")]})
def foo2(bar):
        return (bar, bar)

# can also be used as non-decorator
for name, val in vars().items():
    my_print_signature(val)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/6cee76da/attachment.html 

From collinw at gmail.com  Tue Aug 15 17:36:25 2006
From: collinw at gmail.com (Collin Winter)
Date: Tue, 15 Aug 2006 10:36:25 -0500
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
Message-ID: <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>

On 8/15/06, Paul Prescod <paul at prescod.net> wrote:
> I totally do not understand the requirement for the dictionary and its extra
> overhead.

Under your proposal, annotation consumer libraries have to provide
wrappers for Python's built-in types, since the only way a library has
of knowing whether it should process a given object is by applying a
subclass test.

Extending this same idea to static analysis tools, tools like
pychecker or an optimising compiler would have to supply their own
such wrapper classes. This would be a huge burden, not just on the
authors of such tools, but also on those wishing to use these tools.

I want people to be able to use Python's built-in types without ugly
wrapper classes or any other similar impediments to their pre-existing
Python workflow/thought patterns.

Collin Winter

From pje at telecommunity.com  Tue Aug 15 18:05:22 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 15 Aug 2006 12:05:22 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608150704x4ef5a9abm532cd7ebaae511d@mail.gmail.com
 >
References: <ca471dc20608142004t7f0eb56cj6efa40947504dc01@mail.gmail.com>
	<mailman.34357.1155590629.27774.python-3000@python.org>
	<5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>
	<ca471dc20608142004t7f0eb56cj6efa40947504dc01@mail.gmail.com>
Message-ID: <5.1.1.6.0.20060815120206.026052e8@sparrow.telecommunity.com>

At 07:04 AM 8/15/2006 -0700, Paul Prescod wrote:

>On 8/14/06, Guido van Rossum <<mailto:guido at python.org>guido at python.org> 
>wrote:
>>
>>Haven't I said that the whole time? I *thought* that Collin's PEP
>>steered clear from the topic too. At the same time, does this preclude
>>having some kind of "default" type notation in the standard library?
>
>The PEP steered TOO far of this topic. If it is total free-for-all then 
>when and if we do come up with a standard syntax (whether in 2006 or 2010) 
>it will conflict with deployed code that used the same syntax to mean 
>something different then the standard. And even if there is never, ever, 
>going to be a standard, it must be possible for tools reading the 
>annotations to know whether the user intended their markup to conform to 
>metadata-syntax 1, where "int" means "32 bit int" or metadata syntax 2 
>where it means "arbitrary sized int". Similarly, they must know whether 
>the annotater intended to use metadata syntax 1 where "tuple" means "fixed 
>size, heterogenous" or syntax 2 where it means "immutable list".

On the contrary - it is precisely this looseness that the PEP meant to 
specify, and that I support.  The alternative is too restrictive.

Meanwhile, the absence of predefined semantics does *not* preclude a 
default type notation existing in the standard library, any more than the 
absence of a predefined semantics for docstrings or function attributes 
prevents the stdlib from containing docstring processors or tools that 
operate on function attributes.


From pje at telecommunity.com  Tue Aug 15 18:09:48 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 15 Aug 2006 12:09:48 -0400
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <1cb725390608150656o2587c0ddx2af8e7df80f8e7b8@mail.gmail.co
 m>
References: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>
	<mailman.34357.1155590629.27774.python-3000@python.org>
	<5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060815120531.02601dd8@sparrow.telecommunity.com>

At 06:56 AM 8/15/2006 -0700, Paul Prescod wrote:
>On 8/14/06, Phillip J. Eby 
><<mailto:pje at telecommunity.com>pje at telecommunity.com> wrote:
>>At 1:51 PM 8/14/2006 -0700, "Paul Prescod" 
>><<mailto:paul at prescod.net>paul at prescod.net> wrote:
>> >On 8/14/06, Jim Jewett 
>> <<mailto:jimjjewett at gmail.com>jimjjewett at gmail.com> wrote:
>> > > The definition of a type as an annotation should probably be either
>> > > defined or explicitly undefined.  Earlier discussions talked about
>> > > things like
>> > >
>> > >     def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str)
>> >
>> >
>> >I think that's a separate (large!) PEP. This PEP should disallow frameworks
>> >from inventing their own meaning for this syntax (requiring them to at 
>> least
>> >wrap). Then Guido and crew can dig into this issue on their own schedule.
>>
>>I see we haven't made nearly as much progress on the concept of "no
>>predefined semantics" as I thought we had.  :(
>
>
>>i.e., -1 on constraining what types mean.
>
>I don't understand what you're saying.

I'm saying that we don't need a predefined semantics for annotation objects 
of type 'type'; i.e. the PEP need not define what "a:int" means.  I'm 
roughly +0 on having predefined semantics for annotation objects of type 
'list' and 'str'.


>1. Do you (still?) agree that the meaning of the list type should be 
>defined as a semantically neutral container for other annotations?

I believe it should be a recommended best practice -- "defined" is too 
strong a word.


>2. Do you (still?) agree that the meanings of ALL built-in types at the 
>top-level should be reserved for the Python language designers and should 
>not be randomly used by framework developers. In other words: the function 
>type declaration syntax above should not be used by one third party type 
>checker while another third-party type checker uses the same structure to 
>mean something totally different. Note that I don't mind if they have 
>conflicting semantics for the same expression as long as the end-user is 
>forced to declare which semantic model they are using:

I don't see a reason to require an explicit wrapper except as a 
disambiguator.  That is, until you *actually* need them, 
discriminator-wrappers are a YAGNI.


From paul at prescod.net  Tue Aug 15 20:09:17 2006
From: paul at prescod.net (Paul Prescod)
Date: Tue, 15 Aug 2006 11:09:17 -0700
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
Message-ID: <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
>
> On 8/15/06, Paul Prescod <paul at prescod.net> wrote:
> > I totally do not understand the requirement for the dictionary and its
> extra
> > overhead.
>
> Under your proposal, annotation consumer libraries have to provide
> wrappers for Python's built-in types, since the only way a library has
> of knowing whether it should process a given object is by applying a
> subclass test.
>
> Extending this same idea to static analysis tools, tools like
> pychecker or an optimising compiler would have to supply their own
> such wrapper classes. This would be a huge burden, not just on the
> authors of such tools, but also on those wishing to use these tools.


No, this is incorrect. Metadata is just metadata. Libraries act on
metadata.  There is a many to many relationship. You could go and define
Collin's type metadata syntax. You create a library of wrappers (really you
need only ONE wrapper). Then you could convince the writers of PyPy to use
the same syntax. So there would be one set of annotations used by two
libraries.

Here's what the definition of the one wrapper could look like:

class my_type:
   def __init__(self, data):
       self.data = data

That's it. That's all you need to implement.

I want people to be able to use Python's built-in types without ugly
> wrapper classes or any other similar impediments to their pre-existing
> Python workflow/thought patterns.


The wrapper class doesn't need to be ugly. Just:

from typecheck import my_type as t

def foo(a: t(int, int), b: t("abc")): ...

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/6a95fa1a/attachment.html 

From collinw at gmail.com  Tue Aug 15 20:28:24 2006
From: collinw at gmail.com (Collin Winter)
Date: Tue, 15 Aug 2006 13:28:24 -0500
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
Message-ID: <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>

On 8/15/06, Paul Prescod <paul at prescod.net> wrote:
> On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
> > Extending this same idea to static analysis tools, tools like
> > pychecker or an optimising compiler would have to supply their own
> > such wrapper classes. This would be a huge burden, not just on the
> > authors of such tools, but also on those wishing to use these tools.
>
> No, this is incorrect. Metadata is just metadata. Libraries act on metadata.
>  There is a many to many relationship. You could go and define Collin's type
> metadata syntax. You create a library of wrappers (really you need only ONE
> wrapper). Then you could convince the writers of PyPy to use the same
> syntax. So there would be one set of annotations used by two libraries.

If multiple libraries use the same wrappers, then I can't use more
than one of these libraries at the same time. If a typechecking
consumer, a docstring consumer and PyPy all use the same wrapper (or
"syntax" -- you switch terms between sentences), then I can't have
typechecking and docstrings on the same functions, and I can't do
either if I'm running my program with PyPy.

Collin Winter

From paul at prescod.net  Tue Aug 15 20:54:48 2006
From: paul at prescod.net (Paul Prescod)
Date: Tue, 15 Aug 2006 11:54:48 -0700
Subject: [Python-3000] Draft pre-PEP: function annotations
In-Reply-To: <5.1.1.6.0.20060815120531.02601dd8@sparrow.telecommunity.com>
References: <mailman.34357.1155590629.27774.python-3000@python.org>
	<5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com>
	<5.1.1.6.0.20060815120531.02601dd8@sparrow.telecommunity.com>
Message-ID: <1cb725390608151154y6ef8138dy1b029f8b84339fa9@mail.gmail.com>

On 8/15/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>
>
> I don't see a reason to require an explicit wrapper except as a
> disambiguator.  That is, until you *actually* need them,
> discriminator-wrappers are a YAGNI.



How will you know you "actually" need them until you run a tool on your code
and it crashes or give the wrong result? And what will you do then, go and
clean up your code? And what if the libraries have defined no disambiguation
syntax? Then what?

Function attributes are at least disambiguated by name. You can't put a
function attribute on a function without giving it a name. We need at least
this level of disambiguation for metadata.

Docstrings have become somewhat of a mess of various meanings. Back in the
late 90s I attached XPaths to them and the Spark guy attached parser grammar
instructions. Pydoc pointed at one of my xpath-embedding classes would
produce useless gibberish. So in that sense there was a clash. Given that
pydoc is seldom a mission critical part of any system, this is a minor
issue. Confused type declarations could cause bigger problems, from crashed
compilers to segmentation faults in intepreters.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/92902a10/attachment.htm 

From paul at prescod.net  Tue Aug 15 21:07:57 2006
From: paul at prescod.net (Paul Prescod)
Date: Tue, 15 Aug 2006 12:07:57 -0700
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
Message-ID: <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
>
> If multiple libraries use the same wrappers, then I can't use more
> than one of these libraries at the same time. If a typechecking
> consumer, a docstring consumer and PyPy all use the same wrapper (or
> "syntax" -- you switch terms between sentences), then I can't have
> typechecking and docstrings on the same functions, and I can't do
> either if I'm running my program with PyPy.


There is a MANY TO MANY relationship between syntaxes (as denoted by
wrappers) and tools that work on those syntaxes.

Think of it by analogy: there are programming languages and there are
interpreters. Some programming languages run on multiple interpreters
(e.gPython on .NET, JVM, PyPy, CPython). Some interpreters run
multiple
languages (e.g. .NET, JVM). Some interpreters run a single language
(CPython).

Or another analogy from my domain: there are a variety of XML syntaxes. Some
are designed for a single program. Others, like Atom, are designed for many,
many programs. Also, some programs can handle a single input format. Others
(like RSS/Atom readers) can consume many.

A Typechecking consumer and a PyPy compiler consumer might work on the same
annotations because they are both interested in TYPES (but doing different
things with them). These type consumers might also choose to implement more
than one type checking syntax, if there were a good reason that more than
one arose (perhaps Unix types versus .NET types).

A docstring consumer and a typechecking consumer would *by definition* use
different syntaxes/frameworks/wrappers because the information that they are
looking for is different! But there could be hundreds of docstring consumers
(as there are today!). Docstrings are a special case because the syntax for
them is fairly obvious (an unadorned string).

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/9fa6ca8b/attachment.html 

From collinw at gmail.com  Tue Aug 15 21:13:16 2006
From: collinw at gmail.com (Collin Winter)
Date: Tue, 15 Aug 2006 14:13:16 -0500
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
Message-ID: <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>

On 8/15/06, Paul Prescod <paul at prescod.net> wrote:
> A Typechecking consumer and a PyPy compiler consumer might work on the same
> annotations because they are both interested in TYPES (but doing different
> things with them). These type consumers might also choose to implement more
> than one type checking syntax, if there were a good reason that more than
> one arose (perhaps Unix types versus .NET types).
>
> A docstring consumer and a typechecking consumer would *by definition* use
> different syntaxes/frameworks/wrappers because the information that they are
> looking for is different! But there could be hundreds of docstring consumers
> (as there are today!). Docstrings are a special case because the syntax for
> them is fairly obvious (an unadorned string).

So basically what you're saying is that there would be a more-or-less
standard wrapper for each application of function annotations. How is
this significantly better than my dict-based approach, which uses
standardised dict keys to indicate the kind of metadata?

Collin Winter

From paul at prescod.net  Tue Aug 15 21:30:36 2006
From: paul at prescod.net (Paul Prescod)
Date: Tue, 15 Aug 2006 12:30:36 -0700
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
Message-ID: <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
>
> So basically what you're saying is that there would be a more-or-less
> standard wrapper for each application of function annotations.


No, I explicitly said that there may or may not arise standards based upon
the existence or non-existence of community consensus and convergence of
requirements. Just as there may or may not arise a standard Python web
application framework depending on whether the community converges or does
not.

How is
> this significantly better than my dict-based approach, which uses
> standardised dict keys to indicate the kind of metadata?


The dict-based approach introduces an extra namespace to manage. What if two
different groups start fighting over the keyword "type" or "doc" or "lock"?
Python already has a module system that allows you to use the word "type"
and me to use the word "type" without conflict (though I can't guarantee
that it won't be confusing!). Python's module system allows renaming and
abbreviating: both valuable features.

Also, the dict-based approach is just more punctuation to type. What is the
dict equivalent for:

def foo(a: type(int)) -> type(int):
  ...

versus

def foo(a: {"type":int}) -> {"type": int}:

In my approach you could do this:

Int = type(int)

def foo(a: Int) -> Int

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/8e7c6d23/attachment.htm 

From collinw at gmail.com  Tue Aug 15 22:13:19 2006
From: collinw at gmail.com (Collin Winter)
Date: Tue, 15 Aug 2006 15:13:19 -0500
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
Message-ID: <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>

On 8/15/06, Paul Prescod <paul at prescod.net> wrote:
> On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
> > How is
> > this significantly better than my dict-based approach, which uses
> > standardised dict keys to indicate the kind of metadata?
>
> The dict-based approach introduces an extra namespace to manage. What if two
> different groups start fighting over the keyword "type" or "doc" or "lock"?

How do you foresee this arising? Do you think users will start wanting
to apply several different typechecking systems to the same function?

The idea behind these standard keys is to a) keep them limited in
number, and b) keep them limited in scope. At the moment, I can only
foresee two of these: "type" and "doc". My justification for "type" is
that users won't be using multiple type systems on the same parameter
(and if they are, that their own problem); for "doc" is that a
docstring is just a Python string, and there's really only own way to
look at that within the scope of documentation strings.

Beyond these applications, the annotation consumers are on their own.
Consumers that operate in the same domain may well coordinate their
keys, and popular keys might make it into the list of standard keys
(like the process for getting a module into the stdlib).

I hope to have a second draft of the pre-PEP within a few days that
includes this idea.

Collin Winter

From ironfroggy at gmail.com  Wed Aug 16 00:20:14 2006
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Tue, 15 Aug 2006 18:20:14 -0400
Subject: [Python-3000] Fwd: Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
	<76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>
Message-ID: <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
> On 8/15/06, Paul Prescod <paul at prescod.net> wrote:
> > On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
> > > How is
> > > this significantly better than my dict-based approach, which uses
> > > standardised dict keys to indicate the kind of metadata?
> >
> > The dict-based approach introduces an extra namespace to manage. What if two
> > different groups start fighting over the keyword "type" or "doc" or "lock"?
>
> How do you foresee this arising? Do you think users will start wanting
> to apply several different typechecking systems to the same function?
>
> The idea behind these standard keys is to a) keep them limited in
> number, and b) keep them limited in scope. At the moment, I can only
> foresee two of these: "type" and "doc". My justification for "type" is
> that users won't be using multiple type systems on the same parameter
> (and if they are, that their own problem); for "doc" is that a
> docstring is just a Python string, and there's really only own way to
> look at that within the scope of documentation strings.
>
> Beyond these applications, the annotation consumers are on their own.
> Consumers that operate in the same domain may well coordinate their
> keys, and popular keys might make it into the list of standard keys
> (like the process for getting a module into the stdlib).
>
> I hope to have a second draft of the pre-PEP within a few days that
> includes this idea.
>
> Collin Winter

The dictionary approach, although it is what I was originally planning
to support, is just too ugly and too limited. String keys can be
ambiguous, but objects can not. The arguments against the better
approaches, which you keep trying to repeat, just don't hold up. The
non-dictionary, multiple annotation proposals can stand up to your
requirements perfectly, and fulfill the requirements even better than
the dictionary approach.

1) Static analysis tools (pychecker, optimising compilers, etc) must
be able to use the annotations

As in any example given so far, the annotations would be instansiated
within the function definition itself, which means the form 'def
foo(a: Bar(baz))' is to be expected. This form could even be
documented as the prefered way, as opposed to instansiating the
annotation object before hand and simply using its name in the
function definition. This leads to simple parsing by external tools,
which would be able to deduce what bar is (because before that line
there was an 'from bar import Bar'.

Dictionary string keys are just too limited and offer too much chance
for conflicts. Better to avoid them now than after there are
established and conflicting libraries expecting different things.

2) Decorator-based annotation consumers must be able to use the annotations
3) Non-decorator-based annotation consumers (pydoc, etc) must be able
to use the annotations

A simple filter on the type of the annotations (maybe a helper
function in some basic annotation utility library would be helpful)
will help any consumer get the types of annotations it needs.

In the end, the biggest argument against the dictionary approach is
that it is simply too ugly, and would be almost impossible to get
around for even a single annotation on a parameter.

From collinw at gmail.com  Wed Aug 16 00:29:48 2006
From: collinw at gmail.com (Collin Winter)
Date: Tue, 15 Aug 2006 17:29:48 -0500
Subject: [Python-3000] Fwd: Conventions for annotation consumers (was:
	Re: Draft pre-PEP: function annotations)
In-Reply-To: <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
	<76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>
	<76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>
Message-ID: <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>

On 8/15/06, Calvin Spealman <ironfroggy at gmail.com> wrote:
> On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
>> 1) Static analysis tools (pychecker, optimising compilers, etc) must
>> be able to use the annotations
>
> As in any example given so far, the annotations would be instansiated
> within the function definition itself, which means the form 'def
> foo(a: Bar(baz))' is to be expected. This form could even be
> documented as the prefered way, as opposed to instansiating the
> annotation object before hand and simply using its name in the
> function definition. This leads to simple parsing by external tools,
> which would be able to deduce what bar is (because before that line
> there was an 'from bar import Bar'.

How exactly do they "deduce" what Bar is, just from the "from bar
import Bar" line? pychecker would have to import and compile the Bar
module first. What if being able to import bar depends on some import
hooks that some other module (imported before bar) installed? I guess
you'd have to follow the entire import graph just to make sure. Oh,
and you'd have to end up running the module being analysed in case
*it* installs some import hooks -- or maybe it defines Bar itself.

Your proposal isn't workable.

Collin Winter

From jimjjewett at gmail.com  Wed Aug 16 01:08:41 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 15 Aug 2006 19:08:41 -0400
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
Message-ID: <fb6fbf560608151608g57320277v9caa4ade2aef4462@mail.gmail.com>

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:

> Here's another stab at my earlier idea: ...

> We're still using dicts to hold the annotations, but instead of having
> the dict keyed on the name (function.__name__) of the annotation
> consumer, the keys are arbitrary (for certain values of "arbitrary").
> To enable both in-program and static analysis, the most prominent keys
> will be specified by the PEP. In this example, "type" and "doc" are
> reserved keys; anything that needs the intended type of an annotation
> will look at the "type" key, anything that's looking for special doc
> strings will look at the "doc" key. Any other consumers are free to
> define whatever keys they want (e.g., "constrain_values", above), so
> long as they stay away from the reserved strings.

That seems to get the worst of both worlds.

Static tools will now know that something is intended to express type
information, but still won't know whether it describes typical usage,
an invariant, or an adapter that will make any argument work.

Meanwhile, two different systems can still clash on "constrain_values"
(as well as "type"), without the benefit of an actual type object (or
a name associated with an import) to disambiguate.

> 1) Static analysis tools (pychecker, optimising compilers, etc) must
> be able to use the annotations

If the ownership is by object type, then static tools can get at least
a pretty good idea by looking at the name used to construct those
types.  Realistically, if

    >>> from zope.mypackage import something as anno1
    ...
    >>> def f(a:anno1("asfd"))

does not provide enough information, then nothing static ever will.

-jJ

From jimjjewett at gmail.com  Wed Aug 16 01:22:24 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 15 Aug 2006 19:22:24 -0400
Subject: [Python-3000] Conventions for annotation consumers (was: Re:
	Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
Message-ID: <fb6fbf560608151622v7672fd5arb2d3dbac00a6e680@mail.gmail.com>

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
> ... that users won't be using multiple type systems on the same parameter
> (and if they are, that their own problem); for "doc" is that a
> docstring is just a Python string, and there's really only own way to
> look at that within the scope of documentation strings.

oh ye of little cynicism.

(1)  I might well restrict *myself* to a single type system.  But that
doesn't mean I don't ever want to use someone else's modules, or that
I don't want a doc tool to handle them.

(2)  doc strings already exist, and have already grown inconsistent
microstructure.

"""one line summary -- may or may not include the call signature

Longer documentation, which may or may not also include doctests or
ReST or html or sample calls in a non-doctest format or magic tokens
used by various frameworks, such as Design By Contract wrappers.

Oh, and that first blank line?  Some tools rely on it.  Some functions
don't use it.

Of course, some functions don't use docstrings at all, because the
writers are already afraid that a framework like unittest will
misinterpret them."""

-jJ

From tim.hochberg at ieee.org  Wed Aug 16 01:26:30 2006
From: tim.hochberg at ieee.org (Tim Hochberg)
Date: Tue, 15 Aug 2006 16:26:30 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers (was:
 Re: Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>	<76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>	<76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>
	<43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
Message-ID: <ebtl7o$jd7$1@sea.gmane.org>

Collin Winter wrote:
> On 8/15/06, Calvin Spealman <ironfroggy at gmail.com> wrote:
>> On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
>>> 1) Static analysis tools (pychecker, optimising compilers, etc) must
>>> be able to use the annotations
>> As in any example given so far, the annotations would be instansiated
>> within the function definition itself, which means the form 'def
>> foo(a: Bar(baz))' is to be expected. This form could even be
>> documented as the prefered way, as opposed to instansiating the
>> annotation object before hand and simply using its name in the
>> function definition. This leads to simple parsing by external tools,
>> which would be able to deduce what bar is (because before that line
>> there was an 'from bar import Bar'.
> 
> How exactly do they "deduce" what Bar is, just from the "from bar
> import Bar" line? pychecker would have to import and compile the Bar
> module first. What if being able to import bar depends on some import
> hooks that some other module (imported before bar) installed? I guess
> you'd have to follow the entire import graph just to make sure. Oh,
> and you'd have to end up running the module being analysed in case
> *it* installs some import hooks -- or maybe it defines Bar itself.

Why does PyChecker need to "deduce" what Bar is at all? It seems that 
either bar.Bar is something that PyChecker knows about, because it 
indicates something that it knows how to check. Or, it's something it 
doesn't know about in which case it can safely ignore it. I fail to see 
any significant difference in


def foo(a: Bar(baz)): ...

versus

def foo(a: {'Bar' : baz}): ...

except that the latter is harder to read and more prone to name colisions.

> 
> Your proposal isn't workable.

I, at least, fail to see why at this point

-tim

>


From pje at telecommunity.com  Wed Aug 16 02:17:29 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 15 Aug 2006 20:17:29 -0400
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <mailman.34618.1155684147.27774.python-3000@python.org>
Message-ID: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>

At 05:29 PM 8/16/2006 -0500, "Collin Winter" <collinw at gmail.com> wrote:
>On 8/15/06, Calvin Spealman <ironfroggy at gmail.com> wrote:
> > On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
> >> 1) Static analysis tools (pychecker, optimising compilers, etc) must
> >> be able to use the annotations
> >
> > As in any example given so far, the annotations would be instansiated
> > within the function definition itself, which means the form 'def
> > foo(a: Bar(baz))' is to be expected. This form could even be
> > documented as the prefered way, as opposed to instansiating the
> > annotation object before hand and simply using its name in the
> > function definition. This leads to simple parsing by external tools,
> > which would be able to deduce what bar is (because before that line
> > there was an 'from bar import Bar'.
>
>How exactly do they "deduce" what Bar is, just from the "from bar
>import Bar" line? pychecker would have to import and compile the Bar
>module first. What if being able to import bar depends on some import
>hooks that some other module (imported before bar) installed? I guess
>you'd have to follow the entire import graph just to make sure. Oh,
>and you'd have to end up running the module being analysed in case
>*it* installs some import hooks -- or maybe it defines Bar itself.
>
>Your proposal isn't workable.

By that logic, neither is Python.  :)

I think you mean the reverse; the proposal instead shows that requirement 
#1 is what's not workable here.

I'm frankly baffled by the amount of "protect users from incompatibility" 
ranting that this issue has generated.  If I wanted to use Java, I'd know 
where to find it.  Guido has said time and again that Python's balance 
favors the individual developer at the expense of the group where 
"consenting adults" is concerned, and Py3K isn't intended to change that 
balance.

Personally, I thought Guido's original proposal for function annotations, 
which included a __typecheck__ operator that was replaceable on a 
per-module basis (and defaulted to a no-op), was the perfect thing -- 
neither too much semantics nor too-little.  I'd like to have it back, 
please.  :)


From guido at python.org  Wed Aug 16 02:21:27 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 15 Aug 2006 17:21:27 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com>
References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com>
	<ca471dc20608101621j557f735cs10f4f491eb3b2ee5@mail.gmail.com>
	<1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com>
Message-ID: <ca471dc20608151721g1f426a8i8fd80eecdb18ddfd@mail.gmail.com>

On 8/11/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> [Guido]
> > I expect that Jython doesn't implement this; it doesn't handle ^C either AFAIK.
>
> threads are at most platform agnostic (old unices, embedded systems, etc.
> are not likely to have thread support)

I'm not sure what "platform agnostic" means to you. I think you mean
"a platform dependent optional feature"?

> so keeping this in mind, and having interrupt_main part of the standard
> thread API, which as you say, may not be implementation agnostic,
> why is thread.raise_exc(id, excobj) a bad API?

Because it is more general than interrupt_main(). I'm happy to declare
the latter a CPython exclusive feature that not all other platforms
may support even if they have threads. raise_exc() would have at best
the same status; I imagine the set of platforms where it can be
implemented is smaller than the set of platforms that can support
interrupt_main().

> and as i recall, dotNET's Thread.AbortThread or whatever it's called
> works that way (raising an exception in the other thread), so IronPython
> for once, should be happy with it.

But Jython?

> by the way, is the GIL part of the python standard? i.e., does IronPython
> implement it, although it shouldn't be necessary in dotNET?

No. Neither Jython nor IronPython have it. But since the presence of
the GIL is never directly detectable from Python code, I'm not sure
how it *could* be part of the language standard.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 16 02:23:42 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 15 Aug 2006 17:23:42 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <bb8868b90608110847wb5465eekd13cdb454eeac4cb@mail.gmail.com>
References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com>
	<ca471dc20608101621j557f735cs10f4f491eb3b2ee5@mail.gmail.com>
	<1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com>
	<bb8868b90608110847wb5465eekd13cdb454eeac4cb@mail.gmail.com>
Message-ID: <ca471dc20608151723g7e04325dvd1de4bdb75296d30@mail.gmail.com>

On 8/11/06, Jason Orendorff <jason.orendorff at gmail.com> wrote:
> On 8/11/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> > why is thread.raise_exc(id, excobj) a bad API?
>
> It breaks seemingly innocent code in subtle ways.  Worse, the breakage
> will always be a race condition, so it'll be especially hard to
> reproduce and debug.

So is KeyboardInterrupt. But at least that can't be raised in threads.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 16 02:28:21 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 15 Aug 2006 17:28:21 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <bb8868b90608110904s289236d7i9b60f14969966625@mail.gmail.com>
References: <c56e219d0608102001i44b1267dqb581c2171ced33ce@mail.gmail.com>
	<20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se>
	<20060811082620.192E.JCARLSON@uci.edu>
	<bb8868b90608110904s289236d7i9b60f14969966625@mail.gmail.com>
Message-ID: <ca471dc20608151728s16b8806fk56cbfba639d9233a@mail.gmail.com>

On 8/11/06, Jason Orendorff <jason.orendorff at gmail.com> wrote:
> On 8/11/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Slawomir Nowaczyk <slawomir.nowaczyk.847 at student.lu.se> wrote:
> > > But it should not be done lightly and never when the code is not
> > > specifically expecting it.
> >
> > If you don't want random exceptions being raised in your threads, then
> > don't use this method that is capable of raising exceptions somewhat
> > randomly.
>
> I agree.  The only question is how dire the warnings should be.
>
> I'll answer that question with another question:  Are we going to make
> the standard library robust against asynchronous exceptions?  For
> example, class Thread has an attribute __stopped that is set using
> code similar to the example code I posted.  An exception at just the
> wrong time would kill the thread while leaving __stopped == False.
>
> Maybe that particular case is worth fixing, but to find and fix them
> all?  Better to put strong warnings on this one method: may cause
> unpredictable brokenness.

That is a rather special case since this code (unlike most stdlib
code) can assume it won't get asynchronous exceptions like
KeyboardInterrupt, since that can't be raised in threads.

I expect that the unpredictable brokenness is even bigger in most user
code -- *most* people can't write threadsafe code if their life
depended on it. I believe the only exception I know is Tim Peters.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 16 02:29:32 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 15 Aug 2006 17:29:32 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <87fyg32oo8.fsf@qrnik.zagroda>
References: <c56e219d0608102001i44b1267dqb581c2171ced33ce@mail.gmail.com>
	<20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se>
	<20060811082620.192E.JCARLSON@uci.edu> <87fyg32oo8.fsf@qrnik.zagroda>
Message-ID: <ca471dc20608151729v2392d140q46ce96f71341896@mail.gmail.com>

On 8/11/06, Marcin 'Qrczak' Kowalczyk <qrczak at knm.org.pl> wrote:
> I do want asynchronous exceptions, but not anywhere, only in selected
> regions (or excluding selected regions). This can be designed well.

Please write a proto-PEP.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 16 02:40:29 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 15 Aug 2006 17:40:29 -0700
Subject: [Python-3000] threading, part 2
In-Reply-To: <ebqhvi$80m$1@sea.gmane.org>
References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com>
	<ca471dc20608141117l1c61247fy39bae2b00d45675d@mail.gmail.com>
	<ebqhvi$80m$1@sea.gmane.org>
Message-ID: <ca471dc20608151740k3ce102a3oe17be8e542e10b1b@mail.gmail.com>

On 8/14/06, Georg Brandl <g.brandl at gmx.net> wrote:
> Guido van Rossum wrote:
> > On 8/11/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> >> i mailed this to several people separately, but then i thought it could
> >> benefit the entire group:
> >>
> >> http://sebulba.wikispaces.com/recipe+thread2
> >>
> >> it's an implementation of the proposed " thread.raise_exc", through an extension
> >> to the threading.Thread class. you can test it for yourself; if it proves useful,
> >> it should be exposed as thread.raise_exc in the stdlib (instead of the ctypes
> >> hack)... and of course it should be reflected in threading.Thread as welll.
> >
> > Cool. Question: what's the problem with raising exception instances?
> > Especially in the light of my proposal to use
> >
> >   raise SomeException(42)
> >
> > in preference over (and perhaps exclusively instead of)
> >
> >   raise SomeException, 42
> >
> > in Py3k. The latter IMO is a relic from the days of string exceptions
> > which are as numbered as they come. :-)
>
> I think this is the answer:
>
> http://mail.python.org/pipermail/python-dev/2006-August/068165.html

Hopefully we can fix this in 2.6 or 3.0.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Aug 16 03:09:54 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 16 Aug 2006 13:09:54 +1200
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
 Conventions for annotation consumers)
In-Reply-To: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
Message-ID: <44E27062.2040406@canterbury.ac.nz>

Collin Winter wrote:

> @docstring
> @typechecker
> @constrain_values
> def foo(a: {'doc': "Frobnication count",
>            'type': Number,
>            'constrain_values': range(3, 9)},
>        b: {'type': Number,
>             # This can be only 4, 8 or 12
>            'constrain_values': [4, 8, 12]}) -> {'type': Number}

There's another thing that's bothering me about all this.
The main reason Guido rejected the originally suggested
syntax for function decorators was that it put too much
stuff into the function header and obscured the signature.

Now we seem to be about to open ourselves up to the
same problem on an even bigger scale. Who can honestly
say that the above function declaration is easy to read?
To me it looks downright ugly.

--
Greg

From ironfroggy at gmail.com  Wed Aug 16 03:21:22 2006
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Tue, 15 Aug 2006 21:21:22 -0400
Subject: [Python-3000] Fwd: Conventions for annotation consumers (was:
	Re: Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
	<76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>
	<76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>
	<43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
Message-ID: <76fd5acf0608151821m25ab8048ia6f3d6f288d59338@mail.gmail.com>

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
> On 8/15/06, Calvin Spealman <ironfroggy at gmail.com> wrote:
> > On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
> >> 1) Static analysis tools (pychecker, optimising compilers, etc) must
> >> be able to use the annotations
> >
> > As in any example given so far, the annotations would be instansiated
> > within the function definition itself, which means the form 'def
> > foo(a: Bar(baz))' is to be expected. This form could even be
> > documented as the prefered way, as opposed to instansiating the
> > annotation object before hand and simply using its name in the
> > function definition. This leads to simple parsing by external tools,
> > which would be able to deduce what bar is (because before that line
> > there was an 'from bar import Bar'.
>
> How exactly do they "deduce" what Bar is, just from the "from bar
> import Bar" line? pychecker would have to import and compile the Bar
> module first. What if being able to import bar depends on some import
> hooks that some other module (imported before bar) installed? I guess
> you'd have to follow the entire import graph just to make sure. Oh,
> and you'd have to end up running the module being analysed in case
> *it* installs some import hooks -- or maybe it defines Bar itself.
>
> Your proposal isn't workable.
>
> Collin Winter

Any external tool, which would need to analyze the annotations
statically would either know what the module bar is and what the
object bar.Bar is, or it would ignore it. Thus it has no need to
import or statically parse the modules imported for annotation objects
at all. For example, you may 'from annodoc import doc' and then 'def
foo(a: doc("the only argument"))', so a documentation generator would
be aware of what the annodoc module was and doesn't need to introspect
it in order to understand the annotations.

You're outright refusal to accept the arguments against these points
of your proposal is dragging this discussion on to an unneeded length.
The majority consensus is pointing away from the dictionary
multi-annotations you try to propose or the leave-and-let-be stand
point you originally tried to keep, while type-based annotations seem
much more well agreed upon and has more support. This continually
stretching debate needs to reach a consensus and the best recepted
idea might not be yours.

We really need to see the PEP draft updated to reflect something of a
solution to these issues and there is much less debate than the volumn
of discussion would suggest, so the answers are clear enough to move
forward with.

From greg.ewing at canterbury.ac.nz  Wed Aug 16 03:32:51 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 16 Aug 2006 13:32:51 +1200
Subject: [Python-3000] Conventions for annotation consumers
In-Reply-To: <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
Message-ID: <44E275C3.2070508@canterbury.ac.nz>

Paul Prescod wrote:
> What if 
> two different groups start fighting over the keyword "type" or "doc" or 
> "lock"? Python already has a module system that allows you to use the 
> word "type" and me to use the word "type" without conflict

But, in general, performing this disambiguation requires
executing the module that is making the annotations. For
a processor that only wants to deal with the source, this
is undesirable.

--
Greg

From ironfroggy at gmail.com  Wed Aug 16 04:18:15 2006
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Tue, 15 Aug 2006 22:18:15 -0400
Subject: [Python-3000] Conventions for annotation consumers
In-Reply-To: <44E275C3.2070508@canterbury.ac.nz>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<44E275C3.2070508@canterbury.ac.nz>
Message-ID: <76fd5acf0608151918j3d572b7cq9d61b5170ce966a3@mail.gmail.com>

On 8/15/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Paul Prescod wrote:
> > What if
> > two different groups start fighting over the keyword "type" or "doc" or
> > "lock"? Python already has a module system that allows you to use the
> > word "type" and me to use the word "type" without conflict
>
> But, in general, performing this disambiguation requires
> executing the module that is making the annotations. For
> a processor that only wants to deal with the source, this
> is undesirable.

The path to the module should be considered more like a namespace
identifier. When I see the annotation Number is in annolib.types,
'annolib.types' can be taken as a unique namespace identifier to
understand the context of the name 'Number'. This doesn't need any
processing of the annolib.types module itself, because the contents of
that module are not important, only the name.

From guido at python.org  Wed Aug 16 06:04:41 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 15 Aug 2006 21:04:41 -0700
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
	Conventions for annotation consumers)
In-Reply-To: <44E27062.2040406@canterbury.ac.nz>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<44E27062.2040406@canterbury.ac.nz>
Message-ID: <ca471dc20608152104m49517746yc6ef3340f6fc53f4@mail.gmail.com>

On 8/15/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Collin Winter wrote:
>
> > @docstring
> > @typechecker
> > @constrain_values
> > def foo(a: {'doc': "Frobnication count",
> >            'type': Number,
> >            'constrain_values': range(3, 9)},
> >        b: {'type': Number,
> >             # This can be only 4, 8 or 12
> >            'constrain_values': [4, 8, 12]}) -> {'type': Number}
>
> There's another thing that's bothering me about all this.
> The main reason Guido rejected the originally suggested
> syntax for function decorators was that it put too much
> stuff into the function header and obscured the signature.
>
> Now we seem to be about to open ourselves up to the
> same problem on an even bigger scale. Who can honestly
> say that the above function declaration is easy to read?
> To me it looks downright ugly.

It's a worse-case scenario suggesting how one could solve a very hairy
problem. I don't expect that something this extreme will be at all
common (otherwise I'd be against it too).

PS. http://meyerweb.com/eric/comment/chech.html

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 16 06:13:11 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 15 Aug 2006 21:13:11 -0700
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <44DFE092.8030604@canterbury.ac.nz>
References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu>
	<44DF86AA.7050207@acm.org> <44DFE092.8030604@canterbury.ac.nz>
Message-ID: <ca471dc20608152113n66471411qf72144022e88f04d@mail.gmail.com>

On 8/13/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Talin wrote:
> > the compiler would note the combination of the attribute access and the
> > call, and combine them into an opcode that skips the whole method
> > creation step.
>
> Something like that could probably be made to work. You'd
> want to be careful to do the optimisation only when the
> attribute in question is an ordinary attribute, not
> a property or other descriptor.
>
> I'm also -1 on eliminating bound methods entirely.
> I worked through that idea in considerable depth during my
> discussions with the author of Prothon, which was also to
> have been without any notion of bound methods. The
> consequences are further-reaching than you might think at
> first. The bottom line is that without bound methods,
> Python wouldn't really be Python any more.


Right. I'm against anything that changes the current semantics. I'm
all for a compiler optimization that turns "<expr> . <name> ( <args>
)" into a single opcode that somehow manages to avoid creating the
bound object. As long as it also does the right thing in case the name
refers to something that's not quite a standard method -- be it a
class method or static method, or a class, or anything else callable
(or even not callable :-). And it would be fine if that optimization
wasn't used if there are keyword arguments, or *args or **kwds, or
more than N arguments for some N > 3 or so.

But, as Thomas says, it was tried before and didn't quite work. Maybe
we can borrow some ideas from IronPython, which boasts a 7x faster
method call (or was it function call? it was a call anyway); and
according to Jim Hugunin only half of that speed-up (on a linear or
logarithmic scale? he didn't say) can be explained through the .NET
JIT.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at prescod.net  Wed Aug 16 07:22:34 2006
From: paul at prescod.net (Paul Prescod)
Date: Tue, 15 Aug 2006 22:22:34 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers (was:
	Re: Draft pre-PEP: function annotations)
In-Reply-To: <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
	<76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>
	<76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>
	<43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
Message-ID: <1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com>

On 8/15/06, Collin Winter <collinw at gmail.com> wrote:
>
>
> How exactly do they "deduce" what Bar is, just from the "from bar
> import Bar" line? pychecker would have to import and compile the Bar
> module first. What if being able to import bar depends on some import
> hooks that some other module (imported before bar) installed? I guess
> you'd have to follow the entire import graph just to make sure. Oh,
> and you'd have to end up running the module being analysed in case
> *it* installs some import hooks -- or maybe it defines Bar itself.


The end-user and the type checker creator can negotiate the boundary between
convenience and easy to parse syntax. At first the type checker creator
might say that things must be in a very predictable form with no variants
and no renames. Then they might do a bit more analysis and be able to handle
renames. Then they might evolve towards whole-program analysis and be able
to handle very complicated imports.

Surely you know that decorators can also be renamed, imported, etc. Same
with base classes (which are considered key to type checking). This is just
how Python works. Where people need to use static subsets of Python (like
RPython, or the "freeze" program or the compilable subset used by Jython)
they just define the subset and move on. The languages' core behaviour is
defined dynamically.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/fa53a041/attachment.html 

From paul at prescod.net  Wed Aug 16 07:34:46 2006
From: paul at prescod.net (Paul Prescod)
Date: Tue, 15 Aug 2006 22:34:46 -0700
Subject: [Python-3000] Conventions for annotation consumers
In-Reply-To: <44E275C3.2070508@canterbury.ac.nz>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<44E275C3.2070508@canterbury.ac.nz>
Message-ID: <1cb725390608152234g4010aedbs9cb92c2c361b390f@mail.gmail.com>

On 8/15/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
> Paul Prescod wrote:
> > What if
> > two different groups start fighting over the keyword "type" or "doc" or
> > "lock"? Python already has a module system that allows you to use the
> > word "type" and me to use the word "type" without conflict
>
> But, in general, performing this disambiguation requires
> executing the module that is making the annotations. For
> a processor that only wants to deal with the source, this
> is undesirable.


This is true for every proposal we've described. Proposal 1 is:

Foo(int)
Bar(module.type1)

Proposal two is:

{"Foo": int,
"Bar": module.type1}

In either case, "int" and "module.type1" can be rebound. To say otherwise is
to change Python's evaluation model drastically.

>>> int = None
>>> float = file
>>>

Once you accept Python's dynamism, it makes sense to accept it both for the
annotation "key" as for the "value". If you can convince Guido and the rest
of the Python-dev team to reject it, then you can reject it for both
equally. So the issue is a red herring.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/02465dc1/attachment.htm 

From ironfroggy at gmail.com  Wed Aug 16 07:48:00 2006
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Wed, 16 Aug 2006 01:48:00 -0400
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <ca471dc20608152113n66471411qf72144022e88f04d@mail.gmail.com>
References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu>
	<44DF86AA.7050207@acm.org> <44DFE092.8030604@canterbury.ac.nz>
	<ca471dc20608152113n66471411qf72144022e88f04d@mail.gmail.com>
Message-ID: <76fd5acf0608152248j76f38d2x88ba241a8c66c835@mail.gmail.com>

On 8/16/06, Guido van Rossum <guido at python.org> wrote:
> On 8/13/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > Talin wrote:
> > > the compiler would note the combination of the attribute access and the
> > > call, and combine them into an opcode that skips the whole method
> > > creation step.
> >
> > Something like that could probably be made to work. You'd
> > want to be careful to do the optimisation only when the
> > attribute in question is an ordinary attribute, not
> > a property or other descriptor.
> >
> > I'm also -1 on eliminating bound methods entirely.
> > I worked through that idea in considerable depth during my
> > discussions with the author of Prothon, which was also to
> > have been without any notion of bound methods. The
> > consequences are further-reaching than you might think at
> > first. The bottom line is that without bound methods,
> > Python wouldn't really be Python any more.
>
>
> Right. I'm against anything that changes the current semantics. I'm
> all for a compiler optimization that turns "<expr> . <name> ( <args>
> )" into a single opcode that somehow manages to avoid creating the
> bound object. As long as it also does the right thing in case the name
> refers to something that's not quite a standard method -- be it a
> class method or static method, or a class, or anything else callable
> (or even not callable :-). And it would be fine if that optimization
> wasn't used if there are keyword arguments, or *args or **kwds, or
> more than N arguments for some N > 3 or so.
>
> But, as Thomas says, it was tried before and didn't quite work. Maybe
> we can borrow some ideas from IronPython, which boasts a 7x faster
> method call (or was it function call? it was a call anyway); and
> according to Jim Hugunin only half of that speed-up (on a linear or
> logarithmic scale? he didn't say) can be explained through the .NET
> JIT.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)

Would a possible special method name __methodcall__ be accepted, where
if it exists on a callable, you can expect to use it as __call__ but
with the understanding that it accepts <expr> as self when called in
an optimizable form? This would reduce the method call to two
attribute lookups before the call instead of an instansiation and all
the heavy lifting currently done. For normal functions,
'f.__methodcall__ is f.__call__' may be true, but the existance of
that __methodcall__ name just gives you an extra contract.

From jimjjewett at gmail.com  Wed Aug 16 16:26:44 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 16 Aug 2006 10:26:44 -0400
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
	Conventions for annotation consumers)
In-Reply-To: <ca471dc20608152104m49517746yc6ef3340f6fc53f4@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<44E27062.2040406@canterbury.ac.nz>
	<ca471dc20608152104m49517746yc6ef3340f6fc53f4@mail.gmail.com>
Message-ID: <fb6fbf560608160726j3abf6237m8250b9483ecff011@mail.gmail.com>

On 8/16/06, Guido van Rossum <guido at python.org> wrote:
> On 8/15/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

    [9 lines for a two argument def]

> > There's another thing that's bothering me about all this.
> > The main reason Guido rejected the originally suggested
> > syntax for function decorators was that it put too much
> > stuff into the function header and obscured the signature.

> It's a worse-case scenario suggesting how one could solve a very hairy
> problem. I don't expect that something this extreme will be at all
> common (otherwise I'd be against it too).

Yes and no; I don't think it will be that uncommon to have multiple
annotations, somewhat similar to "public static final int".  Also note
that needing to disambiguate the annotations will tend to increase
their length.

I hope that needing more than one line per argument will be unusual,
but needing more than one line for a definition may not be.

That is one reason I wonder whether all annotations/modifications have
to actually be part of the prologue, or whether they could be applied
to the Signature afterwards.

-jJ

From guido at python.org  Wed Aug 16 16:45:46 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Aug 2006 07:45:46 -0700
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
	Conventions for annotation consumers)
In-Reply-To: <fb6fbf560608160726j3abf6237m8250b9483ecff011@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<44E27062.2040406@canterbury.ac.nz>
	<ca471dc20608152104m49517746yc6ef3340f6fc53f4@mail.gmail.com>
	<fb6fbf560608160726j3abf6237m8250b9483ecff011@mail.gmail.com>
Message-ID: <ca471dc20608160745t4c662158pbcd05ecfed0c5ef@mail.gmail.com>

On 8/16/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> Yes and no; I don't think it will be that uncommon to have multiple
> annotations, somewhat similar to "public static final int".  Also note
> that needing to disambiguate the annotations will tend to increase
> their length.

God save us from public static final int.

> I hope that needing more than one line per argument will be unusual,
> but needing more than one line for a definition may not be.

I expect the latter will be too, as it would only matter for code that
somehow straddles two or more frameworks.

> That is one reason I wonder whether all annotations/modifications have
> to actually be part of the prologue, or whether they could be applied
> to the Signature afterwards.

And how would that reduce the clutter? The information still has to be
entered by the user, presumably with the same disambiguating tags, and
some punctuation.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From collinw at gmail.com  Wed Aug 16 17:09:39 2006
From: collinw at gmail.com (Collin Winter)
Date: Wed, 16 Aug 2006 10:09:39 -0500
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
References: <mailman.34618.1155684147.27774.python-3000@python.org>
	<5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
Message-ID: <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com>

On 8/15/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> Personally, I thought Guido's original proposal for function annotations,
> which included a __typecheck__ operator that was replaceable on a
> per-module basis (and defaulted to a no-op), was the perfect thing --
> neither too much semantics nor too-little.  I'd like to have it back,
> please.  :)

I'd be perfectly happy to go back to talking about "type annotations",
rather than the more general "function annotations", especially since
most of the discussion thus far has been about how to multiple things
with annotations at the same time. Restricting annotations to type
information would be fine by me.

Collin Winter

From guido at python.org  Wed Aug 16 17:45:12 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Aug 2006 08:45:12 -0700
Subject: [Python-3000] Conventions for annotation consumers
In-Reply-To: <44E275C3.2070508@canterbury.ac.nz>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com>
	<43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com>
	<1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<44E275C3.2070508@canterbury.ac.nz>
Message-ID: <ca471dc20608160845j3f78fa21r645c193cfb7fb41a@mail.gmail.com>

On 8/15/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> But, in general, performing this disambiguation requires
> executing the module that is making the annotations. For
> a processor that only wants to deal with the source, this
> is undesirable.

Um, when did we start off in the direction of source-level processing
of function annotations? Are we still talking about Python? I'm
confused (especially since this thread seems to start in the middle).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 16 17:48:46 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Aug 2006 08:48:46 -0700
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <76fd5acf0608152248j76f38d2x88ba241a8c66c835@mail.gmail.com>
References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu>
	<44DF86AA.7050207@acm.org> <44DFE092.8030604@canterbury.ac.nz>
	<ca471dc20608152113n66471411qf72144022e88f04d@mail.gmail.com>
	<76fd5acf0608152248j76f38d2x88ba241a8c66c835@mail.gmail.com>
Message-ID: <ca471dc20608160848o9812ed1jb83f94e4ec013c09@mail.gmail.com>

On 8/15/06, Calvin Spealman <ironfroggy at gmail.com> wrote:
> On 8/16/06, Guido van Rossum <guido at python.org> wrote:
> > Right. I'm against anything that changes the current semantics. I'm
> > all for a compiler optimization that turns "<expr> . <name> ( <args>
> > )" into a single opcode that somehow manages to avoid creating the
> > bound object. As long as it also does the right thing in case the name
> > refers to something that's not quite a standard method -- be it a
> > class method or static method, or a class, or anything else callable
> > (or even not callable :-). And it would be fine if that optimization
> > wasn't used if there are keyword arguments, or *args or **kwds, or
> > more than N arguments for some N > 3 or so.
> >
> > But, as Thomas says, it was tried before and didn't quite work. Maybe
> > we can borrow some ideas from IronPython, which boasts a 7x faster
> > method call (or was it function call? it was a call anyway); and
> > according to Jim Hugunin only half of that speed-up (on a linear or
> > logarithmic scale? he didn't say) can be explained through the .NET
> > JIT.

> Would a possible special method name __methodcall__ be accepted, where
> if it exists on a callable, you can expect to use it as __call__ but
> with the understanding that it accepts <expr> as self when called in
> an optimizable form? This would reduce the method call to two
> attribute lookups before the call instead of an instansiation and all
> the heavy lifting currently done. For normal functions,
> 'f.__methodcall__ is f.__call__' may be true, but the existance of
> that __methodcall__ name just gives you an extra contract.

I'd like to answer "no" (since I think this whole idea is not a very
fruitful avenue) but frankly, I have no idea what you are trying to
describe. Are you even aware of the descriptor protocol (__get__) and
how it's used to create a bound method (or something else)?

No reply is needed.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Wed Aug 16 18:35:00 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed, 16 Aug 2006 12:35:00 -0400
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com
 >
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<mailman.34618.1155684147.27774.python-3000@python.org>
	<5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com>

At 10:09 AM 8/16/2006 -0500, Collin Winter wrote:
>On 8/15/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>>Personally, I thought Guido's original proposal for function annotations,
>>which included a __typecheck__ operator that was replaceable on a
>>per-module basis (and defaulted to a no-op), was the perfect thing --
>>neither too much semantics nor too-little.  I'd like to have it back,
>>please.  :)
>
>I'd be perfectly happy to go back to talking about "type annotations",
>rather than the more general "function annotations", especially since
>most of the discussion thus far has been about how to multiple things
>with annotations at the same time. Restricting annotations to type
>information would be fine by me.

Who said anything about restricting annotations to type information?  I 
just said I liked Guido's original proposal better -- because it doesn't 
restrict a darned thing, and makes it clear that the semantics are up to you.

The annotations of course should still be exposed as a function attribute.


From paul at prescod.net  Wed Aug 16 18:38:21 2006
From: paul at prescod.net (Paul Prescod)
Date: Wed, 16 Aug 2006 09:38:21 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com>
References: <mailman.34618.1155684147.27774.python-3000@python.org>
	<5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com>
Message-ID: <1cb725390608160938h7ddcd317o39e21aac0416a432@mail.gmail.com>

On 8/16/06, Collin Winter <collinw at gmail.com> wrote:
>
> I'd be perfectly happy to go back to talking about "type annotations",
> rather than the more general "function annotations", especially since
> most of the discussion thus far has been about how to multiple things
> with annotations at the same time. Restricting annotations to type
> information would be fine by me.


I don't understand why we would want to go backwards. You wrote a PEP. We
haven't suggested any major technical changes to it, rather just a few
guidelines. How would restricting the domain of the PEP solve any issues
about dynamicity?

By the way, I think it may be naive to presume that there is only one
relevant type system. People may well want to establish mappings from their
types to programming language types. For example, to COM types, .NET types
and Java types. 80% of these may be inferencable from platform-independent
declarations but the other 20% may require a second layer of
platform-specific type declarations.

Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060816/bafe8aa0/attachment.html 

From collinw at gmail.com  Wed Aug 16 18:41:38 2006
From: collinw at gmail.com (Collin Winter)
Date: Wed, 16 Aug 2006 11:41:38 -0500
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com>
References: <mailman.34618.1155684147.27774.python-3000@python.org>
	<5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com>
Message-ID: <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com>

On 8/16/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 10:09 AM 8/16/2006 -0500, Collin Winter wrote:
> >On 8/15/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> >>Personally, I thought Guido's original proposal for function annotations,
> >>which included a __typecheck__ operator that was replaceable on a
> >>per-module basis (and defaulted to a no-op), was the perfect thing --
> >>neither too much semantics nor too-little.  I'd like to have it back,
> >>please.  :)
> >
> >I'd be perfectly happy to go back to talking about "type annotations",
> >rather than the more general "function annotations", especially since
> >most of the discussion thus far has been about how to multiple things
> >with annotations at the same time. Restricting annotations to type
> >information would be fine by me.
>
> Who said anything about restricting annotations to type information?  I
> just said I liked Guido's original proposal better -- because it doesn't
> restrict a darned thing, and makes it clear that the semantics are up to you.
>
> The annotations of course should still be exposed as a function attribute.

Sorry, I meant "restrict" as in having it stated that the annotations
are for typechecking, rather than attempting to support a dozen
different uses simultaneously. The annotations would still be
free-form, with the semantics up to whoever's implementing the
__typecheck__ function, and Python itself wouldn't take any steps to
enforce what can or can't go in the annotations.

Is this more along the lines of what you meant?

Collin Winter

From pje at telecommunity.com  Wed Aug 16 18:54:02 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed, 16 Aug 2006 12:54:02 -0400
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.co
 m>
References: <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com>
	<mailman.34618.1155684147.27774.python-3000@python.org>
	<5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20060816124756.023cc448@sparrow.telecommunity.com>

At 11:41 AM 8/16/2006 -0500, Collin Winter wrote:
>On 8/16/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>>At 10:09 AM 8/16/2006 -0500, Collin Winter wrote:
>> >On 8/15/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>> >>Personally, I thought Guido's original proposal for function annotations,
>> >>which included a __typecheck__ operator that was replaceable on a
>> >>per-module basis (and defaulted to a no-op), was the perfect thing --
>> >>neither too much semantics nor too-little.  I'd like to have it back,
>> >>please.  :)
>> >
>> >I'd be perfectly happy to go back to talking about "type annotations",
>> >rather than the more general "function annotations", especially since
>> >most of the discussion thus far has been about how to multiple things
>> >with annotations at the same time. Restricting annotations to type
>> >information would be fine by me.
>>
>>Who said anything about restricting annotations to type information?  I
>>just said I liked Guido's original proposal better -- because it doesn't
>>restrict a darned thing, and makes it clear that the semantics are up to you.
>>
>>The annotations of course should still be exposed as a function attribute.
>
>Sorry, I meant "restrict" as in having it stated that the annotations
>are for typechecking, rather than attempting to support a dozen
>different uses simultaneously. The annotations would still be
>free-form, with the semantics up to whoever's implementing the
>__typecheck__ function, and Python itself wouldn't take any steps to
>enforce what can or can't go in the annotations.
>
>Is this more along the lines of what you meant?

Yes, but it doesn't mean that the notion of "type" may not be fairly 
expansive.  For example, I can foresee wanting to use this "type" 
information to manage marshalling from web forms or XML-RPC 
requests...  defining command-line options and help...  GUI field/widget 
information for command objects, and so on.

In other words, I want open-ended annotation semantics to allow all sorts 
of metadata-driven behavior.

I think the notion that there's a problem with "attempting to support a 
dozen different uses simultaneously" is a red herring.  Docstrings and 
function attributes do just that, and civilization as we know it has not 
collapsed.


From paul at prescod.net  Wed Aug 16 18:55:31 2006
From: paul at prescod.net (Paul Prescod)
Date: Wed, 16 Aug 2006 09:55:31 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com>
References: <mailman.34618.1155684147.27774.python-3000@python.org>
	<5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com>
	<43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com>
Message-ID: <1cb725390608160955y6a9776c8x4db1cab893a24875@mail.gmail.com>

On 8/16/06, Collin Winter <collinw at gmail.com> wrote:
>
> Sorry, I meant "restrict" as in having it stated that the annotations
> are for typechecking, rather than attempting to support a dozen
> different uses simultaneously. The annotations would still be
> free-form, with the semantics up to whoever's implementing the
> __typecheck__ function, and Python itself wouldn't take any steps to
> enforce what can or can't go in the annotations.


Nobody every suggested that Python should take any steps to enforce what can
or can't go in the annotations! It seems that we're inventing disagreement
where there is none. All I ever suggested is a) that we put some guidelines
in the spec *discouraging* people from using built-in Python types for their
own private meanings without some kind of discriminator clarifying that they
are doing so and b) that we define the shared meanings of a couple of useful
types: lists and tuples. This leaves the Python development team the maximum
latitude to specify the meanings for the other types (especially type type)
later.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060816/2505b89f/attachment.htm 

From guido at python.org  Wed Aug 16 18:57:20 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Aug 2006 09:57:20 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers (was:
	Re: Draft pre-PEP: function annotations)
In-Reply-To: <1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
	<76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>
	<76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>
	<43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
	<1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com>
Message-ID: <ca471dc20608160957v1934bd7bi9b0ed3ecfb2bcef7@mail.gmail.com>

There's much in this thread that I haven't followed, for lack of time.

But it seems clear to me that you've wandered off the path now that
you're discussing what should go into the annotations and how to make
it so that multiple frameworks can coexist.

I don't see how any of that can be analyzed up front -- you have to
build an implementation and try to use it and *then* perhaps you can
think about the problems that occur.

Collin wrote a great PEP that doesn't commit to any kind of semantics
for annotations. (I still have to read it more closely, but from
skimming, it looks fine.) Let's focus some efforts on implementing
that first, and see how we can use it, before we consider the use case
of a framework for frameworks.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at prescod.net  Wed Aug 16 18:58:29 2006
From: paul at prescod.net (Paul Prescod)
Date: Wed, 16 Aug 2006 09:58:29 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <1cb725390608160955y6a9776c8x4db1cab893a24875@mail.gmail.com>
References: <mailman.34618.1155684147.27774.python-3000@python.org>
	<5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com>
	<43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com>
	<1cb725390608160955y6a9776c8x4db1cab893a24875@mail.gmail.com>
Message-ID: <1cb725390608160958s1c8985f3i432ac41cf30d570a@mail.gmail.com>

I said "lists and tuples" where I meant "lists and strings".

On 8/16/06, Paul Prescod <paul at prescod.net> wrote:
>
> On 8/16/06, Collin Winter <collinw at gmail.com> wrote:
>
> > Sorry, I meant "restrict" as in having it stated that the annotations
> > are for typechecking, rather than attempting to support a dozen
> > different uses simultaneously. The annotations would still be
> > free-form, with the semantics up to whoever's implementing the
> > __typecheck__ function, and Python itself wouldn't take any steps to
> > enforce what can or can't go in the annotations.
>
>
> Nobody every suggested that Python should take any steps to enforce what
> can or can't go in the annotations! It seems that we're inventing
> disagreement where there is none. All I ever suggested is a) that we put
> some guidelines in the spec *discouraging* people from using built-in Python
> types for their own private meanings without some kind of discriminator
> clarifying that they are doing so and b) that we define the shared meanings
> of a couple of useful types: lists and tuples. This leaves the Python
> development team the maximum latitude to specify the meanings for the other
> types (especially type type) later.
>
>  Paul Prescod
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060816/8810da78/attachment.html 

From jcarlson at uci.edu  Wed Aug 16 19:03:05 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 16 Aug 2006 10:03:05 -0700
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
	Conventions for annotation consumers)
In-Reply-To: <ca471dc20608160745t4c662158pbcd05ecfed0c5ef@mail.gmail.com>
References: <fb6fbf560608160726j3abf6237m8250b9483ecff011@mail.gmail.com>
	<ca471dc20608160745t4c662158pbcd05ecfed0c5ef@mail.gmail.com>
Message-ID: <20060816090147.19DA.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> On 8/16/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> > That is one reason I wonder whether all annotations/modifications have
> > to actually be part of the prologue, or whether they could be applied
> > to the Signature afterwards.
> 
> And how would that reduce the clutter? The information still has to be
> entered by the user, presumably with the same disambiguating tags, and
> some punctuation.

I'd say that pulling out annotations from the function signature, which
was argued to be the most important piece of information about a
function during the decorator discussion, could do at least as much to
reduce clutter and increase readability and understandability, as
anything else discussed with regards to the PEP so far.

To pull back out that 9 line function...

> @docstring
> @typechecker
> @constrain_values
> def foo(a: {'doc': "Frobnication count",
>            'type': Number,
>            'constrain_values': range(3, 9)},
>        b: {'type': Number,
>             # This can be only 4, 8 or 12
>            'constrain_values': [4, 8, 12]}) -> {'type': Number}

First cleaning up the annotations to not use a dictionary:


@docstring
@typechecker
@constrain_values
def foo(a: [doc("frobination count"),
            type(Number),
            constrain_values(range(3,9))],
        b: [type(Number),
            # This can be only 4, 8 or 12
            constrain_values([4,8,12])]) -> type(Number):

Now lets pull those annotations out of the signature...

@docstring
@typechecker
@constrain_values
@__signature__([doc("frobination count"),
                type(Number),
                constrain_values(range(3,9))],
               [type(Number),
                # This can be only 4, 8 or 12
                constrain_values((4,8,12))], returns=type(Number))
def foo(a, b):


Ultimately the full function definition (including decorators) is just
as cluttered, but now we can see that we have a function that takes two
arguments, without having to scan for 'name:' .  If it is necessary for
somone to know what kinds of values, types, docs, etc., then they can
use the documentation-producing tool that will hopefully come with their
annotation consumer(s).


 - Josiah

P.S.
Then there is the blasphemous:

@docstring(a="frobination count")
@typechecker(a=type(Number), b=type(Number))
@constrain values(a=range(3,9), b=(4,8,12), returns=type(Number))
def foo(a, b):


From jimjjewett at gmail.com  Wed Aug 16 19:03:12 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 16 Aug 2006 13:03:12 -0400
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
	Conventions for annotation consumers)
In-Reply-To: <ca471dc20608160745t4c662158pbcd05ecfed0c5ef@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<44E27062.2040406@canterbury.ac.nz>
	<ca471dc20608152104m49517746yc6ef3340f6fc53f4@mail.gmail.com>
	<fb6fbf560608160726j3abf6237m8250b9483ecff011@mail.gmail.com>
	<ca471dc20608160745t4c662158pbcd05ecfed0c5ef@mail.gmail.com>
Message-ID: <fb6fbf560608161003m4d66371v4b44311b0f6e7c5e@mail.gmail.com>

On 8/16/06, Guido van Rossum <guido at python.org> wrote:
> On 8/16/06, Jim Jewett <jimjjewett at gmail.com> wrote:

> > I hope that needing more than one line per argument will be unusual,
> > but needing more than one line for a definition may not be.

> I expect the latter will be too, as it would only matter for code that
> somehow straddles two or more frameworks.

    >>> def f(position:[int, "negative possible"]): ...

"int" and the comment are both documentation which doesn't really need
any framework.  They are both things I would like to see when
introspecting that particular function, though perhaps not when just
scanning function defs.  Together, they're already long enough that I
would prefer to see any second argument on its own line.

> > That is one reason I wonder whether all annotations/modifications have
> > to actually be part of the prologue, or whether they could be applied
> > to the Signature afterwards.

> And how would that reduce the clutter? The information still has to be
> entered by the user, presumably with the same disambiguating tags, and
> some punctuation.

The summary of a function shows up in its prologue, but the details
span the next several lines (the full docstring and body suite).

My feeling is that when annotations start to get long, they're details
that should no longer be in the summary.

-jJ

From guido at python.org  Wed Aug 16 19:13:42 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Aug 2006 10:13:42 -0700
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
	Conventions for annotation consumers)
In-Reply-To: <20060816090147.19DA.JCARLSON@uci.edu>
References: <fb6fbf560608160726j3abf6237m8250b9483ecff011@mail.gmail.com>
	<ca471dc20608160745t4c662158pbcd05ecfed0c5ef@mail.gmail.com>
	<20060816090147.19DA.JCARLSON@uci.edu>
Message-ID: <ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>

On 8/16/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Guido van Rossum" <guido at python.org> wrote:
> > On 8/16/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> > > That is one reason I wonder whether all annotations/modifications have
> > > to actually be part of the prologue, or whether they could be applied
> > > to the Signature afterwards.
> >
> > And how would that reduce the clutter? The information still has to be
> > entered by the user, presumably with the same disambiguating tags, and
> > some punctuation.
>
> I'd say that pulling out annotations from the function signature, which
> was argued to be the most important piece of information about a
> function during the decorator discussion, could do at least as much to
> reduce clutter and increase readability and understandability, as
> anything else discussed with regards to the PEP so far.
>
> To pull back out that 9 line function...
>
> > @docstring
> > @typechecker
> > @constrain_values
> > def foo(a: {'doc': "Frobnication count",
> >            'type': Number,
> >            'constrain_values': range(3, 9)},
> >        b: {'type': Number,
> >             # This can be only 4, 8 or 12
> >            'constrain_values': [4, 8, 12]}) -> {'type': Number}
>
> First cleaning up the annotations to not use a dictionary:
>
>
> @docstring
> @typechecker
> @constrain_values
> def foo(a: [doc("frobination count"),
>             type(Number),
>             constrain_values(range(3,9))],
>         b: [type(Number),
>             # This can be only 4, 8 or 12
>             constrain_values([4,8,12])]) -> type(Number):
>
> Now lets pull those annotations out of the signature...
>
> @docstring
> @typechecker
> @constrain_values
> @__signature__([doc("frobination count"),
>                 type(Number),
>                 constrain_values(range(3,9))],
>                [type(Number),
>                 # This can be only 4, 8 or 12
>                 constrain_values((4,8,12))], returns=type(Number))
> def foo(a, b):

I think you just have disproved your point. Apart from losing a few
string quotes this is just as unreadable as the example you started
with, and those string quotes were due to a different convention for
multiple annotations, not due to moving the information into a
descriptor.

> Ultimately the full function definition (including decorators) is just
> as cluttered, but now we can see that we have a function that takes two
> arguments, without having to scan for 'name:' .  If it is necessary for
> somone to know what kinds of values, types, docs, etc., then they can
> use the documentation-producing tool that will hopefully come with their
> annotation consumer(s).

The whole point of putting decorators up front was so that they share
prime real estate ("above the fold" if you will :-) with the function
signature. Claiming that what's in the decorators doesn't distract
from the def itself doesn't make it true.

But, as I said 15 minutes ago, please stop worrying about this so
much. Try to implement Collin's PEP (which doesn't have any
constraints on the semantics or use of annotations). There's a Py3k
sprint at Google (MV and NY) next week -- perhaps we can work on it
there!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 16 19:18:47 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Aug 2006 10:18:47 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com>
References: <mailman.34618.1155684147.27774.python-3000@python.org>
	<5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com>
	<43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com>
Message-ID: <ca471dc20608161018k2fa6a2a2x37389588578ee79a@mail.gmail.com>

On 8/16/06, Collin Winter <collinw at gmail.com> wrote:
> Sorry, I meant "restrict" as in having it stated that the annotations
> are for typechecking, rather than attempting to support a dozen
> different uses simultaneously. The annotations would still be
> free-form, with the semantics up to whoever's implementing the
> __typecheck__ function, and Python itself wouldn't take any steps to
> enforce what can or can't go in the annotations.

-1. Th annotations should be available for whatever the user wants to
use them for. Remember, lots of folks do *not* use shared frameworks
-- the only framework they care about is the one they write for
themselves, and they should not feel guilty about using annotations
for whatever metadata they need. To take up an old rule from the X11
world, the language should provide mechanism without policy.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Wed Aug 16 20:12:20 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 16 Aug 2006 11:12:20 -0700
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
	Conventions for annotation consumers)
In-Reply-To: <ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>
References: <20060816090147.19DA.JCARLSON@uci.edu>
	<ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>
Message-ID: <20060816102652.19E3.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> On 8/16/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > @docstring
> > @typechecker
> > @constrain_values
> > @__signature__([doc("frobination count"),
> >                 type(Number),
> >                 constrain_values(range(3,9))],
> >                [type(Number),
> >                 # This can be only 4, 8 or 12
> >                 constrain_values((4,8,12))], returns=type(Number))
> > def foo(a, b):
> 
> I think you just have disproved your point. Apart from losing a few
> string quotes this is just as unreadable as the example you started
> with, and those string quotes were due to a different convention for
> multiple annotations, not due to moving the information into a
> descriptor.
> 
> > Ultimately the full function definition (including decorators) is just
> > as cluttered, but now we can see that we have a function that takes two
> > arguments, without having to scan for 'name:' .  If it is necessary for
> > somone to know what kinds of values, types, docs, etc., then they can
> > use the documentation-producing tool that will hopefully come with their
> > annotation consumer(s).
> 
> The whole point of putting decorators up front was so that they share
> prime real estate ("above the fold" if you will :-) with the function
> signature. Claiming that what's in the decorators doesn't distract
> from the def itself doesn't make it true.

From using Python, my brain has become trained to look for new indent
levels, so when I'm looking for function definitions, this is what I see...

@CRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAP
@CRAPCRAPCRAPCRAPCRAPCRAPCRAP
@CRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAP
def foo(...):
    #stuff
    a = ...
    b = ...
    for ...:
        ...
    ...

In my opinion, decorators that don't include their own indentation for
readability do not distract from the def.  I would imagine that many
people (not just me) have trained themselves to look for new indent
levels, and would agree at some level with this.

Indents within decorators generally induce false positives during visual
scanning, but aside from including a line in the Python style guide
about not using multi-line decorators (and people being kind to readers
of their code), there's not much we can do.


> But, as I said 15 minutes ago, please stop worrying about this so
> much. Try to implement Collin's PEP (which doesn't have any
> constraints on the semantics or use of annotations). There's a Py3k
> sprint at Google (MV and NY) next week -- perhaps we can work on it
> there!

I'm trying to keep function *signatures* readable. Including one *small*
annotation per argument isn't a big deal, but when simple function
signatures (from the def to the suite colon) start spanning multiple
lines, they are getting both ungreppable and unreadable.  My primary
concern is users grepping, reading, and understanding.  If annotations
detract from any of those three, then the annotation is a waste of time
(in my opinion).

This was one of the concerns brought up in the decorator discussion, and
why none of the decorator proposals that sat between the def and the
closing paren even have typed-out examples listed as contenders on the
PythonDecorators wiki (they each get a bullet list as to why they suck).

But maybe I'm misremembering the discussion, maybe decorators make it
very difficult to visually scan for function definitions, and maybe
people want all that garbage in their function signature.


 - Josiah


From guido at python.org  Wed Aug 16 20:17:33 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Aug 2006 11:17:33 -0700
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
	Conventions for annotation consumers)
In-Reply-To: <20060816102652.19E3.JCARLSON@uci.edu>
References: <20060816090147.19DA.JCARLSON@uci.edu>
	<ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>
	<20060816102652.19E3.JCARLSON@uci.edu>
Message-ID: <ca471dc20608161117h62bdaa96ra59eb2dc9bef5f78@mail.gmail.com>

On 8/16/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Guido van Rossum" <guido at python.org> wrote:
> > On 8/16/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > @docstring
> > > @typechecker
> > > @constrain_values
> > > @__signature__([doc("frobination count"),
> > >                 type(Number),
> > >                 constrain_values(range(3,9))],
> > >                [type(Number),
> > >                 # This can be only 4, 8 or 12
> > >                 constrain_values((4,8,12))], returns=type(Number))
> > > def foo(a, b):
> >
> > I think you just have disproved your point. Apart from losing a few
> > string quotes this is just as unreadable as the example you started
> > with, and those string quotes were due to a different convention for
> > multiple annotations, not due to moving the information into a
> > descriptor.
> >
> > > Ultimately the full function definition (including decorators) is just
> > > as cluttered, but now we can see that we have a function that takes two
> > > arguments, without having to scan for 'name:' .  If it is necessary for
> > > somone to know what kinds of values, types, docs, etc., then they can
> > > use the documentation-producing tool that will hopefully come with their
> > > annotation consumer(s).
> >
> > The whole point of putting decorators up front was so that they share
> > prime real estate ("above the fold" if you will :-) with the function
> > signature. Claiming that what's in the decorators doesn't distract
> > from the def itself doesn't make it true.
>
> From using Python, my brain has become trained to look for new indent
> levels, so when I'm looking for function definitions, this is what I see...
>
> @CRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAP
> @CRAPCRAPCRAPCRAPCRAPCRAPCRAP
> @CRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAP
> def foo(...):
>     #stuff
>     a = ...
>     b = ...
>     for ...:
>         ...
>     ...

Well, then the problem becomes finding the tiny 'def' between all that CRAP.

> In my opinion, decorators that don't include their own indentation for
> readability do not distract from the def.  I would imagine that many
> people (not just me) have trained themselves to look for new indent
> levels, and would agree at some level with this.

But notice that the example *did* include multi-line decorators with
indented continuation lines.

> Indents within decorators generally induce false positives during visual
> scanning, but aside from including a line in the Python style guide
> about not using multi-line decorators (and people being kind to readers
> of their code), there's not much we can do.

There's another style:

type_a = {"foo": some_type_for_framework_foo, "bar": some_other_type, etc.}
type_b = {...similar...}

def my_fun(a: type_a, b: type_b) -> type_c:
    ...

This works just as well for the list style of having multiple annotations.

If you write a lot of code that uses multiple annotations, I'd be very
surprised if there weren't a bunch of common combinations that could
be shared like this.

> > But, as I said 15 minutes ago, please stop worrying about this so
> > much. Try to implement Collin's PEP (which doesn't have any
> > constraints on the semantics or use of annotations). There's a Py3k
> > sprint at Google (MV and NY) next week -- perhaps we can work on it
> > there!
>
> I'm trying to keep function *signatures* readable. Including one *small*
> annotation per argument isn't a big deal, but when simple function
> signatures (from the def to the suite colon) start spanning multiple
> lines, they are getting both ungreppable and unreadable.  My primary
> concern is users grepping, reading, and understanding.  If annotations
> detract from any of those three, then the annotation is a waste of time
> (in my opinion).

What exactly are you grepping for where a multi-line arglist would get
in the way? The most complicated pattern for which I grep is probably
something along the lines of '^def \w+\('.

> This was one of the concerns brought up in the decorator discussion, and
> why none of the decorator proposals that sat between the def and the
> closing paren even have typed-out examples listed as contenders on the
> PythonDecorators wiki (they each get a bullet list as to why they suck).
>
> But maybe I'm misremembering the discussion, maybe decorators make it
> very difficult to visually scan for function definitions, and maybe
> people want all that garbage in their function signature.

They don't want it, but if they're forced to have it occasionally
they'll cope. I still think you're way overestimating the importance
of this use case.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at prescod.net  Thu Aug 17 02:11:34 2006
From: paul at prescod.net (Paul Prescod)
Date: Wed, 16 Aug 2006 17:11:34 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers (was:
	Re: Draft pre-PEP: function annotations)
In-Reply-To: <ca471dc20608160957v1934bd7bi9b0ed3ecfb2bcef7@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
	<76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>
	<76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>
	<43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
	<1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com>
	<ca471dc20608160957v1934bd7bi9b0ed3ecfb2bcef7@mail.gmail.com>
Message-ID: <1cb725390608161711v7c0a93b9i3f9e2032da9254af@mail.gmail.com>

Okay, you're the boss. The conversation did go pretty far afield but the
main thing I wanted was just that if a user wanted to have annotations from
framework 1 and framework 2 they could reliably express that as

def foo(a: [Anno1, Anno2]):

All that that requires is a statement in the spec saying: "If you're
processing annotations and you see an annotation you don't understand, skip
it. And if you see a list, look inside it rather than processing it in some
proprietary fashion."

It kind of seemed obvious to me, but I guess everyone's ideas seem obvious
to them. There were other secondary things I would have liked but this
seemed like the minimum required to protect programmers from "greedy
frameworks" that don't play nice in the face of unfamiliar annotations.

On 8/16/06, Guido van Rossum <guido at python.org> wrote:
>
> There's much in this thread that I haven't followed, for lack of time.
>
> But it seems clear to me that you've wandered off the path now that
> you're discussing what should go into the annotations and how to make
> it so that multiple frameworks can coexist.
>
> I don't see how any of that can be analyzed up front -- you have to
> build an implementation and try to use it and *then* perhaps you can
> think about the problems that occur.
>
> Collin wrote a great PEP that doesn't commit to any kind of semantics
> for annotations. (I still have to read it more closely, but from
> skimming, it looks fine.) Let's focus some efforts on implementing
> that first, and see how we can use it, before we consider the use case
> of a framework for frameworks.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060816/b8481a6c/attachment.html 

From greg.ewing at canterbury.ac.nz  Thu Aug 17 03:37:04 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 17 Aug 2006 13:37:04 +1200
Subject: [Python-3000] Function annotations considered obfuscatory (Re:
 Conventions for annotation consumers)
In-Reply-To: <ca471dc20608160745t4c662158pbcd05ecfed0c5ef@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<fb6fbf560608141522j64f611e9ndecd696214b2088c@mail.gmail.com>
	<43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com>
	<43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com>
	<44E27062.2040406@canterbury.ac.nz>
	<ca471dc20608152104m49517746yc6ef3340f6fc53f4@mail.gmail.com>
	<fb6fbf560608160726j3abf6237m8250b9483ecff011@mail.gmail.com>
	<ca471dc20608160745t4c662158pbcd05ecfed0c5ef@mail.gmail.com>
Message-ID: <44E3C840.1000602@canterbury.ac.nz>

Guido van Rossum wrote:

> And how would that reduce the clutter? The information still has to be
> entered by the user, presumably with the same disambiguating tags, and
> some punctuation.

But at least the function header itself would retain
its wysiwyt[1] character of being mostly just a list of
parameter names.

--
[1] What You See Is What You Type

From greg.ewing at canterbury.ac.nz  Thu Aug 17 03:46:43 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 17 Aug 2006 13:46:43 +1200
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <ca471dc20608160957v1934bd7bi9b0ed3ecfb2bcef7@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com>
	<1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
	<76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>
	<76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>
	<43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
	<1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com>
	<ca471dc20608160957v1934bd7bi9b0ed3ecfb2bcef7@mail.gmail.com>
Message-ID: <44E3CA83.6030801@canterbury.ac.nz>

Guido van Rossum wrote:

> Collin wrote a great PEP that doesn't commit to any kind of semantics
> for annotations.

I think the argument started because Collin's PEP actually
went further than that, and asserted that there wouldn't
be any problems created by this lack of specification,
for reasons which are highly debatable. Not surprisingly,
a high amount of debate on that point ensued.

If the PEP simply said something like "We'll look at
this again after we've had some experience", it might
not have been so controversial.

--
Greg

From talin at acm.org  Thu Aug 17 08:21:44 2006
From: talin at acm.org (Talin)
Date: Wed, 16 Aug 2006 23:21:44 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com>
References: <mailman.34618.1155684147.27774.python-3000@python.org>	<5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com>
Message-ID: <44E40AF8.8060400@acm.org>

Collin Winter wrote:
> On 8/15/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>> Personally, I thought Guido's original proposal for function annotations,
>> which included a __typecheck__ operator that was replaceable on a
>> per-module basis (and defaulted to a no-op), was the perfect thing --
>> neither too much semantics nor too-little.  I'd like to have it back,
>> please.  :)
> 
> I'd be perfectly happy to go back to talking about "type annotations",
> rather than the more general "function annotations", especially since
> most of the discussion thus far has been about how to multiple things
> with annotations at the same time. Restricting annotations to type
> information would be fine by me.

I'd be happy to do that as well :)

So far, there has been a great deal of confusion and disagreement about 
this proposal. Some people might be surprised by that - however, my 
point from the beginning is that this confusion and disagreement is 
*inherent* in the concept of function annotations as currently envisioned.

What the current discussion demonstrates is that the number of different 
ways in which function annotations can be used is far larger and more 
diverse than anticipated ("Never underestimate the creative power of an 
infinite number of monkeys".) Normally, this wouldn't be seen as a 
problem, but rather a strength of the design. Whenever you have a broad 
and diverse set of use cases for a given feature, that's usually an 
indication that the feature has been designed well.

However, having a vast set of use cases only works if those use cases 
can have some degree of isolation from one another. If I write a 
decorator, I'm not too concerned about what other decorator classes may 
exist; I may not even be too concerned about what other decorators are 
applied to the same function as mine are.

However, function decorators are a little different than the usual case. 
Specifically, they need to be fairly concise, otherwise they are 
obfuscatory (as someone pointed out). One of the ways of achieving this 
conciseness is to remove the requirement to explicitly identify each 
annotation, and instead allow the meanings of the annotations to be 
implicit. (i.e. the use of built-in types rather than a dictionary of 
key/value pairs.)

The problem with implicit identification is that the category boundaries 
for each annotation are no longer clearly defined. This wouldn't be a 
problem if the number of use cases were small and widely separated. 
However, as the recent discussion has shown, the number of use cases is 
vast and diverse. This means that the implicitly defined categories are 
inevitably going to collide.

What I and others are worried about is that it appears that we are 
heading in a direction in which different users of function annotations 
will be forced to jostle elbows with each other - where each consumer of 
annotations, instead of being able to develop their annotation system in 
private, will be forced to consider the other annotation systems that 
exist already. For someone who is developing an annotation library that 
is intended for widespread use, the *mere existence* of other annotation 
libraries impacts their design and must be taken into account. I feel 
that this is an intolerable burden on the designers of such systems.

Some have proposed resolving this by going back to explicit 
identification of annotations, either by keyword or by unique types. 
However, this destroys some of the conciseness and simplicity of the 
annotations, something which others have objected to.

Personally, I think that the function annotation concept suffers from 
being too ambitious, attempting to be all things to all people. I don't 
think we really need docstring annotations - there are other ways to 
achieve the same effect. The same goes for type checkers and lint 
checkers and most of the other ideas for using annotations. All those 
things are nice, but if they never get done I'm not going to worry about 
it -- and none of these things are worth the level of madness and 
confusion generated by an N-way collision of incompatible frameworks.

I'm going to take a somewhat hard line here, and say that if it were up 
to me, I would ask Phillip Eby exactly what annotation features he needs 
to make his overload dispatching mechanism work, and then I would 
restrict the PEP to just that. In other words, rather than saying 
"annotations can be anything the programmer wants", I would instead say 
"This set of annotations is used for dispatching, any other use of 
annotations is undefined." Which is not to say that a programmer can't 
make up their own -- but that programmer should have no expectations 
that their code is going to be able to interoperate with anyone else's.

-- Talin


From collinw at gmail.com  Thu Aug 17 15:01:13 2006
From: collinw at gmail.com (Collin Winter)
Date: Thu, 17 Aug 2006 08:01:13 -0500
Subject: [Python-3000] Fwd: Conventions for annotation consumers (was:
	Re: Draft pre-PEP: function annotations)
In-Reply-To: <1cb725390608161711v7c0a93b9i3f9e2032da9254af@mail.gmail.com>
References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com>
	<43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com>
	<1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com>
	<43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com>
	<76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com>
	<76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com>
	<43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com>
	<1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com>
	<ca471dc20608160957v1934bd7bi9b0ed3ecfb2bcef7@mail.gmail.com>
	<1cb725390608161711v7c0a93b9i3f9e2032da9254af@mail.gmail.com>
Message-ID: <43aa6ff70608170601v4ef2435eq7824f35867767c7d@mail.gmail.com>

On 8/16/06, Paul Prescod <paul at prescod.net> wrote:
> Okay, you're the boss. The conversation did go pretty far afield but the
> main thing I wanted was just that if a user wanted to have annotations from
> framework 1 and framework 2 they could reliably express that as
>
> def foo(a: [Anno1, Anno2]):
>
> All that that requires is a statement in the spec saying: "If you're
> processing annotations and you see an annotation you don't understand, skip
> it. And if you see a list, look inside it rather than processing it in some
> proprietary fashion."

So, time for an embarrassing confession: I had a bit of a eureka
moment this morning, and I think I finally understand where you were
coming from with this idea. I honestly don't know what I thought you
were proposing, but now that I get it, my old conception seems like
rubbish. Consider the dict-based proposal withdrawn.

Apologies for my part in dragging this discussion into a triple-digit
comment count : )

Collin Winter

From bmx007 at gmail.com  Fri Aug 18 11:59:49 2006
From: bmx007 at gmail.com (bmx007)
Date: Fri, 18 Aug 2006 10:59:49 +0100
Subject: [Python-3000] Fwd: Conventions for annotation consumers
Message-ID: <3f2f9e8c0608180259ybfdf102r48eda9daafdebedf@mail.gmail.com>

Hi,
I haven't read all the thread because it's pretty long, but if I have
well understood
Paul and what is my opinion (and why I use docstring in my own
typecheker module)
is that it's a good idea to not mix function definition and its type.

I think the difference between langages is not what they allow to do,
but how it's easy to write something and easy to READ it (the
read-factor is why I switch from p... to python).
So separation of semantic and type is good thing because we don't
usualy need (as reader) to know both as the same time. So we can read
what we want.

As exemple, consider a function

def find(token, line):
    ...
with string as parameter a boolean as return value.

I allmost cases what we need to not is only the order or the parameter
so do I call find(token, line) or find(line, token) ?

In this case I shoud find easily the semantic of the parameters, and I
don't mind with their type (because I work in a context where I expect
them).
It obviously easiest to find if their is no extra-information so the old way
def find(token, line) is the best.

And it's the same for the type, when I care about the type is usually
a problem of consistency and I don't mind about semantic. This occurs
for exemple in "template" function as

def max(x, y):
   ...
where max could be
int, int -> int
float, float ->float
string, string -> string

and something like (or any equivalent)
def max(x, y):
   :: int, int -> int
   :: float, float -> float
   :: string, string -> string

is (I think) easy to read, write and we can skip easily the
information we don't mind.

Maxime

From ncoghlan at gmail.com  Fri Aug 18 17:14:16 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 19 Aug 2006 01:14:16 +1000
Subject: [Python-3000] Bound and unbound methods
In-Reply-To: <ca471dc20608160848o9812ed1jb83f94e4ec013c09@mail.gmail.com>
References: <44DF0D38.6070507@acm.org>
	<20060813102036.1985.JCARLSON@uci.edu>	<44DF86AA.7050207@acm.org>
	<44DFE092.8030604@canterbury.ac.nz>	<ca471dc20608152113n66471411qf72144022e88f04d@mail.gmail.com>	<76fd5acf0608152248j76f38d2x88ba241a8c66c835@mail.gmail.com>
	<ca471dc20608160848o9812ed1jb83f94e4ec013c09@mail.gmail.com>
Message-ID: <44E5D948.7070503@gmail.com>

Guido van Rossum wrote:
>> Would a possible special method name __methodcall__ be accepted, where
>> if it exists on a callable, you can expect to use it as __call__ but
>> with the understanding that it accepts <expr> as self when called in
>> an optimizable form? This would reduce the method call to two
>> attribute lookups before the call instead of an instansiation and all
>> the heavy lifting currently done. For normal functions,
>> 'f.__methodcall__ is f.__call__' may be true, but the existance of
>> that __methodcall__ name just gives you an extra contract.
> 
> I'd like to answer "no" (since I think this whole idea is not a very
> fruitful avenue) but frankly, I have no idea what you are trying to
> describe. Are you even aware of the descriptor protocol (__get__) and
> how it's used to create a bound method (or something else)?
> 
> No reply is needed.

If I understand Calvin right, the best speed up we could get for the status 
quo is for the "METHOD_CALL" opcode to:
   1. Do a lookup that bypasses the descriptor machinery (i.e. any __get__ 
methods are not called at this point)
   2. If the object is a function object, invoke __call__ directly, supplying 
the instance as the first argument
   3. If the object is a classmethod object, invoke __call__ directly, 
supplying the class as the first argument
   4. If the object is a staticmethod object, invoke __call__ directly, 
without supplying any extra arguments
   5. If the object has a __get__ method, call it and invoke __call__ on the 
result
   6. Otherwise, invoke __call__ on the object

(Caveat: this omits details of the lookup process regarding how descriptors 
are handled that an actual implementation would need to deal with).

I think what Calvin is suggesting is, instead of embedding all those special 
cases in the op code, allow a descriptor to define __methodcall__ as an 
optimised combination of calling __get__ and then invoking __call__ on the 
result. Then the sequence of events in the op code would be to:

   1. Do a lookup that bypasses the descriptor machinery
   2. If the object defines it, invoke __methodcall__ directly, supplying the 
instance as the first argument and the class as the second argument (similar 
to __get__), followed by the args tuple as the 3rd argument and the keyword 
dictionary as the 4th argument.
   5. If the object doesn't define __methodcall__, but has a __get__ method, 
then call it and invoke __call__ on the result
   6. Otherwise, invoke __call__ on the returned object

For example, on a function object, __methodcall__ would look like:

   def __methodcall__(self, obj, cls, args, kwds):
       if obj is None:
           raise TypeError("Cannot call unbound method")
       return self(obj, *args, **kwds)

On a class method descriptor:

   def __methodcall__(self, obj, cls, args, kwds):
       return self._function(cls, *args, **kwds)

On a static method descriptor:

   def __methodcall__(self, obj, cls, args, kwds):
       return self._function(*args, **kwds)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Fri Aug 18 18:18:39 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 19 Aug 2006 02:18:39 +1000
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
Message-ID: <44E5E85F.6080508@gmail.com>

Phillip J. Eby wrote:
> I'm frankly baffled by the amount of "protect users from incompatibility" 
> ranting that this issue has generated.  If I wanted to use Java, I'd know 
> where to find it.  Guido has said time and again that Python's balance 
> favors the individual developer at the expense of the group where 
> "consenting adults" is concerned, and Py3K isn't intended to change that 
> balance.

I actually thought Collin's approach in the PEP was reasonable (deferring the 
details of combining annotations until we had some more experience with how 
they could be made useful in practice). Some of the wording was a little 
strong (suggesting that the conventions would *never* be develop), but the 
idea was sound.

To try and put this in perspective:

1. I believe argument annotations have the most potential to be beneficial 
when used in conjunction with a single decorator chosen or written by the 
developer to support things like Foreign Function Interface type mapping 
(PyObjC, ctypes, XML-RPC, etc), or function overloading (RuleDispatch, etc).

2. If a developer wishes to use multiple annotations together, they can define 
their own annotation processing decorator that invokes the necessary 
operations using non-annotation based APIs provided by the appropriate 
framework, many of which already exist, and will continue to exist in Py3k due 
to the need to be able to process functions which have not been annotated at 
all (such as functions written in C).

3. The question has been raised as to whether or not there is a practical way 
for a developer to use annotations that make sense to a *static* analysis tool 
that doesn't actually execute the Python code

If someone figures out a way to handle the last point *without* compromising 
the ease of use for annotations designed to handle point 1, all well and good. 
Otherwise, I'd call YAGNI. OK, annotations wouldn't be useful for tools like 
pychecker in that case. So be it - to be really useful for a tool like 
pychecker they'd have to be ubiquitous, and that's really not Python any more.

All that said, I'm still not entirely convinced that function annotations are 
a good idea in the first place - I'm inclined to believe that signature 
objects providing a "bind" method that returns a dictionary mapping the method 
call's arguments to the function's named parameters will prove far more 
useful. With this approach, the 'annotations' would continue to be supplied as 
arguments to decorator factories instead of as expressions directly in the 
function header. IOW, I've yet to see any use case that is significantly 
easier to write with function annotations instead of decorator arguments, and 
several cases where function annotations are significantly worse.

For one thing, function annotations are useless for decorating a function that 
was defined elsewhere, whereas it doesn't matter where the function came from 
when using decorator arguments. The latter also has a major benefit in 
unambiguously associating each annotation with the decorator that is the 
intended consumer.

Consider an extreme example Josiah used elsewhere in this discussion:

 > @docstring
 > @typechecker
 > @constrain_values
 > def foo(a: [doc("frobination count"),
 >             type(Number),
 >             constrain_values(range(3,9))],
 >         b: [type(Number),
 >             # This can be only 4, 8 or 12
 >             constrain_values([4,8,12])]) -> type(Number):

Here's how it looks with decorator factories instead:

# Using keyword arguments
@docstring(a="frobination count")
@typechecker(a=Number, b=Number, _return=Number)
@constrain_values(a=range(3,9), b=[4,8,12])
def foo(a, b):
     # the code

# Using positional arguments
@docstring("frobination count")
@typechecker(Number, Number, _return=Number)
@constrain_values(range(3,9), [4,8,12])
def foo(a, b):
     # the code

All the disambiguation cruft is gone, the association between the decorators 
and the values they are processing is clear, the expressions are split 
naturally across the different decorator lines, and the basic signature is 
found easily by scanning for the last line before the indented section. The 
_return=Number is a bit ugly, but that could be handled by syntactic sugar 
that processed a "->expr" in a function call as equivalent to "return=expr" 
(i.e. adding the result of the expression to the keywords dictionary under the 
key "return").

Another advantage of the decorator-with-arguments approach is that you can 
call the decorator factory once, store the result in a variable, and then 
reuse that throughout your module, which is harder with annotations directly 
in the function header (which means that you can only share single 
annotations, not combinations of annotations). For example:

floats2_to_float2tuple = typechecker(float, float, _return=(float, float))

@floats2_to_float2tuple
def cartesian_to_polar(x, y):
     return math.sqrt(x*x + y*y), math.atan2(y, x)

@floats2_to_float2tuple
def polar_to_cartesian(r, theta):
     return r*math.cos(theta), r*math.sin(theta)

Cheers,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From rrr at ronadam.com  Sat Aug 19 12:29:31 2006
From: rrr at ronadam.com (Ron Adam)
Date: Sat, 19 Aug 2006 05:29:31 -0500
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <44E5E85F.6080508@gmail.com>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<44E5E85F.6080508@gmail.com>
Message-ID: <ec6pbl$7iv$1@sea.gmane.org>

Nick Coghlan wrote:

[Clipped other good points.]

> 3. The question has been raised as to whether or not there is a practical way 
> for a developer to use annotations that make sense to a *static* analysis tool 
> that doesn't actually execute the Python code
> 
> If someone figures out a way to handle the last point *without* compromising 
> the ease of use for annotations designed to handle point 1, all well and good. 
> Otherwise, I'd call YAGNI. OK, annotations wouldn't be useful for tools like 
> pychecker in that case. So be it - to be really useful for a tool like 
> pychecker they'd have to be ubiquitous, and that's really not Python any more.

Something I've been looking for is an alternate way to generate function 
signatures that are closer to those used in the documents.

Where help(str.find) gives:

     find(...)
         S.find(sub [,start [,end]]) -> int

         Return the lowest index in S where substring sub is found,
         such that sub is contained within s[start,end].  Optional
         arguments start and end are interpreted as in slice notation.

         Return -1 on failure.

But I am wondering if the annotations could help with both pydoc and 
pychecker.  Then maybe function specifications could be generated and 
look more like ...

    str.find(sub:IsString [,start:IsInt [,end:IsInt]]) -> IsInt

instead of just...

    find(...)


[See below where I'm going with this.]



> All that said, I'm still not entirely convinced that function annotations are 
> a good idea in the first place - I'm inclined to believe that signature 
> objects providing a "bind" method that returns a dictionary mapping the method 
> call's arguments to the function's named parameters will prove far more 
> useful. With this approach, the 'annotations' would continue to be supplied as 
> arguments to decorator factories instead of as expressions directly in the 
> function header. IOW, I've yet to see any use case that is significantly 
> easier to write with function annotations instead of decorator arguments, and 
> several cases where function annotations are significantly worse.
 >
> For one thing, function annotations are useless for decorating a function that 
> was defined elsewhere, whereas it doesn't matter where the function came from 
> when using decorator arguments. The latter also has a major benefit in 
> unambiguously associating each annotation with the decorator that is the 
> intended consumer.


I've been thinking about this also.  It seems maybe there is an effort 
to separate the "meta-data" and the "use of meta-data" a bit too finely. 
  So What you then get is lock and key effect where the decorators that 
use the meta-data and the meta-data itself are separate, but at the same 
time, strongly associated by location (module) and developer.  This may 
be a bit overstated in order to describe it, but I do think it's a 
concern as well. But this is also probably more of a style of use issue 
than an issue with annotations them selves.

The meta-data can also *be* the validator.  So instead of just using 
Float, Int, Long, etc... and writing a smart validator to read and check 
each of those, You can just call the meta-data directly with each 
related argument to validate/modify/or do whatever to it.

So this ...

  > @docstring
  > @typechecker
  > @constrain_values
  > def foo(a: [doc("frobination count"),
  >             type(Number),
  >             constrain_values(range(3,9))],
  >         b: [type(Number),
  >             # This can be only 4, 8 or 12
  >             constrain_values([4,8,12])]) -> type(Number):


could be reduced to ...    (removing redundant checks as well)

     from metalib import *

     @callmeta
     def foo( a: [ SetDoc("frobination count"), InRange(3,9) ],
              b: InSet([4,8,12]) )
              -> IsNumber:
        # code


Which isn't too bad. Or even as positional decorator arguments...

     from metalib import *

     @callmeta( [SetDoc("frobination count"), InRange(3,9)],
                InSet([4,8,12]),
                IsNumber )
     def foo(a, b):
         # code


Both of these are very similar.  The callmeta decorator would be 
impemented different, but by using the validators as the meta-data, it 
makes both versions easier to read and use. IMHO of course.


The metalib routines could be something (roughly) like...

     def IsNumber(arg):
	return type(arg) in (float, int, long)

     def IsString(arg):
         return type(arg) in (str, unicode)

     def InSet(list_):
	def inset(arg):
            return arg in list_
         return inset

     def InRange(start, stop):
         def inrange(arg):
             return start <= arg <= stop
         return inrange

     etc...


(Or it might be better for them to be objects.)


Anyway it's vary late and I'm probably overlooking something, and I 
haven't actually tried any of these so your mileage may vary.  ;-)

Cheers,
   Ron






From pedronis at strakt.com  Sat Aug 19 12:54:19 2006
From: pedronis at strakt.com (Samuele Pedroni)
Date: Sat, 19 Aug 2006 12:54:19 +0200
Subject: [Python-3000] signature annotation in the function signature or a
	separate line
In-Reply-To: <ca471dc20608161117h62bdaa96ra59eb2dc9bef5f78@mail.gmail.com>
References: <20060816090147.19DA.JCARLSON@uci.edu>	<ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>	<20060816102652.19E3.JCARLSON@uci.edu>
	<ca471dc20608161117h62bdaa96ra59eb2dc9bef5f78@mail.gmail.com>
Message-ID: <44E6EDDB.9070604@strakt.com>

Guido van Rossum wrote:
>>But maybe I'm misremembering the discussion, maybe decorators make it
>>very difficult to visually scan for function definitions, and maybe
>>people want all that garbage in their function signature.
> 
> 
> They don't want it, but if they're forced to have it occasionally
> they'll cope. I still think you're way overestimating the importance
> of this use case.
> 

Given that the meaning of annotations is meant not be predefined,
given that people are comining with arbitrarely verbose examples
thereof, given the precedent of type inferenced languages
that use a separate line for optional type information I think
devising a way to have the annotation on a different line
with a decorator like introduction instead of mixed with
the function head would be saner:

One possibility would be to have a syntax for signature expressions
and then allow them as decorators with the obvious effect of attaching
themself:

@sig int,int -> int
def f(a,b):
     return a+b

or with argument optional argument names:

@sig a: int,b: int -> int
def f(a,b):
     return a+b

sig expressions (possibly with parens) would be first class
and be able to appear anywhere an expression is allowed,
they would produce an object embedding the signature information.

So both of these would be possible:

@typecheck
@sig int,int -> int
def f(a,b):
     return a+b

@typecheck(sig int,int -> int)
def f(a,b):
     return a+b

For example having first-class signatures would help express nicely 
reflective queries on overloaded/generic functions, etc...

regards.






From guido at python.org  Sat Aug 19 17:09:53 2006
From: guido at python.org (Guido van Rossum)
Date: Sat, 19 Aug 2006 08:09:53 -0700
Subject: [Python-3000] int-long unification
Message-ID: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>

Martin,

I've thought about it more, and I think it's fine to use a single
type. It will surely simplify many things, and that alone might help
us win back some of the inefficiency this introduces. And it is best
for Python-level users.

Are you interested in doing this at the Google sprint next week?

Here's how I would approach it:

0. Benchmark. (Py3k is slower than 2.5 at the moment, I don't know
why.) I would pick the benchmark that showed the biggest sensitivity
in your recent comparisons.

1. Completely gut intobject.[ch], making all PyInt APIs equivalent to
the corresponding PyLong APIs (through macros if possible). The PyInt
macros become functions. I'm not sure whether it would be better for
PyInt_Check() to always return False or to always return True. In
bltinmodule, export "int" as an alias for "long".

2. Bang on the rest of the code until it compiles and passes all unit
tests (except the 5 that I haven't managed to fix yet -- test_class,
test_descr, test_minidom, and the two etree tests). (Right now many
more are broken due to the elimination of has_key; I'll fix these over
the weekend.)

3. Go over much of the C code where it special-cases PyInt and PyLong
separately, and change this to only use the PyLong calls. Keep the
unittests working.

4. Benchmark.

5. Introduce some optimizations into longobject.c, e.g. a cache for
small ints (like we had in intobject.c), and perhaps a special
representation for values less than maxint (or for anything that fits
in a long long). Or anything else you can think of.

6. Benchmark.

7. Repeat from 5 until satisfied.

At this point I wouldn't rip out the PyInt APIs; leaving them in
aliased to PyLong APIs for a while will let us put off the work on
some of the more obscure extension modules.

What do you think?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ncoghlan at gmail.com  Sat Aug 19 19:54:00 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Aug 2006 03:54:00 +1000
Subject: [Python-3000] signature annotation in the function signature or
 a	separate line
In-Reply-To: <44E6EDDB.9070604@strakt.com>
References: <20060816090147.19DA.JCARLSON@uci.edu>	<ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>	<20060816102652.19E3.JCARLSON@uci.edu>	<ca471dc20608161117h62bdaa96ra59eb2dc9bef5f78@mail.gmail.com>
	<44E6EDDB.9070604@strakt.com>
Message-ID: <44E75038.1090007@gmail.com>

Samuele Pedroni wrote:
> Guido van Rossum wrote:
>>> But maybe I'm misremembering the discussion, maybe decorators make it
>>> very difficult to visually scan for function definitions, and maybe
>>> people want all that garbage in their function signature.
>>
>> They don't want it, but if they're forced to have it occasionally
>> they'll cope. I still think you're way overestimating the importance
>> of this use case.
>>
> 
> Given that the meaning of annotations is meant not be predefined,
> given that people are comining with arbitrarely verbose examples
> thereof, given the precedent of type inferenced languages
> that use a separate line for optional type information I think
> devising a way to have the annotation on a different line
> with a decorator like introduction instead of mixed with
> the function head would be saner:
> 
> One possibility would be to have a syntax for signature expressions
> and then allow them as decorators with the obvious effect of attaching
> themself:
> 
> @sig int,int -> int
> def f(a,b):
>      return a+b
> 
> or with argument optional argument names:
> 
> @sig a: int,b: int -> int
> def f(a,b):
>      return a+b
> 
> sig expressions (possibly with parens) would be first class
> and be able to appear anywhere an expression is allowed,
> they would produce an object embedding the signature information.

What would a separate sig expression buy you over defining "->expr" as a 
special form of keyword argument that binds to the keyword name "return" in 
the dictionary for storing extra keyword arguments?

With the argument based approach, the two above examples would look like:

@sig(int, int, ->int)
def f(a,b):
      return a+b

@sig(a=int, b=int, ->int)
def f(a,b):
      return a+b

The implementation of sig might look something like:
   def sig(*args, **kwds):
       def annotator(f):
           # Assume bind() is defined to pass through any
           # 'return' binding into the returned mapping
           # Otherwise, it uses normal parameter binding
           notes = f.__signature__.bind(*args, **kwds)
           f.__signature__.annotations = notes
           return f
       return annotator

The longer this discussion goes on, the more convinced I become that making it 
easier to write decorator factories that produce decorators that map the 
factory's arguments to the decorated function's parameters is a better idea 
than adding function annotations directly to the function signature.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From pedronis at strakt.com  Sat Aug 19 19:59:23 2006
From: pedronis at strakt.com (Samuele Pedroni)
Date: Sat, 19 Aug 2006 19:59:23 +0200
Subject: [Python-3000] signature annotation in the function signature or
 a	separate line
In-Reply-To: <44E75038.1090007@gmail.com>
References: <20060816090147.19DA.JCARLSON@uci.edu>	<ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>	<20060816102652.19E3.JCARLSON@uci.edu>	<ca471dc20608161117h62bdaa96ra59eb2dc9bef5f78@mail.gmail.com>
	<44E6EDDB.9070604@strakt.com> <44E75038.1090007@gmail.com>
Message-ID: <44E7517B.9020300@strakt.com>

Nick Coghlan wrote:
> 
> What would a separate sig expression buy you over defining "->expr" as a 
> special form of keyword argument that binds to the keyword name "return" 
> in the dictionary for storing extra keyword arguments?

seems to me a quirky addition of sugar, also could not be limited; I 
prefer going the full length and supporting argument name introduction 
with : etc. as shown in the example.

But it seems we agree that interspensing the annotation in the main
head of the function is not such a great idea after all.

From pedronis at strakt.com  Sat Aug 19 20:08:08 2006
From: pedronis at strakt.com (Samuele Pedroni)
Date: Sat, 19 Aug 2006 20:08:08 +0200
Subject: [Python-3000] signature annotation in the function signature or
 a	separate line
In-Reply-To: <44E7517B.9020300@strakt.com>
References: <20060816090147.19DA.JCARLSON@uci.edu>	<ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>	<20060816102652.19E3.JCARLSON@uci.edu>	<ca471dc20608161117h62bdaa96ra59eb2dc9bef5f78@mail.gmail.com>	<44E6EDDB.9070604@strakt.com>
	<44E75038.1090007@gmail.com> <44E7517B.9020300@strakt.com>
Message-ID: <44E75388.2080907@strakt.com>

Samuele Pedroni wrote:
> Nick Coghlan wrote:
> 
>>What would a separate sig expression buy you over defining "->expr" as a 
>>special form of keyword argument that binds to the keyword name "return" 
>>in the dictionary for storing extra keyword arguments?
> 
> 
> seems to me a quirky addition of sugar, also could not be limited; I 
> prefer going the full length and supporting argument name introduction 
> with : etc. as shown in the example.
> 

to be more precise, I find:

@sig a: int, b: int -> int

more readable and to the point than: @sig(a=int,b=int,->int).
First-class sig expressions can have rules to leave out parens
as genexp etc.

Also it can be extended to support attaching annotations to * and ** 
args. It would be hard to devise separate sugar for those.

> But it seems we agree that interspensing the annotation in the main
> head of the function is not such a great idea after all.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/pedronis%40strakt.com


From brett at python.org  Sat Aug 19 21:06:02 2006
From: brett at python.org (Brett Cannon)
Date: Sat, 19 Aug 2006 12:06:02 -0700
Subject: [Python-3000] int-long unification
In-Reply-To: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>
References: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>
Message-ID: <bbaeab100608191206q7b60048cg6c9cf46fe71b309a@mail.gmail.com>

On 8/19/06, Guido van Rossum <guido at python.org> wrote:
>
> Martin,
>
> I've thought about it more, and I think it's fine to use a single
> type. It will surely simplify many things, and that alone might help
> us win back some of the inefficiency this introduces. And it is best
> for Python-level users.


Woohoo!  I totally support this idea (along with anything else that comes up
to simplify the C API; I almost feel like we need a dumbed-down API along
with the full-powered API behind it).  I also support Martin doing the work
=)  (but that's mostly because I know he is in a good position to do it
well).


-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060819/c6bd844a/attachment.htm 

From paul at prescod.net  Sat Aug 19 21:19:54 2006
From: paul at prescod.net (Paul Prescod)
Date: Sat, 19 Aug 2006 12:19:54 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <ec6pbl$7iv$1@sea.gmane.org>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<44E5E85F.6080508@gmail.com> <ec6pbl$7iv$1@sea.gmane.org>
Message-ID: <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com>

On 8/19/06, Ron Adam <rrr at ronadam.com> wrote:
>
>      @callmeta
>      def foo( a: [ SetDoc("frobination count"), InRange(3,9) ],
>               b: InSet([4,8,12]) )
>               -> IsNumber:
>         # code


What extra information or value does the callmeta decorator provide? For the
sake of argument, I'll presume it has some useful function. Even so, it
doesn't make sense to explictly attach it to every function.

Imagine a hundred such functions in a module. Would it be better to do this:

@callmeta
def func1(..): ...

@callmeta
def func2(..): ...

@callmeta
def func3(..): ...

@callmeta
def func4(..): ...

@callmeta
def func5(..): ...

Or to do this:

func1(...):...

func2(...):...

func3(...):...

func4(...):...

func5(...):...

callmeta()

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060819/c0ac90cc/attachment.html 

From rrr at ronadam.com  Sun Aug 20 00:27:06 2006
From: rrr at ronadam.com (Ron Adam)
Date: Sat, 19 Aug 2006 17:27:06 -0500
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>	<44E5E85F.6080508@gmail.com>
	<ec6pbl$7iv$1@sea.gmane.org>
	<1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com>
Message-ID: <ec83d2$o69$1@sea.gmane.org>

Paul Prescod wrote:
> On 8/19/06, *Ron Adam* <rrr at ronadam.com <mailto:rrr at ronadam.com>> wrote:
> 
>          @callmeta
>          def foo( a: [ SetDoc("frobination count"), InRange(3,9) ],
>                   b: InSet([4,8,12]) )
>                   -> IsNumber:
>             # code
> 
> 
> What extra information or value does the callmeta decorator provide? For 
> the sake of argument, I'll presume it has some useful function. Even so, 
> it doesn't make sense to explictly attach it to every function.

The callmeta decorator wouldn't provide any extra information itself, 
all it does is decorate(wrap) the functions so that the meta data gets 
called.  It activates the meta data calls.



> Imagine a hundred such functions in a module. Would it be better to do this:
> 
> @callmeta
> def func1(..): ...
> 
> @callmeta
> def func2(..): ...
> 
> @callmeta
> def func3(..): ...
> 
> @callmeta
> def func4(..): ...
> 
> @callmeta
> def func5(..): ...

Isn't this the same?


> Or to do this:
> 
> func1(...):...
> 
> func2(...):...
> 
> func3(...):...
> 
> func4(...):...
> 
> func5(...):...
> 
> callmeta()

So here callmeta() wraps all the functions to activate the meta data? 
That should also work if you want to activate all the functions or a 
large list of functions with meta data.  It could just skip those 
without callable meta data.



>  Paul Prescod




From bob at redivi.com  Sun Aug 20 01:57:47 2006
From: bob at redivi.com (Bob Ippolito)
Date: Sat, 19 Aug 2006 16:57:47 -0700
Subject: [Python-3000] int-long unification
In-Reply-To: <bbaeab100608191206q7b60048cg6c9cf46fe71b309a@mail.gmail.com>
References: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>
	<bbaeab100608191206q7b60048cg6c9cf46fe71b309a@mail.gmail.com>
Message-ID: <6a36e7290608191657h36645421u3b0859dc504c40b3@mail.gmail.com>

On 8/19/06, Brett Cannon <brett at python.org> wrote:
>
> On 8/19/06, Guido van Rossum <guido at python.org> wrote:
> > Martin,
> >
> > I've thought about it more, and I think it's fine to use a single
> > type. It will surely simplify many things, and that alone might help
> > us win back some of the inefficiency this introduces. And it is best
> > for Python-level users.
>
>
> Woohoo!  I totally support this idea (along with anything else that comes up
> to simplify the C API; I almost feel like we need a dumbed-down API along
> with the full-powered API behind it).  I also support Martin doing the work
> =)  (but that's mostly because I know he is in a good position to do it
> well).

The easiest thing we could do to simplify extension writing would be
to supply a script that generates extension source and a setup.py from
a generic template. The template would demonstrate the current best
practices for defining a function, a constant, an Exception subclass,
and a class that wraps a C struct with a method or two.

-bob

From paul at prescod.net  Sun Aug 20 02:06:23 2006
From: paul at prescod.net (Paul Prescod)
Date: Sat, 19 Aug 2006 17:06:23 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <ec83d2$o69$1@sea.gmane.org>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<44E5E85F.6080508@gmail.com> <ec6pbl$7iv$1@sea.gmane.org>
	<1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com>
	<ec83d2$o69$1@sea.gmane.org>
Message-ID: <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com>

On 8/19/06, Ron Adam <rrr at ronadam.com> wrote:
>
>
> The callmeta decorator wouldn't provide any extra information itself,
> all it does is decorate(wrap) the functions so that the meta data gets
> called.  It activates the meta data calls.


I think we're using the word "metadata" differently. In my universe,
metadata is a form of data and you don't "call" data. You just assert it. I
think that what you are trying to do is USE metadata as a form of runtime
precondition. That's totally fine as long as we are clear that there are
many uses for metadata that do not require anything to "happen" during the
function's instantiation. A documentation annotation or annotation to map to
a foreign type system are examples. So the decorator is allowed but
optional. Given that that's the case, I guess I don't understand the virtue
of bringing decorators into the picture. Yes, they are one consumer of
metadata. Module-scoped functions are another. Application scoped functions
are another. Third party data extraction programs are another. Decorators
working with metadata are just special cases of runtime processors of it.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060819/e291e9b7/attachment.html 

From rrr at ronadam.com  Sun Aug 20 05:17:28 2006
From: rrr at ronadam.com (Ron Adam)
Date: Sat, 19 Aug 2006 22:17:28 -0500
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>	<44E5E85F.6080508@gmail.com>
	<ec6pbl$7iv$1@sea.gmane.org>	<1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com>	<ec83d2$o69$1@sea.gmane.org>
	<1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com>
Message-ID: <ec8kdh$o86$1@sea.gmane.org>

Paul Prescod wrote:
> On 8/19/06, *Ron Adam* <rrr at ronadam.com <mailto:rrr at ronadam.com>> wrote:
> 
> 
>     The callmeta decorator wouldn't provide any extra information itself,
>     all it does is decorate(wrap) the functions so that the meta data gets
>     called.  It activates the meta data calls.
> 
> 
> I think we're using the word "metadata" differently. In my universe, 
> metadata is a form of data and you don't "call" data. You just assert 
> it. I think that what you are trying to do is USE metadata as a form of 
> runtime precondition. 

Yes, I am extending the term in this case to include the details of 
implementing the meta-data.  If you describe something in enough detail 
it might as well python code.  And if it is python code, well why not 
make use of that?


Each of these describes "some info" to greater detail.

(1) "some info"


(2) some_info = "Brief description of some_info."


(3) some_info =
        """
           Detailed description of what
           some info is, and how to use it.

           Pseudo code to how to implement some_info property.
              (pseudo code ...)
        """

(4) some_info =
        """
           Detailed description of what
           some info is, and how to use it.

           # example python code to implement the
           # some_info property of x.
           def some_info_foo(x):
               ...
        """

(5) some_info(x):
        """
           some info - descripton of what some_info is.
        """
        <python code as the exact description of some_info>



This last one describes it so precisely it can actually be called if it 
is desired.  So why not use it?


> That's totally fine as long as we are clear that 
> there are many uses for metadata that do not require anything to 
> "happen" during the function's instantiation. A documentation annotation 
> or annotation to map to a foreign type system are examples. So the 
> decorator is allowed but optional. 

Yes, it is optional in this case too.  Just because it's callable, 
doesn't mean it has to be called to be used.  I could just as well use 
the doc attribute of the meta function, or the function name in any way 
I want and ignore the code completely.


> Given that that's the case, I guess I 
> don't understand the virtue of bringing decorators into the picture. 
> Yes, they are one consumer of metadata. Module-scoped functions are 
> another. Application scoped functions are another. Third party data 
> extraction programs are another. Decorators working with metadata are 
> just special cases of runtime processors of it.
> 
>  Paul Prescod

Decorators reduce repetition and put their labels before a function 
instead of after it. But they aren't required at *any* time because you 
can do the same thing without them, but they can help make the code more 
readable.

    @decorator
    def foo(x):
       # code

is the same as...

    def foo(x):
       # code
    foo = decorator(foo)

The name is repeated three times in the non-decorator version and 
because it is located after the function, it might not be noticed. Other 
than that, they are never required.  (Unless I'm unawares of special cases.)

(Not meaning to start a decorator discussion, just clarifying.)

Cheers,
    Ron






From ncoghlan at gmail.com  Sun Aug 20 06:45:43 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Aug 2006 14:45:43 +1000
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>	<44E5E85F.6080508@gmail.com>
	<ec6pbl$7iv$1@sea.gmane.org>	<1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com>	<ec83d2$o69$1@sea.gmane.org>
	<1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com>
Message-ID: <44E7E8F7.30101@gmail.com>

Paul Prescod wrote:
> Given that that's the case, I guess I 
> don't understand the virtue of bringing decorators into the picture. 
> Yes, they are one consumer of metadata. Module-scoped functions are 
> another. Application scoped functions are another. Third party data 
> extraction programs are another. Decorators working with metadata are 
> just special cases of runtime processors of it.

The reason I believe decorators are relevant is because the question that has 
caused this discussion to go on for so long is one of *disambiguation*. That 
is, there are *lots* of different reasons for annotating a function signature, 
so how does a progammer indicate which particular interpretation is the one 
they mean? Obviously, you can say, "I'm using the signature annotations in my 
module for purpose X". However, a later maintainer of your module may go "but 
I wanted to use those annotations for purpose Y!".

Without function signature annotations in the syntax, *this is not a problem*. 
The One Obvious Way to implement both purpose X and purpose Y is as decorator 
factories that accept as arguments the information corresponding to each of 
the function parameters. Multiple decorators can already be stacked on a 
single function, and the names of the different decorators allow the different 
uses to be easily distinguished using the full power of Python's variable 
namespaces.

If signature annotations are added to the language, however, you have a new 
way of doing things: put the information in the signature annotations and 
write a decorator that consumes the signature information. And if two 
different utilities do that, then you have a conflict, and have to invent a 
mechanism for resolving it. And this disambiguation has to happen for each 
individual signature annotation instead of being done once for the whole 
function as would be the case with using separate decorators.

So, as far as I can see, adding signature annotations doesn't let us do 
anything that can't already be done with less ambiguity using decorator 
factories that accept the appropriate arguments.

Samuele's idea of "signature expressions" (i.e. a literal or builtin function 
for producing objects that describe a function's signature) seems like a 
*much* more fruitful avenue for exploration, as it would provide a genuine 
increase in expressiveness (decorator factories would be able to accept a 
single signature argument instead of separate arguments that then need to be 
mapped to the relevant function parameter).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Sun Aug 20 17:04:33 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 20 Aug 2006 08:04:33 -0700
Subject: [Python-3000] signature annotation in the function signature or
	a separate line
In-Reply-To: <44E6EDDB.9070604@strakt.com>
References: <20060816090147.19DA.JCARLSON@uci.edu>
	<ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>
	<20060816102652.19E3.JCARLSON@uci.edu>
	<ca471dc20608161117h62bdaa96ra59eb2dc9bef5f78@mail.gmail.com>
	<44E6EDDB.9070604@strakt.com>
Message-ID: <ca471dc20608200804w5ffe2bf2t4c15f2c715a861a8@mail.gmail.com>

On 8/19/06, Samuele Pedroni <pedronis at strakt.com> wrote:
> Given that the meaning of annotations is meant not be predefined,

Not sure what that has to do with it.

> given that people are comining with arbitrarely verbose examples
> thereof,

Which I believe are worst-case scenarios and not what we'll see in practice.

> given the precedent of type inferenced languages
> that use a separate line for optional type information

Can you show us an example or two?

> I think
> devising a way to have the annotation on a different line
> with a decorator like introduction instead of mixed with
> the function head would be saner:
>
> One possibility would be to have a syntax for signature expressions
> and then allow them as decorators with the obvious effect of attaching
> themself:
>
> @sig int,int -> int
> def f(a,b):
>      return a+b

One problem with this is that for larger argument lists it's hard for the
(human) reader to match types up with arguments. In general I don't like
having two parallel lists of things that must be matched up; I'd much rather
have a single list containing all the info packed together.

> or with argument optional argument names:
>
> @sig a: int,b: int -> int
> def f(a,b):
>      return a+b

This seems like it would merely move the problem to the previous line; it
doesn't solve the problem that the signature becomes unreadable when the
type expressions are long lists or dicts.

My own recommended solution for long signatures is to generously use the
Python equivalent of 'typedef'; instead of writing

def f(a: [PEAK("some peakish expression here"),
                  Zope("some zopeish expression here")],
           b: [...more of the same...]) -> [PEAK("...", Zope("...")]:
    return a+b

I think most cases can be made a lot more readable by saying

type_a = [PEAK("some peakish expression here"),
                  Zope("some zopeish expression here")]
type_b = [...more of the same...]) -> [PEAK("...", Zope("...")]
type_f = [PEAK("...", Zope("...")]

def f(a: type_a, b: type_b) -> type_f:
    return a+b

especially since I expect that in many cases there will be typedefs that can
be shared between multiple signatures.

> sig expressions (possibly with parens) would be first class
> and be able to appear anywhere an expression is allowed,
> they would produce an object embedding the signature information.

I think it's a good idea to have a way to produce a signature object without
tying it to a function definition; but I'd rather not introduce any new
syntax for just this purpose. For purely positional signatures, this could
be done using a built-in function, e.g.

  s = sig(int, int, returns=int)

I'm not sure what to do to create signatures that include the variable
names, the best I can come up with is

  s = sig(('a', int), ('b', int), returns=int)

(Note that you can't use keyword parameters because that would lose the
ordering of the parameters. Possibly signatures could be limited to
describing parameters that are purely positional and parameters that are
purely keyword but no mixed-mode parameters? Nah, too restrictive.)

But I still don't want to introduce new syntax just for this. In extreme
cases you can always define a dummy function and extract its __signature__
object.

> So both of these would be possible:
>
> @typecheck
> @sig int,int -> int
> def f(a,b):
>      return a+b
>
> @typecheck(sig int,int -> int)
> def f(a,b):
>      return a+b

I'm not sure we need more ways to express the same thing. :-)

> For example having first-class signatures would help express nicely
> reflective queries on overloaded/generic functions, etc...

Agreed. But I think there's a way without forcing the annotations out of the
'def' line.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060820/84237306/attachment.html 

From g.brandl at gmx.net  Sun Aug 20 17:21:35 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 20 Aug 2006 17:21:35 +0200
Subject: [Python-3000] raise with traceback?
Message-ID: <ec9um0$pqs$1@sea.gmane.org>

Hi,

as

   raise ValueError, "something went wrong"

is going to go away, how will one raise with a custom traceback?
The obvious

   raise ValueError("something went wrong"), traceback

or something more esoteric like

   raise ValueError("something went wrong") with traceback

?

Georg


From guido at python.org  Sun Aug 20 17:53:49 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 20 Aug 2006 08:53:49 -0700
Subject: [Python-3000] raise with traceback?
In-Reply-To: <ec9um0$pqs$1@sea.gmane.org>
References: <ec9um0$pqs$1@sea.gmane.org>
Message-ID: <ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a@mail.gmail.com>

The 'with' syntax is attractive because it will flag all unconverted
code as a syntax error.

I wonder if "raise ValueError" should still be allowed (as equivalent
to "raise ValueError()") or that it should be disallowed.

--Guido

On 8/20/06, Georg Brandl <g.brandl at gmx.net> wrote:
> Hi,
>
> as
>
>    raise ValueError, "something went wrong"
>
> is going to go away, how will one raise with a custom traceback?
> The obvious
>
>    raise ValueError("something went wrong"), traceback
>
> or something more esoteric like
>
>    raise ValueError("something went wrong") with traceback
>
> ?
>
> Georg
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Sun Aug 20 18:08:55 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 20 Aug 2006 12:08:55 -0400
Subject: [Python-3000] raise with traceback?
References: <ec9um0$pqs$1@sea.gmane.org>
	<ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a@mail.gmail.com>
Message-ID: <eca1em$16m$1@sea.gmane.org>


"Guido van Rossum" <guido at python.org> wrote in message 
news:ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a at mail.gmail.com...
> I wonder if "raise ValueError" should still be allowed (as equivalent
> to "raise ValueError()") or that it should be disallowed.

+1 for disallow.

raise <exception class instance> is a simple rule to remember.

Having VE == VE() in certain contexts is/would be like haveing s.len == 
s.len() or func == func() (a moderately frequent newbie request) in certain 
contexts.

Plus, why encourage less-helpful, no message exceptions ;-)

Terry Jan Reedy




From guido at python.org  Sun Aug 20 18:10:55 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 20 Aug 2006 09:10:55 -0700
Subject: [Python-3000] Google Sprint Ideas
Message-ID: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>

I've created a wiki page with some ideas for Python 3000 things we
could do at the Google sprint (starting Monday). See:

  http://wiki.python.org/moin/GoogleSprintPy3k

For general info about this sprint -- it's not too late to come! -- see:

  http://wiki.python.org/moin/GoogleSprint

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Sun Aug 20 18:11:32 2006
From: barry at python.org (Barry Warsaw)
Date: Sun, 20 Aug 2006 12:11:32 -0400
Subject: [Python-3000] raise with traceback?
In-Reply-To: <ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a@mail.gmail.com>
References: <ec9um0$pqs$1@sea.gmane.org>
	<ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a@mail.gmail.com>
Message-ID: <FFAD60C5-079B-4F34-8D7B-A91D1EAF59E9@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 20, 2006, at 11:53 AM, Guido van Rossum wrote:

> The 'with' syntax is attractive because it will flag all unconverted
> code as a syntax error.
>
> I wonder if "raise ValueError" should still be allowed (as equivalent
> to "raise ValueError()") or that it should be disallowed.

I say keep it.  I don't see much value in requiring empty  
parentheses, except maybe to keep my left pinkie limber.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBROiJtXEjvBPtnXfVAQKkQwP/WTYvfFYYlA5ukmDmvTg3G5BVCYEyC8hQ
8jZXfnzm0j8PdCGJp2ym16ux0+MIRsMx1taU0VGRpULF4hPfRPHG92EQm/YDRGBm
1X5fXNmQ2sbMAb84GqO6HiQxbUkP70Zu5DbgQj3pCqCO3oJLuqXie1gj5neezBoR
lj2yQHiUnP8=
=JFG+
-----END PGP SIGNATURE-----

From g.brandl at gmx.net  Sun Aug 20 18:12:48 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 20 Aug 2006 18:12:48 +0200
Subject: [Python-3000] raise with traceback?
In-Reply-To: <eca1em$16m$1@sea.gmane.org>
References: <ec9um0$pqs$1@sea.gmane.org>	<ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a@mail.gmail.com>
	<eca1em$16m$1@sea.gmane.org>
Message-ID: <eca1m1$1ec$1@sea.gmane.org>

Terry Reedy wrote:
> "Guido van Rossum" <guido at python.org> wrote in message 
> news:ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a at mail.gmail.com...
>> I wonder if "raise ValueError" should still be allowed (as equivalent
>> to "raise ValueError()") or that it should be disallowed.
>
> +1 for disallow.
> 
> raise <exception class instance> is a simple rule to remember.
> 
> Having VE == VE() in certain contexts is/would be like haveing s.len == 
> s.len() or func == func() (a moderately frequent newbie request) in certain 
> contexts.
> 
> Plus, why encourage less-helpful, no message exceptions ;-)

Some exceptions don't need a message, such as StopIteration, and other
possibly user-defined ones meant to be caught immediately in surrounding
code.

Though I agree that it makes explanations (and probably some bits of code)
easier to only allow instances after raise.

Georg


From martin at v.loewis.de  Sun Aug 20 18:30:23 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 20 Aug 2006 18:30:23 +0200
Subject: [Python-3000] int-long unification
In-Reply-To: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>
References: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>
Message-ID: <44E88E1F.6010607@v.loewis.de>

Guido van Rossum schrieb:
> Are you interested in doing this at the Google sprint next week?

Sure; I hadn't any special plans so far.

> What do you think?

Sounds good. There are two problems I see:

- how to benchmark?

- there are subtle details in the API that require changes
  to extension code. In particular, PyInt_AsLong currently
  cannot fail, but can fail with a range error after the
  unification.

However, to evaluate the performance, it is possible to work
around that.

For this specific problem, I would propose to introduce
another API, say

int PyLong_ToLong(PyObject* val, long* result);

which will return true(1) for success, and set an exception
in case of a failure. Then, we get

long PyLong_AsLong(PyObj *val)
{
  long result;
  if(!PyLong_ToLong(val, &result))return -1;
  return result;
}

and perhaps

long PyInt_AsLong(PyObj* val)
{
  long result;
  if(!PyLong_ToLong(val, &result))
    Py_FatalError("old-style integer conversion failed");
  return result;
}

Regards,
Martin

From guido at python.org  Sun Aug 20 18:43:05 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 20 Aug 2006 09:43:05 -0700
Subject: [Python-3000] int-long unification
In-Reply-To: <44E88E1F.6010607@v.loewis.de>
References: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>
	<44E88E1F.6010607@v.loewis.de>
Message-ID: <ca471dc20608200943w9118b6el8fec43d2e539352c@mail.gmail.com>

On 8/20/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Guido van Rossum schrieb:
> > Are you interested in doing this at the Google sprint next week?
>
> Sure; I hadn't any special plans so far.
>
> > What do you think?
>
> Sounds good. There are two problems I see:
>
> - how to benchmark?

We could possibly do a lot of int allocations and deallocations in a
temporary extension module.

> - there are subtle details in the API that require changes
>   to extension code. In particular, PyInt_AsLong currently
>   cannot fail, but can fail with a range error after the
>   unification.
>
> However, to evaluate the performance, it is possible to work
> around that.
>
> For this specific problem, I would propose to introduce
> another API, say
>
> int PyLong_ToLong(PyObject* val, long* result);
>
> which will return true(1) for success, and set an exception
> in case of a failure. Then, we get
>
> long PyLong_AsLong(PyObj *val)
> {
>   long result;
>   if(!PyLong_ToLong(val, &result))return -1;
>   return result;
> }
>
> and perhaps
>
> long PyInt_AsLong(PyObj* val)
> {
>   long result;
>   if(!PyLong_ToLong(val, &result))
>     Py_FatalError("old-style integer conversion failed");
>   return result;
> }

The fatal error strikes me as unpleasant. Perhaps PyInt_Check[Exact]
should return false if the value won't fit in a C long? Or perhaps we
could just return -sys.maxint-1?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Sun Aug 20 20:12:29 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 20 Aug 2006 14:12:29 -0400
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<44E5E85F.6080508@gmail.com> <ec6pbl$7iv$1@sea.gmane.org>
	<1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com>
	<ec83d2$o69$1@sea.gmane.org>
	<1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com>
Message-ID: <fb6fbf560608201112j4c0b2b97k1f3acaa73a7c90d9@mail.gmail.com>

On 8/19/06, Paul Prescod <paul at prescod.net> wrote:
> On 8/19/06, Ron Adam <rrr at ronadam.com> wrote:

> ... don't understand the virtue of bringing
> decorators into the picture. Yes, they are
> one consumer of metadata.

They aren't being brought in as sample *consumers*; they are being
suggested as *producers* of metadata.

The following works to assert the data

>>> def f(a, b):
...
>>> f.a=int

We're discussing the alternative of

>>> def f(a:int, b):

which is better for some things -- but much worse for others; if the
metadata is any longer than int, it is almost certainly worse.  So (I
believe) he is suggesting that we just reuse decorator syntax

>>> @sig(a=int)
... def f(a, b):

This keeps the single function declaration line short and sweet,
reflecting (modulo "self" and a colon) how it is actually called.  It
gets the annotations (including type information) up where they should
be, but they don't overwhelm the variable names.

Whether to also add signature expressions (to make @sig decorators
easier to write) is a separate question; the key point is not to mess
with the one-line function summary.

-jJ

From osantana at gmail.com  Sun Aug 20 21:03:36 2006
From: osantana at gmail.com (Osvaldo Santana)
Date: Sun, 20 Aug 2006 16:03:36 -0300
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
Message-ID: <b674ca220608201203p66bc1f8ch9d3c02bb1186856d@mail.gmail.com>

Hi Guido,

On 8/20/06, Guido van Rossum <guido at python.org> wrote:
> I've created a wiki page with some ideas for Python 3000 things we
> could do at the Google sprint (starting Monday). See:
>
>   http://wiki.python.org/moin/GoogleSprintPy3k

I'm interested in contribute with the task "Rewrite import in Python
(Brett Cannon)".

I've started to study the Python import mechanism at interpreter
startup to understand how it works
(http://pythonologia.org/python_import/) and I've some ideas to make
this rewrite too.

I'll have full time at my job to work on this.

Thanks,
Osvaldo

-- 
Osvaldo Santana Neto
Python for Maemo developer
icq, url = (11287184, "http://www.pythonbrasil.com.br")

From guido at python.org  Sun Aug 20 21:59:01 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 20 Aug 2006 12:59:01 -0700
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <b674ca220608201203p66bc1f8ch9d3c02bb1186856d@mail.gmail.com>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
	<b674ca220608201203p66bc1f8ch9d3c02bb1186856d@mail.gmail.com>
Message-ID: <ca471dc20608201259m35adc454i129d23a043812b5a@mail.gmail.com>

Excellent! I'm adding Brett to the CC's. Can you update the wiki page
adding your name to that task? Are you coming to the sprint in person
or are you just going to be sprinting at your own place?

--Guido

On 8/20/06, Osvaldo Santana <osantana at gmail.com> wrote:
> Hi Guido,
>
> On 8/20/06, Guido van Rossum <guido at python.org> wrote:
> > I've created a wiki page with some ideas for Python 3000 things we
> > could do at the Google sprint (starting Monday). See:
> >
> >   http://wiki.python.org/moin/GoogleSprintPy3k
>
> I'm interested in contribute with the task "Rewrite import in Python
> (Brett Cannon)".
>
> I've started to study the Python import mechanism at interpreter
> startup to understand how it works
> (http://pythonologia.org/python_import/) and I've some ideas to make
> this rewrite too.
>
> I'll have full time at my job to work on this.
>
> Thanks,
> Osvaldo
>
> --
> Osvaldo Santana Neto
> Python for Maemo developer
> icq, url = (11287184, "http://www.pythonbrasil.com.br")
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at prescod.net  Sun Aug 20 22:07:18 2006
From: paul at prescod.net (Paul Prescod)
Date: Sun, 20 Aug 2006 13:07:18 -0700
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <fb6fbf560608201112j4c0b2b97k1f3acaa73a7c90d9@mail.gmail.com>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>
	<44E5E85F.6080508@gmail.com> <ec6pbl$7iv$1@sea.gmane.org>
	<1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com>
	<ec83d2$o69$1@sea.gmane.org>
	<1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com>
	<fb6fbf560608201112j4c0b2b97k1f3acaa73a7c90d9@mail.gmail.com>
Message-ID: <1cb725390608201307i2b4a2711y7679279b8b2fc871@mail.gmail.com>

On 8/20/06, Jim Jewett <jimjjewett at gmail.com> wrote:
>
> We're discussing the alternative of
>
> >>> def f(a:int, b):
>
> which is better for some things -- but much worse for others; if the
> metadata is any longer than int, it is almost certainly worse.  So (I
> believe) he is suggesting that we just reuse decorator syntax
>
> >>> @sig(a=int)
> ... def f(a, b):


I don't believe that's true, because this is the syntax he showed:

>          @callmeta
>          def foo( a: [ SetDoc("frobination count"), InRange(3,9) ],
>                   b: InSet([4,8,12]) )
>                   -> IsNumber:

I guess I still don't really understand what he's getting at or what the
value of @callmeta is in that example. It just seems like extra noise with
no value to me...

Ron: what *precisely* does the @callmeta decorator do? If you can express it
in code, so much the better.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060820/e8257c01/attachment.htm 

From osantana at gmail.com  Sun Aug 20 22:27:31 2006
From: osantana at gmail.com (Osvaldo Santana)
Date: Sun, 20 Aug 2006 17:27:31 -0300
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <ca471dc20608201259m35adc454i129d23a043812b5a@mail.gmail.com>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
	<b674ca220608201203p66bc1f8ch9d3c02bb1186856d@mail.gmail.com>
	<ca471dc20608201259m35adc454i129d23a043812b5a@mail.gmail.com>
Message-ID: <b674ca220608201327r31202893w25e29d7cc1834ee8@mail.gmail.com>

On 8/20/06, Guido van Rossum <guido at python.org> wrote:
> Excellent! I'm adding Brett to the CC's.

Cool. Has Brett planned something to this rewrite?

> Can you update the wiki page adding your name to that task?

Done.

> Are you coming to the sprint in person
> or are you just going to be sprinting at your own place?

I'll sprint at my job. I can access IRC from there.

Thanks,
Osvaldo

-- 
Osvaldo Santana Neto (aCiDBaSe)
icq, url = (11287184, "http://www.pythonbrasil.com.br")

From nnorwitz at gmail.com  Sun Aug 20 22:51:40 2006
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sun, 20 Aug 2006 16:51:40 -0400
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <b674ca220608201327r31202893w25e29d7cc1834ee8@mail.gmail.com>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
	<b674ca220608201203p66bc1f8ch9d3c02bb1186856d@mail.gmail.com>
	<ca471dc20608201259m35adc454i129d23a043812b5a@mail.gmail.com>
	<b674ca220608201327r31202893w25e29d7cc1834ee8@mail.gmail.com>
Message-ID: <ee2a432c0608201351h6dec136cie7ef8d49481c031a@mail.gmail.com>

On 8/20/06, Osvaldo Santana <osantana at gmail.com> wrote:
> On 8/20/06, Guido van Rossum <guido at python.org> wrote:
> > Excellent! I'm adding Brett to the CC's.
>
> Cool. Has Brett planned something to this rewrite?

I'm not sure exactly what you are asking.  It's mostly planned to be a
re-implementation of the current behaviour in python.  Hopefully
various corner-cases will be cleaned up/documented and generally
smooth over some of the differences between importing a file from the
file system and a zip package.   I don't think he's started any of
this yet, beyond looking at the PyPy implementation.

It helps him in his work to sandbox Python.  Also, various
optimizations or playing with different semantics become much easier
if import is implemented in python.

n

From p.f.moore at gmail.com  Sun Aug 20 22:52:56 2006
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 20 Aug 2006 21:52:56 +0100
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
Message-ID: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>

On 8/20/06, Guido van Rossum <guido at python.org> wrote:
> I've created a wiki page with some ideas for Python 3000 things we
> could do at the Google sprint (starting Monday). See:
>
>   http://wiki.python.org/moin/GoogleSprintPy3k

I notice that one of the items on there is "Work on the new I/O
library (I have much interest in this but need help -- Guido)". I also
have an interest in this, although I won't be at the sprint (and in
general have very little time for coding these days, unfortunately).

Is there any description of the plans for the new  I/O library
anywhere? I assume that ultimately there will be a PEP, but in the
meantime, I recall very little in the way of details having been
discussed.

Paul.

From rrr at ronadam.com  Sun Aug 20 23:01:46 2006
From: rrr at ronadam.com (Ron Adam)
Date: Sun, 20 Aug 2006 16:01:46 -0500
Subject: [Python-3000] Fwd: Conventions for annotation consumers
In-Reply-To: <1cb725390608201307i2b4a2711y7679279b8b2fc871@mail.gmail.com>
References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com>	<44E5E85F.6080508@gmail.com>
	<ec6pbl$7iv$1@sea.gmane.org>	<1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com>	<ec83d2$o69$1@sea.gmane.org>	<1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com>	<fb6fbf560608201112j4c0b2b97k1f3acaa73a7c90d9@mail.gmail.com>
	<1cb725390608201307i2b4a2711y7679279b8b2fc871@mail.gmail.com>
Message-ID: <ecaip3$fmq$1@sea.gmane.org>

Paul Prescod wrote:

> I guess I still don't really understand what he's getting at or what the 
> value of @callmeta is in that example. It just seems like extra noise 
> with no value to me...
> 
> Ron: what *precisely* does the @callmeta decorator do? If you can 
> express it in code, so much the better.
> 
>  Paul Prescod
> 


Here's a working example.  @callmeta could be named something else like 
@asserter, @checker, or whatever.  And it should do more checks to avoid 
non callable annotations and to keep from writing over pre existing 
annotations, etc...

As I said this could all be put in a module and it's easy to create new 
assert tests without having to know about decorators or any special classes.

    Ron



# ----- Some assert test functions.

def IsAny(arg): pass

def IsNumber(arg):
     assert type(arg) in (int, long, float), \
            "%r is not a number" % arg

def IsInt(arg):
     assert type(arg) in (int, long), \
            "%r is not an Int" % arg

def IsFloat(arg):
     assert isinstance(arg, float), \
            "%r is not a flaot" % arg

def InRange(start, stop):
     def inrange(arg):
         assert start <= arg <= stop, \
                "%r is not in range %r through %r" % (arg, start, stop)
     return inrange

def InSet(list_):
     s = set(list_)
     def inset(arg):
         assert arg in s, \
                "%r is not in %r" % (arg, s)
     return inset


# ------- The add-annotation decorator.

def annotate(**kwds):
     def setter(func):
         func.__setattr__('__signature__', dict())
         func.__signature__['annotations'] = kwds
         return func
     return setter


# ------ The do-asserts decorator.

def callmeta(f):
     def new_f(*args, **kwds):
         d = dict(zip(f.func_code.co_varnames, args))
         d.update(kwds)
         tests = f.__signature__['annotations']
         for key in d:
             if key != 'returns':
                 tests[key](d[key])
         result = f(*args, **kwds)
         if 'returns' in tests:
             tests['returns'](result)
         return result
     new_f.func_name = f.func_name
     return new_f


# --------- Examples of using callable annotations.

@callmeta
@annotate(a=Any, b=IsInt, returns=IsInt)
def add(a, b):
     return a + b

print add(1, 4)

@callmeta
@annotate(a=IsInt, b=IsInt, returns=IsInt)
def add(a, b):
     return a + b

print add(1, 4.1)    # assertion error here.



# which could also be...

"""
@callmeta
def add(a:IsInt, b:IsInt) ->IsInt:
     return a + b
"""



From qrczak at knm.org.pl  Sun Aug 20 23:06:28 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sun, 20 Aug 2006 23:06:28 +0200
Subject: [Python-3000] int-long unification
In-Reply-To: <ca471dc20608200943w9118b6el8fec43d2e539352c@mail.gmail.com>
	(Guido van Rossum's message of "Sun, 20 Aug 2006 09:43:05 -0700")
References: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>
	<44E88E1F.6010607@v.loewis.de>
	<ca471dc20608200943w9118b6el8fec43d2e539352c@mail.gmail.com>
Message-ID: <871wrbyv57.fsf@qrnik.zagroda>

"Guido van Rossum" <guido at python.org> writes:

> The fatal error strikes me as unpleasant. Perhaps PyInt_Check[Exact]
> should return false if the value won't fit in a C long?

Maybe.

> Or perhaps we could just return -sys.maxint-1?

This would be a bad idea: some errors in use programs would yield
nonsensical results or be masked instead of being signalled with
exceptions.

I made C macros for the following patterns of extracting C integers
from my language:

1. If the object is an integer with its value in the given range,
   put the value into a C integer variable. Otherwise fail with an
   exception which tells that the value is out of range (includes
   the value, the range, and a string explaining what does this
   value represent), or that is not an integer.

2. As above, but the range is the full range of the C type.

3. As above, but the low end is 0 or given explicitly and the high end
   is the range of the C type.

Only in rare cases I needed to separate checking whether the number is
in the given range, and extracting the value under the assumption that
it has been checked earlier. Sometimes the action performed for out of
range is different than throwing an exception, but this is rare too.

The C type can be smaller or larger than the threshold which separates
the representations of small integers and big integers in my runtime
(which in my case is 1 bit smaller than some C type, so it never
matches exactly). This is handled transparently by these C macros.

I always try to find out the maximum sensible range of the given
parameter. For example:

- bzip2, compression parameters (verbosity 0..4, compression level 1..9,
  work factor 1..250), gzip similarly - case 1
- Python's unichr(): character code 0..0x10FFFF - case 1
- conversions int<->str, base 2..36 - case 1
- seeking into files - cases 2 and 3
- curses, color pair number 0..PAIR_NUMBER(A_COLOR) - case 1
- curses, screen coordinates and character counts - case 3
- curses, KEY_F(n) 0..63 - case 1
- sockets, address family code 0..AF_MAX or 0..255 - case 1
- sockets, port number 0..65535 - case 1
- sockets, socket type code and protocol number - case 3
- readline, function code in keymap 0..255 (or 0..KEYMAP_SIZE-2,
  but KEYMAP_SIZE is always 257) - case 1
- readline, repetition count of commands - case 2
- readline, rl_display_match_list, screen width 0..INT_MAX-2 - case 1
- readline, history entry positions - case 3
- readline, terminal width & height - case 3
- kill() and waitpid(), pid - case 3 (starting from 1 for an
  individual process or 2 for process group)
- kill(), signal number 0..NSIG-1 or 0.._NSIG-1 or 0..32 - case 1

The effect when writing a C extension is that the same C code works
no matter what is the relation between ranges of the target C type
and int / size_t. Python had to code extraction of the seeking offset
specially because off_t may be larger, and silently assumes that the
sensible ranges of pid_t, uid_t etc. are the same as of C int.

The visible effect is that Python has inconsistent exceptions:
>>> unichr(0x123456)
ValueError: unichr() arg not in range(0x110000) (wide Python build)
>>> unichr(0x1234567890)
OverflowError: long int too large to convert to int

Kogut is consistent here:
> Char 0x123456
Value out of range: character code must be between 0 and 1114111, but 1193046 was given
> Char 0x1234567890
Value out of range: character code must be between 0 and 1114111, but 78187493520 was given

Python:
>>> posix.kill(0, 128)
OSError: [Errno 22] Invalid argument
>>> posix.kill(0, 2**32)
OverflowError: long int too large to convert to int

Kogut:
> SignalProcess #group (SystemSignal 128)
Value out of range: signal number must be between 0 and 64, but 128 was given
> SignalProcess #group (SystemSignal (2 %Power 32))
Value out of range: signal number must be between 0 and 64, but 4294967296 was given

The same applies in the other direction, converting from C.

C in Python:
#ifdef HAVE_LARGEFILE_SUPPORT
        PyStructSequence_SET_ITEM(v, 1,
                                  PyLong_FromLongLong((PY_LONG_LONG)st.st_ino));
#else
        PyStructSequence_SET_ITEM(v, 1, PyInt_FromLong((long)st.st_ino));
#endif

C in Kogut:
   KO_INT(ko_value_of_file_status(this)->st_ino)
This is a C expression returning the equivalent of PyObject *,
taking sizeof the argument into account.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From martin at v.loewis.de  Sun Aug 20 23:10:41 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 20 Aug 2006 23:10:41 +0200
Subject: [Python-3000] Ctypes as cross-interpreter C calling interface
In-Reply-To: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com>
Message-ID: <44E8CFD1.9090403@v.loewis.de>

Paul Prescod schrieb:
> Thanks for everyone who contributed. It seems that the emerging
> consensus (bar a security question from Guido) is that ctypes it the way
> forward for calling C code in Python 3000. 

I don't think that can ever work (so I don't participate in that
consensus). There are too many issues with C that make ctypes not
general enough.
a) it requires code to be packaged in a DLL; static libraries
   are not supported (conceptually)
b) it requires you to know the layout of data structures, or
   atleast to duplicate declarations in Python. As the layout
   of the same structure may change over time or across
   implementations (e.g. FILE in stdio), you can never get good
   platform coverage.
c) A good deal of C API is through macros, for various usages
   (symbolic constants, function inlining,
    customization/configuration/conditional compilation)
d) No real support for C++ (where there are even more ABI
   issues: (multiple) inheritance, vtables, constructors,
   operator overload, templates, ...)

To access a C API, the only "right" way is to use a C compiler.
ctypes is for people who want to avoid using a C compiler at
all costs.

Regards,
Martin

From seojiwon at gmail.com  Sun Aug 20 23:52:32 2006
From: seojiwon at gmail.com (Jiwon Seo)
Date: Sun, 20 Aug 2006 14:52:32 -0700
Subject: [Python-3000] Keyword Only Argument
Message-ID: <b008462b0608201452y4b8e37b6gfcf098832ea840ad@mail.gmail.com>

For the implementation of  Implement PEP [PEP]3102 - Keyword Only
Argument, it would be nice to have a (abstract) data structure
representing the signature of a function. Currently, the code object
only has # of arguments, # of default values, so if we want to allow
something like,

def foo(a,b=10,*,c,d):
    ...

or,

def foo(a,b=10,*,c,d=20):
    ...

and signature data structure will be very helpful.

Signature data structure is roughly described in
http://mail.python.org/pipermail/python-3000/2006-April/001249.html ,
but has anyone got detailed idea or implemented it (doesn't matter how
naive the implementation is) ? Brett, is that document most recent one
describing signature data structure?

-Jiwon

From jcarlson at uci.edu  Mon Aug 21 00:47:52 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 20 Aug 2006 15:47:52 -0700
Subject: [Python-3000] signature annotation in the function signature or
	a separate line
In-Reply-To: <ca471dc20608200804w5ffe2bf2t4c15f2c715a861a8@mail.gmail.com>
References: <44E6EDDB.9070604@strakt.com>
	<ca471dc20608200804w5ffe2bf2t4c15f2c715a861a8@mail.gmail.com>
Message-ID: <20060820152716.1A09.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> > given the precedent of type inferenced languages
> > that use a separate line for optional type information
> 
> Can you show us an example or two?

C/C++ probably doesn't count, being that type information is required,
but one can relocate type information to other lines...

void
cross(inp1, inp2, inpl1, inpl2, outp)
double* inp1;
double* inp2;
long    inpl1;
long    inpl2;
double* outp
{
    /* body goes here */
}


 - Josisha


From free.condiments at gmail.com  Mon Aug 21 01:26:57 2006
From: free.condiments at gmail.com (Sam Pointon)
Date: Mon, 21 Aug 2006 00:26:57 +0100
Subject: [Python-3000] signature annotation in the function signature or
	a separate line
In-Reply-To: <ca471dc20608200804w5ffe2bf2t4c15f2c715a861a8@mail.gmail.com>
References: <20060816090147.19DA.JCARLSON@uci.edu>
	<ca471dc20608161013h18be025dr2f413f3226b70819@mail.gmail.com>
	<20060816102652.19E3.JCARLSON@uci.edu>
	<ca471dc20608161117h62bdaa96ra59eb2dc9bef5f78@mail.gmail.com>
	<44E6EDDB.9070604@strakt.com>
	<ca471dc20608200804w5ffe2bf2t4c15f2c715a861a8@mail.gmail.com>
Message-ID: <b1c02c610608201626l2c17f16cnd0f2be590874e1db@mail.gmail.com>

On 20/08/06, Guido van Rossum <guido at python.org> wrote:
> On 8/19/06, Samuele Pedroni <pedronis at strakt.com> wrote:
> > given the precedent of type inferenced languages
> > that use a separate line for optional type information
>
> Can you show us an example or two?

Haskell:

map :: (a -> b) -> [a] -> [b]
map f xs = ...

Note that type information can also be contained in an expression (and
by extension on the same line), though the convention for defined
functions is to have it on a separate line. This  type information is
not quite 100% optional - there are some corner-cases where the
typechecker needs a shove in the correct direction, or the inferred
type could be too general.

--Sam

From guido at python.org  Mon Aug 21 01:27:15 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 20 Aug 2006 16:27:15 -0700
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>
Message-ID: <ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>

On 8/20/06, Paul Moore <p.f.moore at gmail.com> wrote:
> On 8/20/06, Guido van Rossum <guido at python.org> wrote:
> > I've created a wiki page with some ideas for Python 3000 things we
> > could do at the Google sprint (starting Monday). See:
> >
> >   http://wiki.python.org/moin/GoogleSprintPy3k
>
> I notice that one of the items on there is "Work on the new I/O
> library (I have much interest in this but need help -- Guido)". I also
> have an interest in this, although I won't be at the sprint (and in
> general have very little time for coding these days, unfortunately).
>
> Is there any description of the plans for the new  I/O library
> anywhere? I assume that ultimately there will be a PEP, but in the
> meantime, I recall very little in the way of details having been
> discussed.

Without endorsing every detail of his design, tomer filiba has written
several blog (?) entries about this, the latest being
http://sebulba.wikispaces.com/project+iostack+v2 . You can also look
at sandbox/sio/sio.py in svn.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Mon Aug 21 01:42:08 2006
From: talin at acm.org (Talin)
Date: Sun, 20 Aug 2006 16:42:08 -0700
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>
	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>
Message-ID: <44E8F350.8070509@acm.org>

Guido van Rossum wrote:
> On 8/20/06, Paul Moore <p.f.moore at gmail.com> wrote:

> Without endorsing every detail of his design, tomer filiba has written
> several blog (?) entries about this, the latest being
> http://sebulba.wikispaces.com/project+iostack+v2 . You can also look
> at sandbox/sio/sio.py in svn.

One comment after reading this: If we're going to re-invent the Java/C# 
i/o library, could we at least use the same terminology? In particular, 
the term "Layer" has connotations which may be confusing in this context 
- I would prefer something like "Adapter" or "Filter".

Also, I notice that this proposal removes what I consider to be a nice 
feature of Python, which is that you can take a plain file object and 
iterate over the lines of the file -- it would require a separate line 
buffering adapter to be created. I think I understand the reasoning 
behind this - in a world with multiple text encodings, the definition of 
"line" may not be so simple. However, I would assume that the "built-in" 
streams would support the most basic, least-common-denominator encodings 
for convenience.

-- Talin

From greg.ewing at canterbury.ac.nz  Mon Aug 21 03:03:27 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 21 Aug 2006 13:03:27 +1200
Subject: [Python-3000] raise with traceback?
In-Reply-To: <eca1em$16m$1@sea.gmane.org>
References: <ec9um0$pqs$1@sea.gmane.org>
	<ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a@mail.gmail.com>
	<eca1em$16m$1@sea.gmane.org>
Message-ID: <44E9065F.7030802@canterbury.ac.nz>

Terry Reedy wrote:
> "Guido van Rossum" <guido at python.org> wrote in message 
> news:ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a at mail.gmail.com...
> 
>>I wonder if "raise ValueError" should still be allowed (as equivalent
>>to "raise ValueError()") or that it should be disallowed.
> 
> +1 for disallow.

Seems like that would break a lot of code with no
obvious way of flagging things which need to be
changed.

Also it would preclude the possibility of any
future optimisation to avoid instantiating the
exception when its value isn't needed.

--
Greg

From guido at python.org  Mon Aug 21 03:06:02 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 20 Aug 2006 18:06:02 -0700
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <44E8F350.8070509@acm.org>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>
	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>
	<44E8F350.8070509@acm.org>
Message-ID: <ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>

On 8/20/06, Talin <talin at acm.org> wrote:
> Guido van Rossum wrote:
> > On 8/20/06, Paul Moore <p.f.moore at gmail.com> wrote:
>
> > Without endorsing every detail of his design, tomer filiba has written
> > several blog (?) entries about this, the latest being
> > http://sebulba.wikispaces.com/project+iostack+v2 . You can also look
> > at sandbox/sio/sio.py in svn.
>
> One comment after reading this: If we're going to re-invent the Java/C#
> i/o library, could we at least use the same terminology? In particular,
> the term "Layer" has connotations which may be confusing in this context
> - I would prefer something like "Adapter" or "Filter".

That's an example of what I meant when I said "without endorsing every detail".

I don't know which terminology C++ uses beyond streams. I think Java
uses Streams for the lower-level stuff and Reader/Writer for the
higher-level stuff -- or is it the other way around?

> Also, I notice that this proposal removes what I consider to be a nice
> feature of Python, which is that you can take a plain file object and
> iterate over the lines of the file -- it would require a separate line
> buffering adapter to be created. I think I understand the reasoning
> behind this - in a world with multiple text encodings, the definition of
> "line" may not be so simple. However, I would assume that the "built-in"
> streams would support the most basic, least-common-denominator encodings
> for convenience.

First time I noticed that. But perhaps it's the concept of "plain file
object" that changed? My own hierarchy (which I arrived at without
reading tomer's proposal) is something like this:

(1) Basic level (implemented in C) -- open, close, read, write, seek,
tell. Completely unbuffered, maps directly to system calls. Does
binary I/O only.

(2) Buffering. Implements the same API as (1) but adds buffering. This
is what one normally uses for binary file I/O. It builds on (1), but
can also be built on raw sockets instead. It adds an API to inquire
about the amount of buffered data, a flush() method, and ways to
change the buffer size.

(3) Encoding and line endings. Implements a somewhat different API,
for reading/writing text files; the API resembles Python 2's I/O
library more. This is where readline() and next() giving the next line
are implemented. It also does newline translation to/from the
platform's native convention (CRLF or LF, or perhaps CR if anyone
still cares about Mac OS <= 9) and Python's convention (always \n). I
think I want to put these two features (encoding and line endings) in
the same layer because they are both text related. Of course you can
specify ASCII or Latin-1 to effectively disable the encoding part.

Does this make more sense?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Mon Aug 21 03:34:28 2006
From: talin at acm.org (Talin)
Date: Sun, 20 Aug 2006 18:34:28 -0700
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>	
	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>	
	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>	
	<44E8F350.8070509@acm.org>
	<ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>
Message-ID: <44E90DA4.1040203@acm.org>

Guido van Rossum wrote:
> On 8/20/06, Talin <talin at acm.org> wrote:
>> Guido van Rossum wrote:
>> > On 8/20/06, Paul Moore <p.f.moore at gmail.com> wrote:
>>
>> > Without endorsing every detail of his design, tomer filiba has written
>> > several blog (?) entries about this, the latest being
>> > http://sebulba.wikispaces.com/project+iostack+v2 . You can also look
>> > at sandbox/sio/sio.py in svn.
>>
>> One comment after reading this: If we're going to re-invent the Java/C#
>> i/o library, could we at least use the same terminology? In particular,
>> the term "Layer" has connotations which may be confusing in this context
>> - I would prefer something like "Adapter" or "Filter".
> 
> That's an example of what I meant when I said "without endorsing every 
> detail".
> 
> I don't know which terminology C++ uses beyond streams. I think Java
> uses Streams for the lower-level stuff and Reader/Writer for the
> higher-level stuff -- or is it the other way around?

Well, the situation with Java is kind of complex. There are two sets of 
stream classes, but rather than classifying them as "low-level" and 
"high-level", a better classification is "old" and "new". The old 
classes (InputStream/OutputStream) are byte-oriented, whereas the newer 
ones (Reader/Writer) are character-oriented. It it not the case, 
however, that the character-oriented interface sits on top of the 
byte-oriented interface - rather, both interfaces are implemented by a 
number of different back ends.

For purposes of Python, it probably makes more sense to look at the .Net 
System.IO.Stream. (As a general rule, the .Net classes are refactored 
versions of the Java classes, which is both good and bad. It's best to 
study both if one is looking for inspiration.)

Hmmm, apparently the .Net documentation *does* use the term 'layer' to 
describe one stream wrapping another - which I still find strange. To my 
mind, the term 'layer' can either describe a particular design stratum 
within an architecture - such as the 'device layer' of an operating 
system - or it can describe a portion of a document, such as a drawing 
layer in a CAD program. I don't normally think of a single instance of a 
class wrapping another instance as constituting a "layer" - I usually 
use the term "adapter" or "proxy" to describe that case.

(OK, so I'm pedantic about naming. Now you know why one of my side 
projects is writing an online programmer's thesaurus -- using 
Python/TurboGears of course!)

>> Also, I notice that this proposal removes what I consider to be a nice
>> feature of Python, which is that you can take a plain file object and
>> iterate over the lines of the file -- it would require a separate line
>> buffering adapter to be created. I think I understand the reasoning
>> behind this - in a world with multiple text encodings, the definition of
>> "line" may not be so simple. However, I would assume that the "built-in"
>> streams would support the most basic, least-common-denominator encodings
>> for convenience.
> 
> First time I noticed that. But perhaps it's the concept of "plain file
> object" that changed? My own hierarchy (which I arrived at without
> reading tomer's proposal) is something like this:
> 
> (1) Basic level (implemented in C) -- open, close, read, write, seek,
> tell. Completely unbuffered, maps directly to system calls. Does
> binary I/O only.
> 
> (2) Buffering. Implements the same API as (1) but adds buffering. This
> is what one normally uses for binary file I/O. It builds on (1), but
> can also be built on raw sockets instead. It adds an API to inquire
> about the amount of buffered data, a flush() method, and ways to
> change the buffer size.
> 
> (3) Encoding and line endings. Implements a somewhat different API,
> for reading/writing text files; the API resembles Python 2's I/O
> library more. This is where readline() and next() giving the next line
> are implemented. It also does newline translation to/from the
> platform's native convention (CRLF or LF, or perhaps CR if anyone
> still cares about Mac OS <= 9) and Python's convention (always \n). I
> think I want to put these two features (encoding and line endings) in
> the same layer because they are both text related. Of course you can
> specify ASCII or Latin-1 to effectively disable the encoding part.
> 
> Does this make more sense?

I understood that much -- this is pretty much the way everyone does 
things these days (our own custom stream library at work looks pretty 
much like this too.)

The question I was wondering is, will the built-in 'file' function 
return an object of level 3?

-- Talin

From alexander.belopolsky at gmail.com  Mon Aug 21 05:36:09 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Mon, 21 Aug 2006 03:36:09 +0000 (UTC)
Subject: [Python-3000] Google Sprint Ideas
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>
	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>
	<44E8F350.8070509@acm.org>
	<ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>
Message-ID: <loom.20060821T051839-686@post.gmane.org>

Guido van Rossum <guido <at> python.org> writes:

[snip]
>>> Without endorsing every detail of his design, tomer filiba has written
>>> several blog (?) entries about this, the latest being
>>> http://sebulba.wikispaces.com/project+iostack+v2 . You can also look
>>> at sandbox/sio/sio.py in svn.
[snip]
> 
> That's an example of what I meant when I said "without endorsing every
>  detail".

Here is another detail that I would like to see addressed. 
The new API does not seem to provide for a way to read
data directly into an existing object without creating
an intermediate bytes object.

Python 2.x has an undocumented readinto method that
allows to read data directly into an object that supports
buffer protocol.

For Py3k, I would like to suggest a buffer protocol modelled
after iovec structure that is used by the readv system call.
On many systems readv is more efficient than repeated calls
to read and I think Py3k will benefit from a direct access to
that feature.


From martin at v.loewis.de  Mon Aug 21 06:01:00 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 21 Aug 2006 06:01:00 +0200
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <loom.20060821T051839-686@post.gmane.org>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>	<44E8F350.8070509@acm.org>	<ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>
	<loom.20060821T051839-686@post.gmane.org>
Message-ID: <44E92FFC.9080407@v.loewis.de>

Alexander Belopolsky schrieb:
> For Py3k, I would like to suggest a buffer protocol modelled
> after iovec structure that is used by the readv system call.
> On many systems readv is more efficient than repeated calls
> to read and I think Py3k will benefit from a direct access to
> that feature.

-1. It's difficult to use, and I question that there is any
benefit. I believe readv is there primarily for symmetry with
writev and hasn't any sensible uses on its own. writev is
there so you can add additional headers/trailers around data
blocks you received from higher layers. I even doubt that
exposing writev in Python would make a measurable performance
difference.

Regards,
Martin

From guido at python.org  Mon Aug 21 06:32:18 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 20 Aug 2006 21:32:18 -0700
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <44E90DA4.1040203@acm.org>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>
	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>
	<44E8F350.8070509@acm.org>
	<ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>
	<44E90DA4.1040203@acm.org>
Message-ID: <ca471dc20608202132y52ecd4fbs24d7212689f2df03@mail.gmail.com>

On 8/20/06, Talin <talin at acm.org> wrote:
> Guido van Rossum wrote:
> > On 8/20/06, Talin <talin at acm.org> wrote:
> >> Guido van Rossum wrote:
> >> > On 8/20/06, Paul Moore <p.f.moore at gmail.com> wrote:
> >>
> >> > Without endorsing every detail of his design, tomer filiba has written
> >> > several blog (?) entries about this, the latest being
> >> > http://sebulba.wikispaces.com/project+iostack+v2 . You can also look
> >> > at sandbox/sio/sio.py in svn.
> >>
> >> One comment after reading this: If we're going to re-invent the Java/C#
> >> i/o library, could we at least use the same terminology? In particular,
> >> the term "Layer" has connotations which may be confusing in this context
> >> - I would prefer something like "Adapter" or "Filter".
> >
> > That's an example of what I meant when I said "without endorsing every
> > detail".
> >
> > I don't know which terminology C++ uses beyond streams. I think Java
> > uses Streams for the lower-level stuff and Reader/Writer for the
> > higher-level stuff -- or is it the other way around?
>
> Well, the situation with Java is kind of complex. There are two sets of
> stream classes, but rather than classifying them as "low-level" and
> "high-level", a better classification is "old" and "new". The old
> classes (InputStream/OutputStream) are byte-oriented, whereas the newer
> ones (Reader/Writer) are character-oriented. It it not the case,
> however, that the character-oriented interface sits on top of the
> byte-oriented interface - rather, both interfaces are implemented by a
> number of different back ends.

How sure are you of all that? I always thought that these have about
the same age, and that the main distinction is byte vs. char
orientation. Also, the InputStreamReader class clearly sits on top of
the InputStream class (but surprisingly recommends that for efficiency
you do buffering on the reader side instead of on the stream side --
should we consider this for Python too?). And FileReader is a subclass
of InputStreamReader. (OK, further investigation does show that
FileInputStream exists since JDK 1.0 while InputStreamReader exists
since JDK 1.1. But there's much newer Java I/O in the "nio" package,
and there's work going on for "nio2", JSR 203.)

> For purposes of Python, it probably makes more sense to look at the .Net
> System.IO.Stream. (As a general rule, the .Net classes are refactored
> versions of the Java classes, which is both good and bad. It's best to
> study both if one is looking for inspiration.)

Perhaps you can tell us more about that? I've used the Java I/O system
sufficiently to have a feel for how it is actually used, which helps
me find my way in the docs; but for .NET I fear that I would have to
go on a sabbattical to make sense of it. And I don't have time for
that.

> Hmmm, apparently the .Net documentation *does* use the term 'layer' to
> describe one stream wrapping another - which I still find strange. To my
> mind, the term 'layer' can either describe a particular design stratum
> within an architecture - such as the 'device layer' of an operating
> system - or it can describe a portion of a document, such as a drawing
> layer in a CAD program.

It's used whenever you could draw a diagram of several layers of
software sitting on top of each other. Perhaps usually layers are
bigger (like device layers) but I see nothing wrong with declaring
that Python I/O consists of three layers.

> I don't normally think of a single instance of a
> class wrapping another instance as constituting a "layer" - I usually
> use the term "adapter" or "proxy" to describe that case.
>
> (OK, so I'm pedantic about naming. Now you know why one of my side
> projects is writing an online programmer's thesaurus -- using
> Python/TurboGears of course!)

Wouldn't it make more sense to contribute to wikipedia at this point?

> >> Also, I notice that this proposal removes what I consider to be a nice
> >> feature of Python, which is that you can take a plain file object and
> >> iterate over the lines of the file -- it would require a separate line
> >> buffering adapter to be created. I think I understand the reasoning
> >> behind this - in a world with multiple text encodings, the definition of
> >> "line" may not be so simple. However, I would assume that the "built-in"
> >> streams would support the most basic, least-common-denominator encodings
> >> for convenience.
> >
> > First time I noticed that. But perhaps it's the concept of "plain file
> > object" that changed? My own hierarchy (which I arrived at without
> > reading tomer's proposal) is something like this:
> >
> > (1) Basic level (implemented in C) -- open, close, read, write, seek,
> > tell. Completely unbuffered, maps directly to system calls. Does
> > binary I/O only.
> >
> > (2) Buffering. Implements the same API as (1) but adds buffering. This
> > is what one normally uses for binary file I/O. It builds on (1), but
> > can also be built on raw sockets instead. It adds an API to inquire
> > about the amount of buffered data, a flush() method, and ways to
> > change the buffer size.
> >
> > (3) Encoding and line endings. Implements a somewhat different API,
> > for reading/writing text files; the API resembles Python 2's I/O
> > library more. This is where readline() and next() giving the next line
> > are implemented. It also does newline translation to/from the
> > platform's native convention (CRLF or LF, or perhaps CR if anyone
> > still cares about Mac OS <= 9) and Python's convention (always \n). I
> > think I want to put these two features (encoding and line endings) in
> > the same layer because they are both text related. Of course you can
> > specify ASCII or Latin-1 to effectively disable the encoding part.
> >
> > Does this make more sense?
>
> I understood that much -- this is pretty much the way everyone does
> things these days (our own custom stream library at work looks pretty
> much like this too.)

So you have the buffering between the binary I/O and the text I/O too?

> The question I was wondering is, will the built-in 'file' function
> return an object of level 3?

I am hoping to get rid of 'file' altogether. Instead, I want to go
back to 'open'. Calling open() with a binary mode argument would
return a layer 2 or layer 1 (if unbuffered) object; calling it with a
text mode would return a layer 3 object. open() would grow additional
keyword parameters to specify the encoding, the desired newline
translation, and perhaps other aspects of the layering that might need
control.

BTW in response to Alexander Belopolsky: yes, I would like to continue
support for something like readinto() by layer 1 and maybe 2 (perhaps
even more flexible, e.g. specifying a buffer and optional start and
end indices). I don't think it makes sense for layer 3 since strings
are immutable. I agree with Martin von Loewis that a readv() style API
would be impractical (and I note that Alexander doesn't provide any
use case beyond "it's more efficient").

A use case that I do think is important is reading encoded text data
asynchronously from a socket. This might mean that layers 2 and 3 may
have to be aware of the asynchronous (non-blocking or timeout-driven)
nature of the I/O; reading from layer 3 should give as many characters
as possible without blocking for I/O more than the specified timeout.
We should also decide how asynchronous I/O calls report "no more data"
-- exceptions are inefficient and cause clumsy code, but if we return
"", how can we tell that apart from EOF? Perhaps we can use None to
indicate "no more data available without blocking", continuing "" to
indicate EOF. (The other way around makes just as much sense but would
be a bigger break with Python's past than this particular issue is
worth to me.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From alexander.belopolsky at gmail.com  Mon Aug 21 06:43:53 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Mon, 21 Aug 2006 00:43:53 -0400
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <44E92FFC.9080407@v.loewis.de>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>	<44E8F350.8070509@acm.org>	<ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>
	<loom.20060821T051839-686@post.gmane.org>
	<44E92FFC.9080407@v.loewis.de>
Message-ID: <C41AE1E8-F6F6-492E-A1DD-AEA6ED3A0E86@local>


On Aug 21, 2006, at 12:01 AM, Martin v. L?wis wrote:

> Alexander Belopolsky schrieb:
>> For Py3k, I would like to suggest a buffer protocol modelled
>> after iovec structure that is used by the readv system call.
>> On many systems readv is more efficient than repeated calls
>> to read and I think Py3k will benefit from a direct access to
>> that feature.
>
> -1

What is this -1 for:

a) buffer protocol in Py3k?
b) multisegment buffer protocol?
c) readinto that supports multisegment buffers?

Note that in 2.x buffer protocol is multisegment, but readinto
only supports single-segment buffers.

> It's difficult to use, and I question that there is any
> benefit.

I often deal with the system (kx.com) that represents matrices as
nested lists (1d lists of floats are contiguous).  My matrices are
stored on disk as C-style 2d arrays.  If fileinto would support
multisegment buffers, I would be able to update in-memory data
from files on disk just with a call to it.  Currently I have to do it in
a loop.

> I believe readv is there primarily for symmetry with
> writev and hasn't any sensible uses on its own. writev is
> there so you can add additional headers/trailers around data
> blocks you received from higher layers. I even doubt that
> exposing writev in Python would make a measurable performance
> difference.

I did not suggest to expose anything in Python.  AFAIK, the buffer  
protocol
is a C API only. 

From talin at acm.org  Mon Aug 21 07:41:11 2006
From: talin at acm.org (Talin)
Date: Sun, 20 Aug 2006 22:41:11 -0700
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <ca471dc20608202132y52ecd4fbs24d7212689f2df03@mail.gmail.com>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>	
	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>	
	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>	
	<44E8F350.8070509@acm.org>	
	<ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>	
	<44E90DA4.1040203@acm.org>
	<ca471dc20608202132y52ecd4fbs24d7212689f2df03@mail.gmail.com>
Message-ID: <44E94777.9010601@acm.org>

Guido van Rossum wrote:
> On 8/20/06, Talin <talin at acm.org> wrote:
>> Guido van Rossum wrote:
> How sure are you of all that? I always thought that these have about
> the same age, and that the main distinction is byte vs. char
> orientation. Also, the InputStreamReader class clearly sits on top of
> the InputStream class (but surprisingly recommends that for efficiency
> you do buffering on the reader side instead of on the stream side --
> should we consider this for Python too?). And FileReader is a subclass
> of InputStreamReader. (OK, further investigation does show that
> FileInputStream exists since JDK 1.0 while InputStreamReader exists
> since JDK 1.1. But there's much newer Java I/O in the "nio" package,
> and there's work going on for "nio2", JSR 203.)

Admittedly my Java knowledge is somewhat old - I spent 2 years 
programming Java in the ".com era" (2000 - 2001). I remember when the 
new reader classes came out in JDK 1.1. So "old" and "new" are somewhat 
relative here. From the point of view of JDK1.5 they are probably 
indistinguishable as to age :)

>> For purposes of Python, it probably makes more sense to look at the .Net
>> System.IO.Stream. (As a general rule, the .Net classes are refactored
>> versions of the Java classes, which is both good and bad. It's best to
>> study both if one is looking for inspiration.)
> 
> Perhaps you can tell us more about that? I've used the Java I/O system
> sufficiently to have a feel for how it is actually used, which helps
> me find my way in the docs; but for .NET I fear that I would have to
> go on a sabbattical to make sense of it. And I don't have time for
> that.

Try this page. This will at least give you a start:

http://msdn2.microsoft.com/en-us/library/system.io.streamreader_members.aspx

Here's an excerpt from the "Read" method (reformatted by me):

StreamReader.Read () -- Reads the next character from the input stream 
and advances the character position by one character.

StreamReader.Read( Char[], Int32, Int32 ) -- Reads a maximum of count 
characters from the current stream into buffer, beginning at index.

>> Hmmm, apparently the .Net documentation *does* use the term 'layer' to
>> describe one stream wrapping another - which I still find strange. To my
>> mind, the term 'layer' can either describe a particular design stratum
>> within an architecture - such as the 'device layer' of an operating
>> system - or it can describe a portion of a document, such as a drawing
>> layer in a CAD program.
> 
> It's used whenever you could draw a diagram of several layers of
> software sitting on top of each other. Perhaps usually layers are
> bigger (like device layers) but I see nothing wrong with declaring
> that Python I/O consists of three layers.
> 
>> I don't normally think of a single instance of a
>> class wrapping another instance as constituting a "layer" - I usually
>> use the term "adapter" or "proxy" to describe that case.
>>
>> (OK, so I'm pedantic about naming. Now you know why one of my side
>> projects is writing an online programmer's thesaurus -- using
>> Python/TurboGears of course!)
> 
> Wouldn't it make more sense to contribute to wikipedia at this point?

Off topic :)

Seriously, though, what I am doing is very different from Wikipedia, and 
much more like WordNet - that is, I have a database that represents 
semantic relations between words, and an AJAX GUI that allows editing of 
those relationships. Mostly it works, but I still need a way for people 
to create accounts.

(Source browsable at http://www.viridia.org/hg/ if interested.)

>> >> Also, I notice that this proposal removes what I consider to be a nice
>> >> feature of Python, which is that you can take a plain file object and
>> >> iterate over the lines of the file -- it would require a separate line
>> >> buffering adapter to be created. I think I understand the reasoning
>> >> behind this - in a world with multiple text encodings, the 
>> definition of
>> >> "line" may not be so simple. However, I would assume that the 
>> "built-in"
>> >> streams would support the most basic, least-common-denominator 
>> encodings
>> >> for convenience.
>> >
>> > First time I noticed that. But perhaps it's the concept of "plain file
>> > object" that changed? My own hierarchy (which I arrived at without
>> > reading tomer's proposal) is something like this:
>> >
>> > (1) Basic level (implemented in C) -- open, close, read, write, seek,
>> > tell. Completely unbuffered, maps directly to system calls. Does
>> > binary I/O only.
>> >
>> > (2) Buffering. Implements the same API as (1) but adds buffering. This
>> > is what one normally uses for binary file I/O. It builds on (1), but
>> > can also be built on raw sockets instead. It adds an API to inquire
>> > about the amount of buffered data, a flush() method, and ways to
>> > change the buffer size.
>> >
>> > (3) Encoding and line endings. Implements a somewhat different API,
>> > for reading/writing text files; the API resembles Python 2's I/O
>> > library more. This is where readline() and next() giving the next line
>> > are implemented. It also does newline translation to/from the
>> > platform's native convention (CRLF or LF, or perhaps CR if anyone
>> > still cares about Mac OS <= 9) and Python's convention (always \n). I
>> > think I want to put these two features (encoding and line endings) in
>> > the same layer because they are both text related. Of course you can
>> > specify ASCII or Latin-1 to effectively disable the encoding part.
>> >
>> > Does this make more sense?
>>
>> I understood that much -- this is pretty much the way everyone does
>> things these days (our own custom stream library at work looks pretty
>> much like this too.)
> 
> So you have the buffering between the binary I/O and the text I/O too?

Theoretically, yes - you can plug in a buffer in-between them if you 
want. It doesn't do this by default however (our needs are somewhat 
specialized.)

>> The question I was wondering is, will the built-in 'file' function
>> return an object of level 3?
> 
> I am hoping to get rid of 'file' altogether. Instead, I want to go
> back to 'open'. Calling open() with a binary mode argument would
> return a layer 2 or layer 1 (if unbuffered) object; calling it with a
> text mode would return a layer 3 object. open() would grow additional
> keyword parameters to specify the encoding, the desired newline
> translation, and perhaps other aspects of the layering that might need
> control.
> 
> BTW in response to Alexander Belopolsky: yes, I would like to continue
> support for something like readinto() by layer 1 and maybe 2 (perhaps
> even more flexible, e.g. specifying a buffer and optional start and
> end indices). I don't think it makes sense for layer 3 since strings
> are immutable. I agree with Martin von Loewis that a readv() style API
> would be impractical (and I note that Alexander doesn't provide any
> use case beyond "it's more efficient").

Note that the .Net API in the example above supports this.

> A use case that I do think is important is reading encoded text data
> asynchronously from a socket. This might mean that layers 2 and 3 may
> have to be aware of the asynchronous (non-blocking or timeout-driven)
> nature of the I/O; reading from layer 3 should give as many characters
> as possible without blocking for I/O more than the specified timeout.
> We should also decide how asynchronous I/O calls report "no more data"
> -- exceptions are inefficient and cause clumsy code, but if we return
> "", how can we tell that apart from EOF? Perhaps we can use None to
> indicate "no more data available without blocking", continuing "" to
> indicate EOF. (The other way around makes just as much sense but would
> be a bigger break with Python's past than this particular issue is
> worth to me.)
> 

From ncoghlan at gmail.com  Mon Aug 21 12:03:46 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 21 Aug 2006 20:03:46 +1000
Subject: [Python-3000] int-long unification
In-Reply-To: <44E88E1F.6010607@v.loewis.de>
References: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>
	<44E88E1F.6010607@v.loewis.de>
Message-ID: <44E98502.5000203@gmail.com>

Martin v. L?wis wrote:
> Guido van Rossum schrieb:
>> Are you interested in doing this at the Google sprint next week?
> 
> Sure; I hadn't any special plans so far.
> 
>> What do you think?
> 
> Sounds good. There are two problems I see:
> 
> - how to benchmark?
> 
> - there are subtle details in the API that require changes
>   to extension code. In particular, PyInt_AsLong currently
>   cannot fail, but can fail with a range error after the
>   unification.

PyInt_AsLong can already fail with OverflowError - pass it a PyLong object and 
it will try to convert it using the nb_int slot and PyLong_AsLong.

PyInt_AsLong is actually somewhat misnamed - it is really PyNumber_AsLong, 
since it accepts arbitrary objects and coerces them to integers via __int__, 
instead of just accepting PyInt instances.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From qrczak at knm.org.pl  Mon Aug 21 13:11:12 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Mon, 21 Aug 2006 13:11:12 +0200
Subject: [Python-3000] int-long unification
In-Reply-To: <44E98502.5000203@gmail.com> (Nick Coghlan's message of "Mon,
	21 Aug 2006 20:03:46 +1000")
References: <ca471dc20608190809x76320b6ctff62cc44f30574ec@mail.gmail.com>
	<44E88E1F.6010607@v.loewis.de> <44E98502.5000203@gmail.com>
Message-ID: <8764gmpcmn.fsf@qrnik.zagroda>

Nick Coghlan <ncoghlan at gmail.com> writes:

> PyInt_AsLong can already fail with OverflowError

> it accepts arbitrary objects and coerces them to integers via
> __int__, instead of just accepting PyInt instances.

If it calls __int__, it can fail with any exception resulting from
user code.

Grepping sources (2.4.2) reveals that usages are split into 4 groups:

1. Calling PyInt_AsLong only after PyInt_Check succeeds.

2. Handling the case when PyInt_AsLong returns -1 and PyErr_Occurred(),
   or just when PyErr_Occurred().

3. Doing both (e.g. Modules/mmapmodule.c). The test is superfluous
   but harmless.

4. Doing neither (e.g. Modules/parsermodule.c, Modules/posixmodule.c,
   Modules/selectmodule.c and possibly more). This is potentially buggy.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From krstic at solarsail.hcs.harvard.edu  Mon Aug 21 13:16:03 2006
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?B?SXZhbiBLcnN0acSH?=)
Date: Mon, 21 Aug 2006 07:16:03 -0400
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <loom.20060821T051839-686@post.gmane.org>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>	<44E8F350.8070509@acm.org>	<ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>
	<loom.20060821T051839-686@post.gmane.org>
Message-ID: <44E995F3.3090208@solarsail.hcs.harvard.edu>

Alexander Belopolsky wrote:
> The new API does not seem to provide for a way to read
> data directly into an existing object without creating
> an intermediate bytes object.

This is among the several things that Itamar Shtull-Trauring mentioned
during his PyCon 2005 talk on 'Fast Networking with Python':

 http://ln-s.net/D+u

While not affecting the new I/O stack design directly, addressing some
of the other ways Itamar lists for improving Python's network efficiency
(deep support for buffers, non-copying split(), Array.Array extensions,
etc) are things we should probably discuss here.

-- 
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | GPG: 0x147C722D

From g.brandl at gmx.net  Mon Aug 21 21:07:44 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 21 Aug 2006 21:07:44 +0200
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <20060821191023.31522.47467@ximinez.python.org>
References: <20060821191023.31522.47467@ximinez.python.org>
Message-ID: <ecd0a0$eli$1@sea.gmane.org>

python.org Webmaster wrote:
> Dear Wiki user,
> 
> You have subscribed to a wiki page or wiki category on "PythonInfo Wiki" for change notification.
> 
> The following page has been changed by 65.57.245.11:
> http://wiki.python.org/moin/GoogleSprintPy3k
> 
> ------------------------------------------------------------------------------
>   
>    * See PEP PEP:3100 for more ideas
>   
> -  * Make zip() an iterator (like itertools.zip())
> +  * Make zip() an iterator (like itertools.izip())
> + 
> +  * Make map() and filter() iterators and make them stop at the end of the shortest input (like zip()) instead of at the end of the longest input

May I suggest an additional keyword(-only?) argument to get the old behavior,
stopping at the end of the longest input?

Georg


From collinw at gmail.com  Mon Aug 21 21:12:02 2006
From: collinw at gmail.com (Collin Winter)
Date: Mon, 21 Aug 2006 14:12:02 -0500
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <ecd0a0$eli$1@sea.gmane.org>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
Message-ID: <43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com>

On 8/21/06, Georg Brandl <g.brandl at gmx.net> wrote:
> python.org Webmaster wrote:
> > -  * Make zip() an iterator (like itertools.zip())
> > +  * Make zip() an iterator (like itertools.izip())
> > +
> > +  * Make map() and filter() iterators and make them stop at the end of the shortest input (like zip()) instead of at the end of the longest input
>
> May I suggest an additional keyword(-only?) argument to get the old behavior,
> stopping at the end of the longest input?

I thought map() and filter() were going away in Py3k? Did that change?

Collin Winter

From guido at python.org  Mon Aug 21 21:14:54 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Aug 2006 12:14:54 -0700
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <ecd0a0$eli$1@sea.gmane.org>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
Message-ID: <ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>

On 8/21/06, Georg Brandl <g.brandl at gmx.net> wrote:
> > +  * Make map() and filter() iterators and make them stop at the end of the shortest input (like zip()) instead of at the end of the longest input
>
> May I suggest an additional keyword(-only?) argument to get the old behavior,
> stopping at the end of the longest input?

I'd rather not. Why, apart from backwards compatibility?

I'd like map(f, a, b) to be the same as to (f(*x) for x in zip(a, b))
so we have to explain less. (And I think even map(f, *args) === (f(*x)
for x in zip(*args)).)

The right way to write code that works in 2.6 and 3.0 is to only use
inputs of the same length.

Perhaps there could be (or is there already?) a helper in itertools
that iterates over multiple iterables padding the shorter inputs with
None to the length of the longest one.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 21 21:16:21 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Aug 2006 12:16:21 -0700
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com>
Message-ID: <ca471dc20608211216r6090aa2ei2b60d0188f5d71c2@mail.gmail.com>

On 8/21/06, Collin Winter <collinw at gmail.com> wrote:
> I thought map() and filter() were going away in Py3k? Did that change?

I still find them useful when using a built-in function, and unlike
reduce(), I have no trouble reading and understanding such code.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From collinw at gmail.com  Mon Aug 21 21:20:15 2006
From: collinw at gmail.com (Collin Winter)
Date: Mon, 21 Aug 2006 14:20:15 -0500
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <ca471dc20608211216r6090aa2ei2b60d0188f5d71c2@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com>
	<ca471dc20608211216r6090aa2ei2b60d0188f5d71c2@mail.gmail.com>
Message-ID: <43aa6ff70608211220i28bc20a5r4d5fe3b66740873d@mail.gmail.com>

On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> On 8/21/06, Collin Winter <collinw at gmail.com> wrote:
> > I thought map() and filter() were going away in Py3k? Did that change?
>
> I still find them useful when using a built-in function, and unlike
> reduce(), I have no trouble reading and understanding such code.

You might want to remove them from PEP 3100, as it still lists them
under "To be removed".

From guido at python.org  Mon Aug 21 21:21:20 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Aug 2006 12:21:20 -0700
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <43aa6ff70608211220i28bc20a5r4d5fe3b66740873d@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com>
	<ca471dc20608211216r6090aa2ei2b60d0188f5d71c2@mail.gmail.com>
	<43aa6ff70608211220i28bc20a5r4d5fe3b66740873d@mail.gmail.com>
Message-ID: <ca471dc20608211221m241dbe5k6f7661da024001ff@mail.gmail.com>

On 8/21/06, Collin Winter <collinw at gmail.com> wrote:
> On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> > On 8/21/06, Collin Winter <collinw at gmail.com> wrote:
> > > I thought map() and filter() were going away in Py3k? Did that change?
> >
> > I still find them useful when using a built-in function, and unlike
> > reduce(), I have no trouble reading and understanding such code.
>
> You might want to remove them from PEP 3100, as it still lists them
> under "To be removed".

With three question marks. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fredrik.johansson at gmail.com  Mon Aug 21 21:28:30 2006
From: fredrik.johansson at gmail.com (Fredrik Johansson)
Date: Mon, 21 Aug 2006 21:28:30 +0200
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>
Message-ID: <3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com>

On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> Perhaps there could be (or is there already?) a helper in itertools
> that iterates over multiple iterables padding the shorter inputs with
> None to the length of the longest one.

I think the most convenient solution would be to handle this with a
keyword argument to zip(), i.e., zip(a, b, pad=True).

Fredrik Johansson

From guido at python.org  Mon Aug 21 21:53:23 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Aug 2006 12:53:23 -0700
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>
	<3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com>
Message-ID: <ca471dc20608211253tb131a78v7cce01180be88d11@mail.gmail.com>

On 8/21/06, Fredrik Johansson <fredrik.johansson at gmail.com> wrote:
> On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> > Perhaps there could be (or is there already?) a helper in itertools
> > that iterates over multiple iterables padding the shorter inputs with
> > None to the length of the longest one.
>
> I think the most convenient solution would be to handle this with a
> keyword argument to zip(), i.e., zip(a, b, pad=True).

First you'll have to show me a real use case where this behavior is
actually needed.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Mon Aug 21 21:57:26 2006
From: martin at v.loewis.de (martin at v.loewis.de)
Date: Mon, 21 Aug 2006 21:57:26 +0200
Subject: [Python-3000] Google Sprint Ideas
In-Reply-To: <C41AE1E8-F6F6-492E-A1DD-AEA6ED3A0E86@local>
References: <ca471dc20608200910s34ef36f6ueb694af487bfbfa4@mail.gmail.com>
	<79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com>
	<ca471dc20608201627n23d4b98m59dbca1c561121e4@mail.gmail.com>
	<44E8F350.8070509@acm.org>
	<ca471dc20608201806x2356cd42i75112ca3850bab01@mail.gmail.com>
	<loom.20060821T051839-686@post.gmane.org>
	<44E92FFC.9080407@v.loewis.de>
	<C41AE1E8-F6F6-492E-A1DD-AEA6ED3A0E86@local>
Message-ID: <1156190246.44ea1026ae2a3@www.domainfactory-webmail.de>

Zitat von Alexander Belopolsky <alexander.belopolsky at gmail.com>:

> > Alexander Belopolsky schrieb:
> >> For Py3k, I would like to suggest a buffer protocol modelled
> >> after iovec structure that is used by the readv system call.
> >
> > -1
>
> What is this -1 for:
>
> a) buffer protocol in Py3k?
> b) multisegment buffer protocol?
> c) readinto that supports multisegment buffers?

b and c; I don't have an opinion a.

> I did not suggest to expose anything in Python.  AFAIK, the buffer
> protocol is a C API only.

Ah; now that the IO library will be likely 100% pure Python,
this needs thought.

Regards,
Martin





From fredrik.johansson at gmail.com  Mon Aug 21 22:35:37 2006
From: fredrik.johansson at gmail.com (Fredrik Johansson)
Date: Mon, 21 Aug 2006 22:35:37 +0200
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <ca471dc20608211253tb131a78v7cce01180be88d11@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>
	<3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com>
	<ca471dc20608211253tb131a78v7cce01180be88d11@mail.gmail.com>
Message-ID: <3d0cebfb0608211335h38ddfc87hc582e086e3b03f93@mail.gmail.com>

On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> On 8/21/06, Fredrik Johansson <fredrik.johansson at gmail.com> wrote:
> > On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> > > Perhaps there could be (or is there already?) a helper in itertools
> > > that iterates over multiple iterables padding the shorter inputs with
> > > None to the length of the longest one.
> >
> > I think the most convenient solution would be to handle this with a
> > keyword argument to zip(), i.e., zip(a, b, pad=True).
>
> First you'll have to show me a real use case where this behavior is
> actually needed.

I didn't suggest that this feature is needed. But if it is, extending
zip() to handle both cases hardly seems to add more cruft to the
language than adding a whole new function (stuffed away in a library
where not even the language's creator remembers whether it exists :-).

Fredrik Johansson

From guido at python.org  Mon Aug 21 22:40:47 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Aug 2006 13:40:47 -0700
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <3d0cebfb0608211335h38ddfc87hc582e086e3b03f93@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>
	<3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com>
	<ca471dc20608211253tb131a78v7cce01180be88d11@mail.gmail.com>
	<3d0cebfb0608211335h38ddfc87hc582e086e3b03f93@mail.gmail.com>
Message-ID: <ca471dc20608211340w5ca9597bs6b8308a8a0a74695@mail.gmail.com>

On 8/21/06, Fredrik Johansson <fredrik.johansson at gmail.com> wrote:
> On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> > On 8/21/06, Fredrik Johansson <fredrik.johansson at gmail.com> wrote:
> > > On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> > > > Perhaps there could be (or is there already?) a helper in itertools
> > > > that iterates over multiple iterables padding the shorter inputs with
> > > > None to the length of the longest one.
> > >
> > > I think the most convenient solution would be to handle this with a
> > > keyword argument to zip(), i.e., zip(a, b, pad=True).
> >
> > First you'll have to show me a real use case where this behavior is
> > actually needed.
>
> I didn't suggest that this feature is needed. But if it is, extending
> zip() to handle both cases hardly seems to add more cruft to the
> language than adding a whole new function (stuffed away in a library
> where not even the language's creator remembers whether it exists :-).

I beg to disagree. In general I don't like flag arguments that modify
the behavior of a call, when in practice the flag value passed will
nearly always be a constant. That's why we have e.g. find() and
rfind(), not find(..., fromright=False).

Also, I'd like to call YAGNI (and stop wasting everybody's time)
unless a good use case is brought up.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Mon Aug 21 23:21:30 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 21 Aug 2006 14:21:30 -0700
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <44E950B2.4060305@acm.org>
References: <ca471dc20608200824l2129308ne26e5332ce7585ce@mail.gmail.com>
	<44E950B2.4060305@acm.org>
Message-ID: <20060821081944.1A0F.JCARLSON@uci.edu>


Talin <talin at acm.org> wrote:
[snip]
> I've been thinking about the transition to unicode strings, and I want 
> to put forward a notion that might allow the transition to be done 
> gradually instead of all at once.
> 
> The idea would be to temporarily introduce a new name for 8-bit strings 
> - let's call it "ascii". An "ascii" object would be exactly the same as 
> today's 8-bit strings.

There are two parts to the unicode conversion; all literals are unicode,
and we don't have strings anymore, we have bytes.  Without offering the
bytes object, then people can't really convert their code.  String
literals can be handled with the -U command line option (and perhaps
having the interpreter do the str=unicode assignment during startup).


In any case, as I look at Py3k and the future of Python, in each release,
I ask "what are the compelling features that make me want to upgrade?"
In each of the 1.5-2.5 series that I've looked at, each has had some
compelling feature or another that has basically required that I upgrade,
or seriously consider upgrading (bugfixes for stuff that has bitten me,
new syntax that I use, significant increases in speed, etc.) .

As we approach Py3k, I again ask, "what are the compelling features?"
Wholesale breakage of anything that uses ascii strings as text or binary
data? A completely changed IO stack (requiring re-learning of everything
known about Python IO)?  Dictionary .keys(), .values(), and .items()
being their .iter*() equivalents (making it just about impossible to
optimize for Py3k dictionary behavior now)?

I understand getting rid of the cruft, really I do (you should see some
cruft I've been replacing lately). But some of that cruft is useful, or
really, some of that cruft has no alternative currently, which will
require significant rewrites of user code when Py3k is released.  When
everyone has to rewrite their code, they are going to ask, "Why don't I
just stick with the maintenance 2.x? It's going to be maintained for a
few more years yet, and I don't need to rewrite all of my disk IO,
strings in dictionary code, etc.  I will be right along with them (no
offense intended to those currently working towards py3k).

I can code defensively against buffer-sturating DOS attacks with my
socket code, but I can't code defensively to handle some (never mind all)
of the changes and incompatabilities that Py3k will bring.

Here's my suggestion: every feature, syntax, etc., that is slated for
Py3k, let us release bit by bit in the 2.x series.  That lets the 2.x
series evolve into the 3.x series in a somewhat more natural way than
the currently proposed *everything breaks*.  If it takes 1, 2, 3, or 10
more releases in the 2.x series to get to all of the 3.x features, great.
At least people will have a chance to convert, or at least write correct
code for the future.

Say 2.6 gets bytes and special factories (or a special encoding argument)
for file/socket to return bytes instead of strings, and only accept
bytes objects to .write() methods (unless an encoding on the file, etc.,
was previously given). Given these bytes objects, it may even make sense
to offer the .readinto() method that Alex B has been asking for (which
would make 3 built-in objects that could reasonably support readinto:
bytes, array, mmap).

If the IO library is available for 2.6, toss that in there, or offer it
in PyPI as an evolving library.

I would suggest pushing off the dict changes until 2.7 or later, as
there are 340+ examples of dict.keys() in the Python 2.5b2 standard
library, at least half of which are going to need to be changed to
list(dict.keys()) or otherwise.  The breakage in user code will likely
be at least as substantial.


Those are just examples that come to mind now, but I'm sure there are
others changes with similar issues.

 - Josiah


From exarkun at divmod.com  Mon Aug 21 23:38:17 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Mon, 21 Aug 2006 17:38:17 -0400
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <20060821081944.1A0F.JCARLSON@uci.edu>
Message-ID: <20060821213817.1717.1725966885.divmod.quotient.28023@ohm>

On Mon, 21 Aug 2006 14:21:30 -0700, Josiah Carlson <jcarlson at uci.edu> wrote:
>
>Talin <talin at acm.org> wrote:
>[snip]
>> I've been thinking about the transition to unicode strings, and I want
>> to put forward a notion that might allow the transition to be done
>> gradually instead of all at once.
>>
>> The idea would be to temporarily introduce a new name for 8-bit strings
>> - let's call it "ascii". An "ascii" object would be exactly the same as
>> today's 8-bit strings.
>
>There are two parts to the unicode conversion; all literals are unicode,
>and we don't have strings anymore, we have bytes.  Without offering the
>bytes object, then people can't really convert their code.  String
>literals can be handled with the -U command line option (and perhaps
>having the interpreter do the str=unicode assignment during startup).
>

A third step would ease this transition significantly: a unicode_literals __future__ import.

>
>Here's my suggestion: every feature, syntax, etc., that is slated for
>Py3k, let us release bit by bit in the 2.x series.  That lets the 2.x
>series evolve into the 3.x series in a somewhat more natural way than
>the currently proposed *everything breaks*.  If it takes 1, 2, 3, or 10
>more releases in the 2.x series to get to all of the 3.x features, great.
>At least people will have a chance to convert, or at least write correct
>code for the future.

This really seems like the right idea.  "Shoot the moon" upgrades are
almost always worse than incremental upgrades.

The incremental path is better for everyone involved.  For developers of
Python, it gets more people using and providing feedback on the new
features being developed.  For developers with Python, it keeps the scope
of a particular upgrade more manageable, letting them developer focus on a
much smaller set of changes to be made to their application.

Jean-Paul

From guido at python.org  Tue Aug 22 02:36:41 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Aug 2006 17:36:41 -0700
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <20060821081944.1A0F.JCARLSON@uci.edu>
References: <ca471dc20608200824l2129308ne26e5332ce7585ce@mail.gmail.com>
	<44E950B2.4060305@acm.org> <20060821081944.1A0F.JCARLSON@uci.edu>
Message-ID: <ca471dc20608211736h5f8903cctc92c60c5bd6e538e@mail.gmail.com>

On 8/21/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> As we approach Py3k, I again ask, "what are the compelling features?"
> Wholesale breakage of anything that uses ascii strings as text or binary
> data? A completely changed IO stack (requiring re-learning of everything
> known about Python IO)?  Dictionary .keys(), .values(), and .items()
> being their .iter*() equivalents (making it just about impossible to
> optimize for Py3k dictionary behavior now)?

I guess py3k is not for you yet. That's a totally defensible point of
view, and that's why there will be Python 2.6, 2.7, 2.8 and 2.9
(probably) which will gradually close the gap, after which you will
have the choice of maintaining 2.9 yourself or making the switch. :-)

> I understand getting rid of the cruft, really I do (you should see some
> cruft I've been replacing lately). But some of that cruft is useful, or
> really, some of that cruft has no alternative currently, which will
> require significant rewrites of user code when Py3k is released.  When
> everyone has to rewrite their code, they are going to ask, "Why don't I
> just stick with the maintenance 2.x? It's going to be maintained for a
> few more years yet, and I don't need to rewrite all of my disk IO,
> strings in dictionary code, etc.  I will be right along with them (no
> offense intended to those currently working towards py3k).

And yet offense is taken. Have you watched the video of my Py3k talk?
Search for it on Google Video.

> I can code defensively against buffer-sturating DOS attacks with my
> socket code, but I can't code defensively to handle some (never mind all)
> of the changes and incompatabilities that Py3k will bring.

And that's why there will be conversion tools and aids.

> Here's my suggestion: every feature, syntax, etc., that is slated for
> Py3k, let us release bit by bit in the 2.x series.  That lets the 2.x
> series evolve into the 3.x series in a somewhat more natural way than
> the currently proposed *everything breaks*.  If it takes 1, 2, 3, or 10
> more releases in the 2.x series to get to all of the 3.x features, great.
> At least people will have a chance to convert, or at least write correct
> code for the future.

That will happen, whenever possible. For other features it is infeasible.

> Say 2.6 gets bytes and special factories (or a special encoding argument)
> for file/socket to return bytes instead of strings, and only accept
> bytes objects to .write() methods (unless an encoding on the file, etc.,
> was previously given). Given these bytes objects, it may even make sense
> to offer the .readinto() method that Alex B has been asking for (which
> would make 3 built-in objects that could reasonably support readinto:
> bytes, array, mmap).
>
> If the IO library is available for 2.6, toss that in there, or offer it
> in PyPI as an evolving library.

Could do.

> I would suggest pushing off the dict changes until 2.7 or later, as
> there are 340+ examples of dict.keys() in the Python 2.5b2 standard
> library, at least half of which are going to need to be changed to
> list(dict.keys()) or otherwise.  The breakage in user code will likely
> be at least as substantial.

Perhaps you want to help write the transition PEP?

> Those are just examples that come to mind now, but I'm sure there are
> others changes with similar issues.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From murman at gmail.com  Tue Aug 22 05:07:04 2006
From: murman at gmail.com (Michael Urman)
Date: Mon, 21 Aug 2006 22:07:04 -0500
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>
Message-ID: <dcbbbb410608212007l7710c284wc98e91b903b4051e@mail.gmail.com>

On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> I'd like map(f, a, b) to be the same as to (f(*x) for x in zip(a, b))
> so we have to explain less. (And I think even map(f, *args) === (f(*x)
> for x in zip(*args)).)

Should map(None, a, b) == zip(a, b), leaving python with multiple ways
to do one thing? Or should the surprising but useful map(None, ...)
behavior disappear or become even more surprising by padding? Is there
any reason at all for map to take multiple sequences now that we have
starmap and (i)zip?
-- 
Michael Urman  http://www.tortall.net/mu/blog

From collinw at gmail.com  Tue Aug 22 05:16:58 2006
From: collinw at gmail.com (Collin Winter)
Date: Mon, 21 Aug 2006 22:16:58 -0500
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <dcbbbb410608212007l7710c284wc98e91b903b4051e@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>
	<dcbbbb410608212007l7710c284wc98e91b903b4051e@mail.gmail.com>
Message-ID: <43aa6ff70608212016m683c3b8ci9803c31858c937e7@mail.gmail.com>

On 8/21/06, Michael Urman <murman at gmail.com> wrote:
> On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> > I'd like map(f, a, b) to be the same as to (f(*x) for x in zip(a, b))
> > so we have to explain less. (And I think even map(f, *args) === (f(*x)
> > for x in zip(*args)).)
>
> Should map(None, a, b) == zip(a, b), leaving python with multiple ways
> to do one thing? Or should the surprising but useful map(None, ...)
> behavior disappear or become even more surprising by padding? Is there
> any reason at all for map to take multiple sequences now that we have
> starmap and (i)zip?

FWIW, I'm ambivalent as to whether map() accepts multiple sequences,
but I'm strongly in favor of map(None, ....) disappearing. Similarly,
I'd want to see filter(None, ...) go away, too; fastpathing the case
of filter(bool, ....) will achieve the same performance benefit.

Collin Winter

From tjreedy at udel.edu  Tue Aug 22 05:19:30 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 21 Aug 2006 23:19:30 -0400
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
References: <ca471dc20608200824l2129308ne26e5332ce7585ce@mail.gmail.com><44E950B2.4060305@acm.org>
	<20060821081944.1A0F.JCARLSON@uci.edu>
	<ca471dc20608211736h5f8903cctc92c60c5bd6e538e@mail.gmail.com>
Message-ID: <ecdt42$j5$1@sea.gmane.org>


"Guido van Rossum" <guido at python.org> wrote in message 
news:ca471dc20608211736h5f8903cctc92c60c5bd6e538e at mail.gmail.com...

> On 8/21/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>>  When
>> everyone has to rewrite their code, they are going to ask, "Why don't I
>> just stick with the maintenance 2.x? It's going to be maintained for a
>> few more years yet, and I don't need to rewrite all of my disk IO,
>> strings in dictionary code, etc.  I will be right along with them

Many apps never will be converted, just as there are still things running 
under 1.5 and all versions since.  The changeover to writing new stuff in 
3.x will be at least somewhat gradual, as such things always are, and that 
is a good thing, lest the issue tracker be flooded with more items than can 
be dealt with.

> Have you watched the video of my Py3k talk?
> Search for it on Google Video

Searching Guido Python returns
http://video.google.com/videoplay?docid=-6459339159268485356
It pretty well summarizes the results of discussion here up to a month ago.

Terry Jan Reedy




From guido at python.org  Tue Aug 22 05:55:49 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Aug 2006 20:55:49 -0700
Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by
	65.57.245.11
In-Reply-To: <43aa6ff70608212016m683c3b8ci9803c31858c937e7@mail.gmail.com>
References: <20060821191023.31522.47467@ximinez.python.org>
	<ecd0a0$eli$1@sea.gmane.org>
	<ca471dc20608211214j49fdb7b5ta4aaa845785c7a77@mail.gmail.com>
	<dcbbbb410608212007l7710c284wc98e91b903b4051e@mail.gmail.com>
	<43aa6ff70608212016m683c3b8ci9803c31858c937e7@mail.gmail.com>
Message-ID: <ca471dc20608212055k342f917vfa78dc900abbb557@mail.gmail.com>

On 8/21/06, Collin Winter <collinw at gmail.com> wrote:
> On 8/21/06, Michael Urman <murman at gmail.com> wrote:
> > On 8/21/06, Guido van Rossum <guido at python.org> wrote:
> > > I'd like map(f, a, b) to be the same as to (f(*x) for x in zip(a, b))
> > > so we have to explain less. (And I think even map(f, *args) === (f(*x)
> > > for x in zip(*args)).)
> >
> > Should map(None, a, b) == zip(a, b), leaving python with multiple ways
> > to do one thing? Or should the surprising but useful map(None, ...)
> > behavior disappear or become even more surprising by padding? Is there
> > any reason at all for map to take multiple sequences now that we have
> > starmap and (i)zip?
>
> FWIW, I'm ambivalent as to whether map() accepts multiple sequences,
> but I'm strongly in favor of map(None, ....) disappearing. Similarly,
> I'd want to see filter(None, ...) go away, too; fastpathing the case
> of filter(bool, ....) will achieve the same performance benefit.

I think map(f, a, b, ...) and filter(p, a, b, ...) should stay, but
the None cases should be gotten rid of. I don't want to move starmap()
out of itertools into builtins.

I expect that filter(bool, a) is fast enough without greasing the
tracks, but if you don't, feel free to benchmark it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 23 03:32:39 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 22 Aug 2006 18:32:39 -0700
Subject: [Python-3000] Droping find/rfind?
Message-ID: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>

At today's sprint, one of the volunteers completed a patch to rip out
find() and rfind(), replacing all calls with index()/rindex(). But now
I'm getting cold feet -- is this really a good idea? (It's been listed
in PEP 3100 for a long time, but I haven't thought about it much,
really.)

What do people think?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tim.peters at gmail.com  Wed Aug 23 03:47:18 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Tue, 22 Aug 2006 21:47:18 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
Message-ID: <1f7befae0608221847t55c09c57r64cd65511b51f6d4@mail.gmail.com>

[Guido van Rossum]
> At today's sprint, one of the volunteers completed a patch to rip out
> find() and rfind(), replacing all calls with index()/rindex(). But now
> I'm getting cold feet -- is this really a good idea? (It's been listed
> in PEP 3100 for a long time, but I haven't thought about it much,
> really.)
>
> What do people think?

I'd rather toss index/rindex myself, although I understand that
[r]find's -1 return value for "not found" can trip up newbies.  Like I
care ;-)

If you decide to toss [r]find anyway, I'd rather see "not found" be
spelled with an exception more specific than ValueError (who knows
what all "except ValueError:" is going to catch?  /Just/ that the
substring wasn't found?  Ya, that's something to bet your life on
;-)).

From jcarlson at uci.edu  Wed Aug 23 04:38:47 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 22 Aug 2006 19:38:47 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
Message-ID: <20060822191712.1A39.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> At today's sprint, one of the volunteers completed a patch to rip out
> find() and rfind(), replacing all calls with index()/rindex(). But now
> I'm getting cold feet -- is this really a good idea? (It's been listed
> in PEP 3100 for a long time, but I haven't thought about it much,
> really.)
> 
> What do people think?

I have code for Python 2.x that uses [r]find, but have been
transitioning some of it to use [r]partition instead (writing
implementations based on [r]find, but it could have just as easily used
[r]split). Ultimately I think that an unambiguous 'find without slicing'
is useful.

One of the issues with the -1 return on find failure is that it is
ambiguous, one must really check for a -1 return. Here's an API that is
non-ambiguous:
    x.search(y, start=0, stop=sys.maxint, count=sys.maxint)

Which will return a list of up to count non-overlapping examples of y in
x from start to stop.  On failure, it returns an empty list.  This
particular API is at least as powerful as the currently existing [r]find
one, is unambiguous, etc.  It also has a not accidental similarity to
x.split(y, count=sys.maxint), which has served Python for quite a while,
though this would differ in that rather than always returning a list of
at least 1, it could return an empty list.

Its functionality is somewhat mirrored by re.finditer, but the above
search function can be easily turned into rsearch, whereas re is
forward-only.

If I were in a position to suggest a change, I would agree with Tim's
feeling that [r]index should go before [r]find, but I also think that 
[r]find could be made unambiguous; the above being an example of such,
but one that I'm not going to push for except as an example unambiguous
implementation.

 - Josiah


From jack at psynchronous.com  Wed Aug 23 06:41:48 2006
From: jack at psynchronous.com (Jack Diederich)
Date: Wed, 23 Aug 2006 00:41:48 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
Message-ID: <20060823044148.GR5772@performancedrivers.com>

On Tue, Aug 22, 2006 at 06:32:39PM -0700, Guido van Rossum wrote:
> At today's sprint, one of the volunteers completed a patch to rip out
> find() and rfind(), replacing all calls with index()/rindex(). But now
> I'm getting cold feet -- is this really a good idea? (It's been listed
> in PEP 3100 for a long time, but I haven't thought about it much,
> really.)
> 
> What do people think?

Looking at my own code I use find() in two cases

1) in an "if" clause where "in" or startswith() would be appropriate
   This code was written when I started with python and is closer to
   C++ or perl or was a literal translation of a snippet of C++ or perl

2) where try/except around index() would work just fine and partition
   would be even better.  eg/
   try:
     parts.append(text[text.index('himom')])
   except ValueError: pass

This is 50 uses of find/rfind in 70 KLOCs of python.  Considering I would
be better off not using find() in the places I do use it I would be happy
to see it go.  

-Jack

From g.brandl at gmx.net  Wed Aug 23 08:45:00 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 23 Aug 2006 08:45:00 +0200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <1f7befae0608221847t55c09c57r64cd65511b51f6d4@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<1f7befae0608221847t55c09c57r64cd65511b51f6d4@mail.gmail.com>
Message-ID: <ecgthe$454$1@sea.gmane.org>

Tim Peters wrote:
> [Guido van Rossum]
>> At today's sprint, one of the volunteers completed a patch to rip out
>> find() and rfind(), replacing all calls with index()/rindex(). But now
>> I'm getting cold feet -- is this really a good idea? (It's been listed
>> in PEP 3100 for a long time, but I haven't thought about it much,
>> really.)
>>
>> What do people think?
> 
> I'd rather toss index/rindex myself, although I understand that
> [r]find's -1 return value for "not found" can trip up newbies.  Like I
> care ;-)

Perhaps a search() method, like Josiah proposed, makes sense.

> If you decide to toss [r]find anyway, I'd rather see "not found" be
> spelled with an exception more specific than ValueError (who knows
> what all "except ValueError:" is going to catch?  /Just/ that the
> substring wasn't found?  Ya, that's something to bet your life on
> ;-)).

Seriously, this is something I have thought of from time to time:
an exceptions' "source", so that you could say

try:
     x = int(some expression)
except ValueError from int:
     do something

Obviously, it's too much work to add such a thing though.

Georg


From holmesbj.dev at gmail.com  Wed Aug 23 08:46:22 2006
From: holmesbj.dev at gmail.com (Brian Holmes)
Date: Tue, 22 Aug 2006 23:46:22 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060823044148.GR5772@performancedrivers.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
Message-ID: <e3c648160608222346g4587d55eiff521787ca4d915f@mail.gmail.com>

On 8/22/06, Jack Diederich <jack at psynchronous.com> wrote:
>
> On Tue, Aug 22, 2006 at 06:32:39PM -0700, Guido van Rossum wrote:
> > At today's sprint, one of the volunteers completed a patch to rip out
> > find() and rfind(), replacing all calls with index()/rindex(). But now
> > I'm getting cold feet -- is this really a good idea? (It's been listed
> > in PEP 3100 for a long time, but I haven't thought about it much,
> > really.)
> >
> > What do people think?
>
> Looking at my own code I use find() in two cases
>
> 1) in an "if" clause where "in" or startswith() would be appropriate
>    This code was written when I started with python and is closer to
>    C++ or perl or was a literal translation of a snippet of C++ or perl
>
> 2) where try/except around index() would work just fine and partition
>    would be even better.  eg/
>    try:
>      parts.append(text[text.index('himom')])
>    except ValueError: pass
>
> This is 50 uses of find/rfind in 70 KLOCs of python.  Considering I would
> be better off not using find() in the places I do use it I would be happy
> to see it go.
>
> -Jack
> _______________________________________________
>

Even after reading Terry Reedy's arguments, I don't see why we need to
remove this option.  Let both exist.  I'd prefer grandfathering something
like this and leaving it in, even if it wouldn't be there had known
everything from the start.

I just don't think its worth causing people grief in porting to Py3k for
something so trivial.  I support fixing things in Py3k that are real
improvements, but this doesn't really seem like its worth the trade off.

- Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060822/4ba3633a/attachment.html 

From holmesbj.dev at gmail.com  Wed Aug 23 08:50:53 2006
From: holmesbj.dev at gmail.com (Brian Holmes)
Date: Tue, 22 Aug 2006 23:50:53 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060822191712.1A39.JCARLSON@uci.edu>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060822191712.1A39.JCARLSON@uci.edu>
Message-ID: <e3c648160608222350w5f01caebv368ff9adb10a8690@mail.gmail.com>

On 8/22/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
>
> "Guido van Rossum" <guido at python.org> wrote:
> > At today's sprint, one of the volunteers completed a patch to rip out
> > find() and rfind(), replacing all calls with index()/rindex(). But now
> > I'm getting cold feet -- is this really a good idea? (It's been listed
> > in PEP 3100 for a long time, but I haven't thought about it much,
> > really.)
> >
> > What do people think?


[snip]

One of the issues with the -1 return on find failure is that it is
> ambiguous, one must really check for a -1 return. Here's an API that is
> non-ambiguous:
>     x.search(y, start=0, stop=sys.maxint, count=sys.maxint)
>
> Which will return a list of up to count non-overlapping examples of y in
> x from start to stop.  On failure, it returns an empty list.  This
> particular API is at least as powerful as the currently existing [r]find
> one, is unambiguous, etc.  It also has a not accidental similarity to
> x.split(y, count=sys.maxint), which has served Python for quite a while,
> though this would differ in that rather than always returning a list of
> at least 1, it could return an empty list.
>
> Its functionality is somewhat mirrored by re.finditer, but the above
> search function can be easily turned into rsearch, whereas re is
> forward-only.


[snip]

- Josiah
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/holmesbj.dev%40gmail.com
>

+1

I think that would make a great addition to Py3k, or even 2.6.

- Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060822/051f8950/attachment.htm 

From greg.ewing at canterbury.ac.nz  Wed Aug 23 09:35:00 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 23 Aug 2006 19:35:00 +1200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060822191712.1A39.JCARLSON@uci.edu>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060822191712.1A39.JCARLSON@uci.edu>
Message-ID: <44EC0524.2060206@canterbury.ac.nz>

Josiah Carlson wrote:

> One of the issues with the -1 return on find failure is that it is
> ambiguous, one must really check for a -1 return. Here's an API that is
> non-ambiguous:

An alternative would be to return None for not found.
It wouldn't solve the problem of people using the
return value as a boolean, but at least you'd get
an exception if you tried to use the not-found value
as an index.

Or maybe it could return index values as a special
int subclass that always tests true even when it's
zero...

--
Greg

From jjl at pobox.com  Wed Aug 23 13:04:56 2006
From: jjl at pobox.com (John J Lee)
Date: Wed, 23 Aug 2006 12:04:56 +0100 (GMT Standard Time)
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44EC0524.2060206@canterbury.ac.nz>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060822191712.1A39.JCARLSON@uci.edu>
	<44EC0524.2060206@canterbury.ac.nz>
Message-ID: <Pine.WNT.4.64.0608231204010.1204@shaolin>

On Wed, 23 Aug 2006, Greg Ewing wrote:

> Josiah Carlson wrote:
>
>> One of the issues with the -1 return on find failure is that it is
>> ambiguous, one must really check for a -1 return. Here's an API that is
>> non-ambiguous:
>
> An alternative would be to return None for not found.
> It wouldn't solve the problem of people using the
> return value as a boolean, but at least you'd get
> an exception if you tried to use the not-found value
> as an index.
>
> Or maybe it could return index values as a special
> int subclass that always tests true even when it's
> zero...

How about returning a str.NotFound object?


John

From ncoghlan at gmail.com  Wed Aug 23 13:43:12 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 23 Aug 2006 21:43:12 +1000
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
Message-ID: <44EC3F50.3040609@gmail.com>

Guido van Rossum wrote:
> At today's sprint, one of the volunteers completed a patch to rip out
> find() and rfind(), replacing all calls with index()/rindex(). But now
> I'm getting cold feet -- is this really a good idea? (It's been listed
> in PEP 3100 for a long time, but I haven't thought about it much,
> really.)
> 
> What do people think?
> 
I'd be more interested in a patch that replaced standard library uses of
find()/rfind() with either "if sub in string" or partition()/rpartition(). 
Replacing usage of find() for slicing purposes is one of the big reasons the 
latter methods were added, after all.

I also like Josiah's idea of replacing find() with a search() method that 
returned an iterator of indices, so that you can do:

for idx in string.search(sub):
    # Process the indices (if any)

Then you have 5 substring searching mechanisms for different uses cases:

   sub in s          (simple containment test)
   s.index(sub)      (first index, exception if not found)
   s.search(sub)     (iterator of indices, empty if not found)
   s.partition(sep)  (split on first occurrence of substring)
   s.split(sep)      (split on all occurrences of substring)

Cheers,
Nick.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From barry at python.org  Wed Aug 23 14:37:08 2006
From: barry at python.org (Barry Warsaw)
Date: Wed, 23 Aug 2006 08:37:08 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060823044148.GR5772@performancedrivers.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
Message-ID: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>

I agree with Tim -- if we have to get rid of one of them, let's get  
rid of index/rindex and keep find/rfind.  Catching the exception is  
much less convenient than testing for -1.

-Barry


From guido at python.org  Wed Aug 23 16:20:54 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 23 Aug 2006 07:20:54 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
Message-ID: <ca471dc20608230720w20503b76n55c2e9ae7c95695b@mail.gmail.com>

On 8/23/06, Barry Warsaw <barry at python.org> wrote:
> I agree with Tim -- if we have to get rid of one of them, let's get
> rid of index/rindex and keep find/rfind.  Catching the exception is
> much less convenient than testing for -1.

But the -1 is very error-prone, as many have experienced. Also, many
uses of find() should be replaced by 'in' (long ago, 'in' only
accepted one-character strings on the left and find() was the best
alternative) or partition().

To the folks asking for it to stay because it's harmless: in py3k I
want to rip out lots of "harmless" to make the language smaller. A
smaller language is also a feature, and a very important one -- a
frequent complaint I hear is that over time the language has lost some
of its original smallness, which reduces some of the reasons why
people were attracted to it in the first place. (Also, removing
features makes room for new ones -- Bertrand Meyer, Eiffel's creator,
often asks users demanding a new feature to point out which feature
they are willing to drop to make room.)

I don't want Python to become like Emacs, which I still use, but
generally don't recommend to new developers any more... If you haven't
grown up with it, its current state is hard to understand and hard to
defend.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steven.bethard at gmail.com  Wed Aug 23 16:31:55 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 23 Aug 2006 08:31:55 -0600
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608230720w20503b76n55c2e9ae7c95695b@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
	<ca471dc20608230720w20503b76n55c2e9ae7c95695b@mail.gmail.com>
Message-ID: <d11dcfba0608230731x8843da9j179d12cec70ffc76@mail.gmail.com>

On 8/23/06, Guido van Rossum <guido at python.org> wrote:
> On 8/23/06, Barry Warsaw <barry at python.org> wrote:
> > I agree with Tim -- if we have to get rid of one of them, let's get
> > rid of index/rindex and keep find/rfind.  Catching the exception is
> > much less convenient than testing for -1.
>
> But the -1 is very error-prone, as many have experienced. Also, many
> uses of find() should be replaced by 'in' (long ago, 'in' only
> accepted one-character strings on the left and find() was the best
> alternative) or partition().

FWLIW, I only started using Python at the tail end of 2.2, so the 'in'
started working with substrings pretty early for me.  I do a fair bit
of work with text (my research is in natural language processing) and
yet I have exactly zero instances of [r]find() in my code. So I at
least wouldn't miss them if they were gone.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From phd at mail2.phd.pp.ru  Wed Aug 23 16:44:33 2006
From: phd at mail2.phd.pp.ru (Oleg Broytmann)
Date: Wed, 23 Aug 2006 18:44:33 +0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608230720w20503b76n55c2e9ae7c95695b@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
	<ca471dc20608230720w20503b76n55c2e9ae7c95695b@mail.gmail.com>
Message-ID: <20060823144432.GA10709@phd.pp.ru>

On Wed, Aug 23, 2006 at 07:20:54AM -0700, Guido van Rossum wrote:
> in py3k I
> want to rip out lots of "harmless" to make the language smaller. A
> smaller language is also a feature, and a very important one -- a
> frequent complaint I hear is that over time the language has lost some
> of its original smallness, which reduces some of the reasons why
> people were attracted to it in the first place.

   IMHO find() is not a part of the language - it is a part of the standard
library. When people complain about the *language* they AFAIU mean "print >>",
[list comprehension], iterators, generators and (generator expressions),
@decorators, "with", "case"...

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From guido at python.org  Wed Aug 23 17:18:03 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 23 Aug 2006 08:18:03 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060823144432.GA10709@phd.pp.ru>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
	<ca471dc20608230720w20503b76n55c2e9ae7c95695b@mail.gmail.com>
	<20060823144432.GA10709@phd.pp.ru>
Message-ID: <ca471dc20608230818x63624cd0uaf7a356f1e883593@mail.gmail.com>

That's too narrow a view on the language. Surely the built-in types
(especially those with direct compiler support, like literal
notations) are part of the language. The people who complain most
frequently about Python getting too big aren't language designers,
they are users (e.g. scientists) and to them it doesn't matter what
technically is or isn't in the language -- it's the complete set of
tools they have to deal with. That doesn't include all of the standard
library, but it surely includes the built-in types and their behavior!
Otherwise the int/long and str/unicode unifications wouldn't be
language changes either...

-Guido

On 8/23/06, Oleg Broytmann <phd at oper.phd.pp.ru> wrote:
> On Wed, Aug 23, 2006 at 07:20:54AM -0700, Guido van Rossum wrote:
> > in py3k I
> > want to rip out lots of "harmless" to make the language smaller. A
> > smaller language is also a feature, and a very important one -- a
> > frequent complaint I hear is that over time the language has lost some
> > of its original smallness, which reduces some of the reasons why
> > people were attracted to it in the first place.
>
>    IMHO find() is not a part of the language - it is a part of the standard
> library. When people complain about the *language* they AFAIU mean "print >>",
> [list comprehension], iterators, generators and (generator expressions),
> @decorators, "with", "case"...
>
> Oleg.
> --
>      Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
>            Programmers don't die, they just GOSUB without RETURN.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From phd at mail2.phd.pp.ru  Wed Aug 23 17:28:15 2006
From: phd at mail2.phd.pp.ru (Oleg Broytmann)
Date: Wed, 23 Aug 2006 19:28:15 +0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608230818x63624cd0uaf7a356f1e883593@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
	<ca471dc20608230720w20503b76n55c2e9ae7c95695b@mail.gmail.com>
	<20060823144432.GA10709@phd.pp.ru>
	<ca471dc20608230818x63624cd0uaf7a356f1e883593@mail.gmail.com>
Message-ID: <20060823152815.GA17442@phd.pp.ru>

On Wed, Aug 23, 2006 at 08:18:03AM -0700, Guido van Rossum wrote:
> That's too narrow a view on the language.

   I narrowed it by purpose for this discussion.

> Surely the built-in types
> (especially those with direct compiler support, like literal
> notations) are part of the language.

   And still I believe they are two different markets, and you cannot trade
features between them. I am sure it would be hard to by space for new
language (in that narrow sense) features by removing methods from the
standard types.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From barry at python.org  Wed Aug 23 17:52:35 2006
From: barry at python.org (Barry Warsaw)
Date: Wed, 23 Aug 2006 11:52:35 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608230818x63624cd0uaf7a356f1e883593@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
	<ca471dc20608230720w20503b76n55c2e9ae7c95695b@mail.gmail.com>
	<20060823144432.GA10709@phd.pp.ru>
	<ca471dc20608230818x63624cd0uaf7a356f1e883593@mail.gmail.com>
Message-ID: <13DEBA81-AE71-4E2C-BD5C-AC152747BFF2@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 23 Aug 2006 08:18:03 -0700
"Guido van Rossum" <guido at python.org> wrote:

> That's too narrow a view on the language. Surely the built-in types
> (especially those with direct compiler support, like literal
> notations) are part of the language. The people who complain most
> frequently about Python getting too big aren't language designers,
> they are users (e.g. scientists) and to them it doesn't matter what
> technically is or isn't in the language -- it's the complete set of
> tools they have to deal with. That doesn't include all of the standard
> library, but it surely includes the built-in types and their behavior!
> Otherwise the int/long and str/unicode unifications wouldn't be
> language changes either...

Oleg has a point though.  Speaking generally, the perception of
"bigness" comes down to how much you can -- and /have/ to -- keep in
your head at one time while programming or reading code.  Python's
traditionally made excellent choices here.  The language is small
enough to keep in your head but the library is huge.  I don't know
about anybody else, but my aging brain can't keep much of the library
in its RAM so I'm highly dependent on help() and the library reference
manual to find things when I need them.

But I almost never have to look up a particular language feature, and
this was one of the primary reasons I switch from Perl to Python over a
decade ago.  To me, Python's growth with the last few releases is felt
more deeply with language features than with library improvements.
Features like list comprehensions, generators and generator
expressions, and decorators have all been ingrained, and while
originally felt "big" now are common tools I reach for and intuitively
understand.  Some of the 2.5 features such as 'with', relative imports,
and condition expressions haven't reached that level of comfort and
make Python feel "big" to me again.

There are some counter examples: built-in sets, while making a library
feature a built-in type, makes Python feel a bit smaller because sets
are such a natural concept and code using them looks cleaner.  For
Python 3000, integrating ints and longs will definitely do this, as
will (I suspect) making all strings unicode with a (probably rarely
used) byte type.

So the question is where string methods like index and find fall.  To
me, they don't feel like language features. Built-in types fall
somewhere in between language features and library.  Their /presence/
is a language feature but what you can do with them seems more
library-ish to me.  For me, the reason is that I can easily keep in my
head that I have strings to represent text, ints, longs, etc. to
represent numbers, sets, dicts, lists, and tuples to represent
collections, etc.  But I may not remember exactly how to use str.find()
or dict.setdefault() because I use them more rarely (which doesn't
mean they're unimportant!). I know they're there and I vaguely remember
how to use them, so when I need them, it's off to the library reference
or help() for a quick referesher.

This suggests to me that a guiding principle ought to be reducing
language features without losing important functionality, just as the
int/long, str/unicode, all-newstyle classes work is doing.  Here you're
trying to polish the conceptual edges off the language, compound-W'ing
the language warts, and generally streamlining the language so it can
more easily fit in your head.  Where it comes to the library, I think we
ought to concentrate on reducing duplication.  TOOWTDI.  Get rid of the
User* modules.  If I need to do web-stuff, do I need urllib, urllib2,
urlparse, or what? etc.

As for the built-in types, let's reduce duplication here too, so if
there's a better way of e.g. doing what find, rfind, index, and rindex
do, then let's remove them and encourage the other uses.
dict.has_key() is a perfect example here.  'in' replaces many
of the use cases for str.find and friends, but not all.  Maybe
str.partition completes the picture, though I don't have enough
experience with them to know.

Anyway, enough blathering.  Those are my thoughts.  For this
specific case, maybe we really don't need any of ?find() and ?index(),
but if the choice comes down to one or the other, I still find catching
the exception less convenient than checking a return value.

- -Barry
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBROx5w3EjvBPtnXfVAQI03QP/X9KyJabidsid1Vu01PWQZ0Op2ZvoMWyg
b9VQrS94auA/AQD9zg6SoBQaPIIGLAWg6Oh4FjkiuuCwhsb96YHjGdiSE510VfjW
R6qXg9beWTaafJVtzkjCLn0Gu+H5R9EdWnLGvwdVvF2ASPwfrZ2N0G6k/daQlCNk
3G5ucal/Jug=
=vwWM
-----END PGP SIGNATURE-----

From steven.bethard at gmail.com  Wed Aug 23 18:05:11 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 23 Aug 2006 10:05:11 -0600
Subject: [Python-3000] DictMixin (WAS: Droping find/rfind?)
Message-ID: <d11dcfba0608230905k72822c05w239175ad319a811b@mail.gmail.com>

On 8/23/06, Barry Warsaw <barry at python.org> wrote:
> Where it comes to the library, I think we ought to concentrate on
> reducing duplication.  TOOWTDI.  Get rid of the User* modules.

Generally a good idea, but we still need somewhere to put DictMixin.
It's too bad you can't just use the unbound methods like::

    dict.update(dict-like-object, *args, **kwargs)

or we could drop DictMixin entirely.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From jimjjewett at gmail.com  Wed Aug 23 19:08:57 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 23 Aug 2006 13:08:57 -0400
Subject: [Python-3000] DictMixin (WAS: Droping find/rfind?)
In-Reply-To: <d11dcfba0608230905k72822c05w239175ad319a811b@mail.gmail.com>
References: <d11dcfba0608230905k72822c05w239175ad319a811b@mail.gmail.com>
Message-ID: <fb6fbf560608231008s795ee95ax5f1d128e1653098e@mail.gmail.com>

On 8/23/06, Barry Warsaw <barry at python.org> wrote:
> Where it comes to the library, I think we ought to concentrate on
> reducing duplication.  TOOWTDI.  Get rid of the User* modules.

Until it is possible to inherit from multiple extension types, there
will be a need to mimic inheritance with delegation; User* provides a
useful pattern.

-jJ

From jjl at pobox.com  Wed Aug 23 19:47:14 2006
From: jjl at pobox.com (John J Lee)
Date: Wed, 23 Aug 2006 18:47:14 +0100 (GMT Standard Time)
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <Pine.WNT.4.64.0608231204010.1204@shaolin>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060822191712.1A39.JCARLSON@uci.edu>
	<44EC0524.2060206@canterbury.ac.nz>
	<Pine.WNT.4.64.0608231204010.1204@shaolin>
Message-ID: <Pine.WNT.4.64.0608231845570.2916@shaolin>

On Wed, 23 Aug 2006, John J Lee wrote:
[...]
>> An alternative would be to return None for not found.
>> It wouldn't solve the problem of people using the
>> return value as a boolean, but at least you'd get
>> an exception if you tried to use the not-found value
>> as an index.
>>
>> Or maybe it could return index values as a special
>> int subclass that always tests true even when it's
>> zero...
>
> How about returning a str.NotFound object?

Whoops, scratch that, doesn't solve anything more than returning None.


John

From steven.bethard at gmail.com  Wed Aug 23 20:29:26 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 23 Aug 2006 12:29:26 -0600
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
Message-ID: <d11dcfba0608231129m7320051cs8bb0f4c5715c7068@mail.gmail.com>

On 8/23/06, Barry Warsaw <barry at python.org> wrote:
> I agree with Tim -- if we have to get rid of one of them, let's get
> rid of index/rindex and keep find/rfind.  Catching the exception is
> much less convenient than testing for -1.

Could you post a simple example or two?  I keep imagining things like::

    index = text.index(...)
    if 0 <= index:
        ... do something with index ...
    else:
        ...

which looks about the same as::

    try:
        index = text.index(...)
        ... do something with index ...
    except ValueError:
        ...

Is it just that a lot of the else clauses are empty?

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From jcarlson at uci.edu  Wed Aug 23 20:52:54 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 11:52:54 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <d11dcfba0608231129m7320051cs8bb0f4c5715c7068@mail.gmail.com>
References: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
	<d11dcfba0608231129m7320051cs8bb0f4c5715c7068@mail.gmail.com>
Message-ID: <20060823114629.1A57.JCARLSON@uci.edu>


"Steven Bethard" <steven.bethard at gmail.com> wrote:
> 
> On 8/23/06, Barry Warsaw <barry at python.org> wrote:
> > I agree with Tim -- if we have to get rid of one of them, let's get
> > rid of index/rindex and keep find/rfind.  Catching the exception is
> > much less convenient than testing for -1.
> 
> Could you post a simple example or two?  I keep imagining things like::
> 
>     index = text.index(...)
>     if 0 <= index:
>         ... do something with index ...
>     else:
>         ...

A more-often-used style is...

    index = text.find(...)
    if index >= 0:
        ...

Compare this with the use of index:

    try:
        index = text.index(...)
    except ValueError:
        pass
    else:
        ...


or even

    index = 0
    while 1:
        index = text.find(..., index)
        if index == -1:
            break
        ...


compared with

    index = 0
    while 1:
        try:
            index = text.index(..., index)
        except ValueError:
            break
        ...

>     try:
>         index = text.index(...)
>         ... do something with index ...
>     except ValueError:
>         ...

In these not uncommon cases, the use of str.index and having to catch
ValueError is cumbersome (in terms of typing, indentation, etc.), and is
about as susceptible to bugs as str.find, which you have shown by
putting "... do something with index ..." in the try clause, rather than
the else clause.

 - Josiah


From steven.bethard at gmail.com  Wed Aug 23 21:07:49 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 23 Aug 2006 13:07:49 -0600
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060823114629.1A57.JCARLSON@uci.edu>
References: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
	<d11dcfba0608231129m7320051cs8bb0f4c5715c7068@mail.gmail.com>
	<20060823114629.1A57.JCARLSON@uci.edu>
Message-ID: <d11dcfba0608231207o43fd1237i19520b57c85673e1@mail.gmail.com>

Steven Bethard wrote:
> Could you post a simple example or two?

Josiah Carlson wrote:
>     index = text.find(...)
>     if index >= 0:
>         ...
>
[snip]
>     index = 0
>     while 1:
>         index = text.find(..., index)
>         if index == -1:
>             break
>         ...
>

Thanks.  So with your search() function, these would be something like:

    indices = text.search(pattern, count=1)
    if indices:
        index, = indices
        ...

and

    for index in text.search(pattern):
        ...

if I understood the proposal right.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From paul at prescod.net  Wed Aug 23 21:12:31 2006
From: paul at prescod.net (Paul Prescod)
Date: Wed, 23 Aug 2006 12:12:31 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <d11dcfba0608231129m7320051cs8bb0f4c5715c7068@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
	<d11dcfba0608231129m7320051cs8bb0f4c5715c7068@mail.gmail.com>
Message-ID: <1cb725390608231212x1fbd3492jb4e9e3f0fcccee77@mail.gmail.com>

Just throwing it out but what about something like:

found, index = text.index("abc")

if found:
   doSomething(index)

If you were confident that the index was in there you would do something
more like this:

something = text[text.index("abc")[1]:]

(although there are clearer ways to do that)

On 8/23/06, Steven Bethard <steven.bethard at gmail.com> wrote:
>
> On 8/23/06, Barry Warsaw <barry at python.org> wrote:
> > I agree with Tim -- if we have to get rid of one of them, let's get
> > rid of index/rindex and keep find/rfind.  Catching the exception is
> > much less convenient than testing for -1.
>
> Could you post a simple example or two?  I keep imagining things like::
>
>     index = text.index(...)
>     if 0 <= index:
>         ... do something with index ...
>     else:
>         ...
>
> which looks about the same as::
>
>     try:
>         index = text.index(...)
>         ... do something with index ...
>     except ValueError:
>         ...
>
> Is it just that a lot of the else clauses are empty?
>
> STeVe
> --
> I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
> tiny blip on the distant coast of sanity.
>         --- Bucky Katt, Get Fuzzy
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/paul%40prescod.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060823/71692915/attachment.html 

From g.brandl at gmx.net  Wed Aug 23 21:36:12 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 23 Aug 2006 21:36:12 +0200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <d11dcfba0608231207o43fd1237i19520b57c85673e1@mail.gmail.com>
References: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>	<d11dcfba0608231129m7320051cs8bb0f4c5715c7068@mail.gmail.com>	<20060823114629.1A57.JCARLSON@uci.edu>
	<d11dcfba0608231207o43fd1237i19520b57c85673e1@mail.gmail.com>
Message-ID: <ecianc$526$1@sea.gmane.org>

Steven Bethard wrote:
> Steven Bethard wrote:
>> Could you post a simple example or two?
> 
> Josiah Carlson wrote:
>>     index = text.find(...)
>>     if index >= 0:
>>         ...
>>
> [snip]
>>     index = 0
>>     while 1:
>>         index = text.find(..., index)
>>         if index == -1:
>>             break
>>         ...
>>
> 
> Thanks.  So with your search() function, these would be something like:
> 
>     indices = text.search(pattern, count=1)
>     if indices:
>         index, = indices
>         ...

Or even

indices = text.search(pattern, count=1)
for index in indices:
     ...

Georg


From g.brandl at gmx.net  Wed Aug 23 21:36:50 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 23 Aug 2006 21:36:50 +0200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <1cb725390608231212x1fbd3492jb4e9e3f0fcccee77@mail.gmail.com>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>	<20060823044148.GR5772@performancedrivers.com>	<0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>	<d11dcfba0608231129m7320051cs8bb0f4c5715c7068@mail.gmail.com>
	<1cb725390608231212x1fbd3492jb4e9e3f0fcccee77@mail.gmail.com>
Message-ID: <eciaoj$526$2@sea.gmane.org>

Paul Prescod wrote:
> Just throwing it out but what about something like:
> 
> found, index = text.index("abc")
> 
> if found:
>    doSomething(index)

-1. str.index()'s semantics should not be different from list.index().

Georg


From jcarlson at uci.edu  Wed Aug 23 21:56:21 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 12:56:21 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <d11dcfba0608231207o43fd1237i19520b57c85673e1@mail.gmail.com>
References: <20060823114629.1A57.JCARLSON@uci.edu>
	<d11dcfba0608231207o43fd1237i19520b57c85673e1@mail.gmail.com>
Message-ID: <20060823123719.1A5D.JCARLSON@uci.edu>


"Steven Bethard" <steven.bethard at gmail.com> wrote:
> Steven Bethard wrote:
> > Could you post a simple example or two?
> 
> Josiah Carlson wrote:
> >     index = text.find(...)
> >     if index >= 0:
> >         ...
> >
> [snip]
> >     index = 0
> >     while 1:
> >         index = text.find(..., index)
> >         if index == -1:
> >             break
> >         ...
> 
> Thanks.  So with your search() function, these would be something like:
> 
>     indices = text.search(pattern, count=1)
>     if indices:
>         index, = indices
>         ...
> 
> and
> 
>     for index in text.search(pattern):
>         ...
> 
> if I understood the proposal right.

Yes, you understood my (strawman) proposal correctly.  The former could
even be shortened to:

    for index in text.search(pattern, count=1):
        ...

... if there wasn't an else clause in the original search.  Note that my
point in the proposing of search was to say:
1. [r]index is cumbersome
2. [r]find can be error-prone for newbies due to the -1 return
3. the functionality seems to be useful (otherwise neither would exist)
4. let us unambiguate [r]find if possible, because it is the better of
the two (in my opinion)
5. or instead of 4, replace both of them with searh

People seem to like the #5 option, even though it was not my intent by
posting search originally.  Given that some people like it, I'm now of
the opinion that if [r]find is going, then certainly [r]index should go
because it suffers from being more cumbersome to use and has a similar
class of bugs, and if both go, then we should have something to replace
them.  As a replacement, search lacks the exception annoyance of index,
has an unambiguous return value, and naturally supports iterative find
calls.

Given search as a potential replacement, about the only question is
whether count should default to sys.maxint or 1.  The original
description included count=sys.maxint, but if we want to use it as a
somewhat drop-in replacement for find and index, then it would make more
sense for it to have count=1 as a default, with some easy to access
count argument to make it find all of them.


 - Josiah


From jcarlson at uci.edu  Wed Aug 23 22:22:42 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 13:22:42 -0700
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <ca471dc20608211736h5f8903cctc92c60c5bd6e538e@mail.gmail.com>
References: <20060821081944.1A0F.JCARLSON@uci.edu>
	<ca471dc20608211736h5f8903cctc92c60c5bd6e538e@mail.gmail.com>
Message-ID: <20060823125951.1A60.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:

> And yet offense is taken. Have you watched the video of my Py3k talk?
> Search for it on Google Video.

I spent some time yesterday and watched it.  All I was proposing is that
similar to Perl 5 and 6, users of Python 2.x may not feel an
overwhelming desire to move to Python 3.x, because there will be so many
incompatabilities.  I understand that the point of Python 3.x is to
allow for a one-time (at least for now) breakage of the backwards
compatability of the language to get rid of the crap; "Backwards
incompatible changes are allowed in Python 3000, but not to excess."
While each individual change to the language is relatively minor by
itself, putting them all together is effectively one big backwards
incompatible change.

Take the standard library reorganization for example.  I am 100% in
favor of reorganizing it, but if it is all moved at once, then people
can't write code for the future, until it arrives.  But if we were to
create a mapping of new names -> old names, then an import hook could be
written, and people could start using the new package names in 2.6 .

The intent of my post was to say that all of us want Py3k to succeed,
but I believe that in order for it to succeed that breakage from the 2.x
series should be gradual, in a similar way to how 2.x -> 2.x+1 breakage
has been gradual.  I believe we agree on this basic point except for one
thing; according to your talk and your posts here, you want Py3k alpha
in the next year or two, while I'm thinking that Py3k alpha should come
somewhere after 2.6 and probably 2.7, maybe even after 2.8 or 2.9,
depending on how quickly the 2.x series is transitioned.  Having a Py3k
in development really just makes maintenance (bug fixing, etc.) more of
a burdon.


> Perhaps you want to help write the transition PEP?

I'll see what I can hack up next week (I have an advancement talk
tomorrow that I really should be preparing for).

 - Josiah


From bjourne at gmail.com  Wed Aug 23 23:01:23 2006
From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=)
Date: Wed, 23 Aug 2006 23:01:23 +0200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060823114629.1A57.JCARLSON@uci.edu>
References: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org>
	<d11dcfba0608231129m7320051cs8bb0f4c5715c7068@mail.gmail.com>
	<20060823114629.1A57.JCARLSON@uci.edu>
Message-ID: <740c3aec0608231401q18ca271o72157213855e7e17@mail.gmail.com>

On 8/23/06, Josiah Carlson <jcarlson at uci.edu> wrote:

> or even
>
>     index = 0
>     while 1:
>         index = text.find(..., index)
>         if index == -1:
>             break
>         ...
> compared with
>
>     index = 0
>     while 1:
>         try:
>             index = text.index(..., index)
>         except ValueError:
>             break
>         ...

You are supposed to use the in operator:

index = 0
while 1:
    if not "something" in text[index:]:
        break

IMHO, removing find() is good because index() does the same job
without violating the Samurai Principle
(http://c2.com/cgi/wiki?SamuraiPrinciple). It would be interesting to
see the patch that replaced find() with index(), did it really make
the code more cumbersome?

-- 
mvh Bj?rn

From guido at python.org  Wed Aug 23 23:18:59 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 23 Aug 2006 14:18:59 -0700
Subject: [Python-3000] find -> index patch
Message-ID: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>

Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just
gets rid of all *uses* of find/rfind from Lib; it doesn't actually
modify stringobject.c or unicodeobject.c. It doesn't use
[r]partition()'; someone could look for opportunities to use that
separately.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rfind2rindex_find2index.pat
Type: application/octet-stream
Size: 80147 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060823/a8f58010/attachment-0001.obj 

From jack at psynchronous.com  Wed Aug 23 23:39:25 2006
From: jack at psynchronous.com (Jack Diederich)
Date: Wed, 23 Aug 2006 17:39:25 -0400
Subject: [Python-3000] find -> index patch
In-Reply-To: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
References: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
Message-ID: <20060823213924.GS5772@performancedrivers.com>

On Wed, Aug 23, 2006 at 02:18:59PM -0700, Guido van Rossum wrote:
> Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just
> gets rid of all *uses* of find/rfind from Lib; it doesn't actually
> modify stringobject.c or unicodeobject.c. It doesn't use
> [r]partition()'; someone could look for opportunities to use that
> separately.
> 

Is this a machine generated patch?  Changing all calls to find with
  try: i = text.index(sep)
  except: i = -1
has a Yuck factor of -1000.  Some of the excepts specify ValueError,
but still.   

-Jack

From jcarlson at uci.edu  Wed Aug 23 23:48:40 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 14:48:40 -0700
Subject: [Python-3000] find -> index patch
In-Reply-To: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
References: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
Message-ID: <20060823143606.1A66.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just
> gets rid of all *uses* of find/rfind from Lib; it doesn't actually
> modify stringobject.c or unicodeobject.c. It doesn't use
> [r]partition()'; someone could look for opportunities to use that
> separately.
> 
> -- 
> --Guido van Rossum (home page: http://www.python.org/~guido/)

There's a bug in the Lib/idlelib/configHandler.py patch, likely 6
unintend bugs exposed in Lib/idlelib/PyParse.py (which are made worse by
the patch), Lib/idlelib/CallTips.py is broken, 4 examples in
Lib/ihooks.py don't require the try/except clause (it is prefixed with a
containment test), Lib/cookielib.py has two new bugs, ...

I stopped at Lib/string.py

Also, there are inconsistant uses of bare except and except ValueError
clauses.

The patch shouldn't be applied for many reasons, not the least of which
is because it breaks currently working code, it offers poorly-styled
code of the form:
    try:... = str.index(...)
    except:...=-1

...that looks to have been done by a script, it has inconsistant style
compared to the code it replaces, etc.

 - Josiah


From jcarlson at uci.edu  Wed Aug 23 23:53:05 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 14:53:05 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <740c3aec0608231401q18ca271o72157213855e7e17@mail.gmail.com>
References: <20060823114629.1A57.JCARLSON@uci.edu>
	<740c3aec0608231401q18ca271o72157213855e7e17@mail.gmail.com>
Message-ID: <20060823143116.1A63.JCARLSON@uci.edu>


"BJ?rn Lindqvist" <bjourne at gmail.com> wrote:
> 
> On 8/23/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> 
> > or even
> >
> >     index = 0
> >     while 1:
> >         index = text.find(..., index)
> >         if index == -1:
> >             break
> >         ...
> > compared with
> >
> >     index = 0
> >     while 1:
> >         try:
> >             index = text.index(..., index)
> >         except ValueError:
> >             break
> >         ...
> 
> You are supposed to use the in operator:
> 
> index = 0
> while 1:
>     if not "something" in text[index:]:
>         break

This can also lead to O(n^2) running time, causes unnecessary string
allocation, memory copies, etc.  If I saw that in real code, I'd
probably lose respect for the author of that module and/or package.


> IMHO, removing find() is good because index() does the same job
> without violating the Samurai Principle
> (http://c2.com/cgi/wiki?SamuraiPrinciple). It would be interesting to
> see the patch that replaced find() with index(), did it really make
> the code more cumbersome?

Everywhere there is a test for index==str.find(...), needs to be
replaced with a try/except clause.  That's a cumbersome translation if
there ever was one.

 - Josiah


From hasan.diwan at gmail.com  Thu Aug 24 00:09:35 2006
From: hasan.diwan at gmail.com (Hasan Diwan)
Date: Wed, 23 Aug 2006 15:09:35 -0700
Subject: [Python-3000] find -> index patch
In-Reply-To: <20060823143606.1A66.JCARLSON@uci.edu>
References: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
	<20060823143606.1A66.JCARLSON@uci.edu>
Message-ID: <2cda2fc90608231509n7dc5a47bg5adfd2b790e29681@mail.gmail.com>

On 23/08/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> ...that looks to have been done by a script, it has inconsistant style
> compared to the code it replaces, etc.
>

I made the minimal change that implements the functionality suggested, in
terms of find/rfind, they return -1. The least painful way to replace it
with index is:

try:
    i=str.index(foo)
except ValueError:
    i = -1

As for the plain except clauses, that was just laziness on my part. It's not
meant to be stylistically consistent or beautiful, rather it is meant to be
functional and as a starting point. Feel free to change/rewrite the patch.
The GENERAL CASE, i.e. one that is applicable throughout the code is the
try/except clauses shown above.
-- 
Cheers,
Hasan Diwan <hasan.diwan at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060823/6e57fa69/attachment.html 

From g.brandl at gmx.net  Thu Aug 24 00:52:16 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 24 Aug 2006 00:52:16 +0200
Subject: [Python-3000] find -> index patch
In-Reply-To: <20060823143606.1A66.JCARLSON@uci.edu>
References: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
	<20060823143606.1A66.JCARLSON@uci.edu>
Message-ID: <ecim70$aia$1@sea.gmane.org>

Josiah Carlson wrote:
> "Guido van Rossum" <guido at python.org> wrote:
>> Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just
>> gets rid of all *uses* of find/rfind from Lib; it doesn't actually
>> modify stringobject.c or unicodeobject.c. It doesn't use
>> [r]partition()'; someone could look for opportunities to use that
>> separately.
>> 
>> -- 
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> There's a bug in the Lib/idlelib/configHandler.py patch, likely 6
> unintend bugs exposed in Lib/idlelib/PyParse.py (which are made worse by
> the patch),

Are the bugs there in current code too? You should then report them.

> Lib/idlelib/CallTips.py is broken, 4 examples in
> Lib/ihooks.py don't require the try/except clause (it is prefixed with a
> containment test), Lib/cookielib.py has two new bugs, ...
> 
> I stopped at Lib/string.py
> 
> Also, there are inconsistant uses of bare except and except ValueError
> clauses.

Not speaking of the inconsistent use of spaces vs. tabs ;)

Another newly-introduced bug:

-                p = str.rfind('\n', 0, p-1) + 1
+                try:p = str.rindex('\n', 0, p-1) + 1
+		except:p=-1

Georg


From jcarlson at uci.edu  Thu Aug 24 01:30:37 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 16:30:37 -0700
Subject: [Python-3000] find -> index patch
In-Reply-To: <ecim70$aia$1@sea.gmane.org>
References: <20060823143606.1A66.JCARLSON@uci.edu> <ecim70$aia$1@sea.gmane.org>
Message-ID: <20060823162756.1A6C.JCARLSON@uci.edu>


Georg Brandl <g.brandl at gmx.net> wrote:
> 
> Josiah Carlson wrote:
> > "Guido van Rossum" <guido at python.org> wrote:
> >> Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just
> >> gets rid of all *uses* of find/rfind from Lib; it doesn't actually
> >> modify stringobject.c or unicodeobject.c. It doesn't use
> >> [r]partition()'; someone could look for opportunities to use that
> >> separately.
> >> 
> >> -- 
> >> --Guido van Rossum (home page: http://www.python.org/~guido/)
> > 
> > There's a bug in the Lib/idlelib/configHandler.py patch, likely 6
> > unintend bugs exposed in Lib/idlelib/PyParse.py (which are made worse by
> > the patch),
> 
> Are the bugs there in current code too? You should then report them.

Maybe, maybe not.  I'll have to look (but not today).

> > Lib/idlelib/CallTips.py is broken, 4 examples in
> > Lib/ihooks.py don't require the try/except clause (it is prefixed with a
> > containment test), Lib/cookielib.py has two new bugs, ...
> > 
> > I stopped at Lib/string.py
> > 
> > Also, there are inconsistant uses of bare except and except ValueError
> > clauses.
> 
> Not speaking of the inconsistent use of spaces vs. tabs ;)
> 
> Another newly-introduced bug:
> 
> -                p = str.rfind('\n', 0, p-1) + 1
> +                try:p = str.rindex('\n', 0, p-1) + 1
> +		except:p=-1

That was the "likely 6 unintended bugs in Lib/idlelib/PyParse.py".

 - Josiah


From jcarlson at uci.edu  Thu Aug 24 01:39:03 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 16:39:03 -0700
Subject: [Python-3000] find -> index patch
In-Reply-To: <2cda2fc90608231509n7dc5a47bg5adfd2b790e29681@mail.gmail.com>
References: <20060823143606.1A66.JCARLSON@uci.edu>
	<2cda2fc90608231509n7dc5a47bg5adfd2b790e29681@mail.gmail.com>
Message-ID: <20060823163043.1A6F.JCARLSON@uci.edu>


"Hasan Diwan" <hasan.diwan at gmail.com> wrote:
> On 23/08/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> >
> > ...that looks to have been done by a script, it has inconsistant style
> > compared to the code it replaces, etc.
> >
> 
> I made the minimal change that implements the functionality suggested, in
> terms of find/rfind, they return -1. The least painful way to replace it
> with index is:
> 
> try:
>     i=str.index(foo)
> except ValueError:
>     i = -1
> 
> As for the plain except clauses, that was just laziness on my part. It's not
> meant to be stylistically consistent or beautiful, rather it is meant to be
> functional and as a starting point. Feel free to change/rewrite the patch.
> The GENERAL CASE, i.e. one that is applicable throughout the code is the
> try/except clauses shown above.

If find is to be replaced, it should be replaced with something that
isn't as cumbersome to use as index, and shouldn't be done in a bulk
replacement attempt; as you have also shown that doing such can lead to
unintended new bugs and the possible perpetuation of old bugs.

When Raymond Hettinger did the same thing to replace some examples of
find with partition, in my first pass over his proposed patch, I
also discovered a handful of new and perpetuated bugs, which was in a
similar skimming of the patches.

I'm also not going to fix the patch because I don't believe that
replacing find with index is the correct course of action, for the few
reasons I've laid out in the current and previous messages on the topic.

 - Josiah


From tjreedy at udel.edu  Thu Aug 24 02:03:38 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 23 Aug 2006 20:03:38 -0400
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
References: <20060821081944.1A0F.JCARLSON@uci.edu><ca471dc20608211736h5f8903cctc92c60c5bd6e538e@mail.gmail.com>
	<20060823125951.1A60.JCARLSON@uci.edu>
Message-ID: <eciqcp$lra$1@sea.gmane.org>


"Josiah Carlson" <jcarlson at uci.edu> wrote in message 
news:20060823125951.1A60.JCARLSON at uci.edu...
> The intent of my post was to say that all of us want Py3k to succeed,

I should hope that we all do.

> but I believe that in order for it to succeed that breakage from the 2.x
> series should be gradual, in a similar way to how 2.x -> 2.x+1 breakage
> has been gradual.

Given that the rate of intentional breakage in the core language (including 
builtins) has been very minimal, this would take a couple of decades, which 
to my mind would be a failure.

> I believe we agree on this basic point

To the contrary, you seem to have a basic disagreement with the plan to 
make all the core language changes at once and to clear the decks of old 
baggage so we can move forward with a learner language that is a bit easier 
to learn and remember.

> according to your talk and your posts here, you want Py3k alpha
> in the next year or two, while I'm thinking that Py3k alpha should come
> somewhere after 2.6 and probably 2.7, maybe even after 2.8 or 2.9,

Whereas I wish it were already out and would be delighted to see it early 
next year.  Some of the changes have already been put off for at least five 
years and, to me, are overdue.

Terry Jan Reedy






From tjreedy at udel.edu  Thu Aug 24 02:27:26 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 23 Aug 2006 20:27:26 -0400
Subject: [Python-3000] Droping find/rfind?
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com><20060823044148.GR5772@performancedrivers.com>
	<e3c648160608222346g4587d55eiff521787ca4d915f@mail.gmail.com>
Message-ID: <ecirpd$p3t$1@sea.gmane.org>


"Brian Holmes" <holmesbj.dev at gmail.com> wrote in message 
news:e3c648160608222346g4587d55eiff521787ca4d915f at mail.gmail.com...

>Even after reading Terry Reedy's arguments, I don't see why we need to 
> >remove this option.

Since this is my first post in this current thread, you either meant 
someone else or are remembering my posts about in- and out-of-band error 
signaling from the last time we discussed this.

> Let both exist.  I'd prefer grandfathering something like this and 
> leaving it >in, even if it wouldn't be there had known everything from 
> the start.

One point of the 3.0 cleanup is to remove or change things that we 
definitely would not do today.  When I learned Python, both the find/match 
duplication and the in-band same-type Unix/Cism -1 return stuck out to me 
like sore thumbs.  So I would either

1. just remove find() and leave match(); or
2. change find()'s error return to None, and remove index();

 or possibly consider Josiah's idea of
3. remove both in favor of an index generator.

I am strongly -1 on leaving both as are.

Terry Jan Reedy




From jack at psynchronous.com  Thu Aug 24 02:39:48 2006
From: jack at psynchronous.com (Jack Diederich)
Date: Wed, 23 Aug 2006 20:39:48 -0400
Subject: [Python-3000] find -> index patch
In-Reply-To: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
References: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
Message-ID: <20060824003948.GT5772@performancedrivers.com>

On Wed, Aug 23, 2006 at 02:18:59PM -0700, Guido van Rossum wrote:
> Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just
> gets rid of all *uses* of find/rfind from Lib; it doesn't actually
> modify stringobject.c or unicodeobject.c. It doesn't use
> [r]partition()'; someone could look for opportunities to use that
> separately.

I make a go at doing an idiomatic convertion of the first few modules 
tagged by 'grep find( *.py' in Lib, patch attached.

WOW, I love partition.  In all the instances that weren't a simple "in"
test I ended up using [r]partition.  In some cases one of the returned
strings gets thrown away but in those cases it is guaranteed to be small.
The new code is usually smaller than the old and generally clearer.
ex/ cgi.py
-        i = p.find('=')
-        if i >= 0:
-            name = p[:i].strip().lower()
-            value = p[i+1:].strip()
+        (name, sep_found, value) = p.partition('=')
+        if (sep_found):
+            name = name.strip().lower()
+            value = value.strip()

If folks like the way this partial set looks I'll convert the rest.

-Jack
-------------- next part --------------
Index: Lib/CGIHTTPServer.py
===================================================================
--- Lib/CGIHTTPServer.py	(revision 51530)
+++ Lib/CGIHTTPServer.py	(working copy)
@@ -106,16 +106,9 @@
     def run_cgi(self):
         """Execute a CGI script."""
         dir, rest = self.cgi_info
-        i = rest.rfind('?')
-        if i >= 0:
-            rest, query = rest[:i], rest[i+1:]
-        else:
-            query = ''
-        i = rest.find('/')
-        if i >= 0:
-            script, rest = rest[:i], rest[i:]
-        else:
-            script, rest = rest, ''
+        (rest, sep, query) = rest.rpartition('?')
+        (rest, sep, script) = rest.partition('/')
+        rest = sep + rest # keep the slash
         scriptname = dir + '/' + script
         scriptfile = self.translate_path(scriptname)
         if not os.path.exists(scriptfile):
Index: Lib/asynchat.py
===================================================================
--- Lib/asynchat.py	(revision 51530)
+++ Lib/asynchat.py	(working copy)
@@ -125,14 +125,13 @@
                 #    collect data to the prefix
                 # 3) end of buffer does not match any prefix:
                 #    collect data
-                terminator_len = len(terminator)
-                index = self.ac_in_buffer.find(terminator)
-                if index != -1:
+                (data, term_found, more_data) = self.ac_in_buffer.partition(terminator)
+                if term_found:
                     # we found the terminator
-                    if index > 0:
+                    if data:
                         # don't bother reporting the empty string (source of subtle bugs)
-                        self.collect_incoming_data (self.ac_in_buffer[:index])
-                    self.ac_in_buffer = self.ac_in_buffer[index+terminator_len:]
+                        self.collect_incoming_data(data)
+                    self.ac_in_buffer = more_data
                     # This does the Right Thing if the terminator is changed here.
                     self.found_terminator()
                 else:
Index: Lib/cookielib.py
===================================================================
--- Lib/cookielib.py	(revision 51530)
+++ Lib/cookielib.py	(working copy)
@@ -531,8 +531,10 @@
         return True
     if not is_HDN(A):
         return False
-    i = A.rfind(B)
-    if i == -1 or i == 0:
+    if (not B):
+        return False
+    (before_B, sep, after_B) = A.rpartition(B)
+    if not sep or not before_B:
         # A does not have form NB, or N is the empty string
         return False
     if not B.startswith("."):
@@ -595,7 +597,7 @@
 
     """
     erhn = req_host = request_host(request)
-    if req_host.find(".") == -1 and not IPV4_RE.search(req_host):
+    if "." not in req_host and not IPV4_RE.search(req_host):
         erhn = req_host + ".local"
     return req_host, erhn
 
@@ -616,16 +618,12 @@
 
 def request_port(request):
     host = request.get_host()
-    i = host.find(':')
-    if i >= 0:
-        port = host[i+1:]
-        try:
-            int(port)
-        except ValueError:
-            _debug("nonnumeric port: '%s'", port)
-            return None
-    else:
-        port = DEFAULT_HTTP_PORT
+    port = host.partition(':')[-1] or DEFAULT_HTTP_PORT
+    try:
+        int(port)
+    except ValueError:
+        _debug("nonnumeric port: '%s'", port)
+        return None
     return port
 
 # Characters in addition to A-Z, a-z, 0-9, '_', '.', and '-' that don't
@@ -676,13 +674,9 @@
     '.local'
 
     """
-    i = h.find(".")
-    if i >= 0:
-        #a = h[:i]  # this line is only here to show what a is
-        b = h[i+1:]
-        i = b.find(".")
-        if is_HDN(h) and (i >= 0 or b == "local"):
-            return "."+b
+    (a, sep, b) = h.partition(".")
+    if sep and is_HDN(h) and ("." in b or b == "local"):
+        return "."+b
     return h
 
 def is_third_party(request):
@@ -986,11 +980,9 @@
                 # XXX This should probably be compared with the Konqueror
                 # (kcookiejar.cpp) and Mozilla implementations, but it's a
                 # losing battle.
-                i = domain.rfind(".")
-                j = domain.rfind(".", 0, i)
-                if j == 0:  # domain like .foo.bar
-                    tld = domain[i+1:]
-                    sld = domain[j+1:i]
+                (extra, dot, tld) = domain.rpartition(".")
+                (extra, dot, sld) = extra.rpartition(".")
+                if not extra:  # domain like .foo.bar
                     if sld.lower() in ("co", "ac", "com", "edu", "org", "net",
                        "gov", "mil", "int", "aero", "biz", "cat", "coop",
                        "info", "jobs", "mobi", "museum", "name", "pro",
@@ -1002,7 +994,7 @@
                 undotted_domain = domain[1:]
             else:
                 undotted_domain = domain
-            embedded_dots = (undotted_domain.find(".") >= 0)
+            embedded_dots = ("." in undotted_domain)
             if not embedded_dots and domain != ".local":
                 _debug("   non-local domain %s contains no embedded dot",
                        domain)
@@ -1024,8 +1016,7 @@
             if (cookie.version > 0 or
                 (self.strict_ns_domain & self.DomainStrictNoDots)):
                 host_prefix = req_host[:-len(domain)]
-                if (host_prefix.find(".") >= 0 and
-                    not IPV4_RE.search(req_host)):
+                if ("." in host_prefix and not IPV4_RE.search(req_host)):
                     _debug("   host prefix %s for domain %s contains a dot",
                            host_prefix, domain)
                     return False
@@ -1462,13 +1453,13 @@
         else:
             path_specified = False
             path = request_path(request)
-            i = path.rfind("/")
-            if i != -1:
+            (path, sep, dummy) = path.rpartition("/")
+            if sep:
                 if version == 0:
                     # Netscape spec parts company from reality here
-                    path = path[:i]
+                    pass
                 else:
-                    path = path[:i+1]
+                    path = path + sep
             if len(path) == 0: path = "/"
 
         # set default domain
Index: Lib/cgi.py
===================================================================
--- Lib/cgi.py	(revision 51530)
+++ Lib/cgi.py	(working copy)
@@ -340,10 +340,10 @@
     key = plist.pop(0).lower()
     pdict = {}
     for p in plist:
-        i = p.find('=')
-        if i >= 0:
-            name = p[:i].strip().lower()
-            value = p[i+1:].strip()
+        (name, sep_found, value) = p.partition('=')
+        if (sep_found):
+            name = name.strip().lower()
+            value = value.strip()
             if len(value) >= 2 and value[0] == value[-1] == '"':
                 value = value[1:-1]
                 value = value.replace('\\\\', '\\').replace('\\"', '"')
Index: Lib/ConfigParser.py
===================================================================
--- Lib/ConfigParser.py	(revision 51530)
+++ Lib/ConfigParser.py	(working copy)
@@ -468,9 +468,9 @@
                         if vi in ('=', ':') and ';' in optval:
                             # ';' is a comment delimiter only if it follows
                             # a spacing character
-                            pos = optval.find(';')
-                            if pos != -1 and optval[pos-1].isspace():
-                                optval = optval[:pos]
+                            (new_optval, sep, comment) = optval.partition(';')
+                            if (sep and new_optval[-1:].isspace()):
+                                optval = new_optval
                         optval = optval.strip()
                         # allow empty values
                         if optval == '""':
@@ -599,14 +599,13 @@
         if depth > MAX_INTERPOLATION_DEPTH:
             raise InterpolationDepthError(option, section, rest)
         while rest:
-            p = rest.find("%")
-            if p < 0:
+            (before, sep, after) = rest.partition('%')
+            if (not sep):
                 accum.append(rest)
                 return
-            if p > 0:
-                accum.append(rest[:p])
-                rest = rest[p:]
-            # p is no longer used
+            elif (after):
+                accum.append(before)
+                rest = sep + after
             c = rest[1:2]
             if c == "%":
                 accum.append("%")

From jimjjewett at gmail.com  Thu Aug 24 03:10:40 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 23 Aug 2006 21:10:40 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ecirpd$p3t$1@sea.gmane.org>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<e3c648160608222346g4587d55eiff521787ca4d915f@mail.gmail.com>
	<ecirpd$p3t$1@sea.gmane.org>
Message-ID: <fb6fbf560608231810k6ad48da4t7f880d7718d1bb2d@mail.gmail.com>

On 8/23/06, Terry Reedy <tjreedy at udel.edu> wrote:
> 2. change find()'s error return to None, and remove index();

+1

It is particularly unfortunate that the error code of -1 is a valid index.

    >>> substring = string[string.find(marker):]

will silently produce garbage.

>  or possibly consider Josiah's idea of
> 3. remove both in favor of an index generator.

The strawman seemed clumsy, but maybe it will grow on me.

-jJ

From greg.ewing at canterbury.ac.nz  Thu Aug 24 03:36:25 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 24 Aug 2006 13:36:25 +1200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060823123719.1A5D.JCARLSON@uci.edu>
References: <20060823114629.1A57.JCARLSON@uci.edu>
	<d11dcfba0608231207o43fd1237i19520b57c85673e1@mail.gmail.com>
	<20060823123719.1A5D.JCARLSON@uci.edu>
Message-ID: <44ED0299.7040204@canterbury.ac.nz>

Josiah Carlson wrote:

> Given search as a potential replacement, about the only question is
> whether count should default to sys.maxint or 1.

Do you think that there will be many use cases for
count values *other* than 1 or sys.maxint? If not,
it might be more sensible to have two functions,
search() and searchall().

And while we're on this, what about list.index?
Should it also be replaced with list.search or
whatever as well?

--
Greg

From tdelaney at avaya.com  Thu Aug 24 04:05:08 2006
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Thu, 24 Aug 2006 12:05:08 +1000
Subject: [Python-3000] Droping find/rfind?
Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>

Nick Coghlan wrote:

> I also like Josiah's idea of replacing find() with a search() method
> that returned an iterator of indices, so that you can do:
> 
> for idx in string.search(sub):
>     # Process the indices (if any)

Need to be careful with this - the original search proposal returned a
list, which could be tested for a boolean value - hence:

    if not string.search(sub):
        pass

but if an iterator were returned, I think we would want to be able to
perform the same test i.e. search would have to return an iterator that
had already performed the initial search, with __nonzero__ reflecting
the result of that search. I do think that returning an iterator is
better due to the fact that most uses of search() would only care about
the first returned index.

Tim Delaney

From jcarlson at uci.edu  Thu Aug 24 04:14:08 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 19:14:08 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>
Message-ID: <20060823191222.1A76.JCARLSON@uci.edu>


"Delaney, Timothy (Tim)" <tdelaney at avaya.com> wrote:
> 
> Nick Coghlan wrote:
> 
> > I also like Josiah's idea of replacing find() with a search() method
> > that returned an iterator of indices, so that you can do:
> > 
> > for idx in string.search(sub):
> >     # Process the indices (if any)
> 
> Need to be careful with this - the original search proposal returned a
> list, which could be tested for a boolean value - hence:
> 
>     if not string.search(sub):
>         pass
> 
> but if an iterator were returned, I think we would want to be able to
> perform the same test i.e. search would have to return an iterator that
> had already performed the initial search, with __nonzero__ reflecting
> the result of that search. I do think that returning an iterator is
> better due to the fact that most uses of search() would only care about
> the first returned index.

... which is why there is a count argument, that I have recently
suggested default to 1.


 - Josiah


From jcarlson at uci.edu  Thu Aug 24 04:21:22 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 19:21:22 -0700
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <eciqcp$lra$1@sea.gmane.org>
References: <20060823125951.1A60.JCARLSON@uci.edu> <eciqcp$lra$1@sea.gmane.org>
Message-ID: <20060823185143.1A73.JCARLSON@uci.edu>


"Terry Reedy" <tjreedy at udel.edu> wrote:
> "Josiah Carlson" <jcarlson at uci.edu> wrote in message 
> news:20060823125951.1A60.JCARLSON at uci.edu...
> > The intent of my post was to say that all of us want Py3k to succeed,
> 
> I should hope that we all do.
> 
> > but I believe that in order for it to succeed that breakage from the 2.x
> > series should be gradual, in a similar way to how 2.x -> 2.x+1 breakage
> > has been gradual.
> 
> Given that the rate of intentional breakage in the core language (including 
> builtins) has been very minimal, this would take a couple of decades, which 
> to my mind would be a failure.

If we could stick with a 12-18 month release schedule, using deprecation
and removal in subsequent releases, every removal could happen in 2-3
years. 2.6 could offer every feature of 3.0 (except for
backwards-incompatible syntax), warning of removal or relocation (in the
case of stdlib reorganization), 3.0 could handle all of the actual
syntax changes.


> > I believe we agree on this basic point
> 
> To the contrary, you seem to have a basic disagreement with the plan to 
> make all the core language changes at once and to clear the decks of old 
> baggage so we can move forward with a learner language that is a bit easier 
> to learn and remember.

I disagree with the "all the changes at once", but if Guido didn't agree
with a gradual upgrade path, then the 2.6-2.9 series wouldn't even be
considered as options, and we'd be looking at 3.0 coming out after 2.5,
and there not being a 2.6 .  Since 2.6 is planned, and other 2.x
releases are at least possible (if not expected), then I must agree with
someone, as my desires haven't previously been sufficient to change
Python release expectations.


> > according to your talk and your posts here, you want Py3k alpha
> > in the next year or two, while I'm thinking that Py3k alpha should come
> > somewhere after 2.6 and probably 2.7, maybe even after 2.8 or 2.9,
> 
> Whereas I wish it were already out and would be delighted to see it early 
> next year.  Some of the changes have already been put off for at least five 
> years and, to me, are overdue.

As a daily abuser of Python, I've not found the language to be lacking
in any area significant enough, or even having too many overlapping
features suffient to warrant such widespread language breakage.  We
disagree on this point, and that's fine, as long as Guido agrees that
2.6+ make sense, which he does, and states as much in his talk and all
relevant postings I've seen, then I don't need to drug him.

He also agrees that 3.0 should come out sooner rather than later, but
that's not going to stop me from attempting to make the case that 3.0 is
going to be generally unused until later gradual 2.6+ releases close the
gap and make the transition more natural.

But hey, I'm just a guy who writes software who is going to have to
transition and maintain it. Obviously there can't be too many of us, go
ahead and break the language, I'm sure everyone will be happy to
upgrade to 3.0, you won't even need to maintain the 2.x series, really.

 - Josiah


From martin at v.loewis.de  Thu Aug 24 04:24:17 2006
From: martin at v.loewis.de (martin at v.loewis.de)
Date: Thu, 24 Aug 2006 04:24:17 +0200
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <20060823185143.1A73.JCARLSON@uci.edu>
References: <20060823125951.1A60.JCARLSON@uci.edu> <eciqcp$lra$1@sea.gmane.org>
	<20060823185143.1A73.JCARLSON@uci.edu>
Message-ID: <1156386257.44ed0dd1cf737@www.domainfactory-webmail.de>

Zitat von Josiah Carlson <jcarlson at uci.edu>:

> > To the contrary, you seem to have a basic disagreement with the plan to
> > make all the core language changes at once and to clear the decks of old
> > baggage so we can move forward with a learner language that is a bit easier
> > to learn and remember.
>
> I disagree with the "all the changes at once", but if Guido didn't agree
> with a gradual upgrade path, then the 2.6-2.9 series wouldn't even be
> considered as options, and we'd be looking at 3.0 coming out after 2.5,
> and there not being a 2.6 .  Since 2.6 is planned, and other 2.x
> releases are at least possible (if not expected), then I must agree with
> someone, as my desires haven't previously been sufficient to change
> Python release expectations.

That conclusion is invalid. 2.6, 2.7, ... are not made to gradually
move towards 3.0, but because it is anticipated that 3.0 will not
be adopted immediately, but, say, 3.2 might be. To provide new
features for 2.x users, new 2.x releases need to be made
(of course, the features added to, say, 2.7 will likely also
be added to, say, 3.3).

Regards,
Martin


From guido at python.org  Thu Aug 24 04:39:29 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 23 Aug 2006 19:39:29 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060823191222.1A76.JCARLSON@uci.edu>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>
	<20060823191222.1A76.JCARLSON@uci.edu>
Message-ID: <ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>

I don't find the current attempts to come up with a better substring
search API useful.

We did a lot of thinking about this not too long ago, and the result
was the addition of [r]partition() to 2.5 and the intent to drop
[r]find() from py3k as both redundant with [r]index() and error-prone
(I think I just found another bug in logging.__init__.py:

    def _fixupChildren(self, ph, alogger):
        """
        Ensure that children of the placeholder ph are connected to the
        specified logger.
        """
        #for c in ph.loggers:
        for c in ph.loggerMap.keys():
            if string.find(c.parent.name, alogger.name) <> 0:
                alogger.parent = c.parent
                c.parent = alogger

This is either a really weird way of writing "if not
c.parent.name.startswith(alogger.name):", or a bug which was intending
to write "if alogger.name in c.parent.name:" .

I appreciate the criticism on the patch -- clearly it's not ready to
go in, and more work needs to be put in to actually *improve* the
code, using [r]partition()  where necessary, etc. But I'm strenghtened
in the conclusion that find() is way overused and we don't need yet
another search primitive. TOOWTDI.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From holmesbj.dev at gmail.com  Thu Aug 24 05:38:08 2006
From: holmesbj.dev at gmail.com (Brian Holmes)
Date: Wed, 23 Aug 2006 20:38:08 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ecirpd$p3t$1@sea.gmane.org>
References: <ca471dc20608221832s313d02a2gf01a5532eceedebf@mail.gmail.com>
	<20060823044148.GR5772@performancedrivers.com>
	<e3c648160608222346g4587d55eiff521787ca4d915f@mail.gmail.com>
	<ecirpd$p3t$1@sea.gmane.org>
Message-ID: <e3c648160608232038w7107263fx80938c0908298b82@mail.gmail.com>

On 8/23/06, Terry Reedy <tjreedy at udel.edu> wrote:
>
>
> "Brian Holmes" <holmesbj.dev at gmail.com> wrote in message
> news:e3c648160608222346g4587d55eiff521787ca4d915f at mail.gmail.com...
>
> >Even after reading Terry Reedy's arguments, I don't see why we need to
> > >remove this option.
>
> Since this is my first post in this current thread, you either meant
> someone else or are remembering my posts about in- and out-of-band error
> signaling from the last time we discussed this.
>

My reference was to this post:

http://mail.python.org/pipermail/python-dev/2005-August/055717.html

- Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060823/7848eb65/attachment.html 

From talin at acm.org  Thu Aug 24 05:38:06 2006
From: talin at acm.org (Talin)
Date: Wed, 23 Aug 2006 20:38:06 -0700
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <20060823185143.1A73.JCARLSON@uci.edu>
References: <20060823125951.1A60.JCARLSON@uci.edu> <eciqcp$lra$1@sea.gmane.org>
	<20060823185143.1A73.JCARLSON@uci.edu>
Message-ID: <44ED1F1E.1080307@acm.org>

Josiah Carlson wrote:
> "Terry Reedy" <tjreedy at udel.edu> wrote:
>> "Josiah Carlson" <jcarlson at uci.edu> wrote in message 
>> news:20060823125951.1A60.JCARLSON at uci.edu...
>>> The intent of my post was to say that all of us want Py3k to succeed,
>> I should hope that we all do.
>>
>>> but I believe that in order for it to succeed that breakage from the 2.x
>>> series should be gradual, in a similar way to how 2.x -> 2.x+1 breakage
>>> has been gradual.
>> Given that the rate of intentional breakage in the core language (including 
>> builtins) has been very minimal, this would take a couple of decades, which 
>> to my mind would be a failure.
> 
> If we could stick with a 12-18 month release schedule, using deprecation
> and removal in subsequent releases, every removal could happen in 2-3
> years. 2.6 could offer every feature of 3.0 (except for
> backwards-incompatible syntax), warning of removal or relocation (in the
> case of stdlib reorganization), 3.0 could handle all of the actual
> syntax changes.

2.6 should also include a powerful 'lint' option that detects use of 
features not compatible with 3.0. Something like "from __future__ import 
pedantic" or something along those lines.

-- Talin



From jcarlson at uci.edu  Thu Aug 24 07:07:29 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 22:07:29 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44ED0299.7040204@canterbury.ac.nz>
References: <20060823123719.1A5D.JCARLSON@uci.edu>
	<44ED0299.7040204@canterbury.ac.nz>
Message-ID: <20060823220213.1A7C.JCARLSON@uci.edu>


Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Josiah Carlson wrote:
> > Given search as a potential replacement, about the only question is
> > whether count should default to sys.maxint or 1.
> 
> Do you think that there will be many use cases for
> count values *other* than 1 or sys.maxint? If not,
> it might be more sensible to have two functions,
> search() and searchall().

I have used str.split with counts != 1 or sys.maxint, and I would guess
that there would be similar use-cases.

> And while we're on this, what about list.index?
> Should it also be replaced with list.search or
> whatever as well?

To be consistant from a sequence operation perspective, I would say yes,
though I have so rarely used list.index(), I'm hard-pressed to have much
of an opinion.

 - Josiah


From jcarlson at uci.edu  Thu Aug 24 07:20:43 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 23 Aug 2006 22:20:43 -0700
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <1156386257.44ed0dd1cf737@www.domainfactory-webmail.de>
References: <20060823185143.1A73.JCARLSON@uci.edu>
	<1156386257.44ed0dd1cf737@www.domainfactory-webmail.de>
Message-ID: <20060823203502.1A79.JCARLSON@uci.edu>


martin at v.loewis.de wrote:
> 
> Zitat von Josiah Carlson <jcarlson at uci.edu>:
> 
> > > To the contrary, you seem to have a basic disagreement with the plan to
> > > make all the core language changes at once and to clear the decks of old
> > > baggage so we can move forward with a learner language that is a bit easier
> > > to learn and remember.
> >
> > I disagree with the "all the changes at once", but if Guido didn't agree
> > with a gradual upgrade path, then the 2.6-2.9 series wouldn't even be
> > considered as options, and we'd be looking at 3.0 coming out after 2.5,
> > and there not being a 2.6 .  Since 2.6 is planned, and other 2.x
> > releases are at least possible (if not expected), then I must agree with
> > someone, as my desires haven't previously been sufficient to change
> > Python release expectations.
> 
> That conclusion is invalid. 2.6, 2.7, ... are not made to gradually
> move towards 3.0, but because it is anticipated that 3.0 will not
> be adopted immediately, but, say, 3.2 might be. To provide new
> features for 2.x users, new 2.x releases need to be made
> (of course, the features added to, say, 2.7 will likely also
> be added to, say, 3.3).

See Guido's reply here:
http://mail.python.org/pipermail/python-3000/2006-August/003105.html

Specifically his reponse to the "Here's my suggestion:" paragraph. 
Unless I completely misunderstood his response, and his later asking
whether I want to help author the transition PEP (presumably for at
least dict.keys(), bur more likely from 2.x to 3.x), I can't help but
believe that he also wants at least an attempt at some gradual change
for users with cold feet about breaking everything in one go.

Also, in the talk he gave at Google on July 21, somewhere around the
7:45-11 minute mark, he talks about how 3.x features are to be
backported to 2.7 or so, specifically so that there is a larger subset
of Python that will run in both 2.x and 3.x .  Smells like an attempt at
gradual migration to me.


 - Josiah


From steven.bethard at gmail.com  Thu Aug 24 08:31:54 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Thu, 24 Aug 2006 00:31:54 -0600
Subject: [Python-3000] find -> index patch
In-Reply-To: <20060824003948.GT5772@performancedrivers.com>
References: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
	<20060824003948.GT5772@performancedrivers.com>
Message-ID: <d11dcfba0608232331k2fe9819t659bc219c0685590@mail.gmail.com>

On 8/23/06, Jack Diederich <jack at psynchronous.com> wrote:
> On Wed, Aug 23, 2006 at 02:18:59PM -0700, Guido van Rossum wrote:
> > Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just
> > gets rid of all *uses* of find/rfind from Lib; it doesn't actually
> > modify stringobject.c or unicodeobject.c. It doesn't use
> > [r]partition()'; someone could look for opportunities to use that
> > separately.
>
> I make a go at doing an idiomatic convertion of the first few modules
> tagged by 'grep find( *.py' in Lib, patch attached.
>
> WOW, I love partition.

After looking at your patch, I have to agree.  The new code is *way*
more readable.

Nice work!

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From ncoghlan at gmail.com  Thu Aug 24 11:38:44 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 24 Aug 2006 19:38:44 +1000
Subject: [Python-3000] find -> index patch
In-Reply-To: <20060824003948.GT5772@performancedrivers.com>
References: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
	<20060824003948.GT5772@performancedrivers.com>
Message-ID: <44ED73A4.40208@gmail.com>

Jack Diederich wrote:
> If folks like the way this partial set looks I'll convert the rest.

+1 from here (beautifying the standard lib was one of the justifications for 
partition, after all).

> ------------------------------------------------------------------------
> 
> Index: Lib/CGIHTTPServer.py
> ===================================================================
> --- Lib/CGIHTTPServer.py	(revision 51530)
> +++ Lib/CGIHTTPServer.py	(working copy)
> @@ -106,16 +106,9 @@
>      def run_cgi(self):
>          """Execute a CGI script."""
>          dir, rest = self.cgi_info
> -        i = rest.rfind('?')
> -        if i >= 0:
> -            rest, query = rest[:i], rest[i+1:]
> -        else:
> -            query = ''
> -        i = rest.find('/')
> -        if i >= 0:
> -            script, rest = rest[:i], rest[i:]
> -        else:
> -            script, rest = rest, ''
> +        (rest, sep, query) = rest.rpartition('?')
> +        (rest, sep, script) = rest.partition('/')
> +        rest = sep + rest # keep the slash

rest & script are back to front on the second line of the new bit.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Thu Aug 24 11:45:58 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 24 Aug 2006 19:45:58 +1000
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <20060823203502.1A79.JCARLSON@uci.edu>
References: <20060823185143.1A73.JCARLSON@uci.edu>	<1156386257.44ed0dd1cf737@www.domainfactory-webmail.de>
	<20060823203502.1A79.JCARLSON@uci.edu>
Message-ID: <44ED7556.7010306@gmail.com>

Josiah Carlson wrote:
> Also, in the talk he gave at Google on July 21, somewhere around the
> 7:45-11 minute mark, he talks about how 3.x features are to be
> backported to 2.7 or so, specifically so that there is a larger subset
> of Python that will run in both 2.x and 3.x .  Smells like an attempt at
> gradual migration to me.

He also said that he doesn't expect Python 3.0 to see widespread usage, with a 
relatively rapid evolution to 3.1 (and possibly even 3.2).

I don't think there's really that much disagreement here - the difference is 
that Guido wants to get 3.0 out early so that we *know* what the eventual 
target is for later 2.x releases.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From fredrik at pythonware.com  Thu Aug 24 12:35:54 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 24 Aug 2006 12:35:54 +0200
Subject: [Python-3000] find -> index patch
References: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
Message-ID: <ecjvea$jk9$1@sea.gmane.org>

Guido van Rossum wrote:

> Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just
> gets rid of all *uses* of find/rfind from Lib; it doesn't actually
> modify stringobject.c or unicodeobject.c. It doesn't use
> [r]partition()'; someone could look for opportunities to use that
> separately.

since most of the changes appear to be variations of the pattern

    - index = foo.find(bar)
    + try:
    +     index = foo.index(bar)
    + except:
    +     index = -1

it sure looks like the "get rid of find; it's the same thing as index" idea might
be somewhat misguided.  I think I'm "idea".find("good") on this one.  better
use this energy on partitionifying the 2.6 standard library instead.

</F> 




From fredrik at pythonware.com  Thu Aug 24 12:51:20 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 24 Aug 2006 12:51:20 +0200
Subject: [Python-3000] Droping find/rfind?
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com><20060823191222.1A76.JCARLSON@uci.edu>
	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
Message-ID: <eck0b8$mno$1@sea.gmane.org>

Guido van Rossum wrote:

>        for c in ph.loggerMap.keys():
>            if string.find(c.parent.name, alogger.name) <> 0:
>                alogger.parent = c.parent
>                c.parent = alogger
>
> This is either a really weird way of writing "if not
> c.parent.name.startswith(alogger.name):"

weird, indeed, but it could be a premature attempt to optimize away the slicing
for platforms that don't have "startswith" (it doesn't look like a bug, afaict).

(on the other hand, "s[:len(t)] == t" is usually faster than "s.startswith(t)" for short
prefixes, so maybe someone should have done a bit more benchmarking...)

(which reminds me that speeding up handling of optional arguments to C functions
would be an even better use of this energy)

</F> 




From walter at livinglogic.de  Thu Aug 24 12:56:37 2006
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 24 Aug 2006 12:56:37 +0200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>
	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
Message-ID: <44ED85E5.1000005@livinglogic.de>

Guido van Rossum wrote:

> I don't find the current attempts to come up with a better substring
> search API useful.
> 
> [...]
>
> I appreciate the criticism on the patch -- clearly it's not ready to
> go in, and more work needs to be put in to actually *improve* the
> code, using [r]partition()  where necessary, etc. But I'm strenghtened
> in the conclusion that find() is way overused and we don't need yet
> another search primitive. TOOWTDI.

I don't see what's wrong with find() per se. IMHO in the following use
case find() is the best option: Find the occurrences of "{foo bar}"
patterns in the string and return both parts as a tuple. Return (None,
"text") for the parts between the patterns, i.e. for
   'foo{spam eggs}bar{foo bar}'
return
   [(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')]

Using find(), the code looks like this:

def splitfind(s):
    pos = 0
    while True:
        posstart = s.find("{", pos)
        if posstart < 0:
            break
        posarg = s.find(" ", posstart)
        if posarg < 0:
            break
        posend = s.find("}", posarg)
        if posend < 0:
            break
        prefix = s[pos:posstart]
        if prefix:
            yield (None, prefix)
        yield (s[posstart+1:posarg], s[posarg+1:posend])
        pos = posend+1
    rest = s[pos:]
    if rest:
        yield (None, rest)

Using index() looks worse to me. The code is buried under the exception
handling:

def splitindex(s):
    pos = 0
    while True:
        try:
            posstart = s.index("{", pos)
        except ValueError:
            break
        try:
            posarg = s.index(" ", posstart)
        except ValueError:
            break
        try:
            posend = s.find("}", posarg)
        except ValueError:
            break
        prefix = s[pos:posstart]
        if prefix:
            yield (None, prefix)
        yield (s[posstart+1:posarg], s[posarg+1:posend])
        pos = posend+1
    rest = s[pos:]
    if rest:
        yield (None, rest)

Using partition() might have a performance problem if the input string
is long.

Servus,
   Walter


From ncoghlan at gmail.com  Thu Aug 24 13:48:22 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 24 Aug 2006 21:48:22 +1000
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44ED85E5.1000005@livinglogic.de>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
	<44ED85E5.1000005@livinglogic.de>
Message-ID: <44ED9206.1080306@gmail.com>

Walter D?rwald wrote:
> Guido van Rossum wrote:
> 
>> I don't find the current attempts to come up with a better substring
>> search API useful.
>>
>> [...]
>>
>> I appreciate the criticism on the patch -- clearly it's not ready to
>> go in, and more work needs to be put in to actually *improve* the
>> code, using [r]partition()  where necessary, etc. But I'm strenghtened
>> in the conclusion that find() is way overused and we don't need yet
>> another search primitive. TOOWTDI.
> 
> I don't see what's wrong with find() per se. IMHO in the following use
> case find() is the best option: Find the occurrences of "{foo bar}"
> patterns in the string and return both parts as a tuple. Return (None,
> "text") for the parts between the patterns, i.e. for
>    'foo{spam eggs}bar{foo bar}'
> return
>    [(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')]

With a variety of "view types", that work like the corresponding builtin type, 
but reference the original data structure instead of creating copies, then you 
could use partition without having to worry about poor performance on large 
strings:

def splitview(s):
     rest = strview(s)
     while 1:
         prefix, found, rest = rest.partition("{")
         if prefix:
             yield (None, str(prefix))
         if not found:
             break
         first, found, rest = rest.partition("{")
         if not found:
             break
         second, found, rest = rest.partition("{")
         if not found:
             break
         yield (str(first), str(second))

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From fredrik at pythonware.com  Thu Aug 24 14:33:02 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 24 Aug 2006 14:33:02 +0200
Subject: [Python-3000] Droping find/rfind?
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com><44ED85E5.1000005@livinglogic.de>
	<44ED9206.1080306@gmail.com>
Message-ID: <eck69v$b6n$1@sea.gmane.org>

Nick Coghlan wrote:

> With a variety of "view types", that work like the corresponding builtin type,
> but reference the original data structure instead of creating copies

support for string views would require some serious interpreter surgery, though,
and probably break quite a few extensions...

</F> 




From mcherm at mcherm.com  Thu Aug 24 14:44:50 2006
From: mcherm at mcherm.com (Michael Chermside)
Date: Thu, 24 Aug 2006 05:44:50 -0700
Subject: [Python-3000] find -> index patch
Message-ID: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com>

Jack Diederich writes:
> I make a go at doing an idiomatic convertion [...] patch attached.
>
> WOW, I love partition.  In all the instances that weren't a simple "in"
> test I ended up using [r]partition.  In some cases one of the returned
> strings gets thrown away but in those cases it is guaranteed to be small.
> The new code is usually smaller than the old and generally clearer.

Wow. That's just beautiful. This has now convinced me that dumping
[r]find() (at least!) and pushing people toward using partition will
result in pain in the short term (of course), and beautiful, readable
code in the long term.

> If folks like the way this partial set looks I'll convert the rest.

Please do! Even if we *retain* [r]find(), this is still better code.
And I'm personally going to stop using [r]find() in my own code
starting today.

-- Michael Chermside


From fredrik at pythonware.com  Thu Aug 24 15:48:57 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 24 Aug 2006 15:48:57 +0200
Subject: [Python-3000] find -> index patch
References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com>
Message-ID: <eckao9$rq9$1@sea.gmane.org>

Michael Chermside wrote:

>> WOW, I love partition.  In all the instances that weren't a simple "in"
>> test I ended up using [r]partition.  In some cases one of the returned
>> strings gets thrown away but in those cases it is guaranteed to be small.
>> The new code is usually smaller than the old and generally clearer.
>
> Wow. That's just beautiful. This has now convinced me that dumping
> [r]find() (at least!) and pushing people toward using partition will
> result in pain in the short term (of course), and beautiful, readable
> code in the long term.

note that partition provides an elegant solution to an important *subset* of all
problems addressed by find/index.

just like lexical scoping vs. default arguments and map vs. list comprehensions,
it doesn't address all problems right out of the box, and shouldn't be advertised
as doing that.

</F> 




From gmccaughan at synaptics-uk.com  Thu Aug 24 16:21:11 2006
From: gmccaughan at synaptics-uk.com (Gareth McCaughan)
Date: Thu, 24 Aug 2006 15:21:11 +0100
Subject: [Python-3000] find -> index patch
In-Reply-To: <eckao9$rq9$1@sea.gmane.org>
References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com>
	<eckao9$rq9$1@sea.gmane.org>
Message-ID: <200608241521.13007.gmccaughan@synaptics-uk.com>

Fredrik Lundh wrote:

> note that partition provides an elegant solution to an important *subset* of all
> problems addressed by find/index.
> 
> just like lexical scoping vs. default arguments and map vs. list comprehensions,
> it doesn't address all problems right out of the box, and shouldn't be advertised
> as doing that.

Sure, but partition + "in" (now that it works as an arbitrary substring test)
seem to cover a very large subset of the things you'd want to do with find:
enough that having only index available for the remaining cases is unlikely
to hurt much (apart from the important issue of backward compatibility, but
this *is* py3k). I'm having trouble thinking of any plausible counterexamples,
though I'm sure there must be some.

-- 
g



From nnorwitz at gmail.com  Thu Aug 24 16:25:04 2006
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Thu, 24 Aug 2006 10:25:04 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <eck0b8$mno$1@sea.gmane.org>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>
	<20060823191222.1A76.JCARLSON@uci.edu>
	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
	<eck0b8$mno$1@sea.gmane.org>
Message-ID: <ee2a432c0608240725g2d23d45bm958f8135be20029d@mail.gmail.com>

On 8/24/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
>
> (which reminds me that speeding up handling of optional arguments to C functions
> would be an even better use of this energy)

If this patch:  http://python.org/sf/1107887 is integrated with some
of my current work, it should do the job nicely.  IIRC the patch uses
a big switch which sped things up, but Raymond didn't like it (I think
more on a conceptual basis).  I don't think it slowed things down
measurably.

My new approach has been to add a C function pointer to PyCFunction
and some other 'function' objects that can dispatch to an appropriate
function in ceval.c that does the right thing.  I define a bunch of
little methods that are determined when the function is created and
only does what's necessary depending on the ml_flags.  It could be
expanded to look at other things.  The current work hasn't produced
any measurable changes in perf, but I've only gotten rid of a few
comparisons and/or a possible function call (if it isn't inlined).  If
I merge these two approaches, I should be able to be able to speed up
cases like you describe.

n

From guido at python.org  Thu Aug 24 16:27:11 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Aug 2006 07:27:11 -0700
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <20060823203502.1A79.JCARLSON@uci.edu>
References: <20060823185143.1A73.JCARLSON@uci.edu>
	<1156386257.44ed0dd1cf737@www.domainfactory-webmail.de>
	<20060823203502.1A79.JCARLSON@uci.edu>
Message-ID: <ca471dc20608240727h896db39j79a636e5e5c81dff@mail.gmail.com>

On 8/23/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> Specifically his reponse to the "Here's my suggestion:" paragraph.
> Unless I completely misunderstood his response, and his later asking
> whether I want to help author the transition PEP (presumably for at
> least dict.keys(), bur more likely from 2.x to 3.x), I can't help but
> believe that he also wants at least an attempt at some gradual change
> for users with cold feet about breaking everything in one go.
>
> Also, in the talk he gave at Google on July 21, somewhere around the
> 7:45-11 minute mark, he talks about how 3.x features are to be
> backported to 2.7 or so, specifically so that there is a larger subset
> of Python that will run in both 2.x and 3.x .  Smells like an attempt at
> gradual migration to me.

Since you're trying to channel me, and I'm right here listening to you
(and annoyed that you are wasting my time), I need to clarify. What I
*don't* want to happen is that Python 2.6 2.7, and so on keep changing
the language from under users' feet, requiring constant code changes
to keep up, so that by the time the 2.9 -> 3.0 transition comes it
will feel pretty much the same as 2.4 -> 2.5. That would be bad
because it would mean that for every transition users would have to
make a lot of changes. (Pretty much the only changes like that planned
are increasing deprecation warnings for string exceptions, and making
'with' and 'as' unconditional keywords in 2.6.) 3.0 (or 3.2) will feel
like a big change and will require a combination of automatic and
manual explicit conversion, sometimes guided by warnings produced by
Python 2.x in "future-proof-lint" mode (see (a) below).

What I *do* want to do is:

(a) Add an option to Python 2.6 or 2.7 that starts spewing out
warnings about certain things that will change semantics in 3.0 and
are hard to detect by source code inspection alone, just like the
current -Q option. This could detect uses of range(), zip() or
dict.keys() result values incompatible with the iterators or views
that these will return in 3.0. But there will be no pressure to change
such code before the 3.0 transition, and those warnings will be off by
default.

(b) Provide access to the new syntax, without dropping the old syntax,
whenever it can be done without introducing new keywords, or through
__future__ syntax.

But these approaches alone cannot cover all cases. While we can
probably backport the new I/O library, there won't be a way to test it
in a world where str and unicode are the same (unless your app runs on
Jython or IronPython). The str/unicode unification and the int/long
unification, taking just two examples, just can't be backported to
Python 2.x, since they require pervasive and deep changes to the
implementation everywhere.

Another change that is unlikely to be available in 2.x is the
rationalization of comparisons. In 3.0, "1 < 'abc'" will raise a
TypeError; there's just no way to backport this behavior, since again
it requires pervasive changes to the implementation.

I know that you are dreaming of a world where all transitions are
easy. But it's just a dream. 3.0 will require hard work and for many
large apps it will take years to migrate -- the best approach is
probably to make it coincide with a planned major rewrite of the app.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Thu Aug 24 16:33:12 2006
From: martin at v.loewis.de (martin at v.loewis.de)
Date: Thu, 24 Aug 2006 16:33:12 +0200
Subject: [Python-3000] find -> index patch
In-Reply-To: <20060824003948.GT5772@performancedrivers.com>
References: <ca471dc20608231418v21e93634o44139017227d1a2b@mail.gmail.com>
	<20060824003948.GT5772@performancedrivers.com>
Message-ID: <1156429992.44edb8a8e5794@www.domainfactory-webmail.de>

Zitat von Jack Diederich <jack at psynchronous.com>:


> +        if (sep_found):

This should be

           if sep_found:

> If folks like the way this partial set looks I'll convert the rest.

Otherwise, it looks fine.

Martin





From thomas at python.org  Thu Aug 24 16:55:51 2006
From: thomas at python.org (Thomas Wouters)
Date: Thu, 24 Aug 2006 16:55:51 +0200
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <ca471dc20608240727h896db39j79a636e5e5c81dff@mail.gmail.com>
References: <20060823185143.1A73.JCARLSON@uci.edu>
	<1156386257.44ed0dd1cf737@www.domainfactory-webmail.de>
	<20060823203502.1A79.JCARLSON@uci.edu>
	<ca471dc20608240727h896db39j79a636e5e5c81dff@mail.gmail.com>
Message-ID: <9e804ac0608240755x1d5b4406r902d3154157f9fd9@mail.gmail.com>

On 8/24/06, Guido van Rossum <guido at python.org> wrote:

> I know that you are dreaming of a world where all transitions are
> easy. But it's just a dream. 3.0 will require hard work and for many
> large apps it will take years to migrate -- the best approach is
> probably to make it coincide with a planned major rewrite of the app.


I agree with everything you said, except this. Yes, Python 2.x -> 3.x will
always be a large step, no matter which 'x' you take. That shouldn't (and
doesn't, so far) mean you can't write code that works fine in both 2.x and
3.x, and transitioning applications from 2.x-only code to
2.x-and-3.x-codecould then be done incrementally. It would probably
need support from future
2.x releases in order to make that possible, but it shouldn't affect 3.x. It
will still be a rather big effort from applications, but not any bigger than
porting to 3.x in the first place.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060824/00c0f085/attachment.htm 

From jcarlson at uci.edu  Thu Aug 24 18:01:27 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Thu, 24 Aug 2006 09:01:27 -0700
Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be?
In-Reply-To: <ca471dc20608240727h896db39j79a636e5e5c81dff@mail.gmail.com>
References: <20060823203502.1A79.JCARLSON@uci.edu>
	<ca471dc20608240727h896db39j79a636e5e5c81dff@mail.gmail.com>
Message-ID: <20060824084759.1A82.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> What I *do* want to do is:
> 
> (a) Add an option to Python 2.6 or 2.7 that starts spewing out
> warnings about certain things that will change semantics in 3.0 and
> are hard to detect by source code inspection alone, just like the
> current -Q option. This could detect uses of range(), zip() or
> dict.keys() result values incompatible with the iterators or views
> that these will return in 3.0. But there will be no pressure to change
> such code before the 3.0 transition, and those warnings will be off by
> default.
> 
> (b) Provide access to the new syntax, without dropping the old syntax,
> whenever it can be done without introducing new keywords, or through
> __future__ syntax.

Both of these things are also what I want.


> But these approaches alone cannot cover all cases. While we can
> probably backport the new I/O library, there won't be a way to test it
> in a world where str and unicode are the same (unless your app runs on
> Jython or IronPython). The str/unicode unification and the int/long
> unification, taking just two examples, just can't be backported to
> Python 2.x, since they require pervasive and deep changes to the
> implementation everywhere.
> 
> Another change that is unlikely to be available in 2.x is the
> rationalization of comparisons. In 3.0, "1 < 'abc'" will raise a
> TypeError; there's just no way to backport this behavior, since again
> it requires pervasive changes to the implementation.
> 
> I know that you are dreaming of a world where all transitions are
> easy. But it's just a dream. 3.0 will require hard work and for many
> large apps it will take years to migrate -- the best approach is
> probably to make it coincide with a planned major rewrite of the app.
> 

Easy change would be nice, but working towards everyone having an easy
transition would take quite a bit of time and effort, more time and
effort than I think *anyone* is really willing to put forward.

What I want is for the transition not to be hard.  Backporting new
modules is one way of doing this, offering an import hook to gain access
to a new standard library organization (wxPython uses a method of
renaming objects that has worked quite well in their wx namespace
transition, which might be usable here), deprecation warnings,
__future__, etc., all of these are mechanisms, I see, as steps towards
making the 2.x -> 3.x transition not quite so hard.

Ultimately the features/syntax/semantics that cannot be backported will
make the last transition hill a bit tougher to climb than the previous
2.x->2.x+1 ones, but people should have had ample warning for the most
part, and I hope won't have terrible difficulties for the final set of
changes necessary to go from 2.x to 3.x .

 - Josiah


From jimjjewett at gmail.com  Thu Aug 24 18:37:35 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 24 Aug 2006 12:37:35 -0400
Subject: [Python-3000] sort vs order (was: What should the focus for 2.6 be?)
Message-ID: <fb6fbf560608240937p4532197cycf65d99862b13a75@mail.gmail.com>

On 8/24/06, Guido van Rossum <guido at python.org> wrote:
> Another change that is unlikely to be available in 2.x is the
> rationalization of comparisons. In 3.0, "1 < 'abc'" will raise a
> TypeError; there's just no way to backport this behavior, since again
> it requires pervasive changes to the implementation.

I still believe that this breaks an important current use case for
sorting, but maybe the right answer is a different (but similar) API.

Given an arbitrary collection of objects, I want to be able to order
them in a consistent manner, at least within a single interpreter
session.  (Consistency across sessions/machines/persistence/etc would
be even better, but isn't essential.)

The current sort method works pretty well; the new one wouldn't.  It
would be enough (and arguably an improvement, because of broken
objects) if there were a consistent_order equivalent that just caught
the TypeError and then tried a fallback for you until it found an
answer.

-jJ

From guido at python.org  Thu Aug 24 18:44:48 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Aug 2006 09:44:48 -0700
Subject: [Python-3000] sort vs order (was: What should the focus for 2.6
	be?)
In-Reply-To: <fb6fbf560608240937p4532197cycf65d99862b13a75@mail.gmail.com>
References: <fb6fbf560608240937p4532197cycf65d99862b13a75@mail.gmail.com>
Message-ID: <ca471dc20608240944u3f171882r641d76c1682b618@mail.gmail.com>

For doctestst etc., it's easy to create a consistent order:

  sorted(X, key=lambda x: (str(type(x)), x))

This sorts by the name of the type first, then by value within each
type. This is assuming the type itself is sortable -- in 3.0, many
types won't be sortable, e.g. dicts. (Even in 2.x, sets implement < so
differently that a list of sets is likely to cause problems when
sorting.)

--Guido

On 8/24/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/24/06, Guido van Rossum <guido at python.org> wrote:
> > Another change that is unlikely to be available in 2.x is the
> > rationalization of comparisons. In 3.0, "1 < 'abc'" will raise a
> > TypeError; there's just no way to backport this behavior, since again
> > it requires pervasive changes to the implementation.
>
> I still believe that this breaks an important current use case for
> sorting, but maybe the right answer is a different (but similar) API.
>
> Given an arbitrary collection of objects, I want to be able to order
> them in a consistent manner, at least within a single interpreter
> session.  (Consistency across sessions/machines/persistence/etc would
> be even better, but isn't essential.)
>
> The current sort method works pretty well; the new one wouldn't.  It
> would be enough (and arguably an improvement, because of broken
> objects) if there were a consistent_order equivalent that just caught
> the TypeError and then tried a fallback for you until it found an
> answer.
>
> -jJ
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From david.nospam.hopwood at blueyonder.co.uk  Thu Aug 24 22:41:43 2006
From: david.nospam.hopwood at blueyonder.co.uk (David Hopwood)
Date: Thu, 24 Aug 2006 21:41:43 +0100
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44ED85E5.1000005@livinglogic.de>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
	<44ED85E5.1000005@livinglogic.de>
Message-ID: <44EE0F07.5030005@blueyonder.co.uk>

Walter D?rwald wrote:
[...]
> Using find(), the code looks like this:
> 
> def splitfind(s):
>     pos = 0
>     while True:
>         posstart = s.find("{", pos)
>         if posstart < 0:
>             break
>         posarg = s.find(" ", posstart)
>         if posarg < 0:
>             break
>         posend = s.find("}", posarg)
>         if posend < 0:
>             break
>         prefix = s[pos:posstart]
>         if prefix:
>             yield (None, prefix)
>         yield (s[posstart+1:posarg], s[posarg+1:posend])
>         pos = posend+1
>     rest = s[pos:]
>     if rest:
>         yield (None, rest)
> 
> Using index() looks worse to me. The code is buried under the exception
> handling:
> 
> def splitindex(s):
>     pos = 0
>     while True:
>         try:
>             posstart = s.index("{", pos)
>         except ValueError:
>             break
>         try:
>             posarg = s.index(" ", posstart)
>         except ValueError:
>             break
>         try:
>             posend = s.find("}", posarg)
>         except ValueError:
>             break

          try:
              posstart = s.index("{", pos)
              posarg = s.index(" ", posstart)
              posend = s.find("}", posarg)
          except ValueError:
              break

is shorter and clearer than the version using 'find'.

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>




From mcherm at mcherm.com  Thu Aug 24 23:45:24 2006
From: mcherm at mcherm.com (Michael Chermside)
Date: Thu, 24 Aug 2006 14:45:24 -0700
Subject: [Python-3000] sort vs order (was: What should the focus for	2.6
	be?)
Message-ID: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com>

Jim Jewett writes:
> Given an arbitrary collection of objects, I want to be able to order
> them in a consistent manner, at least within a single interpreter
> session.

I think this meets your specifications:

>>> myList = [2.5, 17, object(), 3+4j, 'abc']
>>> myList.sort(key=id)

I prefer Guido's suggestion (id=lambda x: (type(x), x)), but it
doesn't handle types that are not comparable (like the complex
number I included to be perverse). Frankly, I don't know why
you have an "arbitrary collection of objects" -- the only things
I have ever dealt with that handled truly _arbitrary_ collections
of objects were garbage collectors and generic caching mechanisms.
In either case you really *wouldn't* care how things sorted so
long as it was consistant, and then sorting by id works nicely.

Of course, I doubt this is what you're doing because if you
REALLY had arbitrary objects (including uncomparable things like
complex numbers) then you would already need to be doing this
today and your code wouldn't even need to be modified when you
upgraded to 3.0.

-- Michael Chermside


From greg.ewing at canterbury.ac.nz  Fri Aug 25 02:24:06 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 25 Aug 2006 12:24:06 +1200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <eck0b8$mno$1@sea.gmane.org>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>
	<20060823191222.1A76.JCARLSON@uci.edu>
	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
	<eck0b8$mno$1@sea.gmane.org>
Message-ID: <44EE4326.1070604@canterbury.ac.nz>

Fredrik Lundh wrote:

> (on the other hand, "s[:len(t)] == t" is usually faster than "s.startswith(t)" for short
> prefixes,

That's surprising. Any idea why this might be?

--
Greg

From thomas at python.org  Fri Aug 25 02:46:08 2006
From: thomas at python.org (Thomas Wouters)
Date: Thu, 24 Aug 2006 20:46:08 -0400
Subject: [Python-3000] Removing 'old-style' ('simple') slices from Py3K.
Message-ID: <9e804ac0608241746n7de7c161yd40f6bb4c3061ab6@mail.gmail.com>

I spent my time at the Google sprint working on removing simple slices from
Py3k, in the p3yk-noslice branch. The work is pretty much done, except for
some minor details and finishing touches. There are a few items that should
probably be discussed, though.

The state of the tree:
 - The SLICE, STORE_SLICE and DELETE_SLICE opcodes (all 4 versions of each)
are eradicated. This even freed up a local (register) variable in
PyEval_EvalFrameex(), and probably resulted in a speedup of the bytecode
loop. I didn't measure it, though.
 - Various types that didn't support extended slicing had such support
added:
    - UserList, UserString, MutableUserString
    - structseq (what os.stat and time.localtime and such return)
    - sre_parse.SubPattern (well, more or less)
    - buffer
    - bytes
    - mmap.mmap
 - Various types that supported extended slicing now specialcase simple
slicing, for extra speed (list, string, unicode, array, tuple)
 - the ctypes 'Array' and 'Pointer' types support slicing with
slice-objects, but only with step = 1
 - The __getslice__, __setslice__ and __delslice__ slots aren't created
anymore, for C types.
 - The PySequence_GetSlice, PySequence_SetSlice and PySequence_DelSlice no
longer try to access the sq_slice and sq_ass_slice PySequenceMethods
members. They did already fall back to the mp_subscript and mp_ass_subscript
PyMappingMethods members.
 - All tests pass, with only the expected changes to any tests.
 - The PySequenceMethods struct's 'sq_slice' and 'sq_ass_slice' members are
unused and have been renamed
 - PyMapping_Check() now returns true for any type with a
PyMappingMethods.mp_subscript filled, not just those without a
PySequence.sq_slice. One test had to be adjusted for that -- execfile("",
{}, ()) now raises a different error, so it now tests execfile("", {}, 42)
 - There's no way to figure out the size of a Py_ssize_t from Python code,
now. test_support was using a simple-slice to figure it out. I'm not sure if
there's really a reason to do it -- I don't quite understand the use of it.
 - It's still lacking tests for the extended-slicing abilities of buffer,
mmap.mmap, structseq, UserList and UserString.

I think the extended-slicing support as well as the simpleslice
specialcasing should be ported to 2.6. Are there any objections to that? It
means, in some cases, a bit of code duplication, but it would make 's[::]'
almost as fast as 's[:]' for those types.

I also think it may be worthwhile to switch to always using slice objects in
Python 2.6 or 2.7. It would mean we can remove the 12 bytecodes for slicing,
plus the associated code in the main bytecode loop. We can still call
sq_slice/sq_ass_slice if step is None. The main issue is that it might be a
net slowdown for slicing (but speedup for all other operations), and that it
is no longer possible to see the difference between obj[:] and obj[::]. I
personally think code that treats those two (significantly) differently is
insane.

Now that all those types have mp_subscript defined, we could remove sq_item
and sq_ass_item as well. I'm not entirely sure I see all the implications of
that, though. The C code does quite a lot of indexing of tuples and lists,
and those are indexed using Py_ssize_t's directly. Going through a PyObject
for that may be too cumbersome.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060824/c60c127a/attachment.htm 

From tim.peters at gmail.com  Fri Aug 25 03:01:20 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Thu, 24 Aug 2006 21:01:20 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44EE4326.1070604@canterbury.ac.nz>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>
	<20060823191222.1A76.JCARLSON@uci.edu>
	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
	<eck0b8$mno$1@sea.gmane.org> <44EE4326.1070604@canterbury.ac.nz>
Message-ID: <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com>

[Fredrik Lundh]
>> (on the other hand, "s[:len(t)] == t" is usually faster than
"s.startswith(t)" for short
>> prefixes,

[Greg Ewing]
> That's surprising. Any idea why this might be?

Perhaps it has to do with the rest of his message ;-):

>> (which reminds me that speeding up handling of optional arguments
>> to C functions would be an even better use of this energy)

From greg.ewing at canterbury.ac.nz  Fri Aug 25 03:15:01 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 25 Aug 2006 13:15:01 +1200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>
	<20060823191222.1A76.JCARLSON@uci.edu>
	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
	<eck0b8$mno$1@sea.gmane.org> <44EE4326.1070604@canterbury.ac.nz>
	<1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com>
Message-ID: <44EE4F15.2070301@canterbury.ac.nz>

Tim Peters wrote:

> Perhaps it has to do with the rest of his message ;-):
> 
>>>(which reminds me that speeding up handling of optional arguments
>>>to C functions would be an even better use of this energy)

Until a few moments ago, I didn't know that str.startswith()
had any optional arguments, so I missed the significance of
that.

In any case, I still find it surprising that this would
make enough difference to outweigh a Python-level indexing
and comparison...

--
Greg

From martin at v.loewis.de  Fri Aug 25 03:49:55 2006
From: martin at v.loewis.de (martin at v.loewis.de)
Date: Fri, 25 Aug 2006 03:49:55 +0200
Subject: [Python-3000] long/int unification
Message-ID: <1156470595.44ee57436b03d@www.domainfactory-webmail.de>

Here is a quick status of the int_unification branch,
summarizing what I did at the Google sprint in NYC.

- the int type has been dropped; the builtins int and long
  now both refer to long type
- all PyInt_* API is forwarded to the PyLong_* API. Little
  changes to the C code are necessary; the most common offender
  is PyInt_AS_LONG((PyIntObject*)v) since I completely removed
  PyIntObject.
- Much of the test suite passes, although it still has a number
  of bugs.
- There are timing tests for allocation and for addition.
  On allocation, the current implementation is about a factor
  of 2 slower; the integer addition is about 1.5 times slower;
  the initial slowdowns was by a factor of 3. The pystones
  dropped about 10% (pybench fails to run on p3yk).

A couple of interesting observations:
- bool was a subtype of int, and is now a subtype of long. In
  order to avoid knowing the internal representation of long,
  the bool type compares addresses against Py_True and Py_False,
  instead of looking at ob_ival.
- to add the small ints cache, an array of statically allocated
  longs is used, rather than heap-allocating them.
- after adding the small ints cache, lot of things broke, e.g.
  for code like
  py> x = 4
  py> x = -4
  py> x
  -4
  py> 4
  -4
  This happened because long methods just toggle the sign
  of the object they got, messing up the small ints cache.
- to further speedup the implementation, I added special
  casing for one-digit numbers. As they are always in
  range(-32767,32768), the arithmethic operations don't
  need overflow checking anymore (even multiplication
  won't overflow 32-bit int).
- I found that in 2.x, long objects overallocate 2 byte
  on a 32-bit machine, and 6 bytes on a 64-bit machine,
  because sizeof(PyLongObject) rounds up.
- pickle and marshal have been changed to deal with
  the loss of int; pickle generates INT codes even
  for longs now provided the value is in the range
  for the code.

I'm not sure whether this performance change is
acceptable; at this point, I'm running out of ideas
how to further improve the performance. Using a plain
32-bit int as the representation could be another
try, but I somewhat doubt it helps given that the
the supposedly-simpler single-digit case is so
slow.

Regards,
Martin






From fredrik at pythonware.com  Fri Aug 25 07:50:58 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 25 Aug 2006 07:50:58 +0200
Subject: [Python-3000] long/int unification
In-Reply-To: <1156470595.44ee57436b03d@www.domainfactory-webmail.de>
References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de>
Message-ID: <ecm342$ab0$1@sea.gmane.org>

martin at v.loewis.de wrote:

> I'm not sure whether this performance change is
> acceptable; at this point, I'm running out of ideas
> how to further improve the performance.

without really digging into the patch, is it perhaps time to switch to 
unboxed integers for the CPython interpreter ?

(support for implementation subtypes could also be nice; I agree that
it would be nice if we had only one visible integer type, but I don't 
really see why the implementation has to restricted to one type only. 
this applies to strings too, of course).

</F>


From fredrik at pythonware.com  Fri Aug 25 07:54:23 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 25 Aug 2006 07:54:23 +0200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>	<eck0b8$mno$1@sea.gmane.org>
	<44EE4326.1070604@canterbury.ac.nz>
	<1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com>
Message-ID: <ecm3af$aph$1@sea.gmane.org>

Tim Peters wrote:

> [Greg Ewing]
>> That's surprising. Any idea why this might be?
> 
> Perhaps it has to do with the rest of his message ;-):
> 
>>> (which reminds me that speeding up handling of optional arguments
>>> to C functions would be an even better use of this energy)

in my experience, the object allocator tends to be surprisingly fast, 
and the calling mechanism tends to be surprisingly slow.  and this is 
true even if you take this into account.

</F>


From jcarlson at uci.edu  Fri Aug 25 08:39:22 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Thu, 24 Aug 2006 23:39:22 -0700
Subject: [Python-3000] long/int unification
In-Reply-To: <ecm342$ab0$1@sea.gmane.org>
References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de>
	<ecm342$ab0$1@sea.gmane.org>
Message-ID: <20060824232848.1A9F.JCARLSON@uci.edu>


Fredrik Lundh <fredrik at pythonware.com> wrote:
> 
> martin at v.loewis.de wrote:
> 
> > I'm not sure whether this performance change is
> > acceptable; at this point, I'm running out of ideas
> > how to further improve the performance.
> 
> without really digging into the patch, is it perhaps time to switch to 
> unboxed integers for the CPython interpreter ?
> 
> (support for implementation subtypes could also be nice; I agree that
> it would be nice if we had only one visible integer type, but I don't 
> really see why the implementation has to restricted to one type only. 
> this applies to strings too, of course).

In the integer case, it reminds me of James Knight's tagged integer
patch to 2.3 [1].  If using long exclusively is 50% slower, why not try
the improved speed approach?  Also, depending on the objects, one may
consider a few other tagged objects, like perhaps None, True, and False
(they could all be special values with a single tag), or even just use
31/63 bits for the tagged integer value, with a 1 in the lowest bit
signifying it as a tagged integer.


 - Josiah

[1] http://mail.python.org/pipermail/python-dev/2004-July/046139.html



From fredrik at pythonware.com  Fri Aug 25 11:15:37 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 25 Aug 2006 11:15:37 +0200
Subject: [Python-3000] long/int unification
References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de><ecm342$ab0$1@sea.gmane.org>
	<20060824232848.1A9F.JCARLSON@uci.edu>
Message-ID: <ecmf3p$d1i$1@sea.gmane.org>

Josiah Carlson wrote:

> In the integer case, it reminds me of James Knight's tagged integer
> patch to 2.3 [1].  If using long exclusively is 50% slower, why not try
> the improved speed approach?

looks like GvR was -1000 on this idea at the time, though...

> Also, depending on the objects, one may consider a few other tagged
> objects, like perhaps None, True, and False (they could all be special
> values with a single tag), or even just use 31/63 bits for the tagged
> integer value, with a 1 in the lowest bit signifying it as a tagged integer.

iirc, my pytte1 experiment used tagged objects for integers and single-
character strings, which resulting in considerable speedups for the (small
set of) benchmarks I used.

(on the other hand, the dominating speedups in pytte1 were "true" GC,
and call-site caching combined with streamlined method lookup.  if we
really want to speed things up, we should probably start with call-site
caching and (explicit?) method inlining).

</F> 




From ncoghlan at gmail.com  Fri Aug 25 11:50:03 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Aug 2006 19:50:03 +1000
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <eck69v$b6n$1@sea.gmane.org>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com><44ED85E5.1000005@livinglogic.de>	<44ED9206.1080306@gmail.com>
	<eck69v$b6n$1@sea.gmane.org>
Message-ID: <44EEC7CB.2090908@gmail.com>

Fredrik Lundh wrote:
> Nick Coghlan wrote:
> 
>> With a variety of "view types", that work like the corresponding builtin type,
>> but reference the original data structure instead of creating copies
> 
> support for string views would require some serious interpreter surgery, though,
> and probably break quite a few extensions...

Why do you say that? I'm thinking about a type written in Python, intended to 
be used exactly the way I did in my strawman example - you accept a normal 
string, make a view of it, do your manipulations, then make sure that anything 
you return or yield is a normal string so other code doesn't get any nasty 
surprises.

It would be strictly an optimisation technique to allow the normal string 
operations to be used without the performance penalties associating with 
slicing large strings. Otherwise you have to choose between "readable" and 
"scalable" which is an annoying choice to be forced to make.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From fredrik at pythonware.com  Fri Aug 25 12:06:43 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 25 Aug 2006 12:06:43 +0200
Subject: [Python-3000] Droping find/rfind?
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com><44ED85E5.1000005@livinglogic.de>	<44ED9206.1080306@gmail.com><eck69v$b6n$1@sea.gmane.org>
	<44EEC7CB.2090908@gmail.com>
Message-ID: <ecmi3j$m4m$1@sea.gmane.org>

Nick Coghlan wrote:

>> Nick Coghlan wrote:
>>
>>> With a variety of "view types", that work like the corresponding builtin type,
>>> but reference the original data structure instead of creating copies
>>
>> support for string views would require some serious interpreter surgery, though,
>> and probably break quite a few extensions...
>
> Why do you say that?

because I happen to know a lot about how Python's string types are
implemented ?

> make a view of it

so to make a view of a string, you make a view of it ?

</F> 




From ncoghlan at gmail.com  Fri Aug 25 12:20:02 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Aug 2006 20:20:02 +1000
Subject: [Python-3000] Removing 'old-style' ('simple') slices from Py3K.
In-Reply-To: <9e804ac0608241746n7de7c161yd40f6bb4c3061ab6@mail.gmail.com>
References: <9e804ac0608241746n7de7c161yd40f6bb4c3061ab6@mail.gmail.com>
Message-ID: <44EECED2.2020206@gmail.com>

Thomas Wouters wrote:
>  - There's no way to figure out the size of a Py_ssize_t from Python 
> code, now. test_support was using a simple-slice to figure it out. I'm 
> not sure if there's really a reason to do it -- I don't quite understand 
> the use of it.

This isn't quite true, but I will admit that the only way I know how to do it 
is somewhat on the arcane side ;)

   try:
     double_width = 2*(sys.maxint+1)**2-1
     slice(None).indices(double_width)
     pyssize_t_max = double_width          # ssize_t twice as wide as long
   except OverflowError:
     pyssize_t_max = sys.maxint            # ssize_t same width as long

It might make more sense to just include a "sys.maxindex" to parallel 
sys.maxint (even though both are technically misnomers, leaving out the 
'native' bit).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Fri Aug 25 14:33:46 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Aug 2006 22:33:46 +1000
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ecmi3j$m4m$1@sea.gmane.org>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com><44ED85E5.1000005@livinglogic.de>	<44ED9206.1080306@gmail.com><eck69v$b6n$1@sea.gmane.org>	<44EEC7CB.2090908@gmail.com>
	<ecmi3j$m4m$1@sea.gmane.org>
Message-ID: <44EEEE2A.9080509@gmail.com>

Fredrik Lundh wrote:
> Nick Coghlan wrote:
> 
>>> Nick Coghlan wrote:
>>>
>>>> With a variety of "view types", that work like the corresponding builtin type,
>>>> but reference the original data structure instead of creating copies
>>> support for string views would require some serious interpreter surgery, though,
>>> and probably break quite a few extensions...
>> Why do you say that?
> 
> because I happen to know a lot about how Python's string types are
> implemented ?

I believe you're thinking about something far more sophisticated than what I'm 
suggesting. I'm just talking about a Python data type in a standard library 
module that trades off slower performance with smaller strings (due to extra 
method call overhead) against improved scalability (due to avoidance of 
copying strings around).

>> make a view of it
> 
> so to make a view of a string, you make a view of it ?

Yep - by using all those "start" and "stop" optional arguments to builtin 
string methods to implement the methods of a string view in pure Python. By 
creating the string view all you would really be doing is a partial 
application of start and stop arguments on all of the relevant string methods.

I've included an example below that just supports __len__, __str__ and 
partition(). The source object survives for as long as the view does - the 
idea is that the view should only last while you manipulate the string, with 
only real strings released outside the function via return statements or yield 
expressions.

All that said, I think David Hopwood nailed the simplest answer to Walter's 
particular use case with:

def splitindex(s):
     pos = 0
     while True:
       try:
           posstart = s.index("{", pos)
           posarg = s.index(" ", posstart)
           posend = s.index("}", posarg)
       except ValueError:
           break
       prefix = s[pos:posstart]
       if prefix:
           yield (None, prefix)
       yield (s[posstart+1:posarg], s[posarg+1:posend])
       pos = posend+1
     rest = s[pos:]
     if rest:
         yield (None, rest)

 >>> list(splitindex('foo{spam eggs}bar{foo bar}'))
[(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')]

Cheers,
Nick.

# Simple string view example
class strview(object):
     def __new__(cls, source, start=None, stop=None):
         self = object.__new__(cls)
         self.source = "%s" % source
         self.start = start if start is not None else 0
         self.stop = stop if stop is not None else len(source)
         return self
     def __str__(self):
         return self.source[self.start:self.stop]
     def __len__(self):
         return self.stop - self.start
     def partition(self, sep):
         _src = self.source
         try:
             startsep = _src.index(sep, self.start, self.stop)
         except ValueError:
             # Separator wasn't found!
             return self, _NULL_STR, _NULL_STR
         # Return new views of the three string parts
         endsep = startsep + len(sep)
         return (strview(_src, self.start, startsep),
                 strview(_src, startsep, endsep),
                 strview(_src, endsep, self.stop))

_NULL_STR = strview('')

def splitview(s):
      rest = strview(s)
      while 1:
          prefix, found, rest = rest.partition("{")
          if prefix:
              yield (None, str(prefix))
          if not found:
              break
          first, found, rest = rest.partition(" ")
          if not found:
              break
          second, found, rest = rest.partition("}")
          if not found:
              break
          yield (str(first), str(second))

 >>> list(splitview('foo{spam eggs}bar{foo bar}'))
[(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')]


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From fredrik at pythonware.com  Fri Aug 25 15:06:13 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 25 Aug 2006 15:06:13 +0200
Subject: [Python-3000] Droping find/rfind?
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com><44ED85E5.1000005@livinglogic.de>	<44ED9206.1080306@gmail.com><eck69v$b6n$1@sea.gmane.org>	<44EEC7CB.2090908@gmail.com><ecmi3j$m4m$1@sea.gmane.org>
	<44EEEE2A.9080509@gmail.com>
Message-ID: <ecmsk5$orm$1@sea.gmane.org>

Nick Coghlan wrote:

> I believe you're thinking about something far more sophisticated than what I'm
> suggesting. I'm just talking about a Python data type in a standard library
> module that trades off slower performance with smaller strings (due to extra
> method call overhead) against improved scalability (due to avoidance of
> copying strings around).

have you done any benchmarking on this ?

</F> 




From exarkun at divmod.com  Fri Aug 25 15:14:51 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Fri, 25 Aug 2006 09:14:51 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ecmsk5$orm$1@sea.gmane.org>
Message-ID: <20060825131452.1717.999901437.divmod.quotient.30940@ohm>

On Fri, 25 Aug 2006 15:06:13 +0200, Fredrik Lundh <fredrik at pythonware.com> wrote:
>Nick Coghlan wrote:
>
>> I believe you're thinking about something far more sophisticated than what I'm
>> suggesting. I'm just talking about a Python data type in a standard library
>> module that trades off slower performance with smaller strings (due to extra
>> method call overhead) against improved scalability (due to avoidance of
>> copying strings around).
>
>have you done any benchmarking on this ?
>

I've benchmarked string copying via slicing against views implemented using
buffer().  For certain use patterns, views are absolutely significantly
faster.

Jean-Paul

From fredrik at pythonware.com  Fri Aug 25 15:31:49 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 25 Aug 2006 15:31:49 +0200
Subject: [Python-3000] Droping find/rfind?
References: <ecmsk5$orm$1@sea.gmane.org>
	<20060825131452.1717.999901437.divmod.quotient.30940@ohm>
Message-ID: <ecmu46$u4e$1@sea.gmane.org>

Jean-Paul Calderone wrote:

>>> I believe you're thinking about something far more sophisticated than what I'm
>>> suggesting. I'm just talking about a Python data type in a standard library
>>> module that trades off slower performance with smaller strings (due to extra
>>> method call overhead) against improved scalability (due to avoidance of
>>> copying strings around).
>>
>>have you done any benchmarking on this ?
>
> I've benchmarked string copying via slicing against views implemented using
> buffer().  For certain use patterns, views are absolutely significantly
> faster.

of course, but buffers don't support many string methods, so I'm not sure how
that's applicable to this case.

(and before anyone says "let's fix that, then", please read earlier messages).

</F> 




From jimjjewett at gmail.com  Fri Aug 25 16:22:36 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 25 Aug 2006 10:22:36 -0400
Subject: [Python-3000] sort vs order (was: What should the focus for 2.6
	be?)
In-Reply-To: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com>
References: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com>
Message-ID: <fb6fbf560608250722g41acf025n9a76fd174de68171@mail.gmail.com>

On 8/24/06, Michael Chermside <mcherm at mcherm.com> wrote:
> Jim Jewett writes:
> > Given an arbitrary collection of objects, I want to be able to order
> > them in a consistent manner, at least within a single interpreter
> > session.

> I think this meets your specifications:

> >>> myList = [2.5, 17, object(), 3+4j, 'abc']
> >>> myList.sort(key=id)

Yes; not nicely, but it does.  I would prefer that it be the fallback
after first trying a regular sort.  Now I'm wondering if the right
recipe is to try comparing the objects, then the types, then the id,
or whether that would sometimes be inconsistent even for sane objects
if only some classes know about each other.

The end result is that even if I find a solution that works, I think
it will be common (and bug-prone) enough that it really ought to be in
the language, or at least the standard library -- as it is today for
objects that don't go out of their way to prevent it.

> Frankly, I don't know why you have an "arbitrary collection of objects"

mostly for debugging and tests.

> Of course, I doubt this is what you're doing because if you
> REALLY had arbitrary objects (including uncomparable things like
> complex numbers)

More precisely, my code is buggy when faced with complex numbers or
Numeric arrays -- but in practice, it isn't faced with those.  It *is*
faced with tuples, lists, strings, ints, floats, and instances of
arbitrary program-specific classes.  These all work fine today,
because sort either special cases or falls back to using id *without
throwing an exception*.

-jJ

From paul at prescod.net  Fri Aug 25 17:39:47 2006
From: paul at prescod.net (Paul Prescod)
Date: Fri, 25 Aug 2006 08:39:47 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44EE4F15.2070301@canterbury.ac.nz>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>
	<20060823191222.1A76.JCARLSON@uci.edu>
	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
	<eck0b8$mno$1@sea.gmane.org> <44EE4326.1070604@canterbury.ac.nz>
	<1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com>
	<44EE4F15.2070301@canterbury.ac.nz>
Message-ID: <1cb725390608250839s78cb4c46s378bb56313c1932a@mail.gmail.com>

On 8/24/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
> Tim Peters wrote:
>
> > Perhaps it has to do with the rest of his message ;-):
> >
> >>>(which reminds me that speeding up handling of optional arguments
> >>>to C functions would be an even better use of this energy)
>
> Until a few moments ago, I didn't know that str.startswith()
> had any optional arguments, so I missed the significance of
> that.


I also didn't know about the optional arguments to startswith and wonder if
they are much used or just cruft.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060825/026a0124/attachment.html 

From jcarlson at uci.edu  Fri Aug 25 17:47:25 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 25 Aug 2006 08:47:25 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ecmu46$u4e$1@sea.gmane.org>
References: <20060825131452.1717.999901437.divmod.quotient.30940@ohm>
	<ecmu46$u4e$1@sea.gmane.org>
Message-ID: <20060825080148.1AA8.JCARLSON@uci.edu>


"Fredrik Lundh" <fredrik at pythonware.com> wrote:
> Jean-Paul Calderone wrote:
> 
> >>> I believe you're thinking about something far more sophisticated than what I'm
> >>> suggesting. I'm just talking about a Python data type in a standard library
> >>> module that trades off slower performance with smaller strings (due to extra
> >>> method call overhead) against improved scalability (due to avoidance of
> >>> copying strings around).
> >>
> >>have you done any benchmarking on this ?
> >
> > I've benchmarked string copying via slicing against views implemented using
> > buffer().  For certain use patterns, views are absolutely significantly
> > faster.
> 
> of course, but buffers don't support many string methods, so I'm not sure how
> that's applicable to this case.
> 
> (and before anyone says "let's fix that, then", please read earlier messages).

Aside from the scheduled removal of buffer in 3.x, I see no particular
issue with offering a bytes view and str view in 3.x via two specific
bytes and str subtypes.  With care, very few changes if any would be
necessary in the str (unicode) implementation, and the bytesview
consistancy updating is already being done with current buffer objects.

From there, the only quesion is when an operation on a bytes or str
object should return such a view, and the answer would be never.  Return
views from view objects, the non-views from non-view objects.  If you
want views, wrap your original object with a view, and call its methods. 
If you need a non-view, call the standard bytes/str constructor.

 - Josiah


From guido at python.org  Fri Aug 25 17:48:34 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Aug 2006 08:48:34 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060825080148.1AA8.JCARLSON@uci.edu>
References: <20060825131452.1717.999901437.divmod.quotient.30940@ohm>
	<ecmu46$u4e$1@sea.gmane.org> <20060825080148.1AA8.JCARLSON@uci.edu>
Message-ID: <ca471dc20608250848l476d99f5w1dbb65c9518b1568@mail.gmail.com>

On 8/25/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> Aside from the scheduled removal of buffer in 3.x, I see no particular
> issue with offering a bytes view and str view in 3.x via two specific
> bytes and str subtypes.  With care, very few changes if any would be
> necessary in the str (unicode) implementation, and the bytesview
> consistancy updating is already being done with current buffer objects.
>
> >From there, the only quesion is when an operation on a bytes or str
> object should return such a view, and the answer would be never.  Return
> views from view objects, the non-views from non-view objects.  If you
> want views, wrap your original object with a view, and call its methods.
> If you need a non-view, call the standard bytes/str constructor.

For the record, I think this is a major case of YAGNI. You appear way
to obsessed with performance of some microscopic aspect of the
language. Please stop firing random proposals until you actually have
working code and proof that it matters. Speeding up microbenchmarks is
irrelevant.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From exarkun at divmod.com  Fri Aug 25 18:29:50 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Fri, 25 Aug 2006 12:29:50 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608250848l476d99f5w1dbb65c9518b1568@mail.gmail.com>
Message-ID: <20060825162950.1717.331562078.divmod.quotient.31042@ohm>

On Fri, 25 Aug 2006 08:48:34 -0700, Guido van Rossum <guido at python.org> wrote:
>On 8/25/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>> Aside from the scheduled removal of buffer in 3.x, I see no particular
>> issue with offering a bytes view and str view in 3.x via two specific
>> bytes and str subtypes.  With care, very few changes if any would be
>> necessary in the str (unicode) implementation, and the bytesview
>> consistancy updating is already being done with current buffer objects.
>>
>> >From there, the only quesion is when an operation on a bytes or str
>> object should return such a view, and the answer would be never.  Return
>> views from view objects, the non-views from non-view objects.  If you
>> want views, wrap your original object with a view, and call its methods.
>> If you need a non-view, call the standard bytes/str constructor.
>
>For the record, I think this is a major case of YAGNI. You appear way
>to obsessed with performance of some microscopic aspect of the
>language. Please stop firing random proposals until you actually have
>working code and proof that it matters. Speeding up microbenchmarks is
>irrelevant.

Twisted's core loop uses string views to avoid unnecessary copying.  This
has proven to be a real-world speedup.  This isn't a synthetic benchmark
or a micro-optimization.

I don't understand the resistance.  Is it really so earth-shatteringly
surprising that not copying memory unnecessarily is faster than copying
memory unnecessarily?

If the goal is to avoid speeding up Python programs because views are too
complex or unpythonic or whatever, fine.  But there isn't really any
question as to whether or not this is a real optimization.

Jean-Paul

From jimjjewett at gmail.com  Fri Aug 25 18:41:27 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 25 Aug 2006 12:41:27 -0400
Subject: [Python-3000] simplifying methods (was: Re:  Droping find/rfind?)
Message-ID: <fb6fbf560608250941q350b562ak4b325da78f1bd72@mail.gmail.com>

On 8/24/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Until a few moments ago, I didn't know that str.startswith()
> had any optional arguments

I just looked them up, and they turn out to just be syntactic sugar
for a slice.  (Even to the extent of handling omitted arguments as
None.)  The stop argument in particular is (almost) silly.

  s.startswith(prefix, start, stop) === s[start:stop].startswith(prefix)

Ignoring efficiency concerns, would dropping the optional arguments
and requiring an explicit slice be a valid Py3K simplification?

-jJ

From jimjjewett at gmail.com  Fri Aug 25 18:55:33 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 25 Aug 2006 12:55:33 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060825080148.1AA8.JCARLSON@uci.edu>
References: <20060825131452.1717.999901437.divmod.quotient.30940@ohm>
	<ecmu46$u4e$1@sea.gmane.org> <20060825080148.1AA8.JCARLSON@uci.edu>
Message-ID: <fb6fbf560608250955k4fae25a8u72691b51a458fcfc@mail.gmail.com>

On 8/25/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> From there, the only quesion is when an operation on a bytes or str
> object should return such a view, and the answer would be never.  Return
> views from view objects, the non-views from non-view objects.  If you
> want views, wrap your original object with a view, and call its methods.
> If you need a non-view, call the standard bytes/str constructor.

I do like the idea of permitting multiple string *implementations*,
some of which might store their characters elsewhere, as lists and
large tables do.

But this needs to be an automatic implementation detail, like the
distiction between int and long.  If the choice must be explicit, then
people who worry too much about speed will start wrapping all string
references in view().  This is worse (and more tempting) then the
default-argument len=len hack.

-jJ

From guido at python.org  Fri Aug 25 19:37:44 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Aug 2006 10:37:44 -0700
Subject: [Python-3000] simplifying methods (was: Re: Droping find/rfind?)
In-Reply-To: <fb6fbf560608250941q350b562ak4b325da78f1bd72@mail.gmail.com>
References: <fb6fbf560608250941q350b562ak4b325da78f1bd72@mail.gmail.com>
Message-ID: <ca471dc20608251037l29fb3368yc8416d64a8fdd04b@mail.gmail.com>

Then you would have to drop the same style of optional arguments from
all string methods.

There is a method to this madness: the slice arguments let you search
through the string without actually making the slice copy. This
matters rarely, but when it does, it can matter a lot -- imagine s
being 100 MB long, and the specified slice being a large portion of
that.

(Yes, the string "views" that some folks would like to add could solve
this in a different way. But IMO the views make everybody pay because
basic usage of the string data type will be slower, and there are
horrible worst-case scenarios (such as keeping one word from many
10-MB strings). We've gone over this many times without anybody ever
showing a realistic bullet-proof imlpementation or performance figures
other than micro-benchmarks. Perhaps someone should write a PEP so I
can reject it. :-)

--Guido

On 8/25/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/24/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > Until a few moments ago, I didn't know that str.startswith()
> > had any optional arguments
>
> I just looked them up, and they turn out to just be syntactic sugar
> for a slice.  (Even to the extent of handling omitted arguments as
> None.)  The stop argument in particular is (almost) silly.
>
>   s.startswith(prefix, start, stop) === s[start:stop].startswith(prefix)
>
> Ignoring efficiency concerns, would dropping the optional arguments
> and requiring an explicit slice be a valid Py3K simplification?
>
> -jJ
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 25 19:53:15 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Aug 2006 10:53:15 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060825162950.1717.331562078.divmod.quotient.31042@ohm>
References: <ca471dc20608250848l476d99f5w1dbb65c9518b1568@mail.gmail.com>
	<20060825162950.1717.331562078.divmod.quotient.31042@ohm>
Message-ID: <ca471dc20608251053w75dd6bf5h5e0290524424e6dd@mail.gmail.com>

On 8/25/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
> >For the record, I think this is a major case of YAGNI. You appear way
> >to obsessed with performance of some microscopic aspect of the
> >language. Please stop firing random proposals until you actually have
> >working code and proof that it matters. Speeding up microbenchmarks is
> >irrelevant.
>
> Twisted's core loop uses string views to avoid unnecessary copying.  This
> has proven to be a real-world speedup.  This isn't a synthetic benchmark
> or a micro-optimization.

OK, that's the kind of data I was hoping for; if this was mentioned
before I apologize. Did they implement this in C or in Python? Can you
point us to the docs for their API?

> I don't understand the resistance.  Is it really so earth-shatteringly
> surprising that not copying memory unnecessarily is faster than copying
> memory unnecessarily?

It depends on how much bookkeeping is needed to properly free the
underlying buffer when it is no longer referenced, and whether the
application repeatedly takes short long-lived slices of long otherwise
short-lived buffers. Unless you have a heuristic for deciding to copy
at some point, you may waste a lot of space.

> If the goal is to avoid speeding up Python programs because views are too
> complex or unpythonic or whatever, fine.  But there isn't really any
> question as to whether or not this is a real optimization.

There are many ways to implement views. It has often been proposed to
make views an automatic feature of the basic string object. There the
optimization in one case has to be weighed against the pessimization
in another case (like the bookkeeping overhead everywhere and the
worst-case scenario I mentioned above). If views have to be explicitly
requested that may not be a problem because the app author will
(hopefully) understand the issues. But even if it was just a standard
library module, I would worry that many inexperienced programmers
would complicate their code by using the string views module without
real benefits. Sort of the way some folks have knee-jerk habits to
write

  def foo(x, None=None):

if they use None anywhere in the body of the function. This should be
done only as a last resort when real-life measurements have shown that
foo() is a performance show-stopper.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From exarkun at divmod.com  Fri Aug 25 20:49:02 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Fri, 25 Aug 2006 14:49:02 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608251053w75dd6bf5h5e0290524424e6dd@mail.gmail.com>
Message-ID: <20060825184902.1717.697934511.divmod.quotient.31126@ohm>

On Fri, 25 Aug 2006 10:53:15 -0700, Guido van Rossum <guido at python.org> wrote:
>On 8/25/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
>> >For the record, I think this is a major case of YAGNI. You appear way
>> >to obsessed with performance of some microscopic aspect of the
>> >language. Please stop firing random proposals until you actually have
>> >working code and proof that it matters. Speeding up microbenchmarks is
>> >irrelevant.
>>
>>Twisted's core loop uses string views to avoid unnecessary copying.  This
>>has proven to be a real-world speedup.  This isn't a synthetic benchmark
>>or a micro-optimization.
>
>OK, that's the kind of data I was hoping for; if this was mentioned
>before I apologize. Did they implement this in C or in Python? Can you
>point us to the docs for their API?

One instance of this is an implementation detail which doesn't impact any application-level APIs:

http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py?r=17451#L88

Another instance of this is implemented in C++:

http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion

but doesn't interact a lot with Python code.  The C++ API uses char* with a length (a natural way to implement string views in C/C++).  The Python API just uses strings, because Twisted has always used str here, and passing in a buffer would break everything expecting something with str methods.

>>I don't understand the resistance.  Is it really so earth-shatteringly
>>surprising that not copying memory unnecessarily is faster than copying
>>memory unnecessarily?
>
>It depends on how much bookkeeping is needed to properly free the
>underlying buffer when it is no longer referenced, and whether the
>application repeatedly takes short long-lived slices of long otherwise
>short-lived buffers. Unless you have a heuristic for deciding to copy
>at some point, you may waste a lot of space.

Certainly.  The first link above includes an example of such a heuristic.

>>If the goal is to avoid speeding up Python programs because views are too
>>complex or unpythonic or whatever, fine.  But there isn't really any
>>question as to whether or not this is a real optimization.
>
>There are many ways to implement views. It has often been proposed to
>make views an automatic feature of the basic string object. There the
>optimization in one case has to be weighed against the pessimization
>in another case (like the bookkeeping overhead everywhere and the
>worst-case scenario I mentioned above).

I'm happy to see things progress one step at a time.  Having them _at
all_ (buffer) was a good place to start.  A view which has string methods
is a nice incremental improvement.  Maybe somewhere down the line there
can be a single type which magically knows how to behave optimally for all
programs, but I'm not asking for that yet. ;)

>If views have to be explicitly
>requested that may not be a problem because the app author will
>(hopefully) understand the issues. But even if it was just a standard
>library module, I would worry that many inexperienced programmers
>would complicate their code by using the string views module without
>real benefits. Sort of the way some folks have knee-jerk habits to
>write
>
>  def foo(x, None=None):
>
>if they use None anywhere in the body of the function. This should be
>done only as a last resort when real-life measurements have shown that
>foo() is a performance show-stopper.
>

I don't think we see people overusing buffer() in ways which damage
readability now, and buffer is even a builtin.  Tossing something off
into a module somewhere shouldn't really be a problem.  To most people
who don't actually know what they're doing, the idea to optimize code
by reducing memory copying usually just doesn't come up.

Jean-Paul

From rrr at ronadam.com  Fri Aug 25 20:59:46 2006
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 25 Aug 2006 13:59:46 -0500
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44EEEE2A.9080509@gmail.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com><44ED85E5.1000005@livinglogic.de>	<44ED9206.1080306@gmail.com><eck69v$b6n$1@sea.gmane.org>	<44EEC7CB.2090908@gmail.com>	<ecmi3j$m4m$1@sea.gmane.org>
	<44EEEE2A.9080509@gmail.com>
Message-ID: <ecnhgg$3p3$1@sea.gmane.org>

Nick Coghlan wrote:
> Fredrik Lundh wrote:
>> Nick Coghlan wrote:
>>
>>>> Nick Coghlan wrote:
>>>>
>>>>> With a variety of "view types", that work like the corresponding builtin type,
>>>>> but reference the original data structure instead of creating copies
>>>> support for string views would require some serious interpreter surgery, though,
>>>> and probably break quite a few extensions...
>>> Why do you say that?
>> because I happen to know a lot about how Python's string types are
>> implemented ?
> 
> I believe you're thinking about something far more sophisticated than what I'm 
> suggesting. I'm just talking about a Python data type in a standard library 
> module that trades off slower performance with smaller strings (due to extra 
> method call overhead) against improved scalability (due to avoidance of 
> copying strings around).
> 
>>> make a view of it
>> so to make a view of a string, you make a view of it ?
> 
> Yep - by using all those "start" and "stop" optional arguments to builtin 
> string methods to implement the methods of a string view in pure Python. By 
> creating the string view all you would really be doing is a partial 
> application of start and stop arguments on all of the relevant string methods.
> 
> I've included an example below that just supports __len__, __str__ and 
> partition(). The source object survives for as long as the view does - the 
> idea is that the view should only last while you manipulate the string, with 
> only real strings released outside the function via return statements or yield 
> expressions.


   >>>  self.source = "%s" % source

I think this should be.

    self.source = source

Other wise you are making copies of the source which is what you
are trying to avoid.  I'm not sure if python would reuse the self.source 
string, but I wouldn't count on it.


It might be nice if slice objects could be used in more ways in python. 
That may work in most cases where you would want a string view.

An example of a slice version of partition would be:  (not tested)

   def slice_partition(s, sep, sub_slice=None):
     if sub_slice is None:
        sub_slice = slice(len(s))
     found_slice = find_slice(s, sep, sub_slice)
     prefix_slice = slice(sub_slice.start, found_slice.start)
     rest_slice = slice(found_slice.stop, sub_slice.stop)
     return ( prefix_slice,
              found_slice,
              rest_slice )

   # implementation of find_slice left to readers.
   def find_slice(s, sub, sub_slice=None):
      ...
      return found_slice

Of course this isn't needed for short strings, but might be worth while 
when used with very long strings.



> # Simple string view example
> class strview(object):
>      def __new__(cls, source, start=None, stop=None):
>          self = object.__new__(cls)
>          self.source = "%s" % source
>          self.start = start if start is not None else 0
>          self.stop = stop if stop is not None else len(source)
>          return self
>      def __str__(self):
>          return self.source[self.start:self.stop]
>      def __len__(self):
>          return self.stop - self.start
>      def partition(self, sep):
>          _src = self.source
>          try:
>              startsep = _src.index(sep, self.start, self.stop)
>          except ValueError:
>              # Separator wasn't found!
>              return self, _NULL_STR, _NULL_STR
>          # Return new views of the three string parts
>          endsep = startsep + len(sep)
>          return (strview(_src, self.start, startsep),
>                  strview(_src, startsep, endsep),
>                  strview(_src, endsep, self.stop))
> 
> _NULL_STR = strview('')
> 
> def splitview(s):
>       rest = strview(s)
>       while 1:
>           prefix, found, rest = rest.partition("{")
>           if prefix:
>               yield (None, str(prefix))
>           if not found:
>               break
>           first, found, rest = rest.partition(" ")
>           if not found:
>               break
>           second, found, rest = rest.partition("}")
>           if not found:
>               break
>           yield (str(first), str(second))
> 
>  >>> list(splitview('foo{spam eggs}bar{foo bar}'))
> [(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')]



From rrr at ronadam.com  Fri Aug 25 21:08:50 2006
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 25 Aug 2006 14:08:50 -0500
Subject: [Python-3000] sort vs order (was: What should the focus for 2.6
	be?)
In-Reply-To: <fb6fbf560608250722g41acf025n9a76fd174de68171@mail.gmail.com>
References: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com>
	<fb6fbf560608250722g41acf025n9a76fd174de68171@mail.gmail.com>
Message-ID: <ecni1g$5i8$1@sea.gmane.org>

Jim Jewett wrote:

> The end result is that even if I find a solution that works, I think
> it will be common (and bug-prone) enough that it really ought to be in
> the language, or at least the standard library -- as it is today for
> objects that don't go out of their way to prevent it.

The usual way to handle this in databases is to generate an unique 
id_key when the data is entered.  That also allows for duplicate entries 
such as people with the same name, or multiple items with the same part 
number.




From guido at python.org  Fri Aug 25 21:13:31 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Aug 2006 12:13:31 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060825184902.1717.697934511.divmod.quotient.31126@ohm>
References: <ca471dc20608251053w75dd6bf5h5e0290524424e6dd@mail.gmail.com>
	<20060825184902.1717.697934511.divmod.quotient.31126@ohm>
Message-ID: <ca471dc20608251213v70f3a1b1y29df7affbf3f9522@mail.gmail.com>

On 8/25/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
> >>Twisted's core loop uses string views to avoid unnecessary copying.  This
> >>has proven to be a real-world speedup.  This isn't a synthetic benchmark
> >>or a micro-optimization.
> >
> >OK, that's the kind of data I was hoping for; if this was mentioned
> >before I apologize. Did they implement this in C or in Python? Can you
> >point us to the docs for their API?
>
> One instance of this is an implementation detail which doesn't impact any application-level APIs:
>
> http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py?r=17451#L88

You are referring to the two calls to buffer(), right? It seems a
pretty rare use case (though an important one). I wonder how often
offset != 0 in practice. I'd like the new 3.0 I/O library provide
better support for writing part of a buffer, e.g. by adding an
optional offset parameter to write().

> Another instance of this is implemented in C++:
>
> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion
>
> but doesn't interact a lot with Python code.  The C++ API uses char* with a length (a natural way to implement string views in C/C++).  The Python API just uses strings, because Twisted has always used str here, and passing in a buffer would break everything expecting something with str methods.

This doesn't seem a particularly strong use case (but I can't say I
understand the code or how it's used).

> >>I don't understand the resistance.  Is it really so earth-shatteringly
> >>surprising that not copying memory unnecessarily is faster than copying
> >>memory unnecessarily?
> >
> >It depends on how much bookkeeping is needed to properly free the
> >underlying buffer when it is no longer referenced, and whether the
> >application repeatedly takes short long-lived slices of long otherwise
> >short-lived buffers. Unless you have a heuristic for deciding to copy
> >at some point, you may waste a lot of space.
>
> Certainly.  The first link above includes an example of such a heuristic.

Because the app is in control it is easy to avoid the worst-case
behvior of the heuristoc.

> >>If the goal is to avoid speeding up Python programs because views are too
> >>complex or unpythonic or whatever, fine.  But there isn't really any
> >>question as to whether or not this is a real optimization.
> >
> >There are many ways to implement views. It has often been proposed to
> >make views an automatic feature of the basic string object. There the
> >optimization in one case has to be weighed against the pessimization
> >in another case (like the bookkeeping overhead everywhere and the
> >worst-case scenario I mentioned above).
>
> I'm happy to see things progress one step at a time.  Having them _at
> all_ (buffer) was a good place to start.

But buffer() is on the kick-list for Py3k right now. Perhaps the new
bytes object will make it possible to write the first example above
differently; bytes will be mutable which changes everything.

> A view which has string methods
> is a nice incremental improvement.  Maybe somewhere down the line there
> can be a single type which magically knows how to behave optimally for all
> programs, but I'm not asking for that yet. ;)

I still expect that a view with string methods will find more abuse
than legitimate use.

> >If views have to be explicitly
> >requested that may not be a problem because the app author will
> >(hopefully) understand the issues. But even if it was just a standard
> >library module, I would worry that many inexperienced programmers
> >would complicate their code by using the string views module without
> >real benefits. Sort of the way some folks have knee-jerk habits to
> >write
> >
> >  def foo(x, None=None):
> >
> >if they use None anywhere in the body of the function. This should be
> >done only as a last resort when real-life measurements have shown that
> >foo() is a performance show-stopper.
>
> I don't think we see people overusing buffer() in ways which damage
> readability now, and buffer is even a builtin.

But it has been riddled by problems in the past so most people know to
steer clear of it.

> Tossing something off
> into a module somewhere shouldn't really be a problem.  To most people
> who don't actually know what they're doing, the idea to optimize code
> by reducing memory copying usually just doesn't come up.

That final remark is a matter of opinion. I've seen too much code that
mindlessly copied idioms that were supposed to magically speed up
certain things to believe it. Often, people who don't know what they
are doing are more worried about speed than people who do, and they
copy all the wrong examples... :-(

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fredrik at pythonware.com  Fri Aug 25 22:23:18 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 25 Aug 2006 22:23:18 +0200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060825080148.1AA8.JCARLSON@uci.edu>
References: <20060825131452.1717.999901437.divmod.quotient.30940@ohm>	<ecmu46$u4e$1@sea.gmane.org>
	<20060825080148.1AA8.JCARLSON@uci.edu>
Message-ID: <ecnm7m$irf$1@sea.gmane.org>

Josiah Carlson wrote:

> Aside from the scheduled removal of buffer in 3.x, I see no particular
> issue with offering a bytes view and str view in 3.x via two specific
> bytes and str subtypes.

the fact that it's *impossible* to offer a view subtype that's com-
patible with the current PyString C API might be an issue, though.

</F>


From fredrik at pythonware.com  Fri Aug 25 22:27:09 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 25 Aug 2006 22:27:09 +0200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ca471dc20608251213v70f3a1b1y29df7affbf3f9522@mail.gmail.com>
References: <ca471dc20608251053w75dd6bf5h5e0290524424e6dd@mail.gmail.com>	<20060825184902.1717.697934511.divmod.quotient.31126@ohm>
	<ca471dc20608251213v70f3a1b1y29df7affbf3f9522@mail.gmail.com>
Message-ID: <ecnmer$irf$2@sea.gmane.org>

Guido van Rossum wrote:

> That final remark is a matter of opinion. I've seen too much code that
> mindlessly copied idioms that were supposed to magically speed up
> certain things to believe it. Often, people who don't know what they
> are doing are more worried about speed than people who do, and they
> copy all the wrong examples... :-(

+1.

</F>


From krstic at solarsail.hcs.harvard.edu  Fri Aug 25 22:47:29 2006
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?B?SXZhbiBLcnN0acSH?=)
Date: Fri, 25 Aug 2006 16:47:29 -0400
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <20060825184902.1717.697934511.divmod.quotient.31126@ohm>
References: <20060825184902.1717.697934511.divmod.quotient.31126@ohm>
Message-ID: <44EF61E1.5050001@solarsail.hcs.harvard.edu>

Jean-Paul Calderone wrote:
> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion

This is the same Itamar who, in the talk I linked a few days ago
(http://ln-s.net/D+u) extolled buffer as a very real performance
improvement in fast python networking, and asked for broader and more
complete support for buffers, rather than their removal.

A bunch of people, myself included, want to use Python as a persistent
network server. Proper support for reading into already-allocated
memory, and non-copying strings are pretty indispensable for serious
production use.

-- 
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | GPG: 0x147C722D

From tjreedy at udel.edu  Fri Aug 25 23:00:47 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 25 Aug 2006 17:00:47 -0400
Subject: [Python-3000] Droping find/rfind?
References: <ca471dc20608251053w75dd6bf5h5e0290524424e6dd@mail.gmail.com><20060825184902.1717.697934511.divmod.quotient.31126@ohm>
	<ca471dc20608251213v70f3a1b1y29df7affbf3f9522@mail.gmail.com>
Message-ID: <ecnodv$pov$1@sea.gmane.org>


"Guido van Rossum" <guido at python.org> wrote in message 
news:ca471dc20608251213v70f3a1b1y29df7affbf3f9522 at mail.gmail.com...
> But buffer() is on the kick-list for Py3k right now. Perhaps the new
> bytes object will make it possible to write the first example above
> differently; bytes will be mutable which changes everything.

I never learned about buffers and buffer() because in various ways they 
have been underdocumented and label problematical and sujbect to revision 
or removal.

> I still expect that a view with string methods will find more abuse
> than legitimate use.

Perhaps views should first be written and released by advocates as 
3rd-party modules (in C or Python), possibly in more than one competing 
version, to be tested by interested members of the community and subject to 
the usual criteria for inclusion in the standard library or even the core. 
Then we would have some performance and usage data to argue with ;-).

Terry Jan Reedy
 




From tjreedy at udel.edu  Fri Aug 25 23:08:40 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 25 Aug 2006 17:08:40 -0400
Subject: [Python-3000] sort vs order (was: What should the focus for
	2.6be?)
References: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com><fb6fbf560608250722g41acf025n9a76fd174de68171@mail.gmail.com>
	<ecni1g$5i8$1@sea.gmane.org>
Message-ID: <ecnoso$r5c$1@sea.gmane.org>


"Ron Adam" <rrr at ronadam.com> wrote in message 
news:ecni1g$5i8$1 at sea.gmane.org...
> Jim Jewett wrote:
>
>> The end result is that even if I find a solution that works, I think
>> it will be common (and bug-prone) enough that it really ought to be in
>> the language, or at least the standard library -- as it is today for
>> objects that don't go out of their way to prevent it.

Id() *is* in builtins.  Now that sort has a key parameter, I think an 
explicit 'key = id' qualifies enough as 'in the language' for something 
used not too often.

> The usual way to handle this in databases is to generate an unique
> id_key when the data is entered.

Which is what Python does when objects are created.

>  That also allows for duplicate entries
> such as people with the same name, or multiple items with the same part
> number.

Or multiple objects with the same value.

Terry Jan Reedy




From fredrik at pythonware.com  Sat Aug 26 01:16:04 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 26 Aug 2006 01:16:04 +0200
Subject: [Python-3000] PyString C API
Message-ID: <eco0bn$f7b$1@sea.gmane.org>

> the fact that it's *impossible* to offer a view subtype that's com-
> patible with the current PyString C API might be an issue, though.

what's the current thinking wrt. the PyString C API, btw.  has any of the
various bytes/wide string design proposals looked at the C API level ?

</F>




From guido at python.org  Sat Aug 26 01:32:48 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Aug 2006 16:32:48 -0700
Subject: [Python-3000] PyString C API
In-Reply-To: <eco0bn$f7b$1@sea.gmane.org>
References: <eco0bn$f7b$1@sea.gmane.org>
Message-ID: <ca471dc20608251632q77a7a264y5f37439370b5aa4@mail.gmail.com>

On 8/25/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> > the fact that it's *impossible* to offer a view subtype that's com-
> > patible with the current PyString C API might be an issue, though.
>
> what's the current thinking wrt. the PyString C API, btw.  has any of the
> various bytes/wide string design proposals looked at the C API level ?

No... I was hoping to get to that but ended up spending unanticipated
time on fixing comparisons. Maybe the first step ought to be similar
to what was done for int/long unification -- keep both the PyString_
and PyUnicode_ APIs around but make the PyString_ APIs do whatever
they do on Unicode objects instead. Each use of certain macros will
still have to be patched, obviously; e.g. a common way to create a
string is to call PyString_FromStringAndSize(NULL, nbytes) and then to
call something like memcpy(PyString_AS_STRING(obj), source, nbytes) --
this won't work of course.

There are a bunch of PyBytes_ APIs that can be used in those places
where 8-bit strings are really used to hold binary data, not
characters. These have been modeled on the PyString APIs (even with
AS_STRING and GET_SIZE macros). See Include/bytesobject.h.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jackdied at jackdied.com  Sat Aug 26 02:19:23 2006
From: jackdied at jackdied.com (Jack Diederich)
Date: Fri, 25 Aug 2006 20:19:23 -0400
Subject: [Python-3000] cleaning up *path.py code duplication
Message-ID: <20060826001923.GD24154@performancedrivers.com>

While checking find() uses in the stdlib I noticed that the various
path modules have duplicate code and docstrings for some generic path
manipulations.  Delightfully they even have different implementations
and docstrings for identical functions. splitext() is a great bad
example - os2emxpath.splitext() builds up strings by doing char-by-char
concatenations where everyone else uses find() + slice.

If there are no objections I'll move these into a module named 
genericpath.py and change the others to do

from genericpath import func1, func2, funcN

where applicable.

So, any objections? Should it be a 2.6 backport too?

-Jack

From guido at python.org  Sat Aug 26 02:35:33 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Aug 2006 17:35:33 -0700
Subject: [Python-3000] cleaning up *path.py code duplication
In-Reply-To: <20060826001923.GD24154@performancedrivers.com>
References: <20060826001923.GD24154@performancedrivers.com>
Message-ID: <ca471dc20608251735qb19ad7bk1a163f77ed41dfe1@mail.gmail.com>

Sounds like a great 2.6 project. Beware of things that are
intentionally different between platforms of course!

--Guido

On 8/25/06, Jack Diederich <jackdied at jackdied.com> wrote:
> While checking find() uses in the stdlib I noticed that the various
> path modules have duplicate code and docstrings for some generic path
> manipulations.  Delightfully they even have different implementations
> and docstrings for identical functions. splitext() is a great bad
> example - os2emxpath.splitext() builds up strings by doing char-by-char
> concatenations where everyone else uses find() + slice.
>
> If there are no objections I'll move these into a module named
> genericpath.py and change the others to do
>
> from genericpath import func1, func2, funcN
>
> where applicable.
>
> So, any objections? Should it be a 2.6 backport too?
>
> -Jack
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Sat Aug 26 03:32:10 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 26 Aug 2006 13:32:10 +1200
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <1cb725390608250839s78cb4c46s378bb56313c1932a@mail.gmail.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>
	<20060823191222.1A76.JCARLSON@uci.edu>
	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com>
	<eck0b8$mno$1@sea.gmane.org> <44EE4326.1070604@canterbury.ac.nz>
	<1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com>
	<44EE4F15.2070301@canterbury.ac.nz>
	<1cb725390608250839s78cb4c46s378bb56313c1932a@mail.gmail.com>
Message-ID: <44EFA49A.1060201@canterbury.ac.nz>

Paul Prescod wrote:

> I also didn't know about the optional arguments to startswith and wonder 
> if they are much used or just cruft.

Looking through the string methods, it appears that only
a few of them, seemingly chosen arbitrarily, have start
and stop arguments.

Seems to me a string-view object supporting all of the
string methods would be a much better idea than this
haphazard mixture, and would fit in nicely with the
Py3k views philosophy.

--
Greg

From jackdied at jackdied.com  Sat Aug 26 03:52:43 2006
From: jackdied at jackdied.com (Jack Diederich)
Date: Fri, 25 Aug 2006 21:52:43 -0400
Subject: [Python-3000] cleaning up *path.py code duplication
In-Reply-To: <ca471dc20608251735qb19ad7bk1a163f77ed41dfe1@mail.gmail.com>
References: <20060826001923.GD24154@performancedrivers.com>
	<ca471dc20608251735qb19ad7bk1a163f77ed41dfe1@mail.gmail.com>
Message-ID: <20060826015243.GE24154@performancedrivers.com>

Ooph, there is some dissonance in the comments and the code.  Cut-n-paste
errors I suppose.

-- ntpath.py --
def islink(path):
    """Test for symbolic link.  On WindowsNT/95 always returns false"""
    return False

<snip 10 lines>

# This follows symbolic links, so both islink() and isdir() can be true
# for the same path.

def isfile(path):
    """Test whether a path is a regular file"""
-- end exeprt --

I'll try and keep a list so those in the know can do a post mortem on the
comments.  I'm only useful for vetting the *nix versions.

-Jack

On Fri, Aug 25, 2006 at 05:35:33PM -0700, Guido van Rossum wrote:
> Sounds like a great 2.6 project. Beware of things that are
> intentionally different between platforms of course!
> 
> --Guido
> 
> On 8/25/06, Jack Diederich <jackdied at jackdied.com> wrote:
> > While checking find() uses in the stdlib I noticed that the various
> > path modules have duplicate code and docstrings for some generic path
> > manipulations.  Delightfully they even have different implementations
> > and docstrings for identical functions. splitext() is a great bad
> > example - os2emxpath.splitext() builds up strings by doing char-by-char
> > concatenations where everyone else uses find() + slice.
> >
> > If there are no objections I'll move these into a module named
> > genericpath.py and change the others to do
> >
> > from genericpath import func1, func2, funcN
> >
> > where applicable.
> >
> > So, any objections? Should it be a 2.6 backport too?
> >
> > -Jack
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
> 
> 
> -- 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jack%40performancedrivers.com
> 

From ncoghlan at gmail.com  Sat Aug 26 09:27:46 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 26 Aug 2006 17:27:46 +1000
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44EF61E1.5050001@solarsail.hcs.harvard.edu>
References: <20060825184902.1717.697934511.divmod.quotient.31126@ohm>
	<44EF61E1.5050001@solarsail.hcs.harvard.edu>
Message-ID: <44EFF7F2.7070407@gmail.com>

Ivan Krsti? wrote:
> Jean-Paul Calderone wrote:
>> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion
> 
> This is the same Itamar who, in the talk I linked a few days ago
> (http://ln-s.net/D+u) extolled buffer as a very real performance
> improvement in fast python networking, and asked for broader and more
> complete support for buffers, rather than their removal.
> 
> A bunch of people, myself included, want to use Python as a persistent
> network server. Proper support for reading into already-allocated
> memory, and non-copying strings are pretty indispensable for serious
> production use.

A mutable bytes type with deque-like performance characteristics (i.e O(1) 
insert/pop at index 0 as well as at the end), as well as the appropriate 
mutating methods (like read_into()) should go a long way to meeting those needs.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sat Aug 26 10:02:15 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 26 Aug 2006 18:02:15 +1000
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <ecnhgg$3p3$1@sea.gmane.org>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com><44ED85E5.1000005@livinglogic.de>	<44ED9206.1080306@gmail.com><eck69v$b6n$1@sea.gmane.org>	<44EEC7CB.2090908@gmail.com>	<ecmi3j$m4m$1@sea.gmane.org>	<44EEEE2A.9080509@gmail.com>
	<ecnhgg$3p3$1@sea.gmane.org>
Message-ID: <44F00007.3050107@gmail.com>

Ron Adam wrote:
> Nick Coghlan wrote:
>> Fredrik Lundh wrote:
>>> Nick Coghlan wrote:
>>>
>>>>> Nick Coghlan wrote:
>>>>>
>>>>>> With a variety of "view types", that work like the corresponding builtin type,
>>>>>> but reference the original data structure instead of creating copies
>>>>> support for string views would require some serious interpreter surgery, though,
>>>>> and probably break quite a few extensions...
>>>> Why do you say that?
>>> because I happen to know a lot about how Python's string types are
>>> implemented ?
>> I believe you're thinking about something far more sophisticated than what I'm 
>> suggesting. I'm just talking about a Python data type in a standard library 
>> module that trades off slower performance with smaller strings (due to extra 
>> method call overhead) against improved scalability (due to avoidance of 
>> copying strings around).
>>
>>>> make a view of it
>>> so to make a view of a string, you make a view of it ?
>> Yep - by using all those "start" and "stop" optional arguments to builtin 
>> string methods to implement the methods of a string view in pure Python. By 
>> creating the string view all you would really be doing is a partial 
>> application of start and stop arguments on all of the relevant string methods.
>>
>> I've included an example below that just supports __len__, __str__ and 
>> partition(). The source object survives for as long as the view does - the 
>> idea is that the view should only last while you manipulate the string, with 
>> only real strings released outside the function via return statements or yield 
>> expressions.
> 
> 
>    >>>  self.source = "%s" % source
> 
> I think this should be.
> 
>     self.source = source
> 
> Other wise you are making copies of the source which is what you
> are trying to avoid.  I'm not sure if python would reuse the self.source 
> string, but I wouldn't count on it.

CPython 2.5 certainly doesn't reuse the existing string object. Given that 
what I wrote is the way to ensure you have a builtin string type (str or 
unicode) without coercing actual unicode objects to str objects or vice-versa, 
it should probably be subjected to the same optimisation as the str() and 
unicode() constructors (i.e., simply increfing and returning the original 
builtin string).

> It might be nice if slice objects could be used in more ways in python. 
> That may work in most cases where you would want a string view.

That's quite an interesting idea. With that approach, rather than having to 
duplicate 'concrete sequence with copying semantics' and 'sequence view with 
non-copying semantics' everywhere, you could just provide methods on objects 
that returned the appropriate slice objects representing the location of 
relevant sections, rather than copies of the sections themselves.

To make that work effectively, you'd need to implement __nonzero__ on slice 
objects as "((self.stop - self.start) // self.step) > 0" (Either that or 
implement __len__, which would contribute to making slice() look more and more 
like xrange(), as someone else noted recently).

Using the same signature as partition:

    def partition_indices(self, sep, start=None, stop=None):
        if start is None: start = 0
        if stop is None: stop = len(s)
        try:
            idxsep = self.index(sep, start, stop)
        except ValueError:
            return slice(start, stop), slice(0), slice(0)
        endsep = idxsep + len(sep)
        return slice(start, idxsep), slice(idxsep, endsep), slice(endsep, stop)

Then partition() itself would be equivalent to:

    def partition(self, sep, start=None, stop=None):
        before, sep, after = self.partition_indices(sep, start, stop)
        return self[before], self[sep], self[after]

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From jcarlson at uci.edu  Sat Aug 26 10:29:01 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sat, 26 Aug 2006 01:29:01 -0700
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44EFF7F2.7070407@gmail.com>
References: <44EF61E1.5050001@solarsail.hcs.harvard.edu>
	<44EFF7F2.7070407@gmail.com>
Message-ID: <20060826012418.1ABA.JCARLSON@uci.edu>


Nick Coghlan <ncoghlan at gmail.com> wrote:
> Ivan Krsti?? wrote:
> > Jean-Paul Calderone wrote:
> >> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion
> > 
> > This is the same Itamar who, in the talk I linked a few days ago
> > (http://ln-s.net/D+u) extolled buffer as a very real performance
> > improvement in fast python networking, and asked for broader and more
> > complete support for buffers, rather than their removal.
> > 
> > A bunch of people, myself included, want to use Python as a persistent
> > network server. Proper support for reading into already-allocated
> > memory, and non-copying strings are pretty indispensable for serious
> > production use.
> 
> A mutable bytes type with deque-like performance characteristics (i.e O(1) 
> insert/pop at index 0 as well as at the end), as well as the appropriate 
> mutating methods (like read_into()) should go a long way to meeting those needs.

The implementation of deque and the idea behind bytes are not compatible. 
Everything I've heard about the proposal of bytes is that it is
effectively a C unsigned char[] with some convenience methods, very
similar to a Python array.array("B"), with different methods.  There is
also an implementation in the Py3k branch.

Also, while I would have a use for bytes as currently implemented (with
readinto() ), I would have approximately zero use for a deque-like bytes
object (never mind that due to Python not allowing multi-segment buffers,
etc., it would be functionally impossible to get equivalent time bounds).

 - Josiah


From ncoghlan at iinet.net.au  Sat Aug 26 11:12:27 2006
From: ncoghlan at iinet.net.au (Nick Coghlan)
Date: Sat, 26 Aug 2006 19:12:27 +1000
Subject: [Python-3000] Making more effective use of slice objects in Py3k
Message-ID: <44F0107B.20205@iinet.net.au>

This idea is inspired by the find/rfind string discussion (particularly a 
couple of comments from Jim and Ron), but I think the applicability may prove 
to be wider than just string methods (e.g. I suspect it may prove useful for 
the bytes() type as well).

Copy-on-slice semantics are by far the easiest semantics to deal with in most 
cases, as they result in the fewest nasty surprises. However, they have one 
obvious drawback: performance can suffer badly when dealing with large 
datasets (copying 10 MB chunks of memory around can take a while!).

There are a couple of existing workarounds for this: buffer() objects, and the 
start/stop arguments to a variety of string methods. Neither of these is 
particular convenient to work with, and buffer() is slated to go away in Py3k.

I think an enriched slicing model that allows sequence views to be expressed 
easily as "this slice of this sequence" would allow this to be dealt with 
cleanly, without requiring every sequence to provide a corresponding "sequence 
view" with non-copying semantics. I think Guido's concern that people will 
reach for string views when they don't need them is also valid (as I believe 
that it is most often inexperience that leads to premature optimization that 
then leads to needless code complexity).

The specific changes I suggest based on the find/rfind discussion are:

   1. make range() (what used to be xrange()) a subclass of slice(), so that 
range objects can be used to index sequences. The only differences between 
range() and slice() would then be that start/stop/step will never be None for 
range instances, and range instances act like an immutable sequence while 
slice instances do not (i.e. range objects would grow an indices() method).

   2. change range() and slice() to accept slice() instances as arguments so 
that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError 
if x.stop is None).

   3. change API's that currently accept start/stop arguments (like string 
methods) to accept a single slice() instance instead (possibly raising 
ValueError if step != 1).

   4. provide an additional string method partition_indices() that returns 3 
range() objects instead of 3 new strings

The new method would have semantics like:

   def partition_indices(self, sep, limits=None):
       if limits is None:
           limits = range(0, len(self))
       else:
           limits = limits.indices(len(self))
       try:
           idxsep = self.index(sep, limits)
       except ValueError:
           return limits, range(0), range(0)
       endsep = idxsep + len(sep)
       return (range(limits.start, idxsep),
               range(idxsep, endsep),
               range(endsep, limits.stop))

With partition() itself being equivalent to:

     def partition(self, sep, subseq=None):
         before, sep, after = self.partition_indices(sep, subseq)
         return self[before], self[sep], self[after]

Finally, an efficient partition based implementation of the example from 
Walter that started the whole discussion about views and the problem with 
excessive copying would look like:

def splitpartition_indices(s):
      rest = range(len(s))
      while 1:
          prefix, lbrace, rest = s.partition_indices("{", rest)
          first, space, rest = s.partition_indices(" ", rest)
          second, rbrace, rest = s.partition_indices("}", rest)
          if prefix:
              yield (None, s[prefix])
          if not (lbrace and space and rbrace):
              break
          yield (s[first], s[second])

(I know the above misses a micro-optimization, in that it calls partition 
again on an empty subsequence, even if space or lbrace are False. I believe 
doing the three partition calls together makes it much easier to read, and 
searching an empty string is pretty quick).

For comparison, here's the normal copying version that has problems scaling to 
large strings:

def splitpartition(s):
      rest = s
      while 1:
          prefix, lbrace, rest = rest.partition_indices("{")
          first, space, rest = rest.partition_indices(" ")
          second, rbrace, rest = rest.partition_indices("}")
          if prefix:
              yield (None, prefix)
          if not (lbrace and space and rbrace):
              break
          yield (first, second)

Should I make a Py3k PEP for this?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sat Aug 26 11:40:19 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 26 Aug 2006 19:40:19 +1000
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F0107B.20205@iinet.net.au>
References: <44F0107B.20205@iinet.net.au>
Message-ID: <44F01703.6070200@gmail.com>


Nick Coghlan wrote:

A couple of errors in the sample code.

> The new method would have semantics like:
> 
>    def partition_indices(self, sep, limits=None):
>        if limits is None:
>            limits = range(0, len(self))
>        else:
>            limits = limits.indices(len(self))

Either that line should be:
            limits = range(*limits.indices(len(self)))

Or the definition of indices() would need to be changed to return a range() 
object instead of a 3-tuple.

> For comparison, here's the normal copying version that has problems scaling to 
> large strings:
> 
> def splitpartition(s):
>       rest = s
>       while 1:
>           prefix, lbrace, rest = rest.partition_indices("{")
>           first, space, rest = rest.partition_indices(" ")
>           second, rbrace, rest = rest.partition_indices("}")

Those 3 lines should be:
           prefix, lbrace, rest = rest.partition("{")
           first, space, rest = rest.partition(" ")
           second, rbrace, rest = rest.partition("}")

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From rrr at ronadam.com  Sat Aug 26 13:46:14 2006
From: rrr at ronadam.com (Ron Adam)
Date: Sat, 26 Aug 2006 06:46:14 -0500
Subject: [Python-3000] Droping find/rfind?
In-Reply-To: <44F00007.3050107@gmail.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com>	<20060823191222.1A76.JCARLSON@uci.edu>	<ca471dc20608231939j4205a75dxe0072efc5065cea9@mail.gmail.com><44ED85E5.1000005@livinglogic.de>	<44ED9206.1080306@gmail.com><eck69v$b6n$1@sea.gmane.org>	<44EEC7CB.2090908@gmail.com>	<ecmi3j$m4m$1@sea.gmane.org>	<44EEEE2A.9080509@gmail.com>	<ecnhgg$3p3$1@sea.gmane.org>
	<44F00007.3050107@gmail.com>
Message-ID: <ecpcfk$7dq$1@sea.gmane.org>

Nick Coghlan wrote:
> Ron Adam wrote:
>> Nick Coghlan wrote:

[clipped]

>> It might be nice if slice objects could be used in more ways in python. 
>> That may work in most cases where you would want a string view.
> 
> That's quite an interesting idea. With that approach, rather than having to 
> duplicate 'concrete sequence with copying semantics' and 'sequence view with 
> non-copying semantics' everywhere, you could just provide methods on objects 
> that returned the appropriate slice objects representing the location of 
> relevant sections, rather than copies of the sections themselves.

Yes, and possibly having more methods that accept slice objects could 
make that idea work in a way that would seem more natural.


> To make that work effectively, you'd need to implement __nonzero__ on slice 
> objects as "((self.stop - self.start) // self.step) > 0" (Either that or 
> implement __len__, which would contribute to making slice() look more and more 
> like xrange(), as someone else noted recently).

Since xrange() has the same signature, it might be nice to be able to
use a slice object directly in xrange to get indices to a substring or list.

For that to work, slice.indices would need to not return None, and/or
xrange would need to accept None.  They differ in how they handle
negative indices as well.  So I expect it may be too big of a change.


> Using the same signature as partition:
> 
>     def partition_indices(self, sep, start=None, stop=None):
>         if start is None: start = 0
>         if stop is None: stop = len(s)
>         try:
>             idxsep = self.index(sep, start, stop)
>         except ValueError:
>             return slice(start, stop), slice(0), slice(0)
>         endsep = idxsep + len(sep)
>         return slice(start, idxsep), slice(idxsep, endsep), slice(endsep, stop)
> 
> Then partition() itself would be equivalent to:
> 
>     def partition(self, sep, start=None, stop=None):
>         before, sep, after = self.partition_indices(sep, start, stop)
>         return self[before], self[sep], self[after]
> 
> Cheers,
> Nick.


Just a little timing for the fun of it. ;-)


2.5c1 (r25c1:51305, Aug 17 2006, 10:41:11) [MSC v.1310 32 bit (Intel)]
splitindex      : 0.02866
splitview       : 0.28021
splitpartition  : 0.34991
splitslice      : 0.07892


This may not be the best use case, (if you can call it that).  It does 
show that the slice "as a view" idea may have some potential. But 
underneath it's just using index, so a well written function with index 
will probably always be faster.

Cheers,
    Ron


"""
     Compare different index, string view, and partition methods.
"""

# -------- Split by str.index.
def splitindex(s):
      pos = 0
      while True:
        try:
            posstart = s.index("{", pos)
            posarg = s.index(" ", posstart)
            posend = s.index("}", posarg)
        except ValueError:
            break
        yield None, s[pos:posstart]
        yield s[posstart+1:posarg], s[posarg+1:posend]
        pos = posend+1
      rest = s[pos:]
      if rest:
          yield None, rest


# --------- Simple string view.
class strview(object):
      def __new__(cls, source, start=None, stop=None):
          self = object.__new__(cls)
          self.source = source
          #self.start = start if start is not None else 0
          self.start = start != None and start or 0
          #self.stop = stop if stop is not None else len(source)
          self.stop = stop != None and stop or len(source)
          return self
      def __str__(self):
          return self.source[self.start:self.stop]
      def __len__(self):
          return self.stop - self.start
      def partition(self, sep):
          _src = self.source
          try:
              startsep = _src.index(sep, self.start, self.stop)
          except ValueError:
              # Separator wasn't found!
              return self, _NULL_STR, _NULL_STR
          # Return new views of the three string parts
          endsep = startsep + len(sep)
          return (strview(_src, self.start, startsep),
                  strview(_src, startsep, endsep),
                  strview(_src, endsep, self.stop))

_NULL_STR = strview('')

def splitview(s):
       rest = strview(s)
       while 1:
           prefix, found, rest = rest.partition("{")
           if prefix:
               yield (None, str(prefix))
           if not found:
               break
           first, found, rest = rest.partition(" ")
           if not found:
               break
           second, found, rest = rest.partition("}")
           if not found:
               break
           yield (str(first), str(second))


# -------- Split by str.partition.
def splitpartition(s):
     rest = s
     while 1:
         prefix, found, temp = rest.partition("{")
         first, found, temp = temp.partition(" ")
         second, found, temp = temp.partition("}")
         if not found: break
         yield None, prefix
         yield first, second
         rest = temp
     if rest != '':
         yield None, rest


# -------- Split by partition slices.
import sys

def partslice(s, sep, sub_slice=slice(0, sys.maxint)):
     start, stop = sub_slice.start, sub_slice.stop
     try:
         found = s.index(sep, start, stop)
     except ValueError:
         return sub_slice, slice(stop,stop), slice(stop,stop)
     foundend = found + len(sep)
     return ( slice(start, found),
              slice(found, foundend),
              slice(foundend, stop) )

def splitslice(s):
     rest = slice(0, sys.maxint)
     while 1:
         prefix, found, temp = partslice(s, "{", rest)
         first, found, temp = partslice(s, " ", temp)
         second, found, temp = partslice(s, "}", temp)
         if found.start == found.stop:
             break
         yield None, s[prefix]
         yield s[first], s[second]
         rest = temp
     if rest.start != rest.stop:
         yield None, s[rest]

# -------- Tests.
import time
print sys.version

s = 'foo{spam eggs}bar{ham eggs}fob{beacon eggs}' * 2000 + 'xyz'
r = list(splitindex(s))
functions = [splitindex, splitview, splitpartition, splitslice]
for f in functions:
     start = time.clock()
     result = list(f(s))
     print '%-16s: %7.5f' % (f.__name__, time.clock()-start)
     assert result == r





From qrczak at knm.org.pl  Sat Aug 26 14:41:57 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sat, 26 Aug 2006 14:41:57 +0200
Subject: [Python-3000] long/int unification
In-Reply-To: <20060824232848.1A9F.JCARLSON@uci.edu> (Josiah Carlson's
	message of "Thu, 24 Aug 2006 23:39:22 -0700")
References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de>
	<ecm342$ab0$1@sea.gmane.org> <20060824232848.1A9F.JCARLSON@uci.edu>
Message-ID: <87u03z1xey.fsf@qrnik.zagroda>

Josiah Carlson <jcarlson at uci.edu> writes:

> Also, depending on the objects, one may consider a few other tagged
> objects, like perhaps None, True, and False

I doubt that it's worth it: they are not dynamically computed anyway,
so there is little gain (only avoiding manipulating their refcounts),
and the loss is a greater number of special cases when accessing
contents of every object.

> or even just use 31/63 bits for the tagged integer value, with a 1
> in the lowest bit signifying it as a tagged integer.

This is exactly what my compiler of my language does.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From tjreedy at udel.edu  Sat Aug 26 15:26:04 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 26 Aug 2006 09:26:04 -0400
Subject: [Python-3000] Making more effective use of slice objects in Py3k
References: <44F0107B.20205@iinet.net.au>
Message-ID: <ecpi5c$l5q$1@sea.gmane.org>


"Nick Coghlan" <ncoghlan at iinet.net.au> wrote in message 
news:44F0107B.20205 at iinet.net.au...

> I think an enriched slicing model that allows sequence views to be 
> expressed
> easily as "this slice of this sequence" would allow this to be dealt with
> cleanly, without requiring every sequence to provide a corresponding 
> "sequence
> view" with non-copying semantics.

I think this is promising.  I like the potential unification.

> Should I make a Py3k PEP for this?

I think so ;-)

tjr




From guido at python.org  Sat Aug 26 18:26:48 2006
From: guido at python.org (Guido van Rossum)
Date: Sat, 26 Aug 2006 09:26:48 -0700
Subject: [Python-3000] long/int unification
In-Reply-To: <ecmf3p$d1i$1@sea.gmane.org>
References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de>
	<ecm342$ab0$1@sea.gmane.org> <20060824232848.1A9F.JCARLSON@uci.edu>
	<ecmf3p$d1i$1@sea.gmane.org>
Message-ID: <ca471dc20608260926g52e6fa8bh9e9c81598ddba74@mail.gmail.com>

On 8/25/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> Josiah Carlson wrote:
>
> > In the integer case, it reminds me of James Knight's tagged integer
> > patch to 2.3 [1].  If using long exclusively is 50% slower, why not try
> > the improved speed approach?
>
> looks like GvR was -1000 on this idea at the time, though...

I still am, because it requires extra tests for every incref and
decref and also for every use of an object's type pointer. I worry
about the cost of these tests, but I worry much more about the bugs it
will add when people don't tests first. ABC used this approach and we
kept finding bugs due to this problem.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Aug 26 18:30:57 2006
From: guido at python.org (Guido van Rossum)
Date: Sat, 26 Aug 2006 09:30:57 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F0107B.20205@iinet.net.au>
References: <44F0107B.20205@iinet.net.au>
Message-ID: <ca471dc20608260930h528a7f60rc254eb2f75398a57@mail.gmail.com>

Can you explain in a sentence or two how these changes would be
*used*? Your code examples don't speak for themselves (maybe because
It's Saturday morning :-). Short examples of something clumsy and/or
slow that we'd have to write today compared to something fast and
elegant that we could write after the change woulde be quite helpful.
The exact inheritance relationship between slice and [x]range seems a
fairly uninteresting details in comparison.

--Guido

On 8/26/06, Nick Coghlan <ncoghlan at iinet.net.au> wrote:
> This idea is inspired by the find/rfind string discussion (particularly a
> couple of comments from Jim and Ron), but I think the applicability may prove
> to be wider than just string methods (e.g. I suspect it may prove useful for
> the bytes() type as well).
>
> Copy-on-slice semantics are by far the easiest semantics to deal with in most
> cases, as they result in the fewest nasty surprises. However, they have one
> obvious drawback: performance can suffer badly when dealing with large
> datasets (copying 10 MB chunks of memory around can take a while!).
>
> There are a couple of existing workarounds for this: buffer() objects, and the
> start/stop arguments to a variety of string methods. Neither of these is
> particular convenient to work with, and buffer() is slated to go away in Py3k.
>
> I think an enriched slicing model that allows sequence views to be expressed
> easily as "this slice of this sequence" would allow this to be dealt with
> cleanly, without requiring every sequence to provide a corresponding "sequence
> view" with non-copying semantics. I think Guido's concern that people will
> reach for string views when they don't need them is also valid (as I believe
> that it is most often inexperience that leads to premature optimization that
> then leads to needless code complexity).
>
> The specific changes I suggest based on the find/rfind discussion are:
>
>    1. make range() (what used to be xrange()) a subclass of slice(), so that
> range objects can be used to index sequences. The only differences between
> range() and slice() would then be that start/stop/step will never be None for
> range instances, and range instances act like an immutable sequence while
> slice instances do not (i.e. range objects would grow an indices() method).
>
>    2. change range() and slice() to accept slice() instances as arguments so
> that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError
> if x.stop is None).
>
>    3. change API's that currently accept start/stop arguments (like string
> methods) to accept a single slice() instance instead (possibly raising
> ValueError if step != 1).
>
>    4. provide an additional string method partition_indices() that returns 3
> range() objects instead of 3 new strings
>
> The new method would have semantics like:
>
>    def partition_indices(self, sep, limits=None):
>        if limits is None:
>            limits = range(0, len(self))
>        else:
>            limits = limits.indices(len(self))
>        try:
>            idxsep = self.index(sep, limits)
>        except ValueError:
>            return limits, range(0), range(0)
>        endsep = idxsep + len(sep)
>        return (range(limits.start, idxsep),
>                range(idxsep, endsep),
>                range(endsep, limits.stop))
>
> With partition() itself being equivalent to:
>
>      def partition(self, sep, subseq=None):
>          before, sep, after = self.partition_indices(sep, subseq)
>          return self[before], self[sep], self[after]
>
> Finally, an efficient partition based implementation of the example from
> Walter that started the whole discussion about views and the problem with
> excessive copying would look like:
>
> def splitpartition_indices(s):
>       rest = range(len(s))
>       while 1:
>           prefix, lbrace, rest = s.partition_indices("{", rest)
>           first, space, rest = s.partition_indices(" ", rest)
>           second, rbrace, rest = s.partition_indices("}", rest)
>           if prefix:
>               yield (None, s[prefix])
>           if not (lbrace and space and rbrace):
>               break
>           yield (s[first], s[second])
>
> (I know the above misses a micro-optimization, in that it calls partition
> again on an empty subsequence, even if space or lbrace are False. I believe
> doing the three partition calls together makes it much easier to read, and
> searching an empty string is pretty quick).
>
> For comparison, here's the normal copying version that has problems scaling to
> large strings:
>
> def splitpartition(s):
>       rest = s
>       while 1:
>           prefix, lbrace, rest = rest.partition_indices("{")
>           first, space, rest = rest.partition_indices(" ")
>           second, rbrace, rest = rest.partition_indices("}")
>           if prefix:
>               yield (None, prefix)
>           if not (lbrace and space and rbrace):
>               break
>           yield (first, second)
>
> Should I make a Py3k PEP for this?
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Sat Aug 26 19:00:41 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sat, 26 Aug 2006 10:00:41 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F0107B.20205@iinet.net.au>
References: <44F0107B.20205@iinet.net.au>
Message-ID: <20060826084138.1AC0.JCARLSON@uci.edu>


Nick Coghlan <ncoghlan at iinet.net.au> wrote:
> 
> This idea is inspired by the find/rfind string discussion (particularly a 
> couple of comments from Jim and Ron), but I think the applicability may prove 
> to be wider than just string methods (e.g. I suspect it may prove useful for 
> the bytes() type as well).

A couple comments...

I don't particularly like the idea of using lists (or really iter(list) ),
range, or slice objects as defining what indices remain for a particular
string operation.  It just doesn't seem like the *right* thing to do.

> There are a couple of existing workarounds for this: buffer() objects, and the 
> start/stop arguments to a variety of string methods. Neither of these is 
> particular convenient to work with, and buffer() is slated to go away in Py3k.

Ahh, but string views offer a significantly more reasonable mechanism.

string = stringview(string)

Now, you can do things like parition(), slicing (with step=1), etc., and
all can return further string views.  Users don't need to learn a new
semantic (pass the sequence of indices).  We can toss all of the
optional start, stop arguments to all string functions, and replace them
with either of the following:
    result = stringview(string, start=None, stop=None).method(args)
    
    string = stringview(string)
    result = string[start:stop].method(args)


Perhaps one of the reasons why I prefer string views over this indices
mechanism is because I'm familliar with buffers, the idea of just having
a pointer into another structure, etc.  It just feels more natural from
my 8 years of C and 6 years of Python.


 - Josiah


From jackdied at jackdied.com  Sun Aug 27 02:24:04 2006
From: jackdied at jackdied.com (Jack Diederich)
Date: Sat, 26 Aug 2006 20:24:04 -0400
Subject: [Python-3000] find -> index patch
In-Reply-To: <eckao9$rq9$1@sea.gmane.org>
References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com>
	<eckao9$rq9$1@sea.gmane.org>
Message-ID: <20060827002404.GG24154@performancedrivers.com>

On Thu, Aug 24, 2006 at 03:48:57PM +0200, Fredrik Lundh wrote:
> Michael Chermside wrote:
> 
> >> WOW, I love partition.  In all the instances that weren't a simple "in"
> >> test I ended up using [r]partition.  In some cases one of the returned
> >> strings gets thrown away but in those cases it is guaranteed to be small.
> >> The new code is usually smaller than the old and generally clearer.
> >
> > Wow. That's just beautiful. This has now convinced me that dumping
> > [r]find() (at least!) and pushing people toward using partition will
> > result in pain in the short term (of course), and beautiful, readable
> > code in the long term.
> 
> note that partition provides an elegant solution to an important *subset* of all
> problems addressed by find/index.
> 
> just like lexical scoping vs. default arguments and map vs. list comprehensions,
> it doesn't address all problems right out of the box, and shouldn't be advertised
> as doing that.
> 

After some benchmarking find() can't go away without really hurting readline() 
performance.  partition performs as well as find for small lines but for large 
lines the extra copy to concat the newline separator is a killer (twice as slow 
for 50k char lines).  index has the opposite problem as the overhead of setting up
a try block makes 50 char lines twice as slow even when the except clause is never 
triggered.

A version of partition that returned two arguments instead of three would solve
the problem but that would just be adding more functions to remove the two find's
or adding behavior flags to partition.  Ick.

Most uses of find are better off using partition but if this one case can't
be beat there must be others too.

-Jack

From jimjjewett at gmail.com  Sun Aug 27 03:59:25 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sat, 26 Aug 2006 21:59:25 -0400
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060826084138.1AC0.JCARLSON@uci.edu>
References: <44F0107B.20205@iinet.net.au>
	<20060826084138.1AC0.JCARLSON@uci.edu>
Message-ID: <fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>

On 8/26/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> Nick Coghlan <ncoghlan at iinet.net.au> wrote:

> > There are a couple of existing workarounds for
> > this: buffer() objects, and the start/stop arguments
> > to a variety of string methods. Neither of these is
> > particular convenient to work with, and buffer() is
> > slated to go away in Py3k.

> Ahh, but string views offer a significantly more
> reasonable mechanism.

As I understand it, Nick is suggesting that slice objects be used as a
sequence (not just string) view.


> string = stringview(string)
> ...  We can toss all of the optional start, stop
> arguments to all string functions, and replace them
> with either of the following:
>     result = stringview(string, start=None, stop=None).method(args)

>     string = stringview(string)
>     result = string[start:stop].method(args)

Under Nick's proposal, I believe we could replace it with just the final line.

    result = string[start:stop].method(args)

though there is a chance that (when you want to avoid copying) he is
suggesting explicit slice objects such as

    view=slice(start, stop)
    result = view(string).method(args)

-jJ

From jimjjewett at gmail.com  Sun Aug 27 04:42:02 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sat, 26 Aug 2006 22:42:02 -0400
Subject: [Python-3000] path in py3K Re: [Python-checkins] r51624 - in
	python/trunk/Lib: genericpath.py macpath.py ntpath.py
	os2emxpath.py posixpath.py test/test_genericpath.py
Message-ID: <fb6fbf560608261942x37d79be3i197324e35a6849cb@mail.gmail.com>

In Py3K, is it still safe to assume that a list of paths will be
(enough like) ordinary strings?

I ask because of the various Path object discussions; it wasn't clear
that a Path object should be a sequence of (normalized unicode?)
characters (rather than path components), that the path would always
be normalized or absolute, or even that it would implement the LE (or
LT?) comparison operator.

-jJ

On 8/26/06, jack.diederich <python-checkins at python.org> wrote:
> Author: jack.diederich
> Date: Sat Aug 26 20:42:06 2006
> New Revision: 51624

> Added: python/trunk/Lib/genericpath.py

> +# Return the longest prefix of all list elements.
> +def commonprefix(m):
> +    "Given a list of pathnames, returns the longest common leading component"
> +    if not m: return ''
> +    s1 = min(m)
> +    s2 = max(m)
> +    n = min(len(s1), len(s2))
> +    for i in xrange(n):
> +        if s1[i] != s2[i]:
> +            return s1[:i]
> +    return s1[:n]

From guido at python.org  Sun Aug 27 04:51:03 2006
From: guido at python.org (Guido van Rossum)
Date: Sat, 26 Aug 2006 19:51:03 -0700
Subject: [Python-3000] find -> index patch
In-Reply-To: <20060827002404.GG24154@performancedrivers.com>
References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com>
	<eckao9$rq9$1@sea.gmane.org>
	<20060827002404.GG24154@performancedrivers.com>
Message-ID: <ca471dc20608261951g2f2c7fe3p2b2a63eae7b563df@mail.gmail.com>

On 8/26/06, Jack Diederich <jackdied at jackdied.com> wrote:
> After some benchmarking find() can't go away without really hurting readline()
> performance.

Can you elaborate? readline() is typically implemented in C so I'm not
sure I follow.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Aug 27 05:00:05 2006
From: guido at python.org (Guido van Rossum)
Date: Sat, 26 Aug 2006 20:00:05 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>
References: <44F0107B.20205@iinet.net.au>
	<20060826084138.1AC0.JCARLSON@uci.edu>
	<fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>
Message-ID: <ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>

On 8/26/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/26/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Nick Coghlan <ncoghlan at iinet.net.au> wrote:
>
> > > There are a couple of existing workarounds for
> > > this: buffer() objects, and the start/stop arguments
> > > to a variety of string methods. Neither of these is
> > > particular convenient to work with, and buffer() is
> > > slated to go away in Py3k.
>
> > Ahh, but string views offer a significantly more
> > reasonable mechanism.
>
> As I understand it, Nick is suggesting that slice objects be used as a
> sequence (not just string) view.

I have a hard time parsing this sentence. A slice is an object with
three immutable attributes -- start, stop, step. How does this double
as a string view?

> > string = stringview(string)
> > ...  We can toss all of the optional start, stop
> > arguments to all string functions, and replace them
> > with either of the following:
> >     result = stringview(string, start=None, stop=None).method(args)
>
> >     string = stringview(string)
> >     result = string[start:stop].method(args)
>
> Under Nick's proposal, I believe we could replace it with just the final line.

I still don't see the transformation of clumsy to elegant. Please give
me a complete, specific example instead of a generic code snippet.
(Also, please don't use 'string' as a variable name. There's a module
by that name that I can't get out of my head.)

Maybe the idea is that instead of

  pos = s.find(t, pos)

we would write

  pos += stringview(s)[pos:].find(t)

???

And how is that easier on the eyes? (And note the need to use +=
because the sliced view renumbers the positions in the original
string.)

>     result = string[start:stop].method(args)
>
> though there is a chance that (when you want to avoid copying) he is
> suggesting explicit slice objects such as
>
>     view=slice(start, stop)
>     result = view(string).method(args)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Aug 27 05:01:27 2006
From: guido at python.org (Guido van Rossum)
Date: Sat, 26 Aug 2006 20:01:27 -0700
Subject: [Python-3000] path in py3K Re: [Python-checkins] r51624 - in
	python/trunk/Lib: genericpath.py macpath.py ntpath.py
	os2emxpath.py posixpath.py test/test_genericpath.py
In-Reply-To: <fb6fbf560608261942x37d79be3i197324e35a6849cb@mail.gmail.com>
References: <fb6fbf560608261942x37d79be3i197324e35a6849cb@mail.gmail.com>
Message-ID: <ca471dc20608262001s3f1d0b62id94a0e4660839dd1@mail.gmail.com>

It is not my intention to adopt the Path module in Py3k.

On 8/26/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> In Py3K, is it still safe to assume that a list of paths will be
> (enough like) ordinary strings?
>
> I ask because of the various Path object discussions; it wasn't clear
> that a Path object should be a sequence of (normalized unicode?)
> characters (rather than path components), that the path would always
> be normalized or absolute, or even that it would implement the LE (or
> LT?) comparison operator.
>
> -jJ
>
> On 8/26/06, jack.diederich <python-checkins at python.org> wrote:
> > Author: jack.diederich
> > Date: Sat Aug 26 20:42:06 2006
> > New Revision: 51624
>
> > Added: python/trunk/Lib/genericpath.py
>
> > +# Return the longest prefix of all list elements.
> > +def commonprefix(m):
> > +    "Given a list of pathnames, returns the longest common leading component"
> > +    if not m: return ''
> > +    s1 = min(m)
> > +    s2 = max(m)
> > +    n = min(len(s1), len(s2))
> > +    for i in xrange(n):
> > +        if s1[i] != s2[i]:
> > +            return s1[:i]
> > +    return s1[:n]
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Sun Aug 27 05:30:30 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sat, 26 Aug 2006 23:30:30 -0400
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>
References: <44F0107B.20205@iinet.net.au>
	<20060826084138.1AC0.JCARLSON@uci.edu>
	<fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>
	<ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>
Message-ID: <fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>

On 8/26/06, Guido van Rossum <guido at python.org> wrote:
> On 8/26/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> > On 8/26/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > Nick Coghlan <ncoghlan at iinet.net.au> wrote:

> > > > There are a couple of existing workarounds for
> > > > this: buffer() objects, and the start/stop
> > > > arguments to a variety of string methods.

> > > Ahh, but string views offer a significantly more
> > > reasonable mechanism.

> > As I understand it, Nick is suggesting that slice
> > objects be used as a sequence (not just string)
> > view.

> I have a hard time parsing this sentence. A slice is
> an object with three immutable attributes -- start,
> stop, step. How does this double as a string view?

Poor wording on my part; it is (the application of a slice to a
specific sequence) that could act as copyless view.

For example, you wanted to keep the rarely used optional arguments to
find because of efficiency.

    s.find(prefix, start, stop)

does not copy.  If slices were less eager at copying, this could be
rewritten as

    view=slice(start, stop, 1)
    view(s).find(prefix)

or perhaps even as

    s[start:stop].find(prefix)

I'm not sure these look better, but they are less surprising, because
they don't depend on optional arguments that most people have
forgotten about.


> Maybe the idea is that instead of

>   pos = s.find(t, pos)

> we would write

>   pos += stringview(s)[pos:].find(t)

> ???

With stringviews, you wouldn't need to be reindexing from the start of
the original string.  The idiom would instead be a generalization of
"for line in file:"

    while data:
        chunk, sep, data = data.partition()

but the partition call would not need to copy the entire string; it
could simply return three views.

Yes, this does risk keeping all of data alive because one chunk was
saved.  This might be a reasonable tradeoff to avoid the copying.  If
not, perhaps the gc system could be augmented to shrink bloated views
during idle moments.

-jJ

From jackdied at jackdied.com  Sun Aug 27 06:12:27 2006
From: jackdied at jackdied.com (Jack Diederich)
Date: Sun, 27 Aug 2006 00:12:27 -0400
Subject: [Python-3000] find -> index patch
In-Reply-To: <ca471dc20608261951g2f2c7fe3p2b2a63eae7b563df@mail.gmail.com>
References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com>
	<eckao9$rq9$1@sea.gmane.org>
	<20060827002404.GG24154@performancedrivers.com>
	<ca471dc20608261951g2f2c7fe3p2b2a63eae7b563df@mail.gmail.com>
Message-ID: <20060827041227.GJ24154@performancedrivers.com>

On Sat, Aug 26, 2006 at 07:51:03PM -0700, Guido van Rossum wrote:
> On 8/26/06, Jack Diederich <jackdied at jackdied.com> wrote:
> > After some benchmarking find() can't go away without really hurting readline()
> > performance.
> 
> Can you elaborate? readline() is typically implemented in C so I'm not
> sure I follow.
> 

A number of modules in Lib have readline() methods that currently use find().
StringIO, httplib, tarfile, and others

sprat:~/src/python-head/Lib# grep 'def readline' *.py | wc -l
30

Mainly I wanted to point out that find() solves a class of problems that
can't be solved equally well with partition() (bad for large strings that
want to preserve the seperator) or index() (bad for large numbers of small 
strings and for frequent misses).  I wanted to reach the conclusion that 
find() could be yanked out but as Fredrik opined it is still useful for a 
subset of problems.

-Jack

From jcarlson at uci.edu  Sun Aug 27 08:08:14 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sat, 26 Aug 2006 23:08:14 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
References: <ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>
	<fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
Message-ID: <20060826230223.1AD6.JCARLSON@uci.edu>


"Jim Jewett" <jimjjewett at gmail.com> wrote:
> With stringviews, you wouldn't need to be reindexing from the start of
> the original string.  The idiom would instead be a generalization of
> "for line in file:"
> 
>     while data:
>         chunk, sep, data = data.partition()
> 
> but the partition call would not need to copy the entire string; it
> could simply return three views.

Also, with a little work, having string views be smart about
concatenation (if two views are adjacent to each other, like chunk,sep
or sep,data above, view1+view2 -> view3 on the original string), copies
could further be minimized, and the earlier problem with readline, etc.,
can be avoided.

 - Josiah


From jcarlson at uci.edu  Sun Aug 27 08:23:38 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sat, 26 Aug 2006 23:23:38 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>
References: <20060826084138.1AC0.JCARLSON@uci.edu>
	<fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>
Message-ID: <20060826230846.1AD9.JCARLSON@uci.edu>


"Jim Jewett" <jimjjewett at gmail.com> wrote:
> 
> On 8/26/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Nick Coghlan <ncoghlan at iinet.net.au> wrote:
> 
> > > There are a couple of existing workarounds for
> > > this: buffer() objects, and the start/stop arguments
> > > to a variety of string methods. Neither of these is
> > > particular convenient to work with, and buffer() is
> > > slated to go away in Py3k.
> 
> > Ahh, but string views offer a significantly more
> > reasonable mechanism.
> 
> As I understand it, Nick is suggesting that slice objects be used as a
> sequence (not just string) view.

I'm not sure there is a compelling use-case for offering views on
general ordered sequences (lists).  Unicode and bytes strings, sure, but
I don't think I've ever really been hurting for faster/more memory
efficient list slicing...  Maybe I'm strange.

 - Josiah


From ncoghlan at iinet.net.au  Sun Aug 27 16:59:24 2006
From: ncoghlan at iinet.net.au (Nick Coghlan)
Date: Mon, 28 Aug 2006 00:59:24 +1000
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
References: <44F0107B.20205@iinet.net.au>	
	<20060826084138.1AC0.JCARLSON@uci.edu>	
	<fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>	
	<ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>
	<fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
Message-ID: <44F1B34C.4020601@iinet.net.au>

Jim Jewett wrote:
> On 8/26/06, Guido van Rossum <guido at python.org> wrote:
>> On 8/26/06, Jim Jewett <jimjjewett at gmail.com> wrote:
>> > As I understand it, Nick is suggesting that slice
>> > objects be used as a sequence (not just string)
>> > view.
> 
>> I have a hard time parsing this sentence. A slice is
>> an object with three immutable attributes -- start,
>> stop, step. How does this double as a string view?
> 
> Poor wording on my part; it is (the application of a slice to a
> specific sequence) that could act as copyless view.
> 
> For example, you wanted to keep the rarely used optional arguments to
> find because of efficiency.
> 
>    s.find(prefix, start, stop)
> 
> does not copy.  If slices were less eager at copying, this could be
> rewritten as
> 
>    view=slice(start, stop, 1)
>    view(s).find(prefix)
> 
> or perhaps even as
> 
>    s[start:stop].find(prefix)
> 
> I'm not sure these look better, but they are less surprising, because
> they don't depend on optional arguments that most people have
> forgotten about.

Actually, string views have nothing to do with what I'm suggesting (although 
my comments about them in the find/rfind thread were one of the things that 
fed into this message). I'm actually proposing an *alternative* to string 
views, because they have a nasty problem with non-local effects. It is easy to 
pass or return a string view instead of an actual string, and you get 
something that runs with subtly different semantics from what you expect, but 
that isn't likely to trigger an obvious error. It also breaks the persistent 
idiom that "seq[:]" makes a copy (which is true throughout the standard 
library, even if it isn't true for external number-crunching libraries like 
NumPy).

You also potentially end up with *every* sequence type ending up with a 
"x-view" counterpart, which is horrible. OTOH, if we make the standard library 
more consistent in always using a slice or range object anytime it wants to 
pass or return (start, stop, step) information, it provides a foundation for 
someone to do their own non-copying versions.

So with my musings, the non-copying index operation in a subsection would 
still use an optional second argument:

    s.find(prefix, slice(start, stop))

Now, the ultimate extension of this idea would be to permit slice literals in 
places other than sequence indexing (similar to how Py3k is likely to permit 
Ellipsis literals outside of subscript expressions). Naturally, parentheses 
may be needed in order to disambiguate colons:

    s.find(prefix, (start:stop))

Contrast this with the copying version:

    s[start:stop].find(prefix)

If (start:stop:step) is equivalent to slice(start, stop, step), then slice 
notation can be used to create ranges: range(start:stop:step)

The idea of making slice objects callable, with the result being a view of the 
original sequence is Jim's, not mine, and I'm not that keen on it (my 
reservations about string views apply to the more general idea of sequence 
views, too).

Cheers,
Nick.

P.S. I *will* be doing a PEP to bring this discussion together, but be warned 
that it may be a week or two before I get to it.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sun Aug 27 17:28:14 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 28 Aug 2006 01:28:14 +1000
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608260930h528a7f60rc254eb2f75398a57@mail.gmail.com>
References: <44F0107B.20205@iinet.net.au>
	<ca471dc20608260930h528a7f60rc254eb2f75398a57@mail.gmail.com>
Message-ID: <44F1BA0E.3040203@gmail.com>

Guido van Rossum wrote:
> Can you explain in a sentence or two how these changes would be
> *used*? Your code examples don't speak for themselves (maybe because
> It's Saturday morning :-). Short examples of something clumsy and/or
> slow that we'd have to write today compared to something fast and
> elegant that we could write after the change woulde be quite helpful.
> The exact inheritance relationship between slice and [x]range seems a
> fairly uninteresting details in comparison.

A more unified model for representing sequence slices makes it practical to 
offer a non-copying string partitioning method like the version of 
partition_indices() in my initial message. With the current mixed model 
(sometimes using xrange(), sometimes using slice(), sometimes using a 3-tuple, 
sometimes using separate start & stop values), there is no point in offering 
such a method, as it would be terribly inconvenient to work with regardless of 
what kind of objects it returned to indicate the 3 portions of the original 
string:

  - 3-tuples and xrange() objects can't be used to slice a sequence
  - 3-tuples and slice() objects can't be usefully tested for truth
  - none of them can be passed as optional string method arguments

I believe the current mixed model is actually an artifact of the transition 
from simple slicing to extended slicing, albeit one that is significantly less 
obvious than the deprecated __*slice__ family of special methods. Old style 
slicing and string methods use separate start and stop values. Extended 
slicing uses slice objects with start,stop,step attributes (which can be 
anything, including None). The indices() method of slice objects uses a 
start,stop,step 3-tuple. Iteration uses either a list of indices (from 
range()) or xrange objects with start,stop,step attributes (which must be 
integers).

The basic proposal I am making is to reduce this to exactly two concepts:
   - slice objects, which have arbitrary start, stop, step attributes
   - range objects, which have indices as start, stop, step attributes, behave 
like an immutable sequence, and are a subclass of slice

All other instances in the core and standard library which use a different 
representation of a sequence slice (like the optional arguments to string 
methods, or the result of the indices() method) would change to use one of 
those two types. The methods of the types would be driven by the needs of the 
standard library.

In addition to reuding the number of concepts to be dealt with from 4 to 2, I 
believe this would make it much easier to write memory efficient code without 
having to duplicate entire objects with non-copying versions.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sun Aug 27 17:37:59 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 28 Aug 2006 01:37:59 +1000
Subject: [Python-3000] find -> index patch
In-Reply-To: <20060827041227.GJ24154@performancedrivers.com>
References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com>	<eckao9$rq9$1@sea.gmane.org>	<20060827002404.GG24154@performancedrivers.com>	<ca471dc20608261951g2f2c7fe3p2b2a63eae7b563df@mail.gmail.com>
	<20060827041227.GJ24154@performancedrivers.com>
Message-ID: <44F1BC57.7090004@gmail.com>

Jack Diederich wrote:
> On Sat, Aug 26, 2006 at 07:51:03PM -0700, Guido van Rossum wrote:
>> On 8/26/06, Jack Diederich <jackdied at jackdied.com> wrote:
>>> After some benchmarking find() can't go away without really hurting readline()
>>> performance.
>> Can you elaborate? readline() is typically implemented in C so I'm not
>> sure I follow.
>>
> 
> A number of modules in Lib have readline() methods that currently use find().
> StringIO, httplib, tarfile, and others
> 
> sprat:~/src/python-head/Lib# grep 'def readline' *.py | wc -l
> 30
> 
> Mainly I wanted to point out that find() solves a class of problems that
> can't be solved equally well with partition() (bad for large strings that
> want to preserve the seperator) or index() (bad for large numbers of small 
> strings and for frequent misses).  I wanted to reach the conclusion that 
> find() could be yanked out but as Fredrik opined it is still useful for a 
> subset of problems.

What about a version of partition that returned a 3-tuple of xrange objects 
indicating the indices of the partitions, instead of copies of the partitions? 
That would allow you to use the cleaner idiom without having to suffer the 
copying performance penalty.

Something like:

    line, newline, rest = s.partition_indices('\n', rest.start, rest.stop)
    if newline:
        yield s[line.start:newline.stop]

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From jcarlson at uci.edu  Sun Aug 27 17:45:30 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 27 Aug 2006 08:45:30 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F1B34C.4020601@iinet.net.au>
References: <fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
	<44F1B34C.4020601@iinet.net.au>
Message-ID: <20060827081644.1ADC.JCARLSON@uci.edu>


Nick Coghlan <ncoghlan at iinet.net.au> wrote:
[snip]
> that isn't likely to trigger an obvious error. It also breaks the persistent 
> idiom that "seq[:]" makes a copy (which is true throughout the standard 
> library, even if it isn't true for external number-crunching libraries like 
> NumPy).

The copying is easily fixed.  I'm also not terribly concerned with the
persistance of views, as I expect that most people who bother to use
them (and/or care about the efficiency of str.partition, etc.) will know
what they are getting themselves into.  If they don't, then they will
post on python-[list|dev], and we can give them a link to the string
view documentation, which will explain what views are and how they can
release the references to the original object: ref = str(ref) .


> You also potentially end up with *every* sequence type ending up with a 
> "x-view" counterpart, which is horrible. OTOH, if we make the standard library 
> more consistent in always using a slice or range object anytime it wants to 
> pass or return (start, stop, step) information, it provides a foundation for 
> someone to do their own non-copying versions.

I'm not sure your slippery-slope argument holds.  So far there are only
a few objects for which views have been proposed with any substance:
dictionaries, text and byte strings.

The removal of buffer from 3.0 does leave an opening for other
structures for which views (or even the original buffers) would make
sense, like array and mmap, but those each have implementations that
could effectively mirror the (mutable) byte string view.

As for using slices to define a mechanism for returning view-like
objects (it is effectively a different spelling), I don't particularly
care for passing around slice/xrange objects.

I would also like to mention that there exists external libraries that
offer non-copying "views" to their underlying structures, the 'array
interface' that was proposed in the last few months being a primary
example of a desired standardization of such.


> So with my musings, the non-copying index operation in a subsection would 
> still use an optional second argument:
> 
>     s.find(prefix, slice(start, stop))

This reduces the number of optional arguments by 1, and requires the
somewhat explicit spelling out of the slice creation (which you attempt
to remove via various syntax changes). I'm not sure these are actual
improvements to either the string (or otherwise) API, or to the general
sequence API.


> If (start:stop:step) is equivalent to slice(start, stop, step), then slice 
> notation can be used to create ranges: range(start:stop:step)

That looks like the integer slicing PEP that was rejected.  Also, no one
has been severely restricted by syntax; one could easily write a
specialized object so that "for i in range[start:stop:step]" 'does the
right thing'.

 - Josiah


From guido at python.org  Sun Aug 27 17:50:39 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Aug 2006 08:50:39 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
References: <44F0107B.20205@iinet.net.au>
	<20060826084138.1AC0.JCARLSON@uci.edu>
	<fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>
	<ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>
	<fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
Message-ID: <ca471dc20608270850l70279c2bw30a41d82a721f00e@mail.gmail.com>

On 8/26/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> > > As I understand it, Nick is suggesting that slice
> > > objects be used as a sequence (not just string)
> > > view.
>
> > I have a hard time parsing this sentence. A slice is
> > an object with three immutable attributes -- start,
> > stop, step. How does this double as a string view?
>
> Poor wording on my part; it is (the application of a slice to a
> specific sequence) that could act as copyless view.
>
> For example, you wanted to keep the rarely used optional arguments to
> find because of efficiency.

I don't believe they are rarely used. They are (currently) essential
for code that searches a long string for a short substring repeatedly.
If you believe that is a rare use case, why bother coming up with a
whole new language feature to support it?

>     s.find(prefix, start, stop)
>
> does not copy.

That's still really poor wording. If you want to make your case you
should take more time explaining it right.

> If slices were less eager at copying, this could be
> rewritten as
>
>     view=slice(start, stop, 1)
>     view(s).find(prefix)

Now you're postulating that calling a slice will take a slice of an
object? Any object? And how is that supposed to work for arbitrary
objects? I would think that it ought to be a method on the string
object -- surely a view on a string will have to be a different type
of object than a few on a list and that ought to be different again
from a view on a unicode string. Also you're postulating that the
slice object somehow has the same methods as the thing it slices? How
are you expecting to implement that? (Don't tell me that you haven't
thought about implementation yet. Without a plan implementation there
is no feature.)

> or perhaps even as
>
>     s[start:stop].find(prefix)

That will never fly. NumPy may get away with non-copying slices, but
for built-in objects this would be too big of a departure of current
practice. (If you don't stop about this I'll have to add it to PEP
3099. :-)

> I'm not sure these look better, but they are less surprising, because
> they don't depend on optional arguments that most people have
> forgotten about.

Because they're not that important except to the few people who really
need the optimization. Also they're easily looked up.

> > Maybe the idea is that instead of
>
> >   pos = s.find(t, pos)
>
> > we would write
>
> >   pos += stringview(s)[pos:].find(t)
>
> > ???
>
> With stringviews, you wouldn't need to be reindexing from the start of
> the original string.  The idiom would instead be a generalization of
> "for line in file:"
>
>     while data:
>         chunk, sep, data = data.partition()
>
> but the partition call would not need to copy the entire string; it
> could simply return three views.

That depends. I can imagine situations where the indices are needed
regardless of how you code it.

> Yes, this does risk keeping all of data alive because one chunk was
> saved.  This might be a reasonable tradeoff to avoid the copying.  If
> not, perhaps the gc system could be augmented to shrink bloated views
> during idle moments.

Keep dreaming on. it really seems you have no clue about
implementation issues; you just keep postulating random solutions
whenever you're faced with an objection.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Aug 27 17:55:05 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Aug 2006 08:55:05 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060826230223.1AD6.JCARLSON@uci.edu>
References: <ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>
	<fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
	<20060826230223.1AD6.JCARLSON@uci.edu>
Message-ID: <ca471dc20608270855p1c74839nc7d2430ae7cb6479@mail.gmail.com>

On 8/26/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Jim Jewett" <jimjjewett at gmail.com> wrote:
> > With stringviews, you wouldn't need to be reindexing from the start of
> > the original string.  The idiom would instead be a generalization of
> > "for line in file:"
> >
> >     while data:
> >         chunk, sep, data = data.partition()
> >
> > but the partition call would not need to copy the entire string; it
> > could simply return three views.
>
> Also, with a little work, having string views be smart about
> concatenation (if two views are adjacent to each other, like chunk,sep
> or sep,data above, view1+view2 -> view3 on the original string), copies
> could further be minimized, and the earlier problem with readline, etc.,
> can be avoided.

But this assumes that string views are 99.999% indiscernible from
regular strings -- if operations can return a copy or a view depending
on how things happen to be laid out in memory, It should be trivial to
write code that doesn't care whether it gets a string or a view.

This works for strings (which are immutable) but these semantics are
unacceptable for mutable objects -- another reason to doubt that it
makes sense to generalize the idea of views to all sequences, or to
involve a change to the slice object in the design.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Aug 27 18:08:09 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Aug 2006 09:08:09 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F1BA0E.3040203@gmail.com>
References: <44F0107B.20205@iinet.net.au>
	<ca471dc20608260930h528a7f60rc254eb2f75398a57@mail.gmail.com>
	<44F1BA0E.3040203@gmail.com>
Message-ID: <ca471dc20608270908h130c9f29jce193dda6430e507@mail.gmail.com>

On 8/27/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
> > Can you explain in a sentence or two how these changes would be
> > *used*? Your code examples don't speak for themselves (maybe because
> > It's Saturday morning :-). Short examples of something clumsy and/or
> > slow that we'd have to write today compared to something fast and
> > elegant that we could write after the change woulde be quite helpful.
> > The exact inheritance relationship between slice and [x]range seems a
> > fairly uninteresting details in comparison.
>
> A more unified model for representing sequence slices makes it practical to
> offer a non-copying string partitioning method like the version of
> partition_indices() in my initial message.

Which I still don't understand. (Because you give code but no
docstring or rationale, and are assuming some unspecified changes to
other things as well.)

> With the current mixed model
> (sometimes using xrange(), sometimes using slice(), sometimes using a 3-tuple,
> sometimes using separate start & stop values),

I don't recall xrange() being used anywhere except in for-loops. I
don't know of any use of 3-tuples, though the re API uses 2-tuples
consistently.

> there is no point in offering
> such a method, as it would be terribly inconvenient to work with regardless of
> what kind of objects it returned to indicate the 3 portions of the original
> string:
>
>   - 3-tuples and xrange() objects can't be used to slice a sequence
>   - 3-tuples and slice() objects can't be usefully tested for truth
>   - none of them can be passed as optional string method arguments
>
> I believe the current mixed model is actually an artifact of the transition
> from simple slicing to extended slicing,

Really? Extended slicing mostly meant adding a third "step" option to
the slice syntax, which is useful for NumPy but completely pointless
for string searches as we're discussing here. The slice() object was
invented as an API hack so that we didn't have to add new special
methods.

> albeit one that is significantly less
> obvious than the deprecated __*slice__ family of special methods. Old style
> slicing and string methods use separate start and stop values. Extended
> slicing uses slice objects with start,stop,step attributes (which can be
> anything, including None). The indices() method of slice objects uses a
> start,stop,step 3-tuple. Iteration uses either a list of indices (from
> range()) or xrange objects with start,stop,step attributes (which must be
> integers).

It was always my intention to keep slice objects limited to NumPy apps
and the rare application of extended slicing in regular Python.

> The basic proposal I am making is to reduce this to exactly two concepts:
>    - slice objects, which have arbitrary start, stop, step attributes
>    - range objects, which have indices as start, stop, step attributes, behave
> like an immutable sequence, and are a subclass of slice

And you still haven't explained how this is going to make life easier.
I keep asking for concrete examples and you keep answering in
generalities. This is an annoying disconnect.

> All other instances in the core and standard library which use a different
> representation of a sequence slice (like the optional arguments to string
> methods, or the result of the indices() method) would change to use one of
> those two types. The methods of the types would be driven by the needs of the
> standard library.

What's the indices() method?

In many cases it doesn'ts eem to make a lot of sense to return a slice
object, since it doesn't convey more information than a single index
(given that the string being searched for is known -- we're not
searching regular expressions here but literal substrings).

> In addition to reducing the number of concepts to be dealt with from 4 to 2, I
> believe this would make it much easier to write memory efficient code without
> having to duplicate entire objects with non-copying versions.

Write the PEP and make sure it is plentiful of examples of old and new
ways of doing common string operations.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Sun Aug 27 18:52:50 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 27 Aug 2006 09:52:50 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608270855p1c74839nc7d2430ae7cb6479@mail.gmail.com>
References: <20060826230223.1AD6.JCARLSON@uci.edu>
	<ca471dc20608270855p1c74839nc7d2430ae7cb6479@mail.gmail.com>
Message-ID: <20060827091000.1ADF.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> 
> On 8/26/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> >
> > "Jim Jewett" <jimjjewett at gmail.com> wrote:
> > > With stringviews, you wouldn't need to be reindexing from the start of
> > > the original string.  The idiom would instead be a generalization of
> > > "for line in file:"
> > >
> > >     while data:
> > >         chunk, sep, data = data.partition()
> > >
> > > but the partition call would not need to copy the entire string; it
> > > could simply return three views.
> >
> > Also, with a little work, having string views be smart about
> > concatenation (if two views are adjacent to each other, like chunk,sep
> > or sep,data above, view1+view2 -> view3 on the original string), copies
> > could further be minimized, and the earlier problem with readline, etc.,
> > can be avoided.
> 
> But this assumes that string views are 99.999% indiscernible from
> regular strings -- if operations can return a copy or a view depending
> on how things happen to be laid out in memory, It should be trivial to
> write code that doesn't care whether it gets a string or a view.

That's what I'm working towards.  Let us say for a moment that the only
view that was on the table was the string view:
    view = stringview(st[, start[, stop]])

If st is a string, it produces a view on that string.  If st is a
stringview already, it references the original string (removing tree
persistance[1]).

After a view is created, it can be treated like a string for
(effectively) everything because it has an Py_UNICODE* that has already
been adjusted to handle the offset argument.  Its implementation would
require copying the PyUnicodeObject struct, adding one more field:
    PyUnicodeObject* orig_object;
This would point to the original object for the later Py_DECREF (when
the view is destroyed), view creation (again, we don't want tree
persistance), etc.

We can easily discover the 'start' offset again by comparing the
view->str and the orig_object->str pointers.


Optimizations like 'adding properly ordered adjacent string views
returns a new view', 'views over fewer than X bytes are string copies',
etc., could be added later with (hopefully) little trouble.


> This works for strings (which are immutable) but these semantics are
> unacceptable for mutable objects -- another reason to doubt that it
> makes sense to generalize the idea of views to all sequences, or to
> involve a change to the slice object in the design.

I think the whole slice object thing is complete nonsense.

On the other hand, I think that just like buffers are verifying the
object that they are buffering every time they are accessed, mutable
bytes string, array, and mmap views could do the same.  After they are
verified, they can generally be used the same, but it may take some
discussion as to whether certain operations are allowed, and/or what
their semantics are. Things like:
    view = arrayview(arr, 1, -1)
    del view[1:-1]
A convenient semantic (from the Python side of things) is to do as
buffer does now and only allow them to be read-only.


I'm also not terribly convinced about general sequence views, but for
objects in which buffer(obj) returns something useful, I can see
specialized views for them making at least some sense.  I am cautious
about pushing for all of them because implementing views for all would
be a pain. Choosing one (like bytes) would take some effort, but could
easily be pushed back to 3.1 or 3.2 and be done by someone who really
wants them.

 - Josiah


[1] When I say "tree persistance", I mean those cases like a -> b -> c,
where view b persist because view a persists, even though b doesn't have
a reference otherwise.  Making both views a and b reference c directly
allows for b to be freed when it is no longer used.


From jack at psynchronous.com  Sun Aug 27 19:05:50 2006
From: jack at psynchronous.com (Jack Diederich)
Date: Sun, 27 Aug 2006 13:05:50 -0400
Subject: [Python-3000] find -> index patch
In-Reply-To: <44F1BC57.7090004@gmail.com>
References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com>
	<eckao9$rq9$1@sea.gmane.org>
	<20060827002404.GG24154@performancedrivers.com>
	<ca471dc20608261951g2f2c7fe3p2b2a63eae7b563df@mail.gmail.com>
	<20060827041227.GJ24154@performancedrivers.com>
	<44F1BC57.7090004@gmail.com>
Message-ID: <20060827170550.GK24154@performancedrivers.com>

On Mon, Aug 28, 2006 at 01:37:59AM +1000, Nick Coghlan wrote:
> Jack Diederich wrote:
> > On Sat, Aug 26, 2006 at 07:51:03PM -0700, Guido van Rossum wrote:
> >> On 8/26/06, Jack Diederich <jackdied at jackdied.com> wrote:
> >>> After some benchmarking find() can't go away without really hurting readline()
> >>> performance.
> >> Can you elaborate? readline() is typically implemented in C so I'm not
> >> sure I follow.
> >>
> > 
> > A number of modules in Lib have readline() methods that currently use find().
> > StringIO, httplib, tarfile, and others
> > 
> > sprat:~/src/python-head/Lib# grep 'def readline' *.py | wc -l
> > 30
> > 
> > Mainly I wanted to point out that find() solves a class of problems that
> > can't be solved equally well with partition() (bad for large strings that
> > want to preserve the seperator) or index() (bad for large numbers of small 
> > strings and for frequent misses).  I wanted to reach the conclusion that 
> > find() could be yanked out but as Fredrik opined it is still useful for a 
> > subset of problems.
> 
> What about a version of partition that returned a 3-tuple of xrange objects 
> indicating the indices of the partitions, instead of copies of the partitions? 
> That would allow you to use the cleaner idiom without having to suffer the 
> copying performance penalty.
> 
> Something like:
> 
>     line, newline, rest = s.partition_indices('\n', rest.start, rest.stop)
>     if newline:
>         yield s[line.start:newline.stop]
> 

What is with the sudden rush to solve all problems by using slice objects?
I've never used a slice object and I don't care to start now.  The above code
reads just fine as

i = s.find('\n', start, stop)
if i >= 0:
  yield s[:i]

-Jack

From guido at python.org  Sun Aug 27 23:17:12 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Aug 2006 14:17:12 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060827091000.1ADF.JCARLSON@uci.edu>
References: <20060826230223.1AD6.JCARLSON@uci.edu>
	<ca471dc20608270855p1c74839nc7d2430ae7cb6479@mail.gmail.com>
	<20060827091000.1ADF.JCARLSON@uci.edu>
Message-ID: <ca471dc20608271417w480c90aeg6b39c766a8f94750@mail.gmail.com>

On 8/27/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> [1] When I say "tree persistance", I mean those cases like a -> b -> c,
> where view b persist because view a persists, even though b doesn't have
> a reference otherwise.  Making both views a and b reference c directly
> allows for b to be freed when it is no longer used.

Yeah, but you're still keeping c alive, which is the real memory waste.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Aug 27 23:18:13 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Aug 2006 14:18:13 -0700
Subject: [Python-3000] find -> index patch
In-Reply-To: <20060827170550.GK24154@performancedrivers.com>
References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com>
	<eckao9$rq9$1@sea.gmane.org>
	<20060827002404.GG24154@performancedrivers.com>
	<ca471dc20608261951g2f2c7fe3p2b2a63eae7b563df@mail.gmail.com>
	<20060827041227.GJ24154@performancedrivers.com>
	<44F1BC57.7090004@gmail.com>
	<20060827170550.GK24154@performancedrivers.com>
Message-ID: <ca471dc20608271418q24f3262bma07b222813366da6@mail.gmail.com>

On 8/27/06, Jack Diederich <jack at psynchronous.com> wrote:
> What is with the sudden rush to solve all problems by using slice objects?> I've never used a slice object and I don't care to start now.  The above code
> reads just fine as
>
> i = s.find('\n', start, stop)
> if i >= 0:
>   yield s[:i]

Hear, hear.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Mon Aug 28 01:38:08 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 27 Aug 2006 19:38:08 -0400
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608270850l70279c2bw30a41d82a721f00e@mail.gmail.com>
References: <44F0107B.20205@iinet.net.au>
	<20060826084138.1AC0.JCARLSON@uci.edu>
	<fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>
	<ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>
	<fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
	<ca471dc20608270850l70279c2bw30a41d82a721f00e@mail.gmail.com>
Message-ID: <fb6fbf560608271638r1ca2d114yc98a9c4f28036791@mail.gmail.com>

On 8/27/06, Guido van Rossum <guido at python.org> wrote:
> On 8/26/06, Jim Jewett <jimjjewett at gmail.com> wrote:

> > For example, you wanted to keep the rarely used optional arguments to
> > find because of efficiency.

> I don't believe they are rarely used. They are (currently) essential
> for code that searches a long string for a short substring repeatedly.
> If you believe that is a rare use case, why bother coming up with a
> whole new language feature to support it?

I believe that a fair amount of code already does the copying inline;
suppporting it in the runtime means that copying code becomes more
efficient, and shortcutting code becomes less unusual.

> > If slices were less eager at copying, this could be
> > rewritten as

> >     view=slice(start, stop, 1)
> >     view(s).find(prefix)

> Now you're postulating that calling a slice will take a slice of an
> object?

Yes.

> Any object? And how is that supposed to work for arbitrary
> objects?

For non-iterables, it will raise a TypeError.

> I would think that it ought to be a method on the string
> object

Restricting it to a few types including string might make sense.

> Also you're postulating that the slice object somehow has the
> same methods as the thing it slices?

Rather, the value returned by calling the slice on a specific string.
(I tend to think of this as a "slice of" the string, but as you've
pointed out, "slice object" technically refers to the object
specifying how/where to cut.)

> How are you expecting to implement that?

I had expected to implement it as a (string) view, which is why I
don't quite understand the distinction Nick and Josiah are making.

> But this assumes that string views are 99.999% indiscernible from
> regular strings

Yes; instead of assuming that a string's data starts n bytes after the
object's own pointer, it will instead be located at a (possibly zero)
offset.  No visible difference to python code; the difference between
-> and . for C code.  (And this indirection is already used by unicode
objects.)

> That will never fly. NumPy may get away with non-copying slices, but
> for built-in objects this would be too big of a departure of current
> practice. (If you don't stop about this I'll have to add it to PEP
> 3099. :-)

That's unfortunate, but if you're sure, maybe it should go in PEP 3099.

> > Yes, this does risk keeping all of data alive because one chunk was
> > saved.  This might be a reasonable tradeoff to avoid the copying.  If
> > not, perhaps the gc system could be augmented to shrink bloated views
> > during idle moments.

> Keep dreaming on. it really seems you have no clue about
> implementation issues; you just keep postulating random solutions
> whenever you're faced with an objection.

I had thought the problem was more about whether or not it was a good
idea; the tradeoff might be OK, or at least less bad than the
complication of fixing it.

As one implementation of fixing it, in today's garbage collection,
http://svn.python.org/view/python/trunk/Modules/gcmodule.c?rev=46244&view=markup
function collect, surviving objects are moved to the next generation
with gc_list_merge(young, old); before merging, the young list could
be traversed, and any object whose type has a __condense__ method
would get it called.  The strview type's __condense__ method would be
the C equivalent of

    if len(self.src) <= 200:
        return  # Src object too small to be worth recovering
    if (len(self) * refcounts(src)) >= len(self.src):
        return  # Src object used enough to be worth keeping
    self.src=str(src) # Create a new data buffer, with no extra chars.

(Sent in python because the commented C was several times as long,
even before checking with compiler.)  As to whether a __condense
method is a good idea, whether it should really be tied that closely
to garbage collection, whether it should be limited to C
implementations ... that I'm not so sure of.

-jJ

From tdelaney at avaya.com  Mon Aug 28 01:52:08 2006
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Mon, 28 Aug 2006 09:52:08 +1000
Subject: [Python-3000] Making more effective use of slice objects in Py3k
Message-ID: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com>

Jim Jewett wrote:

>     s[start:stop].find(prefix)

No matter what, I really think the obj[start:stop:step] syntax needs to
be consistent in its behaviour - either returning a copy or a view - and
that that behaviour be to return a copy. I'm not at all in favour of
sometimes getting a copy, and sometimes getting a view.

As a bit of an out-there and very premature suggestion ... <wink>

For when/*if* views ever become considered to be a good thing for
builtin classes, etc, may I suggest that the following syntax be
reserved for view creation:

    obj{start:stop:step} 

mapping to something like:

    def __view__(self, slice)

So if you really want a string view, use:

    s{1:2}

instead of:

    s[1:2]

I don't *think* the syntax is currently legal, and I don't think it
could ever be ambiguous - anyone think of a case where it could be?

Tim Delaney

From jimjjewett at gmail.com  Mon Aug 28 02:00:08 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 27 Aug 2006 20:00:08 -0400
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com>
References: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com>
Message-ID: <fb6fbf560608271700h1a7fcff0g1f8d4be98b2efa29@mail.gmail.com>

On 8/27/06, Delaney, Timothy (Tim) <tdelaney at avaya.com> wrote:
> Jim Jewett wrote:

> >     s[start:stop].find(prefix)

> No matter what, I really think the obj[start:stop:step]
> syntax needs to be consistent in its behaviour - either
> returning a copy or a view -

Does it still matter if we're looking only at immutable sequences, such as text?

-jJ

From tdelaney at avaya.com  Mon Aug 28 02:24:41 2006
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Mon, 28 Aug 2006 10:24:41 +1000
Subject: [Python-3000] Making more effective use of slice objects in Py3k
Message-ID: <2773CAC687FD5F4689F526998C7E4E5F0743D1@au3010avexu1.global.avaya.com>

Jim Jewett wrote:

> On 8/27/06, Delaney, Timothy (Tim) <tdelaney at avaya.com> wrote:
>> Jim Jewett wrote:
> 
>>>     s[start:stop].find(prefix)
> 
>> No matter what, I really think the obj[start:stop:step]
>> syntax needs to be consistent in its behaviour - either
>> returning a copy or a view -
> 
> Does it still matter if we're looking only at immutable sequences,
> such as text? 

Actually, yes. I think it should be an explicit operation to say "I'm
taking a small view of this large string, which will result in the large
string existing until the view goes away".

Currently the way to do that is to have a method. I'm simply proposing
that we reserve syntax that is currently not used to prevent it from
being used for another, less appropriate usage. It may never be used at
all.

Tim Delaney

From guido at python.org  Mon Aug 28 03:58:52 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Aug 2006 18:58:52 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608271638r1ca2d114yc98a9c4f28036791@mail.gmail.com>
References: <44F0107B.20205@iinet.net.au>
	<20060826084138.1AC0.JCARLSON@uci.edu>
	<fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>
	<ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>
	<fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
	<ca471dc20608270850l70279c2bw30a41d82a721f00e@mail.gmail.com>
	<fb6fbf560608271638r1ca2d114yc98a9c4f28036791@mail.gmail.com>
Message-ID: <ca471dc20608271858t505089bci99f48ef21d99291b@mail.gmail.com>

On 8/27/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/27/06, Guido van Rossum <guido at python.org> wrote:
> > On 8/26/06, Jim Jewett <jimjjewett at gmail.com> wrote:
>
> > > For example, you wanted to keep the rarely used optional arguments to
> > > find because of efficiency.
>
> > I don't believe they are rarely used. They are (currently) essential
> > for code that searches a long string for a short substring repeatedly.
> > If you believe that is a rare use case, why bother coming up with a
> > whole new language feature to support it?
>
> I believe that a fair amount of code already does the copying inline;
> suppporting it in the runtime means that copying code becomes more
> efficient, and shortcutting code becomes less unusual.

We're not making progress here. Your beliefs against my beliefs isn't
helpful. Do you have proof that there is code out that that's
inefficient and for which it would *matter* if it became faster?

> > > If slices were less eager at copying, this could be
> > > rewritten as
>
> > >     view=slice(start, stop, 1)
> > >     view(s).find(prefix)
>
> > Now you're postulating that calling a slice will take a slice of an
> > object?
>
> Yes.

I'd rather see an explicit method call. Using "call" as an operation
means no other operation can use the same syntax (on the same objects,
of course); you have to be very sure that there won't be another use
of "call" that would be more useful.

> > Any object? And how is that supposed to work for arbitrary
> > objects?
>
> For non-iterables, it will raise a TypeError.

Duh. I meant for other iterables, like tuples and lists. I'm asking if
you expect that asking for a view on a previously unknown sequence
should return a view on that sequence that behaves just like the
underlying object, and how you are thinking of pulling off that feat.
My claim is that you can't. You need full cooperation of the
underlying object to support views. You could attempt to automatically
provide wrappers for all methods, but since you don't know which of
the parameters or return values represent indices and which don't, you
can't do anything useful. Suppose I have a list [1, 2, 3, 1, 2, 3].
Suppose you don't have built-in knowledge of a list (otherwise I'll
substitute some other object that you don't have built-in knowledge
of). Now suppose you have a view v on the last half of that list, and
you ask for v.count(1). This of course should return 1. But how to do
this unless you how the count() method is implemented on the
underlying object type?

> > I would think that it ought to be a method on the string
> > object
>
> Restricting it to a few types including string might make sense.

Yes please. Without that your proposal is dead in the water.

(With it likely too, but for different reasons.)

> > Also you're postulating that the slice object somehow has the
> > same methods as the thing it slices?
>
> Rather, the value returned by calling the slice on a specific string.
> (I tend to think of this as a "slice of" the string, but as you've
> pointed out, "slice object" technically refers to the object
> specifying how/where to cut.)

And remember, calling buffer() on a unicode object is not a useful
operation unless you're interesting in the underlying bytes.

> > How are you expecting to implement that?
>
> I had expected to implement it as a (string) view, which is why I
> don't quite understand the distinction Nick and Josiah are making.

Well maybe you don't quite understand your own proposal either. :-)

> > But this assumes that string views are 99.999% indiscernible from
> > regular strings
>
> Yes; instead of assuming that a string's data starts n bytes after the
> object's own pointer, it will instead be located at a (possibly zero)
> offset.  No visible difference to python code; the difference between
> -> and . for C code.  (And this indirection is already used by unicode
> objects.)

Only because their original draft design had a kind of views. I expect
they had good reasons to rip out that part...

> > That will never fly. NumPy may get away with non-copying slices, but
> > for built-in objects this would be too big of a departure of current
> > practice. (If you don't stop about this I'll have to add it to PEP
> > 3099. :-)
>
> That's unfortunate, but if you're sure, maybe it should go in PEP 3099.

Ask any Python developer. Slices of mutable objects make copies except in NumPy.

> > > Yes, this does risk keeping all of data alive because one chunk was
> > > saved.  This might be a reasonable tradeoff to avoid the copying.  If
> > > not, perhaps the gc system could be augmented to shrink bloated views
> > > during idle moments.
>
> > Keep dreaming on. it really seems you have no clue about
> > implementation issues; you just keep postulating random solutions
> > whenever you're faced with an objection.
>
> I had thought the problem was more about whether or not it was a good
> idea; the tradeoff might be OK, or at least less bad than the
> complication of fixing it.

It's only a good idea if it works. Details matter.

> As one implementation of fixing it, in today's garbage collection,
> http://svn.python.org/view/python/trunk/Modules/gcmodule.c?rev=46244&view=markup
> function collect, surviving objects are moved to the next generation
> with gc_list_merge(young, old); before merging, the young list could
> be traversed, and any object whose type has a __condense__ method
> would get it called.  The strview type's __condense__ method would be
> the C equivalent of
>
>     if len(self.src) <= 200:
>         return  # Src object too small to be worth recovering
>     if (len(self) * refcounts(src)) >= len(self.src):
>         return  # Src object used enough to be worth keeping
>     self.src=str(src) # Create a new data buffer, with no extra chars.
>
> (Sent in python because the commented C was several times as long,
> even before checking with compiler.)  As to whether a __condense
> method is a good idea, whether it should really be tied that closely
> to garbage collection, whether it should be limited to C
> implementations ... that I'm not so sure of.

It's up to you to show that this doesn't completely kill performance.
It would take a lot of measurements.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Mon Aug 28 04:20:42 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 27 Aug 2006 19:20:42 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F0743D1@au3010avexu1.global.avaya.com>
References: <2773CAC687FD5F4689F526998C7E4E5F0743D1@au3010avexu1.global.avaya.com>
Message-ID: <20060827191547.1AEB.JCARLSON@uci.edu>


"Delaney, Timothy (Tim)" <tdelaney at avaya.com> wrote:
> 
> Jim Jewett wrote:
> 
> > On 8/27/06, Delaney, Timothy (Tim) <tdelaney at avaya.com> wrote:
> >> Jim Jewett wrote:
> > 
> >>>     s[start:stop].find(prefix)
> > 
> >> No matter what, I really think the obj[start:stop:step]
> >> syntax needs to be consistent in its behaviour - either
> >> returning a copy or a view -
> > 
> > Does it still matter if we're looking only at immutable sequences,
> > such as text? 
> 
> Actually, yes. I think it should be an explicit operation to say "I'm
> taking a small view of this large string, which will result in the large
> string existing until the view goes away".
> 
> Currently the way to do that is to have a method. I'm simply proposing
> that we reserve syntax that is currently not used to prevent it from
> being used for another, less appropriate usage. It may never be used at
> all.

In what I have been attempting to propose, no text methods would ever
return a view.  If one wants a view of text, one needs to manually
construct the view via 'view = textview(st, start, stop)' or some
equivalent spelling.  After that, any operations on a view returns views
(with a few exceptions, like steps != 1).

The seemingly proposed textobj(start:stop) returning a view is not
terribly intuitive, as () and [] aren't so terribly different from each
other to not confuse someone initially.  Never mind that it would be a
syntax addition for the equivalent of a small subset of operations on
currently existing objects.


 - Josiah


From greg.ewing at canterbury.ac.nz  Mon Aug 28 04:20:18 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 28 Aug 2006 14:20:18 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608271638r1ca2d114yc98a9c4f28036791@mail.gmail.com>
References: <44F0107B.20205@iinet.net.au>
	<20060826084138.1AC0.JCARLSON@uci.edu>
	<fb6fbf560608261859r1ecac1a8ye23008534b952c05@mail.gmail.com>
	<ca471dc20608262000j6d41d26dwd765e88feee5dacd@mail.gmail.com>
	<fb6fbf560608262030m2286a273nbd126a98b63103d3@mail.gmail.com>
	<ca471dc20608270850l70279c2bw30a41d82a721f00e@mail.gmail.com>
	<fb6fbf560608271638r1ca2d114yc98a9c4f28036791@mail.gmail.com>
Message-ID: <44F252E2.4080700@canterbury.ac.nz>

Jim Jewett wrote:
> On 8/27/06, Guido van Rossum <guido at python.org> wrote:

> > Any object? And how is that supposed to work for arbitrary
> > objects?
> 
> For non-iterables, it will raise a TypeError.

I think the question was what benefit would there be
in a general slice-view object which knew nothing about
the internal structure of the thing it's viewing. The
benefits of the string views we're talking about hinge
on the fact that they're special-purpose and know how
to get directly at the bytes of the underlying string.

--
Greg

From tdelaney at avaya.com  Mon Aug 28 04:26:40 2006
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Mon, 28 Aug 2006 12:26:40 +1000
Subject: [Python-3000] Making more effective use of slice objects in Py3k
Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1E921@au3010avexu1.global.avaya.com>

Josiah Carlson wrote:

> The seemingly proposed textobj(start:stop) returning a view is not
> terribly intuitive, as () and [] aren't so terribly different from
> each other to not confuse someone initially.

Nor {} as I proposed for that matter ;)

Tim Delaney

From greg.ewing at canterbury.ac.nz  Mon Aug 28 04:32:54 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 28 Aug 2006 14:32:54 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060827191547.1AEB.JCARLSON@uci.edu>
References: <2773CAC687FD5F4689F526998C7E4E5F0743D1@au3010avexu1.global.avaya.com>
	<20060827191547.1AEB.JCARLSON@uci.edu>
Message-ID: <44F255D6.2060002@canterbury.ac.nz>

Josiah Carlson wrote:
> If one wants a view of text, one needs to manually
> construct the view via 'view = textview(st, start, stop)' or some
> equivalent spelling.  After that, any operations on a view returns views

Given Guido's sensitivity about potential misuses of
views, it might be better if operations on views
*didn't* return views, so that you would have to be
explicit about creating views at all stages.

--
Greg

From jcarlson at uci.edu  Mon Aug 28 04:43:36 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 27 Aug 2006 19:43:36 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608271417w480c90aeg6b39c766a8f94750@mail.gmail.com>
References: <20060827091000.1ADF.JCARLSON@uci.edu>
	<ca471dc20608271417w480c90aeg6b39c766a8f94750@mail.gmail.com>
Message-ID: <20060827184941.1AE8.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> 
> On 8/27/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > [1] When I say "tree persistance", I mean those cases like a -> b -> c,
> > where view b persist because view a persists, even though b doesn't have
> > a reference otherwise.  Making both views a and b reference c directly
> > allows for b to be freed when it is no longer used.
> 
> Yeah, but you're still keeping c alive, which is the real memory waste.

It depends on the application.


1. Let us say I was parsing XML.  Rather than allocating a bunch of small
strings for the various tags, attributes, and data, I could instead
allocate a bunch of string views with pointers into the one larger XML
string.

Because all of the views are the same size, we can use a free list and
optimize allocation, deallocation, etc.  Small strings, on the other
hand, can't have such optimizations, and we would end up fragmentinh
memory over a long series of XML parsings (possibly leading to an
eventual MemoryError).

Even better, if the underlying parsing mechanism expects to recieve a
string, and we pass it a string view instead, then with the proper
string+view implementation, it wouldn't ever need to know that it is
working on views, it would just work, and we would recieve the parsing
with views instead of sliced strings.


2.Another example is the parsing of email or any other [header, blank
line, body] structured data (and even mime-like headers).  Say you have
read in a single email, you can have a view (or views) of the various
headers, with the multipart body, etc., and wouldn't need to copy
anything. Never mind that one could easily handle the insertion of
headers, body portions, etc., all without slicing the original (possibly
large) email, allowing for the easy manipulation of data with little
memory overhead.

Heck, one could even read in an entire mbox-formatted file, pull out all
of the original emails, rearrange them (resort folder by sent
date/recieved time), and write them back to disk, again without ever
slicing up the original mailbox file, resulting in roughly 1/2 the
memory overhead of an equivalent operation using string slicing.


3. In the 2.x byte string case (str not unicode), we have seen with the
various str.find() to str.partition() that chopping up data isn't
uncommon, and that generally most pieces are used, meaning that the
equivalent memory use of the original string is going to persist in
memory anyways.

Also, I would just like to state that I am not advocating the automatic
creation of views depending on string operations, one should always
construct the views explicitly, with something like view = stringview(st). 
Then the operations on the view should return further views and perhaps
occasionally strings, but operations on strings should never return
views.


---
Speaking of the 2.x byte strings and using str.partition() in 3.x, if
2.x strings are going away in 3.x, shouldn't we be either transitioning
everything to using bytes or unicode?  Initial translation of the
standard library to use partition/index seems like a huge time
investment, unless it is planned on being backported to the trunk for
2.6 .

Which reminds me, on August 28, 2005, Raymond sent me an initial patch
for a find -> partition patch for the full 2.5 standard library at the
time.  I can provide everyone with that patch along with my comments,
which may or may not be enough to transition most of the standard
library today.


 - Josiah


From jcarlson at uci.edu  Mon Aug 28 04:45:25 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 27 Aug 2006 19:45:25 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5FF1E921@au3010avexu1.global.avaya.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1E921@au3010avexu1.global.avaya.com>
Message-ID: <20060827194428.1AEE.JCARLSON@uci.edu>


"Delaney, Timothy (Tim)" <tdelaney at avaya.com> wrote:
> 
> Josiah Carlson wrote:
> 
> > The seemingly proposed textobj(start:stop) returning a view is not
> > terribly intuitive, as () and [] aren't so terribly different from
> > each other to not confuse someone initially.
> 
> Nor {} as I proposed for that matter ;)

I can't really see the difference between () and {} when they are on
their own with the font I'm using for email.  Yeah, that's not good
either.

 - Josiah


From jcarlson at uci.edu  Mon Aug 28 10:00:55 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 28 Aug 2006 01:00:55 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F255D6.2060002@canterbury.ac.nz>
References: <20060827191547.1AEB.JCARLSON@uci.edu>
	<44F255D6.2060002@canterbury.ac.nz>
Message-ID: <20060827214348.1AF4.JCARLSON@uci.edu>


Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Josiah Carlson wrote:
> > If one wants a view of text, one needs to manually
> > construct the view via 'view = textview(st, start, stop)' or some
> > equivalent spelling.  After that, any operations on a view returns views
> 
> Given Guido's sensitivity about potential misuses of
> views, it might be better if operations on views
> *didn't* return views, so that you would have to be
> explicit about creating views at all stages.

If every operation on a view returned a string copy, then what would be
the point of the view in the first place?  An alias for Python 2.x
buffer()?  No, that would be silly.

As I see it, the point of string/text views is:
1. Remove all start, stop optional arguments from all string methods,
replacing them with view slicing; resulting in generally improved call
performance by the second or third operation on the original string.
2. Reduce memory use and fragmentation of common operations (like...
while rest: prev, found, rest = rest.partition(sep) ) by performing
those operations on views.
3. Reduce execution time of slicing or slicing-like operations by
performing them on views (prev, found, rest = rest.partition(sep)).

Note that with 2 and 3, it doesn't matter how much or little you 'slice'
from the view, the slicing and/or creation of new views referencing the
original string is a constant time operation every time.

By making view.oper() always return strings instead of views, it makes
#1 the only reason for views, even though #2 and #3 are also important
and valid motivators.

I would also like to point out that it would make the oft-cited
partition example "while rest: first, found, rest = rest.partition(sep)"
run in linear rather than quadratic time, where users will be pleasantly
surprised about improvement in speed (or the lack of a reduction in
speed).


 - Josiah


From p.f.moore at gmail.com  Mon Aug 28 11:08:31 2006
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 28 Aug 2006 10:08:31 +0100
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com>
References: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com>
Message-ID: <79990c6b0608280208l41c2ae9bm1c76ee3bf06c99a7@mail.gmail.com>

On 8/28/06, Delaney, Timothy (Tim) <tdelaney at avaya.com> wrote:
> For when/*if* views ever become considered to be a good thing for
> builtin classes, etc, may I suggest that the following syntax be
> reserved for view creation:
>
>     obj{start:stop:step}
>
> mapping to something like:
>
>     def __view__(self, slice)
>
> So if you really want a string view, use:
>
>     s{1:2}
>
> instead of:
>
>     s[1:2]
>
> I don't *think* the syntax is currently legal, and I don't think it
> could ever be ambiguous - anyone think of a case where it could be?

OTOH, it is very subtle. I had to lean closer to the monitor before I
could even see the distinction you were making! (OK, some of that is
due to less-than-ideal fonts plus failing eyesight, but the point
remains...)

Paul.

From brian at sweetapp.com  Mon Aug 28 11:35:39 2006
From: brian at sweetapp.com (Brian Quinlan)
Date: Mon, 28 Aug 2006 11:35:39 +0200
Subject: [Python-3000] Warning about future-unsafe usage patterns in Python
 2.x e.g. dict.keys().sort()
In-Reply-To: <20060827214348.1AF4.JCARLSON@uci.edu>
References: <20060827191547.1AEB.JCARLSON@uci.edu>	<44F255D6.2060002@canterbury.ac.nz>
	<20060827214348.1AF4.JCARLSON@uci.edu>
Message-ID: <44F2B8EB.6040704@sweetapp.com>

It is my understanding that, in Python 3000, certain functions and 
methods that currently return lists will return some sort of view type 
(e.g. dict.values()) or an iterator (e.g. zip). So certain usage 
patterns will no longer be supported e.g. d.keys().sort().

The attached patch, which is a diff against the subversion "trunk" of 
Python 2.x, tries to warn the user about these kind of future-unsafe 
usage patterns. It works by storing the type that the list will become 
in the future, at creation time, and checking to see if called list 
functions will be supported by that type in the future.

Currently the patch if very incomplete and the idea itself may be 
flawed. But I thought it was interesting to run against my own code to 
see what potential problems it has. Example:

...
Type "help", "copyright", "credits" or "license" for more information.
 >>> d = {"apple" : "sweet", "orange" : "tangy"}
 >>> "juicy" in d.values()
False
 >>> d.keys().sort()
__main__:1: DeprecationWarning: dictionary view will not support sort
 >>> "a" in zip([1,2,3,4], "abcd")
__main__:1: DeprecationWarning: iterator will not support contains
False

Cheers,
Brian
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: warn_list_usage.diff
Url: http://mail.python.org/pipermail/python-3000/attachments/20060828/8f08a2a7/attachment-0001.diff 

From g.brandl at gmx.net  Mon Aug 28 12:22:11 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 28 Aug 2006 12:22:11 +0200
Subject: [Python-3000] Set literals
Message-ID: <ecug4k$cg8$1@sea.gmane.org>

At python.org/sf/1547796, there is a preliminary patch for Py3k set literals
as specified in PEP 3100.

Set comprehensions are not implemented.

have fun,
Georg


From rrr at ronadam.com  Mon Aug 28 13:14:14 2006
From: rrr at ronadam.com (Ron Adam)
Date: Mon, 28 Aug 2006 06:14:14 -0500
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F0107B.20205@iinet.net.au>
References: <44F0107B.20205@iinet.net.au>
Message-ID: <ecujbm$n90$1@sea.gmane.org>

Nick Coghlan wrote:
> This idea is inspired by the find/rfind string discussion (particularly a 
> couple of comments from Jim and Ron), but I think the applicability may prove 
> to be wider than just string methods (e.g. I suspect it may prove useful for 
> the bytes() type as well).

If I'm following the ideas here which was based (only in part) on my 
suggestion.  It's not a major feature request, but instead a combination 
of various small changes in which each may have some benefits of their 
own. The proposal is more in line with cleaning up things so they can 
(if one desires) get them to work together easier.  But that needn't be 
the main reason for doing it.

I also recognize that python has many very specific functions and 
modules, many of which are highly optimized.  Most of the major problems 
have already been solved in that way, so it is really hard to find 
things that make a big difference.  But I don't think that means we 
shouldn't work on making small improvements to things where they are 
possible, even if it's only to make it a bit easier to remember and/or 
learn.


> I think an enriched slicing model that allows sequence views to be expressed 
> easily as "this slice of this sequence" would allow this to be dealt with 
> cleanly, without requiring every sequence to provide a corresponding "sequence 
> view" with non-copying semantics. I think Guido's concern that people will 
> reach for string views when they don't need them is also valid (as I believe 
> that it is most often inexperience that leads to premature optimization that 
> then leads to needless code complexity).

I agree with both of these, but maybe we should concentrate on the 
individual changes and not a big picture to justify a group of changes. 
  The individual changes or enhancements need to stand on their own.

So in that light, the following individual *separate* items is what I 
would focus on for now. (Not string views or slice partition functions. 
Let those come later if they prove useful.)


> The specific changes I suggest based on the find/rfind discussion are:
> 
>    1. make range() (what used to be xrange()) a subclass of slice(), so that 
> range objects can be used to index sequences. The only differences between 
> range() and slice() would then be that start/stop/step will never be None for 
> range instances, and range instances act like an immutable sequence while 
> slice instances do not (i.e. range objects would grow an indices() method).


1. Remove None stored as indices in slice objects. Depending on the step 
value, Any Nones can be converted to 0 or -1 immediately, the step 
should never be None or Zero.

Once the slice is created the Nones are not needed, valid index values 
can be determined. This moves the checks forward to slice object 
creation time from slice object use time.

If a slice object is reused, then there might be some (micro) 
performance benefits if it is defined outside a loop and then used 
multiple times inside a loop.

Also the indices can be read and used directly via  slice.start, etc... 
without having to check for None or invalid index's if someone wants to 
do that.


>    2. change range() and slice() to accept slice() instances as arguments so 
> that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError 
> if x.stop is None).

2. Enable slices and ranges to be converted back and forth.

This works now.

 >>> xrange(*slice(1,-1,1).indices(10))
xrange(1, 9)


There is no way to get the indices from an xrange object. They are not 
available via attributes or methods, (that I know of), but they can be 
gotten by parsing the __repr__ string.

So this doesn't work.

     slice(*xrange(1,10,1).indices())   # no indices method

While I don't have any real specific use case for this item, it may have 
some educational or introspective value. ie... something to teach the 
relationships of each.  An xrange() object can also be defined outside a 
loop and then used multiple times in an inner loop.



3. Continue to make xrange() and slice() a bit more alike in how they 
work and the values they return, but keep them separate and don't 
subclass range from slice.  Each has a definite different purpose 
although they are related in some ways they shouldn't try to 'be' the 
other I think.

The following examples show some inconsistencies in how they work or 
where they could be more alike.  For example viewing a xrange vs slice 
objects returns differing representations depending on what the values 
of the indices are.  These are just minor (barely) annoyances, and there 
isn't anything actually wrong, but they could be improved a bit I think.


# slice always shows all three values if viewed. (This is ok)
 >>> slice(10)
slice(None, 10, None)    # None stored as indices.
 >>> slice(0, 10, 1)
slice(0, 10, 1)

# - xrange only shows values different from the defaults.
 >>> xrange(10)
xrange(10)
 >>> xrange(1, 10)
xrange(1, 10)
 >>> xrange(0, 10, 1)
xrange(10)              # hides 0 and 1

# - The xrange stop value is always an even increment of
# the step value + start.
is even numbered.
 >>> xrange(1, 10, 2)
xrange(1, 11, 2)        # 11! why not 10 here?
 >>> xrange(0, 10, 3)
xrange(0, 12, 3)        # and 10 here instead of 12?


# slice accepts anything!
 >>> slice(1, 10, 0)         # zero for step
slice(1, 10, 0)
 >>> slice(list, int, dict)
slice(<type 'list'>, <type 'int'>, <type 'dict'>)

# xrange rejects any invalid index's.
 >>> xrange(None, 10, None)           # None not an integer.
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: an integer is required

 >>> xrange(1, 10, 0)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
ValueError: xrange() arg 3 must not be zero




4. Allow slice objects to be sub-classed. That will allow for 
experimentation and or for programmers to modify slice in ways they may 
find useful for their "own" applications.  Most likely it would be a way 
to group methods together that all use the same start, stop and or step 
indices.  And then could it be possible to apply those via the slice 
operation at once?


5. Find a way to avoid slice wrap-a-rounds.  These happen when iterating 
past zero in either direction.  It usually requires a different approach 
and/or check to avoid going past the zero/-1 boundary.

One thought I've had on this is to allow only positive integers along 
with a symbol to indicate an index is to be counted from the far end. 
Then an exception could be raised if a negative index is used.

Possibly something like:
    [i:\j]     # '\' indicate j is to be counted from the far end.

The line continuation back slash could be special cased for use with 
slices I think.  But some other symbol might be better.



I think this group of separate items taken together will do what the 
title in this thread suggests.  But each of these is a separate item in 
itself as well and has its own reasons why it could be helpful.



Regarding the other items...

The above changes possibly make some (or most) of the other suggestions 
possible and/or easier to implement.  So then a programmer can roll 
their own string views or slice partition functions in a clean way if 
they want to.  That's the point of the "Making more effective use of 
slice objects".  Its not a specific idea, but a generality that may come 
about by doing these other smaller things first.   And doing them as a 
group is probably a good way to address these things.

I hope this clarifies at least my view point if not Nicks. But I'll keep 
an open mind and see what he has to offer in his PEP.

Cheers,
    Ron

















From ncoghlan at gmail.com  Mon Aug 28 13:40:41 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 28 Aug 2006 21:40:41 +1000
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608270908h130c9f29jce193dda6430e507@mail.gmail.com>
References: <44F0107B.20205@iinet.net.au>	
	<ca471dc20608260930h528a7f60rc254eb2f75398a57@mail.gmail.com>	
	<44F1BA0E.3040203@gmail.com>
	<ca471dc20608270908h130c9f29jce193dda6430e507@mail.gmail.com>
Message-ID: <44F2D639.1080808@gmail.com>

Guido van Rossum wrote:
> On 8/27/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> I believe the current mixed model is actually an artifact of the 
>> transition
>> from simple slicing to extended slicing,
> 
> Really? Extended slicing mostly meant adding a third "step" option to
> the slice syntax, which is useful for NumPy but completely pointless
> for string searches as we're discussing here. The slice() object was
> invented as an API hack so that we didn't have to add new special
> methods.

This is exactly what I'm talking about - I believe the reason you don't see it 
as an oddity, is because you were used to the "start+stop" idiom from before 
slice() was added. For me, only starting to seriously use Python after the 
__*slice__ family of methods had already been deprecated, slice() objects are 
the basic idiom, with any occurrences of "start+stop" being artifacts of the 
old slicing model.

For someone picking up the language after slice() has been added, it's like 
"we've gone to all the effort of defining a type just for sequence slices, but 
we're only going to use it in this one little corner of the language".

>> All other instances in the core and standard library which use a 
>> different
>> representation of a sequence slice (like the optional arguments to string
>> methods, or the result of the indices() method) would change to use 
>> one of
>> those two types. The methods of the types would be driven by the needs 
>> of the
>> standard library.
> 
> What's the indices() method?

An existing method on slice objects that accepts a sequence length and returns 
the appropriate (start, stop, step) 3-tuple. Very handy for implementing 
__getitem__ methods properly.

> Write the PEP and make sure it is plentiful of examples of old and new
> ways of doing common string operations.

Indeed!

Cheers,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From edcjones at comcast.net  Mon Aug 28 16:21:10 2006
From: edcjones at comcast.net (Edward C. Jones)
Date: Mon, 28 Aug 2006 10:21:10 -0400
Subject: [Python-3000] Warning about future-unsafe usage patterns in
 Python 2.x e.g. dict.keys().sort()
In-Reply-To: <mailman.36113.1156757744.27774.python-3000@python.org>
References: <mailman.36113.1156757744.27774.python-3000@python.org>
Message-ID: <44F2FBD6.6040205@comcast.net>


Brian Quinlan said:
> It is my understanding that, in Python 3000, certain functions and 
> methods that currently return lists will return some sort of view type 
> (e.g. dict.values()) or an iterator (e.g. zip). So certain usage 
> patterns will no longer be supported e.g. d.keys().sort().

I use this idiom fairly often:

d = dict()
...
thekeys = d.keys()
thekeys.sort()
for key in thekeys:
     ...

What should I use in Python 3.0?

From fdrake at acm.org  Mon Aug 28 16:45:23 2006
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 28 Aug 2006 10:45:23 -0400
Subject: [Python-3000] Warning about future-unsafe usage patterns in
	Python 2.x e.g. dict.keys().sort()
In-Reply-To: <44F2FBD6.6040205@comcast.net>
References: <mailman.36113.1156757744.27774.python-3000@python.org>
	<44F2FBD6.6040205@comcast.net>
Message-ID: <200608281045.24215.fdrake@acm.org>

On Monday 28 August 2006 10:21, Edward C. Jones wrote:
 > d = dict()
 > ...
 > thekeys = d.keys()
 > thekeys.sort()
 > for key in thekeys:
 >      ...
 >
 > What should I use in Python 3.0?

d = dict()
...
for key in sorted(d.keys()):
    ...


  -Fred

-- 
Fred L. Drake, Jr.   <fdrake at acm.org>

From ronaldoussoren at mac.com  Mon Aug 28 16:46:53 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Mon, 28 Aug 2006 16:46:53 +0200
Subject: [Python-3000] Warning about future-unsafe usage patterns in
	Python 2.x e.g. dict.keys().sort()
In-Reply-To: <44F2FBD6.6040205@comcast.net>
References: <mailman.36113.1156757744.27774.python-3000@python.org>
	<44F2FBD6.6040205@comcast.net>
Message-ID: <6667A80E-E767-4408-8B24-AF9AF3F2DAB0@mac.com>


On 28-aug-2006, at 16:21, Edward C. Jones wrote:

>
> Brian Quinlan said:
>> It is my understanding that, in Python 3000, certain functions and
>> methods that currently return lists will return some sort of view  
>> type
>> (e.g. dict.values()) or an iterator (e.g. zip). So certain usage
>> patterns will no longer be supported e.g. d.keys().sort().
>
> I use this idiom fairly often:
>
> d = dict()
> ...
> thekeys = d.keys()
> thekeys.sort()
> for key in thekeys:
>      ...
>
> What should I use in Python 3.0?

for key in sorted(d.keys()):
     ...

This works in python 2.4 as well.

Ronald


From david.nospam.hopwood at blueyonder.co.uk  Mon Aug 28 17:33:31 2006
From: david.nospam.hopwood at blueyonder.co.uk (David Hopwood)
Date: Mon, 28 Aug 2006 16:33:31 +0100
Subject: [Python-3000] Warning about future-unsafe usage patterns in
 Python 2.x e.g. dict.keys().sort()
In-Reply-To: <44F2B8EB.6040704@sweetapp.com>
References: <20060827191547.1AEB.JCARLSON@uci.edu>	<44F255D6.2060002@canterbury.ac.nz>	<20060827214348.1AF4.JCARLSON@uci.edu>
	<44F2B8EB.6040704@sweetapp.com>
Message-ID: <44F30CCB.8080705@blueyonder.co.uk>

Brian Quinlan wrote:
> It is my understanding that, in Python 3000, certain functions and
> methods that currently return lists will return some sort of view type
> (e.g. dict.values()) or an iterator (e.g. zip). So certain usage
> patterns will no longer be supported e.g. d.keys().sort().
> 
> The attached patch, which is a diff against the subversion "trunk" of
> Python 2.x, tries to warn the user about these kind of future-unsafe
> usage patterns. It works by storing the type that the list will become
> in the future, at creation time, and checking to see if called list
> functions will be supported by that type in the future.

+1 on the idea of the patch.

Some nitpicking:

> +#define PY_REMAIN_LIST     0x01  /* List will remain a list in Py2K */

"in Py3K".

> +		/* XXX This should be PyExc_PendingDeprecationWarning */
> +		if (PyErr_WarnEx(PyExc_DeprecationWarning, message, 1) < 0)
> +			return -1;

Why isn't it PyExc_PendingDeprecationWarning?

> +#define WARN_LIST_USAGE(self, supported_types, operation) \
> +	if (warn_future_usage((PyListObject *) self, \
> +                           supported_types, operation) < 0) \
> +		return NULL;
> +
> +#define WARN_LIST_USAGE_INT(self, supported_types, operation) \
> +	if (warn_future_usage((PyListObject *) self, \
> +						   supported_types, operation) < 0) \
> +		return -1;

These are macros that hide control flow. In this case I don't think that the
difference in verbosity between, say,

    if (warn_future_usage(a, PY_REMAIN_LIST | PY_BECOME_DICTVIEW, "len") < 0)
        return -1;

and

    WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST | PY_BECOME_DICTVIEW, "len");

is sufficient to justify hiding the return in a macro.

(The cast to PyListObject * is not needed: you have the same cast within
warn_future_usage, so its 'self' argument could just as well be declared
as PyObject *.)


The 'operation' string is sometimes a gerund ("slicing", etc.) and sometimes
the name of a method. This should be more consistent.

> +	WARN_LIST_USAGE(a, PY_REMAIN_LIST, "repitition");

"repetition"

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>



From guido at python.org  Mon Aug 28 18:22:52 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Aug 2006 09:22:52 -0700
Subject: [Python-3000] Warning about future-unsafe usage patterns in
	Python 2.x e.g. dict.keys().sort()
In-Reply-To: <44F2B8EB.6040704@sweetapp.com>
References: <20060827191547.1AEB.JCARLSON@uci.edu>
	<44F255D6.2060002@canterbury.ac.nz>
	<20060827214348.1AF4.JCARLSON@uci.edu> <44F2B8EB.6040704@sweetapp.com>
Message-ID: <ca471dc20608280922k182b75d5xf271fc96f8c96c10@mail.gmail.com>

Not much time to review the patch, but +1 on this -- I've described
this a few times in my Py3k talk, glad that some code is forthcoming
now!

--Guido

On 8/28/06, Brian Quinlan <brian at sweetapp.com> wrote:
> It is my understanding that, in Python 3000, certain functions and
> methods that currently return lists will return some sort of view type
> (e.g. dict.values()) or an iterator (e.g. zip). So certain usage
> patterns will no longer be supported e.g. d.keys().sort().
>
> The attached patch, which is a diff against the subversion "trunk" of
> Python 2.x, tries to warn the user about these kind of future-unsafe
> usage patterns. It works by storing the type that the list will become
> in the future, at creation time, and checking to see if called list
> functions will be supported by that type in the future.
>
> Currently the patch if very incomplete and the idea itself may be
> flawed. But I thought it was interesting to run against my own code to
> see what potential problems it has. Example:
>
> ...
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> d = {"apple" : "sweet", "orange" : "tangy"}
>  >>> "juicy" in d.values()
> False
>  >>> d.keys().sort()
> __main__:1: DeprecationWarning: dictionary view will not support sort
>  >>> "a" in zip([1,2,3,4], "abcd")
> __main__:1: DeprecationWarning: iterator will not support contains
> False
>
> Cheers,
> Brian
>
>
> Index: Python/bltinmodule.c
> ===================================================================
> --- Python/bltinmodule.c        (revision 51629)
> +++ Python/bltinmodule.c        (working copy)
> @@ -1570,7 +1570,7 @@
>                 goto Fail;
>         }
>
> -       v = PyList_New(n);
> +       v = PyList_NewFutureType(n, PY_BECOME_ITER);
>         if (v == NULL)
>                 goto Fail;
>
> @@ -1678,7 +1678,7 @@
>                                 "range() result has too many items");
>                 return NULL;
>         }
> -       v = PyList_New(n);
> +       v = PyList_NewFutureType(n, PY_BECOME_ITER);
>         if (v == NULL)
>                 return NULL;
>         for (i = 0; i < n; i++) {
> @@ -2120,7 +2120,7 @@
>         Py_ssize_t len;    /* guess at result length */
>
>         if (itemsize == 0)
> -               return PyList_New(0);
> +               return PyList_NewFutureType(0, PY_BECOME_ITER);
>
>         /* args must be a tuple */
>         assert(PyTuple_Check(args));
> @@ -2148,7 +2148,7 @@
>         /* allocate result list */
>         if (len < 0)
>                 len = 10;       /* arbitrary */
> -       if ((ret = PyList_New(len)) == NULL)
> +       if ((ret = PyList_NewFutureType(len, PY_BECOME_ITER)) == NULL)
>                 return NULL;
>
>         /* obtain iterators */
> Index: Include/listobject.h
> ===================================================================
> --- Include/listobject.h        (revision 51629)
> +++ Include/listobject.h        (working copy)
> @@ -19,6 +19,12 @@
>  extern "C" {
>  #endif
>
> +/* Constants representing the types that may be used instead of a list
> +   in Python 3000 */
> +#define PY_REMAIN_LIST     0x01  /* List will remain a list in Py2K */
> +#define PY_BECOME_DICTVIEW 0x02  /* List will become a "view" on a dict */
> +#define PY_BECOME_ITER     0x04  /* List will become an iterator */
> +
>  typedef struct {
>      PyObject_VAR_HEAD
>      /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
> @@ -36,6 +42,7 @@
>       * the list is not yet visible outside the function that builds it.
>       */
>      Py_ssize_t allocated;
> +    int future_type; /* The type the object will have in Py3K */
>  } PyListObject;
>
>  PyAPI_DATA(PyTypeObject) PyList_Type;
> @@ -44,6 +51,7 @@
>  #define PyList_CheckExact(op) ((op)->ob_type == &PyList_Type)
>
>  PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size);
> +PyAPI_FUNC(PyObject *) PyList_NewFutureType(Py_ssize_t size, int future_type);
>  PyAPI_FUNC(Py_ssize_t) PyList_Size(PyObject *);
>  PyAPI_FUNC(PyObject *) PyList_GetItem(PyObject *, Py_ssize_t);
>  PyAPI_FUNC(int) PyList_SetItem(PyObject *, Py_ssize_t, PyObject *);
> @@ -57,6 +65,9 @@
>  PyAPI_FUNC(PyObject *) _PyList_Extend(PyListObject *, PyObject *);
>
>  /* Macro, trading safety for speed */
> +/* XXX These functions do not (yet) trigger future usage warnings.
> +   So e.g. range(100)[0] will slip though
> +*/
>  #define PyList_GET_ITEM(op, i) (((PyListObject *)(op))->ob_item[i])
>  #define PyList_SET_ITEM(op, i, v) (((PyListObject *)(op))->ob_item[i] = (v))
>  #define PyList_GET_SIZE(op)    (((PyListObject *)(op))->ob_size)
> Index: Objects/dictobject.c
> ===================================================================
> --- Objects/dictobject.c        (revision 51629)
> +++ Objects/dictobject.c        (working copy)
> @@ -1003,7 +1003,7 @@
>
>    again:
>         n = mp->ma_used;
> -       v = PyList_New(n);
> +       v = PyList_NewFutureType(n, PY_BECOME_DICTVIEW);
>         if (v == NULL)
>                 return NULL;
>         if (n != mp->ma_used) {
> @@ -1037,7 +1037,7 @@
>
>    again:
>         n = mp->ma_used;
> -       v = PyList_New(n);
> +       v = PyList_NewFutureType(n, PY_BECOME_DICTVIEW);
>         if (v == NULL)
>                 return NULL;
>         if (n != mp->ma_used) {
> @@ -1076,7 +1076,7 @@
>          */
>    again:
>         n = mp->ma_used;
> -       v = PyList_New(n);
> +       v = PyList_NewFutureType(n, PY_BECOME_DICTVIEW);
>         if (v == NULL)
>                 return NULL;
>         for (i = 0; i < n; i++) {
> Index: Objects/listobject.c
> ===================================================================
> --- Objects/listobject.c        (revision 51629)
> +++ Objects/listobject.c        (working copy)
> @@ -8,6 +8,49 @@
>  #include <sys/types.h>         /* For size_t */
>  #endif
>
> +static int warn_future_usage(PyListObject *self,
> +                                                int supported_types, char *operation)
> +{
> +       char message[256];
> +
> +       if ((((PyListObject *) self)->future_type & supported_types) == 0)
> +       {
> +               switch (self->future_type) {
> +                       case PY_BECOME_DICTVIEW:
> +                               PyOS_snprintf(message, sizeof(message),
> +                                       "dictionary view will not support %s",
> +                                       operation);
> +                               break;
> +                       case PY_BECOME_ITER:
> +                               PyOS_snprintf(message, sizeof(message),
> +                                       "iterator will not support %s",
> +                                       operation);
> +                               break;
> +                       default: /* This shouldn't happen */
> +                               PyErr_BadInternalCall();
> +                               return -1;
> +               }
> +
> +               /* XXX This should be PyExc_PendingDeprecationWarning */
> +               if (PyErr_WarnEx(PyExc_DeprecationWarning, message, 1) < 0)
> +                       return -1;
> +       }
> +
> +       return 0;
> +}
> +
> +#define WARN_LIST_USAGE(self, supported_types, operation) \
> +       if (warn_future_usage((PyListObject *) self, \
> +                           supported_types, operation) < 0) \
> +               return NULL;
> +
> +#define WARN_LIST_USAGE_INT(self, supported_types, operation) \
> +       if (warn_future_usage((PyListObject *) self, \
> +                                                  supported_types, operation) < 0) \
> +               return -1;
> +
> +#define PyList_Check(op) PyObject_TypeCheck(op, &PyList_Type)
> +
>  /* Ensure ob_item has room for at least newsize elements, and set
>   * ob_size to newsize.  If newsize > ob_size on entry, the content
>   * of the new slots at exit is undefined heap trash; it's the caller's
> @@ -116,10 +159,29 @@
>         }
>         op->ob_size = size;
>         op->allocated = size;
> +       op->future_type = PY_REMAIN_LIST;
>         _PyObject_GC_TRACK(op);
>         return (PyObject *) op;
>  }
>
> +PyObject *
> +PyList_NewFutureType(Py_ssize_t size, int future_type)
> +{
> +       PyListObject *op = (PyListObject *) PyList_New(size);
> +       if (op == NULL)
> +               return NULL;
> +       else {
> +               if (future_type == 0)
> +               {
> +                       Py_DECREF(op);
> +                       PyErr_BadInternalCall();
> +                       return NULL;
> +               }
> +               op->future_type = future_type;
> +               return (PyObject *) op;
> +       }
> +}
> +
>  Py_ssize_t
>  PyList_Size(PyObject *op)
>  {
> @@ -369,6 +431,7 @@
>  static Py_ssize_t
>  list_length(PyListObject *a)
>  {
> +       WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST | PY_BECOME_DICTVIEW, "len");
>         return a->ob_size;
>  }
>
> @@ -378,6 +441,7 @@
>         Py_ssize_t i;
>         int cmp;
>
> +       WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST | PY_BECOME_DICTVIEW, "contains");
>         for (i = 0, cmp = 0 ; cmp == 0 && i < a->ob_size; ++i)
>                 cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
>                                                    Py_EQ);
> @@ -387,6 +451,7 @@
>  static PyObject *
>  list_item(PyListObject *a, Py_ssize_t i)
>  {
> +       WARN_LIST_USAGE(a, PY_REMAIN_LIST, "item indexing");
>         if (i < 0 || i >= a->ob_size) {
>                 if (indexerr == NULL)
>                         indexerr = PyString_FromString(
> @@ -404,6 +469,8 @@
>         PyListObject *np;
>         PyObject **src, **dest;
>         Py_ssize_t i, len;
> +
> +       WARN_LIST_USAGE(a, PY_REMAIN_LIST, "slicing");
>         if (ilow < 0)
>                 ilow = 0;
>         else if (ilow > a->ob_size)
> @@ -444,6 +511,9 @@
>         Py_ssize_t i;
>         PyObject **src, **dest;
>         PyListObject *np;
> +
> +       WARN_LIST_USAGE(a, PY_REMAIN_LIST, "concatenation");
> +
>         if (!PyList_Check(bb)) {
>                 PyErr_Format(PyExc_TypeError,
>                           "can only concatenate list (not \"%.200s\") to list",
> @@ -484,6 +554,8 @@
>         PyListObject *np;
>         PyObject **p, **items;
>         PyObject *elem;
> +
> +       WARN_LIST_USAGE(a, PY_REMAIN_LIST, "repitition");
>         if (n < 0)
>                 n = 0;
>         size = a->ob_size * n;
> @@ -521,6 +593,8 @@
>  {
>         Py_ssize_t i;
>         PyObject **item = a->ob_item;
> +
> +       WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST, "clear");
>         if (item != NULL) {
>                 /* Because XDECREF can recursively invoke operations on
>                    this list, we make it empty first. */
> @@ -565,6 +639,9 @@
>         Py_ssize_t k;
>         size_t s;
>         int result = -1;        /* guilty until proved innocent */
> +
> +       WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST, "slicing");
> +
>  #define b ((PyListObject *)v)
>         if (v == NULL)
>                 n = 0;
> @@ -658,9 +735,9 @@
>  {
>         PyObject **items;
>         Py_ssize_t size, i, j, p;
> +       size = PyList_GET_SIZE(self);
>
> -
> -       size = PyList_GET_SIZE(self);
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "repeat");
>         if (size == 0) {
>                 Py_INCREF(self);
>                 return (PyObject *)self;
> @@ -692,6 +769,8 @@
>  list_ass_item(PyListObject *a, Py_ssize_t i, PyObject *v)
>  {
>         PyObject *old_value;
> +
> +       WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST, "item assignment");
>         if (i < 0 || i >= a->ob_size) {
>                 PyErr_SetString(PyExc_IndexError,
>                                 "list assignment index out of range");
> @@ -711,6 +790,8 @@
>  {
>         Py_ssize_t i;
>         PyObject *v;
> +
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "insert");
>         if (!PyArg_ParseTuple(args, "nO:insert", &i, &v))
>                 return NULL;
>         if (ins1(self, i, v) == 0)
> @@ -721,6 +802,7 @@
>  static PyObject *
>  listappend(PyListObject *self, PyObject *v)
>  {
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "append");
>         if (app1(self, v) == 0)
>                 Py_RETURN_NONE;
>         return NULL;
> @@ -736,6 +818,7 @@
>         Py_ssize_t i;
>         PyObject *(*iternext)(PyObject *);
>
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "extend");
>         /* Special cases:
>            1) lists and tuples which can use PySequence_Fast ops
>            2) extending self to self requires making a copy first
> @@ -851,6 +934,7 @@
>  {
>         PyObject *result;
>
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "concatentation");
>         result = listextend(self, other);
>         if (result == NULL)
>                 return result;
> @@ -866,6 +950,7 @@
>         PyObject *v, *arg = NULL;
>         int status;
>
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "pop");
>         if (!PyArg_UnpackTuple(args, "pop", 0, 1, &arg))
>                 return NULL;
>         if (arg != NULL) {
> @@ -1995,6 +2080,8 @@
>         PyObject *key, *value, *kvpair;
>         static char *kwlist[] = {"cmp", "key", "reverse", 0};
>
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "sort");
> +
>         assert(self != NULL);
>         assert (PyList_Check(self));
>         if (args != NULL) {
> @@ -2163,6 +2250,7 @@
>  static PyObject *
>  listreverse(PyListObject *self)
>  {
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "reverse");
>         if (self->ob_size > 1)
>                 reverse_slice(self->ob_item, self->ob_item + self->ob_size);
>         Py_RETURN_NONE;
> @@ -2213,6 +2301,7 @@
>         Py_ssize_t i, start=0, stop=self->ob_size;
>         PyObject *v;
>
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "index");
>         if (!PyArg_ParseTuple(args, "O|O&O&:index", &v,
>                                     _PyEval_SliceIndex, &start,
>                                     _PyEval_SliceIndex, &stop))
> @@ -2244,6 +2333,7 @@
>         Py_ssize_t count = 0;
>         Py_ssize_t i;
>
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "count");
>         for (i = 0; i < self->ob_size; i++) {
>                 int cmp = PyObject_RichCompareBool(self->ob_item[i], v, Py_EQ);
>                 if (cmp > 0)
> @@ -2259,6 +2349,7 @@
>  {
>         Py_ssize_t i;
>
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "remove");
>         for (i = 0; i < self->ob_size; i++) {
>                 int cmp = PyObject_RichCompareBool(self->ob_item[i], v, Py_EQ);
>                 if (cmp > 0) {
> @@ -2372,6 +2463,7 @@
>                self->allocated == 0 || self->allocated == -1);
>
>         /* Empty previous contents */
> +       self->future_type = PY_REMAIN_LIST;
>         if (self->ob_item != NULL) {
>                 (void)list_clear(self);
>         }
> @@ -2456,6 +2548,8 @@
>  static PyObject *
>  list_subscript(PyListObject* self, PyObject* item)
>  {
> +       WARN_LIST_USAGE(self, PY_REMAIN_LIST, "__getitem__");
> +
>         if (PyIndex_Check(item)) {
>                 Py_ssize_t i;
>                 i = PyNumber_AsSsize_t(item, PyExc_IndexError);
> @@ -2505,6 +2599,7 @@
>  static int
>  list_ass_subscript(PyListObject* self, PyObject* item, PyObject* value)
>  {
> +       WARN_LIST_USAGE_INT(self, PY_REMAIN_LIST, "item assignment");
>         if (PyIndex_Check(item)) {
>                 Py_ssize_t i = PyNumber_AsSsize_t(item, PyExc_IndexError);
>                 if (i == -1 && PyErr_Occurred())
> @@ -2874,6 +2969,7 @@
>  {
>         listreviterobject *it;
>
> +       WARN_LIST_USAGE(seq, PY_REMAIN_LIST, "reversed");
>         it = PyObject_GC_New(listreviterobject, &PyListRevIter_Type);
>         if (it == NULL)
>                 return NULL;
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 28 18:42:06 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Aug 2006 09:42:06 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060827184941.1AE8.JCARLSON@uci.edu>
References: <20060827091000.1ADF.JCARLSON@uci.edu>
	<ca471dc20608271417w480c90aeg6b39c766a8f94750@mail.gmail.com>
	<20060827184941.1AE8.JCARLSON@uci.edu>
Message-ID: <ca471dc20608280942s35ca3c8byca725a16484a7e2c@mail.gmail.com>

Josiah (and other supporters of string views),

You seem to be utterly convinced of the superior performance of your
proposal without having done any measurements.

You appear to have a rather naive view on what makes code execute fast
or slow (e.g. you don't seem to appreciate the savings due to a string
object header and its data being consecutive in memory).

Unless you have serious benchmark data (for realistic Python code) I
can't continue to participate in this discussion, where you have said
nothing new in many posts.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fredrik at pythonware.com  Mon Aug 28 18:48:52 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 28 Aug 2006 18:48:52 +0200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608280942s35ca3c8byca725a16484a7e2c@mail.gmail.com>
References: <20060827091000.1ADF.JCARLSON@uci.edu>	<ca471dc20608271417w480c90aeg6b39c766a8f94750@mail.gmail.com>	<20060827184941.1AE8.JCARLSON@uci.edu>
	<ca471dc20608280942s35ca3c8byca725a16484a7e2c@mail.gmail.com>
Message-ID: <ecv6pj$1mn$1@sea.gmane.org>

Guido van Rossum wrote:

> (e.g. you don't seem to appreciate the savings due to a string
> object header and its data being consecutive in memory).

footnote: note that the Unicode string type still doesn't do that (my 
original implementation *did* support string views, and nobody's ever 
gotten around to fully rip it out), so if anyone wants to benchmark 
things related to this specific feature, comparing unicode strings with 
8-bit strings could be someone useful.

</F>


From guido at python.org  Mon Aug 28 18:52:23 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Aug 2006 09:52:23 -0700
Subject: [Python-3000] Set literals
In-Reply-To: <ecug4k$cg8$1@sea.gmane.org>
References: <ecug4k$cg8$1@sea.gmane.org>
Message-ID: <ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>

On 8/28/06, Georg Brandl <g.brandl at gmx.net> wrote:
> At python.org/sf/1547796, there is a preliminary patch for Py3k set literals
> as specified in PEP 3100.

Very cool! This is now checked in.

Georg, can you do something about repr() of an empty set? This
currently produces "{}" while it should produce "set()".

> Set comprehensions are not implemented.

ETA?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Mon Aug 28 19:44:52 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 28 Aug 2006 19:44:52 +0200
Subject: [Python-3000] Set literals
In-Reply-To: <ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>
References: <ecug4k$cg8$1@sea.gmane.org>
	<ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>
Message-ID: <ecva2k$dep$1@sea.gmane.org>

Guido van Rossum wrote:
> On 8/28/06, Georg Brandl <g.brandl at gmx.net> wrote:
>> At python.org/sf/1547796, there is a preliminary patch for Py3k set literals
>> as specified in PEP 3100.
> 
> Very cool! This is now checked in.

Wow, that's fast...

> Georg, can you do something about repr() of an empty set? This
> currently produces "{}" while it should produce "set()".

Right, forgot about that case. I'll correct that now.
(Grr, I even mindlessly changed the unittest that would have caught it)

In the meantime, I played around with the peepholer and tried to copy
the "for x in tuple_or_list" optimization for sets. Results are in SF
patch #1548082.

>> Set comprehensions are not implemented.
> 
> ETA?

There are some points I'd like to have clarified first:

* would it be wise to have some general listcomp <-> genexp
   cleanup first? This starts with the grammar, which currently is slightly
   different (see Grammar:79), and it looks like there's quite a lot of
   (almost) duplicated code in ast.c and compile.c too.

* list comprehensions are special-cased because of the LIST_APPEND opcode.
   If there isn't going to be a special-cased SET_ADD, it's probably the
   easiest thing to transform {x for x in a} into set(x for x in a) in the
   AST step, with "set" of course always being the builtin set.

Georg


From guido at python.org  Mon Aug 28 20:55:30 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Aug 2006 11:55:30 -0700
Subject: [Python-3000] Set literals
In-Reply-To: <ecva2k$dep$1@sea.gmane.org>
References: <ecug4k$cg8$1@sea.gmane.org>
	<ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>
	<ecva2k$dep$1@sea.gmane.org>
Message-ID: <ca471dc20608281155q444e70e8v3d076399ba0d919d@mail.gmail.com>

On 8/28/06, Georg Brandl <g.brandl at gmx.net> wrote:
> Guido van Rossum wrote:
> > On 8/28/06, Georg Brandl <g.brandl at gmx.net> wrote:
> >> At python.org/sf/1547796, there is a preliminary patch for Py3k set literals
> >> as specified in PEP 3100.
> >
> > Very cool! This is now checked in.
>
> Wow, that's fast...

Well it passed all unit tests and the rules for the py3k branch are a
bit looser than for the head... :)

> > Georg, can you do something about repr() of an empty set? This
> > currently produces "{}" while it should produce "set()".
>
> Right, forgot about that case. I'll correct that now.
> (Grr, I even mindlessly changed the unittest that would have caught it)

Checkin?

> In the meantime, I played around with the peepholer and tried to copy
> the "for x in tuple_or_list" optimization for sets. Results are in SF
> patch #1548082.
>
> >> Set comprehensions are not implemented.
> >
> > ETA?
>
> There are some points I'd like to have clarified first:
>
> * would it be wise to have some general listcomp <-> genexp
>    cleanup first? This starts with the grammar, which currently is slightly
>    different (see Grammar:79), and it looks like there's quite a lot of
>    (almost) duplicated code in ast.c and compile.c too.

I expec this cleanup to be quite a bit of work since the semantics are
seriously different. ([...] uses the surrounding scope for the loop
control variables.)

However you might be able to just cleanup the grammar so they are
identical, that would be simpler I suspect.

> * list comprehensions are special-cased because of the LIST_APPEND opcode.
>    If there isn't going to be a special-cased SET_ADD, it's probably the
>    easiest thing to transform {x for x in a} into set(x for x in a) in the
>    AST step, with "set" of course always being the builtin set.

Right. That might actually become a prototype for how to the list
translation as well.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Mon Aug 28 21:49:39 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 28 Aug 2006 12:49:39 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608280942s35ca3c8byca725a16484a7e2c@mail.gmail.com>
References: <20060827184941.1AE8.JCARLSON@uci.edu>
	<ca471dc20608280942s35ca3c8byca725a16484a7e2c@mail.gmail.com>
Message-ID: <20060828120741.1AF7.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> 
> Josiah (and other supporters of string views),
> 
> You seem to be utterly convinced of the superior performance of your
> proposal without having done any measurements.
> 
> You appear to have a rather naive view on what makes code execute fast
> or slow (e.g. you don't seem to appreciate the savings due to a string
> object header and its data being consecutive in memory).
> 
> Unless you have serious benchmark data (for realistic Python code) I
> can't continue to participate in this discussion, where you have said
> nothing new in many posts.

Put up or shut up, eh?

I have written a simple extension module using Pyrex (my manual C
extension writing is awful).  Here are some sample interactions showing
that string views are indeed quite fast.  In all of these examples, a
naive implementation using only stringview.partition() was able to beat
Python 2.5 str.partition, str.split, and re.finditer.

Attached you will find the implementation of stringview I used, along
with sufficient build scripts to get it working using Python 2.3 and
Pyrex 0.9.3 .  Aside from replacing int usage with Py_ssize_t for 2.5,
and *nix users performing a dos2unix call, it should work without change
with the most recent Python and Pyrex versions.

 - Josiah


Using 2.3 :
    >>> x = stringview(40000*' ')
    >>> if 1:
    ...     t = time.time()
    ...     while x:
    ...             _1, _2, x = x.partition(' ')
    ...     print time.time()-t
    ... 
    0.18700003624
    >>> 

Compared with Python 2.5 beta 2
    >>> x = 40000*' '
    >>> if 1:
    ...     t = time.time()
    ...     while x:
    ...             _1, _2, x = x.partition(' ')
    ...     print time.time()-t
    ...
    0.625
    >>> 

But that's about as bad for Python 2.5 as it can get.  What about
something else?  Like a mail file?  In my 21.5 meg archive of py3k,
which contains 3456 messages, I wanted to discover all messages.

Python 2.3.5 (#62, Feb  8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from stringview import *
>>> rest = stringview(open('mail', 'rb').read())
>>> import time
>>> if 1:
...     x = []
...     t = time.time()
...     while rest:
...         cur, found, rest = rest.partition('\r\n.\r\n')
...         x.append(cur)
...     print time.time()-t, len(x)
...
0.0780000686646 3456
>>> 

What about Python 2.5 using split?  That should be fast...

Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on
 win32
Type "help", "copyright", "credits" or "license" for more information.
>>> rest = open('mail', 'rb').read()
>>> import time
>>> if 1:
...     t = time.time()
...     x = rest.split('\r\n.\r\n')
...     print time.time()-t, len(x)
...
0.109999895096 3457
>>> 

Hrm...what about using re?
>>> import re
>>> pat = re.compile('\r\n\.\r\n')
>>> rest = open('mail', 'rb').read()
>>> import time
>>> if 1:
...     x = []
...     t = time.time()
...     for i in pat.finditer(rest):
...         x.append(i)
...     print time.time()-t, len(x)
...
0.125 3456
>>>

Even that's not as good as Python 2.3 + string views.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringview_build.py
Type: application/octet-stream
Size: 654 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringview.pyx
Type: application/octet-stream
Size: 2639 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0001.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringview_helper.h
Type: application/octet-stream
Size: 1656 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0002.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: _setup.py
Type: application/octet-stream
Size: 255 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0003.obj 

From g.brandl at gmx.net  Mon Aug 28 21:52:53 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 28 Aug 2006 21:52:53 +0200
Subject: [Python-3000] Set literals
In-Reply-To: <ca471dc20608281155q444e70e8v3d076399ba0d919d@mail.gmail.com>
References: <ecug4k$cg8$1@sea.gmane.org>	<ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>	<ecva2k$dep$1@sea.gmane.org>
	<ca471dc20608281155q444e70e8v3d076399ba0d919d@mail.gmail.com>
Message-ID: <ecvhil$7mg$1@sea.gmane.org>

Guido van Rossum wrote:

>> > Georg, can you do something about repr() of an empty set? This
>> > currently produces "{}" while it should produce "set()".
>>
>> Right, forgot about that case. I'll correct that now.
>> (Grr, I even mindlessly changed the unittest that would have caught it)
> 
> Checkin?

Done. It now also renders repr(frozenset()) as "frozenset()", which should
cause no problems though.

>> In the meantime, I played around with the peepholer and tried to copy
>> the "for x in tuple_or_list" optimization for sets. Results are in SF
>> patch #1548082.
>>
>> >> Set comprehensions are not implemented.
>> >
>> > ETA?
>>
>> There are some points I'd like to have clarified first:
>>
>> * would it be wise to have some general listcomp <-> genexp
>>    cleanup first? This starts with the grammar, which currently is slightly
>>    different (see Grammar:79), and it looks like there's quite a lot of
>>    (almost) duplicated code in ast.c and compile.c too.
> 
> I expec this cleanup to be quite a bit of work since the semantics are
> seriously different. ([...] uses the surrounding scope for the loop
> control variables.)

I didn't say that I wanted to champion that cleanup ;)

> However you might be able to just cleanup the grammar so they are
> identical, that would be simpler I suspect.

Looking at the grammar, there's only testlist_safe left to kill, in
favor of or_test like in generator expressions. The old_ rules are still
needed.

Hm. Is the precedence in

x = lambda: 1 if 0 else 2

really obvious?


>> * list comprehensions are special-cased because of the LIST_APPEND opcode.
>>    If there isn't going to be a special-cased SET_ADD, it's probably the
>>    easiest thing to transform {x for x in a} into set(x for x in a) in the
>>    AST step, with "set" of course always being the builtin set.
> 
> Right. That might actually become a prototype for how to the list
> translation as well.

Would this need a new opcode, or should generators be special-cased by
BUILD_SET?

Which doesn't seem like a good idea because it means that
     {(x for x in iterable)} == {x for x in iterable}

Georg


From guido at python.org  Mon Aug 28 22:07:55 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Aug 2006 13:07:55 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060828120741.1AF7.JCARLSON@uci.edu>
References: <20060827184941.1AE8.JCARLSON@uci.edu>
	<ca471dc20608280942s35ca3c8byca725a16484a7e2c@mail.gmail.com>
	<20060828120741.1AF7.JCARLSON@uci.edu>
Message-ID: <ca471dc20608281307uf5e1995vfa65538e156f0c0a@mail.gmail.com>

Those are all microbenchmarks. It's easy to prove the superiority of
an approach that way. But what about realistic applications? What if
your views don't end up saving memory or time for an application, but
still cost in terms of added complexity in all string operations?

Anyway, let me begin with your  microbenchmark.

The first one pits a linear algorithm against a quadratic algorithm
with the expected result.

The second one is more interesting; your version doesn't copy while
the split() version copies, and that gives your version the expected
speedup. I never doubted this.

But your code has a worst-case problem: if you take a single short
view of a really long string and then drop the long string, the view
keeps it around. Something like this:

rest = ... # your mailbox file
results = []
for i in range(1000):
  x = rest + "." # Just to force a copy
  results.append(x.partition("\r\n.\r\n")[0]) Save the *first* message
over and over

Now watch the memory growth with your version vs. with standard partition.

Now fix this in your code and re-run your benchmark.

Then I come with another worst-case scenario, etc.

Then I ask you to make it so that string views are 99.999%
indistinguishable from strings -- they have all the same methods, are
usable everywhere else, etc.

--Guido

On 8/28/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Guido van Rossum" <guido at python.org> wrote:
> >
> > Josiah (and other supporters of string views),
> >
> > You seem to be utterly convinced of the superior performance of your
> > proposal without having done any measurements.
> >
> > You appear to have a rather naive view on what makes code execute fast
> > or slow (e.g. you don't seem to appreciate the savings due to a string
> > object header and its data being consecutive in memory).
> >
> > Unless you have serious benchmark data (for realistic Python code) I
> > can't continue to participate in this discussion, where you have said
> > nothing new in many posts.
>
> Put up or shut up, eh?
>
> I have written a simple extension module using Pyrex (my manual C
> extension writing is awful).  Here are some sample interactions showing
> that string views are indeed quite fast.  In all of these examples, a
> naive implementation using only stringview.partition() was able to beat
> Python 2.5 str.partition, str.split, and re.finditer.
>
> Attached you will find the implementation of stringview I used, along
> with sufficient build scripts to get it working using Python 2.3 and
> Pyrex 0.9.3 .  Aside from replacing int usage with Py_ssize_t for 2.5,
> and *nix users performing a dos2unix call, it should work without change
> with the most recent Python and Pyrex versions.
>
>  - Josiah
>
>
> Using 2.3 :
>     >>> x = stringview(40000*' ')
>     >>> if 1:
>     ...     t = time.time()
>     ...     while x:
>     ...             _1, _2, x = x.partition(' ')
>     ...     print time.time()-t
>     ...
>     0.18700003624
>     >>>
>
> Compared with Python 2.5 beta 2
>     >>> x = 40000*' '
>     >>> if 1:
>     ...     t = time.time()
>     ...     while x:
>     ...             _1, _2, x = x.partition(' ')
>     ...     print time.time()-t
>     ...
>     0.625
>     >>>
>
> But that's about as bad for Python 2.5 as it can get.  What about
> something else?  Like a mail file?  In my 21.5 meg archive of py3k,
> which contains 3456 messages, I wanted to discover all messages.
>
> Python 2.3.5 (#62, Feb  8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from stringview import *
> >>> rest = stringview(open('mail', 'rb').read())
> >>> import time
> >>> if 1:
> ...     x = []
> ...     t = time.time()
> ...     while rest:
> ...         cur, found, rest = rest.partition('\r\n.\r\n')
> ...         x.append(cur)
> ...     print time.time()-t, len(x)
> ...
> 0.0780000686646 3456
> >>>
>
> What about Python 2.5 using split?  That should be fast...
>
> Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on
>  win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> rest = open('mail', 'rb').read()
> >>> import time
> >>> if 1:
> ...     t = time.time()
> ...     x = rest.split('\r\n.\r\n')
> ...     print time.time()-t, len(x)
> ...
> 0.109999895096 3457
> >>>
>
> Hrm...what about using re?
> >>> import re
> >>> pat = re.compile('\r\n\.\r\n')
> >>> rest = open('mail', 'rb').read()
> >>> import time
> >>> if 1:
> ...     x = []
> ...     t = time.time()
> ...     for i in pat.finditer(rest):
> ...         x.append(i)
> ...     print time.time()-t, len(x)
> ...
> 0.125 3456
> >>>
>
> Even that's not as good as Python 2.3 + string views.
>
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhettinger at ewtllc.com  Mon Aug 28 22:08:48 2006
From: rhettinger at ewtllc.com (Raymond Hettinger)
Date: Mon, 28 Aug 2006 13:08:48 -0700
Subject: [Python-3000] Set literals
In-Reply-To: <ecva2k$dep$1@sea.gmane.org>
References: <ecug4k$cg8$1@sea.gmane.org>	<ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>
	<ecva2k$dep$1@sea.gmane.org>
Message-ID: <44F34D50.2080805@ewtllc.com>

Georg Brandl wrote:

>In the meantime, I played around with the peepholer and tried to copy
>the "for x in tuple_or_list" optimization for sets. Results are in SF
>patch #1548082.
>
>  
>
Did you mean "if x in tuple_or_list"?   IIRC, there was some reason that 
mutable lists were not supposed to be made into constants in for-loops.





>* list comprehensions are special-cased because of the LIST_APPEND opcode.
>   If there isn't going to be a special-cased SET_ADD, it's probably the
>   easiest thing to transform {x for x in a} into set(x for x in a) in the
>   AST step, with "set" of course always being the builtin set.
>
>  
>

Set comprehensions and list comprehensions are fundamentally the same 
and therefore should have identical implementations. 

While transformation to a generator expression may seem like a good idea 
now, I expect that you'll observe a two-fold performance hit and end-up 
abandoning that approach in favor of the current LIST_APPEND approach.

So it would probably be best to start by teaching the compiler to hide 
the loop variable in a LIST_APPEND approach to list comprehensions and 
then duplicate that approach for set comprehensions.


Raymond



From guido at python.org  Mon Aug 28 22:14:17 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Aug 2006 13:14:17 -0700
Subject: [Python-3000] Set literals
In-Reply-To: <ecvhil$7mg$1@sea.gmane.org>
References: <ecug4k$cg8$1@sea.gmane.org>
	<ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>
	<ecva2k$dep$1@sea.gmane.org>
	<ca471dc20608281155q444e70e8v3d076399ba0d919d@mail.gmail.com>
	<ecvhil$7mg$1@sea.gmane.org>
Message-ID: <ca471dc20608281314y530b88d3gacfa6d02102cea45@mail.gmail.com>

On 8/28/06, Georg Brandl <g.brandl at gmx.net> wrote:
> Guido van Rossum wrote:
>
> >> > Georg, can you do something about repr() of an empty set? This
> >> > currently produces "{}" while it should produce "set()".
> >>
> >> Right, forgot about that case. I'll correct that now.
> >> (Grr, I even mindlessly changed the unittest that would have caught it)
> >
> > Checkin?
>
> Done. It now also renders repr(frozenset()) as "frozenset()", which should
> cause no problems though.

Thanks -- looks good!

> >> In the meantime, I played around with the peepholer and tried to copy
> >> the "for x in tuple_or_list" optimization for sets. Results are in SF
> >> patch #1548082.
> >>
> >> >> Set comprehensions are not implemented.
> >> >
> >> > ETA?
> >>
> >> There are some points I'd like to have clarified first:
> >>
> >> * would it be wise to have some general listcomp <-> genexp
> >>    cleanup first? This starts with the grammar, which currently is slightly
> >>    different (see Grammar:79), and it looks like there's quite a lot of
> >>    (almost) duplicated code in ast.c and compile.c too.
> >
> > I expec this cleanup to be quite a bit of work since the semantics are
> > seriously different. ([...] uses the surrounding scope for the loop
> > control variables.)
>
> I didn't say that I wanted to champion that cleanup ;)

That's fine!

> > However you might be able to just cleanup the grammar so they are
> > identical, that would be simpler I suspect.
>
> Looking at the grammar, there's only testlist_safe left to kill, in
> favor of or_test like in generator expressions. The old_ rules are still
> needed.

Hm, it's been so long... Why?

> Hm. Is the precedence in
>
> x = lambda: 1 if 0 else 2
>
> really obvious?

Yes if you think about how you would use it. Conditionally returning a
lambda or something else is kind of rare. A lambda using a condition
is kind of useful. :-)

> >> * list comprehensions are special-cased because of the LIST_APPEND opcode.
> >>    If there isn't going to be a special-cased SET_ADD, it's probably the
> >>    easiest thing to transform {x for x in a} into set(x for x in a) in the
> >>    AST step, with "set" of course always being the builtin set.
> >
> > Right. That might actually become a prototype for how to the list
> > translation as well.
>
> Would this need a new opcode, or should generators be special-cased by
> BUILD_SET?

Can't remember what BUILD_SET is.

> Which doesn't seem like a good idea because it means that
>      {(x for x in iterable)} == {x for x in iterable}

That should definitely not happen!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Mon Aug 28 22:32:53 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 28 Aug 2006 22:32:53 +0200
Subject: [Python-3000] Set literals
In-Reply-To: <ca471dc20608281314y530b88d3gacfa6d02102cea45@mail.gmail.com>
References: <ecug4k$cg8$1@sea.gmane.org>	<ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>	<ecva2k$dep$1@sea.gmane.org>	<ca471dc20608281155q444e70e8v3d076399ba0d919d@mail.gmail.com>	<ecvhil$7mg$1@sea.gmane.org>
	<ca471dc20608281314y530b88d3gacfa6d02102cea45@mail.gmail.com>
Message-ID: <ecvjtk$fvi$1@sea.gmane.org>

Guido van Rossum wrote:

>> > However you might be able to just cleanup the grammar so they are
>> > identical, that would be simpler I suspect.
>>
>> Looking at the grammar, there's only testlist_safe left to kill, in
>> favor of or_test like in generator expressions. The old_ rules are still
>> needed.
> 
> Hm, it's been so long... Why?

In listcomps/genexps, old_test and old_lambdef do not allow conditional
expressions in order to avoid confusion with the loop's "if".

>> Hm. Is the precedence in
>>
>> x = lambda: 1 if 0 else 2
>>
>> really obvious?
> 
> Yes if you think about how you would use it. Conditionally returning a
> lambda or something else is kind of rare. A lambda using a condition
> is kind of useful. :-)

Okay, that makes sense.

>> >> * list comprehensions are special-cased because of the LIST_APPEND opcode.
>> >>    If there isn't going to be a special-cased SET_ADD, it's probably the
>> >>    easiest thing to transform {x for x in a} into set(x for x in a) in the
>> >>    AST step, with "set" of course always being the builtin set.
>> >
>> > Right. That might actually become a prototype for how to the list
>> > translation as well.
>>
>> Would this need a new opcode, or should generators be special-cased by
>> BUILD_SET?
> 
> Can't remember what BUILD_SET is.

Sorry... it's the newly introduced opcode that creates a new set.

Georg


From g.brandl at gmx.net  Mon Aug 28 22:42:33 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 28 Aug 2006 22:42:33 +0200
Subject: [Python-3000] Set literals
In-Reply-To: <44F34D50.2080805@ewtllc.com>
References: <ecug4k$cg8$1@sea.gmane.org>	<ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>	<ecva2k$dep$1@sea.gmane.org>
	<44F34D50.2080805@ewtllc.com>
Message-ID: <ecvkfo$hth$1@sea.gmane.org>

Raymond Hettinger wrote:
> Georg Brandl wrote:
> 
>>In the meantime, I played around with the peepholer and tried to copy
>>the "for x in tuple_or_list" optimization for sets. Results are in SF
>>patch #1548082.
>>
> Did you mean "if x in tuple_or_list"?   IIRC, there was some reason that 
> mutable lists were not supposed to be made into constants in for-loops.

Yep, I meant the "if" case.

>>* list comprehensions are special-cased because of the LIST_APPEND opcode.
>>   If there isn't going to be a special-cased SET_ADD, it's probably the
>>   easiest thing to transform {x for x in a} into set(x for x in a) in the
>>   AST step, with "set" of course always being the builtin set.
> 
> Set comprehensions and list comprehensions are fundamentally the same 
> and therefore should have identical implementations. 
> 
> While transformation to a generator expression may seem like a good idea 
> now, I expect that you'll observe a two-fold performance hit and end-up 
> abandoning that approach in favor of the current LIST_APPEND approach.

Of course, the LIST_APPEND approach mustn't be thrown out.

> So it would probably be best to start by teaching the compiler to hide 
> the loop variable in a LIST_APPEND approach to list comprehensions and 
> then duplicate that approach for set comprehensions.

Okay, I'll look into that direction. But first I'll try to remove duplication
in ast.c, which should be possible since the syntax of listcomps, genexps
and setcomps will be the same in Py3k.

Georg


From greg.ewing at canterbury.ac.nz  Tue Aug 29 03:10:07 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Aug 2006 13:10:07 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060827214348.1AF4.JCARLSON@uci.edu>
References: <20060827191547.1AEB.JCARLSON@uci.edu>
	<44F255D6.2060002@canterbury.ac.nz>
	<20060827214348.1AF4.JCARLSON@uci.edu>
Message-ID: <44F393EF.6070304@canterbury.ac.nz>

Josiah Carlson wrote:

> If every operation on a view returned a string copy, then what would be
> the point of the view in the first place?

String views would have all the same methods as a real
string, so you could find(), index(), etc. while operating
efficiently on the original data. To my mind this is
preferable to having little-used optional arguments on
an easily-forgotten subset of the string methods: you
only have to remember one thing (how to create a view)
rather than a bunch of random things.

For some things, such as partition(), it might be worth
having a variant that returned views instead of new strings.
But it would be named differently, so you'd still know
whether you were getting a view or not.

On the other hand, this would introduce another random
set of things to remember, i.e. which methods have
view-returning variants. Although maybe it would be
easier to remember them, being different methods rather
than optional arguments to existing methods. Their
existence would show up more clearly under introspection,
for example.

I'm not personally advocating one approach or the other
here -- just pointing out an alternative that might be
more acceptable to the BDFL.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug 29 03:20:58 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Aug 2006 13:20:58 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ecujbm$n90$1@sea.gmane.org>
References: <44F0107B.20205@iinet.net.au> <ecujbm$n90$1@sea.gmane.org>
Message-ID: <44F3967A.7010504@canterbury.ac.nz>

Ron Adam wrote:

> 1. Remove None stored as indices in slice objects. Depending on the step 
> value, Any Nones can be converted to 0 or -1 immediately,

But None isn't the same as -1 in a slice. None means the end
of the sequence, whereas -1 means one less than the end.

I'm also not all that happy about forcing slice indices to
be ints. Traditionally they are, but someone might want to
define a class that uses them in a more general way.

> Once the slice is created the Nones are not needed, valid index values 
> can be determined.

I don't understand what you mean here. Slice objects themselves
know nothing about what object they're going to be used to
slice, so there's no way they can determine "valid index
values" (or even *types* -- see above).

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug 29 03:29:38 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Aug 2006 13:29:38 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608280942s35ca3c8byca725a16484a7e2c@mail.gmail.com>
References: <20060827091000.1ADF.JCARLSON@uci.edu>
	<ca471dc20608271417w480c90aeg6b39c766a8f94750@mail.gmail.com>
	<20060827184941.1AE8.JCARLSON@uci.edu>
	<ca471dc20608280942s35ca3c8byca725a16484a7e2c@mail.gmail.com>
Message-ID: <44F39882.7090501@canterbury.ac.nz>

Guido van Rossum wrote:

> You seem to be utterly convinced of the superior performance of your
> proposal without having done any measurements.

For my part, superior performance isn't the main
reason for considering string views. Rather it's
the simplification that would result from replacing
the current ad-hoc set of optional start-stop
arguments with a single easy-to-remember idiom.

What are your thoughts on that aspect?

--
Greg

From barry at python.org  Tue Aug 29 04:10:14 2006
From: barry at python.org (Barry Warsaw)
Date: Mon, 28 Aug 2006 22:10:14 -0400
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F3967A.7010504@canterbury.ac.nz>
References: <44F0107B.20205@iinet.net.au> <ecujbm$n90$1@sea.gmane.org>
	<44F3967A.7010504@canterbury.ac.nz>
Message-ID: <0B0F9D04-7A68-454C-91F1-E011B862F92A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 28, 2006, at 9:20 PM, Greg Ewing wrote:

> I'm also not all that happy about forcing slice indices to
> be ints. Traditionally they are, but someone might want to
> define a class that uses them in a more general way.

In fact, we do.  Our application is simulated execution of source  
code, so there are cases where we have multiple values due to  
indeterminate conditionals.  For example, we might know that the  
variable "x" has a value between 1 and 5, and we might know that "z"  
is a string with the value "hello there world".  We want to be able  
to index z with Range(1,5) or slice it with say Range(1,5):Range 
(3,7).  Our "z" value is represented by a string-like object that  
presents much of the standard Python string API, so it knows what to  
do with wacky slices with non-integer indices.  We'd definitely want  
to preserve the ability to do that.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRPOiDnEjvBPtnXfVAQJAWwQAnna3MD7qKDY0SFYyTmN/Dnoy3nBrsP/l
kemAn8Rqdj/3EL/iJuesI8N81BtH6CUp3BR0XzCUpKnsTCcyZxjo9M9d96aF18Jm
A8K/QKfRfRRNUe0FuSOwiizRjw8m1yP9k8GNqkOI5IO2B5qt6R8dvyvmAdigWIsg
tVFftyC+1Dw=
=HZRO
-----END PGP SIGNATURE-----

From guido at python.org  Tue Aug 29 04:24:59 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Aug 2006 19:24:59 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F39882.7090501@canterbury.ac.nz>
References: <20060827091000.1ADF.JCARLSON@uci.edu>
	<ca471dc20608271417w480c90aeg6b39c766a8f94750@mail.gmail.com>
	<20060827184941.1AE8.JCARLSON@uci.edu>
	<ca471dc20608280942s35ca3c8byca725a16484a7e2c@mail.gmail.com>
	<44F39882.7090501@canterbury.ac.nz>
Message-ID: <ca471dc20608281924k2904dbf2j8183f673fcc5de1e@mail.gmail.com>

On 8/28/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
> > You seem to be utterly convinced of the superior performance of your
> > proposal without having done any measurements.
>
> For my part, superior performance isn't the main
> reason for considering string views. Rather it's
> the simplification that would result from replacing
> the current ad-hoc set of optional start-stop
> arguments with a single easy-to-remember idiom.
>
> What are your thoughts on that aspect?

A few days ago I posted a bit of code using start-stop arguments and
the same code written using string views. I didn't think the latter
looked better. The start-stop arguments are far from arbitrary. They
are only ad-hoc in the sense that they haven't been added to every API
-- only where they're needed occasionally for performance.

I still fear that a meme will develop that will encourage the use of
views in many cases where they aren't needed; newbies are more prone
to premature optimization than experienced developers, for whom this
feature is intended, and newbies will more likely copy sections of
code without understanding when/why various complexifications are
necessary.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Tue Aug 29 07:17:11 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 28 Aug 2006 22:17:11 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608281307uf5e1995vfa65538e156f0c0a@mail.gmail.com>
References: <20060828120741.1AF7.JCARLSON@uci.edu>
	<ca471dc20608281307uf5e1995vfa65538e156f0c0a@mail.gmail.com>
Message-ID: <20060828132232.1AFD.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> Those are all microbenchmarks. It's easy to prove the superiority of
> an approach that way. But what about realistic applications? What if
> your views don't end up saving memory or time for an application, but
> still cost in terms of added complexity in all string operations?

At no point has anyone claimed that every operation on views will always
be faster than on strings.  Nor has anyone claimed that it will always
reduce memory consumption.  However, for a not insignificant number of
operations, views can be faster, offer better memory use, etc.


I agree with Jean-Paul Calderone:

"If the goal is to avoid speeding up Python programs because views are
too complex or unpythonic or whatever, fine.  But there isn't really any
question as to whether or not this is a real optimization."

"I don't think we see people overusing buffer() in ways which damage
readability now, and buffer is even a builtin.  Tossing something off
into a module somewhere shouldn't really be a problem.  To most people
who don't actually know what they're doing, the idea to optimize code
by reducing memory copying usually just doesn't come up."


While there are examples where views can be slower, this is no different
than the cases where deque is slower than list; sometimes some data
structures are more applicable to the problem than others.  As we have
given users the choice to use a structure that has been optimized for
certain behaviors (set and deque being primary examples), this is just
another structure that offers improved performance for some operations.

> Then I ask you to make it so that string views are 99.999%
> indistinguishable from strings -- they have all the same methods, are
> usable everywhere else, etc.

For reference, I'm about 2 hours into it (including re-reading the
documentation for Pyrex), and I've got [r]partition, [r]find, [r]index,
[r|l]strip. I don't see significant difficulty implementing all other
methods on views.

Astute readers of the original implementation will note that I never
check that the argument being passed in is a string; I use the buffer
interface, so anything offering the buffer interface can be seen as a
read-only view with string methods attached.  Expect a full
implementation later this week.


 - Josiah


From jcarlson at uci.edu  Tue Aug 29 07:31:37 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 28 Aug 2006 22:31:37 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F393EF.6070304@canterbury.ac.nz>
References: <20060827214348.1AF4.JCARLSON@uci.edu>
	<44F393EF.6070304@canterbury.ac.nz>
Message-ID: <20060828213428.1B00.JCARLSON@uci.edu>


Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Josiah Carlson wrote:
> 
> > If every operation on a view returned a string copy, then what would be
> > the point of the view in the first place?
> 
> String views would have all the same methods as a real
> string, so you could find(), index(), etc. while operating
> efficiently on the original data. To my mind this is
> preferable to having little-used optional arguments on
> an easily-forgotten subset of the string methods: you
> only have to remember one thing (how to create a view)
> rather than a bunch of random things.

Indeed, and all of those are preserved if views always returned views,
strings always returned strings, and one used the standard constructors
for both to convert between them; eg. str(view) -> str and view(str) ->
view.  If one ever wanted a string from a view, rather than guessing
which would be the correct one to return (during the implementation of
views), always return a view when operating on views; it's a
constant-time operation per view returned, and if the user really wanted
a string, they can always call str on the returned values.


> For some things, such as partition(), it might be worth
> having a variant that returned views instead of new strings.
> But it would be named differently, so you'd still know
> whether you were getting a view or not.

But wouldn't it be confusing if some methods on views returned views,
while others returned strings?  Wouldn't it make more sense if methods
on an object, generally, returned instances of the same type (when it
made sense)? This seems to be the case with almost every other object
available in the Python standard library, with the notable exceptions of
buffer and mmap.

The slicing operations on mmaps make sense, as only recently did mmaps
gain the ability to map partial files not starting from the beginning,
but I'm not sure how well operating system would handle overlapping
mmaps in the same process (especially during a larger mmap free; that
could bork the heap address space).

For buffer?  I don't know.  Buffer lacks basically every operation that
I use on a string, so I have had little use for it except as a way of
virtually slicing mmaps (for operations where I don't want to pass an
offset argument) and handling socket writing of large blocks of data
that it doesn't make sense to pre-slice*.


> I'm not personally advocating one approach or the other
> here -- just pointing out an alternative that might be
> more acceptable to the BDFL.

Thank you for the input (and thank you for Pyrex, it's making writing
the view object quite easy),

 - Josiah


* Arguably it never makes sense to pre-slice; connection speeds can vary
so significantly that choosing a slice too small results in poor speeds
and high numbers of system calls, and slices that are too large
results in further slicing.  Buffers or their equivalents win by a large
margin.  One trick is to slice the buffer (turning it into a string)
when over half of the original string has been written.  This results in
using at most 2x the minimum amount of memory necessary, while also
guaranteeing that you will only ever slice as much as the minimum
pre-slicing operation would necessitate.


From rrr at ronadam.com  Tue Aug 29 07:47:24 2006
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 29 Aug 2006 00:47:24 -0500
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F3967A.7010504@canterbury.ac.nz>
References: <44F0107B.20205@iinet.net.au> <ecujbm$n90$1@sea.gmane.org>
	<44F3967A.7010504@canterbury.ac.nz>
Message-ID: <ed0kit$2dp$1@sea.gmane.org>

Greg Ewing wrote:
> Ron Adam wrote:
> 
>> 1. Remove None stored as indices in slice objects. Depending on the step 
>> value, Any Nones can be converted to 0 or -1 immediately,
> 
> But None isn't the same as -1 in a slice. None means the end
> of the sequence, whereas -1 means one less than the end.

Yes, you are correct, thats one of those things I get caught on when I 
haven't had enough sleep. ;-)

 >>> 'abcdefg'[-1]
'g'

 >>> 'abcdefg'[0:-1]
'abcdef'

And in addition to that... 0 is not the beginning if the step is -1.

 >>> 'abcdefg'[-1:0:-1]
'gfedcb'

So None for the start index can be 0 or -1.  But for the end index it 
can't be determined.

In the first case above, the stop index would need to be one greater 
than -1 which is 0, and that causes a problem.

In the second case above, the stop index would need to be one less than 
0, then that would again cause a problem.



> I'm also not all that happy about forcing slice indices to
> be ints. Traditionally they are, but someone might want to
> define a class that uses them in a more general way.

Hmmmm, thanks for pointing this out. It sounds interesting and is 
something I hadn't thought about.  In most cases I've seen only integers 
and None are ever used.  And I'm used to seeing an exception if anything 
else is used.

 >>> 'abc'[1.0]
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: string indices must be integers

That is a string method that is generating the exception then and not 
the slice object?

But then what about the slice.indices() method?  It does generate 
exceptions.

 >>> slc = slice(1.0)
 >>> slc.indices(10)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: slice indices must be integers


>> Once the slice is created the Nones are not needed, valid index values 
>> can be determined.
> 
> I don't understand what you mean here. Slice objects themselves
> know nothing about what object they're going to be used to
> slice, so there's no way they can determine "valid index
> values" (or even *types* -- see above).

Ok, I hadn't considered the possibility of methods being defined to read 
the slice object.  Do you know where I could find an example of that?



Cheers,
    Ron





From g.brandl at gmx.net  Tue Aug 29 10:35:43 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 29 Aug 2006 10:35:43 +0200
Subject: [Python-3000] Set literals
In-Reply-To: <ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>
References: <ecug4k$cg8$1@sea.gmane.org>
	<ca471dc20608280952g33233b37k5a044767e1ed0640@mail.gmail.com>
Message-ID: <ed0u8v$sd5$1@sea.gmane.org>

Guido van Rossum wrote:
> On 8/28/06, Georg Brandl <g.brandl at gmx.net> wrote:
>> At python.org/sf/1547796, there is a preliminary patch for Py3k set literals
>> as specified in PEP 3100.
> 
> Very cool! This is now checked in.
> 
> Georg, can you do something about repr() of an empty set? This
> currently produces "{}" while it should produce "set()".
> 
>> Set comprehensions are not implemented.
> 
> ETA?

See patch #1548388.

Cheers,
Georg


From greg.ewing at canterbury.ac.nz  Tue Aug 29 11:14:18 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Aug 2006 21:14:18 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ed0kit$2dp$1@sea.gmane.org>
References: <44F0107B.20205@iinet.net.au> <ecujbm$n90$1@sea.gmane.org>
	<44F3967A.7010504@canterbury.ac.nz> <ed0kit$2dp$1@sea.gmane.org>
Message-ID: <44F4056A.6000009@canterbury.ac.nz>

Ron Adam wrote:

> And in addition to that... 0 is not the beginning if the step is -1.

Negative steps are downright confusing however you
think about them. :-)

> In most cases I've seen only integers 
> and None are ever used.

Numeric uses various strange things as array indexes, such
as Ellipsis and NewAxis. I don't think it uses them as parts
of slices, but I wouldn't be surprised if they came up with
some such usage one day.

>  >>> 'abc'[1.0]
> Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
> TypeError: string indices must be integers
> 
> That is a string method that is generating the exception then and not 
> the slice object?

Yes, I expect so. From experimenting, it seems you can
pass anything you want to slice():

Python 2.3 (#1, Aug  5 2003, 15:52:30)
[GCC 3.1 20020420 (prerelease)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> slice(42.3, "banana", {})
slice(42.299999999999997, 'banana', {})

> But then what about the slice.indices() method?  It does generate 
> exceptions.
> 
>  >>> slc = slice(1.0)
>  >>> slc.indices(10)
> Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
> TypeError: slice indices must be integers

That particular method seems to require ints, yes. But
a slice-using object can extract the start, stop and step
and do whatever it wants with them.

> Ok, I hadn't considered the possibility of methods being defined to read 
> the slice object.  Do you know where I could find an example of that?

--
Greg

From rrr at ronadam.com  Tue Aug 29 13:42:16 2006
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 29 Aug 2006 06:42:16 -0500
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F4056A.6000009@canterbury.ac.nz>
References: <44F0107B.20205@iinet.net.au>
	<ecujbm$n90$1@sea.gmane.org>	<44F3967A.7010504@canterbury.ac.nz>
	<ed0kit$2dp$1@sea.gmane.org> <44F4056A.6000009@canterbury.ac.nz>
Message-ID: <ed19cb$273$1@sea.gmane.org>

Greg Ewing wrote:
> Ron Adam wrote:
> 
>> And in addition to that... 0 is not the beginning if the step is -1.
> 
> Negative steps are downright confusing however you
> think about them. :-)

Yes, and it seems to me it could be easier.  Of course that would mean 
changing something, and any solutions so far is in some way not perfect, 
depending on how you look at it.

>> In most cases I've seen only integers 
>> and None are ever used.
> 
> Numeric uses various strange things as array indexes, such
> as Ellipsis and NewAxis. I don't think it uses them as parts
> of slices, but I wouldn't be surprised if they came up with
> some such usage one day.
 >
>>  >>> 'abc'[1.0]
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in ?
>> TypeError: string indices must be integers
>>
>> That is a string method that is generating the exception then and not 
>> the slice object?
> 
> Yes, I expect so. From experimenting, it seems you can
> pass anything you want to slice():

Hmm..., after playing around with it, list and string methods probably 
call the slices indices() method from within __getitem__.

So it's the slices indices() method that is producing the exceptions in 
both cases.  If other objects allow something besides strings then they 
are probably accessing the stop, start, and step indices directly and 
are not going though slice.indices() to get at them.

The way it seems to work is approximately ...

    s[i:j:k] -> s.__getitem__(x = slice(i,j,k)) # via the SLICE byte code

    i, j, k = x.indices(len(self))    # by s.__getitem__()

The indices method does a type check and fixes the values depending on 
what length is.

>> But then what about the slice.indices() method?  It does generate 
>> exceptions.
>>
>>  >>> slc = slice(1.0)
>>  >>> slc.indices(10)
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in ?
>> TypeError: slice indices must be integers
> 
> That particular method seems to require ints, yes. But
> a slice-using object can extract the start, stop and step
> and do whatever it wants with them.

If you could sub class slice, then it would be possible to replace the 
indices method and turn off the int check and/or put in your own value 
check. but you wouldn't be able to use the i:j:k syntax.  (maybe a good 
thing)

     sequence[myslice_object]    # would work.


You would still need to produce int like values if you use it with 
builtin types.   But not with any of your own objects if you have 
supplied your own __getitem__ method.

Cheers,
    Ron













From guido at python.org  Tue Aug 29 17:42:18 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 08:42:18 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060828132232.1AFD.JCARLSON@uci.edu>
References: <20060828120741.1AF7.JCARLSON@uci.edu>
	<ca471dc20608281307uf5e1995vfa65538e156f0c0a@mail.gmail.com>
	<20060828132232.1AFD.JCARLSON@uci.edu>
Message-ID: <ca471dc20608290842p4798f63ar4e1fc51bbb43c9a9@mail.gmail.com>

On 8/28/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Guido van Rossum" <guido at python.org> wrote:
> > Those are all microbenchmarks. It's easy to prove the superiority of
> > an approach that way. But what about realistic applications? What if
> > your views don't end up saving memory or time for an application, but
> > still cost in terms of added complexity in all string operations?
>
> At no point has anyone claimed that every operation on views will always
> be faster than on strings.  Nor has anyone claimed that it will always
> reduce memory consumption.  However, for a not insignificant number of
> operations, views can be faster, offer better memory use, etc.
>
>
> I agree with Jean-Paul Calderone:
>
> "If the goal is to avoid speeding up Python programs because views are
> too complex or unpythonic or whatever, fine.  But there isn't really any
> question as to whether or not this is a real optimization."

And without qualification that is as false as anything you've said.

> "I don't think we see people overusing buffer() in ways which damage
> readability now, and buffer is even a builtin.  Tossing something off
> into a module somewhere shouldn't really be a problem.  To most people
> who don't actually know what they're doing, the idea to optimize code
> by reducing memory copying usually just doesn't come up."

Another "yes they do -- no they don't" argument. As I've said
repeatedly before, optimizations are likely to be copied without being
understood by newbies. The buffer() built-in has such a poor
reputation and API that it doesn't get much play; but a new "views"
feature that will magically make all your string processing go faster
surely will.

> While there are examples where views can be slower, this is no different
> than the cases where deque is slower than list; sometimes some data
> structures are more applicable to the problem than others.  As we have
> given users the choice to use a structure that has been optimized for
> certain behaviors (set and deque being primary examples), this is just
> another structure that offers improved performance for some operations.

As long as it is very carefully presented as such I have much less of
a problem with it.

Earlier proposals were implying that all string ops should return
views whenever possibe. That, I believe, is never going to fly, and
that's where my main objection lies.

Having views in a library module alleviates many of my objections.
While I still worry that it will be overused, deque doesn't seem to be
overused, so perhaps I should relax.

> > Then I ask you to make it so that string views are 99.999%
> > indistinguishable from strings -- they have all the same methods, are
> > usable everywhere else, etc.
>
> For reference, I'm about 2 hours into it (including re-reading the
> documentation for Pyrex), and I've got [r]partition, [r]find, [r]index,
> [r|l]strip. I don't see significant difficulty implementing all other
> methods on views.
>
> Astute readers of the original implementation will note that I never
> check that the argument being passed in is a string; I use the buffer
> interface, so anything offering the buffer interface can be seen as a
> read-only view with string methods attached.  Expect a full
> implementation later this week.

Good luck!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Tue Aug 29 18:24:29 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 29 Aug 2006 09:24:29 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608290842p4798f63ar4e1fc51bbb43c9a9@mail.gmail.com>
References: <20060828132232.1AFD.JCARLSON@uci.edu>
	<ca471dc20608290842p4798f63ar4e1fc51bbb43c9a9@mail.gmail.com>
Message-ID: <20060829091403.1B09.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> On 8/28/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > While there are examples where views can be slower, this is no different
> > than the cases where deque is slower than list; sometimes some data
> > structures are more applicable to the problem than others.  As we have
> > given users the choice to use a structure that has been optimized for
> > certain behaviors (set and deque being primary examples), this is just
> > another structure that offers improved performance for some operations.
> 
> As long as it is very carefully presented as such I have much less of
> a problem with it.
> 
> Earlier proposals were implying that all string ops should return
> views whenever possibe. That, I believe, is never going to fly, and
> that's where my main objection lies.

String operations always returning views would be arguably insane.  I
hope no one was recommending it (I certainly wasn't, but if my words
were confusing on that part, I apologize); strings are strings, and
views should only be constructed explicitly.

After you have a view, I'm of the opinion that view operations should
return views, except in the case where you explicitly ask for a string
via str(view).


> Having views in a library module alleviates many of my objections.
> While I still worry that it will be overused, deque doesn't seem to be
> overused, so perhaps I should relax.

While it would be interesting (as a social experiment) for views to be
in the __builtins__ module (to test abuse theories), it is probably much
better for it to sit in the collections module.


> > > Then I ask you to make it so that string views are 99.999%
> > > indistinguishable from strings -- they have all the same methods, are
> > > usable everywhere else, etc.
> >
> > For reference, I'm about 2 hours into it (including re-reading the
> > documentation for Pyrex), and I've got [r]partition, [r]find, [r]index,
> > [r|l]strip. I don't see significant difficulty implementing all other
> > methods on views.
> >
> > Astute readers of the original implementation will note that I never
> > check that the argument being passed in is a string; I use the buffer
> > interface, so anything offering the buffer interface can be seen as a
> > read-only view with string methods attached.  Expect a full
> > implementation later this week.
> 
> Good luck!

Thank you!
 - Josiah


From fredrik at pythonware.com  Tue Aug 29 18:32:59 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 29 Aug 2006 18:32:59 +0200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060827184941.1AE8.JCARLSON@uci.edu>
References: <20060827091000.1ADF.JCARLSON@uci.edu>	<ca471dc20608271417w480c90aeg6b39c766a8f94750@mail.gmail.com>
	<20060827184941.1AE8.JCARLSON@uci.edu>
Message-ID: <ed1q7r$v4s$2@sea.gmane.org>

Josiah Carlson wrote:

> 1. Let us say I was parsing XML.  Rather than allocating a bunch of small
> strings for the various tags, attributes, and data, I could instead
> allocate a bunch of string views with pointers into the one larger XML
> string.

when did you last write an XML parser ?

</F>


From jcarlson at uci.edu  Tue Aug 29 19:30:59 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 29 Aug 2006 10:30:59 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ed1q7r$v4s$2@sea.gmane.org>
References: <20060827184941.1AE8.JCARLSON@uci.edu> <ed1q7r$v4s$2@sea.gmane.org>
Message-ID: <20060829102307.1B0F.JCARLSON@uci.edu>


Fredrik Lundh <fredrik at pythonware.com> wrote:
> Josiah Carlson wrote:
> 
> > 1. Let us say I was parsing XML.  Rather than allocating a bunch of small
> > strings for the various tags, attributes, and data, I could instead
> > allocate a bunch of string views with pointers into the one larger XML
> > string.
> 
> when did you last write an XML parser ?

Comparing what I have written as an XML parser to xml.dom, xml.sax,
ElementTree, or others, is a bit like comparing a go-kart with an
automobile.  That is to say, it's been a few years, and it was to
scratch an itch for a particular application, and no other xml parser
existed at the time for my particular applicaion, that I knew of.

Presumably by your question, you think that the particular example I've
offered is bollocks.  Sounds reasonable, I withdraw it.

 - Josiah


From guido at python.org  Tue Aug 29 19:31:49 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 10:31:49 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060829091403.1B09.JCARLSON@uci.edu>
References: <20060828132232.1AFD.JCARLSON@uci.edu>
	<ca471dc20608290842p4798f63ar4e1fc51bbb43c9a9@mail.gmail.com>
	<20060829091403.1B09.JCARLSON@uci.edu>
Message-ID: <ca471dc20608291031l61efbe6aq50d106f4395ffdaa@mail.gmail.com>

On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Guido van Rossum" <guido at python.org> wrote:
> > On 8/28/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > While there are examples where views can be slower, this is no different
> > > than the cases where deque is slower than list; sometimes some data
> > > structures are more applicable to the problem than others.  As we have
> > > given users the choice to use a structure that has been optimized for
> > > certain behaviors (set and deque being primary examples), this is just
> > > another structure that offers improved performance for some operations.
> >
> > As long as it is very carefully presented as such I have much less of
> > a problem with it.
> >
> > Earlier proposals were implying that all string ops should return
> > views whenever possibe. That, I believe, is never going to fly, and
> > that's where my main objection lies.
>
> String operations always returning views would be arguably insane.  I
> hope no one was recommending it (I certainly wasn't, but if my words
> were confusing on that part, I apologize); strings are strings, and
> views should only be constructed explicitly.

I don't know about you, but others have definitely been arguing for
that passionately in the past.

> After you have a view, I'm of the opinion that view operations should
> return views, except in the case where you explicitly ask for a string
> via str(view).

I think it's a mixed bag, and depends on the semantics of the operation.

For operations that are guaranteed to return a substring (like slicing
or partition() -- are there even others?) I think views should return
views (on the original buffer, never views on views).

For operations that may be forced to return a new string (e.g.
concatenation) I think the return value should always be a new string,
even if it could be optimized. So for example if v is a view and s is
a string, v+s should always return a new string, even if s is empty.

BTW beware that in py3k, strings (which will always be unicode
strings) won't support the buffer API -- bytes objects will. Would you
want views on strings or ob bytes or on both?

> > Having views in a library module alleviates many of my objections.
> > While I still worry that it will be overused, deque doesn't seem to be
> > overused, so perhaps I should relax.
>
> While it would be interesting (as a social experiment) for views to be
> in the __builtins__ module (to test abuse theories), it is probably much
> better for it to sit in the collections module.

I'm still very strong on having only a small number of data types
truly built-in; too much choice is much more likely to encourage the
wrong choice, or reduced maintainability.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fredrik at pythonware.com  Tue Aug 29 19:44:27 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 29 Aug 2006 19:44:27 +0200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060829102307.1B0F.JCARLSON@uci.edu>
References: <20060827184941.1AE8.JCARLSON@uci.edu> <ed1q7r$v4s$2@sea.gmane.org>
	<20060829102307.1B0F.JCARLSON@uci.edu>
Message-ID: <ed1uds$iog$1@sea.gmane.org>

Josiah Carlson wrote:

>> when did you last write an XML parser ?
> 
> Comparing what I have written as an XML parser to xml.dom, xml.sax,
> ElementTree, or others, is a bit like comparing a go-kart with an
> automobile.  That is to say, it's been a few years, and it was to
> scratch an itch for a particular application, and no other xml parser
> existed at the time for my particular applicaion, that I knew of.
> 
> Presumably by your question, you think that the particular example I've
> offered is bollocks.

not necessarily, but there are lots of issues involved when doing 
high-performance XML stuff, and I'm not sure views would help quite as 
much as one might think.

(writing and tuning cET was a great way to learn that not everything 
that you think you know about C performance applies to C code running 
inside the Python interpreter...)

</F>


From jcarlson at uci.edu  Tue Aug 29 21:04:35 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 29 Aug 2006 12:04:35 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608291031l61efbe6aq50d106f4395ffdaa@mail.gmail.com>
References: <20060829091403.1B09.JCARLSON@uci.edu>
	<ca471dc20608291031l61efbe6aq50d106f4395ffdaa@mail.gmail.com>
Message-ID: <20060829111904.1B12.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > String operations always returning views would be arguably insane.  I
> > hope no one was recommending it (I certainly wasn't, but if my words
> > were confusing on that part, I apologize); strings are strings, and
> > views should only be constructed explicitly.
> 
> I don't know about you, but others have definitely been arguing for
> that passionately in the past.
> 
> > After you have a view, I'm of the opinion that view operations should
> > return views, except in the case where you explicitly ask for a string
> > via str(view).
> 
> I think it's a mixed bag, and depends on the semantics of the operation.
> 
> For operations that are guaranteed to return a substring (like slicing
> or partition() -- are there even others?) I think views should return
> views (on the original buffer, never views on views).

I agree.

> For operations that may be forced to return a new string (e.g.
> concatenation) I think the return value should always be a new string,
> even if it could be optimized. So for example if v is a view and s is
> a string, v+s should always return a new string, even if s is empty.

I'm on the fence about this.  On the one hand, I understand the
desireability of being able to get the underlying string object without
difficulty.  On the other hand, its performance characteristics could be
confusing to users of Python who may have come to expect that "st+''" is
a constant time operation, regardless of the length of st.

The non-null string addition case, I agree that it could make some sense
to return the string (considering you will need to copy it anyways), but
if one returned a view on that string, it would be more consistant with
other methods, and getting the string back via str(view) would offer
equivalent functionality.  It would also require the user to be explicit
about what they really want; though there is the argument that if I'm
passing a string as an operand to addition with a view, I actually want
a string, so give me one.

I'm going to implement it as returning a view, but leave commented
sections for some of them to return a string.


> BTW beware that in py3k, strings (which will always be unicode
> strings) won't support the buffer API -- bytes objects will. Would you
> want views on strings or ob bytes or on both?

That's tricky.  Views on bytes will come for free, like array, mmap, and
anything else that supports the buffer protocol. It requires the removal
of the __hash__ method for mutables, but that is certainly expected.

Right now, a large portion of standard library code use strings and
string methods to handle parsing, etc.  Removing immutable byte strings
from 3.x seems likely to result in a huge amount of rewriting necessary
to utilize either bytes or text (something I have mentioned before).  I
believe that with views on bytes (and/or sufficient bytes methods), the
vast majority would likely result in the use of bytes.

Having a text view for such situtions that works with the same kinds of
semantics as the bytes view would be nice from a purity/convenience
standpoint, and only needing to handle a single data type (text) could
make its implementation easier.  I don't have any short-term plans of
writing text views, but it may be somewhat easier to do after I'm done
with string/byte views.

 - Josiah


From tomerfiliba at gmail.com  Tue Aug 29 21:43:57 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Tue, 29 Aug 2006 21:43:57 +0200
Subject: [Python-3000] regex literals?
Message-ID: <1d85506f0608291243g2cdfd6f6reb0eb82a5c73fab@mail.gmail.com>

i can't say i'm too fond of this, but i thought of bringing this up. most
scripting
languages (perl, ruby, and boo, to name some) have regular expressions as
language literals. since such languages are heavily used for string
manipulation, it might seem like a good idea to add them at the syntax
level:

e"[A-Za-z_][A-Za-z_0-9]*"

i thought of prefixing "e" for "regular *e*xpression". could also be "p" for
pattern.
it's very simple -- regex literal strings are just passed to re.compile(),
upon
creation, i.e.:
a = e"[A-Z]"

is the same as
a = re.compile("[A-Z]")

what is it good for?

if e"[A-Z]".match("Q"):
    print "success"

since strings (as well as regex strings) are immutable, the compiler can
re.compile them at compile time, as an optimization.

again, i can't say i'like regex literals, and i don't think it would be a
productivity boost (although you would no longer need to import re and
re.compile() your patterns)... but i wanted to bring it to your
consideration.


-tomer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060829/f6a1c658/attachment.htm 

From guido at python.org  Tue Aug 29 21:46:09 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 12:46:09 -0700
Subject: [Python-3000] regex literals?
In-Reply-To: <1d85506f0608291243g2cdfd6f6reb0eb82a5c73fab@mail.gmail.com>
References: <1d85506f0608291243g2cdfd6f6reb0eb82a5c73fab@mail.gmail.com>
Message-ID: <ca471dc20608291246j307f31fak2db5da7781020962@mail.gmail.com>

Do I even have to say -1?

Regular expressions shouldn't become the front and central of Python's
text processing tools.

--Guido

On 8/29/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> i can't say i'm too fond of this, but i thought of bringing this up. most
> scripting
> languages (perl, ruby, and boo, to name some) have regular expressions as
> language literals. since such languages are heavily used for string
> manipulation, it might seem like a good idea to add them at the syntax
> level:
>
> e"[A-Za-z_][A-Za-z_0-9]*"
>
> i thought of prefixing "e" for "regular *e*xpression". could also be "p" for
> pattern.
> it's very simple -- regex literal strings are just passed to re.compile(),
> upon
> creation, i.e.:
> a = e"[A-Z]"
>
> is the same as
> a = re.compile("[A-Z]")
>
> what is it good for?
>
>  if e"[A-Z]".match("Q"):
>     print "success"
>
> since strings (as well as regex strings) are immutable, the compiler can
> re.compile them at compile time, as an optimization.
>
>  again, i can't say i'like regex literals, and i don't think it would be a
> productivity boost (although you would no longer need to import re and
> re.compile() your patterns)... but i wanted to bring it to your
> consideration.
>
>
> -tomer
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 29 21:55:21 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 12:55:21 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060829111904.1B12.JCARLSON@uci.edu>
References: <20060829091403.1B09.JCARLSON@uci.edu>
	<ca471dc20608291031l61efbe6aq50d106f4395ffdaa@mail.gmail.com>
	<20060829111904.1B12.JCARLSON@uci.edu>
Message-ID: <ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>

On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> "Guido van Rossum" <guido at python.org> wrote:
> > For operations that may be forced to return a new string (e.g.
> > concatenation) I think the return value should always be a new string,
> > even if it could be optimized. So for example if v is a view and s is
> > a string, v+s should always return a new string, even if s is empty.
>
> I'm on the fence about this.  On the one hand, I understand the
> desireability of being able to get the underlying string object without
> difficulty.  On the other hand, its performance characteristics could be
> confusing to users of Python who may have come to expect that "st+''" is
> a constant time operation, regardless of the length of st.

Well views aren't strings. And s+t (for s and t strings) normally
takes O(len(s)+len(t)) time.

The type consistency and predictability is more important to me.

I didn't mean to recommend v+"" as the best way to turn a view v into
a string; that would be str(v).

> The non-null string addition case, I agree that it could make some sense
> to return the string (considering you will need to copy it anyways), but
> if one returned a view on that string, it would be more consistant with
> other methods, and getting the string back via str(view) would offer
> equivalent functionality.  It would also require the user to be explicit
> about what they really want; though there is the argument that if I'm
> passing a string as an operand to addition with a view, I actually want
> a string, so give me one.

I strongly believe you're mistaken here. I don't think users will hvae
any trouble with the concept "operations that don't (necessarily)
return a substring will return a new string.

> I'm going to implement it as returning a view, but leave commented
> sections for some of them to return a string.
>
> > BTW beware that in py3k, strings (which will always be unicode
> > strings) won't support the buffer API -- bytes objects will. Would you
> > want views on strings or ob bytes or on both?
>
> That's tricky.  Views on bytes will come for free, like array, mmap, and
> anything else that supports the buffer protocol. It requires the removal
> of the __hash__ method for mutables, but that is certainly expected.

The question is, how useful is the buffer protocol going to be? We
don't know yet.

> Right now, a large portion of standard library code use strings and
> string methods to handle parsing, etc.  Removing immutable byte strings
> from 3.x seems likely to result in a huge amount of rewriting necessary
> to utilize either bytes or text (something I have mentioned before).  I
> believe that with views on bytes (and/or sufficient bytes methods), the
> vast majority would likely result in the use of bytes.

Um, unless you consider decoding a GIF file "parsing", parsing would
seem to naturally fall in the realm of text (characters), not bytes.

> Having a text view for such situtions that works with the same kinds of
> semantics as the bytes view would be nice from a purity/convenience
> standpoint, and only needing to handle a single data type (text) could
> make its implementation easier.  I don't have any short-term plans of
> writing text views, but it may be somewhat easier to do after I'm done
> with string/byte views.

Unifying the semantics between byte views and text views will be
difficult since bytes are mutable.

I recommend that you have a good look at the bytes implementation in
the p3yk branch.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Tue Aug 29 23:01:08 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 29 Aug 2006 17:01:08 -0400
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
References: <20060829091403.1B09.JCARLSON@uci.edu>
	<ca471dc20608291031l61efbe6aq50d106f4395ffdaa@mail.gmail.com>
	<20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
Message-ID: <fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>

On 8/29/06, Guido van Rossum <guido at python.org> wrote:
> On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > "Guido van Rossum" <guido at python.org> wrote:

> The type consistency and predictability is more important to me.

Why is it essential that string views be a different type, rather than
an internal implementation detail, like long vs int?  Today's strings
can already return a new object or an existing one which happens to be
equal.

Is this just a matter of efficiency, or are you making a fundamental
distinction?

-jJ

From jcarlson at uci.edu  Tue Aug 29 23:27:19 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 29 Aug 2006 14:27:19 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
References: <20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
Message-ID: <20060829132924.1B15.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > "Guido van Rossum" <guido at python.org> wrote:
> > > For operations that may be forced to return a new string (e.g.
> > > concatenation) I think the return value should always be a new string,
> > > even if it could be optimized. So for example if v is a view and s is
> > > a string, v+s should always return a new string, even if s is empty.
> >
> > I'm on the fence about this.  On the one hand, I understand the
> > desireability of being able to get the underlying string object without
> > difficulty.  On the other hand, its performance characteristics could be
> > confusing to users of Python who may have come to expect that "st+''" is
> > a constant time operation, regardless of the length of st.
> 
> Well views aren't strings. And s+t (for s and t strings) normally
> takes O(len(s)+len(t)) time.

Right, but my hope is for users who want to use views to start using
them and be able to not be surprised by what they get back.  You have
previously stated that changing return types based on a flag variable is
a horrible idea.  I agree, as providing a flag variable to change return
types is surprising.  This is changing return types based on variable
type, which could be argued as an implicit flag variable, and perhaps
subject to the same surprising behavior == bad criteria that has stopped
other such suggestions in the past.


> The type consistency and predictability is more important to me.

Is view + <anything that supports the buffer protocol> -> view not
consistant or predictable?


> > The non-null string addition case, I agree that it could make some sense
> > to return the string (considering you will need to copy it anyways), but
> > if one returned a view on that string, it would be more consistant with
> > other methods, and getting the string back via str(view) would offer
> > equivalent functionality.  It would also require the user to be explicit
> > about what they really want; though there is the argument that if I'm
> > passing a string as an operand to addition with a view, I actually want
> > a string, so give me one.
> 
> I strongly believe you're mistaken here. I don't think users will hvae
> any trouble with the concept "operations that don't (necessarily)
> return a substring will return a new string.

I could certainly be, but offering both isn't difficult.


> > I'm going to implement it as returning a view, but leave commented
> > sections for some of them to return a string.
> >
> > > BTW beware that in py3k, strings (which will always be unicode
> > > strings) won't support the buffer API -- bytes objects will. Would you
> > > want views on strings or ob bytes or on both?
> >
> > That's tricky.  Views on bytes will come for free, like array, mmap, and
> > anything else that supports the buffer protocol. It requires the removal
> > of the __hash__ method for mutables, but that is certainly expected.
> 
> The question is, how useful is the buffer protocol going to be? We
> don't know yet.

Pretty useful apparently, bytes support decoding to unicode through the
use of its own buffer interface, or really, it uses the decode machinery
that takes a char* and length.

On the other hand, CharBuffer (as opposed to ReadBuffer and
WriteBuffer[1]) isn't really usable, as the reader has no idea about the
*size* and *type* of the characters it is getting back (8, 16, or 32 bit
integers or characters, even 16, 32, or 64 bit floats, etc.). Maybe
fixing CharBuffer, or creating a different interface (deprecating
CharBuffer) would make sense, and would offer the numarray folks their
'array interface'.


> > Right now, a large portion of standard library code use strings and
> > string methods to handle parsing, etc.  Removing immutable byte strings
> > from 3.x seems likely to result in a huge amount of rewriting necessary
> > to utilize either bytes or text (something I have mentioned before).  I
> > believe that with views on bytes (and/or sufficient bytes methods), the
> > vast majority would likely result in the use of bytes.
> 
> Um, unless you consider decoding a GIF file "parsing", parsing would
> seem to naturally fall in the realm of text (characters), not bytes.

I'm using my own definition of parsing again, I apologize.  What I meant
by parsing is anything that currently performs processing of Python 2.x
strings to determine what it is supposed to do.  From http header
processing (sending and recieving), email processing, socket protocols
in smtplib, poplib, asynchat, etc.  All currently use Python 2.x strings.
They will need to be transitioned to 3.x if 2.x byte strings are removed,
and that transition will be quite a bit of work, regardless of whether
bytes get some string methods, or we wrap bytes to provide string
methods, but significantly more if neither is done.


> > Having a text view for such situtions that works with the same kinds of
> > semantics as the bytes view would be nice from a purity/convenience
> > standpoint, and only needing to handle a single data type (text) could
> > make its implementation easier.  I don't have any short-term plans of
> > writing text views, but it may be somewhat easier to do after I'm done
> > with string/byte views.
> 
> Unifying the semantics between byte views and text views will be
> difficult since bytes are mutable.

The only significant nit is that the location of the underlying buffer
pointer changes with byte views.  This is already handled in a generally
satisfactory way in 2.x buffers.


> I recommend that you have a good look at the bytes implementation in
> the p3yk branch.

It is implemented the way I would have expected.

 - Josiah

[1] http://www.python.org/doc/current/api/abstract-buffer.html


From barry at python.org  Tue Aug 29 23:37:16 2006
From: barry at python.org (Barry Warsaw)
Date: Tue, 29 Aug 2006 17:37:16 -0400
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>
References: <20060829091403.1B09.JCARLSON@uci.edu>
	<ca471dc20608291031l61efbe6aq50d106f4395ffdaa@mail.gmail.com>
	<20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
	<fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>
Message-ID: <F7FFF385-231A-46CB-B431-3BE7592C8600@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 29, 2006, at 5:01 PM, Jim Jewett wrote:

> Why is it essential that string views be a different type, rather than
> an internal implementation detail, like long vs int?  Today's strings
> can already return a new object or an existing one which happens to be
> equal.
>
> Is this just a matter of efficiency, or are you making a fundamental
> distinction?

This is a good question.  I haven't been following this thread in  
detail, but ISTM that users shouldn't care and that the object itself  
should do whatever makes the most sense for the most general  
audience.  I'm eager to never have to worry about 8-bit strings vs.  
unicode strings, how they mix and match, and all the nasty corners  
when they interact.  I'd hate to trade that for the worry about  
whether I have a string or a string-view.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRPSzjXEjvBPtnXfVAQJ3WAQAuLgT0yOfIo7gNcg7BS0hvKMb33e9Pbdi
IQdlP0seSt6Q0GXMnCk2DPJdXHAap2co/RnqRXuavqAcJScYBwM626tHppjrgoDV
fcQ6FBn1oshsOSChKIT1tVqiudPiEStWaks6d/xg4yP1EAOEbqEhaGoR3FM7e+Vh
h/d6rtxYaXk=
=XQCo
-----END PGP SIGNATURE-----

From guido at python.org  Tue Aug 29 23:42:43 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 14:42:43 -0700
Subject: [Python-3000] Small Py3k task: fix modulefinder.py
Message-ID: <ca471dc20608291442p3d92790ema7aa35f85d38156a@mail.gmail.com>

Is anyone familiar enough with modulefinder.py to fix its breakage in
Py3k? It chokes in a nasty way (exceeding the recursion limit) on the
relative import syntax. I suspect this is also a problem for 2.5, when
people use that syntax; hence the cross-post. There's no unittest for
modulefinder.py, but I believe py2exe depends on it (and of course
freeze.py, but who uses that still?)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 29 23:51:17 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 14:51:17 -0700
Subject: [Python-3000] Premature optimization and all that
Message-ID: <ca471dc20608291451n68f85451hd418f2b76a47e25c@mail.gmail.com>

Over lunch with Neal we came upon the topic of optimization and Python 3000.

It is our strong opinion that in this stage of the Py3k project we
should focus on getting the new language spec and implementation
feature-complete, without worrying much about optimizations.

We're doing major feature-level surgery, e.g. int/long unification,
str/unicode unification, range/xrange unification, keys() views, and
many others. Keeping everything working is hard work in and of itself;
having to keep it as fast as it was through all the transformations
just makes it that much harder.

if Python 3.0 alpha 1 is twice as slow as 2.5, that's okay with me; we
will have another year to do performance measurements and add new
optimizations in the ramp-up for 3.0 final. Even if 3.0 final is a bit
slower than 2.5 it doesn't bother me too  much; we can continue the
tweaks during the 3.1 and 3.2 development cycle.

Note: I'm note advicating wholesale proactive *removal* of
optimizations. However, I'm allowing new features to slow down
performance temporarily while we get all the features in place. I
expect that the optimization possibilities and needs will be different
than for 2.x, since some of the fundamental data types will be so
different.

In particular, I hope that Martin's int/long unification code can land
ASAP; it's much better to have this feature landed in the p3yk branch,
where everyone can bang on it easily, and learn how this affects user
code, even if it makes everything twice as slow. This seems much
preferable over having it languish in a separate branch until it's
perfect.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Tue Aug 29 23:58:41 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 29 Aug 2006 14:58:41 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <F7FFF385-231A-46CB-B431-3BE7592C8600@python.org>
References: <fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>
	<F7FFF385-231A-46CB-B431-3BE7592C8600@python.org>
Message-ID: <20060829145412.1B18.JCARLSON@uci.edu>


Barry Warsaw <barry at python.org> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Aug 29, 2006, at 5:01 PM, Jim Jewett wrote:
> 
> > Why is it essential that string views be a different type, rather than
> > an internal implementation detail, like long vs int?  Today's strings
> > can already return a new object or an existing one which happens to be
> > equal.
> >
> > Is this just a matter of efficiency, or are you making a fundamental
> > distinction?
> 
> This is a good question.  I haven't been following this thread in  
> detail, but ISTM that users shouldn't care and that the object itself  
> should do whatever makes the most sense for the most general  
> audience.  I'm eager to never have to worry about 8-bit strings vs.  
> unicode strings, how they mix and match, and all the nasty corners  
> when they interact.  I'd hate to trade that for the worry about  
> whether I have a string or a string-view.

If views are not automatically returned for methods on strings, then you
won't have to worry about views unless you explicitly construct them.

Also, you won't ever have a string-view in py3k, it will be a bytes-view,
and if you want to do something like bts.[find|index|partition](sub),
you are going to need the bytes-view, as bytes don't offer those methods
natively.

 - Josiah


From guido at python.org  Wed Aug 30 00:04:09 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 15:04:09 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>
References: <20060829091403.1B09.JCARLSON@uci.edu>
	<ca471dc20608291031l61efbe6aq50d106f4395ffdaa@mail.gmail.com>
	<20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
	<fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>
Message-ID: <ca471dc20608291504n4ce0fcd0q20f90ced72d2fb77@mail.gmail.com>

On 8/29/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/29/06, Guido van Rossum <guido at python.org> wrote:
> > On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > "Guido van Rossum" <guido at python.org> wrote:
>
> > The type consistency and predictability is more important to me.
>
> Why is it essential that string views be a different type, rather than
> an internal implementation detail, like long vs int?  Today's strings
> can already return a new object or an existing one which happens to be
> equal.
>
> Is this just a matter of efficiency, or are you making a fundamental
> distinction?

Sigh. Josiah just said he wouldn't dream of proposing that all string
ops should return string views. You're not helping by questioning even
that.

The short answer is, if you don't have control over when a view on an
existing string is returned and when a copy, there are easy to see
worst-case behaviors that are worse than the problem they are trying
to fix. For example, you'd get a whole series of problems like this
one:

res = []
for i in range(1000):
  s = " "*1000000 # a new 1MB string
  res.append(s[:1]) # a one-character string that is a view on s and
hence keeps s alive

if s[:1] were to return a view on s unconditionally the above loop
would accumumate roughly 1 GB in wasted space.

To fix this you'll have to add heuristics and all sorts of other
things and that will complicate the string implementation and hence
slow it down.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Aug 30 02:35:17 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Aug 2006 12:35:17 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608290842p4798f63ar4e1fc51bbb43c9a9@mail.gmail.com>
References: <20060828120741.1AF7.JCARLSON@uci.edu>
	<ca471dc20608281307uf5e1995vfa65538e156f0c0a@mail.gmail.com>
	<20060828132232.1AFD.JCARLSON@uci.edu>
	<ca471dc20608290842p4798f63ar4e1fc51bbb43c9a9@mail.gmail.com>
Message-ID: <44F4DD45.6060809@canterbury.ac.nz>

Guido van Rossum wrote:

> Having views in a library module alleviates many of my objections.
> While I still worry that it will be overused, deque doesn't seem to be
> overused, so perhaps I should relax.

Another thought is that there will already be ways
in which Py3k views could lead to inefficiencies if
they're not used carefully. A keys() view of a dict,
for example, will keep the values of the dict alive
as well as the keys, unlike the existing keys()
method.

--
Greg

From guido at python.org  Wed Aug 30 02:59:00 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 17:59:00 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F4DD45.6060809@canterbury.ac.nz>
References: <20060828120741.1AF7.JCARLSON@uci.edu>
	<ca471dc20608281307uf5e1995vfa65538e156f0c0a@mail.gmail.com>
	<20060828132232.1AFD.JCARLSON@uci.edu>
	<ca471dc20608290842p4798f63ar4e1fc51bbb43c9a9@mail.gmail.com>
	<44F4DD45.6060809@canterbury.ac.nz>
Message-ID: <ca471dc20608291759v42de405excac9bc64be87bf8e@mail.gmail.com>

On 8/29/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
> > Having views in a library module alleviates many of my objections.
> > While I still worry that it will be overused, deque doesn't seem to be
> > overused, so perhaps I should relax.
>
> Another thought is that there will already be ways
> in which Py3k views could lead to inefficiencies if
> they're not used carefully. A keys() view of a dict,
> for example, will keep the values of the dict alive
> as well as the keys, unlike the existing keys()
> method.

Right; but I don't expect that such a keys() view will typically have
a lifetime longer than the dict. For substrings OTOH that's quite
common (parsing etc.).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Aug 30 03:37:06 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Aug 2006 13:37:06 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060829145412.1B18.JCARLSON@uci.edu>
References: <fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>
	<F7FFF385-231A-46CB-B431-3BE7592C8600@python.org>
	<20060829145412.1B18.JCARLSON@uci.edu>
Message-ID: <44F4EBC2.8020401@canterbury.ac.nz>

Josiah Carlson wrote:

> If views are not automatically returned for methods on strings, then you
> won't have to worry about views unless you explicitly construct them.

Although you might have to worry about someone else
handing you a view when you weren't expecting it. Minimising
the chance of that is a reason for operations on views
not to return further views by default.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug 30 03:44:57 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Aug 2006 13:44:57 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060829132924.1B15.JCARLSON@uci.edu>
References: <20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
	<20060829132924.1B15.JCARLSON@uci.edu>
Message-ID: <44F4ED99.2060408@canterbury.ac.nz>

Josiah Carlson wrote:
> This is changing return types based on variable type,

How do you make that out? It seems the opposite to me --
Guido is saying that the return type of s+t should *not*
depend on whether s or t happens to be a view rather than
a real string.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug 30 03:45:26 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Aug 2006 13:45:26 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>
References: <20060829091403.1B09.JCARLSON@uci.edu>
	<ca471dc20608291031l61efbe6aq50d106f4395ffdaa@mail.gmail.com>
	<20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
	<fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>
Message-ID: <44F4EDB6.1000303@canterbury.ac.nz>

Jim Jewett wrote:

> Why is it essential that string views be a different type, rather than
> an internal implementation detail, like long vs int?

We're talking about a more abstract notion of "type" here.
Strings and views are different things with different
performance characteristics, so it's important to know
which one you're getting, whether they're implemented
as different type()s or not.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug 30 03:46:17 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Aug 2006 13:46:17 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060829111904.1B12.JCARLSON@uci.edu>
References: <20060829091403.1B09.JCARLSON@uci.edu>
	<ca471dc20608291031l61efbe6aq50d106f4395ffdaa@mail.gmail.com>
	<20060829111904.1B12.JCARLSON@uci.edu>
Message-ID: <44F4EDE9.1060700@canterbury.ac.nz>

Josiah Carlson wrote:
> On the other hand, its performance characteristics could be
> confusing to users of Python who may have come to expect that "st+''" is
> a constant time operation, regardless of the length of st.

Even if that's always true, I'm not sure it's really a
useful thing to know. How often do you write a string
concatenation expecting that one of the operands will
almost always be empty? I can count the number of times
I've done that on the fingers of one elbow.

--
Greg

From aahz at pythoncraft.com  Wed Aug 30 04:16:25 2006
From: aahz at pythoncraft.com (Aahz)
Date: Tue, 29 Aug 2006 19:16:25 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608291759v42de405excac9bc64be87bf8e@mail.gmail.com>
References: <20060828120741.1AF7.JCARLSON@uci.edu>
	<ca471dc20608281307uf5e1995vfa65538e156f0c0a@mail.gmail.com>
	<20060828132232.1AFD.JCARLSON@uci.edu>
	<ca471dc20608290842p4798f63ar4e1fc51bbb43c9a9@mail.gmail.com>
	<44F4DD45.6060809@canterbury.ac.nz>
	<ca471dc20608291759v42de405excac9bc64be87bf8e@mail.gmail.com>
Message-ID: <20060830021625.GA19157@panix.com>

On Tue, Aug 29, 2006, Guido van Rossum wrote:
> On 8/29/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> Guido van Rossum wrote:
>>>
>>> Having views in a library module alleviates many of my objections.
>>> While I still worry that it will be overused, deque doesn't seem to
>>> be overused, so perhaps I should relax.
>>
>> Another thought is that there will already be ways in which Py3k
>> views could lead to inefficiencies if they're not used carefully. A
>> keys() view of a dict, for example, will keep the values of the dict
>> alive as well as the keys, unlike the existing keys() method.
>
> Right; but I don't expect that such a keys() view will typically have
> a lifetime longer than the dict. 

That's true only for newer code that correctly uses sets instead of
dicts -- but we've had this argument before.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."  --Brian W. Kernighan

From guido at python.org  Wed Aug 30 05:16:26 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 20:16:26 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F4EBC2.8020401@canterbury.ac.nz>
References: <fb6fbf560608291401o76e8d3cbub022e7b954cb38a7@mail.gmail.com>
	<F7FFF385-231A-46CB-B431-3BE7592C8600@python.org>
	<20060829145412.1B18.JCARLSON@uci.edu>
	<44F4EBC2.8020401@canterbury.ac.nz>
Message-ID: <ca471dc20608292016u747a2535l31fedfd7567e0e72@mail.gmail.com>

On 8/29/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Josiah Carlson wrote:
>
> > If views are not automatically returned for methods on strings, then you
> > won't have to worry about views unless you explicitly construct them.
>
> Although you might have to worry about someone else
> handing you a view when you weren't expecting it. Minimising
> the chance of that is a reason for operations on views
> not to return further views by default.

In support of Josiah here: I think that's the caller's responsibility then.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 30 05:18:06 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 20:18:06 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F4ED99.2060408@canterbury.ac.nz>
References: <20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
	<20060829132924.1B15.JCARLSON@uci.edu>
	<44F4ED99.2060408@canterbury.ac.nz>
Message-ID: <ca471dc20608292018s1310eef6k11509048af229be1@mail.gmail.com>

On 8/29/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Josiah Carlson wrote:
> > This is changing return types based on variable type,
>
> How do you make that out? It seems the opposite to me --
> Guido is saying that the return type of s+t should *not*
> depend on whether s or t happens to be a view rather than
> a real string.

No, I never meant to say that. There's nothing wrong with the type of
x+y depending on the types of x and y. I meant that s+v, v+s and v+w
(s being a string, v and w being views) should all return strings
because -- in general -- they cannot always be views, and I don't want
the return type to depend on the *value* of the inputs.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Wed Aug 30 06:00:49 2006
From: talin at acm.org (Talin)
Date: Tue, 29 Aug 2006 21:00:49 -0700
Subject: [Python-3000] Comment on iostack library
Message-ID: <44F50D71.5030402@acm.org>

I've been thinking more about the iostack proposal. Right now, a typical 
file handle consists of 3 "layers" - one representing the backing store 
(file, memory, network, etc.), one for adding buffering, and one 
representing the program-level API for reading strings, bytes, decoded 
text, etc.

I wonder if it wouldn't be better to cut that down to two. Specifically, 
I would like to suggest eliminating the buffering layer.

My reasoning is fairly straightforward: Most file system handles, 
network handles and other operating system handles already support 
buffering, and they do a far better job of it than we can. The handles 
that don't support buffering are memory streams - which don't need 
buffering anyway.

Of course, it would make sense for Python to provide its own buffering 
implementation if we were going to always use the lowest-level i/o API 
provided by the operating system, but I can't see why we would want to 
do that. The OS knows how to allocate an optimal buffer, using 
information such as the block size of the filesystem, whereas trying to 
achieve this same level of functionality in the Python standard library 
would be needlessly complex IMHO.

-- Talin

From guido at python.org  Wed Aug 30 06:24:02 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Aug 2006 21:24:02 -0700
Subject: [Python-3000] Comment on iostack library
In-Reply-To: <44F50D71.5030402@acm.org>
References: <44F50D71.5030402@acm.org>
Message-ID: <ca471dc20608292124w644bfee5gc2221bcf6304228f@mail.gmail.com>

On 8/29/06, Talin <talin at acm.org> wrote:
> I've been thinking more about the iostack proposal. Right now, a typical
> file handle consists of 3 "layers" - one representing the backing store
> (file, memory, network, etc.), one for adding buffering, and one
> representing the program-level API for reading strings, bytes, decoded
> text, etc.
>
> I wonder if it wouldn't be better to cut that down to two. Specifically,
> I would like to suggest eliminating the buffering layer.
>
> My reasoning is fairly straightforward: Most file system handles,
> network handles and other operating system handles already support
> buffering, and they do a far better job of it than we can. The handles
> that don't support buffering are memory streams - which don't need
> buffering anyway.
>
> Of course, it would make sense for Python to provide its own buffering
> implementation if we were going to always use the lowest-level i/o API
> provided by the operating system, but I can't see why we would want to
> do that. The OS knows how to allocate an optimal buffer, using
> information such as the block size of the filesystem, whereas trying to
> achieve this same level of functionality in the Python standard library
> would be needlessly complex IMHO.

I'm not sure I follow.

We *definitely* don't want to use stdio -- it's not part of the OS
anyway, and has some annoying quirks like not giving you any insight
in how it is using the buffer, nor changing the buffer size on the
fly, and crashing when you switch read and write calls.

So given that, how would you implement readline()? Reading one byte at
a time until you've got the \n is definitely way too slow given the
constant overhead of system calls.

Regarding optimal buffer size, I've never seen a program for which 8K
wasn't optimal. Larger buffers simply don't pay off.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Wed Aug 30 07:26:59 2006
From: talin at acm.org (Talin)
Date: Tue, 29 Aug 2006 22:26:59 -0700
Subject: [Python-3000] Comment on iostack library
In-Reply-To: <ca471dc20608292124w644bfee5gc2221bcf6304228f@mail.gmail.com>
References: <44F50D71.5030402@acm.org>
	<ca471dc20608292124w644bfee5gc2221bcf6304228f@mail.gmail.com>
Message-ID: <44F521A3.1040304@acm.org>

Guido van Rossum wrote:
> On 8/29/06, Talin <talin at acm.org> wrote:
>> I've been thinking more about the iostack proposal. Right now, a typical
>> file handle consists of 3 "layers" - one representing the backing store
>> (file, memory, network, etc.), one for adding buffering, and one
>> representing the program-level API for reading strings, bytes, decoded
>> text, etc.
>>
>> I wonder if it wouldn't be better to cut that down to two. Specifically,
>> I would like to suggest eliminating the buffering layer.
>>
>> My reasoning is fairly straightforward: Most file system handles,
>> network handles and other operating system handles already support
>> buffering, and they do a far better job of it than we can. The handles
>> that don't support buffering are memory streams - which don't need
>> buffering anyway.
>>
>> Of course, it would make sense for Python to provide its own buffering
>> implementation if we were going to always use the lowest-level i/o API
>> provided by the operating system, but I can't see why we would want to
>> do that. The OS knows how to allocate an optimal buffer, using
>> information such as the block size of the filesystem, whereas trying to
>> achieve this same level of functionality in the Python standard library
>> would be needlessly complex IMHO.
> 
> I'm not sure I follow.
> 
> We *definitely* don't want to use stdio -- it's not part of the OS
> anyway, and has some annoying quirks like not giving you any insight
> in how it is using the buffer, nor changing the buffer size on the
> fly, and crashing when you switch read and write calls.
> 
> So given that, how would you implement readline()? Reading one byte at
> a time until you've got the \n is definitely way too slow given the
> constant overhead of system calls.
> 
> Regarding optimal buffer size, I've never seen a program for which 8K
> wasn't optimal. Larger buffers simply don't pay off.

Well, as far as readline goes: In order to split the text into lines, 
you have to decode the text first anyway, which is a layer 3 operation. 
You can't just read bytes until you get a \n, because the file you are 
reading might be encoded in UCS2 or something. So for example, in a 
big-endian UCS2 encoding, newline would be encoded as 0x00 0x0a, whereas 
in a little-endian UCS2 encoding, it would be 0x0A 0x00. Merely stopping 
at the 0x0A byte is incorrect, you've only read half the character.

You're correct that reading by line does require a buffer if you want to 
do it efficiently. However, in a world of character encodings, the 
readline buffer has to be implemented at a higher level in the IO stack, 
at the same level which understands text encodings. There may be a 
different set of buffers at the lower level to minimize the number of 
disk i/o operations, but they can't really be the same buffer -- either 
that, or the text encoding layer will need to have fairly incestuous 
knowledge of what's going on at the lower layers so that it can peek 
inside its buffers.

It seems to me that no matter how you slice it, you can't have an 
abstract "buffering" layer that is independent of both the layer beneath 
and the layer above. Both the text decoding layer and the disk i/o layer 
need to have fairly intimate knowledge of their buffers if you want 
maximum efficiency. (I'm not opposed to a custom implementation of 
buffering in the level 1 file object itself, although I suspect in most 
cases you'd be better off using what the OS or its standard libs provide.)

As far as stdio not giving you hints as to how it is using the buffer, I 
am not sure what you mean...what kind of information would a custom 
buffer implementation give you that stdio would not? If its early 
detection of \n is what you are thinking of, I've already shown that 
won't work unless you are assuming an 8-bit encoding.

-- Talin

From ronaldoussoren at mac.com  Wed Aug 30 07:47:16 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Wed, 30 Aug 2006 07:47:16 +0200
Subject: [Python-3000] Comment on iostack library
In-Reply-To: <44F521A3.1040304@acm.org>
References: <44F50D71.5030402@acm.org>
	<ca471dc20608292124w644bfee5gc2221bcf6304228f@mail.gmail.com>
	<44F521A3.1040304@acm.org>
Message-ID: <79F38D6C-F609-4B58-9C43-6FF0C2BEECE5@mac.com>


On 30-aug-2006, at 7:26, Talin wrote:

> Guido van Rossum wrote:
>>
>> Regarding optimal buffer size, I've never seen a program for which 8K
>> wasn't optimal. Larger buffers simply don't pay off.

Larger buffers can be useful when doing binary I/O through stdio (at  
least on linux). I've recently had a program that had significant  
speedup when I used a 128K buffer.

>
> Well, as far as readline goes: In order to split the text into lines,
> you have to decode the text first anyway, which is a layer 3  
> operation.


And buffering is a layer 2 operation. Function calls are signficantly  
cheaper than system calls. You don't want to do a system call for  
every character read, but might get away with doing a function call  
per character.

Ronald


From fredrik at pythonware.com  Wed Aug 30 10:38:25 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 30 Aug 2006 10:38:25 +0200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
References: <20060827184941.1AE8.JCARLSON@uci.edu>
	<ed1q7r$v4s$2@sea.gmane.org><20060829102307.1B0F.JCARLSON@uci.edu>
	<ed1uds$iog$1@sea.gmane.org>
Message-ID: <ed3iq2$9iv$1@sea.gmane.org>

Fredrik Lundh wrote:

> not necessarily, but there are lots of issues involved when doing
> high-performance XML stuff, and I'm not sure views would help quite as
> much as one might think.
>
> (writing and tuning cET was a great way to learn that not everything
> that you think you know about C performance applies to C code running
> inside the Python interpreter...)

and also based on the cET (and NFS) experiences, it wouldn't surprise me
if a naive 32-bit text string implementation will, on average, slow things down
*more* than any string view implementation can speed things up again...

(in other words, I'm convinced that we need a polymorphic string type.  I'm not
so sure we need views, but if we have the former, we can use that mechanism to
support the latter)

</F> 




From qrczak at knm.org.pl  Wed Aug 30 11:20:56 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Wed, 30 Aug 2006 11:20:56 +0200
Subject: [Python-3000] Comment on iostack library
In-Reply-To: <44F521A3.1040304@acm.org> (talin@acm.org's message of "Tue, 29
	Aug 2006 22:26:59 -0700")
References: <44F50D71.5030402@acm.org>
	<ca471dc20608292124w644bfee5gc2221bcf6304228f@mail.gmail.com>
	<44F521A3.1040304@acm.org>
Message-ID: <87wt8q1sw7.fsf@qrnik.zagroda>

Talin <talin at acm.org> writes:

> It seems to me that no matter how you slice it, you can't have an
> abstract "buffering" layer that is independent of both the layer
> beneath and the layer above.

I think buffering makes sense as the topmost layer, and typically only
there.

Encoding conversion and newline conversion should be performed a block
at a time, below buffering, so not only I/O syscalls, but also
invocations of the recoding machinery are amortized by buffering.

Buffering comes in separate byte and character flavors.

Placing buffering below that makes sense only in cases we want to
decode as little bytes as possible at a time (accepting the slowdown
of encoding one character at a time, but avoiding a syscall per
character). I'm not sure whether this is ever necessary. Finding
the end of HTTP headers can be done before conversion to text.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From ncoghlan at gmail.com  Wed Aug 30 11:46:57 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Aug 2006 19:46:57 +1000
Subject: [Python-3000] Premature optimization and all that
In-Reply-To: <ca471dc20608291451n68f85451hd418f2b76a47e25c@mail.gmail.com>
References: <ca471dc20608291451n68f85451hd418f2b76a47e25c@mail.gmail.com>
Message-ID: <44F55E91.4020000@gmail.com>

Guido van Rossum wrote:
> Over lunch with Neal we came upon the topic of optimization and Python 3000.
> 
> It is our strong opinion that in this stage of the Py3k project we
> should focus on getting the new language spec and implementation
> feature-complete, without worrying much about optimizations.

+1 here - this sounds like an excellent plan to me.

Step 1: Make it work
Step 2: Make it work fast

I've made life difficult for myself a few times by trying to do step 2 without 
doing step 1 first :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Wed Aug 30 12:06:48 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Aug 2006 20:06:48 +1000
Subject: [Python-3000] Comment on iostack library
In-Reply-To: <44F521A3.1040304@acm.org>
References: <44F50D71.5030402@acm.org>	<ca471dc20608292124w644bfee5gc2221bcf6304228f@mail.gmail.com>
	<44F521A3.1040304@acm.org>
Message-ID: <44F56338.5070802@gmail.com>

Talin wrote:
> It seems to me that no matter how you slice it, you can't have an 
> abstract "buffering" layer that is independent of both the layer beneath 
> and the layer above. Both the text decoding layer and the disk i/o layer 
> need to have fairly intimate knowledge of their buffers if you want 
> maximum efficiency. (I'm not opposed to a custom implementation of 
> buffering in the level 1 file object itself, although I suspect in most 
> cases you'd be better off using what the OS or its standard libs provide.)

You'd insert a buffering layer at the appropriate point for whatever you're 
trying to do. The advantage of pulling the buffering out into a separate layer 
is that it can be reused with different byte sources & sinks by supplying the 
appropriate configuration parameters, instead of having to reimplement it for 
each different source/sink.

Applications generally won't be expected to construct these IO stacks 
manually. File IO stacks, for example, will most likely still be created by a 
call to the open() builtin (although the default mode may change to be binary 
if no text encoding is specified).

Here's a list of the IO stacks I believe will be commonly used:

Unbuffered byte IO stack:
   - byte stream API
   - byte source/sink

Block buffered byte IO stack:
   - byte stream API
   - block buffering layer
   - byte source/sink

Character buffered text IO stack:
   - text stream API
   - text codec layer
   - byte source/sink
(effectively unbuffered for single byte encodings like ASCII)

Block buffered text IO stack:
   - text stream API
   - text codec layer
   - block buffering
   - byte source/sink

Line buffered text IO stack:
   - text stream API
   - line buffering
   - text codec layer
   - block buffering
   - byte source/sink

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Wed Aug 30 16:22:11 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Aug 2006 07:22:11 -0700
Subject: [Python-3000] Comment on iostack library
In-Reply-To: <44F521A3.1040304@acm.org>
References: <44F50D71.5030402@acm.org>
	<ca471dc20608292124w644bfee5gc2221bcf6304228f@mail.gmail.com>
	<44F521A3.1040304@acm.org>
Message-ID: <ca471dc20608300722mcae971ct2eb2f64fffca2603@mail.gmail.com>

On 8/29/06, Talin <talin at acm.org> wrote:
> Guido van Rossum wrote:
> > I'm not sure I follow.
> >
> > We *definitely* don't want to use stdio -- it's not part of the OS
> > anyway, and has some annoying quirks like not giving you any insight
> > in how it is using the buffer, nor changing the buffer size on the
> > fly, and crashing when you switch read and write calls.
> >
> > So given that, how would you implement readline()? Reading one byte at
> > a time until you've got the \n is definitely way too slow given the
> > constant overhead of system calls.
> >
> > Regarding optimal buffer size, I've never seen a program for which 8K
> > wasn't optimal. Larger buffers simply don't pay off.
>
> Well, as far as readline goes: In order to split the text into lines,
> you have to decode the text first anyway, which is a layer 3 operation.

OK, I see some of your point. This may explain why in Java the
buffering layer seems to be sitting on top of the encoding/decoding.

Still, for binary file I/O, we'll need a buffering layer on top of the
raw I/O operations. Lots of file formats are read/written in small
chunks but it would be very expensive to turn each small chunk into a
system call.

> As far as stdio not giving you hints as to how it is using the buffer, I
> am not sure what you mean...what kind of information would a custom
> buffer implementation give you that stdio would not?

The specific problem with stdio is that you can't tell if anything is
in the buffer or not. This can make it difficult to do non-blocking
I/O on a socket through stdio (e.g. when using the makefile() option
of Python sockets). Another is that a read after a write is undefined
in the C std and can give segfaults on some platforms, so Python has
to keep track of the "state" of the I/O buffer.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at prescod.net  Wed Aug 30 17:16:39 2006
From: paul at prescod.net (Paul Prescod)
Date: Wed, 30 Aug 2006 08:16:39 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608292018s1310eef6k11509048af229be1@mail.gmail.com>
References: <20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
	<20060829132924.1B15.JCARLSON@uci.edu>
	<44F4ED99.2060408@canterbury.ac.nz>
	<ca471dc20608292018s1310eef6k11509048af229be1@mail.gmail.com>
Message-ID: <1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com>

I don't understand. If the difference between a string and a string view is
a difference of VALUES, not TYPES, then the return type is varying based
upon the difference of input types (which you say is okay). Conversely, if
the strings and string views only vary in their values (share a type) then
the return code is only varying in its value (which EVERYBODY thinks is
okay).

Or maybe we're dealing with a third (new?) situation in which the
performance characteristics of a return value is being dictated by the
performance characteristics of the inputs rather than being predictable on
the basis of the types or values.

On 8/29/06, Guido van Rossum <guido at python.org> wrote:
>
> On 8/29/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > Josiah Carlson wrote:
> > > This is changing return types based on variable type,
> >
> > How do you make that out? It seems the opposite to me --
> > Guido is saying that the return type of s+t should *not*
> > depend on whether s or t happens to be a view rather than
> > a real string.
>
> No, I never meant to say that. There's nothing wrong with the type of
> x+y depending on the types of x and y. I meant that s+v, v+s and v+w
> (s being a string, v and w being views) should all return strings
> because -- in general -- they cannot always be views, and I don't want
> the return type to depend on the *value* of the inputs.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/paul%40prescod.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060830/5916c974/attachment.htm 

From guido at python.org  Wed Aug 30 17:31:07 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Aug 2006 08:31:07 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com>
References: <20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
	<20060829132924.1B15.JCARLSON@uci.edu>
	<44F4ED99.2060408@canterbury.ac.nz>
	<ca471dc20608292018s1310eef6k11509048af229be1@mail.gmail.com>
	<1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com>
Message-ID: <ca471dc20608300831s422b737ei5d35fad380f7f072@mail.gmail.com>

The difference between a string and a view is one of TYPE. (Because
they can have such different performance and memory usage
characteristics, it's not right to treat them as the same type.)

You seem to be misunderstanding what I said. I want the return type
only to depend on the input types. This means that all string and view
concatenations must return strings, not views, because we can always
create a new string, but we cannot always create a new view
representing the concatenation (unless views were to support disjoint
sections, which leads to insanity and the complexity and slowness of
ABC's B-tree string implementation).

Assuming v and w are views: Just like v.lower() must sometimes create
a new string, which implies it must always return a string, v+w must
sometimes create a new string, so it must always return a string.
(It's okay to return an existing string if one with the appropriate
value happens to be lying around nearby; but it's not okay to return
one of the input views, because they're not strings.)

Hope this clarifies things,

--Guido

On 8/30/06, Paul Prescod <paul at prescod.net> wrote:
> I don't understand. If the difference between a string and a string view is
> a difference of VALUES, not TYPES, then the return type is varying based
> upon the difference of input types (which you say is okay). Conversely, if
> the strings and string views only vary in their values (share a type) then
> the return code is only varying in its value (which EVERYBODY thinks is
> okay).
>
> Or maybe we're dealing with a third (new?) situation in which the
> performance characteristics of a return value is being dictated by the
> performance characteristics of the inputs rather than being predictable on
> the basis of the types or values.
>
>
> On 8/29/06, Guido van Rossum <guido at python.org> wrote:
> >
>  On 8/29/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > Josiah Carlson wrote:
> > > This is changing return types based on variable type,
> >
>  > How do you make that out? It seems the opposite to me --
> > Guido is saying that the return type of s+t should *not*
> > depend on whether s or t happens to be a view rather than
> > a real string.
>
>  No, I never meant to say that. There's nothing wrong with the type of
> x+y depending on the types of x and y. I meant that s+v, v+s and v+w
> (s being a string, v and w being views) should all return strings
> because -- in general -- they cannot always be views, and I don't want
> the return type to depend on the *value* of the inputs.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
>
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
>  Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/paul%40prescod.net
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at prescod.net  Wed Aug 30 18:04:47 2006
From: paul at prescod.net (Paul Prescod)
Date: Wed, 30 Aug 2006 09:04:47 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ca471dc20608300831s422b737ei5d35fad380f7f072@mail.gmail.com>
References: <20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
	<20060829132924.1B15.JCARLSON@uci.edu>
	<44F4ED99.2060408@canterbury.ac.nz>
	<ca471dc20608292018s1310eef6k11509048af229be1@mail.gmail.com>
	<1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com>
	<ca471dc20608300831s422b737ei5d35fad380f7f072@mail.gmail.com>
Message-ID: <1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com>

Yes, thanks for the clarification. From a type theory point of view there is
nothing stopping string + view returning a view always (even if it is a view
of a new string) but that would have very poor performance characteristics.

On 8/30/06, Guido van Rossum <guido at python.org> wrote:
>
> The difference between a string and a view is one of TYPE. (Because
> they can have such different performance and memory usage
> characteristics, it's not right to treat them as the same type.)
>
> You seem to be misunderstanding what I said. I want the return type
> only to depend on the input types. This means that all string and view
> concatenations must return strings, not views, because we can always
> create a new string, but we cannot always create a new view
> representing the concatenation (unless views were to support disjoint
> sections, which leads to insanity and the complexity and slowness of
> ABC's B-tree string implementation).
>
> Assuming v and w are views: Just like v.lower() must sometimes create
> a new string, which implies it must always return a string, v+w must
> sometimes create a new string, so it must always return a string.
> (It's okay to return an existing string if one with the appropriate
> value happens to be lying around nearby; but it's not okay to return
> one of the input views, because they're not strings.)
>
> Hope this clarifies things,
>
> --Guido
>
> On 8/30/06, Paul Prescod <paul at prescod.net> wrote:
> > I don't understand. If the difference between a string and a string view
> is
> > a difference of VALUES, not TYPES, then the return type is varying based
> > upon the difference of input types (which you say is okay). Conversely,
> if
> > the strings and string views only vary in their values (share a type)
> then
> > the return code is only varying in its value (which EVERYBODY thinks is
> > okay).
> >
> > Or maybe we're dealing with a third (new?) situation in which the
> > performance characteristics of a return value is being dictated by the
> > performance characteristics of the inputs rather than being predictable
> on
> > the basis of the types or values.
> >
> >
> > On 8/29/06, Guido van Rossum <guido at python.org> wrote:
> > >
> >  On 8/29/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > > Josiah Carlson wrote:
> > > > This is changing return types based on variable type,
> > >
> >  > How do you make that out? It seems the opposite to me --
> > > Guido is saying that the return type of s+t should *not*
> > > depend on whether s or t happens to be a view rather than
> > > a real string.
> >
> >  No, I never meant to say that. There's nothing wrong with the type of
> > x+y depending on the types of x and y. I meant that s+v, v+s and v+w
> > (s being a string, v and w being views) should all return strings
> > because -- in general -- they cannot always be views, and I don't want
> > the return type to depend on the *value* of the inputs.
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > _______________________________________________
> >
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> >  Unsubscribe:
> > http://mail.python.org/mailman/options/python-3000/paul%40prescod.net
> >
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060830/c77924b2/attachment-0001.html 

From guido at python.org  Wed Aug 30 18:54:38 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Aug 2006 09:54:38 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com>
References: <20060829111904.1B12.JCARLSON@uci.edu>
	<ca471dc20608291255h3972ed51h20df1ae63ca97df@mail.gmail.com>
	<20060829132924.1B15.JCARLSON@uci.edu>
	<44F4ED99.2060408@canterbury.ac.nz>
	<ca471dc20608292018s1310eef6k11509048af229be1@mail.gmail.com>
	<1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com>
	<ca471dc20608300831s422b737ei5d35fad380f7f072@mail.gmail.com>
	<1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com>
Message-ID: <ca471dc20608300954w41ec2b99k92b048e833e4a4c6@mail.gmail.com>

I'd phrase it differently -- that would be plain silly. :-)

On 8/30/06, Paul Prescod <paul at prescod.net> wrote:
> Yes, thanks for the clarification. From a type theory point of view there is
> nothing stopping string + view returning a view always (even if it is a view
> of a new string) but that would have very poor performance characteristics.
>
>
> On 8/30/06, Guido van Rossum <guido at python.org> wrote:
> > The difference between a string and a view is one of TYPE. (Because
> > they can have such different performance and memory usage
> > characteristics, it's not right to treat them as the same type.)
> >
> > You seem to be misunderstanding what I said. I want the return type
> > only to depend on the input types. This means that all string and view
> > concatenations must return strings, not views, because we can always
> > create a new string, but we cannot always create a new view
> > representing the concatenation (unless views were to support disjoint
> > sections, which leads to insanity and the complexity and slowness of
> > ABC's B-tree string implementation).
> >
> > Assuming v and w are views: Just like v.lower() must sometimes create
> > a new string, which implies it must always return a string, v+w must
> > sometimes create a new string, so it must always return a string.
> > (It's okay to return an existing string if one with the appropriate
> > value happens to be lying around nearby; but it's not okay to return
> > one of the input views, because they're not strings.)
> >
> > Hope this clarifies things,
> >
> > --Guido
> >
> > On 8/30/06, Paul Prescod <paul at prescod.net> wrote:
> > > I don't understand. If the difference between a string and a string view
> is
> > > a difference of VALUES, not TYPES, then the return type is varying based
> > > upon the difference of input types (which you say is okay). Conversely,
> if
> > > the strings and string views only vary in their values (share a type)
> then
> > > the return code is only varying in its value (which EVERYBODY thinks is
> > > okay).
> > >
> > > Or maybe we're dealing with a third (new?) situation in which the
> > > performance characteristics of a return value is being dictated by the
> > > performance characteristics of the inputs rather than being predictable
> on
> > > the basis of the types or values.
> > >
> > >
> > > On 8/29/06, Guido van Rossum <guido at python.org > wrote:
> > > >
> > >  On 8/29/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > > > Josiah Carlson wrote:
> > > > > This is changing return types based on variable type,
> > > >
> > >  > How do you make that out? It seems the opposite to me --
> > > > Guido is saying that the return type of s+t should *not*
> > > > depend on whether s or t happens to be a view rather than
> > > > a real string.
> > >
> > >  No, I never meant to say that. There's nothing wrong with the type of
> > > x+y depending on the types of x and y. I meant that s+v, v+s and v+w
> > > (s being a string, v and w being views) should all return strings
> > > because -- in general -- they cannot always be views, and I don't want
> > > the return type to depend on the *value* of the inputs.
> > >
> > > --
> > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > > _______________________________________________
> > >
> > > Python-3000 mailing list
> > > Python-3000 at python.org
> > > http://mail.python.org/mailman/listinfo/python-3000
> > >  Unsubscribe:
> > >
> http://mail.python.org/mailman/options/python-3000/paul%40prescod.net
> > >
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Wed Aug 30 20:25:58 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 30 Aug 2006 11:25:58 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com>
References: <ca471dc20608300831s422b737ei5d35fad380f7f072@mail.gmail.com>
	<1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com>
Message-ID: <20060830091620.1B30.JCARLSON@uci.edu>


"Paul Prescod" <paul at prescod.net> wrote:
> Yes, thanks for the clarification. From a type theory point of view there is
> nothing stopping string + view returning a view always (even if it is a view
> of a new string) but that would have very poor performance characteristics.

It depends.  Assume single-segment views (that's what I've been
implementing).  If you have two non-adjacent views, or a view+string
(for non-empty strings), etc., you need to take the time to construct
the new string, that's a given.  But once you have a string, you could
return either the string, or you could return a full view of the string.
The performance differences are fairly insignificant (I was not able to
measure any).

Up until this morning I was planning on writing everything such that
constructive manipulation (upper(), __add__, etc.) returned views of
strings. While I still feel it would be more consistant to always return
views, returning strings does let the user know that "this operation may
take a while" by virtue of returning a string.


 - Josiah


From steven.bethard at gmail.com  Wed Aug 30 23:40:55 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 30 Aug 2006 15:40:55 -0600
Subject: [Python-3000] have zip() raise exception for sequences of different
	lengths
Message-ID: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>

A couple Python-3000 threads [1] [2] have indicated that the most
natural use of zip() is with sequences of the same lengths.  I feel
the same way, and run into this all the time.  Because the error would
otherwise pass silently, I usually end up adding checks before each
use of zip() to raise an exception if I accidentally pass in sequences
of different lengths.

Any chance that zip() in Python 3000 could automatically raise an
exception if the sequence lengths are different?  If there's really a
need for a zip that just truncates, maybe that could be moved to
itertools?  I think the equal-length scenario is dramatically more
common, and keeping that error from passing silently would be a good
thing IMHO.

[1] http://mail.python.org/pipermail/python-3000/2006-March/000160.html
[2] http://mail.python.org/pipermail/python-3000/2006-August/003094.html

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From rhettinger at ewtllc.com  Wed Aug 30 23:52:54 2006
From: rhettinger at ewtllc.com (Raymond Hettinger)
Date: Wed, 30 Aug 2006 14:52:54 -0700
Subject: [Python-3000] have zip() raise exception for sequences of
 different lengths
In-Reply-To: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
Message-ID: <44F608B6.5010209@ewtllc.com>

Steven Bethard wrote:

>A couple Python-3000 threads [1] [2] have indicated that the most
>natural use of zip() is with sequences of the same lengths.  I feel
>the same way, and run into this all the time.  Because the error would
>otherwise pass silently, I usually end up adding checks before each
>use of zip() to raise an exception if I accidentally pass in sequences
>of different lengths.
>
>Any chance that zip() in Python 3000 could automatically raise an
>exception if the sequence lengths are different?  If there's really a
>need for a zip that just truncates, maybe that could be moved to
>itertools?  I think the equal-length scenario is dramatically more
>common, and keeping that error from passing silently would be a good
>thing IMHO.
>
>  
>
-1
 I think this would cause much more harm than good and wreck an 
otherwise easy-to-understand tool.


Raymond




From guido at python.org  Wed Aug 30 23:57:48 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Aug 2006 14:57:48 -0700
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <44F608B6.5010209@ewtllc.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
Message-ID: <ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>

> Steven Bethard wrote:
> >A couple Python-3000 threads [1] [2] have indicated that the most
> >natural use of zip() is with sequences of the same lengths.  I feel
> >the same way, and run into this all the time.  Because the error would
> >otherwise pass silently, I usually end up adding checks before each
> >use of zip() to raise an exception if I accidentally pass in sequences
> >of different lengths.
> >
> >Any chance that zip() in Python 3000 could automatically raise an
> >exception if the sequence lengths are different?  If there's really a
> >need for a zip that just truncates, maybe that could be moved to
> >itertools?  I think the equal-length scenario is dramatically more
> >common, and keeping that error from passing silently would be a good
> >thing IMHO.

[Raymond]
> -1
>  I think this would cause much more harm than good and wreck an
> otherwise easy-to-understand tool.

Perhaps a compromise could be to add a keyword parameter to request
such an exception? (We could even add three options: truncate, pad,
error, with truncate being the default, and pad being the old map()
and filter() behavior.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Thu Aug 31 00:21:34 2006
From: barry at python.org (Barry Warsaw)
Date: Wed, 30 Aug 2006 18:21:34 -0400
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
Message-ID: <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 30, 2006, at 5:57 PM, Guido van Rossum wrote:

> Perhaps a compromise could be to add a keyword parameter to request
> such an exception? (We could even add three options: truncate, pad,
> error, with truncate being the default, and pad being the old map()
> and filter() behavior.)

Caveat: I don't even know if /I/ like this, but I'll spit it out  
anyway in case it spurs an actual good idea from someone else. :)

What about a keyword argument called 'filler' which can be an n-sized  
sequence or a callable.  If it's a sequence, then when zip arguments  
are exhausted, you pull values for that item from the appropriate  
element of the sequence.  If it's a callable, you call it with the  
items you have and None's for the exhausted ones.  Whatever filler()  
returns, zip returns.  filler() could then splice in whatever values  
it wants.  Yeah 'None' for the missing ones can be ambiguous but oh  
well.

You raise a ValueError if filler is a sequence of size that doesn't  
match the number of zip arguments or if filler() doesn't return an  
appropriately sized sequence.

yeah-okay-dumb-5-minute-idea-ly y'rs,
- -Barry

P.S. OTOH, zip's current semantics never bothered me much in practice.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRPYPc3EjvBPtnXfVAQL5RQQAh93Sr84HaLP0Zo4hr3JBuWkhipryIx+A
eCnGKXxXa8fTvBuRcaHFAryPnXxrnrhs1pmpQsf3/scJTHcwXstX8OMJvHrFRqcV
KHF8qRazP271RnbDQuDBTJwcwsTFpjHtDVyNbApYxQDreiy77q4ZDyuraICKlkqo
rT8hfF3Mab8=
=+9HZ
-----END PGP SIGNATURE-----

From rhettinger at ewtllc.com  Thu Aug 31 00:41:08 2006
From: rhettinger at ewtllc.com (Raymond Hettinger)
Date: Wed, 30 Aug 2006 15:41:08 -0700
Subject: [Python-3000] have zip() raise exception for sequences of
 different lengths
In-Reply-To: <ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>	
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
Message-ID: <44F61404.8010002@ewtllc.com>


>
> Perhaps a compromise could be to add a keyword parameter to request
> such an exception? (We could even add three options: truncate, pad,
> error, with truncate being the default, and pad being the old map()
> and filter() behavior.)


FWIW, I intend to add an itertool called izip_longest() which allows a 
pad value to be specified.  In deciding to accept that feature request, 
I put a great deal of thought and research into the idea.  Along the 
way, I looked at other languages and found both truncating and padding 
versions of zip but did not find any version that raised an exception.  
IMO, such a provision would foul the waters and complicate the use of an 
otherwise simple function.

Until now, there have been zero requests for zip() to have exception 
raising behavior.

For Python 3k, I recommend:
* simply replacing zip() with itertools.izip()
* keeping the zip_longest() in a separate module
* punting on an exception raising version

The first covers 99% of use cases.
The second covers a handful of situations that are otherwise difficult 
to deal with.
The third is a YAGNI.


Raymond

 



From steven.bethard at gmail.com  Thu Aug 31 01:33:27 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 30 Aug 2006 17:33:27 -0600
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <44F61404.8010002@ewtllc.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
	<44F61404.8010002@ewtllc.com>
Message-ID: <d11dcfba0608301633i4ef09c89ha4e2b15b7f39dd81@mail.gmail.com>

On 8/30/06, Raymond Hettinger <rhettinger at ewtllc.com> wrote:
> Until now, there have been zero requests for zip() to have exception
> raising behavior.
>
> For Python 3k, I recommend:
> * simply replacing zip() with itertools.izip()
> * keeping the zip_longest() in a separate module
> * punting on an exception raising version
>
> The first covers 99% of use cases.

I guess it depends what you mean by "covers".  If you mean "produces
the correct output for correct input" then yes, it does, but so would
the exception raising one.  I contend that it often does the wrong
thing for incorrect input by silently truncating. To try to give a
fair evaluation of this contention, I looked at some stdlib examples
and tried to classify them:

Examples where different lengths should be an error:

compiler/pycodegen.py:        for i, for_ in
zip(range(len(node.quals)), node.quals):
dis.py:    for byte_incr, line_incr in zip(byte_increments, line_increments):
email/Header.py:        return zip(chunks, [charset]*len(chunks))
filecmp.py:        a = dict(izip(imap(os.path.normcase,
self.left_list), self.left_list))
idlelib/keybindingDialog.py:        for modifier, variable in
zip(self.modifiers, self.modifier_vars):

Examples where truncation is needed:

csv.py:        d = dict(zip(self.fieldnames, row))
idlelib/EditorWindow.py:            for i, file in zip(count(), rf_list):

A couple of the examples (pycodegen.py, EditorWindow.py) are really
just performing a poor-man's enumerate(), but with a cursory glance it
still looks to me like there are more cases in the stdlib where it is
a programming error to have lists of different sizes.

If changing zip()'s behavior to match the most common use case is
totally out, the stdlib code at least argues for adding something like
itertools.izip_exact().

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From steven.bethard at gmail.com  Thu Aug 31 01:56:32 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 30 Aug 2006 17:56:32 -0600
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <44F608B6.5010209@ewtllc.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
Message-ID: <d11dcfba0608301656kb599177t548e25e098de3c47@mail.gmail.com>

On 8/30/06, Raymond Hettinger <rhettinger at ewtllc.com> wrote:
> Steven Bethard wrote:
>
> >A couple Python-3000 threads [1] [2] have indicated that the most
> >natural use of zip() is with sequences of the same lengths.  I feel
> >the same way, and run into this all the time.  Because the error would
> >otherwise pass silently, I usually end up adding checks before each
> >use of zip() to raise an exception if I accidentally pass in sequences
> >of different lengths.
> >
> >Any chance that zip() in Python 3000 could automatically raise an
> >exception if the sequence lengths are different?  If there's really a
> >need for a zip that just truncates, maybe that could be moved to
> >itertools?  I think the equal-length scenario is dramatically more
> >common, and keeping that error from passing silently would be a good
> >thing IMHO.
>
> -1
>  I think this would cause much more harm than good and wreck an
> otherwise easy-to-understand tool.

Current documentation:

zip(  	[iterable, ...])
    This function returns a list of tuples, where the i-th tuple
contains the i-th element from each of the argument sequences or
iterables. The returned list is truncated in length to the length of
the shortest argument sequence...

Proposed change:

zip(  	[iterable, ...])
    This function returns a list of tuples, where the i-th tuple
contains the i-th element from each of the argument sequences or
iterables. It is an error if the argument sequences are of different
lengths...

That seems pretty comparable in complexity to me.  Could you explain
how this makes zip() harder to understand?

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From rhettinger at ewtllc.com  Thu Aug 31 01:58:04 2006
From: rhettinger at ewtllc.com (Raymond Hettinger)
Date: Wed, 30 Aug 2006 16:58:04 -0700
Subject: [Python-3000] have zip() raise exception for sequences
 of	different lengths
In-Reply-To: <d11dcfba0608301633i4ef09c89ha4e2b15b7f39dd81@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>	<44F608B6.5010209@ewtllc.com>	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>	<44F61404.8010002@ewtllc.com>
	<d11dcfba0608301633i4ef09c89ha4e2b15b7f39dd81@mail.gmail.com>
Message-ID: <44F6260C.1040502@ewtllc.com>


>If changing zip()'s behavior to match the most common use case is
>totally out, the stdlib code at least argues for adding something like
>itertools.izip_exact().
>  
>

I open to that.

For this time being, let's do this.  Add itertools.izip_longest() in 
Py2.5 and include a recipe for izip_exact() and see if anyone cares 
enough to ever use it.  The new any() and all() functions started out as 
recipes and graduated when their popularity was shown.  If izip_exact() 
proves its worth, then I would be happy to add it as a tool.


Raymond


From greg.ewing at canterbury.ac.nz  Thu Aug 31 01:59:07 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 Aug 2006 11:59:07 +1200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060830091620.1B30.JCARLSON@uci.edu>
References: <ca471dc20608300831s422b737ei5d35fad380f7f072@mail.gmail.com>
	<1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com>
	<20060830091620.1B30.JCARLSON@uci.edu>
Message-ID: <44F6264B.4000005@canterbury.ac.nz>

Josiah Carlson wrote:

> Up until this morning I was planning on writing everything such that
> constructive manipulation (upper(), __add__, etc.) returned views of
> strings.

I was about to say that this would be completely pointless,
when I realised the point is so that further operations on
these results would return views of them. In Josiah's
views-always-return-views world, that would actually make
sense -- but only if we really wanted such a world.

To my mind, the use of views is to temporarily call out
a part of a string for the purpose of applying some
other operation to it. Views will therefore be
short-lived objects that you won't want to keep and
pass around. I suspect that, if views are the default
result of anything done to a view, one will almost
always be doing a str() on the result to turn it back
into a non-view. If that's the case, then returning
views would be the wrong default.

--
Greg

From rhettinger at ewtllc.com  Thu Aug 31 02:03:17 2006
From: rhettinger at ewtllc.com (Raymond Hettinger)
Date: Wed, 30 Aug 2006 17:03:17 -0700
Subject: [Python-3000] have zip() raise exception for sequences
 of	different lengths
In-Reply-To: <d11dcfba0608301656kb599177t548e25e098de3c47@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>	<44F608B6.5010209@ewtllc.com>
	<d11dcfba0608301656kb599177t548e25e098de3c47@mail.gmail.com>
Message-ID: <44F62745.60006@ewtllc.com>


>Proposed change:
>
>zip(  	[iterable, ...])
>    This function returns a list of tuples, where the i-th tuple
>contains the i-th element from each of the argument sequences or
>iterables. It is an error if the argument sequences are of different
>lengths...
>
>That seems pretty comparable in complexity to me.  Could you explain
>how this makes zip() harder to understand?
>  
>

It's a PITA because it precludes all of the use cases whether the inputs 
ARE intentionally of different length (like when one argument supplys an 
infinite iterator):

   for lineno, ts, line in zip(count(1), timestamp(), sys.stdin):
       print 'Line %d, Time %s:  %s)' % (lineno, ts, line)


Raymond


From greg.ewing at canterbury.ac.nz  Thu Aug 31 02:06:56 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 Aug 2006 12:06:56 +1200
Subject: [Python-3000] have zip() raise exception for sequences of
 different lengths
In-Reply-To: <ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
Message-ID: <44F62820.3090206@canterbury.ac.nz>

Guido van Rossum wrote:

> Perhaps a compromise could be to add a keyword parameter to request
> such an exception?

But who is going to bother using such a keyword,
when it's not necessary for correct operation of
the program in the absence of bugs?

> (We could even add three options: truncate, pad,
> error, with truncate being the default, and pad being the old map()
> and filter() behavior.)

This seems to fall foul of the no-constant-parameters
guideline.

--
Greg

From rrr at ronadam.com  Thu Aug 31 03:26:55 2006
From: rrr at ronadam.com (Ron Adam)
Date: Wed, 30 Aug 2006 20:26:55 -0500
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
Message-ID: <ed5e2k$r7f$1@sea.gmane.org>

Guido van Rossum wrote:

> Perhaps a compromise could be to add a keyword parameter to request
> such an exception? (We could even add three options: truncate, pad,
> error, with truncate being the default, and pad being the old map()
> and filter() behavior.)

Maybe it can be done with just two optional keywords.


If 'match' is True, raise an error if iterables are mismatched.

if a 'pad' is specified then pad, else truncate.

The current truncating behavior would be the default.


    Ron









From jcarlson at uci.edu  Thu Aug 31 04:20:06 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 30 Aug 2006 19:20:06 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F6264B.4000005@canterbury.ac.nz>
References: <20060830091620.1B30.JCARLSON@uci.edu>
	<44F6264B.4000005@canterbury.ac.nz>
Message-ID: <20060830185158.1B3F.JCARLSON@uci.edu>


Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Josiah Carlson wrote:
> 
> > Up until this morning I was planning on writing everything such that
> > constructive manipulation (upper(), __add__, etc.) returned views of
> > strings.
> 
> I was about to say that this would be completely pointless,
> when I realised the point is so that further operations on
> these results would return views of them. In Josiah's
> views-always-return-views world, that would actually make
> sense -- but only if we really wanted such a world.

Code wise, it could easily be a keyword argument on construction.


> To my mind, the use of views is to temporarily call out
> a part of a string for the purpose of applying some
> other operation to it. Views will therefore be
> short-lived objects that you won't want to keep and
> pass around. I suspect that, if views are the default
> result of anything done to a view, one will almost
> always be doing a str() on the result to turn it back
> into a non-view. If that's the case, then returning
> views would be the wrong default.

If views are always returned, then we can perform some optimizations
(adjacent view concatenation, etc.), which may reduce running time,
memory use, etc.  If the user *needs* a string to be returned, they can
always perform str(view).  But remember, since 2.x strings are going
away in 3.x, then it would really be bytes(view).  I've looked through
the methods available to them, and I'm happy that views are gaining
traction, if only so that I can get view(bytes).partition() .

If we always return strings (or bytes in 3.x), then all of those
optimizations are lost.  I'm writing them with optimizations, but they
can certainly be removed later.

Oh, and I've only got about 15 methods of the 60+ left to implement.


 - Josiah


From talin at acm.org  Thu Aug 31 04:35:48 2006
From: talin at acm.org (Talin)
Date: Wed, 30 Aug 2006 19:35:48 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060830185158.1B3F.JCARLSON@uci.edu>
References: <20060830091620.1B30.JCARLSON@uci.edu>	<44F6264B.4000005@canterbury.ac.nz>
	<20060830185158.1B3F.JCARLSON@uci.edu>
Message-ID: <44F64B04.9080200@acm.org>

Josiah Carlson wrote:

> If views are always returned, then we can perform some optimizations
> (adjacent view concatenation, etc.), which may reduce running time,
> memory use, etc.  If the user *needs* a string to be returned, they can
> always perform str(view).  But remember, since 2.x strings are going
> away in 3.x, then it would really be bytes(view).  I've looked through
> the methods available to them, and I'm happy that views are gaining
> traction, if only so that I can get view(bytes).partition() .

I know this was shot down before, but I would still like to see a 
"characters" type - that is, a mutable sequence of wide characters, much 
like the Java StringBuffer class - to go along with "bytes". From my 
perspective, it makes perfect sense to have an "array of character" type 
as well as an "array of byte" type, and since the "array of byte" is 
simply called "bytes", then by extension the "array of character" type 
would be called "characters".

Of course, both the 'array' and 'list' types already give you that, but 
"characters" would have additional string-like methods. (However since 
it is mutable, it would not be capable of producing views.)

The 'characters' data type would be particularly optimized for 
character-at-a-time operations, i.e. building up a string one character 
at a time. An example use would be processing escape sequences in 
strings, where you are transforming the escaped string into its 
non-escaped equivalent.

-- Talin

From guido at python.org  Thu Aug 31 05:01:04 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Aug 2006 20:01:04 -0700
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <ed5e2k$r7f$1@sea.gmane.org>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
	<ed5e2k$r7f$1@sea.gmane.org>
Message-ID: <ca471dc20608302001x6e32c7bal23650fafb8224ebc@mail.gmail.com>

Actually given Raymond's preferences I take it back

On 8/30/06, Ron Adam <rrr at ronadam.com> wrote:
> Guido van Rossum wrote:
>
> > Perhaps a compromise could be to add a keyword parameter to request
> > such an exception? (We could even add three options: truncate, pad,
> > error, with truncate being the default, and pad being the old map()
> > and filter() behavior.)
>
> Maybe it can be done with just two optional keywords.
>
>
> If 'match' is True, raise an error if iterables are mismatched.
>
> if a 'pad' is specified then pad, else truncate.
>
> The current truncating behavior would be the default.
>
>
>     Ron
>
>
>
>
>
>
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug 31 05:05:26 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Aug 2006 20:05:26 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F64B04.9080200@acm.org>
References: <20060830091620.1B30.JCARLSON@uci.edu>
	<44F6264B.4000005@canterbury.ac.nz>
	<20060830185158.1B3F.JCARLSON@uci.edu> <44F64B04.9080200@acm.org>
Message-ID: <ca471dc20608302005g68e44a05p8c0b041926590786@mail.gmail.com>

On 8/30/06, Talin <talin at acm.org> wrote:
> I know this was shot down before, but I would still like to see a
> "characters" type - that is, a mutable sequence of wide characters, much
> like the Java StringBuffer class - to go along with "bytes". From my
> perspective, it makes perfect sense to have an "array of character" type
> as well as an "array of byte" type, and since the "array of byte" is
> simply called "bytes", then by extension the "array of character" type
> would be called "characters".
>
> Of course, both the 'array' and 'list' types already give you that, but
> "characters" would have additional string-like methods. (However since
> it is mutable, it would not be capable of producing views.)
>
> The 'characters' data type would be particularly optimized for
> character-at-a-time operations, i.e. building up a string one character
> at a time. An example use would be processing escape sequences in
> strings, where you are transforming the escaped string into its
> non-escaped equivalent.

The array module was always usable for this purpose (even for Unicode
characters) but it doesn't seem to have gotten any traction. So it
sounds like a YAGNI to me.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steven.bethard at gmail.com  Thu Aug 31 05:32:14 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 30 Aug 2006 21:32:14 -0600
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <44F6260C.1040502@ewtllc.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
	<44F61404.8010002@ewtllc.com>
	<d11dcfba0608301633i4ef09c89ha4e2b15b7f39dd81@mail.gmail.com>
	<44F6260C.1040502@ewtllc.com>
Message-ID: <d11dcfba0608302032q93fa6a1h99cbd52b7e6cdfa0@mail.gmail.com>

On 8/30/06, Raymond Hettinger <rhettinger at ewtllc.com> wrote:
> >If changing zip()'s behavior to match the most common use case is
> >totally out, the stdlib code at least argues for adding something like
> >itertools.izip_exact().
>
> I open to that.
>
> For this time being, let's do this.  Add itertools.izip_longest() in
> Py2.5 and include a recipe for izip_exact() and see if anyone cares
> enough to ever use it.  The new any() and all() functions started out as
> recipes and graduated when their popularity was shown.  If izip_exact()
> proves its worth, then I would be happy to add it as a tool.

Fair enough.  Michael Chermside provided a recipe here:

http://mail.python.org/pipermail/python-3000/2006-March/000160.html

Maybe there's a cleaner way to write this, but I couldn't spot one off-hand.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From jcarlson at uci.edu  Thu Aug 31 05:41:24 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 30 Aug 2006 20:41:24 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F64B04.9080200@acm.org>
References: <20060830185158.1B3F.JCARLSON@uci.edu> <44F64B04.9080200@acm.org>
Message-ID: <20060830203044.1B42.JCARLSON@uci.edu>


Talin <talin at acm.org> wrote:
> I know this was shot down before, but I would still like to see a 
> "characters" type - that is, a mutable sequence of wide characters, much 
> like the Java StringBuffer class - to go along with "bytes". From my 
> perspective, it makes perfect sense to have an "array of character" type 
> as well as an "array of byte" type, and since the "array of byte" is 
> simply called "bytes", then by extension the "array of character" type 
> would be called "characters".

If the buffer API offered information about the size of each element,
similar to the way the proposed 'array API' is offering, this would just
be one of the supportable cases.  Views could offer the ability to
specify the size of each element during construction (8, 16, or 32 bits),
but variant methods for handling everything would need to be constructed.

> Of course, both the 'array' and 'list' types already give you that, but 
> "characters" would have additional string-like methods. (However since 
> it is mutable, it would not be capable of producing views.)

The view object I have now supports mutable and resizable objects (like
bytes and array).

> The 'characters' data type would be particularly optimized for 
> character-at-a-time operations, i.e. building up a string one character 
> at a time. An example use would be processing escape sequences in 
> strings, where you are transforming the escaped string into its 
> non-escaped equivalent.

That is already possible with array.array('H', ...) or array.array('L', ...),
depending on the unicode width of your platform.  Array performs a more
conservative reallocation strategy (1/16 rather than 1/8), but it seems
to work well enough.  Combine array with wide character support in views,
and we could very well have the functionality that you desire.

 - Josiah


From bob at redivi.com  Thu Aug 31 05:56:03 2006
From: bob at redivi.com (Bob Ippolito)
Date: Wed, 30 Aug 2006 20:56:03 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ed3iq2$9iv$1@sea.gmane.org>
References: <20060827184941.1AE8.JCARLSON@uci.edu> <ed1q7r$v4s$2@sea.gmane.org>
	<20060829102307.1B0F.JCARLSON@uci.edu> <ed1uds$iog$1@sea.gmane.org>
	<ed3iq2$9iv$1@sea.gmane.org>
Message-ID: <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com>

On 8/30/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> Fredrik Lundh wrote:
>
> > not necessarily, but there are lots of issues involved when doing
> > high-performance XML stuff, and I'm not sure views would help quite as
> > much as one might think.
> >
> > (writing and tuning cET was a great way to learn that not everything
> > that you think you know about C performance applies to C code running
> > inside the Python interpreter...)
>
> and also based on the cET (and NFS) experiences, it wouldn't surprise me
> if a naive 32-bit text string implementation will, on average, slow things down
> *more* than any string view implementation can speed things up again...
>
> (in other words, I'm convinced that we need a polymorphic string type.  I'm not
> so sure we need views, but if we have the former, we can use that mechanism to
> support the latter)

+1 for polymorphic strings.

This would give us the best of both worlds: compact representations
for ASCII and Latin-1, full 32-bit text when needed, and the
possibility to implement further optimizations when necessary. It
could add a bit of complexity and/or a massive speed penalty
(depending on how naive the implementation is) around character
operations though.

For implementation ideas, Apple's CoreFoundation has a mature
implementation of polymorphic strings in C (which is the basis for
their NSString type in Objective-C), and there's a cross-platform
subset of it available as CF-Lite:
http://developer.apple.com/opensource/cflite.html

-bob

From jackdied at jackdied.com  Thu Aug 31 06:00:41 2006
From: jackdied at jackdied.com (Jack Diederich)
Date: Thu, 31 Aug 2006 00:00:41 -0400
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <ca471dc20608302001x6e32c7bal23650fafb8224ebc@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
	<ed5e2k$r7f$1@sea.gmane.org>
	<ca471dc20608302001x6e32c7bal23650fafb8224ebc@mail.gmail.com>
Message-ID: <20060831040041.GF6257@performancedrivers.com>

No need to take it back, as a long time python-* list reader I only took
your  initial post as thinking out loud.

List readers can spot similar threads in the future by looking for these
three indicators:

1) Behavioral function arguments are discouraged and mostly on your say-so.

2) You didn't top post, so it wasn't a pronouncement.

3) Long time readers were sure enough of #1 and #2 that no one added a 
   "GOOD GOD NO" reply

top-posting-ly,

-Jack

On Wed, Aug 30, 2006 at 08:01:04PM -0700, Guido van Rossum wrote:
> Actually given Raymond's preferences I take it back
> 
> On 8/30/06, Ron Adam <rrr at ronadam.com> wrote:
> > Guido van Rossum wrote:
> >
> > > Perhaps a compromise could be to add a keyword parameter to request
> > > such an exception? (We could even add three options: truncate, pad,
> > > error, with truncate being the default, and pad being the old map()
> > > and filter() behavior.)
> >
> > Maybe it can be done with just two optional keywords.
> >
> >
> > If 'match' is True, raise an error if iterables are mismatched.
> >
> > if a 'pad' is specified then pad, else truncate.
> >
> > The current truncating behavior would be the default.
> >
> >
> >     Ron
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
> 
> 
> -- 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jack%40performancedrivers.com
> 

From rrr at ronadam.com  Thu Aug 31 06:27:27 2006
From: rrr at ronadam.com (Ron Adam)
Date: Wed, 30 Aug 2006 23:27:27 -0500
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060830185158.1B3F.JCARLSON@uci.edu>
References: <20060830091620.1B30.JCARLSON@uci.edu>	<44F6264B.4000005@canterbury.ac.nz>
	<20060830185158.1B3F.JCARLSON@uci.edu>
Message-ID: <ed5ol4$ktv$1@sea.gmane.org>

Josiah Carlson wrote:

> If views are always returned, then we can perform some optimizations
> (adjacent view concatenation, etc.), which may reduce running time,
> memory use, etc.  d

Given a empty string and a view to it, how much memory do you think a 
view object will take in comparison to the string object?

Wouldn't there be a minimum size of a string where it would be better to 
just copy the string?










From jack at psynchronous.com  Thu Aug 31 06:43:54 2006
From: jack at psynchronous.com (Jack Diederich)
Date: Thu, 31 Aug 2006 00:43:54 -0400
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com>
References: <20060827184941.1AE8.JCARLSON@uci.edu> <ed1q7r$v4s$2@sea.gmane.org>
	<20060829102307.1B0F.JCARLSON@uci.edu>
	<ed1uds$iog$1@sea.gmane.org> <ed3iq2$9iv$1@sea.gmane.org>
	<6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com>
Message-ID: <20060831044354.GH6257@performancedrivers.com>

On Wed, Aug 30, 2006 at 08:56:03PM -0700, Bob Ippolito wrote:
> On 8/30/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> > Fredrik Lundh wrote:
> >
> > > not necessarily, but there are lots of issues involved when doing
> > > high-performance XML stuff, and I'm not sure views would help quite as
> > > much as one might think.
> > >
> > > (writing and tuning cET was a great way to learn that not everything
> > > that you think you know about C performance applies to C code running
> > > inside the Python interpreter...)
> >
> > and also based on the cET (and NFS) experiences, it wouldn't surprise me
> > if a naive 32-bit text string implementation will, on average, slow things down
> > *more* than any string view implementation can speed things up again...
> >
> > (in other words, I'm convinced that we need a polymorphic string type.  I'm not
> > so sure we need views, but if we have the former, we can use that mechanism to
> > support the latter)
> 
> +1 for polymorphic strings.
> 
> This would give us the best of both worlds: compact representations
> for ASCII and Latin-1, full 32-bit text when needed, and the
> possibility to implement further optimizations when necessary. It
> could add a bit of complexity and/or a massive speed penalty
> (depending on how naive the implementation is) around character
> operations though.
> 
> For implementation ideas, Apple's CoreFoundation has a mature
> implementation of polymorphic strings in C (which is the basis for
> their NSString type in Objective-C), and there's a cross-platform
> subset of it available as CF-Lite:
> http://developer.apple.com/opensource/cflite.html
> 

Having watched Fredrik casually double the speed of many str and unicode 
operations in a week I'm easily +1 on whatever he says.  Bob's support 
makes that a +2, he struck me as quite sane too.

That said can you guys expand on what polymorphic[1] means here in particular?
Python wise I can only think of the str/unicode/buffer split.  If the 
fraternity of strings doesn't include views (which I haven't needed either)
what are you considering for the other kinds?

-Jack

[1] My ten pound Webster's says
    "An organism having more that one adult form, as the different castes 
    in social ants" which is close enough to what I think the comp sci
    definition is.

From jcarlson at uci.edu  Thu Aug 31 07:23:05 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 30 Aug 2006 22:23:05 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <ed5ol4$ktv$1@sea.gmane.org>
References: <20060830185158.1B3F.JCARLSON@uci.edu> <ed5ol4$ktv$1@sea.gmane.org>
Message-ID: <20060830220511.1B45.JCARLSON@uci.edu>


Ron Adam <rrr at ronadam.com> wrote:
> 
> Josiah Carlson wrote:
> 
> > If views are always returned, then we can perform some optimizations
> > (adjacent view concatenation, etc.), which may reduce running time,
> > memory use, etc.  d
> 
> Given a empty string and a view to it, how much memory do you think a 
> view object will take in comparison to the string object?

On 32 bit platforms, the current implementation uses 8 more bytes than a
Python 2.4 buffer, or 44 bytes rather than 36.  The base string object
takes up at least 24 bytes (for strings of length 2-4, all length 1 and
0 strings are interned).

> Wouldn't there be a minimum size of a string where it would be better to 
> just copy the string?

What do you mean by "better"?  If your question is: at what size would
returning a Python 2.x string be more space efficient than a the current
view implementation, that would be a string of up to 24 bytes long.

However, as I said before, with views we can do adjacent view
concatenation...

    x,y,z = view.partition(a)
    left_with_sep = x+y
    right_with_sep = y+z

If we returned views from view addition, then both of the additions
above would be constant time operations.  But if we returned strings
from view additions, the above two additions would run in O(n) time
together.

If we were really crazy, we could even handle non-adjacent view
concatenation by checking the readonly flag, and examining data to the
right of the current view.  But even I'm not that crazy.


 - Josiah


From talin at acm.org  Thu Aug 31 07:36:43 2006
From: talin at acm.org (Talin)
Date: Wed, 30 Aug 2006 22:36:43 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060830203044.1B42.JCARLSON@uci.edu>
References: <20060830185158.1B3F.JCARLSON@uci.edu> <44F64B04.9080200@acm.org>
	<20060830203044.1B42.JCARLSON@uci.edu>
Message-ID: <44F6756B.2080606@acm.org>

Josiah Carlson wrote:
> Talin <talin at acm.org> wrote:
>> The 'characters' data type would be particularly optimized for 
>> character-at-a-time operations, i.e. building up a string one character 
>> at a time. An example use would be processing escape sequences in 
>> strings, where you are transforming the escaped string into its 
>> non-escaped equivalent.
> 
> That is already possible with array.array('H', ...) or array.array('L', ...),
> depending on the unicode width of your platform.  Array performs a more
> conservative reallocation strategy (1/16 rather than 1/8), but it seems
> to work well enough.  Combine array with wide character support in views,
> and we could very well have the functionality that you desire.

Well, one of the things I wanted to be able to do is:

    'characters += str'

Or more precisely:

    token_buf = characters()
    token_buf += "example"
    token_buf += "\n"
    print token_buf
    >>> "example\n"

Now, an ordinary list would concatenate the string *object* onto the end 
of the list; whereas the character array would concatenate the string 
characters to the end of the character array. Also note that the __str__ 
method of the character array returns a vanilla string object of its 
contents.

(What I am describing here is exactly the behavior of Java StringBuffer.)

-- Talin

From paul at prescod.net  Thu Aug 31 10:05:18 2006
From: paul at prescod.net (Paul Prescod)
Date: Thu, 31 Aug 2006 01:05:18 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060831044354.GH6257@performancedrivers.com>
References: <20060827184941.1AE8.JCARLSON@uci.edu> <ed1q7r$v4s$2@sea.gmane.org>
	<20060829102307.1B0F.JCARLSON@uci.edu> <ed1uds$iog$1@sea.gmane.org>
	<ed3iq2$9iv$1@sea.gmane.org>
	<6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com>
	<20060831044354.GH6257@performancedrivers.com>
Message-ID: <1cb725390608310105j2f8ee298p3a44d91fc91140ad@mail.gmail.com>

On 8/30/06, Jack Diederich <jack at psynchronous.com> wrote:
>
> On Wed, Aug 30, 2006 at 08:56:03PM -0700, Bob Ippolito wrote:
> > > and also based on the cET (and NFS) experiences, it wouldn't surprise
> me
> > > if a naive 32-bit text string implementation will, on average, slow
> things down
> > > *more* than any string view implementation can speed things up
> again...
> > >
> > > (in other words, I'm convinced that we need a polymorphic string
> type.  I'm not
> > > so sure we need views, but if we have the former, we can use that
> mechanism to
> > > support the latter)
> >
> > +1 for polymorphic strings.
> >
> > This would give us the best of both worlds: compact representations
> > for ASCII and Latin-1, full 32-bit text when needed, and the
> > possibility to implement further optimizations when necessary. It
> > could add a bit of complexity and/or a massive speed penalty
> > (depending on how naive the implementation is) around character
> > operations though.
>
> Having watched Fredrik casually double the speed of many str and unicode
> operations in a week I'm easily +1 on whatever he says.  Bob's support
> makes that a +2, he struck me as quite sane too.
>
> That said can you guys expand on what polymorphic[1] means here in
> particular?


I think that Bob alluded to it. They are talking about a string that uses 1
byte-per-character for ASCII text, perhaps two bytes-per-character for a mix
of Greek and Russian text and four bytes-per-character for certain Chinese
or Japanese strings. From the Python programmers' point of view it should be
an invisible optimization.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060831/0f61bc29/attachment-0001.htm 

From fredrik at pythonware.com  Thu Aug 31 10:21:00 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 31 Aug 2006 10:21:00 +0200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060831044354.GH6257@performancedrivers.com>
References: <20060827184941.1AE8.JCARLSON@uci.edu>
	<ed1q7r$v4s$2@sea.gmane.org>	<20060829102307.1B0F.JCARLSON@uci.edu>	<ed1uds$iog$1@sea.gmane.org>
	<ed3iq2$9iv$1@sea.gmane.org>	<6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com>
	<20060831044354.GH6257@performancedrivers.com>
Message-ID: <ed665c$nvs$1@sea.gmane.org>

Jack Diederich wrote:

> That said can you guys expand on what polymorphic[1] means here in particular?
> Python wise I can only think of the str/unicode/buffer split.  If the 
> fraternity of strings doesn't include views (which I haven't needed either)
> what are you considering for the other kinds?

the idea is to allow a given string object to use different kinds of 
storage depending on what data it contains, and how it's being used.

off the top of my head, I'd imagine using at least:

     wide unicode (32-bit)
     8-bit ascii/iso-8859-1
     utf-8

and possibly also one or more of

     narrow unicode (16-bit)
     8-bit encoded (arbitrary 8-bit encodings)
     utf-16
     selected asian encodings

all these look and behave the same at the Python level, as well as when 
using "high-level" C API:s.  ob_type may differ (also during an object's 
lifetime), but type(s) is always the same.

this approach gives you lots of advantages:

- lots of operations can be carried out without having to convert the 
  data (all the formats listed above supports forward iteration, and 
most text-level operations).

- you'll save tons of memory in applications that uses text mostly in a 
few character sets (and less memory means more speed).

- adding (or removing) specific string implementations becomes trivial, 
both for the core developers and extension writers.

etc.

the main disadvantage is that it becomes a bit more difficult to deal 
with strings at the C level (but properly dealing with both 8-bit and 
Unicode strings is already a pain in the ass, and I'm not sure this has 
to be any harder.  just slightly different).

for some details on apple's implementation (thanks bob!), see:

https://developer.apple.com/documentation/CoreFoundation/Conceptual/CFStrings/Concepts/StringStorage.html

</F>


From jimjjewett at gmail.com  Thu Aug 31 17:38:59 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 31 Aug 2006 11:38:59 -0400
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
	<305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org>
Message-ID: <fb6fbf560608310838h24cd17aem187a8326398d7cc2@mail.gmail.com>

On 8/30/06, Barry Warsaw <barry at python.org> wrote:
> On Aug 30, 2006, at 5:57 PM, Guido van Rossum wrote:

> > Perhaps a compromise could be to add a keyword parameter to request
> > such an exception? (We could even add three options: truncate, pad,
> > error, with truncate being the default, and pad being the old map()
> > and filter() behavior.)

> What about a keyword argument called 'filler' which can be an n-sized
> sequence or a callable.

How about a keyword-only argument called finish which is a callable to
deal with the problem?  When any sequence is exhausted, its position
is filled with StopIteration, and then finish(result) is returned.

For example,

    >>> g=zip("abc", (1,2))

The third call to g.next() will return the result of
    finish('c', StopIteration)

def finish_truncate(*args):
    # The default, like today
    raise StopIteration

def finish_error(*args):
    if all(v is StopIteration for v in args):
        raise StopIteration
    raise ValueError("Mismatched sequence length %s" % args)

def finish_padNone(*args):
    if all(v is StopIteration for v in args):
        raise StopIteration
    return tuple((v if v is not StopIteration else None) for v in args)

-jJ

From barry at python.org  Thu Aug 31 17:44:53 2006
From: barry at python.org (Barry Warsaw)
Date: Thu, 31 Aug 2006 11:44:53 -0400
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <fb6fbf560608310838h24cd17aem187a8326398d7cc2@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
	<305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org>
	<fb6fbf560608310838h24cd17aem187a8326398d7cc2@mail.gmail.com>
Message-ID: <FAD7156D-88B3-4B56-B3F7-E6EB0A1EFD40@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 31, 2006, at 11:38 AM, Jim Jewett wrote:

> On 8/30/06, Barry Warsaw <barry at python.org> wrote:
>
>> What about a keyword argument called 'filler' which can be an n-sized
>> sequence or a callable.
>
> How about a keyword-only argument called finish which is a callable to
> deal with the problem?  When any sequence is exhausted, its position
> is filled with StopIteration, and then finish(result) is returned.

Nice!
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRPcD+3EjvBPtnXfVAQKviQP/fEcBu7t2iXEfBom3flvDgcoauJp+/XSS
s2zdIivkQAZgs8kmbtYpk0R4KPyIUhyjHahzcxvUKKXGakfpIl73FBGSK+XfG/iq
IqQ33dW4Gl6YBt9HpOLVd0NP1RWUGl+QNegLP2ihgLoRFi0QK8fBj0FPoxHdHrfu
rIGXwJe6Qlg=
=0PRM
-----END PGP SIGNATURE-----

From rhettinger at ewtllc.com  Thu Aug 31 18:12:44 2006
From: rhettinger at ewtllc.com (Raymond Hettinger)
Date: Thu, 31 Aug 2006 09:12:44 -0700
Subject: [Python-3000] have zip() raise exception for sequences
 of	different lengths
In-Reply-To: <fb6fbf560608310838h24cd17aem187a8326398d7cc2@mail.gmail.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>	<44F608B6.5010209@ewtllc.com>	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>	<305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org>
	<fb6fbf560608310838h24cd17aem187a8326398d7cc2@mail.gmail.com>
Message-ID: <44F70A7C.602@ewtllc.com>


>How about a keyword-only argument called finish which is a callable to
>deal with the problem?  When any sequence is exhausted, its position
>is filled with StopIteration, and then finish(result) is returned.
>
>  
>

How about we resist the urge to complicate the snot out of a basic 
looping construct.  Hypergeneralization is more of a sin than premature 
optimization.

It is important that zip() be left as dirt simple as possible.  In the 
tutorial (section 5.6), we're able to use short, simple examples to 
teach all of the fundamental looping techniques to total beginners in a 
way that lets them save their brain power for learning exceptions, 
classes, generators, packages, and whatnot.

Creative talent is being wasted here just to solve a non-problem.   
Please keep Py3k on track for cruft removal. We're seeing way too much 
discussion on random, screwball proposals rather that focusing on what 
really matters:  Keeping the tried and true while removing stuff we've 
always wanted to take away.



Raymond

From guido at python.org  Thu Aug 31 18:29:32 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 31 Aug 2006 09:29:32 -0700
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <44F70A7C.602@ewtllc.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>
	<44F608B6.5010209@ewtllc.com>
	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>
	<305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org>
	<fb6fbf560608310838h24cd17aem187a8326398d7cc2@mail.gmail.com>
	<44F70A7C.602@ewtllc.com>
Message-ID: <ca471dc20608310929u15a2e328i97724625913e7f22@mail.gmail.com>

On 8/31/06, Raymond Hettinger <rhettinger at ewtllc.com> wrote:
>
> >How about a keyword-only argument called finish which is a callable to
> >deal with the problem?  When any sequence is exhausted, its position
> >is filled with StopIteration, and then finish(result) is returned.
>
> How about we resist the urge to complicate the snot out of a basic
> looping construct.  Hypergeneralization is more of a sin than premature
> optimization.

Hear, hear! Hypergeneralization adds features you can never get rid of
even though they may only be useful for <1% of the populations. At
least unnecessary optimizations can be rolled back safely.

> It is important that zip() be left as dirt simple as possible.  In the
> tutorial (section 5.6), we're able to use short, simple examples to
> teach all of the fundamental looping techniques to total beginners in a
> way that lets them save their brain power for learning exceptions,
> classes, generators, packages, and whatnot.
>
> Creative talent is being wasted here just to solve a non-problem.
> Please keep Py3k on track for cruft removal. We're seeing way too much
> discussion on random, screwball proposals rather that focusing on what
> really matters:  Keeping the tried and true while removing stuff we've
> always wanted to take away.

Amen.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Thu Aug 31 19:34:59 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 31 Aug 2006 19:34:59 +0200
Subject: [Python-3000] have zip() raise exception for sequences of
	different lengths
In-Reply-To: <44F70A7C.602@ewtllc.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>	<44F608B6.5010209@ewtllc.com>	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>	<305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org>	<fb6fbf560608310838h24cd17aem187a8326398d7cc2@mail.gmail.com>
	<44F70A7C.602@ewtllc.com>
Message-ID: <ed76k3$dsi$1@sea.gmane.org>

Raymond Hettinger wrote:
>>How about a keyword-only argument called finish which is a callable to
>>deal with the problem?  When any sequence is exhausted, its position
>>is filled with StopIteration, and then finish(result) is returned.
>>
>>  
>>
> 
> How about we resist the urge to complicate the snot out of a basic 
> looping construct.  Hypergeneralization is more of a sin than premature 
> optimization.
> 
> It is important that zip() be left as dirt simple as possible.

Added to PEP 3099.

Georg


From ironfroggy at gmail.com  Thu Aug 31 19:42:57 2006
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Thu, 31 Aug 2006 13:42:57 -0400
Subject: [Python-3000] Exception Expressions
Message-ID: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com>

I thought I felt in the mood for some abuse today, so I'm proposing
something sure to give me plenty of crap, but maybe someone will enjoy
the idea, anyway. This is a step beyond the recently added conditional
expressions. I actually made this up as a joke, explaining at which
point we would have gone too far with branching logic in an
expression. After making the joke, I was sad to realize I didn't mind
the idea and thought I'd see if anyone else doesn't mind it either.

    expr1 except expr2 if exc_type

For example, given a list, letters, of ['a', 'b', 'c'], we would be
able to do the following:

    print letters[7] except "N/A" if IndexError

This would translate to something along the lines of:

    try:
        _tmp = letters[7]
    except IndexError:
        _tmp = "N/A"
    print _tmp

Obviously, the except in an expression has to take precedence over if
expressions, otherwise it would evaluate '"N/A" if IndexError" first.
The syntax can be extended in some ways, to allow for handling
multiple exception types for one result or different results for
different exception types:

    foo() except "Bar or Baz!?" if BarError, BazError
    foo() except "Bar!" if BarError, "Baz!" if BazError

Other example use cases:

    # Fallback on an alternative path
    open(filename) except open(filename2) if IOError

    # Handle divide-by-zero
    while expr != "quit":
        print eval(expr) except "Can not divide by zero!" if ZeroDivisionError
        expr = raw_input()

    # Use a cache when an external resource timesout
    db.get(key) except cache.get(key) if TimeoutError

Only very basic exception handling would be useful with this syntax,
so nothing would ever get out of hand, unless someone wasn't caring
about their code looking good and keeping good line lengths, so their
code probably wouldn't look great to begin with.

If there is any positive response I'll write up a PEP.

From brett at python.org  Thu Aug 31 20:20:20 2006
From: brett at python.org (Brett Cannon)
Date: Thu, 31 Aug 2006 11:20:20 -0700
Subject: [Python-3000] Exception Expressions
In-Reply-To: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com>
References: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com>
Message-ID: <bbaeab100608311120v67b23b79p15c2d46fe86cbed9@mail.gmail.com>

On 8/31/06, Calvin Spealman <ironfroggy at gmail.com> wrote:
>
> I thought I felt in the mood for some abuse today, so I'm proposing
> something sure to give me plenty of crap, but maybe someone will enjoy
> the idea, anyway.


Never hurts too much to try, huh?  =)  Plus it gives me a break from my
work.

This is a step beyond the recently added conditional
> expressions. I actually made this up as a joke, explaining at which
> point we would have gone too far with branching logic in an
> expression. After making the joke, I was sad to realize I didn't mind
> the idea and thought I'd see if anyone else doesn't mind it either.
>
>     expr1 except expr2 if exc_type
>
> For example, given a list, letters, of ['a', 'b', 'c'], we would be
> able to do the following:
>
>     print letters[7] except "N/A" if IndexError


So this feels like the Perl idiom of using die: ``open(file) or die`` (or
something like that; I have never been a Perl guy so I could be off).

This would translate to something along the lines of:
>
>     try:
>         _tmp = letters[7]
>     except IndexError:
>         _tmp = "N/A"
>     print _tmp
>
> Obviously, the except in an expression has to take precedence over if
> expressions, otherwise it would evaluate '"N/A" if IndexError" first.
> The syntax can be extended in some ways, to allow for handling
> multiple exception types for one result or different results for
> different exception types:
>
>     foo() except "Bar or Baz!?" if BarError, BazError
>     foo() except "Bar!" if BarError, "Baz!" if BazError
>
> Other example use cases:
>
>     # Fallback on an alternative path
>     open(filename) except open(filename2) if IOError
>
>     # Handle divide-by-zero
>     while expr != "quit":
>         print eval(expr) except "Can not divide by zero!" if
> ZeroDivisionError
>         expr = raw_input()
>
>     # Use a cache when an external resource timesout
>     db.get(key) except cache.get(key) if TimeoutError
>
> Only very basic exception handling would be useful with this syntax,
> so nothing would ever get out of hand, unless someone wasn't caring
> about their code looking good and keeping good line lengths, so their
> code probably wouldn't look great to begin with.



The problem I have with this whole proposal is that catching exceptions
should be very obvious in the source code.  This proposal does not help with
that ideal.  So I am -1 on the whole idea.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060831/602811a4/attachment.html 

From jimjjewett at gmail.com  Thu Aug 31 20:21:32 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 31 Aug 2006 14:21:32 -0400
Subject: [Python-3000] Exception Expressions
In-Reply-To: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com>
References: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com>
Message-ID: <fb6fbf560608311121u6c05a6e4o475af21d6bc1e326@mail.gmail.com>

>     expr1 except expr2 if exc_type
...
>     print letters[7] except "N/A" if IndexError

I sort of like it, though I'm more worried than you about ugly code.

There have been many times when I wanted it so that I could use a list
comprehension (or generator comprehension) instead of a function or
block.

The bad news is that I seem to be an anti-channeller, so my interest
is perhaps not a *good* sign.

-jJ

From barry at python.org  Thu Aug 31 20:28:14 2006
From: barry at python.org (Barry Warsaw)
Date: Thu, 31 Aug 2006 14:28:14 -0400
Subject: [Python-3000] have zip() raise exception for sequences
	of	different lengths
In-Reply-To: <44F70A7C.602@ewtllc.com>
References: <d11dcfba0608301440u34f00311x714d3c1fe94f699a@mail.gmail.com>	<44F608B6.5010209@ewtllc.com>	<ca471dc20608301457l63a906f6occ5c7a00721de7cd@mail.gmail.com>	<305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org>
	<fb6fbf560608310838h24cd17aem187a8326398d7cc2@mail.gmail.com>
	<44F70A7C.602@ewtllc.com>
Message-ID: <01A13D75-12A7-4590-A4A1-F0488D4C105C@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 31, 2006, at 12:12 PM, Raymond Hettinger wrote:

> It is important that zip() be left as dirt simple as possible.  In  
> the tutorial (section 5.6), we're able to use short, simple  
> examples to teach all of the fundamental looping techniques to  
> total beginners in a way that lets them save their brain power for  
> learning exceptions, classes, generators, packages, and whatnot.

Without addressing zip() in particular (as I said before, its current  
API is just fine to me), and while agreeing with the general  
principle of keeping things as simple as they can be, I don't believe  
you have to teach all the ins-and-outs of a particular function,  
class, or module as soon as it's introduced in the tutorial.  It's  
perfectly fine to keep the intro examples short and sweet with a  
footnote saying "go here for more advanced usage".  There's a ton of  
stuff in Python that total beginners just don't need to know right away.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRPcqQ3EjvBPtnXfVAQJxSAP/Yk2Dqh88iHThSKoqHHr9rURGbO2UWPvt
R4xAFr4QMy4L8GtzLaG3l/RyeG59UwELgZCzRefw/aDuMotLrjrx4KvSb+FIgWmA
r/lwWnF34xWH+oSwD459WotkRIJxVnwCAUOJtiCGYqSKfSEf0z5OwDJfGCRCb6Iv
8RRqoeBlVVQ=
=iT7K
-----END PGP SIGNATURE-----

From talin at acm.org  Thu Aug 31 20:46:13 2006
From: talin at acm.org (Talin)
Date: Thu, 31 Aug 2006 11:46:13 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <20060831044354.GH6257@performancedrivers.com>
References: <20060827184941.1AE8.JCARLSON@uci.edu>
	<ed1q7r$v4s$2@sea.gmane.org>	<20060829102307.1B0F.JCARLSON@uci.edu>	<ed1uds$iog$1@sea.gmane.org>
	<ed3iq2$9iv$1@sea.gmane.org>	<6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com>
	<20060831044354.GH6257@performancedrivers.com>
Message-ID: <44F72E75.2050204@acm.org>

Jack Diederich wrote:
>>> (in other words, I'm convinced that we need a polymorphic string type.  I'm not
>>> so sure we need views, but if we have the former, we can use that mechanism to
>>> support the latter)
>> +1 for polymorphic strings.
>>
>> This would give us the best of both worlds: compact representations
>> for ASCII and Latin-1, full 32-bit text when needed, and the
>> possibility to implement further optimizations when necessary. It
>> could add a bit of complexity and/or a massive speed penalty
>> (depending on how naive the implementation is) around character
>> operations though.
>>
>> For implementation ideas, Apple's CoreFoundation has a mature
>> implementation of polymorphic strings in C (which is the basis for
>> their NSString type in Objective-C), and there's a cross-platform
>> subset of it available as CF-Lite:
>> http://developer.apple.com/opensource/cflite.html
>>
> 
> Having watched Fredrik casually double the speed of many str and unicode 
> operations in a week I'm easily +1 on whatever he says.  Bob's support 
> makes that a +2, he struck me as quite sane too.

One way to handle this efficiently would be to only support the 
encodings which have a constant character size: ASCII, Latin-1, UCS-2 
and UTF-32. In other words, if the content of your text is plain ASCII, 
use an 8-bit-per-character string; If the content is limited to the 
Unicode BMF (Basic Multilingual Plane) use UCS-2; And if you are using 
Unicode supplementary characters, use UTF-32.

(The difference between UCS-2 and UTF-16 is that UCS-2 is always 2 bytes 
per character, and doesn't support the supplemental characters above 
0xffff, whereas UTF-16 characters can be either 2 or 4 bytes.)

By avoiding UTF-8, UTF-16 and other variable-character-length formats, 
you can always insure that character index operations are done in 
constant time. Index operations would simply require scaling the index 
by the character size, rather than having to scan through the string and 
count characters.

The drawback of this method is that you may be forced to transform the 
entire string into a wider encoding if you add a single character that 
won't fit into the current encoding.

(Another option is to simply make all strings UTF-32 -- which is not 
that unreasonable, considering that text strings normally make up only a 
small fraction of a program's memory footprint. I am sure that there are 
applications that don't conform to this generalization, however. )

-- Talin

From guido at python.org  Thu Aug 31 20:55:15 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 31 Aug 2006 11:55:15 -0700
Subject: [Python-3000] Making more effective use of slice objects in Py3k
In-Reply-To: <44F72E75.2050204@acm.org>
References: <20060827184941.1AE8.JCARLSON@uci.edu> <ed1q7r$v4s$2@sea.gmane.org>
	<20060829102307.1B0F.JCARLSON@uci.edu> <ed1uds$iog$1@sea.gmane.org>
	<ed3iq2$9iv$1@sea.gmane.org>
	<6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com>
	<20060831044354.GH6257@performancedrivers.com>
	<44F72E75.2050204@acm.org>
Message-ID: <ca471dc20608311155i89d671dtdf99907674cbf87d@mail.gmail.com>

On 8/31/06, Talin <talin at acm.org> wrote:
> One way to handle this efficiently would be to only support the
> encodings which have a constant character size: ASCII, Latin-1, UCS-2
> and UTF-32. In other words, if the content of your text is plain ASCII,
> use an 8-bit-per-character string; If the content is limited to the
> Unicode BMF (Basic Multilingual Plane) use UCS-2; And if you are using
> Unicode supplementary characters, use UTF-32.
>
> (The difference between UCS-2 and UTF-16 is that UCS-2 is always 2 bytes
> per character, and doesn't support the supplemental characters above
> 0xffff, whereas UTF-16 characters can be either 2 or 4 bytes.)

I think we should also support UTF-16, since Java and .NET (and
Win32?) appear to be using effectively; making surrogate handling an
application issue doesn't seem *too* big of a burden for many apps.

> By avoiding UTF-8, UTF-16 and other variable-character-length formats,
> you can always insure that character index operations are done in
> constant time. Index operations would simply require scaling the index
> by the character size, rather than having to scan through the string and
> count characters.
>
> The drawback of this method is that you may be forced to transform the
> entire string into a wider encoding if you add a single character that
> won't fit into the current encoding.

A way to handle UTF-8 strings and other variable-length encodings
would be to maintain a small cache of index positions with the string
object.

> (Another option is to simply make all strings UTF-32 -- which is not
> that unreasonable, considering that text strings normally make up only a
> small fraction of a program's memory footprint. I am sure that there are
> applications that don't conform to this generalization, however. )

Here you are effectively voting against polymorphic strings. I believe
Fredrik has good reasons to doubt this assertion.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Thu Aug 31 22:58:55 2006
From: tjreedy at udel.edu (tjreedy)
Date: Thu, 31 Aug 2006 22:58:55 +0200
Subject: [Python-3000] Making more effective use of slice objects in Py3k
References: <20060827184941.1AE8.JCARLSON@uci.edu>
	<ed1q7r$v4s$2@sea.gmane.org><20060829102307.1B0F.JCARLSON@uci.edu>
	<ed1uds$iog$1@sea.gmane.org><ed3iq2$9iv$1@sea.gmane.org>
	<6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com>
Message-ID: <ed7iii$psn$1@sea.gmane.org>


"Bob Ippolito" <bob at redivi.com> wrote in message 
news:6a36e7290608302056v4b0e68abrfe0c5b1fc927ff at mail.gmail.com...
> +1 for polymorphic strings.

A strong +1 here also.
>
> This would give us the best of both worlds: compact representations
> for ASCII and Latin-1, full 32 bit text when needed, and the
> possibility to implement further optimizations when necessary.

As I understand current plans, Python 3 will have a polymorphic integer type 
that handles details of switching between the two current implementations, 
one for efficiency, and one for generality, behind the scenes.

I think it would be a great selling point for people to adopt Python 3 if it 
also handled the even worse nastiness of text forms behind the scenes, and 
kept the efficiency of special case uses (as in all ascii chars) while 
making the transition to generality more seamless than it is now.

These two similar features would be enough, to me, to make Py3 more than 
just 2.x with cruft removed.

Terry J. Reedy






From rhettinger at ewtllc.com  Thu Aug 31 23:29:36 2006
From: rhettinger at ewtllc.com (Raymond Hettinger)
Date: Thu, 31 Aug 2006 14:29:36 -0700
Subject: [Python-3000] Exception Expressions
In-Reply-To: <fb6fbf560608311121u6c05a6e4o475af21d6bc1e326@mail.gmail.com>
References: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com>
	<fb6fbf560608311121u6c05a6e4o475af21d6bc1e326@mail.gmail.com>
Message-ID: <44F754C0.8080404@ewtllc.com>


>The bad news is that I seem to be an anti-channeller, so my interest
>is perhaps not a *good* sign.
>
>  
>
QOTW

From tomerfiliba at gmail.com  Thu Aug 31 23:43:44 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Thu, 31 Aug 2006 23:43:44 +0200
Subject: [Python-3000] Comment on iostack library
Message-ID: <1d85506f0608311443s108822c1n31682ba765b2f3e0@mail.gmail.com>

i haven't been online for the last couple of days, so i'll unify
my replies into one post.

[Talin]
> Right now, a typical
> file handle consists of 3 "layers" - one representing the backing store
> (file, memory, network, etc.), one for adding buffering, and one
> representing the program-level API for reading strings, bytes, decoded
> text, etc.

yes, and it's also good you noted *typical*. the design is to
allow virtually unlimited number of such layers, stacked one
after the other, giving you very fine level of control without
having to write a single line of "procedural" or tailored code.
you just mix in what you want.

[Talin]
> I wonder if it wouldn't be better to cut that down to two. Specifically,
> I would like to suggest eliminating the buffering layer.
> My reasoning is fairly straightforward: Most file system handles,
> network handles and other operating system handles already support
> buffering, and they do a far better job of it than we can.

indeed, but as guido said (and i believe it also says so at my
wiki page), stdio cannot be trusted, let alone the way different
OSes implement things. buffering, for once, is a horrible issue.
i remember an old C program i wrote that worked fine on
windows, but not on linux, because i didn't print a newline and
stdout was line-buffered... i couldn't see the output, and it was
a nightmare to debug.

[Talin]
> Well, as far as readline goes: In order to split the text into lines,
> you have to decode the text first anyway, which is a layer 3 operation.
> You can't just read bytes until you get a \n, because the file you are
> reading might be encoded in UCS2 or something.

well, the LineBufferedLayer can be "configured" to split on any
"marker", i.e.: LineBufferedLayer(stream, marker = "\x00\x0a")
and of course layer 3, which creates layer 2, can set this marker
to any byte sequence. note it's a *byte* sequence, not chars,
since this passes down to layer 1 transparently.

i.e.

delimiters = {"utf8" : "\x0a", "utf16" : "\x00\x0a"}

def textfile(filename, mode, encoding = None):
    f = FileStream(filename, mode)
    f = LineBufferingLayer(f, delimiters[encoding])
    f = TextInterface(f, encoding)
    return f

[Talin]
> It seems to me that no matter how you slice it, you can't have an
> abstract "buffering" layer that is independent of both the layer beneath
> and the layer above.

but that's the whole idea! buffering is a complicated task that must
*not* be rewritten for every type of underlying storage. if one wanted
to write or read lines over a socket, one shouldn't have need to
reimplement file-like line buffering, as done by socket.py.

i want to be able to read lines directly from any stream: socket, file,
or memory. how i choose to implement my HTTP parser is my only
concern, i don't want to be limited by the kind of stream my parser
would work over.

[Nick]
> You'd insert a buffering layer at the appropriate point for whatever you're
> trying to do. The advantage of pulling the buffering out into a separate layer
> is that it can be reused with different byte sources & sinks by supplying the
> appropriate configuration parameters, instead of having to reimplement it for
> each different source/sink.

indeed

[Marcin]
> I think buffering makes sense as the topmost layer, and typically only
> there.
> Encoding conversion and newline conversion should be performed a block
> at a time, below buffering, so not only I/O syscalls, but also
> invocations of the recoding machinery are amortized by buffering.

you have a good point, which i also stumbled upon when implementing
the TextInterface. but how would you suggest to solve it?

write()ing is always simpler, because you already have the entire
buffer, which you can encode as a chunk.

when read()ing, you can decode() the entire pre-read buffer first,
but then you have a "tail" of undecodable data (an incomplete
character or record), which would be quite nasty to handle.

besides, encoding suffers from many issues. suppose you have a
damaged UTF8 file, which you read char-by-char. when we reach the
damaged part, you'll never be able to "skip" it, as we'll just keep
read()ing bytes, hoping to make a character out of it , until we
reach EOF, i.e.:

def read_char(self):
    buf = ""
    while not self._stream.eof:
        buf += self._stream.read(1)
        try:
            return buf.decode("utf8")
        except ValueError:
            pass

which leads me to the following thought: maybe we should have
an "enhanced" encoding library for py3k, which would report
*incomplete* data differently from *invalid* data. today it's just a
ValueError: suppose decode() would raise IncompleteDataError
when the given data is not sufficient to be decoded successfully,
and ValueError when the data is just corrupted.

that could aid iostack greatly.



-tomer

From ironfroggy at gmail.com  Thu Aug 31 23:50:02 2006
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Thu, 31 Aug 2006 17:50:02 -0400
Subject: [Python-3000] Exception Expressions
In-Reply-To: <bbaeab100608311120v67b23b79p15c2d46fe86cbed9@mail.gmail.com>
References: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com>
	<bbaeab100608311120v67b23b79p15c2d46fe86cbed9@mail.gmail.com>
Message-ID: <76fd5acf0608311450r6fbddd44n28ab6f83741b8699@mail.gmail.com>

On 8/31/06, Brett Cannon <brett at python.org> wrote:
> So this feels like the Perl idiom of using die: ``open(file) or die`` (or
> something like that; I have never been a Perl guy so I could be off).
>
> > ...
>
> The problem I have with this whole proposal is that catching exceptions
> should be very obvious in the source code.  This proposal does not help with
> that ideal.  So I am -1 on the whole idea.
>
> -Brett

"Ouch" on the associated my idea with perl!

Although I agree that it is good to be obvious about exceptions, there
are some cases when they are simply less than exceptional. For
example, you can do d.get(key, default) if you know something is a
dictionary, but for general mappings you can't rely on that, and may
often use exceptions as a kind of logic control. No, that doesn't sync
with the purity of exceptions, but sometimes practicality and
real-world usage trumps theory.

Only allowing a single expression, it shouldn't be able to get ugly.
Also, maybe I hate to admit it but it could allow 'expr1 except expr2'
as pretty something more like the 'or die' paradigm.