From hashcollision at gmail.com  Wed Aug  1 06:55:43 2007
From: hashcollision at gmail.com (hashcollision)
Date: Wed, 1 Aug 2007 00:55:43 -0400
Subject: [Python-3000] renaming suggestion
Message-ID: <37f76d50707312155h2bc297eci5591c777c8331f2d@mail.gmail.com>

I think that WeakKeyDictionary and should be renamed to WeakKeyDict (same
with WeakValueDictionary). This will make it consistent with dict and
collections.defaultdict.

Sincerely
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070801/9b2e42f1/attachment.htm 

From fdrake at acm.org  Wed Aug  1 13:54:56 2007
From: fdrake at acm.org (Fred Drake)
Date: Wed, 1 Aug 2007 07:54:56 -0400
Subject: [Python-3000] renaming suggestion
In-Reply-To: <37f76d50707312155h2bc297eci5591c777c8331f2d@mail.gmail.com>
References: <37f76d50707312155h2bc297eci5591c777c8331f2d@mail.gmail.com>
Message-ID: <60A87254-7A7B-4D5E-B219-02D5DE30846D@acm.org>

On Aug 1, 2007, at 12:55 AM, hashcollision wrote:
> I think that WeakKeyDictionary and should be renamed to WeakKeyDict  
> (same with WeakValueDictionary). This will make it consistent with  
> dict and collections.defaultdict.

Hmm.  I'm not opposed to providing new names for these classes if  
that really helps, though I'm not convinced that it does.  The old  
names should be preserved for backward compatibility.

If we're looking for some sort of consistency, it seems that having  
both CamelCase and righteouscase doesn't help.  There's precedent for  
both, but the general trend seems to be toward CamelCase for classes  
that aren't built-in.  (The use of a C implementation for defaultdict  
isn't a consideration, IMO.)

Would you consider weakkeydict and weakvaluedict better than  
WeakKeyDict and WeakValueDict?  If not, I suspect that consistency  
isn't the underlying motivation.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From greg.ewing at canterbury.ac.nz  Thu Aug  2 02:31:40 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 02 Aug 2007 12:31:40 +1200
Subject: [Python-3000] renaming suggestion
In-Reply-To: <60A87254-7A7B-4D5E-B219-02D5DE30846D@acm.org>
References: <37f76d50707312155h2bc297eci5591c777c8331f2d@mail.gmail.com>
	<60A87254-7A7B-4D5E-B219-02D5DE30846D@acm.org>
Message-ID: <46B125EC.6070706@canterbury.ac.nz>

Fred Drake wrote:
> Would you consider weakkeydict and weakvaluedict better than  
> WeakKeyDict and WeakValueDict?

I like WeakKeyDict and WeakValDict because the existing names
seem excessively long-winded, given that 'dict' or 'Dict' is
a well-established abbreviation.

But I actually think that at least some of the weakref stuff
should be builtin, seeing as support for it is wired directly
into the core implementation. In that case, all-lowercase
names would be more appropriate.

--
Greg

From talin at acm.org  Thu Aug  2 04:01:02 2007
From: talin at acm.org (Talin)
Date: Wed, 01 Aug 2007 19:01:02 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
Message-ID: <46B13ADE.7080901@acm.org>

I had a long discussion with Guido today, where he pointed out numerous 
flaws and inconsistencies in my PEP that I had overlooked. I won't go 
into all of the details of what he found, but I'd like to focus on what 
came out of the discussion. I'm going to be updating the PEP to 
incorporate the latest thinking, but I thought I would post it on Py3K 
first to see what people think.

The first change is to divide the conversion specifiers into two parts, 
which we will call "alignment specifiers" and "format specifiers". So 
the new syntax for a format field will be:

     valueSpec [,alignmentSpec] [:formatSpec]

In other words, alignmentSpec is introduced by a comma, and conversion 
spec is introduced by a colon. This use of comma and colon is taken 
directly from .Net. although our alignment and conversion specifiers 
themselves look nothing like the ones in .Net.

Alignment specifiers now includes the former 'fill', 'align' and 'width' 
properties. So for example, to indicate a field width of 8:

     "Property count {0,8}".format(propertyCount)

The 'formatSpec' now includes the former 'sign' and 'type' parameters:

     "Number of errors: {0:+d}".format(errCount)

In the preceding example, this would indicate an integer field preceded 
by a sign for both positive and negative numbers.

There are still some things to be worked out. For example, there are 
currently 3 different meanings of 'width': Minimum width, maximum width, 
and number of digits of decimal precision. The previous version of the 
PEP followed the 2.x convention, which was 'n.n' - 'min.prec' for 
floats, and 'min.max' for everything else. However, that seems confusing.

(I'm actually still working out the details - and in fact a little bit 
of a bikeshed discussion would be welcome at this point, as I could use 
some help ironing out these kinds of little inconsistencies.)

In general, you can think of the difference between format specifier and 
alignment specifier as:

     Format Specifier: Controls how the value is converted to a string.
     Alignment Specifier: Controls how the string is placed on the line.

Another change in the behavior is that the __format__ special method can 
only be used to override the format specifier - it can't be used to 
override the alignment specifier. The reason is simple: __format__ is 
used to control how your object is string-ified. It shouldn't get 
involved in things like left/right alignment or field width, which are 
really properties of the field, not the object being printed.

The __format__ special method can basically completely change how the 
format specifier is interpreted. So for example for Date objects you can 
have a format specifier that looks like the input to strftime().

However, there are times when you want to override the __format__ hook. 
The primary use case is the 'r' conversion specifier, which is used to 
get the repr() of an object.

At the moment I'm leaning towards using the exclamation mark ('!') to 
indicate this, in a way that's analogous to the CSS "! important" flag - 
it basically means "No, I really mean it!" Two possible syntax 
alternatives are:

     "The repr is {0!r}".format(obj)
     "The repr is {0:r!}".format(obj)

In the first option, we use '!' in place of the colon. In the second 
case, we use '!' as a suffix.

Another change suggested by Guido is explicit support for the Decimal 
type. Under the current proposal, a format specifier of 'f' will cause 
the Decimal object to be coerced to float before printing. That's not 
what we want, because it will cause a loss of precision. Instead, the 
rule should be that Decimal can use all of the same formatting types as 
float, but it won't try to convert the Decimal to float as an 
intermediate step.

Here's some pseudo code outlining how the new formatting algorithm for 
fields will work:

     def format_field(value, alignmentSpec, formatSpec):
         if value has a __format__ attribute, and no '!' flag:
             s = value.__format__(value, formatSpec)
         else:
             if the formatSpec is 'r':
                  s = repr(value)
             else if the formatSpec is 'd' or one of the integer types:
                  # Coerce to int
                  s = formatInteger(int(value), formatSpec)
             else if the formatSpec is 'f' or one of the float types:
                   if value is a Decimal:
                       s = formatDecimal(value, formatSpec)
                   else:
                       # Coerce to float
                       s = formatFloat(float(value), formatSpec)
             else:
                 s = str(value)

     # Now that we have 's', apply the alignment options
     return applyAlignment(s, alignmentSpec)

My goal is that some time in the next several weeks I would like to get 
working a C implementation of just this function. Most of the complexity 
of the PEP implementation is right here IMHO.

Before I edit the PEP I'm going to let this marinate for a week and see 
what the discussion brings up.

-- Talin

From rrr at ronadam.com  Thu Aug  2 06:58:40 2007
From: rrr at ronadam.com (Ron Adam)
Date: Wed, 01 Aug 2007 23:58:40 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B13ADE.7080901@acm.org>
References: <46B13ADE.7080901@acm.org>
Message-ID: <46B16480.7050502@ronadam.com>



Talin wrote:
> I had a long discussion with Guido today, where he pointed out numerous 
> flaws and inconsistencies in my PEP that I had overlooked. I won't go 
> into all of the details of what he found, but I'd like to focus on what 
> came out of the discussion. I'm going to be updating the PEP to 
> incorporate the latest thinking, but I thought I would post it on Py3K 
> first to see what people think.
> 
> The first change is to divide the conversion specifiers into two parts, 
> which we will call "alignment specifiers" and "format specifiers". So 
> the new syntax for a format field will be:
> 
>      valueSpec [,alignmentSpec] [:formatSpec]
> 
> In other words, alignmentSpec is introduced by a comma, and conversion 
> spec is introduced by a colon. This use of comma and colon is taken 
> directly from .Net. although our alignment and conversion specifiers 
> themselves look nothing like the ones in .Net.
> 
> Alignment specifiers now includes the former 'fill', 'align' and 'width' 
> properties. So for example, to indicate a field width of 8:
> 
>      "Property count {0,8}".format(propertyCount)

How would I specify align right, and width 42?


> The 'formatSpec' now includes the former 'sign' and 'type' parameters:
> 
>      "Number of errors: {0:+d}".format(errCount)
 >
> In the preceding example, this would indicate an integer field preceded 
> by a sign for both positive and negative numbers.

How can I get negative numbers to print in red?  Just kidding. ;-)

(I recently was frustrated by not being able to have text of two different 
colors in a single tkinter button.)


> There are still some things to be worked out. For example, there are 
> currently 3 different meanings of 'width': Minimum width, maximum width, 
> and number of digits of decimal precision. The previous version of the 
> PEP followed the 2.x convention, which was 'n.n' - 'min.prec' for 
> floats, and 'min.max' for everything else. However, that seems confusing.

Yep, enough so that I need to look it up more often than I like.

> (I'm actually still working out the details - and in fact a little bit 
> of a bikeshed discussion would be welcome at this point, as I could use 
> some help ironing out these kinds of little inconsistencies.)
> 
> In general, you can think of the difference between format specifier and 
> alignment specifier as:
> 
>      Format Specifier: Controls how the value is converted to a string.
>      Alignment Specifier: Controls how the string is placed on the line.

Keeping related terms together is important I think.  The order]

    {item, alignment: format}

Splits the item and it's formatter.  The alignment is more of a container 
property or feild property as you pointed out further down.

So maybe if you group the related values together...

     {valuespec:format, alignment:width}

This has a nice dictionary feel and maybe that may be useful as well. 
Reusing things I'm familiar with does make it easier.


So it would be possible to create a dictionary and use it's repr() function 
as a formatter.  Nice little bonus. ;-)

     string_format = {0:'i', 'R':12}
     string_format.repr().format(number)

Well almost, there is the tiny problem of the quotes inside it.  :/


So in order to right justify an integer...

    "Property count {0:i, R:8}".format(propertyCount)


The precision for floats is not part of the field width, so it belongs in 
formatter term.

    "Total cost {0:f2, R:12}".format(totalcost)

I'm not sure if it should be 'f.2' or just "f2".


> Another change in the behavior is that the __format__ special method can 
> only be used to override the format specifier - it can't be used to 
> override the alignment specifier. The reason is simple: __format__ is 
> used to control how your object is string-ified. It shouldn't get 
> involved in things like left/right alignment or field width, which are 
> really properties of the field, not the object being printed.

Right, so maybe there should be both a __format__, and an __alignment__ method?


> The __format__ special method can basically completely change how the 
> format specifier is interpreted. So for example for Date objects you can 
> have a format specifier that looks like the input to strftime().
> 
> However, there are times when you want to override the __format__ hook. 
> The primary use case is the 'r' conversion specifier, which is used to 
> get the repr() of an object.
> 
> At the moment I'm leaning towards using the exclamation mark ('!') to 
> indicate this, in a way that's analogous to the CSS "! important" flag - 
> it basically means "No, I really mean it!" Two possible syntax 
> alternatives are:
> 
>      "The repr is {0!r}".format(obj)
>      "The repr is {0:r!}".format(obj)
> 
> In the first option, we use '!' in place of the colon. In the second 
> case, we use '!' as a suffix.

-1

This doesn't feel right to me.  It seems to me if you do this, then we will 
see all sorts of weird __repr__ methods that return things completely 
different than we get now.


As an alternative ...

Some sort of indirect formatting should be possible, but maybe it can be 
part of the data passed to the format method.

     "{0:s, L:10} = {1:!, L:12}".format(obj.name, (obj, rept))


Another possibility is to chain them some how.

     "{0:!, L:10} = {1:!, L:12}".formatfn(str, repr).format(obj, obj)

Same could be done with the alignments if there is a use case for it.


> Another change suggested by Guido is explicit support for the Decimal 
> type. Under the current proposal, a format specifier of 'f' will cause 
> the Decimal object to be coerced to float before printing. That's not 
> what we want, because it will cause a loss of precision. Instead, the 
> rule should be that Decimal can use all of the same formatting types as 
> float, but it won't try to convert the Decimal to float as an 
> intermediate step.

+1

I'm hoping at some point (in the not too far future) I will be able to tell 
python to use Decimal in place of floats and not have to change nearly 
every function that produces a float literal.  This is a step in that 
direction.

> Here's some pseudo code outlining how the new formatting algorithm for 
> fields will work:
> 
>      def format_field(value, alignmentSpec, formatSpec):
>          if value has a __format__ attribute, and no '!' flag:
>              s = value.__format__(value, formatSpec)
>          else:
>              if the formatSpec is 'r':
>                   s = repr(value)
>              else if the formatSpec is 'd' or one of the integer types:
>                   # Coerce to int
>                   s = formatInteger(int(value), formatSpec)
>              else if the formatSpec is 'f' or one of the float types:
>                    if value is a Decimal:
>                        s = formatDecimal(value, formatSpec)
>                    else:
>                        # Coerce to float
>                        s = formatFloat(float(value), formatSpec)
>              else:
>                  s = str(value)
> 
>      # Now that we have 's', apply the alignment options
>      return applyAlignment(s, alignmentSpec)
> 
> My goal is that some time in the next several weeks I would like to get 
> working a C implementation of just this function. Most of the complexity 
> of the PEP implementation is right here IMHO.
> 
> Before I edit the PEP I'm going to let this marinate for a week and see 
> what the discussion brings up.
> 
> -- Talin

Great work Talin, I cheer your efforts at keeping this moving given the 
many directions and turns it has taken!

Cheers,
    Ron


From talin at acm.org  Thu Aug  2 07:32:55 2007
From: talin at acm.org (Talin)
Date: Wed, 01 Aug 2007 22:32:55 -0700
Subject: [Python-3000] PEP 3115 chaining rules (was Re: pep 3124 plans)
In-Reply-To: <20070731162912.3E2C53A40A7@sparrow.telecommunity.com>
References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com>	<46A19FCC.7070609@acm.org>	<20070721181442.48FB03A403A@sparrow.telecommunity.com>	<46A2AE31.2080105@canterbury.ac.nz>	<20070722020422.5AAAC3A403A@sparrow.telecommunity.com>	<46A3ECB7.9070504@canterbury.ac.nz>	<20070723010750.E27693A40A9@sparrow.telecommunity.com>	<46A453C7.9070407@acm.org>	<20070723153031.D00273A403D@sparrow.telecommunity.com>	<5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com>	<ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com>	<20070727162212.60E2F3A40E6@sparrow.telecommunity.com>	<20070730201511.C14ED3A406B@sparrow.telecommunity.com>	<46AF037C.9050902@gmail.com>
	<20070731162912.3E2C53A40A7@sparrow.telecommunity.com>
Message-ID: <46B16C87.7060807@acm.org>

Phillip J. Eby wrote:
> At 07:40 PM 7/31/2007 +1000, Nick Coghlan wrote:
>> Phillip J. Eby wrote:
>>> In other words, a class' metaclass has to be a derivative of all 
>>> its bases' metaclasses; ISTM that a __prepare__ namespace needs to 
>>> be a derivative in some sense of all its bases' __prepare__ 
>>> results.  This probably isn't enforceable, but the pattern should 
>>> be documented such that e.g. the overloading metaclass' __prepare__ 
>>> would return a mapping that delegates operations to the mapping 
>>> returned by its super()'s __prepare__, and the actual class 
>>> creation would be similarly chained.  PEP 3115 probably needs a 
>>> section to explain these issues and recommend best practices for 
>>> implementing __prepare__ and class creation on that basis.  I'll 
>>> write something up after I've thought this through some more.
>> A variant of the metaclass rule specific to __prepare__ might look 
>> something like:
>>   A class's metaclass providing the __prepare__ method must be a 
>> subclass of all of the class's base classes providing __prepare__ methods.
> 
> That doesn't really work; among other things, it would require 
> everything to be a dict subclass, since type.__prepare__() will 
> presumably return a dict.  Therefore, it really does need to be 
> delegation instead of inheritance, or it becomes very difficult to 
> provide any "interesting" properties.

I think you are on to something here.

I think that in order to 'mix' metaclasses, each metaclass needs to get 
a crack at the members as they are defined. The 'dict' object really 
isn't important - what's important is to be able to overload the 
creation of a class member.

I can think of a couple ways to accomplish this.

1) The first, and most brute force idea is to pass to a metaclass's 
__prepare__ statement an extra parameter containing the result of the 
previous metaclass's __prepare__ method. This would allow the 
__prepare__ statement to *wrap* the earlier metaclass's dict, 
intercepting the insertion operations or passing them through as needed.

In fact, you could even make it so that the first __prepare__ in the 
chain gets passed in a regular dict. So __prepare__ always gets a dict 
which it is supposed to wrap, although it can choose to ignore it.

2) The second idea is to recognize the fact that we were never all that 
interested in creating a special dict subclass; The reason we chose that 
is because it seemed like an easy way to hook in to the addition of new 
members, by overridding the dict's insertion function. In other words, 
what the metaclass wants is the ability to override *insertion*. So you 
could change the metaclass interface to make it so that insertion is 
overridable, but in an "event chain" way, so that each metaclass gets a 
shot at the insertion event as it occurs.

The problem here is that we need to support more than just insertion, 
but lookup (and possibly deletion) as well.

This also leads to the third idea, which I am sure that you - of all 
people - have already thought of, which is:

3) Use something like your generic function 'next method' pattern. In 
fact, go the whole way and say that"add_class_member(cls:Metaclass, 
name, member, next:next_method)" is a generic function, and then call 
next_method to inform the next metaclass in the chain.

There are two obvious problems here: First, we can't dispatch on 'cls' 
since it's not been created yet.

Second, the metaclass machinery is deep down inside the interpreter and 
operates at the very heart of Python. Which means that in order to use 
generic functions, they would have to be built-in to the heart of Python 
as well. While I would love to see that happen some day, I am not 
comfortable giving an untried, brand new module such 'blessed' status.

-- Talin

From talin at acm.org  Thu Aug  2 07:40:32 2007
From: talin at acm.org (Talin)
Date: Wed, 01 Aug 2007 22:40:32 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B16480.7050502@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B16480.7050502@ronadam.com>
Message-ID: <46B16E50.9050308@acm.org>

Ron Adam wrote:

> Splits the item and it's formatter.  The alignment is more of a 
> container property or feild property as you pointed out further down.
> 
> So maybe if you group the related values together...
> 
>     {valuespec:format, alignment:width}
> 
> This has a nice dictionary feel and maybe that may be useful as well. 
> Reusing things I'm familiar with does make it easier.

I'm certainly open to switching the order of things around. Remember, 
though, that we have *5* fields (6 depending on how you count) of 
formatting options to deal with (see the PEP for details):

    -- alignment
    -- padding char
    -- minwidth
    -- maxwidth
    -- fractional digits
    -- type

...And we need to be able to represent these succinctly. That last is 
important and here's why: None of these formatting codes are necessary 
at all, because in the final analysis you can get the same effect by 
wrapping the arguments to format() with the appropriate padding, 
alignment, and type conversion function calls.

In other words, the whole point of these format codes is that they are 
convenient shortcuts. And shortcuts by definition need to be *short*.

So we need to strike a balance between convenience and readability.

-- Talin

From rrr at ronadam.com  Thu Aug  2 12:31:54 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 02 Aug 2007 05:31:54 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B16E50.9050308@acm.org>
References: <46B13ADE.7080901@acm.org> <46B16480.7050502@ronadam.com>
	<46B16E50.9050308@acm.org>
Message-ID: <46B1B29A.3020209@ronadam.com>



Talin wrote:
> Ron Adam wrote:
> 
>> Splits the item and it's formatter.  The alignment is more of a 
>> container property or feild property as you pointed out further down.
>>
>> So maybe if you group the related values together...
>>
>>     {valuespec:format, alignment:width}
>>
>> This has a nice dictionary feel and maybe that may be useful as well. 
>> Reusing things I'm familiar with does make it easier.
> 
> I'm certainly open to switching the order of things around. Remember, 
> though, that we have *5* fields (6 depending on how you count) of 
> formatting options to deal with (see the PEP for details):
> 
>    -- alignment
>    -- padding char
>    -- minwidth
>    -- maxwidth
>    -- fractional digits
>    -- type
> 
> ...And we need to be able to represent these succinctly. That last is 
> important and here's why: None of these formatting codes are necessary 
> at all, because in the final analysis you can get the same effect by 
> wrapping the arguments to format() with the appropriate padding, 
> alignment, and type conversion function calls.
> 
> In other words, the whole point of these format codes is that they are 
> convenient shortcuts. And shortcuts by definition need to be *short*.
> 
> So we need to strike a balance between convenience and readability.
> 
> -- Talin

I wasn't thinking we would treat each option separately, just type, and 
alignment groups.  Within those, we would have pretty much what you have 
proposed.


Maybe it will help more to go a little slower instead of jumping in and 
offering a bunch of new alternatives.


What code specifies Decimals, "D"?


Any chance of thousands separators?


And what about exchanging comma's and decimals?


Does the following look complete, or needs anything added or removed?


Format Specifiers:
     As string:
         type           # [s|r]

     As integer:
         sign           # [-|+|()]
         fill_char      # character    \__ +07d -> +0000123
         fill_width     # number       /   fills with zeros
         type           # [b|c|d|o|x|X]  (*)

     As fixed point:
         sign           # [-|+|()]
         fill_char      # character
         fill_width     # number
         precision      # number       (fractional digits)
         type           # [e|E|f|F|g|G|n|%]


Alignment Specifiers:
     If formatted value is is shorter than min_width:
         align          # [<|>|^]
         min_width      # number
         padding_char   # character

     If formatted value is longer than max_width:
         max_width      # number



Cheers,
    Ron















From guido at python.org  Thu Aug  2 16:46:32 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Aug 2007 07:46:32 -0700
Subject: [Python-3000] PEP 3115 chaining rules (was Re: pep 3124 plans)
In-Reply-To: <46B16C87.7060807@acm.org>
References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com>
	<46A453C7.9070407@acm.org>
	<20070723153031.D00273A403D@sparrow.telecommunity.com>
	<5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com>
	<ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com>
	<20070727162212.60E2F3A40E6@sparrow.telecommunity.com>
	<20070730201511.C14ED3A406B@sparrow.telecommunity.com>
	<46AF037C.9050902@gmail.com>
	<20070731162912.3E2C53A40A7@sparrow.telecommunity.com>
	<46B16C87.7060807@acm.org>
Message-ID: <ca471dc20708020746o36c3d9al7528b789f471622b@mail.gmail.com>

On 8/1/07, Talin <talin at acm.org> wrote:
> I think that in order to 'mix' metaclasses, each metaclass needs to get
> a crack at the members as they are defined. The 'dict' object really
> isn't important - what's important is to be able to overload the
> creation of a class member.
>
> I can think of a couple ways to accomplish this.
>
> 1) The first, and most brute force idea is to pass to a metaclass's
> __prepare__ statement an extra parameter containing the result of the
> previous metaclass's __prepare__ method. This would allow the
> __prepare__ statement to *wrap* the earlier metaclass's dict,
> intercepting the insertion operations or passing them through as needed.

I'm confused. The only way to mix metaclasses is by explicitly
multiply inheriting them. So the normal "super" machinery should work,
shouldn't it? (Except for 'type' not defining __prepare__(), but that
can be fixed.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stargaming at gmail.com  Thu Aug  2 17:03:42 2007
From: stargaming at gmail.com (Stargaming)
Date: Thu, 2 Aug 2007 15:03:42 +0000 (UTC)
Subject: [Python-3000] optimizing [x]range
References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com>
Message-ID: <f8srod$5m7$1@sea.gmane.org>

On Sat, 28 Jul 2007 17:06:50 +0200, tomer filiba wrote:

> currently, testing for "x in xrange(y)" is an O(n) operation.
> 
> since xrange objects (which would become range in py3k) are not real
> lists, there's no reason that __contains__ be an O(n). it can easily be
> made into an O(1) operation. here's a demo code (it should be trivial to
> implement this in CPython)
> 
> 
> class xxrange(object):
>     def __init__(self, *args):
>         if len(args) == 1:
>             self.start, self.stop, self.step = (0, args[0], 1)
>         elif len(args) == 2:
>             self.start, self.stop, self.step = (args[0], args[1], 1)
>         elif len(args) == 3:
>             self.start, self.stop, self.step = args
>         else:
>             raise TypeError("invalid number of args")
> 
>     def __iter__(self):
>         i = self.start
>         while i < self.stop:
>             yield i
>             i += self.step
> 
>     def __contains__(self, num):
>         if num < self.start or num > self.stop:
>             return False
>         return (num - self.start) % self.step == 0
> 
> 
> print list(xxrange(7))            # [0, 1, 2, 3, 4, 5, 6] print
> list(xxrange(0, 7, 2))      # [0, 2, 4, 6] print list(xxrange(1, 7, 2)) 
>     # [1, 3, 5] print 98 in xxrange(100)          # True print 98 in
> xxrange(0, 100, 2)    # True print 99 in xxrange(0, 100, 2)    # False
> print 98 in xxrange(1, 100, 2)    # False print 99 in xxrange(1, 100, 2)
>    # True
> 
> 
> 
> -tomer

I gave the implementation a try. I cannot guarantee that it follows every 
guideline for the CPython core but it works, it's fast and passes all 
tests::

    >>> 98 in xrange(0, 100, 2)
    True
    >>> 99 in xrange(0, 100, 2)
    False
    >>> 98 in xrange(1, 100, 2)
    False
    >>> 99 in xrange(1, 100, 2)
    True

Note: test_xrange wasn't really helpful with validating xrange's 
functionality. No tests for negative steps, at least it didn't warn me 
while it didn't work. ;)

It is basically the algorithm you provided, with a fix for negative steps.

The patch is based on the latest trunk/ checkout, Python 2.6. I don't 
think this is a problem if nobody else made any effort towards making 
xrange more sequence-like in the Python 3000 branch. The C source might 
require some tab/space cleanup.

Speed comparison
================

$ ./python -V
Python 2.6a0
$ python -V
Python 2.5.1

./python is my local build, with the patch applied.

$ ./python -mtimeit "0 in xrange(100)"
1000000 loops, best of 3: 0.641 usec per loop
$ python -mtimeit "0 in xrange(100)"
1000000 loops, best of 3: 0.717 usec per loop

Well, with a lot of ignorance, this is still the same.

$ ./python -mtimeit "99 in xrange(100)"
1000000 loops, best of 3: 0.638 usec per loop
$ python -mtimeit "99 in xrange(100)"
100000 loops, best of 3: 6.17 usec per loop

Notice the difference in the magnitude of loops!

$ ./python -mtimeit -s "from sys import maxint" "maxint in xrange(maxint)"
1000000 loops, best of 3: 0.622 usec per loop
$ python -mtimeit -s "from sys import maxint" "maxint in xrange(maxint)"

Still waiting.. ;)


Index: Objects/rangeobject.c
===================================================================
--- Objects/rangeobject.c       (revision 56666)
+++ Objects/rangeobject.c       (working copy)
@@ -129,12 +129,31 @@
        return rtn;
 }

+static int
+range_contains(rangeobject *r, PyIntObject *key) {
+    if (PyInt_Check(key)) {
+        int keyval = key->ob_ival;
+        int start = r->start;
+        int step = r->step;
+        int end = start + (r->len * step);
+
+        if ((step < 0 && keyval <= start && keyval > end) \
+           || (step > 0 && keyval >= start && keyval < end)) {
+            return ((keyval - start) % step) == 0;
+        }
+    }
+    return 0;
+}
+
 static PySequenceMethods range_as_sequence = {
        (lenfunc)range_length,  /* sq_length */
        0,                      /* sq_concat */
        0,                      /* sq_repeat */
        (ssizeargfunc)range_item, /* sq_item */
        0,                      /* sq_slice */
+    0, /* sq_ass_item */
+    0, /* sq_ass_slice */
+    (objobjproc)range_contains, /* sq_contains */
 };

 static PyObject * range_iter(PyObject *seq);


Test suite
==========

OK
288 tests OK.
CAUTION:  stdout isn't compared in verbose mode:
a test that passes in verbose mode may fail without it.
1 test failed:
    test_nis # due to verbosity, I guess
38 tests skipped:
    test_aepack test_al test_applesingle test_bsddb test_bsddb185
    test_bsddb3 test_bz2 test_cd test_cl test_codecmaps_cn
    test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr
    test_codecmaps_tw test_curses test_dbm test_gdbm test_gl
    test_imageop test_imgfile test_linuxaudiodev test_macostools
    test_normalization test_ossaudiodev test_pep277 test_plistlib
    test_scriptpackages test_socketserver test_sqlite test_startfile
    test_sunaudiodev test_tcl test_timeout test_urllib2net
    test_urllibnet test_winreg test_winsound test_zipfile64
5 skips unexpected on linux2:
    test_tcl test_dbm test_bz2 test_gdbm test_bsddb


Should I submit the patch to the SF patch manager as well?

Regards,
Stargaming


From stargaming at gmail.com  Thu Aug  2 18:19:29 2007
From: stargaming at gmail.com (Stargaming)
Date: Thu, 2 Aug 2007 16:19:29 +0000 (UTC)
Subject: [Python-3000] optimizing [x]range
References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com>
	<f8srod$5m7$1@sea.gmane.org>
Message-ID: <f8t06h$efu$1@sea.gmane.org>

On Thu, 02 Aug 2007 15:03:42 +0000, Stargaming wrote:

> On Sat, 28 Jul 2007 17:06:50 +0200, tomer filiba wrote:
> 
>> currently, testing for "x in xrange(y)" is an O(n) operation.
>> 
>> since xrange objects (which would become range in py3k) are not real
>> lists, there's no reason that __contains__ be an O(n). it can easily be
>> made into an O(1) operation. here's a demo code (it should be trivial
>> to implement this in CPython)
[snipped algorithm]
> 
> I gave the implementation a try. 
[snipped patch details]
>
> Should I submit the patch to the SF patch manager as well?

Guido> Yes, please submit to SF.

Submitted to the SF patch manager as patch #1766304. It is marked as a 
Python 2.6 item.

http://sourceforge.net/tracker/index.php?
func=detail&aid=1766304&group_id=5470&atid=305470

Regards,
Stargaming



From pje at telecommunity.com  Thu Aug  2 18:24:04 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 02 Aug 2007 12:24:04 -0400
Subject: [Python-3000] PEP 3115 chaining rules (was Re: pep 3124 plans)
In-Reply-To: <ca471dc20708020746o36c3d9al7528b789f471622b@mail.gmail.com
 >
References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com>
	<46A453C7.9070407@acm.org>
	<20070723153031.D00273A403D@sparrow.telecommunity.com>
	<5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com>
	<ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com>
	<20070727162212.60E2F3A40E6@sparrow.telecommunity.com>
	<20070730201511.C14ED3A406B@sparrow.telecommunity.com>
	<46AF037C.9050902@gmail.com>
	<20070731162912.3E2C53A40A7@sparrow.telecommunity.com>
	<46B16C87.7060807@acm.org>
	<ca471dc20708020746o36c3d9al7528b789f471622b@mail.gmail.com>
Message-ID: <20070802162146.2B0DB3A406B@sparrow.telecommunity.com>

At 07:46 AM 8/2/2007 -0700, Guido van Rossum wrote:
>On 8/1/07, Talin <talin at acm.org> wrote:
> > I think that in order to 'mix' metaclasses, each metaclass needs to get
> > a crack at the members as they are defined. The 'dict' object really
> > isn't important - what's important is to be able to overload the
> > creation of a class member.
> >
> > I can think of a couple ways to accomplish this.
> >
> > 1) The first, and most brute force idea is to pass to a metaclass's
> > __prepare__ statement an extra parameter containing the result of the
> > previous metaclass's __prepare__ method. This would allow the
> > __prepare__ statement to *wrap* the earlier metaclass's dict,
> > intercepting the insertion operations or passing them through as needed.
>
>I'm confused. The only way to mix metaclasses is by explicitly
>multiply inheriting them. So the normal "super" machinery should work,
>shouldn't it?

Yes.  As I said in the email Talin was replying to, it's sufficient 
to document the fact that a metaclass should call its super()'s 
__prepare__ and delegate operations to it; the additional stuff Talin 
is suggesting is unnecessary.

>  (Except for 'type' not defining __prepare__(), but that
>can be fixed.)

Yep. 


From guido at python.org  Thu Aug  2 18:48:27 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Aug 2007 09:48:27 -0700
Subject: [Python-3000] PEP 3115 chaining rules (was Re: pep 3124 plans)
In-Reply-To: <20070802162146.2B0DB3A406B@sparrow.telecommunity.com>
References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com>
	<5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com>
	<ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com>
	<20070727162212.60E2F3A40E6@sparrow.telecommunity.com>
	<20070730201511.C14ED3A406B@sparrow.telecommunity.com>
	<46AF037C.9050902@gmail.com>
	<20070731162912.3E2C53A40A7@sparrow.telecommunity.com>
	<46B16C87.7060807@acm.org>
	<ca471dc20708020746o36c3d9al7528b789f471622b@mail.gmail.com>
	<20070802162146.2B0DB3A406B@sparrow.telecommunity.com>
Message-ID: <ca471dc20708020948u4c87a997p3e93bf7a51f9503b@mail.gmail.com>

On 8/2/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 07:46 AM 8/2/2007 -0700, Guido van Rossum wrote:
> >On 8/1/07, Talin <talin at acm.org> wrote:
> > > I think that in order to 'mix' metaclasses, each metaclass needs to get
> > > a crack at the members as they are defined. The 'dict' object really
> > > isn't important - what's important is to be able to overload the
> > > creation of a class member.
> > >
> > > I can think of a couple ways to accomplish this.
> > >
> > > 1) The first, and most brute force idea is to pass to a metaclass's
> > > __prepare__ statement an extra parameter containing the result of the
> > > previous metaclass's __prepare__ method. This would allow the
> > > __prepare__ statement to *wrap* the earlier metaclass's dict,
> > > intercepting the insertion operations or passing them through as needed.
> >
> >I'm confused. The only way to mix metaclasses is by explicitly
> >multiply inheriting them. So the normal "super" machinery should work,
> >shouldn't it?
>
> Yes.  As I said in the email Talin was replying to, it's sufficient
> to document the fact that a metaclass should call its super()'s
> __prepare__ and delegate operations to it; the additional stuff Talin
> is suggesting is unnecessary.
>
> >  (Except for 'type' not defining __prepare__(), but that
> >can be fixed.)
>
> Yep.

Committed revision 56672.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nicko at nicko.org  Thu Aug  2 18:47:45 2007
From: nicko at nicko.org (Nicko van Someren)
Date: Thu, 2 Aug 2007 17:47:45 +0100
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B13ADE.7080901@acm.org>
References: <46B13ADE.7080901@acm.org>
Message-ID: <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>

On 2 Aug 2007, at 03:01, Talin wrote:
> In general, you can think of the difference between format  
> specifier and
> alignment specifier as:
>
>      Format Specifier: Controls how the value is converted to a  
> string.
>      Alignment Specifier: Controls how the string is placed on the  
> line.
>
> Another change in the behavior is that the __format__ special  
> method can
> only be used to override the format specifier - it can't be used to
> override the alignment specifier. The reason is simple: __format__ is
> used to control how your object is string-ified. It shouldn't get
> involved in things like left/right alignment or field width, which are
> really properties of the field, not the object being printed.

Say I format numbers in an accounting system and, in the absence of  
being able to colour my losses in red, I choose the parentheses sign  
representation style ().  In this case I'd like to be able to have my  
numbers align thus:
	   1000
	    200
	  (3000)
	     40
	 (50000)
I.e. with the bulk of the padding applied before the number but  
conditional padding after the number if there is no closing bracket.

If the placement is done entirely outside the __format__ method then  
you to make sure that it is documented that, when using the () style  
of sign indicator, positive numbers need to have a space placed  
either side, e.g. -100 goes to "(100)" but +100 does to " 100 ".  If  
you do this then it should all come out in the wash, but I think it  
deserves a note somewhere.

	Cheers,
		Nicko


From guido at python.org  Thu Aug  2 20:30:58 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Aug 2007 11:30:58 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
Message-ID: <ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>

Personally I think support for the various accounting-style output is
not worth it. I betcha any accounting system worth the name would not
use this and instead have its own custom code for formatting anyway.

My personal suggestion is to stay close to the .NET formatting language:

  name_specifier [',' width_specifier] [':' conversion_specifier]

where width_specifier is a positive or negative number giving the
minimum width (negative for left-alignment) and conversion_specifier
is passed uninterpreted to the object's __format__ method.

In order to support the use cases for %s and %r, I propose to allow
appending a single letter 's', 'r' or 'f' to the width_specifier
(*not* the conversion_specifier):

 'r' always calls repr() on the object;
 's' always calls str() on the object;
 'f' calls the object's __format__() method passing it the
conversion_specifier, or if it has no __format__() method, calls
repr() on it. This is also the default.

If no __format__() method was called (either because 'r' or 's' was
used, or because there was no __format__() method on the object), the
conversion_specifier (if given) is a *maximum* length; this handles
the pretty common use cases of %.20s and %.20r (limiting the size of a
printed value).

The numeric types are the main types that must provide __format__().
(I also propose that for datetime types the format string ought to be
interpreted as a strftime format string.) I think that
float.__format__() should *not* support the integer formatting codes
(d, x, o etc.) -- I find the current '%d' % 3.14 == '3' an abomination
which is most likely an incidental effect of calling int() on the
argument (should really be __index__()). But int.__format__() should
support the float formatting codes; I think '%6.3f' % 12 should return
' 12.000'. This is in line with 1/2 returning 0.5; int values should
produce results identical to the corresponding float values when used
in the same context. I think this should be solved inside
int.__format__() though; the generic formatting code should not have
to know about this.

--Guido

On 8/2/07, Nicko van Someren <nicko at nicko.org> wrote:
> On 2 Aug 2007, at 03:01, Talin wrote:
> > In general, you can think of the difference between format
> > specifier and
> > alignment specifier as:
> >
> >      Format Specifier: Controls how the value is converted to a
> > string.
> >      Alignment Specifier: Controls how the string is placed on the
> > line.
> >
> > Another change in the behavior is that the __format__ special
> > method can
> > only be used to override the format specifier - it can't be used to
> > override the alignment specifier. The reason is simple: __format__ is
> > used to control how your object is string-ified. It shouldn't get
> > involved in things like left/right alignment or field width, which are
> > really properties of the field, not the object being printed.
>
> Say I format numbers in an accounting system and, in the absence of
> being able to colour my losses in red, I choose the parentheses sign
> representation style ().  In this case I'd like to be able to have my
> numbers align thus:
>            1000
>             200
>           (3000)
>              40
>          (50000)
> I.e. with the bulk of the padding applied before the number but
> conditional padding after the number if there is no closing bracket.
>
> If the placement is done entirely outside the __format__ method then
> you to make sure that it is documented that, when using the () style
> of sign indicator, positive numbers need to have a space placed
> either side, e.g. -100 goes to "(100)" but +100 does to " 100 ".  If
> you do this then it should all come out in the wash, but I think it
> deserves a note somewhere.
>
>         Cheers,
>                 Nicko
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Thu Aug  2 20:47:44 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 02 Aug 2007 11:47:44 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B13ADE.7080901@acm.org>
References: <46B13ADE.7080901@acm.org>
Message-ID: <46B226D0.5010505@trueblade.com>

Talin wrote:
> The first change is to divide the conversion specifiers into two parts, 
> which we will call "alignment specifiers" and "format specifiers". So 
> the new syntax for a format field will be:
> 
>      valueSpec [,alignmentSpec] [:formatSpec]
> 
> In other words, alignmentSpec is introduced by a comma, and conversion 
> spec is introduced by a colon. This use of comma and colon is taken 
> directly from .Net. although our alignment and conversion specifiers 
> themselves look nothing like the ones in .Net.

Should the .format() method (and underlying machinery) interpret the 
formatSpec (and/or the alignmentSpec) at all?  I believe .NET doesn't 
proscribe any meaning to its format specifiers, but instead passes them 
in to the object for parsing and interpretation.  You would lose the 
ability to implicitely convert to ints, floats and strings, but maybe 
that should be explicit, anyway.

And if we do that, why not require all objects to support a __format__
method?  Maybe if it's not present we could convert to a string, and use
the default string __format__ method.  This way, there would be less
special purpose machinery, and .format() could just parse out the {}'s, 
extract the object from the parameters, and call the underlying object's 
__format__ method.

Eric.



From jyasskin at gmail.com  Thu Aug  2 20:53:18 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Thu, 2 Aug 2007 11:53:18 -0700
Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy
	for Numbers
In-Reply-To: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com>
References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com>
Message-ID: <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>

After some more discussion, I have another version of the PEP with a
draft, partial implementation. Let me know what you think.



PEP: 3141
Title: A Type Hierarchy for Numbers
Version: $Revision: 56646 $
Last-Modified: $Date: 2007-08-01 10:11:55 -0700 (Wed, 01 Aug 2007) $
Author: Jeffrey Yasskin <jyasskin at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Apr-2007
Post-History: 25-Apr-2007, 16-May-2007, 02-Aug-2007


Abstract
========

This proposal defines a hierarchy of Abstract Base Classes (ABCs) (PEP
3119) to represent number-like classes. It proposes a hierarchy of
``Number :> Complex :> Real :> Rational :> Integral`` where ``A :> B``
means "A is a supertype of B", and a pair of ``Exact``/``Inexact``
classes to capture the difference between ``floats`` and
``ints``. These types are significantly inspired by Scheme's numeric
tower [#schemetower]_.

Rationale
=========

Functions that take numbers as arguments should be able to determine
the properties of those numbers, and if and when overloading based on
types is added to the language, should be overloadable based on the
types of the arguments. For example, slicing requires its arguments to
be ``Integrals``, and the functions in the ``math`` module require
their arguments to be ``Real``.

Specification
=============

This PEP specifies a set of Abstract Base Classes, and suggests a
general strategy for implementing some of the methods. It uses
terminology from PEP 3119, but the hierarchy is intended to be
meaningful for any systematic method of defining sets of classes.

The type checks in the standard library should use these classes
instead of the concrete built-ins.


Numeric Classes
---------------

We begin with a Number class to make it easy for people to be fuzzy
about what kind of number they expect. This class only helps with
overloading; it doesn't provide any operations. ::

    class Number(metaclass=ABCMeta): pass


Most implementations of complex numbers will be hashable, but if you
need to rely on that, you'll have to check it explicitly: mutable
numbers are supported by this hierarchy. **Open issue:** Should
__pos__ coerce the argument to be an instance of the type it's defined
on? Why do the builtins do this? ::

    class Complex(Number):
        """Complex defines the operations that work on the builtin complex type.

        In short, those are: a conversion to complex, .real, .imag, +, -,
        *, /, abs(), .conjugate, ==, and !=.

        If it is given heterogenous arguments, and doesn't have special
        knowledge about them, it should fall back to the builtin complex
        type as described below.
        """

        @abstractmethod
        def __complex__(self):
            """Return a builtin complex instance."""

        def __bool__(self):
            """True if self != 0."""
            return self != 0

        @abstractproperty
        def real(self):
            """Retrieve the real component of this number.

            This should subclass Real.
            """
            raise NotImplementedError

        @abstractproperty
        def imag(self):
            """Retrieve the real component of this number.

            This should subclass Real.
            """
            raise NotImplementedError

        @abstractmethod
        def __add__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __radd__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __neg__(self):
            raise NotImplementedError

        def __pos__(self):
            return self

        def __sub__(self, other):
            return self + -other

        def __rsub__(self, other):
            return -self + other

        @abstractmethod
        def __mul__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __rmul__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __div__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __rdiv__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __pow__(self, exponent):
            """Like division, a**b should promote to complex when necessary."""
            raise NotImplementedError

        @abstractmethod
        def __rpow__(self, base):
            raise NotImplementedError

        @abstractmethod
        def __abs__(self):
            """Returns the Real distance from 0."""
            raise NotImplementedError

        @abstractmethod
        def conjugate(self):
            """(x+y*i).conjugate() returns (x-y*i)."""
            raise NotImplementedError

        @abstractmethod
        def __eq__(self, other):
            raise NotImplementedError

        def __ne__(self, other):
            return not (self == other)


The ``Real`` ABC indicates that the value is on the real line, and
supports the operations of the ``float`` builtin. Real numbers are
totally ordered except for NaNs (which this PEP basically ignores). ::

    class Real(Complex):
        """To Complex, Real adds the operations that work on real numbers.

        In short, those are: a conversion to float, trunc(), divmod,
        %, <, <=, >, and >=.

        Real also provides defaults for the derived operations.
        """

        @abstractmethod
        def __float__(self):
            """Any Real can be converted to a native float object."""
            raise NotImplementedError

        @abstractmethod
        def __trunc__(self):
            """Truncates self to an Integral.

            Returns an Integral i such that:
              * i>0 iff self>0
              * abs(i) <= abs(self).
            """
            raise NotImplementedError

        def __divmod__(self, other):
            """The pair (self // other, self % other).

            Sometimes this can be computed faster than the pair of
            operations.
            """
            return (self // other, self % other)

        def __rdivmod__(self, other):
            """The pair (self // other, self % other).

            Sometimes this can be computed faster than the pair of
            operations.
            """
            return (other // self, other % self)

        @abstractmethod
        def __floordiv__(self, other):
            """The floor() of self/other. Integral."""
            raise NotImplementedError

        @abstractmethod
        def __rfloordiv__(self, other):
            """The floor() of other/self."""
            raise NotImplementedError

        @abstractmethod
        def __mod__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __rmod__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __lt__(self, other):
            """< on Reals defines a total ordering, except perhaps for NaN."""
            raise NotImplementedError

        @abstractmethod
        def __le__(self, other):
            raise NotImplementedError

        # Concrete implementations of Complex abstract methods.

        def __complex__(self):
            return complex(float(self))

        @property
        def real(self):
            return self

        @property
        def imag(self):
            return 0

        def conjugate(self):
            """Conjugate is a no-op for Reals."""
            return self


There is no built-in rational type, but it's straightforward to write,
so we provide an ABC for it. **Open issue**: Add Demo/classes/Rat.py
to the stdlib? ::

    class Rational(Real, Exact):
        """.numerator and .denominator should be in lowest terms."""

        @abstractproperty
        def numerator(self):
            raise NotImplementedError

        @abstractproperty
        def denominator(self):
            raise NotImplementedError

        # Concrete implementation of Real's conversion to float.

        def __float__(self):
            return self.numerator / self.denominator


And finally integers::

    class Integral(Rational):
        """Integral adds a conversion to int and the bit-string operations."""

        @abstractmethod
        def __int__(self):
            raise NotImplementedError

        def __index__(self):
            return int(self)

        @abstractmethod
        def __pow__(self, exponent, modulus):
            """self ** exponent % modulus, but maybe faster.

            Implement this if you want to support the 3-argument version
            of pow(). Otherwise, just implement the 2-argument version
            described in Complex. Raise a TypeError if exponent < 0 or any
            argument isn't Integral.
            """
            raise NotImplementedError

        @abstractmethod
        def __lshift__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __rlshift__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __rshift__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __rrshift__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __and__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __rand__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __xor__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __rxor__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __or__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __ror__(self, other):
            raise NotImplementedError

        @abstractmethod
        def __invert__(self):
            raise NotImplementedError

        # Concrete implementations of Rational and Real abstract methods.

        def __float__(self):
            return float(int(self))

        @property
        def numerator(self):
            return self

        @property
        def denominator(self):
            return 1


Exact vs. Inexact Classes
-------------------------

Floating point values may not exactly obey several of the properties
you would expect. For example, it is possible for ``(X + -X) + 3 ==
3``, but ``X + (-X + 3) == 0``. On the range of values that most
functions deal with this isn't a problem, but it is something to be
aware of.

Therefore, I define ``Exact`` and ``Inexact`` ABCs to mark whether
types have this problem. Every instance of ``Integral`` and
``Rational`` should be Exact, but ``Reals`` and ``Complexes`` may or
may not be. (Do we really only need one of these, and the other is
defined as ``not`` the first?) ::

    class Exact(Number): pass
    class Inexact(Number): pass


Changes to operations and __magic__ methods
-------------------------------------------

To support more precise narrowing from float to int (and more
generally, from Real to Integral), I'm proposing the following new
__magic__ methods, to be called from the corresponding library
functions. All of these return Integrals rather than Reals.

1. ``__trunc__(self)``, called from a new builtin ``trunc(x)``, which
   returns the Integral closest to ``x`` between 0 and ``x``.

2. ``__floor__(self)``, called from ``math.floor(x)``, which returns
   the greatest Integral ``<= x``.

3. ``__ceil__(self)``, called from ``math.ceil(x)``, which returns the
   least Integral ``>= x``.

4. ``__round__(self)``, called from ``round(x)``, with returns the
   Integral closest to ``x``, rounding half toward even. **Open
   issue:** We could support the 2-argument version, but then we'd
   only return an Integral if the second argument were ``<= 0``.

5. ``__properfraction__(self)``, called from a new function,
   ``math.properfraction(x)``, which resembles C's ``modf()``: returns
   a pair ``(n:Integral, r:Real)`` where ``x == n + r``, both ``n``
   and ``r`` have the same sign as ``x``, and ``abs(r) < 1``. **Open
   issue:** Oh, we already have ``math.modf``. What name do we want
   for this? Should we use divmod(x, 1) instead?

Because the ``int()`` conversion from ``float`` is equivalent to but
less explicit than ``trunc()``, let's remove it. (Or, if that breaks
too much, just add a deprecation warning.)

``complex.__{divmod,mod,floordiv,int,float}__`` should also go
away. These should continue to raise ``TypeError`` to help confused
porters, but should not appear in ``help(complex)`` to avoid confusing
more people. **Open issue:** This is difficult to do with the
``PyNumberMethods`` struct. What's the best way to accomplish it?


Notes for type implementors
---------------------------

Implementors should be careful to make equal numbers equal and
hash them to the same values. This may be subtle if there are two
different extensions of the real numbers. For example, a complex type
could reasonably implement hash() as follows::

        def __hash__(self):
	    return hash(complex(self))

but should be careful of any values that fall outside of the built in
complex's range or precision.

Adding More Numeric ABCs
~~~~~~~~~~~~~~~~~~~~~~~~

There are, of course, more possible ABCs for numbers, and this would
be a poor hierarchy if it precluded the possibility of adding
those. You can add ``MyFoo`` between ``Complex`` and ``Real`` with::

    class MyFoo(Complex): ...
    MyFoo.register(Real)

Implementing the arithmetic operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We want to implement the arithmetic operations so that mixed-mode
operations either call an implementation whose author knew about the
types of both arguments, or convert both to the nearest built in type
and do the operation there. For subtypes of Integral, this means that
__add__ and __radd__ should be defined as::

    class MyIntegral(Integral):

        def __add__(self, other):
            if isinstance(other, MyIntegral):
                return do_my_adding_stuff(self, other)
            elif isinstance(other, OtherTypeIKnowAbout):
                return do_my_other_adding_stuff(self, other)
            else:
                return NotImplemented

        def __radd__(self, other):
            if isinstance(other, MyIntegral):
                return do_my_adding_stuff(other, self)
            elif isinstance(other, OtherTypeIKnowAbout):
                return do_my_other_adding_stuff(other, self)
            elif isinstance(other, Integral):
                return int(other) + int(self)
            elif isinstance(other, Real):
                return float(other) + float(self)
            elif isinstance(other, Complex):
                return complex(other) + complex(self)
            else:
                return NotImplemented


There are 5 different cases for a mixed-type operation on subclasses
of Complex. I'll refer to all of the above code that doesn't refer to
MyIntegral and OtherTypeIKnowAbout as "boilerplate". ``a`` will be an
instance of ``A``, which is a subtype of ``Complex`` (``a : A <:
Complex``), and ``b : B <: Complex``. I'll consider ``a + b``:

    1. If A defines an __add__ which accepts b, all is well.
    2. If A falls back to the boilerplate code, and it were to return
       a value from __add__, we'd miss the possibility that B defines
       a more intelligent __radd__, so the boilerplate should return
       NotImplemented from __add__. (Or A may not implement __add__ at
       all.)
    3. Then B's __radd__ gets a chance. If it accepts a, all is well.
    4. If it falls back to the boilerplate, there are no more possible
       methods to try, so this is where the default implementation
       should live.
    5. If B <: A, Python tries B.__radd__ before A.__add__. This is
       ok, because it was implemented with knowledge of A, so it can
       handle those instances before delegating to Complex.

If ``A<:Complex`` and ``B<:Real`` without sharing any other knowledge,
then the appropriate shared operation is the one involving the built
in complex, and both __radd__s land there, so ``a+b == b+a``.


Rejected Alternatives
=====================

The initial version of this PEP defined an algebraic hierarchy
inspired by a Haskell Numeric Prelude [#numericprelude]_ including
MonoidUnderPlus, AdditiveGroup, Ring, and Field, and mentioned several
other possible algebraic types before getting to the numbers. I had
expected this to be useful to people using vectors and matrices, but
the NumPy community really wasn't interested, and we ran into the
issue that even if ``x`` is an instance of ``X <: MonoidUnderPlus``
and ``y`` is an instance of ``Y <: MonoidUnderPlus``, ``x + y`` may
still not make sense.

Then I gave the numbers a much more branching structure to include
things like the Gaussian Integers and Z/nZ, which could be Complex but
wouldn't necessarily support things like division. The community
decided that this was too much complication for Python, so I've now
scaled back the proposal to resemble the Scheme numeric tower much
more closely.


References
==========

.. [#pep3119] Introducing Abstract Base Classes
   (http://www.python.org/dev/peps/pep-3119/)

.. [#classtree] Possible Python 3K Class Tree?, wiki page created by
Bill Janssen
   (http://wiki.python.org/moin/AbstractBaseClasses)

.. [#numericprelude] NumericPrelude: An experimental alternative
hierarchy of numeric type classes
   (http://darcs.haskell.org/numericprelude/docs/html/index.html)

.. [#schemetower] The Scheme numerical tower
   (http://www.swiss.ai.mit.edu/ftpdir/scheme-reports/r5rs-html/r5rs_8.html#SEC50)


Acknowledgements
================

Thanks to Neil Norwitz for encouraging me to write this PEP in the
first place, to Travis Oliphant for pointing out that the numpy people
didn't really care about the algebraic concepts, to Alan Isaac for
reminding me that Scheme had already done this, and to Guido van
Rossum and lots of other people on the mailing list for refining the
concept.

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: numbers.diff
Type: application/octet-stream
Size: 28346 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070802/1ae0d946/attachment-0001.obj 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pep-3141.txt
Url: http://mail.python.org/pipermail/python-3000/attachments/20070802/1ae0d946/attachment-0001.txt 

From martin at v.loewis.de  Thu Aug  2 21:43:14 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 02 Aug 2007 21:43:14 +0200
Subject: [Python-3000] optimizing [x]range
In-Reply-To: <f8srod$5m7$1@sea.gmane.org>
References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com>
	<f8srod$5m7$1@sea.gmane.org>
Message-ID: <46B233D2.4030304@v.loewis.de>

> The patch is based on the latest trunk/ checkout, Python 2.6. I don't 
> think this is a problem if nobody else made any effort towards making 
> xrange more sequence-like in the Python 3000 branch. The C source might 
> require some tab/space cleanup.

Unfortunately, this is exactly what happened: In Py3k, the range object
is defined in terms PyObject*, so your patch won't apply to the 3k branch.

Regards,
Martin

From guido at python.org  Thu Aug  2 21:47:09 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Aug 2007 12:47:09 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B2323E.6040206@cornell.edu>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2323E.6040206@cornell.edu>
Message-ID: <ca471dc20708021247w5e7450aam6a16206b93534343@mail.gmail.com>

On 8/2/07, Joel Bender <jjb5 at cornell.edu> wrote:
> > My personal suggestion is to stay close to the .NET formatting language
>
> If Microsoft formatting ideas are going to be used, why not use the
> Excel language?  In my mind it's not any worse than any other string of
> characters with special meanings.  It's widely understood (mostly),
> clearly documented (kinda), and I think the date and time formatting is
> clearer than strftime.
>
> I would expect [Red] to be omitted.

You may be overestimating how widely it understood it is. I betcha
that most Python programmers have never heard of it. I certainly have
no idea what the Excel language is (and I've had Excel on my various
laptops for about a decade).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jjb5 at cornell.edu  Thu Aug  2 21:36:30 2007
From: jjb5 at cornell.edu (Joel Bender)
Date: Thu, 02 Aug 2007 15:36:30 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
Message-ID: <46B2323E.6040206@cornell.edu>

Guido van Rossum wrote:

> Personally I think support for the various accounting-style output is
> not worth it. I betcha any accounting system worth the name would not
> use this and instead have its own custom code for formatting anyway.
> 
> My personal suggestion is to stay close to the .NET formatting language

If Microsoft formatting ideas are going to be used, why not use the 
Excel language?  In my mind it's not any worse than any other string of 
characters with special meanings.  It's widely understood (mostly), 
clearly documented (kinda), and I think the date and time formatting is 
clearer than strftime.

I would expect [Red] to be omitted.


Joel

From guido at python.org  Fri Aug  3 00:25:36 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Aug 2007 15:25:36 -0700
Subject: [Python-3000] optimizing [x]range
In-Reply-To: <46B233D2.4030304@v.loewis.de>
References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com>
	<f8srod$5m7$1@sea.gmane.org> <46B233D2.4030304@v.loewis.de>
Message-ID: <ca471dc20708021525m654eecb0wa5997ba602ebd1d7@mail.gmail.com>

On 8/2/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > The patch is based on the latest trunk/ checkout, Python 2.6. I don't
> > think this is a problem if nobody else made any effort towards making
> > xrange more sequence-like in the Python 3000 branch. The C source might
> > require some tab/space cleanup.
>
> Unfortunately, this is exactly what happened: In Py3k, the range object
> is defined in terms PyObject*, so your patch won't apply to the 3k branch.

FWIW, making xrange (or range in Py3k) "more sequence-like" is exactly
what should *not* happen.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Fri Aug  3 01:01:39 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 2 Aug 2007 19:01:39 -0400
Subject: [Python-3000] Updated and simplified PEP 3141: A Type
	Hierarchyfor Numbers
References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com>
	<5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>
Message-ID: <f8tnog$45f$1@sea.gmane.org>

"Jeffrey Yasskin" <jyasskin at gmail.com> wrote in message 
news:5d44f72f0708021153u7ea1f443jfdee3c167b011011 at mail.gmail.com...

|        def __bool__(self):
|            """True if self != 0."""
|            return self != 0

Could this be a Number rather than Complex method?
---------------

| There is no built-in rational type

Floats constitute a bit-size bounded (like ints) set of rationals with 
denominators restricted to powers of two.  Decimal literals and Decimals 
constitute a memory bounded (like longs) set of rationals with denominators 
instead restricted to powers of ten.  I suspect that if both were presented 
as such, new programmers would be less likely to ask if
>>> 1.1
1.1000000000000001
is a bug in Python.

Math.frexp returns a disguised form of (numerator,denominator) (without 
common factor of two removal).  If undisguised functions were added (and 
the same for Decimal), there would be no need, really, for class Real.

If such were done, a .num_denom() method either supplementing or replacing 
.numerator() and .denominator() and returning (num, denom) would have the 
same efficiency justification of int.divmod.

I would like to see a conforming Rat.py class with unrestricted 
denominators.
--------------------

| And finally integers::
|
|    class Integral(Rational):
|        """Integral adds a conversion to int and the bit-string 
operations."""

The bit-string operations are not 'integer' operations.  Rather they are 
'integers represented as powers of two' operations.  While << and >> can be 
interpreted (and implemented) as * and //, the other four are genernally 
meaningless for other representations, such as prime factorization or 
fibonacci base.  The Lib Ref agrees:
   3.4.1 Bit-string Operations on Integer Types
   Plain and long integer types support additional operations that make 
sense
   only for bit-strings
Other integer types should not have to support them to call themselves 
Integral.  So I think at least |, ^, &, and ~ should be removed from 
Integral and put in a subclass thereof.  Possible names are Pow2Int or 
BitStringInt or BitIntegral.
-----------

In short, having read up to the beginning of Exact vs. Inexact Classes, my 
suggestion is to delete the unrealizable 'real' class and add an easily 
realizable non-bit-string integer class.

Terry Jan Reedy




From tjreedy at udel.edu  Fri Aug  3 01:30:59 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 2 Aug 2007 19:30:59 -0400
Subject: [Python-3000] Updated and simplified PEP 3141: A
	TypeHierarchyfor Numbers
References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com><5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>
	<f8tnog$45f$1@sea.gmane.org>
Message-ID: <f8tpfh$85o$1@sea.gmane.org>


"Terry Reedy" <tjreedy at udel.edu> wrote in message 
news:f8tnog$45f$1 at sea.gmane.org...
|| In short, having read up to the beginning of Exact vs. Inexact Classes, 
my
| suggestion is to delete the unrealizable 'real' class

Less than a minute after hitting Send, I realized that one could base a 
(restricted) class of non-rational reals on tuple of rationals, with one 
being an exponent of the other.  But since operations on such pairs 
generally do not simplify to such a pair, the members of such a class would 
have to be expression trees.  So computation would be mostly symbolic 
rather than actual.  And I don't think we need an ABC for such a 
specialized symbolic computation class.

tjr




From guido at python.org  Fri Aug  3 01:34:50 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Aug 2007 16:34:50 -0700
Subject: [Python-3000] Updated and simplified PEP 3141: A Type
	Hierarchyfor Numbers
In-Reply-To: <f8tnog$45f$1@sea.gmane.org>
References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com>
	<5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>
	<f8tnog$45f$1@sea.gmane.org>
Message-ID: <ca471dc20708021634y1606531el266d37f8f8a5f706@mail.gmail.com>

On 8/2/07, Terry Reedy <tjreedy at udel.edu> wrote:
> Floats constitute a bit-size bounded (like ints) set of rationals with
> denominators restricted to powers of two.  Decimal literals and Decimals
> constitute a memory bounded (like longs) set of rationals with denominators
> instead restricted to powers of ten.  I suspect that if both were presented
> as such, new programmers would be less likely to ask if
> >>> 1.1
> 1.1000000000000001
> is a bug in Python.

You gotta be kidding. That complaint mostly comes from people who
would completely glaze over an explanation like the paragraph above.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Fri Aug  3 03:26:58 2007
From: janssen at parc.com (Bill Janssen)
Date: Thu, 2 Aug 2007 18:26:58 PDT
Subject: [Python-3000] Updated and simplified PEP 3141: A Type
	Hierarchyfor Numbers
In-Reply-To: <f8tnog$45f$1@sea.gmane.org> 
References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com>
	<5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>
	<f8tnog$45f$1@sea.gmane.org>
Message-ID: <07Aug2.182707pdt."57996"@synergy1.parc.xerox.com>

Terry,

I liked these ideas so much I removed both "integer" and "float" from
the HTTP-NG type system.  See
http://www.parc.com/janssen/pubs/http-next-generation-architecture.html,
section 4.5.1.  Though if I was doing it again, I'd go further, and
make all fixed-point, floating-point, and string types abstract, so
that only application-defined concrete subtypes could be instantiated.

Bill

From greg.ewing at canterbury.ac.nz  Fri Aug  3 03:33:08 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Aug 2007 13:33:08 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
Message-ID: <46B285D4.1060207@canterbury.ac.nz>

Nicko van Someren wrote:
> 	   1000
> 	    200
> 	  (3000)
> 	     40
> 	 (50000)
> I.e. with the bulk of the padding applied before the number but  
> conditional padding after the number if there is no closing bracket.

I think it should be the responsibility of the formatter
to add the extra space when needed. Then the aligner can
just do its usual thing with the result and doesn't have
to know anything about the format.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Fri Aug  3 04:03:52 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Aug 2007 14:03:52 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
Message-ID: <46B28D08.9000700@canterbury.ac.nz>

Guido van Rossum wrote:
> In order to support the use cases for %s and %r, I propose to allow
> appending a single letter 's', 'r' or 'f' to the width_specifier
> (*not* the conversion_specifier):
> 
>  'r' always calls repr() on the object;
>  's' always calls str() on the object;
>  'f' calls the object's __format__() method passing it the
> conversion_specifier, or if it has no __format__() method, calls
> repr() on it. This is also the default.

Won't it seem a bit unintuitive that 'r' and 's' have
to come before the colon, whereas all the others come
after it?

It would seem more logical to me if 'r' and 's' were
treated as special cases of the conversion specifier
that are recognised before calling __format__.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From brandon at rhodesmill.org  Fri Aug  3 04:14:58 2007
From: brandon at rhodesmill.org (Brandon Craig Rhodes)
Date: Thu, 02 Aug 2007 22:14:58 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	(Guido van Rossum's message of "Thu, 2 Aug 2007 11:30:58 -0700")
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
Message-ID: <87d4y5749p.fsf@ten22.rhodesmill.org>

"Guido van Rossum" <guido at python.org> writes:

> My personal suggestion is to stay close to the .NET formatting language:
>
>   name_specifier [',' width_specifier] [':' conversion_specifier]

A problem is that this format requires brute memorization to remember
where to put things.  If letters were used to prefix specifications,
like "w" for width and "p" for precision, one could write something
like:

   >>> 'The average is: {0:w8p2} today.'.format(avg)
   'The average is:     7.24 today.'

This would give users at least a shot at mnemonically parsing - and
constructing - format strings, and eliminate the problem of having to
decide what goes first.

If, on the other hand, all we have to go on are some commas and
colons, then I, for one, will probably always have to look things up -
just like I always did for C-style percent-sign format specifications
in the first place.

-- 
Brandon Craig Rhodes   brandon at rhodesmill.org   http://rhodesmill.org/brandon

From jyasskin at gmail.com  Fri Aug  3 05:06:32 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Thu, 2 Aug 2007 20:06:32 -0700
Subject: [Python-3000] Updated and simplified PEP 3141: A Type
	Hierarchyfor Numbers
In-Reply-To: <f8tnog$45f$1@sea.gmane.org>
References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com>
	<5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>
	<f8tnog$45f$1@sea.gmane.org>
Message-ID: <5d44f72f0708022006l51618c4du7223ffad3bb6a0b6@mail.gmail.com>

On 8/2/07, Terry Reedy <tjreedy at udel.edu> wrote:
> "Jeffrey Yasskin" <jyasskin at gmail.com> wrote in message
> news:5d44f72f0708021153u7ea1f443jfdee3c167b011011 at mail.gmail.com...
>
> |        def __bool__(self):
> |            """True if self != 0."""
> |            return self != 0
>
> Could this be a Number rather than Complex method?

Yes, as could probably __add__, __sub__, __mul__, __abs__, and maybe a
few others, but I didn't have a good criterion for distinguishing.
What's fundamental about a "number"? I chose to punt for now, thinking
that we can move operations up there later if we want.

Remember that this is all duck typed anyway: if you overload a
function based on Number vs Sequence, that doesn't stop you from
dividing the numbers you got.

> ---------------
>
> | There is no built-in rational type
>
> Floats constitute a bit-size bounded (like ints) set of rationals with
> denominators restricted to powers of two.  Decimal literals and Decimals
> constitute a memory bounded (like longs) set of rationals with denominators
> instead restricted to powers of ten.

You are strictly correct, but I think people think of them as
approximations to the real numbers, rather than restricted and inexact
rationals. In particular, functions like exp, sin, etc. make sense on
approximated reals, but not on rationals.

> Math.frexp returns a disguised form of (numerator,denominator) (without
> common factor of two removal).  If undisguised functions were added (and
> the same for Decimal), there would be no need, really, for class Real.
>
> If such were done, a .num_denom() method either supplementing or replacing
> .numerator() and .denominator() and returning (num, denom) would have the
> same efficiency justification of int.divmod.
>
> I would like to see a conforming Rat.py class with unrestricted
> denominators.
> --------------------
>
> | And finally integers::
> |
> |    class Integral(Rational):
> |        """Integral adds a conversion to int and the bit-string
> operations."""
>
> The bit-string operations are not 'integer' operations.  Rather they are
> 'integers represented as powers of two' operations.  While << and >> can be
> interpreted (and implemented) as * and //, the other four are genernally
> meaningless for other representations, such as prime factorization or
> fibonacci base.  The Lib Ref agrees:
>    3.4.1 Bit-string Operations on Integer Types
>    Plain and long integer types support additional operations that make
> sense
>    only for bit-strings
> Other integer types should not have to support them to call themselves
> Integral.  So I think at least |, ^, &, and ~ should be removed from
> Integral and put in a subclass thereof.  Possible names are Pow2Int or
> BitStringInt or BitIntegral.

If some more people agree that they want to write integral types that
aren't based on powers of 2 (but with operations like addition that a
prime factorization representation wouldn't support), I wouldn't
object to pulling those operators out of Integral.

Then recall that Integral only needs to be in the standard library so
that the std lib's type checks can check for it rather than int. Are
there any type checks in the standard library that are looking for the
bit-string operations? Can BitString go elsewhere until it's proven
its worth?

> -----------
>
> In short, having read up to the beginning of Exact vs. Inexact Classes, my
> suggestion is to delete the unrealizable 'real' class and add an easily
> realizable non-bit-string integer class.

There are a couple of representations of non-rational subsets of the
reals from the algebraic numbers all the way up to computable reals
represented by Cauchy sequences.
http://darcs.haskell.org/numericprelude/docs/html/index.html has a
couple of these. I think RootSet and PowerSeries are the most concrete
there.

-- 
Namast?,
Jeffrey Yasskin

From guido at python.org  Fri Aug  3 05:14:30 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Aug 2007 20:14:30 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B28D08.9000700@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B28D08.9000700@canterbury.ac.nz>
Message-ID: <ca471dc20708022014h2e674aaat2652de5698104f5d@mail.gmail.com>

On 8/2/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> > In order to support the use cases for %s and %r, I propose to allow
> > appending a single letter 's', 'r' or 'f' to the width_specifier
> > (*not* the conversion_specifier):
> >
> >  'r' always calls repr() on the object;
> >  's' always calls str() on the object;
> >  'f' calls the object's __format__() method passing it the
> > conversion_specifier, or if it has no __format__() method, calls
> > repr() on it. This is also the default.
>
> Won't it seem a bit unintuitive that 'r' and 's' have
> to come before the colon, whereas all the others come
> after it?

That depends on how you think of it. My point is that these determine
which formatting API is used.

> It would seem more logical to me if 'r' and 's' were
> treated as special cases of the conversion specifier
> that are recognised before calling __format__.

But that would make it impossible to write a __format__ method that
takes a string that *might* consist of just 'r' or 's'. The conversion
specifier should be completely opaque (as it is in .NET).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug  3 05:16:53 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Aug 2007 20:16:53 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <87d4y5749p.fsf@ten22.rhodesmill.org>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<87d4y5749p.fsf@ten22.rhodesmill.org>
Message-ID: <ca471dc20708022016s2098fc5cxbf64f94d4524193b@mail.gmail.com>

On 8/2/07, Brandon Craig Rhodes <brandon at rhodesmill.org> wrote:
> "Guido van Rossum" <guido at python.org> writes:
>
> > My personal suggestion is to stay close to the .NET formatting language:
> >
> >   name_specifier [',' width_specifier] [':' conversion_specifier]
>
> A problem is that this format requires brute memorization to remember
> where to put things.  If letters were used to prefix specifications,
> like "w" for width and "p" for precision, one could write something
> like:
>
>    >>> 'The average is: {0:w8p2} today.'.format(avg)
>    'The average is:     7.24 today.'
>
> This would give users at least a shot at mnemonically parsing - and
> constructing - format strings, and eliminate the problem of having to
> decide what goes first.
>
> If, on the other hand, all we have to go on are some commas and
> colons, then I, for one, will probably always have to look things up -
> just like I always did for C-style percent-sign format specifications
> in the first place.

I fully expect having to look up the *conversion specifier* syntax,
which is specific to each type. But I expect that the conversion
specifier is relatively rarely used, and instead *most* uses will just
use the width specifier. The width specifier is so simple and
universal that one will quickly remember it. (Experimentation is also
easy enough.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nnorwitz at gmail.com  Fri Aug  3 07:59:01 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Thu, 2 Aug 2007 22:59:01 -0700
Subject: [Python-3000] removing __members__ and __methods__
Message-ID: <ee2a432c0708022259k171f6044q481a211fc64be538@mail.gmail.com>

__members__ and __methods__ are both deprecated as of 2.2 and there is
the new __dir__.  Is there any reason to keep them?  I don't notice
anything in PEP 3100, but it seems like they should be removed.

Also PyMember_[GS]et are documented as obsolete and I plan to remove
them unless I hear otherwise.

n

From nnorwitz at gmail.com  Fri Aug  3 08:01:53 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Thu, 2 Aug 2007 23:01:53 -0700
Subject: [Python-3000] C API cleanup int/long
Message-ID: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>

Since there is a merged int/long type now, we need to decide how the C
API should look.  For example, should the APIs be prefixed with
PyInt_* or PyLong_?  What are the other issues?  What do people want
for the C API in 3k?

n

From greg.ewing at canterbury.ac.nz  Fri Aug  3 08:19:12 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Aug 2007 18:19:12 +1200
Subject: [Python-3000] C API cleanup int/long
In-Reply-To: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
Message-ID: <46B2C8E0.8080409@canterbury.ac.nz>

Neal Norwitz wrote:
> Since there is a merged int/long type now, we need to decide how the C
> API should look.  For example, should the APIs be prefixed with
> PyInt_* or PyLong_?

I've always assumed it would be Py_Int*. If any
integer can be any length, it doesn't make sense
to have any length-related words in the names.

--
Greg

From nnorwitz at gmail.com  Fri Aug  3 08:31:13 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Thu, 2 Aug 2007 23:31:13 -0700
Subject: [Python-3000] C API cleanup int/long
In-Reply-To: <46B2C8E0.8080409@canterbury.ac.nz>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
	<46B2C8E0.8080409@canterbury.ac.nz>
Message-ID: <ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>

On 8/2/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Neal Norwitz wrote:
> > Since there is a merged int/long type now, we need to decide how the C
> > API should look.  For example, should the APIs be prefixed with
> > PyInt_* or PyLong_?
>
> I've always assumed it would be Py_Int*. If any
> integer can be any length, it doesn't make sense
> to have any length-related words in the names.

Aside from the name, are there other issues you can think of with any
of the API changes?  There are some small changes, things like macros
only having a function form.  Are these a problem?

Str/unicode is going to be a big change.  Any thoughts there?

n

From talin at acm.org  Fri Aug  3 08:55:03 2007
From: talin at acm.org (Talin)
Date: Thu, 02 Aug 2007 23:55:03 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
References: <46B13ADE.7080901@acm.org>	
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
Message-ID: <46B2D147.90606@acm.org>

Guido van Rossum wrote:
> My personal suggestion is to stay close to the .NET formatting language:
> 
>   name_specifier [',' width_specifier] [':' conversion_specifier]
> 
> where width_specifier is a positive or negative number giving the
> minimum width (negative for left-alignment) and conversion_specifier
> is passed uninterpreted to the object's __format__ method.

Before I comment on this I think I need to clear up a mismatch between 
your understanding of how __format__ works and mine. In particular, why 
it won't work for float and int to define a __format__ method.

Remember how I said in your office that it made sense to me there were 
two levels of format hooks in .Net? I realize that I wasn't being very 
clear at the time - as often happens when my thoughts are racing too 
fast for my mouth.

What I meant was that conceptually, there are two stages of 
customization, which I will call "pre-coercion" and "post-coercion" 
customization.

Before I explain what that means, let me say that I don't think that 
this is actually how .Net works, and I'm not proposing that there 
actually be two customization hooks. What I want to do is describe an 
abstract conceptual model of formatting, in which formatting occurs in a 
number of stages.

Pre-coercion formatting means that the real type of the value is used to 
control formatting. We don't attempt to convert the value to an int or 
float or repr() or anything - instead it's allowed to completely 
dominate the interpretation of the format codes. So the case of the 
DateTime object interpreting its specifiers as a stftime argument falls 
into this case.

In most cases, there won't be a pre-coercion hook. In which case the 
formatting proceeds to the next two stages, which are type coercion and 
then post-coercion formatting. The type coercion is driven be a 
*standard interpretation* of the format specifier. After the value is 
converted to the type, we then apply formatting that is specific to that 
type.

Now, I always envisioned that __format__ would allow reinterpretation of 
the format specifier. Therefore, __format__ fits into this model as a 
pre-coercion customization hook - it has to come *before* the type 
coercion, because otherwise type information would be destroyed and 
__format__ wouldn't work.

But the formatters for int and float have to happen *after* type 
coercion. Therefore, those formatters can't be the same as __format__.

> In order to support the use cases for %s and %r, I propose to allow
> appending a single letter 's', 'r' or 'f' to the width_specifier
> (*not* the conversion_specifier):
> 
>  'r' always calls repr() on the object;
>  's' always calls str() on the object;
>  'f' calls the object's __format__() method passing it the
> conversion_specifier, or if it has no __format__() method, calls
> repr() on it. This is also the default.
> 
> If no __format__() method was called (either because 'r' or 's' was
> used, or because there was no __format__() method on the object), the
> conversion_specifier (if given) is a *maximum* length; this handles
> the pretty common use cases of %.20s and %.20r (limiting the size of a
> printed value).
> 
> The numeric types are the main types that must provide __format__().
> (I also propose that for datetime types the format string ought to be
> interpreted as a strftime format string.) I think that
> float.__format__() should *not* support the integer formatting codes
> (d, x, o etc.) -- I find the current '%d' % 3.14 == '3' an abomination
> which is most likely an incidental effect of calling int() on the
> argument (should really be __index__()). But int.__format__() should
> support the float formatting codes; I think '%6.3f' % 12 should return
> ' 12.000'. This is in line with 1/2 returning 0.5; int values should
> produce results identical to the corresponding float values when used
> in the same context. I think this should be solved inside
> int.__format__() though; the generic formatting code should not have
> to know about this.

I don't agree that using the 'd' format type to print floats is an 
abomination, but that's because of a difference in design philosophy. 
I'm inclined to be permissive in this, because I don't see the benefit 
of being pedantic here, and I do see the potential usefulness of 
considering 'd' to be the same as 'f' with a precision of 0.

But that's a detail. I want to think about the larger picture.

Earlier I said that there were 6 attributes being controlled by the 
various specifiers, but based on the previous discussion there are 
actually 8, in no particular order:

    -- minimum width
    -- maximum width
    -- decimal precision
    -- alignment
    -- padding
    -- treatment of signs and negative numbers
    -- type coercion options
    -- number formatting options for a given type, such as exponential 
notation.

That seems a lot of parameters to cram into a lowly format string, and I 
can't imagine that anyone would like a system that requires these all to 
be specified individually. It would be cumbersome and hard to remember.

Fortunately, we recognize that these parameters are not all independent. 
Many combinations of parameters are nonsensical, especially when talking 
about non-number types. Therefore, we can can compress the visual 
specification of these attributes on a much smaller number of actual 
specified format codes.

Traditionally the C sprintf function has done two kinds of 
'multiplexing' of these codes. The first is to change the interpretation 
of a particular field (such as precision) based on the number formatting 
type. The second is to use letters to represent combinations of 
attributes - so for example the letter 'd' implies both that it's an 
integer type, and also how that integer type should be formatted.

So the challenge is to try and figure out how to represent all of the 
sensible permutations of formatting attributes in a way which is both 
intuitive and mnemonic.

There are two approaches to making this system programmer friendly: We 
can either try to invent the best possible system out of whole cloth, or 
we can steal from the past in the hopes that programmers who already 
know a previous syntax for format strings will be able to employ their 
prior knowledge.

If we decide to create a new system out of whole cloth, then what do we 
have to work with? Well, as I see it we have the following tools at our 
disposal for encoding meaning in a short form:

    -- Various delimiter characters: :,.!#$ and so on.
    -- Letters to represent one or more attributes.
    -- Numbers to represent scalar quantities
    -- The relative ordering of all of the above.

We also have to consider what it means to be 'intuitive'. In this case, 
we should consider that the various delimiter characters have 
connotations - such as the fact that '.' suggests a decimal point, or 
that '<' suggests a left-pointing arrow.

(I should also mention that "a:b,c" looks prettier to my eye than 
"a,b:c". There's a reason for this, and its because of Python syntax. 
Now, in Python, ':' isn't an operator - but if it was, you would have to 
consider its precedence to be very low. Because when we look at an 
expression 'if x: a,b' we know that comma binds more tightly than the 
colon, and so it's the same thing as saying 'if x: (a,b)'. But in any 
case this is purely an aesthetic digression and not terribly weighty.)

That's all I have to say for the moment - I'm still thinking this 
through. In any case, I think it's worthwhile to be scrutinizing this 
issue at a very low level and examining all of the assumptions.

-- Talin


From pc at gafol.net  Fri Aug  3 09:22:39 2007
From: pc at gafol.net (Paul Colomiets)
Date: Fri, 03 Aug 2007 10:22:39 +0300
Subject: [Python-3000] text_zipimport fix
Message-ID: <46B2D7BF.1010502@gafol.net>

Hi,

I've just uploaded patch that fixes test_zipimport.
http://www.python.org/sf/1766592

I'm still in doubt of some str/bytes issues. Fix me if I'm wrong.
1. imp.get_magic() should return bytes
2. loader.get_data() should return bytes
3. loader.get_source() should return str with encoding given from "# -*- 
coding: something -*-" header

How to achieve third without reinventing something? Seems that compiler 
makes use of it something thought the ast, but I'm not sure.

Currently it does PyString_FromStringAndSize(bytes) which should be 
equivalent of str(the_bytes), and uses utf8 I think.

--
Paul.

From rrr at ronadam.com  Fri Aug  3 10:08:05 2007
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 03 Aug 2007 03:08:05 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B2D147.90606@acm.org>
References: <46B13ADE.7080901@acm.org>		<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org>
Message-ID: <46B2E265.5080905@ronadam.com>


Talin wrote:

> (I should also mention that "a:b,c" looks prettier to my eye than 
> "a,b:c". There's a reason for this, and its because of Python syntax. 
> Now, in Python, ':' isn't an operator - but if it was, you would have to 
> consider its precedence to be very low. Because when we look at an 
> expression 'if x: a,b' we know that comma binds more tightly than the 
> colon, and so it's the same thing as saying 'if x: (a,b)'. But in any 
> case this is purely an aesthetic digression and not terribly weighty.)


+1  See below!  :-)


After a fair amount of experimenting today, I think I've found a nice 
middle ground that meets some of what both you and Guido are looking for. 
(And a bit my own preference too.)

What I've come up with is...

     '{name: specifier_1, specifier_2}'

Where the order of specifier_1 and specifier_2 are not dependent.  What 
that does is shorten the short common cases.  It's also one less thing to 
remember. :-)

The field name is set off with a colon which I think helps strengthen it's 
relationship as being a key when kwds are used as an argument source.  And 
it also resembles pythons syntax pattern more closely as you mentioned above.

The requirement for this to work is that the alignment specifier and the 
format specifiers can never start with the same characters.  Which turns 
out to be easy if you use type prefixes instead of postfixes on the format 
specifiers.

All of the following work:

The most common cases are very short and simple.

      '{0}, {1:s}, {2:r}, {3:d}'   # etc... for types

      '{0:10}'       # min field width
      '{0:.20}'      # max field width
      '{0:^20}'      # centered
      '{0:-20}'      # right justified
      '{0:+20}'      # left justified  (default)

      '{0:10.20}'    # min & max field widths together


If it starts with a letter, it's a format specifier.:

     '{0:d+}, {1:d()}, {2:d-}'

     '{0:f.3}, {1:f-.6}'


Or if it starts with '+', '-', or '^', or a digit, it's a field alignment 
specifier.


Combinations:

      '{0:10,r}'       # Specifiers are not ordered dependent.
      '{0:r,10}'       # Both of these produce the same output.

      '{0:-10,f+.3}'   ->  "  +123.346"

      '{0:-15,f()7.3}' ->  "  (    123.456)"

      '{0:^10,r}'      ->  " 'Hello'  "


For filled types such as numbers with leading zeros, a '/' character can 
separate the numeric_width from the fill character.  A numeric_width isn't 
the same as field width, so it's part of the type formatter here.

      # width/char

      '{0:d7/0}'     ->    '0000123'

      '{0:f7/0.3}'   ->    '0000123.000'


Filled widths can be used in the alignment specifiers too and follow the 
same rules.

      '{0:^16/_,s}'          ->    '____John Doe____'

      'Chapter {0:-10/.,d}'  ->    'Chapter........10'


Some motivational thoughts:

- The prefix form may make remembering formatting character sequences 
easier.  Or if not, an alphabetic look up table would work nicely as a 
reference.

- The colon and comma don't move around or change places.  That may help 
make it more readable and less likely to have bugs due to typos.

- If the format specifiers style is too close to some other existing 
languages format, but different enough to not be interchangeable, it could 
be more confusing instead of less confusing.



I have a partial python implementation with a doc test I can post or send 
to you if you want to see how the parsing is handled.  The format specifier 
parsing isn't implemented, but the top level string, field, and 
alignment_specs parsing is.  Its enough to see how it ties together.

Its a rough sketch, but it should be easy to build on.

Cheers,
    Ron

From stargaming at gmail.com  Fri Aug  3 10:13:58 2007
From: stargaming at gmail.com (Stargaming)
Date: Fri, 3 Aug 2007 08:13:58 +0000 (UTC)
Subject: [Python-3000] optimizing [x]range
References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com>
	<f8srod$5m7$1@sea.gmane.org> <46B233D2.4030304@v.loewis.de>
	<ca471dc20708021525m654eecb0wa5997ba602ebd1d7@mail.gmail.com>
Message-ID: <f8uo46$f76$1@sea.gmane.org>

On Thu, 02 Aug 2007 15:25:36 -0700, Guido van Rossum wrote:

> On 8/2/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> > The patch is based on the latest trunk/ checkout, Python 2.6. I don't
>> > think this is a problem if nobody else made any effort towards making
>> > xrange more sequence-like in the Python 3000 branch. The C source
>> > might require some tab/space cleanup.
>>
>> Unfortunately, this is exactly what happened: In Py3k, the range object
>> is defined in terms PyObject*, so your patch won't apply to the 3k
>> branch.
> 
> FWIW, making xrange (or range in Py3k) "more sequence-like" is exactly
> what should *not* happen.

No, that's exactly what *should* happen for optimization reasons. 

xrange has never (neither in 2.6 nor 3.0) had an sq_contains slot. 
Growing such a slot is a precondition for implementing 
xrange.__contains__ as an optimized special case, and that makes it more 
sequence-like on the side of the implementation. This does not mean it 
becomes more like the 2.x range, which we're abandoning. 
Sorry for the confusion.



From jeremy at alum.mit.edu  Fri Aug  3 16:14:51 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Fri, 3 Aug 2007 10:14:51 -0400
Subject: [Python-3000] socket makefile bug
Message-ID: <e8bf7a530708030714p37d69020l7a5da240fc58b270@mail.gmail.com>

I'm looking into httplib problems on the struni branch.  One
unexpected problem is that socket.makefile() is not behaving
correctly. The docs say "The file object references a dup()ped version
of the socket file descriptor, so the file object and socket object
may be closed or garbage-collected independently."  In Python 3000,
the object returned by makefile is no a dup()ped versoin of the file
descriptor.  If I close the socket, I close the file returned by
makefile().

I can dig into the makefile() problem, but I thought I'd mention in
the hopes that someone else thinks its easy to fix.

Jeremy

From talin at acm.org  Fri Aug  3 18:06:45 2007
From: talin at acm.org (Talin)
Date: Fri, 03 Aug 2007 09:06:45 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B2E265.5080905@ronadam.com>
References: <46B13ADE.7080901@acm.org>		<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
Message-ID: <46B35295.1030007@acm.org>

Ron Adam wrote:
> After a fair amount of experimenting today, I think I've found a nice 
> middle ground that meets some of what both you and Guido are looking 
> for. (And a bit my own preference too.)

First off, thank you very much for taking the time to think about this 
in such detail. There are a lot of good ideas here.

What's missing, however, is a description of how all of this interacts 
with the __format__ hook. The problem we are facing right now is 
sometimes we want to override the __format__ hook and sometimes we 
don't. Right now, the model that we want seems to be:

   1) High precedence type coercion, i.e. 'r', which bypasses __format__.
   2) Check for __format__, and let it interpret the format specifier.
   3) Regular type coercion, i.e. 'd', 'f' and so on.
   4) Regular formatting based on type.

-- Talin


From guido at python.org  Fri Aug  3 18:27:30 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 3 Aug 2007 09:27:30 -0700
Subject: [Python-3000] socket makefile bug
In-Reply-To: <e8bf7a530708030714p37d69020l7a5da240fc58b270@mail.gmail.com>
References: <e8bf7a530708030714p37d69020l7a5da240fc58b270@mail.gmail.com>
Message-ID: <ca471dc20708030927j2a233c9vace57122ca9ba306@mail.gmail.com>

The docs are out of date, we don't dup() any more (that was needed
only because we were using fdopen()). But what *should* happen is that
when you close the file object the socket is still open. The socket
wrapper's close() method should be fixed. I can look into that later
today.

On 8/3/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> I'm looking into httplib problems on the struni branch.  One
> unexpected problem is that socket.makefile() is not behaving
> correctly. The docs say "The file object references a dup()ped version
> of the socket file descriptor, so the file object and socket object
> may be closed or garbage-collected independently."  In Python 3000,
> the object returned by makefile is no a dup()ped versoin of the file
> descriptor.  If I close the socket, I close the file returned by
> makefile().
>
> I can dig into the makefile() problem, but I thought I'd mention in
> the hopes that someone else thinks its easy to fix.
>
> Jeremy
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jeremy at alum.mit.edu  Fri Aug  3 18:34:06 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Fri, 3 Aug 2007 12:34:06 -0400
Subject: [Python-3000] socket makefile bug
In-Reply-To: <ca471dc20708030927j2a233c9vace57122ca9ba306@mail.gmail.com>
References: <e8bf7a530708030714p37d69020l7a5da240fc58b270@mail.gmail.com>
	<ca471dc20708030927j2a233c9vace57122ca9ba306@mail.gmail.com>
Message-ID: <e8bf7a530708030934g7476107bq6d2278d7e0e886e6@mail.gmail.com>

On 8/3/07, Guido van Rossum <guido at python.org> wrote:
> The docs are out of date, we don't dup() any more (that was needed
> only because we were using fdopen()). But what *should* happen is that
> when you close the file object the socket is still open. The socket
> wrapper's close() method should be fixed. I can look into that later
> today.

Ok.  I confirmed that calling dup() fixes the problem, but that
doesn't work on Windows.  I also uncovered a bug in socket.py, which
fails to set _can_dup_socket to True on platforms where you can dup a
socket.

Jeremy

>
> On 8/3/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> > I'm looking into httplib problems on the struni branch.  One
> > unexpected problem is that socket.makefile() is not behaving
> > correctly. The docs say "The file object references a dup()ped version
> > of the socket file descriptor, so the file object and socket object
> > may be closed or garbage-collected independently."  In Python 3000,
> > the object returned by makefile is no a dup()ped versoin of the file
> > descriptor.  If I close the socket, I close the file returned by
> > makefile().
> >
> > I can dig into the makefile() problem, but I thought I'd mention in
> > the hopes that someone else thinks its easy to fix.
> >
> > Jeremy
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From guido at python.org  Fri Aug  3 19:04:28 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 3 Aug 2007 10:04:28 -0700
Subject: [Python-3000] removing __members__ and __methods__
In-Reply-To: <ee2a432c0708022259k171f6044q481a211fc64be538@mail.gmail.com>
References: <ee2a432c0708022259k171f6044q481a211fc64be538@mail.gmail.com>
Message-ID: <ca471dc20708031004s1c63de41n730c540f1a076c41@mail.gmail.com>

Yes, they should all go. Expect some cleanup though!

On 8/2/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> __members__ and __methods__ are both deprecated as of 2.2 and there is
> the new __dir__.  Is there any reason to keep them?  I don't notice
> anything in PEP 3100, but it seems like they should be removed.
>
> Also PyMember_[GS]et are documented as obsolete and I plan to remove
> them unless I hear otherwise.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug  3 19:06:06 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 3 Aug 2007 10:06:06 -0700
Subject: [Python-3000] optimizing [x]range
In-Reply-To: <f8uo46$f76$1@sea.gmane.org>
References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com>
	<f8srod$5m7$1@sea.gmane.org> <46B233D2.4030304@v.loewis.de>
	<ca471dc20708021525m654eecb0wa5997ba602ebd1d7@mail.gmail.com>
	<f8uo46$f76$1@sea.gmane.org>
Message-ID: <ca471dc20708031006p34d2af94ybca380bb476f5763@mail.gmail.com>

On 8/3/07, Stargaming <stargaming at gmail.com> wrote:
> On Thu, 02 Aug 2007 15:25:36 -0700, Guido van Rossum wrote:
>
> > On 8/2/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> >> > The patch is based on the latest trunk/ checkout, Python 2.6. I don't
> >> > think this is a problem if nobody else made any effort towards making
> >> > xrange more sequence-like in the Python 3000 branch. The C source
> >> > might require some tab/space cleanup.
> >>
> >> Unfortunately, this is exactly what happened: In Py3k, the range object
> >> is defined in terms PyObject*, so your patch won't apply to the 3k
> >> branch.
> >
> > FWIW, making xrange (or range in Py3k) "more sequence-like" is exactly
> > what should *not* happen.
>
> No, that's exactly what *should* happen for optimization reasons.
>
> xrange has never (neither in 2.6 nor 3.0) had an sq_contains slot.
> Growing such a slot is a precondition for implementing
> xrange.__contains__ as an optimized special case, and that makes it more
> sequence-like on the side of the implementation. This does not mean it
> becomes more like the 2.x range, which we're abandoning.
> Sorry for the confusion.

OK, gotcha. I was just warning not to add silliness like slicing.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug  3 19:20:50 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 3 Aug 2007 10:20:50 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B35295.1030007@acm.org>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
Message-ID: <ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>

I have no time for a complete response, but a few quickies:

- The more I think about it the, more I think putting knowledge of
floating point formatting into the wrapper is wrong. I really think we
should put this into float.__format__ (and int.__format__, and
Decimal.__format__). I can't find a reason why you don't want this;
perhaps it is an axiom? I think it needs to be challenged.

- The relative priorities of colon and comma vary by context; e.g. in
a[i:j, m:n] the colon binds tighter.

- Interpreting X.Y as min.max, while conventional in C, is hard to remember.

- If we're going to deviate from .NET, we should deviate strongly, and
I propose using semicolon as delimiter.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug  3 20:41:52 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 3 Aug 2007 11:41:52 -0700
Subject: [Python-3000] text_zipimport fix
In-Reply-To: <46B2D7BF.1010502@gafol.net>
References: <46B2D7BF.1010502@gafol.net>
Message-ID: <ca471dc20708031141k17360d01iea0015e5a014c4b9@mail.gmail.com>

I've checked this in as r56707. It looks fine at cursory inspection;
if someone wants to test the handling of encodings more thoroughly, be
my guest.

--Guido

On 8/3/07, Paul Colomiets <pc at gafol.net> wrote:
> Hi,
>
> I've just uploaded patch that fixes test_zipimport.
> http://www.python.org/sf/1766592
>
> I'm still in doubt of some str/bytes issues. Fix me if I'm wrong.
> 1. imp.get_magic() should return bytes
> 2. loader.get_data() should return bytes
> 3. loader.get_source() should return str with encoding given from "# -*-
> coding: something -*-" header
>
> How to achieve third without reinventing something? Seems that compiler
> makes use of it something thought the ast, but I'm not sure.
>
> Currently it does PyString_FromStringAndSize(bytes) which should be
> equivalent of str(the_bytes), and uses utf8 I think.
>
> --
> Paul.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Fri Aug  3 22:37:36 2007
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 03 Aug 2007 15:37:36 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B35295.1030007@acm.org>
References: <46B13ADE.7080901@acm.org>		<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
Message-ID: <46B39210.6060202@ronadam.com>



Talin wrote:
 > Ron Adam wrote:
 >> After a fair amount of experimenting today, I think I've found a nice
 >> middle ground that meets some of what both you and Guido are looking
 >> for. (And a bit my own preference too.)
 >
 > First off, thank you very much for taking the time to think about this
 > in such detail. There are a lot of good ideas here.

Thanks, I use string operations a *lot* and I really do want it to work as 
easy as possible in a wide variety of situations.


 > What's missing, however, is a description of how all of this interacts
 > with the __format__ hook. The problem we are facing right now is
 > sometimes we want to override the __format__ hook and sometimes we
 > don't. Right now, the model that we want seems to be:
 >
 >   1) High precedence type coercion, i.e. 'r', which bypasses __format__.

I think you are looking for an internal simplicity which isn't needed, and 
most people won't even think about.

The exposed interface doesn't have any ambiguities if 'r' is a format 
specification just like 's', 'd', or 'f'.  These are what the formatter 
will dispatch on.  I think a few if/else's to catch types that will call 
__repr__ and __str__, instead of __format__ aren't that costly.  I think 
there are other areas that can be optimized more, and/or other points where 
we can hook into and modify the results.

Or am I missing something still?  Maybe if you give an example where it 
makes a difference it would help us sort it out.


 >   2) Check for __format__, and let it interpret the format specifier.
 >   3) Regular type coercion, i.e. 'd', 'f' and so on.
 >   4) Regular formatting based on type.



The sequence of parsing I have so far.


1. Split string into a dictionary of fields and a list of string parts.

     'Partnumber: {0:10},  Price: ${1:f.2}'.format('123abc', 99.95)

Results in...

      {'0':('', 10), '1':('f.2', '')}        # key:(format_spec, align_spec)

      ['Partnumber: ', '{0}', '  Price: $', '{1}']


2.  Apply the format_spec and then the alignment_spec to the arguments.

     {'0':'123abc    ', '1':'99.95'}

     * If the arguments are a sequence, they are enumerated to get keys.
     * If they are a dict, the existing keys are used.
     * Passing both *args and **kwds should also work.


3.  Replace the keys in the string list with the corresponding formatted 
dictionary values.

    ['Partnumber: ', '123abc    ', '  Price: $', '99.95']


4.  Join the string parts back together.

    'Partnumber: 123abc      Price: $99.95'


It may be useful to expose some of these intermediate steps so that we 
could pre-process the specifications, or post-process the formatted results 
before it gets merged back into the string.

Which seems to fit with some of your thoughts, although I think you are 
thinking more in the line of overriding methods instead of directly 
accessing the data.  A little bit of both could go a long ways.

Cheers,
    Ron

From rrr at ronadam.com  Fri Aug  3 23:18:55 2007
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 03 Aug 2007 16:18:55 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
References: <46B13ADE.7080901@acm.org>	
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>	
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
Message-ID: <46B39BBF.80809@ronadam.com>


Guido van Rossum wrote:
> I have no time for a complete response, but a few quickies:
> 
> - The more I think about it the, more I think putting knowledge of
> floating point formatting into the wrapper is wrong. I really think we
> should put this into float.__format__ (and int.__format__, and
> Decimal.__format__). I can't find a reason why you don't want this;
> perhaps it is an axiom? I think it needs to be challenged.

I agree.


> - The relative priorities of colon and comma vary by context; e.g. in
> a[i:j, m:n] the colon binds tighter.
> 
> - Interpreting X.Y as min.max, while conventional in C, is hard to remember.
> 
> - If we're going to deviate from .NET, we should deviate strongly, and
> I propose using semicolon as delimiter.

So ...

     '{0:10.20,f.2}'

would become...

     '{0:10;20,f.2}'

Works for me.

I think this would be better because then decimal places would be 
recognizable right off because they *would* have a decimal before them. 
And nothing else would.  And min;max would be recognizable right off 
because of the semicolon.

And I can't think of anything that would be unique and work better.

Cheers,
    Ron



From guido at python.org  Sat Aug  4 00:43:33 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 3 Aug 2007 15:43:33 -0700
Subject: [Python-3000] socket makefile bug
In-Reply-To: <e8bf7a530708030934g7476107bq6d2278d7e0e886e6@mail.gmail.com>
References: <e8bf7a530708030714p37d69020l7a5da240fc58b270@mail.gmail.com>
	<ca471dc20708030927j2a233c9vace57122ca9ba306@mail.gmail.com>
	<e8bf7a530708030934g7476107bq6d2278d7e0e886e6@mail.gmail.com>
Message-ID: <ca471dc20708031543xb5b9248t12e0268ebc7546f8@mail.gmail.com>

On 8/3/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> On 8/3/07, Guido van Rossum <guido at python.org> wrote:
> > The docs are out of date, we don't dup() any more (that was needed
> > only because we were using fdopen()). But what *should* happen is that
> > when you close the file object the socket is still open. The socket
> > wrapper's close() method should be fixed. I can look into that later
> > today.
>
> Ok.  I confirmed that calling dup() fixes the problem, but that
> doesn't work on Windows.  I also uncovered a bug in socket.py, which
> fails to set _can_dup_socket to True on platforms where you can dup a
> socket.

Followup: Jeremy fixed this by adding an explicit reference count to
the socket object, counting how many makefile() streams are hanging
off it. A few more unit tests (including httplib) are now working.

However, things are still not all good. E.g.

$ rm -f CP936.TXT
$ ./python  Lib/test/regrtest.py -uall test_codecmaps_cn
test_codecmaps_cn
	fetching http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT
...
test test_codecmaps_cn crashed -- <class 'error'>: (9, 'Bad file descriptor')
1 test failed:
    test_codecmaps_cn
[68157 refs]
$ ./python
...
>>> import urllib
[46065 refs]
>>> x = urllib.urlopen("http://python.org").read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/io.py",
line 390, in read
    return self.readall()
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/io.py",
line 400, in readall
    data = self.read(DEFAULT_BUFFER_SIZE)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/io.py",
line 392, in read
    n = self.readinto(b)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/socket.py",
line 264, in readinto
    return self._sock.recv_into(b)
socket.error: (9, 'Bad file descriptor')
[60365 refs]
>>>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From dalcinl at gmail.com  Sat Aug  4 01:44:23 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Fri, 3 Aug 2007 20:44:23 -0300
Subject: [Python-3000] optimizing [x]range
In-Reply-To: <f8t06h$efu$1@sea.gmane.org>
References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com>
	<f8srod$5m7$1@sea.gmane.org> <f8t06h$efu$1@sea.gmane.org>
Message-ID: <e7ba66e40708031644j664ed699ge3db977eccc985d7@mail.gmail.com>

On 8/2/07, Stargaming <stargaming at gmail.com> wrote:
> >> made into an O(1) operation. here's a demo code (it should be trivial
> >> to implement this in CPython)
> [snipped algorithm]

Did you taked into account that your patch is not backward compatible
with py2.5?? Just try to do this with your patch,

$ python
Python 2.5.1 (r251:54863, Jun  1 2007, 12:15:26)
>>> class A:
...   def __eq__(self, other):
...     return other == 3
...
>>> A() in xrange(3)
False
>>> A() in xrange(4)
True
>>>

I know, my example is biased, but I have to insist. With this patch,
'a in xrange' will in general not be the same as 'a in range(...)'. I
am fine with this for py3k, but not sure if all people will agree on
this for python 2.6.



-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From rrr at ronadam.com  Sat Aug  4 02:12:17 2007
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 03 Aug 2007 19:12:17 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B35295.1030007@acm.org>
References: <46B13ADE.7080901@acm.org>		<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
Message-ID: <46B3C461.80704@ronadam.com>



Talin wrote:

> What's missing, however, is a description of how all of this interacts 
> with the __format__ hook. The problem we are facing right now is 
> sometimes we want to override the __format__ hook and sometimes we 
> don't. Right now, the model that we want seems to be:
> 
>   1) High precedence type coercion, i.e. 'r', which bypasses __format__.
>   2) Check for __format__, and let it interpret the format specifier.
>   3) Regular type coercion, i.e. 'd', 'f' and so on.
>   4) Regular formatting based on type.

A few more thoughts regarding this..

We can divide these into concrete and abstract type specifiers.

Concrete type specifiers only accept a specific type.  They would raise an 
exception if the argument is of another type.  Concrete type specifiers 
would always passed to an objects __format__ method.


Abstract type specifications are more lenient and work with a wider range 
of objects because they use duck typing.

So that splits things up as follows...

(I added an abstract 't' text type, to resolve the ambiguity of the 's' 
type either calling __format__ or __str__)


Concrete type specifiers:

     s -  string type   (not the __str__ method in this case)

     b,c,d,o,x,X - int type

     e,E,f,F - float type


Abstract types specifiers:  (uses duck typing)

     ! -  calls __format__ method,  fallbacks... (__str__, __repr__)
     t -  (text) calls __str__ method,  no fallback
     r -  calls __repr__ method, fallback (__str__)

The '!' type could be the default if no type is specified, it is needed if 
you also specify any formatting options.

    '{0:!xyz}'

So '!xyz' would be passed to the __format__ method if it exists and it 
would be up to that type to know what to do with the xyz.  The format_spec 
wouldn't be passed to __str__ or __repr__.

Should there be an abstract numeric type?  These would call an objects 
__int__ or __float__ method if it exists.

One of the things I've noticed is you have 'd' instead of 'i'.  Maybe 'i' 
should be the concrete integer type, and 'd' and abstract integer type that 
calls an objects __int__ method.  Then we would need a floating point 
abstract type that calls __float__ ... 'g'?

In the case of an abstract type that does a conversion the format spec 
could be forwarded to the objects __format__ method of the returned object.


Then we have the following left over...

         'g' - General format. This prints the number as a fixed-point
               number, unless the number is too large, in which case
               it switches to 'e' exponent notation.
         'G' - General format. Same as 'g' except switches to 'E'
               if the number gets to large.
         'n' - Number. This is the same as 'g', except that it uses the
               current locale setting to insert the appropriate
               number separator characters.
         '%' - Percentage. Multiplies the number by 100 and displays
               in fixed ('f') format, followed by a percent sign.

Now I'm not sure how to best handle these.  Are they abstract and call a 
particular __method__, or are they concrete and always call __format__ on a 
particular type?


Cheers,
    Ron






From adam at hupp.org  Sat Aug  4 02:15:06 2007
From: adam at hupp.org (Adam Hupp)
Date: Fri, 3 Aug 2007 19:15:06 -0500
Subject: [Python-3000] patch for csv test failures
Message-ID: <20070804001505.GA22643@mouth.upl.cs.wisc.edu>

I've uploaded a patch to SF[0] that fixes the csv struni test
failures.  The patch also implements unicode support in the _csv C
module.  Some questions:

1. The CSV PEP (305) lists Unicode support as a TODO.  Is there a
   particular person I should talk to have this change reviewed?

2. PEP 7 (C style guide) says to use single tab indentation, except
   for py3k which uses 4 spaces per indent.  _csv.c has a mix of both
   spaces and tabs.  Should I reindent the whole thing or just leave
   it as-is?

[0] http://www.python.org/sf/1767398 

-- 
Adam Hupp | http://hupp.org/adam/


From rhamph at gmail.com  Sat Aug  4 07:03:14 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 3 Aug 2007 23:03:14 -0600
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B35295.1030007@acm.org>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
Message-ID: <aac2c7cb0708032203j21e5e405gb5545631d140c932@mail.gmail.com>

On 8/3/07, Talin <talin at acm.org> wrote:
> Ron Adam wrote:
> > After a fair amount of experimenting today, I think I've found a nice
> > middle ground that meets some of what both you and Guido are looking
> > for. (And a bit my own preference too.)
>
> First off, thank you very much for taking the time to think about this
> in such detail. There are a lot of good ideas here.
>
> What's missing, however, is a description of how all of this interacts
> with the __format__ hook. The problem we are facing right now is
> sometimes we want to override the __format__ hook and sometimes we
> don't. Right now, the model that we want seems to be:
>
>    1) High precedence type coercion, i.e. 'r', which bypasses __format__.
>    2) Check for __format__, and let it interpret the format specifier.
>    3) Regular type coercion, i.e. 'd', 'f' and so on.
>    4) Regular formatting based on type.

Why not let __format__ return NotImplemented as meaning "use a
fallback".  E.g., 'd' would fall back to obj.__index__, 'r' to
repr(obj), etc.  You'd then have code like this:

class float:
    def __format__(self, type, ...):
        if type == 'f':
            return formatted float
        else:
            return NotImplemented

class MyFloat:
    def __format__(self, type, ...):
        if type == 'D':
            return custom format
        else:
            return float(self).__format__(type, ...)

class Decimal:
     def __format__(self, type, ...):
         if type == 'f':
             return formatted similar to float
         else:
             return NotImplemented

def handle_format(obj, type, ...):
    if hasattr(obj, '__format__'):
        s = obj.__format__(type, ...)
    else:
        s = NotImplemented

    if s is NotImplemented:
        if type == 'f':
            s = float(obj).__format__(type, ...)
        elif type == 'd':
            s = operator.index(obj).__format__(type, ...)
        elif type == 'r':
            s = repr(obj)
        elif type == 's':
            s = str(obj)
        else:
            raise ValueError("Unsupported format type")

    return s

-- 
Adam Olsen, aka Rhamphoryncus

From kbk at shore.net  Sat Aug  4 07:11:12 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Sat, 04 Aug 2007 01:11:12 -0400
Subject: [Python-3000] map() Returns Iterator
Message-ID: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>

Although there has been quite a bit of discussion on dropping reduce()
and retaining map(), filter(), and zip(), there has been less discussion
(at least that I can find) on changing them to return iterators instead
of lists.

I think of map() and filter() as sequence transformers.  To me, it's
an unexpected semantic change that the result is no longer a list.

In existing Lib/ code, it's twice as likely that the result of map()
will be assigned than to use it as an iterator in a flow control
statement.

If the statistics on the usage of map() stay the same, 2/3 of the time
the current implementation will require code like

        foo = list(map(fcn, bar)).

map() and filter() were retained primarily because they can produce
more compact and readable code when used correctly.  Adding list() most
of the time seems to diminish this benefit, especially when combined with
a lambda as the first arg.

There are a number of instances where map() is called for its side
effect, e.g.

        map(print, line_sequence)

with the return result ignored.  In py3k this has caused many silent
failures.  We've been weeding these out, and there are only a couple
left, but there are no doubt many more in 3rd party code.

The situation with filter() is similar, though it's not used purely
for side effects.  zip() is infrequently used.  However, IMO for
consistency they should all act the same way.

I've seen GvR slides suggesting replacing map() et. al. with list
comprehensions, but never with generator expressions.

PEP 3100: "Make built-ins return an iterator where appropriate
(e.g. range(), zip(), map(), filter(), etc.)"

It makes sense for range() to return an iterator.  I have my doubts on
map(), filter(), and zip().  Having them return iterators seems to
be a premature optimization.  Could something be done in the ast phase
of compilation instead?






-- 
KBK

From rhamph at gmail.com  Sat Aug  4 07:13:05 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 3 Aug 2007 23:13:05 -0600
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <aac2c7cb0708032203j21e5e405gb5545631d140c932@mail.gmail.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<aac2c7cb0708032203j21e5e405gb5545631d140c932@mail.gmail.com>
Message-ID: <aac2c7cb0708032213x5027f55bq5a37ffd4d1bfb628@mail.gmail.com>

On 8/3/07, Adam Olsen <rhamph at gmail.com> wrote:
> class MyFloat:
>     def __format__(self, type, ...):
>         if type == 'D':
>             return custom format
>         else:
>             return float(self).__format__(type, ...)

Oops, explicitly falling back to float is unnecessary here.  It should
instead be:

class MyFloat:
    def __float__(self):
        return self as float
    def __format__(self, type, ...):
        if type == 'D':
            return custom format
        else:
            return NotImplemented  # Falls back to self.__float__().__format__()

-- 
Adam Olsen, aka Rhamphoryncus

From jyasskin at gmail.com  Sat Aug  4 09:56:04 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Sat, 4 Aug 2007 00:56:04 -0700
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <5d44f72f0708040056y7f7b8f0ah2141ee7230860b2f@mail.gmail.com>

<wild-speculation>
Is it possible to make the result of map() look like a list if people
are paying attention, but use memory like an iterator when they're
not? We'd want to distinguish between:
  x = map(...)
and
  for x in map(...)

Actually, to get any use out of it, we'd need to allow the first case,
as long as the first call to .__iter__() were also the last use of the
value. (I think this makes the return value of map() a 'view' rather
than an iterator?) How could we know that in time to do something
about it?

It looks to me that this could be accomplished if the last use of a
variable in a particular scope didn't increment the reference count
when passing that variable to a function. (Of course, I don't know
anything about how function calls actually work, which is why this is
wild speculation.) Then when the map-view's .__iter__() method is
called, it could check self's reference count. If that count is 1,
just proceed to iterate down the list, throwing away values after
computing them. If the count is >1, then create a list and fill it
while computing the map.

This could also be a handy optimization for plain list iterators, and
maybe other types. If .__iter__() is called on the last reference to a
particular value, then as it walks the list, it can decrement the
reference count of the items it has passed, since nobody can ever
again retrieve them through that list.
</wild-speculation>

Also, calling map() for its side-effects is a perversion of the
concept and shouldn't be encouraged by the language. Write a for loop.
;)

On 8/3/07, Kurt B. Kaiser <kbk at shore.net> wrote:
> Although there has been quite a bit of discussion on dropping reduce()
> and retaining map(), filter(), and zip(), there has been less discussion
> (at least that I can find) on changing them to return iterators instead
> of lists.
>
> I think of map() and filter() as sequence transformers.  To me, it's
> an unexpected semantic change that the result is no longer a list.
>
> In existing Lib/ code, it's twice as likely that the result of map()
> will be assigned than to use it as an iterator in a flow control
> statement.
>
> If the statistics on the usage of map() stay the same, 2/3 of the time
> the current implementation will require code like
>
>         foo = list(map(fcn, bar)).
>
> map() and filter() were retained primarily because they can produce
> more compact and readable code when used correctly.  Adding list() most
> of the time seems to diminish this benefit, especially when combined with
> a lambda as the first arg.
>
> There are a number of instances where map() is called for its side
> effect, e.g.
>
>         map(print, line_sequence)
>
> with the return result ignored.  In py3k this has caused many silent
> failures.  We've been weeding these out, and there are only a couple
> left, but there are no doubt many more in 3rd party code.
>
> The situation with filter() is similar, though it's not used purely
> for side effects.  zip() is infrequently used.  However, IMO for
> consistency they should all act the same way.
>
> I've seen GvR slides suggesting replacing map() et. al. with list
> comprehensions, but never with generator expressions.
>
> PEP 3100: "Make built-ins return an iterator where appropriate
> (e.g. range(), zip(), map(), filter(), etc.)"
>
> It makes sense for range() to return an iterator.  I have my doubts on
> map(), filter(), and zip().  Having them return iterators seems to
> be a premature optimization.  Could something be done in the ast phase
> of compilation instead?
>
>
>
>
>
>
> --
> KBK
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com
>


-- 
Namast?,
Jeffrey Yasskin
http://jeffrey.yasskin.info/

"Religion is an improper response to the Divine." ? "Skinny Legs and
All", by Tom Robbins

From greg.ewing at canterbury.ac.nz  Sat Aug  4 13:55:23 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 04 Aug 2007 23:55:23 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B2D147.90606@acm.org>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org>
Message-ID: <46B4692B.7060308@canterbury.ac.nz>

Talin wrote:
> But the formatters for int and float have to happen *after* type 
> coercion.

I don't see why. Couldn't the __format__ method
for an int recognise float formats as well and coerce
itself as necessary?

> (I should also mention that "a:b,c" looks prettier to my eye than 
> "a,b:c".

It seems more logical to me, too, for the colon
to separate the value from all the stuff telling
how to format it.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Aug  4 14:15:29 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 05 Aug 2007 00:15:29 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B39BBF.80809@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com>
Message-ID: <46B46DE1.1090403@canterbury.ac.nz>

Ron Adam wrote:

>      '{0:10;20,f.2}'
> 
> Works for me.

It doesn't work for me, as it breaks up into

   0:10; 20,f.2

i.e. semicolons separate more strongly than commas
to my eyes.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Aug  4 14:33:16 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 05 Aug 2007 00:33:16 +1200
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <5d44f72f0708040056y7f7b8f0ah2141ee7230860b2f@mail.gmail.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<5d44f72f0708040056y7f7b8f0ah2141ee7230860b2f@mail.gmail.com>
Message-ID: <46B4720C.5050504@canterbury.ac.nz>

Jeffrey Yasskin wrote:
> <wild-speculation>
> Is it possible to make the result of map() look like a list if people
> are paying attention, but use memory like an iterator when they're
> not?

I suppose it could lazily materialise a list behind the
scenes when needed (i.e. on the first __getitem__ or
__len__ call), but the semantics still wouldn't be *exactly*
the same, as it wouldn't be possible to iterate over it more
than once. Also as any side-effects would be delayed.

--
Greg

From rrr at ronadam.com  Sat Aug  4 17:02:39 2007
From: rrr at ronadam.com (Ron Adam)
Date: Sat, 04 Aug 2007 10:02:39 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B46DE1.1090403@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>
Message-ID: <46B4950F.40905@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
> 
>>      '{0:10;20,f.2}'
>>
>> Works for me.
> 
> It doesn't work for me, as it breaks up into
> 
>    0:10; 20,f.2
> 
> i.e. semicolons separate more strongly than commas
> to my eyes.

And after my reply I realized this looks a bit odd.

     {0:;20,f.2}

But I figured I could get used to it.


An alternative I thought of this morning is to reuse the alignment symbols 
'^', '+', and '-' and require a minimum width if a maximum width is specified.

Then the field aligner would also have instructions for how to align/trim 
something if it is wider than max_width.

    {0:0+7}        'Hello W'

    {0:0-7}        'o World'

    {0:0^7}        'llo Wor'


Where this may make the most sense is if we are centering something in a 
minimum width field, but want to be sure we can see one end or the other if 
it's over the maximum field width.

The trade off is it adds an extra character for cases where only max_width 
would be needed.

    {0:0+20,f.2}



Separators in tkinter:

The text widget has a line.column index.  '1.0' is the first line, and 
first character.

# window geometry is width x height + x_offset + y_offset (no spaces!)
root.geometry("400x300+30+30")


Cheers,
    Ron


From rrr at ronadam.com  Sat Aug  4 18:30:24 2007
From: rrr at ronadam.com (Ron Adam)
Date: Sat, 04 Aug 2007 11:30:24 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B4950F.40905@ronadam.com>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com>
Message-ID: <46B4A9A0.9070206@ronadam.com>



Ron Adam wrote:

> An alternative I thought of this morning is to reuse the alignment symbols 
> '^', '+', and '-' and require a minimum width if a maximum width is specified.

One more (or two) additions to this...

In the common cases of generating columnar reports, the min_width and 
max_width values would be equal.  So instead of repeating the numbers we 
could just prefix this case with double alignment symbols.

So instead of:

    '{0:+20+20}, {1:^100+100}, {2:-15+15}'

We could use:

    '{0:++20}, {1:^+100}, {2:-+15}'

Which would result in a first column that right aligns, a second column 
that centers unless the value is longer than 100, in which case it right 
align, and cuts the end, and a third column that left aligns, but cuts off 
the right if it's over 15.


One other feature might be to use the fill syntax form to specify an 
overflow replacement character...

    '{0:10+10/#}'.format('Python')                 ->  'Python    '

    '{0:10+10/#}'.format('To be, or not to be.')   ->  '##########'



Another way to think of the double alignment specification term, is that it 
moves slicing and preformatting of exceptional cases into the string format 
operation so we don't have to do the following just to catch the rare 
possibility of exceptional cases.  And it avoids altering the source data.

if len(value1)>max_width:
     value = value[:max_width]    # or [len(value)-max_with:]

Etc... for value2, value3 ...

line = format_string.format(value1, value2, value3, ...)


Cheers,
    Ron







From skip at pobox.com  Sat Aug  4 23:06:30 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 4 Aug 2007 16:06:30 -0500
Subject: [Python-3000] py3k conversion docs?
Message-ID: <18100.59990.335150.692487@montanaro.dyndns.org>

I'm looking at the recently submitted patch for the csv module and am
scratching my head a bit trying to understand the code transformations.
I've not looked at any py3k code yet, so this is all new to me.  Is there
any documentation about the Py3k conversion?  I'm particularly interested in
the string->unicode conversion.

Here's one confusing conversion.  I see PyString_FromStringAndSize replaced
by PyUnicode_FromUnicode.  In another place I see PyString_FromString
replaced by PyUnicodeDecodeASCII.  In some places I see a char left alone.
In other places I see it replaced by PyUNICODE.

Skip

From skip at pobox.com  Sat Aug  4 23:41:26 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 4 Aug 2007 16:41:26 -0500
Subject: [Python-3000] atexit module problems/questions
Message-ID: <18100.62086.177289.274444@montanaro.dyndns.org>

During the recast of the atexit module into C it grew _clear and unregister
functions.  I can understand that a clear function might be handy, but why
is it private?

Given that sys.exitfunc is gone is there a reason to have _run_exitfuncs?
Who's going to call it?

Finally, I can see a situation where you might register the same function
multiple times with different argument lists, yet unregister takes only the
function as the discriminator.  I think that's going to be of at-best
minimal use, and error-prone.  (A common use might be to register os.unlink
for various files the program created during its run.  Unregister would be
quite useless there.)  In GTK's gobject library, when you register idle or
timer functions an integer id is returned.  That id is what is used to later
remove that function.  If you decide to retainn unregister (I would vote to
remove it if it's not going to be fixed) I think you might as well break the
register function's api and return ncallbacks instead of the function.

Skip

From guido at python.org  Sat Aug  4 23:48:47 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 4 Aug 2007 14:48:47 -0700
Subject: [Python-3000] py3k conversion docs?
In-Reply-To: <18100.59990.335150.692487@montanaro.dyndns.org>
References: <18100.59990.335150.692487@montanaro.dyndns.org>
Message-ID: <ca471dc20708041448m1326ec14y3c4ab6b572b10c03@mail.gmail.com>

I haven't seen the patch you mention, and unfortunately there aren't
docs for the conversion yet.

However, one thing to note is that in 2.x, the PyString type ('str')
is used for binary data, encoded text data, and decoded text data. In
3.0, binary and encoded text are represented using PyBytes ('bytes'),
and decoded text is represented as PyUnicode (now called 'str').
Perhaps it helps understanding the patch knowing that 'char*' is
likely encoded text, while 'PyUNICODE*' is likely decoded text.

Sorry,

--Guido

On 8/4/07, skip at pobox.com <skip at pobox.com> wrote:
> I'm looking at the recently submitted patch for the csv module and am
> scratching my head a bit trying to understand the code transformations.
> I've not looked at any py3k code yet, so this is all new to me.  Is there
> any documentation about the Py3k conversion?  I'm particularly interested in
> the string->unicode conversion.
>
> Here's one confusing conversion.  I see PyString_FromStringAndSize replaced
> by PyUnicode_FromUnicode.  In another place I see PyString_FromString
> replaced by PyUnicodeDecodeASCII.  In some places I see a char left alone.
> In other places I see it replaced by PyUNICODE.
>
> Skip
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Sat Aug  4 23:49:50 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 4 Aug 2007 16:49:50 -0500
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <18100.62086.177289.274444@montanaro.dyndns.org>
References: <18100.62086.177289.274444@montanaro.dyndns.org>
Message-ID: <18100.62590.744303.912325@montanaro.dyndns.org>


    skip> Given that sys.exitfunc is gone is there a reason to have
    skip> _run_exitfuncs?  Who's going to call it?

I should have elaborated.  Clearly you need some way to call it, but since
that is going to be called from C code (isn't it?), why expose it to Python
code?

Skip

From lists at cheimes.de  Sun Aug  5 02:44:44 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 05 Aug 2007 02:44:44 +0200
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <18100.62590.744303.912325@montanaro.dyndns.org>
References: <18100.62086.177289.274444@montanaro.dyndns.org>
	<18100.62590.744303.912325@montanaro.dyndns.org>
Message-ID: <f936i2$nii$1@sea.gmane.org>

skip at pobox.com wrote:
>     skip> Given that sys.exitfunc is gone is there a reason to have
>     skip> _run_exitfuncs?  Who's going to call it?
> 
> I should have elaborated.  Clearly you need some way to call it, but since
> that is going to be called from C code (isn't it?), why expose it to Python
> code?

Unit tests? Some developers might want to test their registered functions.

Christian


From greg.ewing at canterbury.ac.nz  Sun Aug  5 03:13:06 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 05 Aug 2007 13:13:06 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B4A9A0.9070206@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
Message-ID: <46B52422.2090006@canterbury.ac.nz>

Ron Adam wrote:
> Which would result in a first column that right aligns, a second column 
> that centers unless the value is longer than 100, in which case it right 
> align, and cuts the end, and a third column that left aligns, but cuts off 
> the right if it's over 15.

All this talk about cutting things off worries me. In the
case of numbers at least, if you can't afford to expand the
column width, normally the right thing to do is *not* to cut
them off, but replace them with **** or some other thing that
stands out.

This suggests that the formatting and field width options may
not be as easily separable as we would like.

--
Greg

From greg.ewing at canterbury.ac.nz  Sun Aug  5 03:25:18 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 05 Aug 2007 13:25:18 +1200
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <18100.62086.177289.274444@montanaro.dyndns.org>
References: <18100.62086.177289.274444@montanaro.dyndns.org>
Message-ID: <46B526FE.4060500@canterbury.ac.nz>

skip at pobox.com wrote:
> I can see a situation where you might register the same function
> multiple times with different argument lists, yet unregister takes only the
> function as the discriminator.

One way to fix this would be to remove the ability to
register arguments along with the function. It's not
necessary, as you can always use a closure to get
the same effect. Then you have a unique handle for
each registered callback.

--
Greg

From jyasskin at gmail.com  Sun Aug  5 03:53:45 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Sat, 4 Aug 2007 18:53:45 -0700
Subject: [Python-3000] test_asyncore fails intermittently on Darwin
In-Reply-To: <46AE943D.1040105@canterbury.ac.nz>
References: <2cda2fc90707261505tdd9a0f1t861b5801c37ad11e@mail.gmail.com>
	<1d36917a0707261618oac94f20l98f464a2ab1edc4e@mail.gmail.com>
	<2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com>
	<2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com>
	<46AE943D.1040105@canterbury.ac.nz>
Message-ID: <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com>

Well, regardless of the brokenness of the patch, I do get two
different failures from this test on OSX. The first is caused by
trying to socket.bind() a port that's already been bound recently:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/Users/jyasskin/src/python/test_asyncore/Lib/threading.py",
line 464, in __bootstrap
    self.run()
  File "/Users/jyasskin/src/python/test_asyncore/Lib/threading.py",
line 444, in run
    self.__target(*self.__args, **self.__kwargs)
  File "Lib/test/test_asyncore.py", line 59, in capture_server
    serv.bind(("", PORT))
  File "<string>", line 1, in bind
socket.error: (48, 'Address already in use')

That looks pretty easy to fix.

The second:

======================================================================
ERROR: test_send (__main__.DispatcherWithSendTests_UsePoll)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_asyncore.py", line 351, in test_send
    d.send(data)
  File "/Users/jyasskin/src/python/test_asyncore/Lib/asyncore.py",
line 468, in send
    self.initiate_send()
  File "/Users/jyasskin/src/python/test_asyncore/Lib/asyncore.py",
line 455, in initiate_send
    num_sent = dispatcher.send(self, self.out_buffer[:512])
  File "/Users/jyasskin/src/python/test_asyncore/Lib/asyncore.py",
line 335, in send
    if why[0] == EWOULDBLOCK:
TypeError: 'error' object is unindexable

seems to be caused by a change in exceptions. I've reduced the problem
into the attached patch, which adds a test to Lib/test/test_socket.py.
It looks like subscripting is no longer the way to get values out of
socket.errors, but another way hasn't been implemented yet.

On 7/30/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Hasan Diwan wrote:
> > The issue seems to be in the socket.py close method. It needs to sleep
> > socket.SO_REUSEADDR seconds before returning.
>
> WHAT??? socket.SO_REUSEADDR is a flag that you pass when
> creating a socket to tell it to re-use an existing address,
> not something to be used as a timeout value, as far as
> I know.
>
> --
> Greg
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com
>


-- 
Namast?,
Jeffrey Yasskin
http://jeffrey.yasskin.info/

"Religion is an improper response to the Divine." ? "Skinny Legs and
All", by Tom Robbins
-------------- next part --------------
A non-text attachment was scrubbed...
Name: socket_breakage.diff
Type: application/octet-stream
Size: 1164 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070804/c8e25af9/attachment.obj 

From guido at python.org  Sun Aug  5 04:09:45 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 4 Aug 2007 19:09:45 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B52422.2090006@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
Message-ID: <ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>

On 8/4/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Ron Adam wrote:
> > Which would result in a first column that right aligns, a second column
> > that centers unless the value is longer than 100, in which case it right
> > align, and cuts the end, and a third column that left aligns, but cuts off
> > the right if it's over 15.
>
> All this talk about cutting things off worries me. In the
> case of numbers at least, if you can't afford to expand the
> column width, normally the right thing to do is *not* to cut
> them off, but replace them with **** or some other thing that
> stands out.
>
> This suggests that the formatting and field width options may
> not be as easily separable as we would like.

I remember a language that did the *** thing; it was called Fortran.
It was an absolutely terrible feature. A later language (Pascal)
solved it by ignoring the field width if the number didn't fit -- it
would mess up your layout but at least you'd see the value. That
strategy worked much better, and later languages (e.g. C) followed it.
So I think a maximum width is quite unnecessary for numbers. For
strings, of course, it's useful; it can be made part of the
string-specific conversion specifier.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Sun Aug  5 04:48:18 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 4 Aug 2007 21:48:18 -0500
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <f936i2$nii$1@sea.gmane.org>
References: <18100.62086.177289.274444@montanaro.dyndns.org>
	<18100.62590.744303.912325@montanaro.dyndns.org>
	<f936i2$nii$1@sea.gmane.org>
Message-ID: <18101.14962.541067.708881@montanaro.dyndns.org>


    skip> Given that sys.exitfunc is gone is there a reason to have
    skip> _run_exitfuncs?  Who's going to call it?

    Christian> Unit tests? Some developers might want to test their
    Christian> registered functions.

Your tests can just fork another instance of Python which prints:

    python -c 'import atexit
def f(*args, **kwds):
    print("atexit", args, kwds)

atexit.register(f, 1, x=2)
'

and have your test case expect to see the appropriate output.

Skip

From skip at pobox.com  Sun Aug  5 04:51:13 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 4 Aug 2007 21:51:13 -0500
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <46B526FE.4060500@canterbury.ac.nz>
References: <18100.62086.177289.274444@montanaro.dyndns.org>
	<46B526FE.4060500@canterbury.ac.nz>
Message-ID: <18101.15137.718715.98755@montanaro.dyndns.org>


    >> I can see a situation where you might register the same function
    >> multiple times with different argument lists, yet unregister takes
    >> only the function as the discriminator.

    Greg> One way to fix this would be to remove the ability to register
    Greg> arguments along with the function. It's not necessary, as you can
    Greg> always use a closure to get the same effect. Then you have a
    Greg> unique handle for each registered callback.

Then you need to hang onto the closure.  That might be some distance away
from the point at which the function was registered.  Returning a unique id
corresponding to the specific call to atexit.register is much simpler the
than forcing the caller to build a closure.

Skip


From talin at acm.org  Sun Aug  5 06:17:21 2007
From: talin at acm.org (Talin)
Date: Sat, 04 Aug 2007 21:17:21 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B4A9A0.9070206@ronadam.com>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>
Message-ID: <46B54F51.40705@acm.org>

Ron Adam wrote:

> Ron Adam wrote:
> 
>> An alternative I thought of this morning is to reuse the alignment symbols 
>> '^', '+', and '-' and require a minimum width if a maximum width is specified.
> 
> One more (or two) additions to this...

(snipped)

I've kind of lost track of what the proposal is at this specific point. 
I like several of the ideas you have proposed, but I think it needs to 
be slimmed down even more.

I don't have a particular syntax in mind - yet - but I can tell you what 
I would like to see in general.

Guido used the term "mini-language" to describe the conversion specifier 
syntax. I think that's a good term, because it implies that it's not 
just a set of isolated properties, but rather a grammar where the 
arrangement and ordering of things matters.

Like real human languages, it has a "Huffman-coding" property, where the 
most commonly-uttered phrases are the shortest. This conciseness is 
achieved by sacrificing some degree of orthogonality (in the same way 
that a CISC machine instruction is shorter than an equivalent RISC 
instruction.) In practical terms it means that the interpretation of a 
symbol depends on what comes before it.

So in general common cases should be short, uncommon cases should be 
possible. And we don't have to allow every possible combination of 
options, just the ones that are most important.

Another thing I want to point out is that Guido and I (in a private 
discussion) have resolved our argument about the role of __format__. 
Well, not so much *agreed* I guess, more like I capitulated.

But in any case, the deal is that int, float, and decimal all get to 
have a __format__ method which interprets the format string for those 
types. There is no longer any automatic coercion of types based on the 
format string - so simply defining an __int__ method for a type is 
insufficient if you want to use the 'd' format type. Instead, if you 
want to use 'd' you can simply write the following:

    def MyClass:
       def __format__(self, spec):
          return int(self).__format__(spec)

This at least has the advantage of simplifying the problem quite a bit. 
The global 'format(value, spec)' function now just does:

    1) check for the 'repr' override, if present return repr(val)
    2) call val.__format__(spec) if it exists
    3) call str(val).__format__(spec)

Note that this also means that float.__format__ will have to handle 'd' 
and int.__format__ will handle 'f', and so on, although this can be done 
by explicit type conversion in the __format__ method. (No need for float 
to handle 'x' and the like, even though it does work with %-formatting 
today.)

> One other feature might be to use the fill syntax form to specify an 
> overflow replacement character...
> 
>     '{0:10+10/#}'.format('Python')                 ->  'Python    '
> 
>     '{0:10+10/#}'.format('To be, or not to be.')   ->  '##########'

Yeah, as Guido pointed out in another message that's not going to fly.

A few minor points on syntax of the minilanguage:

-- I like your idea that :xxxx and ,yyyy can occur in any order.

-- I'm leaning towards the .Net conversion spec syntax convention where 
the type letter comes first: ':f10'. The idea being that the first 
letter changes the interpretation of subsequent letters.

Note that in the .Net case, the numeric quantity after the letter 
represents a *precision* specifier, not a min/max field width.

So for example, in .Net having a float field of minimum width 10 and a 
decimal precision of 3 digits would be ':f3,10'.

Now, as stated above, there's no 'max field width' for any data type 
except strings. So in the case of strings, we can re-use the precision 
specifier just like C printf does: ':s10' to limit the string to 10 
characters. So 's:10,5' to indicate a max width of 10, min width of 5.

-- There's no decimal precision quantity for any data type except 
floats. So ':d10' doesn't mean anything I think, but ':d,10' is minimum 
10 digits.

-- I don't have an opinion yet on where the other stuff (sign options, 
padding, alignment) should go, except that sign should go next to the 
type letter, while the rest should go after the comma.

-- For the 'repr' override, Guido suggests putting 'r' in the alignment 
field: '{0,r}'. How that mixes with alignment and padding is unknown, 
although frankly why anyone would want to pad and align a repr() is 
completely beyond me.

-- Talin

From rrr at ronadam.com  Sun Aug  5 08:06:43 2007
From: rrr at ronadam.com (Ron Adam)
Date: Sun, 05 Aug 2007 01:06:43 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
Message-ID: <46B568F3.9060105@ronadam.com>



Guido van Rossum wrote:
> On 8/4/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> Ron Adam wrote:
>>> Which would result in a first column that right aligns, a second column
>>> that centers unless the value is longer than 100, in which case it right
>>> align, and cuts the end, and a third column that left aligns, but cuts off
>>> the right if it's over 15.
>> All this talk about cutting things off worries me. In the
>> case of numbers at least, if you can't afford to expand the
>> column width, normally the right thing to do is *not* to cut
>> them off, but replace them with **** or some other thing that
>> stands out.
>>
>> This suggests that the formatting and field width options may
>> not be as easily separable as we would like.
> 
> I remember a language that did the *** thing; it was called Fortran.
> It was an absolutely terrible feature. A later language (Pascal)
> solved it by ignoring the field width if the number didn't fit -- it
> would mess up your layout but at least you'd see the value. That
> strategy worked much better, and later languages (e.g. C) followed it.
> So I think a maximum width is quite unnecessary for numbers. For
> strings, of course, it's useful; it can be made part of the
> string-specific conversion specifier.

I looked up Fortran's print descriptors and it seems they have only a 
single width descriptor which as you say automatically included the *** 
over flow behavior.  So the programmer doesn't have a choice, they can 
either specify a width and get that too, or don't specify a width.  I can 
see how that would be very annoying.

    See section 2...

    http://www-solar.mcs.st-and.ac.uk/~steveb/course/notes/set4.pdf

The field width specification I've described is rich enough so that the 
programmer can choose the behavior they want.  So it doesn't have the same 
problem.

A programmer can choose to implement the Fortran behavior if they really 
want to.  They would need to specify an overflow replacement character to 
turn that on.  Other wise it never occurs.

    '{0:10+20/*,s}'

In the above case the field width would normally be 10, but could expand 
upto 20,  and only if it goes over 20 is the field filled with '*'s.  But 
that behavior was explicitly specified by the programmer by supplying an 
overflow replacement character along with the max_width size.  It's not 
automatically included as in the Fortran case.  Truncating behavior is 
explicitly specified by giving a max_width size without a replacement 
character. And a minimum width is explicitly specified by supplying a 
min_width size.

So the programmer has full and explicit control of the alignment behaviors 
in all cases.

Since an alignment specification is always paired with a format 
specification, the programmer can choose the best alignment behavior to go 
along with a formatter in the context of their application.  This is a good 
thing even though some programmers may not always make the best choices at 
first.  I believe they will learn fairly quickly what not to do.

So the choices are:

1 - Remove the replacement character alignment option. It may not be all 
that useful, and by removing it we protect programmers from making some 
mistakes, but limit others from this feature who may find it useful.

So just how useful/desirable is this?

2 - Only use max_width inside string formatters.  This further protects 
programmers from making silly choices.  And further limits other that may 
want to use max_width with other types.  It also breaks up the clean split 
of alignment and format specifiers.  (But this may be a matter of 
perspective.)



I'm +0 on (1), and -1 on (2) moving max_width to the string formatter.  So 
what do others think about these features?

If you do #2, then #1 also goes, unless it too is moved to the string 
formatter.


Note: Moving these to the string type formatter doesn't prevent them from 
being used with numbers in all cases.  A general text class would still be 
able to use them with numeric entries because it would call the __str__ 
method of the number to first convert the number to a string, but then call 
__format__ on that string and forward these string options.  It just 
requires more thought to do, and a better understanding of the internal 
process.

But also this depends on the choice of the underlying implementation.

Cheers,
    Ron




From hasan.diwan at gmail.com  Sun Aug  5 09:12:35 2007
From: hasan.diwan at gmail.com (Hasan Diwan)
Date: Sun, 5 Aug 2007 00:12:35 -0700
Subject: [Python-3000] test_asyncore fails intermittently on Darwin
In-Reply-To: <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com>
References: <2cda2fc90707261505tdd9a0f1t861b5801c37ad11e@mail.gmail.com>
	<1d36917a0707261618oac94f20l98f464a2ab1edc4e@mail.gmail.com>
	<2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com>
	<2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com>
	<46AE943D.1040105@canterbury.ac.nz>
	<5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com>
Message-ID: <2cda2fc90708050012p49831ad5ga69f7a069acff3d2@mail.gmail.com>

On 04/08/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> Well, regardless of the brokenness of the patch, I do get two
> different failures from this test on OSX. The first is caused by
> trying to socket.bind() a port that's already been bound recently:
>
> Exception in thread Thread-2:
> Traceback (most recent call last):
>   File "/Users/jyasskin/src/python/test_asyncore/Lib/threading.py",
> line 464, in __bootstrap
>     self.run()
>   File "/Users/jyasskin/src/python/test_asyncore/Lib/threading.py",
> line 444, in run
>     self.__target(*self.__args, **self.__kwargs)
>   File "Lib/test/test_asyncore.py", line 59, in capture_server
>     serv.bind(("", PORT))
>   File "<string>", line 1, in bind
> socket.error: (48, 'Address already in use')

Patch number 1767834 -- uncommitted as yet -- fixes this problem.
-- 
Cheers,
Hasan Diwan <hasan.diwan at gmail.com>

From rrr at ronadam.com  Sun Aug  5 11:57:25 2007
From: rrr at ronadam.com (Ron Adam)
Date: Sun, 05 Aug 2007 04:57:25 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B54F51.40705@acm.org>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org>
Message-ID: <46B59F05.3070200@ronadam.com>



Talin wrote:
> Ron Adam wrote:
> 
>> Ron Adam wrote:
>>
>>> An alternative I thought of this morning is to reuse the alignment 
>>> symbols '^', '+', and '-' and require a minimum width if a maximum 
>>> width is specified.
>>
>> One more (or two) additions to this...
> 
> (snipped)
> 
> I've kind of lost track of what the proposal is at this specific point. 
> I like several of the ideas you have proposed, but I think it needs to 
> be slimmed down even more.

I put in a lot of implementation details, so it may seem heavier than it 
really is.

> I don't have a particular syntax in mind - yet - but I can tell you what 
> I would like to see in general.
> 
> Guido used the term "mini-language" to describe the conversion specifier 
> syntax. I think that's a good term, because it implies that it's not 
> just a set of isolated properties, but rather a grammar where the 
> arrangement and ordering of things matters.

I agree, a mini-language also imply a richness that a simple option list 
doesn't have.

> Like real human languages, it has a "Huffman-coding" property, where the 
> most commonly-uttered phrases are the shortest. This conciseness is 
> achieved by sacrificing some degree of orthogonality (in the same way 
> that a CISC machine instruction is shorter than an equivalent RISC 
> instruction.) In practical terms it means that the interpretation of a 
> symbol depends on what comes before it.

Sounds good.

> So in general common cases should be short, uncommon cases should be 
> possible. And we don't have to allow every possible combination of 
> options, just the ones that are most important.

I figured some of what I suggested would be vetoed, but included them in 
case they are desirable.  It's not always easy to know before hand how the 
community, or Guido, ;-) is going to respond to any suggestion.


> Another thing I want to point out is that Guido and I (in a private 
> discussion) have resolved our argument about the role of __format__. 
> Well, not so much *agreed* I guess, more like I capitulated.

Refer to the message in this thread where I discuss the difference between 
concrete and abstract format specifiers.  I think this is basically where 
you and Guido are differing on these issues.  I got the impression you 
prefer the more abstract interpretation and Guido prefers a more 
traditional interpretation.  We can have both as long as they are well 
defined and documented as being one or the other.  It's when we try to make 
one format specifier have both qualities at different times that it gets messy.


Here's how the apply_format function could look, we may not be in as much 
disagreement as you think.

def apply_format(value, format_spec):
     abstract = False
     type = format_spec[0]
     if type in 'rtgd':
	abstract = True
     	if format_spec[0] == 'r':      # abstarct repr
             value = repr(value)
         elif format_spec[0] == 't':    # abstarct text
             value = str(value)
         elif format_spec[0] == 'g':    # abstract float
             value = float(value)
         else
             format_spec[0] == 'd':     # abstarct int
             value = int(value)
     return value.__format__(format_spec, abstract)

The above abstract types use duck typing to convert to concrete types 
before calling the returned types __format__ method. There aren't that many 
abstract types needed.  We only need a few to cover the most common cases.

That's it.  It's up to each types __format__ method to figure out things 
from there.  They can look at the original type spec passed to them and 
handle special cases if need be.

If the abstract flag is False and the format_spec type doesn't match the 
type of the __format__ methods class, then an exception can be raised. 
This offers a wider range of strictness/leniency to string formatting. 
There are cases where you may want either.


> But in any case, the deal is that int, float, and decimal all get to 
> have a __format__ method which interprets the format string for those 
> types.

Good, +1

> There is no longer any automatic coercion of types based on the 
> format string

Ever?  This seems to contradict below where you say int needs to handle 
float, and float needs to handle int.  Can you explain further?


> - so simply defining an __int__ method for a type is 
> insufficient if you want to use the 'd' format type. Instead, if you 
> want to use 'd' you can simply write the following:
> 
>    def MyClass:
>       def __format__(self, spec):
>          return int(self).__format__(spec)


So if an item has an __int__ method, but not a __format__ method, and you 
tried to print it with a 'd' format type, it would raise an exception?

 From your descriptions elsewhere in this reply it sounds like it would 
fall back to string output.  Or am I missing something?



> This at least has the advantage of simplifying the problem quite a bit. 
> The global 'format(value, spec)' function now just does:
> 
>    1) check for the 'repr' override, if present return repr(val)
>    2) call val.__format__(spec) if it exists
>    3) call str(val).__format__(spec)

The repr override is the same as in the above function, except in the above 
example any options after the 'r' would be interpreted by the string 
__format__ method.

Sense there isn't any string specific options yet... it can just be 
returned early as in #1 here, but if options are added to the string type, 
that could be changed to forward the format_spec to the string __format__ 
method.

Number two is the same also.

Number three could be the same...  Just put the __format__() in a 
try/except and call str(value) on the exception.

It sounds like we may be getting hung up on interpretation rather than a 
real difference.

> Note that this also means that float.__format__ will have to handle 'd' 
> and int.__format__ will handle 'f', and so on, although this can be done 
> by explicit type conversion in the __format__ method. (No need for float 
> to handle 'x' and the like, even though it does work with %-formatting 
> today.)

This happens in my example above in the case of 'g' and 'd' types 
specifiers, but I'm not sure when it happens in your description if no 
conversions are made?


>> One other feature might be to use the fill syntax form to specify an 
>> overflow replacement character...
>>
>>     '{0:10+10/#}'.format('Python')                 ->  'Python    '
>>
>>     '{0:10+10/#}'.format('To be, or not to be.')   ->  '##########'
> 
> Yeah, as Guido pointed out in another message that's not going to fly.

This one was just a see if it fly's suggestion.  It apparently didn't 
unless a bunch of people all of a sudden say they have actual and valid use 
cases for it that make sense.

Some times you just have to punt and see what happens. ;-)


> A few minor points on syntax of the minilanguage:
> 
> -- I like your idea that :xxxx and ,yyyy can occur in any order.
 >
> -- I'm leaning towards the .Net conversion spec syntax convention where 
> the type letter comes first: ':f10'. The idea being that the first 
> letter changes the interpretation of subsequent letters.
 >
> Note that in the .Net case, the numeric quantity after the letter 
> represents a *precision* specifier, not a min/max field width.

I agree with these points of course.

> So for example, in .Net having a float field of minimum width 10 and a 
> decimal precision of 3 digits would be ':f3,10'.

It looks ok to me, but there may be some cases where it could be ambiguous. 
   How would you specify leading 0's.  Or would we do that in the alignment 
specifier?

     {0:f3,-10/0}    '000123.000'


> Now, as stated above, there's no 'max field width' for any data type 
> except strings. So in the case of strings, we can re-use the precision 
> specifier just like C printf does: ':s10' to limit the string to 10 
> characters. So 's:10,5' to indicate a max width of 10, min width of 5.

I'm sure you meant '{0:s10,5}' here.

What happens if the string is too long? Does it always cut the left side 
off? Or do we use +' - and ^ here too?

> -- There's no decimal precision quantity for any data type except 
> floats. So ':d10' doesn't mean anything I think, but ':d,10' is minimum 
> 10 digits.

This is fine... The maximum value is optional, so this works in my examples 
as well.  If there's not enough cases where specifying a maximum width is 
useful I'm ok with not having it.

The reason I prefer it in the alignment side, is it applies to all cases 
equally.  A consistency I prefer, but maybe not one that's needed.

> -- I don't have an opinion yet on where the other stuff (sign options, 
> padding, alignment) should go, except that sign should go next to the 
> type letter, while the rest should go after the comma.

I think I agree here.


> -- For the 'repr' override, Guido suggests putting 'r' in the alignment 
> field: '{0,r}'. How that mixes with alignment and padding is unknown, 
> although frankly why anyone would want to pad and align a repr() is 
> completely beyond me.

Sometimes it's handy for formatting a variable repr output in columns. 
Mostly for debugging, learning exercises, or documentation purposes.

Since there is no actual Repr type, it may seem like it shouldn't be a type 
specifier. But if you consider it as indirect string type, an abstract type 
that converts to string type, the idea and implementation works fine and it 
can then forward it's type specifier to the strings __format__ method.  (or 
not)

The exact behavior can be flexible.

To me there is an underlying consistency with grouping abstract/indirect 
types with more concrete types rather than makeing an exception in the 
field alignment specifier.

Moving repr to the format side sort of breaks the original clean idea of 
having a field alignment specifier and separate type format specifiers.


I think if we continue to sort out the detail behaviors of the underlying 
implementation, the best overall solution will sort it self out.  Good and 
complete example test cases will help too.

I think we actually agree on quite a lot so far. :-)

Cheers,
    Ron

From martin at v.loewis.de  Sun Aug  5 14:37:15 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Aug 2007 14:37:15 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>	<46B2C8E0.8080409@canterbury.ac.nz>
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>
Message-ID: <46B5C47B.5090703@v.loewis.de>

> Aside from the name, are there other issues you can think of with any
> of the API changes?  There are some small changes, things like macros
> only having a function form.  Are these a problem?
> 
> Str/unicode is going to be a big change.  Any thoughts there?

We need some rules on what the character set is on the C level.
E.g. if you do PyString_FromStringAndSize, is that ASCII, Latin-1,
UTF-8? Likewise, what is the encoding in PyArg_ParseTuple for s
and s# parameters?

Regards,
Martin

From ironfroggy at gmail.com  Sun Aug  5 15:20:33 2007
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Sun, 5 Aug 2007 09:20:33 -0400
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <76fd5acf0708050620v25da595fi11a3e8d5f76306c1@mail.gmail.com>

I can't remember specifics, but i had always expected map and filter
to be replaced by their itertools counter parts.

On 8/4/07, Kurt B. Kaiser <kbk at shore.net> wrote:
> Although there has been quite a bit of discussion on dropping reduce()
> and retaining map(), filter(), and zip(), there has been less discussion
> (at least that I can find) on changing them to return iterators instead
> of lists.
>
> I think of map() and filter() as sequence transformers.  To me, it's
> an unexpected semantic change that the result is no longer a list.
>
> In existing Lib/ code, it's twice as likely that the result of map()
> will be assigned than to use it as an iterator in a flow control
> statement.
>
> If the statistics on the usage of map() stay the same, 2/3 of the time
> the current implementation will require code like
>
>         foo = list(map(fcn, bar)).
>
> map() and filter() were retained primarily because they can produce
> more compact and readable code when used correctly.  Adding list() most
> of the time seems to diminish this benefit, especially when combined with
> a lambda as the first arg.
>
> There are a number of instances where map() is called for its side
> effect, e.g.
>
>         map(print, line_sequence)
>
> with the return result ignored.  In py3k this has caused many silent
> failures.  We've been weeding these out, and there are only a couple
> left, but there are no doubt many more in 3rd party code.
>
> The situation with filter() is similar, though it's not used purely
> for side effects.  zip() is infrequently used.  However, IMO for
> consistency they should all act the same way.
>
> I've seen GvR slides suggesting replacing map() et. al. with list
> comprehensions, but never with generator expressions.
>
> PEP 3100: "Make built-ins return an iterator where appropriate
> (e.g. range(), zip(), map(), filter(), etc.)"
>
> It makes sense for range() to return an iterator.  I have my doubts on
> map(), filter(), and zip().  Having them return iterators seems to
> be a premature optimization.  Could something be done in the ast phase
> of compilation instead?
>
>
>
>
>
>
> --
> KBK
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/ironfroggy%40gmail.com
>


-- 
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://ironfroggy-code.blogspot.com/

From alan.mcintyre at gmail.com  Sun Aug  5 16:02:58 2007
From: alan.mcintyre at gmail.com (Alan McIntyre)
Date: Sun, 5 Aug 2007 10:02:58 -0400
Subject: [Python-3000] test_asyncore fails intermittently on Darwin
In-Reply-To: <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com>
References: <2cda2fc90707261505tdd9a0f1t861b5801c37ad11e@mail.gmail.com>
	<1d36917a0707261618oac94f20l98f464a2ab1edc4e@mail.gmail.com>
	<2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com>
	<2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com>
	<46AE943D.1040105@canterbury.ac.nz>
	<5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com>
Message-ID: <1d36917a0708050702n6b48594bn824bd97ea6622421@mail.gmail.com>

On 8/4/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> Well, regardless of the brokenness of the patch, I do get two
> different failures from this test on OSX. The first is caused by
> trying to socket.bind() a port that's already been bound recently:
<snip>
> That looks pretty easy to fix.

It was fixed in the trunk on July 28 as part of rev 56604, by letting
the OS assign the port (binding to port 0). I apologize if everybody
was expecting me to fix this in Python 3000; I thought the initial
complaint was in reference to 2.6. I'm working on test improvements
for 2.6, so I'm sort of fixated on the trunk at the moment. :)  I
wouldn't mind trying to roll my changes forward into Py3k after GSoC
is done if I have the time, though.

Alan

From guido at python.org  Sun Aug  5 17:08:28 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 5 Aug 2007 08:08:28 -0700
Subject: [Python-3000] C API cleanup str
In-Reply-To: <46B5C47B.5090703@v.loewis.de>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
	<46B2C8E0.8080409@canterbury.ac.nz>
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>
	<46B5C47B.5090703@v.loewis.de>
Message-ID: <ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>

On 8/5/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Aside from the name, are there other issues you can think of with any
> > of the API changes?  There are some small changes, things like macros
> > only having a function form.  Are these a problem?
> >
> > Str/unicode is going to be a big change.  Any thoughts there?
>
> We need some rules on what the character set is on the C level.
> E.g. if you do PyString_FromStringAndSize, is that ASCII, Latin-1,
> UTF-8? Likewise, what is the encoding in PyArg_ParseTuple for s
> and s# parameters?

IMO at the C level all conversions between bytes and Unicode that
don't specify a conversion should use UTF-8. That's what most of the
changes made so far do.

An exception should be made for stuff that explicitly handles
filenames; there the filesystem encoding should obviously used.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Sun Aug  5 17:48:06 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Aug 2007 17:48:06 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>	
	<46B2C8E0.8080409@canterbury.ac.nz>	
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>	
	<46B5C47B.5090703@v.loewis.de>
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>
Message-ID: <46B5F136.4010502@v.loewis.de>

> IMO at the C level all conversions between bytes and Unicode that
> don't specify a conversion should use UTF-8. That's what most of the
> changes made so far do.

I agree. We should specify that somewhere, so we have a recorded
guideline to use in case of doubt.

One function that misbehaves under this spec is
PyUnicode_FromString[AndSize], which assumes the input is Latin-1
(i.e. it performs a codepoint-per-codepoint conversion).

As a consequence, this now can fail because of encoding errors
(which it previously couldn't).

> An exception should be made for stuff that explicitly handles
> filenames; there the filesystem encoding should obviously used.

In most cases, this still follows the rule, as the filename encoding
is specified explicitly. I agree this should also be specified, in
particular when the import code gets fixed (where strings typically
denote file names).

Regards,
Martin

From guido at python.org  Sun Aug  5 17:59:38 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 5 Aug 2007 08:59:38 -0700
Subject: [Python-3000] C API cleanup str
In-Reply-To: <46B5F136.4010502@v.loewis.de>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
	<46B2C8E0.8080409@canterbury.ac.nz>
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>
	<46B5C47B.5090703@v.loewis.de>
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>
	<46B5F136.4010502@v.loewis.de>
Message-ID: <ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>

On 8/5/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > IMO at the C level all conversions between bytes and Unicode that
> > don't specify a conversion should use UTF-8. That's what most of the
> > changes made so far do.
>
> I agree. We should specify that somewhere, so we have a recorded
> guideline to use in case of doubt.

But where? Time to start a PEP for the C API perhaps?

> One function that misbehaves under this spec is
> PyUnicode_FromString[AndSize], which assumes the input is Latin-1
> (i.e. it performs a codepoint-per-codepoint conversion).

Ouch.

> As a consequence, this now can fail because of encoding errors
> (which it previously couldn't).

You mean if it were fixed it could fail, right? Code calling it should
be checking for errors anyway because it allocates memory.

Have you tried making this particular change and seeing what fails?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Sun Aug  5 18:25:53 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Aug 2007 18:25:53 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>	
	<46B2C8E0.8080409@canterbury.ac.nz>	
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>	
	<46B5C47B.5090703@v.loewis.de>	
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>	
	<46B5F136.4010502@v.loewis.de>
	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
Message-ID: <46B5FA11.5040404@v.loewis.de>

>> I agree. We should specify that somewhere, so we have a recorded
>> guideline to use in case of doubt.
> 
> But where? Time to start a PEP for the C API perhaps?

I would put it into the API documentation. We can put a daily-generated
version of the documentation online, just as the trunk documentation is
updated daily.

IMO, a PEP is necessary only for disputed cases. As the C API seems to
get few if any disputes, we just need to record the decisions made.

>> As a consequence, this now can fail because of encoding errors
>> (which it previously couldn't).
> 
> You mean if it were fixed it could fail, right?

Right.

> Have you tried making this particular change and seeing what fails?

No. I suspect most callers pass ASCII, so they should be fine. In
the cases where it really fails, the caller likely meant to create
bytes.

Regards,
Martin


From adam at hupp.org  Sun Aug  5 18:31:32 2007
From: adam at hupp.org (Adam Hupp)
Date: Sun, 5 Aug 2007 11:31:32 -0500
Subject: [Python-3000] py3k conversion docs?
In-Reply-To: <18100.59990.335150.692487@montanaro.dyndns.org>
References: <18100.59990.335150.692487@montanaro.dyndns.org>
Message-ID: <20070805163132.GA10277@mouth.upl.cs.wisc.edu>

On Sat, Aug 04, 2007 at 04:06:30PM -0500, skip at pobox.com wrote:
> I'm looking at the recently submitted patch for the csv module and am
> scratching my head a bit trying to understand the code transformations.
> I've not looked at any py3k code yet, so this is all new to me.  Is there
> any documentation about the Py3k conversion?  I'm particularly interested in
> the string->unicode conversion.
> 
> Here's one confusing conversion.  I see PyString_FromStringAndSize replaced
> by PyUnicode_FromUnicode.  


In that case the type of ReaderObj.field has changed from char* to
Py_UNICODE*.  _FromUnicode should be analagous to the
_FromStringAndSize call here.

> In another place I see PyString_FromString replaced by
> PyUnicodeDecodeASCII.  In some places I see a char left alone.  In
> other places I see it replaced by PyUNICODE.

Actually, I missed one spot that should use Py_UNICODE instead of
char.  get_nullchar_as_None should be taking a Py_UNICODE instead of a
char, and PyUnicode_DecodeASCII should really be a call to
_FromUnicode.

I'll say though that I'm not positive this patch is the Right Way to
do the conversion.  Review by someone that does would be appreciated.


-- 
Adam Hupp | http://hupp.org/adam/


From talin at acm.org  Sun Aug  5 18:33:29 2007
From: talin at acm.org (Talin)
Date: Sun, 05 Aug 2007 09:33:29 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B59F05.3070200@ronadam.com>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com>
Message-ID: <46B5FBD9.4020301@acm.org>

Ron Adam wrote:
> Talin wrote:
>> Another thing I want to point out is that Guido and I (in a private 
>> discussion) have resolved our argument about the role of __format__. 
>> Well, not so much *agreed* I guess, more like I capitulated.
> 
> Refer to the message in this thread where I discuss the difference 
> between concrete and abstract format specifiers.  I think this is 
> basically where you and Guido are differing on these issues.  I got the 
> impression you prefer the more abstract interpretation and Guido prefers 
> a more traditional interpretation.  We can have both as long as they are 
> well defined and documented as being one or the other.  It's when we try 
> to make one format specifier have both qualities at different times that 
> it gets messy.
> 
> 
> Here's how the apply_format function could look, we may not be in as 
> much disagreement as you think.
> 
> def apply_format(value, format_spec):
>     abstract = False
>     type = format_spec[0]
>     if type in 'rtgd':
>     abstract = True
>         if format_spec[0] == 'r':      # abstarct repr
>             value = repr(value)
>         elif format_spec[0] == 't':    # abstarct text
>             value = str(value)
>         elif format_spec[0] == 'g':    # abstract float
>             value = float(value)
>         else
>             format_spec[0] == 'd':     # abstarct int
>             value = int(value)
>     return value.__format__(format_spec, abstract)
> 
> The above abstract types use duck typing to convert to concrete types 
> before calling the returned types __format__ method. There aren't that 
> many abstract types needed.  We only need a few to cover the most common 
> cases.
> 
> That's it.  It's up to each types __format__ method to figure out things 
> from there.  They can look at the original type spec passed to them and 
> handle special cases if need be.

Let me define some terms again for the discussion. As noted before, the 
',' part is called the alignment specifier. It's no longer appropriate 
to use the term 'conversion specifier', since we're not doing 
conversions, so I guess I will stick with the term 'format specifier' 
for the ':' part.

What Guido wants is for the general 'apply_format' function to not 
examine the format specifier *at all*.

The reason is that for some types, the __format__ method can define its 
own interpretation of the format string which may include the letters 
'rtgd' as part of its regular syntax. Basically, he wants no constraints 
on what __format__ is allowed to do.

Given this constraint, it becomes pretty obvious which attributes go in 
which part. Attributes which are actually involved in generating the 
text (signs and leading digits) would have to go in the 
format_specifier, and attributes which are are interpreted by 
apply_format (such as left/right alignment) would have to go in the 
alignment specifier.

Of course, the two can't be entirely isolated because there is 
interaction between the two specifiers for some types. For example, it 
would normally be the case that padding is applied by 'apply_format', 
which knows about the field width and the padding character. However, in 
the case of an integer that is printed with leading zeros, the sign must 
come *before* the padding: '+000000010'. It's not sufficient to simply 
apply padding blindly to the output of __format__, which would give you 
'000000+10'.

(Maybe leading zeros and padding are different things? But the 
__format__ would still need to know the field width, which is usually 
part of the alignment spec, since it's usually applied as a 
post-processing step by 'apply_format')

> If the abstract flag is False and the format_spec type doesn't match the 
> type of the __format__ methods class, then an exception can be raised. 
> This offers a wider range of strictness/leniency to string formatting. 
> There are cases where you may want either.
> 
> 
>> But in any case, the deal is that int, float, and decimal all get to 
>> have a __format__ method which interprets the format string for those 
>> types.
> 
> Good, +1
> 
>> There is no longer any automatic coercion of types based on the format 
>> string
> 
> Ever?  This seems to contradict below where you say int needs to handle 
> float, and float needs to handle int.  Can you explain further?

What I mean is that a float, upon receiving a format specifier of 'd', 
needs to print the number so that it 'looks like' an integer. It doesn't 
actually have to convert it to an int. So 'd' in this case is just a 
synonym for 'f0'.

>> - so simply defining an __int__ method for a type is insufficient if 
>> you want to use the 'd' format type. Instead, if you want to use 'd' 
>> you can simply write the following:
>>
>>    def MyClass:
>>       def __format__(self, spec):
>>          return int(self).__format__(spec)
> 
> 
> So if an item has an __int__ method, but not a __format__ method, and 
> you tried to print it with a 'd' format type, it would raise an exception?
> 
>  From your descriptions elsewhere in this reply it sounds like it would 
> fall back to string output.  Or am I missing something?

Yes, we have to have some sort of fallback if there's no __format__ 
method at all. My thought here is to coerce to str() in this case.

>> So for example, in .Net having a float field of minimum width 10 and a 
>> decimal precision of 3 digits would be ':f3,10'.
> 
> It looks ok to me, but there may be some cases where it could be 
> ambiguous.   How would you specify leading 0's.  Or would we do that in 
> the alignment specifier?
> 
>     {0:f3,-10/0}    '000123.000'

I'm not sure. This is the one case where the two specifiers interact, as 
I mentioned above.

>> Now, as stated above, there's no 'max field width' for any data type 
>> except strings. So in the case of strings, we can re-use the precision 
>> specifier just like C printf does: ':s10' to limit the string to 10 
>> characters. So 's:10,5' to indicate a max width of 10, min width of 5.
> 
> I'm sure you meant '{0:s10,5}' here.

Right.

>> -- For the 'repr' override, Guido suggests putting 'r' in the 
>> alignment field: '{0,r}'. How that mixes with alignment and padding is 
>> unknown, although frankly why anyone would want to pad and align a 
>> repr() is completely beyond me.
> 
> Sometimes it's handy for formatting a variable repr output in columns. 
> Mostly for debugging, learning exercises, or documentation purposes.
> 
> Since there is no actual Repr type, it may seem like it shouldn't be a 
> type specifier. But if you consider it as indirect string type, an 
> abstract type that converts to string type, the idea and implementation 
> works fine and it can then forward it's type specifier to the strings 
> __format__ method.  (or not)
> 
> The exact behavior can be flexible.
> 
> To me there is an underlying consistency with grouping abstract/indirect 
> types with more concrete types rather than makeing an exception in the 
> field alignment specifier.
> 
> Moving repr to the format side sort of breaks the original clean idea of 
> having a field alignment specifier and separate type format specifiers.

The reason for this is because of the constraint that apply_format never 
looks at the format specifier, so overrides for repr() can only go in 
the thing that it does look at - the alignment spec.

> I think if we continue to sort out the detail behaviors of the 
> underlying implementation, the best overall solution will sort it self 
> out.  Good and complete example test cases will help too.
> 
> I think we actually agree on quite a lot so far. :-)

Me too.

> Cheers,
>    Ron
> 

From rhamph at gmail.com  Sun Aug  5 20:05:48 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 5 Aug 2007 12:05:48 -0600
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B5FBD9.4020301@acm.org>
References: <46B13ADE.7080901@acm.org> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org>
Message-ID: <aac2c7cb0708051105g2e82f62dye2042528ca17869f@mail.gmail.com>

On 8/5/07, Talin <talin at acm.org> wrote:
> Ron Adam wrote:
> > To me there is an underlying consistency with grouping abstract/indirect
> > types with more concrete types rather than makeing an exception in the
> > field alignment specifier.
> >
> > Moving repr to the format side sort of breaks the original clean idea of
> > having a field alignment specifier and separate type format specifiers.
>
> The reason for this is because of the constraint that apply_format never
> looks at the format specifier, so overrides for repr() can only go in
> the thing that it does look at - the alignment spec.

How important is this constraint?  In my proposal, apply_format (which
I called handle_format, alas) immediately called __format__.  Only if
__format__ didn't exist or it returned NotImplemented would it check
what type was expected and attempt a coercion (__float__, __index__,
etc), then calling __format__ on that.

-- 
Adam Olsen, aka Rhamphoryncus

From rrr at ronadam.com  Sun Aug  5 21:41:59 2007
From: rrr at ronadam.com (Ron Adam)
Date: Sun, 05 Aug 2007 14:41:59 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B5FBD9.4020301@acm.org>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org>
Message-ID: <46B62807.4030106@ronadam.com>



Talin wrote:
> Ron Adam wrote:
>> Talin wrote:

> Let me define some terms again for the discussion. As noted before, the 
> ',' part is called the alignment specifier. It's no longer appropriate 
> to use the term 'conversion specifier', since we're not doing 
> conversions, so I guess I will stick with the term 'format specifier' 
> for the ':' part.

I don't consider them as conversions, it's all going to end up as either a 
string or an exception at the end.  It's just a matter of the best way to 
get there.

The only case where a conversion of any type *doesn't* happen is when a 
value is already a string and a string specifier is applied to it or there 
is no format specifier.  In all most other cases, some sort of converting 
process occurs, although it may be a manual reading of characters or bytes 
and not an explicit type cast.  And in those cases, its more a matter of 
when it happens rather than how it happens that is important.

Also this is a one directional data path.  The process should never have 
side effects that may effect an object that is passed into a formatter. 
This isn't enforceable, but pythons builtin mechanisms should never do 
that.  Creating new objects in an intermediate step doesn't do that.


> What Guido wants is for the general 'apply_format' function to not 
> examine the format specifier *at all*.

Hmmm...  With this, it becomes much harder to determine what a format 
specifier will do because it depends totally on the objects __format__ 
method implementation.  So the behavior of a specific format specifier may 
change depending on the argument object type.

It also makes the __format__ methods much more complex because you need to 
have them know how to handle a wider variety of possibilities.

What will the built in types __format__ method do if they get a specifier 
they don't know how to handle?  Raise an exception, or fall back to str, or 
repr?


> The reason is that for some types, the __format__ method can define its 
> own interpretation of the format string which may include the letters 
> 'rtgd' as part of its regular syntax. Basically, he wants no constraints 
> on what __format__ is allowed to do.

You suggested the format specification be interpreted like a mini language. 
  That implies there may be global format interpreter that an objects 
__format__ method can call.

Such an interpreter would know how to handle the built in types and be 
extendable. Or we could supply a __format__ method to change the behavior 
if we want something else.  In effect, it moves any 
conversions/interpretations that may happen later in the even chain.

Is this the direction he wants to go in?

Or does he want each built in object to have it's own __format__ method 
independent from each other?


> Given this constraint, it becomes pretty obvious which attributes go in 
> which part. Attributes which are actually involved in generating the 
> text (signs and leading digits) would have to go in the 
> format_specifier, and attributes which are are interpreted by 
> apply_format (such as left/right alignment) would have to go in the 
> alignment specifier.
> 
> Of course, the two can't be entirely isolated because there is 
> interaction between the two specifiers for some types. For example, it 
> would normally be the case that padding is applied by 'apply_format', 
> which knows about the field width and the padding character. However, in 
> the case of an integer that is printed with leading zeros, the sign must 
> come *before* the padding: '+000000010'. It's not sufficient to simply 
> apply padding blindly to the output of __format__, which would give you 
> '000000+10'.
> 
> (Maybe leading zeros and padding are different things? But the 
> __format__ would still need to know the field width, which is usually 
> part of the alignment spec, since it's usually applied as a 
> post-processing step by 'apply_format')

It is different.  That is why earlier I made the distinction between a 
numeric width and a field width.  This would be a numeric width, and it 
would be inside a field which may have it's own minimum width and possibly 
a different fill character.

    '{0:d+6/0,^15/_}'.format(123)  ->      '____+000123____'

This way, the two terms don't have to know about each other.

The same output in some cases can be generated in more than one way, but I 
don't think that is always a bad thing.  Trying to avoid that makes things 
more complex.


>>> There is no longer any automatic coercion of types based on the 
>>> format string
>>
>> Ever?  This seems to contradict below where you say int needs to 
>> handle float, and float needs to handle int.  Can you explain further?
> 
> What I mean is that a float, upon receiving a format specifier of 'd', 
> needs to print the number so that it 'looks like' an integer. It doesn't 
> actually have to convert it to an int. So 'd' in this case is just a 
> synonym for 'f0'.

I will think about this a bit.  It seems to me, the results are the same 
with more work.

What about rounding behaviors, isn't 'f0' different in that regard?


>>> - so simply defining an __int__ method for a type is insufficient if 
>>> you want to use the 'd' format type. Instead, if you want to use 'd' 
>>> you can simply write the following:
>>>
>>>    def MyClass:
>>>       def __format__(self, spec):
>>>          return int(self).__format__(spec)
>>
>>
>> So if an item has an __int__ method, but not a __format__ method, and 
>> you tried to print it with a 'd' format type, it would raise an 
>> exception?
>>
>>  From your descriptions elsewhere in this reply it sounds like it 
>> would fall back to string output.  Or am I missing something?
> 
> Yes, we have to have some sort of fallback if there's no __format__ 
> method at all. My thought here is to coerce to str() in this case.

Will a string have a __format__ method and if so, will the format specifier 
term be forwarded to the string's __format__ method in this case too?


>>> So for example, in .Net having a float field of minimum width 10 and 
>>> a decimal precision of 3 digits would be ':f3,10'.
>>
>> It looks ok to me, but there may be some cases where it could be 
>> ambiguous.   How would you specify leading 0's.  Or would we do that 
>> in the alignment specifier?
>>
>>     {0:f3,-10/0}    '000123.000'
> 
> I'm not sure. This is the one case where the two specifiers interact, as 
> I mentioned above.

Yes, that is way I asked about it. To avoid interaction you need for floats 
to have a 'numeric width'.  And to avoid ambiguities with the precision 
term you need the '.'.

      {0:f+6/0.3}       '-000123.000'
      {0:f+6.3}         '+   456.000'
      {0:f6}            '   789.0'
      {0:f.3}           '42.000'



>> To me there is an underlying consistency with grouping 
>> abstract/indirect types with more concrete types rather than makeing 
>> an exception in the field alignment specifier.
>>
>> Moving repr to the format side sort of breaks the original clean idea 
>> of having a field alignment specifier and separate type format 
>> specifiers.
> 
> The reason for this is because of the constraint that apply_format never 
> looks at the format specifier, so overrides for repr() can only go in 
> the thing that it does look at - the alignment spec.

Ok. But I'm -1 on this for the record.   It creates an exceptional case. 
ie... the format is applied first, except if the alignment term has an 'r' 
in it.

Then what happens to the format specifier term if it exists?  Is it 
forwarded to the string __format__ method here?, ignored?, or is an 
exception raised?


I'm going to think about these issues some more. Maybe I'll change my mind 
  or find another way to 'see' this.

Cheers,
    Ron


From martin at v.loewis.de  Sun Aug  5 22:32:16 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Aug 2007 22:32:16 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>	
	<46B2C8E0.8080409@canterbury.ac.nz>	
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>	
	<46B5C47B.5090703@v.loewis.de>	
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>	
	<46B5F136.4010502@v.loewis.de>
	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
Message-ID: <46B633D0.7050902@v.loewis.de>

> You mean if it were fixed it could fail, right? Code calling it should
> be checking for errors anyway because it allocates memory.
> 
> Have you tried making this particular change and seeing what fails?

I now tried, and it turned out that bytes.__reduce__ would break
(again); I fixed it and changed it in r56755.

It turned out that PyUnicode_FromString was even documented to
accept latin-1.

While I was looking at it, I wondered why PyUnicode_FromStringAndSize
allows a NULL first argument, creating a null-initialized Unicode
object. This functionality is already available as
PyUnicode_FromUnicode, and callers who previously wrote

   obuf = PyString_FromStringAndSize(NULL, bufsize);
   if (!obuf)return NULL;
   buf = PyString_AsString(buf);

could be tricked into believing that they now can change the
string object they just created - which they cannot, as
buf will just be the UTF-8 encoded version of the real string.

Regards,
Martin

From martin at v.loewis.de  Sun Aug  5 22:49:33 2007
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Aug 2007 22:49:33 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
Message-ID: <46B637DD.7070905@v.loewis.de>

I changed bsddb so that it consistently produces and
consumes bytes only, and added convenience wrappers
StringKeys and StringValues for people whose database
are known to store only strings as either keys or
values; those get UTF-8 encoded.

While I could fix test_bsddb with these changes,
anydbm and whichdb broke, as they expect to use
string keys. Changing them to use bytes keys then
broke dumbdbm, which uses a dictionary internally
for the index.

This brings me to join others in the desire for
immutable bytes objects: I think such a type is
needed, and it should probably use the same
hash algorithm as str8.

I don't think it needs to be a separate type,
instead, bytes objects could have a idem-potent
.freeze() method which switches the "immutable"
bit on. There would be no way to switch it off
again.

If that is not acceptable, please tell me how else
to fix the dbm modules.

Regards,
Martin

From fdrake at acm.org  Sun Aug  5 23:04:39 2007
From: fdrake at acm.org (Fred Drake)
Date: Sun, 5 Aug 2007 17:04:39 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B637DD.7070905@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
Message-ID: <393F0424-E74F-4E27-9AFB-45EC70704A56@acm.org>

On Aug 5, 2007, at 4:49 PM, Martin v. L?wis wrote:
> I don't think it needs to be a separate type,
> instead, bytes objects could have a idem-potent
> .freeze() method which switches the "immutable"
> bit on. There would be no way to switch it off
> again.

+1


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From greg.ewing at canterbury.ac.nz  Mon Aug  6 02:10:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Aug 2007 12:10:20 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
Message-ID: <46B666EC.2090807@canterbury.ac.nz>

Guido wrote:
> I remember a language that did the *** thing; it was called Fortran.
> It was an absolutely terrible feature.

I agree that expanding the field width is much to be
preferred if possible. But if you *must* have a maximum
field width, it's better to show no number at all than
a number with some of its digits invisibly chopped
off.

 > So I think a maximum width is quite unnecessary for numbers.

I agree with that.

By the way, I've always thought the reason C has maximum
widths for string formats is so you can deal with strings
which are not null-terminated, an issue that doesn't
arise in Python.

--
Greg


From greg.ewing at canterbury.ac.nz  Mon Aug  6 02:12:21 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Aug 2007 12:12:21 +1200
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <18101.15137.718715.98755@montanaro.dyndns.org>
References: <18100.62086.177289.274444@montanaro.dyndns.org>
	<46B526FE.4060500@canterbury.ac.nz>
	<18101.15137.718715.98755@montanaro.dyndns.org>
Message-ID: <46B66765.7060406@canterbury.ac.nz>

skip at pobox.com wrote:
> Then you need to hang onto the closure.  That might be some distance away
> from the point at which the function was registered.

Well, if you get a unique ID, you need to hang onto that
somehow, too.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug  6 02:24:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Aug 2007 12:24:20 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B54F51.40705@acm.org>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org>
Message-ID: <46B66A34.4080202@canterbury.ac.nz>

Talin wrote:
> So 's:10,5' to indicate a max width of 10, min width of 5.

If you just say

   ':s10'

does this mean there's *no* minimum width, or that the
minimum width is also 10?

The former would be somewhat unintuitive, but if the latter,
then the separation between format and width specifiers
breaks down.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug  6 02:31:28 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Aug 2007 12:31:28 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B568F3.9060105@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com>
Message-ID: <46B66BE0.7090005@canterbury.ac.nz>

Ron Adam wrote:
> 
> Truncating behavior is 
> explicitly specified by giving a max_width size without a replacement 
> character.

I think that would be an extremely bad default for numbers.

 > I believe they will learn fairly quickly what not to do.

Even if that's true, why make the behaviour that's desirable
in the vast majority of cases more difficult to specify?

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug  6 02:42:38 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Aug 2007 12:42:38 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B59F05.3070200@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
Message-ID: <46B66E7E.4060209@canterbury.ac.nz>

Ron Adam wrote:
>      return value.__format__(format_spec, abstract)

Why would the __format__ method need to be passed an
'abstract' flag? It can tell from the format_spec if
it needs to know.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug  6 03:08:46 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Aug 2007 13:08:46 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B5FBD9.4020301@acm.org>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org>
Message-ID: <46B6749E.9020304@canterbury.ac.nz>

Talin wrote:
> in 
> the case of an integer that is printed with leading zeros, the sign must 
> come *before* the padding: '+000000010'. It's not sufficient to simply 
> apply padding blindly to the output of __format__, which would give you 
> '000000+10'.

How about this, then: The apply_format function parses
the alignment spec and passes the result to the __format__
method along with the format spec. The __format__ method
can then choose to do its own alignment and padding to
achieve the specified field width. If it returns something
less than the specified width, apply_format then uses the
default alignment algorithm.

Then the __format__ method has complete control over the
whole process if it wants, the only distinction being that
the alignment spec has a fixed syntax whereas the format
spec can be anything.

--
Greg

From skip at pobox.com  Mon Aug  6 04:12:32 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 5 Aug 2007 21:12:32 -0500
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <46B66765.7060406@canterbury.ac.nz>
References: <18100.62086.177289.274444@montanaro.dyndns.org>
	<46B526FE.4060500@canterbury.ac.nz>
	<18101.15137.718715.98755@montanaro.dyndns.org>
	<46B66765.7060406@canterbury.ac.nz>
Message-ID: <18102.33680.607164.490861@montanaro.dyndns.org>


    >> Then you need to hang onto the closure.  That might be some distance
    >> away from the point at which the function was registered.

    Greg> Well, if you get a unique ID, you need to hang onto that somehow,
    Greg> too.

Yes, but an int is both much smaller than a function and can't be involved
in cyclic garbage.

Skip


From rrr at ronadam.com  Mon Aug  6 06:47:36 2007
From: rrr at ronadam.com (Ron Adam)
Date: Sun, 05 Aug 2007 23:47:36 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B66E7E.4060209@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com> <46B66E7E.4060209@canterbury.ac.nz>
Message-ID: <46B6A7E8.7040001@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>>      return value.__format__(format_spec, abstract)
> 
> Why would the __format__ method need to be passed an
> 'abstract' flag? It can tell from the format_spec if
> it needs to know.

I may have been thinking too far ahead on this one.  I first wrote that 
without the abstract flag, but then changed it because it seemed there was 
an ambiguous situations that I thought this would clear up.

I think i was thinking of a way to offer a generic way to tell a __format__ 
method weather or not to raise an exception or fall back to str or repr.


lets say a string __format__ method looks like the following...

    def __format__(self, specifier, abstract=False):
       if not specifier or specifier[0] == 's' or abstract:
           return self
       raise(ValueError, invalid type for format specifier.)

It would be more complex than this in most cases, but it doesn't need to 
know about any other specifier types to work.  Of course string types don't 
need to fall back, but that doens't mean it is away s desirable for them to 
succeed.


For example if we have...

     '{0:k10}'.format('python')

Should it even try to succeed, or should it complain immediately?

If the string __format__ method got 'k10' as a format specifier, it has no 
idea what the 'k10' is suppose to mean, it needs to make a choice to either 
fall back to str(), or raise an exception that could be caught and handled.

So,

Is it useful to sometimes be strict an at other times forgive and fall back?

And if so, how can we handle that best?

(The exact mechanism can be figured out later, its the desired behaviors 
that needs to be determined for now.)

Cheers,
    Ron











From rrr at ronadam.com  Mon Aug  6 06:48:51 2007
From: rrr at ronadam.com (Ron Adam)
Date: Sun, 05 Aug 2007 23:48:51 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B66BE0.7090005@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>
Message-ID: <46B6A833.70007@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> Truncating behavior is 
>> explicitly specified by giving a max_width size without a replacement 
>> character.
> 
> I think that would be an extremely bad default for numbers.

It's *not* a default.  The default is to have no max_width.


>  > I believe they will learn fairly quickly what not to do.
> 
> Even if that's true, why make the behaviour that's desirable
> in the vast majority of cases more difficult to specify?
> 
> --
> Greg
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/rrr%40ronadam.com
> 
> 


From talin at acm.org  Mon Aug  6 07:40:07 2007
From: talin at acm.org (Talin)
Date: Sun, 05 Aug 2007 22:40:07 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B6749E.9020304@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com>	<46B5FBD9.4020301@acm.org>
	<46B6749E.9020304@canterbury.ac.nz>
Message-ID: <46B6B437.7080901@acm.org>

Greg Ewing wrote:
> Talin wrote:
>> in 
>> the case of an integer that is printed with leading zeros, the sign must 
>> come *before* the padding: '+000000010'. It's not sufficient to simply 
>> apply padding blindly to the output of __format__, which would give you 
>> '000000+10'.
> 
> How about this, then: The apply_format function parses
> the alignment spec and passes the result to the __format__
> method along with the format spec. The __format__ method
> can then choose to do its own alignment and padding to
> achieve the specified field width. If it returns something
> less than the specified width, apply_format then uses the
> default alignment algorithm.
> 
> Then the __format__ method has complete control over the
> whole process if it wants, the only distinction being that
> the alignment spec has a fixed syntax whereas the format
> spec can be anything.

I think that this is right - at least, I can't think of another way to 
do it.

-- Talin

From skip at pobox.com  Mon Aug  6 08:03:02 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 6 Aug 2007 01:03:02 -0500
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <18102.33680.607164.490861@montanaro.dyndns.org>
References: <18100.62086.177289.274444@montanaro.dyndns.org>
	<46B526FE.4060500@canterbury.ac.nz>
	<18101.15137.718715.98755@montanaro.dyndns.org>
	<46B66765.7060406@canterbury.ac.nz>
	<18102.33680.607164.490861@montanaro.dyndns.org>
Message-ID: <18102.47510.862152.964621@montanaro.dyndns.org>


    skip> Yes, but an int is both much smaller than a function and can't be
    skip> involved in cyclic garbage.

I also forgot to mention that inexperienced users will probably find it
easier to hang onto an int than create a closure

Skip

From greg.ewing at canterbury.ac.nz  Mon Aug  6 08:44:05 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Aug 2007 18:44:05 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B6A7E8.7040001@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com>
Message-ID: <46B6C335.4080504@canterbury.ac.nz>

Ron Adam wrote:
> If the string __format__ method got 'k10' as a format specifier, it has 
> no idea what the 'k10' is suppose to mean, it needs to make a choice to 
> either fall back to str(), or raise an exception that could be caught 
> and handled.

I think Guido's scheme handles this okay. Each type's
__format__ decides whether it can handle the format
spec, and if not, explicitly delegates to something
else such as str(self).__format__. Eventually you
will get to a type that either understands the format
or has nowhere left to delegate to and raises an
exception.

--
Greg

From walter at livinglogic.de  Mon Aug  6 09:51:08 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Mon, 06 Aug 2007 09:51:08 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <46B633D0.7050902@v.loewis.de>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>		<46B2C8E0.8080409@canterbury.ac.nz>		<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>		<46B5C47B.5090703@v.loewis.de>		<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>		<46B5F136.4010502@v.loewis.de>	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B633D0.7050902@v.loewis.de>
Message-ID: <46B6D2EC.601@livinglogic.de>

Martin v. L?wis wrote:

>> You mean if it were fixed it could fail, right? Code calling it should
>> be checking for errors anyway because it allocates memory.
>>
>> Have you tried making this particular change and seeing what fails?
> 
> I now tried, and it turned out that bytes.__reduce__ would break
> (again); I fixed it and changed it in r56755.
> 
> It turned out that PyUnicode_FromString was even documented to
> accept latin-1.

Yes, that seemed to me to be the most obvious interpretion.

> While I was looking at it, I wondered why PyUnicode_FromStringAndSize
> allows a NULL first argument, creating a null-initialized Unicode
> object.

Because that's what PyString_FromStringAndSize() does.

> This functionality is already available as
> PyUnicode_FromUnicode, and callers who previously wrote
> 
>    obuf = PyString_FromStringAndSize(NULL, bufsize);
>    if (!obuf)return NULL;
>    buf = PyString_AsString(buf);
> 
> could be tricked into believing that they now can change the
> string object they just created - which they cannot, as
> buf will just be the UTF-8 encoded version of the real string.

True, this will no longer work.

So should NULL support be dropped from PyUnicode_FromStringAndSize()?

Servus,
    Walter

From martin at v.loewis.de  Mon Aug  6 10:07:20 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Aug 2007 10:07:20 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <46B6D2EC.601@livinglogic.de>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>		<46B2C8E0.8080409@canterbury.ac.nz>		<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>		<46B5C47B.5090703@v.loewis.de>		<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>		<46B5F136.4010502@v.loewis.de>	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de>
Message-ID: <46B6D6B8.7000207@v.loewis.de>

>> I now tried, and it turned out that bytes.__reduce__ would break
>> (again); I fixed it and changed it in r56755.
>>
>> It turned out that PyUnicode_FromString was even documented to
>> accept latin-1.
> 
> Yes, that seemed to me to be the most obvious interpretion.

Unfortunately, this made creating and retrieving asymmetric:
when you do PyUnicode_AsString, you'll get an UTF-8 string; when
you do PyUnicode_FromString, you did have to pass Latin-1. Making
AsString also return Latin-1 would, of course, restrict the number of
cases where it works.

>> While I was looking at it, I wondered why PyUnicode_FromStringAndSize
>> allows a NULL first argument, creating a null-initialized Unicode
>> object.
> 
> Because that's what PyString_FromStringAndSize() does.

I guessed that was the historic reason; I just wondered whether the
rationale for having it in PyString_FromStringAndSize still applies
to Unicode.

> So should NULL support be dropped from PyUnicode_FromStringAndSize()?

That's my proposal, yes.

Regards,
Martin

From rrr at ronadam.com  Mon Aug  6 10:24:27 2007
From: rrr at ronadam.com (Ron Adam)
Date: Mon, 06 Aug 2007 03:24:27 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B6C219.4040900@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
Message-ID: <46B6DABB.3080509@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> It's *not* a default.  The default is to have no max_width.
> 
> You're suggesting it would be a default if you
> *did* specify a max width but no replacement
> char. That's what I'm saying would be a bad
> default.

Absolutely, *for_field_widths* which is string of characters, after the 
formatting step is done.

We could add a numeric max_width that is specific to numbers that default 
to the '*' character if width overflow is done.  As long as the field 
max_width isn't specified or is shorter than the field max_width, it would 
do what you want.

We could have a pre_process step that adjusts the format width to be within 
the field min-max range or raises an exception if you really want that.


     {0:d,10+20}      # Field width are just string operations done after
                      # formatting is done.

     {0:d10+20}       # Numeric widths,  differ from field widths.
                      # They are specific to the type so can handle special
                      # cases.


Now here's the problem with all of this.  As we add the widths back into 
the format specifications, we are basically saying the idea of a separate 
field width specifier is wrong.

So maybe it's not really a separate independent thing after all, and it 
just a convenient grouping for readability purposes only.

So in that case there is no field alignment function, and it's up to the 
__Format__ method to do both.  :-/

Cheers,
    Ron












From rrr at ronadam.com  Mon Aug  6 10:40:32 2007
From: rrr at ronadam.com (Ron Adam)
Date: Mon, 06 Aug 2007 03:40:32 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B6C335.4080504@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com>	<46B66E7E.4060209@canterbury.ac.nz>
	<46B6A7E8.7040001@ronadam.com> <46B6C335.4080504@canterbury.ac.nz>
Message-ID: <46B6DE80.2050000@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> If the string __format__ method got 'k10' as a format specifier, it has 
>> no idea what the 'k10' is suppose to mean, it needs to make a choice to 
>> either fall back to str(), or raise an exception that could be caught 
>> and handled.
> 
> I think Guido's scheme handles this okay. Each type's
> __format__ decides whether it can handle the format
> spec, and if not, explicitly delegates to something
> else such as str(self).__format__. Eventually you
> will get to a type that either understands the format
> or has nowhere left to delegate to and raises an
> exception.

It sounds like we are describing the same thing, but differ on "where" and 
"when" things are done, but we still haven't worked out the "what" and 
"how" yet.


I think we need to work out the details "what" from the bottom up and then 
see "how" we can do those.  So the questions I asked are still important.

What should happen in various situations of mismatched or invalid type 
specifiers?

When should exceptions occur?

Then after that, what features should each format specifier have?


Then we can determine what is best for "where" and "when" things should be 
done.

Cheers,
    Ron

From walter at livinglogic.de  Mon Aug  6 11:14:21 2007
From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=)
Date: Mon, 06 Aug 2007 11:14:21 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <46B6D6B8.7000207@v.loewis.de>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>		<46B2C8E0.8080409@canterbury.ac.nz>		<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>		<46B5C47B.5090703@v.loewis.de>		<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>		<46B5F136.4010502@v.loewis.de>	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de>
	<46B6D6B8.7000207@v.loewis.de>
Message-ID: <46B6E66D.80301@livinglogic.de>

Martin v. L?wis wrote:

>>> I now tried, and it turned out that bytes.__reduce__ would break
>>> (again); I fixed it and changed it in r56755.
>>>
>>> It turned out that PyUnicode_FromString was even documented to
>>> accept latin-1.
>> Yes, that seemed to me to be the most obvious interpretion.
> 
> Unfortunately, this made creating and retrieving asymmetric:
> when you do PyUnicode_AsString, you'll get an UTF-8 string; when
> you do PyUnicode_FromString, you did have to pass Latin-1. Making
> AsString also return Latin-1 would, of course, restrict the number of
> cases where it works.

True, UTF-8 seems to be the better choice. However all spots in the C
source that call PyUnicode_FromString() only pass ASCII anyway, which
will probably be the most common case.

>>> While I was looking at it, I wondered why PyUnicode_FromStringAndSize
>>> allows a NULL first argument, creating a null-initialized Unicode
>>> object.
>> Because that's what PyString_FromStringAndSize() does.
> 
> I guessed that was the historic reason; I just wondered whether the
> rationale for having it in PyString_FromStringAndSize still applies
> to Unicode.
> 
>> So should NULL support be dropped from PyUnicode_FromStringAndSize()?
> 
> That's my proposal, yes.

At least this would give a clear error message in case someone passes NULL.

Servus,
   Walter


From ncoghlan at gmail.com  Mon Aug  6 12:58:13 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Aug 2007 20:58:13 +1000
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B5FBD9.4020301@acm.org>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org>	<46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org>
Message-ID: <46B6FEC5.9040503@gmail.com>

Talin wrote:
> Of course, the two can't be entirely isolated because there is 
> interaction between the two specifiers for some types. For example, it 
> would normally be the case that padding is applied by 'apply_format', 
> which knows about the field width and the padding character. However, in 
> the case of an integer that is printed with leading zeros, the sign must 
> come *before* the padding: '+000000010'. It's not sufficient to simply 
> apply padding blindly to the output of __format__, which would give you 
> '000000+10'.
> 
> (Maybe leading zeros and padding are different things? But the 
> __format__ would still need to know the field width, which is usually 
> part of the alignment spec, since it's usually applied as a 
> post-processing step by 'apply_format')

Is the signature of __format__ up for negotiation?

If __format__ receives both the alignment specifier and the format 
specifier as arguments, then the method would be free to return its own 
string that has already been adjusted to meet the minimum field width. 
Objects which don't care about alignment details can just return their 
formatted result and let the standard alignment handler deal with the 
minimum field width.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Mon Aug  6 13:02:35 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Aug 2007 21:02:35 +1000
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B6B437.7080901@acm.org>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B54F51.40705@acm.org>	<46B59F05.3070200@ronadam.com>	<46B5FBD9.4020301@acm.org>	<46B6749E.9020304@canterbury.ac.nz>
	<46B6B437.7080901@acm.org>
Message-ID: <46B6FFCB.6010106@gmail.com>

Talin wrote:
> Greg Ewing wrote:
>> Talin wrote:
>>> in 
>>> the case of an integer that is printed with leading zeros, the sign must 
>>> come *before* the padding: '+000000010'. It's not sufficient to simply 
>>> apply padding blindly to the output of __format__, which would give you 
>>> '000000+10'.
>> How about this, then: The apply_format function parses
>> the alignment spec and passes the result to the __format__
>> method along with the format spec. The __format__ method
>> can then choose to do its own alignment and padding to
>> achieve the specified field width. If it returns something
>> less than the specified width, apply_format then uses the
>> default alignment algorithm.
>>
>> Then the __format__ method has complete control over the
>> whole process if it wants, the only distinction being that
>> the alignment spec has a fixed syntax whereas the format
>> spec can be anything.
> 
> I think that this is right - at least, I can't think of another way to 
> do it.

Heh, I could have saved myself some typing by reading more before replying.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Mon Aug  6 13:04:48 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Aug 2007 21:04:48 +1000
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <18102.47510.862152.964621@montanaro.dyndns.org>
References: <18100.62086.177289.274444@montanaro.dyndns.org>	<46B526FE.4060500@canterbury.ac.nz>	<18101.15137.718715.98755@montanaro.dyndns.org>	<46B66765.7060406@canterbury.ac.nz>	<18102.33680.607164.490861@montanaro.dyndns.org>
	<18102.47510.862152.964621@montanaro.dyndns.org>
Message-ID: <46B70050.5010006@gmail.com>

skip at pobox.com wrote:
>     skip> Yes, but an int is both much smaller than a function and can't be
>     skip> involved in cyclic garbage.
> 
> I also forgot to mention that inexperienced users will probably find it
> easier to hang onto an int than create a closure

a) functools.partial isn't that hard to use
b) we could create it automatically in atexit.register and return it

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Mon Aug  6 13:11:33 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Aug 2007 21:11:33 +1000
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B637DD.7070905@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
Message-ID: <46B701E5.3030206@gmail.com>

Martin v. L?wis wrote:
> I don't think it needs to be a separate type,
> instead, bytes objects could have a idem-potent
> .freeze() method which switches the "immutable"
> bit on. There would be no way to switch it off
> again.

+1 here - hashable byte sequences are very handy for dealing with 
fragments of low level serial protocols.

It would also be nice if b"" literals set that immutable flag 
automatically - otherwise converting some of my lookup tables over to 
Py3k would be a serious pain (not a pain I'm likely to have to deal with 
personally given the relative time frames involved, but a pain nonetheless).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From stargaming at gmail.com  Mon Aug  6 13:40:13 2007
From: stargaming at gmail.com (Stargaming)
Date: Mon, 6 Aug 2007 11:40:13 +0000 (UTC)
Subject: [Python-3000] optimizing [x]range
References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com>
	<f8srod$5m7$1@sea.gmane.org> <46B233D2.4030304@v.loewis.de>
Message-ID: <f971at$8ou$1@sea.gmane.org>

On Thu, 02 Aug 2007 21:43:14 +0200, Martin v. L?wis wrote:

>> The patch is based on the latest trunk/ checkout, Python 2.6. I don't
>> think this is a problem if nobody else made any effort towards making
>> xrange more sequence-like in the Python 3000 branch. The C source might
>> require some tab/space cleanup.
> 
> Unfortunately, this is exactly what happened: In Py3k, the range object
> is defined in terms PyObject*, so your patch won't apply to the 3k
> branch.
> 
> Regards,
> Martin

Fixed. Rewrote the patch for the p3yk branch. I'm not sure if I used the 
PyNumber-API correctly, I mostly oriented this patch at other range_* 
methods. See http://sourceforge.net/
tracker/index.php?func=detail&aid=1766304&group_id=5470&atid=305470

Regards,
Stargaming


From skip at pobox.com  Mon Aug  6 14:02:12 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 6 Aug 2007 07:02:12 -0500
Subject: [Python-3000] atexit module problems/questions
In-Reply-To: <46B70050.5010006@gmail.com>
References: <18100.62086.177289.274444@montanaro.dyndns.org>
	<46B526FE.4060500@canterbury.ac.nz>
	<18101.15137.718715.98755@montanaro.dyndns.org>
	<46B66765.7060406@canterbury.ac.nz>
	<18102.33680.607164.490861@montanaro.dyndns.org>
	<18102.47510.862152.964621@montanaro.dyndns.org>
	<46B70050.5010006@gmail.com>
Message-ID: <18103.3524.814185.479602@montanaro.dyndns.org>


    Nick> a) functools.partial isn't that hard to use

Never heard of it and I've been writing Python since the mid-90's.  The
point is that not everybody dreams in a functional programming style.

    Nick> b) we could create it automatically in atexit.register and return
    Nick>    it

That's a possibility, though I'm still inclined to think returning an
ever-increasing int (which is already available as the index into the array)
is cleaner and would be microscopically more efficient) is the way to go.
In the no-arg case you'd just return the function which was passed in.  Is
creating and returning a closure going to be a challenge for Jython or
IronPython?

Changing the focus of this thread a bit, this all seems to be getting a bit
baroque.  Maybe we should back up and ask why atexit needed to be recast in
C in the first place.  Can someone enlighten me?  At some level it seems
more like gratuitous bug insertion than a true necessity.

Skip


From nicko at nicko.org  Mon Aug  6 14:10:11 2007
From: nicko at nicko.org (Nicko van Someren)
Date: Mon, 6 Aug 2007 13:10:11 +0100
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <918D7BB8-3EAE-4295-A387-6B9AEB06D921@nicko.org>

On 4 Aug 2007, at 06:11, Kurt B. Kaiser wrote:

> Although there has been quite a bit of discussion on dropping reduce()
> and retaining map(), filter(), and zip(), there has been less  
> discussion
> (at least that I can find) on changing them to return iterators  
> instead
> of lists.
>
> I think of map() and filter() as sequence transformers.  To me, it's
> an unexpected semantic change that the result is no longer a list.

I agree.  In almost all of the cases where I would naturally use map  
rather than a list comprehension either I want the transformed list  
(rather something that can generate it) or I want the function  
explicitly called on all the elements of the source list right away  
(rather than some time later, or perhaps never).

> In existing Lib/ code, it's twice as likely that the result of map()
> will be assigned than to use it as an iterator in a flow control
> statement.
>
> If the statistics on the usage of map() stay the same, 2/3 of the time
> the current implementation will require code like
>
>         foo = list(map(fcn, bar)).

I presume that if this semantic change stays we are going to have to  
add something to 2to3 which will force the creation of a list from  
the result of any call to map.

> map() and filter() were retained primarily because they can produce
> more compact and readable code when used correctly.  Adding list()  
> most
> of the time seems to diminish this benefit, especially when  
> combined with
> a lambda as the first arg.
>
> There are a number of instances where map() is called for its side
> effect, e.g.
>
>         map(print, line_sequence)
>
> with the return result ignored.  In py3k this has caused many silent
> failures.  We've been weeding these out, and there are only a couple
> left, but there are no doubt many more in 3rd party code.

I'm sure that there are lots of these.  Other scenarios which will  
make for ugly bugs include things like map(db_commit,  
changed_record_list).

> The situation with filter() is similar, though it's not used purely
> for side effects.  zip() is infrequently used.  However, IMO for
> consistency they should all act the same way.

Filter returning an iterator is going to break lots of code which  
says things like:
	interesting_things = filter(predicate, things)
	...
	if foo in interesting_things: ...

Again, if this semantic stays then 2to3 better fix it.  Arguably 2to3  
could translate a call to filter() to a list comprehension.

> I've seen GvR slides suggesting replacing map() et. al. with list
> comprehensions, but never with generator expressions.
>
> PEP 3100: "Make built-ins return an iterator where appropriate
> (e.g. range(), zip(), map(), filter(), etc.)"
>
> It makes sense for range() to return an iterator.  I have my doubts on
> map(), filter(), and zip().  Having them return iterators seems to
> be a premature optimization.  Could something be done in the ast phase
> of compilation instead?

Looking through code I've written, I suspect that basically whenever  
I use map(), filter() or zip() in any context other than in a for...  
loop I am after the concrete list and not an iterator for it.

I would hesitate to suggest that it be optimised at compile time,  
irrespective of the issues resulting from these being built-ins  
rather that keywords (and thus can be reassigned).  Consider we have  
a function f() has a printing side effect, then we have:
	for j in [f(i) for i in range(3)]: print j
	f:  0
	f:  1
	f:  2
	0
	1
	2
And we have
	for j in (f(i) for i in range(3)): print j
	f:  0
	0
	f:  1
	1
	f:  2
	2
We're talking about changing the behaviour of:
	for j in map(f, range(3)): print j
from the former to the later.  If we did some AST phase optimisation  
so that most of the time map() returned a list but it gave an  
iterator if it was used inside a for... loop I think it would be  
dreadfully confusing.

IMHO, when I read "Make built-ins return an iterator where  
appropriate..." I'm inclined to think that it's appropriate for range 
() and but not for the others.

	Nicko


From jeremy at alum.mit.edu  Mon Aug  6 15:30:13 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Mon, 6 Aug 2007 09:30:13 -0400
Subject: [Python-3000] should rfc822 accept text io or binary io?
Message-ID: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>

This is a fairly specific question, but it gets at a more general
issue I don't fully understand.

I recently updated httplib and urllib so that they work on the struni
branch.  A recurring problem with these libraries is that they call
methods like strip() and split().  On a string object, calling these
methods with no arguments means strip/split whitespace.  The bytes
object has no corresponding default arguments; whitespace may not be
well-defined for bytes.  (Or is it?)
In general, the approach was to read data as bytes off the socket and
convert header lines to iso-8859-1 before processing them.

test_urllib2_localnet still fails.  One of the problems is that
BaseHTTPServer doesn't process HTTP responses correctly.  Like
httplib, it converts the HTTP status line to iso-8859-1.  But it
parses the rest of the headers by calling mimetools.Message, which is
really rfc822.Message.  The header lines of an RFC 822 message
(really, RFC 2822) are ascii, so it should be easy to do the
conversion.  rfc822.Message assumes it is reading from a text file and
that readline() returns a string.

So the short question is: Should rfc822.Message require a text io
object or a binary io object?  Or should it except either (via some
new constructor or extra arguments to the existing constructor)?  I'm
not sure how to design an API for bytes vs strings.  The API used to
be equally well suited for reading from a file or a socket, but they
don't behave the same way anymore.

Jeremy

From steven.bethard at gmail.com  Mon Aug  6 17:22:58 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 6 Aug 2007 09:22:58 -0600
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <918D7BB8-3EAE-4295-A387-6B9AEB06D921@nicko.org>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<918D7BB8-3EAE-4295-A387-6B9AEB06D921@nicko.org>
Message-ID: <d11dcfba0708060822q1155d372icbe88897fb5d8708@mail.gmail.com>

On 8/6/07, Nicko van Someren <nicko at nicko.org> wrote:
> On 4 Aug 2007, at 06:11, Kurt B. Kaiser wrote:
> > There are a number of instances where map() is called for its side
> > effect, e.g.
> >
> >         map(print, line_sequence)
> >
> > with the return result ignored.  In py3k this has caused many silent
> > failures.  We've been weeding these out, and there are only a couple
> > left, but there are no doubt many more in 3rd party code.
>
> I'm sure that there are lots of these.  Other scenarios which will
> make for ugly bugs include things like map(db_commit,
> changed_record_list).

I'd just like to say that I'll be glad to see these kind of things go
away. This is really a confusing abuse of map() for a reader of the
code.

> Filter returning an iterator is going to break lots of code which
> says things like:
>         interesting_things = filter(predicate, things)
>         ...
>         if foo in interesting_things: ...

Actually, as written, that code will work just fine::

>>> from itertools import ifilter as filter
>>> interesting_things = filter(str.isalnum, 'a 1 . a1 a1.'.split())
>>> if 'a1' in interesting_things:
...     print 'it worked!'
...
it worked!

Perhaps you meant to have multiple if clauses?

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From guido at python.org  Mon Aug  6 19:58:27 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 10:58:27 -0700
Subject: [Python-3000] C API cleanup str
In-Reply-To: <46B6E66D.80301@livinglogic.de>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>
	<46B5C47B.5090703@v.loewis.de>
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>
	<46B5F136.4010502@v.loewis.de>
	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de>
	<46B6D6B8.7000207@v.loewis.de> <46B6E66D.80301@livinglogic.de>
Message-ID: <ca471dc20708061058s4b468b44k6f9ecd98019dbeeb@mail.gmail.com>

Do you guys need more guidance on this? It seems Martin's checkin
didn't make things worse in the tests deparment -- I find (on Ubuntu)
that test_ctypes is now failing, but test_threaded_import started
passing.

One issue with just putting this in the C API docs is that I believe
(tell me if I'm wrong) that these haven't been kept up to date in the
struni branch so we'll need to make a lot more changes than just this
one...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug  6 20:06:26 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 11:06:26 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B637DD.7070905@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
Message-ID: <ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>

On 8/5/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> This brings me to join others in the desire for
> immutable bytes objects: I think such a type is
> needed, and it should probably use the same
> hash algorithm as str8.
>
> I don't think it needs to be a separate type,
> instead, bytes objects could have a idem-potent
> .freeze() method which switches the "immutable"
> bit on. There would be no way to switch it off
> again.

I'm sorry, but this is unacceptable. It would make all reasoning based
upon the type of the object unsound: if type(X) == bytes, is it
hashable? Can we append, delete or replace values? What is the type of
a slice of it? If it is currently mutable, will it still be mutable
after I call some other function on it?

Python has traditionally always used a separate type for this purpose:
list vs. tuple, set vs. frozenset. If we are have to have a frozen
bytes type, it should be a separate type.

> If that is not acceptable, please tell me how else
> to fix the dbm modules.

By fixing the code that uses them? By using str8 (perhaps renamed to
frozenbytes and certainly stripped of its locale-dependent APIs)?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug  6 20:18:23 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 11:18:23 -0700
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>

On 8/3/07, Kurt B. Kaiser <kbk at shore.net> wrote:
> Although there has been quite a bit of discussion on dropping reduce()
> and retaining map(), filter(), and zip(), there has been less discussion
> (at least that I can find) on changing them to return iterators instead
> of lists.

That's probably because over the years that this has been on my list
of things I'd change most people agreed silently.

> I think of map() and filter() as sequence transformers.  To me, it's
> an unexpected semantic change that the result is no longer a list.

Well, enough people thought of them as iteratables to request imap(),
ifilter() and izip() added to the itertools library.

> In existing Lib/ code, it's twice as likely that the result of map()
> will be assigned than to use it as an iterator in a flow control
> statement.

Did you take into account the number of calls to imap()?

> If the statistics on the usage of map() stay the same, 2/3 of the time
> the current implementation will require code like
>
>         foo = list(map(fcn, bar)).

And the 2to3 tool should make this transformation (unless it can tell
from context that it's unnecessary, e.g. in a for-loop, or in a call
to list(), tuple() or sorted().

We didn't write the 2to3 transform, but it's easier than some others
we already did (e.g. keys()).

> map() and filter() were retained primarily because they can produce
> more compact and readable code when used correctly.  Adding list() most
> of the time seems to diminish this benefit, especially when combined with
> a lambda as the first arg.

When you have a lambda as the first argument the better translation is
*definitely* a list comprehension, as it saves creating stack frames
for the lambda calls.

> There are a number of instances where map() is called for its side
> effect, e.g.
>
>         map(print, line_sequence)
>
> with the return result ignored.

I'd just call that bad style.

> In py3k this has caused many silent
> failures.  We've been weeding these out, and there are only a couple
> left, but there are no doubt many more in 3rd party code.

And that's why the 2to3 tool needs improvement.

> The situation with filter() is similar, though it's not used purely
> for side effects.

Same reply though.

Also, have you ever liked the behavior that filter returns a string if
the input is a string, a tuple if the input is a tuple, and a list for
all other cases? That really sucks IMO.

> zip() is infrequently used.

It was especially designed for use in for-loops (to end the fruitless
discussions trying to come up with parallel iteration syntax). If it
wasn't for the fact that iterators hadn't been invented yet at the
time, zip() would definitely have returned an iterator right from the
start, just as enumerate().

> However, IMO for consistency they should all act the same way.

That's not much of consistency.

> I've seen GvR slides suggesting replacing map() et. al. with list
> comprehensions, but never with generator expressions.

Depends purely on context. Also, it's easier to talk about "list
comprehensions" than about "list comprehensions or generator
expressions" all the time, so I may have abbreviated my suggestions
occasionally.

> PEP 3100: "Make built-ins return an iterator where appropriate
> (e.g. range(), zip(), map(), filter(), etc.)"
>
> It makes sense for range() to return an iterator.  I have my doubts on
> map(), filter(), and zip().  Having them return iterators seems to
> be a premature optimization.  Could something be done in the ast phase
> of compilation instead?

Not likely, the compiler doesn't know enough about the state of builtins.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nicko at nicko.org  Mon Aug  6 20:33:03 2007
From: nicko at nicko.org (Nicko van Someren)
Date: Mon, 6 Aug 2007 19:33:03 +0100
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <d11dcfba0708060822q1155d372icbe88897fb5d8708@mail.gmail.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<918D7BB8-3EAE-4295-A387-6B9AEB06D921@nicko.org>
	<d11dcfba0708060822q1155d372icbe88897fb5d8708@mail.gmail.com>
Message-ID: <CCDA31AE-2865-4247-8D68-F871DE45D89D@nicko.org>

On 6 Aug 2007, at 16:22, Steven Bethard wrote:

> On 8/6/07, Nicko van Someren <nicko at nicko.org> wrote:
...
>> Filter returning an iterator is going to break lots of code which
>> says things like:
>>         interesting_things = filter(predicate, things)
>>         ...
>>         if foo in interesting_things: ...
>
> Actually, as written, that code will work just fine::
> ...
> Perhaps you meant to have multiple if clauses?

You're right, I did!  I wrote an much longer example with multiple  
ifs and it looked too verbose, so I edited it down, and lost the  
meaning.

In fact, from a bug tracing point of view it's even worse, since,  
reusing your example, code with currently reads:
	interesting_things = filter(str.isalnum, 'a 1 . a1 a1.'.split())
	if '1' in interesting_things:
		print "Failure"
	if 'a1' in interesting_things:
		print "... is not an option"
will currently do the same as, but in v3 will be different from:
	interesting_things = filter(str.isalnum, 'a 1 . a1 a1.'.split())
	if 'a1' in interesting_things:
		print "Failure"
	if '1' in interesting_things:
		print "... is not an option"

If filter() really has to be made to return an iterator then (a) 2to3  
is going to have to make lists of its output and (b) the behaviour is  
going to need to be very clearly documented.  I do think that many  
people are going to be confused by this change.

	Nicko


From bioinformed at gmail.com  Mon Aug  6 22:18:22 2007
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Mon, 6 Aug 2007 16:18:22 -0400
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
Message-ID: <2e1434c10708061318x1a2edf89x9faf352ecee58f55@mail.gmail.com>

On 8/6/07, Guido van Rossum <guido at python.org> wrote:
>
> On 8/3/07, Kurt B. Kaiser <kbk at shore.net> wrote:
> > If the statistics on the usage of map() stay the same, 2/3 of the time
> > the current implementation will require code like
> >
> >         foo = list(map(fcn, bar)).
>
> And the 2to3 tool should make this transformation (unless it can tell
> from context that it's unnecessary, e.g. in a for-loop, or in a call
> to list(), tuple() or sorted().



I hate to be pedantic, but it is not possible for 2to3 to tell, in general,
that it is safe to elide the list() because the result is used directly in a
for loop (for the usual arguments that use the Halting Problem as a trump
card). e.g.,

  foo = map(f,seq)
  for i in foo:
    if g(i):
      break

cannot be correctly transformed to (in Python 2.5 parlance):

  foo = imap(f,seq)
  for i in foo:
    if g(i):
      break

or equivalently:

  for x in seq:
    if g(f(x)):
       break

when f has side-effects, since:
  1. the loop need not evaluate the entire sequence, resulting in f(i) being
called for some prefix of seq
  2. g(i) may depend on the side-effects of f not yet determined

Given that Python revels in being a non-pure functional language, we can
poke fun of examples like:

  map(db.commit, transactions)

but we need to recognize that the semantics are very specific, like:

  map(db.commit_phase2, map(db.commit_phase1, transactions))

that performs all phase 1 commits before any phase 2 commits.

More so, the former idiom is much more sensible when 'commit' returns a
meaningful return value that must be stored.  Would we blink too much at:

  results = []
  for tx in transactions:
    results.append(db.commit(tx))

except to consider rewriting in 'modern' syntax as a list comprehension:

  results = [ db.commit(tx) for tx in transactions ]

I'm all for 2to3 being dumb.  Dumb but correct.  It should always put list()
around all uses of map(), filter(), dict.keys(), etc to maintain the exact
behavior from 2.6. Let the author of the code optimize away the extra work
if/when they feel comfortable doing so.  After all, it is their
job/reputation/life on the line, not ours.

~Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070806/de0b5034/attachment.html 

From brett at python.org  Mon Aug  6 22:29:07 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 6 Aug 2007 13:29:07 -0700
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
Message-ID: <bbaeab100708061329l49e658fcg4d17c0a6a61bf1f2@mail.gmail.com>

On 8/6/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> This is a fairly specific question, but it gets at a more general
> issue I don't fully understand.
>
> I recently updated httplib and urllib so that they work on the struni
> branch.  A recurring problem with these libraries is that they call
> methods like strip() and split().  On a string object, calling these
> methods with no arguments means strip/split whitespace.  The bytes
> object has no corresponding default arguments; whitespace may not be
> well-defined for bytes.  (Or is it?)
> In general, the approach was to read data as bytes off the socket and
> convert header lines to iso-8859-1 before processing them.
>
> test_urllib2_localnet still fails.  One of the problems is that
> BaseHTTPServer doesn't process HTTP responses correctly.  Like
> httplib, it converts the HTTP status line to iso-8859-1.  But it
> parses the rest of the headers by calling mimetools.Message, which is
> really rfc822.Message.  The header lines of an RFC 822 message
> (really, RFC 2822) are ascii, so it should be easy to do the
> conversion.  rfc822.Message assumes it is reading from a text file and
> that readline() returns a string.
>
> So the short question is: Should rfc822.Message require a text io
> object or a binary io object?  Or should it except either (via some
> new constructor or extra arguments to the existing constructor)?  I'm
> not sure how to design an API for bytes vs strings.  The API used to
> be equally well suited for reading from a file or a socket, but they
> don't behave the same way anymore.
>

This really should be redirected as a question for the email package
as rfc822 and mimetools have been deprecated for a while now in favor
of using 'email'.

-Brett

From brett at python.org  Mon Aug  6 22:30:23 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 6 Aug 2007 13:30:23 -0700
Subject: [Python-3000] C API cleanup str
In-Reply-To: <ca471dc20708061058s4b468b44k6f9ecd98019dbeeb@mail.gmail.com>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
	<46B5C47B.5090703@v.loewis.de>
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>
	<46B5F136.4010502@v.loewis.de>
	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de>
	<46B6D6B8.7000207@v.loewis.de> <46B6E66D.80301@livinglogic.de>
	<ca471dc20708061058s4b468b44k6f9ecd98019dbeeb@mail.gmail.com>
Message-ID: <bbaeab100708061330x11880239iefb0ccbf6971004a@mail.gmail.com>

On 8/6/07, Guido van Rossum <guido at python.org> wrote:
> Do you guys need more guidance on this? It seems Martin's checkin
> didn't make things worse in the tests deparment -- I find (on Ubuntu)
> that test_ctypes is now failing, but test_threaded_import started
> passing.

The test_threaded_import pass is from a fix I checked in, not Martin (I think).

-Brett

From guido at python.org  Mon Aug  6 22:43:43 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 13:43:43 -0700
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <2e1434c10708061318x1a2edf89x9faf352ecee58f55@mail.gmail.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
	<2e1434c10708061318x1a2edf89x9faf352ecee58f55@mail.gmail.com>
Message-ID: <ca471dc20708061343i1a0b8fecvc753d7a2a8f6845c@mail.gmail.com>

On 8/6/07, Kevin Jacobs <jacobs at bioinformed.com> <bioinformed at gmail.com> wrote:
> I hate to be pedantic, but it is not possible for 2to3 to tell, in general,
> that it is safe to elide the list() because the result is used directly in a
> for loop (for the usual arguments that use the Halting Problem as a trump
> card).

Of course not. 2to3 is a pragmatic tool and doesn't really care.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Mon Aug  6 22:46:15 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 6 Aug 2007 15:46:15 -0500
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
Message-ID: <18103.34967.170146.660275@montanaro.dyndns.org>

I thought rfc822 was going away.  From the current module documentation:

    Deprecated since release 2.3. The email package should be used in
    preference to the rfc822 module. This module is present only to maintain
    backward compatibility.

Shouldn't rfc822 be gone altogether in Python 3?

Skip

From martin at v.loewis.de  Mon Aug  6 23:00:43 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 06 Aug 2007 23:00:43 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <46B6E66D.80301@livinglogic.de>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>		<46B2C8E0.8080409@canterbury.ac.nz>		<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>		<46B5C47B.5090703@v.loewis.de>		<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>		<46B5F136.4010502@v.loewis.de>	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de>
	<46B6D6B8.7000207@v.loewis.de> <46B6E66D.80301@livinglogic.de>
Message-ID: <46B78BFB.7000005@v.loewis.de>

>> Unfortunately, this made creating and retrieving asymmetric:
>> when you do PyUnicode_AsString, you'll get an UTF-8 string; when
>> you do PyUnicode_FromString, you did have to pass Latin-1. Making
>> AsString also return Latin-1 would, of course, restrict the number of
>> cases where it works.
> 
> True, UTF-8 seems to be the better choice. However all spots in the C
> source that call PyUnicode_FromString() only pass ASCII anyway, which
> will probably be the most common case.

Right - so from a practical point of view, it makes no difference.
However, we still need to agree, then standardize, now, so we can
give people a consistent picture.

>>> So should NULL support be dropped from PyUnicode_FromStringAndSize()?
>> That's my proposal, yes.
> 
> At least this would give a clear error message in case someone passes NULL.

Ok, so I'm gooing to change it, then.

Regards,
Martin

From martin at v.loewis.de  Mon Aug  6 23:07:34 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Aug 2007 23:07:34 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <ca471dc20708061058s4b468b44k6f9ecd98019dbeeb@mail.gmail.com>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>	
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>	
	<46B5C47B.5090703@v.loewis.de>	
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>	
	<46B5F136.4010502@v.loewis.de>	
	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>	
	<46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de>	
	<46B6D6B8.7000207@v.loewis.de> <46B6E66D.80301@livinglogic.de>
	<ca471dc20708061058s4b468b44k6f9ecd98019dbeeb@mail.gmail.com>
Message-ID: <46B78D96.4090901@v.loewis.de>

> One issue with just putting this in the C API docs is that I believe
> (tell me if I'm wrong) that these haven't been kept up to date in the
> struni branch so we'll need to make a lot more changes than just this
> one...

That's certainly the case. However, if we end up deleting the str8
type entirely, I'd be in favor of recycling the PyString_* names
for Unicode, in which case everything needs to be edited, anyway.

Regards,
Martin


From martin at v.loewis.de  Mon Aug  6 23:15:50 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Aug 2007 23:15:50 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
Message-ID: <46B78F86.9000505@v.loewis.de>

>> If that is not acceptable, please tell me how else
>> to fix the dbm modules.
> 
> By fixing the code that uses them?

I don't know how to do that. All implementation strategies I
can think of have significant drawbacks.

> By using str8 (perhaps renamed to
> frozenbytes and certainly stripped of its locale-dependent APIs)?

Ok. So you could agree to a frozenbytes type, then? I'll add one,
reusing the implementation of the bytes object.

If that is done:

a) should one of these be the base type of the other?
b) should bytes literals be regular or frozen bytes?
c) is it still ok to provide a .freeze() method on
   bytes returning frozenbytes?
d) should unicode.defenc be frozen?
e) should (or may) codecs return frozenbytes?

Regards,
Martin



From fdrake at acm.org  Mon Aug  6 23:18:21 2007
From: fdrake at acm.org (Fred Drake)
Date: Mon, 6 Aug 2007 17:18:21 -0400
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <18103.34967.170146.660275@montanaro.dyndns.org>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
Message-ID: <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>

On Aug 6, 2007, at 4:46 PM, skip at pobox.com wrote:
> I thought rfc822 was going away.  From the current module  
> documentation:
> ...
> Shouldn't rfc822 be gone altogether in Python 3?

Yes.  And the answers to Jeremy's questions about what sort of IO is  
appropriate for the email package should be left to the email-sig as  
well, I suspect.  It's good that they've come up.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From talex5 at gmail.com  Mon Aug  6 21:33:18 2007
From: talex5 at gmail.com (Thomas Leonard)
Date: Mon, 6 Aug 2007 20:33:18 +0100
Subject: [Python-3000] Binary compatibility
Message-ID: <cd53a0140708061233v3776e9d5m40f39f3a022bb76d@mail.gmail.com>

Hi all,

I recently asked about the UCS2 / UCS4 binary compatibility issues
with Python on Guido's blog, and Guido suggested I continue the
discussion here:

http://www.artima.com/forums/flat.jsp?forum=106&thread=211430

The issue is that Python has a compile-time configuration setting
which changes its ABI. For example, on Ubuntu we have:

$ objdump -T /usr/bin/python|grep UCS
080ac3e0 g    DF .text  00000206  Base        PyUnicodeUCS4_EncodeUTF8
080b2810 g    DF .text  000000ba  Base        PyUnicodeUCS4_DecodeLatin1
080b6c20 g    DF .text  000002b3  Base        PyUnicodeUCS4_RSplit
...

Whereas on some other systems, including compiled-from-source Python, you get:

$ objdump -T python|grep UCS
080abc80 g    DF .text  00000201  Base        PyUnicodeUCS2_EncodeUTF8
080b32e0 g    DF .text  000000c7  Base        PyUnicodeUCS2_DecodeLatin1
080b6740 g    DF .text  000002b9  Base        PyUnicodeUCS2_RSplit

(note "UCS2" vs "UCS4")

This means that I can't distribute Python extensions as binaries. Any
extension built on Ubuntu may fail on some other system. I confess I
haven't tried this recently, but it has caused me trouble in the past.
I'd like to be sure it won't happen with Python 3.

I've hit this problem with both of the open source projects I work on;
the ROX desktop (http://rox.sf.net) and Zero Install
(http://0install.net).

ROX is a desktop environment. Most of our programs are written in
(pure) Python. Some, including ROX-Filer, are pure C. Sometimes it
would have been useful to combine the two: for example we could write
the pager applet in Python if it could use C to talk to the libwnck
library, or we could add Python scripting to the filer and gradually
migrate more of the code to Python.

Zero Install is a decentralised software installation system, itself
written entirely in Python, in which software authors publish
GPG-signed XML feed files on their websites. These feeds list versions
of their programs along with a cryptographic digest of each version's
contents (think GIT tree IDs here). This allows installing software
without needing root access, while still sharing libraries and
programs automatically between (mutually suspicious) users. Although
we don't need to use C extensions for the system itself, distributing
Python/C hybrid programs with it has been problematic.

Another group having similar problems is the Autopackage project:

 http://trac.autopackage.org/wiki/LinuxProblems#Python
 http://trac.autopackage.org/wiki/PackagingPythonApps
 http://plan99.net/~mike/blog/2006/05/24/python-unicode-abi/

Finally, the issue has also been brought up before on the Python lists:
   http://mail.python.org/pipermail/python-dev/2005-September/056837.html

Guido suggested:

 "Why don't you distribute a Python interpreter binary built with the
right options? Depending on users having installed the correct Python
version (especially if your users are not programmers) is asking for
trouble."

There are several problems for us with this approach:

- We have to maintain our own version of Python, including pushing out
security updates.

- We also have to maintain all the Python modules, in particular
python-gnome, in a similar way.

- Our users have to download Python twice whenever there's a new release.

- If some programs are using the distribution's Python and some are
using ours (libraries installed using Zero Install are only used by
software itself installed the same way; distribution packages aren't
affected), two copies of Python must be loaded into memory. This is
slow and wasteful of memory.

This is assuming all third-party code uses Zero Install for
distribution, so that only one extra version of Python is required.
For people distributing programs by other means, they would also have
to include their own copies of Python, leading to even more waste.

>From our point of view, it would be better if the format of strings
was an internal implementation detail. For most users, it doesn't
matter what the setting is, as long as the public interface doesn't
change! The cost of converting between formats is small, and in any
case most software outside of Python (the GNOME stack, for example)
uses UTF-8, so all strings have to be converted when going in or out
of Python anyway.

An alternative would be to default to UCS4, and give the option an
alarming  name such as --with-unicode-for-space-limited-devices or
something so that packagers don't mess with it.

Thanks,


-- 
Dr Thomas Leonard		http://rox.sourceforge.net
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1

From guido at python.org  Mon Aug  6 23:33:34 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 14:33:34 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B78F86.9000505@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
Message-ID: <ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>

On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> >> If that is not acceptable, please tell me how else
> >> to fix the dbm modules.
> >
> > By fixing the code that uses them?
>
> I don't know how to do that. All implementation strategies I
> can think of have significant drawbacks.

Can you elaborate about the issues?

> > By using str8 (perhaps renamed to
> > frozenbytes and certainly stripped of its locale-dependent APIs)?
>
> Ok. So you could agree to a frozenbytes type, then? I'll add one,
> reusing the implementation of the bytes object.

Not quite. It's the least evil. I'm hoping to put off the decision.

Could you start using str8 instead for now? Or is that not usable for
a fix? (If so, why not?)

> If that is done:
>
> a) should one of these be the base type of the other?

No. List and tuple don't inherit from each other, nor do set and
frozenset. A common base class is okay. (We didn't quite do this for
sets but it makes sense for Py3k to change this.)

> b) should bytes literals be regular or frozen bytes?

Regular -- set literals produce mutable sets, too.

> c) is it still ok to provide a .freeze() method on
>    bytes returning frozenbytes?

I'd rather use the same kind of API used between set and frozenset:
each constructor takes the other as argument.

> d) should unicode.defenc be frozen?

Yes. It's currently a str8 isn't it? So that's already the case.

> e) should (or may) codecs return frozenbytes?

I think it would be more convenient and consistent if all APIs
returned mutable bytes and the only API that creates frozen bytes was
the frozen bytes constructor. (defenc excepted as it's a C-level API
and having it be mutable would be bad.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Mon Aug  6 23:52:21 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Aug 2007 23:52:21 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>	
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
Message-ID: <46B79815.1030504@v.loewis.de>

>> I don't know how to do that. All implementation strategies I
>> can think of have significant drawbacks.
> 
> Can you elaborate about the issues?

It's a decision tree:

0. whichdb fails

1. should the DB APIs use strings or bytes as keys and values?
   Given the discussion of bsddb, I went for "bytes". I replace

   f["1"] = b"1"
 with
   f[b"1"] = b"1"

2. then,  dumbdbm fails, with TypeError: keys must be strings.
   I change __setitem__ to expect bytes instead of basestring

3. it fails with unhashable type: 'bytes' in line 166:

   if key not in self._index:

   _index is a dictionary. It's really essential that the key
   can be found quickly in _index, since this is how it finds
   the data in the database (so using, say, a linear search would
   be no option)

> Not quite. It's the least evil. I'm hoping to put off the decision.

For how long? Do you expect to receive further information that will
make a decision simpler?

> Could you start using str8 instead for now? Or is that not usable for
> a fix? (If so, why not?)

It should work, although I probably will have to fix the index file
generation in dumbdbm (either way), since it uses %r to generate
the index; this would put s prefixes into the file which won't be
understood on reading (it uses eval() to process the index, which
might need fixing, anyway)

> No. List and tuple don't inherit from each other, nor do set and
> frozenset. A common base class is okay. (We didn't quite do this for
> sets but it makes sense for Py3k to change this.)

Ok, so there would be basebytes, I assume.

>> d) should unicode.defenc be frozen?
> 
> Yes. It's currently a str8 isn't it? So that's already the case.

Right.

I think I will have to bite the bullet and use str8 explicitly,
although doing so makes me shudder.

Regards,
Martin

From guido at python.org  Tue Aug  7 00:33:57 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 15:33:57 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B79815.1030504@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
Message-ID: <ca471dc20708061533h76477676w265052f1a18bd005@mail.gmail.com>

On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I think I will have to bite the bullet and use str8 explicitly,
> although doing so makes me shudder.

This is the right short-term solution IMO. We'll rename or reconsider
later -- either closer to the a1 release or after.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug  7 01:55:18 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 16:55:18 -0700
Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests
	in the struni branch!
Message-ID: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>

We're down to 11 failing test in the struni branch. I'd like to get
this down to zero ASAP so that we can retire the old p3yk (yes, with
typo!) branch and rename py3k-struni to py3k.

Please help! Here's the list of failing tests:

test_ctypes
Recently one test started failing again, after Martin changed
PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1.

test_email
test_email_codecs
test_email_renamed
Can someone contact the email-sig and ask for help with these?

test_minidom
Recently started failing again; probably shallow.

test_sqlite
Virgin territory, probably best done by whoever wrote the code or at
least someone with time to spare.

test_tarfile
Virgin territory again (but different owner :-).

test_urllib2_localnet
test_urllib2net
I think Jeremy Hylton may be close to fixing these, he's done a lot of
work on urllib and httplib.

test_xml_etree_c
Virgin territory again.

There are also a few tests that only fail on CYGWIN or OSX; I won't
bother listing these.

If you want to help, please refer to this wiki page:
http://wiki.python.org/moin/Py3kStrUniTests

There are also other tasks; see http://wiki.python.org/moin/Py3kToDo

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From shiblon at gmail.com  Tue Aug  7 02:09:11 2007
From: shiblon at gmail.com (Chris Monson)
Date: Mon, 6 Aug 2007 20:09:11 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
Message-ID: <da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>

On 8/6/07, Guido van Rossum <guido at python.org> wrote:
>
> On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > >> If that is not acceptable, please tell me how else
> > >> to fix the dbm modules.
> > >
> > > By fixing the code that uses them?
> >
> > I don't know how to do that. All implementation strategies I
> > can think of have significant drawbacks.
>
> Can you elaborate about the issues?
>
> > > By using str8 (perhaps renamed to
> > > frozenbytes and certainly stripped of its locale-dependent APIs)?
> >
> > Ok. So you could agree to a frozenbytes type, then? I'll add one,
> > reusing the implementation of the bytes object.
>
> Not quite. It's the least evil. I'm hoping to put off the decision.
>
> Could you start using str8 instead for now? Or is that not usable for
> a fix? (If so, why not?)
>
> > If that is done:
> >
> > a) should one of these be the base type of the other?
>
> No. List and tuple don't inherit from each other, nor do set and
> frozenset. A common base class is okay. (We didn't quite do this for
> sets but it makes sense for Py3k to change this.)
>
> > b) should bytes literals be regular or frozen bytes?
>
> Regular -- set literals produce mutable sets, too.


But all other string literals produce immutable types:

""
r""
u"" (going away, but still)
and hopefully b""

Wouldn't it be confusing to have b"" be the only mutable quote-delimited
literal?  For everything else, there's bytes().

:-)

- C

> c) is it still ok to provide a .freeze() method on
> >    bytes returning frozenbytes?
>
> I'd rather use the same kind of API used between set and frozenset:
> each constructor takes the other as argument.
>
> > d) should unicode.defenc be frozen?
>
> Yes. It's currently a str8 isn't it? So that's already the case.
>
> > e) should (or may) codecs return frozenbytes?
>
> I think it would be more convenient and consistent if all APIs
> returned mutable bytes and the only API that creates frozen bytes was
> the frozen bytes constructor. (defenc excepted as it's a C-level API
> and having it be mutable would be bad.)
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/shiblon%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070806/5973ca68/attachment.html 

From guido at python.org  Tue Aug  7 02:19:12 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 17:19:12 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
Message-ID: <ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>

On 8/6/07, Chris Monson <shiblon at gmail.com> wrote:
> On 8/6/07, Guido van Rossum <guido at python.org> wrote:
> > On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > > b) should bytes literals be regular or frozen bytes?
> >
> > Regular -- set literals produce mutable sets, too.
>
> But all other string literals produce immutable types:
>
> ""
> r""
> u"" (going away, but still)
> and hopefully b""
>
> Wouldn't it be confusing to have b"" be the only mutable quote-delimited
> literal?  For everything else, there's bytes().

Well, it would be just as confusing to have a bytes literal and not
have it return a bytes object. The frozenbytes type is intended (if I
understand the use case correctly) as for the relatively rare case
where bytes must be used as dict keys and we can't assume that the
bytes use any particular encoding.

Personally, I still think that converting to the latin-1 encoding is
probably just as good for this particular use case. So perhaps I don't
understand the use case(s?) correctly.

> :-)

What does the :-) mean? That you're not seriously objecting?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug  7 02:39:15 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 17:39:15 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B79815.1030504@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
Message-ID: <ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>

On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> >> I don't know how to do that. All implementation strategies I
> >> can think of have significant drawbacks.
> >
> > Can you elaborate about the issues?
>
> It's a decision tree:
>
> 0. whichdb fails
>
> 1. should the DB APIs use strings or bytes as keys and values?
>    Given the discussion of bsddb, I went for "bytes". I replace
>
>    f["1"] = b"1"
>  with
>    f[b"1"] = b"1"
>
> 2. then,  dumbdbm fails, with TypeError: keys must be strings.
>    I change __setitem__ to expect bytes instead of basestring
>
> 3. it fails with unhashable type: 'bytes' in line 166:
>
>    if key not in self._index:
>
>    _index is a dictionary. It's really essential that the key
>    can be found quickly in _index, since this is how it finds
>    the data in the database (so using, say, a linear search would
>    be no option)

I thought about this issue some more.

Given that the *dbm types strive for emulating dicts, I think it makes
sense to use strings for the keys, and bytes for the values; this
makes them more plug-compatible with real dicts. (We should ideally
also change the keys() method etc. to return views.) This of course
requires that we know the encoding used for the keys. Perhaps it would
be acceptable to pick a conservative default encoding (e.g. ASCII) and
add an encoding argument to the open() method.

Perhaps this will work? It seems better than using str8 or bytes for the keys.

> > Not quite. It's the least evil. I'm hoping to put off the decision.
>
> For how long? Do you expect to receive further information that will
> make a decision simpler?

I'm waiting for a show-stopper issue that can't be solved without
having an immutable bytes type. It would be great if we could prove to
ourselves that such a show-stopper will never happen; or if we found
one quickly. But so far the show-stopper candidates aren't convincing.
At the same time we still have enough uses of str9 and PyString left
in the code base that we can't kill str8 yet.

It would be great if we had the decision before alpha 1 but I'm okay
if it remains open a bit longer (1-2 months past alpha 1).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From kbk at shore.net  Tue Aug  7 02:49:04 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Mon, 06 Aug 2007 20:49:04 -0400
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
	(Guido van Rossum's message of "Mon, 6 Aug 2007 11:18:23 -0700")
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
Message-ID: <87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com>

"Guido van Rossum" <guido at python.org> writes:

>> I think of map() and filter() as sequence transformers.  To me, it's
>> an unexpected semantic change that the result is no longer a list.
>
> Well, enough people thought of them as iteratables to request imap(),
> ifilter() and izip() added to the itertools library.

Agreed.  When processing (possibly very long) streams, the lazy versions
have great advantages.

3>> def ones():
        while True:
                yield 1

3>> a = map(lambda x: x+x, ones())
3>> b = map(lambda x: x+1, a)
3>> b.__next__()
3

If you try that in 2.6 you fill memory.

However, IMHO eliminating the strict versions of map() and filter() in
favor of the lazy versions from itertools kicks the degree of
sophistication necessary to understand these functions up a notch (or
three).

3>> c, d = map(int, ('1', '2'))
3>> c, d
(1, 2)
3>> e = map(int, ('1', '2'))
3>> f, g = e
3>> f, g
(1, 2)
3>> f, g = e
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    f, g = e
ValueError: need more than 0 values to unpack


To say nothing of remembering having to use

3>> foo = (list(map(bar)))

most the time.  I'd say keep map(), filter(), imap() and ifilter(), and
use the latter when you're working with streams.

"Explicit is better than implicit."

Then there's the silent failure to process the side-effects of

3>> map(print, lines)

which is rather unexpected.  To me, this code is quite readable and not
at all pathological (no more than any print statement :).  It may not be
Pythonic in the modern idiom (that pattern is found mostly in code.py
and IDLE, and it's very rare), but it's legal and it's a little
surprising that it's necessary to spell it

3>> list(map(print, lines)) 

now to get any action.  It took me awhile to track down the failures in
the interactive interpreter emulator because that pattern was being used
to print the exceptions; the thing just produced no output at all.

The alternatives

3>> print('\n'.join(lines))

or

3>> (print(line) for line in lines)  # oops, nothing happened
3>> [print(line) for line in lines]

aren't much of an improvement.

>> In existing Lib/ code, it's twice as likely that the result of map()
>> will be assigned than to use it as an iterator in a flow control
>> statement.
>
> Did you take into account the number of calls to imap()?

No.  Since the py3k branch is partially converted, I went back to 2.6,
where skipping Lib/test/, there are (approximately!!):

87  assignments of the output of map(), passing a list
21  assignments involving map(), but not directly.  Many of these involve
    'tuple' or 'join' and could accept an iterator.
58  return statements involving map() (39 directly)
1   use to construct a list used as an argument
2   for ... in map()   (!!)   and 1 for ... in enumerate(map(...))
1   use as map(foo, bar) == baz_list
5   uses of imap()

[...]

> We didn't write the 2to3 transform, but it's easier than some others
> we already did (e.g. keys()).

I see a transform in svn.  As an aside, is there any accepted process
for running these transforms over the p3yk branch?  Some parts of Lib/
are converted, possibly by hand, possibly by 2to3, and other parts are
not.

[...]

> When you have a lambda as the first argument the better translation is
> *definitely* a list comprehension, as it saves creating stack frames
> for the lambda calls.

Thanks, good tip.

[...]

> Also, have you ever liked the behavior that filter returns a string if
> the input is a string, a tuple if the input is a tuple, and a list for
> all other cases? That really sucks IMO.

If you look at map() and filter() as sequence transformers, that makes
some sense: preserve the type of the sequence if possible.  But clearly
map() and filter() should act the same way!

>> zip() is infrequently used.
>
> It was especially designed for use in for-loops (to end the fruitless
> discussions trying to come up with parallel iteration syntax). If it
> wasn't for the fact that iterators hadn't been invented yet at the
> time, zip() would definitely have returned an iterator right from the
> start, just as enumerate().
>
>> However, IMO for consistency they should all act the same way.
>
> That's not much of consistency.

It's used only fifteen times in 2.6 Lib/ and four of those are
izip(). Eight are assignments, mostly to build dicts. Six are in
for-loops. One is a return.

>> Could something be done in the ast phase of compilation instead?
>
> Not likely, the compiler doesn't know enough about the state of builtins.

OK, thanks for the reply.

-- 
KBK

From greg.ewing at canterbury.ac.nz  Tue Aug  7 02:53:27 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 12:53:27 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B6DABB.3080509@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com>
Message-ID: <46B7C287.3030007@canterbury.ac.nz>

Ron Adam wrote:
>     {0:d,10+20}      # Field width are just string operations done after
>                      # formatting is done.
> 
>     {0:d10+20}       # Numeric widths,  differ from field widths.
>                      # They are specific to the type so can handle special
>                      # cases.

Still not good - if you confuse two very similar and easily
confusable things, your numbers get chopped off.

I'm with Guido on this one: only strings should have
a max width, and it should be part of the string format
spec, so you can't accidentally apply it to numbers.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug  7 02:57:13 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 12:57:13 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B6DE80.2050000@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com>
	<46B6C335.4080504@canterbury.ac.nz> <46B6DE80.2050000@ronadam.com>
Message-ID: <46B7C369.3040509@canterbury.ac.nz>

Ron Adam wrote:
> What should happen in various situations of mismatched or invalid type 
> specifiers?

I think that a format string that is not understood
by any part of the system should raise an exception
(rather than, e.g. falling back on str()). Refuse the
temptation to guess.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug  7 03:30:29 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 13:30:29 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
Message-ID: <46B7CB35.5020204@canterbury.ac.nz>

Guido van Rossum wrote:
> The frozenbytes type is intended (if I
> understand the use case correctly) as for the relatively rare case
> where bytes must be used as dict keys

Another issue I can see is that not having a frozen
bytes literal means that there is no efficient way of
embedding constant bytes data in a program. You
end up with extra overhead in both time and space
(two copies of the data in memory, and extra time
needed to make the copy).

If the literal form is frozen, on the other hand,
you only incur these overheads when you really need
a mutable copy of the data.

--
Greg

From talin at acm.org  Tue Aug  7 03:40:28 2007
From: talin at acm.org (Talin)
Date: Mon, 06 Aug 2007 18:40:28 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B6DABB.3080509@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com>
Message-ID: <46B7CD8C.5070807@acm.org>

Ron Adam wrote:
> Now here's the problem with all of this.  As we add the widths back into 
> the format specifications, we are basically saying the idea of a separate 
> field width specifier is wrong.
> 
> So maybe it's not really a separate independent thing after all, and it 
> just a convenient grouping for readability purposes only.

I'm beginning to suspect that this is indeed the case.

Before we go too much further, let me give out the URLs for the .Net 
documentation on these topics, since much of the current design we're 
discussing has been inspired by .Net:

http://msdn2.microsoft.com/en-us/library/dwhawy9k.aspx
http://msdn2.microsoft.com/en-us/library/0c899ak8.aspx
http://msdn2.microsoft.com/en-us/library/0asazeez.aspx
http://msdn2.microsoft.com/en-us/library/c3s1ez6e.aspx
http://msdn2.microsoft.com/en-us/library/az4se3k1.aspx
http://msdn2.microsoft.com/en-us/library/txafckwd.aspx

I'd suggest some study of these. Although I would warn against adopting 
this wholesale, as there are a huge number of features described in 
these documents, more than I think we need.

One other URL for people who want to play around with implementing this 
stuff is my Python prototype of the original version of the PEP. It has 
all the code you need to format floats with decimal precision, 
exponents, and so on:

http://www.viridia.org/hg/python/string_format?f=5e4b833ed285;file=StringFormat.py;style=raw

-- Talin

From fdrake at acm.org  Tue Aug  7 03:41:40 2007
From: fdrake at acm.org (Fred Drake)
Date: Mon, 6 Aug 2007 21:41:40 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7CB35.5020204@canterbury.ac.nz>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B7CB35.5020204@canterbury.ac.nz>
Message-ID: <B8EB32BA-85A7-47AE-BCFE-EDDF29E4C9F6@acm.org>

On Aug 6, 2007, at 9:30 PM, Greg Ewing wrote:
> If the literal form is frozen, on the other hand,
> you only incur these overheads when you really need
> a mutable copy of the data.

Indeed.  I have no reason to think the desire for a frozen form is  
the oddball case; I suspect it will be much more common than the need  
for mutable bytes objects from literal data in my own code.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From talin at acm.org  Tue Aug  7 03:43:03 2007
From: talin at acm.org (Talin)
Date: Mon, 06 Aug 2007 18:43:03 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B701E5.3030206@gmail.com>
References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com>
Message-ID: <46B7CE27.3030103@acm.org>

Nick Coghlan wrote:
> Martin v. L?wis wrote:
>> I don't think it needs to be a separate type,
>> instead, bytes objects could have a idem-potent
>> .freeze() method which switches the "immutable"
>> bit on. There would be no way to switch it off
>> again.
> 
> +1 here - hashable byte sequences are very handy for dealing with 
> fragments of low level serial protocols.
> 
> It would also be nice if b"" literals set that immutable flag 
> automatically - otherwise converting some of my lookup tables over to 
> Py3k would be a serious pain (not a pain I'm likely to have to deal with 
> personally given the relative time frames involved, but a pain nonetheless).


I'm also for an immutable bytes type - but I'm not so sure about 
freezing in place.

The most efficient representation of immutable bytes is quite different 
from the most efficient representation of mutable bytes.

Rather, I think that they should both share a common Abstract Base Class 
defining what you can do with immutable byte strings, but the actual 
storage of the bytes themselves should be implemented in the subclass.

-- Talin

From mike.klaas at gmail.com  Tue Aug  7 03:57:07 2007
From: mike.klaas at gmail.com (Mike Klaas)
Date: Mon, 6 Aug 2007 18:57:07 -0700
Subject: [Python-3000]  Immutable bytes type and dbm modules
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
Message-ID: <B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>

On 6-Aug-07, at 5:39 PM, Guido van Rossum wrote:

>
> I thought about this issue some more.
>
> Given that the *dbm types strive for emulating dicts, I think it makes
> sense to use strings for the keys, and bytes for the values; this
> makes them more plug-compatible with real dicts. (We should ideally
> also change the keys() method etc. to return views.) This of course
> requires that we know the encoding used for the keys. Perhaps it would
> be acceptable to pick a conservative default encoding (e.g. ASCII) and
> add an encoding argument to the open() method.
>
> Perhaps this will work? It seems better than using str8 or bytes  
> for the keys.

There are some scenarios that might be difficult under such a regime.

The berkeley api provides means for efficiently mapping a bytestring  
to another bytestring.  Often, the data is not text, and the  
performance of the database is sensitive to the means of serialization.

For instance, it is quite common to use integers as keys.  If you are  
inserting keys in order, it is about a hundred times faster to encode  
the ints in big-endian byte order than than little-endian:

class MyIntDB(object):
	def __setitem__(self, key, item):
               self.db.put(struct.pack('>Q', key), serializer(item))
         def __getitem__(self, key):
               return unserializer(self.db.get(struct.pack('>Q', key)))

How do you envision these types of tasks being accomplished with  
unicode keys?  It is conceivable that one could write a custom  
unicode encoding that accomplishes this, convert the key to unicode,  
and pass the custom encoding name to the constructor.

regards,
-Mike


From guido at python.org  Tue Aug  7 04:06:34 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 19:06:34 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
Message-ID: <ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>

On 8/6/07, Mike Klaas <mike.klaas at gmail.com> wrote:
> On 6-Aug-07, at 5:39 PM, Guido van Rossum wrote:
> > Given that the *dbm types strive for emulating dicts, I think it makes
> > sense to use strings for the keys, and bytes for the values; this
> > makes them more plug-compatible with real dicts. (We should ideally
> > also change the keys() method etc. to return views.) This of course
> > requires that we know the encoding used for the keys. Perhaps it would
> > be acceptable to pick a conservative default encoding (e.g. ASCII) and
> > add an encoding argument to the open() method.
> >
> > Perhaps this will work? It seems better than using str8 or bytes
> > for the keys.
>
> There are some scenarios that might be difficult under such a regime.
>
> The berkeley api provides means for efficiently mapping a bytestring
> to another bytestring.  Often, the data is not text, and the
> performance of the database is sensitive to the means of serialization.
>
> For instance, it is quite common to use integers as keys.  If you are
> inserting keys in order, it is about a hundred times faster to encode
> the ints in big-endian byte order than than little-endian:

I'm assuming that this speed difference says something about the
implementation of the underlying dbm package. Which package did you
use to measure this?

> class MyIntDB(object):
>         def __setitem__(self, key, item):
>                self.db.put(struct.pack('>Q', key), serializer(item))
>          def __getitem__(self, key):
>                return unserializer(self.db.get(struct.pack('>Q', key)))
>
> How do you envision these types of tasks being accomplished with
> unicode keys?  It is conceivable that one could write a custom
> unicode encoding that accomplishes this, convert the key to unicode,
> and pass the custom encoding name to the constructor.

Well, the *easiest* (I don't know about simplest) way to use ints as
keys is of course to use the decimal representation. You'd use
str(key) instead of struct.pack(). This would of course not maintain
key order -- is that important? If you need to be compatible with
struct.pack(), and we were to choose Unicode strings for the keys in
the API, then you might have to do something like
struct.pack(...).encode("latin-1") and specify latin-1 as the
database's key encoding.

Of course this may not be compatible with an external constraint (e.g.
another application that already has a key format) but in that case
you may have to use arbitrary tricks anyway (the latin-1 encoding
might still be helpful).

However, I give you that a pure bytes API would be more convenient at times.

How about we define two APIs, using raw bytes and one using strings +
a given encoding?

Or perhaps a special value of the encoding argument passed to
*dbm.open() (maybe None, maybe the default, maybe "raw" or "bytes"?)
to specify that the key values are to be bytes?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug  7 04:22:40 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 19:22:40 -0700
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
	<87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <ca471dc20708061922g4e8babceh85dc2687beae8978@mail.gmail.com>

On 8/6/07, Kurt B. Kaiser <kbk at shore.net> wrote:
> "Guido van Rossum" <guido at python.org> writes:
[...pushback...]

> However, IMHO eliminating the strict versions of map() and filter() in
> favor of the lazy versions from itertools kicks the degree of
> sophistication necessary to understand these functions up a notch (or
> three).

I wonder how bad this is given that range() and dict.keys() and
friends will also stop returning lists? I don't think you ever saw any
of my Py3k presentations (the slides of the latest one at here:
http://conferences.oreillynet.com/presentations/os2007/os_vanrossum.ppt).
I've always made a point of suggesting that we're switching to
returning iterators instead of lists from as many APIs as makes sense
(I stop at str.split() though, as I can't think of a use case where
the list would be so big as to be bothersome).

> To say nothing of remembering having to use
>
> 3>> foo = (list(map(bar)))
>
> most the time.

I think you're overreacting due to your experience with conversion of
existing code. I expect that new use cases where a list is needed will
generally be written using list comprehensions in Py3k-specific
tutorials, and generator expressions for situations where a list isn't
needed (as a slightly more advanced feature). Then map() and filter()
can be shown as more advanced optimizations of certain end cases.

> I'd say keep map(), filter(), imap() and ifilter(), and
> use the latter when you're working with streams.
>
> "Explicit is better than implicit."
>
> Then there's the silent failure to process the side-effects of
>
> 3>> map(print, lines)
>
> which is rather unexpected.  To me, this code is quite readable and not
> at all pathological (no more than any print statement :).  It may not be
> Pythonic in the modern idiom (that pattern is found mostly in code.py
> and IDLE, and it's very rare), but it's legal and it's a little
> surprising that it's necessary to spell it
>
> 3>> list(map(print, lines))
>
> now to get any action.

Aren't you a little too fond of this idiom? I've always found it a
little surprising when I encountered it, and replaced it with the more
straightforward

 for line in lines:
    print(line)

> It took me awhile to track down the failures in
> the interactive interpreter emulator because that pattern was being used
> to print the exceptions; the thing just produced no output at all.

I think that's just a side effect of the conversion. I take it you
didn't use 2to3?

> The alternatives
>
> 3>> print('\n'.join(lines))
>
> or
>
> 3>> (print(line) for line in lines)  # oops, nothing happened
> 3>> [print(line) for line in lines]
>
> aren't much of an improvement.

Well duh. Really. What's wrong with writing it as a plain old for-loop?

> >> In existing Lib/ code, it's twice as likely that the result of map()
> >> will be assigned than to use it as an iterator in a flow control
> >> statement.
> >
> > Did you take into account the number of calls to imap()?
>
> No.  Since the py3k branch is partially converted, I went back to 2.6,
> where skipping Lib/test/, there are (approximately!!):
>
> 87  assignments of the output of map(), passing a list
> 21  assignments involving map(), but not directly.  Many of these involve
>     'tuple' or 'join' and could accept an iterator.
> 58  return statements involving map() (39 directly)
> 1   use to construct a list used as an argument
> 2   for ... in map()   (!!)   and 1 for ... in enumerate(map(...))
> 1   use as map(foo, bar) == baz_list
> 5   uses of imap()

I'm not sure what the relevant of assignments is. I can assign an
iterator to a variable and do stuff with it and never require it to be
a list. I can also pass a map() call into a function and then it
depends on what the function does to that argument.

> [...]
>
> > We didn't write the 2to3 transform, but it's easier than some others
> > we already did (e.g. keys()).
>
> I see a transform in svn.

I guess I didn't look well enough.

> As an aside, is there any accepted process
> for running these transforms over the p3yk branch?  Some parts of Lib/
> are converted, possibly by hand, possibly by 2to3, and other parts are
> not.

(Aside: Please skip the p3yk branch and use the py3k-struni branch --
it's the way of the future.)

I tend to do manual conversion of the stdlib because it's on the
bleeding edge. At times I've regretted this, and gone back and run a
particular transform over some of the code. I rarely use the full set
of transforms on a whole subtree, although others sometimes do that.
Do note the options that help convert doctests and deal with print()
already being a function.

[zip()]
> It's used only fifteen times in 2.6 Lib/ and four of those are
> izip(). Eight are assignments, mostly to build dicts.

I don't understand. What's an "assignment" to build a dict? Do you
mean something like

  dict(zip(keys, values))

? That's an ideal use case for an iterator.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mike.klaas at gmail.com  Tue Aug  7 04:47:37 2007
From: mike.klaas at gmail.com (Mike Klaas)
Date: Mon, 6 Aug 2007 19:47:37 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
Message-ID: <DB2ED88C-7254-4F93-9AF5-39BA6BF8F29E@gmail.com>


On 6-Aug-07, at 7:06 PM, Guido van Rossum wrote:

> On 8/6/07, Mike Klaas <mike.klaas at gmail.com> wrote:

>> For instance, it is quite common to use integers as keys.  If you are
>> inserting keys in order, it is about a hundred times faster to encode
>> the ints in big-endian byte order than than little-endian:
>
> I'm assuming that this speed difference says something about the
> implementation of the underlying dbm package. Which package did you
> use to measure this?

This is true for bsddb backed by Berkeley DB, but it should be true  
to some extent in any btree-based database.

btrees are much more efficiently-constructed if built in key-order  
(especially when they don't fit in memory), and the difference stems  
purely from the nature of the representation: the little-endian byte  
representation of sorted integers is no longer sorted.  The big- 
endian representation preserves the sort order.

>> class MyIntDB(object):
>>         def __setitem__(self, key, item):
>>                self.db.put(struct.pack('>Q', key), serializer(item))
>>          def __getitem__(self, key):
>>                return unserializer(self.db.get(struct.pack('>Q',  
>> key)))
>>
>> How do you envision these types of tasks being accomplished with
>> unicode keys?  It is conceivable that one could write a custom
>> unicode encoding that accomplishes this, convert the key to unicode,
>> and pass the custom encoding name to the constructor.
>
> Well, the *easiest* (I don't know about simplest) way to use ints as
> keys is of course to use the decimal representation. You'd use
> str(key) instead of struct.pack(). This would of course not maintain
> key order -- is that important? If you need to be compatible with
> struct.pack(), and we were to choose Unicode strings for the keys in
> the API, then you might have to do something like
> struct.pack(...).encode("latin-1") and specify latin-1 as the
> database's key encoding.

The decimal representation would work if it were left-padded  
appropriately, though it would be somewhat space-inefficient.  The  
second option you propose is likely the most feasible.

> Of course this may not be compatible with an external constraint (e.g.
> another application that already has a key format) but in that case
> you may have to use arbitrary tricks anyway (the latin-1 encoding
> might still be helpful).
>
> However, I give you that a pure bytes API would be more convenient  
> at times.
>
> How about we define two APIs, using raw bytes and one using strings +
> a given encoding?
>

> Or perhaps a special value of the encoding argument passed to
> *dbm.open() (maybe None, maybe the default, maybe "raw" or "bytes"?)
> to specify that the key values are to be bytes?

Either option sounds fine, but you're still left with the need to  
implement the raw bytes version in dumbdbm.

ISTM that this issue boils down to the question of if and how byte  
sequences can be hashed in py3k.  Assuming you are trying to  
implement a file parser that dispatches to various methods based on  
binary data, what is the pythonic way to do this in py3k?

One option is to .decode('latin-1') and dispatch on the (meaningless)  
text.  Another is for something like str8 to be kept around.  Yet  
another is to use non-hash-based datastructures (like trees) to  
implement these algorithms.

-Mike

From rrr at ronadam.com  Tue Aug  7 04:53:26 2007
From: rrr at ronadam.com (Ron Adam)
Date: Mon, 06 Aug 2007 21:53:26 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B7C369.3040509@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com>	<46B66E7E.4060209@canterbury.ac.nz>
	<46B6A7E8.7040001@ronadam.com>	<46B6C335.4080504@canterbury.ac.nz>
	<46B6DE80.2050000@ronadam.com> <46B7C369.3040509@canterbury.ac.nz>
Message-ID: <46B7DEA6.5050609@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> What should happen in various situations of mismatched or invalid type 
>> specifiers?
> 
> I think that a format string that is not understood
> by any part of the system should raise an exception
> (rather than, e.g. falling back on str()). Refuse the
> temptation to guess.

That handles invalid type specifiers.


What about mismatched specifiers?

Try to convert the data?

Raise an exception?

Either depending on what the type specifier is?


I think the opinion so far is to let the objects __format__ method 
determine this, but we need to figure this out what the built in types will do.


Ron



From martin at v.loewis.de  Tue Aug  7 05:13:43 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Aug 2007 05:13:43 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>	
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	
	<46B78F86.9000505@v.loewis.de>	
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
Message-ID: <46B7E367.5060409@v.loewis.de>

> Personally, I still think that converting to the latin-1 encoding is
> probably just as good for this particular use case. So perhaps I don't
> understand the use case(s?) correctly.

No, it rather means that this solution didn't occur to me. It's a bit
expensive, since every access (getitem or setitem) will cause a
recoding, if the parameters are required to be bytes - but so would any
other solution that you can accept (i.e. use str8, use a separate
frozenbytes) - they all require that you copy the key parameter in
setitem/getitem.

So this sounds better than using str8.

Regards,
Martin

From martin at v.loewis.de  Tue Aug  7 05:27:40 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Aug 2007 05:27:40 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>	
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	
	<46B78F86.9000505@v.loewis.de>	
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
Message-ID: <46B7E6AC.3020102@v.loewis.de>

> I thought about this issue some more.
> 
> Given that the *dbm types strive for emulating dicts, I think it makes
> sense to use strings for the keys, and bytes for the values; this
> makes them more plug-compatible with real dicts. (We should ideally
> also change the keys() method etc. to return views.) This of course
> requires that we know the encoding used for the keys. Perhaps it would
> be acceptable to pick a conservative default encoding (e.g. ASCII) and
> add an encoding argument to the open() method.
> 
> Perhaps this will work? It seems better than using str8 or bytes for the keys.

It would work, but it would not be good. The dbm files traditionally did
not have any notion of character encoding for keys or values; they are
really bytes:bytes mappings. The encoding used for the keys might not
be known, or it might not be consistent across all keys.

Furthermore, for the specific case of bsddb, some users pointed out that
they absolutely think that keys must be bytes, since they *conceptually*
aren't text at all. "Big" users of bsddb create databases where some
tables are index tables for other tables; in such tables, the keys are
combinations of fields where the byte representation allows for
efficient lookup (akin postgres "create index foo_idx on foo(f1, f2,
f3);" where the key to the index becomes the concatenation of f1, f2,
and f3 - and f2 may be INTEGER, f3 TIMESTAMP WITHOUT TIME ZONE, say).

It's always possible to treat these as if they were latin-1, but this
is so unnaturally hacky that I didn't think of it.

Regards,
Martin


From martin at v.loewis.de  Tue Aug  7 05:29:33 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Aug 2007 05:29:33 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7CE27.3030103@acm.org>
References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com>
	<46B7CE27.3030103@acm.org>
Message-ID: <46B7E71D.1050705@v.loewis.de>

> The most efficient representation of immutable bytes is quite different
> from the most efficient representation of mutable bytes.

In what way?

Curious,
Martin

From martin at v.loewis.de  Tue Aug  7 05:41:58 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Aug 2007 05:41:58 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
Message-ID: <46B7EA06.5040106@v.loewis.de>

> Or perhaps a special value of the encoding argument passed to
> *dbm.open() (maybe None, maybe the default, maybe "raw" or "bytes"?)
> to specify that the key values are to be bytes?

This is essentially the state of the bsddb module in the struni branch
right now. The default is bytes keys and values; if you want string
keys, you write

   db = bsddb.open(...)
   db = bsddb.StringKeys(db)

which arranges for transparent UTF-8 encoding; it would be possible to
extend this to

   db = bsddb.open(...)
   db = bsddb.StringKeys(db, encoding="latin-1")

However, this has the view that there is a single "proper" key
representation, which is bytes, and then reinterpretations.

Now if you say that the dbm files are dicts conceptually, and
bytes are not allowed as dict keys, then any API that allows
for bytes as dbm keys (whether by default or as an option) is
conceptually inconsistent - as you now do have dict-like objects
which use bytes keys. This causes confusion if you pass one of
them to, say, .update of a "real" dict, which then fails. IOW,
I couldn't do

   d = {}
   d.update(db)

if db is in the "keys are bytes" mode.

Regards,
Martin

From martin at v.loewis.de  Tue Aug  7 05:45:00 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Aug 2007 05:45:00 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
Message-ID: <46B7EABC.1060909@v.loewis.de>

> For instance, it is quite common to use integers as keys.  If you are  
> inserting keys in order, it is about a hundred times faster to encode  
> the ints in big-endian byte order than than little-endian:
> 
> class MyIntDB(object):
> 	def __setitem__(self, key, item):
>                self.db.put(struct.pack('>Q', key), serializer(item))
>          def __getitem__(self, key):
>                return unserializer(self.db.get(struct.pack('>Q', key)))

I guess Guido wants you to write

class MyIntDB(object):
  def __setitem__(self, key, item):
    self.db.put(struct.pack('>Q', key).encode("latin-1"),
                serializer(item))
  def __getitem__(self, key):
    return unserializer(self.db.get(
       struct.pack('>Q', key).encode("latin-1"))

here.

> How do you envision these types of tasks being accomplished with  
> unicode keys?  It is conceivable that one could write a custom  
> unicode encoding that accomplishes this, convert the key to unicode,  
> and pass the custom encoding name to the constructor.

See above. It's always trivial to do that with latin-1 as the encoding
(I'm glad you didn't see that, either :-).

Regards,
Martin

From python at rcn.com  Tue Aug  7 05:45:09 2007
From: python at rcn.com (Raymond Hettinger)
Date: Mon, 6 Aug 2007 20:45:09 -0700
Subject: [Python-3000] map() Returns Iterator
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com><ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
	<87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <006201c7d8a5$5aff9050$f001a8c0@RaymondLaptop1>

From: "Kurt B. Kaiser" <kbk at shore.net>
> , IMHO eliminating the strict versions of map() and filter() in
> favor of the lazy versions from itertools kicks the degree of
> sophistication necessary to understand these functions up a notch (or
> three).

Not really.  Once range() starts returning an iterator,
that will be the new, basic norm.  With that as a foundation,
it would be suprising if map() and enumerate() and zip()
did not return iterators.  Learn once, use everywhere.


Raymond

From talin at acm.org  Tue Aug  7 05:50:27 2007
From: talin at acm.org (Talin)
Date: Mon, 06 Aug 2007 20:50:27 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7E71D.1050705@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com>
	<46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de>
Message-ID: <46B7EC03.7000605@acm.org>

Martin v. L?wis wrote:
>> The most efficient representation of immutable bytes is quite different
>> from the most efficient representation of mutable bytes.
> 
> In what way?

Well, in some runtime environments (I'm not sure about Python), for 
immutables you can combine the object header and the bytes array into a 
single allocation. Further, the header need not contain an explicit 
pointer to the bytes themselves, instead the bytes are obtained by doing 
pointer arithmetic on the header address.

For a mutable bytes object, you'll need to allocate the actual bytes 
separately from the header. Typically you'll also need a second 'length' 
field to represent the current physical capacity of the allocated memory 
block, in addition to the logical length of the byte array.

So in other words, the in-memory layout of the two structs is different 
enough that attempting to combine them into a single struct is kind of 
awkward.

> Curious,
> Martin
> 

From martin at v.loewis.de  Tue Aug  7 05:51:26 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Aug 2007 05:51:26 +0200
Subject: [Python-3000] Binary compatibility
In-Reply-To: <cd53a0140708061233v3776e9d5m40f39f3a022bb76d@mail.gmail.com>
References: <cd53a0140708061233v3776e9d5m40f39f3a022bb76d@mail.gmail.com>
Message-ID: <46B7EC3E.3070802@v.loewis.de>

> This means that I can't distribute Python extensions as binaries.

I think this conclusion is completely wrong. Why do you come to it?

If you want to distribute extension modules for Ubuntu, just distribute
the UCS-4 module. You need separate binary packages for different
microprocessors and operating systems, anyway, as you can't use the
same binary for Windows, OSX, Ubuntu, or Solaris.

> Any extension built on Ubuntu may fail on some other system.

Every extension built on Ubuntu *will* fail on other processors
or operating systems - even if the Unicode issue was solved, it
would still be a different instruction set (if you x85 vs. SPARC
or Itanium, say), and even for a single microprocessor, it will
fail if the OS ABI is different (different C libraries etc).

Now, you seem to talk about different *Linux* systems. On Linux,
use UCS-4.

Regards,
Martin


From guido at python.org  Tue Aug  7 05:56:35 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 20:56:35 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7EA06.5040106@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
Message-ID: <ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>

On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Or perhaps a special value of the encoding argument passed to
> > *dbm.open() (maybe None, maybe the default, maybe "raw" or "bytes"?)
> > to specify that the key values are to be bytes?
>
> This is essentially the state of the bsddb module in the struni branch
> right now. The default is bytes keys and values; if you want string
> keys, you write
>
>    db = bsddb.open(...)
>    db = bsddb.StringKeys(db)
>
> which arranges for transparent UTF-8 encoding;

Ah. I hadn't realized that this was the API. It sounds like as good a
solution as mine.

> it would be possible to extend this to
>
>    db = bsddb.open(...)
>    db = bsddb.StringKeys(db, encoding="latin-1")

This would be even better.

> However, this has the view that there is a single "proper" key
> representation, which is bytes, and then reinterpretations.
>
> Now if you say that the dbm files are dicts conceptually, and
> bytes are not allowed as dict keys, then any API that allows
> for bytes as dbm keys (whether by default or as an option) is
> conceptually inconsistent - as you now do have dict-like objects
> which use bytes keys. This causes confusion if you pass one of
> them to, say, .update of a "real" dict, which then fails. IOW,
> I couldn't do
>
>    d = {}
>    d.update(db)
>
> if db is in the "keys are bytes" mode.

I guess we have to rethink our use of these databases somewhat. I
think I'm fine with the model that the basic dbm implementations map
bytes to bytes, and aren't particularly compatible with dicts. (They
aren't, really, anyway -- the key and value types are typically
restricted, and the reference semantics are different.)

But, just like for regular file we have TextIOWrapper which wraps a
binary file with a layer for encoded text I/O, I think it would be
very useful to have a layer around the *dbm modules for making them
handle text.

Perhaps the StringKeys and/or StringValues wrappers can be
generalized? Or perhaps we could borrow from io.open(), and use a
combination of the mode and the encoding to determine how to stack
wrappers.

Another approach might be to generalize shelve. It already supports
pickling values. There could be a few variants for dealing with keys
that are either strings or arbitrary immutables; the keys used for the
underlying *dbm file would then be either an encoding (if the keys are
limited to strings) or a pickle (if they aren't). (The latter would
require some kind of canonical pickling version, so may not be
practical; there also may not be enough of a use case to bother.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug  7 06:03:45 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Aug 2007 21:03:45 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7EC03.7000605@acm.org>
References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com>
	<46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de>
	<46B7EC03.7000605@acm.org>
Message-ID: <ca471dc20708062103i1d74c40dyb804ec3f9b2dff03@mail.gmail.com>

On 8/6/07, Talin <talin at acm.org> wrote:
> Martin v. L?wis wrote:
> >> The most efficient representation of immutable bytes is quite different
> >> from the most efficient representation of mutable bytes.
> >
> > In what way?
>
> Well, in some runtime environments (I'm not sure about Python), for
> immutables you can combine the object header and the bytes array into a
> single allocation. Further, the header need not contain an explicit
> pointer to the bytes themselves, instead the bytes are obtained by doing
> pointer arithmetic on the header address.
>
> For a mutable bytes object, you'll need to allocate the actual bytes
> separately from the header. Typically you'll also need a second 'length'
> field to represent the current physical capacity of the allocated memory
> block, in addition to the logical length of the byte array.
>
> So in other words, the in-memory layout of the two structs is different
> enough that attempting to combine them into a single struct is kind of
> awkward.

Right. You've described exactly the difference between str8 and bytes
(PyString and PyBytes) in the struni branch (or in the future in
Python 2.6 for that matter).

There are two savings here: (1) the string object uses less memory
(only a single instance of the malloc header and round-off waste); (2)
the string object uses less time to allocate and free (only a single
call to malloc() or free()).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Tue Aug  7 06:22:24 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Aug 2007 06:22:24 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7EC03.7000605@acm.org>
References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com>
	<46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de>
	<46B7EC03.7000605@acm.org>
Message-ID: <46B7F380.1050805@v.loewis.de>

>>> The most efficient representation of immutable bytes is quite different
>>> from the most efficient representation of mutable bytes.
>>
>> In what way?
> 
> Well, in some runtime environments (I'm not sure about Python), for
> immutables you can combine the object header and the bytes array into a
> single allocation. Further, the header need not contain an explicit
> pointer to the bytes themselves, instead the bytes are obtained by doing
> pointer arithmetic on the header address.

Hmm. That assumes that the mutable bytes type also supports changes to
its length. I see that the Python bytes type does that, but I don't
think it's really necessary - I'm not even sure it's useful.

For a bytes array, you don't need a separate allocation, and it still
can be mutable.

> So in other words, the in-memory layout of the two structs is different
> enough that attempting to combine them into a single struct is kind of
> awkward.

... assuming the mutable bytes type behaves like a Python list, that
is. If it behaved like a Java/C byte[], this issue would not exist.

Regards,
Martin

From jyasskin at gmail.com  Tue Aug  7 06:26:32 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Mon, 6 Aug 2007 21:26:32 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
Message-ID: <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>

On 8/6/07, Guido van Rossum <guido at python.org> wrote:
> On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > For how long? Do you expect to receive further information that will
> > make a decision simpler?
>
> I'm waiting for a show-stopper issue that can't be solved without
> having an immutable bytes type.

Apologies if this has been answered before, but why are you waiting
for a show-stopper that requires an immutable bytes type rather than
one that requires a mutable one? This being software, there isn't
likely to be a real show-stopper (especially if you're willing to copy
the whole object), just things that are unnecessarily annoying or
confusing. Hashing seems to be one of those.

Taking TOOWTDI as a guideline: If you have immutable bytes and need a
mutable object, just use list(). If you have mutable bytes and need an
immutable object, you could 1) convert it to an int (probably
big-endian), 2) convert it to a latin-1 unicode object (containing
garbage, of course), 3) figure out an encoding in which to assume the
bytes represent text and create a unicode string from that, or 4) use
the deprecated str8 type. Why isn't this a clear win for immutable
bytes?

Jeffrey

From stephen at xemacs.org  Tue Aug  7 06:53:06 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 07 Aug 2007 13:53:06 +0900
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7E6AC.3020102@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<46B7E6AC.3020102@v.loewis.de>
Message-ID: <87ejig7xot.fsf@uwakimon.sk.tsukuba.ac.jp>

"Martin v. L?wis" writes:

 > It's always possible to treat these as if they were latin-1, but this
 > is so unnaturally hacky that I didn't think of it.

Emacs and XEmacs have both suffered (in different ways) from treating
raw bytes as ISO 8859-1.  Python is very different (among other
things, the Unicode type is already well-developed and the preferred
representation for text), but I think it's just as well that you avoid
this.  Even if it costs a little extra work.


From martin at v.loewis.de  Tue Aug  7 06:43:21 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 07 Aug 2007 06:43:21 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>	
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	
	<46B78F86.9000505@v.loewis.de>	
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	
	<46B79815.1030504@v.loewis.de>	
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
Message-ID: <46B7F869.6080007@v.loewis.de>

> Apologies if this has been answered before, but why are you waiting
> for a show-stopper that requires an immutable bytes type rather than
> one that requires a mutable one?

You mean, the need for a mutable bytes type might not be clear yet?

Code that has been ported to the bytes type probably doesn't use it
correctly yet, but to me, the need for a buffery thing where you
can allocate some buffer, and then fill it byte-for-byte is quite
obvious. It's a standard thing in all kinds of communication
protocols: in sending, you allocate plenty of memory, fill it, and
then send the fraction you actually consumed. In receiving, you
allocate plenty of memory (not knowing yet how much you will receive),
then only process as much as you needed. You do all that without
creating new buffers all the time - you use a single one over and
over again.

Code that has been ported to bytes from str8 often tends to still
follow the immutable pattern, creating a list of bytes objects to
be joined later - this can be improved in code reviews.

> Taking TOOWTDI as a guideline: If you have immutable bytes and need a
> mutable object, just use list().

I don't think this is adequate. Too much lower-level API relies on
having memory blocks, and that couldn't be implemented efficiently
with a list.

Regards,
Martin

From martin at v.loewis.de  Tue Aug  7 06:53:32 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Aug 2007 06:53:32 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>	
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>	
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>	
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
Message-ID: <46B7FACC.8030503@v.loewis.de>

> I guess we have to rethink our use of these databases somewhat.

Ok. In the interest of progress, I'll be looking at coming up with
some fixes for the code base right now; as we agree that the
underlying semantics is bytes:bytes, any encoding wrappers on
top of it can be added later.

> Perhaps the StringKeys and/or StringValues wrappers can be
> generalized? Or perhaps we could borrow from io.open(), and use a
> combination of the mode and the encoding to determine how to stack
> wrappers.

I thought about this, and couldn't think of a place where to put
them. Also, the bsddb versions provide additional functions
(such as .first() and .last()) which don't belong to the dict
API.

Furthermore, for dumbdbm, it would indeed be better if the dumbdbm
object knew that keys are meant to be strings. It could support
that natively - although not in a binary-backwards compatible
manner with 2.x. Doing so would be more efficient in the
implementation, as you'd avoid recoding.

> Another approach might be to generalize shelve. It already supports
> pickling values. There could be a few variants for dealing with keys
> that are either strings or arbitrary immutables; the keys used for the
> underlying *dbm file would then be either an encoding (if the keys are
> limited to strings) or a pickle (if they aren't). (The latter would
> require some kind of canonical pickling version, so may not be
> practical; there also may not be enough of a use case to bother.)

My concern is that people need to access existing databases. It's
all fine that the code accessing them breaks, and that they have
to actively port to Py3k. However, telling them that they have to
represent the keys in their dbm disk files in a different manner
might cause a revolt...

Regards,
Martin


From kbk at shore.net  Tue Aug  7 07:02:12 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Tue, 07 Aug 2007 01:02:12 -0400
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <ca471dc20708061922g4e8babceh85dc2687beae8978@mail.gmail.com>
	(Guido van Rossum's message of "Mon, 6 Aug 2007 19:22:40 -0700")
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
	<87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708061922g4e8babceh85dc2687beae8978@mail.gmail.com>
Message-ID: <87fy2whr8r.fsf@hydra.hampton.thirdcreek.com>

"Guido van Rossum" <guido at python.org> writes:

> [...pushback...]
>
>> However, IMHO eliminating the strict versions of map() and filter() in
>> favor of the lazy versions from itertools kicks the degree of
>> sophistication necessary to understand these functions up a notch (or
>> three).
>
> I wonder how bad this is given that range() and dict.keys() and
> friends will also stop returning lists? 

Don't know.  It's straightforward for us, but we use it every day.  I'm
with you on the dict methods; I just view map() and filter() differently.
I'll get used to it.  Let's see what we hear from the high schools in a
few years.

> I don't think you ever saw any of my Py3k presentations (the slides of
> the latest one at here:
> http://conferences.oreillynet.com/presentations/os2007/os_vanrossum.ppt).

Yes, I had dug them out.  This link is the best so far, thanks!

> I've always made a point of suggesting that we're switching to
> returning iterators instead of lists from as many APIs as makes sense
> (I stop at str.split() though, as I can't think of a use case where
> the list would be so big as to be bothersome).

It's not your father's snake  :-)

[...]

> I think you're overreacting due to your experience with conversion of
> existing code. I expect that new use cases where a list is needed will
> generally be written using list comprehensions in Py3k-specific
> tutorials, and generator expressions for situations where a list isn't
> needed (as a slightly more advanced feature). Then map() and filter()
> can be shown as more advanced optimizations of certain end cases.

I think you are correct.

[...]

> (Aside: Please skip the p3yk branch and use the py3k-struni branch --
> it's the way of the future.)

I was working on IDLE in p3yk because I expect a whole new set of
failures when I jump it to py3k-struni.  Maybe I'm wrong about that.

It's mostly working now; I've been editing/testing Python 3000 with it
for several weeks.

> I tend to do manual conversion of the stdlib because it's on the
> bleeding edge. At times I've regretted this, and gone back and run a
> particular transform over some of the code. I rarely use the full set
> of transforms on a whole subtree, although others sometimes do that.
> Do note the options that help convert doctests and deal with print()
> already being a function.

I'll give it a shot.  It probably would have helped me get IDLE going
sooner; I had to trace the interpreter failure through IDLE into
code.py.  The biggest problem was those four map(print...) statements
which I'll wager you wrote back in your salad days :-)

I have my answer, thanks!  See you in py3k-struni!

> [zip()]
>> It's used only fifteen times in 2.6 Lib/ and four of those are
>> izip(). Eight are assignments, mostly to build dicts.
>
> I don't understand. What's an "assignment" to build a dict? Do you
> mean something like
>
>   dict(zip(keys, values))
>
> ? That's an ideal use case for an iterator.

Yup, typical lines are

Lib/filecmp.py:        a = dict(izip(imap(os.path.normcase, self.left_list), self.left_list))

Lib/mailbox.py:        self._toc = dict(enumerate(zip(starts, stops)))

-- 
KBK

From kbk at shore.net  Tue Aug  7 07:09:15 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Tue, 07 Aug 2007 01:09:15 -0400
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <006201c7d8a5$5aff9050$f001a8c0@RaymondLaptop1> (Raymond
	Hettinger's message of "Mon, 6 Aug 2007 20:45:09 -0700")
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
	<87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com>
	<006201c7d8a5$5aff9050$f001a8c0@RaymondLaptop1>
Message-ID: <87bqdkhqx0.fsf@hydra.hampton.thirdcreek.com>

"Raymond Hettinger" <python at rcn.com> writes:

> Not really.  Once range() starts returning an iterator,
> that will be the new, basic norm.  With that as a foundation,
> it would be suprising if map() and enumerate() and zip()
> did not return iterators.  Learn once, use everywhere.

Except that range() is usually used in a loop, while map() and filter()
are not.  It seems to me that these two functions are going to expose
naked iterators to beginners (well, intermediates) more than the other
changes will.

-- 
KBK

From jyasskin at gmail.com  Tue Aug  7 07:45:34 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Mon, 6 Aug 2007 22:45:34 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7F869.6080007@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<46B7F869.6080007@v.loewis.de>
Message-ID: <5d44f72f0708062245k32e79de4s4b545c59a974612a@mail.gmail.com>

On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Apologies if this has been answered before, but why are you waiting
> > for a show-stopper that requires an immutable bytes type rather than
> > one that requires a mutable one?
>
> You mean, the need for a mutable bytes type might not be clear yet?
>
> Code that has been ported to the bytes type probably doesn't use it
> correctly yet, but to me, the need for a buffery thing where you
> can allocate some buffer, and then fill it byte-for-byte is quite
> obvious. It's a standard thing in all kinds of communication
> protocols: in sending, you allocate plenty of memory, fill it, and
> then send the fraction you actually consumed. In receiving, you
> allocate plenty of memory (not knowing yet how much you will receive),
> then only process as much as you needed. You do all that without
> creating new buffers all the time - you use a single one over and
> over again.

For low-level I/O code, I totally agree that a mutable buffery object
is needed. What I'm wondering about is why that object needs to bleed
up into the code the struni branch is fixing. The bytes type isn't
even going to serve that function without some significant interface
changes. For example, to support re-using bytes buffers, socket.send()
would need to take start and end offsets into its bytes argument.
Otherwise, you have to slice the object to select the right data,
which *because bytes are mutable* requires a copy. PEP 3116's .write()
method has the same problem. Making those changes is, of course,
doable, but it seems like something that should be consciously
committed to.

Python 2 seems to have gotten away with doing all the buffery stuff in
C. Is there a reason Python 3 shouldn't do the same? I was about to
wonder if the performance was even worth the nuisance, but then I
realized that I could run my own (na?ve) benchmark. Running revision
56747 of the p3yk branch, I get:

$ ./python.exe -m timeit 'b = bytes(v % 256 for v in range(1000))'
1000 loops, best of 3: 272 usec per loop
$ ./python.exe -m timeit -s 'b=bytes(v%256 for v in range(2000))' 'for
v in range(1000): b[v] = v % 256'
1000 loops, best of 3: 298 usec per loop

which seems to demonstrate that pre-allocating the bytes object is
slightly _more_ expensive than re-allocating it each time.

In any case, if people want to use bytes as both the low-level buffery
I/O thing and the high-level byte string, I think PEP 358 should
document it, since right now it just asserts that bytes are mutable
without any reason why.

> Code that has been ported to bytes from str8 often tends to still
> follow the immutable pattern, creating a list of bytes objects to
> be joined later - this can be improved in code reviews.
>
> > Taking TOOWTDI as a guideline: If you have immutable bytes and need a
> > mutable object, just use list().
>
> I don't think this is adequate. Too much lower-level API relies on
> having memory blocks, and that couldn't be implemented efficiently
> with a list.
>
> Regards,
> Martin
>


-- 
Namast?,
Jeffrey Yasskin

From lists at cheimes.de  Tue Aug  7 08:13:07 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 07 Aug 2007 08:13:07 +0200
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
 tests in the struni branch!
In-Reply-To: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
Message-ID: <46B80D73.5050009@cheimes.de>

Guido van Rossum wrote:
> test_minidom
> Recently started failing again; probably shallow.

test_minidom is passing for me (Ubuntu 7.04, r56793, UCS2 build).

> test_tarfile
> Virgin territory again (but different owner :-).

The tarfile should be addressed by either its original author or
somebody with lots of spare time. As stated earlier it's a beast. I
tried to fix it several weeks ago because I thought it is a low hanging
fruit. I was totally wrong. :/

Christian


From martin at v.loewis.de  Tue Aug  7 08:22:02 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 07 Aug 2007 08:22:02 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <5d44f72f0708062245k32e79de4s4b545c59a974612a@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>	
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	
	<46B78F86.9000505@v.loewis.de>	
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	
	<46B79815.1030504@v.loewis.de>	
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>	
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>	
	<46B7F869.6080007@v.loewis.de>
	<5d44f72f0708062245k32e79de4s4b545c59a974612a@mail.gmail.com>
Message-ID: <46B80F8A.7060906@v.loewis.de>

> For low-level I/O code, I totally agree that a mutable buffery object
> is needed.

The code we are looking at right now (dbm interfaces) *is* low-level
I/O code.

> For example, to support re-using bytes buffers, socket.send()
> would need to take start and end offsets into its bytes argument.
> Otherwise, you have to slice the object to select the right data,
> which *because bytes are mutable* requires a copy. PEP 3116's .write()
> method has the same problem. Making those changes is, of course,
> doable, but it seems like something that should be consciously
> committed to.

Sure. There are several ways to do that, including producing view
objects - which would be possible even though the underlying buffer
is mutable; the view would then be just as mutable.

> Python 2 seems to have gotten away with doing all the buffery stuff in
> C. Is there a reason Python 3 shouldn't do the same?

I think Python 2 has demonstrated that this doesn't really work. People
repeatedly did += on strings (leading to quadratic performance),
invented the buffer interface (which is semantically flawed), added
direct support for mmap, and so on.

> $ ./python.exe -m timeit 'b = bytes(v % 256 for v in range(1000))'
> 1000 loops, best of 3: 272 usec per loop
> $ ./python.exe -m timeit -s 'b=bytes(v%256 for v in range(2000))' 'for
> v in range(1000): b[v] = v % 256'
> 1000 loops, best of 3: 298 usec per loop
> 
> which seems to demonstrate that pre-allocating the bytes object is
> slightly _more_ expensive than re-allocating it each time.

There must be more conditions to it; I get

martin at mira:~/work/3k$ ./python -m timeit 'b = bytes(v % 256 for v in
range(1000))'
1000 loops, best of 3: 434 usec per loop
martin at mira:~/work/3k$ ./python -m timeit -s 'b=bytes(v%256 for v in
range(2000))' 'for v in range(1000): b[v] = v % 256'
1000 loops, best of 3: 394 usec per loop

which is the reverse result.

> In any case, if people want to use bytes as both the low-level buffery
> I/O thing and the high-level byte string, I think PEP 358 should
> document it, since right now it just asserts that bytes are mutable
> without any reason why.

That point is mute now; the PEP has been accepted. Documenting things
is always good, but the time for objections to the PEP is over now -
that's what the PEP process is for.

Regards,
Martin


From talex5 at gmail.com  Tue Aug  7 09:15:06 2007
From: talex5 at gmail.com (Thomas Leonard)
Date: Tue, 7 Aug 2007 08:15:06 +0100
Subject: [Python-3000] Binary compatibility
In-Reply-To: <46B7EC3E.3070802@v.loewis.de>
References: <cd53a0140708061233v3776e9d5m40f39f3a022bb76d@mail.gmail.com>
	<46B7EC3E.3070802@v.loewis.de>
Message-ID: <cd53a0140708070015j7f944ea5qca963da2684d5e86@mail.gmail.com>

On 8/7/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > This means that I can't distribute Python extensions as binaries.
>
> I think this conclusion is completely wrong. Why do you come to it?
>
> If you want to distribute extension modules for Ubuntu, just distribute
> the UCS-4 module. You need separate binary packages for different
> microprocessors and operating systems, anyway, as you can't use the
> same binary for Windows, OSX, Ubuntu, or Solaris.

You're right that we already have to provide several binaries
(although OSX and Windows users usually aren't all that interested in
running Unix desktop environments like ROX ;-), but each new
combination is more work for us. Linux/x86 covers pretty much all our
non-technical users, I think.

Autopackage double-compiles C++ programs (C++ being the other piece of
Linux infrastructure with an unstable ABI), for example, but if they
want to provide binaries for a C++ program using Python, that's 4
binaries per architecture!

(You also have to special-case the selection logic. Every installation
system understands about different versions and different processors,
but they need custom code to figure out which of two flavours of
Python is installed).

> > Any extension built on Ubuntu may fail on some other system.
>
> Every extension built on Ubuntu *will* fail on other processors
> or operating systems - even if the Unicode issue was solved, it
> would still be a different instruction set (if you x85 vs. SPARC
> or Itanium, say),

> and even for a single microprocessor, it will
> fail if the OS ABI is different (different C libraries etc).

Generally it doesn't. Our ROX-Filer/x86 binary using GTK+ runs on all
Linux/x86 systems (as far as I know). Linux binary compatibility is
currently very good, provided you avoid C++ and Python extensions.

> Now, you seem to talk about different *Linux* systems. On Linux,
> use UCS-4.

Yes, that's what we want. But Python 2.5 defaults to UCS-2 (at least
last time I tried), while many distros have used UCS-4. If Linux
always used UCS-4, that would be fine, but currently there's no
guarantee of that.


-- 
Dr Thomas Leonard		http://rox.sourceforge.net
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1

From pj at place.org  Tue Aug  7 06:37:49 2007
From: pj at place.org (Paul Jimenez)
Date: Mon, 06 Aug 2007 23:37:49 -0500
Subject: [Python-3000] Plea for help:
	python/branches/py3k-struni/Lib/tarfile.py
Message-ID: <20070807043749.E11A8179C7E@place.org>


  This evening I had a couple hours to spar and happend to read Guido's
plea for help near the beginning of it. I picked up a failing testcase
that no one had claimed and did what I could: it's not finished, but it
fixes approximately 75% of the errors in test_tarfile. I concentrated
on fixing problems that the testcase turned up; a pure inspection of
the source would turn up lots of things I missed, I'm sure. I hope it's
useful; it probably need minor attention from me on what the Right Thing
to do is in the case of encoding and decoding: ascii? I had to do a
.decode('latin-1') to pass the umlaut-in-a-filename test, but I'm not at
all sure that that's the true Right Thing. Anyway, here's a start; I'm
explicitly *not* claiming that I'll ever touch this source code again; I
don't want to block anyone else from working on it.  Enjoy.

  --pj


Index: tarfile.py
===================================================================
--- tarfile.py	(revision 56785)
+++ tarfile.py	(working copy)
@@ -72,33 +72,33 @@
 #---------------------------------------------------------
 # tar constants
 #---------------------------------------------------------
-NUL = "\0"                      # the null character
+NUL = b"\0"                     # the null character
 BLOCKSIZE = 512                 # length of processing blocks
 RECORDSIZE = BLOCKSIZE * 20     # length of records
-GNU_MAGIC = "ustar  \0"         # magic gnu tar string
-POSIX_MAGIC = "ustar\x0000"     # magic posix tar string
+GNU_MAGIC = b"ustar  \0"        # magic gnu tar string
+POSIX_MAGIC = b"ustar\x0000"    # magic posix tar string
 
 LENGTH_NAME = 100               # maximum length of a filename
 LENGTH_LINK = 100               # maximum length of a linkname
 LENGTH_PREFIX = 155             # maximum length of the prefix field
 
-REGTYPE = "0"                   # regular file
-AREGTYPE = "\0"                 # regular file
-LNKTYPE = "1"                   # link (inside tarfile)
-SYMTYPE = "2"                   # symbolic link
-CHRTYPE = "3"                   # character special device
-BLKTYPE = "4"                   # block special device
-DIRTYPE = "5"                   # directory
-FIFOTYPE = "6"                  # fifo special device
-CONTTYPE = "7"                  # contiguous file
+REGTYPE = b"0"                   # regular file
+AREGTYPE = b"\0"                 # regular file
+LNKTYPE = b"1"                   # link (inside tarfile)
+SYMTYPE = b"2"                   # symbolic link
+CHRTYPE = b"3"                   # character special device
+BLKTYPE = b"4"                   # block special device
+DIRTYPE = b"5"                   # directory
+FIFOTYPE = b"6"                  # fifo special device
+CONTTYPE = b"7"                  # contiguous file
 
-GNUTYPE_LONGNAME = "L"          # GNU tar longname
-GNUTYPE_LONGLINK = "K"          # GNU tar longlink
-GNUTYPE_SPARSE = "S"            # GNU tar sparse file
+GNUTYPE_LONGNAME = b"L"          # GNU tar longname
+GNUTYPE_LONGLINK = b"K"          # GNU tar longlink
+GNUTYPE_SPARSE = b"S"            # GNU tar sparse file
 
-XHDTYPE = "x"                   # POSIX.1-2001 extended header
-XGLTYPE = "g"                   # POSIX.1-2001 global header
-SOLARIS_XHDTYPE = "X"           # Solaris extended header
+XHDTYPE = b"x"                   # POSIX.1-2001 extended header
+XGLTYPE = b"g"                   # POSIX.1-2001 global header
+SOLARIS_XHDTYPE = b"X"           # Solaris extended header
 
 USTAR_FORMAT = 0                # POSIX.1-1988 (ustar) format
 GNU_FORMAT = 1                  # GNU tar format
@@ -176,6 +176,9 @@
 def stn(s, length):
     """Convert a python string to a null-terminated string buffer.
     """
+    #return s[:length].encode('ascii') + (length - len(s)) * NUL
+    if type(s) != type(b''):
+        s = s.encode('ascii')
     return s[:length] + (length - len(s)) * NUL
 
 def nts(s):
@@ -184,8 +187,8 @@
     # Use the string up to the first null char.
     p = s.find("\0")
     if p == -1:
-        return s
-    return s[:p]
+        return s.decode('latin-1')
+    return s[:p].decode('latin-1')
 
 def nti(s):
     """Convert a number field to a python number.
@@ -214,7 +217,7 @@
     # encoding, the following digits-1 bytes are a big-endian
     # representation. This allows values up to (256**(digits-1))-1.
     if 0 <= n < 8 ** (digits - 1):
-        s = "%0*o" % (digits - 1, n) + NUL
+        s = ("%0*o" % (digits - 1, n)).encode('ascii') + NUL
     else:
         if format != GNU_FORMAT or n >= 256 ** (digits - 1):
             raise ValueError("overflow in number field")
@@ -412,7 +415,7 @@
         self.comptype = comptype
         self.fileobj  = fileobj
         self.bufsize  = bufsize
-        self.buf      = ""
+        self.buf      = b""
         self.pos      = 0
         self.closed   = False
 
@@ -434,7 +437,7 @@
             except ImportError:
                 raise CompressionError("bz2 module is not available")
             if mode == "r":
-                self.dbuf = ""
+                self.dbuf = b""
                 self.cmp = bz2.BZ2Decompressor()
             else:
                 self.cmp = bz2.BZ2Compressor()
@@ -451,10 +454,10 @@
                                             self.zlib.DEF_MEM_LEVEL,
                                             0)
         timestamp = struct.pack("<L", int(time.time()))
-        self.__write("\037\213\010\010%s\002\377" % timestamp)
+        self.__write(b"\037\213\010\010" + timestamp + b"\002\377")
         if self.name.endswith(".gz"):
             self.name = self.name[:-3]
-        self.__write(self.name + NUL)
+        self.__write(self.name.encode('ascii') + NUL)
 
     def write(self, s):
         """Write string s to the stream.
@@ -487,7 +490,7 @@
 
         if self.mode == "w" and self.buf:
             self.fileobj.write(self.buf)
-            self.buf = ""
+            self.buf = b""
             if self.comptype == "gz":
                 # The native zlib crc is an unsigned 32-bit integer, but
                 # the Python wrapper implicitly casts that to a signed C
@@ -507,12 +510,12 @@
         """Initialize for reading a gzip compressed fileobj.
         """
         self.cmp = self.zlib.decompressobj(-self.zlib.MAX_WBITS)
-        self.dbuf = ""
+        self.dbuf = b""
 
         # taken from gzip.GzipFile with some alterations
-        if self.__read(2) != "\037\213":
+        if self.__read(2) != b"\037\213":
             raise ReadError("not a gzip file")
-        if self.__read(1) != "\010":
+        if self.__read(1) != b"\010":
             raise CompressionError("unsupported compression method")
 
         flag = ord(self.__read(1))
@@ -564,7 +567,7 @@
                 if not buf:
                     break
                 t.append(buf)
-            buf = "".join(t)
+            buf = b"".join(t)
         else:
             buf = self._read(size)
         self.pos += len(buf)
@@ -588,7 +591,7 @@
                 raise ReadError("invalid compressed data")
             t.append(buf)
             c += len(buf)
-        t = "".join(t)
+        t = b"".join(t)
         self.dbuf = t[size:]
         return t[:size]
 
@@ -604,7 +607,7 @@
                 break
             t.append(buf)
             c += len(buf)
-        t = "".join(t)
+        t = b"".join(t)
         self.buf = t[size:]
         return t[:size]
 # class _Stream
@@ -655,7 +658,7 @@
         if self.mode == "r":
             self.bz2obj = bz2.BZ2Decompressor()
             self.fileobj.seek(0)
-            self.buf = ""
+            self.buf = b""
         else:
             self.bz2obj = bz2.BZ2Compressor()
 
@@ -670,7 +673,7 @@
             except EOFError:
                 break
             x += len(data)
-        self.buf = "".join(b)
+        self.buf = b"".join(b)
 
         buf = self.buf[:size]
         self.buf = self.buf[size:]
@@ -753,7 +756,7 @@
                 break
             size -= len(buf)
             data.append(buf)
-        return "".join(data)
+        return b"".join(data)
 
     def readsparsesection(self, size):
         """Read a single section of a sparse file.
@@ -761,7 +764,7 @@
         section = self.sparse.find(self.position)
 
         if section is None:
-            return ""
+            return b""
 
         size = min(size, section.offset + section.size - self.position)
 
@@ -793,7 +796,7 @@
         self.size = tarinfo.size
 
         self.position = 0
-        self.buffer = ""
+        self.buffer = b""
 
     def read(self, size=None):
         """Read at most size bytes from the file. If size is not
@@ -802,11 +805,11 @@
         if self.closed:
             raise ValueError("I/O operation on closed file")
 
-        buf = ""
+        buf = b""
         if self.buffer:
             if size is None:
                 buf = self.buffer
-                self.buffer = ""
+                self.buffer = b""
             else:
                 buf = self.buffer[:size]
                 self.buffer = self.buffer[size:]
@@ -827,16 +830,16 @@
         if self.closed:
             raise ValueError("I/O operation on closed file")
 
-        if "\n" in self.buffer:
-            pos = self.buffer.find("\n") + 1
+        if b"\n" in self.buffer:
+            pos = self.buffer.find(b"\n") + 1
         else:
             buffers = [self.buffer]
             while True:
                 buf = self.fileobj.read(self.blocksize)
                 buffers.append(buf)
-                if not buf or "\n" in buf:
-                    self.buffer = "".join(buffers)
-                    pos = self.buffer.find("\n") + 1
+                if not buf or b"\n" in buf:
+                    self.buffer = b"".join(buffers)
+                    pos = self.buffer.find(b"\n") + 1
                     if pos == 0:
                         # no newline found.
                         pos = len(self.buffer)
@@ -848,7 +851,7 @@
         buf = self.buffer[:pos]
         self.buffer = self.buffer[pos:]
         self.position += len(buf)
-        return buf
+        return buf.decode()
 
     def readlines(self):
         """Return a list with all remaining lines.
@@ -886,7 +889,7 @@
         else:
             raise ValueError("Invalid argument")
 
-        self.buffer = ""
+        self.buffer = b""
         self.fileobj.seek(self.position)
 
     def close(self):
@@ -1015,7 +1018,7 @@
         """
         info["magic"] = GNU_MAGIC
 
-        buf = ""
+        buf = b""
         if len(info["linkname"]) > LENGTH_LINK:
             buf += self._create_gnu_long_header(info["linkname"], GNUTYPE_LONGLINK)
 
@@ -1071,7 +1074,7 @@
         if pax_headers:
             buf = self._create_pax_generic_header(pax_headers)
         else:
-            buf = ""
+            buf = b""
 
         return buf + self._create_header(info, USTAR_FORMAT)
 
@@ -1108,7 +1111,7 @@
             itn(info.get("gid", 0), 8, format),
             itn(info.get("size", 0), 12, format),
             itn(info.get("mtime", 0), 12, format),
-            "        ", # checksum field
+            b"        ", # checksum field
             info.get("type", REGTYPE),
             stn(info.get("linkname", ""), 100),
             stn(info.get("magic", POSIX_MAGIC), 8),
@@ -1119,9 +1122,9 @@
             stn(info.get("prefix", ""), 155)
         ]
 
-        buf = struct.pack("%ds" % BLOCKSIZE, "".join(parts))
+        buf = struct.pack("%ds" % BLOCKSIZE, b"".join(parts))
         chksum = calc_chksums(buf[-BLOCKSIZE:])[0]
-        buf = buf[:-364] + "%06o\0" % chksum + buf[-357:]
+        buf = buf[:-364] + ("%06o\0" % chksum).encode('ascii') + buf[-357:]
         return buf
 
     @staticmethod
@@ -1139,10 +1142,10 @@
         """Return a GNUTYPE_LONGNAME or GNUTYPE_LONGLINK sequence
            for name.
         """
-        name += NUL
+        name = name.encode('ascii') + NUL
 
         info = {}
-        info["name"] = "././@LongLink"
+        info["name"] = b"././@LongLink"
         info["type"] = type
         info["size"] = len(name)
         info["magic"] = GNU_MAGIC
@@ -1324,7 +1327,7 @@
             lastpos = offset + numbytes
             pos += 24
 
-        isextended = ord(buf[482])
+        isextended = buf[482]
         origsize = nti(buf[483:495])
 
         # If the isextended flag is given,
@@ -1344,7 +1347,7 @@
                 realpos += numbytes
                 lastpos = offset + numbytes
                 pos += 24
-            isextended = ord(buf[504])
+            isextended = buf[504]
 
         if lastpos < origsize:
             sp.append(_hole(lastpos, origsize - lastpos))
Index: test/test_tarfile.py
===================================================================
--- test/test_tarfile.py	(revision 56784)
+++ test/test_tarfile.py	(working copy)
@@ -115,7 +115,7 @@
         fobj.seek(0, 2)
         self.assertEqual(tarinfo.size, fobj.tell(),
                      "seek() to file's end failed")
-        self.assert_(fobj.read() == "",
+        self.assert_(fobj.read() == b"",
                      "read() at file's end did not return empty string")
         fobj.seek(-tarinfo.size, 2)
         self.assertEqual(0, fobj.tell(),

From greg.ewing at canterbury.ac.nz  Tue Aug  7 10:20:23 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 20:20:23 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
Message-ID: <46B82B47.9090108@canterbury.ac.nz>

Guido van Rossum wrote:
> At the same time we still have enough uses of str9
                                                 ^^^^

For holding data from 9-track tapes? :-)

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug  7 10:21:28 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 20:21:28 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
Message-ID: <46B82B88.5000804@canterbury.ac.nz>

Guido van Rossum wrote:
> Personally, I still think that converting to the latin-1 encoding is
> probably just as good for this particular use case.

Although that's a conceptually screwy thing to do
when your data has nothing to do with characters.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug  7 10:47:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 20:47:20 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7EA06.5040106@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
Message-ID: <46B83198.5090502@canterbury.ac.nz>

Martin v. L?wis wrote:
> Now if you say that the dbm files are dicts conceptually,

I wouldn't say they're dicts, rather they're mappings.
Restriction of keys to immutable values is a peculiarity
of dicts, not a required feature of mappings in general.

--
Greg


From greg.ewing at canterbury.ac.nz  Tue Aug  7 10:51:50 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 20:51:50 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7F380.1050805@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com>
	<46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de>
	<46B7EC03.7000605@acm.org> <46B7F380.1050805@v.loewis.de>
Message-ID: <46B832A6.5000104@canterbury.ac.nz>

Martin v. L?wis wrote:
> That assumes that the mutable bytes type also supports changes to
> its length.

It would be surprising if it didn't, because that would
make it different from all the other builtin mutable
sequences.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug  7 10:54:49 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 20:54:49 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
Message-ID: <46B83359.8020204@canterbury.ac.nz>

Jeffrey Yasskin wrote:
> If you have mutable bytes and need an
> immutable object, you could 1) convert it to an int (probably
> big-endian),

That's not a reversible transformation, because you lose
information about leading zero bits.

> 4) use the deprecated str8 type

Which won't exist in Py3k, so it'll be a bit hard to use...

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug  7 11:01:42 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 21:01:42 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7F869.6080007@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<46B7F869.6080007@v.loewis.de>
Message-ID: <46B834F6.7050307@canterbury.ac.nz>

Martin v. L?wis wrote:
> Code that has been ported to the bytes type probably doesn't use it
> correctly yet, but to me, the need for a buffery thing where you
> can allocate some buffer, and then fill it byte-for-byte is quite
> obvious.

We actually already *have* something like that,
i.e. array.array('B').

So I don't think it's a priori a silly idea to
consider making the bytes type immutable only,
and using the array type for when you want a
mutable buffer.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug  7 11:33:51 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Aug 2007 21:33:51 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B7DEA6.5050609@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com>
	<46B6C335.4080504@canterbury.ac.nz> <46B6DE80.2050000@ronadam.com>
	<46B7C369.3040509@canterbury.ac.nz> <46B7DEA6.5050609@ronadam.com>
Message-ID: <46B83C7F.603@canterbury.ac.nz>

Ron Adam wrote:
> What about mismatched specifiers?

It's not clear exactly what you mean by a "mismatched"
specifier.

Some types may recognise when they're being passed
a format spec that belongs to another type, and try
to convert themselves to that type (e.g. applying
'f' to an int or 'd' to a float).

If the type doesn't recognise the format at all,
and doesn't have a fallback type to delegate to
(as will probably be the case with str) then
you will get an exception.

> I think the opinion so far is to let the objects __format__ method 
> determine this, but we need to figure this out what the built in types 
> will do.

My suggestions would be:

   int - understands all the 'integer' formats
         (d, x, o, etc.)
       - recognises the 'float' formats ('f', 'e', etc.)
         and delegates to float
       - delegates anything it doesn't recognise to str

   float - understands all the 'float' formats
         - recognises the 'integer' formats and delegates to int
         - delegates anything it doesn't recognise to str

   str - recognises the 'string' formats (only one?)
       - raises an exception for anything it doesn't understand

I've forgotten where 'r' was supposed to fit into
this scheme. Can anyone remind me?

--
Greg

From theller at ctypes.org  Tue Aug  7 14:06:27 2007
From: theller at ctypes.org (Thomas Heller)
Date: Tue, 07 Aug 2007 14:06:27 +0200
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
 tests in the struni branch!
In-Reply-To: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
Message-ID: <f99n87$8or$1@sea.gmane.org>

Guido van Rossum schrieb:
> We're down to 11 failing test in the struni branch. I'd like to get
> this down to zero ASAP so that we can retire the old p3yk (yes, with
> typo!) branch and rename py3k-struni to py3k.
> 
> Please help! Here's the list of failing tests:
> 
> test_ctypes
> Recently one test started failing again, after Martin changed
> PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1.

I wanted to look into this and noticed that 'import time' on Windows
doesn't work anymore on my machine.  The reason is that PyUnicode_FromStringAndSize()
is called for the string 'Westeurop?ische Normalzeit', and then fails with

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data

Thomas


From theller at ctypes.org  Tue Aug  7 14:12:26 2007
From: theller at ctypes.org (Thomas Heller)
Date: Tue, 07 Aug 2007 14:12:26 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <46B78D96.4090901@v.loewis.de>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>		<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>		<46B5C47B.5090703@v.loewis.de>		<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>		<46B5F136.4010502@v.loewis.de>		<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>		<46B633D0.7050902@v.loewis.de>
	<46B6D2EC.601@livinglogic.de>		<46B6D6B8.7000207@v.loewis.de>
	<46B6E66D.80301@livinglogic.de>	<ca471dc20708061058s4b468b44k6f9ecd98019dbeeb@mail.gmail.com>
	<46B78D96.4090901@v.loewis.de>
Message-ID: <f99nja$8ft$1@sea.gmane.org>

Martin v. L?wis schrieb:
>> One issue with just putting this in the C API docs is that I believe
>> (tell me if I'm wrong) that these haven't been kept up to date in the
>> struni branch so we'll need to make a lot more changes than just this
>> one...
> 
> That's certainly the case. However, if we end up deleting the str8
> type entirely, I'd be in favor of recycling the PyString_* names
> for Unicode, in which case everything needs to be edited, anyway.

PyText_*, maybe?

Thomas


From ncoghlan at gmail.com  Tue Aug  7 14:33:36 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 07 Aug 2007 22:33:36 +1000
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B82B88.5000804@canterbury.ac.nz>
References: <46B637DD.7070905@v.loewis.de>	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	<46B78F86.9000505@v.loewis.de>	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz>
Message-ID: <46B866A0.2040800@gmail.com>

Greg Ewing wrote:
> Guido van Rossum wrote:
>> Personally, I still think that converting to the latin-1 encoding is
>> probably just as good for this particular use case.
> 
> Although that's a conceptually screwy thing to do
> when your data has nothing to do with characters.

Yeah, this approach seems to run counter to the whole point of getting 
rid of the current str type: "for binary data use bytes, for text use 
Unicode, unless you need your binary data to be hashable, and then you 
decode it to gibberish Unicode via the latin-1 codec"

This would mean that the Unicode type would acquire all of the ambiquity 
currently associated with the 8-bit str type: does it contain actual 
text, or does it contain arbitrary latin-1 decoded binary data?

A separate frozenbytes type (with the bytes API instead of the string 
API) would solve the problem far more cleanly.

Easy-for-me-to-say-when-I'm-not-providing-the-code-'ly yours,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Tue Aug  7 16:36:20 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 07:36:20 -0700
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
	tests in the struni branch!
In-Reply-To: <f99n87$8or$1@sea.gmane.org>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<f99n87$8or$1@sea.gmane.org>
Message-ID: <ca471dc20708070736t6797477bw6667e0608bccb316@mail.gmail.com>

On 8/7/07, Thomas Heller <theller at ctypes.org> wrote:
> Guido van Rossum schrieb:
> > test_ctypes
> > Recently one test started failing again, after Martin changed
> > PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1.
>
> I wanted to look into this and noticed that 'import time' on Windows
> doesn't work anymore on my machine.  The reason is that PyUnicode_FromStringAndSize()
> is called for the string 'Westeurop?ische Normalzeit', and then fails with
>
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data

I'm assuming that's a literal somewhere? In what encoding is it? That
function was recently changed to require the input to be UTF-8. If the
input isn't UTF-8, you'll have to use another API with an explicit
encoding, PyUnicode_Decode().

I'm pretty sure this change is also responsible for the one failure
(as it started around the time that change was made) but I don't
understand the failure well enough to track it down. (It looked like
uninitialized memory was being accessed though.)

In case you wonder why it was changed, it's for symmetry with
_PyUnicode_AsDefaultEncodedString(), which is the most common way to
turn Unicode back into a char* without specifying an encoding. (And
yes, that name needs to be changed.)

See recent posts here.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Tue Aug  7 16:36:01 2007
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 7 Aug 2007 09:36:01 -0500
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
 tests in the struni branch!
In-Reply-To: <f99n87$8or$1@sea.gmane.org>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<f99n87$8or$1@sea.gmane.org>
Message-ID: <18104.33617.706079.853923@montanaro.dyndns.org>

test_csv got removed from the failing list after Guido applied Adam Hupp's
patch.  (I checked in a small update for one thing Adam missed.)  I'm still
getting test failures though:

    ======================================================================
    FAIL: test_reader_attrs (__main__.Test_Csv)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "Lib/test/test_csv.py", line 63, in test_reader_attrs
        self._test_default_attrs(csv.reader, [])
      File "Lib/test/test_csv.py", line 47, in _test_default_attrs
        self.assertEqual(obj.dialect.delimiter, ',')
    AssertionError: s'\x00' != ','

This same exception crops up six times.  Maybe this isn't
str->unicode-related, but it sure seems like it to me.  I spent some time
over the past few days trying to figure it out, but I struck out.

Skip


From guido at python.org  Tue Aug  7 16:42:42 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 07:42:42 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7F380.1050805@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com>
	<46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de>
	<46B7EC03.7000605@acm.org> <46B7F380.1050805@v.loewis.de>
Message-ID: <ca471dc20708070742t4b379010kf5d6d9e519002693@mail.gmail.com>

On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> >>> The most efficient representation of immutable bytes is quite different
> >>> from the most efficient representation of mutable bytes.
> >>
> >> In what way?
> >
> > Well, in some runtime environments (I'm not sure about Python), for
> > immutables you can combine the object header and the bytes array into a
> > single allocation. Further, the header need not contain an explicit
> > pointer to the bytes themselves, instead the bytes are obtained by doing
> > pointer arithmetic on the header address.
>
> Hmm. That assumes that the mutable bytes type also supports changes to
> its length. I see that the Python bytes type does that, but I don't
> think it's really necessary - I'm not even sure it's useful.

It is. The I/O library uses it extensively for buffers: instead of
allocating a new object each time some data is added to a buffer, the
buffer is simply extended. This saves the malloc/free calls for the
object header, and in some cases realloc is also free (there is some
overallocation in the bytes type and sometimes realloc can extend an
object without moving it, if the space after it happens to be free).

> For a bytes array, you don't need a separate allocation, and it still
> can be mutable.
>
> > So in other words, the in-memory layout of the two structs is different
> > enough that attempting to combine them into a single struct is kind of
> > awkward.
>
> ... assuming the mutable bytes type behaves like a Python list, that
> is. If it behaved like a Java/C byte[], this issue would not exist.

There is no requirement to copy bad features from Java.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug  7 16:48:46 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 07:48:46 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
Message-ID: <ca471dc20708070748y12f2fb88gdd359833ddad6ad4@mail.gmail.com>

On 8/6/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> On 8/6/07, Guido van Rossum <guido at python.org> wrote:
> > I'm waiting for a show-stopper issue that can't be solved without
> > having an immutable bytes type.
>
> Apologies if this has been answered before, but why are you waiting
> for a show-stopper that requires an immutable bytes type rather than
> one that requires a mutable one? This being software, there isn't
> likely to be a real show-stopper (especially if you're willing to copy
> the whole object), just things that are unnecessarily annoying or
> confusing. Hashing seems to be one of those.

Well one reason of course is that we currently have a mutable bytes
object and that it works well in most situations.

> Taking TOOWTDI as a guideline: If you have immutable bytes and need a
> mutable object, just use list().

That would not work with low-level I/O (sometimes readinto() is
useful), and in general list(b) (where b is a bytes object) takes up
an order of magnitude more memory than b.

> If you have mutable bytes and need an
> immutable object, you could 1) convert it to an int (probably
> big-endian), 2) convert it to a latin-1 unicode object (containing
> garbage, of course), 3) figure out an encoding in which to assume the
> bytes represent text and create a unicode string from that, or 4) use
> the deprecated str8 type. Why isn't this a clear win for immutable
> bytes?

IMO there are some use cases where mutable bytes are the only
realistic solution. These mostly have to do with doing large amounts
of I/O reusing a buffer. Currently the array module can be used for
this but I would like to get rid of it in favor of bytes and Travis
Oliphant's new buffer API (which serves a similar purpose as the array
module but has a much more powerful mini-language to describe the
internal structure of the elements, similar to the struct module.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From aahz at pythoncraft.com  Tue Aug  7 16:56:02 2007
From: aahz at pythoncraft.com (Aahz)
Date: Tue, 7 Aug 2007 07:56:02 -0700
Subject: [Python-3000] map() Returns Iterator
In-Reply-To: <ca471dc20708061922g4e8babceh85dc2687beae8978@mail.gmail.com>
References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708061118k57d10f16g6f6a57ce3a9f907e@mail.gmail.com>
	<87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708061922g4e8babceh85dc2687beae8978@mail.gmail.com>
Message-ID: <20070807145602.GA20333@panix.com>

On Mon, Aug 06, 2007, Guido van Rossum wrote:
>
> I've always made a point of suggesting that we're switching to
> returning iterators instead of lists from as many APIs as makes sense
> (I stop at str.split() though, as I can't think of a use case where
> the list would be so big as to be bothersome).

s = ('123456789' * 10) + '\n'
s = s * 10**9
s.split('\n')

Now, maybe we "shouldn't" be processing all that in memory, but if your
argument applies to other things, I don't see why it shouldn't apply to
split().  Keep in mind that because split() generates a new string for
each line, that really does eat lots of memory, even if you switch to
10**6 instead of 10**9, which seems like a very common use case.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

This is Python.  We don't care much about theory, except where it intersects 
with useful practice.  

From guido at python.org  Tue Aug  7 16:59:05 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 07:59:05 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B866A0.2040800@gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
Message-ID: <ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>

On 8/7/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Yeah, this approach seems to run counter to the whole point of getting
> rid of the current str type: "for binary data use bytes, for text use
> Unicode, unless you need your binary data to be hashable, and then you
> decode it to gibberish Unicode via the latin-1 codec"
>
> This would mean that the Unicode type would acquire all of the ambiquity
> currently associated with the 8-bit str type: does it contain actual
> text, or does it contain arbitrary latin-1 decoded binary data?

Not necessarily, as this kind of use is typically very localized.
Remember practicality beats purity.

> A separate frozenbytes type (with the bytes API instead of the string
> API) would solve the problem far more cleanly.

But at a cost: an extra data type, more code to maintain, more docs to
write, thicker books, etc.

To me, the most important cost is that every time you need to use
bytes you would have to think about whether to use frozen or mutable
bytes.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From shiblon at gmail.com  Tue Aug  7 16:59:31 2007
From: shiblon at gmail.com (Chris Monson)
Date: Tue, 7 Aug 2007 10:59:31 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
Message-ID: <da3f900e0708070759y43f9b925le9914fe9c63be4d1@mail.gmail.com>

On 8/6/07, Guido van Rossum <guido at python.org> wrote:
>
> On 8/6/07, Chris Monson <shiblon at gmail.com> wrote:
> > On 8/6/07, Guido van Rossum <guido at python.org> wrote:
> > > On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > > > b) should bytes literals be regular or frozen bytes?
> > >
> > > Regular -- set literals produce mutable sets, too.
> >
> > But all other string literals produce immutable types:
> >
> > ""
> > r""
> > u"" (going away, but still)
> > and hopefully b""
> >
> > Wouldn't it be confusing to have b"" be the only mutable quote-delimited
> > literal?  For everything else, there's bytes().
>
> Well, it would be just as confusing to have a bytes literal and not
> have it return a bytes object. The frozenbytes type is intended (if I
> understand the use case correctly) as for the relatively rare case
> where bytes must be used as dict keys and we can't assume that the
> bytes use any particular encoding.
>
> Personally, I still think that converting to the latin-1 encoding is
> probably just as good for this particular use case. So perhaps I don't
> understand the use case(s?) correctly.
>
> > :-)
>
> What does the :-) mean? That you're not seriously objecting?


No, just that I'm friendly.  (just a smile, not a wink).

I still think that having b"" be the only immutable string-looking thing is
a bad idea.  Just because the types are named "bytes" and "frozenbytes"
instead of "bytes" and "BytesIO" or something similar doesn't mean that the
syntax magically looks right.

--
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070807/d559048c/attachment.html 

From guido at python.org  Tue Aug  7 17:01:41 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 08:01:41 -0700
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
	tests in the struni branch!
In-Reply-To: <18104.33617.706079.853923@montanaro.dyndns.org>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<f99n87$8or$1@sea.gmane.org>
	<18104.33617.706079.853923@montanaro.dyndns.org>
Message-ID: <ca471dc20708070801obfe5adp179c59474b196e14@mail.gmail.com>

Odd. It passes for me. What platform? What locale? Have you tried svn
up and rebuilding? Do you have any local changes (svn st)?

Note that in s'\x00', the 's' prefix is produced by the repr() of a
str8 object; this may be enough of a hint to track it down. Perhaps
there's a call to PyString_From... that got missed by the conversion
and only matters for certain locales?

--Guido

On 8/7/07, skip at pobox.com <skip at pobox.com> wrote:
> test_csv got removed from the failing list after Guido applied Adam Hupp's
> patch.  (I checked in a small update for one thing Adam missed.)  I'm still
> getting test failures though:
>
>     ======================================================================
>     FAIL: test_reader_attrs (__main__.Test_Csv)
>     ----------------------------------------------------------------------
>     Traceback (most recent call last):
>       File "Lib/test/test_csv.py", line 63, in test_reader_attrs
>         self._test_default_attrs(csv.reader, [])
>       File "Lib/test/test_csv.py", line 47, in _test_default_attrs
>         self.assertEqual(obj.dialect.delimiter, ',')
>     AssertionError: s'\x00' != ','
>
> This same exception crops up six times.  Maybe this isn't
> str->unicode-related, but it sure seems like it to me.  I spent some time
> over the past few days trying to figure it out, but I struck out.
>
> Skip
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From p.f.moore at gmail.com  Tue Aug  7 17:02:40 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 7 Aug 2007 16:02:40 +0100
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
	tests in the struni branch!
In-Reply-To: <ca471dc20708070736t6797477bw6667e0608bccb316@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<f99n87$8or$1@sea.gmane.org>
	<ca471dc20708070736t6797477bw6667e0608bccb316@mail.gmail.com>
Message-ID: <79990c6b0708070802y112f5750o5b5e79a2833a19b8@mail.gmail.com>

On 07/08/07, Guido van Rossum <guido at python.org> wrote:
> On 8/7/07, Thomas Heller <theller at ctypes.org> wrote:
> > I wanted to look into this and noticed that 'import time' on Windows
> > doesn't work anymore on my machine.  The reason is that PyUnicode_FromStringAndSize()
> > is called for the string 'Westeurop?ische Normalzeit', and then fails with
> >
> > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data
>
> I'm assuming that's a literal somewhere? In what encoding is it? That
> function was recently changed to require the input to be UTF-8. If the
> input isn't UTF-8, you'll have to use another API with an explicit
> encoding, PyUnicode_Decode().

I'd guess it's coming from a call to a Windows API somewhere. The
correct fix is probably to switch to using the "wide character"
Windows APIs, which will give Unicode values as results directly. A
shorter-term fix is possibly to use Windows' default code page to
decode all strings coming back from Windows APIs (although I'm not
sure it'll be any quicker in practice!).

Paul.

From lists at cheimes.de  Tue Aug  7 17:21:55 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 07 Aug 2007 17:21:55 +0200
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
 tests in the struni branch!
In-Reply-To: <ca471dc20708070739i2294ac62qce404ddbf42b1ab6@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>	
	<46B80D73.5050009@cheimes.de>
	<ca471dc20708070739i2294ac62qce404ddbf42b1ab6@mail.gmail.com>
Message-ID: <46B88E13.7070908@cheimes.de>

Guido van Rossum wrote:
> Alas, not for me (Ubuntu 6.06 LTS, UCS2 build):
> 
> ======================================================================
> ERROR: testEncodings (__main__.MinidomTest)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "Lib/test/test_minidom.py", line 872, in testEncodings
>     self.assertEqual(doc.toxml(),
>   File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py",
> line 46, in toxml
>     return self.toprettyxml("", "", encoding)
>   File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py",
> line 54, in toprettyxml
>     self.writexml(writer, "", indent, newl, encoding)
>   File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py",
> line 1747, in writexml
>     node.writexml(writer, indent, addindent, newl)
>   File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py",
> line 817, in writexml
>     node.writexml(writer,indent+addindent,addindent,newl)
>   File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py",
> line 1036, in writexml
>     _write_data(writer, "%s%s%s"%(indent, self.data, newl))
>   File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py",
> line 301, in _write_data
>     writer.write(data)
>   File "/usr/local/google/home/guido/python/py3k-struni/Lib/io.py",
> line 1023, in write
>     b = s.encode(self._encoding)
> UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in
> position 0: ordinal not in range(256)

What's your locale? My locale setting is de_DE.UTF-8.
When I run the unit test of minidom with "LC_ALL=C ./python
Lib/test/test_minidom.py" testEncoding is failing, too.

> So true. I'm hoping the real author will identify himself. :-)

His name is Lars Gust<E4>bel (probably Gust?bel).

Christian


From theller at ctypes.org  Tue Aug  7 17:50:27 2007
From: theller at ctypes.org (Thomas Heller)
Date: Tue, 07 Aug 2007 17:50:27 +0200
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
 tests in the struni branch!
In-Reply-To: <ca471dc20708070736t6797477bw6667e0608bccb316@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>	<f99n87$8or$1@sea.gmane.org>
	<ca471dc20708070736t6797477bw6667e0608bccb316@mail.gmail.com>
Message-ID: <f9a4c3$1ef$1@sea.gmane.org>

Guido van Rossum schrieb:
> On 8/7/07, Thomas Heller <theller at ctypes.org> wrote:
>> Guido van Rossum schrieb:
>> > test_ctypes
>> > Recently one test started failing again, after Martin changed
>> > PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1.
>>
>> I wanted to look into this and noticed that 'import time' on Windows
>> doesn't work anymore on my machine.  The reason is that PyUnicode_FromStringAndSize()
>> is called for the string 'Westeurop?ische Normalzeit', and then fails with
>>
>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data
> 
> I'm assuming that's a literal somewhere? In what encoding is it? That
> function was recently changed to require the input to be UTF-8. If the
> input isn't UTF-8, you'll have to use another API with an explicit
> encoding, PyUnicode_Decode().

It's in Modules/timemodule.c, line 691:
	PyModule_AddObject(m, "tzname",
			   Py_BuildValue("(zz)", tzname[0], tzname[1]));

According to MSDN, tzname is a global variable; the contents is somehow
derived from the TZ environment variable (which is not set in my case).

Is there another Py_BuildValue code that should be used?  BTW: There are
other occurrences of Py_BuildValue("(zz)", ...) in this file; someone should
probably check if the UTF8 can be assumed as input.

> I'm pretty sure this change is also responsible for the one failure
> (as it started around the time that change was made) but I don't
> understand the failure well enough to track it down. (It looked like
> uninitialized memory was being accessed though.)

I'm not sure what failure you are talking about here.

> In case you wonder why it was changed, it's for symmetry with
> _PyUnicode_AsDefaultEncodedString(), which is the most common way to
> turn Unicode back into a char* without specifying an encoding. (And
> yes, that name needs to be changed.)
> 
> See recent posts here.
> 


From jeremy at alum.mit.edu  Tue Aug  7 17:52:05 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Tue, 7 Aug 2007 11:52:05 -0400
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
Message-ID: <e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>

On 8/6/07, Fred Drake <fdrake at acm.org> wrote:
> On Aug 6, 2007, at 4:46 PM, skip at pobox.com wrote:
> > I thought rfc822 was going away.  From the current module
> > documentation:
> > ...
> > Shouldn't rfc822 be gone altogether in Python 3?
>
> Yes.  And the answers to Jeremy's questions about what sort of IO is
> appropriate for the email package should be left to the email-sig as
> well, I suspect.  It's good that they've come up.

Hmmm.  Should we being using the email package to parse HTTP headers?
RFC 2616 says that HTTP headers follow the "same generic format" as
RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers
are arbitrary 8-bit values.  You'd need to parse them differently.

I also wonder if it makes sense for httplib to depend on email.  If it
is possible to write generic code, maybe it belongs in a common
library rather than in either email or httplib.

I meant my original email to ask a more general question:  Does anyone
have some suggestions about how to design libraries that could deal
with bytes or strings?  If an HTTP header value contains 8-bit binary
data, does the client application expect bytes or a string in some
encoding?

If you have a library that consumes file-like objects, how do you deal
with bytes vs. strings?  Do you have two constructor options so that
the client can specify what kind of output the file-like object
products?  Do you try to guess?  Do you just write code assuming
strings and let it fail on a bad lower() call when it gets bytes?

Jeremy

>
>
>    -Fred
>
> --
> Fred Drake   <fdrake at acm.org>
>
>
>
>

From jimjjewett at gmail.com  Tue Aug  7 17:56:49 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 7 Aug 2007 11:56:49 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
Message-ID: <fb6fbf560708070856g4435666bkd47ae65d9fc3a7a4@mail.gmail.com>

On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> On 8/7/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > This would mean that the Unicode type would acquire all of the ambiquity
> > currently associated with the 8-bit str type: does it contain actual
> > text, or does it contain arbitrary latin-1 decoded binary data?

...

> > A separate frozenbytes type (with the bytes API instead of the string
> > API) would solve the problem far more cleanly.

> But at a cost: an extra data type, more code to maintain, more docs to
> write, thicker books, etc.

I think that cost is already there, and we're making it even worse by
trying to use the same name for two distinct concepts.

(1)  A mutable buffer
(2)  A literal which isn't "characters"

Historically, most of the type(2) examples have just used ASCII (or at
least Latin-1) for convenience, so that they *look* like characters.
The actual requirements are on the bytes, though, so recoding them to
a different output format is not OK.

Also note that for type(2), immutability is important, not just for
efficiency, but conceptually.  These are generally compile-time
constants, and letting them change *will* lead to confusion.  (Even
letting them get replaced is confusing, but that sort of
monkey-patching is sufficiently rare and obvious that it seems to work
out OK today.)

-jJ

From collinw at gmail.com  Tue Aug  7 18:22:47 2007
From: collinw at gmail.com (Collin Winter)
Date: Tue, 7 Aug 2007 09:22:47 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B834F6.7050307@canterbury.ac.nz>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz>
Message-ID: <43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com>

On 8/7/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Martin v. L?wis wrote:
> > Code that has been ported to the bytes type probably doesn't use it
> > correctly yet, but to me, the need for a buffery thing where you
> > can allocate some buffer, and then fill it byte-for-byte is quite
> > obvious.
>
> We actually already *have* something like that,
> i.e. array.array('B').

Could someone please explain to me the conceptual difference between
array.array('B'), bytes(), buffer objects and simple lists of
integers? I'm confused about when I should use which.

Collin Winter

From ncoghlan at gmail.com  Tue Aug  7 18:22:51 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 08 Aug 2007 02:22:51 +1000
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>	
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	
	<46B78F86.9000505@v.loewis.de>	
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>	
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>	
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
Message-ID: <46B89C5B.7090104@gmail.com>

Guido van Rossum wrote:
> On 8/7/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> A separate frozenbytes type (with the bytes API instead of the
>> string API) would solve the problem far more cleanly.
> 
> But at a cost: an extra data type, more code to maintain, more docs
> to write, thicker books, etc.
> 
> To me, the most important cost is that every time you need to use 
> bytes you would have to think about whether to use frozen or mutable 
> bytes.

I agree this cost exists, but I don't think it is very high. I would 
expect the situation to be the same as with sets - you'd use the mutable
version by default, unless there was some specific reason to want the 
frozen version (usually because you want something that is hashable, or 
easy to share safely amongst multiple clients).

However, code also talks louder than words in this case, and I don't 
have any relevant code, so I am going to try to stay out of this thread now.

Cheers,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Tue Aug  7 18:35:00 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 09:35:00 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz>
	<43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com>
Message-ID: <ca471dc20708070935j50715ccre93143b0a30a0ae9@mail.gmail.com>

On 8/7/07, Collin Winter <collinw at gmail.com> wrote:
> Could someone please explain to me the conceptual difference between
> array.array('B'), bytes(), buffer objects and simple lists of
> integers? I'm confused about when I should use which.

Assuming you weren't being sarcastic, array('B') and bytes() are very
close except bytes have a literal notation and many string-ish
methods. The buffer objects returned by the buffer() builtin provide a
read-only view on other objects that happen to have an internal
buffer, like strings, bytes, arrays, PIL images, and numpy arrays.
Lists of integers don't have the property that the other three share
which is that their C representation is a contiguous array of bytes
(char* in C). This representation is important because to do efficient
I/O in C you need char*.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug  7 18:39:26 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 09:39:26 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B89C5B.7090104@gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com>
Message-ID: <ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>

On 8/7/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
> > On 8/7/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >> A separate frozenbytes type (with the bytes API instead of the
> >> string API) would solve the problem far more cleanly.
> >
> > But at a cost: an extra data type, more code to maintain, more docs
> > to write, thicker books, etc.
> >
> > To me, the most important cost is that every time you need to use
> > bytes you would have to think about whether to use frozen or mutable
> > bytes.
>
> I agree this cost exists, but I don't think it is very high. I would
> expect the situation to be the same as with sets - you'd use the mutable
> version by default, unless there was some specific reason to want the
> frozen version (usually because you want something that is hashable, or
> easy to share safely amongst multiple clients).

That would imply that b"..." should return a mutable bytes object,
which many people have objected to. If b"..." is immutable, the
immutable bytes type is in your face all the time and you'll have to
deal with the difference all the time. E.g. is the result of
concatenating a mutable and an immutable bytes object mutable? Does it
matter whether the mutable operand is first or second? Is a slice of
an immutable bytes array immutable itself?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Tue Aug  7 18:46:50 2007
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 7 Aug 2007 11:46:50 -0500
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
 tests in the struni branch!
In-Reply-To: <ca471dc20708070801obfe5adp179c59474b196e14@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<f99n87$8or$1@sea.gmane.org>
	<18104.33617.706079.853923@montanaro.dyndns.org>
	<ca471dc20708070801obfe5adp179c59474b196e14@mail.gmail.com>
Message-ID: <18104.41466.798544.931265@montanaro.dyndns.org>


    Guido> Odd. It passes for me. What platform? What locale? Have you tried
    Guido> svn up and rebuilding? Do you have any local changes (svn st)?

I am completely up-to-date:

    >>> sys.subversion
    ('CPython', 'branches/py3k-struni', '56800')

Running on Mac OS X (G4 Powerbook), no local modifications.  Configured like
so:

    ./configure --prefix=/Users/skip/local LDFLAGS=-L/opt/local/lib CPPFLAGS=-I/opt/local/include --with-pydebug

Locale:

    >>> locale.getdefaultlocale()
    (None, 'mac-roman')

Is there some environment variable I can set to run in a different locale?

    Guido> Note that in s\x00, the s prefix is produced by the repr() of a
    Guido> str8 object; this may be enough of a hint to track it
    Guido> down. Perhaps theres a call to PyString_From... that got missed
    Guido> by the conversion and only matters for certain locales?

I don't see any PyString_From... calls left in Modules/_csv.c.  Should I be
looking elsewhere?

Skip


From guido at python.org  Tue Aug  7 18:51:34 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 09:51:34 -0700
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
Message-ID: <ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>

On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> On 8/6/07, Fred Drake <fdrake at acm.org> wrote:
> > On Aug 6, 2007, at 4:46 PM, skip at pobox.com wrote:
> > > I thought rfc822 was going away.  From the current module
> > > documentation:
> > > ...
> > > Shouldn't rfc822 be gone altogether in Python 3?
> >
> > Yes.  And the answers to Jeremy's questions about what sort of IO is
> > appropriate for the email package should be left to the email-sig as
> > well, I suspect.  It's good that they've come up.
>
> Hmmm.  Should we being using the email package to parse HTTP headers?
> RFC 2616 says that HTTP headers follow the "same generic format" as
> RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers
> are arbitrary 8-bit values.  You'd need to parse them differently.

I'm confused (and too lazy to read the RFCs). How can you have case
insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit
values? Assuming they mean it's an ASCII superset, does that mean that
HTTP doesn't have case insensitivity for bytes with values > 127?

> I also wonder if it makes sense for httplib to depend on email.  If it
> is possible to write generic code, maybe it belongs in a common
> library rather than in either email or httplib.
>
> I meant my original email to ask a more general question:  Does anyone
> have some suggestions about how to design libraries that could deal
> with bytes or strings?  If an HTTP header value contains 8-bit binary
> data, does the client application expect bytes or a string in some
> encoding?
>
> If you have a library that consumes file-like objects, how do you deal
> with bytes vs. strings?  Do you have two constructor options so that
> the client can specify what kind of output the file-like object
> products?  Do you try to guess?  Do you just write code assuming
> strings and let it fail on a bad lower() call when it gets bytes?

In general I'm against writing polymorphic code that tries to work for
strings as well as bytes, except very small algorithms. For larger
amounts of code, you almost always run into the need for literals or
hashing or case conversion or other differences (e.g. \n vs. \r\n when
doing I/O).

I think it's conceptually cleaner to pick a particular type for an API
and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm
files read/write bytes; text files (io.TextIOBase) read/write strings.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug  7 18:55:48 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 09:55:48 -0700
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
	tests in the struni branch!
In-Reply-To: <f9a4c3$1ef$1@sea.gmane.org>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<f99n87$8or$1@sea.gmane.org>
	<ca471dc20708070736t6797477bw6667e0608bccb316@mail.gmail.com>
	<f9a4c3$1ef$1@sea.gmane.org>
Message-ID: <ca471dc20708070955k4c734bcbge8efd08e6b001af8@mail.gmail.com>

On 8/7/07, Thomas Heller <theller at ctypes.org> wrote:
> Guido van Rossum schrieb:
> > On 8/7/07, Thomas Heller <theller at ctypes.org> wrote:
> >> Guido van Rossum schrieb:
> >> > test_ctypes
> >> > Recently one test started failing again, after Martin changed
> >> > PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1.
> >>
> >> I wanted to look into this and noticed that 'import time' on Windows
> >> doesn't work anymore on my machine.  The reason is that PyUnicode_FromStringAndSize()
> >> is called for the string 'Westeurop?ische Normalzeit', and then fails with
> >>
> >> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data
> >
> > I'm assuming that's a literal somewhere? In what encoding is it? That
> > function was recently changed to require the input to be UTF-8. If the
> > input isn't UTF-8, you'll have to use another API with an explicit
> > encoding, PyUnicode_Decode().
>
> It's in Modules/timemodule.c, line 691:
>         PyModule_AddObject(m, "tzname",
>                            Py_BuildValue("(zz)", tzname[0], tzname[1]));
>
> According to MSDN, tzname is a global variable; the contents is somehow
> derived from the TZ environment variable (which is not set in my case).

Is there anything from which you can guess the encoding (e.g. the
filesystem encoding?).

> Is there another Py_BuildValue code that should be used?  BTW: There are
> other occurrences of Py_BuildValue("(zz)", ...) in this file; someone should
> probably check if the UTF8 can be assumed as input.

These are all externally-provided strings. It will depend on the
platform what the encoding is.

I wonder if we need to add another format code to Py_BuildValue (and
its friends) to designate "platform default encoding" instead of
UTF-8.

> > I'm pretty sure this change is also responsible for the one failure
> > (as it started around the time that change was made) but I don't
> > understand the failure well enough to track it down. (It looked like
> > uninitialized memory was being accessed though.)
>
> I'm not sure what failure you are talking about here.

When I run test_ctypes I get this (1 error out of 301 tests):

======================================================================
ERROR: test_functions (ctypes.test.test_stringptr.StringPtrTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/ctypes/test/test_stringptr.py",
line 72, in test_functions
    x1 = r[0], r[1], r[2], r[3], r[4]
UnicodeDecodeError: 'utf8' codec can't decode byte 0xdb in position 0:
unexpected end of data

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jeremy at alum.mit.edu  Tue Aug  7 19:38:44 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Tue, 7 Aug 2007 13:38:44 -0400
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
Message-ID: <e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>

On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> > On 8/6/07, Fred Drake <fdrake at acm.org> wrote:
> > > On Aug 6, 2007, at 4:46 PM, skip at pobox.com wrote:
> > > > I thought rfc822 was going away.  From the current module
> > > > documentation:
> > > > ...
> > > > Shouldn't rfc822 be gone altogether in Python 3?
> > >
> > > Yes.  And the answers to Jeremy's questions about what sort of IO is
> > > appropriate for the email package should be left to the email-sig as
> > > well, I suspect.  It's good that they've come up.
> >
> > Hmmm.  Should we being using the email package to parse HTTP headers?
> > RFC 2616 says that HTTP headers follow the "same generic format" as
> > RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers
> > are arbitrary 8-bit values.  You'd need to parse them differently.
>
> I'm confused (and too lazy to read the RFCs). How can you have case
> insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit
> values? Assuming they mean it's an ASCII superset, does that mean that
> HTTP doesn't have case insensitivity for bytes with values > 127?

For HTTP, the header names need to be ASCII, but the values can be
great > 127.  I haven't read enough of the spec to know which header
values might include binary data and how you are supposed to interpret
them.  Assuming that the spec allows OCTET instead of token (which is
ASCII) for a reason, it suggests that the header values need to be
bytes.

> > I also wonder if it makes sense for httplib to depend on email.  If it
> > is possible to write generic code, maybe it belongs in a common
> > library rather than in either email or httplib.
> >
> > I meant my original email to ask a more general question:  Does anyone
> > have some suggestions about how to design libraries that could deal
> > with bytes or strings?  If an HTTP header value contains 8-bit binary
> > data, does the client application expect bytes or a string in some
> > encoding?
> >
> > If you have a library that consumes file-like objects, how do you deal
> > with bytes vs. strings?  Do you have two constructor options so that
> > the client can specify what kind of output the file-like object
> > products?  Do you try to guess?  Do you just write code assuming
> > strings and let it fail on a bad lower() call when it gets bytes?
>
> In general I'm against writing polymorphic code that tries to work for
> strings as well as bytes, except very small algorithms. For larger
> amounts of code, you almost always run into the need for literals or
> hashing or case conversion or other differences (e.g. \n vs. \r\n when
> doing I/O).
>
> I think it's conceptually cleaner to pick a particular type for an API
> and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm
> files read/write bytes; text files (io.TextIOBase) read/write strings.

It certainly makes rfc822 tricky to update.  Is it intended to work
with files or sockets?  In Python 2.x, it works with either.  If we
have some future email/rfc822/httpheaders library that parses the
"generic format," will it work with sockets or files or will we have
two versions?

Jeremy

From guido at python.org  Tue Aug  7 19:52:26 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 10:52:26 -0700
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
Message-ID: <ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>

On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> > On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> > > Hmmm.  Should we being using the email package to parse HTTP headers?
> > > RFC 2616 says that HTTP headers follow the "same generic format" as
> > > RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers
> > > are arbitrary 8-bit values.  You'd need to parse them differently.
> >
> > I'm confused (and too lazy to read the RFCs). How can you have case
> > insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit
> > values? Assuming they mean it's an ASCII superset, does that mean that
> > HTTP doesn't have case insensitivity for bytes with values > 127?
>
> For HTTP, the header names need to be ASCII, but the values can be
> great > 127.  I haven't read enough of the spec to know which header
> values might include binary data and how you are supposed to interpret
> them.  Assuming that the spec allows OCTET instead of token (which is
> ASCII) for a reason, it suggests that the header values need to be
> bytes.

Bizarre. I'm not aware of any HTTP header that requires *binary*
values. I can imagine though that they may contain *encoded* text and
that they are leaving the encoding up to separate negotiations between
client and server, or another header, or specified explicitly by the
header, etc. It can't be pure binary because it's still subject to the
\r\n line terminator.

> > In general I'm against writing polymorphic code that tries to work for
> > strings as well as bytes, except very small algorithms. For larger
> > amounts of code, you almost always run into the need for literals or
> > hashing or case conversion or other differences (e.g. \n vs. \r\n when
> > doing I/O).
> >
> > I think it's conceptually cleaner to pick a particular type for an API
> > and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm
> > files read/write bytes; text files (io.TextIOBase) read/write strings.
>
> It certainly makes rfc822 tricky to update.  Is it intended to work
> with files or sockets?  In Python 2.x, it works with either.  If we
> have some future email/rfc822/httpheaders library that parses the
> "generic format," will it work with sockets or files or will we have
> two versions?

It never worked with socket object, did it? If it worked with the
objects returned by makefile(), why not use text mode ("r" or "w") as
the mode arg? (Then you can even specify an encoding.) IMO it makes
more sense to treat rfc822 headers as text, since they are for all
intents and purposes meant to be human-readable, and there's case
insensitivity implied.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jeremy at alum.mit.edu  Tue Aug  7 20:31:30 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Tue, 7 Aug 2007 14:31:30 -0400
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
Message-ID: <e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>

On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> > On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> > > On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> > > > Hmmm.  Should we being using the email package to parse HTTP headers?
> > > > RFC 2616 says that HTTP headers follow the "same generic format" as
> > > > RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers
> > > > are arbitrary 8-bit values.  You'd need to parse them differently.
> > >
> > > I'm confused (and too lazy to read the RFCs). How can you have case
> > > insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit
> > > values? Assuming they mean it's an ASCII superset, does that mean that
> > > HTTP doesn't have case insensitivity for bytes with values > 127?
> >
> > For HTTP, the header names need to be ASCII, but the values can be
> > great > 127.  I haven't read enough of the spec to know which header
> > values might include binary data and how you are supposed to interpret
> > them.  Assuming that the spec allows OCTET instead of token (which is
> > ASCII) for a reason, it suggests that the header values need to be
> > bytes.
>
> Bizarre. I'm not aware of any HTTP header that requires *binary*
> values. I can imagine though that they may contain *encoded* text and
> that they are leaving the encoding up to separate negotiations between
> client and server, or another header, or specified explicitly by the
> header, etc. It can't be pure binary because it's still subject to the
> \r\n line terminator.

I did a little more reading.

"""The TEXT rule is only used for descriptive field contents and values
   that are not intended to be interpreted by the message parser. Words
   of *TEXT MAY contain characters from character sets other than ISO-
   8859-1 [22] only when encoded according to the rules of RFC 2047
   [14].

       TEXT           = <any OCTET except CTLs,
                        but including LWS>
"""

The odd thing here is that RFC 2047 (MIME) seems to be about encoding
non-ASCII character sets in ASCII.  So the spec is kind of odd here.
The actual bytes on the wire seem to be ASCII, but they may an
interpretation where those ASCII bytes represent a non-ASCII string.
So the shared parsing with email/rfc822 does seem reasonable.

> > > In general I'm against writing polymorphic code that tries to work for
> > > strings as well as bytes, except very small algorithms. For larger
> > > amounts of code, you almost always run into the need for literals or
> > > hashing or case conversion or other differences (e.g. \n vs. \r\n when
> > > doing I/O).
> > >
> > > I think it's conceptually cleaner to pick a particular type for an API
> > > and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm
> > > files read/write bytes; text files (io.TextIOBase) read/write strings.
> >
> > It certainly makes rfc822 tricky to update.  Is it intended to work
> > with files or sockets?  In Python 2.x, it works with either.  If we
> > have some future email/rfc822/httpheaders library that parses the
> > "generic format," will it work with sockets or files or will we have
> > two versions?
>
> It never worked with socket object, did it? If it worked with the
> objects returned by makefile(), why not use text mode ("r" or "w") as
> the mode arg? (Then you can even specify an encoding.) IMO it makes
> more sense to treat rfc822 headers as text, since they are for all
> intents and purposes meant to be human-readable, and there's case
> insensitivity implied.

We use the same makefile() object to read the headers and the body.
We can't trust the body is text.  I guess we could change the code to
use two different makefile() calls--a text one for headers that is
closed when the headers are done, and a binary one for the body.

Jeremy

From stephen at xemacs.org  Tue Aug  7 20:49:53 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 08 Aug 2007 03:49:53 +0900
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
Message-ID: <873ayv89im.fsf@uwakimon.sk.tsukuba.ac.jp>

Guido van Rossum writes:

 > Bizarre. I'm not aware of any HTTP header that requires *binary*
 > values. I can imagine though that they may contain *encoded* text and
 > that they are leaving the encoding up to separate negotiations between
 > client and server, or another header, or specified explicitly by the
 > header, etc. It can't be pure binary because it's still subject to the
 > \r\n line terminator.

I assume that the relevant explanation is from RFC 2616, sec 2.2
<ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt>:

   The TEXT rule is only used for descriptive field contents and values
   that are not intended to be interpreted by the message parser. Words
   of *TEXT MAY contain characters from character sets other than ISO-
   8859-1 [22] only when encoded according to the rules of RFC 2047
   [14].

       TEXT           = <any OCTET except CTLs, but including LWS>

   A CRLF is allowed in the definition of TEXT only as part of a header
   field continuation. It is expected that the folding LWS will be
   replaced with a single SP before interpretation of the TEXT value.

Many parsed fields are made up of tokens, whose components are a
subset of CHAR, which is US-ASCII characters as octets (also
sec. 2.2).  This is the ASCII coded character set (EBCDIC encoding of
the ASCII repertoire won't do).  Other parsed fields contain special
data, such as dates, written with some subset of ASCII.

From guido at python.org  Tue Aug  7 20:38:25 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 11:38:25 -0700
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
	<e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
Message-ID: <ca471dc20708071138s7722ddbdp7d675f8b83084000@mail.gmail.com>

On 8/7/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> We use the same makefile() object to read the headers and the body.
> We can't trust the body is text.  I guess we could change the code to
> use two different makefile() calls--a text one for headers that is
> closed when the headers are done, and a binary one for the body.

That would cause problems with the buffering, but it is safe to
extract the underlying binary buffered stream from the TextIOWrapper
instance using the .buffer attribute -- this is intentionally not
prefixed with an underscore.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lars at gustaebel.de  Tue Aug  7 20:40:00 2007
From: lars at gustaebel.de (Lars =?iso-8859-15?Q?Gust=E4bel?=)
Date: Tue, 7 Aug 2007 20:40:00 +0200
Subject: [Python-3000] Pleaswe help with the countdown to zero
	failing	tests in the struni branch!
In-Reply-To: <46B80D73.5050009@cheimes.de>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<46B80D73.5050009@cheimes.de>
Message-ID: <20070807184000.GA26947@core.g33x.de>

On Tue, Aug 07, 2007 at 08:13:07AM +0200, Christian Heimes wrote:
> > test_tarfile
> > Virgin territory again (but different owner :-).
> 
> The tarfile should be addressed by either its original author or
> somebody with lots of spare time. As stated earlier it's a beast. I
> tried to fix it several weeks ago because I thought it is a low hanging
> fruit. I was totally wrong. :/

Okay, I fixed tarfile.py. It isn't that hard if know how to tame
the beast ;-) I hope everything works fine now.

-- 
Lars Gust?bel
lars at gustaebel.de

Der Mensch kann zwar tun, was er will,
aber er kann nicht wollen, was er will.
(Arthur Schopenhauer)


From guido at python.org  Tue Aug  7 21:27:40 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 12:27:40 -0700
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
	tests in the struni branch!
In-Reply-To: <20070807184000.GA26947@core.g33x.de>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<46B80D73.5050009@cheimes.de> <20070807184000.GA26947@core.g33x.de>
Message-ID: <ca471dc20708071227pfb24d03ifa7de91b92d8e9b8@mail.gmail.com>

I still get these three failures on Ubuntu dapper:


======================================================================
ERROR: test_fileobj_iter (test.test_tarfile.Bz2UstarReadTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/test/test_tarfile.py",
line 83, in test_fileobj_iter
    tarinfo = self.tar.getmember("ustar/regtype")
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 2055, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 2131, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 2169, in makefile
    copyfileobj(source, target)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 254, in copyfileobj
    shutil.copyfileobj(src, dst)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/shutil.py",
line 21, in copyfileobj
    buf = fsrc.read(length)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 809, in read
    buf += self.fileobj.read(size - len(buf))
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 718, in read
    return self.readnormal(size)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 727, in readnormal
    return self.fileobj.read(size)
ValueError: the bz2 library has received wrong parameters

======================================================================
ERROR: test_fileobj_readlines (test.test_tarfile.Bz2UstarReadTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/test/test_tarfile.py",
line 67, in test_fileobj_readlines
    tarinfo = self.tar.getmember("ustar/regtype")
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 2055, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 2131, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 2169, in makefile
    copyfileobj(source, target)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 254, in copyfileobj
    shutil.copyfileobj(src, dst)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/shutil.py",
line 21, in copyfileobj
    buf = fsrc.read(length)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 809, in read
    buf += self.fileobj.read(size - len(buf))
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 718, in read
    return self.readnormal(size)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 727, in readnormal
    return self.fileobj.read(size)
ValueError: the bz2 library has received wrong parameters

======================================================================
ERROR: test_fileobj_seek (test.test_tarfile.Bz2UstarReadTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/test/test_tarfile.py",
line 93, in test_fileobj_seek
    fobj = open(os.path.join(TEMPDIR, "ustar/regtype"), "rb")
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 2055, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 2131, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 2169, in makefile
    copyfileobj(source, target)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 254, in copyfileobj
    shutil.copyfileobj(src, dst)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/shutil.py",
line 21, in copyfileobj
    buf = fsrc.read(length)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 809, in read
    buf += self.fileobj.read(size - len(buf))
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 718, in read
    return self.readnormal(size)
  File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py",
line 727, in readnormal
    return self.fileobj.read(size)
ValueError: the bz2 library has received wrong parameters

----------------------------------------------------------------------
Ran 140 tests in 5.346s

FAILED (errors=3)
test test_tarfile failed -- errors occurred; run in verbose mode for details
1 test failed:
    test_tarfile
[69852 refs]


On 8/7/07, Lars Gust?bel <lars at gustaebel.de> wrote:
> On Tue, Aug 07, 2007 at 08:13:07AM +0200, Christian Heimes wrote:
> > > test_tarfile
> > > Virgin territory again (but different owner :-).
> >
> > The tarfile should be addressed by either its original author or
> > somebody with lots of spare time. As stated earlier it's a beast. I
> > tried to fix it several weeks ago because I thought it is a low hanging
> > fruit. I was totally wrong. :/
>
> Okay, I fixed tarfile.py. It isn't that hard if know how to tame
> the beast ;-) I hope everything works fine now.
>
> --
> Lars Gust?bel
> lars at gustaebel.de
>
> Der Mensch kann zwar tun, was er will,
> aber er kann nicht wollen, was er will.
> (Arthur Schopenhauer)
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Tue Aug  7 21:52:55 2007
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 07 Aug 2007 14:52:55 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B83C7F.603@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com>	<46B66E7E.4060209@canterbury.ac.nz>
	<46B6A7E8.7040001@ronadam.com>	<46B6C335.4080504@canterbury.ac.nz>
	<46B6DE80.2050000@ronadam.com>	<46B7C369.3040509@canterbury.ac.nz>
	<46B7DEA6.5050609@ronadam.com> <46B83C7F.603@canterbury.ac.nz>
Message-ID: <46B8CD97.8020000@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> What about mismatched specifiers?
> 
> It's not clear exactly what you mean by a "mismatched"
> specifier.
 >
> Some types may recognise when they're being passed
> a format spec that belongs to another type, and try
> to convert themselves to that type (e.g. applying
> 'f' to an int or 'd' to a float).

After thinking about this a bit more, I think specifiers don't have types 
and don't belong to types. They are the instructions to convert an object 
to a string, and to format that string in a special ways.  It seems they 
are mapped one to many, and not one to one.


> If the type doesn't recognise the format at all,
> and doesn't have a fallback type to delegate to
> (as will probably be the case with str) then
> you will get an exception.

I agree.


>> I think the opinion so far is to let the objects __format__ method 
>> determine this, but we need to figure this out what the built in types 
>> will do.
> 
> My suggestions would be:
> 
>    int - understands all the 'integer' formats
>          (d, x, o, etc.)
>        - recognises the 'float' formats ('f', 'e', etc.)
>          and delegates to float
>        - delegates anything it doesn't recognise to str
> 
>    float - understands all the 'float' formats
>          - recognises the 'integer' formats and delegates to int
>          - delegates anything it doesn't recognise to str
> 
>    str - recognises the 'string' formats (only one?)
>        - raises an exception for anything it doesn't understand
> 
> I've forgotten where 'r' was supposed to fit into
> this scheme. Can anyone remind me?

So if i want my own objects to work with these other specifiers, I need to 
do something like the following in every object?

     def __format__(self, specifier):
         if specifier[0] in ['i', 'x', 'o', etc]:
             return int(self).format(specifier)
         if specifier[0] in ['f', 'e', etc]:
             return float(self.).format(specifier)
         if specifier[0] == 'r':
             return repr(self)
         if specifier[0] == 's':
             return str(self).format(specifier)
         if specifier[0] in '...':
             ...
             ... my own specifier handler
             ...
         raise ValueError, 'invalid specifier for this object type'



I'm currently playing with a model where specifiers are objects. This seems 
to simplify some things.  The specifier object parses the specifier term 
and has a method to apply it to a value.  It can know about all the 
standard built in specifiers.

It could call an objects __format__ method for an unknown specifier or we 
can have the __format__ method have first crack at it and write the default 
__format__ method like this...

     def __format__(self, specifier):
         return specifier.apply(self)



Then we can over ride it in various ways...

     def __format__(self, specifier):
         ...
         ... my specifier handler
         ...
         return result


     def __format__(self, specifier):
         if specifier[0] in '...':
             ...
             ... my specifier handler
             ...
             return result
         return specifier.apply(self)


     def __format__(self, specifier):
	try:
             return specifier.apply(self)
         Except ValueError:
             pass
         ...
         ... my specifier handler
         ...
         return result


Or we can say the standard ones don't call your __format__ method, but if 
you use the '!' specifier, it will call your __format__ method.  More 
limiting, but much simpler.  I'm not sure I have a preference here yet.

Cheers,
    Ron








From guido at python.org  Tue Aug  7 22:13:03 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 13:13:03 -0700
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
	tests in the struni branch!
In-Reply-To: <18104.33617.706079.853923@montanaro.dyndns.org>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<f99n87$8or$1@sea.gmane.org>
	<18104.33617.706079.853923@montanaro.dyndns.org>
Message-ID: <ca471dc20708071313j5256adb7j79c7cc6d374daa66@mail.gmail.com>

I see this too now, but only on OSX.

On 8/7/07, skip at pobox.com <skip at pobox.com> wrote:
> test_csv got removed from the failing list after Guido applied Adam Hupp's
> patch.  (I checked in a small update for one thing Adam missed.)  I'm still
> getting test failures though:
>
>     ======================================================================
>     FAIL: test_reader_attrs (__main__.Test_Csv)
>     ----------------------------------------------------------------------
>     Traceback (most recent call last):
>       File "Lib/test/test_csv.py", line 63, in test_reader_attrs
>         self._test_default_attrs(csv.reader, [])
>       File "Lib/test/test_csv.py", line 47, in _test_default_attrs
>         self.assertEqual(obj.dialect.delimiter, ',')
>     AssertionError: s'\x00' != ','
>
> This same exception crops up six times.  Maybe this isn't
> str->unicode-related, but it sure seems like it to me.  I spent some time
> over the past few days trying to figure it out, but I struck out.
>
> Skip
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Tue Aug  7 22:26:54 2007
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 07 Aug 2007 15:26:54 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B7CD8C.5070807@acm.org>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
Message-ID: <46B8D58E.5040501@ronadam.com>



Talin wrote:
> Ron Adam wrote:
>> Now here's the problem with all of this.  As we add the widths back 
>> into the format specifications, we are basically saying the idea of a 
>> separate field width specifier is wrong.
>>
>> So maybe it's not really a separate independent thing after all, and 
>> it just a convenient grouping for readability purposes only.
> 
> I'm beginning to suspect that this is indeed the case.

Yes, I believe so even more after experimenting last night with specifier 
objects.

for now I'm using ','s for separating *all* the terms.  I don't intend that 
should be used for a final version, but for now it makes parsing the terms 
and getting the behavior right much easier.

      f,.3,>7     right justify in field width 7, with 3 decimal places.

      s,^10,w20   Center in feild 10,  expands up to width 20.

      f,.3,%

This allows me to just split on ',' and experiment with ordering and see 
how some terms might need to interact with other terms and how to do that 
without having to fight the syntax problem for now.

Later the syntax can be compressed and tested with a fairly complete 
doctest as a separate problem.


> Before we go too much further, let me give out the URLs for the .Net 
> documentation on these topics, since much of the current design we're 
> discussing has been inspired by .Net:
> 
> http://msdn2.microsoft.com/en-us/library/dwhawy9k.aspx
> http://msdn2.microsoft.com/en-us/library/0c899ak8.aspx
> http://msdn2.microsoft.com/en-us/library/0asazeez.aspx
> http://msdn2.microsoft.com/en-us/library/c3s1ez6e.aspx
> http://msdn2.microsoft.com/en-us/library/az4se3k1.aspx
> http://msdn2.microsoft.com/en-us/library/txafckwd.aspx
> 
> I'd suggest some study of these. Although I would warn against adopting 
> this wholesale, as there are a huge number of features described in 
> these documents, more than I think we need.
> 
> One other URL for people who want to play around with implementing this 
> stuff is my Python prototype of the original version of the PEP. It has 
> all the code you need to format floats with decimal precision, 
> exponents, and so on:
> 
> http://www.viridia.org/hg/python/string_format?f=5e4b833ed285;file=StringFormat.py;style=raw 

Thanks, I'll take a look at it.

Cheers,
    Ron



From jimjjewett at gmail.com  Tue Aug  7 23:03:37 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 7 Aug 2007 17:03:37 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708070935j50715ccre93143b0a30a0ae9@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz>
	<43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com>
	<ca471dc20708070935j50715ccre93143b0a30a0ae9@mail.gmail.com>
Message-ID: <fb6fbf560708071403u43fd4ab8lf8ed6a6cf967a680@mail.gmail.com>

On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> On 8/7/07, Collin Winter <collinw at gmail.com> wrote:
> > Could someone please explain to me the conceptual difference between
> > array.array('B'), bytes(), buffer objects and simple lists of
> > integers? I'm confused about when I should use which.

[bytes and array.array are similar, but bytes have extra methods and a
literal notation]

[buffer is read-only to your code, but may not be immutable]

> Lists of integers don't have the property that the other three share
> which is that their C representation is a contiguous array of bytes
> (char* in C). This representation is important because to do efficient
> I/O in C you need char*.

This sounds almost as if they were all interchangable implementations
of the same interface, and you should choose based on quality of
implementation.

If the need for immutable isn't worth a distinct type, I'm not sure
why "I want it to be fast" is worth two (or three) extra types,
distinguished by the equivalent of cursor isolation level.

FWLIW, I think the migration path for the existing three types makes
sense, but I would like b" ... " to be an immutable bytes object, and
bytes(" ... ") to be the constructor for something that can mutate.

-jJ

From nas at arctrix.com  Tue Aug  7 23:12:01 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Tue, 7 Aug 2007 21:12:01 +0000 (UTC)
Subject: [Python-3000] should rfc822 accept text io or binary io?
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
Message-ID: <f9an71$vtp$1@sea.gmane.org>

Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> Hmmm.  Should we being using the email package to parse HTTP headers?
> RFC 2616 says that HTTP headers follow the "same generic format" as
> RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers
> are arbitrary 8-bit values.  You'd need to parse them differently.

It would be good to have a good RFC 2616 header parser in the
standard library.  I believe every Python web framework implements
it's own.

  Neil


From jimjjewett at gmail.com  Tue Aug  7 23:14:00 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 7 Aug 2007 17:14:00 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com>
	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
Message-ID: <fb6fbf560708071414u4b7b55fey7cb00daa30973ff5@mail.gmail.com>

On 8/7/07, Guido van Rossum <guido at python.org> wrote:

> If b"..." is immutable, the
> immutable bytes type is in your face all the time and you'll have to
> deal with the difference all the time.

There is a conceptual difference between the main use cases for
mutable (a buffer) and the main use cases for immutable (a protocol
constant).

I'm not sure why you would need a literal for the mutable version.
How often do you create a new buffer with initial values?  (Note:  not
pointing to existing memory; creating a new one.)

> E.g. is the result of
> concatenating a mutable and an immutable bytes object mutable?
> Does it matter whether the mutable operand is first or second?

I would say immutable; you're taking a snapshot.  (I would have some
sympathy for taking the type of the first operand, but then you need
to worry about + vs +=, and whether the start of the new object will
notice later state changes.)

> Is a slice of an immutable bytes array immutable itself?

Why wouldn't it be?

The question is what to do with a slice from a *mutable* array.  Most
of python uses copies (and keeps type, so the result is also mutable).
 Numpy often shares state for efficiency.  Making an immutable copy
makes sense for every sane use case *I* can come up with.  (The insane
case is that you might want to pass output_buffer[60:70] to some other
object for its status output.)

-jJ

From guido at python.org  Wed Aug  8 00:41:40 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 15:41:40 -0700
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
	tests in the struni branch!
In-Reply-To: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
Message-ID: <ca471dc20708071541s68ad50c5sb8b5c1becc9fca9d@mail.gmail.com>

Here's a followup.

We need help from someone with a 64-bit Linux box; these tests are
failing on 64-bit only: test_io, test_largefile, test_ossaudiodev,
test_poll, test_shelve, test_socket_ssl.

I suspect that the _fileio.c module probably is one of the culprits.

Other news:

On 8/6/07, Guido van Rossum <guido at python.org> wrote:
> We're down to 11 failing test in the struni branch. I'd like to get
> this down to zero ASAP so that we can retire the old p3yk (yes, with
> typo!) branch and rename py3k-struni to py3k.
>
> Please help! Here's the list of failing tests:
>
> test_ctypes
> Recently one test started failing again, after Martin changed
> PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1.
>
> test_email
> test_email_codecs
> test_email_renamed
> Can someone contact the email-sig and ask for help with these?
>
> test_minidom
> Recently started failing again; probably shallow.
>
> test_sqlite
> Virgin territory, probably best done by whoever wrote the code or at
> least someone with time to spare.
>
> test_tarfile
> Virgin territory again (but different owner :-).

Lars Gustaebel fixed this except for a few bz2-related tests.

> test_urllib2_localnet
> test_urllib2net
> I think Jeremy Hylton may be close to fixing these, he's done a lot of
> work on urllib and httplib.
>
> test_xml_etree_c
> Virgin territory again.
>
> There are also a few tests that only fail on CYGWIN or OSX; I won't
> bother listing these.

The two OSX tests listed at the time were fixed, thanks to those volunteers!

We now only have an OSX-specific failure in test_csv.

> If you want to help, please refer to this wiki page:
> http://wiki.python.org/moin/Py3kStrUniTests
>
> There are also other tasks; see http://wiki.python.org/moin/Py3kToDo

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lars at gustaebel.de  Wed Aug  8 01:21:14 2007
From: lars at gustaebel.de (Lars =?iso-8859-15?Q?Gust=E4bel?=)
Date: Wed, 8 Aug 2007 01:21:14 +0200
Subject: [Python-3000] Pleaswe help with the countdown to zero
	failing	tests in the struni branch!
In-Reply-To: <ca471dc20708071227pfb24d03ifa7de91b92d8e9b8@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<46B80D73.5050009@cheimes.de> <20070807184000.GA26947@core.g33x.de>
	<ca471dc20708071227pfb24d03ifa7de91b92d8e9b8@mail.gmail.com>
Message-ID: <20070807232114.GA29701@core.g33x.de>

On Tue, Aug 07, 2007 at 12:27:40PM -0700, Guido van Rossum wrote:
> I still get these three failures on Ubuntu dapper:
> 
> 
> ======================================================================
> ERROR: test_fileobj_iter (test.test_tarfile.Bz2UstarReadTest)
> ----------------------------------------------------------------------
[...]
> ValueError: the bz2 library has received wrong parameters

This is actually a bug in the bz2 module. The read() method of
bz2.BZ2File raises this ValueError with a size argument of 0.

-- 
Lars Gust?bel
lars at gustaebel.de

A chicken is an egg's way of producing more eggs.
(Anonymous)

From guido at python.org  Wed Aug  8 01:29:29 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 16:29:29 -0700
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
	tests in the struni branch!
In-Reply-To: <20070807232114.GA29701@core.g33x.de>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<46B80D73.5050009@cheimes.de> <20070807184000.GA26947@core.g33x.de>
	<ca471dc20708071227pfb24d03ifa7de91b92d8e9b8@mail.gmail.com>
	<20070807232114.GA29701@core.g33x.de>
Message-ID: <ca471dc20708071629hb0c3854yeb529577340f7528@mail.gmail.com>

Thanks -- fixed!
Committed revision 56814.


On 8/7/07, Lars Gust?bel <lars at gustaebel.de> wrote:
> On Tue, Aug 07, 2007 at 12:27:40PM -0700, Guido van Rossum wrote:
> > I still get these three failures on Ubuntu dapper:
> >
> >
> > ======================================================================
> > ERROR: test_fileobj_iter (test.test_tarfile.Bz2UstarReadTest)
> > ----------------------------------------------------------------------
> [...]
> > ValueError: the bz2 library has received wrong parameters
>
> This is actually a bug in the bz2 module. The read() method of
> bz2.BZ2File raises this ValueError with a size argument of 0.
>
> --
> Lars Gust?bel
> lars at gustaebel.de
>
> A chicken is an egg's way of producing more eggs.
> (Anonymous)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Aug  8 02:04:56 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 12:04:56 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
Message-ID: <46B908A8.8040605@canterbury.ac.nz>

Guido van Rossum wrote:

> On 8/7/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>>This would mean that the Unicode type would acquire all of the ambiquity
>>currently associated with the 8-bit str type
>
> Not necessarily, as this kind of use is typically very localized.
> Remember practicality beats purity.

Has anyone considered that, depending on the implementation,
a latin1-decoded unicode string could take 2-4 times as much
memory?

--
Greg

From guido at python.org  Wed Aug  8 02:11:53 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 17:11:53 -0700
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
	tests in the struni branch!
In-Reply-To: <18104.33617.706079.853923@montanaro.dyndns.org>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<f99n87$8or$1@sea.gmane.org>
	<18104.33617.706079.853923@montanaro.dyndns.org>
Message-ID: <ca471dc20708071711y36caa780ta85537c57d591f64@mail.gmail.com>

Fixed now. This was OSX only due to an endianness issue; but the bug
was universal: we were treating a unicode character using
structmodule's T_CHAR. Since other similar fields of the dialect type
were dealt with properly it seems this was merely an oversight.

On 8/7/07, skip at pobox.com <skip at pobox.com> wrote:
> test_csv got removed from the failing list after Guido applied Adam Hupp's
> patch.  (I checked in a small update for one thing Adam missed.)  I'm still
> getting test failures though:
>
>     ======================================================================
>     FAIL: test_reader_attrs (__main__.Test_Csv)
>     ----------------------------------------------------------------------
>     Traceback (most recent call last):
>       File "Lib/test/test_csv.py", line 63, in test_reader_attrs
>         self._test_default_attrs(csv.reader, [])
>       File "Lib/test/test_csv.py", line 47, in _test_default_attrs
>         self.assertEqual(obj.dialect.delimiter, ',')
>     AssertionError: s'\x00' != ','
>
> This same exception crops up six times.  Maybe this isn't
> str->unicode-related, but it sure seems like it to me.  I spent some time
> over the past few days trying to figure it out, but I struck out.
>
> Skip
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Aug  8 02:26:22 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 12:26:22 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B89C5B.7090104@gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com>
Message-ID: <46B90DAE.8050506@canterbury.ac.nz>

Nick Coghlan wrote:
> I would 
> expect the situation to be the same as with sets - you'd use the mutable
> version by default, unless there was some specific reason to want the 
> frozen version (usually because you want something that is hashable, or 
> easy to share safely amongst multiple clients).

My instinct with regard to sets is the other way around,
i.e. use immutable sets unless there's a reason they
need to be mutable. The reason is safety -- accidentally
trying to mutate an immutable object fails more quickly
and obviously than the converse.

If Python had had both mutable and immutable strings
from the beginning, would you be giving the same
advice, i.e. use mutable strings unless they need to
be immutable? If not, what makes strings different from
sets in this regard?

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug  8 02:57:35 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 12:57:35 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708070748y12f2fb88gdd359833ddad6ad4@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<ca471dc20708070748y12f2fb88gdd359833ddad6ad4@mail.gmail.com>
Message-ID: <46B914FF.6030606@canterbury.ac.nz>

Guido van Rossum wrote:
> Currently the array module can be used for
> this but I would like to get rid of it in favor of bytes and Travis
> Oliphant's new buffer API

I thought the plan was to *enhance* the array module
so that it provides multi-dimensional arrays that
support the new buffer protocol.

If the plan is instead to axe it completely, then
I'm disappointed. Bytes is only a replacement for
array.array('B'), not any of the other types.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug  8 03:14:47 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 13:14:47 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B8CD97.8020000@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com>
	<46B6C335.4080504@canterbury.ac.nz> <46B6DE80.2050000@ronadam.com>
	<46B7C369.3040509@canterbury.ac.nz> <46B7DEA6.5050609@ronadam.com>
	<46B83C7F.603@canterbury.ac.nz> <46B8CD97.8020000@ronadam.com>
Message-ID: <46B91907.6030705@canterbury.ac.nz>

Ron Adam wrote:
> 
> Greg Ewing wrote:
 >
>> Some types may recognise when they're being passed
>> a format spec that belongs to another type, and try
>> to convert themselves to that type (e.g. applying
>> 'f' to an int or 'd' to a float).
> 
> After thinking about this a bit more, I think specifiers don't have 
> types and don't belong to types.

I agree - I was kind of speaking in shorthand there.
What I really meant was that some types have some knowledge
of format specifiers recognised by other types. E.g.
int doesn't itself know how to format something using
a spec that starts with 'f', but it knows that float
*does* know, so it converts itself to a float and
lets float handle it from there.

If you were to pass an 'f' format to something that
had no clue about it at all, e.g. a datetime, you
would ultimately get an exception. And there's
nothing stopping another type from recognising
'f' and doing something of its own with it that
doesn't involve conversion to float (e.g. decimal).

So the one-to-many mapping you mention is accommodated.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug  8 01:53:42 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 11:53:42 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B6FEC5.9040503@gmail.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org> <46B6FEC5.9040503@gmail.com>
Message-ID: <46B90606.9070006@canterbury.ac.nz>

Nick Coghlan wrote:
> If __format__ receives both the alignment specifier and the format 
> specifier as arguments,

My suggestion would be for it to receive the alignment
spec pre-parsed, since apply_format has to at least
partially parse it itself, and there doesn't seem to
be anything gained by having *both* the format and
alignment specs arbitrary, as anything type-specific
can go in the format spec. So the alignment spec
might as well have a fixed syntax.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug  8 03:25:43 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 13:25:43 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com>
	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
Message-ID: <46B91B97.30603@canterbury.ac.nz>

Guido van Rossum wrote:
> That would imply that b"..." should return a mutable bytes object,
> which many people have objected to.

I'm still very uncomfortable about this. It's so
completely unlike anything else in the language.
I have a strong feeling that it is going to trip
people up a lot, and end up being one of the
Famous Warts To Be Fixed In Py4k.

There's some evidence of this already in the way
we're referring to it as a "bytes literal", when
it's *not* actually a literal, but a constructor.
Or at least it's a literal with an implied
construction operation around it.

> is the result of
> concatenating a mutable and an immutable bytes object mutable? Does it
> matter whether the mutable operand is first or second? Is a slice of
> an immutable bytes array immutable itself?

These are valid questions, but I don't think they're
imponderable enough to be show-stoppers.

With lists/tuples it's resolved by not allowing them
to be concatenated with each other, but that's probably
too restrictive here.

My feeling is that like should produce like, and where
there's a conflict, immutability should win. Mutable
buffers tend to be used as an internal part of something
else, such as an IOStream, and aren't exposed to the
outside, or if they are, they're exposed in a read-only
kind of way.

So concatenating mutable and immutable should give the
same result as concatenating two immutables, i.e. an
immutable. If you need to add something to the end of
your buffer, while keeping it mutable, you use extend().

This gives us

   immutable + immutable -> immutable
   mutable + immutable -> immutable
   immutable + mutable -> immutable
   mutable + mutable -> mutable  (*)

   immutable[:] -> immutable
   mutable[:] -> mutable         (*)

(*) One might argue that these should be immutable,
on the grounds of safety, but I think that would be
too surprisingly different from the way other mutable
sequence work.

--
Greg

From guido at python.org  Wed Aug  8 03:52:58 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 18:52:58 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B914FF.6030606@canterbury.ac.nz>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<ca471dc20708070748y12f2fb88gdd359833ddad6ad4@mail.gmail.com>
	<46B914FF.6030606@canterbury.ac.nz>
Message-ID: <ca471dc20708071852q66bc0b7cve4f625e63f5e55d@mail.gmail.com>

On 8/7/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> > Currently the array module can be used for
> > this but I would like to get rid of it in favor of bytes and Travis
> > Oliphant's new buffer API
>
> I thought the plan was to *enhance* the array module
> so that it provides multi-dimensional arrays that
> support the new buffer protocol.
>
> If the plan is instead to axe it completely, then
> I'm disappointed. Bytes is only a replacement for
> array.array('B'), not any of the other types.

I wouldn't ax it unless there was a replacement.  But I'm not holding
the replacement to any kind of compatibility with the old array
module, and I expect it would more likely take the form of a wrapper
around anything that supports the (new) buffer API, such as bytes.
This would render the array module obsolete.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug  8 04:01:35 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Aug 2007 19:01:35 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B90DAE.8050506@canterbury.ac.nz>
References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com> <46B90DAE.8050506@canterbury.ac.nz>
Message-ID: <ca471dc20708071901k69243f10u3bd414cd531fea3b@mail.gmail.com>

On 8/7/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> My instinct with regard to sets is the other way around,
> i.e. use immutable sets unless there's a reason they
> need to be mutable. The reason is safety -- accidentally
> trying to mutate an immutable object fails more quickly
> and obviously than the converse.

But this is impractical -- a very common way to work is to build up a
set incrementally. With immutable sets this would quickly become
O(N**2). That's why set() is mutable and {...} creates a set, and the
only way to create an immutable set is to use frozenset(...).

> If Python had had both mutable and immutable strings
> from the beginning, would you be giving the same
> advice, i.e. use mutable strings unless they need to
> be immutable? If not, what makes strings different from
> sets in this regard?

That's easy. sets are mutable for the same reason lists are mutable --
lists are conceptually containers for "larger" amounts of data than
strings. I don't adhere to the "let's just make copying  really fast
by using tricks like refcounting etc." school -- that was a pain in
the B* for ABC.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Wed Aug  8 04:16:16 2007
From: talin at acm.org (Talin)
Date: Tue, 07 Aug 2007 19:16:16 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <ca471dc20708070935j50715ccre93143b0a30a0ae9@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	<46B78F86.9000505@v.loewis.de>	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	<46B79815.1030504@v.loewis.de>	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>	<46B7F869.6080007@v.loewis.de>
	<46B834F6.7050307@canterbury.ac.nz>	<43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com>
	<ca471dc20708070935j50715ccre93143b0a30a0ae9@mail.gmail.com>
Message-ID: <46B92770.1080907@acm.org>

Guido van Rossum wrote:

> Assuming you weren't being sarcastic, array('B') and bytes() are very
> close except bytes have a literal notation and many string-ish
> methods. The buffer objects returned by the buffer() builtin provide a
> read-only view on other objects that happen to have an internal
> buffer, like strings, bytes, arrays, PIL images, and numpy arrays.
> Lists of integers don't have the property that the other three share
> which is that their C representation is a contiguous array of bytes
> (char* in C). This representation is important because to do efficient
> I/O in C you need char*.

I've been following the discussion in a cursory way, and I can see that 
there is a lot of disagreement and confusion around the whole issue of 
mutable vs. immutable bytes.

If it were up to me, and I was starting over from scratch, here's the 
design I would create:

1) I'd reserve the term 'bytes' to refer to an immutable byte string.

2) I would re-purpose the existing term 'buffer' to refer to a mutable, 
resizable byte buffer.

Rationale: "buffer" has been used historically in many computer 
languages to refer to a mutable area of memory. The word 'bytes', on the 
other hand, seems to imply a *value* rather than a *location*, and 
values (like numbers) are generally considered immutable.

3) Both 'bytes' and 'buffer' would be derived from an abstract base 
class called ByteSequence. ByteSequence defines all of the read-only 
accessor methods common to both classes.

4) Literals of both types are available - using a prefix of small 'b' 
for bytes, and capitol B for 'buffer'.

5) Both 'bytes' and 'buffer' would support the 'buffer protocol', 
although in the former case it would be read-only. Other things which 
are not buffers could also support this protocol.

6) Library APIs that required a byte sequence would be written to test 
vs. the abstract ByteSequence type.

7) Both bytes and buffer objects would be inter-convertible using the 
appropriate constructors.

-- Talin

From shiblon at gmail.com  Wed Aug  8 04:30:44 2007
From: shiblon at gmail.com (Chris Monson)
Date: Tue, 7 Aug 2007 22:30:44 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B92770.1080907@acm.org>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz>
	<43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com>
	<ca471dc20708070935j50715ccre93143b0a30a0ae9@mail.gmail.com>
	<46B92770.1080907@acm.org>
Message-ID: <da3f900e0708071930occbcbc4ge0ddd223a8ee19b8@mail.gmail.com>

Wow.  +1 for pure lucid reasoning.

(Sorry for top-posting; blame the crackberry)


On 8/7/07, Talin <talin at acm.org> wrote:
> Guido van Rossum wrote:
>
> > Assuming you weren't being sarcastic, array('B') and bytes() are very
> > close except bytes have a literal notation and many string-ish
> > methods. The buffer objects returned by the buffer() builtin provide a
> > read-only view on other objects that happen to have an internal
> > buffer, like strings, bytes, arrays, PIL images, and numpy arrays.
> > Lists of integers don't have the property that the other three share
> > which is that their C representation is a contiguous array of bytes
> > (char* in C). This representation is important because to do efficient
> > I/O in C you need char*.
>
> I've been following the discussion in a cursory way, and I can see that
> there is a lot of disagreement and confusion around the whole issue of
> mutable vs. immutable bytes.
>
> If it were up to me, and I was starting over from scratch, here's the
> design I would create:
>
> 1) I'd reserve the term 'bytes' to refer to an immutable byte string.
>
> 2) I would re-purpose the existing term 'buffer' to refer to a mutable,
> resizable byte buffer.
>
> Rationale: "buffer" has been used historically in many computer
> languages to refer to a mutable area of memory. The word 'bytes', on the
> other hand, seems to imply a *value* rather than a *location*, and
> values (like numbers) are generally considered immutable.
>
> 3) Both 'bytes' and 'buffer' would be derived from an abstract base
> class called ByteSequence. ByteSequence defines all of the read-only
> accessor methods common to both classes.
>
> 4) Literals of both types are available - using a prefix of small 'b'
> for bytes, and capitol B for 'buffer'.
>
> 5) Both 'bytes' and 'buffer' would support the 'buffer protocol',
> although in the former case it would be read-only. Other things which
> are not buffers could also support this protocol.
>
> 6) Library APIs that required a byte sequence would be written to test
> vs. the abstract ByteSequence type.
>
> 7) Both bytes and buffer objects would be inter-convertible using the
> appropriate constructors.
>
> -- Talin
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/shiblon%40gmail.com
>

From skip at pobox.com  Wed Aug  8 04:49:52 2007
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 7 Aug 2007 21:49:52 -0500
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
 tests in the struni branch!
In-Reply-To: <ca471dc20708071711y36caa780ta85537c57d591f64@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>
	<f99n87$8or$1@sea.gmane.org>
	<18104.33617.706079.853923@montanaro.dyndns.org>
	<ca471dc20708071711y36caa780ta85537c57d591f64@mail.gmail.com>
Message-ID: <18105.12112.271066.494963@montanaro.dyndns.org>


    Guido> Fixed now. This was OSX only due to an endianness issue; but the
    Guido> bug was universal: we were treating a unicode character using
    Guido> structmodule's T_CHAR. Since other similar fields of the dialect
    Guido> type were dealt with properly it seems this was merely an
    Guido> oversight.

Thanks.  I'm sure I would not have figured that out for quite awhile.

Skip

From ntoronto at cs.byu.edu  Wed Aug  8 04:37:51 2007
From: ntoronto at cs.byu.edu (Neil Toronto)
Date: Tue, 07 Aug 2007 20:37:51 -0600
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B91B97.30603@canterbury.ac.nz>
References: <46B637DD.7070905@v.loewis.de>	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	<46B78F86.9000505@v.loewis.de>	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>	<46B82B88.5000804@canterbury.ac.nz>
	<46B866A0.2040800@gmail.com>	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>	<46B89C5B.7090104@gmail.com>	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
	<46B91B97.30603@canterbury.ac.nz>
Message-ID: <46B92C7F.3090500@cs.byu.edu>

Greg Ewing wrote:
> Guido van Rossum wrote:
>   
>> That would imply that b"..." should return a mutable bytes object,
>> which many people have objected to.
>>     
>
> I'm still very uncomfortable about this. It's so
> completely unlike anything else in the language.
> I have a strong feeling that it is going to trip
> people up a lot, and end up being one of the
> Famous Warts To Be Fixed In Py4k.
>
> There's some evidence of this already in the way
> we're referring to it as a "bytes literal", when
> it's *not* actually a literal, but a constructor.
> Or at least it's a literal with an implied
> construction operation around it.
>   

Not only that, but it's the only *string prefix* that causes the 
interpreter to create and return a mutable object.

It's not too late to go with Talin's suggestions (bytes = immutable, 
buffer = mutable), is it? I got warm fuzzies reading that.

Neil


From greg.ewing at canterbury.ac.nz  Wed Aug  8 04:57:30 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 14:57:30 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B92770.1080907@acm.org>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz>
	<43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com>
	<ca471dc20708070935j50715ccre93143b0a30a0ae9@mail.gmail.com>
	<46B92770.1080907@acm.org>
Message-ID: <46B9311A.7020908@canterbury.ac.nz>

Talin wrote:
> 4) Literals of both types are available - using a prefix of small 'b' 
> for bytes, and capitol B for 'buffer'.

I don't see that it would be really necessary to have a
distinct syntax for a buffer constructor (no literal!)
because you could always write

   buffer(b"...")

This is what it would have to be doing underneath
anyway.

--
Greg

From rhamph at gmail.com  Wed Aug  8 05:52:31 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 7 Aug 2007 21:52:31 -0600
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B91B97.30603@canterbury.ac.nz>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com>
	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
	<46B91B97.30603@canterbury.ac.nz>
Message-ID: <aac2c7cb0708072052y5f492a8ct779f6a91b024b9b4@mail.gmail.com>

On 8/7/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> So concatenating mutable and immutable should give the
> same result as concatenating two immutables, i.e. an
> immutable. If you need to add something to the end of
> your buffer, while keeping it mutable, you use extend().
>
> This gives us
>
>    immutable + immutable -> immutable
>    mutable + immutable -> immutable
>    immutable + mutable -> immutable
>    mutable + mutable -> mutable  (*)

>>> () + []
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate tuple (not "list") to tuple

Less confusing to prohibit concatenation of mismatched types.  There's
always trivial workarounds (ie () + tuple([]) or .extend()).

-- 
Adam Olsen, aka Rhamphoryncus

From martin at v.loewis.de  Wed Aug  8 07:38:42 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Aug 2007 07:38:42 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B908A8.8040605@canterbury.ac.nz>
References: <46B637DD.7070905@v.loewis.de>	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	<46B78F86.9000505@v.loewis.de>	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>	<46B82B88.5000804@canterbury.ac.nz>
	<46B866A0.2040800@gmail.com>	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B908A8.8040605@canterbury.ac.nz>
Message-ID: <46B956E2.2040600@v.loewis.de>

>>> This would mean that the Unicode type would acquire all of the ambiquity
>>> currently associated with the 8-bit str type
>> Not necessarily, as this kind of use is typically very localized.
>> Remember practicality beats purity.
> 
> Has anyone considered that, depending on the implementation,
> a latin1-decoded unicode string could take 2-4 times as much
> memory?

I considered it, then ignored it. If you have the need for hashing,
the string won't be long.

Regards,
Martin

From martin at v.loewis.de  Wed Aug  8 07:45:57 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Aug 2007 07:45:57 +0200
Subject: [Python-3000] Binary compatibility
In-Reply-To: <cd53a0140708070015j7f944ea5qca963da2684d5e86@mail.gmail.com>
References: <cd53a0140708061233v3776e9d5m40f39f3a022bb76d@mail.gmail.com>	<46B7EC3E.3070802@v.loewis.de>
	<cd53a0140708070015j7f944ea5qca963da2684d5e86@mail.gmail.com>
Message-ID: <46B95895.20705@v.loewis.de>

>> Now, you seem to talk about different *Linux* systems. On Linux,
>> use UCS-4.
> 
> Yes, that's what we want. But Python 2.5 defaults to UCS-2 (at least
> last time I tried), while many distros have used UCS-4. If Linux
> always used UCS-4, that would be fine, but currently there's no
> guarantee of that.

I see why a guarantee would help, but I don't think it's necessary.
Just provide UCS-4 binaries only on Linux, and when somebody complains,
tell them to recompile Python, or to recompile your software themselves.

The defaults in 2.5.x cannot be changed anymore. The defaults could
be changed for Linux in 2.6, but then the question is: why just for
Linux?

Regards,
Martin


From nnorwitz at gmail.com  Wed Aug  8 07:57:32 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Tue, 7 Aug 2007 22:57:32 -0700
Subject: [Python-3000] infinite recursion with python -v
Message-ID: <ee2a432c0708072257v68f57d70l9b155947650bb640@mail.gmail.com>

The wiki seems to be done, so sorry for the spam.

python -v crashes due to infinite recursion (well, it tried to be
infinite until it got a stack overflow :-)  The problem seems to be
that Lib/encodings/latin_1.py is loaded, but it tries to be converted
to latin_1, so it tries to load the module, and ...  Or something like
that.  See below for a call stack.

Minimal version:

PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches
Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184
mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at
Python/sysmodule.c:1350
PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380
check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=,
cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755
load_source_module (name= "encodings.latin_1", pathname=
"Lib/encodings/latin_1.py", fp=) at Python/import.c:938
load_module (name= "encodings.latin_1", fp=,buf=
"Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733
import_submodule (mod=, subname= "latin_1",fullname=
"encodings.latin_1") at Python/import.c:2418
load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=)
at Python/import.c:2213
import_module_level (name=, globals=, locals=, fromlist=, level=0) at
Python/import.c:1992
PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=,
locals=, fromlist=, level=0) at Python/import.c:2056
builtin___import__ () at Python/bltinmodule.c:151
[...]
_PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147
codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211
PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275
PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322
PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at
Objects/stringobject.c:459
string_encode () at Objects/stringobject.c:3138
[...]
PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159
PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches
Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184

== Stack trace for python -v recursion (argument values are mostly trimmed) ==

PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches
Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184
mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at
Python/sysmodule.c:1350
PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380
check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=,
cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755
load_source_module (name= "encodings.latin_1", pathname=
"Lib/encodings/latin_1.py", fp=) at Python/import.c:938
load_module (name= "encodings.latin_1", fp=,buf=
"Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733
import_submodule (mod=, subname= "latin_1",fullname=
"encodings.latin_1") at Python/import.c:2418
load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=)
at Python/import.c:2213
import_module_level (name=, globals=, locals=, fromlist=, level=0) at
Python/import.c:1992
PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=,
locals=, fromlist=, level=0) at Python/import.c:2056
builtin___import__ () at Python/bltinmodule.c:151
PyCFunction_Call () at Objects/methodobject.c:77
PyObject_Call () at Objects/abstract.c:1736
do_call () at Python/ceval.c:3764
call_function (pp_stack=, oparg=513) at Python/ceval.c:3574
PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216
PyEval_EvalCodeEx () at Python/ceval.c:2835
function_call () at Objects/funcobject.c:634
PyObject_Call () at Objects/abstract.c:1736
PyEval_CallObjectWithKeywords () at Python/ceval.c:3431
_PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147
codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211
PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275
PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322
PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at
Objects/stringobject.c:459
string_encode () at Objects/stringobject.c:3138
PyCFunction_Call () at Objects/methodobject.c:73
call_function () at Python/ceval.c:3551
PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216
PyEval_EvalCodeEx () at Python/ceval.c:2835
function_call () at Objects/funcobject.c:634
PyObject_Call () at Objects/abstract.c:1736
method_call () at Objects/classobject.c:397
PyObject_Call () at Objects/abstract.c:1736
PyEval_CallObjectWithKeywords () at Python/ceval.c:3431
PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159
PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches
Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184

From g.brandl at gmx.net  Wed Aug  8 08:38:11 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 08 Aug 2007 08:38:11 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <fb6fbf560708071414u4b7b55fey7cb00daa30973ff5@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<46B78F86.9000505@v.loewis.de>	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>	<46B82B88.5000804@canterbury.ac.nz>
	<46B866A0.2040800@gmail.com>	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>	<46B89C5B.7090104@gmail.com>	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
	<fb6fbf560708071414u4b7b55fey7cb00daa30973ff5@mail.gmail.com>
Message-ID: <f9boc9$etk$1@sea.gmane.org>

Jim Jewett schrieb:
> On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> 
>> If b"..." is immutable, the
>> immutable bytes type is in your face all the time and you'll have to
>> deal with the difference all the time.
> 
> There is a conceptual difference between the main use cases for
> mutable (a buffer) and the main use cases for immutable (a protocol
> constant).
> 
> I'm not sure why you would need a literal for the mutable version.
> How often do you create a new buffer with initial values?  (Note:  not
> pointing to existing memory; creating a new one.)

The same reason that you might create empty lists or dicts: to fill them.

>> E.g. is the result of
>> concatenating a mutable and an immutable bytes object mutable?
>> Does it matter whether the mutable operand is first or second?
> 
> I would say immutable; you're taking a snapshot.  (I would have some
> sympathy for taking the type of the first operand, but then you need
> to worry about + vs +=, and whether the start of the new object will
> notice later state changes.)

But what about

mutable = mutable + immutable

mutable += immutable

I'd expect it to stay mutable in both cases.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From g.brandl at gmx.net  Wed Aug  8 08:40:02 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 08 Aug 2007 08:40:02 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B91B97.30603@canterbury.ac.nz>
References: <46B637DD.7070905@v.loewis.de>	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>	<46B78F86.9000505@v.loewis.de>	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>	<46B82B88.5000804@canterbury.ac.nz>
	<46B866A0.2040800@gmail.com>	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>	<46B89C5B.7090104@gmail.com>	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
	<46B91B97.30603@canterbury.ac.nz>
Message-ID: <f9bofq$etk$2@sea.gmane.org>

Greg Ewing schrieb:

> So concatenating mutable and immutable should give the
> same result as concatenating two immutables, i.e. an
> immutable. If you need to add something to the end of
> your buffer, while keeping it mutable, you use extend().
> 
> This gives us
> 
>    immutable + immutable -> immutable
>    mutable + immutable -> immutable
>    immutable + mutable -> immutable
>    mutable + mutable -> mutable  (*)

NB: when dealing with sets and frozensets, you get the type of
the first operand.  Doing something different here is confusing.

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From martin at v.loewis.de  Wed Aug  8 09:04:45 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Aug 2007 09:04:45 +0200
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
 tests in the struni branch!
In-Reply-To: <ca471dc20708070955k4c734bcbge8efd08e6b001af8@mail.gmail.com>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>	<f99n87$8or$1@sea.gmane.org>	<ca471dc20708070736t6797477bw6667e0608bccb316@mail.gmail.com>	<f9a4c3$1ef$1@sea.gmane.org>
	<ca471dc20708070955k4c734bcbge8efd08e6b001af8@mail.gmail.com>
Message-ID: <46B96B0D.6080605@v.loewis.de>

>> It's in Modules/timemodule.c, line 691:
>>         PyModule_AddObject(m, "tzname",
>>                            Py_BuildValue("(zz)", tzname[0], tzname[1]));
>>
>> According to MSDN, tzname is a global variable; the contents is somehow
>> derived from the TZ environment variable (which is not set in my case).
> 
> Is there anything from which you can guess the encoding (e.g. the
> filesystem encoding?).

It's in the locale's encoding. On Windows, that will be "mbcs"; on other
systems, the timezone names are typically all in ASCII - this would
allow for a quick work-around. Using the filesytemencoding would also
work, although it would be an equal hack: it's *meant* to be used only
for file names (and on OSX at least, it deviates from the locale's
encoding - although I have no idea what tzname is encoded in on OSX).

> These are all externally-provided strings. It will depend on the
> platform what the encoding is.
> 
> I wonder if we need to add another format code to Py_BuildValue (and
> its friends) to designate "platform default encoding" instead of
> UTF-8.

For symmetry with ParseTuple, there could be the 'e' versions
(es, ez, ...) which would take a codec name also.

"platform default encoding" is a tricky concept, of course:
Windows alone has two of them on each installation.

Regards,
Martin


From jyasskin at gmail.com  Wed Aug  8 09:57:05 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Wed, 8 Aug 2007 00:57:05 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B80F8A.7060906@v.loewis.de>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<46B79815.1030504@v.loewis.de>
	<ca471dc20708061739w513fb601v8cf72b2eb2a490fd@mail.gmail.com>
	<5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com>
	<46B7F869.6080007@v.loewis.de>
	<5d44f72f0708062245k32e79de4s4b545c59a974612a@mail.gmail.com>
	<46B80F8A.7060906@v.loewis.de>
Message-ID: <5d44f72f0708080057u16e72916rf3bf369ea2889859@mail.gmail.com>

I agree completely with Talin's suggestion for the arrangement of the
mutable and immutable alternatives, but there are a couple points here
that I wanted to answer.

On 8/6/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > For low-level I/O code, I totally agree that a mutable buffery object
> > is needed.
>
> The code we are looking at right now (dbm interfaces) *is* low-level
> I/O code.

But you want an immutable interface to it that looks like a dict. I
think that's entirely appropriate because the underlying C code is the
real low-level I/O code, while the Python wrapper is actually pretty
high-level.

> > For example, to support re-using bytes buffers, socket.send()
> > would need to take start and end offsets into its bytes argument.
> > Otherwise, you have to slice the object to select the right data,
> > which *because bytes are mutable* requires a copy. PEP 3116's .write()
> > method has the same problem. Making those changes is, of course,
> > doable, but it seems like something that should be consciously
> > committed to.
>
> Sure. There are several ways to do that, including producing view
> objects - which would be possible even though the underlying buffer
> is mutable; the view would then be just as mutable.
>
> > Python 2 seems to have gotten away with doing all the buffery stuff in
> > C. Is there a reason Python 3 shouldn't do the same?
>
> I think Python 2 has demonstrated that this doesn't really work. People
> repeatedly did += on strings (leading to quadratic performance),

This argues for mutable strings at least as much as it argues for
mutable high-level bytes. Now that they exist, generators are a pretty
natural way to build up immutable objects, so people certainly have
the option to avoid quadratic performance whatever the mutability of
their objects.

> invented the buffer interface (which is semantically flawed), added
> direct support for mmap, and so on.

And those still exist in Python 3 (perhaps in an updated form). A
mutable bytes doesn't obsolete them. It may be a handy concrete type
for the buffer interface, but then so is array.

> > me: [benchmarks showing 10% faster construction]
[Probably this just means that something hasn't been optimized enough
on Intel Macs]
> Martin: [same benchmarks showing 10% faster copying]

I'd really say it's the same result (and shouldn't have claimed
otherwise in my email. Sorry). A 10% difference either way is likely
to be dwarfed by the costs of actually doing I/O. Before picking
interfaces around the notion that either allocation or copying is
expensive, it would be wise to run benchmarks to figure out what the
performance actually looks like.

On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> [list()] would not work with low-level I/O (sometimes readinto() is useful)

When is "sometimes"? Is it the same times that rewriting into C would
be a good idea? I'd really like to see any benchmarks people have
written to decide this.

In any case, the obvious thing to do may well be different when you're
writing performance-critical code and when you're writing code that
just needs to be readable. I haven't seen any such distinguishing
circumstance for the various hashing techniques.

On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> On 8/6/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> ...why are you waiting
> > for a show-stopper that requires an immutable bytes type rather than
> > one that requires a mutable one?
>
> Well one reason of course is that we currently have a mutable bytes
> object and that it works well in most situations.

The status quo argument must be weaker given that bytes hasn't existed
in any released Python. I was really asking why you picked mutable as
the first type to experiment with, and I guess I/O is the answer to
that, although it seems to me like a case of the tail wagging the dog.

On 8/7/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Jeffrey Yasskin wrote:
> > If you have mutable bytes and need an
> > immutable object, you could 1) convert it to an int (probably
> > big-endian),
>
> That's not a reversible transformation, because you lose
> information about leading zero bits.

Good point. You'd need a length along with the data, unless you're
dealing with a fixed-length thing like 4CCs. This is still probably
among the most efficient representations.

On 8/7/07, Guido van Rossum <guido at python.org> wrote:
> But this is impractical -- a very common way to work is to build up a
> set incrementally. With immutable sets this would quickly become
> O(N**2). That's why set() is mutable and {...} creates a set, and the
> only way to create an immutable set is to use frozenset(...).

I would probably default to constructing an immutable set with a
generator. If I needed to do something more complicated, I'd fall back
to a mutable set. Of course, making the name for the immutable version
3 times as long biases the language toward the mutable version.

-- 
Namast?,
Jeffrey Yasskin

From greg.ewing at canterbury.ac.nz  Wed Aug  8 12:57:24 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 22:57:24 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <aac2c7cb0708072052y5f492a8ct779f6a91b024b9b4@mail.gmail.com>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com>
	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
	<46B91B97.30603@canterbury.ac.nz>
	<aac2c7cb0708072052y5f492a8ct779f6a91b024b9b4@mail.gmail.com>
Message-ID: <46B9A194.90009@canterbury.ac.nz>

Adam Olsen wrote:
> Less confusing to prohibit concatenation of mismatched types.  There's
> always trivial workarounds (ie () + tuple([]) or .extend()).

Normally I would agree, but in this case I feel that
it would be inconvenient. With the scheme I proposed,
code that treats bytes as read-only doesn't have to
care whether it has a mutable or immutable object.

If they were as rigidly separated as lists and tuples,
every API would have to be strictly aware of whether
it dealt with mutable or immutable bytes.

I could be wrong, though. It may turn out that keeping
them separate is the right thing to do.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug  8 13:03:15 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 23:03:15 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <f9boc9$etk$1@sea.gmane.org>
References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com>
	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
	<fb6fbf560708071414u4b7b55fey7cb00daa30973ff5@mail.gmail.com>
	<f9boc9$etk$1@sea.gmane.org>
Message-ID: <46B9A2F3.9050305@canterbury.ac.nz>

Georg Brandl wrote:

> mutable = mutable + immutable
> 
> mutable += immutable

I wouldn't have a problem with these being different.
They're already different with list + tuple (although
in that case, one of them is disallowed).

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug  8 13:04:58 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Aug 2007 23:04:58 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <f9bofq$etk$2@sea.gmane.org>
References: <46B637DD.7070905@v.loewis.de>
	<ca471dc20708061106i4089da2cq44981422c7787b44@mail.gmail.com>
	<46B78F86.9000505@v.loewis.de>
	<ca471dc20708061433m37ab0ce6sfa6781295fc5ef23@mail.gmail.com>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com>
	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
	<46B91B97.30603@canterbury.ac.nz> <f9bofq$etk$2@sea.gmane.org>
Message-ID: <46B9A35A.7030406@canterbury.ac.nz>

Georg Brandl wrote:

> NB: when dealing with sets and frozensets, you get the type of
> the first operand.  Doing something different here is confusing.

Hmmm, I don't think I would have designed it that
way. I might be willing to go along with that
precedent, though.

--
Greg

From victor.stinner at haypocalc.com  Wed Aug  8 18:14:05 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 8 Aug 2007 18:14:05 +0200
Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug,
	ctypes c_char creates bytes
Message-ID: <200708081814.05073.victor.stinner@haypocalc.com>

Hi,

I hear Guido's request to fix last py3k-struni bugs. I downloaded subversion 
trunk and started to work on ctypes tests.

The problem is that ctypes c_char (and c_char_p) creates unicode string 
instead of byte string. I attached a proposition (patch) to change this 
behaviour (use bytes for c_char).

So in next example, it will display 'bytes' and not 'str':
  from ctypes import c_buffer, c_char
  buf = c_buffer("abcdef")
  print (type(buf[0]))

Other behaviour changes:
 - repr(c_char) adds a "b"
   eg. repr(c_char('x')) is "c_char(b'x')" instead of "c_char('x')"
 - bytes is mutable whereas str is not: 
   this may break some modules based on ctypes

Victor Stinner aka haypo
http://hachoir.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: py3k-struni-ctypes.diff
Type: text/x-diff
Size: 4992 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070808/d4bfd476/attachment.bin 

From guido at python.org  Wed Aug  8 18:45:38 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Aug 2007 09:45:38 -0700
Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug,
	ctypes c_char creates bytes
In-Reply-To: <200708081814.05073.victor.stinner@haypocalc.com>
References: <200708081814.05073.victor.stinner@haypocalc.com>
Message-ID: <ca471dc20708080945n52ce9aex2557a4079f310b70@mail.gmail.com>

Thanks! Would you mind submitting to SF and assigning to Thomas Heller
(theller I think)?

And update the wiki (http://wiki.python.org/moin/Py3kStrUniTests)

On 8/8/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> Hi,
>
> I hear Guido's request to fix last py3k-struni bugs. I downloaded subversion
> trunk and started to work on ctypes tests.
>
> The problem is that ctypes c_char (and c_char_p) creates unicode string
> instead of byte string. I attached a proposition (patch) to change this
> behaviour (use bytes for c_char).
>
> So in next example, it will display 'bytes' and not 'str':
>   from ctypes import c_buffer, c_char
>   buf = c_buffer("abcdef")
>   print (type(buf[0]))
>
> Other behaviour changes:
>  - repr(c_char) adds a "b"
>    eg. repr(c_char('x')) is "c_char(b'x')" instead of "c_char('x')"
>  - bytes is mutable whereas str is not:
>    this may break some modules based on ctypes
>
> Victor Stinner aka haypo
> http://hachoir.org/
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From theller at ctypes.org  Wed Aug  8 20:32:48 2007
From: theller at ctypes.org (Thomas Heller)
Date: Wed, 08 Aug 2007 20:32:48 +0200
Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug,
 ctypes c_char creates bytes
In-Reply-To: <200708081814.05073.victor.stinner@haypocalc.com>
References: <200708081814.05073.victor.stinner@haypocalc.com>
Message-ID: <f9d28k$tdg$1@sea.gmane.org>

Victor Stinner schrieb:
> Hi,
> 
> I hear Guido's request to fix last py3k-struni bugs. I downloaded subversion 
> trunk and started to work on ctypes tests.
> 
> The problem is that ctypes c_char (and c_char_p) creates unicode string 
> instead of byte string. I attached a proposition (patch) to change this 
> behaviour (use bytes for c_char).
> 
> So in next example, it will display 'bytes' and not 'str':
>   from ctypes import c_buffer, c_char
>   buf = c_buffer("abcdef")
>   print (type(buf[0]))
> 
> Other behaviour changes:
>  - repr(c_char) adds a "b"
>    eg. repr(c_char('x')) is "c_char(b'x')" instead of "c_char('x')"
>  - bytes is mutable whereas str is not: 
>    this may break some modules based on ctypes

This patch looks correct.  I will test it and then commit if all works well.


The problem I had fixing this is that I was not sure whether the c_char types
should 'contain' bytes objects or str8 objects.  str8 will be going away, so
it seems the decision is clear.

OTOH, I'm a little bit confused about the bytes type.  I think this behaviour
is a little bit confusing, but maybe that's just me:

>>> b"abc"[:]
b'abc'
>>> b"abc"[:1]
b'a'
>>> b"abc"[1]
98
>>> b"abc"[1] = 42
>>> b"abc"[1] = "f"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object cannot be interpreted as an integer
>>> b"abc"[1] = b"f"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'bytes' object cannot be interpreted as an integer
>>>

Especially confusing is that the repr of a bytes object looks like a string,
but bytes do not contain characters but integers instead.

Thomas


From guido at python.org  Wed Aug  8 20:40:52 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Aug 2007 11:40:52 -0700
Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug,
	ctypes c_char creates bytes
In-Reply-To: <f9d28k$tdg$1@sea.gmane.org>
References: <200708081814.05073.victor.stinner@haypocalc.com>
	<f9d28k$tdg$1@sea.gmane.org>
Message-ID: <ca471dc20708081140ge8b5d09od87e55f38f97ad9@mail.gmail.com>

On 8/8/07, Thomas Heller <theller at ctypes.org> wrote:
> OTOH, I'm a little bit confused about the bytes type.  I think this behaviour
> is a little bit confusing, but maybe that's just me:
>
> >>> b"abc"[:]
> b'abc'
> >>> b"abc"[:1]
> b'a'
> >>> b"abc"[1]
> 98
> >>> b"abc"[1] = 42
> >>> b"abc"[1] = "f"
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: 'str' object cannot be interpreted as an integer
> >>> b"abc"[1] = b"f"
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: 'bytes' object cannot be interpreted as an integer
> >>>
>
> Especially confusing is that the repr of a bytes object looks like a string,
> but bytes do not contain characters but integers instead.

I hope you can get used to it. This design is a bit of a compromise --
conceptually, bytes really contain small unsigned integers in [0,
256), but in order to be maximally useful, the bytes literal (and
hence the bytes repr()) shows printable ASCII characters as themselves
(controls and non-ASCII are shown as \xXX). This is more compact too.
PBP!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Wed Aug  8 20:41:33 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Aug 2007 20:41:33 +0200
Subject: [Python-3000] [Python-Dev] Regular expressions, Unicode etc.
In-Reply-To: <E1IIhpw-0001jT-2W@virgo.cus.cam.ac.uk>
References: <E1IIhpw-0001jT-2W@virgo.cus.cam.ac.uk>
Message-ID: <46BA0E5D.60109@v.loewis.de>

> My second one is about Unicode.  I really, but REALLY regard it as
> a serious defect that there is no escape for printing characters.
> Any code that checks arbitrary text is likely to need them - yes,
> I know why Perl and hence PCRE doesn't have that, but let's skip
> that.  That is easy to add, though choosing a letter is tricky.
> Currently \c and \C, for 'character' (I would prefer 'text' or
> 'printable', but \t is obviously insane and \P is asking for
> incompatibility with Perl and Java).

Before discussing the escape, I'd like to see a specification of
it first - what characters precisely would classify as "printing"?

> But attempting to rebuild the Unicode database hasn't worked.
> Tools/unicode is, er, a trifle incomplete and out of date.  The
> only file I need to change is Objects/unicodetype_db.h, but the
> init attempts to run Tools/unicode/makeunicodedata.py have not
> been successful.
> 
> I may be able to reverse engineer the mechanism enough to get
> the files off the Unicode site and run it, but I don't want to
> spend forever on it.  Any clues?

I see that you managed to do something here, so I'm not sure
what kind of help you still need.

Regards,
Martin

From mike.klaas at gmail.com  Wed Aug  8 20:56:58 2007
From: mike.klaas at gmail.com (Mike Klaas)
Date: Wed, 8 Aug 2007 11:56:58 -0700
Subject: [Python-3000] [Python-Dev] Regular expressions, Unicode etc.
In-Reply-To: <E1IIhpw-0001jT-2W@virgo.cus.cam.ac.uk>
References: <E1IIhpw-0001jT-2W@virgo.cus.cam.ac.uk>
Message-ID: <9CBC9283-52BF-48AB-A39F-0DAE0E4EAFAE@gmail.com>

On 8-Aug-07, at 2:28 AM, Nick Maclaren wrote:

> I have needed to push my stack to teach REs (don't ask), and am
> taking a look at the RE code.  I may be able to extend it to support
> RFE 694374 and (more importantly) atomic groups and possessive
> quantifiers.  While I regard such things as revolting beyond belief,
> they make a HELL of a difference to the efficiency of recognising
> things like HTML tags in a morass of mixed text.

+1.  I would use such a feature.

> The other approach, which is to stick to true regular expressions,
> and wholly or partially convert to DFAs, has already been rendered
> impossible by even the limited Perl/PCRE extensions that Python
> has adopted.

Impossible?  Surely, a sufficiently-competent re engine could detect  
when a DFA is possible to construct?

-Mike

From lists at cheimes.de  Wed Aug  8 21:44:59 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 08 Aug 2007 21:44:59 +0200
Subject: [Python-3000] Removal of cStringIO and StringIO module
Message-ID: <f9d6fv$c6o$1@sea.gmane.org>

I've spent some free time today to work on a patch that removes
cStringIO and StringIO from the py3k-struni branch. The patch is
available at http://www.python.org/sf/1770008

It adds a deprecation warning to StringIO.py and a facade cStringIO.py.
Both modules act as an alias for io.StringIO. You may remove the files.
I didn't noticed that 2to3 has a fixer for cStringIO and StringIO. But
the files may be useful because the fixer doesn't fix doc tests.

Some unit tests are failing because I don't know how to handle
StringIO(buffer()). Georg Brandl suggested to use io.BytesIO but that
doesn't work correctly.

Christian


From victor.stinner at haypocalc.com  Wed Aug  8 21:56:11 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 8 Aug 2007 21:56:11 +0200
Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug,
	ctypes c_char creates bytes
In-Reply-To: <ca471dc20708080945n52ce9aex2557a4079f310b70@mail.gmail.com>
References: <200708081814.05073.victor.stinner@haypocalc.com>
	<ca471dc20708080945n52ce9aex2557a4079f310b70@mail.gmail.com>
Message-ID: <200708082156.11985.victor.stinner@haypocalc.com>

On Wednesday 08 August 2007 18:45:38 you wrote:
> Thanks! Would you mind submitting to SF and assigning to Thomas Heller
> (theller I think)?
>
> And update the wiki (http://wiki.python.org/moin/Py3kStrUniTests)

Thomas Heller did it. Thanks ;-)

Victor Stinner aka haypo
http://hachoir.org/

From brett at python.org  Wed Aug  8 22:07:54 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 8 Aug 2007 13:07:54 -0700
Subject: [Python-3000] Removal of cStringIO and StringIO module
In-Reply-To: <f9d6fv$c6o$1@sea.gmane.org>
References: <f9d6fv$c6o$1@sea.gmane.org>
Message-ID: <bbaeab100708081307u61f5a5c2ka828d5aa5b3d34fc@mail.gmail.com>

On 8/8/07, Christian Heimes <lists at cheimes.de> wrote:
> I've spent some free time today to work on a patch that removes
> cStringIO and StringIO from the py3k-struni branch. The patch is
> available at http://www.python.org/sf/1770008
>

Thanks for the work!  And with Alexandre's Summer of Code project to
have a C version of io.StringIO that is used transparently that should
work out well!

> It adds a deprecation warning to StringIO.py and a facade cStringIO.py.
> Both modules act as an alias for io.StringIO. You may remove the files.
> I didn't noticed that 2to3 has a fixer for cStringIO and StringIO. But
> the files may be useful because the fixer doesn't fix doc tests.
>

Deprecation warnings for modules that are going away has not been
handled yet.  Stdlib stuff is on the table for after 3.0a1.  Chances
are the stdlib deprecations will be a 2.6 thing and 3.0 won't have any
since we expect people to go 2.6 -> 3.0, not the other way around.
There will be 2to3 fixers for the imports, but not until we tackle the
stdlib and its cleanup.

> Some unit tests are failing because I don't know how to handle
> StringIO(buffer()). Georg Brandl suggested to use io.BytesIO but that
> doesn't work correctly.

Really?  I did that in a couple of places and it worked for me.
What's the problem specifically?

-Brett

From jimjjewett at gmail.com  Wed Aug  8 22:50:14 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 8 Aug 2007 16:50:14 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <f9boc9$etk$1@sea.gmane.org>
References: <46B637DD.7070905@v.loewis.de>
	<da3f900e0708061709v53d93f7m76519dd6cee92cc5@mail.gmail.com>
	<ca471dc20708061719p3092e6bbu47c68bbfddc11012@mail.gmail.com>
	<46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com>
	<ca471dc20708070759v6f9894a6w84838553f672cd5f@mail.gmail.com>
	<46B89C5B.7090104@gmail.com>
	<ca471dc20708070939i648f710fp95b0e1d878ed87cf@mail.gmail.com>
	<fb6fbf560708071414u4b7b55fey7cb00daa30973ff5@mail.gmail.com>
	<f9boc9$etk$1@sea.gmane.org>
Message-ID: <fb6fbf560708081350n5329e77ej6bfbf0ef08ae819e@mail.gmail.com>

On 8/8/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Jim Jewett schrieb:

> > I'm not sure why you would need a literal for the mutable version.
> > How often do you create a new buffer with initial values?  (Note:  not
> > pointing to existing memory; creating a new one.)

> The same reason that you might create empty lists or dicts: to fill them.

Let me rephrase that -- how often do you create new non-empty buffers?

The equivalent of a empty list or dict is buffer().  If you really
want to save keystrokes, call it buf().  The question is whether we
really need to abbreviate

    >>> mybuf = buffer("abcde")

as

    >>> mybuf = b"abcde"

I would say leave literal syntax for the immutable type.

(And other than this nit, I also lend my support to Talin's suggestion.)

-jJ

From guido at python.org  Wed Aug  8 23:19:24 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Aug 2007 14:19:24 -0700
Subject: [Python-3000] Moving to a "py3k" branch soon
Message-ID: <ca471dc20708081419q27a96511o18ee9e45bf0a08f6@mail.gmail.com>

I would like to move to a new branch soon for all Py3k development.

I plan to name the branch "py3k".  It will be branched from
py3k-struni.  I will do one last set of merges from the trunk via p3yk
(note typo!) and py3k-struni, and then I will *delete* the old py3k
and py3k-struni branches (you will still be able to access their last
known good status by syncing back to a previous revision).  I will
temporarily shut up some unit tests to avoid getting endless spam from
Neal's buildbot.

After the switch, you should be able to switch your workspaces to the
new branch using the "svn switch" command.

If anyone is in the middle of something that would become painful due
to this changeover, let me know ASAP and I'll delay.

I will send out another message when I start the move, and another
when I finish it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nnorwitz at gmail.com  Wed Aug  8 23:52:58 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Wed, 8 Aug 2007 14:52:58 -0700
Subject: [Python-3000] C API cleanup str
In-Reply-To: <46B5FA11.5040404@v.loewis.de>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
	<46B2C8E0.8080409@canterbury.ac.nz>
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>
	<46B5C47B.5090703@v.loewis.de>
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>
	<46B5F136.4010502@v.loewis.de>
	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B5FA11.5040404@v.loewis.de>
Message-ID: <ee2a432c0708081452k8b874e7j8547f36bb908fc23@mail.gmail.com>

On 8/5/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> >> I agree. We should specify that somewhere, so we have a recorded
> >> guideline to use in case of doubt.
> >
> > But where? Time to start a PEP for the C API perhaps?
>
> I would put it into the API documentation. We can put a daily-generated
> version of the documentation online, just as the trunk documentation is
> updated daily.

That's already been done for a while:  http://docs.python.org/dev/3.0/

It's even updated every 12 hours. :-)

n

From lists at cheimes.de  Thu Aug  9 00:22:51 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 09 Aug 2007 00:22:51 +0200
Subject: [Python-3000] tp_bytes and __bytes__ magic method
Message-ID: <f9dfo1$b7e$1@sea.gmane.org>

Hey Pythonistas!

Victor Stinner just made a good point at #python. The py3k has no magic
method and type slot for bytes. Python has magic methods like __int__
for int(ob) and __str__ for str(ob). Are you considering to add a
__bytes__ method and tp_bytes?

I can think of a bunch of use cases for a magic method.

Christian


From victor.stinner at haypocalc.com  Thu Aug  9 00:49:30 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 9 Aug 2007 00:49:30 +0200
Subject: [Python-3000] tp_bytes and __bytes__ magic method
In-Reply-To: <f9dfo1$b7e$1@sea.gmane.org>
References: <f9dfo1$b7e$1@sea.gmane.org>
Message-ID: <200708090049.30405.victor.stinner@haypocalc.com>

On Thursday 09 August 2007 00:22:51 Christian Heimes wrote:
> Hey Pythonistas!
>
> Victor Stinner just made a good point at #python. The py3k has no magic
> method and type slot for bytes.

And another problem: mix of __str__ and __unicode__ methods.

class A:
  def __str__(self): return '__str__'

class B:
  def __str__(self): return '__str__'
  def __unicode__(self): return '__unicode__'

print (repr(str( A() )))  # display '__str__'
print (repr(str( B() )))  # display '__unicode__'


Proposition:

  __str__() -> str (2.x) becomes __bytes__() -> bytes (3000)
  __unicode__() -> unicode (2.x) becomes __str__() -> str (3000)

Victor Stinner aka haypo

From guido at python.org  Thu Aug  9 00:54:47 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Aug 2007 15:54:47 -0700
Subject: [Python-3000] tp_bytes and __bytes__ magic method
In-Reply-To: <f9dfo1$b7e$1@sea.gmane.org>
References: <f9dfo1$b7e$1@sea.gmane.org>
Message-ID: <ca471dc20708081554x834b59cpcde5c9b1d9ca3c4a@mail.gmail.com>

On 8/8/07, Christian Heimes <lists at cheimes.de> wrote:
> Victor Stinner just made a good point at #python. The py3k has no magic
> method and type slot for bytes. Python has magic methods like __int__
> for int(ob) and __str__ for str(ob). Are you considering to add a
> __bytes__ method and tp_bytes?

Never occurred to me. The intention is that bytes() has a fixed
signature. It's far less important than str(). __int__() is different,
numeric types must be convertible.

> I can think of a bunch of use cases for a magic method.

Such as?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug  9 00:55:40 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Aug 2007 15:55:40 -0700
Subject: [Python-3000] tp_bytes and __bytes__ magic method
In-Reply-To: <200708090049.30405.victor.stinner@haypocalc.com>
References: <f9dfo1$b7e$1@sea.gmane.org>
	<200708090049.30405.victor.stinner@haypocalc.com>
Message-ID: <ca471dc20708081555gf96c4e3l1fa73ae9afbe562@mail.gmail.com>

The plan is to kill __unicode__ and only use __str__. But we're not
quite there yet.

On 8/8/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> On Thursday 09 August 2007 00:22:51 Christian Heimes wrote:
> > Hey Pythonistas!
> >
> > Victor Stinner just made a good point at #python. The py3k has no magic
> > method and type slot for bytes.
>
> And another problem: mix of __str__ and __unicode__ methods.
>
> class A:
>   def __str__(self): return '__str__'
>
> class B:
>   def __str__(self): return '__str__'
>   def __unicode__(self): return '__unicode__'
>
> print (repr(str( A() )))  # display '__str__'
> print (repr(str( B() )))  # display '__unicode__'
>
>
> Proposition:
>
>   __str__() -> str (2.x) becomes __bytes__() -> bytes (3000)
>   __unicode__() -> unicode (2.x) becomes __str__() -> str (3000)
>
> Victor Stinner aka haypo
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From victor.stinner at haypocalc.com  Thu Aug  9 01:13:35 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 9 Aug 2007 01:13:35 +0200
Subject: [Python-3000] tp_bytes and __bytes__ magic method
In-Reply-To: <ca471dc20708081554x834b59cpcde5c9b1d9ca3c4a@mail.gmail.com>
References: <f9dfo1$b7e$1@sea.gmane.org>
	<ca471dc20708081554x834b59cpcde5c9b1d9ca3c4a@mail.gmail.com>
Message-ID: <200708090113.35900.victor.stinner@haypocalc.com>

On Thursday 09 August 2007 00:54:47 Guido van Rossum wrote:
> On 8/8/07, Christian Heimes <lists at cheimes.de> wrote:
> > Victor Stinner just made a good point at #python. The py3k has no magic
> > method and type slot for bytes (...)
> > I can think of a bunch of use cases for a magic method.
>
> Such as?

I'm writting on email module and I guess that some __str__ methods should 
return bytes instead of str (and so should be renamed to __bytes__). Maybe 
the one of Message class (Lib/email/message.py).

Victor Stinner aka haypo
http://hachoir.org/

From lists at cheimes.de  Thu Aug  9 01:49:35 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 09 Aug 2007 01:49:35 +0200
Subject: [Python-3000] tp_bytes and __bytes__ magic method
In-Reply-To: <ca471dc20708081554x834b59cpcde5c9b1d9ca3c4a@mail.gmail.com>
References: <f9dfo1$b7e$1@sea.gmane.org>
	<ca471dc20708081554x834b59cpcde5c9b1d9ca3c4a@mail.gmail.com>
Message-ID: <f9dkqg$off$1@sea.gmane.org>

Guido van Rossum wrote:
>> I can think of a bunch of use cases for a magic method.
> 
> Such as?

The __bytes__ method could be used to implement a byte representation of
an arbitrary object. The byte representation can then be used to submit
the object over wire or dump it into a file. In Python 2.x I could
overwrite __str__ to send an object over a socket but in Python 3k str()
returns a unicode object that can't be transmitted over sockets. Sockets
support bytes only.

Christian


From victor.stinner at haypocalc.com  Thu Aug  9 01:59:46 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 9 Aug 2007 01:59:46 +0200
Subject: [Python-3000] fix email module for bytes/str
Message-ID: <200708090159.46301.victor.stinner@haypocalc.com>

Hi,

I started to work on email module, but I have trouble to understand if a 
function should returns bytes or str (because I don't know email module).

Header.encode() -> bytes?
Message.as_string() -> bytes?
decode_header() -> list of (bytes, str|None) or (str, str|None)?
base64MIME.encode() -> bytes?

message_from_string() <- bytes?

Message.get_payload() -> bytes or str?

A charset name type is str, right?

---------------

Things to change to get bytes:
 - replace StringIO with BytesIO
 - add 'b' prefix, eg. '' becomes b''
 - replace "%s=%s" % (x, y) with b''.join((x, b'=', y))
   => is it the best method to concatenate bytes?

Problems (to port python 2.x code to 3000):
 - When obj.lower() is used, I expect obj to be str but it's bytes
 - obj.strip() doesn't work when obj is a byte, it requires an
   argument but I don't know the right value! Maybe b'\n\r\v\t '?
 - iterate on a bytes object gives number and not bytes object, eg.
      for c in b"small text":
         if re.match("(\n|\r)", c): ...
   Is it possible to 'bytes' regex? re.compile(b"x") raise an exception

-- 
Victor Stinner aka haypo
http://hachoir.org/

From guido at python.org  Thu Aug  9 02:00:44 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Aug 2007 17:00:44 -0700
Subject: [Python-3000] tp_bytes and __bytes__ magic method
In-Reply-To: <f9dkqg$off$1@sea.gmane.org>
References: <f9dfo1$b7e$1@sea.gmane.org>
	<ca471dc20708081554x834b59cpcde5c9b1d9ca3c4a@mail.gmail.com>
	<f9dkqg$off$1@sea.gmane.org>
Message-ID: <ca471dc20708081700s208617cbneeb4d811c50a108c@mail.gmail.com>

> On Thursday 09 August 2007 00:54:47 Guido van Rossum wrote:
> > On 8/8/07, Christian Heimes <lists at cheimes.de> wrote:
> > > Victor Stinner just made a good point at #python. The py3k has no magic
> > > method and type slot for bytes (...)
> > > I can think of a bunch of use cases for a magic method.
> >
> > Such as?

On 8/8/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> I'm writting on email module and I guess that some __str__ methods should
> return bytes instead of str (and so should be renamed to __bytes__). Maybe
> the one of Message class (Lib/email/message.py).

On 8/8/07, Christian Heimes <lists at cheimes.de> wrote:
> The __bytes__ method could be used to implement a byte representation of
> an arbitrary object. The byte representation can then be used to submit
> the object over wire or dump it into a file. In Python 2.x I could
> overwrite __str__ to send an object over a socket but in Python 3k str()
> returns a unicode object that can't be transmitted over sockets. Sockets
> support bytes only.

This could just as well be done using a method on that specific
object. I don't think having to write x.as_bytes() is worse than
bytes(x), *unless* there are contexts where it's important to convert
something to bytes without knowing what kind of thing it is. For
str(), such a context exists: print(). For bytes(), I'm not so sure.
The use cases given here seem to be either very specific to a certain
class, or could be solved using other generic APIs like pickling.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug  9 02:01:14 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Aug 2007 17:01:14 -0700
Subject: [Python-3000] fix email module for bytes/str
In-Reply-To: <200708090159.46301.victor.stinner@haypocalc.com>
References: <200708090159.46301.victor.stinner@haypocalc.com>
Message-ID: <ca471dc20708081701k321d1dfei4e0dec8669b3c8e5@mail.gmail.com>

You might want to send this to the email-sig.

On 8/8/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> Hi,
>
> I started to work on email module, but I have trouble to understand if a
> function should returns bytes or str (because I don't know email module).
>
> Header.encode() -> bytes?
> Message.as_string() -> bytes?
> decode_header() -> list of (bytes, str|None) or (str, str|None)?
> base64MIME.encode() -> bytes?
>
> message_from_string() <- bytes?
>
> Message.get_payload() -> bytes or str?
>
> A charset name type is str, right?
>
> ---------------
>
> Things to change to get bytes:
>  - replace StringIO with BytesIO
>  - add 'b' prefix, eg. '' becomes b''
>  - replace "%s=%s" % (x, y) with b''.join((x, b'=', y))
>    => is it the best method to concatenate bytes?
>
> Problems (to port python 2.x code to 3000):
>  - When obj.lower() is used, I expect obj to be str but it's bytes
>  - obj.strip() doesn't work when obj is a byte, it requires an
>    argument but I don't know the right value! Maybe b'\n\r\v\t '?
>  - iterate on a bytes object gives number and not bytes object, eg.
>       for c in b"small text":
>          if re.match("(\n|\r)", c): ...
>    Is it possible to 'bytes' regex? re.compile(b"x") raise an exception
>
> --
> Victor Stinner aka haypo
> http://hachoir.org/
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From victor.stinner at haypocalc.com  Thu Aug  9 04:27:19 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 9 Aug 2007 04:27:19 +0200
Subject: [Python-3000] bytes regular expression?
Message-ID: <200708090427.19830.victor.stinner@haypocalc.com>

Hi,

Since Python 3000 regular expressions are now Unicode by default, how can I 
use bytes regex? Very simplified example of my problem:
  import re
  print( re.sub("(.)", b"[\\1]", b'abc') )

This code fails with exception:
  File "(...)/py3k-struni/Lib/re.py", line 241, in _compile_repl
     p = _cache_repl.get(key)
  TypeError: unhashable type: 'bytes'

Does "frozen bytes type" (immutable) exist to be able to use a cache?

Victor Stinner aka haypo
http://hachoir.org/

From skip at pobox.com  Thu Aug  9 04:55:26 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 8 Aug 2007 21:55:26 -0500
Subject: [Python-3000] C API cleanup str
In-Reply-To: <ee2a432c0708081452k8b874e7j8547f36bb908fc23@mail.gmail.com>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
	<46B2C8E0.8080409@canterbury.ac.nz>
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>
	<46B5C47B.5090703@v.loewis.de>
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>
	<46B5F136.4010502@v.loewis.de>
	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B5FA11.5040404@v.loewis.de>
	<ee2a432c0708081452k8b874e7j8547f36bb908fc23@mail.gmail.com>
Message-ID: <18106.33310.386717.634156@montanaro.dyndns.org>


    Neal> That's already been done for a while:
    Neal> http://docs.python.org/dev/3.0/

Cool.  If the Google Sprint Chicago Edition becomes a reality (we've been
discussing it on chicago at python.org) and I get to go I think I will probably
devote much of my time to documentation.

Skip

From guido at python.org  Thu Aug  9 06:07:12 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Aug 2007 21:07:12 -0700
Subject: [Python-3000] bytes regular expression?
In-Reply-To: <200708090427.19830.victor.stinner@haypocalc.com>
References: <200708090427.19830.victor.stinner@haypocalc.com>
Message-ID: <ca471dc20708082107q640c70ccjf535b72f1744ae2e@mail.gmail.com>

A quick temporary hack is to use buffer(b'abc') instead. (buffer() is
so incredibly broken that it lets you hash() even if the underlying
object is broken. :-)

The correct solution is to fix the re library to avoid using hash()
directly on the underlying data type altogether; that never had sound
semantics (as proven by the buffer() hack above).

--Guido

On 8/8/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> Hi,
>
> Since Python 3000 regular expressions are now Unicode by default, how can I
> use bytes regex? Very simplified example of my problem:
>   import re
>   print( re.sub("(.)", b"[\\1]", b'abc') )
>
> This code fails with exception:
>   File "(...)/py3k-struni/Lib/re.py", line 241, in _compile_repl
>      p = _cache_repl.get(key)
>   TypeError: unhashable type: 'bytes'
>
> Does "frozen bytes type" (immutable) exist to be able to use a cache?
>
> Victor Stinner aka haypo
> http://hachoir.org/
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Thu Aug  9 08:06:32 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 09 Aug 2007 08:06:32 +0200
Subject: [Python-3000] C API cleanup str
In-Reply-To: <18106.33310.386717.634156@montanaro.dyndns.org>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>	<46B2C8E0.8080409@canterbury.ac.nz>	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>	<46B5C47B.5090703@v.loewis.de>	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>	<46B5F136.4010502@v.loewis.de>	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>	<46B5FA11.5040404@v.loewis.de>	<ee2a432c0708081452k8b874e7j8547f36bb908fc23@mail.gmail.com>
	<18106.33310.386717.634156@montanaro.dyndns.org>
Message-ID: <f9east$814$1@sea.gmane.org>

skip at pobox.com schrieb:
>     Neal> That's already been done for a while:
>     Neal> http://docs.python.org/dev/3.0/
> 
> Cool.  If the Google Sprint Chicago Edition becomes a reality (we've been
> discussing it on chicago at python.org) and I get to go I think I will probably
> devote much of my time to documentation.

When will that be? I think we should then switch over to the reST tree
before you start.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From brett at python.org  Thu Aug  9 08:31:08 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 8 Aug 2007 23:31:08 -0700
Subject: [Python-3000] C API cleanup str
In-Reply-To: <f9east$814$1@sea.gmane.org>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>
	<46B5C47B.5090703@v.loewis.de>
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>
	<46B5F136.4010502@v.loewis.de>
	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B5FA11.5040404@v.loewis.de>
	<ee2a432c0708081452k8b874e7j8547f36bb908fc23@mail.gmail.com>
	<18106.33310.386717.634156@montanaro.dyndns.org>
	<f9east$814$1@sea.gmane.org>
Message-ID: <bbaeab100708082331w20f0b456j7b0e9f68c3a8f6e7@mail.gmail.com>

On 8/8/07, Georg Brandl <g.brandl at gmx.net> wrote:
> skip at pobox.com schrieb:
> >     Neal> That's already been done for a while:
> >     Neal> http://docs.python.org/dev/3.0/
> >
> > Cool.  If the Google Sprint Chicago Edition becomes a reality (we've been
> > discussing it on chicago at python.org) and I get to go I think I will probably
> > devote much of my time to documentation.
>
> When will that be? I think we should then switch over to the reST tree
> before you start.

Aug 22-25: http://wiki.python.org/moin/GoogleSprint?highlight=%28googlesprint%29
.

-Brett

From talex5 at gmail.com  Wed Aug  8 20:27:56 2007
From: talex5 at gmail.com (Thomas Leonard)
Date: Wed, 8 Aug 2007 18:27:56 +0000 (UTC)
Subject: [Python-3000] Binary compatibility
References: <cd53a0140708061233v3776e9d5m40f39f3a022bb76d@mail.gmail.com>
	<46B7EC3E.3070802@v.loewis.de>
	<cd53a0140708070015j7f944ea5qca963da2684d5e86@mail.gmail.com>
	<46B95895.20705@v.loewis.de>
Message-ID: <f9d1vc$qmf$1@sea.gmane.org>

On Wed, 08 Aug 2007 07:45:57 +0200, Martin v. L?wis wrote:

>>> Now, you seem to talk about different *Linux* systems. On Linux,
>>> use UCS-4.
>> 
>> Yes, that's what we want. But Python 2.5 defaults to UCS-2 (at least
>> last time I tried), while many distros have used UCS-4. If Linux
>> always used UCS-4, that would be fine, but currently there's no
>> guarantee of that.
> 
> I see why a guarantee would help, but I don't think it's necessary.
> Just provide UCS-4 binaries only on Linux, and when somebody complains,
> tell them to recompile Python, or to recompile your software themselves.

Won't recompiling Python break every other Python program on their system,
though? (e.g. anything that itself uses a C Python library)

Also, anything involving recompiling isn't exactly user friendly... we
might give Linux a bad name!

> The defaults in 2.5.x cannot be changed anymore. The defaults could
> be changed for Linux in 2.6, but then the question is: why just for
> Linux?

Are there different Windows python binaries around with different UCS-2/4
settings? If so, I'd imaging that would be a problem too, although as I
say we don't have many Windows users for ROX.

BTW, none of this is urgent. We experimented with Python/C hybrids in the
past. It didn't work, so we carried on using pure C programs for anything
that needed any part in C. So it's not causing actual problems for users
right now. It would just be nice to have it sorted out one day, so we
could use Python more in the future.


-- 
Dr Thomas Leonard		http://rox.sourceforge.net
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1


From talin at acm.org  Thu Aug  9 09:31:37 2007
From: talin at acm.org (Talin)
Date: Thu, 09 Aug 2007 00:31:37 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B8D58E.5040501@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com>
Message-ID: <46BAC2D9.2020902@acm.org>

Ron Adam wrote:
> Talin wrote:
>> Ron Adam wrote:
>>> Now here's the problem with all of this.  As we add the widths back 
>>> into the format specifications, we are basically saying the idea of a 
>>> separate field width specifier is wrong.
>>>
>>> So maybe it's not really a separate independent thing after all, and 
>>> it just a convenient grouping for readability purposes only.
>>
>> I'm beginning to suspect that this is indeed the case.
> 
> Yes, I believe so even more after experimenting last night with 
> specifier objects.
> 
> for now I'm using ','s for separating *all* the terms.  I don't intend 
> that should be used for a final version, but for now it makes parsing 
> the terms and getting the behavior right much easier.
> 
>      f,.3,>7     right justify in field width 7, with 3 decimal places.
> 
>      s,^10,w20   Center in feild 10,  expands up to width 20.
> 
>      f,.3,%
> 
> This allows me to just split on ',' and experiment with ordering and see 
> how some terms might need to interact with other terms and how to do 
> that without having to fight the syntax problem for now.
> 
> Later the syntax can be compressed and tested with a fairly complete 
> doctest as a separate problem.

When you get a chance, can you write down your current thinking in a 
single document? Right now, there are lots of suggestions scattered in a 
bunch of different messages, some of which have been superseded, and 
it's hard to sew them together.

At this point, I think that as far as the mini-language goes, after 
wandering far afield from the original PEP we have arrived at a design 
that's not very far - at least semantically - from what we started with. 
In other words, other than the special case of 'repr', we find that 
pretty much everything can fit into a single specifier string; Attempts 
to break it up into two independent specifiers that are handled by two 
different entities run into the problem that the specifiers aren't 
independent and there are interactions between the two. Because the 
dividing line between "format specifier" and "alignment specifier" 
changes based on the type of data being formatted, trying to keep them 
separate results in redundancy and duplication, where we end up with 
more than one way to specify padding, alignment, or minimum width.

So I'm tempted to just use what's in the PEP now as a starting point - 
perhaps re-arranging the order of attributes, as has been discussed, or 
perhaps not - and then handling 'repr' via a different prefix character 
other than ':'. The 'repr' flag does nothing more than call __repr__ on 
the object, and then call __format__ on the result using whatever 
conversion spec was specified. (There might be a similar flag that does 
a call to __str__, which has the effect of calling str.__format__ 
instead of the object's native __format__ function.)

As far as requiring the different built-in versions of __format__ to 
have to parse the standard conversion specifier, that is not a problem 
in practice, as we'll have a little mini-parser that parses the 
conversion spec and fills in a C struct. There will also be a 
Python-accessible version of the same thing for people extending 
formatters in Python.

So, the current action items are:

1) Get consensus the syntax of the formatting mini-language.

2) Create a pure-python implementation of the global 'format' function, 
which will be a new standard library function that formats a single 
value, given a conversion spec:

    format(value, conversion)

3) Write implementations of str.__format__, int.__format__, 
float.__format__, decimal.__format__ and so on.

4) Create C implementations of the above.

5) Write the code for complex, multi-value formatting as specified in 
the PEP, and hook up to the built-in string class.

-- Talin

From martin at v.loewis.de  Thu Aug  9 10:30:53 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 09 Aug 2007 10:30:53 +0200
Subject: [Python-3000] Binary compatibility
In-Reply-To: <f9d1vc$qmf$1@sea.gmane.org>
References: <cd53a0140708061233v3776e9d5m40f39f3a022bb76d@mail.gmail.com>	<46B7EC3E.3070802@v.loewis.de>	<cd53a0140708070015j7f944ea5qca963da2684d5e86@mail.gmail.com>	<46B95895.20705@v.loewis.de>
	<f9d1vc$qmf$1@sea.gmane.org>
Message-ID: <46BAD0BD.7040905@v.loewis.de>

>> I see why a guarantee would help, but I don't think it's necessary.
>> Just provide UCS-4 binaries only on Linux, and when somebody complains,
>> tell them to recompile Python, or to recompile your software themselves.
> 
> Won't recompiling Python break every other Python program on their system,
> though? (e.g. anything that itself uses a C Python library)

It depends. If they use their own Python installation, just tell them
to use the vendor-supplied one instead - it likely is UCS-4.

If the vendor-supplied one is UCS-2, talk to the vendor. For the user,
tell them to make a separate installation (e.g. in /usr/local/bin).
This won't interfere with the existing installation.

> Also, anything involving recompiling isn't exactly user friendly... we
> might give Linux a bad name!

Hmm. Some think that Linux has becoming worse when people stopped
compiling the kernel themselves.

>> The defaults in 2.5.x cannot be changed anymore. The defaults could
>> be changed for Linux in 2.6, but then the question is: why just for
>> Linux?
> 
> Are there different Windows python binaries around with different UCS-2/4
> settings?

No. The definition of Py_UNICODE on Windows is mandated by the operating
system, which has Unicode APIs that are two-bytes, and Python wants to
use them.

> BTW, none of this is urgent. We experimented with Python/C hybrids in the
> past. It didn't work, so we carried on using pure C programs for anything
> that needed any part in C. So it's not causing actual problems for users
> right now. It would just be nice to have it sorted out one day, so we
> could use Python more in the future.

I can understand the concern, but I believe it is fairly theoretical.
Most distributions do use UCS-4 these days, so the problem went away
by de-facto standardization, not by de-jure standardization.

Regards,
Martin


From rrr at ronadam.com  Thu Aug  9 12:42:56 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 09 Aug 2007 05:42:56 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BAC2D9.2020902@acm.org>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
Message-ID: <46BAEFB0.9050400@ronadam.com>



Talin wrote:
> Ron Adam wrote:
>> Talin wrote:
>>> Ron Adam wrote:
>>>> Now here's the problem with all of this.  As we add the widths back 
>>>> into the format specifications, we are basically saying the idea of 
>>>> a separate field width specifier is wrong.
>>>>
>>>> So maybe it's not really a separate independent thing after all, and 
>>>> it just a convenient grouping for readability purposes only.
>>>
>>> I'm beginning to suspect that this is indeed the case.
>>
>> Yes, I believe so even more after experimenting last night with 
>> specifier objects.
>>
>> for now I'm using ','s for separating *all* the terms.  I don't intend 
>> that should be used for a final version, but for now it makes parsing 
>> the terms and getting the behavior right much easier.
>>
>>      f,.3,>7     right justify in field width 7, with 3 decimal places.
>>
>>      s,^10,w20   Center in feild 10,  expands up to width 20.
>>
>>      f,.3,%
>>
>> This allows me to just split on ',' and experiment with ordering and 
>> see how some terms might need to interact with other terms and how to 
>> do that without having to fight the syntax problem for now.
>>
>> Later the syntax can be compressed and tested with a fairly complete 
>> doctest as a separate problem.
> 
> When you get a chance, can you write down your current thinking in a 
> single document? Right now, there are lots of suggestions scattered in a 
> bunch of different messages, some of which have been superseded, and 
> it's hard to sew them together.

I'll see what I can come up with.  But I think you pretty much covered it 
below.

> At this point, I think that as far as the mini-language goes, after 
> wandering far afield from the original PEP we have arrived at a design 
> that's not very far - at least semantically - from what we started with. 

Yes, I agree.

> In other words, other than the special case of 'repr', we find that 
> pretty much everything can fit into a single specifier string; Attempts 
> to break it up into two independent specifiers that are handled by two 
> different entities run into the problem that the specifiers aren't 
> independent and there are interactions between the two. Because the 
> dividing line between "format specifier" and "alignment specifier" 
> changes based on the type of data being formatted, trying to keep them 
> separate results in redundancy and duplication, where we end up with 
> more than one way to specify padding, alignment, or minimum width.

Yes.

Another deciding factor is weather or not users want a general formatting 
language that is very flexible, and allows them to combine and order 
instructions to do a wide variety of things.  Some of which may not make 
much sense.  (Just like you can create regular expressions that don't make 
sense.)

Or do they want an option based system that limits what they can do to a 
set of well defined behaviors?

It seems having well defined behaviors (limited to things that make sense.) 
is preferred.

(Although I prefer the former myself.)


> So I'm tempted to just use what's in the PEP now as a starting point - 
> perhaps re-arranging the order of attributes, as has been discussed, or 
> perhaps not - and then handling 'repr' via a different prefix character 
> other than ':'. The 'repr' flag does nothing more than call __repr__ on 
> the object, and then call __format__ on the result using whatever 
> conversion spec was specified. (There might be a similar flag that does 
> a call to __str__, which has the effect of calling str.__format__ 
> instead of the object's native __format__ function.)

The way to think of 'repr' and 'str' is that of a general "object" format 
type/specifier.  That puts str and repr into the same context as the rest 
of the format types.  This is really a point of view issue and not so much 
of a semantic one.  I think {0:r} and {0:s} are to "object", as {0:d} and 
{0:e} are to "float" ...  just another relationship relative to the value 
being formatted.  So I don't understand the need to treat them differently.


> As far as requiring the different built-in versions of __format__ to 
> have to parse the standard conversion specifier, that is not a problem 
> in practice, as we'll have a little mini-parser that parses the 
> conversion spec and fills in a C struct. There will also be a 
> Python-accessible version of the same thing for people extending 
> formatters in Python.

This is not too far from what I was thinking then.

I'm not sure I can add much to that.

My current experimental implementation, allows for pre-parsing a format 
string so the parsing step can be moved outside of a loop and doesn't have 
to be reparsed on each use, or it can be examined and possibly modified 
before applying it to arguments.

I'm not sure how useful that is, but instead of iterating a string and 
handling each item sequentially, it parses the whole string and all the 
format fields at one time, then formats all the arguments, then does a 
list.join() operation to combine them.  This may be faster in pure python, 
but probably slower in C.


> So, the current action items are:
> 
> 1) Get consensus the syntax of the formatting mini-language.

Putting the syntax first can introduce side effects or limitations as a 
result of the syntax.  So this might be better as a later step.

By getting a consensus on the exact behaviors and then proceeding to the 
implementation, I think it will move things along faster.  While this is 
for the most part is in the pep, I think any loose ends on the behavior 
side should be nailed down completely before the final syntax is worked out.

Then we can find a syntax that works with the implementation, rather than 
try to make the implementation work with the syntax.


> 2) Create a pure-python implementation of the global 'format' function, 
> which will be a new standard library function that formats a single 
> value, given a conversion spec:
> 
>    format(value, conversion)
> 
> 3) Write implementations of str.__format__, int.__format__, 
> float.__format__, decimal.__format__ and so on.
> 
> 4) Create C implementations of the above.
> 
> 5) Write the code for complex, multi-value formatting as specified in 
> the PEP, and hook up to the built-in string class.

I think finishing up #2 and #3 should come first with very extensive tests. 
  (Using what ever syntax works for now.)

I've been going over the tests in the sand box trying to get my 
experimental version to pass them.  Once I get it to pass most of them I'll 
send you a copy.

BTW.. I noticed str.center() has an odd behavior of alternating uneven 
padding widths on odd or even lengths strings.  Is this intentional?

 >>> 'a'.center(2)
'a '
 >>> 'aa'.center(3)
' aa'


Cheers,
    Ron

From nmm1 at cus.cam.ac.uk  Wed Aug  8 11:28:16 2007
From: nmm1 at cus.cam.ac.uk (Nick Maclaren)
Date: Wed, 08 Aug 2007 10:28:16 +0100
Subject: [Python-3000] Regular expressions, Unicode etc.
Message-ID: <E1IIhpw-0001jT-2W@virgo.cus.cam.ac.uk>

I have needed to push my stack to teach REs (don't ask), and am
taking a look at the RE code.  I may be able to extend it to support
RFE 694374 and (more importantly) atomic groups and possessive
quantifiers.  While I regard such things as revolting beyond belief,
they make a HELL of a difference to the efficiency of recognising
things like HTML tags in a morass of mixed text.

The other approach, which is to stick to true regular expressions,
and wholly or partially convert to DFAs, has already been rendered
impossible by even the limited Perl/PCRE extensions that Python
has adopted.

My first question is whether this would clash with any ongoing
work, including being superseded by any changes in Python 3000.

Note that I am NOT proposing to do a fixed task, but will produce
a proper proposal only when I know what I can achieve for a small
amount of work.  If the SRE engine turns out to be unsuitable to
extend in these ways, I shall quietly abandon the project.



My second one is about Unicode.  I really, but REALLY regard it as
a serious defect that there is no escape for printing characters.
Any code that checks arbitrary text is likely to need them - yes,
I know why Perl and hence PCRE doesn't have that, but let's skip
that.  That is easy to add, though choosing a letter is tricky.
Currently \c and \C, for 'character' (I would prefer 'text' or
'printable', but \t is obviously insane and \P is asking for
incompatibility with Perl and Java).

But attempting to rebuild the Unicode database hasn't worked.
Tools/unicode is, er, a trifle incomplete and out of date.  The
only file I need to change is Objects/unicodetype_db.h, but the
init attempts to run Tools/unicode/makeunicodedata.py have not
been successful.

I may be able to reverse engineer the mechanism enough to get
the files off the Unicode site and run it, but I don't want to
spend forever on it.  Any clues?


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679

From skip at pobox.com  Thu Aug  9 13:14:07 2007
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 9 Aug 2007 06:14:07 -0500
Subject: [Python-3000] C API cleanup str
In-Reply-To: <f9east$814$1@sea.gmane.org>
References: <ee2a432c0708022301m3dbf7a7tfe7378336fe0cefb@mail.gmail.com>
	<46B2C8E0.8080409@canterbury.ac.nz>
	<ee2a432c0708022331q768f3cbbu527aad3359ed9552@mail.gmail.com>
	<46B5C47B.5090703@v.loewis.de>
	<ca471dc20708050808o7dfcb8c4pe59acff12b57c8f9@mail.gmail.com>
	<46B5F136.4010502@v.loewis.de>
	<ca471dc20708050859g3e3178d7ncfc6209e9ea05708@mail.gmail.com>
	<46B5FA11.5040404@v.loewis.de>
	<ee2a432c0708081452k8b874e7j8547f36bb908fc23@mail.gmail.com>
	<18106.33310.386717.634156@montanaro.dyndns.org>
	<f9east$814$1@sea.gmane.org>
Message-ID: <18106.63231.371228.836379@montanaro.dyndns.org>


    Georg> When will that be? I think we should then switch over to the reST
    Georg> tree before you start.

Georg,

Can you remind me how to get at your new doc tree?

Skip

From guido at python.org  Thu Aug  9 15:57:58 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Aug 2007 06:57:58 -0700
Subject: [Python-3000] Moving to a "py3k" branch *NOW*
Message-ID: <ca471dc20708090657w496ec055u8736de0ca5787b81@mail.gmail.com>

I am starting now. Please, no more checkins to either p3yk ot py3k-struni.

On 8/8/07, Guido van Rossum <guido at python.org> wrote:
> I would like to move to a new branch soon for all Py3k development.
>
> I plan to name the branch "py3k".  It will be branched from
> py3k-struni.  I will do one last set of merges from the trunk via p3yk
> (note typo!) and py3k-struni, and then I will *delete* the old py3k
> and py3k-struni branches (you will still be able to access their last
> known good status by syncing back to a previous revision).  I will
> temporarily shut up some unit tests to avoid getting endless spam from
> Neal's buildbot.
>
> After the switch, you should be able to switch your workspaces to the
> new branch using the "svn switch" command.
>
> If anyone is in the middle of something that would become painful due
> to this changeover, let me know ASAP and I'll delay.
>
> I will send out another message when I start the move, and another
> when I finish it.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug  9 16:43:41 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Aug 2007 07:43:41 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
Message-ID: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>

This is done. The new py3k branch is ready for business.

If you currently have the py3k-struni branch checked out (at its top
level), *don't update*, but issue the following commands:

  svn switch svn+ssh://pythondev at svn.python.org/python/branches/py3k
  svn update

Only a small amount of activity should result (unless you didn't svn
update for a long time).

For the p3yk branch, the same instructions will work, but the svn
update will update most of your tree. A "make clean" is recommended in
this case.

Left to do:

- update the wikis
- clean out the old branches
- switch the buildbot and the doc builder to use the new branch (Neal)

There are currently about 7 failing unit tests left:

test_bsddb
test_bsddb3
test_email
test_email_codecs
test_email_renamed
test_sqlite
test_urllib2_localnet

See http://wiki.python.org/moin/Py3kStrUniTests for detailed status
regarding these.

--Guido

On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> I am starting now. Please, no more checkins to either p3yk ot py3k-struni.
>
> On 8/8/07, Guido van Rossum <guido at python.org> wrote:
> > I would like to move to a new branch soon for all Py3k development.
> >
> > I plan to name the branch "py3k".  It will be branched from
> > py3k-struni.  I will do one last set of merges from the trunk via p3yk
> > (note typo!) and py3k-struni, and then I will *delete* the old py3k
> > and py3k-struni branches (you will still be able to access their last
> > known good status by syncing back to a previous revision).  I will
> > temporarily shut up some unit tests to avoid getting endless spam from
> > Neal's buildbot.
> >
> > After the switch, you should be able to switch your workspaces to the
> > new branch using the "svn switch" command.
> >
> > If anyone is in the middle of something that would become painful due
> > to this changeover, let me know ASAP and I'll delay.
> >
> > I will send out another message when I start the move, and another
> > when I finish it.
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From victor.stinner at haypocalc.com  Thu Aug  9 17:40:58 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 9 Aug 2007 17:40:58 +0200
Subject: [Python-3000] bytes regular expression?
In-Reply-To: <ca471dc20708082107q640c70ccjf535b72f1744ae2e@mail.gmail.com>
References: <200708090427.19830.victor.stinner@haypocalc.com>
	<ca471dc20708082107q640c70ccjf535b72f1744ae2e@mail.gmail.com>
Message-ID: <200708091740.59070.victor.stinner@haypocalc.com>

Hi,

On Thursday 09 August 2007 06:07:12 Guido van Rossum wrote:
> A quick temporary hack is to use buffer(b'abc') instead. (buffer() is
> so incredibly broken that it lets you hash() even if the underlying
> object is broken. :-)

I prefer str8 which looks to be a good candidate for "frozenbytes" type.

> The correct solution is to fix the re library to avoid using hash()
> directly on the underlying data type altogether; that never had sound
> semantics (as proven by the buffer() hack above).

re module uses a dictionary to store compiled expressions and the key is a 
tuple (pattern, flags) where pattern is a bytes (str8) or str and flags is an 
int.

re module bugs:
 1. _compile() doesn't support bytes
 2. escape() doesn't support bytes

My attached patch fix both bugs:
 - convert bytes to str8 in _compile() to be able to hash it
 - add a special version of escape() for bytes

I don't know the best method to create a bytes in a for. In Python 2.x, the 
best method is to use a list() and ''.join(). Since bytes is mutable I 
choosed to use append() and concatenation (a += b).

I also added new unit test for escape() function with bytes argument.

You may not apply my patch directly. I don't know Python 3000 very well nor 
Python coding style. But my patch should help to fix the bugs ;-)

-----

Why re module has code for Python < 2.2 (optional finditer() function)? Since 
the code is now specific to Python 3000, we should use new types like set 
(use a set for _alphanum instead of a dictionary) and functions like 
enumerate (in _escape for str block).

Victor Stinner
http://hachoir.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: py3k-struni-re.diff
Type: text/x-diff
Size: 3440 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070809/23e53b6a/attachment-0001.bin 

From jason.orendorff at gmail.com  Thu Aug  9 17:42:46 2007
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Thu, 9 Aug 2007 11:42:46 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7FACC.8030503@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
Message-ID: <bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>

On 8/7/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> My concern is that people need to access existing databases. It's
> all fine that the code accessing them breaks, and that they have
> to actively port to Py3k. However, telling them that they have to
> represent the keys in their dbm disk files in a different manner
> might cause a revolt...

Too true.  Offhand, why not provide hooks for serializing and
deserializing keys?  The same for values, too.  People porting to
py3k could use those.  Besides, this thread makes it sound like
people usually write their own wrapper classes whenever they
use *dbm.  Hooks would help with that, or even eliminate the
need altogether.

-j

From jjb5 at cornell.edu  Thu Aug  9 18:11:16 2007
From: jjb5 at cornell.edu (Joel Bender)
Date: Thu, 09 Aug 2007 12:11:16 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>	<46B7EA06.5040106@v.loewis.de>	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>	<46B7FACC.8030503@v.loewis.de>
	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
Message-ID: <46BB3CA4.9010904@cornell.edu>

Jason Orendorff wrote:

> Hooks would help with that, or even eliminate the need altogether.

IMHO, having a __bytes__ method would go a long way.


Joel

From kbk at shore.net  Thu Aug  9 18:21:58 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Thu, 09 Aug 2007 12:21:58 -0400
Subject: [Python-3000] IDLE in new py3k
Message-ID: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com>


After a clean checkout in py3k, IDLE fails even w/o subprocess...

trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py
Fatal Python error: PyEval_SaveThread: NULL tstate
Aborted

trader ~/PYDOTORG/projects/python/branches/py3k$ ./python
Python 3.0x (py3k:56858, Aug  9 2007, 12:09:06) 
[GCC 4.1.2 20061027 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py -n
Fatal Python error: PyEval_SaveThread: NULL tstate
Aborted

Any ideas on where to look?
-- 
KBK

From jason.orendorff at gmail.com  Thu Aug  9 18:34:11 2007
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Thu, 9 Aug 2007 12:34:11 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46BB3CA4.9010904@cornell.edu>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
	<46BB3CA4.9010904@cornell.edu>
Message-ID: <bb8868b90708090934h5a43d1ccvd4de91fc6faf105a@mail.gmail.com>

On 8/9/07, Joel Bender <jjb5 at cornell.edu> wrote:
> Jason Orendorff wrote:
> > Hooks would help with that, or even eliminate the need altogether.
>
> IMHO, having a __bytes__ method would go a long way.

Well, it would go halfway--you also need to deserialize.
__bytes__ alone would be useless.

Of course Python does have a library for serializing and
deserializing practically anything: pickle.  What I proposed
is a generalization of shelve.

  http://docs.python.org/lib/module-shelve.html

-j

From guido at python.org  Thu Aug  9 18:36:27 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Aug 2007 09:36:27 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46BB3CA4.9010904@cornell.edu>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
	<46BB3CA4.9010904@cornell.edu>
Message-ID: <ca471dc20708090936o7f74815fi15d2880ae4203ffe@mail.gmail.com>

On 8/9/07, Joel Bender <jjb5 at cornell.edu> wrote:
> Jason Orendorff wrote:
>
> > Hooks would help with that, or even eliminate the need altogether.
>
> IMHO, having a __bytes__ method would go a long way.

I've heard this before, but there are many different, equally
attractive ways to serialize objects to bytes (e.g. marshal, pickle,
repr + encode, etc.).

Bytes are not the same as strings, and you need to think differently about them.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From victor.stinner at haypocalc.com  Thu Aug  9 18:38:00 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 9 Aug 2007 18:38:00 +0200
Subject: [Python-3000] bytes regular expression?
In-Reply-To: <200708091740.59070.victor.stinner@haypocalc.com>
References: <200708090427.19830.victor.stinner@haypocalc.com>
	<ca471dc20708082107q640c70ccjf535b72f1744ae2e@mail.gmail.com>
	<200708091740.59070.victor.stinner@haypocalc.com>
Message-ID: <200708091838.00298.victor.stinner@haypocalc.com>

On Thursday 09 August 2007 17:40:58 I wrote:
> My attached patch fix both bugs:
>  - convert bytes to str8 in _compile() to be able to hash it
>  - add a special version of escape() for bytes

My first try was buggy for this snippet code:
   import re
   assert type(re.sub(b'', b'', b'')) is bytes
   assert type(re.sub(b'(x)', b'[\\1]', b'x')) is bytes

My first patch mix bytes and str8 and so re.sub fails in some cases.

So here is a new patch using str8 in dictionary key and str in regex parsing 
(sre_parse.py) (and then reconvert to bytes for 'literals' variable).

Victor Stinner
http://hachoir.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: py3k-struni-re2.diff
Type: text/x-diff
Size: 5398 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070809/7ff5e743/attachment.bin 

From guido at python.org  Thu Aug  9 18:44:58 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Aug 2007 09:44:58 -0700
Subject: [Python-3000] IDLE in new py3k
In-Reply-To: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com>
References: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <ca471dc20708090944i32deb00eu55ffe53b28a8d95f@mail.gmail.com>

On 8/9/07, Kurt B. Kaiser <kbk at shore.net> wrote:
>
> After a clean checkout in py3k, IDLE fails even w/o subprocess...
>
> trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py
> Fatal Python error: PyEval_SaveThread: NULL tstate
> Aborted
>
> trader ~/PYDOTORG/projects/python/branches/py3k$ ./python
> Python 3.0x (py3k:56858, Aug  9 2007, 12:09:06)
> [GCC 4.1.2 20061027 (prerelease)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>>
>
> trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py -n
> Fatal Python error: PyEval_SaveThread: NULL tstate
> Aborted

So it does. :-(

> Any ideas on where to look?

No, but I'll see if I can find anything with gdb.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jarausch at igpm.rwth-aachen.de  Thu Aug  9 18:38:18 2007
From: jarausch at igpm.rwth-aachen.de (Helmut Jarausch)
Date: Thu, 09 Aug 2007 18:38:18 +0200 (CEST)
Subject: [Python-3000] idle3.0 - is is supposed to work?
Message-ID: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>

Hi,

probably, I am too impatient.
I've just installed py3k (the new branch).
Trying 
idle3.0

I get

Traceback (most recent call last):
  File "/usr/local/bin/idle3.0", line 3, in <module>
    from idlelib.PyShell import main
  File "/usr/local/lib/python3.0/idlelib/PyShell.py", line 26, in <module>
    from .EditorWindow import EditorWindow, fixwordbreaks
  File "/usr/local/lib/python3.0/idlelib/EditorWindow.py", line 16, in <module>
    from . import GrepDialog
  File "/usr/local/lib/python3.0/idlelib/GrepDialog.py", line 5, in <module>
    import SearchEngine
ImportError: No module named SearchEngine

Thanks all of you for improving Python even more,

Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

From theller at ctypes.org  Thu Aug  9 19:21:27 2007
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 09 Aug 2007 19:21:27 +0200
Subject: [Python-3000] bytes regular expression?
In-Reply-To: <200708091740.59070.victor.stinner@haypocalc.com>
References: <200708090427.19830.victor.stinner@haypocalc.com>	<ca471dc20708082107q640c70ccjf535b72f1744ae2e@mail.gmail.com>
	<200708091740.59070.victor.stinner@haypocalc.com>
Message-ID: <f9fien$ccf$1@sea.gmane.org>

Victor Stinner schrieb:
> 
> I prefer str8 which looks to be a good candidate for "frozenbytes" type.
> 

I love this idea!  Leave str8 as it is, maybe extend Python so that it understands
the s"..." literals and we are done.

Thomas


From kbk at shore.net  Thu Aug  9 19:22:07 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Thu, 09 Aug 2007 13:22:07 -0400
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de> (Helmut Jarausch's
	message of "Thu, 09 Aug 2007 18:38:18 +0200 (CEST)")
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>
Message-ID: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>

Helmut Jarausch <jarausch at igpm.rwth-aachen.de> writes:

> probably, I am too impatient.
> I've just installed py3k (the new branch).
> Trying 
> idle3.0
>
> I get
>
> Traceback (most recent call last):
>   File "/usr/local/bin/idle3.0", line 3, in <module>
>     from idlelib.PyShell import main
>   File "/usr/local/lib/python3.0/idlelib/PyShell.py", line 26, in <module>
>     from .EditorWindow import EditorWindow, fixwordbreaks
>   File "/usr/local/lib/python3.0/idlelib/EditorWindow.py", line 16, in <module>
>     from . import GrepDialog
>   File "/usr/local/lib/python3.0/idlelib/GrepDialog.py", line 5, in <module>
>     import SearchEngine
> ImportError: No module named SearchEngine

I just checked in a fix - GrepDialog.py wasn't using relative imports.
I'm not sure why you hit this exception and I did't.  Probably a sys.path
difference.

Try again.

What platform are you using?  On

Linux trader 2.6.18-ARCH #1 SMP PREEMPT Sun Nov 19 09:14:35 CET 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux

(and whatever GvR is running)  IDLE isn't starting at all in py3k.

-- 
KBK

From guido at python.org  Thu Aug  9 19:31:18 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Aug 2007 10:31:18 -0700
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>
	<87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>

On 8/9/07, Kurt B. Kaiser <kbk at shore.net> wrote:
> What platform are you using?  On
>
> Linux trader 2.6.18-ARCH #1 SMP PREEMPT Sun Nov 19 09:14:35 CET 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
>
> (and whatever GvR is running)  IDLE isn't starting at all in py3k.

I get the same failure on OSX (PPC) and on Linux (x86 Ubuntu). It has
to do with the Tcl/Tk wrapping code, in particular it's in the
LEAVE_PYTHON macro on line 1995 on _tkinter.c.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From theller at ctypes.org  Thu Aug  9 19:40:12 2007
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 09 Aug 2007 19:40:12 +0200
Subject: [Python-3000] Pleaswe help with the countdown to zero failing
 tests in the struni branch!
In-Reply-To: <46B96B0D.6080605@v.loewis.de>
References: <ca471dc20708061655w62420da3m1d87ffa5eff669c6@mail.gmail.com>	<f99n87$8or$1@sea.gmane.org>	<ca471dc20708070736t6797477bw6667e0608bccb316@mail.gmail.com>	<f9a4c3$1ef$1@sea.gmane.org>	<ca471dc20708070955k4c734bcbge8efd08e6b001af8@mail.gmail.com>
	<46B96B0D.6080605@v.loewis.de>
Message-ID: <f9fjhs$gqa$1@sea.gmane.org>

Martin v. L?wis schrieb:
>>> It's in Modules/timemodule.c, line 691:
>>>         PyModule_AddObject(m, "tzname",
>>>                            Py_BuildValue("(zz)", tzname[0], tzname[1]));
>>>
>>> According to MSDN, tzname is a global variable; the contents is somehow
>>> derived from the TZ environment variable (which is not set in my case).
>> 
>> Is there anything from which you can guess the encoding (e.g. the
>> filesystem encoding?).
> 
> It's in the locale's encoding. On Windows, that will be "mbcs"; on other
> systems, the timezone names are typically all in ASCII - this would
> allow for a quick work-around. Using the filesytemencoding would also
> work, although it would be an equal hack: it's *meant* to be used only
> for file names (and on OSX at least, it deviates from the locale's
> encoding - although I have no idea what tzname is encoded in on OSX).
> 
>> These are all externally-provided strings. It will depend on the
>> platform what the encoding is.
>> 
>> I wonder if we need to add another format code to Py_BuildValue (and
>> its friends) to designate "platform default encoding" instead of
>> UTF-8.
> 
> For symmetry with ParseTuple, there could be the 'e' versions
> (es, ez, ...) which would take a codec name also.

That would be great, imo. OTOH, I have not time to do this.
Currently, I have to set TZ=GMT to be able to start python 3.

Thomas


From steven.bethard at gmail.com  Thu Aug  9 19:39:50 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Thu, 9 Aug 2007 11:39:50 -0600
Subject: [Python-3000] bytes regular expression?
In-Reply-To: <200708091740.59070.victor.stinner@haypocalc.com>
References: <200708090427.19830.victor.stinner@haypocalc.com>
	<ca471dc20708082107q640c70ccjf535b72f1744ae2e@mail.gmail.com>
	<200708091740.59070.victor.stinner@haypocalc.com>
Message-ID: <d11dcfba0708091039m7e976a39gc006c1eb9726baf7@mail.gmail.com>

On 8/9/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> re module uses a dictionary to store compiled expressions and the key is a
> tuple (pattern, flags) where pattern is a bytes (str8) or str and flags is an
> int.

So why not just skip caching for anything that doesn't hash()?  If
you're really worried about efficiency, simply re.compile() the
expression once and don't rely on the re module's internal cache.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From martin at v.loewis.de  Thu Aug  9 23:04:46 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 09 Aug 2007 23:04:46 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>	
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>	
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>	
	<46B7EA06.5040106@v.loewis.de>	
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>	
	<46B7FACC.8030503@v.loewis.de>
	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
Message-ID: <46BB816E.1070507@v.loewis.de>

Jason Orendorff schrieb:
> On 8/7/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> My concern is that people need to access existing databases. It's
>> all fine that the code accessing them breaks, and that they have
>> to actively port to Py3k. However, telling them that they have to
>> represent the keys in their dbm disk files in a different manner
>> might cause a revolt...
> 
> Too true.  Offhand, why not provide hooks for serializing and
> deserializing keys? 

Perhaps YAGNI? We already support pickling values (dbshelve),
and I added support for encoding/decoding strings as either
keys or values (though in a limited manner).

In any case, somebody would have to make a specification
for that, and then somebody would have to provide an
implementation of it.

Regards,
Martin


From guido at python.org  Thu Aug  9 23:49:44 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Aug 2007 14:49:44 -0700
Subject: [Python-3000] IDLE in new py3k
In-Reply-To: <ca471dc20708090944i32deb00eu55ffe53b28a8d95f@mail.gmail.com>
References: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708090944i32deb00eu55ffe53b28a8d95f@mail.gmail.com>
Message-ID: <ca471dc20708091449j281c7f52y8493b93e569d4bb@mail.gmail.com>

I've checked in a fix for the immediate cause of the fatal error: an
error path in PythonCmd() was passign through the LEAVE_PYTHON macro
twice. This bug was present even on the trunk, where I fixed it too
(and probably in 2.5 as well, but I didn't check).

But the reason we got here was that an AsString() call failed. Why?
Here's the traceback:

Traceback (most recent call last):
  File "/usr/local/google/home/guido/python/py3k/Lib/runpy.py", line
83, in run_module
    filename, loader, alter_sys)
  File "/usr/local/google/home/guido/python/py3k/Lib/runpy.py", line
50, in _run_module_code
    mod_name, mod_fname, mod_loader)
  File "/usr/local/google/home/guido/python/py3k/Lib/runpy.py", line
32, in _run_code
    exec(code, run_globals)
  File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/idle.py",
line 21, in <module>
    idlelib.PyShell.main()
  File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/PyShell.py",
line 1385, in main
    shell = flist.open_shell()
  File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/PyShell.py",
line 272, in open_shell
    self.pyshell = PyShell(self)
  File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/PyShell.py",
line 795, in __init__
    OutputWindow.__init__(self, flist, None, None)
  File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/OutputWindow.py",
line 16, in __init__
    EditorWindow.__init__(self, *args)
  File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/EditorWindow.py",
line 231, in __init__
    per.insertfilter(color)
  File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/Percolator.py",
line 35, in insertfilter
    filter.setdelegate(self.top)
  File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/ColorDelegator.py",
line 49, in setdelegate
    self.config_colors()
  File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/ColorDelegator.py",
line 56, in config_colors
    self.tag_configure(tag, **cnf)
  File "/usr/local/google/home/guido/python/py3k/Lib/lib-tk/Tkinter.py",
line 3066, in tag_configure
    return self._configure(('tag', 'configure', tagName), cnf, kw)
  File "/usr/local/google/home/guido/python/py3k/Lib/lib-tk/Tkinter.py",
line 1187, in _configure
    self.tk.call(_flatten((self._w, cmd)) + self._options(cnf))
_tkinter.TclError: unknown option "#000000"

--Guido

On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> On 8/9/07, Kurt B. Kaiser <kbk at shore.net> wrote:
> >
> > After a clean checkout in py3k, IDLE fails even w/o subprocess...
> >
> > trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py
> > Fatal Python error: PyEval_SaveThread: NULL tstate
> > Aborted
> >
> > trader ~/PYDOTORG/projects/python/branches/py3k$ ./python
> > Python 3.0x (py3k:56858, Aug  9 2007, 12:09:06)
> > [GCC 4.1.2 20061027 (prerelease)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>>
> >
> > trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py -n
> > Fatal Python error: PyEval_SaveThread: NULL tstate
> > Aborted
>
> So it does. :-(
>
> > Any ideas on where to look?
>
> No, but I'll see if I can find anything with gdb.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jason.orendorff at gmail.com  Fri Aug 10 00:00:58 2007
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Thu, 9 Aug 2007 18:00:58 -0400
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46BB816E.1070507@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
	<46BB816E.1070507@v.loewis.de>
Message-ID: <bb8868b90708091500k46d62e59id7a1a203a2cc6536@mail.gmail.com>

On 8/9/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Too true.  Offhand, why not provide hooks for serializing and
> > deserializing keys?
>
> Perhaps YAGNI? We already support pickling values (dbshelve),
> and I added support for encoding/decoding strings as either
> keys or values (though in a limited manner).

You don't need to go outside this thread to find a use case
not covered by either of those.

> In any case, somebody would have to make a specification
> for that, and then somebody would have to provide an
> implementation of it.

It was just a suggestion.  I wish this could occasionally
go without saying.

-j

From martin at v.loewis.de  Fri Aug 10 00:08:09 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Aug 2007 00:08:09 +0200
Subject: [Python-3000] bytes regular expression?
In-Reply-To: <f9fien$ccf$1@sea.gmane.org>
References: <200708090427.19830.victor.stinner@haypocalc.com>	<ca471dc20708082107q640c70ccjf535b72f1744ae2e@mail.gmail.com>	<200708091740.59070.victor.stinner@haypocalc.com>
	<f9fien$ccf$1@sea.gmane.org>
Message-ID: <46BB9049.7090406@v.loewis.de>

>> I prefer str8 which looks to be a good candidate for "frozenbytes" type.
>>
> 
> I love this idea!  Leave str8 as it is, maybe extend Python so that it understands
> the s"..." literals and we are done.

Please, no. Two string-like types with literals are enough.

Regards,
Martin

From martin at v.loewis.de  Fri Aug 10 00:16:35 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Aug 2007 00:16:35 +0200
Subject: [Python-3000] IDLE in new py3k
In-Reply-To: <ca471dc20708091449j281c7f52y8493b93e569d4bb@mail.gmail.com>
References: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com>	<ca471dc20708090944i32deb00eu55ffe53b28a8d95f@mail.gmail.com>
	<ca471dc20708091449j281c7f52y8493b93e569d4bb@mail.gmail.com>
Message-ID: <46BB9243.60101@v.loewis.de>

> But the reason we got here was that an AsString() call failed. Why?

The only reason I can see is that PyObject_Str failed; that may happen
if PyObject_Str fails. That, in turn, can happen for bytes objects
if they are not UTF-8.

I think _tkinter should get rid of AsString, and use the Tcl object
API instead (not sure how to do that specifically, though)

> Here's the traceback:

Are you sure these are related? This traceback looks like a Tcl
error - so what does that have to do with AsString?

Regards,
Martin

From martin at v.loewis.de  Fri Aug 10 00:36:15 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Aug 2007 00:36:15 +0200
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>	<87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>
Message-ID: <46BB96DF.5060305@v.loewis.de>

> I get the same failure on OSX (PPC) and on Linux (x86 Ubuntu). It has
> to do with the Tcl/Tk wrapping code, in particular it's in the
> LEAVE_PYTHON macro on line 1995 on _tkinter.c.

I'm not convinced. The actual failure is that "tag configure" is invoked
with a None tagname (which then gets stripped through flatten, apparently).

The ColorDelegator it originates from has these colors:


[('COMMENT', {'foreground': '#dd0000', 'background': '#ffffff'}),
('DEFINITION', {'foreground': '#0000ff', 'background': '#ffffff'}),
('hit', {'foreground': '#ffffff', 'background': '#000000'}), ('STRING',
{'foreground': '#00aa00', 'background': '#ffffff'}), ('KEYWORD',
{'foreground': '#ff7700', 'background': '#ffffff'}), ('stdout',
{'foreground': 'blue', 'background': '#ffffff'}), ('stdin',
{'foreground': None, 'background': None}), ('SYNC', {'foreground': None,
'background': None}), ('BREAK', {'foreground': 'black', 'background':
'#ffff55'}), ('BUILTIN', {'foreground': '#900090', 'background':
'#ffffff'}), ('stderr', {'foreground': 'red', 'background': '#ffffff'}),
('ERROR', {'foreground': '#000000', 'background': '#ff7777'}), (None,
{'foreground': '#000000', 'background': '#ffffff'}), ('console',
{'foreground': '#770000', 'background': '#ffffff'}), ('TODO',
{'foreground': None, 'background': None})

and invokes this code:

        for tag, cnf in self.tagdefs.items():
            if cnf:
                self.tag_configure(tag, **cnf)

so if None is a dictionary key (as it is), you get the error you
see.

Regards,
Martin

From guido at python.org  Fri Aug 10 00:37:23 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Aug 2007 15:37:23 -0700
Subject: [Python-3000] IDLE in new py3k
In-Reply-To: <46BB9243.60101@v.loewis.de>
References: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708090944i32deb00eu55ffe53b28a8d95f@mail.gmail.com>
	<ca471dc20708091449j281c7f52y8493b93e569d4bb@mail.gmail.com>
	<46BB9243.60101@v.loewis.de>
Message-ID: <ca471dc20708091537t538e2b7fme071c2e3bfce4368@mail.gmail.com>

On 8/9/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > But the reason we got here was that an AsString() call failed. Why?
>
> The only reason I can see is that PyObject_Str failed; that may happen
> if PyObject_Str fails. That, in turn, can happen for bytes objects
> if they are not UTF-8.
>
> I think _tkinter should get rid of AsString, and use the Tcl object
> API instead (not sure how to do that specifically, though)
>
> > Here's the traceback:
>
> Are you sure these are related? This traceback looks like a Tcl
> error - so what does that have to do with AsString?

My only evidence is that after I fixed the segfault, this traceback
appeared. Possibly something in IDLE is catching the AsString() error
in a later stage. There may also be timing dependencies since idle
makes heavy use of after().

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Fri Aug 10 00:45:28 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Fri, 10 Aug 2007 00:45:28 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <bb8868b90708091500k46d62e59id7a1a203a2cc6536@mail.gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>	
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>	
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>	
	<46B7EA06.5040106@v.loewis.de>	
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>	
	<46B7FACC.8030503@v.loewis.de>	
	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>	
	<46BB816E.1070507@v.loewis.de>
	<bb8868b90708091500k46d62e59id7a1a203a2cc6536@mail.gmail.com>
Message-ID: <46BB9908.8000704@v.loewis.de>

Jason Orendorff schrieb:
> On 8/9/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>> Too true.  Offhand, why not provide hooks for serializing and
>>> deserializing keys?
>
> It was just a suggestion.  I wish this could occasionally
> go without saying.

Perhaps using "I suggest" instead of asking "why not" would
have clued me; English is not my native language, and I take
questions as literally asking something. Normally, the answer
to "why not do XYZ" is "because nobody has the time to do
that", but too many people asking this specific question
still haven't learned this, so I feel obliged to provide the
obvious answer rather than ignoring the poster. This is
in turn because I heard too many times "I posted this years
ago, but nobody listened".

In some cases, posters genuinely don't know that nobody
else will work on it and that they need to become active
if they want to see things happen. So telling them has the
small chance that we get more contributions out of it than
mere suggestions; this is well worth the time spent to
tell people like you what they already knew.

Regards,
Martin

From kbk at shore.net  Fri Aug 10 00:44:35 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Thu, 09 Aug 2007 18:44:35 -0400
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <46BB96DF.5060305@v.loewis.de> (Martin v. =?iso-8859-1?Q?L=F6?=
	=?iso-8859-1?Q?wis's?= message of
	"Fri, 10 Aug 2007 00:36:15 +0200")
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>
	<87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>
	<46BB96DF.5060305@v.loewis.de>
Message-ID: <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com>

"Martin v. L?wis" <martin at v.loewis.de> writes:

> I'm not convinced. The actual failure is that "tag configure" is invoked
> with a None tagname (which then gets stripped through flatten, apparently).

OTOH, IDLE ran w/o this error in p3yk...

-- 
KBK

From rrr at ronadam.com  Fri Aug 10 00:50:50 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 09 Aug 2007 17:50:50 -0500
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46BB816E.1070507@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>		<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>		<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>		<46B7EA06.5040106@v.loewis.de>		<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>		<46B7FACC.8030503@v.loewis.de>	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
	<46BB816E.1070507@v.loewis.de>
Message-ID: <46BB9A4A.3070201@ronadam.com>



Martin v. L?wis wrote:
> Jason Orendorff schrieb:
>> On 8/7/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>> My concern is that people need to access existing databases. It's
>>> all fine that the code accessing them breaks, and that they have
>>> to actively port to Py3k. However, telling them that they have to
>>> represent the keys in their dbm disk files in a different manner
>>> might cause a revolt...
>> Too true.  Offhand, why not provide hooks for serializing and
>> deserializing keys? 
> 
> Perhaps YAGNI? We already support pickling values (dbshelve),
> and I added support for encoding/decoding strings as either
> keys or values (though in a limited manner).
> 
> In any case, somebody would have to make a specification
> for that, and then somebody would have to provide an
> implementation of it.
> 
> Regards,
> Martin

Just a thought...

Would some sort of an indirect reference type help.  Possibly an 
object_id_based_reference as keys instead of using or hashing the object 
itself?  This wouldn't change if the object mutates between accesses and 
could be immutable.

Ron


From guido at python.org  Fri Aug 10 00:58:58 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Aug 2007 15:58:58 -0700
Subject: [Python-3000] infinite recursion with python -v
In-Reply-To: <ee2a432c0708072257v68f57d70l9b155947650bb640@mail.gmail.com>
References: <ee2a432c0708072257v68f57d70l9b155947650bb640@mail.gmail.com>
Message-ID: <ca471dc20708091558x545dfc9as5bb216b6622a702f@mail.gmail.com>

I've checked a band-aid fix for this (r56878). The band-aid works by
pre-importing the latin-1 codec (and also the utf-8 codec, just to be
sure) *before* setting sys.stdout and sys.stderr (this happens in
site.py, in installnewio()).

This is a horrible hack though, and only works because, as long as
sys.stderr isn't set, the call to PyFile_WriteString() in mywrite()
(in sysmodule.c) returns a quick error, causing mywrite() to write
directly to C's stderr.

I've also checked in a change to PyFile_WriteString() to call
PyUnicode_FromString() instead of PyString_FromString(), but that
doesn't appear to make any difference (r56879).

FWIW, I've attached an edited version of the traceback mailed by Neal;
the email mangled the formatting too much. Maybe someone else has a
bright idea.

--Guido

On 8/7/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> The wiki seems to be done, so sorry for the spam.
>
> python -v crashes due to infinite recursion (well, it tried to be
> infinite until it got a stack overflow :-)  The problem seems to be
> that Lib/encodings/latin_1.py is loaded, but it tries to be converted
> to latin_1, so it tries to load the module, and ...  Or something like
> that.  See below for a call stack.
>
> Minimal version:
>
> PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches
> Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184
> mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at
> Python/sysmodule.c:1350
> PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380
> check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=,
> cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755
> load_source_module (name= "encodings.latin_1", pathname=
> "Lib/encodings/latin_1.py", fp=) at Python/import.c:938
> load_module (name= "encodings.latin_1", fp=,buf=
> "Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733
> import_submodule (mod=, subname= "latin_1",fullname=
> "encodings.latin_1") at Python/import.c:2418
> load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=)
> at Python/import.c:2213
> import_module_level (name=, globals=, locals=, fromlist=, level=0) at
> Python/import.c:1992
> PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=,
> locals=, fromlist=, level=0) at Python/import.c:2056
> builtin___import__ () at Python/bltinmodule.c:151
> [...]
> _PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147
> codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211
> PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275
> PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322
> PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at
> Objects/stringobject.c:459
> string_encode () at Objects/stringobject.c:3138
> [...]
> PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159
> PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches
> Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184
>
> == Stack trace for python -v recursion (argument values are mostly trimmed) ==
>
> PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches
> Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184
> mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at
> Python/sysmodule.c:1350
> PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380
> check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=,
> cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755
> load_source_module (name= "encodings.latin_1", pathname=
> "Lib/encodings/latin_1.py", fp=) at Python/import.c:938
> load_module (name= "encodings.latin_1", fp=,buf=
> "Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733
> import_submodule (mod=, subname= "latin_1",fullname=
> "encodings.latin_1") at Python/import.c:2418
> load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=)
> at Python/import.c:2213
> import_module_level (name=, globals=, locals=, fromlist=, level=0) at
> Python/import.c:1992
> PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=,
> locals=, fromlist=, level=0) at Python/import.c:2056
> builtin___import__ () at Python/bltinmodule.c:151
> PyCFunction_Call () at Objects/methodobject.c:77
> PyObject_Call () at Objects/abstract.c:1736
> do_call () at Python/ceval.c:3764
> call_function (pp_stack=, oparg=513) at Python/ceval.c:3574
> PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216
> PyEval_EvalCodeEx () at Python/ceval.c:2835
> function_call () at Objects/funcobject.c:634
> PyObject_Call () at Objects/abstract.c:1736
> PyEval_CallObjectWithKeywords () at Python/ceval.c:3431
> _PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147
> codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211
> PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275
> PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322
> PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at
> Objects/stringobject.c:459
> string_encode () at Objects/stringobject.c:3138
> PyCFunction_Call () at Objects/methodobject.c:73
> call_function () at Python/ceval.c:3551
> PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216
> PyEval_EvalCodeEx () at Python/ceval.c:2835
> function_call () at Objects/funcobject.c:634
> PyObject_Call () at Objects/abstract.c:1736
> method_call () at Objects/classobject.c:397
> PyObject_Call () at Objects/abstract.c:1736
> PyEval_CallObjectWithKeywords () at Python/ceval.c:3431
> PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159
> PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches
> Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
short:

PyFile_WriteString (s="# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184
mywrite (name="stderr", fp=, format="# %s matches %s\n", va=) at Python/sysmodule.c:1350
PySys_WriteStderr (format="# %s matches %s\n") at Python/sysmodule.c:1380
check_compiled_module (pathname="Lib/encodings/latin_1.py", mtime=, cpathname="Lib/encodings/latin_1.pyc") at Python/import.c:755
load_source_module (name="encodings.latin_1", pathname="Lib/encodings/latin_1.py", fp=) at Python/import.c:938
load_module (name="encodings.latin_1", fp=,buf="Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733
import_submodule (mod=, subname="latin_1",fullname="encodings.latin_1") at Python/import.c:2418
load_next (mod=,altmod=, p_name=,buf="encodings.latin_1", p_buflen=) at Python/import.c:2213
import_module_level (name=, globals=, locals=, fromlist=, level=0) at Python/import.c:1992
PyImport_ImportModuleLevel (name="encodings.latin_1", globals=, locals=, fromlist=, level=0) at Python/import.c:2056
builtin___import__ () at Python/bltinmodule.c:151
[...]
_PyCodec_Lookup (encoding="latin-1") at Python/codecs.c:147
codec_getitem (encoding="latin-1",index=0) at Python/codecs.c:211
PyCodec_Encoder (encoding="latin-1") at Python/codecs.c:275
PyCodec_Encode (object=,encoding="latin-1", errors=) at Python/codecs.c:322
PyString_AsEncodedObject (str=,encoding="latin-1", errors=) at Objects/stringobject.c:459
string_encode () at Objects/stringobject.c:3138
[...]
PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159
PyFile_WriteString (s="# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184




long:

PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184
mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at Python/sysmodule.c:1350
PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380
check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=, cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755
load_source_module (name= "encodings.latin_1", pathname= "Lib/encodings/latin_1.py", fp=) at Python/import.c:938
load_module (name= "encodings.latin_1", fp=,buf= "Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733
import_submodule (mod=, subname= "latin_1",fullname= "encodings.latin_1") at Python/import.c:2418
load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=) at Python/import.c:2213
import_module_level (name=, globals=, locals=, fromlist=, level=0) at Python/import.c:1992
PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=, locals=, fromlist=, level=0) at Python/import.c:2056
builtin___import__ () at Python/bltinmodule.c:151
PyCFunction_Call () at Objects/methodobject.c:77
PyObject_Call () at Objects/abstract.c:1736
do_call () at Python/ceval.c:3764
call_function (pp_stack=, oparg=513) at Python/ceval.c:3574
PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216
PyEval_EvalCodeEx () at Python/ceval.c:2835
function_call () at Objects/funcobject.c:634
PyObject_Call () at Objects/abstract.c:1736
PyEval_CallObjectWithKeywords () at Python/ceval.c:3431
_PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147
codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211
PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275
PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322
PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at Objects/stringobject.c:459
string_encode () at Objects/stringobject.c:3138
PyCFunction_Call () at Objects/methodobject.c:73
call_function () at Python/ceval.c:3551
PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216
PyEval_EvalCodeEx () at Python/ceval.c:2835
function_call () at Objects/funcobject.c:634
PyObject_Call () at Objects/abstract.c:1736
method_call () at Objects/classobject.c:397
PyObject_Call () at Objects/abstract.c:1736
PyEval_CallObjectWithKeywords () at Python/ceval.c:3431
PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159
PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184

From guido at python.org  Fri Aug 10 00:59:49 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Aug 2007 15:59:49 -0700
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com>
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>
	<87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>
	<46BB96DF.5060305@v.loewis.de>
	<87ps1we3ak.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <ca471dc20708091559j1d26c9deh8066efb13be59135@mail.gmail.com>

On 8/9/07, Kurt B. Kaiser <kbk at shore.net> wrote:
> "Martin v. L?wis" <martin at v.loewis.de> writes:
>
> > I'm not convinced. The actual failure is that "tag configure" is invoked
> > with a None tagname (which then gets stripped through flatten, apparently).
>
> OTOH, IDLE ran w/o this error in p3yk...

Yeah, in the new branch there will be more occurrences of PyUnicode,
which causes _tkinter.c to take different paths in many cases.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Fri Aug 10 01:01:43 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Aug 2007 01:01:43 +0200
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com>
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>	<87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>	<ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>	<46BB96DF.5060305@v.loewis.de>
	<87ps1we3ak.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <46BB9CD7.2030301@v.loewis.de>

>> I'm not convinced. The actual failure is that "tag configure" is invoked
>> with a None tagname (which then gets stripped through flatten, apparently).
> 
> OTOH, IDLE ran w/o this error in p3yk...

Yes. Somebody would have to study what precisely the problem is: is it
that there is a None key in that dictionary, and that you must not use
None as a tag name? In that case: where does the None come from?
Or else: is it that you can use None as a tagname in 2.x, but can't
anymore in 3.0? If so: why not?

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Fri Aug 10 01:24:13 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 10 Aug 2007 11:24:13 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BAEFB0.9050400@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com>
Message-ID: <46BBA21D.9060403@canterbury.ac.nz>

> Talin wrote:
>>In other words, other than the special case of 'repr', we find that 
>>pretty much everything can fit into a single specifier string;

I think there might still be merit in separating the
field width and alignment spec, at least syntactically,
since all format specs will have it and it should have
a uniform syntax across all of them, and it would be
good if it didn't have to be parsed by all the __format__
methods individually.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Fri Aug 10 02:15:51 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 10 Aug 2007 12:15:51 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46BB9908.8000704@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
	<46BB816E.1070507@v.loewis.de>
	<bb8868b90708091500k46d62e59id7a1a203a2cc6536@mail.gmail.com>
	<46BB9908.8000704@v.loewis.de>
Message-ID: <46BBAE37.8090600@canterbury.ac.nz>

Martin v. L?wis wrote:
> Perhaps using "I suggest" instead of asking "why not" would
> have clued me; English is not my native language, and I take
> questions as literally asking something.

Well, it's really a suggestion and a question. If
there's some reason the suggestion is a bad idea,
there's nothing wrong with pointing that out in
a reply.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Fri Aug 10 02:18:25 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 10 Aug 2007 12:18:25 +1200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46BB9A4A.3070201@ronadam.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
	<46BB816E.1070507@v.loewis.de> <46BB9A4A.3070201@ronadam.com>
Message-ID: <46BBAED1.8090700@canterbury.ac.nz>

Ron Adam wrote:
> Would some sort of an indirect reference type help.  Possibly an 
> object_id_based_reference as keys instead of using or hashing the object 
> itself?  This wouldn't change if the object mutates between accesses and 
> could be immutable.

But if the object did mutate, the cached re would be out
of date, and this wouldn't be noticed.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From kbk at shore.net  Fri Aug 10 02:26:49 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Thu, 09 Aug 2007 20:26:49 -0400
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <46BB9CD7.2030301@v.loewis.de> (Martin v. =?iso-8859-1?Q?L=F6?=
	=?iso-8859-1?Q?wis's?= message of
	"Fri, 10 Aug 2007 01:01:43 +0200")
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>
	<87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>
	<46BB96DF.5060305@v.loewis.de>
	<87ps1we3ak.fsf@hydra.hampton.thirdcreek.com>
	<46BB9CD7.2030301@v.loewis.de>
Message-ID: <87lkckdyk6.fsf@hydra.hampton.thirdcreek.com>

"Martin v. L?wis" <martin at v.loewis.de> writes:

>> OTOH, IDLE ran w/o this error in p3yk...
>
> Yes. Somebody would have to study what precisely the problem is: is it
> that there is a None key in that dictionary, and that you must not use
> None as a tag name? In that case: where does the None come from?
> Or else: is it that you can use None as a tagname in 2.x, but can't
> anymore in 3.0? If so: why not?

OK, I'll start looking at it.

-- 
KBK

From greg.ewing at canterbury.ac.nz  Fri Aug 10 02:30:38 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 10 Aug 2007 12:30:38 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BAEFB0.9050400@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com>
Message-ID: <46BBB1AE.5010207@canterbury.ac.nz>

Ron Adam wrote:
> The way to think of 'repr' and 'str' is that of a general "object" format 
> type/specifier.  That puts str and repr into the same context as the rest 
> of the format types.  This is really a point of view issue and not so much 
> of a semantic one.  I think {0:r} and {0:s} are to "object", as {0:d} and 
> {0:e} are to "float" ...  just another relationship relative to the value 
> being formatted.  So I don't understand the need to treat them differently.

There's no need to treat 's' specially, but 'r' is different,
at least if we want

    "{0:r}".format(x)

to always mean the same thing as

    "{0:s}".format(repr(x))

To achieve that without requiring every __format__ method
to recognise 'r' and handle it itself is going to require
format() to intercept 'r' before calling the __format__
method, as far as I can see. It can't be done by str,
since by the time str.__format__ gets called, the object
has already been passed through str(), and it's too late
to call repr() on it.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From eric+python-dev at trueblade.com  Fri Aug 10 03:26:30 2007
From: eric+python-dev at trueblade.com (Eric V. Smith)
Date: Thu, 09 Aug 2007 21:26:30 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46B5FBD9.4020301@acm.org>
References: <46B13ADE.7080901@acm.org>	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org>	<46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org>
Message-ID: <46BBBEC6.5030705@trueblade.com>

I'm just getting back from vacation and trying to catch up.  I think 
I've caught the sense of the discussion, but forgive me if I haven't.

Talin wrote:
> The reason is that for some types, the __format__ method can define its 
> own interpretation of the format string which may include the letters 
> 'rtgd' as part of its regular syntax. Basically, he wants no constraints 
> on what __format__ is allowed to do.

Why would this not be true for all types?  Why have int's interpret "f", 
or other things that don't apply to int's?

If you want:

x = 3
"{0:f}".format(x)

then be explicit and write:

"{0:f}".format(float(x))

I realize it's a little more verbose, but now the __format__ functions 
only need to worry about what applies to their own type, and we get out 
of the business of deciding how and when to convert between ints and 
floats and decimals and whatever other types are involved.

And once you decide that the entire specifier is interpreted only by the 
type, you no longer need the default specifier (":f" in this case), and 
you could just write:
"{0}".format(float(x))

That is, since we already know the type, we don't need to specify the 
type in the specifier.  Now the "d", or "x", or whatever could just be 
used by int's (for example), and only as needed.

I grant that repr() might be a different case, as Greg Ewing points out 
in a subsequent message.  But maybe we use something other than a colon 
to get __repr__ called, like:
"{`0`}".format(float(x))

I'm kidding with the back-ticks, of course.  Find some syntax which can 
be disambiguated from all specifiers.  Maybe:
"{0#}".format(float(x))
Or something similar.

Eric.


From rrr at ronadam.com  Fri Aug 10 06:35:34 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 09 Aug 2007 23:35:34 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BBB1AE.5010207@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>
	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>
	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>
	<46BBB1AE.5010207@canterbury.ac.nz>
Message-ID: <46BBEB16.2040205@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> The way to think of 'repr' and 'str' is that of a general "object" format 
>> type/specifier.  That puts str and repr into the same context as the rest 
>> of the format types.  This is really a point of view issue and not so much 
>> of a semantic one.  I think {0:r} and {0:s} are to "object", as {0:d} and 
>> {0:e} are to "float" ...  just another relationship relative to the value 
>> being formatted.  So I don't understand the need to treat them differently.
> 
> There's no need to treat 's' specially, but 'r' is different,
> at least if we want
> 
>     "{0:r}".format(x)
> 
> to always mean the same thing as
> 
>     "{0:s}".format(repr(x))
> 
> To achieve that without requiring every __format__ method
> to recognise 'r' and handle it itself is going to require
> format() to intercept 'r' before calling the __format__
> method, as far as I can see. It can't be done by str,
> since by the time str.__format__ gets called, the object
> has already been passed through str(), and it's too late
> to call repr() on it.

This doesn't require a different syntax to do.

Lets start at the top... what will the str.format() method look like?



Maybe an approximation might be:  (only one possible variation)

class str(object):
     ...
     def format(self, *args, **kwds):
         return format(self, *args, **kwds)  #calls global function.
     ...

And then for each format field, it will call the __format__ method of the 
matching position or named value.

class object():
     ...
     def __format__(self, value, format_spec):
        return value, format_spec
     ...

It doesn't actually do anything because it's a pre-format hook so that 
users can override the default behavior.  An overridden __format__ method 
can do one of the following...

     - handle the format spec on it's own and return a (string, None) [*]

     - alter the value and return (new_value, format_spec)

     - alter the format_spec and return (value, new_format_spec)

     - do logging of some values, and return the (value, format_spec) 
unchanged.

     - do something entirely different and return ('', None)

[* None could indicate an already formatted value in this case.]


Does this look ok, or would you do it a different way?

If we do it this way, then the 'r' formatter isn't handled any different 
than any of the others.  The exceptional case is the custom formatters in 
__format__ methods not the 'r' case.

Cheers,
    Ron














From talin at acm.org  Fri Aug 10 06:39:12 2007
From: talin at acm.org (Talin)
Date: Thu, 09 Aug 2007 21:39:12 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46BB3CA4.9010904@cornell.edu>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>	<46B7EA06.5040106@v.loewis.de>	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>	<46B7FACC.8030503@v.loewis.de>	<bb8868b90708090842j27a8a238p588846b9fda80c7b@mail.gmail.com>
	<46BB3CA4.9010904@cornell.edu>
Message-ID: <46BBEBF0.80508@acm.org>

Joel Bender wrote:
> Jason Orendorff wrote:
> 
>> Hooks would help with that, or even eliminate the need altogether.
> 
> IMHO, having a __bytes__ method would go a long way.

This would be better done with generic functions once we have them.

I general, I feel it's better not to embed knowledge of a particular 
serialization scheme in an object. Otherwise, you'd end up with every 
class having to know about 'pickle' and 'shelve' and 'marshall' and 
'JSON' and 'serialize-to-XML' and every other weird serialization format 
that people come up with.

Instead, this is exactly what GFs are good for, so that neither the 
object nor the serializer have to handle the N*M combinations of the two.

Of course, this criticism also works against having a __str__ method, 
instead of simply defining 'str()' as a GF. And there is some validity 
to that point. But for historical reasons, we're not likely to change 
it. And there's also some validity to the argument that a 'printable' 
representation is the one universal converter that deserves special status.

-- Talin

From g.brandl at gmx.net  Fri Aug 10 07:10:00 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 10 Aug 2007 07:10:00 +0200
Subject: [Python-3000] Console encoding detection broken
Message-ID: <f9grv8$1eg$1@sea.gmane.org>

Well, subject says it all. While 2.5 sets sys.std*.encoding correctly to
UTF-8, 3k sets it to 'latin-1', breaking output of Unicode strings.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From nnorwitz at gmail.com  Fri Aug 10 07:37:48 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Thu, 9 Aug 2007 22:37:48 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
Message-ID: <ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>

On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> This is done. The new py3k branch is ready for business.
>
> Left to do:
>
> - switch the buildbot and the doc builder to use the new branch (Neal)

I've updated to use the new branch.  I got the docs building, but
there are many more problems.  I won't re-enable the cronjob until
more things are working.

> There are currently about 7 failing unit tests left:
>
> test_bsddb
> test_bsddb3
> test_email
> test_email_codecs
> test_email_renamed
> test_sqlite
> test_urllib2_localnet

Ok, I disabled these, so if only they fail, mail shouldn't be sent
(when I enable the script).

There are other problems:
 * had to kill test_poplib due to taking all cpu without progress
 * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3:
test_foo test_bar ...)
 * at least one test fails with a fatal error
 * make install fails

Here are the details (probably best to update the wiki with status
before people start working on these):

I'm not sure what was happening with test_poplib.  I had to kill
test_poplib due to taking all cpu without progress.  When I ran it by
itself, it was fine.  So there was some bad interaction with another
test.

Ref leaks and fatal error (see
http://docs.python.org/dev/3.0/results/make-test-refleak.out):
test_array leaked [11, 11, 11] references, sum=33
test_bytes leaked [4, 4, 4] references, sum=12
test_codeccallbacks leaked [21, 21, 21] references, sum=63
test_codecs leaked [260, 260, 260] references, sum=780
test_ctypes leaked [10, 10, 10] references, sum=30
Fatal Python error:
/home/neal/python/py3k/Modules/datetimemodule.c:1175 object at
0xb60b19c8 has negative ref count -4

There are probably more, but I haven't had a chance to run more after
test_datetime.

This failure occurred while running with -R:

test test_coding failed -- Traceback (most recent call last):
  File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
line 12, in test_bad_coding2
    self.verify_bad_module(module_name)
  File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
line 20, in verify_bad_module
    text = fp.read()
  File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read
    res += decoder.decode(self.buffer.read(), True)
  File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py",
line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
0: ordinal not in range(128)


See http://docs.python.org/dev/3.0/results/make-install.out for this failure:

Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
Traceback (most recent call last):
  File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
162, in <module>
    exit_status = int(not main())
  File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
152, in main
    force, rx, quiet):
  File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
89, in compile_dir
    if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
  File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
65, in compile_dir
    ok = py_compile.compile(fullname, None, dfile, True)
  File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
144, in compile
    py_exc = PyCompileError(err.__class__,err.args,dfile or file)
  File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
49, in __init__
    tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
  File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
179, in format_exception_only
    filename = value.filename or "<string>"
AttributeError: 'tuple' object has no attribute 'filename'

I'm guessing this came from the change in exception args handling?

  File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
144, in compile
    py_exc = PyCompileError(err.__class__,err.args,dfile or file)

n

From theller at ctypes.org  Fri Aug 10 07:49:51 2007
From: theller at ctypes.org (Thomas Heller)
Date: Fri, 10 Aug 2007 07:49:51 +0200
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
Message-ID: <f9gu9v$55i$1@sea.gmane.org>

Neal Norwitz schrieb:
> On 8/9/07, Guido van Rossum <guido at python.org> wrote:
>> This is done. The new py3k branch is ready for business.
>>
>> Left to do:
>>
>> - switch the buildbot and the doc builder to use the new branch (Neal)

Shouldn't there be a py3k buildbot at http://www.python.org/dev/buildbot/
as well?

Thomas


From nnorwitz at gmail.com  Fri Aug 10 08:15:43 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Thu, 9 Aug 2007 23:15:43 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <f9gu9v$55i$1@sea.gmane.org>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<f9gu9v$55i$1@sea.gmane.org>
Message-ID: <ee2a432c0708092315r6a55f4cer8912687333122f73@mail.gmail.com>

On 8/9/07, Thomas Heller <theller at ctypes.org> wrote:
> Neal Norwitz schrieb:
> > On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> >> This is done. The new py3k branch is ready for business.
> >>
> >> Left to do:
> >>
> >> - switch the buildbot and the doc builder to use the new branch (Neal)
>
> Shouldn't there be a py3k buildbot at http://www.python.org/dev/buildbot/
> as well?

I plan to add one, but things still need to settle down more first.
As long as there are failing tests, it's not too worthwhile.  Plus
with all the other failures.

My plan is to get things working really well on one platform first and
then enable the buildbots.  We still have quite a bit of work to do
(see my previous mail in this thread).

n

From martin at v.loewis.de  Fri Aug 10 08:17:32 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Aug 2007 08:17:32 +0200
Subject: [Python-3000] Console encoding detection broken
In-Reply-To: <f9grv8$1eg$1@sea.gmane.org>
References: <f9grv8$1eg$1@sea.gmane.org>
Message-ID: <46BC02FC.6080107@v.loewis.de>

Georg Brandl schrieb:
> Well, subject says it all. While 2.5 sets sys.std*.encoding correctly to
> UTF-8, 3k sets it to 'latin-1', breaking output of Unicode strings.

And not surprisingly so: io.py says

        if encoding is None:
            # XXX This is questionable
            encoding = sys.getfilesystemencoding() or "latin-1"

First, at the point where this call is made, sys.getfilesystemencoding
is still None, plus the code is broken as getfilesystemencoding is not
the correct value for sys.stdout.encoding. Instead, the way it should
be computed is:

1. On Unix, use the same value that sys.getfilesystemencoding will get,
   namely the result of nl_langinfo(CODESET); if that is not available,
   fall back - to anything, but the most logical choices are UTF-8
   (if you want output to always succeed) and ASCII (if you don't want
   to risk mojibake).
2. On Windows, if output is to a terminal, use GetConsoleOutputCP.
   Else fall back, probably to CP_ACP (ie. "mbcs")
3. On OSX, I don't know. If output is to a terminal, UTF-8 may be
   a good bet (although some people operate their Terminal.apps
   not in UTF-8; there is no way to find out). Otherwise, use the
   locale's encoding - not sure how to find out what that is.

Regards,
Martin

From nnorwitz at gmail.com  Fri Aug 10 08:31:13 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Thu, 9 Aug 2007 23:31:13 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
Message-ID: <ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>

I wonder if a lot of the refleaks may have the same cause as this one:

  b'\xff'.decode("utf8", "ignore")

No leaks jumped out at me.  Here is the rest of the leaks that have
been reported so far.  I don't know how many have the same cause.

test_multibytecodec leaked [72, 72, 72] references, sum=216
test_parser leaked [5, 5, 5] references, sum=15

The other failures that occurred with -R:

test test_collections failed -- errors occurred; run in verbose mode for details

test test_gzip failed -- Traceback (most recent call last):
  File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
test_many_append
    ztxt = zgfile.read(8192)
  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
    self._read(readsize)
  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
    self._read_eof()
  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
    crc32 = read32(self.fileobj)
  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
    return struct.unpack("<l", input.read(4))[0]
  File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
    return o.unpack(s)
struct.error: unpack requires a string argument of length 4

test test_runpy failed -- Traceback (most recent call last):
  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
in test_run_module
    self._check_module(depth)
  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
in _check_module
    d2 = run_module(mod_name) # Read from bytecode
  File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
    raise ImportError("No module named %s" % mod_name)
ImportError: No module named runpy_test

test_textwrap was the last test to complete.  test_thread was still running.

n
--
On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> > This is done. The new py3k branch is ready for business.
> >
> > Left to do:
> >
> > - switch the buildbot and the doc builder to use the new branch (Neal)
>
> I've updated to use the new branch.  I got the docs building, but
> there are many more problems.  I won't re-enable the cronjob until
> more things are working.
>
> > There are currently about 7 failing unit tests left:
> >
> > test_bsddb
> > test_bsddb3
> > test_email
> > test_email_codecs
> > test_email_renamed
> > test_sqlite
> > test_urllib2_localnet
>
> Ok, I disabled these, so if only they fail, mail shouldn't be sent
> (when I enable the script).
>
> There are other problems:
>  * had to kill test_poplib due to taking all cpu without progress
>  * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3:
> test_foo test_bar ...)
>  * at least one test fails with a fatal error
>  * make install fails
>
> Here are the details (probably best to update the wiki with status
> before people start working on these):
>
> I'm not sure what was happening with test_poplib.  I had to kill
> test_poplib due to taking all cpu without progress.  When I ran it by
> itself, it was fine.  So there was some bad interaction with another
> test.
>
> Ref leaks and fatal error (see
> http://docs.python.org/dev/3.0/results/make-test-refleak.out):
> test_array leaked [11, 11, 11] references, sum=33
> test_bytes leaked [4, 4, 4] references, sum=12
> test_codeccallbacks leaked [21, 21, 21] references, sum=63
> test_codecs leaked [260, 260, 260] references, sum=780
> test_ctypes leaked [10, 10, 10] references, sum=30
> Fatal Python error:
> /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at
> 0xb60b19c8 has negative ref count -4
>
> There are probably more, but I haven't had a chance to run more after
> test_datetime.
>
> This failure occurred while running with -R:
>
> test test_coding failed -- Traceback (most recent call last):
>   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> line 12, in test_bad_coding2
>     self.verify_bad_module(module_name)
>   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> line 20, in verify_bad_module
>     text = fp.read()
>   File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read
>     res += decoder.decode(self.buffer.read(), True)
>   File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py",
> line 26, in decode
>     return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
> 0: ordinal not in range(128)
>
>
> See http://docs.python.org/dev/3.0/results/make-install.out for this failure:
>
> Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
> Traceback (most recent call last):
>   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> 162, in <module>
>     exit_status = int(not main())
>   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> 152, in main
>     force, rx, quiet):
>   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> 89, in compile_dir
>     if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
>   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> 65, in compile_dir
>     ok = py_compile.compile(fullname, None, dfile, True)
>   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> 144, in compile
>     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
>   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> 49, in __init__
>     tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
>   File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
> 179, in format_exception_only
>     filename = value.filename or "<string>"
> AttributeError: 'tuple' object has no attribute 'filename'
>
> I'm guessing this came from the change in exception args handling?
>
>   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> 144, in compile
>     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
>
> n
>

From martin at v.loewis.de  Fri Aug 10 08:55:34 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Aug 2007 08:55:34 +0200
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <87lkckdyk6.fsf@hydra.hampton.thirdcreek.com>
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>	<87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>	<ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>	<46BB96DF.5060305@v.loewis.de>	<87ps1we3ak.fsf@hydra.hampton.thirdcreek.com>	<46BB9CD7.2030301@v.loewis.de>
	<87lkckdyk6.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <46BC0BE6.90908@v.loewis.de>

>>> OTOH, IDLE ran w/o this error in p3yk...
>> Yes. Somebody would have to study what precisely the problem is: is it
>> that there is a None key in that dictionary, and that you must not use
>> None as a tag name? In that case: where does the None come from?
>> Or else: is it that you can use None as a tagname in 2.x, but can't
>> anymore in 3.0? If so: why not?
> 
> OK, I'll start looking at it.

So did I, somewhat. It looks like a genuine bug in IDLE to me: you
can't use None as a tag name, AFAIU. I'm not quite sure why this
doesn't cause an exception in 2.x; if I try to give a None tag
separately (i.e. in a stand-alone program) in 2.5,
it gives me the same exception.

Regards,
Martin


From greg.ewing at canterbury.ac.nz  Fri Aug 10 09:21:04 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 10 Aug 2007 19:21:04 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BBBEC6.5030705@trueblade.com>
References: <46B13ADE.7080901@acm.org>
	<05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org>
	<ca471dc20708021130le2e295es8f30ccc987d3de1b@mail.gmail.com>
	<46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com>
Message-ID: <46BC11E0.9040201@canterbury.ac.nz>

Eric V. Smith wrote:
> If you want:
> 
> x = 3
> "{0:f}".format(x)
> 
> then be explicit and write:
> 
> "{0:f}".format(float(x))

That would be quite inconvenient, I think. It's very common
to use ints and floats interchangeably in contexts which
are conceptually float. The rest of the language facilitates
this, and it would be a nuisance if it didn't extend to
formatting.

So I think that ints and floats should know a little about
each other's format specs, just enough to know when to
delegate to each other.

> And once you decide that the entire specifier is interpreted only by the 
> type, you no longer need the default specifier (":f" in this case), and 
> you could just write:
> "{0}".format(float(x))

That can be done.

> I grant that repr() might be a different case, as Greg Ewing points out 
> in a subsequent message.  But maybe we use something other than a colon 
> to get __repr__ called, like:
> "{`0`}".format(float(x))

I'm inclined to think the best thing is just to declare
that 'r' is special and gets intercepted by format().
It means that __format__ methods don't *quite* get
complete control, but I think it would be a practical
solution.

Another way would be to ditch 'r' completely and just
tell people to wrap repr() around their arguments if
they want it. That might be seen as a backward step
in terms of convenience, though.

--
Greg

From nnorwitz at gmail.com  Fri Aug 10 09:31:29 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Fri, 10 Aug 2007 00:31:29 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>
Message-ID: <ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>

Bah, who needs sleep anyways.  This list of problems should be fairly
complete when running with -R.  (it skips the fatal error from
test_datetime though)

Code to trigger a leak:   b'\xff'.decode("utf8", "ignore")

Leaks:
test_array leaked [11, 11, 11] references, sum=33
test_bytes leaked [4, 4, 4] references, sum=12
test_codeccallbacks leaked [21, 21, 21] references, sum=63
test_codecs leaked [260, 260, 260] references, sum=780
test_ctypes leaked [-22, 43, 10] references, sum=31
test_multibytecodec leaked [72, 72, 72] references, sum=216
test_parser leaked [5, 5, 5] references, sum=15
test_unicode leaked [4, 4, 4] references, sum=12
test_xml_etree leaked [128, 128, 128] references, sum=384
test_xml_etree_c leaked [128, 128, 128] references, sum=384
test_zipimport leaked [29, 29, 29] references, sum=87

Failures with -R:

test test_collections failed -- errors occurred; run in verbose mode for details

test test_gzip failed -- Traceback (most recent call last):
 File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
test_many_append
   ztxt = zgfile.read(8192)
 File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
   self._read(readsize)
 File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
   self._read_eof()
 File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
   crc32 = read32(self.fileobj)
 File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
   return struct.unpack("<l", input.read(4))[0]
 File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
   return o.unpack(s)
struct.error: unpack requires a string argument of length 4

test test_runpy failed -- Traceback (most recent call last):
 File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
in test_run_module
   self._check_module(depth)
 File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
in _check_module
   d2 = run_module(mod_name) # Read from bytecode
 File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
   raise ImportError("No module named %s" % mod_name)
ImportError: No module named runpy_test

test test_shelve failed -- errors occurred; run in verbose mode for details

test test_structmembers failed -- errors occurred; run in verbose mode
for details

test_univnewlines skipped -- This Python does not have universal newline support

Traceback (most recent call last):
  File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in
handle_request
    self.process_request(request, client_address)
  File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in
process_request
    self.finish_request(request, client_address)
  File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in
finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__
    self.handle()
  File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle
    self.handle_one_request()
  File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303,
in handle_one_request
    if not self.parse_request(): # An error code has been sent, just exit
  File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281,
in parse_request
    self.headers = self.MessageClass(self.rfile, 0)
  File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__
    rfc822.Message.__init__(self, fp, seekable)
  File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__
    self.readheaders()
  File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders
    headerseen = self.isheader(line)
  File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader
    return line[:i].lower()
AttributeError: 'bytes' object has no attribute 'lower'

On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> I wonder if a lot of the refleaks may have the same cause as this one:
>
>   b'\xff'.decode("utf8", "ignore")
>
> No leaks jumped out at me.  Here is the rest of the leaks that have
> been reported so far.  I don't know how many have the same cause.
>
> test_multibytecodec leaked [72, 72, 72] references, sum=216
> test_parser leaked [5, 5, 5] references, sum=15
>
> The other failures that occurred with -R:
>
> test test_collections failed -- errors occurred; run in verbose mode for details
>
> test test_gzip failed -- Traceback (most recent call last):
>   File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> test_many_append
>     ztxt = zgfile.read(8192)
>   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
>     self._read(readsize)
>   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
>     self._read_eof()
>   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
>     crc32 = read32(self.fileobj)
>   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
>     return struct.unpack("<l", input.read(4))[0]
>   File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
>     return o.unpack(s)
> struct.error: unpack requires a string argument of length 4
>
> test test_runpy failed -- Traceback (most recent call last):
>   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> in test_run_module
>     self._check_module(depth)
>   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> in _check_module
>     d2 = run_module(mod_name) # Read from bytecode
>   File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
>     raise ImportError("No module named %s" % mod_name)
> ImportError: No module named runpy_test
>
> test_textwrap was the last test to complete.  test_thread was still running.
>
> n
> --
> On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> > > This is done. The new py3k branch is ready for business.
> > >
> > > Left to do:
> > >
> > > - switch the buildbot and the doc builder to use the new branch (Neal)
> >
> > I've updated to use the new branch.  I got the docs building, but
> > there are many more problems.  I won't re-enable the cronjob until
> > more things are working.
> >
> > > There are currently about 7 failing unit tests left:
> > >
> > > test_bsddb
> > > test_bsddb3
> > > test_email
> > > test_email_codecs
> > > test_email_renamed
> > > test_sqlite
> > > test_urllib2_localnet
> >
> > Ok, I disabled these, so if only they fail, mail shouldn't be sent
> > (when I enable the script).
> >
> > There are other problems:
> >  * had to kill test_poplib due to taking all cpu without progress
> >  * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3:
> > test_foo test_bar ...)
> >  * at least one test fails with a fatal error
> >  * make install fails
> >
> > Here are the details (probably best to update the wiki with status
> > before people start working on these):
> >
> > I'm not sure what was happening with test_poplib.  I had to kill
> > test_poplib due to taking all cpu without progress.  When I ran it by
> > itself, it was fine.  So there was some bad interaction with another
> > test.
> >
> > Ref leaks and fatal error (see
> > http://docs.python.org/dev/3.0/results/make-test-refleak.out):
> > test_array leaked [11, 11, 11] references, sum=33
> > test_bytes leaked [4, 4, 4] references, sum=12
> > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > test_codecs leaked [260, 260, 260] references, sum=780
> > test_ctypes leaked [10, 10, 10] references, sum=30
> > Fatal Python error:
> > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at
> > 0xb60b19c8 has negative ref count -4
> >
> > There are probably more, but I haven't had a chance to run more after
> > test_datetime.
> >
> > This failure occurred while running with -R:
> >
> > test test_coding failed -- Traceback (most recent call last):
> >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > line 12, in test_bad_coding2
> >     self.verify_bad_module(module_name)
> >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > line 20, in verify_bad_module
> >     text = fp.read()
> >   File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read
> >     res += decoder.decode(self.buffer.read(), True)
> >   File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py",
> > line 26, in decode
> >     return codecs.ascii_decode(input, self.errors)[0]
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
> > 0: ordinal not in range(128)
> >
> >
> > See http://docs.python.org/dev/3.0/results/make-install.out for this failure:
> >
> > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
> > Traceback (most recent call last):
> >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > 162, in <module>
> >     exit_status = int(not main())
> >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > 152, in main
> >     force, rx, quiet):
> >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > 89, in compile_dir
> >     if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
> >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > 65, in compile_dir
> >     ok = py_compile.compile(fullname, None, dfile, True)
> >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > 144, in compile
> >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > 49, in __init__
> >     tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
> >   File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
> > 179, in format_exception_only
> >     filename = value.filename or "<string>"
> > AttributeError: 'tuple' object has no attribute 'filename'
> >
> > I'm guessing this came from the change in exception args handling?
> >
> >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > 144, in compile
> >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> >
> > n
> >
>

From greg.ewing at canterbury.ac.nz  Fri Aug 10 09:32:09 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 10 Aug 2007 19:32:09 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BBEB16.2040205@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com>
Message-ID: <46BC1479.30405@canterbury.ac.nz>

Ron Adam wrote:
>     - alter the value and return (new_value, format_spec)
>     - alter the format_spec and return (value, new_format_spec)
>     - do logging of some values, and return the (value, format_spec) 
> unchanged.

I would ditch all of these. They're not necessary, as
the same effect can be achieved by explicitly calling
another __format__ method, or one's own __format__
method with different args, and returning the result.

>     - do something entirely different and return ('', None)

I don't understand. What is meant to happen in that case?

> Does this look ok, or would you do it a different way?

You haven't explained how this addresses the 'r' issue
without requiring every __format__ method to recognise
and deal with it.

--
Greg


From brett at python.org  Fri Aug 10 09:45:33 2007
From: brett at python.org (Brett Cannon)
Date: Fri, 10 Aug 2007 00:45:33 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
Message-ID: <bbaeab100708100045i462ede60s9d512ba84e6c96d0@mail.gmail.com>

On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
[SNIP]
> See http://docs.python.org/dev/3.0/results/make-install.out for this failure:
>
> Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
> Traceback (most recent call last):
>   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> 162, in <module>
>     exit_status = int(not main())
>   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> 152, in main
>     force, rx, quiet):
>   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> 89, in compile_dir
>     if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
>   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> 65, in compile_dir
>     ok = py_compile.compile(fullname, None, dfile, True)
>   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> 144, in compile
>     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
>   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> 49, in __init__
>     tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
>   File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
> 179, in format_exception_only
>     filename = value.filename or "<string>"
> AttributeError: 'tuple' object has no attribute 'filename'
>
> I'm guessing this came from the change in exception args handling?

What change are you thinking of?  'args' was not changed, only the
removal of 'message'.

-Brett

From lists at cheimes.de  Fri Aug 10 10:57:26 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 10 Aug 2007 10:57:26 +0200
Subject: [Python-3000] tp_bytes and __bytes__ magic method
In-Reply-To: <ca471dc20708081700s208617cbneeb4d811c50a108c@mail.gmail.com>
References: <f9dfo1$b7e$1@sea.gmane.org>	<ca471dc20708081554x834b59cpcde5c9b1d9ca3c4a@mail.gmail.com>	<f9dkqg$off$1@sea.gmane.org>
	<ca471dc20708081700s208617cbneeb4d811c50a108c@mail.gmail.com>
Message-ID: <f9h99n$2bi$1@sea.gmane.org>

Guido van Rossum wrote:
> This could just as well be done using a method on that specific
> object. I don't think having to write x.as_bytes() is worse than
> bytes(x), *unless* there are contexts where it's important to convert
> something to bytes without knowing what kind of thing it is. For
> str(), such a context exists: print(). For bytes(), I'm not so sure.
> The use cases given here seem to be either very specific to a certain
> class, or could be solved using other generic APIs like pickling.


I see your point. Since nobody else beside Victor and me are interested
in __bytes__ I retract my proposal. Thanks for your time.

Christian


From lists at cheimes.de  Fri Aug 10 11:20:23 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 10 Aug 2007 11:20:23 +0200
Subject: [Python-3000] No (C) optimization flag
Message-ID: <f9hakp$6n4$1@sea.gmane.org>

Good morning py3k-dev!

If I understand correctly the new C optimization for io,py by Alexandre
Vassalotti and possible other optimization for modules likes pickle.py
are going to be dropped in automatically. The Python implementation is a
reference implementation and will be used as fall back only.

On the one hand it is an improvement. We are getting instant
optimization without teaching people to use a cFoo module. But on the
other hand it is going to make debugging with pdb much harder because
pdb can't step into C code.

I like to propose a --disable-optimization (-N for no optimization) flag
for Python that disables the usage of optimized implementation. The
status of the flag can be set by either a command line argument or a C
function call before Py_Initialize() and it can be queried by
sys.getoptimization(). It's not possible to chance the flag during
runtime. That should make the code simple and straight forward.

When the flag is set modules like io and pickle must not use their
optimized version and fall back to their Python implementation. I'm
willing to give it a try writing the necessary code myself. I think I'm
familiar enough with the Python C API after my work on PythonNet for
this simple task.

Christian


From victor.stinner at haypocalc.com  Fri Aug 10 12:09:30 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 10 Aug 2007 12:09:30 +0200
Subject: [Python-3000] bytes regular expression?
In-Reply-To: <f9fien$ccf$1@sea.gmane.org>
References: <200708090427.19830.victor.stinner@haypocalc.com>
	<200708091740.59070.victor.stinner@haypocalc.com>
	<f9fien$ccf$1@sea.gmane.org>
Message-ID: <200708101209.30281.victor.stinner@haypocalc.com>

On Thursday 09 August 2007 19:21:27 Thomas Heller wrote:
> Victor Stinner schrieb:
> > I prefer str8 which looks to be a good candidate for "frozenbytes" type.
>
> I love this idea!  Leave str8 as it is, maybe extend Python so that it
> understands the s"..." literals and we are done.

Hum, today str8 is between bytes and str types. str8 has more methods (eg. 
lower()) than bytes, its behaviour is different in comparaison (b'a' != 'a' 
but str8('a') == 'a') and issubclass(str8, basestring) is True.

I think that a frozenbytes type is required for backward compatibility (in 
Python 2.x, "a" is immutable). Eg. use bytes as key for a dictionary (looks 
to be needed in re and dbm modules).

Victor Stinner aka haypo
http://hachoir.org/

From walter at livinglogic.de  Fri Aug 10 12:12:44 2007
From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=)
Date: Fri, 10 Aug 2007 12:12:44 +0200
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>	<ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>
	<ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
Message-ID: <46BC3A1C.205@livinglogic.de>

Neal Norwitz wrote:

> [...]
> Code to trigger a leak:   b'\xff'.decode("utf8", "ignore")

This should be fixed in r56894.

> [...]

Servus,
   Walter

From jimjjewett at gmail.com  Fri Aug 10 16:27:07 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 10 Aug 2007 10:27:07 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BBBEC6.5030705@trueblade.com>
References: <46B13ADE.7080901@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com>
Message-ID: <fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>

On 8/9/07, Eric V. Smith <eric+python-dev at trueblade.com> wrote:

> If you want:
>
> x = 3
> "{0:f}".format(x)
>
> then be explicit and write:
>
> "{0:f}".format(float(x))

Because then you can't really create formatting strings.  Instead of

    >>> print("The high temperature at {place:s}, on {date:YYYY-MM-DD}
was {temp:f}" % tempsdict)
    >>> print("{name} scored {score:f}" % locals())

You would have to write

    >>> _tempsdict_copy = dict(tempsdict)
    >>> _tempsdict_copy['place'] = str(_tempsdict_copy['place'])
    >>> _tempsdict_copy['date'] =
    ...         datetime.date(_tempsdict_copy['date']).isoformat()
    >>> _tempsdict_copy['temp'] = float(_tempsdict_copy['temp'])
    >>> print("The high temperature at {place}, on {date} was {temp}"
% _tempsdict_copy)

    >>> _f_score = float(score)
    >>> print("{name} scored {score}" % locals())

-jJ

From jimjjewett at gmail.com  Fri Aug 10 16:33:21 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 10 Aug 2007 10:33:21 -0400
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <f9hakp$6n4$1@sea.gmane.org>
References: <f9hakp$6n4$1@sea.gmane.org>
Message-ID: <fb6fbf560708100733i5f9bd67g9e4556669525f59c@mail.gmail.com>

On 8/10/07, Christian Heimes <lists at cheimes.de> wrote:

> I like to propose a --disable-optimization (-N for no optimization) flag
> for Python that disables the usage of optimized implementation. The
> status of the flag can be set by either a command line argument or a C
> function call before Py_Initialize() and it can be queried by
> sys.getoptimization(). It's not possible to chance the flag during
> runtime. That should make the code simple and straight forward.

So you want it global and not per-module?

It strikes me as something that ought to be controllable at a
finer-grained level, if only to ensure that regression tests continue
to also test the python version automatically.

-jJ

From guido at python.org  Fri Aug 10 16:44:14 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Aug 2007 07:44:14 -0700
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <f9hakp$6n4$1@sea.gmane.org>
References: <f9hakp$6n4$1@sea.gmane.org>
Message-ID: <ca471dc20708100744u58df433s11479973620aa93a@mail.gmail.com>

If you really need to step through the Python code, you can just
sabotage the loading of the non-Python version, e.g. remove or rename
the .so or .dll file temporarily.

I wonder about the usefulness of this debugging though -- if you're
debugging something that requires you to step through the C code, how
do you know that the same bug is present in the Python code you're
stepping through instead? Otherwise (if you're debugging a bug in your
own program) I'm not sure I see how stepping through the I/O library
is helpful.

Sounds like what you're really after is *understanding* how the I/O
library works. For that, perhaps reading the docs and then reading the
source code would be more effective.

--Guido

On 8/10/07, Christian Heimes <lists at cheimes.de> wrote:
> Good morning py3k-dev!
>
> If I understand correctly the new C optimization for io,py by Alexandre
> Vassalotti and possible other optimization for modules likes pickle.py
> are going to be dropped in automatically. The Python implementation is a
> reference implementation and will be used as fall back only.
>
> On the one hand it is an improvement. We are getting instant
> optimization without teaching people to use a cFoo module. But on the
> other hand it is going to make debugging with pdb much harder because
> pdb can't step into C code.
>
> I like to propose a --disable-optimization (-N for no optimization) flag
> for Python that disables the usage of optimized implementation. The
> status of the flag can be set by either a command line argument or a C
> function call before Py_Initialize() and it can be queried by
> sys.getoptimization(). It's not possible to chance the flag during
> runtime. That should make the code simple and straight forward.
>
> When the flag is set modules like io and pickle must not use their
> optimized version and fall back to their Python implementation. I'm
> willing to give it a try writing the necessary code myself. I think I'm
> familiar enough with the Python C API after my work on PythonNet for
> this simple task.
>
> Christian
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Fri Aug 10 17:26:55 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 10 Aug 2007 11:26:55 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>
References: <46B13ADE.7080901@acm.org>	
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>	
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>	
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>	
	<46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com>
	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>
Message-ID: <46BC83BF.3000407@trueblade.com>

Jim Jewett wrote:
> On 8/9/07, Eric V. Smith <eric+python-dev at trueblade.com> wrote:
> 
>> If you want:
>>
>> x = 3
>> "{0:f}".format(x)
>>
>> then be explicit and write:
>>
>> "{0:f}".format(float(x))
> 
> Because then you can't really create formatting strings.  Instead of
> 
>     >>> print("The high temperature at {place:s}, on {date:YYYY-MM-DD}
> was {temp:f}" % tempsdict)
>     >>> print("{name} scored {score:f}" % locals())
> 
> You would have to write
> 
>     >>> _tempsdict_copy = dict(tempsdict)
>     >>> _tempsdict_copy['place'] = str(_tempsdict_copy['place'])
>     >>> _tempsdict_copy['date'] =
>     ...         datetime.date(_tempsdict_copy['date']).isoformat()
>     >>> _tempsdict_copy['temp'] = float(_tempsdict_copy['temp'])
>     >>> print("The high temperature at {place}, on {date} was {temp}"
> % _tempsdict_copy)
> 
>     >>> _f_score = float(score)
>     >>> print("{name} scored {score}" % locals())
> 
> -jJ
> 

I concede your point that while using dictionaries it's convenient not 
to have to convert types manually.

However, your date example wouldn't require conversion to a string, 
since YYYY-MM would just be passed to datetime.date.__format__().  And
"{score}" in your second example would need to be changed to "{_f_score}".


Anyway, if we're keeping conversions, I see two approaches:

1: "".format() (or Talin's format_field, actually) understands which 
types can be converted to other types, and does the conversions.  This 
is how Patrick and I wrote the original PEP 3101 sandbox prototype.

2: each type's __format__ function understands how to convert to some 
subset of all types (int can convert to float and decimal, for example).

I was going to argue for approach 2, but after describing it, it became 
too difficult to understand, and I think I'll instead argue for approach 
1.  The problem with approach 2 is that there's logic in 
int.__format__() that understands float.__format__() specifiers, and 
vice-versa.  At least with approach 1, all of this logic is in one place.

So I think format_field() has logic like:

def format_field(value, specifier):
     # handle special repr case
     if is_repr_specifier(specifier):
         return value.__repr__()

     # handle special string case
     if is_string_specifier(specifier):
         return str(value).__format__(specifier)

     # handle built-in conversions
     if (isinstance(value, (float, basestring))
           and is_int_specifier(specifier)):
         return int(value).__format__(specifier)
     if (isinstance(value, (int, basestring)
           and is_float_specifier(specifier)):
         return float(value).__format__(specifier)

     # handle all other cases
     return value.__format__(specifier)

This implies that string and repr specifiers are discernible across all 
types, and int and float specifiers are unique amongst themselves.  The 
trick, of course, is what's in is_XXX_specifier.

I don't know enough about decimal to know if it's possible or desirable 
to automatically convert it to other types, or other types to it.

Eric.



From nnorwitz at gmail.com  Fri Aug 10 18:27:01 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Fri, 10 Aug 2007 09:27:01 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <bbaeab100708100045i462ede60s9d512ba84e6c96d0@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<bbaeab100708100045i462ede60s9d512ba84e6c96d0@mail.gmail.com>
Message-ID: <ee2a432c0708100927x22d9c29dtb3fc37ed46019b4@mail.gmail.com>

On 8/10/07, Brett Cannon <brett at python.org> wrote:
> On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> [SNIP]
> > See http://docs.python.org/dev/3.0/results/make-install.out for this failure:
> >
> > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
> > Traceback (most recent call last):
> >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > 162, in <module>
> >     exit_status = int(not main())
> >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > 152, in main
> >     force, rx, quiet):
> >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > 89, in compile_dir
> >     if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
> >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > 65, in compile_dir
> >     ok = py_compile.compile(fullname, None, dfile, True)
> >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > 144, in compile
> >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > 49, in __init__
> >     tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
> >   File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
> > 179, in format_exception_only
> >     filename = value.filename or "<string>"
> > AttributeError: 'tuple' object has no attribute 'filename'
> >
> > I'm guessing this came from the change in exception args handling?
>
> What change are you thinking of?  'args' was not changed, only the
> removal of 'message'.

That was probably the change I was thinking of.  Though wasn't there
also a change with unpacking args when catching an exception?

I didn't dig into this problem or the code, it was a guess so could be
totally off.  I was really more thinking out loud (hence the
question).  Hoping it might trigger some better ideas (or get people
looking into the problem).

n

From ncoghlan at gmail.com  Fri Aug 10 18:36:16 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 11 Aug 2007 02:36:16 +1000
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <ca471dc20708100744u58df433s11479973620aa93a@mail.gmail.com>
References: <f9hakp$6n4$1@sea.gmane.org>
	<ca471dc20708100744u58df433s11479973620aa93a@mail.gmail.com>
Message-ID: <46BC9400.70803@gmail.com>

Guido van Rossum wrote:
> If you really need to step through the Python code, you can just
> sabotage the loading of the non-Python version, e.g. remove or rename
> the .so or .dll file temporarily.
> 
> I wonder about the usefulness of this debugging though -- if you're
> debugging something that requires you to step through the C code, how
> do you know that the same bug is present in the Python code you're
> stepping through instead? Otherwise (if you're debugging a bug in your
> own program) I'm not sure I see how stepping through the I/O library
> is helpful.
> 
> Sounds like what you're really after is *understanding* how the I/O
> library works. For that, perhaps reading the docs and then reading the
> source code would be more effective.

However we select between Python and native module versions, the build 
bots need be set up to run the modules both ways (with and without C 
optimisation).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From rhamph at gmail.com  Fri Aug 10 19:02:43 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 10 Aug 2007 11:02:43 -0600
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BC83BF.3000407@trueblade.com>
References: <46B13ADE.7080901@acm.org> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com>
	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>
	<46BC83BF.3000407@trueblade.com>
Message-ID: <aac2c7cb0708101002t1d09606cl82ce318cd5f26beb@mail.gmail.com>

On 8/10/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Anyway, if we're keeping conversions, I see two approaches:
>
> 1: "".format() (or Talin's format_field, actually) understands which
> types can be converted to other types, and does the conversions.  This
> is how Patrick and I wrote the original PEP 3101 sandbox prototype.
>
> 2: each type's __format__ function understands how to convert to some
> subset of all types (int can convert to float and decimal, for example).

I feel I must be missing something obvious here, but could somebody
explain the problem with __format__ returning NotImplemented to mean
"use a fallback"?  It seems like it'd have the advantages of both, ie
repr, str, and several other formats are automatic, while it's still
possible to override any format or create new ones.


-- 
Adam Olsen, aka Rhamphoryncus

From guido at python.org  Fri Aug 10 19:26:13 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Aug 2007 10:26:13 -0700
Subject: [Python-3000] Console encoding detection broken
In-Reply-To: <46BC02FC.6080107@v.loewis.de>
References: <f9grv8$1eg$1@sea.gmane.org> <46BC02FC.6080107@v.loewis.de>
Message-ID: <ca471dc20708101026j6368a727r8a946579c22e165b@mail.gmail.com>

On 8/9/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Georg Brandl schrieb:
> > Well, subject says it all. While 2.5 sets sys.std*.encoding correctly to
> > UTF-8, 3k sets it to 'latin-1', breaking output of Unicode strings.
>
> And not surprisingly so: io.py says
>
>         if encoding is None:
>             # XXX This is questionable
>             encoding = sys.getfilesystemencoding() or "latin-1"

Guilty as charged.

Alas, I don't know much about the machinery of console and filesystem
encodings, so I need help!

> First, at the point where this call is made, sys.getfilesystemencoding
> is still None,

What can we do about this? Set it earlier? It should really be set by
the time site.py is imported (which sets sys.stdin/out/err), as this
is the first time a lot of Python code is run that touches the
filesystem (e.g. sys.path mangling).

> plus the code is broken as getfilesystemencoding is not
> the correct value for sys.stdout.encoding. Instead, the way it should
> be computed is:
>
> 1. On Unix, use the same value that sys.getfilesystemencoding will get,
>    namely the result of nl_langinfo(CODESET); if that is not available,
>    fall back - to anything, but the most logical choices are UTF-8
>    (if you want output to always succeed) and ASCII (if you don't want
>    to risk mojibake).
> 2. On Windows, if output is to a terminal, use GetConsoleOutputCP.
>    Else fall back, probably to CP_ACP (ie. "mbcs")
> 3. On OSX, I don't know. If output is to a terminal, UTF-8 may be
>    a good bet (although some people operate their Terminal.apps
>    not in UTF-8; there is no way to find out). Otherwise, use the
>    locale's encoding - not sure how to find out what that is.

Feel free to add code that implements this. I suppose it would be a
good idea to have a separate function io.guess_console_encoding(...)
which takes some argument (perhaps a raw file?) and returns an
encoding name, never None. This could then be implemented by switching
on the platform into platform-specific functions and a default.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 10 19:56:00 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Aug 2007 10:56:00 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
Message-ID: <ca471dc20708101056h7ef8e014va5188033b70e3e1c@mail.gmail.com>

On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > There are currently about 7 failing unit tests left:
> >
> > test_bsddb

Looks like this (trivial) test now passes, at least on the one box I
have where it isn't skipped (an ancient red hat 7.3 box that just
won't die :-).

> > test_bsddb3

FWIW, this one *hangs* for me on the only box where I have the right
environment to build bsddb, after running many tests successfully.

> There are other problems:
>  * had to kill test_poplib due to taking all cpu without progress

I tried to reproduce this but couldn't (using a debug build). Is it
reproducible for you? What are the exact arguments you pass to Python
and regrtest and what platform did you use?

>  * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3:
> test_foo test_bar ...)

Did Walter's submit fix these? If not, can you provide more details of
the remaining leaks?

>  * at least one test fails with a fatal error

I'll look into this.

>  * make install fails

I think I fixed this. The traceback module no longer likes getting a
tuple instead of an Exception instance, so a small adjustment had to
be made to py_compile.py. The syntax error that py_compile was trying
to report is now no longer fatal.

> Here are the details (probably best to update the wiki with status
> before people start working on these):

I don't think it hurts if multiple people look into this. These are
all tough problems. Folks, do report back here as soon as you've got a
result (positive or negative).

> I'm not sure what was happening with test_poplib.  I had to kill
> test_poplib due to taking all cpu without progress.  When I ran it by
> itself, it was fine.  So there was some bad interaction with another
> test.

And I can't reproduce it either. Can you?

> Ref leaks and fatal error (see
> http://docs.python.org/dev/3.0/results/make-test-refleak.out):
> test_array leaked [11, 11, 11] references, sum=33
> test_bytes leaked [4, 4, 4] references, sum=12
> test_codeccallbacks leaked [21, 21, 21] references, sum=63
> test_codecs leaked [260, 260, 260] references, sum=780
> test_ctypes leaked [10, 10, 10] references, sum=30
> Fatal Python error:
> /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at
> 0xb60b19c8 has negative ref count -4
>
> There are probably more, but I haven't had a chance to run more after
> test_datetime.

I'm running regrtest.py -uall -R4:3: now.

> This failure occurred while running with -R:
>
> test test_coding failed -- Traceback (most recent call last):
>   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> line 12, in test_bad_coding2
>     self.verify_bad_module(module_name)
>   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> line 20, in verify_bad_module
>     text = fp.read()
>   File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read
>     res += decoder.decode(self.buffer.read(), True)
>   File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py",
> line 26, in decode
>     return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
> 0: ordinal not in range(128)

This doesn't fail in isolation for me.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Fri Aug 10 20:09:13 2007
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 10 Aug 2007 13:09:13 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BC1479.30405@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>
	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>
	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>
	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>
	<46BC1479.30405@canterbury.ac.nz>
Message-ID: <46BCA9C9.1010306@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>>     - alter the value and return (new_value, format_spec)
>>     - alter the format_spec and return (value, new_format_spec)
>>     - do logging of some values, and return the (value, format_spec) 
>> unchanged.
> 
> I would ditch all of these. They're not necessary, as
> the same effect can be achieved by explicitly calling
> another __format__ method, or one's own __format__
> method with different args, and returning the result.

I'm not sure what you mean by "ditch all of these".  Do you mean not 
document them, or not have the format function do anything further on the 
(value, format_spec) after it is returned from the __format__ method?


>>     - do something entirely different and return ('', None)
> 
> I don't understand. What is meant to happen in that case?

The output in this case is a null string. Setting the format spec to None 
tell the format() function not to do anything more to it.

What ever else happens in the __format__ method is up to the programmer.


>> Does this look ok, or would you do it a different way?
> 
> You haven't explained how this addresses the 'r' issue
> without requiring every __format__ method to recognise
> and deal with it.

The format function recognizes and deals with 'r' just as you suggest, but 
it also recognizes and deals with all the other standard formatter types in 
the same way.

The format function would first call the objects __format__ method and give 
it a chance to have control, and depending on what is returned, try to 
handle it or not.

If you want the 'r' specifier to always have precedence over even custom 
__format__ methods, then you can do that too, but I don't see the need.

Ron





From nas at arctrix.com  Fri Aug 10 20:17:21 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Fri, 10 Aug 2007 18:17:21 +0000 (UTC)
Subject: [Python-3000] No (C) optimization flag
References: <f9hakp$6n4$1@sea.gmane.org>
	<ca471dc20708100744u58df433s11479973620aa93a@mail.gmail.com>
	<46BC9400.70803@gmail.com>
Message-ID: <f9ia3h$jbb$1@sea.gmane.org>

Nick Coghlan <ncoghlan at gmail.com> wrote:
> However we select between Python and native module versions, the build 
> bots need be set up to run the modules both ways (with and without C 
> optimisation).

If there is a way to explictly import each module separately then I
think that meets both needs.

  Neil


From guido at python.org  Fri Aug 10 20:18:35 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Aug 2007 11:18:35 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>
	<ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
Message-ID: <ca471dc20708101118y59b1e56dh5fda77fa461e69b8@mail.gmail.com>

Status update:

The following still leak (regrtest.py -R4:3:)

test_array leaked [11, 11, 11] references, sum=33
test_multibytecodec leaked [72, 72, 72] references, sum=216
test_parser leaked [5, 5, 5] references, sum=15
test_zipimport leaked [29, 29, 29] references, sum=87

I can't reproduce the test_shelve failure.

I *do* see the test_structmember failure, will investigate.

I see a failure but no segfault in test_datetime; will investigate.

Regarding test_univnewlines, this is virgin territory. I've never met
anyone who used the newlines attribute on file objects. I'll make a
separate post to call it out.

--Guido

On 8/10/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> Bah, who needs sleep anyways.  This list of problems should be fairly
> complete when running with -R.  (it skips the fatal error from
> test_datetime though)
>
> Code to trigger a leak:   b'\xff'.decode("utf8", "ignore")
>
> Leaks:
> test_array leaked [11, 11, 11] references, sum=33
> test_bytes leaked [4, 4, 4] references, sum=12
> test_codeccallbacks leaked [21, 21, 21] references, sum=63
> test_codecs leaked [260, 260, 260] references, sum=780
> test_ctypes leaked [-22, 43, 10] references, sum=31
> test_multibytecodec leaked [72, 72, 72] references, sum=216
> test_parser leaked [5, 5, 5] references, sum=15
> test_unicode leaked [4, 4, 4] references, sum=12
> test_xml_etree leaked [128, 128, 128] references, sum=384
> test_xml_etree_c leaked [128, 128, 128] references, sum=384
> test_zipimport leaked [29, 29, 29] references, sum=87
>
> Failures with -R:
>
> test test_collections failed -- errors occurred; run in verbose mode for details
>
> test test_gzip failed -- Traceback (most recent call last):
>  File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> test_many_append
>    ztxt = zgfile.read(8192)
>  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
>    self._read(readsize)
>  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
>    self._read_eof()
>  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
>    crc32 = read32(self.fileobj)
>  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
>    return struct.unpack("<l", input.read(4))[0]
>  File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
>    return o.unpack(s)
> struct.error: unpack requires a string argument of length 4
>
> test test_runpy failed -- Traceback (most recent call last):
>  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> in test_run_module
>    self._check_module(depth)
>  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> in _check_module
>    d2 = run_module(mod_name) # Read from bytecode
>  File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
>    raise ImportError("No module named %s" % mod_name)
> ImportError: No module named runpy_test
>
> test test_shelve failed -- errors occurred; run in verbose mode for details
>
> test test_structmembers failed -- errors occurred; run in verbose mode
> for details
>
> test_univnewlines skipped -- This Python does not have universal newline support
>
> Traceback (most recent call last):
>   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in
> handle_request
>     self.process_request(request, client_address)
>   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in
> process_request
>     self.finish_request(request, client_address)
>   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in
> finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__
>     self.handle()
>   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle
>     self.handle_one_request()
>   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303,
> in handle_one_request
>     if not self.parse_request(): # An error code has been sent, just exit
>   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281,
> in parse_request
>     self.headers = self.MessageClass(self.rfile, 0)
>   File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__
>     rfc822.Message.__init__(self, fp, seekable)
>   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__
>     self.readheaders()
>   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders
>     headerseen = self.isheader(line)
>   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader
>     return line[:i].lower()
> AttributeError: 'bytes' object has no attribute 'lower'
>
> On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > I wonder if a lot of the refleaks may have the same cause as this one:
> >
> >   b'\xff'.decode("utf8", "ignore")
> >
> > No leaks jumped out at me.  Here is the rest of the leaks that have
> > been reported so far.  I don't know how many have the same cause.
> >
> > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > test_parser leaked [5, 5, 5] references, sum=15
> >
> > The other failures that occurred with -R:
> >
> > test test_collections failed -- errors occurred; run in verbose mode for details
> >
> > test test_gzip failed -- Traceback (most recent call last):
> >   File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> > test_many_append
> >     ztxt = zgfile.read(8192)
> >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
> >     self._read(readsize)
> >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
> >     self._read_eof()
> >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
> >     crc32 = read32(self.fileobj)
> >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
> >     return struct.unpack("<l", input.read(4))[0]
> >   File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
> >     return o.unpack(s)
> > struct.error: unpack requires a string argument of length 4
> >
> > test test_runpy failed -- Traceback (most recent call last):
> >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> > in test_run_module
> >     self._check_module(depth)
> >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> > in _check_module
> >     d2 = run_module(mod_name) # Read from bytecode
> >   File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
> >     raise ImportError("No module named %s" % mod_name)
> > ImportError: No module named runpy_test
> >
> > test_textwrap was the last test to complete.  test_thread was still running.
> >
> > n
> > --
> > On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> > > > This is done. The new py3k branch is ready for business.
> > > >
> > > > Left to do:
> > > >
> > > > - switch the buildbot and the doc builder to use the new branch (Neal)
> > >
> > > I've updated to use the new branch.  I got the docs building, but
> > > there are many more problems.  I won't re-enable the cronjob until
> > > more things are working.
> > >
> > > > There are currently about 7 failing unit tests left:
> > > >
> > > > test_bsddb
> > > > test_bsddb3
> > > > test_email
> > > > test_email_codecs
> > > > test_email_renamed
> > > > test_sqlite
> > > > test_urllib2_localnet
> > >
> > > Ok, I disabled these, so if only they fail, mail shouldn't be sent
> > > (when I enable the script).
> > >
> > > There are other problems:
> > >  * had to kill test_poplib due to taking all cpu without progress
> > >  * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3:
> > > test_foo test_bar ...)
> > >  * at least one test fails with a fatal error
> > >  * make install fails
> > >
> > > Here are the details (probably best to update the wiki with status
> > > before people start working on these):
> > >
> > > I'm not sure what was happening with test_poplib.  I had to kill
> > > test_poplib due to taking all cpu without progress.  When I ran it by
> > > itself, it was fine.  So there was some bad interaction with another
> > > test.
> > >
> > > Ref leaks and fatal error (see
> > > http://docs.python.org/dev/3.0/results/make-test-refleak.out):
> > > test_array leaked [11, 11, 11] references, sum=33
> > > test_bytes leaked [4, 4, 4] references, sum=12
> > > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > > test_codecs leaked [260, 260, 260] references, sum=780
> > > test_ctypes leaked [10, 10, 10] references, sum=30
> > > Fatal Python error:
> > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at
> > > 0xb60b19c8 has negative ref count -4
> > >
> > > There are probably more, but I haven't had a chance to run more after
> > > test_datetime.
> > >
> > > This failure occurred while running with -R:
> > >
> > > test test_coding failed -- Traceback (most recent call last):
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > line 12, in test_bad_coding2
> > >     self.verify_bad_module(module_name)
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > line 20, in verify_bad_module
> > >     text = fp.read()
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read
> > >     res += decoder.decode(self.buffer.read(), True)
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py",
> > > line 26, in decode
> > >     return codecs.ascii_decode(input, self.errors)[0]
> > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
> > > 0: ordinal not in range(128)
> > >
> > >
> > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure:
> > >
> > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
> > > Traceback (most recent call last):
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > 162, in <module>
> > >     exit_status = int(not main())
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > 152, in main
> > >     force, rx, quiet):
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > 89, in compile_dir
> > >     if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > 65, in compile_dir
> > >     ok = py_compile.compile(fullname, None, dfile, True)
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > 144, in compile
> > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > 49, in __init__
> > >     tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
> > > 179, in format_exception_only
> > >     filename = value.filename or "<string>"
> > > AttributeError: 'tuple' object has no attribute 'filename'
> > >
> > > I'm guessing this came from the change in exception args handling?
> > >
> > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > 144, in compile
> > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > >
> > > n
> > >
> >
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 10 20:23:45 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Aug 2007 11:23:45 -0700
Subject: [Python-3000] Universal newlines support in Python 3.0
Message-ID: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>

Python 3.0 currently has limited universal newlines support: by
default, \r\n is translated into \n for text files, but this can be
controlled by the newline= keyword parameter. For details on how, see
PEP 3116. The PEP prescribes that a lone \r must also be translated,
though this hasn't been implemented yet (any volunteers?).

However, the old universal newlines feature also set an attibute named
'newlines' on the file object to a tuple of up to three elements
giving the actual line endings that were observed on the file so far
(\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
implemented. I'm tempted to kill it. Does anyone have a use case for
this? Has anyone even ever used this?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 10 20:28:05 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Aug 2007 11:28:05 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ca471dc20708101118y59b1e56dh5fda77fa461e69b8@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>
	<ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
	<ca471dc20708101118y59b1e56dh5fda77fa461e69b8@mail.gmail.com>
Message-ID: <ca471dc20708101128v43204087jb29e1a538fed8991@mail.gmail.com>

Um, Neal reported some more failures with -R earlier. I can reproduce
the failures for test_collections and test_runpy, but test_gzip passes
fine for me (standalone). I'll look into these. I'm still running the
full set as well, it'll take all day. I can't reproduce Neal's problem
with test_poplib (which was pegging the CPU for him).

On 8/10/07, Guido van Rossum <guido at python.org> wrote:
> Status update:
>
> The following still leak (regrtest.py -R4:3:)
>
> test_array leaked [11, 11, 11] references, sum=33
> test_multibytecodec leaked [72, 72, 72] references, sum=216
> test_parser leaked [5, 5, 5] references, sum=15
> test_zipimport leaked [29, 29, 29] references, sum=87
>
> I can't reproduce the test_shelve failure.
>
> I *do* see the test_structmember failure, will investigate.
>
> I see a failure but no segfault in test_datetime; will investigate.
>
> Regarding test_univnewlines, this is virgin territory. I've never met
> anyone who used the newlines attribute on file objects. I'll make a
> separate post to call it out.
>
> --Guido
>
> On 8/10/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > Bah, who needs sleep anyways.  This list of problems should be fairly
> > complete when running with -R.  (it skips the fatal error from
> > test_datetime though)
> >
> > Code to trigger a leak:   b'\xff'.decode("utf8", "ignore")
> >
> > Leaks:
> > test_array leaked [11, 11, 11] references, sum=33
> > test_bytes leaked [4, 4, 4] references, sum=12
> > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > test_codecs leaked [260, 260, 260] references, sum=780
> > test_ctypes leaked [-22, 43, 10] references, sum=31
> > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > test_parser leaked [5, 5, 5] references, sum=15
> > test_unicode leaked [4, 4, 4] references, sum=12
> > test_xml_etree leaked [128, 128, 128] references, sum=384
> > test_xml_etree_c leaked [128, 128, 128] references, sum=384
> > test_zipimport leaked [29, 29, 29] references, sum=87
> >
> > Failures with -R:
> >
> > test test_collections failed -- errors occurred; run in verbose mode for details
> >
> > test test_gzip failed -- Traceback (most recent call last):
> >  File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> > test_many_append
> >    ztxt = zgfile.read(8192)
> >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
> >    self._read(readsize)
> >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
> >    self._read_eof()
> >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
> >    crc32 = read32(self.fileobj)
> >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
> >    return struct.unpack("<l", input.read(4))[0]
> >  File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
> >    return o.unpack(s)
> > struct.error: unpack requires a string argument of length 4
> >
> > test test_runpy failed -- Traceback (most recent call last):
> >  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> > in test_run_module
> >    self._check_module(depth)
> >  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> > in _check_module
> >    d2 = run_module(mod_name) # Read from bytecode
> >  File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
> >    raise ImportError("No module named %s" % mod_name)
> > ImportError: No module named runpy_test
> >
> > test test_shelve failed -- errors occurred; run in verbose mode for details
> >
> > test test_structmembers failed -- errors occurred; run in verbose mode
> > for details
> >
> > test_univnewlines skipped -- This Python does not have universal newline support
> >
> > Traceback (most recent call last):
> >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in
> > handle_request
> >     self.process_request(request, client_address)
> >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in
> > process_request
> >     self.finish_request(request, client_address)
> >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in
> > finish_request
> >     self.RequestHandlerClass(request, client_address, self)
> >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__
> >     self.handle()
> >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle
> >     self.handle_one_request()
> >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303,
> > in handle_one_request
> >     if not self.parse_request(): # An error code has been sent, just exit
> >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281,
> > in parse_request
> >     self.headers = self.MessageClass(self.rfile, 0)
> >   File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__
> >     rfc822.Message.__init__(self, fp, seekable)
> >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__
> >     self.readheaders()
> >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders
> >     headerseen = self.isheader(line)
> >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader
> >     return line[:i].lower()
> > AttributeError: 'bytes' object has no attribute 'lower'
> >
> > On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > I wonder if a lot of the refleaks may have the same cause as this one:
> > >
> > >   b'\xff'.decode("utf8", "ignore")
> > >
> > > No leaks jumped out at me.  Here is the rest of the leaks that have
> > > been reported so far.  I don't know how many have the same cause.
> > >
> > > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > > test_parser leaked [5, 5, 5] references, sum=15
> > >
> > > The other failures that occurred with -R:
> > >
> > > test test_collections failed -- errors occurred; run in verbose mode for details
> > >
> > > test test_gzip failed -- Traceback (most recent call last):
> > >   File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> > > test_many_append
> > >     ztxt = zgfile.read(8192)
> > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
> > >     self._read(readsize)
> > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
> > >     self._read_eof()
> > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
> > >     crc32 = read32(self.fileobj)
> > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
> > >     return struct.unpack("<l", input.read(4))[0]
> > >   File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
> > >     return o.unpack(s)
> > > struct.error: unpack requires a string argument of length 4
> > >
> > > test test_runpy failed -- Traceback (most recent call last):
> > >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> > > in test_run_module
> > >     self._check_module(depth)
> > >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> > > in _check_module
> > >     d2 = run_module(mod_name) # Read from bytecode
> > >   File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
> > >     raise ImportError("No module named %s" % mod_name)
> > > ImportError: No module named runpy_test
> > >
> > > test_textwrap was the last test to complete.  test_thread was still running.
> > >
> > > n
> > > --
> > > On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > > On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> > > > > This is done. The new py3k branch is ready for business.
> > > > >
> > > > > Left to do:
> > > > >
> > > > > - switch the buildbot and the doc builder to use the new branch (Neal)
> > > >
> > > > I've updated to use the new branch.  I got the docs building, but
> > > > there are many more problems.  I won't re-enable the cronjob until
> > > > more things are working.
> > > >
> > > > > There are currently about 7 failing unit tests left:
> > > > >
> > > > > test_bsddb
> > > > > test_bsddb3
> > > > > test_email
> > > > > test_email_codecs
> > > > > test_email_renamed
> > > > > test_sqlite
> > > > > test_urllib2_localnet
> > > >
> > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent
> > > > (when I enable the script).
> > > >
> > > > There are other problems:
> > > >  * had to kill test_poplib due to taking all cpu without progress
> > > >  * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3:
> > > > test_foo test_bar ...)
> > > >  * at least one test fails with a fatal error
> > > >  * make install fails
> > > >
> > > > Here are the details (probably best to update the wiki with status
> > > > before people start working on these):
> > > >
> > > > I'm not sure what was happening with test_poplib.  I had to kill
> > > > test_poplib due to taking all cpu without progress.  When I ran it by
> > > > itself, it was fine.  So there was some bad interaction with another
> > > > test.
> > > >
> > > > Ref leaks and fatal error (see
> > > > http://docs.python.org/dev/3.0/results/make-test-refleak.out):
> > > > test_array leaked [11, 11, 11] references, sum=33
> > > > test_bytes leaked [4, 4, 4] references, sum=12
> > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > > > test_codecs leaked [260, 260, 260] references, sum=780
> > > > test_ctypes leaked [10, 10, 10] references, sum=30
> > > > Fatal Python error:
> > > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at
> > > > 0xb60b19c8 has negative ref count -4
> > > >
> > > > There are probably more, but I haven't had a chance to run more after
> > > > test_datetime.
> > > >
> > > > This failure occurred while running with -R:
> > > >
> > > > test test_coding failed -- Traceback (most recent call last):
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > > line 12, in test_bad_coding2
> > > >     self.verify_bad_module(module_name)
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > > line 20, in verify_bad_module
> > > >     text = fp.read()
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read
> > > >     res += decoder.decode(self.buffer.read(), True)
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py",
> > > > line 26, in decode
> > > >     return codecs.ascii_decode(input, self.errors)[0]
> > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
> > > > 0: ordinal not in range(128)
> > > >
> > > >
> > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure:
> > > >
> > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
> > > > Traceback (most recent call last):
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > 162, in <module>
> > > >     exit_status = int(not main())
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > 152, in main
> > > >     force, rx, quiet):
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > 89, in compile_dir
> > > >     if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > 65, in compile_dir
> > > >     ok = py_compile.compile(fullname, None, dfile, True)
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > 144, in compile
> > > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > 49, in __init__
> > > >     tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
> > > > 179, in format_exception_only
> > > >     filename = value.filename or "<string>"
> > > > AttributeError: 'tuple' object has no attribute 'filename'
> > > >
> > > > I'm guessing this came from the change in exception args handling?
> > > >
> > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > 144, in compile
> > > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > > >
> > > > n
> > > >
> > >
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stephen at xemacs.org  Fri Aug 10 21:15:45 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 11 Aug 2007 04:15:45 +0900
Subject: [Python-3000]  Universal newlines support in Python 3.0
In-Reply-To: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
Message-ID: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>

Guido van Rossum writes:

 > However, the old universal newlines feature also set an attibute named
 > 'newlines' on the file object to a tuple of up to three elements
 > giving the actual line endings that were observed on the file so far
 > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
 > implemented. I'm tempted to kill it. Does anyone have a use case for
 > this?

I have run into files that intentionally have more than one newline
convention used (mbox and Babyl mail folders, with messages received
from various platforms).  However, most of the time multiple newline
conventions is a sign that the file is either corrupt or isn't text.
If so, then saving the file may corrupt it.  The newlines attribute
could be used to check for this condition.

 > Has anyone even ever used this?

Not I.  When I care about such issues I prefer that the codec raise an
exception at the time of detection.


From brett at python.org  Fri Aug 10 21:11:12 2007
From: brett at python.org (Brett Cannon)
Date: Fri, 10 Aug 2007 12:11:12 -0700
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <46BC9400.70803@gmail.com>
References: <f9hakp$6n4$1@sea.gmane.org>
	<ca471dc20708100744u58df433s11479973620aa93a@mail.gmail.com>
	<46BC9400.70803@gmail.com>
Message-ID: <bbaeab100708101211nd7c04b8x9726c93e09d4452b@mail.gmail.com>

On 8/10/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
> > If you really need to step through the Python code, you can just
> > sabotage the loading of the non-Python version, e.g. remove or rename
> > the .so or .dll file temporarily.
> >
> > I wonder about the usefulness of this debugging though -- if you're
> > debugging something that requires you to step through the C code, how
> > do you know that the same bug is present in the Python code you're
> > stepping through instead? Otherwise (if you're debugging a bug in your
> > own program) I'm not sure I see how stepping through the I/O library
> > is helpful.
> >
> > Sounds like what you're really after is *understanding* how the I/O
> > library works. For that, perhaps reading the docs and then reading the
> > source code would be more effective.
>
> However we select between Python and native module versions, the build
> bots need be set up to run the modules both ways (with and without C
> optimisation).
>

Part of Alexandre's SoC work is to come up with a mechanism to do this.

-Brett

From guido at python.org  Fri Aug 10 21:16:47 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Aug 2007 12:16:47 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ca471dc20708101128v43204087jb29e1a538fed8991@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>
	<ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
	<ca471dc20708101118y59b1e56dh5fda77fa461e69b8@mail.gmail.com>
	<ca471dc20708101128v43204087jb29e1a538fed8991@mail.gmail.com>
Message-ID: <ca471dc20708101216m213028d0yec379995cf473157@mail.gmail.com>

I've updated the wiki page with the status for these. I've confirmed
the test_datetime segfault, but I can only provoke it when run in
sequence with all the others.

I'm also experiencing a hang of test_asynchat when run in sequence.

http://wiki.python.org/moin/Py3kStrUniTests

--Guido

On 8/10/07, Guido van Rossum <guido at python.org> wrote:
> Um, Neal reported some more failures with -R earlier. I can reproduce
> the failures for test_collections and test_runpy, but test_gzip passes
> fine for me (standalone). I'll look into these. I'm still running the
> full set as well, it'll take all day. I can't reproduce Neal's problem
> with test_poplib (which was pegging the CPU for him).
>
> On 8/10/07, Guido van Rossum <guido at python.org> wrote:
> > Status update:
> >
> > The following still leak (regrtest.py -R4:3:)
> >
> > test_array leaked [11, 11, 11] references, sum=33
> > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > test_parser leaked [5, 5, 5] references, sum=15
> > test_zipimport leaked [29, 29, 29] references, sum=87
> >
> > I can't reproduce the test_shelve failure.
> >
> > I *do* see the test_structmember failure, will investigate.
> >
> > I see a failure but no segfault in test_datetime; will investigate.
> >
> > Regarding test_univnewlines, this is virgin territory. I've never met
> > anyone who used the newlines attribute on file objects. I'll make a
> > separate post to call it out.
> >
> > --Guido
> >
> > On 8/10/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > Bah, who needs sleep anyways.  This list of problems should be fairly
> > > complete when running with -R.  (it skips the fatal error from
> > > test_datetime though)
> > >
> > > Code to trigger a leak:   b'\xff'.decode("utf8", "ignore")
> > >
> > > Leaks:
> > > test_array leaked [11, 11, 11] references, sum=33
> > > test_bytes leaked [4, 4, 4] references, sum=12
> > > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > > test_codecs leaked [260, 260, 260] references, sum=780
> > > test_ctypes leaked [-22, 43, 10] references, sum=31
> > > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > > test_parser leaked [5, 5, 5] references, sum=15
> > > test_unicode leaked [4, 4, 4] references, sum=12
> > > test_xml_etree leaked [128, 128, 128] references, sum=384
> > > test_xml_etree_c leaked [128, 128, 128] references, sum=384
> > > test_zipimport leaked [29, 29, 29] references, sum=87
> > >
> > > Failures with -R:
> > >
> > > test test_collections failed -- errors occurred; run in verbose mode for details
> > >
> > > test test_gzip failed -- Traceback (most recent call last):
> > >  File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> > > test_many_append
> > >    ztxt = zgfile.read(8192)
> > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
> > >    self._read(readsize)
> > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
> > >    self._read_eof()
> > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
> > >    crc32 = read32(self.fileobj)
> > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
> > >    return struct.unpack("<l", input.read(4))[0]
> > >  File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
> > >    return o.unpack(s)
> > > struct.error: unpack requires a string argument of length 4
> > >
> > > test test_runpy failed -- Traceback (most recent call last):
> > >  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> > > in test_run_module
> > >    self._check_module(depth)
> > >  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> > > in _check_module
> > >    d2 = run_module(mod_name) # Read from bytecode
> > >  File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
> > >    raise ImportError("No module named %s" % mod_name)
> > > ImportError: No module named runpy_test
> > >
> > > test test_shelve failed -- errors occurred; run in verbose mode for details
> > >
> > > test test_structmembers failed -- errors occurred; run in verbose mode
> > > for details
> > >
> > > test_univnewlines skipped -- This Python does not have universal newline support
> > >
> > > Traceback (most recent call last):
> > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in
> > > handle_request
> > >     self.process_request(request, client_address)
> > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in
> > > process_request
> > >     self.finish_request(request, client_address)
> > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in
> > > finish_request
> > >     self.RequestHandlerClass(request, client_address, self)
> > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__
> > >     self.handle()
> > >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle
> > >     self.handle_one_request()
> > >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303,
> > > in handle_one_request
> > >     if not self.parse_request(): # An error code has been sent, just exit
> > >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281,
> > > in parse_request
> > >     self.headers = self.MessageClass(self.rfile, 0)
> > >   File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__
> > >     rfc822.Message.__init__(self, fp, seekable)
> > >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__
> > >     self.readheaders()
> > >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders
> > >     headerseen = self.isheader(line)
> > >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader
> > >     return line[:i].lower()
> > > AttributeError: 'bytes' object has no attribute 'lower'
> > >
> > > On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > > I wonder if a lot of the refleaks may have the same cause as this one:
> > > >
> > > >   b'\xff'.decode("utf8", "ignore")
> > > >
> > > > No leaks jumped out at me.  Here is the rest of the leaks that have
> > > > been reported so far.  I don't know how many have the same cause.
> > > >
> > > > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > > > test_parser leaked [5, 5, 5] references, sum=15
> > > >
> > > > The other failures that occurred with -R:
> > > >
> > > > test test_collections failed -- errors occurred; run in verbose mode for details
> > > >
> > > > test test_gzip failed -- Traceback (most recent call last):
> > > >   File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> > > > test_many_append
> > > >     ztxt = zgfile.read(8192)
> > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
> > > >     self._read(readsize)
> > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
> > > >     self._read_eof()
> > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
> > > >     crc32 = read32(self.fileobj)
> > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
> > > >     return struct.unpack("<l", input.read(4))[0]
> > > >   File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
> > > >     return o.unpack(s)
> > > > struct.error: unpack requires a string argument of length 4
> > > >
> > > > test test_runpy failed -- Traceback (most recent call last):
> > > >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> > > > in test_run_module
> > > >     self._check_module(depth)
> > > >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> > > > in _check_module
> > > >     d2 = run_module(mod_name) # Read from bytecode
> > > >   File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
> > > >     raise ImportError("No module named %s" % mod_name)
> > > > ImportError: No module named runpy_test
> > > >
> > > > test_textwrap was the last test to complete.  test_thread was still running.
> > > >
> > > > n
> > > > --
> > > > On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > > > On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> > > > > > This is done. The new py3k branch is ready for business.
> > > > > >
> > > > > > Left to do:
> > > > > >
> > > > > > - switch the buildbot and the doc builder to use the new branch (Neal)
> > > > >
> > > > > I've updated to use the new branch.  I got the docs building, but
> > > > > there are many more problems.  I won't re-enable the cronjob until
> > > > > more things are working.
> > > > >
> > > > > > There are currently about 7 failing unit tests left:
> > > > > >
> > > > > > test_bsddb
> > > > > > test_bsddb3
> > > > > > test_email
> > > > > > test_email_codecs
> > > > > > test_email_renamed
> > > > > > test_sqlite
> > > > > > test_urllib2_localnet
> > > > >
> > > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent
> > > > > (when I enable the script).
> > > > >
> > > > > There are other problems:
> > > > >  * had to kill test_poplib due to taking all cpu without progress
> > > > >  * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3:
> > > > > test_foo test_bar ...)
> > > > >  * at least one test fails with a fatal error
> > > > >  * make install fails
> > > > >
> > > > > Here are the details (probably best to update the wiki with status
> > > > > before people start working on these):
> > > > >
> > > > > I'm not sure what was happening with test_poplib.  I had to kill
> > > > > test_poplib due to taking all cpu without progress.  When I ran it by
> > > > > itself, it was fine.  So there was some bad interaction with another
> > > > > test.
> > > > >
> > > > > Ref leaks and fatal error (see
> > > > > http://docs.python.org/dev/3.0/results/make-test-refleak.out):
> > > > > test_array leaked [11, 11, 11] references, sum=33
> > > > > test_bytes leaked [4, 4, 4] references, sum=12
> > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > > > > test_codecs leaked [260, 260, 260] references, sum=780
> > > > > test_ctypes leaked [10, 10, 10] references, sum=30
> > > > > Fatal Python error:
> > > > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at
> > > > > 0xb60b19c8 has negative ref count -4
> > > > >
> > > > > There are probably more, but I haven't had a chance to run more after
> > > > > test_datetime.
> > > > >
> > > > > This failure occurred while running with -R:
> > > > >
> > > > > test test_coding failed -- Traceback (most recent call last):
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > > > line 12, in test_bad_coding2
> > > > >     self.verify_bad_module(module_name)
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > > > line 20, in verify_bad_module
> > > > >     text = fp.read()
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read
> > > > >     res += decoder.decode(self.buffer.read(), True)
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py",
> > > > > line 26, in decode
> > > > >     return codecs.ascii_decode(input, self.errors)[0]
> > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
> > > > > 0: ordinal not in range(128)
> > > > >
> > > > >
> > > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure:
> > > > >
> > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
> > > > > Traceback (most recent call last):
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > 162, in <module>
> > > > >     exit_status = int(not main())
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > 152, in main
> > > > >     force, rx, quiet):
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > 89, in compile_dir
> > > > >     if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > 65, in compile_dir
> > > > >     ok = py_compile.compile(fullname, None, dfile, True)
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > > 144, in compile
> > > > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > > 49, in __init__
> > > > >     tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
> > > > > 179, in format_exception_only
> > > > >     filename = value.filename or "<string>"
> > > > > AttributeError: 'tuple' object has no attribute 'filename'
> > > > >
> > > > > I'm guessing this came from the change in exception args handling?
> > > > >
> > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > > 144, in compile
> > > > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > > > >
> > > > > n
> > > > >
> > > >
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 10 21:19:48 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Aug 2007 12:19:48 -0700
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <f9ia3h$jbb$1@sea.gmane.org>
References: <f9hakp$6n4$1@sea.gmane.org>
	<ca471dc20708100744u58df433s11479973620aa93a@mail.gmail.com>
	<46BC9400.70803@gmail.com> <f9ia3h$jbb$1@sea.gmane.org>
Message-ID: <ca471dc20708101219l2fc1ff2bk3501c00a8bf13faa@mail.gmail.com>

On 8/10/07, Neil Schemenauer <nas at arctrix.com> wrote:
> Nick Coghlan <ncoghlan at gmail.com> wrote:
> > However we select between Python and native module versions, the build
> > bots need be set up to run the modules both ways (with and without C
> > optimisation).
>
> If there is a way to explictly import each module separately then I
> think that meets both needs.

This sounds good. It may be as simple as moving the Python
implementation into a separate module as well, and having the public
module attempt to import first from the C code, then from the Python
code.

I think that if there's code for which no C equivalent exists (e.g.
some stuff in heapq.py, presumably some stuff in io.py), it should be
in the public module, so the latter cal do something like this:

try:
  from _c_foo import *  # C version
except ImportError:
  from _py_foo import *  # Py vesrion

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tony at PageDNA.com  Fri Aug 10 21:27:46 2007
From: tony at PageDNA.com (Tony Lownds)
Date: Fri, 10 Aug 2007 12:27:46 -0700
Subject: [Python-3000] Universal newlines support in Python 3.0
In-Reply-To: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
Message-ID: <E96C8047-00EF-48A7-A693-42D476557F99@PageDNA.com>


On Aug 10, 2007, at 11:23 AM, Guido van Rossum wrote:

> Python 3.0 currently has limited universal newlines support: by
> default, \r\n is translated into \n for text files, but this can be
> controlled by the newline= keyword parameter. For details on how, see
> PEP 3116. The PEP prescribes that a lone \r must also be translated,
> though this hasn't been implemented yet (any volunteers?).

I'll give it a shot!

-Tony

From jeremy at alum.mit.edu  Fri Aug 10 22:13:46 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Fri, 10 Aug 2007 16:13:46 -0400
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ca471dc20708101216m213028d0yec379995cf473157@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>
	<ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
	<ca471dc20708101118y59b1e56dh5fda77fa461e69b8@mail.gmail.com>
	<ca471dc20708101128v43204087jb29e1a538fed8991@mail.gmail.com>
	<ca471dc20708101216m213028d0yec379995cf473157@mail.gmail.com>
Message-ID: <e8bf7a530708101313m6f992b1du38c6200b17b9a63b@mail.gmail.com>

I also see test_shelve failing because something is passing bytes as a
dictionary key.  I've just started seeing it, but can't figure out
what caused the change.

Jeremy

On 8/10/07, Guido van Rossum <guido at python.org> wrote:
> I've updated the wiki page with the status for these. I've confirmed
> the test_datetime segfault, but I can only provoke it when run in
> sequence with all the others.
>
> I'm also experiencing a hang of test_asynchat when run in sequence.
>
> http://wiki.python.org/moin/Py3kStrUniTests
>
> --Guido
>
> On 8/10/07, Guido van Rossum <guido at python.org> wrote:
> > Um, Neal reported some more failures with -R earlier. I can reproduce
> > the failures for test_collections and test_runpy, but test_gzip passes
> > fine for me (standalone). I'll look into these. I'm still running the
> > full set as well, it'll take all day. I can't reproduce Neal's problem
> > with test_poplib (which was pegging the CPU for him).
> >
> > On 8/10/07, Guido van Rossum <guido at python.org> wrote:
> > > Status update:
> > >
> > > The following still leak (regrtest.py -R4:3:)
> > >
> > > test_array leaked [11, 11, 11] references, sum=33
> > > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > > test_parser leaked [5, 5, 5] references, sum=15
> > > test_zipimport leaked [29, 29, 29] references, sum=87
> > >
> > > I can't reproduce the test_shelve failure.
> > >
> > > I *do* see the test_structmember failure, will investigate.
> > >
> > > I see a failure but no segfault in test_datetime; will investigate.
> > >
> > > Regarding test_univnewlines, this is virgin territory. I've never met
> > > anyone who used the newlines attribute on file objects. I'll make a
> > > separate post to call it out.
> > >
> > > --Guido
> > >
> > > On 8/10/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > > Bah, who needs sleep anyways.  This list of problems should be fairly
> > > > complete when running with -R.  (it skips the fatal error from
> > > > test_datetime though)
> > > >
> > > > Code to trigger a leak:   b'\xff'.decode("utf8", "ignore")
> > > >
> > > > Leaks:
> > > > test_array leaked [11, 11, 11] references, sum=33
> > > > test_bytes leaked [4, 4, 4] references, sum=12
> > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > > > test_codecs leaked [260, 260, 260] references, sum=780
> > > > test_ctypes leaked [-22, 43, 10] references, sum=31
> > > > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > > > test_parser leaked [5, 5, 5] references, sum=15
> > > > test_unicode leaked [4, 4, 4] references, sum=12
> > > > test_xml_etree leaked [128, 128, 128] references, sum=384
> > > > test_xml_etree_c leaked [128, 128, 128] references, sum=384
> > > > test_zipimport leaked [29, 29, 29] references, sum=87
> > > >
> > > > Failures with -R:
> > > >
> > > > test test_collections failed -- errors occurred; run in verbose mode for details
> > > >
> > > > test test_gzip failed -- Traceback (most recent call last):
> > > >  File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> > > > test_many_append
> > > >    ztxt = zgfile.read(8192)
> > > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
> > > >    self._read(readsize)
> > > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
> > > >    self._read_eof()
> > > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
> > > >    crc32 = read32(self.fileobj)
> > > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
> > > >    return struct.unpack("<l", input.read(4))[0]
> > > >  File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
> > > >    return o.unpack(s)
> > > > struct.error: unpack requires a string argument of length 4
> > > >
> > > > test test_runpy failed -- Traceback (most recent call last):
> > > >  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> > > > in test_run_module
> > > >    self._check_module(depth)
> > > >  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> > > > in _check_module
> > > >    d2 = run_module(mod_name) # Read from bytecode
> > > >  File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
> > > >    raise ImportError("No module named %s" % mod_name)
> > > > ImportError: No module named runpy_test
> > > >
> > > > test test_shelve failed -- errors occurred; run in verbose mode for details
> > > >
> > > > test test_structmembers failed -- errors occurred; run in verbose mode
> > > > for details
> > > >
> > > > test_univnewlines skipped -- This Python does not have universal newline support
> > > >
> > > > Traceback (most recent call last):
> > > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in
> > > > handle_request
> > > >     self.process_request(request, client_address)
> > > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in
> > > > process_request
> > > >     self.finish_request(request, client_address)
> > > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in
> > > > finish_request
> > > >     self.RequestHandlerClass(request, client_address, self)
> > > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__
> > > >     self.handle()
> > > >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle
> > > >     self.handle_one_request()
> > > >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303,
> > > > in handle_one_request
> > > >     if not self.parse_request(): # An error code has been sent, just exit
> > > >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281,
> > > > in parse_request
> > > >     self.headers = self.MessageClass(self.rfile, 0)
> > > >   File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__
> > > >     rfc822.Message.__init__(self, fp, seekable)
> > > >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__
> > > >     self.readheaders()
> > > >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders
> > > >     headerseen = self.isheader(line)
> > > >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader
> > > >     return line[:i].lower()
> > > > AttributeError: 'bytes' object has no attribute 'lower'
> > > >
> > > > On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > > > I wonder if a lot of the refleaks may have the same cause as this one:
> > > > >
> > > > >   b'\xff'.decode("utf8", "ignore")
> > > > >
> > > > > No leaks jumped out at me.  Here is the rest of the leaks that have
> > > > > been reported so far.  I don't know how many have the same cause.
> > > > >
> > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > > > > test_parser leaked [5, 5, 5] references, sum=15
> > > > >
> > > > > The other failures that occurred with -R:
> > > > >
> > > > > test test_collections failed -- errors occurred; run in verbose mode for details
> > > > >
> > > > > test test_gzip failed -- Traceback (most recent call last):
> > > > >   File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> > > > > test_many_append
> > > > >     ztxt = zgfile.read(8192)
> > > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
> > > > >     self._read(readsize)
> > > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
> > > > >     self._read_eof()
> > > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
> > > > >     crc32 = read32(self.fileobj)
> > > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
> > > > >     return struct.unpack("<l", input.read(4))[0]
> > > > >   File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
> > > > >     return o.unpack(s)
> > > > > struct.error: unpack requires a string argument of length 4
> > > > >
> > > > > test test_runpy failed -- Traceback (most recent call last):
> > > > >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> > > > > in test_run_module
> > > > >     self._check_module(depth)
> > > > >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> > > > > in _check_module
> > > > >     d2 = run_module(mod_name) # Read from bytecode
> > > > >   File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
> > > > >     raise ImportError("No module named %s" % mod_name)
> > > > > ImportError: No module named runpy_test
> > > > >
> > > > > test_textwrap was the last test to complete.  test_thread was still running.
> > > > >
> > > > > n
> > > > > --
> > > > > On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > > > > On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> > > > > > > This is done. The new py3k branch is ready for business.
> > > > > > >
> > > > > > > Left to do:
> > > > > > >
> > > > > > > - switch the buildbot and the doc builder to use the new branch (Neal)
> > > > > >
> > > > > > I've updated to use the new branch.  I got the docs building, but
> > > > > > there are many more problems.  I won't re-enable the cronjob until
> > > > > > more things are working.
> > > > > >
> > > > > > > There are currently about 7 failing unit tests left:
> > > > > > >
> > > > > > > test_bsddb
> > > > > > > test_bsddb3
> > > > > > > test_email
> > > > > > > test_email_codecs
> > > > > > > test_email_renamed
> > > > > > > test_sqlite
> > > > > > > test_urllib2_localnet
> > > > > >
> > > > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent
> > > > > > (when I enable the script).
> > > > > >
> > > > > > There are other problems:
> > > > > >  * had to kill test_poplib due to taking all cpu without progress
> > > > > >  * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3:
> > > > > > test_foo test_bar ...)
> > > > > >  * at least one test fails with a fatal error
> > > > > >  * make install fails
> > > > > >
> > > > > > Here are the details (probably best to update the wiki with status
> > > > > > before people start working on these):
> > > > > >
> > > > > > I'm not sure what was happening with test_poplib.  I had to kill
> > > > > > test_poplib due to taking all cpu without progress.  When I ran it by
> > > > > > itself, it was fine.  So there was some bad interaction with another
> > > > > > test.
> > > > > >
> > > > > > Ref leaks and fatal error (see
> > > > > > http://docs.python.org/dev/3.0/results/make-test-refleak.out):
> > > > > > test_array leaked [11, 11, 11] references, sum=33
> > > > > > test_bytes leaked [4, 4, 4] references, sum=12
> > > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > > > > > test_codecs leaked [260, 260, 260] references, sum=780
> > > > > > test_ctypes leaked [10, 10, 10] references, sum=30
> > > > > > Fatal Python error:
> > > > > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at
> > > > > > 0xb60b19c8 has negative ref count -4
> > > > > >
> > > > > > There are probably more, but I haven't had a chance to run more after
> > > > > > test_datetime.
> > > > > >
> > > > > > This failure occurred while running with -R:
> > > > > >
> > > > > > test test_coding failed -- Traceback (most recent call last):
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > > > > line 12, in test_bad_coding2
> > > > > >     self.verify_bad_module(module_name)
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > > > > line 20, in verify_bad_module
> > > > > >     text = fp.read()
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read
> > > > > >     res += decoder.decode(self.buffer.read(), True)
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py",
> > > > > > line 26, in decode
> > > > > >     return codecs.ascii_decode(input, self.errors)[0]
> > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
> > > > > > 0: ordinal not in range(128)
> > > > > >
> > > > > >
> > > > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure:
> > > > > >
> > > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
> > > > > > Traceback (most recent call last):
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > > 162, in <module>
> > > > > >     exit_status = int(not main())
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > > 152, in main
> > > > > >     force, rx, quiet):
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > > 89, in compile_dir
> > > > > >     if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > > 65, in compile_dir
> > > > > >     ok = py_compile.compile(fullname, None, dfile, True)
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > > > 144, in compile
> > > > > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > > > 49, in __init__
> > > > > >     tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
> > > > > > 179, in format_exception_only
> > > > > >     filename = value.filename or "<string>"
> > > > > > AttributeError: 'tuple' object has no attribute 'filename'
> > > > > >
> > > > > > I'm guessing this came from the change in exception args handling?
> > > > > >
> > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > > > 144, in compile
> > > > > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > > > > >
> > > > > > n
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu
>

From guido at python.org  Fri Aug 10 23:04:28 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Aug 2007 14:04:28 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <e8bf7a530708101313m6f992b1du38c6200b17b9a63b@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>
	<ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
	<ca471dc20708101118y59b1e56dh5fda77fa461e69b8@mail.gmail.com>
	<ca471dc20708101128v43204087jb29e1a538fed8991@mail.gmail.com>
	<ca471dc20708101216m213028d0yec379995cf473157@mail.gmail.com>
	<e8bf7a530708101313m6f992b1du38c6200b17b9a63b@mail.gmail.com>
Message-ID: <ca471dc20708101404h4a2d8500i22a1a313ba638053@mail.gmail.com>

I tried test_shelve on three boxes, with the following results:

Ubuntu, using gdbm: pass
OSX, using dbm: pass
Red Hat 7.3, using bsddb: fail

So this seems to be a lingering bsddb failure. (I think that's the
"simple" bsddb module, not the full bsddb3 package.)

--Guido

On 8/10/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> I also see test_shelve failing because something is passing bytes as a
> dictionary key.  I've just started seeing it, but can't figure out
> what caused the change.
>
> Jeremy
>
> On 8/10/07, Guido van Rossum <guido at python.org> wrote:
> > I've updated the wiki page with the status for these. I've confirmed
> > the test_datetime segfault, but I can only provoke it when run in
> > sequence with all the others.
> >
> > I'm also experiencing a hang of test_asynchat when run in sequence.
> >
> > http://wiki.python.org/moin/Py3kStrUniTests
> >
> > --Guido
> >
> > On 8/10/07, Guido van Rossum <guido at python.org> wrote:
> > > Um, Neal reported some more failures with -R earlier. I can reproduce
> > > the failures for test_collections and test_runpy, but test_gzip passes
> > > fine for me (standalone). I'll look into these. I'm still running the
> > > full set as well, it'll take all day. I can't reproduce Neal's problem
> > > with test_poplib (which was pegging the CPU for him).
> > >
> > > On 8/10/07, Guido van Rossum <guido at python.org> wrote:
> > > > Status update:
> > > >
> > > > The following still leak (regrtest.py -R4:3:)
> > > >
> > > > test_array leaked [11, 11, 11] references, sum=33
> > > > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > > > test_parser leaked [5, 5, 5] references, sum=15
> > > > test_zipimport leaked [29, 29, 29] references, sum=87
> > > >
> > > > I can't reproduce the test_shelve failure.
> > > >
> > > > I *do* see the test_structmember failure, will investigate.
> > > >
> > > > I see a failure but no segfault in test_datetime; will investigate.
> > > >
> > > > Regarding test_univnewlines, this is virgin territory. I've never met
> > > > anyone who used the newlines attribute on file objects. I'll make a
> > > > separate post to call it out.
> > > >
> > > > --Guido
> > > >
> > > > On 8/10/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > > > Bah, who needs sleep anyways.  This list of problems should be fairly
> > > > > complete when running with -R.  (it skips the fatal error from
> > > > > test_datetime though)
> > > > >
> > > > > Code to trigger a leak:   b'\xff'.decode("utf8", "ignore")
> > > > >
> > > > > Leaks:
> > > > > test_array leaked [11, 11, 11] references, sum=33
> > > > > test_bytes leaked [4, 4, 4] references, sum=12
> > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > > > > test_codecs leaked [260, 260, 260] references, sum=780
> > > > > test_ctypes leaked [-22, 43, 10] references, sum=31
> > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > > > > test_parser leaked [5, 5, 5] references, sum=15
> > > > > test_unicode leaked [4, 4, 4] references, sum=12
> > > > > test_xml_etree leaked [128, 128, 128] references, sum=384
> > > > > test_xml_etree_c leaked [128, 128, 128] references, sum=384
> > > > > test_zipimport leaked [29, 29, 29] references, sum=87
> > > > >
> > > > > Failures with -R:
> > > > >
> > > > > test test_collections failed -- errors occurred; run in verbose mode for details
> > > > >
> > > > > test test_gzip failed -- Traceback (most recent call last):
> > > > >  File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> > > > > test_many_append
> > > > >    ztxt = zgfile.read(8192)
> > > > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
> > > > >    self._read(readsize)
> > > > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
> > > > >    self._read_eof()
> > > > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
> > > > >    crc32 = read32(self.fileobj)
> > > > >  File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
> > > > >    return struct.unpack("<l", input.read(4))[0]
> > > > >  File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
> > > > >    return o.unpack(s)
> > > > > struct.error: unpack requires a string argument of length 4
> > > > >
> > > > > test test_runpy failed -- Traceback (most recent call last):
> > > > >  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> > > > > in test_run_module
> > > > >    self._check_module(depth)
> > > > >  File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> > > > > in _check_module
> > > > >    d2 = run_module(mod_name) # Read from bytecode
> > > > >  File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
> > > > >    raise ImportError("No module named %s" % mod_name)
> > > > > ImportError: No module named runpy_test
> > > > >
> > > > > test test_shelve failed -- errors occurred; run in verbose mode for details
> > > > >
> > > > > test test_structmembers failed -- errors occurred; run in verbose mode
> > > > > for details
> > > > >
> > > > > test_univnewlines skipped -- This Python does not have universal newline support
> > > > >
> > > > > Traceback (most recent call last):
> > > > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in
> > > > > handle_request
> > > > >     self.process_request(request, client_address)
> > > > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in
> > > > > process_request
> > > > >     self.finish_request(request, client_address)
> > > > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in
> > > > > finish_request
> > > > >     self.RequestHandlerClass(request, client_address, self)
> > > > >   File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__
> > > > >     self.handle()
> > > > >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle
> > > > >     self.handle_one_request()
> > > > >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303,
> > > > > in handle_one_request
> > > > >     if not self.parse_request(): # An error code has been sent, just exit
> > > > >   File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281,
> > > > > in parse_request
> > > > >     self.headers = self.MessageClass(self.rfile, 0)
> > > > >   File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__
> > > > >     rfc822.Message.__init__(self, fp, seekable)
> > > > >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__
> > > > >     self.readheaders()
> > > > >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders
> > > > >     headerseen = self.isheader(line)
> > > > >   File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader
> > > > >     return line[:i].lower()
> > > > > AttributeError: 'bytes' object has no attribute 'lower'
> > > > >
> > > > > On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > > > > I wonder if a lot of the refleaks may have the same cause as this one:
> > > > > >
> > > > > >   b'\xff'.decode("utf8", "ignore")
> > > > > >
> > > > > > No leaks jumped out at me.  Here is the rest of the leaks that have
> > > > > > been reported so far.  I don't know how many have the same cause.
> > > > > >
> > > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216
> > > > > > test_parser leaked [5, 5, 5] references, sum=15
> > > > > >
> > > > > > The other failures that occurred with -R:
> > > > > >
> > > > > > test test_collections failed -- errors occurred; run in verbose mode for details
> > > > > >
> > > > > > test test_gzip failed -- Traceback (most recent call last):
> > > > > >   File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in
> > > > > > test_many_append
> > > > > >     ztxt = zgfile.read(8192)
> > > > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read
> > > > > >     self._read(readsize)
> > > > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read
> > > > > >     self._read_eof()
> > > > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof
> > > > > >     crc32 = read32(self.fileobj)
> > > > > >   File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32
> > > > > >     return struct.unpack("<l", input.read(4))[0]
> > > > > >   File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack
> > > > > >     return o.unpack(s)
> > > > > > struct.error: unpack requires a string argument of length 4
> > > > > >
> > > > > > test test_runpy failed -- Traceback (most recent call last):
> > > > > >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230,
> > > > > > in test_run_module
> > > > > >     self._check_module(depth)
> > > > > >   File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168,
> > > > > > in _check_module
> > > > > >     d2 = run_module(mod_name) # Read from bytecode
> > > > > >   File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module
> > > > > >     raise ImportError("No module named %s" % mod_name)
> > > > > > ImportError: No module named runpy_test
> > > > > >
> > > > > > test_textwrap was the last test to complete.  test_thread was still running.
> > > > > >
> > > > > > n
> > > > > > --
> > > > > > On 8/9/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > > > > > On 8/9/07, Guido van Rossum <guido at python.org> wrote:
> > > > > > > > This is done. The new py3k branch is ready for business.
> > > > > > > >
> > > > > > > > Left to do:
> > > > > > > >
> > > > > > > > - switch the buildbot and the doc builder to use the new branch (Neal)
> > > > > > >
> > > > > > > I've updated to use the new branch.  I got the docs building, but
> > > > > > > there are many more problems.  I won't re-enable the cronjob until
> > > > > > > more things are working.
> > > > > > >
> > > > > > > > There are currently about 7 failing unit tests left:
> > > > > > > >
> > > > > > > > test_bsddb
> > > > > > > > test_bsddb3
> > > > > > > > test_email
> > > > > > > > test_email_codecs
> > > > > > > > test_email_renamed
> > > > > > > > test_sqlite
> > > > > > > > test_urllib2_localnet
> > > > > > >
> > > > > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent
> > > > > > > (when I enable the script).
> > > > > > >
> > > > > > > There are other problems:
> > > > > > >  * had to kill test_poplib due to taking all cpu without progress
> > > > > > >  * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3:
> > > > > > > test_foo test_bar ...)
> > > > > > >  * at least one test fails with a fatal error
> > > > > > >  * make install fails
> > > > > > >
> > > > > > > Here are the details (probably best to update the wiki with status
> > > > > > > before people start working on these):
> > > > > > >
> > > > > > > I'm not sure what was happening with test_poplib.  I had to kill
> > > > > > > test_poplib due to taking all cpu without progress.  When I ran it by
> > > > > > > itself, it was fine.  So there was some bad interaction with another
> > > > > > > test.
> > > > > > >
> > > > > > > Ref leaks and fatal error (see
> > > > > > > http://docs.python.org/dev/3.0/results/make-test-refleak.out):
> > > > > > > test_array leaked [11, 11, 11] references, sum=33
> > > > > > > test_bytes leaked [4, 4, 4] references, sum=12
> > > > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63
> > > > > > > test_codecs leaked [260, 260, 260] references, sum=780
> > > > > > > test_ctypes leaked [10, 10, 10] references, sum=30
> > > > > > > Fatal Python error:
> > > > > > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at
> > > > > > > 0xb60b19c8 has negative ref count -4
> > > > > > >
> > > > > > > There are probably more, but I haven't had a chance to run more after
> > > > > > > test_datetime.
> > > > > > >
> > > > > > > This failure occurred while running with -R:
> > > > > > >
> > > > > > > test test_coding failed -- Traceback (most recent call last):
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > > > > > line 12, in test_bad_coding2
> > > > > > >     self.verify_bad_module(module_name)
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py",
> > > > > > > line 20, in verify_bad_module
> > > > > > >     text = fp.read()
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read
> > > > > > >     res += decoder.decode(self.buffer.read(), True)
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py",
> > > > > > > line 26, in decode
> > > > > > >     return codecs.ascii_decode(input, self.errors)[0]
> > > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
> > > > > > > 0: ordinal not in range(128)
> > > > > > >
> > > > > > >
> > > > > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure:
> > > > > > >
> > > > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ...
> > > > > > > Traceback (most recent call last):
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > > > 162, in <module>
> > > > > > >     exit_status = int(not main())
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > > > 152, in main
> > > > > > >     force, rx, quiet):
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > > > 89, in compile_dir
> > > > > > >     if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet):
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line
> > > > > > > 65, in compile_dir
> > > > > > >     ok = py_compile.compile(fullname, None, dfile, True)
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > > > > 144, in compile
> > > > > > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > > > > 49, in __init__
> > > > > > >     tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line
> > > > > > > 179, in format_exception_only
> > > > > > >     filename = value.filename or "<string>"
> > > > > > > AttributeError: 'tuple' object has no attribute 'filename'
> > > > > > >
> > > > > > > I'm guessing this came from the change in exception args handling?
> > > > > > >
> > > > > > >   File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line
> > > > > > > 144, in compile
> > > > > > >     py_exc = PyCompileError(err.__class__,err.args,dfile or file)
> > > > > > >
> > > > > > > n
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > > >
> > >
> > >
> > > --
> > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu
> >
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From alexandre at peadrop.com  Fri Aug 10 23:20:21 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Fri, 10 Aug 2007 17:20:21 -0400
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <46BC9400.70803@gmail.com>
References: <f9hakp$6n4$1@sea.gmane.org>
	<ca471dc20708100744u58df433s11479973620aa93a@mail.gmail.com>
	<46BC9400.70803@gmail.com>
Message-ID: <acd65fa20708101420t291a9dc6me42d04cfcf8b30e1@mail.gmail.com>

On 8/10/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> However we select between Python and native module versions, the build
> bots need be set up to run the modules both ways (with and without C
> optimisation).

That is trivial to do without any runtime flags. For example for
testing both the C and Python implementations of StringIO (and
BytesIO), I define the Python implementation with a leading underscore
and rename it if the C implementation is available:

  class _StringIO(TextIOWrapper):
      ...

  # Use the faster implementation of StringIO if available
  try:
      from _stringio import StringIO
  except ImportError:
      StringIO = _StringIO

With this way, the Python implementation remains available for testing
(or debugging). For testing the modules, I first check if the C
implementation is available, then define the test according to that --
just check out by yourself:
http://svn.python.org/view/python/branches/cpy_merge/Lib/test/test_memoryio.py?rev=56445&view=markup

-- Alexandre

From victor.stinner at haypocalc.com  Sat Aug 11 01:49:10 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sat, 11 Aug 2007 01:49:10 +0200
Subject: [Python-3000] [Email-SIG] fix email module for python 3000
	(bytes/str)
In-Reply-To: <200708090241.08369.victor.stinner@haypocalc.com>
References: <200708090241.08369.victor.stinner@haypocalc.com>
Message-ID: <200708110149.10939.victor.stinner@haypocalc.com>

Hi,

On Thursday 09 August 2007 02:41:08 Victor Stinner wrote:
> I started to work on email module to port it for Python 3000, but I have
> trouble to understand if a function should returns bytes or str (because I
> don't know email module).

It's really hard to convert email module to Python 3000 because it does mix 
byte strings and (unicode) character strings...

I wrote some notes about bytes/str helping people to migrate Python 2.x code 
to Python 3000, or at least to explain the difference between Python 
2.x "str" type and Python 3000 "bytes" type:
   http://wiki.python.org/moin/BytesStr

About email module, some deductions:
 test_email.py: openfile() must use 'rb' file mode for all tests
 base64MIME.decode() and base64MIME.encode() should accept bytes and str
 base64MIME.decode() result type is bytes
 base64MIME.encode() result type should be... bytes or str, no idea

Other decode() and encode() functions should use same rules about types.

Python modules (binascii and base64) choosed bytes type for encode result.

Victor Stinner aka haypo
http://hachoir.org/

From victor.stinner at haypocalc.com  Sat Aug 11 02:25:27 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sat, 11 Aug 2007 02:25:27 +0200
Subject: [Python-3000] bytes: compare bytes to integer
Message-ID: <200708110225.28056.victor.stinner@haypocalc.com>

Hi,

I don't like the behaviour of Python 3000 when we compare a bytes strings
with length=1:
   >>> b'xyz'[0] == b'x'
   False

The code can be see as:
   >>> ord(b'x') == b'x'
   False

or also:
   >>> 120 == b'x'
   False


Two solutions:
 1. b'xyz'[0] returns a new bytes object (b'x' instead of 120) 
    like b'xyz'[0:1] does
 2. allow to compare a bytes string of 1 byte with an integer

I prefer (2) since (1) is wrong: bytes contains integers and not bytes!

Victor Stinner aka haypo
http://hachoir.org/

From victor.stinner at haypocalc.com  Sat Aug 11 02:35:43 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sat, 11 Aug 2007 02:35:43 +0200
Subject: [Python-3000] Fix imghdr module for bytes
Message-ID: <200708110235.43664.victor.stinner@haypocalc.com>

Hi,

I just see that function what() of imghdr module requires str type for 
argument h which is totally wrong! An image file is composed of bytes and not 
characters.

Attached patch should fix it. Notes:
 - I used .startswith() instead of h[:len(s)] == s
 - I used h[0] == ord(b'P') instead of h[0] == b'P' because the second syntax 
doesn't work (see my other email "bytes: compare bytes to integer")
- str is allowed but doesn't work: what() always returns None

I dislike "h[0] == ord(b'P')", in Python 2.x it's simply "h[0] == 'P'". A 
shorter syntax would be "h[0] == 80" but I prefer explicit test. It's maybe 
stupid, we manipulate bytes and not character, so "h[0] == 80" is 
acceptable... maybe with a comment?


imghdr is included in unit tests?


Victor Stinner
http://hachoir.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: py3k-imghdr.patch
Type: text/x-diff
Size: 2512 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070811/de5fac37/attachment.bin 

From rhamph at gmail.com  Sat Aug 11 02:45:33 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 10 Aug 2007 18:45:33 -0600
Subject: [Python-3000] Fix imghdr module for bytes
In-Reply-To: <200708110235.43664.victor.stinner@haypocalc.com>
References: <200708110235.43664.victor.stinner@haypocalc.com>
Message-ID: <aac2c7cb0708101745w53247653ifc4aeef0e9c287a4@mail.gmail.com>

On 8/10/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> Hi,
>
> I just see that function what() of imghdr module requires str type for
> argument h which is totally wrong! An image file is composed of bytes and not
> characters.
>
> Attached patch should fix it. Notes:
>  - I used .startswith() instead of h[:len(s)] == s
>  - I used h[0] == ord(b'P') instead of h[0] == b'P' because the second syntax
> doesn't work (see my other email "bytes: compare bytes to integer")
> - str is allowed but doesn't work: what() always returns None
>
> I dislike "h[0] == ord(b'P')", in Python 2.x it's simply "h[0] == 'P'". A
> shorter syntax would be "h[0] == 80" but I prefer explicit test. It's maybe
> stupid, we manipulate bytes and not character, so "h[0] == 80" is
> acceptable... maybe with a comment?

Try h[0:1] == b'P'.  Slicing will ensure it stays as a bytes object,
rather than just giving the integer it contains.

-- 
Adam Olsen, aka Rhamphoryncus

From brotchie at gmail.com  Sat Aug 11 02:51:15 2007
From: brotchie at gmail.com (James Brotchie)
Date: Sat, 11 Aug 2007 10:51:15 +1000
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <46BC0BE6.90908@v.loewis.de>
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>
	<87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>
	<46BB96DF.5060305@v.loewis.de>
	<87ps1we3ak.fsf@hydra.hampton.thirdcreek.com>
	<46BB9CD7.2030301@v.loewis.de>
	<87lkckdyk6.fsf@hydra.hampton.thirdcreek.com>
	<46BC0BE6.90908@v.loewis.de>
Message-ID: <8e766a670708101751l5f1f3e7fh2e7e614520b2f7f0@mail.gmail.com>

On 8/10/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>
> >>> OTOH, IDLE ran w/o this error in p3yk...
> >> Yes. Somebody would have to study what precisely the problem is: is it
> >> that there is a None key in that dictionary, and that you must not use
> >> None as a tag name? In that case: where does the None come from?
> >> Or else: is it that you can use None as a tagname in 2.x, but can't
> >> anymore in 3.0? If so: why not?
> >
> > OK, I'll start looking at it.
>
So did I, somewhat. It looks like a genuine bug in IDLE to me: you
> can't use None as a tag name, AFAIU. I'm not quite sure why this
> doesn't cause an exception in 2.x; if I try to give a None tag
> separately (i.e. in a stand-alone program) in 2.5,
> it gives me the same exception.


In 2.x the 'tag configure None' call does indeed raise a TclError, this can
been seen by trapping calls to Tkinter_Error in _tkinter.c and outputting
the Tkapp_Result.

For some reason on py3k this TclError doesn't get caught anywhere, whilst in
2.x it either gets caught or just disappears.

This behaviour can be demonstrated with:

    def config_colors(self):
        for tag, cnf in self.tagdefs.items():
            if cnf:
                try:
                    self.tag_configure(tag, **cnf)
                except:
                    sys.exit(1)
        self.tag_raise('sel')

on py3k the exception is caught and execution stops
on 2.x no exception is caught and execution continues (however Tkinter_Error
is still called during Tkiner_Call execution!?)

tag_configure doesn't behave this way when used in a trivial stand-alone
program, must be some obscurity within idle.

I'm confused....

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070811/c62dff31/attachment.htm 

From victor.stinner at haypocalc.com  Sat Aug 11 03:09:00 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sat, 11 Aug 2007 03:09:00 +0200
Subject: [Python-3000] Fix sndhdr module for bytes
Message-ID: <200708110309.01014.victor.stinner@haypocalc.com>

Hi,

As imghdr, sndhdr tests were strill based on Unicode strings instead of bytes.

Attached patch should fix the module. I'm very, I was unable to test it.

Note: I replaced aifc.openfp with aifc.open since it's the new public 
function.

sndhdr requires some cleanup: it doesn't check division by zero in functions 
test_hcom and test_voc. I think that division by zero means that the file is 
invalid. I didn't want to fix these bugs in the same patch. So first I'm 
waiting your comments about this one :-)

Victor Stinner
http://hachoir.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: py3k-sndhdr.patch
Type: text/x-diff
Size: 3258 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070811/25d513cc/attachment-0001.bin 

From greg.ewing at canterbury.ac.nz  Sat Aug 11 03:31:49 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 11 Aug 2007 13:31:49 +1200
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <f9hakp$6n4$1@sea.gmane.org>
References: <f9hakp$6n4$1@sea.gmane.org>
Message-ID: <46BD1185.2080702@canterbury.ac.nz>

Christian Heimes wrote:
> But on the
> other hand it is going to make debugging with pdb much harder because
> pdb can't step into C code.

But wouldn't the only reason you want to step into,
e.g. pickle be if there were a bug in pickle itself?
And if this happens when you're using the C version
of pickle, you need to debug the C version. Debugging
the Python version instead isn't going to help you.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Aug 11 03:48:11 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 11 Aug 2007 13:48:11 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BC83BF.3000407@trueblade.com>
References: <46B13ADE.7080901@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com>
	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>
	<46BC83BF.3000407@trueblade.com>
Message-ID: <46BD155B.2010202@canterbury.ac.nz>

Eric Smith wrote:
> 1: "".format() ... understands which 
> types can be converted to other types, and does the conversions.
> 
> 2: each type's __format__ function understands how to convert to some 
> subset of all types (int can convert to float and decimal, for example).
> 
> The problem with approach 2 is that there's logic in 
> int.__format__() that understands float.__format__() specifiers, and 
> vice-versa.  At least with approach 1, all of this logic is in one place.

Whereas the problem with approach 1 is that it's not
extensible. You can't add new types with new format
specifiers that can be interconverted.

I don't think the logic needs to be complicated. As
long as the format spec syntaxes are chosen sensibly,
it's not necessary for e.g. int.__format__ to be able
to parse float's format specs, only to recognise when
it's got one. That could be as simple as

   if spec[:1] in 'efg':
     return float(self).__format__(spec)

> This implies that string and repr specifiers are discernible across all 
> types, and int and float specifiers are unique amongst themselves.

Another advantage of letting the __format__ methods
handle it is that a given type *can* handle another
type's format spec itself if it wants. E.g. if float
has some way of handling the 'd' format that's
considered better than converting to int first, then
it can do that.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Aug 11 03:57:42 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 11 Aug 2007 13:57:42 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BCA9C9.1010306@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com>
Message-ID: <46BD1796.3000904@canterbury.ac.nz>

Ron Adam wrote:
> 
> I'm not sure what you mean by "ditch all of these".

I was guessing that what's meant by returning a
(value, format_spec) tuple is to re-try the
formatting using the new value and spec. That's
what I thought was unnecessary, since the method
can do that itself if it wants.

> The output in this case is a null string. Setting the format spec to 
> None tell the format() function not to do anything more to it.

Then why not just return an empty string?

> The format function would first call the objects __format__ method and 
> give it a chance to have control, and depending on what is returned, try 
> to handle it or not.

Okay, I see now -- your format function has more smarts
in it than mine.

But as was suggested earlier, returning NotImplemented
ought to be enough to signal a fallback to a different
strategy.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Aug 11 04:03:39 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 11 Aug 2007 14:03:39 +1200
Subject: [Python-3000] Universal newlines support in Python 3.0
In-Reply-To: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
Message-ID: <46BD18FB.5030901@canterbury.ac.nz>

Guido van Rossum wrote:
> However, the old universal newlines feature also set an attibute named
> 'newlines' on the file object to a tuple of up to three elements
> giving the actual line endings that were observed on the file so far
> (\r, \n, or \r\n).

I've never used it, but I can see how it could be
useful, e.g. if you're implementing a text editor
that wants to be able to save the file back in
the same format it had before.

But such a specialised use could just as well be
provided by a library facility, such as a wrapper
class around a raw I/O stream.

--
Greg

From eric+python-dev at trueblade.com  Sat Aug 11 05:30:33 2007
From: eric+python-dev at trueblade.com (Eric V. Smith)
Date: Fri, 10 Aug 2007 23:30:33 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BD155B.2010202@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com>	<46B5FBD9.4020301@acm.org>
	<46BBBEC6.5030705@trueblade.com>	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>	<46BC83BF.3000407@trueblade.com>
	<46BD155B.2010202@canterbury.ac.nz>
Message-ID: <46BD2D59.1040209@trueblade.com>

Greg Ewing wrote:
> Eric Smith wrote:
>> 1: "".format() ... understands which 
>> types can be converted to other types, and does the conversions.
>>
>> 2: each type's __format__ function understands how to convert to some 
>> subset of all types (int can convert to float and decimal, for example).
>>
>> The problem with approach 2 is that there's logic in 
>> int.__format__() that understands float.__format__() specifiers, and 
>> vice-versa.  At least with approach 1, all of this logic is in one place.
> 
> Whereas the problem with approach 1 is that it's not
> extensible. You can't add new types with new format
> specifiers that can be interconverted.

Granted.

> I don't think the logic needs to be complicated. As
> long as the format spec syntaxes are chosen sensibly,
> it's not necessary for e.g. int.__format__ to be able
> to parse float's format specs, only to recognise when
> it's got one. That could be as simple as
> 
>    if spec[:1] in 'efg':
>      return float(self).__format__(spec)

Right.  Your "if" test is my is_float_specifier function.  The problem 
is that this needs to be shared between int and float and string, and 
anything else (maybe decimal?) that can be converted to a float.  Maybe 
we should make is_float_specifier a classmethod of float[1], so that 
int's __format__ (and also string's __format__) could say:

if float.is_float_specifier(spec):
    return float(self).__format__(spec)

And float's __format__ function could do all of the specifier testing, 
for types it knows to convert itself to, and then say:

if not float.is_float_specifier(spec):
     return NotImplemented
else:
     # do the actual formatting

And then presumably the top-level "".format() could check for 
NotImplemented and convert the value to a string and use the specifier 
on that:

result = value.__format__(spec)
if result is NotImplemented:
    return str(value).__format__(spec)
else:
    return result

Then we could take my approach number 1 above, but have the code that 
does the specifier testing be centralized.  I agree that central to all 
of this is choosing specifiers sensibly, for those types that we expect 
to supply interconversions (great word!).

For types that won't be participating in any conversions, such as date 
or user defined types, no such sensible specifiers are needed.

> Another advantage of letting the __format__ methods
> handle it is that a given type *can* handle another
> type's format spec itself if it wants. E.g. if float
> has some way of handling the 'd' format that's
> considered better than converting to int first, then
> it can do that.

But then float's int formatting code would have to fully implement the 
int formatter.  You couldn't add functionality to the int formatter (and 
its specifiers) without updating the code in 2 places: both int and float.

Eric.

[1]: If we make it a class method, it could just be is_specifier(), or 
maybe __is_specifier__.  This name would be implemented by all types 
that participate in the interconversions we're describing.

From rrr at ronadam.com  Sat Aug 11 05:50:02 2007
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 10 Aug 2007 22:50:02 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BD1796.3000904@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>
	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>
	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>
	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>
	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>
	<46BD1796.3000904@canterbury.ac.nz>
Message-ID: <46BD31EA.6040507@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> I'm not sure what you mean by "ditch all of these".
> 
> I was guessing that what's meant by returning a
> (value, format_spec) tuple is to re-try the
> formatting using the new value and spec. That's
> what I thought was unnecessary, since the method
> can do that itself if it wants.

It's not retrying because it hasn't tried yet.  As you noted below I think.

It lets the __format__ method do its thing first and then depending on what 
it gets back, it (the format function) may or may not do any formatting.

It's handy to pass both the format specifier and the value both times as 
the __format__ function may only alter one or the other.


>> The output in this case is a null string. Setting the format spec to 
>> None tell the format() function not to do anything more to it.
> 
> Then why not just return an empty string?

Because an empty string is a valid string.  It can be expanded to a minimum 
width which me may not want to do.

Now if returning a single value was equivalent to returning ('', None) or 
('', ''), then that would work.

The format function could check for that case.


>> The format function would first call the objects __format__ method and 
>> give it a chance to have control, and depending on what is returned, try 
>> to handle it or not.
> 
> Okay, I see now -- your format function has more smarts
> in it than mine.

Yes, enough so that you can either *not have* a __format__ method as the 
default, or supply "object" with a very simple generic one.  Which is a 
very easy way to give all objects their __format__ methods.

In the case of not having one, format would check for it.  In the case of a 
vary simple one that doesn't do anything, it just gets back what it sent, 
or possibly a NotImplemented exception.

Which ever strategy is faster should probably be the one chosen here.

> But as was suggested earlier, returning NotImplemented
> ought to be enough to signal a fallback to a different
> strategy.

That isn't the case we're referring to, it's the case where we want to 
suppress a fall back choice that's available.

Cheers,
    Ron


From greg.ewing at canterbury.ac.nz  Sat Aug 11 07:08:29 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 11 Aug 2007 17:08:29 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BD31EA.6040507@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD1796.3000904@canterbury.ac.nz>
	<46BD31EA.6040507@ronadam.com>
Message-ID: <46BD444D.1060907@canterbury.ac.nz>

Ron Adam wrote:
>
> Greg Ewing wrote:
 >
> > Then why not just return an empty string?
> 
> Because an empty string is a valid string.  It can be expanded to a 
> minimum width which me may not want to do.

I'm not seeing a use case for this. If the user says
he wants his field a certain minimum width, what
business does the type have overriding that?

--
Greg

From talin at acm.org  Sat Aug 11 08:48:56 2007
From: talin at acm.org (Talin)
Date: Fri, 10 Aug 2007 23:48:56 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BCA9C9.1010306@ronadam.com>
References: <46B13ADE.7080901@acm.org>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>	<46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com>
Message-ID: <46BD5BD8.7030706@acm.org>

I'm going to address several issues of the discussion, in hopes of 
short-cutting through some of the debate. I may not be responding to the 
correct person in all cases.

Ron Adam wrote:
> If you want the 'r' specifier to always have precedence over even custom 
> __format__ methods, then you can do that too, but I don't see the need.

In my conversation with Guido, he felt pretty strongly that he wanted 
'repr' to be able to override the type-specific __format__ function.

I'm assuming, therefore, that this is a non-negotiable design 
constraint. Unfortunately, it's one that completely destroys the nice 
neat orthogonal design that you've proposed.

I often find that the best way to reason about irreconcilable design 
constraints is to reduce them to a set of contradictory logical 
propositions:

   a) A __format__ method must be redefine the meaning of a format 
specifier.
   b) The 'repr' option must be able to take precedence the __format__ 
method.

The only possible resolution to this di-lemma is that the 'repr' option 
must not be part of the format specifier, but rather must be part of 
something else.

Assuming that we continue with the assumption that we want to delegate 
as much as possible to the __format__ methods of individual types, this 
means that we are pretty much forced to divide the format string into 
two pieces, which are:

    1) The part that __format__ is allowed to reinterpret.
    2) The part that __format__ is required to implement without 
reinterpreting.

----

Now, as far as delegating formatting between types: We don't need a 
hyper-extensible system for delegating to different formatters.

For all Python types except the numeric types, the operation of 
__format__ is pretty simple: the format specifier is passed to 
__format__ and that's it. If the __format__ method can't handle the 
specifier, that's an error, end of story.

Numeric types are special because of the fact that they are silently 
inter-convertable to each other. For example, you can add a float and an 
int, and Python will just do the right thing without complaining. It 
means that a Python programmer may not always know the exact numeric 
type they are dealing with - and this is a feature IMHO.

Therefore, it's my strong belief that you should be able to format any 
numeric type without knowing exactly what type it is. Which means that 
all numeric types need to be able to handle, in some way, all valid 
number format strings. IMHO.

Fortunately, the set of number types is small and fixed, and is not 
likely to increase any time soon. And this requirement does *not* apply 
to any data type other than numbers.

----

As to the issue of how flexible the system should be:

My belief is that one of the primary design criteria for the format 
specifier mini-language is that it doesn't detract from the readability 
of the format string.

So for example, if I have a string "Total: {0:d} Tax: {1:d}", it's 
fairly easy for me to mentally filter out the "{0:d}" part and replace 
it with a number, and this in turn lets me imagine how the string might 
look when printed.

(The older syntax, 'Total: %d', was even better in this regard, but when 
you start to use named/numbered fields it actually is worse. And 
implicit ordering is brittle.)

My design goal here is relatively simple: For the most common use cases, 
the format field shouldn't me much longer (if it all) than the value to 
be printed would be. For uncommon cases, where the programmer is 
invoking additional options, the format field can be longer, but it 
should still be kept concise.

----

One final thing I wanted to mention, which Guido reminded me, is that 
we're getting short on time. This PEP has not yet been officially 
accepted, and the reason is because of the lack of an implementation. I 
don't want to miss the boat. (The boat in this case being Alpha 1.)

-- Talin

From stephen at xemacs.org  Sat Aug 11 09:02:07 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 11 Aug 2007 16:02:07 +0900
Subject: [Python-3000] Universal newlines support in Python 3.0
In-Reply-To: <46BD18FB.5030901@canterbury.ac.nz>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<46BD18FB.5030901@canterbury.ac.nz>
Message-ID: <874pj67dw0.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 > Guido van Rossum wrote:
 > > However, the old universal newlines feature also set an attibute named
 > > 'newlines' on the file object to a tuple of up to three elements
 > > giving the actual line endings that were observed on the file so far
 > > (\r, \n, or \r\n).
 > 
 > I've never used it, but I can see how it could be
 > useful, e.g. if you're implementing a text editor
 > that wants to be able to save the file back in
 > the same format it had before.

But if there's more than one line ending used, that's not good
enough.  Universal newlines is a wonderful convenience for most text
usage, but if you really need to be able to preserve format, it's not
going to be enough.

I think it's best for universal newlines to be simple.  Let fancy
facilities be provided by a library wrapping raw IO, as you suggest.

From nnorwitz at gmail.com  Sat Aug 11 09:16:28 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sat, 11 Aug 2007 00:16:28 -0700
Subject: [Python-3000] release plans (was: More PEP 3101 changes incoming)
Message-ID: <ee2a432c0708110016x669f0be9vfe7269686cdc22b@mail.gmail.com>

On 8/10/07, Talin <talin at acm.org> wrote:
>
> One final thing I wanted to mention, which Guido reminded me, is that
> we're getting short on time. This PEP has not yet been officially
> accepted, and the reason is because of the lack of an implementation. I
> don't want to miss the boat. (The boat in this case being Alpha 1.)

Alpha 1 is a few *weeks* away.  The release will hopefully come
shortly after the sprint at Google which is Aug 22-25.
http://wiki.python.org/moin/GoogleSprint

It's time to get stuff done!  There's still a ton of work to do:
http://wiki.python.org/moin/Py3kToDo

One thing not mentioned on the wiki is finding and removing all the
cruft in the code and docs that has been deprecated.  If you know of
any of these things, please add a note to the wiki.  Some common
strings to look for are:  deprecated, compatibility, b/w, backward,
and obsolete.

n

From rrr at ronadam.com  Sat Aug 11 10:48:27 2007
From: rrr at ronadam.com (Ron Adam)
Date: Sat, 11 Aug 2007 03:48:27 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BD444D.1060907@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>
	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>
	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>
	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>
	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>
	<46BD1796.3000904@canterbury.ac.nz>	<46BD31EA.6040507@ronadam.com>
	<46BD444D.1060907@canterbury.ac.nz>
Message-ID: <46BD77DB.60405@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> Greg Ewing wrote:
>  >
>>> Then why not just return an empty string?
>> Because an empty string is a valid string.  It can be expanded to a 
>> minimum width which me may not want to do.
> 
> I'm not seeing a use case for this. If the user says
> he wants his field a certain minimum width, what
> business does the type have overriding that?

Just pointing out *one* capability that is possible, Maybe it was just a 
poor choice as an example.

Cheers,
    Ron








From talin at acm.org  Sat Aug 11 10:57:16 2007
From: talin at acm.org (Talin)
Date: Sat, 11 Aug 2007 01:57:16 -0700
Subject: [Python-3000] Format specifier proposal
Message-ID: <46BD79EC.1020301@acm.org>

Taking some ideas from the various threads, here's what I'd like to propose:

(Assume that brackets [] means 'optional field')

   [:[type][align][sign][[0]minwidth][.precision]][/fill][!r]

Examples:

    :f        # Floating point number of natural width
    :f10      # Floating point number, width at least 10
    :f010     # Floating point number, width at least 10, leading zeros
    :f.2      # Floating point number with two decimal digits
    :8        # Minimum width 8, type defaults to natural type
    :d+2      # Integer number, 2 digits, sign always shown
    !r        # repr() format
    :10!r     # Field width 10, repr() format
    :s10      # String right-aligned within field of minimum width
              # of 10 chars.
    :s10.10   # String right-aligned within field of minimum width
              # of 10 chars, maximum width 10.
    :s<10     # String left-aligned in 10 char (min) field.
    :d^15     # Integer centered in 15 character field
    :>15/.    # Right align and pad with '.' chars
    :f<+015.5 # Floating point, left aligned, always show sign,
              # leading zeros, field width 15 (min), 5 decimal places.

Notes:

   -- Leading zeros is different than fill character, although the two 
are mutually exclusive. (Leading zeros always go between the sign and 
the number, padding does not.)
   -- For strings, precision is used as maximum field width.
   -- __format__ functions are not allowed to re-interpret '!r'.

I realize that the grouping of things is a little odd - for example, it 
would be nice to put minwidth, padding and alignment in their own little 
group so that they could be processed independently from __format__. 
However:

   -- Since minwidth is the most common option, I wanted it to have no 
special prefix char.
   -- I wanted precision to come after minwidth, since the 'm.n' format 
feels intuitive and traditional.
   -- I wanted type to come first, since it affects how some attributes 
are interpreted.
   -- Putting the sign right before the width field also feels right.

The regex for interpreting this, BTW, is something like the following:

    "(?:\:([a-z])?(<|>|\^)?(+|-)?(\d+)(\.\d+))(/.)?(!r)?"

(Although it may make more sense to allow the fill and regex fields to 
appear in any order. In other words, any field that is identified by a 
unique prefix char can be specified in any order.)

-- Talin


From rrr at ronadam.com  Sat Aug 11 11:48:22 2007
From: rrr at ronadam.com (Ron Adam)
Date: Sat, 11 Aug 2007 04:48:22 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BD5BD8.7030706@acm.org>
References: <46B13ADE.7080901@acm.org>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>	<46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
Message-ID: <46BD85E6.1030005@ronadam.com>



Talin wrote:
> I'm going to address several issues of the discussion, in hopes of 
> short-cutting through some of the debate. I may not be responding to the 
> correct person in all cases.
> 
> Ron Adam wrote:
>> If you want the 'r' specifier to always have precedence over even 
>> custom __format__ methods, then you can do that too, but I don't see 
>> the need.
> 
> In my conversation with Guido, he felt pretty strongly that he wanted 
> 'repr' to be able to override the type-specific __format__ function.
 >
> I'm assuming, therefore, that this is a non-negotiable design 
> constraint. Unfortunately, it's one that completely destroys the nice 
> neat orthogonal design that you've proposed.

It doesn't completely destroy it, but it does create one exception that 
needs to be remembered.  As I said, it doesn't have to be a special syntax. 
  {0:r} is just fine.


> I often find that the best way to reason about irreconcilable design 
> constraints is to reduce them to a set of contradictory logical 
> propositions:
> 
>   a) A __format__ method must be redefine the meaning of a format 
> specifier.
>   b) The 'repr' option must be able to take precedence the __format__ 
> method.
> 
> The only possible resolution to this di-lemma is that the 'repr' option 
> must not be part of the format specifier, but rather must be part of 
> something else.

You are either going to have one or the other, but never both, so there 
isn't any conflict.

    'ojbect: {0:r}'.format(obj)

or

    'object: {0:s}'.format(repr(obj))

These don't collide in any way.  The only question is weather the 'r' 
specifier also allows for other options like width and alignment.


> Assuming that we continue with the assumption that we want to delegate 
> as much as possible to the __format__ methods of individual types, this 
> means that we are pretty much forced to divide the format string into 
> two pieces, which are:
> 
>    1) The part that __format__ is allowed to reinterpret.
>    2) The part that __format__ is required to implement without 
> reinterpreting.

What should not be allowed?  And why?


> Now, as far as delegating formatting between types: We don't need a 
> hyper-extensible system for delegating to different formatters.
> 
> For all Python types except the numeric types, the operation of 
> __format__ is pretty simple: the format specifier is passed to 
> __format__ and that's it. If the __format__ method can't handle the 
> specifier, that's an error, end of story.
> 
> Numeric types are special because of the fact that they are silently 
> inter-convertable to each other. For example, you can add a float and an 
> int, and Python will just do the right thing without complaining. It 
> means that a Python programmer may not always know the exact numeric 
> type they are dealing with - and this is a feature IMHO.
> 
> Therefore, it's my strong belief that you should be able to format any 
> numeric type without knowing exactly what type it is. Which means that 
> all numeric types need to be able to handle, in some way, all valid 
> number format strings. IMHO.
 >
> Fortunately, the set of number types is small and fixed, and is not 
> likely to increase any time soon. And this requirement does *not* apply 
> to any data type other than numbers.

If this is the direction you and Guido want, I'll try to help make it work. 
  It seems it's also what greg is thinking of.

I think to get the python emplementation moving, we need to impliment the 
whole event chain, not just a function.

So starting with a fstr() type and then possibly fint() and ffloat() etc... 
  where each subclass their respective types and add format and __format__ 
methods respectively.  (for now and for testing)

Then working out the rest will be more productive I think.


> ----
> 
> As to the issue of how flexible the system should be:

Umm... I think you got side tracked here.  The below paragraphs are about 
readability and syntax,  not how flexible of the underlying system.

The question of flexibility is more to do with what things can be allowed 
to be overridden, and what things should not.  For example just how much 
control does a __format__ method have?  I thought the idea was the entire 
format specifier is sent to the __format__ method and then it can do what 
it wants with it.  Possibly replacing the specifier altogether and 
initiating another types __format__ method with the substituted format 
specifier.  Which sounds just fine to me and is what I want too.

So far we established the repr formatter can not be over ridden, but not 
addressed anything else or how to do that specifically.  (Weve addressed it 
in generally terms yes, but we haven't gotten into the details.)


> My belief is that one of the primary design criteria for the format 
> specifier mini-language is that it doesn't detract from the readability 
> of the format string.
> 
> So for example, if I have a string "Total: {0:d} Tax: {1:d}", it's 
> fairly easy for me to mentally filter out the "{0:d}" part and replace 
> it with a number, and this in turn lets me imagine how the string might 
> look when printed.
> 
> (The older syntax, 'Total: %d', was even better in this regard, but when 
> you start to use named/numbered fields it actually is worse. And 
> implicit ordering is brittle.)
> 
> My design goal here is relatively simple: For the most common use cases, 
> the format field shouldn't me much longer (if it all) than the value to 
> be printed would be. For uncommon cases, where the programmer is 
> invoking additional options, the format field can be longer, but it 
> should still be kept concise.
> 
> ----
> 
> One final thing I wanted to mention, which Guido reminded me, is that 
> we're getting short on time. This PEP has not yet been officially 
> accepted, and the reason is because of the lack of an implementation. I 
> don't want to miss the boat. (The boat in this case being Alpha 1.)

I'll forward what I've been playing with so you can take a look.  It's not 
the preferred way, but it may have some things in it you can use.  And it's 
rather incomplete still.

Cheers,
    Ron


From benji at benjiyork.com  Sat Aug 11 14:32:03 2007
From: benji at benjiyork.com (Benji York)
Date: Sat, 11 Aug 2007 08:32:03 -0400
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <46BD1185.2080702@canterbury.ac.nz>
References: <f9hakp$6n4$1@sea.gmane.org> <46BD1185.2080702@canterbury.ac.nz>
Message-ID: <46BDAC43.3050904@benjiyork.com>

Greg Ewing wrote:
> Christian Heimes wrote:
>> But on the
>> other hand it is going to make debugging with pdb much harder because
>> pdb can't step into C code.
> 
> But wouldn't the only reason you want to step into,
> e.g. pickle be if there were a bug in pickle itself?

I believe he's talking about a situation where pickle calls back into 
Python.
-- 
Benji York
http://benjiyork.com

From martin at v.loewis.de  Sat Aug 11 16:25:42 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Aug 2007 16:25:42 +0200
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>	<ee2a432c0708092331s45df1950yd1bfe14bac22fdbb@mail.gmail.com>
	<ee2a432c0708100031x3424e3e9v716e50c7464e4912@mail.gmail.com>
Message-ID: <46BDC6E6.1080601@v.loewis.de>

> test_array leaked [11, 11, 11] references, sum=33

I fixed that in r56924

Regards,
Martin

From martin at v.loewis.de  Sat Aug 11 16:46:29 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Aug 2007 16:46:29 +0200
Subject: [Python-3000] Console encoding detection broken
In-Reply-To: <ca471dc20708101026j6368a727r8a946579c22e165b@mail.gmail.com>
References: <f9grv8$1eg$1@sea.gmane.org> <46BC02FC.6080107@v.loewis.de>
	<ca471dc20708101026j6368a727r8a946579c22e165b@mail.gmail.com>
Message-ID: <46BDCBC5.2060503@v.loewis.de>

> Feel free to add code that implements this. I suppose it would be a
> good idea to have a separate function io.guess_console_encoding(...)
> which takes some argument (perhaps a raw file?) and returns an
> encoding name, never None. This could then be implemented by switching
> on the platform into platform-specific functions and a default.

I've added os.device_encoding, which returns the terminal's encoding
if possible. If the device is not a terminal, it falls back to
locale.getpreferredencoding().

Regards,
Martin


From tony at PageDNA.com  Sat Aug 11 18:45:37 2007
From: tony at PageDNA.com (Tony Lownds)
Date: Sat, 11 Aug 2007 09:45:37 -0700
Subject: [Python-3000] Universal newlines support in Python 3.0
In-Reply-To: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
Message-ID: <E1DD53D0-168D-4DC1-BF76-E7A3DE64F359@PageDNA.com>


On Aug 10, 2007, at 11:23 AM, Guido van Rossum wrote:

> Python 3.0 currently has limited universal newlines support: by
> default, \r\n is translated into \n for text files, but this can be
> controlled by the newline= keyword parameter. For details on how, see
> PEP 3116. The PEP prescribes that a lone \r must also be translated,
> though this hasn't been implemented yet (any volunteers?).
>

I'm working on this, but now I'm not sure how the file is supposed to  
be read when
the newline parameter is \r or \r\n. Here's the PEP language:

   buffer is a reference to the BufferedIOBase object to be wrapped  
with the TextIOWrapper.
   encoding refers to an encoding to be used for translating between  
the byte-representation
   and character-representation. If it is None, then the system's  
locale setting will be used
   as the default. newline can be None, '\n', '\r', or '\r\n' (all  
other values are illegal);
   it indicates the translation for '\n' characters written. If None,  
a system-specific default
   is chosen, i.e., '\r\n' on Windows and '\n' on Unix/Linux. Setting  
newline='\n' on input
   means that no CRLF translation is done; lines ending in '\r\n'  
will be returned as '\r\n'.
   ('\r' support is still needed for some OSX applications that  
produce files using '\r' line
   endings; Excel (when exporting to text) and Adobe Illustrator EPS  
files are the most common examples.

Is this ok: when newline='\r\n' or newline='\r' is passed, only that  
string is used to determine
the end of lines. No translation to '\n' is done.

> However, the old universal newlines feature also set an attibute named
> 'newlines' on the file object to a tuple of up to three elements
> giving the actual line endings that were observed on the file so far
> (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
> implemented. I'm tempted to kill it. Does anyone have a use case for
> this? Has anyone even ever used this?
>

This strikes me as a pragmatic feature, making it easy to read a file
and write back the same line ending. I can include in patch.

http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 
+show:cz2Fhijwr3s:yutdXigOmYY:YDns9IyEkLQ&sa=N&cd=12&ct=rc&cs_p=http://f 
tp.gnome.org/pub/gnome/sources/meld/1.0/ 
meld-1.0.0.tar.bz2&cs_f=meld-1.0.0/filediff.py#a0

http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 
+show:SLyZnjuFadw:kOTmKU8aU2I:VX_dFr3mrWw&sa=N&cd=37&ct=rc&cs_p=http://s 
vn.python.org/projects/ctypes/trunk&cs_f=ctypeslib/ctypeslib/ 
dynamic_module.py#a0

Thanks
-Tony


From guido at python.org  Sat Aug 11 19:29:38 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 11 Aug 2007 10:29:38 -0700
Subject: [Python-3000] Universal newlines support in Python 3.0
In-Reply-To: <E1DD53D0-168D-4DC1-BF76-E7A3DE64F359@PageDNA.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<E1DD53D0-168D-4DC1-BF76-E7A3DE64F359@PageDNA.com>
Message-ID: <ca471dc20708111029n5722ff04h824eb8788a4c824e@mail.gmail.com>

On 8/11/07, Tony Lownds <tony at pagedna.com> wrote:
>
> On Aug 10, 2007, at 11:23 AM, Guido van Rossum wrote:
>
> > Python 3.0 currently has limited universal newlines support: by
> > default, \r\n is translated into \n for text files, but this can be
> > controlled by the newline= keyword parameter. For details on how, see
> > PEP 3116. The PEP prescribes that a lone \r must also be translated,
> > though this hasn't been implemented yet (any volunteers?).
> >
>
> I'm working on this, but now I'm not sure how the file is supposed to
> be read when
> the newline parameter is \r or \r\n. Here's the PEP language:
>
>    buffer is a reference to the BufferedIOBase object to be wrapped
> with the TextIOWrapper.
>    encoding refers to an encoding to be used for translating between
> the byte-representation
>    and character-representation. If it is None, then the system's
> locale setting will be used
>    as the default. newline can be None, '\n', '\r', or '\r\n' (all
> other values are illegal);
>    it indicates the translation for '\n' characters written. If None,
> a system-specific default
>    is chosen, i.e., '\r\n' on Windows and '\n' on Unix/Linux. Setting
> newline='\n' on input
>    means that no CRLF translation is done; lines ending in '\r\n'
> will be returned as '\r\n'.
>    ('\r' support is still needed for some OSX applications that
> produce files using '\r' line
>    endings; Excel (when exporting to text) and Adobe Illustrator EPS
> files are the most common examples.
>
> Is this ok: when newline='\r\n' or newline='\r' is passed, only that
> string is used to determine
> the end of lines. No translation to '\n' is done.

I *think* it would be more useful if it always returned lines ending
in \n (not \r\n or \r). Wouldn't it? Although this is not how it
currently behaves; when you set newline='\r\n', it returns the \r\n
unchanged, so it would make sense to do this too when newline='\r'.
Caveat user I guess.

> > However, the old universal newlines feature also set an attibute named
> > 'newlines' on the file object to a tuple of up to three elements
> > giving the actual line endings that were observed on the file so far
> > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
> > implemented. I'm tempted to kill it. Does anyone have a use case for
> > this? Has anyone even ever used this?
> >
>
> This strikes me as a pragmatic feature, making it easy to read a file
> and write back the same line ending. I can include in patch.

OK, if you think you can, that's good. It's not always sufficient (not
if there was a mix of line endings) but it's a start.

> http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22
> +show:cz2Fhijwr3s:yutdXigOmYY:YDns9IyEkLQ&sa=N&cd=12&ct=rc&cs_p=http://f
> tp.gnome.org/pub/gnome/sources/meld/1.0/
> meld-1.0.0.tar.bz2&cs_f=meld-1.0.0/filediff.py#a0
>
> http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22
> +show:SLyZnjuFadw:kOTmKU8aU2I:VX_dFr3mrWw&sa=N&cd=37&ct=rc&cs_p=http://s
> vn.python.org/projects/ctypes/trunk&cs_f=ctypeslib/ctypeslib/
> dynamic_module.py#a0

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Aug 11 19:53:02 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 11 Aug 2007 10:53:02 -0700
Subject: [Python-3000] Four new failing tests
Message-ID: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>

I see four tests fail that passed yesterday:

< test_csv
< test_shelve
< test_threaded_import
< test_wsgiref

Details:

test_csv: one error
======================================================================
ERROR: test_char_write (__main__.TestArrayWrites)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_csv.py", line 648, in test_char_write
    a = array.array('u', string.letters)
ValueError: string length not a multiple of item size

test_shelve: 9 error, last:
======================================================================
ERROR: test_write (__main__.TestProto2FileShelve)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/google/home/guido/python/py3k/Lib/test/mapping_tests.py",
line 118, in test_write
    self.failIf(knownkey in d)
  File "/usr/local/google/home/guido/python/py3k/Lib/shelve.py", line
92, in __contains__
    return key.encode(self.keyencoding) in self.dict
TypeError: gdbm key must be string, not bytes

test_threaded_import:
Trying 20 threads ... OK.
Trying 50 threads ... OK.
Trying 20 threads ... OK.
Trying 50 threads ... OK.
Trying 20 threads ... OK.
Trying 50 threads ... OK.
Traceback (most recent call last):
  File "Lib/test/test_threaded_import.py", line 75, in <module>
    test_main()
  File "Lib/test/test_threaded_import.py", line 72, in test_main
    test_import_hangers()
  File "Lib/test/test_threaded_import.py", line 36, in test_import_hangers
    raise TestFailed(test.threaded_import_hangers.errors)
test.test_support.TestFailed: ['tempfile.TemporaryFile appeared to hang']
testing import hangers ... [54531 refs]

test_wsgiref: 3 errors, last:
======================================================================
ERROR: test_validated_hello (__main__.IntegrationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_wsgiref.py", line 145, in test_validated_hello
    out, err = run_amock(validator(hello_app))
  File "Lib/test/test_wsgiref.py", line 58, in run_amock
    server.finish_request((inp, out), ("127.0.0.1",8888))
  File "/usr/local/google/home/guido/python/py3k/Lib/SocketServer.py",
line 254, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/google/home/guido/python/py3k/Lib/SocketServer.py",
line 522, in __init__
    self.handle()
  File "/usr/local/google/home/guido/python/py3k/Lib/wsgiref/simple_server.py",
line 131, in handle
    if not self.parse_request(): # An error code has been sent, just exit
  File "/usr/local/google/home/guido/python/py3k/Lib/BaseHTTPServer.py",
line 283, in parse_request
    text = io.TextIOWrapper(self.rfile)
  File "/usr/local/google/home/guido/python/py3k/Lib/io.py", line 975,
in __init__
    encoding = os.device_encoding(buffer.fileno())
  File "/usr/local/google/home/guido/python/py3k/Lib/io.py", line 576, in fileno
    return self.raw.fileno()
  File "/usr/local/google/home/guido/python/py3k/Lib/io.py", line 299, in fileno
    self._unsupported("fileno")
  File "/usr/local/google/home/guido/python/py3k/Lib/io.py", line 185,
in _unsupported
    (self.__class__.__name__, name))
io.UnsupportedOperation: BytesIO.fileno() not supported


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nnorwitz at gmail.com  Sat Aug 11 20:39:10 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sat, 11 Aug 2007 11:39:10 -0700
Subject: [Python-3000] Four new failing tests
In-Reply-To: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
Message-ID: <ee2a432c0708111139g68aee53dg47cb1e9275c7f5f6@mail.gmail.com>

On 8/11/07, Guido van Rossum <guido at python.org> wrote:
> I see four tests fail that passed yesterday:
>
> < test_csv
> < test_shelve
> < test_threaded_import
> < test_wsgiref

The only failure I could reproduce was test_wsgiref.  That problem was
fixed in 56932.

I had updated the previous revision and built from make clean.  I
wonder if there are some subtle stability problems given the various
intermittent problems we've seen.  If anyone has time, it would be
interesting to use valgrind or purify on 3k.

n

From tony at pagedna.com  Sat Aug 11 20:41:08 2007
From: tony at pagedna.com (Tony Lownds)
Date: Sat, 11 Aug 2007 11:41:08 -0700
Subject: [Python-3000] Universal newlines support in Python 3.0
In-Reply-To: <ca471dc20708111029n5722ff04h824eb8788a4c824e@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<E1DD53D0-168D-4DC1-BF76-E7A3DE64F359@PageDNA.com>
	<ca471dc20708111029n5722ff04h824eb8788a4c824e@mail.gmail.com>
Message-ID: <B416984A-309B-42FA-90D2-B37549D8F139@pagedna.com>


On Aug 11, 2007, at 10:29 AM, Guido van Rossum wrote:
>> Is this ok: when newline='\r\n' or newline='\r' is passed, only that
>> string is used to determine
>> the end of lines. No translation to '\n' is done.
>
> I *think* it would be more useful if it always returned lines ending
> in \n (not \r\n or \r). Wouldn't it? Although this is not how it
> currently behaves; when you set newline='\r\n', it returns the \r\n
> unchanged, so it would make sense to do this too when newline='\r'.
> Caveat user I guess.

Because there's an easy way to translate, having the option to not  
translate
apply to all valid newline values is probably more useful. I do think  
it's easier
to define the behavior this way.

> OK, if you think you can, that's good. It's not always sufficient (not
> if there was a mix of line endings) but it's a start.

Right

-Tony

From martin at v.loewis.de  Sat Aug 11 21:29:31 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Aug 2007 21:29:31 +0200
Subject: [Python-3000] Four new failing tests
In-Reply-To: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
Message-ID: <46BE0E1B.3050000@v.loewis.de>

> test_csv: one error
> ======================================================================
> ERROR: test_char_write (__main__.TestArrayWrites)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "Lib/test/test_csv.py", line 648, in test_char_write
>     a = array.array('u', string.letters)
> ValueError: string length not a multiple of item size

Please try again. gdbm wasn't using bytes properly.

Regards,
Martin

From martin at v.loewis.de  Sat Aug 11 21:39:46 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Aug 2007 21:39:46 +0200
Subject: [Python-3000] Four new failing tests
In-Reply-To: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
Message-ID: <46BE1082.40300@v.loewis.de>

> ======================================================================
> ERROR: test_char_write (__main__.TestArrayWrites)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "Lib/test/test_csv.py", line 648, in test_char_write
>     a = array.array('u', string.letters)
> ValueError: string length not a multiple of item size

I think some decision should be made wrt. string.letters.

Clearly, string.letters cannot reasonably contain *all* letters
(i.e. all characters of categories Ll, Lu, Lt, Lo). Or can it?

Traditionally, string.letters contained everything that is a letter
in the current locale. Still, computing this string might be expensive
assuming you have to go through all Unicode code points and determine
whether they are letters in the current locale.

So I see the following options:
1. remove it entirely. Keep string.ascii_letters instead
2. remove string.ascii_letters, and make string.letters to be
   ASCII only.
3. Make string.letters contain all letters in the current locale.
4. Make string.letters truly contain everything that is classified
   as a letter in the Unicode database.

Which one should happen?

Regards,
Martin

From rhamph at gmail.com  Sat Aug 11 22:46:07 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Sat, 11 Aug 2007 14:46:07 -0600
Subject: [Python-3000] Four new failing tests
In-Reply-To: <46BE1082.40300@v.loewis.de>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
	<46BE1082.40300@v.loewis.de>
Message-ID: <aac2c7cb0708111346w51a251afh167c6f5e0774369f@mail.gmail.com>

On 8/11/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > ======================================================================
> > ERROR: test_char_write (__main__.TestArrayWrites)
> > ----------------------------------------------------------------------
> > Traceback (most recent call last):
> >   File "Lib/test/test_csv.py", line 648, in test_char_write
> >     a = array.array('u', string.letters)
> > ValueError: string length not a multiple of item size
>
> I think some decision should be made wrt. string.letters.
>
> Clearly, string.letters cannot reasonably contain *all* letters
> (i.e. all characters of categories Ll, Lu, Lt, Lo). Or can it?
>
> Traditionally, string.letters contained everything that is a letter
> in the current locale. Still, computing this string might be expensive
> assuming you have to go through all Unicode code points and determine
> whether they are letters in the current locale.
>
> So I see the following options:
> 1. remove it entirely. Keep string.ascii_letters instead
> 2. remove string.ascii_letters, and make string.letters to be
>    ASCII only.
> 3. Make string.letters contain all letters in the current locale.
> 4. Make string.letters truly contain everything that is classified
>    as a letter in the Unicode database.

Wasn't unicodedata.ascii_letters suggested at one point (to eliminate
the string module), or was that my imagination?

IMO, if there is a need for unicode or locale letters, we should
provide a function to generate them as needed.  It can be passed
directly to set or whatever datastructure is actually needed.  We
shouldn't burden the startup cost with such a large datastructure
unless absolutely necessary (nor should we use a property to load it
when first needed; expensive to compute attribute and all that).

-- 
Adam Olsen, aka Rhamphoryncus

From greg.ewing at canterbury.ac.nz  Sun Aug 12 03:52:11 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 12 Aug 2007 13:52:11 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BD5BD8.7030706@acm.org>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
Message-ID: <46BE67CB.9010101@canterbury.ac.nz>

Talin wrote:
> we are pretty much forced to divide the format string into 
> two pieces, which are:
> 
>    1) The part that __format__ is allowed to reinterpret.
>    2) The part that __format__ is required to implement without 
> reinterpreting.

or
      2) the part that format() interprets itself without
         involving the __format__ method.

The trouble is, treating 'r' this way means inventing a
whole new part of the format string whose *only* use is
to provide a way of specifying 'r'. Furthermore, whenever
this part is used, the form of the regular format spec
that goes with it will be highly constrained, as it doesn't
make sense to use anything other than an 's'-type format
along with 'r'.

Given the extreme non-orthogonality that these two parts
of the format spec would have, separating them doesn't seem
like a good idea to me. It looks like excessive purity at
the expense of practicality.

> This PEP has not yet been officially 
> accepted, and the reason is because of the lack of an implementation. I 
> don't want to miss the boat.

Although it wouldn't be good to rush things and end
up committed to something that wasn't the best. Py3k
is supposed to be removing warts, not introducing new
ones.

--
Greg

From greg.ewing at canterbury.ac.nz  Sun Aug 12 03:54:44 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 12 Aug 2007 13:54:44 +1200
Subject: [Python-3000] Universal newlines support in Python 3.0
In-Reply-To: <874pj67dw0.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<46BD18FB.5030901@canterbury.ac.nz>
	<874pj67dw0.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <46BE6864.9020103@canterbury.ac.nz>

Stephen J. Turnbull wrote:
> But if there's more than one line ending used, that's not good
> enough.

If there's more than one, then you're in trouble anyway.
In the usual case where there is only one, it provides
a way of finding out what it is.

--
Greg

From kbk at shore.net  Sun Aug 12 04:01:09 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Sat, 11 Aug 2007 22:01:09 -0400
Subject: [Python-3000] idle3.0 - is is supposed to work?
In-Reply-To: <46BC0BE6.90908@v.loewis.de> (Martin v. =?iso-8859-1?Q?L=F6wi?=
	=?iso-8859-1?Q?s's?= message of "Fri, 10 Aug 2007 08:55:34 +0200")
References: <tkrat.1e8187685ad86255@igpm.rwth-aachen.de>
	<87tzr8ei80.fsf@hydra.hampton.thirdcreek.com>
	<ca471dc20708091031k3ed03d8ap6c211706c4f19f7b@mail.gmail.com>
	<46BB96DF.5060305@v.loewis.de>
	<87ps1we3ak.fsf@hydra.hampton.thirdcreek.com>
	<46BB9CD7.2030301@v.loewis.de>
	<87lkckdyk6.fsf@hydra.hampton.thirdcreek.com>
	<46BC0BE6.90908@v.loewis.de>
Message-ID: <87absxldei.fsf@hydra.hampton.thirdcreek.com>

"Martin v. L?wis" <martin at v.loewis.de> writes:

>>>> OTOH, IDLE ran w/o this error in p3yk...
>>> Yes. Somebody would have to study what precisely the problem is: is it
>>> that there is a None key in that dictionary, and that you must not use
>>> None as a tag name? In that case: where does the None come from?
>>> Or else: is it that you can use None as a tagname in 2.x, but can't
>>> anymore in 3.0? If so: why not?
>> 
>> OK, I'll start looking at it.
>
> So did I, somewhat. It looks like a genuine bug in IDLE to me: you
> can't use None as a tag name, AFAIU. I'm not quite sure why this
> doesn't cause an exception in 2.x; if I try to give a None tag
> separately (i.e. in a stand-alone program) in 2.5,
> it gives me the same exception.

I've commented out the None tag.  It appears to be inoperative in any case.

That plus initializing 'iomark' correctly got me to the point where IDLE
is producing encoding errors.  See following message.

-- 
KBK

From kbk at shore.net  Sun Aug 12 04:13:38 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Sat, 11 Aug 2007 22:13:38 -0400
Subject: [Python-3000] IDLE encoding setup
Message-ID: <87643llctp.fsf@hydra.hampton.thirdcreek.com>

I've checked in a version of PyShell.py which directs exceptions to the
terminal instead of to IDLE's shell since the latter isn't working right now.

There also is apparently an encoding issue with the subprocess setup
which I'm ignoring for now by starting IDLE w/o the subprocess:

cd Lib/idlelib
../../python ./idle.py -n

Traceback (most recent call last):
  File "./idle.py", line 21, in <module>
    idlelib.PyShell.main()
  File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 1389, in main
    shell = flist.open_shell()
  File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 274, in open_shell
    if not self.pyshell.begin():
  File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 976, in begin
    self.firewallmessage, idlever.IDLE_VERSION, nosub))
  File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 1214, in write
    OutputWindow.write(self, s, tags, "iomark")
  File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/OutputWindow.py", line 42, in write
    s = str(s, IOBinding.encoding)
TypeError: decoding Unicode is not supported

Hopefully MvL has a few minutes to revisit the IOBinding.py code which is
setting IDLE's encoding.  I'm not sure how it should be configured.

-- 
KBK

From greg.ewing at canterbury.ac.nz  Sun Aug 12 04:16:38 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 12 Aug 2007 14:16:38 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BD85E6.1030005@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
	<46BD85E6.1030005@ronadam.com>
Message-ID: <46BE6D86.6040205@canterbury.ac.nz>

Ron Adam wrote:
> 
> The only question is weather the 'r' 
> specifier also allows for other options like width and alignment.

I'd say it should have exactly the same options as 's'.

--
Greg

From collinw at gmail.com  Sun Aug 12 04:27:51 2007
From: collinw at gmail.com (Collin Winter)
Date: Sat, 11 Aug 2007 21:27:51 -0500
Subject: [Python-3000] Untested py3k regressions
Message-ID: <43aa6ff70708111927q5a1d924cx14f73517c0143ff4@mail.gmail.com>

Hi all,

I've started a wiki page to catalog known regressions in the py3k
branch that aren't covered by the test suite:
http://wiki.python.org/moin/Py3kRegressions.

First up: dir() doesn't work on traceback objects (it now produces an
empty list). A patch for this is up at http://python.org/sf/1772489.

Collin Winter

From eric+python-dev at trueblade.com  Sun Aug 12 04:34:14 2007
From: eric+python-dev at trueblade.com (Eric V. Smith)
Date: Sat, 11 Aug 2007 22:34:14 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BE6D86.6040205@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>
	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>
	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>
	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>
	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org>	<46BD85E6.1030005@ronadam.com>
	<46BE6D86.6040205@canterbury.ac.nz>
Message-ID: <46BE71A6.9020609@trueblade.com>

Greg Ewing wrote:
> Ron Adam wrote:
>> The only question is weather the 'r' 
>> specifier also allows for other options like width and alignment.
> 
> I'd say it should have exactly the same options as 's'.

Agreed.

From nnorwitz at gmail.com  Sun Aug 12 04:40:40 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sat, 11 Aug 2007 19:40:40 -0700
Subject: [Python-3000] Untested py3k regressions
In-Reply-To: <43aa6ff70708111927q5a1d924cx14f73517c0143ff4@mail.gmail.com>
References: <43aa6ff70708111927q5a1d924cx14f73517c0143ff4@mail.gmail.com>
Message-ID: <ee2a432c0708111940x40d63a69i4a24a4d6fb0af66d@mail.gmail.com>

On 8/11/07, Collin Winter <collinw at gmail.com> wrote:
> Hi all,
>
> I've started a wiki page to catalog known regressions in the py3k
> branch that aren't covered by the test suite:
> http://wiki.python.org/moin/Py3kRegressions.
>
> First up: dir() doesn't work on traceback objects (it now produces an
> empty list). A patch for this is up at http://python.org/sf/1772489.

I've moved the other documented regression (using PYTHONDUMPREFS env't
var) from the Py3kStrUniTests page to the new page.  I expect there
are a bunch of options that have problems, since those don't get great
testing.

I've also noticed that since io is now in Python, if you catch the
control-C just right, you can get strange error messages where the
code assumed an error meant something specific.  In my case, I typed
control-C while doing an execfile (before I removed it) and got two
different errors:  SyntaxError and some error related to BOM IIRC.

n

From greg.ewing at canterbury.ac.nz  Sun Aug 12 04:42:26 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 12 Aug 2007 14:42:26 +1200
Subject: [Python-3000] Four new failing tests
In-Reply-To: <46BE1082.40300@v.loewis.de>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
	<46BE1082.40300@v.loewis.de>
Message-ID: <46BE7392.7090807@canterbury.ac.nz>

Martin v. L?wis wrote:
> So I see the following options:
> 1. remove it entirely. Keep string.ascii_letters instead

I'd vote for this one. The only major use case for
string.letters I can see is testing whether something
is a letter using 'c in letters'. This obviously
doesn't scale when there can be thousands of letters,
and a function for testing letterness covers that use
case just as well.

The only other thing you might want to do is iterate
over all the possible letters, and that doesn't scale
either.

--
Greg

From nnorwitz at gmail.com  Sun Aug 12 04:49:12 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sat, 11 Aug 2007 19:49:12 -0700
Subject: [Python-3000] IDLE encoding setup
In-Reply-To: <87643llctp.fsf@hydra.hampton.thirdcreek.com>
References: <87643llctp.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <ee2a432c0708111949u4ace48e8ma021e1d5964f98f3@mail.gmail.com>

On 8/11/07, Kurt B. Kaiser <kbk at shore.net> wrote:
> I've checked in a version of PyShell.py which directs exceptions to the
> terminal instead of to IDLE's shell since the latter isn't working right now.
>
> There also is apparently an encoding issue with the subprocess setup
> which I'm ignoring for now by starting IDLE w/o the subprocess:
>
> cd Lib/idlelib
> ../../python ./idle.py -n
>
> Traceback (most recent call last):
>   File "./idle.py", line 21, in <module>
>     idlelib.PyShell.main()
>   File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 1389, in main
>     shell = flist.open_shell()
>   File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 274, in open_shell
>     if not self.pyshell.begin():
>   File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 976, in begin
>     self.firewallmessage, idlever.IDLE_VERSION, nosub))
>   File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 1214, in write
>     OutputWindow.write(self, s, tags, "iomark")
>   File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/OutputWindow.py", line 42, in write
>     s = str(s, IOBinding.encoding)
> TypeError: decoding Unicode is not supported

I can't reproduce this problem in idle.  Here's how the error seems to
be caused:

>>> str('abc', 'utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoding Unicode is not supported

Also:

>>> str(str('abc', 'utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoding Unicode is not supported

This hack might work to get you farther:

    s = str(s.encode('utf-8'), IOBinding.encoding)

(ie, add the encode() part)  I don't know what should be done to
really fix it though.

n

From oliphant.travis at ieee.org  Sun Aug 12 06:24:14 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Sat, 11 Aug 2007 22:24:14 -0600
Subject: [Python-3000] Need help compiling py3k-buffer branch
Message-ID: <f9m1b7$h2e$1@sea.gmane.org>

Hi everyone,

I apologize for my quietness on this list (I'm actually in the middle of 
a move),  but I recently implemented most of PEP 3118 in the py3k-buffer 
branch (it's implemented but not tested...)

However, I'm running into trouble getting it to link.  The compilation 
step proceeds fine but then I get a segmentation fault during the link 
stage.

It might be my platform (I've been having trouble with my 7-year old 
computer, as of late) or my installation of gcc.

I wondered if somebody would be willing to check out the py3k-buffer 
branch and try to compile it to see if there is some other problem that 
I'm not able to detect.

Thanks so much for any help.

-Travis Oliphant


From nnorwitz at gmail.com  Sun Aug 12 06:51:49 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sat, 11 Aug 2007 21:51:49 -0700
Subject: [Python-3000] Need help compiling py3k-buffer branch
In-Reply-To: <f9m1b7$h2e$1@sea.gmane.org>
References: <f9m1b7$h2e$1@sea.gmane.org>
Message-ID: <ee2a432c0708112151g3b7d29b5i647c3ae9492c3720@mail.gmail.com>

On 8/11/07, Travis E. Oliphant <oliphant.travis at ieee.org> wrote:
>
> However, I'm running into trouble getting it to link.  The compilation
> step proceeds fine but then I get a segmentation fault during the link
> stage.

The problem is that python is crashing when trying to run setup.py.  I
fixed the immediate problem which was that the type wasn't initialized
properly.  It usually starts up now.

I also fixed a 64-bit problem with a mismatch between an int and Py_ssize_t.

> It might be my platform (I've been having trouble with my 7-year old
> computer, as of late) or my installation of gcc.

I was seeing crashes from dereferencing null pointers sometimes on
startup, sometimes on shutdown.

Good luck!

n

From oliphant.travis at ieee.org  Sun Aug 12 07:32:50 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Sat, 11 Aug 2007 23:32:50 -0600
Subject: [Python-3000] Need help compiling py3k-buffer branch
In-Reply-To: <ee2a432c0708112151g3b7d29b5i647c3ae9492c3720@mail.gmail.com>
References: <f9m1b7$h2e$1@sea.gmane.org>
	<ee2a432c0708112151g3b7d29b5i647c3ae9492c3720@mail.gmail.com>
Message-ID: <f9m5br$o3e$1@sea.gmane.org>

Neal Norwitz wrote:
> On 8/11/07, Travis E. Oliphant <oliphant.travis at ieee.org> wrote:
>> However, I'm running into trouble getting it to link.  The compilation
>> step proceeds fine but then I get a segmentation fault during the link
>> stage.
> 
> The problem is that python is crashing when trying to run setup.py.  I
> fixed the immediate problem which was that the type wasn't initialized
> properly.  It usually starts up now.
> 
> I also fixed a 64-bit problem with a mismatch between an int and Py_ssize_t.
> 
>> It might be my platform (I've been having trouble with my 7-year old
>> computer, as of late) or my installation of gcc.
> 
> I was seeing crashes from dereferencing null pointers sometimes on
> startup, sometimes on shutdown.
> 

Thanks for the quick fix.   That will definitely help me make more 
progress.

Thanks,

-Travis


From martin at v.loewis.de  Sun Aug 12 09:08:04 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Aug 2007 09:08:04 +0200
Subject: [Python-3000] IDLE encoding setup
In-Reply-To: <87643llctp.fsf@hydra.hampton.thirdcreek.com>
References: <87643llctp.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <46BEB1D4.5000307@v.loewis.de>

>     s = str(s, IOBinding.encoding)
> TypeError: decoding Unicode is not supported
> 
> Hopefully MvL has a few minutes to revisit the IOBinding.py code which is
> setting IDLE's encoding.  I'm not sure how it should be configured.

This code was now bogus. In 2.x, the line read

      s = unicode(s, IOBinding.encoding)

Then unicode got systematically replaced by str, but so did the type of
s, and this entire block of code was now obsolete; I removed it in
56951.

I now get an IDLE window which crashes as soon as I type something.

Regards,
Martin

From oliphant.travis at ieee.org  Sat Aug 11 23:47:05 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Sat, 11 Aug 2007 15:47:05 -0600
Subject: [Python-3000] Help with compiling the py3k-buffer branch
Message-ID: <46BE2E59.2090008@ieee.org>


Hi everyone,

I apologize for my quietness on this list (I'm actually in the middle of 
a move),  but I recently implemented most of PEP 3118 in the py3k-buffer 
branch (it's implemented but not tested...)

However, I'm running into trouble getting it to link.  The compilation 
step proceeds fine but then I get a segmentation fault during the link 
stage.

It might be my platform (I've been having trouble with my 7-year old 
computer, as of late) or my installation of gcc.

I wondered if somebody would be willing to check out the py3k-buffer 
branch and try to compile it to see if there is some other problem that 
I'm not able to detect.

Thanks so much for any help.

-Travis Oliphant



From barry at python.org  Sun Aug 12 16:50:05 2007
From: barry at python.org (Barry Warsaw)
Date: Sun, 12 Aug 2007 09:50:05 -0500
Subject: [Python-3000] [Email-SIG] fix email module for python 3000
	(bytes/str)
In-Reply-To: <200708110149.10939.victor.stinner@haypocalc.com>
References: <200708090241.08369.victor.stinner@haypocalc.com>
	<200708110149.10939.victor.stinner@haypocalc.com>
Message-ID: <8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 10, 2007, at 6:49 PM, Victor Stinner wrote:

> It's really hard to convert email module to Python 3000 because it  
> does mix
> byte strings and (unicode) character strings...

Indeed, but I'm making progress.

Just a very quick follow up now, with hopefully more detail soon.   
I'm cross posting this one on purpose because of a couple of more  
general py3k issues involved.

In r56957 I committed changes to sndhdr.py and imghdr.py so that they  
compare what they read out of the files against proper byte  
literals.  AFAICT, neither module has a unittest, and if you run them  
from the command line, you'll see that they're completely broken  
(without my fix).  The email package uses these to guess content type  
subparts for the MIMEAudio and MIMEImage subclasses.  I didn't add  
unittests, just some judicious 'b' prefixes, and a quick command line  
test seems to make the situation better.  This also makes a bunch of  
email unittests pass.

Another general Python thing that bit me was when an exception gets  
raised with a non-ascii message, e.g.

 >>> raise RuntimeError('oops')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
RuntimeError: oops
 >>> raise RuntimeError('oo\xfcps')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
 >>>

Um, what?  (I'm using a XEmacs shell buffer on OS X, but you get  
something similar in an iTerm and Terminal window.).  In the email  
unittests, I was getting one unexpected exception that had a non- 
ascii character in it, but this crashed the unittest harness because  
when it tried to print the exception message out, you'd instead get  
an exception in io.py and the test run would exit.  Okay, that all  
makes sense, but IWBNI py3k could do better <wink>.

Fixing other simple issues (not checked in yet), I'm down to 20  
failures, 13 errors out of 247 tests.  I'm running  
test_email_renamed.py only because test_email.py will go away (we  
should remove the old module names and bump the email pkg version  
number too).

As for the other questions Victor raises, we definitely need to  
answer them, but that should be for another reply.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRr8eHXEjvBPtnXfVAQIrJgQAoWGaoN82/KFLggu0IIM0BSghIQppiFVv
9weB+Kq6oAcgN95XKGSCZmPwA8jHkeUAWRpm8gZn7k44N2fJuZw11Klajy0tzUPW
Y4b5y8jPVU85phOKinynmHb9suXroyb35ZgMSp+WipL4L5PkOMv/x9q59Rs6ldjZ
cQu3Sssai9I=
=QG9j
-----END PGP SIGNATURE-----

From eric+python-dev at trueblade.com  Sun Aug 12 17:10:16 2007
From: eric+python-dev at trueblade.com (Eric V. Smith)
Date: Sun, 12 Aug 2007 11:10:16 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BD5BD8.7030706@acm.org>
References: <46B13ADE.7080901@acm.org>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org>
Message-ID: <46BF22D8.2090309@trueblade.com>

Talin wrote:
> One final thing I wanted to mention, which Guido reminded me, is that 
> we're getting short on time. This PEP has not yet been officially 
> accepted, and the reason is because of the lack of an implementation. I 
> don't want to miss the boat. (The boat in this case being Alpha 1.)

I have hooked up the existing PEP 3101 sandbox implementation into the 
py3k branch as unicode.format().  It implements the earlier PEP syntax 
for specifiers.

I'm going to work on removing some cruft, adding tests, and then slowly 
change it over to use the new proposed specifier syntax.

Eric.


From p.f.moore at gmail.com  Sun Aug 12 18:58:44 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 12 Aug 2007 17:58:44 +0100
Subject: [Python-3000] Universal newlines support in Python 3.0
In-Reply-To: <ca471dc20708111029n5722ff04h824eb8788a4c824e@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<E1DD53D0-168D-4DC1-BF76-E7A3DE64F359@PageDNA.com>
	<ca471dc20708111029n5722ff04h824eb8788a4c824e@mail.gmail.com>
Message-ID: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com>

On 11/08/07, Guido van Rossum <guido at python.org> wrote:
> On 8/11/07, Tony Lownds <tony at pagedna.com> wrote:
> > Is this ok: when newline='\r\n' or newline='\r' is passed, only that
> > string is used to determine
> > the end of lines. No translation to '\n' is done.
>
> I *think* it would be more useful if it always returned lines ending
> in \n (not \r\n or \r). Wouldn't it? Although this is not how it
> currently behaves; when you set newline='\r\n', it returns the \r\n
> unchanged, so it would make sense to do this too when newline='\r'.
> Caveat user I guess.

Neither this wording, nor the PEP are clear to me, but I'm
assuming/hoping that there will be a way to spell the current
behaviour for universal newlines on input[1], namely that files can
have *either* bare \n, *or* the combination \r\n, to delimit lines.
Whichever is used (I have no need for mixed-style files) gets
translated to \n so that the program sees the same data regardless.

[1] ... at least the bit I care about :-)

This behaviour is immensely useful for uniform treatment of Windows
text files, which are an inconsistent mess of \n-only and \r\n
conventions.

Specifically, I'm looking to replicate this behaviour:

>xxd crlf
0000000: 610d 0a62 0d0a                           a..b..

>xxd lf
0000000: 610a 620a                                a.b.

>python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> open('crlf').read()
'a\nb\n'
>>> open('lf').read()
'a\nb\n'
>>>

As demonstrated, this is the default in Python 2.5. I'd hope it was so
in 3.0 as well.

Sorry I can't test this for myself - I don't have the time/toolset to
build my own Py3k on Windows...

Paul.

From talin at acm.org  Sun Aug 12 19:00:55 2007
From: talin at acm.org (Talin)
Date: Sun, 12 Aug 2007 10:00:55 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BF22D8.2090309@trueblade.com>
References: <46B13ADE.7080901@acm.org>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com>
Message-ID: <46BF3CC7.6010405@acm.org>

Eric V. Smith wrote:
> Talin wrote:
>> One final thing I wanted to mention, which Guido reminded me, is that 
>> we're getting short on time. This PEP has not yet been officially 
>> accepted, and the reason is because of the lack of an implementation. 
>> I don't want to miss the boat. (The boat in this case being Alpha 1.)
> 
> I have hooked up the existing PEP 3101 sandbox implementation into the 
> py3k branch as unicode.format().  It implements the earlier PEP syntax 
> for specifiers.

Woo hoo! Thanks Eric. This is great news.

At some point I'd like to build that branch myself, I might send you 
questions later.

> I'm going to work on removing some cruft, adding tests, and then slowly 
> change it over to use the new proposed specifier syntax.

I'm not sure that I'm happy with my own syntax proposal just yet. I want 
to sit down with Guido and talk it over before I commit to anything.

I think part of the difficulty that I am having is as follows:

In writing this PEP, it was never my intention to invent something brand 
new, but rather to find a way to cleanly merge together all of the 
various threads and requests and wishes that people had expressed on the 
subject. (Although I will admit that there are a few minor innovations 
of my own, but they aren't fundamental to the PEP.)

That's why I originally chose a format specifier syntax which was as 
close to the original % formatting syntax as I could manage. The idea 
was that people could simply transfer over the knowledge from the old 
system to the new one.

The old system is quite surprising in how many features it packs into 
just a few characters worth of specifiers. (For example, I don't know if 
anyone has noticed the option for putting a space in front of positive 
numbers, so that negative and positive numbers line up correctly when 
using fixed-width fields.)

And some of the suggested additions, like centering using the '^' 
character, seemed to fit in with the old scheme quite well.

However, the old syntax doesn't fit very well with the new requirements: 
the desire to have the 'repr' option take precedence over the 
type-specific formatter, and the desire to split the format specifier 
into two parts, one which is handled by the type-specific formatter, and 
one which is handled by the general formatter.

So I find that I'm having to invent something brand new, and as I'm 
writing stuff down I'm continually asking myself the question "OK, so 
how is the typical Python programmer going to see this?" Normally, I 
don't have a problem with this, because usually when one is doing an 
architectural design, if you look hard enough, eventually you'll find 
some obviously superior configuration of design elements that is clearly 
the simplest and best way to do it. And so you can assume that everyone 
who uses your design will look at it and see that, yes indeed, this is 
the right design.

But with these format strings, it seems (to me, anyway) that the design 
choices are a lot more arbitrary and driven by aesthetics. Almost any 
combination of specifiers will work, the question is how to arrange them 
in a way that is easy to memorize.

And I find I'm having trouble envisioning what a typical Python 
programmer will or won't find intuitive; And moreover, for some reason 
all of the syntax proposals, including my own, seem kind of "line-noisy" 
to me aesthetically, for all but the simplest cases.

This is made more challenging by the fact that the old syntax allowed so 
many options to be crammed into such a small space; I didn't want to 
have the new system be significantly less capable than the old, and yet 
I find it's rather difficult to shoehorn all of those capabilities into 
a new syntax without making something that is either complex, or too 
verbose (although I admit I have a fairly strict definition of verbosity.)

Guido's been suggesting that I model the format specifiers after the 
.Net numeric formatting strings, but this system is significantly less 
capable than %-style format specifiers. Yes, you can do fancy things 
like "(###)###-####", but there's no provision for centering or for a 
custom fill character.

This would be easier if I was sitting in a room with other Python 
programmers so that I could show them various suggestions and see what 
their emotional reactions are. I'm having a hard time doing this in 
isolation. That's kind of why I want to meet with Guido on this, as he's 
good at cutting through this kind of crap.

-- Talin

From janssen at parc.com  Sun Aug 12 19:09:26 2007
From: janssen at parc.com (Bill Janssen)
Date: Sun, 12 Aug 2007 10:09:26 PDT
Subject: [Python-3000] [Email-SIG] fix email module for python 3000
	(bytes/str)
In-Reply-To: <200708110149.10939.victor.stinner@haypocalc.com> 
References: <200708090241.08369.victor.stinner@haypocalc.com>
	<200708110149.10939.victor.stinner@haypocalc.com>
Message-ID: <07Aug12.100928pdt."57996"@synergy1.parc.xerox.com>

>  base64MIME.decode() and base64MIME.encode() should accept bytes and str
>  base64MIME.decode() result type is bytes
>  base64MIME.encode() result type should be... bytes or str, no idea
> 
> Other decode() and encode() functions should use same rules about types.

Victor,

Here's my take on this:

base64MIME.decode converts string to bytes
base64MIME.encode converts bytes to string

Pretty straightforward.

Bill

From janssen at parc.com  Sun Aug 12 19:11:18 2007
From: janssen at parc.com (Bill Janssen)
Date: Sun, 12 Aug 2007 10:11:18 PDT
Subject: [Python-3000] bytes: compare bytes to integer
In-Reply-To: <200708110225.28056.victor.stinner@haypocalc.com> 
References: <200708110225.28056.victor.stinner@haypocalc.com>
Message-ID: <07Aug12.101123pdt."57996"@synergy1.parc.xerox.com>

> I don't like the behaviour of Python 3000 when we compare a bytes strings
> with length=1:
>    >>> b'xyz'[0] == b'x'
>    False
> 
> The code can be see as:
>    >>> ord(b'x') == b'x'
>    False
> 
> or also:
>    >>> 120 == b'x'
>    False
> 
> Two solutions:
>  1. b'xyz'[0] returns a new bytes object (b'x' instead of 120) 
>     like b'xyz'[0:1] does
>  2. allow to compare a bytes string of 1 byte with an integer
> 
> I prefer (2) since (1) is wrong: bytes contains integers and not bytes!

Why not just write

   b'xyz'[0:1] == b'x'

in the first place?  Let's not start adding "special" cases.

Bill

From martin at v.loewis.de  Sun Aug 12 19:24:42 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Aug 2007 19:24:42 +0200
Subject: [Python-3000] Four new failing tests
In-Reply-To: <aac2c7cb0708111346w51a251afh167c6f5e0774369f@mail.gmail.com>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>	<46BE1082.40300@v.loewis.de>
	<aac2c7cb0708111346w51a251afh167c6f5e0774369f@mail.gmail.com>
Message-ID: <46BF425A.3070100@v.loewis.de>

> Wasn't unicodedata.ascii_letters suggested at one point (to eliminate
> the string module), or was that my imagination?

Not sure - I don't recall such a proposal.

> IMO, if there is a need for unicode or locale letters, we should
> provide a function to generate them as needed.  It can be passed
> directly to set or whatever datastructure is actually needed.  We
> shouldn't burden the startup cost with such a large datastructure
> unless absolutely necessary (nor should we use a property to load it
> when first needed; expensive to compute attribute and all that).

Exactly my feelings. Still, people seem to like string.letters a lot,
and I'm unsure as to why that is.

Regards,
Martin


From g.brandl at gmx.net  Sun Aug 12 19:41:40 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 12 Aug 2007 19:41:40 +0200
Subject: [Python-3000] bytes: compare bytes to integer
In-Reply-To: <07Aug12.101123pdt."57996"@synergy1.parc.xerox.com>
References: <200708110225.28056.victor.stinner@haypocalc.com>
	<07Aug12.101123pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <f9ngoi$epc$1@sea.gmane.org>

Bill Janssen schrieb:
>> I don't like the behaviour of Python 3000 when we compare a bytes strings
>> with length=1:
>>    >>> b'xyz'[0] == b'x'
>>    False
>> 
>> The code can be see as:
>>    >>> ord(b'x') == b'x'
>>    False
>> 
>> or also:
>>    >>> 120 == b'x'
>>    False
>> 
>> Two solutions:
>>  1. b'xyz'[0] returns a new bytes object (b'x' instead of 120) 
>>     like b'xyz'[0:1] does
>>  2. allow to compare a bytes string of 1 byte with an integer
>> 
>> I prefer (2) since (1) is wrong: bytes contains integers and not bytes!
> 
> Why not just write
> 
>    b'xyz'[0:1] == b'x'
> 
> in the first place?  Let's not start adding "special" cases.

Hm... I have a feeling that this will be one of the first entries in a
hypothetical "Python 3.0 Gotchas" list.

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From foom at fuhm.net  Sun Aug 12 20:28:10 2007
From: foom at fuhm.net (James Y Knight)
Date: Sun, 12 Aug 2007 14:28:10 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BE67CB.9010101@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
	<46BE67CB.9010101@canterbury.ac.nz>
Message-ID: <44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net>

I've been skimming a lot of the discussion about how to special case  
various bits and pieces of formatting, but it really seems to me as  
if this is really just asking for the use of a generic function. I'm  
not sure how exactly one spells the requirement that the first  
argument be equal to a certain object (e.g. 'r') rather than a  
subtype of it, so I'll just gloss over that issue.

But anyways, it seems like it might look something like this:

# Default behaviors
@overload
def __format__(format_char:'r', obj, format_spec):
   return __format__('s', repr(obj), format_spec)

@overload
def __format__(format_char:'s', obj, format_spec):
   return __format__('s', str(obj), format_spec)

@overload
def __format__(format_char:'f', obj, format_spec):
   return __format__('s', float(obj), format_spec)

@overload
def __format__(format_char:'d', obj, format_spec):
   return __format__('s', int(obj), format_spec)


# Type specific behaviors
@overload
def __format__(format_char:'s', obj:str, format_spec):
   ...string formatting...

@overload
def __format__(format_char:'f', obj:float, format_spec):
   ...float formatting...

@overload
def __format__(format_char:'d', obj:int, format_spec):
   ...integer formatting...

James

From p.f.moore at gmail.com  Sun Aug 12 21:12:29 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 12 Aug 2007 20:12:29 +0100
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <f9nj86$l8b$1@sea.gmane.org>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<E1DD53D0-168D-4DC1-BF76-E7A3DE64F359@PageDNA.com>
	<ca471dc20708111029n5722ff04h824eb8788a4c824e@mail.gmail.com>
	<79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com>
	<f9nj86$l8b$1@sea.gmane.org>
Message-ID: <79990c6b0708121212m2490d6f0tb151c3c1d5aa1ea3@mail.gmail.com>

On 12/08/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Note that Python does nothing special in the above case. For non-Windows
> platforms, you'd get two different results -- the conversion from \r\n to
> \n is done by the Windows C runtime since the default open() mode is text mode.
>
> Only with mode 'U' does Python use its own universal newline mode.

Pah. You're right - I almost used 'U' and then "discovered" that I
didn't need it (and got bitten by a portability bug as a result :-()

> With Python 3.0, the C library is not used and Python uses universal newline
> mode by default.

That's what I expected, but I was surprised to find that the PEP is
pretty unclear on this. The phrase "universal newlines" is mentioned
only once, and never defined. Knowing the meaning, I can see how the
PEP is intended to say that universal newlines on input is the default
(and you set the newline argument to specify a *specific*,
non-universal, newline value) - but I missed it on first reading.

Thanks for the clarification.
Paul.

From p.f.moore at gmail.com  Sun Aug 12 21:19:23 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 12 Aug 2007 20:19:23 +0100
Subject: [Python-3000] Fix imghdr module for bytes
In-Reply-To: <aac2c7cb0708101745w53247653ifc4aeef0e9c287a4@mail.gmail.com>
References: <200708110235.43664.victor.stinner@haypocalc.com>
	<aac2c7cb0708101745w53247653ifc4aeef0e9c287a4@mail.gmail.com>
Message-ID: <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com>

On 11/08/07, Adam Olsen <rhamph at gmail.com> wrote:
> Try h[0:1] == b'P'.  Slicing will ensure it stays as a bytes object,
> rather than just giving the integer it contains.

Ugh. Alternatively, h[0] == ord('P') should work.

Unless you're writing source in EBCDIC (is that allowed?).

All of the alternatives seem somewhat ugly. While I agree with the
idea that the bytes should be kept clean & simple, we seem to be
finding a few non-optimal corner cases. It would be a shame if the
bytes type turned into a Python 3.0 wart from day 1...

Would it be worth keeping a wiki page of the bytes type "idioms" that
are needed, as people discover them? Putting them all in one place
might give a better feel as to whether there is a real problem to
address.

Paul.

From rhamph at gmail.com  Sun Aug 12 21:39:23 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 12 Aug 2007 13:39:23 -0600
Subject: [Python-3000] Four new failing tests
In-Reply-To: <46BF425A.3070100@v.loewis.de>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
	<46BE1082.40300@v.loewis.de>
	<aac2c7cb0708111346w51a251afh167c6f5e0774369f@mail.gmail.com>
	<46BF425A.3070100@v.loewis.de>
Message-ID: <aac2c7cb0708121239j539af705ud56c1b79d8e1c6a3@mail.gmail.com>

On 8/12/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Wasn't unicodedata.ascii_letters suggested at one point (to eliminate
> > the string module), or was that my imagination?
>
> Not sure - I don't recall such a proposal.
>
> > IMO, if there is a need for unicode or locale letters, we should
> > provide a function to generate them as needed.  It can be passed
> > directly to set or whatever datastructure is actually needed.  We
> > shouldn't burden the startup cost with such a large datastructure
> > unless absolutely necessary (nor should we use a property to load it
> > when first needed; expensive to compute attribute and all that).
>
> Exactly my feelings. Still, people seem to like string.letters a lot,
> and I'm unsure as to why that is.

I think because it feels like the most direct, least obscured
approach.  Calling ord() feels like a hack, re is overkill and
maligned for many reasons, and c.isalpha() would behave differently if
passed unicode instead of str.

Perhaps we should have a .isasciialpha() and document that as the
preferred alternative.

Looking over google codesearch results, I don't find myself enamored
with the existing string.letters usages.  Most can be easily converted
to .isalpha/isalnum/isasciialpha/etc.  What can't easily be converted
could be done using something else, and I don't think warrant use of
string.letters given its regular misusage.  What's really frightening
is the tendency to use string.letters to build regular expressions.

-- 
Adam Olsen, aka Rhamphoryncus

From eric+python-dev at trueblade.com  Sun Aug 12 21:49:51 2007
From: eric+python-dev at trueblade.com (Eric V. Smith)
Date: Sun, 12 Aug 2007 15:49:51 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BF3CC7.6010405@acm.org>
References: <46B13ADE.7080901@acm.org>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com>
	<46BF3CC7.6010405@acm.org>
Message-ID: <46BF645F.3000808@trueblade.com>

Talin wrote:
> Eric V. Smith wrote:
>> I have hooked up the existing PEP 3101 sandbox implementation into the 
>> py3k branch as unicode.format().  It implements the earlier PEP syntax 
>> for specifiers.
> 
> Woo hoo! Thanks Eric. This is great news.
> 
> At some point I'd like to build that branch myself, I might send you 
> questions later.

I'm currently just developing in the py3k branch.  I know this is 
sub-optimal, since I can't check in there until the PEP is accepted. 
The original PEP 3101 sample implementation was in sandbox/pep3101, and 
was never a branch, because it was built as an external module.

Maybe I should create a py3k-pep3101 (or py3k-format) branch, so I can 
check my stuff in?

> I'm not sure that I'm happy with my own syntax proposal just yet. I want 
> to sit down with Guido and talk it over before I commit to anything.

I'll have some comments in the thread with your proposal.

> This would be easier if I was sitting in a room with other Python 
> programmers so that I could show them various suggestions and see what 
> their emotional reactions are. I'm having a hard time doing this in 
> isolation. That's kind of why I want to meet with Guido on this, as he's 
> good at cutting through this kind of crap.

Good luck!  I'm open to private email, or chatting on the phone, if you 
want someone to bounce ideas off of.

Eric.

From rhamph at gmail.com  Sun Aug 12 21:53:27 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 12 Aug 2007 13:53:27 -0600
Subject: [Python-3000] Fix imghdr module for bytes
In-Reply-To: <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com>
References: <200708110235.43664.victor.stinner@haypocalc.com>
	<aac2c7cb0708101745w53247653ifc4aeef0e9c287a4@mail.gmail.com>
	<79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com>
Message-ID: <aac2c7cb0708121253h5c3b69cbwae7cfcc9842d478d@mail.gmail.com>

On 8/12/07, Paul Moore <p.f.moore at gmail.com> wrote:
> On 11/08/07, Adam Olsen <rhamph at gmail.com> wrote:
> > Try h[0:1] == b'P'.  Slicing will ensure it stays as a bytes object,
> > rather than just giving the integer it contains.
>
> Ugh. Alternatively, h[0] == ord('P') should work.
>
> Unless you're writing source in EBCDIC (is that allowed?).

I doubt it, but if it was it should be translated to unicode upon
loading and have no effect on the semantics.


> All of the alternatives seem somewhat ugly. While I agree with the
> idea that the bytes should be kept clean & simple, we seem to be
> finding a few non-optimal corner cases. It would be a shame if the
> bytes type turned into a Python 3.0 wart from day 1...

I don't think this behaviour change is a problem.  It's just a bit
surprising and something that has to be learned when you switch to
3.0.  It matches list behaviour and in the end will reduce the
concepts needed to use the language.


> Would it be worth keeping a wiki page of the bytes type "idioms" that
> are needed, as people discover them? Putting them all in one place
> might give a better feel as to whether there is a real problem to
> address.

-- 
Adam Olsen, aka Rhamphoryncus

From martin at v.loewis.de  Sun Aug 12 23:54:36 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Aug 2007 23:54:36 +0200
Subject: [Python-3000] Four new failing tests
In-Reply-To: <aac2c7cb0708121239j539af705ud56c1b79d8e1c6a3@mail.gmail.com>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>	
	<46BE1082.40300@v.loewis.de>	
	<aac2c7cb0708111346w51a251afh167c6f5e0774369f@mail.gmail.com>	
	<46BF425A.3070100@v.loewis.de>
	<aac2c7cb0708121239j539af705ud56c1b79d8e1c6a3@mail.gmail.com>
Message-ID: <46BF819C.8040107@v.loewis.de>

>> Exactly my feelings. Still, people seem to like string.letters a lot,
>> and I'm unsure as to why that is.
> 
> I think because it feels like the most direct, least obscured
> approach.  Calling ord() feels like a hack, re is overkill and
> maligned for many reasons, and c.isalpha() would behave differently if
> passed unicode instead of str.

I think the first ones might apply, but the last one surely doesn't.
When people use string.letters, they don't consider issues such as
character set. If they would, they knew that string.letters may vary
with locale.

> What's really frightening
> is the tendency to use string.letters to build regular expressions.

Indeed. However, if string.letters is removed, I trust that people
start listing all characters explicitly in the regex, and curse
python-dev for removing such a useful facility.

Regards,
Martin

From kbk at shore.net  Mon Aug 13 00:31:44 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Sun, 12 Aug 2007 18:31:44 -0400
Subject: [Python-3000] IDLE encoding setup
In-Reply-To: <46BEB1D4.5000307@v.loewis.de> (Martin v. =?iso-8859-1?Q?L=F6?=
	=?iso-8859-1?Q?wis's?= message of
	"Sun, 12 Aug 2007 09:08:04 +0200")
References: <87643llctp.fsf@hydra.hampton.thirdcreek.com>
	<46BEB1D4.5000307@v.loewis.de>
Message-ID: <87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com>

"Martin v. L?wis" <martin at v.loewis.de> writes:

>> Hopefully MvL has a few minutes to revisit the IOBinding.py code which is
>> setting IDLE's encoding.  I'm not sure how it should be configured.
>
> This code was now bogus. In 2.x, the line read
>
>       s = unicode(s, IOBinding.encoding)
>
> Then unicode got systematically replaced by str, but so did the type of
> s, and this entire block of code was now obsolete; I removed it in
> 56951.

OK, thanks.

Is the code which sets IOBinding.encoding still correct?  That value is
used in several places in IDLE, including setting the encoding for
std{in,err,out}.

Same question for IOBinding.py:IOBinding.{encode(),decode()} !

>
> I now get an IDLE window which crashes as soon as I type something.

Yes, something like

  File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/lib-tk/Tkinter.py", line 1022, in mainloop
    self.tk.mainloop(n)
TypeError: expected string, bytes found

I can duplicate this using just WidgetRedirector.main() (no IDLE), but I
haven't figured out the problem as yet.  That's a very interesting module ::-P

-- 
KBK

From martin at v.loewis.de  Mon Aug 13 00:51:47 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 13 Aug 2007 00:51:47 +0200
Subject: [Python-3000] IDLE encoding setup
In-Reply-To: <87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com>
References: <87643llctp.fsf@hydra.hampton.thirdcreek.com>	<46BEB1D4.5000307@v.loewis.de>
	<87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <46BF8F03.9030705@v.loewis.de>

> Is the code which sets IOBinding.encoding still correct?  That value is
> used in several places in IDLE, including setting the encoding for
> std{in,err,out}.

I think so, yes. The conditions in which it needs to be used will have
to change, though: Python 3 defaults to UTF-8 as the source encoding,
so there is no need to use a computed encoding when there is no declared
one, anymore.

What encoding IDLE should use for sys.stdout is as debatable as it
always was (i.e. should it use a fixed on, independent of installation,
or a variable one, depending on the user's locale)

> Same question for IOBinding.py:IOBinding.{encode(),decode()} !

This is still mostly correct, except that it should encode as UTF-8
in the absence of any declared encoding (see above).

I'll fix that when I find some time.

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Mon Aug 13 01:46:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 13 Aug 2007 11:46:20 +1200
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46BD79EC.1020301@acm.org>
References: <46BD79EC.1020301@acm.org>
Message-ID: <46BF9BCC.7010505@canterbury.ac.nz>

Talin wrote:
>     :s10      # String right-aligned within field of minimum width
>               # of 10 chars.

I'm wondering whether the default alignment for
strings should be left instead of right. The C
way is all very consistent and all, but it's not
a very practical default. How often do you want
a right-aligned string?

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug 13 01:53:06 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 13 Aug 2007 11:53:06 +1200
Subject: [Python-3000] bytes: compare bytes to integer
In-Reply-To: <f9ngoi$epc$1@sea.gmane.org>
References: <200708110225.28056.victor.stinner@haypocalc.com>
	<07Aug12.101123pdt.57996@synergy1.parc.xerox.com>
	<f9ngoi$epc$1@sea.gmane.org>
Message-ID: <46BF9D62.4080707@canterbury.ac.nz>

Georg Brandl wrote:
> Hm... I have a feeling that this will be one of the first entries in a
> hypothetical "Python 3.0 Gotchas" list.

And probably it's exacerbated by calling them byte
"strings", when they're really a kind of array rather
than a kind of string, and the use of b"..." as a
constructor syntax.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug 13 01:58:30 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 13 Aug 2007 11:58:30 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
	<46BE67CB.9010101@canterbury.ac.nz>
	<44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net>
Message-ID: <46BF9EA6.10706@canterbury.ac.nz>

James Y Knight wrote:
> I've been skimming a lot of the discussion about how to special case  
> various bits and pieces of formatting, but it really seems to me as  
> if this is really just asking for the use of a generic function.

I was afraid someone would suggest that. I think it
would be a bad idea to use something like that in such
a fundamental part of the core until GFs are much
better tried and tested.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug 13 02:03:41 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 13 Aug 2007 12:03:41 +1200
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <79990c6b0708121212m2490d6f0tb151c3c1d5aa1ea3@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<E1DD53D0-168D-4DC1-BF76-E7A3DE64F359@PageDNA.com>
	<ca471dc20708111029n5722ff04h824eb8788a4c824e@mail.gmail.com>
	<79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com>
	<f9nj86$l8b$1@sea.gmane.org>
	<79990c6b0708121212m2490d6f0tb151c3c1d5aa1ea3@mail.gmail.com>
Message-ID: <46BF9FDD.7090402@canterbury.ac.nz>

Paul Moore wrote:
> and you set the newline argument to specify a *specific*,
> non-universal, newline value

It still seems wrong to not translate the newlines, though,
since it's still a *text* mode, and the standard Python
representation of text has \n line endings.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug 13 02:08:28 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 13 Aug 2007 12:08:28 +1200
Subject: [Python-3000] Fix imghdr module for bytes
In-Reply-To: <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com>
References: <200708110235.43664.victor.stinner@haypocalc.com>
	<aac2c7cb0708101745w53247653ifc4aeef0e9c287a4@mail.gmail.com>
	<79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com>
Message-ID: <46BFA0FC.2060707@canterbury.ac.nz>

Paul Moore wrote:
> Ugh. Alternatively, h[0] == ord('P') should work.

I'm wondering whether we want a "byte character literal"
to go along with "byte string literals":

   h[0] == c"P"

After all, if it makes sense to write an array of bytes
as though they were ASCII characters, it must make sense
to write a single byte that way as well.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Aug 13 02:12:23 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 13 Aug 2007 12:12:23 +1200
Subject: [Python-3000] Four new failing tests
In-Reply-To: <46BF819C.8040107@v.loewis.de>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
	<46BE1082.40300@v.loewis.de>
	<aac2c7cb0708111346w51a251afh167c6f5e0774369f@mail.gmail.com>
	<46BF425A.3070100@v.loewis.de>
	<aac2c7cb0708121239j539af705ud56c1b79d8e1c6a3@mail.gmail.com>
	<46BF819C.8040107@v.loewis.de>
Message-ID: <46BFA1E7.6070300@canterbury.ac.nz>

Martin v. L?wis wrote:
> However, if string.letters is removed, I trust that people
> start listing all characters explicitly in the regex, and curse
> python-dev for removing such a useful facility.

On the other hand, if it's kept, but turns into something
tens of kilobytes long, what effect will *that* have on
people's regular expressions?

--
Greg

From victor.stinner at haypocalc.com  Mon Aug 13 02:19:56 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 13 Aug 2007 02:19:56 +0200
Subject: [Python-3000] bytes: compare bytes to integer
In-Reply-To: <07Aug12.101123pdt."57996"@synergy1.parc.xerox.com>
References: <200708110225.28056.victor.stinner@haypocalc.com>
	<07Aug12.101123pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <200708130219.57043.victor.stinner@haypocalc.com>

On Sunday 12 August 2007 19:11:18 Bill Janssen wrote:
> Why not just write
>
>    b'xyz'[0:1] == b'x'

It's just strange to write:
   'abc'[0] == 'a'
for character string and:
   b'abc'[0:1] == b'a'
for byte string.

The problem in my brain is that str is a special case since a str item is also 
a string, where a bytes item is an integer.

It's clear that "[5, 9, 10][0] == [5]" is wrong, but for bytes and str it's 
not intuitive because of b'...' syntax. If I had to wrote [120, 121, 122] 
instead of b'xyz' it would be easier to understand that first value is an 
integer and not the *letter* X or the *string* X.


I dislike b'xyz'[0:1] == b'x' since I want to check first item and not to 
compare substrings.

Victor Stinner aka haypo
http://hachoir.org/

From eric+python-dev at trueblade.com  Mon Aug 13 02:22:27 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Sun, 12 Aug 2007 20:22:27 -0400
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46BD79EC.1020301@acm.org>
References: <46BD79EC.1020301@acm.org>
Message-ID: <46BFA443.2000005@trueblade.com>

Talin wrote:
> Taking some ideas from the various threads, here's what I'd like to propose:
> 
> (Assume that brackets [] means 'optional field')
> 
>    [:[type][align][sign][[0]minwidth][.precision]][/fill][!r]
> 
> Examples:
> 
>     :f        # Floating point number of natural width
>     :f10      # Floating point number, width at least 10
>     :f010     # Floating point number, width at least 10, leading zeros
>     :f.2      # Floating point number with two decimal digits
>     :8        # Minimum width 8, type defaults to natural type
>     :d+2      # Integer number, 2 digits, sign always shown
>     !r        # repr() format
>     :10!r     # Field width 10, repr() format
>     :s10      # String right-aligned within field of minimum width
>               # of 10 chars.
>     :s10.10   # String right-aligned within field of minimum width
>               # of 10 chars, maximum width 10.
>     :s<10     # String left-aligned in 10 char (min) field.
>     :d^15     # Integer centered in 15 character field
>     :>15/.    # Right align and pad with '.' chars
>     :f<+015.5 # Floating point, left aligned, always show sign,
>               # leading zeros, field width 15 (min), 5 decimal places.

For those cases where we're going to special case either conversions or 
repr, it would be convenient if the character were always first.  And 
since repr and string formatting are so similar, it would be convenient 
if they where the same, except for the "r" part.  But the "!" (or 
something similar) is needed, otherwise no format string could ever 
begin with an "r".

So, how about "!r" be leftmost for repr formatting.  The similarities 
would be:

"!r"       # default repr formatting
":s"       # default string formatting
"!r10"     # repr right aligned, minimum 10 chars width
":s10"     # convert to string, right aligned, minimum 10 chars width

Admittedly the "r" is now superfluous, but I think it's clearer with the 
"r" present than without it.  And it would allow for future expansion of 
such top-level functionality to bypass __format__.

Eric.

From victor.stinner at haypocalc.com  Mon Aug 13 02:26:03 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 13 Aug 2007 02:26:03 +0200
Subject: [Python-3000] [Email-SIG] fix email module for python 3000
	(bytes/str)
In-Reply-To: <8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org>
References: <200708090241.08369.victor.stinner@haypocalc.com>
	<200708110149.10939.victor.stinner@haypocalc.com>
	<8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org>
Message-ID: <200708130226.03670.victor.stinner@haypocalc.com>

On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote:
> In r56957 I committed changes to sndhdr.py and imghdr.py so that they
> compare what they read out of the files against proper byte
> literals.

So nobody read my patches? :-( See my emails "[Python-3000] Fix imghdr module 
for bytes" and "[Python-3000] Fix sndhdr module for bytes" from last 
saturday. But well, my patches look similar.

Barry's patch is incomplete: test_voc() is wrong.

I attached a new patch:
 - fix "h[sbseek] == b'\1'" and "ratecode = ord(h[sbseek+4])" in test_voc()
 - avoid division by zero
 - use startswith method: replace h[:2] == b'BM' by h.startswith(b'BM')
 - use aifc.open() instead of old aifc.openfp()
 - use ord(b'P') instead of ord('P')

Victor Stinner aka haypo
http://hachoir.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: py3k-imgsnd-hdr.patch
Type: text/x-diff
Size: 5326 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070813/081c76b4/attachment-0001.bin 

From victor.stinner at haypocalc.com  Mon Aug 13 02:32:29 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 13 Aug 2007 02:32:29 +0200
Subject: [Python-3000] bytes regular expression?
In-Reply-To: <d11dcfba0708091039m7e976a39gc006c1eb9726baf7@mail.gmail.com>
References: <200708090427.19830.victor.stinner@haypocalc.com>
	<200708091740.59070.victor.stinner@haypocalc.com>
	<d11dcfba0708091039m7e976a39gc006c1eb9726baf7@mail.gmail.com>
Message-ID: <200708130232.30033.victor.stinner@haypocalc.com>

On Thursday 09 August 2007 19:39:50 you wrote:
> So why not just skip caching for anything that doesn't hash()?  If
> you're really worried about efficiency, simply re.compile() the
> expression once and don't rely on the re module's internal cache.

I tried to keep backward compatibility.

Why character string are "optimized" (cached) but not byte string? Since regex 
parsing is slow, it's a good idea to avoid recomputation in re.compile().

Regular expression for bytes are useful for file, network, picture, etc. 
manipulation.

Victor Stinner aka haypo
http://hachoir.org/

From talin at acm.org  Mon Aug 13 03:01:03 2007
From: talin at acm.org (Talin)
Date: Sun, 12 Aug 2007 18:01:03 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>
	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>
	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>
	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>
	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org>	<46BE67CB.9010101@canterbury.ac.nz>
	<44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net>
Message-ID: <46BFAD4F.3000700@acm.org>

James Y Knight wrote:
> I've been skimming a lot of the discussion about how to special case  
> various bits and pieces of formatting, but it really seems to me as  
> if this is really just asking for the use of a generic function. I'm  
> not sure how exactly one spells the requirement that the first  
> argument be equal to a certain object (e.g. 'r') rather than a  
> subtype of it, so I'll just gloss over that issue.

The plan is to eventually use generic functions once we actually have an 
implementation of them. That probably won't happen in Alpha 1.

-- Talin


From steven.bethard at gmail.com  Mon Aug 13 04:22:46 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Sun, 12 Aug 2007 20:22:46 -0600
Subject: [Python-3000] bytes regular expression?
In-Reply-To: <200708130232.30033.victor.stinner@haypocalc.com>
References: <200708090427.19830.victor.stinner@haypocalc.com>
	<200708091740.59070.victor.stinner@haypocalc.com>
	<d11dcfba0708091039m7e976a39gc006c1eb9726baf7@mail.gmail.com>
	<200708130232.30033.victor.stinner@haypocalc.com>
Message-ID: <d11dcfba0708121922o716d8cc4od408ac0af2df8f69@mail.gmail.com>

On 8/12/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> On Thursday 09 August 2007 19:39:50 you wrote:
> > So why not just skip caching for anything that doesn't hash()?  If
> > you're really worried about efficiency, simply re.compile() the
> > expression once and don't rely on the re module's internal cache.
>
> I tried to keep backward compatibility.

It's not actually backwards incompatible -- the re docs don't promise
anywhere to do any caching for you. I'd rather wait and see whether
the caching is really necessary for bytes than keep str8 around if we
don't have to.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From eric+python-dev at trueblade.com  Mon Aug 13 04:37:54 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Sun, 12 Aug 2007 22:37:54 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BD2D59.1040209@trueblade.com>
References: <46B13ADE.7080901@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B54F51.40705@acm.org>	<46B59F05.3070200@ronadam.com>	<46B5FBD9.4020301@acm.org>	<46BBBEC6.5030705@trueblade.com>	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>	<46BC83BF.3000407@trueblade.com>	<46BD155B.2010202@canterbury.ac.nz>
	<46BD2D59.1040209@trueblade.com>
Message-ID: <46BFC402.6060804@trueblade.com>

Eric V. Smith wrote:
> Right.  Your "if" test is my is_float_specifier function.  The problem 
> is that this needs to be shared between int and float and string, and 
> anything else (maybe decimal?) that can be converted to a float.  Maybe 
> we should make is_float_specifier a classmethod of float[1], so that 
> int's __format__ (and also string's __format__) could say:
> 
> if float.is_float_specifier(spec):
>     return float(self).__format__(spec)
> 
> And float's __format__ function could do all of the specifier testing, 
> for types it knows to convert itself to, and then say:

As I've begun implementing this, I think we really do need these 
is_XXX_specifier functions.

Say I create a new int-like class, not derived from int, named MyInt. 
And I want to use it like an int, maybe to print it as a hex number:

i = MyInt()
"{0:x}".format(i)

In order for me to write the __format__ function in MyInt, I have to 
know if the specifier is in fact an int specifier.  Rather than put this 
specifier checking logic into every class that wants to convert itself 
to an int, we could centralize it in a class method int.is_int_specifier 
(or maybe int.is_specifier):

class MyInt:
     def __format__(self, spec):
         if int.is_int_specifier(spec):
             return int(self).__format__(spec)
         return "MyInt instance with custom specifier " + spec
     def __int__(self):
         return <some local state>


The problem with this logic is that every class that implements __int__ 
would probably want to contains this same logic.

Maybe we want to move this into unicode.format, and say that any class 
that implements __int__ automatically will participate in a conversion 
for a specifier that looks like an int specifier.  Of course the same 
logic would exist for float and maybe string.  Then we wouldn't need a 
public int.is_int_specifier.

The disadvantage of this approach is that if you do implement __int__, 
you're restricted in what format specifiers your __format__ method will 
ever be called with.  You're restricted from using a specifier that 
starts with d, x, etc.  That argues for making every __format__ method 
implement this test itself, only if it wants to.  Which means we would 
want to have int.is_int_specifier.

Thoughts?

Eric.



From rhamph at gmail.com  Mon Aug 13 06:05:56 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 12 Aug 2007 22:05:56 -0600
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BFC402.6060804@trueblade.com>
References: <46B13ADE.7080901@acm.org> <46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org>
	<46BBBEC6.5030705@trueblade.com>
	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>
	<46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz>
	<46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com>
Message-ID: <aac2c7cb0708122105j6e4e12br183da6d03d8feebd@mail.gmail.com>

On 8/12/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Eric V. Smith wrote:
> > Right.  Your "if" test is my is_float_specifier function.  The problem
> > is that this needs to be shared between int and float and string, and
> > anything else (maybe decimal?) that can be converted to a float.  Maybe
> > we should make is_float_specifier a classmethod of float[1], so that
> > int's __format__ (and also string's __format__) could say:
> >
> > if float.is_float_specifier(spec):
> >     return float(self).__format__(spec)
> >
> > And float's __format__ function could do all of the specifier testing,
> > for types it knows to convert itself to, and then say:
>
> As I've begun implementing this, I think we really do need these
> is_XXX_specifier functions.
>
> Say I create a new int-like class, not derived from int, named MyInt.
> And I want to use it like an int, maybe to print it as a hex number:
>
> i = MyInt()
> "{0:x}".format(i)
>
> In order for me to write the __format__ function in MyInt, I have to
> know if the specifier is in fact an int specifier.  Rather than put this
> specifier checking logic into every class that wants to convert itself
> to an int, we could centralize it in a class method int.is_int_specifier
> (or maybe int.is_specifier):
>
> class MyInt:
>      def __format__(self, spec):
>          if int.is_int_specifier(spec):
>              return int(self).__format__(spec)
>          return "MyInt instance with custom specifier " + spec
>      def __int__(self):
>          return <some local state>

My proposal was to flip this logic: __format__ should check for its
own specifiers first, and only if it doesn't match will it return
NotImplemented (triggering a call to __int__, or maybe __index__).

class MyInt:
    def __format__(self, spec):
        if is_custom_spec(spec):
            return "MyInt instance with custom specifier " + spec
        return NotImplemented
    def __int__(self):
        return <some local state>

This avoids the need for a public is_int_specifier.  unicode.format
would still have the logic, but since it's called after you're not
restricted from starting with d, x, etc.


> The problem with this logic is that every class that implements __int__
> would probably want to contains this same logic.
>
> Maybe we want to move this into unicode.format, and say that any class
> that implements __int__ automatically will participate in a conversion
> for a specifier that looks like an int specifier.  Of course the same
> logic would exist for float and maybe string.  Then we wouldn't need a
> public int.is_int_specifier.
>
> The disadvantage of this approach is that if you do implement __int__,
> you're restricted in what format specifiers your __format__ method will
> ever be called with.  You're restricted from using a specifier that
> starts with d, x, etc.  That argues for making every __format__ method
> implement this test itself, only if it wants to.  Which means we would
> want to have int.is_int_specifier.
>
> Thoughts?
>
> Eric.
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/rhamph%40gmail.com
>


-- 
Adam Olsen, aka Rhamphoryncus

From eric+python-dev at trueblade.com  Mon Aug 13 06:30:01 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 13 Aug 2007 00:30:01 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <aac2c7cb0708122105j6e4e12br183da6d03d8feebd@mail.gmail.com>
References: <46B13ADE.7080901@acm.org> <46B54F51.40705@acm.org>	
	<46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org>	
	<46BBBEC6.5030705@trueblade.com>	
	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>	
	<46BC83BF.3000407@trueblade.com>
	<46BD155B.2010202@canterbury.ac.nz>	
	<46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com>
	<aac2c7cb0708122105j6e4e12br183da6d03d8feebd@mail.gmail.com>
Message-ID: <46BFDE49.7090404@trueblade.com>

Adam Olsen wrote:
> My proposal was to flip this logic: __format__ should check for its
> own specifiers first, and only if it doesn't match will it return
> NotImplemented (triggering a call to __int__, or maybe __index__).
> 
> class MyInt:
>     def __format__(self, spec):
>         if is_custom_spec(spec):
>             return "MyInt instance with custom specifier " + spec
>         return NotImplemented
>     def __int__(self):
>         return <some local state>
> 
> This avoids the need for a public is_int_specifier.  unicode.format
> would still have the logic, but since it's called after you're not
> restricted from starting with d, x, etc.

That makes sense, since the object would have first crack at the spec, 
but needn't implement the conversions itself.  Let me see where that 
gets me.  Now I see what you were getting at with your earlier posts on 
the subject.  It wasn't clear to me that the "use a fallback" would 
include "convert based on the spec, if possible".

If accepted, this should go into the PEP, of course.

It's not clear to me if __int__ or __index__ is correct, here.  I think 
it's __int__, since float won't have __index__, and we want to be able 
to convert float to int (right?).

Thanks!

Eric.


From pc at gafol.net  Mon Aug 13 11:51:35 2007
From: pc at gafol.net (Paul Colomiets)
Date: Mon, 13 Aug 2007 12:51:35 +0300
Subject: [Python-3000] Four new failing tests
In-Reply-To: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
Message-ID: <46C029A7.8080604@gafol.net>

Guido van Rossum wrote:
> I see four tests fail that passed yesterday:
> [...]
> < test_threaded_import
Patch attached.
Need any comments?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: py3k_tempfile.diff
Type: text/x-patch
Size: 486 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070813/1da6249e/attachment.bin 

From p.f.moore at gmail.com  Mon Aug 13 14:50:12 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 13 Aug 2007 13:50:12 +0100
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <46BF9FDD.7090402@canterbury.ac.nz>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<E1DD53D0-168D-4DC1-BF76-E7A3DE64F359@PageDNA.com>
	<ca471dc20708111029n5722ff04h824eb8788a4c824e@mail.gmail.com>
	<79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com>
	<f9nj86$l8b$1@sea.gmane.org>
	<79990c6b0708121212m2490d6f0tb151c3c1d5aa1ea3@mail.gmail.com>
	<46BF9FDD.7090402@canterbury.ac.nz>
Message-ID: <79990c6b0708130550y22ddb47crb406f46376c31233@mail.gmail.com>

On 13/08/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Paul Moore wrote:
> > and you set the newline argument to specify a *specific*,
> > non-universal, newline value
>
> It still seems wrong to not translate the newlines, though,
> since it's still a *text* mode, and the standard Python
> representation of text has \n line endings.

Yes, I'd agree with that (it's not a case I particularly care about
myself, but I agree with your logic).

Paul.

From kbk at shore.net  Mon Aug 13 16:22:15 2007
From: kbk at shore.net (Kurt B. Kaiser)
Date: Mon, 13 Aug 2007 10:22:15 -0400
Subject: [Python-3000] IDLE encoding setup
In-Reply-To: <87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com> (Kurt B. Kaiser's
	message of "Sun, 12 Aug 2007 18:31:44 -0400")
References: <87643llctp.fsf@hydra.hampton.thirdcreek.com>
	<46BEB1D4.5000307@v.loewis.de>
	<87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <874pj35xbc.fsf@hydra.hampton.thirdcreek.com>

"Kurt B. Kaiser" <kbk at shore.net> writes:

>> I now get an IDLE window which crashes as soon as I type something.
>
> Yes, something like
>
>   File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/lib-tk/Tkinter.py", line 1022, in mainloop
>     self.tk.mainloop(n)
> TypeError: expected string, bytes found
>
> I can duplicate this using just WidgetRedirector.main() (no IDLE), but I
> haven't figured out the problem as yet.  That's a very interesting module ::-P

The changes you checked in to _tkinter.c fixed WidgetRedirector, and
those and the other changes you made in idlelib have IDLE working
without the subprocess.

Thanks very much for working on this!

-- 
KBK

From martin at v.loewis.de  Mon Aug 13 17:48:19 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 13 Aug 2007 17:48:19 +0200
Subject: [Python-3000] IDLE encoding setup
In-Reply-To: <874pj35xbc.fsf@hydra.hampton.thirdcreek.com>
References: <87643llctp.fsf@hydra.hampton.thirdcreek.com>	<46BEB1D4.5000307@v.loewis.de>	<87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com>
	<874pj35xbc.fsf@hydra.hampton.thirdcreek.com>
Message-ID: <46C07D43.7080601@v.loewis.de>

> The changes you checked in to _tkinter.c fixed WidgetRedirector, and
> those and the other changes you made in idlelib have IDLE working
> without the subprocess.
> 
> Thanks very much for working on this!

I doubt that all is working yet, though. So some thorough testing would
probably be necessary - plus getting the subprocess case to work, of
course.

Regards,
Martin

From skip at pobox.com  Mon Aug 13 18:55:26 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 13 Aug 2007 11:55:26 -0500
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
 3.0
In-Reply-To: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<E1DD53D0-168D-4DC1-BF76-E7A3DE64F359@PageDNA.com>
	<ca471dc20708111029n5722ff04h824eb8788a4c824e@mail.gmail.com>
	<79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com>
Message-ID: <18112.36094.979628.85609@montanaro.dyndns.org>


    Paul> ...  that files can have *either* bare \n, *or* the combination
    Paul> \r\n, to delimit lines.

As someone else pointed out, \r needs to be supported as well.  Many Mac
applications (Excel comes to mind) still emit text files with \r as the line
terminator.

Skip

From janssen at parc.com  Mon Aug 13 19:10:23 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 13 Aug 2007 10:10:23 PDT
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BFC402.6060804@trueblade.com> 
References: <46B13ADE.7080901@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com>
	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>
	<46BC83BF.3000407@trueblade.com>
	<46BD155B.2010202@canterbury.ac.nz>
	<46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com>
Message-ID: <07Aug13.101032pdt."57996"@synergy1.parc.xerox.com>

> Say I create a new int-like class, not derived from int, named MyInt. 
> And I want to use it like an int, maybe to print it as a hex number:

Then derive that class from "int", for heaven's sake!

Bill

From guido at python.org  Mon Aug 13 19:25:41 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Aug 2007 10:25:41 -0700
Subject: [Python-3000] Python 3000 Sprint @ Google
Message-ID: <ca471dc20708131025k23a57570jc85c022065ebaff1@mail.gmail.com>

It's official! The second annual Python Sprint @ Google is happening
again: August 22-25 (Wed-Sat).  We're sprinting at two locations, this
time Google headquarters in Mountain View and the Google office in
Chicago (thanks to Brian Fitzpatrick). We'll connect the two sprints
with full-screen videoconferencing. The event is *free* and includes
Google's *free gourmet food*.

Anyone with a reasonable Python experience is invited to attend. The
primary goal is to work on Python 3000, to polish off the first alpha
release; other ideas are welcome too. Experienced Python core
developers will be available for mentoring. (The goal is not to learn
Python; it is to learn *contributing* to Python.)

For more information and to sign up, please see the wiki page on python.org:
http://wiki.python.org/moin/GoogleSprint

Sign-up via the wiki page is strongly recommended to avoid lines
getting badges. Please read the whole wiki page to make sure you're
prepared.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Mon Aug 13 19:33:03 2007
From: rrr at ronadam.com (Ron Adam)
Date: Mon, 13 Aug 2007 12:33:03 -0500
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46BD79EC.1020301@acm.org>
References: <46BD79EC.1020301@acm.org>
Message-ID: <46C095CF.2060507@ronadam.com>


>     :f<+015.5 # Floating point, left aligned, always show sign,
>               # leading zeros, field width 15 (min), 5 decimal places.

Which has precedence... left alignment or zero padding?

Or should this be an error?

Ron



From rowen at cesmail.net  Mon Aug 13 19:46:08 2007
From: rowen at cesmail.net (Russell E Owen)
Date: Mon, 13 Aug 2007 10:46:08 -0700
Subject: [Python-3000] Universal newlines support in Python 3.0
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <rowen-D017E8.10460713082007@sea.gmane.org>

In article <87wsw3p5em.fsf at uwakimon.sk.tsukuba.ac.jp>,
 "Stephen J. Turnbull" <stephen at xemacs.org> wrote:

> Guido van Rossum writes:
> 
>  > However, the old universal newlines feature also set an attibute named
>  > 'newlines' on the file object to a tuple of up to three elements
>  > giving the actual line endings that were observed on the file so far
>  > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
>  > implemented. I'm tempted to kill it. Does anyone have a use case for
>  > this?
> 
> I have run into files that intentionally have more than one newline
> convention used (mbox and Babyl mail folders, with messages received
> from various platforms).  However, most of the time multiple newline
> conventions is a sign that the file is either corrupt or isn't text.
> If so, then saving the file may corrupt it.  The newlines attribute
> could be used to check for this condition.

There is at least one Mac source code editor (SubEthaEdit) that is all 
too happy to add one kind of newline to a file that started out with a 
different line ending character. As a result I have seen a fair number 
of text files with mixed line endings. I don't see as many these days, 
though; perhaps because the current version of SubEthaEdit handles 
things a bit better. So perhaps it won't matter much for Python 3000.

-- Russell


From guido at python.org  Mon Aug 13 19:51:18 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Aug 2007 10:51:18 -0700
Subject: [Python-3000] [Email-SIG] fix email module for python 3000
	(bytes/str)
In-Reply-To: <200708130226.03670.victor.stinner@haypocalc.com>
References: <200708090241.08369.victor.stinner@haypocalc.com>
	<200708110149.10939.victor.stinner@haypocalc.com>
	<8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org>
	<200708130226.03670.victor.stinner@haypocalc.com>
Message-ID: <ca471dc20708131051x176f7d87q2848e0c209e842d@mail.gmail.com>

Checked in. But next time please do use SF to submit patches (and feel
free to assign them to me and mail the list about it).

On 8/12/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote:
> > In r56957 I committed changes to sndhdr.py and imghdr.py so that they
> > compare what they read out of the files against proper byte
> > literals.
>
> So nobody read my patches? :-( See my emails "[Python-3000] Fix imghdr module
> for bytes" and "[Python-3000] Fix sndhdr module for bytes" from last
> saturday. But well, my patches look similar.
>
> Barry's patch is incomplete: test_voc() is wrong.
>
> I attached a new patch:
>  - fix "h[sbseek] == b'\1'" and "ratecode = ord(h[sbseek+4])" in test_voc()
>  - avoid division by zero
>  - use startswith method: replace h[:2] == b'BM' by h.startswith(b'BM')
>  - use aifc.open() instead of old aifc.openfp()
>  - use ord(b'P') instead of ord('P')

This latter one is questionable. If you really want to compare to
bytes, perhaps write h[:1] == b'P' instead of b[0] == ord(b'P')?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 13 20:57:28 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Aug 2007 11:57:28 -0700
Subject: [Python-3000] Four new failing tests
In-Reply-To: <46BE1082.40300@v.loewis.de>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
	<46BE1082.40300@v.loewis.de>
Message-ID: <ca471dc20708131157y33e80745neb8f04ebbd6e106e@mail.gmail.com>

On 8/11/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > ======================================================================
> > ERROR: test_char_write (__main__.TestArrayWrites)
> > ----------------------------------------------------------------------
> > Traceback (most recent call last):
> >   File "Lib/test/test_csv.py", line 648, in test_char_write
> >     a = array.array('u', string.letters)
> > ValueError: string length not a multiple of item size

I fixed this by removing the code from _locale.c that changes string.letters.

> I think some decision should be made wrt. string.letters.
>
> Clearly, string.letters cannot reasonably contain *all* letters
> (i.e. all characters of categories Ll, Lu, Lt, Lo). Or can it?
>
> Traditionally, string.letters contained everything that is a letter
> in the current locale. Still, computing this string might be expensive
> assuming you have to go through all Unicode code points and determine
> whether they are letters in the current locale.
>
> So I see the following options:
> 1. remove it entirely. Keep string.ascii_letters instead
> 2. remove string.ascii_letters, and make string.letters to be
>    ASCII only.
> 3. Make string.letters contain all letters in the current locale.
> 4. Make string.letters truly contain everything that is classified
>    as a letter in the Unicode database.
>
> Which one should happen?

First I'd like to rule out 3 and 4. I don't like 3 because in our new
all-unicode world, using the locale for deciding what letters are
makes no sense -- one should use isalpha() etc. I think 4 is not at
all what people who use string.letters expect, and it's too large.

I think 2 is unnecsesarily punishing people who use
string.ascii_letters -- they have already declared they don't care
about Unicode and we shouldn't break their code.

So that leaves 1.

There are (I think) two categories of users who use string.letters:

(a) People who have never encountered a non-English locale and for
whom there is no difference between string.ascii_letters and
string.letters. Their code may or may not work in other locales. We're
doing them a favor by flagging this in their code by removing
string.letters.

(b) People who want locale-specific behavior. Their code will probably
break anyway, since they are apparently processing text using 8-bit
characters encoded in a fixed-width encoding (e.g. the various Latin-N
encodings). They ought to convert their code to Unicode. Once they are
processing Unicode strings, they can just use isalpha() etc. If they
really want to know the set of letters that can be encoded in their
locale's encoding, they can use locale.getpreferredencoding() and
deduce it from there, e.g.:

enc = locale.getpreferredencoding()
letters = [c for c in bytes(range(256)).decode(enc) if c.isalpha()]

This won't work for multi-byte encodings of course -- but there code
never worked in that case anyway.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 13 21:07:21 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Aug 2007 12:07:21 -0700
Subject: [Python-3000] Four new failing tests
In-Reply-To: <46C029A7.8080604@gafol.net>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>
	<46C029A7.8080604@gafol.net>
Message-ID: <ca471dc20708131207h6a86b648w3bf7ef0defd9b34f@mail.gmail.com>

On 8/13/07, Paul Colomiets <pc at gafol.net> wrote:
> Guido van Rossum wrote:
> > I see four tests fail that passed yesterday:
> > [...]
> > < test_threaded_import
> Patch attached.
> Need any comments?

Thanks! The patch as-is didn't help, but after changing the write()
line to b'blat' it works.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 13 22:15:03 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Aug 2007 13:15:03 -0700
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <rowen-D017E8.10460713082007@sea.gmane.org>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
Message-ID: <ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>

On 8/13/07, Russell E Owen <rowen at cesmail.net> wrote:
> In article <87wsw3p5em.fsf at uwakimon.sk.tsukuba.ac.jp>,
>  "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>
> > Guido van Rossum writes:
> >
> >  > However, the old universal newlines feature also set an attibute named
> >  > 'newlines' on the file object to a tuple of up to three elements
> >  > giving the actual line endings that were observed on the file so far
> >  > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
> >  > implemented. I'm tempted to kill it. Does anyone have a use case for
> >  > this?
> >
> > I have run into files that intentionally have more than one newline
> > convention used (mbox and Babyl mail folders, with messages received
> > from various platforms).  However, most of the time multiple newline
> > conventions is a sign that the file is either corrupt or isn't text.
> > If so, then saving the file may corrupt it.  The newlines attribute
> > could be used to check for this condition.
>
> There is at least one Mac source code editor (SubEthaEdit) that is all
> too happy to add one kind of newline to a file that started out with a
> different line ending character. As a result I have seen a fair number
> of text files with mixed line endings. I don't see as many these days,
> though; perhaps because the current version of SubEthaEdit handles
> things a bit better. So perhaps it won't matter much for Python 3000.

I've seen similar behavior in MS VC++ (long ago, dunno what it does
these days). It would read files with \r\n and \n line endings, and
whenever you edited a line, that line also got a \r\n ending. But
unchanged lines that started out with \n-only endings would keep the
\n only. And there was no way for the end user to see or control this.

To emulate this behavior in Python you'd have to read the file in
binary mode *or* we'd have to have an additional flag specifying to
return line endings as encountered in the file. The newlines attribute
(as defined in 2.x) doesn't help, because it doesn't tell which lines
used which line ending. I think the newline feature in PEP 3116 falls
short too; it seems mostly there to override the line ending *written*
(from the default os.sep).

I think we may need different flags for input and for output.

For input, we'd need two things: (a) which are acceptable line
endings; (b) whether to translate acceptable line endings to \n or
not. For output, we need two things again: (c) whether to translate
line endings at all; (d) which line endings to translate. I guess we
could map (c) to (b) and (d) to (a) for a signature that's the same
for input and output (and makes sense for read+write files as well).
The default would be (a)=={'\n', '\r\n', '\r'} and (b)==True.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Mon Aug 13 22:22:56 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 13 Aug 2007 22:22:56 +0200
Subject: [Python-3000] Four new failing tests
In-Reply-To: <ca471dc20708131157y33e80745neb8f04ebbd6e106e@mail.gmail.com>
References: <ca471dc20708111053l3e9daa92j751143df3c7cefc3@mail.gmail.com>	
	<46BE1082.40300@v.loewis.de>
	<ca471dc20708131157y33e80745neb8f04ebbd6e106e@mail.gmail.com>
Message-ID: <46C0BDA0.3090207@v.loewis.de>

> So that leaves 1.

Ok. So several people have spoken in favor of removing string.letters;
I'll work on removing it.

Regards,
Martin

From brett at python.org  Tue Aug 14 00:13:10 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 13 Aug 2007 15:13:10 -0700
Subject: [Python-3000] Python 3000 Sprint @ Google
In-Reply-To: <ca471dc20708131025k23a57570jc85c022065ebaff1@mail.gmail.com>
References: <ca471dc20708131025k23a57570jc85c022065ebaff1@mail.gmail.com>
Message-ID: <bbaeab100708131513q22017848y7bc91001886c6e34@mail.gmail.com>

On 8/13/07, Guido van Rossum <guido at python.org> wrote:
> It's official! The second annual Python Sprint @ Google is happening
> again: August 22-25 (Wed-Sat).

I can't attend this year (damn doctor's appt.), but I will try to be
on Google Talk (username of bcannon) in case I can help out somehow
remotely.

-Brett

From chris.monsanto at gmail.com  Tue Aug 14 01:52:09 2007
From: chris.monsanto at gmail.com (Chris Monsanto)
Date: Mon, 13 Aug 2007 19:52:09 -0400
Subject: [Python-3000] 100% backwards compatible parenless function call
	statements
Message-ID: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com>

Since Python makes such a distinction between statements and expressions, I
am proposing that function calls as statements should be allowed to omit
parentheses. What I am proposing is 100% compatible with Python 2.x's
behavior of function calls; so those uncomfortable with this (basic) idea
can continue to use parens in their function calls. Expressions still
require parens because of ambiguity and clarity issues.

--Some examples:--

print "Parenless function call!", file=my_file

print(".. but this is still allowed")

# We still need parens for calls to functions where the sole argument is a
tuple
# But you'd have to do this anyway in Python 2.x... nothing lost.
print((1, 2))

# Need parens if the function call isnt the only thing in the statement
cos(3) + 4

# Need parens if function call isnt a statement, otherwise how would we get
the function itself?
x = cos(3)

# Make a the value of my_func...
my_func2 = my_func
my_func2 # call other function
my_func2() # call it again

# Method call?
f = open("myfile")
f.close

# Chained method
obj.something().somethinganother().yeah

--Notes:--

A lot of other things in Python 2.x/Python 3k at the moment have this same
behavior...

# No parens required
x, y = b, a

# But sometimes they are
func((1, 2))

# Generator expressions sometimes don't need parens
func(i for i in list)

# But sometimes they do
func(a, (i for i in list))

--Pros:--

1) Removes unnecessary verbosity for the majority of situations.
2) Python 2.x code works the same unmodified.
3) No weird stuff with non-first class objects, ala Ruby meth.call().
Functions still remain assignable to other values without other trickery.
4) Because it's completely backwards compatible, you could even have
something like from __future__ import parenless in Python 2.6 for a
transition.

--Cons:--

1) Can't type "func" bare in interpreter to get its repr. I think this is a
non-issue; I personally never do this, and with parenless calls you can just
type "repr func" anyway. Specifically I think this shouldn't be considered
because in scripts doing something like "f.close" does absolutely nothing
and giving it some functionality would be nice. It also solves one of the
Python gotchas found here:
http://www.ferg.org/projects/python_gotchas.html(specifically #5)

I'm willing to write up a proper PEP if anyone is interested in the idea. I
figured I'd poll around first.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070813/21dbb247/attachment.htm 

From rrr at ronadam.com  Tue Aug 14 01:52:31 2007
From: rrr at ronadam.com (Ron Adam)
Date: Mon, 13 Aug 2007 18:52:31 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BF3CC7.6010405@acm.org>
References: <46B13ADE.7080901@acm.org>	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>	<46BD5BD8.7030706@acm.org>
	<46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org>
Message-ID: <46C0EEBF.3010206@ronadam.com>



Talin wrote:

> I'm not sure that I'm happy with my own syntax proposal just yet. I want 
> to sit down with Guido and talk it over before I commit to anything.
> 
> I think part of the difficulty that I am having is as follows:

> However, the old syntax doesn't fit very well with the new requirements: 
> the desire to have the 'repr' option take precedence over the 
> type-specific formatter, and the desire to split the format specifier 
> into two parts, one which is handled by the type-specific formatter, and 
> one which is handled by the general formatter.

> But with these format strings, it seems (to me, anyway) that the design 
> choices are a lot more arbitrary and driven by aesthetics. Almost any 
> combination of specifiers will work, the question is how to arrange them 
> in a way that is easy to memorize.

They seem arbitrary because intuition says to put things together that 
"look" like they belong together, but the may really be two different 
things. Or because we try to use one thing for two different purposes. 
This is also what makes it hard to understand as well.

I reconsidered the split term forms a bit more and I think I've come up
with a better way to think about them. Sometimes a slight conceptual shift
can make a difference.

The basic form is:

       {name[:][type][alignment_term][,content_modifying_term]}


TYPE:	
	The specifier type.  One of 'deifsrx'.
                (and any others I left off)

         No type defaults to 's'
                (this is the safest default, but it could
                 be more flexible if need be.)


ALIGNMENT TERM:   [direction]width[/fill]

     direction:    is one of   '<^>'
     width:        a positive integer
     /fill:         a character

     * This pattern is always the same and makes up an alignment term.
       By keeping this part consistant, it makes it easy to remember
       and understand.


CONTENT MODIFYING TERMS:

* Strings and numbers are handled differently, You would never use both of 
these at the same time.

     STRINGS:  [string_width]

         string_width:    Positive or negative integer to clip a
                          long string.

         * works like slice, s[:n] or s[-n:])


     NUMBERS:   [sign][0][digits][.decimals][%]

	sign:       '-', '+', '(', ' ', ''
                     * The trailing ')' is optional.

         0:	    use leading zeros

         digits:     number of digits, or number before decimal.

         .decimal:   number of decimal places

         %:          multiplies by 100 and add ' %' to end.



Some differences are, alignment terms never have '+-' in them, or any of 
the number formatting symbols.  They are consistent.

The digits value are number of digits before the decimal.  This doesn't 
include the other symbols used in the field so it isn't the same as a field 
width.

    (I believe this is one of the points of confusion.  Or it is for me.)

It bothered me that to figure out the number of digits before the decimal I 
had to subtract all the other parts.  You can think of this as shorter form 
of the # syntax.

       ######.###  ->  6.3



Surprisingly this does not have a big impact on the latest proposed syntax 
or the number of characters used unless someone wants to both specify a 
numeric formatting term with a larger sized alignment term.

So here's what how it compares with an actual doctest that passes.

Note: fstr() and ffloat() are versions of str and float with the needed 
methods to work.

Examples from python3000 list:
     (With only a few changes where it makes sense or to make it work.)

     >>> floatvalue = ffloat(123.456)

     :f        # Floating point number of natural width
     >>> fstr('{0:f}').format(floatvalue)
     '123.456'

     :f10      # Floating point number, width at least 10
               ## behavior changed
     >>> fstr('{0:f10}').format(floatvalue)
     '       123.456'

     >>> fstr('{0:f>10}').format(floatvalue)
     '   123.456'

     :f010     # Floating point number, width at least 10, leading zeros
     >>> fstr('{0:f010}').format(floatvalue)
     '0000000123.456'

     :f.2      # Floating point number with two decimal digits
     >>> fstr('{0:f.2}').format(floatvalue)
     '123.46'

     :8        # Minimum width 8, type defaults to natural type
               ## defualt is string type, no type is guessed.
     >>> fstr('{0:8}').format(floatvalue)
     '123.456 '

     :d+2      # Integer number, 2 digits, sign always shown
               ## (minor change to show padded digits.)
     >>> fstr('{0:d+5}').format(floatvalue)
     '+  123'

     :!r       # repr() format
               ## ALTERED, not special case syntax.
               ## the 'r' is special even so.
     >>> fstr('{0:r}').format(floatvalue)
     '123.456'

     :10!r     # Field width 10, repr() format
               ## ALTERED, see above.
     >>> fstr('{0:r10}').format(floatvalue)
     '123.456   '


     :s10      # String right-aligned within field of minimum width
               # of 10 chars.
     >>> fstr('{0:s10}').format(floatvalue)
     '123.456   '


     :s10.10   # String right-aligned within field of minimum width
               # of 10 chars, maximum width 10.
               ## ALTERED, comma instead of period.
     >>> fstr('{0:s10,10}').format(floatvalue)
     '123.456   '

     :s<10     # String left-aligned in 10 char (min) field.
     >>> fstr('{0:s<10}').format(floatvalue)
     '123.456   '

     :d^15     # Integer centered in 15 character field
     >>> fstr('{0:d^15}').format(floatvalue)
     '      123      '

     :>15/.    # Right align and pad with '.' chars
     >>> fstr('{0:>15/.}').format(floatvalue)
     '........123.456'

     :f<+015.5 # Floating point, left aligned, always show sign,
               # leading zeros, field width 15 (min), 5 decimal places.
               ## ALTERED: '<' not needed, size of digits reduced.
     >>> fstr('{0:f010.5}').format(floatvalue)
     '0000000123.45600'


So very little actually changed in most of these syntax wise.  Some 
behavioral changes to number formatting. But I think these are plus's.

  - I haven't special cases the '!' syntax.

  - The behavior of digits in the numeric format term is changed.


So if the terms have the following patterns they can easily be identified.

     <dirction> <number> </char>    alignment term

     <sign> <number>      With strings type only ... A clipping term

     <sign> <0> <digits> <.> <decimals> <%>   A numeric format term



And example of using these together...

     s>25,-25            right align short strings,
                         clip long strings from the end.
                         (clips the beginning off  s = s[-25:])


     f^30/_,(010.3%)     Centers a zero padded number with 3 decimal,
                         and with parentheses around negative numbers,
                         and spaces around positive numbers,
                         in a field 30 characters wide,
                         with underscore padding.

Yes, this example is a bit long, but it does a lot!



> Guido's been suggesting that I model the format specifiers after the 
> .Net numeric formatting strings, but this system is significantly less 
> capable than %-style format specifiers. Yes, you can do fancy things 
> like "(###)###-####", but there's no provision for centering or for a 
> custom fill character.

Usually when you use this type of formatting it's very specific and doesn't 
need any sort of aligning.  Maybe we can try to add this in later after the 
rest is figured out and working?  It would fit naturally as an alternative 
content modifying term for strings.

       s^30,(###)###-####   Center phone numbers in a 30 character column.

The numbers signs are enough to identify the type here.


> This would be easier if I was sitting in a room with other Python 
> programmers so that I could show them various suggestions and see what 
> their emotional reactions are. I'm having a hard time doing this in 
> isolation. That's kind of why I want to meet with Guido on this, as he's 
> good at cutting through this kind of crap.

I agree, it would be easier.

Cheers,
    Ron



From greg.ewing at canterbury.ac.nz  Tue Aug 14 02:58:27 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Aug 2007 12:58:27 +1200
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46BFA443.2000005@trueblade.com>
References: <46BD79EC.1020301@acm.org> <46BFA443.2000005@trueblade.com>
Message-ID: <46C0FE33.1010708@canterbury.ac.nz>

Eric Smith wrote:
> But the "!" (or 
> something similar) is needed, otherwise no format string could ever 
> begin with an "r".

My current preference is for 'r' to always mean the
same thing for all types. That means if you're designing
a new format string, you just have to choose something
other than 'r'. I don't see that as a big problem.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug 14 03:10:42 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Aug 2007 13:10:42 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BFC402.6060804@trueblade.com>
References: <46B13ADE.7080901@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com>
	<46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com>
	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>
	<46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz>
	<46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com>
Message-ID: <46C10112.4040506@canterbury.ac.nz>

Eric Smith wrote:
> In order for me to write the __format__ function in MyInt, I have to 
> know if the specifier is in fact an int specifier.
 >
 > class MyInt:
 >      def __format__(self, spec):
 >          if int.is_int_specifier(spec):
 >              return int(self).__format__(spec)
 >          return "MyInt instance with custom specifier " + spec

I would do this the other way around, i.e. first look
to see whether the spec is one that MyInt wants to handle
specially, and if not, *assume* that it's an int specifier.
E.g. if MyInt defines a new "m" format:

   def __format__(self, spec):
     if spec.startswith("m"):
       return self.do_my_formatting(spec)
     else:
       return int(self).__format__(spec)

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug 14 03:36:48 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Aug 2007 13:36:48 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46BFDE49.7090404@trueblade.com>
References: <46B13ADE.7080901@acm.org> <46B54F51.40705@acm.org>
	<46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org>
	<46BBBEC6.5030705@trueblade.com>
	<fb6fbf560708100727tdecc140heac8481eaef5e5db@mail.gmail.com>
	<46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz>
	<46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com>
	<aac2c7cb0708122105j6e4e12br183da6d03d8feebd@mail.gmail.com>
	<46BFDE49.7090404@trueblade.com>
Message-ID: <46C10730.5010702@canterbury.ac.nz>

Eric Smith wrote:
> It's not clear to me if __int__ or __index__ is correct, here.  I think 
> it's __int__, since float won't have __index__, and we want to be able 
> to convert float to int (right?).

This issue doesn't arise if the object itself does the
fallback conversion, as in the example I posted, rather
than leave it to generic code in format().

--
Greg

From bwinton at latte.ca  Tue Aug 14 03:48:41 2007
From: bwinton at latte.ca (Blake Winton)
Date: Mon, 13 Aug 2007 21:48:41 -0400
Subject: [Python-3000] 100% backwards compatible parenless function call
 statements
In-Reply-To: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com>
References: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com>
Message-ID: <46C109F9.1040503@latte.ca>

Chris Monsanto wrote:
> so those uncomfortable with 
> this (basic) idea can continue to use parens in their function calls. 

But we would have to read people's code who didn't use them.

> my_func2 # call other function
> my_func2() # call it again

So, those two are the same, but these two are different?
print my_func2
print my_func2()

What about these two?
x.y().z
x.y().z()

Would this apply to anything which implements callable?

> # Method call?
> f = open("myfile")
> f.close

What happens in
for x in dir(f):
     x
?  If some things are functions, do they get called and the other things 
don't?

> --Pros:--
> 1) Removes unnecessary verbosity for the majority of situations.

"unnecessary verbosity" is kind of stretching it.  Two whole characters 
in some situations is hardly a huge burden.

> I'm willing to write up a proper PEP if anyone is interested in the 
> idea. I figured I'd poll around first.

I vote "AAAAAAaaaahhhh!  Dear god, no!".  ;)

Seriously, knowing at a glance the difference between function 
references and function invocations is one of the reasons I like Python 
(and dislike Ruby).  Your proposal would severely compromise that 
functionality.

Later,
Blake.

From bwinton at latte.ca  Tue Aug 14 03:51:45 2007
From: bwinton at latte.ca (Blake Winton)
Date: Mon, 13 Aug 2007 21:51:45 -0400
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
 3.0
In-Reply-To: <ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
Message-ID: <46C10AB1.4020708@latte.ca>

Guido van Rossum wrote:
> On 8/13/07, Russell E Owen <rowen at cesmail.net> wrote:
>> In article <87wsw3p5em.fsf at uwakimon.sk.tsukuba.ac.jp>,
>>  "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>>> I have run into files that intentionally have more than one newline
>>> convention used (mbox and Babyl mail folders, with messages received
>>> from various platforms).  However, most of the time multiple newline
>>> conventions is a sign that the file is either corrupt or isn't text.
>> There is at least one Mac source code editor (SubEthaEdit) that is all
>> too happy to add one kind of newline to a file that started out with a
>> different line ending character.
> I've seen similar behavior in MS VC++ (long ago, dunno what it does
> these days). It would read files with \r\n and \n line endings, and
> whenever you edited a line, that line also got a \r\n ending. But
> unchanged lines that started out with \n-only endings would keep the
> \n only. And there was no way for the end user to see or control this.

I've seen it in Scite (an editor based around Scintilla) just yesterday. 
  It was rather annoying, since it messed up my diffs something awful, 
and was invisible to the naked eye.  (But it lets you "Show Line 
Endings", which quickly made the problem apparent.)

Later,
Blake.

From victor.stinner at haypocalc.com  Tue Aug 14 03:52:45 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 14 Aug 2007 03:52:45 +0200
Subject: [Python-3000] [Email-SIG] fix email module for python 3000
	(bytes/str)
In-Reply-To: <ca471dc20708131051x176f7d87q2848e0c209e842d@mail.gmail.com>
References: <200708090241.08369.victor.stinner@haypocalc.com>
	<200708130226.03670.victor.stinner@haypocalc.com>
	<ca471dc20708131051x176f7d87q2848e0c209e842d@mail.gmail.com>
Message-ID: <200708140352.45989.victor.stinner@haypocalc.com>

Hi,

On Monday 13 August 2007 19:51:18 Guido van Rossum wrote:
> Checked in. But next time please do use SF to submit patches (and feel
> free to assign them to me and mail the list about it).

Ah yes, you already asked to use SF. I will use it next time.

> On 8/12/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> > On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote:
> > > In r56957 I committed changes to sndhdr.py and imghdr.py so that they
> > > compare what they read out of the files against proper byte
> > > literals.
> >
> > So nobody read my patches?
> > (...) 
> > I attached a new patch 
> > (...)
> >  - use ord(b'P') instead of ord('P')
>
> This latter one is questionable. If you really want to compare to
> bytes, perhaps write h[:1] == b'P' instead of b[0] == ord(b'P')?

Someone proposed c'P' syntax for ord(b'P') which is like an alias for 80.

I prefer letters than number when letters have sens.

I also think (I may be wrong) that b'xyz'[0] == 80 is faster than b'xyz'[:1] 
== b'x' since b'xyz'[:1] creates a new object. If we keep speed argument, 
b'xyz'[0] == ord(b'P') may be slower than b'xyz'[:1] == b'x' since ord(b'P') 
is recomputed each time (is it right?).

But well, speed argument is stupid since it's a micro-optimization :-)

Victor Stinner aka haypo
http://hachoir.org/

From greg.ewing at canterbury.ac.nz  Tue Aug 14 04:02:21 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Aug 2007 14:02:21 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C0EEBF.3010206@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org>
	<46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
	<46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org>
	<46C0EEBF.3010206@ronadam.com>
Message-ID: <46C10D2D.60705@canterbury.ac.nz>

Ron Adam wrote:

> The digits value are number of digits before the decimal.  This doesn't 
> include the other symbols used in the field so it isn't the same as a field 
> width.

How does this work with formats where the number of
digits before the decimal can vary, but before+after
is constant?

Also, my feeling about the whole of this is that
it's too complicated. It seems like you can have
at least three numbers in a format, and at first
glance it's quite confusing as to what they all
mean.

--
Greg

From skip at pobox.com  Tue Aug 14 04:15:39 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 13 Aug 2007 21:15:39 -0500
Subject: [Python-3000] os.extsep & RISCOS support removal
Message-ID: <18113.4171.231630.187319@montanaro.dyndns.org>

I'm working my way through RISCOS code removal.  I came across this note in
Misc/HISTORY: 

- os.extsep -- a new variable needed by the RISCOS support.  It is the
  separator used by extensions, and is '.' on all platforms except
  RISCOS, where it is '/'.  There is no need to use this variable
  unless you have a masochistic desire to port your code to RISCOS.

If RISCOS is going away should os.extsep as well?

Skip


From victor.stinner at haypocalc.com  Tue Aug 14 04:22:36 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 14 Aug 2007 04:22:36 +0200
Subject: [Python-3000] Questions about email bytes/str (python 3000)
Message-ID: <200708140422.36818.victor.stinner@haypocalc.com>

Hi,

After many tests, I'm unable to convert email module to Python 3000. I'm also 
unable to take decision of the best type for some contents.



(1) Email parts should be stored as byte or character string?

Related methods: Generator class, Message.get_payload(), Message.as_string().

Let's take an example: multipart (MIME) email with latin-1 and base64 (ascii) 
sections. Mix latin-1 and ascii => mix bytes. So the best type should be 
bytes.

=> bytes



(2) Parsing file (raw string): use bytes or str in parsing?

The parser use methods related to str like splitlines(), lower(), strip(). But 
it should be easy to rewrite/avoid these methods. I think that low-level 
parsing should be done on bytes. At the end, or when we know the charset, we 
can convert to str.

=> bytes



About base64, I agree with Bill Janssen:
 - base64MIME.decode converts string to bytes
 - base64MIME.encode converts bytes to string

But decode may accept bytes as input (as base64 modules does): use 
str(value, 'ascii', 'ignore') or str(value, 'ascii', 'strict').


I wrote 4 differents (non-working) patches. So I you want to work on email 
module and Python 3000, please first contact me. When I will get a better 
patch, I will submit it.


Victor Stinner aka haypo
http://hachoir.org/

From guido at python.org  Tue Aug 14 04:24:20 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Aug 2007 19:24:20 -0700
Subject: [Python-3000] 100% backwards compatible parenless function call
	statements
In-Reply-To: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com>
References: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com>
Message-ID: <ca471dc20708131924m74f07b7fka6d9e3fe180a729@mail.gmail.com>

This is a topic for python-ideas, not python-3000.

To be absolutely brutally honest, it doesn't look like you understand
parsing well enough to be able to write a PEP. E.g. why is

  cos(3)+4

not interpreted as

  cos((3)+4)

in your proposal?

Python's predecessor had something like this, and they *did* do it
properly. The result was that if you wanted the other interpretation
you'd have to write

  (cos 3) + 4

Similarly in Haskell, I believe.

In any case, I don't believe the claim from the subject, especially if
you don't distinguish between

  f.close

and

  f.close()

How would you even know that 'close' is a method and not an attibute?
E.g. how do you avoid interpreting

  f.closed

as

  f.closed()

(which would be a TypeError)?

Skeptically,

--Guido

On 8/13/07, Chris Monsanto <chris.monsanto at gmail.com> wrote:
> Since Python makes such a distinction between statements and expressions, I
> am proposing that function calls as statements should be allowed to omit
> parentheses. What I am proposing is 100% compatible with Python 2.x's
> behavior of function calls; so those uncomfortable with this (basic) idea
> can continue to use parens in their function calls. Expressions still
> require parens because of ambiguity and clarity issues.
>
> --Some examples:--
>
> print "Parenless function call!", file=my_file
>
> print(".. but this is still allowed")
>
> # We still need parens for calls to functions where the sole argument is a
> tuple
> # But you'd have to do this anyway in Python 2.x... nothing lost.
> print((1, 2))
>
> # Need parens if the function call isnt the only thing in the statement
> cos(3) + 4
>
> # Need parens if function call isnt a statement, otherwise how would we get
> the function itself?
> x = cos(3)
>
> # Make a the value of my_func...
> my_func2 = my_func
> my_func2 # call other function
> my_func2() # call it again
>
> # Method call?
> f = open("myfile")
> f.close
>
> # Chained method
> obj.something().somethinganother().yeah
>
> --Notes:--
>
> A lot of other things in Python 2.x/Python 3k at the moment have this same
> behavior...
>
> # No parens required
> x, y = b, a
>
> # But sometimes they are
> func((1, 2))
>
> # Generator expressions sometimes don't need parens
> func(i for i in list)
>
> # But sometimes they do
> func(a, (i for i in list))
>
> --Pros:--
>
> 1) Removes unnecessary verbosity for the majority of situations.
> 2) Python 2.x code works the same unmodified.
> 3) No weird stuff with non-first class objects, ala Ruby meth.call().
> Functions still remain assignable to other values without other trickery.
> 4) Because it's completely backwards compatible, you could even have
> something like from __future__ import parenless in Python 2.6 for a
> transition.
>
> --Cons:--
>
> 1) Can't type "func" bare in interpreter to get its repr. I think this is a
> non-issue; I personally never do this, and with parenless calls you can just
> type "repr func" anyway. Specifically I think this shouldn't be considered
> because in scripts doing something like " f.close" does absolutely nothing
> and giving it some functionality would be nice. It also solves one of the
> Python gotchas found here:
> http://www.ferg.org/projects/python_gotchas.html
> (specifically #5)
>
> I'm willing to write up a proper PEP if anyone is interested in the idea. I
> figured I'd poll around first.
>
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 14 04:25:51 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Aug 2007 19:25:51 -0700
Subject: [Python-3000] os.extsep & RISCOS support removal
In-Reply-To: <18113.4171.231630.187319@montanaro.dyndns.org>
References: <18113.4171.231630.187319@montanaro.dyndns.org>
Message-ID: <ca471dc20708131925v6bd6e6d5q76adfb847ec92c7a@mail.gmail.com>

On 8/13/07, skip at pobox.com <skip at pobox.com> wrote:
> I'm working my way through RISCOS code removal.  I came across this note in
> Misc/HISTORY:
>
> - os.extsep -- a new variable needed by the RISCOS support.  It is the
>   separator used by extensions, and is '.' on all platforms except
>   RISCOS, where it is '/'.  There is no need to use this variable
>   unless you have a masochistic desire to port your code to RISCOS.
>
> If RISCOS is going away should os.extsep as well?

Yes please. It just causes more code for no good reason.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Tue Aug 14 04:38:01 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 13 Aug 2007 21:38:01 -0500
Subject: [Python-3000] os.extsep & RISCOS support removal
In-Reply-To: <ca471dc20708131925v6bd6e6d5q76adfb847ec92c7a@mail.gmail.com>
References: <18113.4171.231630.187319@montanaro.dyndns.org>
	<ca471dc20708131925v6bd6e6d5q76adfb847ec92c7a@mail.gmail.com>
Message-ID: <18113.5513.764384.130318@montanaro.dyndns.org>


    >> If RISCOS is going away should os.extsep as well?

    Guido> Yes please. It just causes more code for no good reason.

Good.  I already removed it in my sandbox. ;-)

While I'm thinking about it, should I be identifying tasks I'm working on
somewhere to avoid duplication of effort?  There was something on the
py3kstruni wiki page where people marked the failing tests they were working
on.  I'm not aware of a similar page for more general tasks.

Skip


From talin at acm.org  Tue Aug 14 05:13:35 2007
From: talin at acm.org (Talin)
Date: Mon, 13 Aug 2007 20:13:35 -0700
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C095CF.2060507@ronadam.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
Message-ID: <46C11DDF.2080607@acm.org>

Ron Adam wrote:
> 
>>     :f<+015.5 # Floating point, left aligned, always show sign,
>>               # leading zeros, field width 15 (min), 5 decimal places.
> 
> Which has precedence... left alignment or zero padding?
> 
> Or should this be an error?

The answer is: Just ignore that proposal entirely :)

------

So I sat down with Guido and as I expected he has simplified my thoughts 
greatly. Based on the conversation we had, I think we both agree on what 
should be done:


1) There will be a new built-in function "format" that formats a single 
field. This function takes two arguments, a value to format, and a 
format specifier string.

The "format" function does exactly the following:

    def format(value, spec):
       return value.__format__(spec)

(I believe this even works if value is 'None'.)

In other words, any type conversion or fallbacks must be done by 
__format__; Any interpretation or parsing of the format specifier is 
also done by __format__.

"format" does not, however, handle the "!r" specifier. That is done by 
the caller of this function (usually the Formatter class.)


2) The various type-specific __format__ methods are allowed to know 
about other types - so 'int' knows about 'float' and so on.

Note that other than the special case of int <--> float, this knowledge 
is one way only, meaning that the dependency graph is a acyclic.

For most types, if they see a type letter that they don't recognize, 
they should coerce to their nearest built-in type (int, float, etc.) and 
re-invoke __format__.


3) In addition to int.__format__, float.__format__, and str.__format__, 
there will also be object.__format__, which simply coerces the object to 
a string, and calls __format__ on the result.

   class object:
      def __format__(self, spec):
         return str(self).__format__(spec)

So in other words, all objects are formattable if they can be converted 
to a string.


4) Explicit type coercion is a separate field from the format spec:

     {name[:format_spec][!coercion]}

Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to 
string.) Other letters may be added later based on need.

The coercion field cases the formatter class to attempt to coerce the 
value to the specified type before calling format(value, format_spec)


5) Mini-language for format specifiers:

So I do like your (Ron's) latest proposal, and I am thinking about it 
quite a bit.

Guido suggested (and I am favorable to the idea) that we simply keep the 
2.5 format syntax, or the slightly more advanced variation that's in the 
PEP now.

This has a couple of advantages:

-- It means that Python programmers won't have to learn a new syntax.
-- It makes the 2to3 conversion of format strings trivial. (Although 
there are some other difficulties with automatic conversion of '%', but 
they are unrelated to format specifiers.)

Originally I liked the idea of putting the type letter at the front, 
instead of at the back like it is in 2.5. However, when you think about 
it, it actually makes sense to have it at the back. Because the type 
letter is now optional, it won't need to be there most of the time. The 
type letter is really just an optional modifier flag, not a "type" at all.

Two features of your proposal that aren't supported in the old syntax are:

   -- Arbitrary fill characters, as opposed to just '0' and ' '.
   -- Taking the string value from the left or right.

I'm not sure how much we need the first. The second sounds kind of 
useful though.

I'm thinking that we might be able to take your ideas and simply extend 
the old 2.5 syntax, so that it would be backwards compatible. On the 
other hand, it seems to me that once we have a *real* implementation 
(which we will soon), it will be relatively easy for people to 
experiment with new features and syntactical innovations.


6) Finally, Guido stressed that he wants to make sure that the 
implementation supports fields within fields, such as:

    {0:{1}.{2}}

Fortunately, the 'format' function doesn't have to handle this (it only 
formats a single value.) This would be done by the higher-level code.

-- Talin

From guido at python.org  Tue Aug 14 05:15:25 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Aug 2007 20:15:25 -0700
Subject: [Python-3000] os.extsep & RISCOS support removal
In-Reply-To: <18113.5513.764384.130318@montanaro.dyndns.org>
References: <18113.4171.231630.187319@montanaro.dyndns.org>
	<ca471dc20708131925v6bd6e6d5q76adfb847ec92c7a@mail.gmail.com>
	<18113.5513.764384.130318@montanaro.dyndns.org>
Message-ID: <ca471dc20708132015u12d31582r3f2046c6ea7d678d@mail.gmail.com>

Use this page: http://wiki.python.org/moin/Py3kToDo

The "master" Python 3000 in the wiki links to these and other
resources: http://wiki.python.org/moin/Python3000

--Guido

On 8/13/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     >> If RISCOS is going away should os.extsep as well?
>
>     Guido> Yes please. It just causes more code for no good reason.
>
> Good.  I already removed it in my sandbox. ;-)
>
> While I'm thinking about it, should I be identifying tasks I'm working on
> somewhere to avoid duplication of effort?  There was something on the
> py3kstruni wiki page where people marked the failing tests they were working
> on.  I'm not aware of a similar page for more general tasks.
>
> Skip
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Tue Aug 14 05:44:38 2007
From: rrr at ronadam.com (Ron Adam)
Date: Mon, 13 Aug 2007 22:44:38 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C10D2D.60705@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>
	<46B2D147.90606@acm.org>	<46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>
	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>
	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>
	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>
	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org>	<46BF22D8.2090309@trueblade.com>
	<46BF3CC7.6010405@acm.org>	<46C0EEBF.3010206@ronadam.com>
	<46C10D2D.60705@canterbury.ac.nz>
Message-ID: <46C12526.8040807@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
> 
>> The digits value are number of digits before the decimal.  This doesn't 
>> include the other symbols used in the field so it isn't the same as a field 
>> width.
> 
> How does this work with formats where the number of
> digits before the decimal can vary, but before+after
> is constant?

I think this is what you're looking for.

    f>15,.3   #15 field width, 3 decimal places, right aligned.

In this case the sign will be right before the most significant digit.

Or you could use...

    f 10.3   # total width = 15

In this one, the sign would be to the far left of the field.  So they are 
not the same thing.  The space is used here to make positives numbers the 
same width as negatives values.


> Also, my feeling about the whole of this is that
> it's too complicated. It seems like you can have
> at least three numbers in a format, and at first
> glance it's quite confusing as to what they all
> mean.

Well, at first glance so is everything else that's been suggested,  it's 
because we are doing a lot in a very little space.  In this case we are 
adding just a touch of complexity to the syntax in order to use grouping to 
remove complexity in understanding the expression.

These are all field width terms:

     >10       right align in field 10
     ^15/_     center in field 15, pad with underscores
     20/*      left align in field 20, pad with *

They are easy to identify because other terms do not contain '<^>/'. And 
sense they are separate from other format terms, once you get it, you've 
got it. Nothing more to remember here.

It doesn't make sense to put signs in front of field widths because the 
signs have no relation to the field width at all.


These are all number formats:

     +10.4
     (10.4)
     .6
     ' 9.3'     Quoted so you can see the space.
     10.

Here, we don't use alignment symbols.  Alignments have no meaning in the 
context of number of digits.  So these taken as a smaller chunk of the 
whole will also be easier to remember.  There are no complex interactions 
between field alignment terms, and number terms this way.  That makes 
simpler to understand and learn.


Lets take apart the alternative syntax.

     f<+15.2

       f   fixed point     # of decimals is specified

       <   align left      (field attribute)

       +   sign            (number attribute)

       15  width           (field attribute)

       .2  decimals        (number attribute)

So what you have is some of them apply to the field with and some of them 
effect how the number is displayed.  But they alternate.  (Does anyone else 
find that kind of odd?)

The specifier syntax described here groups related items together.

    f<15,-.2

    f   fixed point

    <     left align
    15    field width

    +     sign
    .2    decimals


Yes, we can get rid of one number by just using the field width in place of 
a digits width.  But it's a trade off.  I think it complicates the concept 
in exchange for simplifying the syntax.

Regards,
    Ron













From guido at python.org  Tue Aug 14 05:53:26 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Aug 2007 20:53:26 -0700
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C11DDF.2080607@acm.org>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
Message-ID: <ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>

On 8/13/07, Talin <talin at acm.org> wrote:
> So I sat down with Guido and as I expected he has simplified my thoughts
> greatly. Based on the conversation we had, I think we both agree on what
> should be done:
>
>
> 1) There will be a new built-in function "format" that formats a single
> field. This function takes two arguments, a value to format, and a
> format specifier string.
>
> The "format" function does exactly the following:
>
>     def format(value, spec):
>        return value.__format__(spec)
>
> (I believe this even works if value is 'None'.)

Yes, assuming the definition of object.__format__ you give later.

> In other words, any type conversion or fallbacks must be done by
> __format__; Any interpretation or parsing of the format specifier is
> also done by __format__.
>
> "format" does not, however, handle the "!r" specifier. That is done by
> the caller of this function (usually the Formatter class.)
>
>
> 2) The various type-specific __format__ methods are allowed to know
> about other types - so 'int' knows about 'float' and so on.
>
> Note that other than the special case of int <--> float, this knowledge
> is one way only, meaning that the dependency graph is a acyclic.

Though we don't necessarily care (witness the exception for
int<->float -- other types could know about each other too, if it's
useful).

> For most types, if they see a type letter that they don't recognize,
> they should coerce to their nearest built-in type (int, float, etc.) and
> re-invoke __format__.

Make that "for numeric types".

One of my favorite examples of non-numeric types are the date, time
and datetime types from the datetime module; here I propose that their
__format__ be defined like this:

  def __format__(self, spec):
      return self.strftime(spec)

> 3) In addition to int.__format__, float.__format__, and str.__format__,
> there will also be object.__format__, which simply coerces the object to
> a string, and calls __format__ on the result.
>
>    class object:
>       def __format__(self, spec):
>          return str(self).__format__(spec)
>
> So in other words, all objects are formattable if they can be converted
> to a string.
>
>
> 4) Explicit type coercion is a separate field from the format spec:
>
>      {name[:format_spec][!coercion]}

Over lunch we discussed putting !coercion first. IMO {foo!r:20} reads
more naturally from left to right: take foo, call repr() on it, then
call format(_, '20') on the resulting string.

> Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to
> string.) Other letters may be added later based on need.
>
> The coercion field cases the formatter class to attempt to coerce the
> value to the specified type before calling format(value, format_spec)
>
>
> 5) Mini-language for format specifiers:
>
> So I do like your (Ron's) latest proposal, and I am thinking about it
> quite a bit.
>
> Guido suggested (and I am favorable to the idea) that we simply keep the
> 2.5 format syntax, or the slightly more advanced variation that's in the
> PEP now.
>
> This has a couple of advantages:
>
> -- It means that Python programmers won't have to learn a new syntax.
> -- It makes the 2to3 conversion of format strings trivial. (Although
> there are some other difficulties with automatic conversion of '%', but
> they are unrelated to format specifiers.)
>
> Originally I liked the idea of putting the type letter at the front,
> instead of at the back like it is in 2.5. However, when you think about
> it, it actually makes sense to have it at the back. Because the type
> letter is now optional, it won't need to be there most of the time. The
> type letter is really just an optional modifier flag, not a "type" at all.
>
> Two features of your proposal that aren't supported in the old syntax are:
>
>    -- Arbitrary fill characters, as opposed to just '0' and ' '.
>    -- Taking the string value from the left or right.
>
> I'm not sure how much we need the first. The second sounds kind of
> useful though.

The second could be added to the mini-language for strings
(str.__format__); I don't see how it would make sense for numbers. (If
you want the last N digits of an int x, by all means use x%10**N.)

> I'm thinking that we might be able to take your ideas and simply extend
> the old 2.5 syntax, so that it would be backwards compatible. On the
> other hand, it seems to me that once we have a *real* implementation
> (which we will soon), it will be relatively easy for people to
> experiment with new features and syntactical innovations.
>
>
> 6) Finally, Guido stressed that he wants to make sure that the
> implementation supports fields within fields, such as:
>
>     {0:{1}.{2}}
>
> Fortunately, the 'format' function doesn't have to handle this (it only
> formats a single value.) This would be done by the higher-level code.

Yup. Great summary overall!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Tue Aug 14 07:49:11 2007
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 14 Aug 2007 00:49:11 -0500
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C11DDF.2080607@acm.org>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
Message-ID: <46C14257.6090902@ronadam.com>



Talin wrote:
> Ron Adam wrote:
>>
>>>     :f<+015.5 # Floating point, left aligned, always show sign,
>>>               # leading zeros, field width 15 (min), 5 decimal places.
>>
>> Which has precedence... left alignment or zero padding?
>>
>> Or should this be an error?
> 
> The answer is: Just ignore that proposal entirely :)

Ok :)

> ------
> 
> So I sat down with Guido and as I expected he has simplified my thoughts 
> greatly. Based on the conversation we had, I think we both agree on what 
> should be done:
> 
> 
> 1) There will be a new built-in function "format" that formats a single 
> field. This function takes two arguments, a value to format, and a 
> format specifier string.
> 
> The "format" function does exactly the following:
> 
>    def format(value, spec):
>       return value.__format__(spec)
 >
> (I believe this even works if value is 'None'.)
> 
> In other words, any type conversion or fallbacks must be done by 
> __format__; Any interpretation or parsing of the format specifier is 
> also done by __format__.
> 
> "format" does not, however, handle the "!r" specifier. That is done by 
> the caller of this function (usually the Formatter class.)
> 
> 
> 2) The various type-specific __format__ methods are allowed to know 
> about other types - so 'int' knows about 'float' and so on.
> 
> Note that other than the special case of int <--> float, this knowledge 
> is one way only, meaning that the dependency graph is a acyclic.
> 
> For most types, if they see a type letter that they don't recognize, 
> they should coerce to their nearest built-in type (int, float, etc.) and 
> re-invoke __format__.

If it coerces the value first, it can then just call format(value, spec).


> 3) In addition to int.__format__, float.__format__, and str.__format__, 
> there will also be object.__format__, which simply coerces the object to 
> a string, and calls __format__ on the result.
> 
>   class object:
>      def __format__(self, spec):
>         return str(self).__format__(spec)
> 
> So in other words, all objects are formattable if they can be converted 
> to a string.
> 
> 
> 4) Explicit type coercion is a separate field from the format spec:
> 
>     {name[:format_spec][!coercion]}
> 
> Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to 
> string.) Other letters may be added later based on need.
> 
> The coercion field cases the formatter class to attempt to coerce the 
> value to the specified type before calling format(value, format_spec)

So the !letters refer to actual types, where the format specifier letters 
are output format designators mean what ever the object interprets them as.

Hmmm... ok, I see why Guido leans towards putting it before the colon.  In 
a way it's more like a function call and not related to the format 
specifier type at all.

     {repr(name):format_spec}

Heck, it could even be first...

     {r!name:format_spec}

Or maybe because it's closer to name.__repr__ he prefers the name!r ordering?


A wilder idea I was thinking about somewhat related to this was to be able 
to chain format specifiers, but I haven't worked out the details yet.


> 5) Mini-language for format specifiers:
> 
> So I do like your (Ron's) latest proposal, and I am thinking about it 
> quite a bit.

I'm actually testing them before I post them.  That filters out most of the 
really bad ideas.  ;-)

Although I'd also like to see a few more people agree with it before 
committing to something new.

> Guido suggested (and I am favorable to the idea) that we simply keep the 
> 2.5 format syntax, or the slightly more advanced variation that's in the 
> PEP now.
> 
> This has a couple of advantages:
> 
> -- It means that Python programmers won't have to learn a new syntax.
> -- It makes the 2to3 conversion of format strings trivial. (Although 
> there are some other difficulties with automatic conversion of '%', but 
> they are unrelated to format specifiers.)

Yes, the 2 to 3 conversion will be a challenge with a new syntax, but as 
long as the new syntax is richer than the old one, it shouldn't be that 
much trouble.  If we remove things we could do before, then it gets much 
harder.


> Originally I liked the idea of putting the type letter at the front, 
> instead of at the back like it is in 2.5. However, when you think about 
> it, it actually makes sense to have it at the back. Because the type 
> letter is now optional, it won't need to be there most of the time. The 
> type letter is really just an optional modifier flag, not a "type" at all.

The reason it's in the back for % formatting is it serves as the closing 
bracket.  With the {}'s we can put it anywhere it makes that makes the most 
sense.


> Two features of your proposal that aren't supported in the old syntax are:
> 
>   -- Arbitrary fill characters, as opposed to just '0' and ' '.
>   -- Taking the string value from the left or right.
> 
> I'm not sure how much we need the first. The second sounds kind of 
> useful though.

The fill characters are already implemented in the strings rjust, ljust, 
and center methods.

  |  center(...)
  |      S.center(width[, fillchar]) -> string
  |
  |      Return S centered in a string of length width. Padding is
  |      done using the specified fill character (default is a space)

So adding it, is just a matter of calling these with the fillchar.

And as Guido also pointed out... the taking of string values from the left 
and right should work on strings and not numbers.


> I'm thinking that we might be able to take your ideas and simply extend 
> the old 2.5 syntax, so that it would be backwards compatible. On the 
> other hand, it seems to me that once we have a *real* implementation 
> (which we will soon), it will be relatively easy for people to 
> experiment with new features and syntactical innovations.

I'm looking forward to that.  :-)


> 6) Finally, Guido stressed that he wants to make sure that the 
> implementation supports fields within fields, such as:
> 
>    {0:{1}.{2}}

I've been thinking about this also for the use of dynamically formatting 
strings.  Is that the use case he is after?

     "{0:{1},{2}}".format(value, '^40', 'f(20.2)')

Which would first insert {1} and {2} into the string before formatting 0.

      {0:^40,f(20.2)}   Use your favorite syntax of course. ;-)

The items 1, and 2 would probably not be string literals in this case, but 
come from a data source associated to the value.

And of course what actually gets inserted in inner fields can be anything.


> Fortunately, the 'format' function doesn't have to handle this (it only 
> formats a single value.) This would be done by the higher-level code.

Looks like this is moving along nicely now. :-)

Cheers,
    Ron




From andrew.j.wade at gmail.com  Tue Aug 14 08:28:05 2007
From: andrew.j.wade at gmail.com (Andrew James Wade)
Date: Tue, 14 Aug 2007 02:28:05 -0400
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
Message-ID: <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>

On Mon, 13 Aug 2007 20:53:26 -0700
"Guido van Rossum" <guido at python.org> wrote:

...

> One of my favorite examples of non-numeric types are the date, time
> and datetime types from the datetime module; here I propose that their
> __format__ be defined like this:
> 
>   def __format__(self, spec):
>       return self.strftime(spec)

You loose the ability to align the field then. What about:

    def __format__(self, align_spec, spec="%Y-%m-%d %H:%M:%S"):
        return format(self.strftime(spec), align_spec)
 
with

    def format(value, spec):
        if "," in spec:
            align_spec, custom_spec = spec.split(",",1)
            return value.__format__(align_spec, custom_spec)
        else:
            return value.__format__(spec)

":,%Y-%m-%d" may be slightly more gross than ":%Y-%m-%d", but on the plus
side ":30" would mean the same thing across all types.

-- Andrew

From walter at livinglogic.de  Tue Aug 14 09:55:35 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue, 14 Aug 2007 09:55:35 +0200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C12526.8040807@ronadam.com>
References: <46B13ADE.7080901@acm.org>	<46B2E265.5080905@ronadam.com>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>	<46BD5BD8.7030706@acm.org>	<46BF22D8.2090309@trueblade.com>	<46BF3CC7.6010405@acm.org>	<46C0EEBF.3010206@ronadam.com>	<46C10D2D.60705@canterbury.ac.nz>
	<46C12526.8040807@ronadam.com>
Message-ID: <46C15FF7.8020106@livinglogic.de>

Ron Adam wrote:
> 
> Greg Ewing wrote:
>> Ron Adam wrote:
>>
>>> The digits value are number of digits before the decimal.  This doesn't 
>>> include the other symbols used in the field so it isn't the same as a field 
>>> width.
>> How does this work with formats where the number of
>> digits before the decimal can vary, but before+after
>> is constant?
> 
> I think this is what you're looking for.
> 
>     f>15,.3   #15 field width, 3 decimal places, right aligned.
> 
> In this case the sign will be right before the most significant digit.
> 
> Or you could use...
> 
>     f 10.3   # total width = 15
> 
> In this one, the sign would be to the far left of the field.  So they are 
> not the same thing.  The space is used here to make positives numbers the 
> same width as negatives values.
> 
> 
>> Also, my feeling about the whole of this is that
>> it's too complicated. It seems like you can have
>> at least three numbers in a format, and at first
>> glance it's quite confusing as to what they all
>> mean.
> 
> Well, at first glance so is everything else that's been suggested,  it's 
> because we are doing a lot in a very little space.  In this case we are 
> adding just a touch of complexity to the syntax in order to use grouping to 
> remove complexity in understanding the expression.
> 
> These are all field width terms:
> 
>      >10       right align in field 10
>      ^15/_     center in field 15, pad with underscores
>      20/*      left align in field 20, pad with *
> 
> They are easy to identify because other terms do not contain '<^>/'. And 
> sense they are separate from other format terms, once you get it, you've 
> got it. Nothing more to remember here.
> 
> It doesn't make sense to put signs in front of field widths because the 
> signs have no relation to the field width at all.
> 
> 
> These are all number formats:
> 
>      +10.4
>      (10.4)
>      .6
>      ' 9.3'     Quoted so you can see the space.
>      10.
> 
> Here, we don't use alignment symbols.  Alignments have no meaning in the 
> context of number of digits.  So these taken as a smaller chunk of the 
> whole will also be easier to remember.  There are no complex interactions 
> between field alignment terms, and number terms this way.  That makes 
> simpler to understand and learn.
> 
> 
> Lets take apart the alternative syntax.
> 
>      f<+15.2
> 
>        f   fixed point     # of decimals is specified
> 
>        <   align left      (field attribute)
> 
>        +   sign            (number attribute)
> 
>        15  width           (field attribute)
> 
>        .2  decimals        (number attribute)

Then why not have something more readable like

    al;s+;w15;d2

This is longer that <+15.2, but IMHO much more readable, because it's 
clear where each specifier ends and begins.

Servus,
    Walter


From p.f.moore at gmail.com  Tue Aug 14 12:52:48 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 14 Aug 2007 11:52:48 +0100
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C15FF7.8020106@livinglogic.de>
References: <46B13ADE.7080901@acm.org> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
	<46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org>
	<46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz>
	<46C12526.8040807@ronadam.com> <46C15FF7.8020106@livinglogic.de>
Message-ID: <79990c6b0708140352n7df1a758h6ed2fb37138ea930@mail.gmail.com>

On 14/08/07, Walter D?rwald <walter at livinglogic.de> wrote:
> Then why not have something more readable like
>
>    al;s+;w15;d2

A brief sanity check freom someone who is not reading this thread, but
happened to see this post (and it's *not* a dig at Walter, just a
general comment):

If that's *more* readable, I'd hate to see what it's more readable *than*.

I'd suggest that someone take a step back and think about how people
will use these things in practice. I'd probably refuse to accept
something like that in a code review without a comment. And I'd
certainly swear if I had to deal with it in maintenance...

Paul.

From barry at python.org  Tue Aug 14 15:30:58 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 14 Aug 2007 09:30:58 -0400
Subject: [Python-3000] [Email-SIG] fix email module for python 3000
	(bytes/str)
In-Reply-To: <200708130226.03670.victor.stinner@haypocalc.com>
References: <200708090241.08369.victor.stinner@haypocalc.com>
	<200708110149.10939.victor.stinner@haypocalc.com>
	<8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org>
	<200708130226.03670.victor.stinner@haypocalc.com>
Message-ID: <E50AC401-C936-4202-903A-1691BD56ABE5@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 12, 2007, at 8:26 PM, Victor Stinner wrote:

> On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote:
>> In r56957 I committed changes to sndhdr.py and imghdr.py so that they
>> compare what they read out of the files against proper byte
>> literals.
>
> So nobody read my patches? :-( See my emails "[Python-3000] Fix  
> imghdr module
> for bytes" and "[Python-3000] Fix sndhdr module for bytes" from last
> saturday. But well, my patches look similar.

Victor, sorry but my email was very spotty and I definitely missed  
your original patches.  Sorry for duplicating work and thanks for  
fixing the last few things in these modules.  Glad Guido got these  
committed.

I'll follow up on email package more in a bit.
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsGuknEjvBPtnXfVAQLbfgQAqfiBeaVwIN35nXn9D7DZXItkzoZSd+1V
f/a4PnzBHTdvFZgggisK/7o5b1uULOaHILLSmiQMFp0W/zV2JFCvKI7kc1/SkjSo
UgIXK3o9WtmljH3aj1njc6fgy3VCVfa09NDKf89/rCy15AaSxF21YinIDIqF/yGN
Sn2RQJqvNPc=
=KpZC
-----END PGP SIGNATURE-----

From skip at pobox.com  Tue Aug 14 15:34:20 2007
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 14 Aug 2007 08:34:20 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <79990c6b0708140352n7df1a758h6ed2fb37138ea930@mail.gmail.com>
References: <46B13ADE.7080901@acm.org> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
	<46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org>
	<46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz>
	<46C12526.8040807@ronadam.com> <46C15FF7.8020106@livinglogic.de>
	<79990c6b0708140352n7df1a758h6ed2fb37138ea930@mail.gmail.com>
Message-ID: <18113.44892.399786.404146@montanaro.dyndns.org>


    Paul> If that's *more* readable, I'd hate to see what it's more readable
    Paul> *than*.

    Paul> I'd suggest that someone take a step back and think about how
    Paul> people will use these things in practice. I'd probably refuse to
    Paul> accept something like that in a code review without a comment. And
    Paul> I'd certainly swear if I had to deal with it in maintenance...

I'm with Paul.  When I first saw the examples of the proposed notation I
thought, "Exactly how is this better than the current printf-style format
strings?"  Since then I have basically ignored the discussion.  Before you
go too much farther I would suggest feeding the proposal to the
wolves^H^H^H^H^H^Hfolks on comp.lang.python to see what they think.

Skip

From barry at python.org  Tue Aug 14 15:45:45 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 14 Aug 2007 09:45:45 -0400
Subject: [Python-3000] bytes: compare bytes to integer
In-Reply-To: <f9ngoi$epc$1@sea.gmane.org>
References: <200708110225.28056.victor.stinner@haypocalc.com>
	<07Aug12.101123pdt."57996"@synergy1.parc.xerox.com>
	<f9ngoi$epc$1@sea.gmane.org>
Message-ID: <719423F4-779C-4596-8045-0E60603A9F92@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 12, 2007, at 1:41 PM, Georg Brandl wrote:

> Bill Janssen schrieb:
>>> I don't like the behaviour of Python 3000 when we compare a bytes  
>>> strings
>>> with length=1:
>>>>>> b'xyz'[0] == b'x'
>>>    False
>>>
>>> The code can be see as:
>>>>>> ord(b'x') == b'x'
>>>    False
>>>
>>> or also:
>>>>>> 120 == b'x'
>>>    False
>>>
>>> Two solutions:
>>>  1. b'xyz'[0] returns a new bytes object (b'x' instead of 120)
>>>     like b'xyz'[0:1] does
>>>  2. allow to compare a bytes string of 1 byte with an integer
>>>
>>> I prefer (2) since (1) is wrong: bytes contains integers and not  
>>> bytes!
>>
>> Why not just write
>>
>>    b'xyz'[0:1] == b'x'
>>
>> in the first place?  Let's not start adding "special" cases.
>
> Hm... I have a feeling that this will be one of the first entries in a
> hypothetical "Python 3.0 Gotchas" list.

Yes, it will because the b-prefix tricks you by being just similar  
enough to 8-bit strings for you to want them to act the same way.   
I'm not advocating getting rid of bytes literals though (they are  
just too handy), but if you were forced to spell it bytes('xyz') I  
don't think you'd get as much confusion.

Any tutorial on bytes should include the following example:

 >>> a = list('xyz')
 >>> a[0]
'x'
 >>> a[0:1]
['x']
 >>> b = bytes('xyz')
 >>> b[0]
120
 >>> b[0:1]
b'x'
 >>> b == b'xyz'
True

That makes it pretty clear, IMO.
- -Barry



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsGyC3EjvBPtnXfVAQJEtAQAtMUk8fVAFeMHYam6iNg4G3+NwmPWVXp4
YJSh8ZBEICSNlyJSNk8ntE0vKkqLSFMnI24RtoFDJJ2lKrbPtBoH2OyWuXHgfCzd
VG/LBMjMRV0IMQjkl2EtpD2atBBfDhQ6IPZtqaZJQ7HM10IUZtEq3gf/Q2Alttm4
nr4W46Pny3s=
=1rz/
-----END PGP SIGNATURE-----

From barry at python.org  Tue Aug 14 15:58:32 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 14 Aug 2007 09:58:32 -0400
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
Message-ID: <FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 13, 2007, at 4:15 PM, Guido van Rossum wrote:

> I've seen similar behavior in MS VC++ (long ago, dunno what it does
> these days). It would read files with \r\n and \n line endings, and
> whenever you edited a line, that line also got a \r\n ending. But
> unchanged lines that started out with \n-only endings would keep the
> \n only. And there was no way for the end user to see or control this.
>
> To emulate this behavior in Python you'd have to read the file in
> binary mode *or* we'd have to have an additional flag specifying to
> return line endings as encountered in the file. The newlines attribute
> (as defined in 2.x) doesn't help, because it doesn't tell which lines
> used which line ending. I think the newline feature in PEP 3116 falls
> short too; it seems mostly there to override the line ending *written*
> (from the default os.sep).
>
> I think we may need different flags for input and for output.
>
> For input, we'd need two things: (a) which are acceptable line
> endings; (b) whether to translate acceptable line endings to \n or
> not. For output, we need two things again: (c) whether to translate
> line endings at all; (d) which line endings to translate. I guess we
> could map (c) to (b) and (d) to (a) for a signature that's the same
> for input and output (and makes sense for read+write files as well).
> The default would be (a)=={'\n', '\r\n', '\r'} and (b)==True.

I haven't thought about the output side of the equation, but I've  
already hit a situation where I'd like to see the input side (b)  
option implemented.

I'm still sussing out the email package changes (down to 7F/9E of 247  
tests!) but in trying to fix things I found myself wanting to open  
files in text mode so that I got strings out of the file instead of  
bytes.  This was all fine except that some of the tests started  
failing because of the EOL translation that happens unconditionally  
now.   The file contained \r\n and the test was ensuring these EOLs  
were preserved in the parsed text.  I switched back to opening the  
file in binary mode, and doing a crufty conversion of bytes to  
strings (which I suspect is error prone but gets me farther along).

It would have been perfect, I think, if I could have opened the file  
in text mode so that read() gave me strings, with universal newlines  
and preservation of line endings (i.e. no translation to \n).

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsG1CXEjvBPtnXfVAQKF3AP/X+/E44KI2EB3w0i3N5cGBCajJbMV93fk
j2S/lfQf4tjBH3ZFEhUnybcJxsNukYY65T4MdzKh+IgJHV5s0rQtl2Hzr85e7Y0O
i5Z3N4TAKc11PjSIk6vKrkgwPCEMzvwIQ5DFxeQBF5kOF6cZuXKaeDzB6z/GBYNv
YiJEnOeZkW8=
=u6OL
-----END PGP SIGNATURE-----

From rrr at ronadam.com  Tue Aug 14 16:20:26 2007
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 14 Aug 2007 09:20:26 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C15FF7.8020106@livinglogic.de>
References: <46B13ADE.7080901@acm.org>	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>	<46BD5BD8.7030706@acm.org>	<46BF22D8.2090309@trueblade.com>	<46BF3CC7.6010405@acm.org>	<46C0EEBF.3010206@ronadam.com>	<46C10D2D.60705@canterbury.ac.nz>
	<46C12526.8040807@ronadam.com> <46C15FF7.8020106@livinglogic.de>
Message-ID: <46C1BA2A.9050303@ronadam.com>



Walter D?rwald wrote:

>> Lets take apart the alternative syntax.
>>
>>      f<+15.2
>>
>>        f   fixed point     # of decimals is specified
>>
>>        <   align left      (field attribute)
>>
>>        +   sign            (number attribute)
>>
>>        15  width           (field attribute)
>>
>>        .2  decimals        (number attribute)
> 
> Then why not have something more readable like
> 
>    al;s+;w15;d2
> 
> This is longer that <+15.2, but IMHO much more readable, because it's 
> clear where each specifier ends and begins.
> 
> Servus,
>    Walter

Well depending on what its for that might very well be appropriate, It's 
order independent, and has other benefits, but it' probably the other 
extreme in the case of string formatting.

It's been expressed here quite a few times that compactness is also 
desirable.  By dividing it into two terms, you still get the compactness in 
the most common cases and you get an easier to understand and read terms in 
the more complex cases.



Or format fields dynamically by inserting components, it makes things a bit 
easier.

BTW, the order of the grouping is flexible...

       value_spec = "f.2"
       "{0:{1},^30}".format(value, value_spec)

Or...

      field_spec = "^30"
      "{0:f.2,{1}}".format(value, field_spec)


So it breaks it up into logical parts as well.


Cheers,
    Ron






From barry at python.org  Tue Aug 14 17:39:29 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 14 Aug 2007 11:39:29 -0400
Subject: [Python-3000] Questions about email bytes/str (python 3000)
In-Reply-To: <200708140422.36818.victor.stinner@haypocalc.com>
References: <200708140422.36818.victor.stinner@haypocalc.com>
Message-ID: <E8DCAEF8-B7F8-4946-8256-AD0732492C51@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 13, 2007, at 10:22 PM, Victor Stinner wrote:

> After many tests, I'm unable to convert email module to Python  
> 3000. I'm also
> unable to take decision of the best type for some contents.

I made a lot of progress on the email package while I was traveling,  
though I haven't checked things in yet.  I probably will very soon,  
even if I haven't yet fixed the last few remaining problems.  I'm  
down to 7 failures, 9 errors of 247 tests.

> (1) Email parts should be stored as byte or character string?

Strings.  Email messages are conceptually strings so I think it makes  
sense to represent them internally as such.  The FeedParser should  
expect strings and the Generator should output strings.  One place  
where I think bytes should show up would be in decoded payloads, but  
in that case I really want to make an API change so that .get_payload 
(decoded=True) is deprecated in favor of a separate method.

I'm proposing other API changes to make things work better, a few of  
which are in my current patch, but others I want to defer if they  
don't directly contribute to getting these tests to pass.

> Related methods: Generator class, Message.get_payload(),  
> Message.as_string().
>
> Let's take an example: multipart (MIME) email with latin-1 and  
> base64 (ascii)
> sections. Mix latin-1 and ascii => mix bytes. So the best type  
> should be
> bytes.
>
> => bytes

Except that by the time they're parsed into an email message, they  
must be ascii, either encoded as base64 or quoted-printable.  We also  
have to know at that point the charset being used, so I think it  
makes sense to keep everything as strings.

> (2) Parsing file (raw string): use bytes or str in parsing?
>
> The parser use methods related to str like splitlines(), lower(),  
> strip(). But
> it should be easy to rewrite/avoid these methods. I think that low- 
> level
> parsing should be done on bytes. At the end, or when we know the  
> charset, we
> can convert to str.
>
> => bytes

Maybe, though I'm not totally convinced.  It's certainly easier to  
get the tests to pass if we stick with parsing strings.   
email.message_from_string() should continue to accept strings,  
otherwise obviously it would have to be renamed, but also because  
it's primary use case is turning a triple quoted string literal into  
an email message.

I alluded to the one crufty part of this in a separate thread.  In  
order to accept universal newlines but preserve end-of-line  
characters, you currently have to open files in binary mode.  Then,  
because my parser works on strings you have to convert those bytes to  
strings, which I am successfully doing now, but which I suspect is  
ultimately error prone.  I would like to see a flag to preserve line  
endings on files opened in text + universal newlines mode, and then I  
think the hack for Parser.parse() would go away.  We'd define how  
files passed to this method must be opened.  Besides, I think it is  
much more common to be parsing strings into email messages anyway.

> About base64, I agree with Bill Janssen:
>  - base64MIME.decode converts string to bytes
>  - base64MIME.encode converts bytes to string

I agree.

> But decode may accept bytes as input (as base64 modules does): use
> str(value, 'ascii', 'ignore') or str(value, 'ascii', 'strict').

Hmm, I'm not sure about this, but I think that .encode() may have to  
accept strings.

> I wrote 4 differents (non-working) patches. So I you want to work  
> on email
> module and Python 3000, please first contact me. When I will get a  
> better
> patch, I will submit it.

Like I said, I also have an extensive patch that gets me most of the  
way there.  I don't want to having dueling patches, so I think what  
I'll do is put a branch in the sandbox and apply my changes there for  
now.  Then we will have real code to discuss.

A few other things from my notes and diff:

Do we need email.message_from_bytes() and Message.as_bytes()?  While  
I'm (currently <wink>) pretty well convinced that email messages  
should be strings, the use case for bytes includes reading them  
directly to or from sockets, though in this case because the RFCs  
generally require ascii with encodings and charsets clearly  
described, I think a bytes-to-string wrapper may suffice.

Charset class: How do we do conversions from input charset to output  
charset?  This is required by e.g. Japanese to go from euc-jp to  
iso-2022-jp IIUC.  Currently I have to use a crufty string-to-bytes  
converter like so:

 >>> bytes(ord(c) for c in s)

rather than just bytes(s).  I'm sure there's a better way I haven't  
found yet.

Generator._write_headers() and the _is8bitstring() test aren't really  
appropriate or correct now that everything's a unicode.  This  
affected quite a few tests because long headers that previously were  
getting split were now not getting split.  I ended up ditching the  
_is8bitstring() test, but that lead me into an API change for  
Message.__str__() and Message.as_string(), which I've long wanted to  
do anyway.  First Message.__str__() no longer includes the Unix-From  
header, but more importantly, .as_string() takes the maxheaderlen as  
an argument and defaults to no header wrapping.  By changing various  
related tests to call .as_string(maxheaderlen=78), these split header  
tests can be made to pass again.  I think these changes make str 
(some_message) saner and more explicit (because it does not split  
headers) but these may be controversial in the email-sig.

You asked earlier about decode_header().  This should definitely  
return a list of tuples of (bytes, charset|None).

Header is going to need some significant revision  First, there's the  
whole mess of .encode() vs. __str__() vs. __unicode__() to sort out.   
It's insane that the latter two had different semantics w.r.t.  
whitespace preservation between encoded words, so let's fix that.   
Also, if the common use case is to do something like this:

 >>> msg['subject'] = 'a subject string'

then I wonder if we shouldn't be doing more sanity checking on the  
header value.  For example, if the value had a non-ascii character in  
it, then what should we do?  One way would be to throw an exception,  
requiring the use of something like:

 >>> msg['subject'] = Header('a \xfc subject', 'utf-8')

or we could do the most obvious thing and try to convert to 'ascii'  
then 'utf-8' if no charset is given explicitly.  I thought about  
always turning headers into Header instances, but I think that might  
break some common use cases.  It might be possible to define equality  
and other operations on Header instances so that these common cases  
continue to work.  The email-sig can address that later.

However, if all Header instances are unicode and have a valid  
charset, I wonder if the splittable tests are still relevant, and  
whether we can simplify header splitting.  I have to think about this  
some more.

As for the remaining failures and errors, they come down to  
simplifying the splittable logic, dealing with Message.__str__() vs.  
Message.__unicode__(), verifying that the UnicodeErrors some tests  
expect to get raise don't make sense any more, and fixing a couple of  
other small issues I haven't gotten to yet.

I will create a sandbox branch and apply my changes later today so we  
have something concrete to look at.

Cheers,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsHMsXEjvBPtnXfVAQLfCwP8CeHi9RBW5ULri3w6sBz5a1fkdVCftk71
uW8q0LercTJSa2ewvtrlWdKm9F403IabYjh2Bg8cZfHmYyZ+/b18oU64zzkZylo/
pHw9Iyvk9ZW6G7mwJRwpV9c6JXJNvsQtKRWipuue0ZMagI5OJBXR8vhRIDGkt+NC
ARhIrHXPEW8=
=DBLp
-----END PGP SIGNATURE-----

From adam at hupp.org  Tue Aug 14 18:16:21 2007
From: adam at hupp.org (Adam Hupp)
Date: Tue, 14 Aug 2007 11:16:21 -0500
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
	<FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
Message-ID: <20070814161621.GA26420@mouth.upl.cs.wisc.edu>

On Tue, Aug 14, 2007 at 09:58:32AM -0400, Barry Warsaw wrote:
> This was all fine except that some of the tests started  
> failing because of the EOL translation that happens unconditionally  
> now.   The file contained \r\n and the test was ensuring these EOLs  
> were preserved in the parsed text.  I switched back to opening the  
> file in binary mode, and doing a crufty conversion of bytes to  
> strings (which I suspect is error prone but gets me farther along).
> 
> It would have been perfect, I think, if I could have opened the file  
> in text mode so that read() gave me strings, with universal newlines  
> and preservation of line endings (i.e. no translation to \n).

FWIW this same issue (and solution) came up while fixing the csv
tests.

-- 
Adam Hupp | http://hupp.org/adam/


From martin at v.loewis.de  Tue Aug 14 18:35:56 2007
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Aug 2007 18:35:56 +0200
Subject: [Python-3000] PEP 3131: Identifier syntax
Message-ID: <46C1D9EC.70902@v.loewis.de>

I'm trying to finalize PEP 3131, and want to collect proposals on
modifications of the identifier syntax. I will ignore any proposals
that suggest that different versions of the syntax should be used
depending on various conditions; I'm only asking for modifications
to the current proposed syntax.

So far, I recall two specific suggestions which I have now
incorporated into the PEP: usage if NFKC instead of NFC, and
usage of XID_Start and XID_Continue instead of ID_Start and
ID_Continue (although I'm still uncertain on how precisely
these properties are defined).

What other changes should be implemented?

Regards,
Martin

From guido at python.org  Tue Aug 14 18:41:48 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 14 Aug 2007 09:41:48 -0700
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
Message-ID: <ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>

On 8/13/07, Andrew James Wade <andrew.j.wade at gmail.com> wrote:
> On Mon, 13 Aug 2007 20:53:26 -0700
> "Guido van Rossum" <guido at python.org> wrote:
>
> ...
>
> > One of my favorite examples of non-numeric types are the date, time
> > and datetime types from the datetime module; here I propose that their
> > __format__ be defined like this:
> >
> >   def __format__(self, spec):
> >       return self.strftime(spec)
>
> You loose the ability to align the field then. What about:
>
>     def __format__(self, align_spec, spec="%Y-%m-%d %H:%M:%S"):
>         return format(self.strftime(spec), align_spec)
>
> with
>
>     def format(value, spec):
>         if "," in spec:
>             align_spec, custom_spec = spec.split(",",1)
>             return value.__format__(align_spec, custom_spec)
>         else:
>             return value.__format__(spec)
>
> ":,%Y-%m-%d" may be slightly more gross than ":%Y-%m-%d", but on the plus
> side ":30" would mean the same thing across all types.

Sorry, I really don't like imposing *any* syntactic constraints on the
spec apart from !r and !s.

You can get the default format with a custom size by using !s:30.

If you want a custom format *and* padding, just add extra spaces to the spec.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 14 18:52:47 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 14 Aug 2007 09:52:47 -0700
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
	<FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
Message-ID: <ca471dc20708140952g75cb6099xc7beed59f415b80@mail.gmail.com>

On 8/14/07, Barry Warsaw <barry at python.org> wrote:
> It would have been perfect, I think, if I could have opened the file
> in text mode so that read() gave me strings, with universal newlines
> and preservation of line endings (i.e. no translation to \n).

You can do that already, by passing newline="\n" to the open()
function when using text mode. Try this script for a demo:

f = open("@", "wb")
f.write("bare nl\n"
        "crlf\r\n"
        "bare nl\n"
        "crlf\r\n")
f.close()

f = open("@", "r")  # default, universal newlines mode
print(f.readlines())
f.close()

f = open("@", "r", newline="\n")  # recognize only \n as newline
print(f.readlines())
f.close()

This outputs:

['bare nl\n', 'crlf\n', 'bare nl\n', 'crlf\n']
['bare nl\n', 'crlf\r\n', 'bare nl\n', 'crlf\r\n']

Now, this doesn't support bare \r as line terminator, but I doubt you
care much about that (unless you want to port the email package to Mac
OS 9 :-).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Tue Aug 14 19:22:28 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 14 Aug 2007 19:22:28 +0200
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <46BDAC43.3050904@benjiyork.com>
References: <f9hakp$6n4$1@sea.gmane.org> <46BD1185.2080702@canterbury.ac.nz>
	<46BDAC43.3050904@benjiyork.com>
Message-ID: <f9socq$cb2$1@sea.gmane.org>

Benji York wrote:
>> But wouldn't the only reason you want to step into,
>> e.g. pickle be if there were a bug in pickle itself?
> 
> I believe he's talking about a situation where pickle calls back into 
> Python.

Yes, Benji is right. In the past I run into trouble with pickles two or
times. I was successfully able to debug and resolve my problem with the
pickle module and pdb. I like to keep the option in the Python 3.0
series. In my opinion it is very useful to step through Python code to
see how the code is suppose to work.

I'm trying to get involve in the Python core development process. It
seems that I'm not ready yet to contribute new ideas because I'm missing
the big picture. On the other hand I don't know how I can contribute to
existing sub projects for Py3k. I find it difficult to get in. :/

Christian


From brett at python.org  Tue Aug 14 20:16:25 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 14 Aug 2007 11:16:25 -0700
Subject: [Python-3000] No (C) optimization flag
In-Reply-To: <f9socq$cb2$1@sea.gmane.org>
References: <f9hakp$6n4$1@sea.gmane.org> <46BD1185.2080702@canterbury.ac.nz>
	<46BDAC43.3050904@benjiyork.com> <f9socq$cb2$1@sea.gmane.org>
Message-ID: <bbaeab100708141116x29b98bb1w2533b83ee894f85a@mail.gmail.com>

On 8/14/07, Christian Heimes <lists at cheimes.de> wrote:
> Benji York wrote:
> >> But wouldn't the only reason you want to step into,
> >> e.g. pickle be if there were a bug in pickle itself?
> >
> > I believe he's talking about a situation where pickle calls back into
> > Python.
>
> Yes, Benji is right. In the past I run into trouble with pickles two or
> times. I was successfully able to debug and resolve my problem with the
> pickle module and pdb. I like to keep the option in the Python 3.0
> series. In my opinion it is very useful to step through Python code to
> see how the code is suppose to work.
>
> I'm trying to get involve in the Python core development process. It
> seems that I'm not ready yet to contribute new ideas because I'm missing
> the big picture.

Just stick around for a while and you will pick up on a general theme
in how decisions are made.

> On the other hand I don't know how I can contribute to
> existing sub projects for Py3k. I find it difficult to get in. :/

Well, don't force it unless you like the subproject.  If you are just
looking for something to do there are always bugs to squash or patches
to evaluate.  Otherwise I would suggest just waiting until something
comes along that grabs your attention and bugging anyone else who is
working on it for any guidance you need.

Yes, it can take a little while to get into the groove, but we are all
nice guys and are happy to answer your questions.

-Brett

From jimjjewett at gmail.com  Tue Aug 14 21:33:20 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 14 Aug 2007 15:33:20 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C0EEBF.3010206@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46BAEFB0.9050400@ronadam.com>
	<46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com>
	<46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com>
	<46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com>
Message-ID: <fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>

On 8/13/07, Ron Adam <rrr at ronadam.com> wrote:
> I reconsidered the split term forms a bit more and I think I've come up
> with a better way to think about them. Sometimes a slight conceptual
> shift can make a difference.

> The basic form is:

>        {name[:][type][alignment_term][,content_modifying_term]}

That sounds good, but make sure it works in practice; I think you were
already tempted to violate it yourself in your details section.  You
used the (alignment term) width as the number of digits before the
decimal, instead of as the field width.

> TYPE:
>         The specifier type.  One of 'deifsrx'.
>                 (and any others I left off)

There should be one that says "just trust the object, and if it
doesn't have a __format__, then gripe".  (We wouldn't need to support
arbitrary content_modifying_terms unless it were possible to use more
than the builtin types.)

> ALIGNMENT TERM:   [direction]width[/fill]
>
>      direction:    is one of   '<^>'
>      width:        a positive integer
>      /fill:         a character


So this assumes fixed-width, with fill?

Can I leave the width off to say "whatever it takes"?

Can I say "width=whatever it takes, up to 72 chars ... but don't pad
it if you don't need to"?

(And once you support variable-width, then minimum is needed again.)

I'm not sure that variable lengths and alignment even *should* be
supported in the same expression, but it forcing everything to
fixed-width would be enough of a change that it needs an explicit
callout.

>      NUMBERS:   [sign][0][digits][.decimals][%]

I read Greg's question:

    How does this work with formats where the number of
    digits before the decimal can vary, but before+after
    is constant?

differently, as about significant figures.  It may be that 403 and
14.1 are both valid values, but 403.0 would imply too much precision.
(Would it *always* be OK to write these as 4.03e+2 and 1.41e+1?)

Maybe the answer is that sig figs are a special case, and need a
template with callbacks instead of a format string ... but that
doesn't feel right.

>      /fill:         a character

I think you need to specify it a bit more than that.  Can you use a
comma?  (It looks like the start of the content modifier.)  How about
a quote-mark, or a carriage return?

-jJ

From g.brandl at gmx.net  Tue Aug 14 22:19:06 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 14 Aug 2007 22:19:06 +0200
Subject: [Python-3000] Documentation switch imminent
Message-ID: <f9t2nn$ksg$1@sea.gmane.org>

Now that the converted documentation is fairly bug-free, I want to
make the switch.

I will replace the old Doc/ trees in the trunk and py3k branches
tomorrow, moving over the reST ones found at
svn+ssh://svn.python.org/doctools/Doc-{26,3k}.

Neal will change his build scripts, so that the 2.6 and 3.0 devel
documentation pages at docs.python.org will be built from these new
trees soon.

Infos for people who will write docs in the new trees can be found in the
new "Documenting Python" document, at the moment still available from
http://pydoc.gbrandl.de:3000/documenting/, especially the "Differences"
section at http://pydoc.gbrandl.de:3000/documenting/fromlatex/ (which
is not complete, patches are welcome :)

Cheers,
Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From lists at cheimes.de  Tue Aug 14 22:46:56 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 14 Aug 2007 22:46:56 +0200
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <f9t2nn$ksg$1@sea.gmane.org>
References: <f9t2nn$ksg$1@sea.gmane.org>
Message-ID: <46C214C0.3050703@cheimes.de>

Georg Brandl wrote:
> Infos for people who will write docs in the new trees can be found in the
> new "Documenting Python" document, at the moment still available from
> http://pydoc.gbrandl.de:3000/documenting/, especially the "Differences"
> section at http://pydoc.gbrandl.de:3000/documenting/fromlatex/ (which
> is not complete, patches are welcome :)

http://pydoc.gbrandl.de:3000/documenting/fromlatex/ doesn't work for me:

Keyword Not Found

The keyword documenting/fromlatex is not directly associated with a page.


Christian


From g.brandl at gmx.net  Tue Aug 14 22:55:36 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 14 Aug 2007 22:55:36 +0200
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <46C214C0.3050703@cheimes.de>
References: <f9t2nn$ksg$1@sea.gmane.org> <46C214C0.3050703@cheimes.de>
Message-ID: <f9t4s4$sn9$1@sea.gmane.org>

Christian Heimes schrieb:
> Georg Brandl wrote:
>> Infos for people who will write docs in the new trees can be found in the
>> new "Documenting Python" document, at the moment still available from
>> http://pydoc.gbrandl.de:3000/documenting/, especially the "Differences"
>> section at http://pydoc.gbrandl.de:3000/documenting/fromlatex/ (which
>> is not complete, patches are welcome :)
> 
> http://pydoc.gbrandl.de:3000/documenting/fromlatex/ doesn't work for me:
> 
> Keyword Not Found
> 
> The keyword documenting/fromlatex is not directly associated with a page.

Oops... should be fixed now.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From rrr at ronadam.com  Wed Aug 15 00:53:17 2007
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 14 Aug 2007 17:53:17 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>
References: <46B13ADE.7080901@acm.org> <46BAEFB0.9050400@ronadam.com>	
	<46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com>	
	<46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com>	 <46BD5BD8.7030706@acm.org>
	<46BF22D8.2090309@trueblade.com>	 <46BF3CC7.6010405@acm.org>
	<46C0EEBF.3010206@ronadam.com>
	<fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>
Message-ID: <46C2325D.1010209@ronadam.com>



Jim Jewett wrote:
> On 8/13/07, Ron Adam <rrr at ronadam.com> wrote:
>> I reconsidered the split term forms a bit more and I think I've come up
>> with a better way to think about them. Sometimes a slight conceptual
>> shift can make a difference.
> 
>> The basic form is:
> 
>>        {name[:][type][alignment_term][,content_modifying_term]}
> 
> That sounds good, but make sure it works in practice; I think you were
> already tempted to violate it yourself in your details section.  You
> used the (alignment term) width as the number of digits before the
> decimal, instead of as the field width.

I have a test version to test these ideas with.  It's uses a centralized 
parser rather the the distributed method that has been decided on, but it's 
good for testing the syntax.  In some ways it's easier for that.

You can leave out either term.  So that may have been what you are seeing.


There are more details that can be worked out.

You can also switch the order of the terms.  But I'm leaning towards having 
it always process the terms left to right.  For example you might have...

    {0:!r,s+30,<30}

So in this case, it would first do a repr(), then trim long strings to 30 
characters, then left align them in a field 30 characters wide.

But I haven't tested the idea of left to right sequential formatting yet as 
I need to move a lot of stuff around in my test implementation to get that 
to work.

Even with that, the most common cases are just single terms, so it adds 
capability for those who want it, or need it, without penalizing newbies.

     {0:f8.2}
   or
     {0:^30}

Simple expression like these will be what is used 9 out 10 times.


>> TYPE:
>>         The specifier type.  One of 'deifsrx'.
>>                 (and any others I left off)
> 
> There should be one that says "just trust the object, and if it
> doesn't have a __format__, then gripe".  (We wouldn't need to support
> arbitrary content_modifying_terms unless it were possible to use more
> than the builtin types.)

That's the default behavior sense the object gets it first.


>> ALIGNMENT TERM:   [direction]width[/fill]
>>
>>      direction:    is one of   '<^>'
>>      width:        a positive integer
>>      /fill:         a character
> 
> 
> So this assumes fixed-width, with fill?

Minimal width with fill for shorter than width items.  It expands if the 
length of the item is longer than width.


> Can I leave the width off to say "whatever it takes"?

Yes

> Can I say "width=whatever it takes, up to 72 chars ... but don't pad
> it if you don't need to"?

That's the default behavior.

> (And once you support variable-width, then minimum is needed again.)
> 
> I'm not sure that variable lengths and alignment even *should* be
> supported in the same expression, but it forcing everything to
> fixed-width would be enough of a change that it needs an explicit
> callout.

Alignment is needed for when the length of the value is shorter than the 
length of the field.  So if a field has a minimal width, and a value is 
shorter than that, it will be used.


>>      NUMBERS:   [sign][0][digits][.decimals][%]
> 
> I read Greg's question:
> 
>     How does this work with formats where the number of
>     digits before the decimal can vary, but before+after
>     is constant?
> 
> differently, as about significant figures.  It may be that 403 and
> 14.1 are both valid values, but 403.0 would imply too much precision.
> (Would it *always* be OK to write these as 4.03e+2 and 1.41e+1?)

I've been avoiding scientific notation so far.  :-)  So any suggestions on 
this part will be good.  I think the format would be 'e1.2' or even just 
'e.2'.  It should follow the same pattern if possible.

> Maybe the answer is that sig figs are a special case, and need a
> template with callbacks instead of a format string ... but that
> doesn't feel right.
> 
>>      /fill:         a character
> 
> I think you need to specify it a bit more than that.  Can you use a
> comma? 

Yes, any character after a '/' works.


  (It looks like the start of the content modifier.)  How about
> a quote-mark, or a carriage return?

Quotes (if they match the the string delimiters) will need to be escaped as 
well as new lines, but I don't see any reason why they couldn't be used. 
I'm not sure why you would want to use those, but I lean towards letting 
the programmer figure out that part. ;-)


Cheers,
    Ron





From brett at python.org  Wed Aug 15 01:57:19 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 14 Aug 2007 16:57:19 -0700
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <f9t2nn$ksg$1@sea.gmane.org>
References: <f9t2nn$ksg$1@sea.gmane.org>
Message-ID: <bbaeab100708141657n16a012fbqdceece1dff992a06@mail.gmail.com>

On 8/14/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Now that the converted documentation is fairly bug-free, I want to
> make the switch.
>
> I will replace the old Doc/ trees in the trunk and py3k branches
> tomorrow, moving over the reST ones found at
> svn+ssh://svn.python.org/doctools/Doc-{26,3k}.

First, that address is wrong; missing a 'trunk' in there.

Second, are we going to keep the docs in a separate tree forever, or
is this just for now?  I am not thinking so much about the tools, but
whether we will need to do two separate commits in order to make code
changes *and* change the docs?  Or are you going to add an externals
dependency in the trees to their respective doc directories?

-Brett

From greg.ewing at canterbury.ac.nz  Wed Aug 15 02:02:35 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 15 Aug 2007 12:02:35 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C12526.8040807@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46B2E265.5080905@ronadam.com>
	<46B35295.1030007@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
	<46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org>
	<46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz>
	<46C12526.8040807@ronadam.com>
Message-ID: <46C2429B.1090507@canterbury.ac.nz>

Ron Adam wrote:
> 
> Greg Ewing wrote:
 >
> > How does this work with formats where the number of
> > digits before the decimal can vary, but before+after
> > is constant?
> 
> I think this is what you're looking for.
> 
>    f>15,.3   #15 field width, 3 decimal places, right aligned.

No, I'm talking about formats such as "g" where the
number of significant digits is fixed, but the position
of the decimal point can change depending on the magnitude
of the number. That wouldn't fit into your before.after
format.

>> Also, my feeling about the whole of this is that
>> it's too complicated.
>
> it's because we are doing a lot in a very little space.

Yes, and I think you're trying to do a bit too much.
The format strings are starting to look like line
noise.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug 15 02:15:24 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 15 Aug 2007 12:15:24 +1200
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
Message-ID: <46C2459C.1000405@canterbury.ac.nz>

Guido van Rossum wrote:
> Over lunch we discussed putting !coercion first. IMO {foo!r:20} reads
> more naturally from left to right

It also has the advantage that the common case of
'r' with no other specifications is one character
shorter and looks tidier, i.e. {foo!r} rather than
{foo:!r}.

But either way, I suspect I'll find it difficult
to avoid writing it as {foo:r} in the heat of the
moment.

> On 8/13/07, Talin <talin at acm.org> wrote:
> > Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to
> > string.)

Is there ever a case where you would need to
convert to a string?

> > Originally I liked the idea of putting the type letter at the front,
> > instead of at the back like it is in 2.5. However, when you think about
> > it, it actually makes sense to have it at the back.

I'm not so sure about that. Since most of the time
it's going to be used as a discriminator that determines
how the rest of the format spec is interpreted, it
could make more sense to have it at the front.

The only reason it's at the back in % formats is
because that's the only way of telling where the
format spec ends. We don't have that problem here.

> > 6) Finally, Guido stressed that he wants to make sure that the
> > implementation supports fields within fields, such as:
> >
> >    {0:{1}.{2}}

Is that recursive? In other words, can the
nested {} contain another full format spec?

--
Greg


From guido at python.org  Wed Aug 15 02:27:37 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 14 Aug 2007 17:27:37 -0700
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C2459C.1000405@canterbury.ac.nz>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<46C2459C.1000405@canterbury.ac.nz>
Message-ID: <ca471dc20708141727t1d2505ffhc30d2e89ecdf0da3@mail.gmail.com>

On 8/14/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> > Over lunch we discussed putting !coercion first. IMO {foo!r:20} reads
> > more naturally from left to right
>
> It also has the advantage that the common case of
> 'r' with no other specifications is one character
> shorter and looks tidier, i.e. {foo!r} rather than
> {foo:!r}.
>
> But either way, I suspect I'll find it difficult
> to avoid writing it as {foo:r} in the heat of the
> moment.

I guess __format__ implementations should fall back to a default
formatting spec rather than raising an exception when they don't
understand the spec passed to them.

> > On 8/13/07, Talin <talin at acm.org> wrote:
> > > Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to
> > > string.)
>
> Is there ever a case where you would need to
> convert to a string?

When the default output produced by __format__ is different from that
produced by __str__ or __repr__ (hard to imagine), or when you want to
use a string-specific option to pad or truncate (more likely).

> > > Originally I liked the idea of putting the type letter at the front,
> > > instead of at the back like it is in 2.5. However, when you think about
> > > it, it actually makes sense to have it at the back.
>
> I'm not so sure about that. Since most of the time
> it's going to be used as a discriminator that determines
> how the rest of the format spec is interpreted, it
> could make more sense to have it at the front.

This can be decided on a type-by-type basis (except for numbers, which
should follow the built-in numbers' example).

> The only reason it's at the back in % formats is
> because that's the only way of telling where the
> format spec ends. We don't have that problem here.

But it's still more familiar to read "10.3f" than "f10.3".

> > > 6) Finally, Guido stressed that he wants to make sure that the
> > > implementation supports fields within fields, such as:
> > >
> > >    {0:{1}.{2}}
>
> Is that recursive? In other words, can the
> nested {} contain another full format spec?

We were trying not to open that can of worms. It's probably
unnecessary to support that, but it may be easier to support it than
to forbid it, and I don't see anything wrong with it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Aug 15 02:56:54 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 15 Aug 2007 12:56:54 +1200
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
Message-ID: <46C24F56.5050104@canterbury.ac.nz>

Guido van Rossum wrote:
> On 8/13/07, Andrew James Wade <andrew.j.wade at gmail.com> wrote:
> 
>>On Mon, 13 Aug 2007 20:53:26 -0700
>>"Guido van Rossum" <guido at python.org> wrote:

>>>I propose that their
>>>__format__ be defined like this:
>>>
>>>  def __format__(self, spec):
>>>      return self.strftime(spec)
>>
>>You loose the ability to align the field then.

This might be a use case for the chaining of format specs
that Ron mentioned. Suppose you could do

   "{{1:spec1}:spec2}".format(x)

which would be equivalent to

   format(format(x, "spec1"), "spec2")

then you could do

   "{{1:%Y-%m-%d %H:%M:%S}:<20}".format(my_date)

and get your date left-aligned in a 20-wide field.

(BTW, I'm not sure about using strftime-style formats
as-is, since the % chars look out of place in our new
format syntax.)

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Aug 15 02:59:45 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 15 Aug 2007 12:59:45 +1200
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <ca471dc20708140952g75cb6099xc7beed59f415b80@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
	<FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
	<ca471dc20708140952g75cb6099xc7beed59f415b80@mail.gmail.com>
Message-ID: <46C25001.6080806@canterbury.ac.nz>

Guido van Rossum wrote:

> Now, this doesn't support bare \r as line terminator, but I doubt you
> care much about that (unless you want to port the email package to Mac
> OS 9 :-).

Haven't we decided that '\r' still occurs in some
cases even on MacOSX?

--
Greg

From guido at python.org  Wed Aug 15 03:41:47 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 14 Aug 2007 18:41:47 -0700
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <46C25001.6080806@canterbury.ac.nz>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
	<FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
	<ca471dc20708140952g75cb6099xc7beed59f415b80@mail.gmail.com>
	<46C25001.6080806@canterbury.ac.nz>
Message-ID: <ca471dc20708141841k6a6b7ab5n49df9a34a0e63000@mail.gmail.com>

On 8/14/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
> > Now, this doesn't support bare \r as line terminator, but I doubt you
> > care much about that (unless you want to port the email package to Mac
> > OS 9 :-).
>
> Haven't we decided that '\r' still occurs in some
> cases even on MacOSX?

Yes.

I was simply describing what works today. The \r option still needs to
be added to io.py. But it is in the PEP.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Wed Aug 15 03:44:54 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 14 Aug 2007 18:44:54 PDT
Subject: [Python-3000] Questions about email bytes/str (python 3000)
In-Reply-To: <E8DCAEF8-B7F8-4946-8256-AD0732492C51@python.org> 
References: <200708140422.36818.victor.stinner@haypocalc.com>
	<E8DCAEF8-B7F8-4946-8256-AD0732492C51@python.org>
Message-ID: <07Aug14.184454pdt."57996"@synergy1.parc.xerox.com>

> > Let's take an example: multipart (MIME) email with latin-1 and  
> > base64 (ascii)
> > sections. Mix latin-1 and ascii => mix bytes. So the best type  
> > should be
> > bytes.
> >
> > => bytes
> 
> Except that by the time they're parsed into an email message, they  
> must be ascii, either encoded as base64 or quoted-printable.  We also  
> have to know at that point the charset being used, so I think it  
> makes sense to keep everything as strings.

Actually, Victor's right here -- it makes more sense to treat them as
bytes.  It's RFC 821 (SMTP) that requires 7-bit ASCII, not the MIME
format.  Non-SMTP mail transports do exist, and are popular in various
places.  Email transported via other transport mechanisms may, for
instance, use a Content-Transfer-Encoding of "binary" for some
sections of the message.  Some parts of the top-most header of the
message may be counted on to be encoded as ASCII strings, but not the
whole message in general.

> > About base64, I agree with Bill Janssen:
> >  - base64MIME.decode converts string to bytes
> >  - base64MIME.encode converts bytes to string
> 
> I agree.
> 
> > But decode may accept bytes as input (as base64 modules does): use
> > str(value, 'ascii', 'ignore') or str(value, 'ascii', 'strict').
> 
> Hmm, I'm not sure about this, but I think that .encode() may have to  
> accept strings.

Personally, I think it would avoid more errors if it didn't.  Let the
user explicitly encode the string to a particular representation
before calling base64.encode().

Bill

From rrr at ronadam.com  Wed Aug 15 03:58:59 2007
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 14 Aug 2007 20:58:59 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C2429B.1090507@canterbury.ac.nz>
References: <46B13ADE.7080901@acm.org>
	<46B35295.1030007@acm.org>	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>	<46B39BBF.80809@ronadam.com>
	<46B46DE1.1090403@canterbury.ac.nz>	<46B4950F.40905@ronadam.com>
	<46B4A9A0.9070206@ronadam.com>	<46B52422.2090006@canterbury.ac.nz>	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>	<46B568F3.9060105@ronadam.com>
	<46B66BE0.7090005@canterbury.ac.nz>	<46B6851C.1030204@ronadam.com>
	<46B6C219.4040900@canterbury.ac.nz>	<46B6DABB.3080509@ronadam.com>
	<46B7CD8C.5070807@acm.org>	<46B8D58E.5040501@ronadam.com>
	<46BAC2D9.2020902@acm.org>	<46BAEFB0.9050400@ronadam.com>
	<46BBB1AE.5010207@canterbury.ac.nz>	<46BBEB16.2040205@ronadam.com>
	<46BC1479.30405@canterbury.ac.nz>	<46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org>	<46BF22D8.2090309@trueblade.com>
	<46BF3CC7.6010405@acm.org>	<46C0EEBF.3010206@ronadam.com>
	<46C10D2D.60705@canterbury.ac.nz>	<46C12526.8040807@ronadam.com>
	<46C2429B.1090507@canterbury.ac.nz>
Message-ID: <46C25DE3.6060906@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> Greg Ewing wrote:
>  >
>>> How does this work with formats where the number of
>>> digits before the decimal can vary, but before+after
>>> is constant?
>> I think this is what you're looking for.
>>
>>    f>15,.3   #15 field width, 3 decimal places, right aligned.
> 
> No, I'm talking about formats such as "g" where the
> number of significant digits is fixed, but the position
> of the decimal point can change depending on the magnitude
> of the number. That wouldn't fit into your before.after
> format.

It would probably just a number with no decimal point in it.  Something 
like 'g10' seems simple enough.  You will always have the 'g' in this case.


>>> Also, my feeling about the whole of this is that
>>> it's too complicated.
>> it's because we are doing a lot in a very little space.
> 
> Yes, and I think you're trying to do a bit too much.
> The format strings are starting to look like line
> noise.

Do you have a specific example or is it just an overall feeling?


One of the motivations for finding something else is because the % 
formatting terms are confusing to some. A few here have said they need to 
look them up repeatedly and have difficulty remembering the exact forms and 
order.

And part of it is the suggestion of splitting it up into parts that are 
interpreted by the objects __format__ method, and a part that are 
interpreted by the format function.  For example the the field alignment 
part can be handled by the format function, and the value format part can 
be handled by the __format__ method.  It helps to have the alignment part 
be well defined and completely separate from the content formatter part in 
this case.  And it saves everyone from having to parse and implement 
alignments in there format methods.  I think that is really the biggest 
reason to do this.

I'm not sure you can split up field aligning and numeric formatting that 
way when using the % style formatting.  They are combined too tightly.  So 
each type would need to do both in it's __format__ method.  And chances are 
there will be many types that do one or the other but not both just because 
it's too much work, or just due to plain laziness.

So before we discard this, I'd like to see a full working version with 
complete __format__ methods for int, float, and str types and any 
supporting functions they may use.

And my apologies if its starting to seem like line noise.  I'm not that 
good at explaining things in simple ways.  I tend to add too much detail 
when I don't need to, or not enough when I do.  A complaint I get often 
enough.  But I think this one is fixable by anyone who is a bit better at 
writing and explaining things in simple ways than I am. :-)

Cheers,
   Ron












From rrr at ronadam.com  Wed Aug 15 04:12:32 2007
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 14 Aug 2007 21:12:32 -0500
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C24F56.5050104@canterbury.ac.nz>
References: <46BD79EC.1020301@acm.org>
	<46C095CF.2060507@ronadam.com>	<46C11DDF.2080607@acm.org>	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
	<46C24F56.5050104@canterbury.ac.nz>
Message-ID: <46C26110.8020001@ronadam.com>



Greg Ewing wrote:
> Guido van Rossum wrote:
>> On 8/13/07, Andrew James Wade <andrew.j.wade at gmail.com> wrote:
>>
>>> On Mon, 13 Aug 2007 20:53:26 -0700
>>> "Guido van Rossum" <guido at python.org> wrote:
> 
>>>> I propose that their
>>>> __format__ be defined like this:
>>>>
>>>>  def __format__(self, spec):
>>>>      return self.strftime(spec)
>>> You loose the ability to align the field then.
> 
> This might be a use case for the chaining of format specs
> that Ron mentioned. Suppose you could do
> 
>    "{{1:spec1}:spec2}".format(x)
> 
> which would be equivalent to
> 
>    format(format(x, "spec1"), "spec2")


What I was thinking of was just a simple left to right evaluation order.

     "{0:spec1, spec2, ... }".format(x)

I don't expect this will ever get very long.


> then you could do
> 
>    "{{1:%Y-%m-%d %H:%M:%S}:<20}".format(my_date)
> 
> and get your date left-aligned in a 20-wide field.

So in this case all you would need is...

     {0:%Y-%m-%d %H:%M:%S,<20}



> (BTW, I'm not sure about using strftime-style formats
> as-is, since the % chars look out of place in our new
> format syntax.)
> 
> --
> Greg
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/rrr%40ronadam.com
> 
> 

From andrew.j.wade at gmail.com  Wed Aug 15 05:02:27 2007
From: andrew.j.wade at gmail.com (Andrew James Wade)
Date: Tue, 14 Aug 2007 23:02:27 -0400
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
Message-ID: <20070814230227.0c9be356.ajwade+py3k@andrew.wade.networklinux.net>

On Tue, 14 Aug 2007 09:41:48 -0700
"Guido van Rossum" <guido at python.org> wrote:

> On 8/13/07, Andrew James Wade <andrew.j.wade at gmail.com> wrote:
> > On Mon, 13 Aug 2007 20:53:26 -0700
> > "Guido van Rossum" <guido at python.org> wrote:
> >
> > ...
> >
> > > One of my favorite examples of non-numeric types are the date, time
> > > and datetime types from the datetime module; here I propose that their
> > > __format__ be defined like this:
> > >
> > >   def __format__(self, spec):
> > >       return self.strftime(spec)
> >
> > You loose the ability to align the field then. What about:
> >
> >     def __format__(self, align_spec, spec="%Y-%m-%d %H:%M:%S"):
> >         return format(self.strftime(spec), align_spec)
> >
> > with
> >
> >     def format(value, spec):
> >         if "," in spec:
> >             align_spec, custom_spec = spec.split(",",1)
> >             return value.__format__(align_spec, custom_spec)
> >         else:
> >             return value.__format__(spec)
> >
> > ":,%Y-%m-%d" may be slightly more gross than ":%Y-%m-%d", but on the plus
> > side ":30" would mean the same thing across all types.
> 
> Sorry, I really don't like imposing *any* syntactic constraints on the
> spec apart from !r and !s.

Does this mean that {1!30:%Y-%m-%d} would be legal syntax, that __format__
can do what it pleases with? That'd be great: there's an obvious place
for putting standard fields, and another for putting custom formatting
where collisions with !r and !s are not a concern:
{1!30}
{1:%Y-%m-%d}
{1:!renewal date: %Y-%m-%d} # no special meaning for ! here.
{1!30:%Y-%m-%d}

"!" wouldn't necessarily have to be followed by standard codes, but I'm
not sure why you'd want to put anything else (aside from !r, !s) there.

> You can get the default format with a custom size by using !s:30.
>
> If you want a custom format *and* padding, just add extra spaces to the spec.

That doesn't work for ":%A" or ":%B"; not if you want to pad to a fixed
width. I really think you'll have support for the standard
string-formatting codes appear in most formatting specifications in some
guise or another; they may as well appear in a standard place too.

-- Andrew

From barry at python.org  Wed Aug 15 05:44:26 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 14 Aug 2007 23:44:26 -0400
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <ca471dc20708140952g75cb6099xc7beed59f415b80@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
	<FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
	<ca471dc20708140952g75cb6099xc7beed59f415b80@mail.gmail.com>
Message-ID: <D74D8B6A-C467-45C0-B7CB-94E9F30FA83F@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 14, 2007, at 12:52 PM, Guido van Rossum wrote:

> On 8/14/07, Barry Warsaw <barry at python.org> wrote:
>> It would have been perfect, I think, if I could have opened the file
>> in text mode so that read() gave me strings, with universal newlines
>> and preservation of line endings (i.e. no translation to \n).
>
> You can do that already, by passing newline="\n" to the open()
> function when using text mode.

Cute, but obscure.  I'm not sure I like it as the ultimate way of  
spelling these semantics.

> Try this script for a demo:
>
> f = open("@", "wb")
> f.write("bare nl\n"
>         "crlf\r\n"
>         "bare nl\n"
>         "crlf\r\n")
> f.close()
>
> f = open("@", "r")  # default, universal newlines mode
> print(f.readlines())
> f.close()
>
> f = open("@", "r", newline="\n")  # recognize only \n as newline
> print(f.readlines())
> f.close()
>
> This outputs:
>
> ['bare nl\n', 'crlf\n', 'bare nl\n', 'crlf\n']
> ['bare nl\n', 'crlf\r\n', 'bare nl\n', 'crlf\r\n']
>
> Now, this doesn't support bare \r as line terminator, but I doubt you
> care much about that (unless you want to port the email package to Mac
> OS 9 :-).

Naw, I don't, though someday we'll get just such a file and a bug  
report about busted line endings ;).

There's still a problem though: this works for .readlines() but not  
for .read() which unconditionally converts \r\n to \n.  The  
FeedParser uses .read() and I think the behavior should be the same  
for both methods.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsJ2mnEjvBPtnXfVAQIL8AP/YhVUAoR9yWMniTUls5thI4ubUmPJlln4
R2cDOCw97lsYEDBk80bS2d/ZgncG5EnleIBmg+UtkEoSduhTOLZjot3cgmfy1DqX
LHFfUCe8AnHLjuZBV7RbOcpn14X8fGtqNkYq25yvyOIvIYdIBP64ZjbyFD+kZhTA
Ss8e10D+YJw=
=otBw
-----END PGP SIGNATURE-----

From guido at python.org  Wed Aug 15 06:03:40 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 14 Aug 2007 21:03:40 -0700
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <D74D8B6A-C467-45C0-B7CB-94E9F30FA83F@python.org>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
	<FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
	<ca471dc20708140952g75cb6099xc7beed59f415b80@mail.gmail.com>
	<D74D8B6A-C467-45C0-B7CB-94E9F30FA83F@python.org>
Message-ID: <ca471dc20708142103p54b397a0n21b8cd524fc2a4b1@mail.gmail.com>

On 8/14/07, Barry Warsaw <barry at python.org> wrote:
> On Aug 14, 2007, at 12:52 PM, Guido van Rossum wrote:
> > On 8/14/07, Barry Warsaw <barry at python.org> wrote:
> >> It would have been perfect, I think, if I could have opened the file
> >> in text mode so that read() gave me strings, with universal newlines
> >> and preservation of line endings (i.e. no translation to \n).
> >
> > You can do that already, by passing newline="\n" to the open()
> > function when using text mode.
>
> Cute, but obscure.  I'm not sure I like it as the ultimate way of
> spelling these semantics.

It was the best we could come up with in the 3 minutes we devoted to
this at PyCon when drafting PEP 3116. If you have a better idea,
please don't hide it!

> > Try this script for a demo:
> >
> > f = open("@", "wb")
> > f.write("bare nl\n"
> >         "crlf\r\n"
> >         "bare nl\n"
> >         "crlf\r\n")
> > f.close()
> >
> > f = open("@", "r")  # default, universal newlines mode
> > print(f.readlines())
> > f.close()
> >
> > f = open("@", "r", newline="\n")  # recognize only \n as newline
> > print(f.readlines())
> > f.close()
> >
> > This outputs:
> >
> > ['bare nl\n', 'crlf\n', 'bare nl\n', 'crlf\n']
> > ['bare nl\n', 'crlf\r\n', 'bare nl\n', 'crlf\r\n']
> >
> > Now, this doesn't support bare \r as line terminator, but I doubt you
> > care much about that (unless you want to port the email package to Mac
> > OS 9 :-).
>
> Naw, I don't, though someday we'll get just such a file and a bug
> report about busted line endings ;).
>
> There's still a problem though: this works for .readlines() but not
> for .read() which unconditionally converts \r\n to \n.  The
> FeedParser uses .read() and I think the behavior should be the same
> for both methods.

Ow, that's a bug! I'll look into fixing it; this was unintentional!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Wed Aug 15 06:27:08 2007
From: talin at acm.org (Talin)
Date: Tue, 14 Aug 2007 21:27:08 -0700
Subject: [Python-3000] PEP 3101 Updated
Message-ID: <46C2809C.3000806@acm.org>

A new version is up, incorporating material from the various discussions 
on this list:

	http://www.python.org/dev/peps/pep-3101/

Diffs are here:

http://svn.python.org/view/peps/trunk/pep-3101.txt?rev=57044&r1=56535&r2=57044

-- Talin

From barry at python.org  Wed Aug 15 06:33:46 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 15 Aug 2007 00:33:46 -0400
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <ca471dc20708142103p54b397a0n21b8cd524fc2a4b1@mail.gmail.com>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
	<FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
	<ca471dc20708140952g75cb6099xc7beed59f415b80@mail.gmail.com>
	<D74D8B6A-C467-45C0-B7CB-94E9F30FA83F@python.org>
	<ca471dc20708142103p54b397a0n21b8cd524fc2a4b1@mail.gmail.com>
Message-ID: <7375B721-C62F-49ED-945B-5A9107246D6C@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 15, 2007, at 12:03 AM, Guido van Rossum wrote:

> On 8/14/07, Barry Warsaw <barry at python.org> wrote:
>> On Aug 14, 2007, at 12:52 PM, Guido van Rossum wrote:
>>> On 8/14/07, Barry Warsaw <barry at python.org> wrote:
>>>> It would have been perfect, I think, if I could have opened the  
>>>> file
>>>> in text mode so that read() gave me strings, with universal  
>>>> newlines
>>>> and preservation of line endings (i.e. no translation to \n).
>>>
>>> You can do that already, by passing newline="\n" to the open()
>>> function when using text mode.
>>
>> Cute, but obscure.  I'm not sure I like it as the ultimate way of
>> spelling these semantics.
>
> It was the best we could come up with in the 3 minutes we devoted to
> this at PyCon when drafting PEP 3116. If you have a better idea,
> please don't hide it!

I think you (almost) suggested it in your first message!  Add a flag  
called preserve_eols that defaults to False, is ignored unless  
universal newline mode is turned on, and when True, disables the  
replacement on input.

>> There's still a problem though: this works for .readlines() but not
>> for .read() which unconditionally converts \r\n to \n.  The
>> FeedParser uses .read() and I think the behavior should be the same
>> for both methods.
>
> Ow, that's a bug! I'll look into fixing it; this was unintentional!

Oh, excellent!  I think that will nicely take care of email package's  
needs and will allow me to remove the crufty conversion.

Thanks!
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsKCK3EjvBPtnXfVAQLgcgP/eFUci/jmPqEY5TDE1bUHgiMhY3F1GxXX
epYc4Q7wDOf05Ky1pjmRDRMkfQkalL/seP58IAW7b1FaWT98bSP56vrLcyuy+oje
23e7bqggEikfS/+E15U7E/xz+h1qbKdEr7c43/sl/s8flBE47MHXAI/sMKKfvS+6
kVqHKWXX0Lk=
=81z3
-----END PGP SIGNATURE-----

From andrew.j.wade at gmail.com  Wed Aug 15 06:42:04 2007
From: andrew.j.wade at gmail.com (Andrew James Wade)
Date: Wed, 15 Aug 2007 00:42:04 -0400
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C26110.8020001@ronadam.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
	<46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com>
Message-ID: <20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net>

On Tue, 14 Aug 2007 21:12:32 -0500
Ron Adam <rrr at ronadam.com> wrote:

> 
> 
> Greg Ewing wrote:
> > Guido van Rossum wrote:
> >> On 8/13/07, Andrew James Wade <andrew.j.wade at gmail.com> wrote:
> >>
> >>> On Mon, 13 Aug 2007 20:53:26 -0700
> >>> "Guido van Rossum" <guido at python.org> wrote:
> > 
> >>>> I propose that their
> >>>> __format__ be defined like this:
> >>>>
> >>>>  def __format__(self, spec):
> >>>>      return self.strftime(spec)
> >>> You loose the ability to align the field then.
> > 
> > This might be a use case for the chaining of format specs
> > that Ron mentioned. Suppose you could do
> > 
> >    "{{1:spec1}:spec2}".format(x)
> > 
> > which would be equivalent to
> > 
> >    format(format(x, "spec1"), "spec2")

That would be a solution to my concerns, though that would have to be:

    "{ {1:spec1}:spec2}"

> 
> 
> What I was thinking of was just a simple left to right evaluation order.
> 
>      "{0:spec1, spec2, ... }".format(x)
> 
> I don't expect this will ever get very long.

The first __format__ will return a str, so chains longer than 2 don't
make a lot of sense. And the delimiter character should be allowed in
spec1; limiting the length of the chain to 2 allows that without escaping:

    "{0:spec1-with-embedded-comma,}".format(x)

My scheme did the same sort of thing with spec1 and spec2 reversed.
Your order makes more intuitive sense; I chose my order because I
wanted the syntax to be a generalization of formatting strings.

Handling the chaining within the __format__ methods should be all of
two lines of boilerplate per method.

-- Andrew

From guido at python.org  Wed Aug 15 06:56:23 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 14 Aug 2007 21:56:23 -0700
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
Message-ID: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>

I thought some more about the universal newlines situation, and I
think I can handle all the use cases with a single 'newline'
parameter. The use cases are:

(A) input use cases:

(1) newline=None: input with default universal newlines mode; lines
may end in \r, \n, or \r\n, and these are translated to \n.

(2) newline='': input with untranslated universal newlines mode; lines
may end in \r, \n, or \r\n, and these are returned untranslated.

(3) newline='\r', newline='\n', newline='\r\n': input lines must end
with the given character(s), and these are translated to \n.

(B) output use cases:

(1) newline=None: every \n written is translated to os.linesep.

(2) newline='': no translation takes place.

(3) newline='\r', newline='\n', newline='\r\n': every \n written is
translated to the value of newline.

Note that cases (2) are new, and case (3) changes from the current PEP
and/or from the current implementation (which seems to deviate from
the PEP).

Also note that it doesn't matter whether .readline(), .read() or
.read(N) is used. The PEP is currently unclear on this and the
implementation is wrong.

Proposed language for the PEP:


    ``.__init__(self, buffer, encoding=None, newline=None)``

        ``buffer`` is a reference to the ``BufferedIOBase`` object to
        be wrapped with the ``TextIOWrapper``.

        ``encoding`` refers to an encoding to be used for translating
        between the byte-representation and character-representation.
        If it is ``None``, then the system's locale setting will be
        used as the default.

        ``newline`` can be ``None``, ``''``, ``'\n'``, ``'\r'``, or
        ``'\r\n'``; all other values are illegal.  It controls the
        handling of line endings.  It works as follows:

        * On input, if ``newline`` is ``None``, universal newlines
          mode is enabled.  Lines in the input can end in ``'\n'``,
          ``'\r'``, or ``'\r\n'``, and these are translated into
          ``'\n'`` before being returned to the caller.  If it is
          ``''``, universal newline mode is enabled, but line endings
          are returned to the caller untranslated.  If it has any of
          the other legal values, input lines are only terminated by
          the given string, and the line ending is returned to the
          caller translated to ``'\n'``.

        * On output, if ``newline`` is ``None``, any ``'\n'``
          characters written are translated to the system default
          line separator, ``os.linesep``.  If ``newline`` is ``''``,
          no translation takes place.  If ``newline`` is any of the
          other legal values, any ``'\n'`` characters written are
          translated to the given string.

        Further notes on the ``newline`` parameter:

        * ``'\r'`` support is still needed for some OSX applications
          that produce files using ``'\r'`` line endings; Excel (when
          exporting to text) and Adobe Illustrator EPS files are the
          most common examples.

        * If translation is enabled, it happens regardless of which
          method is called for reading or writing.  For example,
          {{{f.read()}}} will always produce the same result as
          {{{''.join(f.readlines())}}}.

        * If universal newlines without translation are requested on
          input (i.e. ``newline=''``), if a system read operation
          returns a buffer ending in ``'\r'``, another system read
          operation is done to determine whether it is followed by
          ``'\n'`` or not.  In universal newlines mode with
          translation, the second system read operation may be
          postponed until the next read request, and if the following
          system read operation returns a buffer starting with
          ``'\n'``, that character is simply discarded.


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Wed Aug 15 07:50:30 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 15 Aug 2007 01:50:30 -0400
Subject: [Python-3000] Questions about email bytes/str (python 3000)
In-Reply-To: <E8DCAEF8-B7F8-4946-8256-AD0732492C51@python.org>
References: <200708140422.36818.victor.stinner@haypocalc.com>
	<E8DCAEF8-B7F8-4946-8256-AD0732492C51@python.org>
Message-ID: <F4B589A6-09C3-490E-95A7-070E6E2CBCEF@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 14, 2007, at 11:39 AM, Barry Warsaw wrote:

> I will create a sandbox branch and apply my changes later today so  
> we have something concrete to look at.

Done.  See:

http://svn.python.org/view/sandbox/trunk/emailpkg/5_0-exp/

I'm down to 5 failures and 6 errors (in test_email.py only), and I  
think most if not all of them are related to the broken header  
splittable stuff.

Please take a look.
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsKUJnEjvBPtnXfVAQISBQQAnEKytL8fqLbe+HADIyIBr1gDFtzbc4nw
zY4oEDPV+d4zFiAj9Ap5uePCfQxnqRdBMsHhkbCkB9k0XSDoWv2NxC10KLdE2CEO
YMLB+BB5uMjTCkHhaUVr/rIdKv/4LKZFy1v9dJv5X3BF5clugWa3L+tioe0kPk9X
jDkjZKc59LE=
=73uN
-----END PGP SIGNATURE-----

From rrr at ronadam.com  Wed Aug 15 08:52:33 2007
From: rrr at ronadam.com (Ron Adam)
Date: Wed, 15 Aug 2007 01:52:33 -0500
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net>
References: <46BD79EC.1020301@acm.org>	<46C095CF.2060507@ronadam.com>	<46C11DDF.2080607@acm.org>	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>	<46C24F56.5050104@canterbury.ac.nz>	<46C26110.8020001@ronadam.com>
	<20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net>
Message-ID: <46C2A2B1.1010708@ronadam.com>



Andrew James Wade wrote:
> On Tue, 14 Aug 2007 21:12:32 -0500
> Ron Adam <rrr at ronadam.com> wrote:

>> What I was thinking of was just a simple left to right evaluation order.
>>
>>      "{0:spec1, spec2, ... }".format(x)
>>
>> I don't expect this will ever get very long.
> 
> The first __format__ will return a str, so chains longer than 2 don't
> make a lot of sense. And the delimiter character should be allowed in
> spec1; limiting the length of the chain to 2 allows that without escaping:
> 
>     "{0:spec1-with-embedded-comma,}".format(x)
> 
> My scheme did the same sort of thing with spec1 and spec2 reversed.
> Your order makes more intuitive sense; I chose my order because I
> wanted the syntax to be a generalization of formatting strings.
 >
> Handling the chaining within the __format__ methods should be all of
> two lines of boilerplate per method.

I went ahead and tried this out and it actually cleared up some difficulty 
  in organizing the parsing code.  That was a very nice surprise. :)

     (actual doctest)

     >>> import time
     >>> class GetTime(object):
     ...     def __init__(self, time=time.gmtime()):
     ...         self.time = time
     ...     def __format__(self, spec):
     ...         return fstr(time.strftime(spec, self.time))

     >>> start = GetTime(time.gmtime(1187154773.0085449))

     >>> fstr("Start: {0:%d/%m/%Y %H:%M:%S,<30}").format(start)
     'Start: 15/08/2007 05:12:53           '

After each term is returned from the __format__ call, the results 
__format__ method is called with the next specifier.  GetTime.__format__ 
returns a string.  str.__format__, aligns it.  A nice left to right 
sequence of events.

The chaining is handled before the __format__ method calls so each 
__format__ method only needs to be concerned with doing it's own thing.

The alignment is no longer special cased as it's just part of the string 
formatter.  No other types need it as long as their __format__ methods 
return strings. Which means nobody needs to write parsers to handle field 
alignments.

If you had explicit conversions for other types besides !r and !s, it might 
be useful to do things like the following.  Suppose you had text data with 
floats in it along with some other junk.  You could do the following...

      # Purposely longish example just to show sequence of events.

      "The total is: ${0:s-10,!f,(.2),>12}".format(line)

Which would grab 10 characters from the end of the line, convert it to a 
float, the floats __format__ method is called which formats it to 2 decimal 
places, then it's right aligned in a field 12 characters wide.

That could be shorted to {0:s-10,f(.2),>12} as long as strings types know 
how to convert to float.  Or if you want the () to line up on both sides, 
you'd probably just use {0:s-10,f(7.2)}.

This along with the nested substitutions Guido wants, this would be a 
pretty powerful mini formatting language like that Talon hinted at earlier.

I don't think there is any need to limit the number of terms, that sort of 
spoils the design.  The two downsides of this are it's a bit different from 
what users are use to, and we would need to escape commas inside of 
specifiers somehow.

It simplifies the parsing and formatting code underneath like I was hoping, 
but it may scare some people off.  But the simple common cases are still 
really simple, so I hope not.

BTW... I don't think I can add anything more to this idea.  The rest is 
just implementation details and documentation. :)

Cheers,
    Ron


From brett at python.org  Wed Aug 15 09:47:40 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 15 Aug 2007 00:47:40 -0700
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
Message-ID: <bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>

On 8/14/07, Guido van Rossum <guido at python.org> wrote:
> I thought some more about the universal newlines situation, and I
> think I can handle all the use cases with a single 'newline'
> parameter. The use cases are:
>
> (A) input use cases:
>
> (1) newline=None: input with default universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are translated to \n.
>
> (2) newline='': input with untranslated universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are returned untranslated.
>
> (3) newline='\r', newline='\n', newline='\r\n': input lines must end
> with the given character(s), and these are translated to \n.
>
> (B) output use cases:
>
> (1) newline=None: every \n written is translated to os.linesep.
>
> (2) newline='': no translation takes place.
>
> (3) newline='\r', newline='\n', newline='\r\n': every \n written is
> translated to the value of newline.
>

I like the options, but I would swap the meaning of None and the empty
string.  My reasoning for this is that for option 3 it says to me
"here is a string representing EOL, and make it \n".  So I would think
of the empty string as, "I don't know what EOL is, but I want it
translated to \n".  Then None means, "I don't want any translation
done" by the fact that the argument is not a string.  In other words,
the existence of a string argument means you want EOL translated to
\n, and the specific value of 'newline' specifying how to determine
what EOL is.

-Brett

From martin at v.loewis.de  Wed Aug 15 10:37:50 2007
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Aug 2007 10:37:50 +0200
Subject: [Python-3000] PEP 3131 is implemented
Message-ID: <46C2BB5E.7060102@v.loewis.de>

I just implemented PEP 3131 (non-ASCII identifiers).

There are several problems with displaying error messages,
in particular when the terminal cannot render the string;
if anybody wants to work on this, please go ahead.

Regards,
Martin

From eric+python-dev at trueblade.com  Wed Aug 15 11:04:32 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 15 Aug 2007 05:04:32 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C2809C.3000806@acm.org>
References: <46C2809C.3000806@acm.org>
Message-ID: <46C2C1A0.4060002@trueblade.com>

Talin wrote:
> A new version is up, incorporating material from the various discussions 
> on this list:
> 
> 	http://www.python.org/dev/peps/pep-3101/

I have a number of parts of this implemented.  I'm refactoring the 
original PEP 3101 sandbox code to get it working.  Mostly it involves 
un-optimizing string handling in the original work :(

These tests all pass:

self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e')
self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e')
self.assertEqual("My name is {0}".format('Fred'), "My name is Fred")
self.assertEqual("My name is {0[name]}".format(dict(name='Fred')),
                  "My name is Fred")
self.assertEqual("My name is {0} :-{{}}".format('Fred'),
                  "My name is Fred :-{}")

I have not added the !r syntax yet.

I've only spent 5 minutes looking at this so far, but I can't figure out 
where to add a __format__ to object.  If someone could point me to the 
right place, that would be helpful.

Thanks.

From barry at python.org  Wed Aug 15 14:25:49 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 15 Aug 2007 08:25:49 -0400
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
Message-ID: <C8D75D6B-B77B-4203-A2B6-7D44D15EC61F@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 15, 2007, at 3:47 AM, Brett Cannon wrote:

> On 8/14/07, Guido van Rossum <guido at python.org> wrote:
>> I thought some more about the universal newlines situation, and I
>> think I can handle all the use cases with a single 'newline'
>> parameter. The use cases are:
>>
>> (A) input use cases:
>>
>> (1) newline=None: input with default universal newlines mode; lines
>> may end in \r, \n, or \r\n, and these are translated to \n.
>>
>> (2) newline='': input with untranslated universal newlines mode;  
>> lines
>> may end in \r, \n, or \r\n, and these are returned untranslated.
>>
>> (3) newline='\r', newline='\n', newline='\r\n': input lines must end
>> with the given character(s), and these are translated to \n.
>>
>> (B) output use cases:
>>
>> (1) newline=None: every \n written is translated to os.linesep.
>>
>> (2) newline='': no translation takes place.
>>
>> (3) newline='\r', newline='\n', newline='\r\n': every \n written is
>> translated to the value of newline.
>>
>
> I like the options, but I would swap the meaning of None and the empty
> string.  My reasoning for this is that for option 3 it says to me
> "here is a string representing EOL, and make it \n".  So I would think
> of the empty string as, "I don't know what EOL is, but I want it
> translated to \n".  Then None means, "I don't want any translation
> done" by the fact that the argument is not a string.  In other words,
> the existence of a string argument means you want EOL translated to
> \n, and the specific value of 'newline' specifying how to determine
> what EOL is.

What Brett said.
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsLwznEjvBPtnXfVAQKMQAP9FzztQ09re2pLBN/uNKrLCf2i5Z1ENZQU
Rbfwv8Ek2ZcBurvDht8Oyj3wgOzOKUhk6XfHdHD0Mf3CW9XL6dMvSZHQOv3sORQF
Fh6MI4B9HezL/Fuy2C9OenM0TaYHkH5aoYagIjM9/LOezEkxliHU/gOMGY4657dG
Turqz+xPunw=
=xC3V
-----END PGP SIGNATURE-----

From barry at python.org  Wed Aug 15 14:28:56 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 15 Aug 2007 08:28:56 -0400
Subject: [Python-3000] PEP 3131 is implemented
In-Reply-To: <46C2BB5E.7060102@v.loewis.de>
References: <46C2BB5E.7060102@v.loewis.de>
Message-ID: <D436BF54-006F-41D3-9911-1F0D2995C08A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 15, 2007, at 4:37 AM, Martin v. L?wis wrote:

> I just implemented PEP 3131 (non-ASCII identifiers).
>
> There are several problems with displaying error messages,
> in particular when the terminal cannot render the string;
> if anybody wants to work on this, please go ahead.

I'm not sure this is related only to PEP 3131 changes (I haven't  
tried it yet), but I hit a similar problem when I was working on the  
email package.  I think I posted a message about it in a separate  
thread.

If an exception gets raised with a message containing characters that  
can't be printed on your terminal, you get a nasty exception inside  
io, which obscures the real exception.  The resulting traceback makes  
it difficult to debug what's going on.  I haven't looked deeper but / 
my/ solution was to repr the message before instantiating the  
exception.  IWBNI Python itself Did Something Better.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsLxiHEjvBPtnXfVAQLSkAQAuC0UAWwFb5kC6uQb9zhCm4zH/BMKaN1k
hb6PheDHHl2KnwKoVB+Lw3XrBlbrvZotpYnThQEAG4vNtW92O59zf1uxtRwFo16Q
dvQhqwx4fAobWWQYkIK1F7i6SaEyHa+8D8iXy33RTcZKkwKvD69miSFyGxEyHq2x
2zH1Uk+qzos=
=P+66
-----END PGP SIGNATURE-----

From lists at cheimes.de  Wed Aug 15 15:28:05 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 15 Aug 2007 15:28:05 +0200
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
Message-ID: <f9uv15$jva$1@sea.gmane.org>

Brett Cannon wrote:
> I like the options, but I would swap the meaning of None and the empty
> string.  My reasoning for this is that for option 3 it says to me
> "here is a string representing EOL, and make it \n".  So I would think
> of the empty string as, "I don't know what EOL is, but I want it
> translated to \n".  Then None means, "I don't want any translation
> done" by the fact that the argument is not a string.  In other words,
> the existence of a string argument means you want EOL translated to
> \n, and the specific value of 'newline' specifying how to determine
> what EOL is.

I like to propose some constants which should be used instead of the
strings:

MAC = '\r'
UNIX = '\n'
WINDOWS = '\r\n'
UNIVERSAL = ''
NOTRANSLATE = None

I think that open(filename, newline=io.UNIVERSAL) or open(filename,
newline=io.WINDOWS) is much more readable than open(filename,
newline=''). Besides I always forget if Windows is '\r\n' or '\n\r'. *g*

Christian


From g.brandl at gmx.net  Wed Aug 15 15:36:56 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 15 Aug 2007 15:36:56 +0200
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
Message-ID: <f9uvhk$l07$1@sea.gmane.org>

Brett Cannon schrieb:
> On 8/14/07, Guido van Rossum <guido at python.org> wrote:
>> I thought some more about the universal newlines situation, and I
>> think I can handle all the use cases with a single 'newline'
>> parameter. The use cases are:
>>
>> (A) input use cases:
>>
>> (1) newline=None: input with default universal newlines mode; lines
>> may end in \r, \n, or \r\n, and these are translated to \n.
>>
>> (2) newline='': input with untranslated universal newlines mode; lines
>> may end in \r, \n, or \r\n, and these are returned untranslated.
>>
>> (3) newline='\r', newline='\n', newline='\r\n': input lines must end
>> with the given character(s), and these are translated to \n.
>>
>> (B) output use cases:
>>
>> (1) newline=None: every \n written is translated to os.linesep.
>>
>> (2) newline='': no translation takes place.
>>
>> (3) newline='\r', newline='\n', newline='\r\n': every \n written is
>> translated to the value of newline.
>>
> 
> I like the options, but I would swap the meaning of None and the empty
> string.  My reasoning for this is that for option 3 it says to me
> "here is a string representing EOL, and make it \n".  So I would think
> of the empty string as, "I don't know what EOL is, but I want it
> translated to \n".  Then None means, "I don't want any translation
> done" by the fact that the argument is not a string.  In other words,
> the existence of a string argument means you want EOL translated to
> \n, and the specific value of 'newline' specifying how to determine
> what EOL is.

I'd use None and "\r"/... as proposed, but "U" instead of the empty string
for universal newline mode. "U" already has that established meaning, and
you don't have to remember the difference between the two (false) values ""
and None.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From g.brandl at gmx.net  Wed Aug 15 16:05:12 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 15 Aug 2007 16:05:12 +0200
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <bbaeab100708141657n16a012fbqdceece1dff992a06@mail.gmail.com>
References: <f9t2nn$ksg$1@sea.gmane.org>
	<bbaeab100708141657n16a012fbqdceece1dff992a06@mail.gmail.com>
Message-ID: <f9v16k$rq5$1@sea.gmane.org>

Brett Cannon schrieb:
> On 8/14/07, Georg Brandl <g.brandl at gmx.net> wrote:
>> Now that the converted documentation is fairly bug-free, I want to
>> make the switch.
>>
>> I will replace the old Doc/ trees in the trunk and py3k branches
>> tomorrow, moving over the reST ones found at
>> svn+ssh://svn.python.org/doctools/Doc-{26,3k}.
> 
> First, that address is wrong; missing a 'trunk' in there.

Sorry again.

> Second, are we going to keep the docs in a separate tree forever, or
> is this just for now? 

They will be moved (in a few minutes...) to the location where the
Latex docs are now.

> I am not thinking so much about the tools, but
> whether we will need to do two separate commits in order to make code
> changes *and* change the docs?  Or are you going to add an externals
> dependency in the trees to their respective doc directories?

No separate commits will be needed to commit changes to the docs.
However, the tool to build the docs will not be in the tree under Doc/,
but continue to be maintained in the doctools/ toplevel project.

I spoke with Martin about including them as externals, but we agreed that
they are not needed and cost too much time on every "svn up".  Instead,
the Doc/ makefile checks out the tools in a separate directory and runs
them from there. (The Doc/README.txt file explains this in more detail.)

Cheers,
Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From g.brandl at gmx.net  Wed Aug 15 16:33:21 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 15 Aug 2007 16:33:21 +0200
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <f9t2nn$ksg$1@sea.gmane.org>
References: <f9t2nn$ksg$1@sea.gmane.org>
Message-ID: <f9v2rd$2dl$1@sea.gmane.org>

Georg Brandl schrieb:
> Now that the converted documentation is fairly bug-free, I want to
> make the switch.
> 
> I will replace the old Doc/ trees in the trunk and py3k branches
> tomorrow, moving over the reST ones found at
> svn+ssh://svn.python.org/doctools/Doc-{26,3k}.
> 
> Neal will change his build scripts, so that the 2.6 and 3.0 devel
> documentation pages at docs.python.org will be built from these new
> trees soon.

Okay, I made the switch.  I tagged the state of both Python branches
before the switch as tags/py{26,3k}-before-rstdocs/.

>From now on, I'll make changes that apply to 2.6 and 3.0 only in the
trunk and hope that svnmerge will continue to work.

I'll also handle the backport of doc changes for bugfixes to the 2.5
branch if you drop me a mail which revision I should backport.

Cheers,
Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From python3now at gmail.com  Wed Aug 15 16:38:50 2007
From: python3now at gmail.com (James Thiele)
Date: Wed, 15 Aug 2007 07:38:50 -0700
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C2809C.3000806@acm.org>
References: <46C2809C.3000806@acm.org>
Message-ID: <8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com>

I think the example:

    "My name is {0.name}".format(file('out.txt'))

Would be easier to understand if you added:

Which would produce:

        "My name is 'out.txt'"



On 8/14/07, Talin <talin at acm.org> wrote:
> A new version is up, incorporating material from the various discussions
> on this list:
>
>         http://www.python.org/dev/peps/pep-3101/
>
> Diffs are here:
>
> http://svn.python.org/view/peps/trunk/pep-3101.txt?rev=57044&r1=56535&r2=57044
>
>
> -- Talin
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/python3now%40gmail.com
>

From python3now at gmail.com  Wed Aug 15 16:52:32 2007
From: python3now at gmail.com (James Thiele)
Date: Wed, 15 Aug 2007 07:52:32 -0700
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C2C1A0.4060002@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
Message-ID: <8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com>

The section on the explicit conversion flag contains the following line:

      These flags are typically placed before the format specifier:

Where else can they be placed?

Also there is no description of what action (if any) is taken if an
unknown explicit conversion flag is encoubtered.

On 8/15/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Talin wrote:
> > A new version is up, incorporating material from the various discussions
> > on this list:
> >
> >       http://www.python.org/dev/peps/pep-3101/
>
> I have a number of parts of this implemented.  I'm refactoring the
> original PEP 3101 sandbox code to get it working.  Mostly it involves
> un-optimizing string handling in the original work :(
>
> These tests all pass:
>
> self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e')
> self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e')
> self.assertEqual("My name is {0}".format('Fred'), "My name is Fred")
> self.assertEqual("My name is {0[name]}".format(dict(name='Fred')),
>                   "My name is Fred")
> self.assertEqual("My name is {0} :-{{}}".format('Fred'),
>                   "My name is Fred :-{}")
>
> I have not added the !r syntax yet.
>
> I've only spent 5 minutes looking at this so far, but I can't figure out
> where to add a __format__ to object.  If someone could point me to the
> right place, that would be helpful.
>
> Thanks.
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/python3now%40gmail.com
>

From fdrake at acm.org  Wed Aug 15 16:54:01 2007
From: fdrake at acm.org (Fred Drake)
Date: Wed, 15 Aug 2007 10:54:01 -0400
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <f9uv15$jva$1@sea.gmane.org>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
	<f9uv15$jva$1@sea.gmane.org>
Message-ID: <436D659B-DBF2-4342-AAFB-966F7FA44F86@acm.org>

On Aug 15, 2007, at 9:28 AM, Christian Heimes wrote:
> I like to propose some constants which should be used instead of the
> strings:

+1 for this.  This should make code easier to read, too; not everyone  
spends time with line-oriented I/O, and the strings are just magic  
numbers in that case.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From steven.bethard at gmail.com  Wed Aug 15 17:02:32 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 15 Aug 2007 09:02:32 -0600
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <f9uvhk$l07$1@sea.gmane.org>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
	<f9uvhk$l07$1@sea.gmane.org>
Message-ID: <d11dcfba0708150802o1d3cd731l8435fee8cecacbd7@mail.gmail.com>

On 8/15/07, Georg Brandl <g.brandl at gmx.net> wrote:
> I'd use None and "\r"/... as proposed, but "U" instead of the empty string
> for universal newline mode.

I know that "U" already has this meaning as a differently named
parameter, but if I saw something like::

    open(file_name, newline='U')

my first intuition would be that it's some weird file format where
each chunk of the file is delimited by letter U.  Probably just a
result of dealing with too many bad file formats though. ;-)

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From jimjjewett at gmail.com  Wed Aug 15 18:07:22 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 15 Aug 2007 12:07:22 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C2325D.1010209@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46BBEB16.2040205@ronadam.com>
	<46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com>
	<46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com>
	<fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>
	<46C2325D.1010209@ronadam.com>
Message-ID: <fb6fbf560708150907q5d2f037ex85139dce51dec7fb@mail.gmail.com>

On 8/14/07, Ron Adam <rrr at ronadam.com> wrote:
> Jim Jewett wrote:
> > On 8/13/07, Ron Adam <rrr at ronadam.com> wrote:

> >>        {name[:][type][alignment_term][,content_modifying_term]}

> > ...  You used the (alignment term)
> > width as the number of digits before the decimal,
> > instead of as the field width.

> You can leave out either term.  So that may have
> been what you are seeing.

I thought I was going based on your spec, rather than the examples.

>      {0:f8.2}

Because there is no comma, this should be a field of width 8 -- two
after a decimal point, the decimal point itself, and at most 5 digits
(or 4 and sign) before the the decimal point

I (mis?)read your messge as special-casing float to 8 digits before
the decimal point, for a total field width of 11.


...
> Minimal width with fill for shorter than width items.  It
> expands if the length of the item is longer than width.
 ...
> > Can I say "width=whatever it takes, up to 72 chars ...
> > but don't pad it if you don't need to"?

> That's the default behavior.

If width is only a minimum, where did it get the maximum of 72 chars part?

> > I'm not sure that variable lengths and alignment even
> > *should* be supported in the same expression, but if
> > forcing everything to fixed-width would be enough of
> > a change that it needs an explicit callout.

> Alignment is needed for when the length of the value is shorter than the
> length of the field.  So if a field has a minimal width, and a value is
> shorter than that, it will be used.

I should have been more explicit -- variable length *field*.  How can
one report say "use up to ten characters if you need them, but only
use three if that is all you need" and another report say "use exactly
ten characters; right align and fill with spaces."

-jJ

From barry at python.org  Wed Aug 15 18:11:18 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 15 Aug 2007 12:11:18 -0400
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <f9uv15$jva$1@sea.gmane.org>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
	<f9uv15$jva$1@sea.gmane.org>
Message-ID: <D584A49D-5E3C-4F51-BB34-021D688C7F52@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 15, 2007, at 9:28 AM, Christian Heimes wrote:

> I like to propose some constants which should be used instead of the
> strings:
>
> MAC = '\r'
> UNIX = '\n'
> WINDOWS = '\r\n'
> UNIVERSAL = ''
> NOTRANSLATE = None
>
> I think that open(filename, newline=io.UNIVERSAL) or open(filename,
> newline=io.WINDOWS) is much more readable than open(filename,
> newline=''). Besides I always forget if Windows is '\r\n' or '\n 
> \r'. *g*

Yes, excellent idea.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsMlpnEjvBPtnXfVAQINBAP/en1BYxU9wKErov26dyqo8snJLNnregEO
YVP/8b9EM4csEMAJbO/pOBjsOuub/TO5h7nCdiuV0GAGTAzzt4kICHr/cEVGKnOU
dCd949uTLeIYVkgJnPnJ/ynE5Q30uMIIysXBbrbNx3rWJt74fNBDuF0xLHgw4d0O
cHvT5rzmdvs=
=DwhF
-----END PGP SIGNATURE-----

From eric+python-dev at trueblade.com  Wed Aug 15 19:02:55 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 15 Aug 2007 13:02:55 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com>
References: <46C2809C.3000806@acm.org>
	<8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com>
Message-ID: <46C331BF.2020104@trueblade.com>

James Thiele wrote:
> I think the example:
> 
>     "My name is {0.name}".format(file('out.txt'))
> 
> Would be easier to understand if you added:
> 
> Which would produce:
> 
>         "My name is 'out.txt'"

I agree.

Also, the example a couple of paragraphs down:
"My name is {0[name]}".format(dict(name='Fred'))
should show the expected output:
"My name is Fred"

I was adding test cases from the PEP last night, and I ignored the file 
one because I didn't want to mess with files.  I've looked around for a 
replacement, but I couldn't find a built in type with an attribute that 
would be easy to test.  Maybe we could stick with file, and use sys.stdin:

"File name is {0.name}".format(sys.stdin)

which would produce:

'File name is 0'

I don't know if the "0" is platform dependent or not.  If anyone has an 
example of a builtin (or standard module) type, variable, or whatever 
that has an attribute that can have a known value, I'd like to see it.

When working on this, I notice that in 2.3.3 (on the same machine), 
sys.stdin.name is '<stdin>', but in py3k it's 0.  Not sure if that's a 
bug or intentional.

In any event, if we leave this example in the PEP, not only should we 
include the expected output, it should probably be changed to use "open" 
instead of "file":
"My name is {0.name}".format(open('out.txt'))
since I think file(filename) is deprecated (but still works).  At least 
I thought it was deprecated, now that I look around I can't find any 
mention of it.


Eric.


From brett at python.org  Wed Aug 15 19:16:10 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 15 Aug 2007 10:16:10 -0700
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <f9v16k$rq5$1@sea.gmane.org>
References: <f9t2nn$ksg$1@sea.gmane.org>
	<bbaeab100708141657n16a012fbqdceece1dff992a06@mail.gmail.com>
	<f9v16k$rq5$1@sea.gmane.org>
Message-ID: <bbaeab100708151016o1329d493m393bdb76149c20c0@mail.gmail.com>

On 8/15/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Brett Cannon schrieb:
> > On 8/14/07, Georg Brandl <g.brandl at gmx.net> wrote:
> >> Now that the converted documentation is fairly bug-free, I want to
> >> make the switch.
> >>
> >> I will replace the old Doc/ trees in the trunk and py3k branches
> >> tomorrow, moving over the reST ones found at
> >> svn+ssh://svn.python.org/doctools/Doc-{26,3k}.
> >
> > First, that address is wrong; missing a 'trunk' in there.
>
> Sorry again.
>

Not a problem.  I also noticed, though, that the user (pythondev) is
missing as well.  =)

> > Second, are we going to keep the docs in a separate tree forever, or
> > is this just for now?
>
> They will be moved (in a few minutes...) to the location where the
> Latex docs are now.
>

Yep, just did an update.

> > I am not thinking so much about the tools, but
> > whether we will need to do two separate commits in order to make code
> > changes *and* change the docs?  Or are you going to add an externals
> > dependency in the trees to their respective doc directories?
>
> No separate commits will be needed to commit changes to the docs.
> However, the tool to build the docs will not be in the tree under Doc/,
> but continue to be maintained in the doctools/ toplevel project.
>

OK.

> I spoke with Martin about including them as externals, but we agreed that
> they are not needed and cost too much time on every "svn up".  Instead,
> the Doc/ makefile checks out the tools in a separate directory and runs
> them from there. (The Doc/README.txt file explains this in more detail.)

Seems simple enough!  Thanks again for doing this, Georg (and the doc SIG)!

-Brett

From rhamph at gmail.com  Wed Aug 15 19:20:27 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 15 Aug 2007 11:20:27 -0600
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C2A2B1.1010708@ronadam.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
	<46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com>
	<20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net>
	<46C2A2B1.1010708@ronadam.com>
Message-ID: <aac2c7cb0708151020sb51a5d6u55f736bc838e2c38@mail.gmail.com>

On 8/15/07, Ron Adam <rrr at ronadam.com> wrote:
>
>
> Andrew James Wade wrote:
> > On Tue, 14 Aug 2007 21:12:32 -0500
> > Ron Adam <rrr at ronadam.com> wrote:
>
> >> What I was thinking of was just a simple left to right evaluation order.
> >>
> >>      "{0:spec1, spec2, ... }".format(x)
> >>
> >> I don't expect this will ever get very long.
> >
> > The first __format__ will return a str, so chains longer than 2 don't
> > make a lot of sense. And the delimiter character should be allowed in
> > spec1; limiting the length of the chain to 2 allows that without escaping:
> >
> >     "{0:spec1-with-embedded-comma,}".format(x)
> >
> > My scheme did the same sort of thing with spec1 and spec2 reversed.
> > Your order makes more intuitive sense; I chose my order because I
> > wanted the syntax to be a generalization of formatting strings.
>  >
> > Handling the chaining within the __format__ methods should be all of
> > two lines of boilerplate per method.
>
> I went ahead and tried this out and it actually cleared up some difficulty
>   in organizing the parsing code.  That was a very nice surprise. :)
>
>      (actual doctest)
>
>      >>> import time
>      >>> class GetTime(object):
>      ...     def __init__(self, time=time.gmtime()):
>      ...         self.time = time
>      ...     def __format__(self, spec):
>      ...         return fstr(time.strftime(spec, self.time))
>
>      >>> start = GetTime(time.gmtime(1187154773.0085449))
>
>      >>> fstr("Start: {0:%d/%m/%Y %H:%M:%S,<30}").format(start)
>      'Start: 15/08/2007 05:12:53           '

Caveat: some date formats include a comma.  I think the only
workaround would be splitting them into separate formats (and using
the input date twice).

-- 
Adam Olsen, aka Rhamphoryncus

From eric+python-dev at trueblade.com  Wed Aug 15 19:26:33 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 15 Aug 2007 13:26:33 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com>
Message-ID: <46C33749.5010702@trueblade.com>

James Thiele wrote:
> The section on the explicit conversion flag contains the following line:
> 
>       These flags are typically placed before the format specifier:
> 
> Where else can they be placed?

I'd like this to say they can only be placed where the PEP describes 
them, or maybe to be only at the end.
"{0!r:20}".format("Hello")
or
"{0:20!r}".format("Hello")

Putting them at the end makes the parsing easier, although I grant you 
that that's not a great reason for specifying it that way.  Whatever it 
is, I think there should be only one place they can go.

> Also there is no description of what action (if any) is taken if an
> unknown explicit conversion flag is encoubtered.

I would assume a ValueError, but yes, it should be explicit.

From guido at python.org  Wed Aug 15 19:28:00 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 15 Aug 2007 10:28:00 -0700
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
Message-ID: <ca471dc20708151028m4a31662bpb982ebb9b881629@mail.gmail.com>

On 8/14/07, Guido van Rossum <guido at python.org> wrote:
> I thought some more about the universal newlines situation, and I
> think I can handle all the use cases with a single 'newline'
> parameter. The use cases are:
>
> (A) input use cases:
>
> (1) newline=None: input with default universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are translated to \n.
>
> (2) newline='': input with untranslated universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are returned untranslated.
>
> (3) newline='\r', newline='\n', newline='\r\n': input lines must end
> with the given character(s), and these are translated to \n.
>
> (B) output use cases:
>
> (1) newline=None: every \n written is translated to os.linesep.
>
> (2) newline='': no translation takes place.
>
> (3) newline='\r', newline='\n', newline='\r\n': every \n written is
> translated to the value of newline.

I'm going to respond to several replies in one email. Warning:
bikeshedding ahead!

On 8/15/07, Brett Cannon <brett at python.org> wrote:
> I like the options, but I would swap the meaning of None and the empty
> string.  My reasoning for this is that for option 3 it says to me
> "here is a string representing EOL, and make it \n".  So I would think
> of the empty string as, "I don't know what EOL is, but I want it
> translated to \n".  Then None means, "I don't want any translation
> done" by the fact that the argument is not a string.  In other words,
> the existence of a string argument means you want EOL translated to
> \n, and the specific value of 'newline' specifying how to determine
> what EOL is.

I see it differently. None is the natural default, which is universal
newline modes with translation on input, and translation of \n to
os.linesep on output. On input, all the other forms mean "no
translation", and the value is the character string that ends a line
(leaving the door open for a future extension to arbitrary record
separators, either as an eventual standard feature, or as a compatible
user-defined variant). If it is empty, that is clearly an exception
(since io.py is not able to paranormally guess when a line ends
without searching for a character), so we give that the special
meaning "disable translation, but use the default line ending
separators".

On output, the situation isn't quite symmetrical, since the use cases
are different: the natural default is to translate \n to os.linesep,
and the most common other choices are probably to translate \n to a
specific line ending (this helps keep the line ending choice separate
from the code that produces the output). Again, translating \n to the
empty string makes no sense, so the empty string can be used for
another special case: and again, it is the "give the app the most
control" case.

Note that translation on input when a specific line ending is given
doesn't make much sense, and can even create ambiguities -- e.g. if
the line ending is \r\n, an input line of the form XXX\nYYY\r\n would
be translated to XXX\nYYY\n, and then one would wonder why it wasn't
split at the first \n. (If you want translation, you're apparently not
all that interested in the details, so the default is best for you.)
For output, it's different: *not* translating on output doesn't
require one to specify a line ending when opening the file.

Here are a few complete scenarios:

- Copy a file (perhaps changing the encoding) while keeping line
endings the same: specify newline="" on input and output.

- Copy a file translating line endings to the platform default:
specify newline=None on input and output.

- Copy a file translating line endings to a specific string: specify
newline=None on input and newline="<string>" on output.

- Read a Windows file the way it would be interpreted by certain tools
on Windows: set newline="\r\n" (this treats a lone \n or \r as a
regular character).

On 8/15/07, Christian Heimes <lists at cheimes.de> wrote:
> I like to propose some constants which should be used instead of the
> strings:
>
> MAC = '\r'
> UNIX = '\n'
> WINDOWS = '\r\n'
> UNIVERSAL = ''
> NOTRANSLATE = None
>
> I think that open(filename, newline=io.UNIVERSAL) or open(filename,
> newline=io.WINDOWS) is much more readable than open(filename,
> newline=''). Besides I always forget if Windows is '\r\n' or '\n\r'. *g*

I find named constants unpythonic; taken to the extreme you'd also
want to define names for modes like "r" and "w+b". I also think it's a
bad idea to use platform names -- lots of places besides Windows use
\r\n (e.g. most standard internet protocols), and most modern Mac
applications use \n, not \r.

On 8/15/07, Georg Brandl <g.brandl at gmx.net> wrote:
> I'd use None and "\r"/... as proposed, but "U" instead of the empty string
> for universal newline mode. "U" already has that established meaning, and
> you don't have to remember the difference between the two (false) values ""
> and None.

But it would close off the possible extension to other separators I
mentioned above.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Wed Aug 15 19:39:34 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 15 Aug 2007 11:39:34 -0600
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <f9uv15$jva$1@sea.gmane.org>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
	<f9uv15$jva$1@sea.gmane.org>
Message-ID: <aac2c7cb0708151039j260ccffby19b04cf632e51e9a@mail.gmail.com>

On 8/15/07, Christian Heimes <lists at cheimes.de> wrote:
> Brett Cannon wrote:
> > I like the options, but I would swap the meaning of None and the empty
> > string.  My reasoning for this is that for option 3 it says to me
> > "here is a string representing EOL, and make it \n".  So I would think
> > of the empty string as, "I don't know what EOL is, but I want it
> > translated to \n".  Then None means, "I don't want any translation
> > done" by the fact that the argument is not a string.  In other words,
> > the existence of a string argument means you want EOL translated to
> > \n, and the specific value of 'newline' specifying how to determine
> > what EOL is.
>
> I like to propose some constants which should be used instead of the
> strings:
>
> MAC = '\r'
> UNIX = '\n'
> WINDOWS = '\r\n'
> UNIVERSAL = ''
> NOTRANSLATE = None
>
> I think that open(filename, newline=io.UNIVERSAL) or open(filename,
> newline=io.WINDOWS) is much more readable than open(filename,
> newline=''). Besides I always forget if Windows is '\r\n' or '\n\r'. *g*

I agree, but please make the constants opaque.  I don't want to see a
random mix of constants and non-constants.  Plus, opaque constants
could be self-documenting.

-- 
Adam Olsen, aka Rhamphoryncus

From jjb5 at cornell.edu  Wed Aug 15 19:44:04 2007
From: jjb5 at cornell.edu (Joel Bender)
Date: Wed, 15 Aug 2007 13:44:04 -0400
Subject: [Python-3000] Fix imghdr module for bytes
In-Reply-To: <46BFA0FC.2060707@canterbury.ac.nz>
References: <200708110235.43664.victor.stinner@haypocalc.com>	<aac2c7cb0708101745w53247653ifc4aeef0e9c287a4@mail.gmail.com>	<79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com>
	<46BFA0FC.2060707@canterbury.ac.nz>
Message-ID: <46C33B64.9040104@cornell.edu>

Greg Ewing wrote:

> I'm wondering whether we want a "byte character literal"
> to go along with "byte string literals":
> 
>    h[0] == c"P"
> 
> After all, if it makes sense to write an array of bytes
> as though they were ASCII characters, it must make sense
> to write a single byte that way as well.

Would you propose these to be mutable as well?  Ugh.  :-)


Joel


From martin at v.loewis.de  Wed Aug 15 19:51:02 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Aug 2007 19:51:02 +0200
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <f9v2rd$2dl$1@sea.gmane.org>
References: <f9t2nn$ksg$1@sea.gmane.org> <f9v2rd$2dl$1@sea.gmane.org>
Message-ID: <46C33D06.9030607@v.loewis.de>

> Okay, I made the switch.  I tagged the state of both Python branches
> before the switch as tags/py{26,3k}-before-rstdocs/.

Update instructions:

1. svn diff Doc; any pending changes will need to be redone
2. svn up; this will remove the tex sources, and then likely
   fail if there were still other files present in Doc, e.g.
   from building the documentation
3. review any files left in Doc
4. rm -rf Doc
5. svn up

If you are certain there is nothing of interest in your sandbox
copy of Doc, you can start with step 4.

Regards,
Martin

From tony at PageDNA.com  Wed Aug 15 20:27:45 2007
From: tony at PageDNA.com (Tony Lownds)
Date: Wed, 15 Aug 2007 11:27:45 -0700
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
Message-ID: <BBE1B683-4A81-4835-A4E2-1416DCAF7711@PageDNA.com>


On Aug 14, 2007, at 9:56 PM, Guido van Rossum wrote:

> I thought some more about the universal newlines situation, and I
> think I can handle all the use cases with a single 'newline'
> parameter. The use cases are:
>
> (A) input use cases:
>
> (1) newline=None: input with default universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are translated to \n.
>
> (2) newline='': input with untranslated universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are returned untranslated.
>
> (3) newline='\r', newline='\n', newline='\r\n': input lines must end
> with the given character(s), and these are translated to \n.
>
> (B) output use cases:
>
> (1) newline=None: every \n written is translated to os.linesep.
>
> (2) newline='': no translation takes place.
>
> (3) newline='\r', newline='\n', newline='\r\n': every \n written is
> translated to the value of newline.
>

These make a lot of sense to me. I'm working on test cases / cleanup,
but I have a patch that implements the behavior above. And the
newlines attribute.

Thanks
-Tony



From victor.stinner at haypocalc.com  Wed Aug 15 21:52:38 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 15 Aug 2007 21:52:38 +0200
Subject: [Python-3000] Questions about email bytes/str (python 3000)
In-Reply-To: <07Aug14.184454pdt."57996"@synergy1.parc.xerox.com>
References: <200708140422.36818.victor.stinner@haypocalc.com>
	<E8DCAEF8-B7F8-4946-8256-AD0732492C51@python.org>
	<07Aug14.184454pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <200708152152.38839.victor.stinner@haypocalc.com>

On Wednesday 15 August 2007 03:44:54 Bill Janssen wrote:
> > (...) I think that base64MIME.encode() may have to accept strings.
>
> Personally, I think it would avoid more errors if it didn't.

Yeah, how can you guess which charset the user want to use? For most user, 
there is only one charset: latin-1. So I you use UTF-8, he will not 
understand conversion errors.

Another argument: I like bidirectional codec:
   decode(encode(x)) == x
   encode(decode(x)) == x

So if you mix bytes and str, these relations will be wrong.

Victor Stinner aka haypo
http://hachoir.org/

From rhamph at gmail.com  Wed Aug 15 22:14:25 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 15 Aug 2007 14:14:25 -0600
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
Message-ID: <aac2c7cb0708151314m459f555mf0126f58ec0117b@mail.gmail.com>

On 8/14/07, Guido van Rossum <guido at python.org> wrote:
> I thought some more about the universal newlines situation, and I
> think I can handle all the use cases with a single 'newline'
> parameter. The use cases are:
>
> (A) input use cases:
>
> (1) newline=None: input with default universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are translated to \n.
>
> (2) newline='': input with untranslated universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are returned untranslated.

Caveat: this mode cannot be supported by sockets.  When reading a lone
\r you need to peek ahead to ensure the next character is not a \n,
but for sockets that may block indefinitely.

I don't expect sockets to use the file API by default, but there's
enough overlap (named pipes?) that limitations like this should be
well documented (and if possible, produce an explicit error!)


> (3) newline='\r', newline='\n', newline='\r\n': input lines must end
> with the given character(s), and these are translated to \n.
>
> (B) output use cases:
>
> (1) newline=None: every \n written is translated to os.linesep.
>
> (2) newline='': no translation takes place.
>
> (3) newline='\r', newline='\n', newline='\r\n': every \n written is
> translated to the value of newline.
>
> Note that cases (2) are new, and case (3) changes from the current PEP
> and/or from the current implementation (which seems to deviate from
> the PEP).

[snip]
>
>         * If universal newlines without translation are requested on
>           input (i.e. ``newline=''``), if a system read operation
>           returns a buffer ending in ``'\r'``, another system read
>           operation is done to determine whether it is followed by
>           ``'\n'`` or not.  In universal newlines mode with
>           translation, the second system read operation may be
>           postponed until the next read request, and if the following
>           system read operation returns a buffer starting with
>           ``'\n'``, that character is simply discarded.


-- 
Adam Olsen, aka Rhamphoryncus

From guido at python.org  Wed Aug 15 22:17:15 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 15 Aug 2007 13:17:15 -0700
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <aac2c7cb0708151314m459f555mf0126f58ec0117b@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<aac2c7cb0708151314m459f555mf0126f58ec0117b@mail.gmail.com>
Message-ID: <ca471dc20708151317w71082250na15f5faba4ed40bb@mail.gmail.com>

On 8/15/07, Adam Olsen <rhamph at gmail.com> wrote:
> On 8/14/07, Guido van Rossum <guido at python.org> wrote:
> > (2) newline='': input with untranslated universal newlines mode; lines
> > may end in \r, \n, or \r\n, and these are returned untranslated.
>
> Caveat: this mode cannot be supported by sockets.  When reading a lone
> \r you need to peek ahead to ensure the next character is not a \n,
> but for sockets that may block indefinitely.

It depends on what you want. In general *any* read from a socket may
block indefinitely. If the protocol requires turning around at \r *or*
\r\n I'd say the protocol is insane.

> I don't expect sockets to use the file API by default, but there's
> enough overlap (named pipes?) that limitations like this should be
> well documented (and if possible, produce an explicit error!)

Why do you want it to produce an error? Who says I don't know what I'm
doing when I request that mode?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Wed Aug 15 22:36:58 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 15 Aug 2007 14:36:58 -0600
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <ca471dc20708151317w71082250na15f5faba4ed40bb@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<aac2c7cb0708151314m459f555mf0126f58ec0117b@mail.gmail.com>
	<ca471dc20708151317w71082250na15f5faba4ed40bb@mail.gmail.com>
Message-ID: <aac2c7cb0708151336m616ee4dalec885cff89e05186@mail.gmail.com>

On 8/15/07, Guido van Rossum <guido at python.org> wrote:
> On 8/15/07, Adam Olsen <rhamph at gmail.com> wrote:
> > On 8/14/07, Guido van Rossum <guido at python.org> wrote:
> > > (2) newline='': input with untranslated universal newlines mode; lines
> > > may end in \r, \n, or \r\n, and these are returned untranslated.
> >
> > Caveat: this mode cannot be supported by sockets.  When reading a lone
> > \r you need to peek ahead to ensure the next character is not a \n,
> > but for sockets that may block indefinitely.
>
> It depends on what you want. In general *any* read from a socket may
> block indefinitely. If the protocol requires turning around at \r *or*
> \r\n I'd say the protocol is insane.
>
> > I don't expect sockets to use the file API by default, but there's
> > enough overlap (named pipes?) that limitations like this should be
> > well documented (and if possible, produce an explicit error!)
>
> Why do you want it to produce an error? Who says I don't know what I'm
> doing when I request that mode?

As you just said, you'd be insane to require it.

But on second thought I don't think we can reliably say it's wrong.  A
named pipe may just have a file cat'd through it, which would handle
this mode just fine.

It should be documented that interactive streams cannot safely use this mode.

-- 
Adam Olsen, aka Rhamphoryncus

From rrr at ronadam.com  Wed Aug 15 22:52:56 2007
From: rrr at ronadam.com (Ron Adam)
Date: Wed, 15 Aug 2007 15:52:56 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <fb6fbf560708150907q5d2f037ex85139dce51dec7fb@mail.gmail.com>
References: <46B13ADE.7080901@acm.org> <46BBEB16.2040205@ronadam.com>	
	<46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com>	 <46BD5BD8.7030706@acm.org>
	<46BF22D8.2090309@trueblade.com>	 <46BF3CC7.6010405@acm.org>
	<46C0EEBF.3010206@ronadam.com>	
	<fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>	
	<46C2325D.1010209@ronadam.com>
	<fb6fbf560708150907q5d2f037ex85139dce51dec7fb@mail.gmail.com>
Message-ID: <46C367A8.4040601@ronadam.com>



Jim Jewett wrote:
> On 8/14/07, Ron Adam <rrr at ronadam.com> wrote:
>> Jim Jewett wrote:
>>> On 8/13/07, Ron Adam <rrr at ronadam.com> wrote:
> 
>>>>        {name[:][type][alignment_term][,content_modifying_term]}
> 
>>> ...  You used the (alignment term)
>>> width as the number of digits before the decimal,
>>> instead of as the field width.
> 
>> You can leave out either term.  So that may have
>> been what you are seeing.
> 
> I thought I was going based on your spec, rather than the examples.
> 
>>      {0:f8.2}
>
> Because there is no comma, this should be a field of width 8 -- two
> after a decimal point, the decimal point itself, and at most 5 digits
> (or 4 and sign) before the the decimal point
> 
> I (mis?)read your messge as special-casing float to 8 digits before
> the decimal point, for a total field width of 11.

No, but I see where the confusion is coming from.  The spec is split up a 
bit different than what you are expecting.

Maybe this will help...

       {name[:[type][align][,format]]}

If the alignment is left out, the comma isn't needed so it becomes..

       {name[:[type][fomrat]}

Which is what you are seeing.


BUT, after trying a sequential specifications terms... and finding it works 
better this changes a bit.. more on this at the end of this message.  I'll 
try to answer your questions first.

> ...
>> Minimal width with fill for shorter than width items.  It
>> expands if the length of the item is longer than width.
>  ...
>>> Can I say "width=whatever it takes, up to 72 chars ...
>>> but don't pad it if you don't need to"?
> 
>> That's the default behavior.
> 
> If width is only a minimum, where did it get the maximum of 72 chars part?

Ok.. we aren't communicating...  Lets see if I can clear this up.

       width = whatever it takes    # size of a field
       max_width = 72

Padding doesn't make sense here.  Whatever it takes is basically saying 
there is no minimum width, so there is nothing to say how much padding to 
use.  Unless you want a fixed 72 width with padding.  Then the whatever it 
takes part doesn't makes sense.

So if you want a padded width of 30, and a maximum width of 72, you would 
use the following.

     {0:s^30/_,+72}    Pad strings shorter than 30 with '_',
                       trim long strings to 72 max.

In this case the +72 is evaluated in the context of a string.

>>> I'm not sure that variable lengths and alignment even
>>> *should* be supported in the same expression, but if
>>> forcing everything to fixed-width would be enough of
>>> a change that it needs an explicit callout.
> 
>> Alignment is needed for when the length of the value is shorter than the
>> length of the field.  So if a field has a minimal width, and a value is
>> shorter than that, it will be used.
> 
> I should have been more explicit -- variable length *field*.  How can
> one report say "use up to ten characters if you need them, but only
> use three if that is all you need" and another report say "use exactly
> ten characters; right align and fill with spaces."
> 
> -jJ

    {0:s<3,+10}     The second part is like s[:+10]

Minimum left aligned of 3, (strings of 1 and 2 go to the left with spaces 
as padding up to 3 characters total, but expand up to 10.  Anything over 
that is trimmed to 10 characters.

    {0:s>10,+10}

Right align shorter than 10 character strings and pad with spaces, and cut 
longer strings to 10 characters from the left.

    {0:s>10,-10}   The second term is like s[-10:]

Right align shorter than 10 character strings, and pad with spaces, and cut 
longer strings to 10 characters from the right.




Now... For different variation.

I've found we can do all this by chaining multiple specifiers in a simple 
left to right evaluated expression.  It doesn't split the type letters from 
the terms like what confused you earlier, so you may like this better.

The biggest problem with this is the use of commas for separators, they can 
clash with commas in the specifiers. This is a problem with any multi part 
split specifier as well, so if you can think of a way to resolve that, it 
would be good.



BTW, The examples above work unchanged.

      {0:s>10,+10}

This is equivalent to...

       s>10,s+10

We can drop the 's' off the second term because the result of the first 
term is a string.  Or it can be left on for clarity.  So this evaluates to...

       value.__format__('s>10').__format__('+10')

We could combine the terms, but because the behavior is so different, 
aligning  vs clipping, I don't think that's a good idea even though they 
are both handled by the string type.

With numbers it works the same way...

      {0:f10.2,>12}

      value.__format__('f10.2').__format__('>12')

Here a string is returned after the f format spec is applied, and then 
string's __format__ handles the field alignment.  The leading 's' isn't 
needed for 's>12' because the type is already a string.


It's much simpler than it may sound in practice.  I also have a test (proof 
of concept) implementation if you'd like to see it.

I've worked into parts of Talons latest PEP update as well.  There are 
still plenty of things to add to it though like scientific notation and 
general number forms.  Unused argument testing, nested terms, and attribute 
access.


     General form:

         {label[:spec1[,spec2][,spec3]...]}


     Each values.__format__ method is called with the next specifier
     from left to right sequence.

     In most cases only one or two specifiers are needed.


     * = Not implemented yet

     The general string presentation type.

         's' - String. Outputs aligned and or trimmed strings.
         'r' - Repr string. Outputs a string by using repr().

          Strings are the default for anything without a __format__ method.


     The available integer presentation types are:

       * 'b' - Binary. Outputs the number in base 2.
       * 'c' - Character. Converts the integer to the corresponding
               unicode character before printing.
         'd' - Decimal Integer. Outputs the number in base 10.
         'o' - Octal format. Outputs the number in base 8.
         'x' - Hex format. Outputs the number in base 16, using lower-
               case letters for the digits above 9.
         'X' - Hex format. Outputs the number in base 16, using upper-
               case letters for the digits above 9.

     The available floating point presentation types are:

       * 'e' - Exponent notation. Prints the number in scientific
               notation using the letter 'e' to indicate the exponent.
       * 'E' - Exponent notation. Same as 'e' except it uses an upper
               case 'E' as the separator character.
         'f' - Fixed point. Displays the number as a fixed-point
               number.
         'F' - Fixed point. Same as 'f'.
       * 'g' - General format. This prints the number as a fixed-point
               number, unless the number is too large, in which case
               it switches to 'e' exponent notation.
       * 'G' - General format. Same as 'g' except switches to 'E'
               if the number gets to large.
       * 'n' - Number. This is the same as 'g', except that it uses the
               current locale setting to insert the appropriate
               number separator characters.
         '%' - Percentage. Multiplies the number by 100 and displays
               in fixed ('f') format, followed by a percent sign.


     String alignment:

         Sets minimum field width, justification, and padding chars.

         [s|r][justify][width][/padding_char]

         <   left justify            (default)
         >   right justify
         ^   center

         width               minimum field width

         /padding_char       A single character


     String clipping:

         [s|r][trimmed width]

         +n  Use n characters from beginning.  s[:+n]
         -n  use n characters form end. s[-n:]


     Numeric formats:

         [type][sign][0][digits][.decimals][%]

         signs:
             -   show negative sign   (defualt)
             +   show both negative and positive sign
             (   parentheses around negatives, spaces around positives
                   (ending ')' is optional.)

         %   muliplies number by 100 and places ' %' after it.

         0   pad digits with leading zeros.

         digits      number of digits before the decimal

         decimals    number of digits after the decimal


EXAMPLES:

     >>> short_str = fstr("World")
     >>> long_str = fstr("World" * 3)
     >>> pos_int = fint(12345)
     >>> neg_int = fint(-12345)
     >>> pos_float = ffloat(123.45678)
     >>> neg_float = ffloat(-123.45678)


     >>> fstr("Hello {0}").format(short_str)    # default __str__
     'Hello World'

     >>> fstr("Hello {0:s}").format(short_str)
     'Hello World'

     >>> fstr("Hello {0:r}").format(short_str)
     "Hello 'World'"

     >>> fstr("Hello {0:s<10}").format(short_str)
     'Hello World     '

     >>> fstr("Hello {0:s^10/_}").format(short_str)
     'Hello __World___'

     >>> fstr("Hello {0:s+12,<10}").format(short_str)
     'Hello World     '

     >>> fstr("Hello {0:s+12,<10}").format(long_str)
     'Hello WorldWorldWo'

     >>> fstr("Hello {0:r+12,<10}").format(long_str)
     "Hello 'WorldWorldW"

     >>> fstr("Hello {0:s-12}").format(long_str)
     'Hello ldWorldWorld'


     INTEGERS:

     >>> fstr("Item Number: {0}").format(pos_int)
     'Item Number: 12345'

     >>> fstr("Item Number: {0:i}").format(pos_int)
     'Item Number: 12345'

     >>> fstr("Item Number: {0:x}").format(pos_int)
     'Item Number: 0x3039'

     >>> fstr("Item Number: {0:X}").format(pos_int)
     'Item Number: 0X3039'

     >>> fstr("Item Number: {0:o}").format(pos_int)
     'Item Number: 030071'

     >>> fstr("Item Number: {0:>10}").format(pos_int)
     'Item Number:      12345'

     >>> fstr("Item Number: {0:<10}").format(pos_int)
     'Item Number: 12345     '

     >>> fstr("Item Number: {0:^10}").format(pos_int)
     'Item Number:   12345   '

     >>> fstr("Item Number: {0:i010%}").format(neg_int)
     'Item Number: -0001234500 %'

     >>> fstr("Item Number: {0:i()}").format(pos_int)
     'Item Number:  12345 '

     >>> fstr("Item Number: {0:i()}").format(neg_int)
     'Item Number: (12345)'


     FIXEDPOINT:

     >>> fstr("Item Number: {0}").format(pos_float)
     'Item Number: 123.45678'

     >>> fstr("Item Number: {0:f}").format(pos_float)
     'Item Number: 123.45678'

     >>> fstr("Item Number: {0:>12}").format(pos_float)
     'Item Number:    123.45678'

     >>> fstr("Item Number: {0:<12}").format(pos_float)
     'Item Number: 123.45678   '

     >>> fstr("Item Number: {0:^12}").format(pos_float)
     'Item Number:  123.45678  '

     >>> fstr("Item Number: {0:f07.2%}").format(neg_float)
     'Item Number: -0012345.68 %'

     >>> fstr("Item Number: {0:F.3}").format(neg_float)
     'Item Number: -123.457'

     >>> fstr("Item Number: {0:f.7}").format(neg_float)
     'Item Number: -123.4567800'

     >>> fstr("Item Number: {0:f(05.3)}").format(neg_float)
     'Item Number: (00123.457)'

     >>> fstr("Item Number: {0:f05.7}").format(neg_float)
     'Item Number: -00123.4567800'

     >>> fstr("Item Number: {0:f06.2}").format(neg_float)
     'Item Number: -000123.46'

     >>> import time
     >>> class GetTime(object):
     ...     def __init__(self, time=time.gmtime()):
     ...         self.time = time
     ...     def __format__(self, spec):
     ...         return fstr(time.strftime(spec, self.time))

     >>> start = GetTime(time.gmtime(1187154773.0085449))

     >>> fstr("Start: {0:%d/%m/%Y %H:%M:%S,<30}").format(start)
     'Start: 15/08/2007 05:12:53           '


-----------------------------------------------------------------
Examples from python3000 list:
     (With only a few changes where it makes sense or to make it work.)

     >>> floatvalue = ffloat(123.456)

     # Floating point number of natural width
     >>> fstr('{0:f}').format(floatvalue)
     '123.456'

     # Floating point number, with 10 digits before the decimal
     >>> fstr('{0:f10}').format(floatvalue)
     '       123.456'

     # Floating point number, in feild of width 10, right justified.
     >>> fstr('{0:f,>10}').format(floatvalue)
     '   123.456'

     # Floating point number, width at least 10 digits before
     # the decimal, leading zeros
     >>> fstr('{0:f010}').format(floatvalue)
     '0000000123.456'

     # Floating point number with two decimal digits
     >>> fstr('{0:f.2}').format(floatvalue)
     '123.46'

     # Minimum width 8, type defaults to natural type
     >>> fstr('{0:8}').format(floatvalue)
     '123.456 '

     # Integer number, 5 digits, sign always shown
     >>> fstr('{0:d+5}').format(floatvalue)
     '+  123'

     # repr() format
     >>> fstr('{0:r}').format(floatvalue)
     "'123.456'"

     # Field width 10, repr() format
     >>> fstr('{0:r10}').format(floatvalue)
     "'123.456' "

     # String right-aligned within field of minimum width
     >>> fstr('{0:s10}').format(floatvalue)
     '123.456   '

     # String right-aligned within field of minimum width
     >>> fstr('{0:s+10,10}').format(floatvalue)
     '123.456   '

     # String left-aligned in 10 char (min) field.
     >>> fstr('{0:s<10}').format(floatvalue)
     '123.456   '

     # Integer centered in 15 character field
     >>> fstr('{0:d,^15}').format(floatvalue)
     '      123      '

     # Right align and pad with '.' chars
     >>> fstr('{0:>15/.}').format(floatvalue)
     '........123.456'

     # Floating point,  always show sign,
     # leading zeros, 10 digits before decimal, 5 decimal places.
     >>> fstr('{0:f010.5}').format(floatvalue)
     '0000000123.45600'





From rrr at ronadam.com  Wed Aug 15 23:29:40 2007
From: rrr at ronadam.com (Ron Adam)
Date: Wed, 15 Aug 2007 16:29:40 -0500
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <aac2c7cb0708151020sb51a5d6u55f736bc838e2c38@mail.gmail.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>	
	<46C11DDF.2080607@acm.org>	
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>	
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>	
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>	
	<46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com>	
	<20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net>	
	<46C2A2B1.1010708@ronadam.com>
	<aac2c7cb0708151020sb51a5d6u55f736bc838e2c38@mail.gmail.com>
Message-ID: <46C37044.6000406@ronadam.com>



Adam Olsen wrote:
> On 8/15/07, Ron Adam <rrr at ronadam.com> wrote:
>>
>> Andrew James Wade wrote:
>>> On Tue, 14 Aug 2007 21:12:32 -0500
>>> Ron Adam <rrr at ronadam.com> wrote:
>>>> What I was thinking of was just a simple left to right evaluation order.
>>>>
>>>>      "{0:spec1, spec2, ... }".format(x)
>>>>
>>>> I don't expect this will ever get very long.
>>> The first __format__ will return a str, so chains longer than 2 don't
>>> make a lot of sense. And the delimiter character should be allowed in
>>> spec1; limiting the length of the chain to 2 allows that without escaping:
>>>
>>>     "{0:spec1-with-embedded-comma,}".format(x)
>>>
>>> My scheme did the same sort of thing with spec1 and spec2 reversed.
>>> Your order makes more intuitive sense; I chose my order because I
>>> wanted the syntax to be a generalization of formatting strings.
>>  >
>>> Handling the chaining within the __format__ methods should be all of
>>> two lines of boilerplate per method.
>> I went ahead and tried this out and it actually cleared up some difficulty
>>   in organizing the parsing code.  That was a very nice surprise. :)
>>
>>      (actual doctest)
>>
>>      >>> import time
>>      >>> class GetTime(object):
>>      ...     def __init__(self, time=time.gmtime()):
>>      ...         self.time = time
>>      ...     def __format__(self, spec):
>>      ...         return fstr(time.strftime(spec, self.time))
>>
>>      >>> start = GetTime(time.gmtime(1187154773.0085449))
>>
>>      >>> fstr("Start: {0:%d/%m/%Y %H:%M:%S,<30}").format(start)
>>      'Start: 15/08/2007 05:12:53           '
> 
> Caveat: some date formats include a comma.  I think the only
> workaround would be splitting them into separate formats (and using
> the input date twice).

Maybe having an escaped comma?   '\,'

It really isn't any different than escaping quotes.  It could be limited to 
just inside format {} expressions I think.

Using raw strings with '\54' won't work.

Ron


From eric+python-dev at trueblade.com  Wed Aug 15 23:34:26 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 15 Aug 2007 17:34:26 -0400
Subject: [Python-3000] Change in sys.stdin.name
Message-ID: <46C37162.5020705@trueblade.com>

I mentioned this in another message, but I thought I'd mention it here.

I see this change in the behavior of sys.stdin.name, between 2.3.3 and 
3.0x (checked out a few minutes ago).

$ python
Python 2.3.3 (#1, May  7 2004, 10:31:40)
[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import sys
 >>> sys.stdin.name
'<stdin>'


$ ./python
Python 3.0x (py3k:57077M, Aug 15 2007, 17:27:26)
[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import sys
 >>> sys.stdin.name
0


I see similar behavior with sys.stdout and sys.stderr.

Is this deliberate?  I can file a bug report if need be, just let me know.

Eric.


From brett at python.org  Wed Aug 15 23:40:22 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 15 Aug 2007 14:40:22 -0700
Subject: [Python-3000] [Python-Dev]  Documentation switch imminent
In-Reply-To: <46C33D06.9030607@v.loewis.de>
References: <f9t2nn$ksg$1@sea.gmane.org> <f9v2rd$2dl$1@sea.gmane.org>
	<46C33D06.9030607@v.loewis.de>
Message-ID: <bbaeab100708151440u7df9e6fco8cc611ae98a2ae95@mail.gmail.com>

On 8/15/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Okay, I made the switch.  I tagged the state of both Python branches
> > before the switch as tags/py{26,3k}-before-rstdocs/.
>
> Update instructions:
>
> 1. svn diff Doc; any pending changes will need to be redone
> 2. svn up; this will remove the tex sources, and then likely
>    fail if there were still other files present in Doc, e.g.
>    from building the documentation
> 3. review any files left in Doc
> 4. rm -rf Doc
> 5. svn up
>
> If you are certain there is nothing of interest in your sandbox
> copy of Doc, you can start with step 4.

Why the 'rm' call?  When I did ``svn update`` it deleted the files for
me.  Is this to ditch some metadata?

-Brett

From martin at v.loewis.de  Wed Aug 15 23:54:01 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 15 Aug 2007 23:54:01 +0200
Subject: [Python-3000] [Python-Dev]  Documentation switch imminent
In-Reply-To: <bbaeab100708151440u7df9e6fco8cc611ae98a2ae95@mail.gmail.com>
References: <f9t2nn$ksg$1@sea.gmane.org> <f9v2rd$2dl$1@sea.gmane.org>	
	<46C33D06.9030607@v.loewis.de>
	<bbaeab100708151440u7df9e6fco8cc611ae98a2ae95@mail.gmail.com>
Message-ID: <46C375F9.3020908@v.loewis.de>

>> 1. svn diff Doc; any pending changes will need to be redone
>> 2. svn up; this will remove the tex sources, and then likely
>>    fail if there were still other files present in Doc, e.g.
>>    from building the documentation
>> 3. review any files left in Doc
>> 4. rm -rf Doc
>> 5. svn up
>>
>> If you are certain there is nothing of interest in your sandbox
>> copy of Doc, you can start with step 4.
> 
> Why the 'rm' call?  When I did ``svn update`` it deleted the files for
> me.  Is this to ditch some metadata?

No, it's to delete any files in this tree not under version control,
see step 2. If you had any such files, step 2 would abort with an
error message

svn: Konnte Verzeichnis ?Doc? nicht hinzuf?gen: ein Objekt mit demselben
Namen existiert bereits

(or some such)

Regards,
Martin

From jimjjewett at gmail.com  Thu Aug 16 01:27:57 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 15 Aug 2007 19:27:57 -0400
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C2459C.1000405@canterbury.ac.nz>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<46C2459C.1000405@canterbury.ac.nz>
Message-ID: <fb6fbf560708151627k17bed17dn89d5f5f7970aae17@mail.gmail.com>

On 8/14/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

> ... {foo!r} rather than {foo:!r}

> But either way, I suspect I'll find it difficult
> to avoid writing it as {foo:r} in the heat of the
> moment.

Me too.

And I don't like using up the "!" character.  I know that it is still
available outside of strings, but ... it is one more character that
python used to leave for other tools.  Giving a meaning to "@" mostly
went OK, but ...

From guido at python.org  Thu Aug 16 01:40:27 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 15 Aug 2007 16:40:27 -0700
Subject: [Python-3000] Questions about email bytes/str (python 3000)
In-Reply-To: <200708152152.38839.victor.stinner@haypocalc.com>
References: <200708140422.36818.victor.stinner@haypocalc.com>
	<E8DCAEF8-B7F8-4946-8256-AD0732492C51@python.org>
	<200708152152.38839.victor.stinner@haypocalc.com>
Message-ID: <ca471dc20708151640v78018026q6915d6e792fa2c87@mail.gmail.com>

(Warning: quotation intentionally out of context!)

On 8/15/07, somebody wrote:
> For most users, there is only one charset: latin-1.

Whoa! Careful with those assumptions. This is very culture and
platforms dependent. For Americans it's ASCII (only half joking :-).
For most of the world it's likely UTF-8. In Asia, it's anything *but*
Latin-1. On my Mac laptop, the file system and the Terminal program
default to UTF-8.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Thu Aug 16 01:41:59 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 15 Aug 2007 19:41:59 -0400
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C2A2B1.1010708@ronadam.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
	<46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com>
	<20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net>
	<46C2A2B1.1010708@ronadam.com>
Message-ID: <fb6fbf560708151641q3ce3bbe8sbfa4131711fafee@mail.gmail.com>

On 8/15/07, Ron Adam <rrr at ronadam.com> wrote:

> After each term is returned from the __format__ call, the results
> __format__ method is called with the next specifier.  GetTime.__format__
> returns a string.  str.__format__, aligns it.  A nice left to right
> sequence of events.

Is this a pattern that objects should normally follow, or a convention
enforced by format itself?  In other words, does

    "{0:abc,def,ghi}".format(value)

mean

    # Assume value.__format__ will delegate properly, to
    #    result1.__format__("def,ghi")
    #
    # There are some surprises when a trailing field size gets ignored by
    # value.__class__.
    #
    # Are  infinite loops more likely?
    value.__format__("abc,def,ghi")

or

    # The separator character (","?) gets hard to use in format strings...
    value.__format__("abc").__format__("def").__format__("ghi")

-jJ

From guido at python.org  Thu Aug 16 02:37:19 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 15 Aug 2007 17:37:19 -0700
Subject: [Python-3000] Change in sys.stdin.name
In-Reply-To: <46C37162.5020705@trueblade.com>
References: <46C37162.5020705@trueblade.com>
Message-ID: <ca471dc20708151737w314a2f74kf66a4bd267943562@mail.gmail.com>

It sort of is -- the new I/O library uses the file descriptor if no
filename is given. There were no unit tests that verified the old
behavior, and I think it was of pretty marginal usefulness. Code
inspecting f.name can tell the difference by looking at its type -- if
it is an int, it's a file descriptor, if it is a string, it's a file
name.

On 8/15/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> I mentioned this in another message, but I thought I'd mention it here.
>
> I see this change in the behavior of sys.stdin.name, between 2.3.3 and
> 3.0x (checked out a few minutes ago).
>
> $ python
> Python 2.3.3 (#1, May  7 2004, 10:31:40)
> [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import sys
>  >>> sys.stdin.name
> '<stdin>'
>
>
> $ ./python
> Python 3.0x (py3k:57077M, Aug 15 2007, 17:27:26)
> [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import sys
>  >>> sys.stdin.name
> 0
>
>
> I see similar behavior with sys.stdout and sys.stderr.
>
> Is this deliberate?  I can file a bug report if need be, just let me know.
>
> Eric.
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Thu Aug 16 02:47:27 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 16 Aug 2007 02:47:27 +0200
Subject: [Python-3000] Change in sys.stdin.name
In-Reply-To: <46C37162.5020705@trueblade.com>
References: <46C37162.5020705@trueblade.com>
Message-ID: <fa06r5$v10$1@sea.gmane.org>

Eric Smith wrote:
> Is this deliberate?  I can file a bug report if need be, just let me know.

I'm sure it is a bug. The site.installnewio() function doesn't set the
names. The attached patch fixes the issue and adds an unit test, too.

Christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stdnames.patch
Type: text/x-patch
Size: 1060 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070816/a40faf04/attachment.bin 

From greg.ewing at canterbury.ac.nz  Thu Aug 16 03:06:42 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 16 Aug 2007 13:06:42 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C25DE3.6060906@ronadam.com>
References: <46B13ADE.7080901@acm.org>
	<ca471dc20708031020t43af7a32i9c9507bc2eedcbe9@mail.gmail.com>
	<46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz>
	<46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com>
	<46B52422.2090006@canterbury.ac.nz>
	<ca471dc20708041909j3f86b107hace9407673c302ea@mail.gmail.com>
	<46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz>
	<46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz>
	<46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org>
	<46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org>
	<46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz>
	<46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz>
	<46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org>
	<46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org>
	<46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz>
	<46C12526.8040807@ronadam.com> <46C2429B.1090507@canterbury.ac.nz>
	<46C25DE3.6060906@ronadam.com>
Message-ID: <46C3A322.8040904@canterbury.ac.nz>

Ron Adam wrote:
> 
> Greg Ewing wrote:
> > The format strings are starting to look like line
> > noise.
> 
> Do you have a specific example or is it just an overall feeling?

It's an overall feeling from looking at your examples.
I can't take them in at a glance -- I have to minutely
examine them character by character, which is tiring.

With the traditional format strings, at least I can
visually parse them without much trouble, even if I
don't know precisely what all the parts mean.`

 > For example the the field alignment
> part can be handled by the format function, and the value format part 
> can be handled by the __format__ method.

Yes, although that seems to be about the *only* thing
that can be separated, and it can be specified using
just one character, which should be easy enough to
strip out before passing on the format string.

> And my apologies if its starting to seem like line noise.  I'm not that 
> good at explaining things in simple ways.

It doesn't really have anything to do with explanation.
As I indicated above, even if I understand exactly
what each part means, it's still hard work parsing
the string if it contains more than a couple of the
allowed elements.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From rrr at ronadam.com  Thu Aug 16 03:07:02 2007
From: rrr at ronadam.com (Ron Adam)
Date: Wed, 15 Aug 2007 20:07:02 -0500
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <fb6fbf560708151641q3ce3bbe8sbfa4131711fafee@mail.gmail.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>	
	<46C11DDF.2080607@acm.org>	
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>	
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>	
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>	
	<46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com>	
	<20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net>	
	<46C2A2B1.1010708@ronadam.com>
	<fb6fbf560708151641q3ce3bbe8sbfa4131711fafee@mail.gmail.com>
Message-ID: <46C3A336.7080706@ronadam.com>



Jim Jewett wrote:
> On 8/15/07, Ron Adam <rrr at ronadam.com> wrote:
> 
>> After each term is returned from the __format__ call, the results
>> __format__ method is called with the next specifier.  GetTime.__format__
>> returns a string.  str.__format__, aligns it.  A nice left to right
>> sequence of events.
> 
> Is this a pattern that objects should normally follow, or a convention
> enforced by format itself?  In other words, does
> 
>     "{0:abc,def,ghi}".format(value)
> 
> mean
> 
>     # Assume value.__format__ will delegate properly, to
>     #    result1.__format__("def,ghi")
>     #
>     # There are some surprises when a trailing field size gets ignored by
>     # value.__class__.
>     #
>     # Are  infinite loops more likely?
>     value.__format__("abc,def,ghi")
> 
> or
> 
>     # The separator character (","?) gets hard to use in format strings...
>     value.__format__("abc").__format__("def").__format__("ghi")

It would have to be this version.  There isn't any way for the vformat 
method (*) to decide which ',' belongs where unless you create some strict 
and awkward rules about when you can use comma's and when you can't.

* vformat is the method described in the pep responsible for calling 
format() with each value and specifier.  So it is where the chaining is done.

Currently I have only the following for this part, but it could be more 
sophisticated.

     value = self.get_value(key, args, kwargs)
     for term in spec.split(','):
     	value = format(value, term)

There are two ways around this, one is to have a comma escape sequence such 
as '\,'.  Then after it split, it can replace the '\,' with ',' and then 
call format with the specifier with the un-escaped commas.

Another way might be to be able to designate an alternative separator in 
some way.  {0:|:abc|def,ghi}  Where :sep: is the separator to use other 
than comma.  Or :: to force a single term with no chaining.  Or some other 
syntax might work?  <shurg>  It's not an impossible problem to solve.

The idea is __format__ methods only need to be concerned with their part. 
They shouldn't have to parse some other objects specifier and pass it 
along.  (But you can still do that if you really want to.)

Cheers,
    Ron




From greg.ewing at canterbury.ac.nz  Thu Aug 16 03:08:54 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 16 Aug 2007 13:08:54 +1200
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C26110.8020001@ronadam.com>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
	<46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com>
Message-ID: <46C3A3A6.2060109@canterbury.ac.nz>

Ron Adam wrote:

> What I was thinking of was just a simple left to right evaluation order.
> 
>     "{0:spec1, spec2, ... }".format(x)

That would work too, as long as you were willing to
forbid "," as a possible character in a type-specific
format string.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Thu Aug 16 03:18:12 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 16 Aug 2007 13:18:12 +1200
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <20070814230227.0c9be356.ajwade+py3k@andrew.wade.networklinux.net>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
	<20070814230227.0c9be356.ajwade+py3k@andrew.wade.networklinux.net>
Message-ID: <46C3A5D4.3060602@canterbury.ac.nz>

Andrew James Wade wrote:
> {1:!renewal date: %Y-%m-%d} # no special meaning for ! here.

Yuck. Although it might happen to work due to reuse of
strftime, I'd consider that bad style -- constant parts
of the output string should be outside of the format
specs, i.e.:

   "renewal date: {1:%Y-%m-%d}".format(my_date)

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Thu Aug 16 03:22:47 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 16 Aug 2007 13:22:47 +1200
Subject: [Python-3000] [Python-Dev] Universal newlines support in Python
	3.0
In-Reply-To: <7375B721-C62F-49ED-945B-5A9107246D6C@python.org>
References: <ca471dc20708101123y3628e352n7116839dbfdc9bb3@mail.gmail.com>
	<87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp>
	<rowen-D017E8.10460713082007@sea.gmane.org>
	<ca471dc20708131315i6dba68d6u9efae161d1b647ca@mail.gmail.com>
	<FAEFC901-BAE1-4D56-8FEE-6098E307DAF2@python.org>
	<ca471dc20708140952g75cb6099xc7beed59f415b80@mail.gmail.com>
	<D74D8B6A-C467-45C0-B7CB-94E9F30FA83F@python.org>
	<ca471dc20708142103p54b397a0n21b8cd524fc2a4b1@mail.gmail.com>
	<7375B721-C62F-49ED-945B-5A9107246D6C@python.org>
Message-ID: <46C3A6E7.4010702@canterbury.ac.nz>

Barry Warsaw wrote:
> Add a flag  
> called preserve_eols that defaults to False, is ignored unless  
> universal newline mode is turned on,

Is there any reason it shouldn't work in non-universal-
newlines mode too?

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Thu Aug 16 03:35:36 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 16 Aug 2007 13:35:36 +1200
Subject: [Python-3000] Proposed new language for newline parameter to
 TextIOBase
In-Reply-To: <f9uv15$jva$1@sea.gmane.org>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
	<f9uv15$jva$1@sea.gmane.org>
Message-ID: <46C3A9E8.7040207@canterbury.ac.nz>

Christian Heimes wrote:
> Besides I always forget if Windows is '\r\n' or '\n\r'.

Oh, that's easy. The teletype needs to get the CR before
the LF so that it can start moving the carriage back
to the left while it's scrolling the paper up.

What? You don't have a teletype? Well, er, in that
case...

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Thu Aug 16 03:38:09 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 16 Aug 2007 13:38:09 +1200
Subject: [Python-3000] Proposed new language for newline parameter to
 TextIOBase
In-Reply-To: <f9uvhk$l07$1@sea.gmane.org>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<bbaeab100708150047i6bcc59e1pdfbcfe4f655f5f1d@mail.gmail.com>
	<f9uvhk$l07$1@sea.gmane.org>
Message-ID: <46C3AA81.9090602@canterbury.ac.nz>

Georg Brandl wrote:
> "U" instead of the empty string for universal newline mode.

It would be good to leave open the possibility of
allowing arbitrary line separation strings at some
time in the future. In that case, newline == "U" would
mean lines separated by "U".

Predefined constants seem like a good idea to me.
Otherwise I'm sure I'll always have to look up what
'' and None mean.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Thu Aug 16 03:56:50 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 16 Aug 2007 13:56:50 +1200
Subject: [Python-3000] Fix imghdr module for bytes
In-Reply-To: <46C33B64.9040104@cornell.edu>
References: <200708110235.43664.victor.stinner@haypocalc.com>
	<aac2c7cb0708101745w53247653ifc4aeef0e9c287a4@mail.gmail.com>
	<79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com>
	<46BFA0FC.2060707@canterbury.ac.nz> <46C33B64.9040104@cornell.edu>
Message-ID: <46C3AEE2.1030201@canterbury.ac.nz>

Joel Bender wrote:

> Would you propose these to be mutable as well?

No, they'd be integers.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Thu Aug 16 04:01:04 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 16 Aug 2007 14:01:04 +1200
Subject: [Python-3000] Proposed new language for newline parameter to
 TextIOBase
In-Reply-To: <aac2c7cb0708151314m459f555mf0126f58ec0117b@mail.gmail.com>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<aac2c7cb0708151314m459f555mf0126f58ec0117b@mail.gmail.com>
Message-ID: <46C3AFE0.4030906@canterbury.ac.nz>

Adam Olsen wrote:
> On 8/14/07, Guido van Rossum <guido at python.org> wrote:
 >
> > (2) newline='': input with untranslated universal newlines mode; lines
> > may end in \r, \n, or \r\n, and these are returned untranslated.
> 
> Caveat: this mode cannot be supported by sockets.  When reading a lone
> \r you need to peek ahead to ensure the next character is not a \n,
> but for sockets that may block indefinitely.

You could return as soon as you see the '\r', with
a flag set indicating that if the next character
that comes in is '\n' it should be ignored.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From rhamph at gmail.com  Thu Aug 16 04:41:36 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 15 Aug 2007 20:41:36 -0600
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <46C3AFE0.4030906@canterbury.ac.nz>
References: <ca471dc20708142156l9e7ef7ue33e314041d4ad90@mail.gmail.com>
	<aac2c7cb0708151314m459f555mf0126f58ec0117b@mail.gmail.com>
	<46C3AFE0.4030906@canterbury.ac.nz>
Message-ID: <aac2c7cb0708151941l3875b5ebg21cb7683561487f7@mail.gmail.com>

On 8/15/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Adam Olsen wrote:
> > On 8/14/07, Guido van Rossum <guido at python.org> wrote:
>  >
> > > (2) newline='': input with untranslated universal newlines mode; lines
> > > may end in \r, \n, or \r\n, and these are returned untranslated.
> >
> > Caveat: this mode cannot be supported by sockets.  When reading a lone
> > \r you need to peek ahead to ensure the next character is not a \n,
> > but for sockets that may block indefinitely.
>
> You could return as soon as you see the '\r', with
> a flag set indicating that if the next character
> that comes in is '\n' it should be ignored.

That would be the *other* universal newlines mode. ;)

(Once you're already modifying the output, you might as well convert
everything to '\n'.)

-- 
Adam Olsen, aka Rhamphoryncus

From talin at acm.org  Thu Aug 16 05:12:24 2007
From: talin at acm.org (Talin)
Date: Wed, 15 Aug 2007 20:12:24 -0700
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C331BF.2020104@trueblade.com>
References: <46C2809C.3000806@acm.org>
	<8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com>
	<46C331BF.2020104@trueblade.com>
Message-ID: <46C3C098.6080601@acm.org>

Eric Smith wrote:
> James Thiele wrote:
>> I think the example:
>>
>>     "My name is {0.name}".format(file('out.txt'))
>>
>> Would be easier to understand if you added:
>>
>> Which would produce:
>>
>>         "My name is 'out.txt'"
> 
> I agree.
> 
> Also, the example a couple of paragraphs down:
> "My name is {0[name]}".format(dict(name='Fred'))
> should show the expected output:
> "My name is Fred"

Those examples are kind of contrived to begin with. Maybe we should 
replace them with more realistic ones.

-- Talin

From talin at acm.org  Thu Aug 16 05:13:20 2007
From: talin at acm.org (Talin)
Date: Wed, 15 Aug 2007 20:13:20 -0700
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C33749.5010702@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com>
	<46C33749.5010702@trueblade.com>
Message-ID: <46C3C0D0.50101@acm.org>

Eric Smith wrote:
> James Thiele wrote:
>> The section on the explicit conversion flag contains the following line:
>>
>>       These flags are typically placed before the format specifier:
>>
>> Where else can they be placed?
> 
> I'd like this to say they can only be placed where the PEP describes 
> them, or maybe to be only at the end.
> "{0!r:20}".format("Hello")
> or
> "{0:20!r}".format("Hello")
> 
> Putting them at the end makes the parsing easier, although I grant you 
> that that's not a great reason for specifying it that way.  Whatever it 
> is, I think there should be only one place they can go.

Guido expressed a definite preference for having them be first.

>> Also there is no description of what action (if any) is taken if an
>> unknown explicit conversion flag is encoubtered.
> 
> I would assume a ValueError, but yes, it should be explicit.

This is one of those things I leave up to the implementor and then 
document later :)


From eric+python-dev at trueblade.com  Thu Aug 16 05:26:44 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 15 Aug 2007 23:26:44 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C3C0D0.50101@acm.org>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com>
	<46C33749.5010702@trueblade.com> <46C3C0D0.50101@acm.org>
Message-ID: <46C3C3F4.7060307@trueblade.com>

Talin wrote:
> Eric Smith wrote:
>> James Thiele wrote:
>>> The section on the explicit conversion flag contains the following line:
>>>
>>>       These flags are typically placed before the format specifier:
>>>
>>> Where else can they be placed?
>>
>> I'd like this to say they can only be placed where the PEP describes 
>> them, or maybe to be only at the end.
>> "{0!r:20}".format("Hello")
>> or
>> "{0:20!r}".format("Hello")
>>
>> Putting them at the end makes the parsing easier, although I grant you 
>> that that's not a great reason for specifying it that way.  Whatever 
>> it is, I think there should be only one place they can go.
> 
> Guido expressed a definite preference for having them be first.

I was afraid of that.  Then can we say they'll always go first?  Or is 
the intent really to say they can go anywhere (PEP says "typically placed")?

The sample implementation of vformat in the PEP says they'll go last:

               # Check for explicit type conversion
               field_spec, _, explicit = field_spec.partition("!")

Eric.

From talin at acm.org  Thu Aug 16 05:31:18 2007
From: talin at acm.org (Talin)
Date: Wed, 15 Aug 2007 20:31:18 -0700
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C3C3F4.7060307@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com>
	<46C33749.5010702@trueblade.com> <46C3C0D0.50101@acm.org>
	<46C3C3F4.7060307@trueblade.com>
Message-ID: <46C3C506.7050802@acm.org>

Eric Smith wrote:
> Talin wrote:
>> Eric Smith wrote:
>>> James Thiele wrote:
>>>> The section on the explicit conversion flag contains the following 
>>>> line:
>>>>
>>>>       These flags are typically placed before the format specifier:
>>>>
>>>> Where else can they be placed?
>>>
>>> I'd like this to say they can only be placed where the PEP describes 
>>> them, or maybe to be only at the end.
>>> "{0!r:20}".format("Hello")
>>> or
>>> "{0:20!r}".format("Hello")
>>>
>>> Putting them at the end makes the parsing easier, although I grant 
>>> you that that's not a great reason for specifying it that way.  
>>> Whatever it is, I think there should be only one place they can go.
>>
>> Guido expressed a definite preference for having them be first.
> 
> I was afraid of that.  Then can we say they'll always go first?  Or is 
> the intent really to say they can go anywhere (PEP says "typically 
> placed")?

I can revise it to say that they always come first if that's would make 
it easier.

> The sample implementation of vformat in the PEP says they'll go last:
> 
>               # Check for explicit type conversion
>               field_spec, _, explicit = field_spec.partition("!")

That's a bug.

Too bad there's no unit tests for pseudo-code :)

-- Talin


From eric+python-dev at trueblade.com  Thu Aug 16 05:37:42 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 15 Aug 2007 23:37:42 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C3C506.7050802@acm.org>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com>
	<46C33749.5010702@trueblade.com> <46C3C0D0.50101@acm.org>
	<46C3C3F4.7060307@trueblade.com> <46C3C506.7050802@acm.org>
Message-ID: <46C3C686.1000500@trueblade.com>

Talin wrote:
> Eric Smith wrote:
>> Talin wrote:
>>> Guido expressed a definite preference for having them be first.
>>
>> I was afraid of that.  Then can we say they'll always go first?  Or is 
>> the intent really to say they can go anywhere (PEP says "typically 
>> placed")?
> 
> I can revise it to say that they always come first if that's would make 
> it easier.

That would make it easier to code, and I suspect easier to read the 
Python code that uses them.  I'll keep coding as if it says they're 
first; no sense updating the PEP until we batch up some changes.

>> The sample implementation of vformat in the PEP says they'll go last:
>>
>>               # Check for explicit type conversion
>>               field_spec, _, explicit = field_spec.partition("!")
> 
> That's a bug.
> 
> Too bad there's no unit tests for pseudo-code :)

There's a task for someone!


From aholkner at cs.rmit.edu.au  Thu Aug 16 05:17:50 2007
From: aholkner at cs.rmit.edu.au (Alex Holkner)
Date: Thu, 16 Aug 2007 13:17:50 +1000
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C2809C.3000806@acm.org>
References: <46C2809C.3000806@acm.org>
Message-ID: <46C3C1DE.6070302@cs.rmit.edu.au>

Talin wrote:
> A new version is up, incorporating material from the various discussions 
> on this list:
> 
> 	http://www.python.org/dev/peps/pep-3101/

I've been following this thread for a few weeks, and I believe the 
following issues haven't yet been addressed:

The PEP abstract says this proposal will replace the '%' operator, yet 
all the examples use the more verbose .format() method.  Can a later 
section in the PEP (perhaps "String Methods") confirm that '%' on string 
is synonymous with the format method in Python 3000?

What is the behaviour of whitespace in a format specifier?  e.g.
how much of the following is valid?

      "{  foo . name  : 20s }".format(foo=open('bar'))

One use-case might be to visually line up fields (in source) with a
minimum field width.  Even if not permitted, I believe this should be 
mentioned in the PEP.

Does a brace that does not begin a format specifier raise an exception 
or get treated as character data?  e.g.

     "{@foo}"

or, if no whitespace is permitted:

     "{ foo }"

or, an unmatched closing brace:

     " } "

I don't have any preference on either behaviour, but would like to see 
it clarified in the PEP.

Has there been any consideration for omitting the field name?  The 
behaviour would be the same as the current string interpolation:

     "The {:s} sat on the {:s}".format('cat', 'mat')

IMO this has gives a nicer abbreviated form for the most common use case:

     "Your name is {}.".format(name)

This has the benefit of having similar syntax to the default 
interpolation for UNIX find and xargs commands, and eliminating errors 
from giving the wrong field number (there have been several posts in 
this thread erroneously using a 1-based index).

Apologies if I'm repeating answered questions.

Cheers
Alex.

From talin at acm.org  Thu Aug 16 06:07:50 2007
From: talin at acm.org (Talin)
Date: Wed, 15 Aug 2007 21:07:50 -0700
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C3C1DE.6070302@cs.rmit.edu.au>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
Message-ID: <46C3CD96.4070902@acm.org>

Alex Holkner wrote:
> Talin wrote:
>> A new version is up, incorporating material from the various discussions 
>> on this list:
>>
>> 	http://www.python.org/dev/peps/pep-3101/
> 
> I've been following this thread for a few weeks, and I believe the 
> following issues haven't yet been addressed:
> 
> The PEP abstract says this proposal will replace the '%' operator, yet 
> all the examples use the more verbose .format() method.  Can a later 
> section in the PEP (perhaps "String Methods") confirm that '%' on string 
> is synonymous with the format method in Python 3000?

Well, originally it was my intent that the .format method would co-exist 
beside the '%' operator, but Guido appears to want to deprecate the '%' 
operator (it will continue to be supported until 3.1 at least however.)

> What is the behaviour of whitespace in a format specifier?  e.g.
> how much of the following is valid?
> 
>       "{  foo . name  : 20s }".format(foo=open('bar'))

Eric, it's your call :)

> One use-case might be to visually line up fields (in source) with a
> minimum field width.  Even if not permitted, I believe this should be 
> mentioned in the PEP.
> 
> Does a brace that does not begin a format specifier raise an exception 
> or get treated as character data?  e.g.
> 
>      "{@foo}"
> 
> or, if no whitespace is permitted:
> 
>      "{ foo }"
> 
> or, an unmatched closing brace:
> 
>      " } "

I would say unmatched brace should not be considered an error, but I'm 
the permissive type.

> I don't have any preference on either behaviour, but would like to see 
> it clarified in the PEP.
> 
> Has there been any consideration for omitting the field name?  The 
> behaviour would be the same as the current string interpolation:
> 
>      "The {:s} sat on the {:s}".format('cat', 'mat')

I suspect that this is something many people would like to move away 
from. Particularly in cases where different format strings are being 
used on the same data (a common example is localized strings), it's 
useful to be able to change around field order without changing the 
arguments to the function.

> IMO this has gives a nicer abbreviated form for the most common use case:
> 
>      "Your name is {}.".format(name)
> 
> This has the benefit of having similar syntax to the default 
> interpolation for UNIX find and xargs commands, and eliminating errors 
> from giving the wrong field number (there have been several posts in 
> this thread erroneously using a 1-based index).
> 
> Apologies if I'm repeating answered questions.
> 
> Cheers
> Alex.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org
> 

From nnorwitz at gmail.com  Thu Aug 16 07:07:05 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Wed, 15 Aug 2007 22:07:05 -0700
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <f9v2rd$2dl$1@sea.gmane.org>
References: <f9t2nn$ksg$1@sea.gmane.org> <f9v2rd$2dl$1@sea.gmane.org>
Message-ID: <ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>

On 8/15/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Georg Brandl schrieb:
> >
> > Neal will change his build scripts, so that the 2.6 and 3.0 devel
> > documentation pages at docs.python.org will be built from these new
> > trees soon.
>
> Okay, I made the switch.  I tagged the state of both Python branches
> before the switch as tags/py{26,3k}-before-rstdocs/.

http://docs.python.org/dev/
http://docs.python.org/dev/3.0/

The upgrade went smoothly.  Below are all the issues I noticed.  I had
to install a version of python 2.5 since that is a minimum
requirement.  I had to change from a plain 'make' in the Doc directory
to 'make html'.  The output is in build/html rather than html/ now.

2.6 output:
trying to load pickled env... failed: [Errno 2] No such file or
directory: 'build/doctrees/environment.pickle'

writing output...
... library/contextlib.rst<string>:3: Warning: 'with' will become a
reserved keyword in Python 2.6
tutorial/errors.rst<string>:1: Warning: 'with' will become a reserved
keyword in Python

3.0 output:
Traceback (most recent call last):
  File "tools/sphinx-build.py", line 13, in <module>
    from sphinx import main
  File "/home/neal/python/py3k/Doc/tools/sphinx/__init__.py", line 16,
in <module>
    from .builder import builders
  File "/home/neal/python/py3k/Doc/tools/sphinx/builder.py", line 35,
in <module>
    from .environment import BuildEnvironment
  File "/home/neal/python/py3k/Doc/tools/sphinx/environment.py", line
34, in <module>
    from docutils.parsers.rst.states import Body
  File "/home/neal/python/py3k/Doc/tools/docutils/parsers/rst/__init__.py",
line 77, in <module>
    from docutils.parsers.rst import states
  File "/home/neal/python/py3k/Doc/tools/docutils/parsers/rst/states.py",
line 110, in <module>
    import roman
ImportError: No module named roman

After this error, I just linked my tools directory to the one in 2.6
(trunk) and that worked. I'm not sure if this will create problems in
the future.

trying to load pickled env... failed: [Errno 2] No such file or
directory: 'build/doctrees/environment.pickle'

writing output...
... library/contextlib.rst<string>:3: Warning: 'with' will become a
reserved keyword in Python 2.6
library/shutil.rst<string>:17: Warning: 'as' will become a reserved
keyword in Python 2.6
library/subprocess.rst<string>:7: Warning: 'as' will become a reserved
keyword in Python 2.6
tutorial/errors.rst<string>:1: Warning: 'with' will become a reserved
keyword in Python 2.6

I realize none of these are a big deal.  However, it would be nice if
it was cleaned up so that people unfamiliar with building the docs
aren't surprised.

n

From andrew.j.wade at gmail.com  Mon Aug 13 01:58:56 2007
From: andrew.j.wade at gmail.com (Andrew James Wade)
Date: Sun, 12 Aug 2007 19:58:56 -0400
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46BD79EC.1020301@acm.org>
References: <46BD79EC.1020301@acm.org>
Message-ID: <20070812195856.d6f085e8.ajwade+00@andrew.wade.networklinux.net>

On Sat, 11 Aug 2007 01:57:16 -0700
Talin <talin at acm.org> wrote:

> Taking some ideas from the various threads, here's what I'd like to propose:
> 
> (Assume that brackets [] means 'optional field')
> 
>    [:[type][align][sign][[0]minwidth][.precision]][/fill][!r]
> 
> Examples:
> 
>     :f        # Floating point number of natural width
>     :f10      # Floating point number, width at least 10
>     :f010     # Floating point number, width at least 10, leading zeros
>     :f.2      # Floating point number with two decimal digits
>     :8        # Minimum width 8, type defaults to natural type
>     :d+2      # Integer number, 2 digits, sign always shown
>     !r        # repr() format
>     :10!r     # Field width 10, repr() format
>     :s10      # String right-aligned within field of minimum width
>               # of 10 chars.
>     :s10.10   # String right-aligned within field of minimum width
>               # of 10 chars, maximum width 10.
>     :s<10     # String left-aligned in 10 char (min) field.
>     :d^15     # Integer centered in 15 character field
>     :>15/.    # Right align and pad with '.' chars
>     :f<+015.5 # Floating point, left aligned, always show sign,
>               # leading zeros, field width 15 (min), 5 decimal places.
> 
> Notes:
> 
>    -- Leading zeros is different than fill character, although the two 
> are mutually exclusive. (Leading zeros always go between the sign and 
> the number, padding does not.)
>    -- For strings, precision is used as maximum field width.
>    -- __format__ functions are not allowed to re-interpret '!r'.
> 
> I realize that the grouping of things is a little odd - for example, it 
> would be nice to put minwidth, padding and alignment in their own little 
> group so that they could be processed independently from __format__. 

Most custom formatting specs will probably end up putting width,
padding and alignment in their own little group and will delegate
those functions to str.__format__. Like so:

:>30/.,yyyy-MM-dd HH:mm:ss

def __format__(self, specifiers):
    align_spec, foo_spec = (specifiers.split(",",1) + [""])[:2]
    ... format foo ...
    return str.__format__(formatted_foo, align_spec.replace("foo", "s"))

(I would suggest allowing ,yyyy-MM-dd as a short form of :,yyyy-MM-dd).

I suspect there will be few cases where it makes sense to intermingle
the width/alignment/padding fields with other fields. 

I would move !r to the start of the formatting specification; it should be
prominent when it appears, and format will want to find it easily and
unambiguously rather than leaving it to boilerplate in each __format__
method.

-- Andrew

From carlmj at hawaii.edu  Mon Aug 13 12:08:50 2007
From: carlmj at hawaii.edu (Carl Johnson)
Date: Mon, 13 Aug 2007 00:08:50 -1000
Subject: [Python-3000] More PEP 3101 changes incoming
Message-ID: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu>

(First, let me apologize for diving into a bike shed discussion.)

There are two proposed ways to handle custom __format__ methods:

> class MyInt:
>      def __format__(self, spec):
>          if int.is_int_specifier(spec):
>              return int(self).__format__(spec)
>          return "MyInt instance with custom specifier " + spec
>      def __int__(self):
>          return <some local state>

and

> class MyInt:
>     def __format__(self, spec):
>         if is_custom_spec(spec):
>             return "MyInt instance with custom specifier " + spec
>         return NotImplemented
>     def __int__(self):
>         return <some local state>

I think this would be more straightforward as:

class MyInt:
     def __format__(self, spec):
         if is_MyInt_specific_spec(spec):
             return "MyInt instance with custom specifier " + spec
	else:
             return int(self).__format__(spec)
     def __int__(self):
         return <some local state>

The makers of the MyInt class should be the ones responsible for  
knowing that
MyInt can be converted to int as needed for output. If they want  
MyInt to
handle all the same format spec options as MyInt, it's up to them to  
either
implement them all in their __format__ or to cast the instance object  
to int
then call its __format__ object by themselves. I don't see the point  
in having
format guess what MyInt should be converted to if it can't handle the  
options
passed to it. If we go too far down this road, if MyInt craps out  
when given
":MM-DD-YY", then format will be obliged to try casting to Date just  
to see if
it will work. No, I think the format function should be somewhat  
dumb, since
dumb makes more sense to __format__ implementers than clever. Let  
them figure
out what their type can be cast into.

In the case that regular int can't handle the given format spec either,
int.__format__ will raise (return?) NotImplemented, in which case the  
format
function will try string conversion, and then if that also pukes, a  
runtime
exception should be raised.

I also like the idea of using "!r" for calling repr and agree that it  
should be
listed first. The syntax seems to be calling out for a little bit of  
extension
though. Might it be nice to be able to do something like this?

s = "10"
print("{0!i:+d}".format(s)) #prints "+10"

The !i attempts to cast the string to int. If it fails, then an  
exception is
raised. If it succeeds, then the int.__format__ method is used on the  
remainder
of the spec string. The logic is that ! commands are abbreviated  
functions that
are applied to the input before other formatting options are given.

On the one hand, this does risk a descent into "line noise" if too  
many !
options are provided. On the other hand, I think that providing !  
options for
just repr, str, int, and float probably wouldn't be too bad, and  
might save
some tedious writing of int(s), etc. in spots. It seems like if we're  
going to
have a weird syntax for repr anyway, we might as well use it to make  
things
more convenient in other ways. Or is this too TMTOWTDI-ish, since one  
could
just write int(s) instead? (But by that logic, one could write repr 
(s) too?)

The format function would end up looking like this:

def format(obj, spec):
     if spec[0] == "!":
         switch statement for applying obj = repr(obj), obj = int 
(obj), etc.
         spec = spec[2:]
     if obj.__format__ and type(obj) is not str:
         try:
             #if spec contains letters not understood, __format__  
raises NI
             return obj.__format__(spec)
         except NotImplemented:
             pass #everything gets put through str as a last resort
     return str(obj).__format__(spec) #last chance before throwing  
exception

Does this make sense to anyone else?

--Carl Johnson

From talin at acm.org  Thu Aug 16 09:03:52 2007
From: talin at acm.org (Talin)
Date: Thu, 16 Aug 2007 00:03:52 -0700
Subject: [Python-3000] [Python-Dev]  Documentation switch imminent
In-Reply-To: <ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>
References: <f9t2nn$ksg$1@sea.gmane.org> <f9v2rd$2dl$1@sea.gmane.org>
	<ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>
Message-ID: <46C3F6D8.2090502@acm.org>

Neal Norwitz wrote:
> On 8/15/07, Georg Brandl <g.brandl at gmx.net> wrote:
>> Georg Brandl schrieb:
>>> Neal will change his build scripts, so that the 2.6 and 3.0 devel
>>> documentation pages at docs.python.org will be built from these new
>>> trees soon.
>> Okay, I made the switch.  I tagged the state of both Python branches
>> before the switch as tags/py{26,3k}-before-rstdocs/.
> 
> http://docs.python.org/dev/
> http://docs.python.org/dev/3.0/

So awesome. Great job everyone!

-- Talin


From talin at acm.org  Thu Aug 16 09:09:39 2007
From: talin at acm.org (Talin)
Date: Thu, 16 Aug 2007 00:09:39 -0700
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu>
References: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu>
Message-ID: <46C3F833.7040209@acm.org>

Carl Johnson wrote:
> (First, let me apologize for diving into a bike shed discussion.)
> 
> There are two proposed ways to handle custom __format__ methods:
> 
>> class MyInt:
>>      def __format__(self, spec):
>>          if int.is_int_specifier(spec):
>>              return int(self).__format__(spec)
>>          return "MyInt instance with custom specifier " + spec
>>      def __int__(self):
>>          return <some local state>
> 
> and
> 
>> class MyInt:
>>     def __format__(self, spec):
>>         if is_custom_spec(spec):
>>             return "MyInt instance with custom specifier " + spec
>>         return NotImplemented
>>     def __int__(self):
>>         return <some local state>
> 
> I think this would be more straightforward as:
> 
> class MyInt:
>      def __format__(self, spec):
>          if is_MyInt_specific_spec(spec):
>              return "MyInt instance with custom specifier " + spec
> 	else:
>              return int(self).__format__(spec)
>      def __int__(self):
>          return <some local state>
> 
> The makers of the MyInt class should be the ones responsible for  
> knowing that
> MyInt can be converted to int as needed for output. If they want  
> MyInt to
> handle all the same format spec options as MyInt, it's up to them to  
> either
> implement them all in their __format__ or to cast the instance object  
> to int
> then call its __format__ object by themselves. I don't see the point  
> in having
> format guess what MyInt should be converted to if it can't handle the  
> options
> passed to it. If we go too far down this road, if MyInt craps out  
> when given
> ":MM-DD-YY", then format will be obliged to try casting to Date just  
> to see if
> it will work. No, I think the format function should be somewhat  
> dumb, since
> dumb makes more sense to __format__ implementers than clever. Let  
> them figure
> out what their type can be cast into.

+1

> In the case that regular int can't handle the given format spec either,
> int.__format__ will raise (return?) NotImplemented, in which case the  
> format
> function will try string conversion, and then if that also pukes, a  
> runtime
> exception should be raised.
> 
> I also like the idea of using "!r" for calling repr and agree that it  
> should be
> listed first. The syntax seems to be calling out for a little bit of  
> extension
> though. Might it be nice to be able to do something like this?
> 
> s = "10"
> print("{0!i:+d}".format(s)) #prints "+10"

It's been talked about extending it. The plan is to first implement the
more restricted version and let people hack on it, adding what features
are deemed useful in practice.

> The !i attempts to cast the string to int. If it fails, then an  
> exception is
> raised. If it succeeds, then the int.__format__ method is used on the  
> remainder
> of the spec string. The logic is that ! commands are abbreviated  
> functions that
> are applied to the input before other formatting options are given.
> 
> On the one hand, this does risk a descent into "line noise" if too  
> many !
> options are provided. On the other hand, I think that providing !  
> options for
> just repr, str, int, and float probably wouldn't be too bad, and  
> might save
> some tedious writing of int(s), etc. in spots. It seems like if we're  
> going to
> have a weird syntax for repr anyway, we might as well use it to make  
> things
> more convenient in other ways. Or is this too TMTOWTDI-ish, since one  
> could
> just write int(s) instead? (But by that logic, one could write repr 
> (s) too?)
> 
> The format function would end up looking like this:
> 
> def format(obj, spec):
>      if spec[0] == "!":
>          switch statement for applying obj = repr(obj), obj = int 
> (obj), etc.
>          spec = spec[2:]
>      if obj.__format__ and type(obj) is not str:
>          try:
>              #if spec contains letters not understood, __format__  
> raises NI
>              return obj.__format__(spec)
>          except NotImplemented:
>              pass #everything gets put through str as a last resort
>      return str(obj).__format__(spec) #last chance before throwing  
> exception

The built-in 'format' function doesn't handle '!r', that's done by the
caller. The 'spec' argument passed in to 'format' is the part *after*
the colon.

Also, there's no need to test for the existence __format__, because all
objects will have a __format__ method which is inherited from
object.__format__.

> Does this make sense to anyone else?

Perfectly.

> --Carl Johnson
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org
> 

From andrew.j.wade at gmail.com  Thu Aug 16 09:28:06 2007
From: andrew.j.wade at gmail.com (Andrew James Wade)
Date: Thu, 16 Aug 2007 03:28:06 -0400
Subject: [Python-3000] Format specifier proposal
In-Reply-To: <46C3A5D4.3060602@canterbury.ac.nz>
References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com>
	<46C11DDF.2080607@acm.org>
	<ca471dc20708132053h1fa1c18ai87d61d8b85aadf07@mail.gmail.com>
	<20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net>
	<ca471dc20708140941o1428bda9w9aa1f9be772d80e5@mail.gmail.com>
	<20070814230227.0c9be356.ajwade+py3k@andrew.wade.networklinux.net>
	<46C3A5D4.3060602@canterbury.ac.nz>
Message-ID: <20070816032806.a8427bbd.ajwade+py3k@andrew.wade.networklinux.net>

On Thu, 16 Aug 2007 13:18:12 +1200
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

> Andrew James Wade wrote:
> > {1:!renewal date: %Y-%m-%d} # no special meaning for ! here.
> 
> Yuck. Although it might happen to work due to reuse of
> strftime, I'd consider that bad style -- constant parts
> of the output string should be outside of the format
> specs, i.e.:
> 
>    "renewal date: {1:%Y-%m-%d}".format(my_date)

To be sure; it's just that I couldn't think of a better example. My
point is by putting spec1 last, the only things you need to escape
are { and }. (They can be escaped as {lb} and {rb} by passing the
right parameters.)

The alteratives I see are:
1. [:spec1[,spec2]]   - {1: %B %d, %Y} doesn't work as expected.
2. [!spec2][:spec1]]  - order reversed.
                      - meaning of spec2 is overloaded by !r, !s.[1]
3. [:spec2[,spec1]]   - order reversed.
                      - spec1-only syntax is too elaborate: {1:,%Y-%m-%d}
4. [,spec2][:spec1]  long discussion here:
http://mail.python.org/pipermail/python-3000/2007-August/009066.html
                      - order reversed problem is particularly bad,
                        because : looks like it should have low
                        precedence.
                      - meaning of spec2 is overloaded by ,r ,s.[1]
                      - On the positive side, this is similar to .NET
                        syntax.
5. { {1:spec1}:spec2} - looks like a replacement field for the name
                        specifier. (Though a spec1 like %Y-%m-%d would
                        tend to counteract that impression.)

[1] This is particularly awkward since spec2 should be applied after
spec1, but !s and !r should be applied before spec1.

And in Talin's proposal, spec2 will be superfluous for strings and
integers. It's also not needed when all you want to do is align str(x).

I don't think any of them will fly :-(. My guess is that __format__
methods will do the chaining themselves with little standardization
on the syntax to do so.

-- Andrew

From p.f.moore at gmail.com  Thu Aug 16 11:44:21 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 16 Aug 2007 10:44:21 +0100
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C367A8.4040601@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46BCA9C9.1010306@ronadam.com>
	<46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com>
	<46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com>
	<fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>
	<46C2325D.1010209@ronadam.com>
	<fb6fbf560708150907q5d2f037ex85139dce51dec7fb@mail.gmail.com>
	<46C367A8.4040601@ronadam.com>
Message-ID: <79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com>

On 15/08/07, Ron Adam <rrr at ronadam.com> wrote:
> EXAMPLES:
>
[...]
> Examples from python3000 list:
[...]

Can I suggest that these all go into the PEP, to give readers some
flavour of what the new syntax will look like?

I'd also repeat the suggestion that these examples be posted to
comp.lang.python, to get more general community feedback.

Paul.

From rrr at ronadam.com  Thu Aug 16 12:21:00 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 16 Aug 2007 05:21:00 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com>
References: <46B13ADE.7080901@acm.org> <46BCA9C9.1010306@ronadam.com>	
	<46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com>	
	<46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com>	
	<fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>	
	<46C2325D.1010209@ronadam.com>	
	<fb6fbf560708150907q5d2f037ex85139dce51dec7fb@mail.gmail.com>	
	<46C367A8.4040601@ronadam.com>
	<79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com>
Message-ID: <46C4250C.3050806@ronadam.com>

Paul Moore wrote:
> On 15/08/07, Ron Adam <rrr at ronadam.com> wrote:
>> EXAMPLES:
>>
> [...]
>> Examples from python3000 list:
> [...]
> 
> Can I suggest that these all go into the PEP, to give readers some
> flavour of what the new syntax will look like?
> 
> I'd also repeat the suggestion that these examples be posted to
> comp.lang.python, to get more general community feedback.
> 
> Paul.

Currently these particular examples aren't the syntax supported by the PEP. 
  It's an alternative/possibly syntax only if there is enough support for a 
serial left to right specification pattern as outlined.

What the pep supports is a single value that is passed to the __format__ 
function.  So the pep syntax combines alignment and other options into one 
term that the __format__ methods must decode all at once.

I think most of developers here are still looking at various details and 
are still undecided.  Do you have a preference for one or the other yet?

Cheers,
    Ron



From p.f.moore at gmail.com  Thu Aug 16 13:08:28 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 16 Aug 2007 12:08:28 +0100
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <46C4250C.3050806@ronadam.com>
References: <46B13ADE.7080901@acm.org> <46BF22D8.2090309@trueblade.com>
	<46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com>
	<fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>
	<46C2325D.1010209@ronadam.com>
	<fb6fbf560708150907q5d2f037ex85139dce51dec7fb@mail.gmail.com>
	<46C367A8.4040601@ronadam.com>
	<79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com>
	<46C4250C.3050806@ronadam.com>
Message-ID: <79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com>

On 16/08/07, Ron Adam <rrr at ronadam.com> wrote:
> Currently these particular examples aren't the syntax supported by the PEP.
>  It's an alternative/possibly syntax only if there is enough support for a
> serial left to right specification pattern as outlined.

Ah, I hadn't realised that. I've been skipping most of the
discussions, mainly because of the lack of concrete examples :-)

> I think most of developers here are still looking at various details and
> are still undecided.  Do you have a preference for one or the other yet?

As evidenced by the fact that I failed to notice the difference, I
can't distinguish the two :-)

All of the examples I've seen are hard to read. As Greg said, I find
that I have to study the format string, mentally breaking it into
parts, before I understand it. This is in complete contrast to
printf-style "%.10s" formats. I'm not at all sure this is anything
more than unfamiliarity, compounded by the fact that most of the
examples I see on the list are relatively complex, or edge cases. But
it's a major barrier to both understanding and acceptance of the new
proposals.

I'd still really like to see:

1. A simple listing of common cases, maybe taken from something like
stdlib uses of %-formats. Yes, most of them would be pretty trivial.
That's the point!

2. A *very short* comparison of a few more advanced cases - I'd
suggest formatting floats as fixed width, 2 decimal places (%5.2f),
formatting 8-digit hex (%.8X) and maybe a simple date format
(%Y-%m-%d). Yes, those are the sort of things I consider advanced.
Examples I've seen in the discussion aren't "advanced" in my book,
they are "I'll never use that" :-)

3. Another very short list of a couple of things you can do with the
new format, which you can't do with the existing % formats.
Concentrate here on real-world use cases - tabular reports, reordering
fields for internationalisation, things like that. As a data point,
I've never needed to centre a field in a print statement. Heck, I
don't even recall ever needing to specify how the sign of a number was
printed!

I get the impression that the "clever" new features aren't actually
going to address the sorts of formatting problems I hit a lot. That's
fine, I can write code to do what I want, but there's a sense of YAGNI
about the discussion, because (for example) by the time I need to
format a centred, max-18, min-5 character number with 3 decimal places
and the sign hard to the left, I'm also going to want to dot-fill a
string to 30 characters and insert commas into the number, and I'm
writing code anyway, so why bother with an obscure format string that
only does half the job? (It goes without saying that if the format
string can do everything I want, that argument doesn't work, but then
we get to the complexity issues that hit regular expressions :-))

Sorry if this sounds a bit skeptical, but there's a *lot* of
discussion here over a feature I expect to use pretty infrequently.
99% of my % formats use nothing more than %s!

Paul.

From eric+python-dev at trueblade.com  Thu Aug 16 14:08:14 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 16 Aug 2007 08:08:14 -0400
Subject: [Python-3000] Adding __format__ to object
Message-ID: <46C43E2E.3000308@trueblade.com>

As part of implementing PEP 3101, I need to add __format__ to object, to 
achieve the equivalent of:

class object:
     def __format__(self, format_spec):
         return format(str(self), format_spec)

I've added __format__ to int, unicode, etc., but I can't figure out 
where or how to add it to object.

Any pointers are appreciated.  Something as simple as "look at foo.c" or 
"grep for __baz__" would be good enough.

Thanks!
Eric.

From skip at pobox.com  Thu Aug 16 14:29:12 2007
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 16 Aug 2007 07:29:12 -0500
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C3C1DE.6070302@cs.rmit.edu.au>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
Message-ID: <18116.17176.123168.265491@montanaro.dyndns.org>


    Alex> The PEP abstract says this proposal will replace the '%' operator,

I hope this really doesn't happen.  printf-style formatting has a long
history both in C and Python and is well-understood.  Its few limitations
are mostly due to the binary nature of the % operator, not to the power or
flexibility of the format strings themselves.  In contrast, the new format
"language" seems to have no history (is it based on features in other
languages?  does anyone know if it will actually be usable in common
practice?) and at least to the casual observer of recent threads on this
topic seems extremely baroque.

Python has a tradition of incorporating the best ideas from other languages.
String formatting is so common that it doesn't seem to me we should need to
invent a new, unproven mechanism to do this.

Skip

From eric+python-dev at trueblade.com  Thu Aug 16 14:50:21 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 16 Aug 2007 08:50:21 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C3CD96.4070902@acm.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<46C3CD96.4070902@acm.org>
Message-ID: <46C4480D.1060603@trueblade.com>

Talin wrote:
> Alex Holkner wrote:
>> What is the behaviour of whitespace in a format specifier?  e.g.
>> how much of the following is valid?
>>
>>       "{  foo . name  : 20s }".format(foo=open('bar'))
> 
> Eric, it's your call :)

I'm okay with whitespace before the colon (or !, as the case may be). 
After the colon, I'd say it's significant and can't be automatically 
removed, because a particular formatter might care (for example, "%Y %M 
%D" for dates).

Currently the code doesn't allow whitespace before the colon.  I can add 
this if time permits, once everything else is implemented.


From g.brandl at gmx.net  Thu Aug 16 14:58:08 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 16 Aug 2007 14:58:08 +0200
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.17176.123168.265491@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
Message-ID: <fa1hkr$gfj$1@sea.gmane.org>

skip at pobox.com schrieb:
>     Alex> The PEP abstract says this proposal will replace the '%' operator,
> 
> I hope this really doesn't happen.  printf-style formatting has a long
> history both in C and Python and is well-understood.  Its few limitations
> are mostly due to the binary nature of the % operator, not to the power or
> flexibility of the format strings themselves.  In contrast, the new format
> "language" seems to have no history (is it based on features in other
> languages?  does anyone know if it will actually be usable in common
> practice?) and at least to the casual observer of recent threads on this
> topic seems extremely baroque.
> 
> Python has a tradition of incorporating the best ideas from other languages.
> String formatting is so common that it doesn't seem to me we should need to
> invent a new, unproven mechanism to do this.

Not to mention the pain of porting %-style format strings and % formatting
to {}-style format strings and .format() in Py3k.

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From lists at cheimes.de  Thu Aug 16 15:12:25 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 16 Aug 2007 15:12:25 +0200
Subject: [Python-3000] Adding __format__ to object
In-Reply-To: <46C43E2E.3000308@trueblade.com>
References: <46C43E2E.3000308@trueblade.com>
Message-ID: <46C44D39.7040408@cheimes.de>

Eric Smith wrote:
> Any pointers are appreciated.  Something as simple as "look at foo.c" or 
> "grep for __baz__" would be good enough.

look at Objects/typeobject.c and grep for PyMethodDef object_methods[]

Christian


From barry at python.org  Thu Aug 16 15:43:29 2007
From: barry at python.org (Barry Warsaw)
Date: Thu, 16 Aug 2007 09:43:29 -0400
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.17176.123168.265491@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
Message-ID: <32B81623-9B2E-4B0C-99DF-12415E3F79E3@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 16, 2007, at 8:29 AM, skip at pobox.com wrote:

>     Alex> The PEP abstract says this proposal will replace the '%'  
> operator,
>
> I hope this really doesn't happen.  printf-style formatting has a long
> history both in C and Python and is well-understood.  Its few  
> limitations
> are mostly due to the binary nature of the % operator, not to the  
> power or
> flexibility of the format strings themselves.  In contrast, the new  
> format
> "language" seems to have no history (is it based on features in other
> languages?  does anyone know if it will actually be usable in common
> practice?) and at least to the casual observer of recent threads on  
> this
> topic seems extremely baroque.
>
> Python has a tradition of incorporating the best ideas from other  
> languages.
> String formatting is so common that it doesn't seem to me we should  
> need to
> invent a new, unproven mechanism to do this.

There are two parts to this, one is the language you use to define  
formatting and the other is the syntax you use to invoke it.  I've  
been mostly ignoring the PEP 3101 discussions because every time I  
see examples of an advanced format string I shudder and resign myself  
to never remembering how to use it.  It certainly doesn't feel like  
it's going to fit /my/ brain.

OTOH, I'm not saying that a super whizzy all-encompassing inscrutable- 
but-powerful format language is a bad thing for Python, but it may  
not be the best /default/ language for formatting.  OTOH, I don't  
think the three different formatting languages need three different  
syntaxes to invoke them.

The three formatting languages are, in order of decreasing simplicity  
and familiarity, but increasing power:

- - PEP 292 $-strings
- - Python 2 style % substitutions
- - PEP 3101 format specifiers

I think all three languages have their place, but I would like to see  
if there's some way to make spelling their use more consistent.  Not  
that I have any great ideas on that front, but I think the proposals  
in PEP 3101 (which I've only skimmed) are tied too closely to the  
latter format language.

While I agree with Skip's sentiment, I'd say that it's not the %- 
operator I care as much about as it is the middle road that the  
formatting language it uses takes.  For example, the logging package  
uses the same language but exposes a better (IMO) way to spell its use:

 >>> log.info('User %s ate %s', user, food)

So the question is, is there some way to unify the use of format  
strings in the three different formatting languages, giving equal  
weight to each under the acknowledgment that all three use cases are  
equally valid?

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsRUgnEjvBPtnXfVAQIgPgP9EeHyZRdco1w5yUG1ro8UoTMFJ5ppsxcK
Lyif38XaXTCL0t5nvxbvvI1GZksOHY4qyUwmUYrs+APhJbfSXfoGU1Ih+CzJhWPE
1PMng4s2z2pubpqGbAgV6etHx7Uiy8RPxp9lsD6rBo4GdtJwfTAFGgRgU67foMBl
ijlxMujdOR0=
=tw3Y
-----END PGP SIGNATURE-----

From guido at python.org  Thu Aug 16 16:53:01 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 16 Aug 2007 07:53:01 -0700
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <fa1hkr$gfj$1@sea.gmane.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa1hkr$gfj$1@sea.gmane.org>
Message-ID: <ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>

> >     Alex> The PEP abstract says this proposal will replace the '%' operator,

> skip at pobox.com:
> > I hope this really doesn't happen.  printf-style formatting has a long
> > history both in C and Python and is well-understood.  Its few limitations
> > are mostly due to the binary nature of the % operator, not to the power or
> > flexibility of the format strings themselves.  In contrast, the new format
> > "language" seems to have no history (is it based on features in other
> > languages?  does anyone know if it will actually be usable in common
> > practice?) and at least to the casual observer of recent threads on this
> > topic seems extremely baroque.
> >
> > Python has a tradition of incorporating the best ideas from other languages.
> > String formatting is so common that it doesn't seem to me we should need to
> > invent a new, unproven mechanism to do this.

On 8/16/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Not to mention the pain of porting %-style format strings and % formatting
> to {}-style format strings and .format() in Py3k.

There are many aspects to this. First of all, the discussion of PEP
3101 is taking too long, and some of the proposals are indeed outright
scary. I have long stopped following it -- basically I only pay
attention when Talin personally tells me that there's a new proposal.
I *think* that with Monday's breakthrough we're actually close, but
that apparently doesn't stop a few folks from continuing the heated
discussion.

Second, most aspects of the proposal have actually been lifted from
other languages. The {...} notation is common in many web templating
languages and also in .NET. In .NET, for example, you can write {0},
{1} etc. to reference positional parameters just like in the PEP. I
don't recall if it supports {x} to reference to parameters by name,
but that's common in web templating languages. The idea of allowing
{x.name} and {x[key]} or {x.name[1]} also comes from web templating
languages. In .NET, if you hav additional formatting requirements, you
can write {0,10} to format parameter 0 with a minimum width of 10, and
{0,-10} to right-align it. In .NET you can also write {0:xxx} where
xxx is a mini-language used to express more details; this is used to
request things like hex output, or the many variants of formatting
floats.

While we're not copying .NET *exactly*, most of the basic ideas are
very similar; the discussion at this point is mostly about the
type-specific mini-languages. My proposal is to use *exactly the same
mini-language as used in 2.x %-formatting* (but without the '%'
character), in particular: {0:10.3f} will format a float in a field of
10 characters wide with 3 digits behind the decimal point, and {0:08x}
will format an int in hex with 0-padding in a field 8 characters wide.
For strings, you can write {0:10.20s} to specify a min width of 10 and
a max width of 20 (I betcha you didn't even know you could do this
with %10.20s :-). The only added wrinkle is that you can also write
{0!r} to *force* using repr() on the value. This is similar to %r in
2.x. Of course, non-numeric types can define their own mini-language,
but that's all advanced stuff. (The concept of type-specific
mini-languages is straight from .NET though.)

I'm afraid there's an awful lot of bikeshedding going on trying to
improve on this, e.g. people want the 'f' or 'x' in front, but I think
time to the first alpha release is so close that we should stop
discussing this and start implementing. (Fortunately at least one
person has already implemented most of this.)

Much of the earlies discussion was also terribly misguided because of
an earlier assumption that the mini-language should coerce the type.
This caused endless confusion about what to do with types that have
their own __format__ override. In the end we (I) wisely decided that
the object's __format__ always wins and numeric types will just have
to support the same mini-language by convention. The user of the
format() method won't care about any of this.

Now on to the transition. On the one hand I always planned this to
*replace* the old %-formatting syntax, which has a number of real
problems: "%s" % x raises an exception if x happens to be a tuple, and
you have to write "%s" % (x,) to format an object if you aren't sure
about its type; also, it's very common to forget the trailing 's' in
"%(name)s" % {'name': ...}.

On the other hand it's too close to the alpha 1 release to fix all the
current uses of %. (In fact it would be just short of a miracle if a
working format() implementation made it into 3.0a1 at all. But I
believe in miracles.)

The mechanical translation is relatively straightforward when the
format string is given as a literal, and this part is well within the
scope of the 2to3 tool (someone just has to write the converter). The
problems come, however, when formatting strings are passed around in
variables or arguments. We can't very well assume that every string
that happens to contain a % sign is a format string, and we can't
assume that every use of the % operator is a formatting operator,
either. Talin has jokingly proposed to translate *all* occurrences of
x%y into _legacy_percent(x, y) which would be a function that does
on-the-fly translation of format strings if x is a string, and returns
x%y if it isn't, but that doesn't sound attractive at all.

I don't know what percentage of %-formatting uses a string literal on
the left; if it's a really high number (high 90s), I'd like to kill
%-formatting and go with mechanical translation; otherwise, I think
we'll have to phase out %-formatting in 3.x or 4.0.

I hope this takes away some of the fears; and gives the PEP 3101 crowd
the incentive to stop bikeshedding and start coding!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Thu Aug 16 17:05:36 2007
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 16 Aug 2007 10:05:36 -0500
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa1hkr$gfj$1@sea.gmane.org>
	<ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
Message-ID: <18116.26560.234919.274765@montanaro.dyndns.org>


Thanks for the detailed response.

    Guido> Now on to the transition. On the one hand I always planned this
    Guido> to *replace* the old %-formatting syntax, which has a number of
    Guido> real problems: "%s" % x raises an exception if x happens to be a
    Guido> tuple, and you have to write "%s" % (x,) to format an object if
    Guido> you aren't sure about its type; also, it's very common to forget
    Guido> the trailing 's' in "%(name)s" % {'name': ...}.

I was conflating the format string and the % operator in some of my (casual)
thinking.  I'm much less married to retaining the % operator itself (that is
the source of most of the current warts I believe), but as you pointed out
some of the format string proposals are pretty scary.

    Guido> I don't know what percentage of %-formatting uses a string
    Guido> literal on the left; if it's a really high number (high 90s), I'd
    Guido> like to kill %-formatting and go with mechanical translation;
    Guido> otherwise, I think we'll have to phase out %-formatting in 3.x or
    Guido> 4.0.

Yow!  You're already thinking about Python 4???

Skip

From eric+python-dev at trueblade.com  Thu Aug 16 17:09:56 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 16 Aug 2007 11:09:56 -0400
Subject: [Python-3000] Adding __format__ to object
In-Reply-To: <46C44D39.7040408@cheimes.de>
References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de>
Message-ID: <46C468C4.3020907@trueblade.com>

Christian Heimes wrote:
> Eric Smith wrote:
>> Any pointers are appreciated.  Something as simple as "look at foo.c" or 
>> "grep for __baz__" would be good enough.
> 
> look at Objects/typeobject.c and grep for PyMethodDef object_methods[]

I should have mentioned that's among the things I've already tried.

But that appears to add methods to 'type', not to an instance of 
'object'.  If you do dir(object()):

$ ./python
Python 3.0x (py3k:57077M, Aug 16 2007, 10:10:04)
[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> dir(object())
['__class__', '__delattr__', '__doc__', '__eq__', '__ge__', 
'__getattribute__', '__gt__', '__hash__', '__init__', '__le__', 
'__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', 
'__repr__', '__setattr__', '__str__']

You don't see the methods in typeobject.c (__mro__, etc).

This is pretty much the last hurdle in finishing my implementation of 
PEP 3101.  The rest of it is either done, or just involves refactoring 
existing code.

Eric.

From jimjjewett at gmail.com  Thu Aug 16 17:17:18 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 16 Aug 2007 11:17:18 -0400
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu>
References: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu>
Message-ID: <fb6fbf560708160817v4b8b23a0pd6974961d01da5c7@mail.gmail.com>

On 8/13/07, Carl Johnson <carlmj at hawaii.edu> wrote:
> I also like the idea of using "!r" for calling repr ...

> s = "10"
> print("{0!i:+d}".format(s)) #prints "+10"

> The !i attempts to cast the string to int. ...
> The logic is that ! commands are abbreviated functions ...

Which does the "i" mean?
    (1)  Call s.__format__(...) with a flag indicating that it should
format itself like an integer.
    (2)  Ignore s.__format__, and instead call s.__index__().__format__(...)

If it is (case 1) an instruction to the object, then I don't see why
it needs to be special-cased; objects can handle (or not) any format
string, and "i" may well typically mean integer, but not always.

If it is (case 2) an instruction to the format function, then what are
the limits?  I see the value of r for repr, because that is already a
built-in alternative representation.  If we also allow int, then we
might as well allow arbitrary functions to check for validity
constraints.


    def valcheck(val, spec=None):
        v=index(v)
        if not v in range(11):
            raise ValueError("Expected an integer in [0..10], but got
{0!r}".format(v))
        if spec is None:
            return v
        return spec.format(v)

...

    "You rated your experience as {0!valcheck:d} out of 10."

> ... is this too TMTOWTDI-ish, since one could
> just write int(s) instead?

You can't write int(s) if you're passing a mapping (or tuple) from
someone else; at best you can copy the mapping and modify certain
values.

-jJ

From p.f.moore at gmail.com  Thu Aug 16 17:32:43 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 16 Aug 2007 16:32:43 +0100
Subject: [Python-3000] Adding __format__ to object
In-Reply-To: <46C468C4.3020907@trueblade.com>
References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de>
	<46C468C4.3020907@trueblade.com>
Message-ID: <79990c6b0708160832v2a7a105bp9f6323a77b05361b@mail.gmail.com>

On 16/08/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Christian Heimes wrote:
> > look at Objects/typeobject.c and grep for PyMethodDef object_methods[]
>
> I should have mentioned that's among the things I've already tried.
[...]
> You don't see the methods in typeobject.c (__mro__, etc).

__mro__ is in type_members (at the top of the file). You want
object_methods (lower down). All it currently defines is __reduce__
and __reduce_ex__ (which are in dir(object()).

I've not tested (or ever used) this, so I could be wrong, of course.
Paul.

From talin at acm.org  Thu Aug 16 17:49:28 2007
From: talin at acm.org (Talin)
Date: Thu, 16 Aug 2007 08:49:28 -0700
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
References: <46C2809C.3000806@acm.org>
	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa1hkr$gfj$1@sea.gmane.org>
	<ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
Message-ID: <46C47208.4070103@acm.org>

Guido van Rossum wrote:
> While we're not copying .NET *exactly*, most of the basic ideas are
> very similar; the discussion at this point is mostly about the
> type-specific mini-languages. My proposal is to use *exactly the same
> mini-language as used in 2.x %-formatting* (but without the '%'
> character), in particular: {0:10.3f} will format a float in a field of
> 10 characters wide with 3 digits behind the decimal point, and {0:08x}
> will format an int in hex with 0-padding in a field 8 characters wide.
> For strings, you can write {0:10.20s} to specify a min width of 10 and
> a max width of 20 (I betcha you didn't even know you could do this
> with %10.20s :-). The only added wrinkle is that you can also write
> {0!r} to *force* using repr() on the value. This is similar to %r in
> 2.x. Of course, non-numeric types can define their own mini-language,
> but that's all advanced stuff. (The concept of type-specific
> mini-languages is straight from .NET though.)

Just to follow up on what Guido said:

The current language of the PEP uses a formatting mini-language which is 
very close to the conversion specifiers of the existing '%' operator, 
and which is backwards compatible with it. So essentially if you 
understand printf-style formatting, you can use what you know.

It may be that additional formatting options can be added in a future 
release of Python, however they should (a) be backwards compatible with 
what we have now, (b) demonstrate a compelling need, and (c) require 
their own PEP. In other words, I'm not going to add any more creeping 
features to the current PEP. Except for updating the examples and adding 
some clarifications that people have asked for, it's *done*.

I admit that part of this whole syntax discussion was my fault - I did 
ask for a bit of a bikeshed discussion and I got way more than I 
bargained for :)

-- Talin

From janssen at parc.com  Thu Aug 16 18:29:05 2007
From: janssen at parc.com (Bill Janssen)
Date: Thu, 16 Aug 2007 09:29:05 PDT
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.17176.123168.265491@montanaro.dyndns.org> 
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
Message-ID: <07Aug16.092913pdt."57996"@synergy1.parc.xerox.com>

>     Alex> The PEP abstract says this proposal will replace the '%' operator,
> 
> I hope this really doesn't happen.  printf-style formatting has a long
> history both in C and Python and is well-understood.  Its few limitations
> are mostly due to the binary nature of the % operator, not to the power or
> flexibility of the format strings themselves.  In contrast, the new format
> "language" seems to have no history (is it based on features in other
> languages?  does anyone know if it will actually be usable in common
> practice?) and at least to the casual observer of recent threads on this
> topic seems extremely baroque.

> String formatting is so common that it doesn't seem to me we should need to
> invent a new, unproven mechanism to do this.

I strongly agree with Skip here.  The current proposal seems to me a
poor solution to a non-problem.

Bill

From guido at python.org  Thu Aug 16 18:33:31 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 16 Aug 2007 09:33:31 -0700
Subject: [Python-3000] Adding __format__ to object
In-Reply-To: <79990c6b0708160832v2a7a105bp9f6323a77b05361b@mail.gmail.com>
References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de>
	<46C468C4.3020907@trueblade.com>
	<79990c6b0708160832v2a7a105bp9f6323a77b05361b@mail.gmail.com>
Message-ID: <ca471dc20708160933y3981b006tb34d6d9f4214f196@mail.gmail.com>

Paul's right. I agree it's confusing that object and type are both
defined in the same file (though there's probably a good reason, given
that type is derived from object and object is an instance of type
:-). To add methods to object, add them to object_methods in that
file. I've tested this.

On 8/16/07, Paul Moore <p.f.moore at gmail.com> wrote:
> On 16/08/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> > Christian Heimes wrote:
> > > look at Objects/typeobject.c and grep for PyMethodDef object_methods[]
> >
> > I should have mentioned that's among the things I've already tried.
> [...]
> > You don't see the methods in typeobject.c (__mro__, etc).
>
> __mro__ is in type_members (at the top of the file). You want
> object_methods (lower down). All it currently defines is __reduce__
> and __reduce_ex__ (which are in dir(object()).
>
> I've not tested (or ever used) this, so I could be wrong, of course.
> Paul.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug 16 18:38:32 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 16 Aug 2007 09:38:32 -0700
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.26560.234919.274765@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa1hkr$gfj$1@sea.gmane.org>
	<ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
	<18116.26560.234919.274765@montanaro.dyndns.org>
Message-ID: <ca471dc20708160938s364ed232t498e5f9c842d938c@mail.gmail.com>

On 8/16/07, skip at pobox.com <skip at pobox.com> wrote:
>
> Thanks for the detailed response.
>
>     Guido> Now on to the transition. On the one hand I always planned this
>     Guido> to *replace* the old %-formatting syntax, which has a number of
>     Guido> real problems: "%s" % x raises an exception if x happens to be a
>     Guido> tuple, and you have to write "%s" % (x,) to format an object if
>     Guido> you aren't sure about its type; also, it's very common to forget
>     Guido> the trailing 's' in "%(name)s" % {'name': ...}.
>
> I was conflating the format string and the % operator in some of my (casual)
> thinking.  I'm much less married to retaining the % operator itself (that is
> the source of most of the current warts I believe), but as you pointed out
> some of the format string proposals are pretty scary.

Which is why they won't be accepted. ;-)

To clarify the % operator is the cause of the first wart; the second
wart is caused by the syntax for individual formats. Keeping one but
changing the other seems to keep some of the flaws of one or the other
while still paying the full price of change, so this isn't an option.

>     Guido> I don't know what percentage of %-formatting uses a string
>     Guido> literal on the left; if it's a really high number (high 90s), I'd
>     Guido> like to kill %-formatting and go with mechanical translation;
>     Guido> otherwise, I think we'll have to phase out %-formatting in 3.x or
>     Guido> 4.0.
>
> Yow!  You're already thinking about Python 4???

I use it in the same way we used to refer to Python 3000 in the past. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Thu Aug 16 18:46:38 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 16 Aug 2007 10:46:38 -0600
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com>
References: <46B13ADE.7080901@acm.org> <46BF3CC7.6010405@acm.org>
	<46C0EEBF.3010206@ronadam.com>
	<fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>
	<46C2325D.1010209@ronadam.com>
	<fb6fbf560708150907q5d2f037ex85139dce51dec7fb@mail.gmail.com>
	<46C367A8.4040601@ronadam.com>
	<79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com>
	<46C4250C.3050806@ronadam.com>
	<79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com>
Message-ID: <aac2c7cb0708160946ybbdca13la27f43a19e62f49f@mail.gmail.com>

On 8/16/07, Paul Moore <p.f.moore at gmail.com> wrote:
> On 16/08/07, Ron Adam <rrr at ronadam.com> wrote:
> > Currently these particular examples aren't the syntax supported by the PEP.
> >  It's an alternative/possibly syntax only if there is enough support for a
> > serial left to right specification pattern as outlined.
>
> Ah, I hadn't realised that. I've been skipping most of the
> discussions, mainly because of the lack of concrete examples :-)
>
> > I think most of developers here are still looking at various details and
> > are still undecided.  Do you have a preference for one or the other yet?
>
> As evidenced by the fact that I failed to notice the difference, I
> can't distinguish the two :-)
>
> All of the examples I've seen are hard to read. As Greg said, I find
> that I have to study the format string, mentally breaking it into
> parts, before I understand it. This is in complete contrast to
> printf-style "%.10s" formats. I'm not at all sure this is anything
> more than unfamiliarity, compounded by the fact that most of the
> examples I see on the list are relatively complex, or edge cases. But
> it's a major barrier to both understanding and acceptance of the new
> proposals.
>
> I'd still really like to see:
>
> 1. A simple listing of common cases, maybe taken from something like
> stdlib uses of %-formats. Yes, most of them would be pretty trivial.
> That's the point!

Seconded!  This discussion needs some grounding.

>
> 2. A *very short* comparison of a few more advanced cases - I'd
> suggest formatting floats as fixed width, 2 decimal places (%5.2f),
> formatting 8-digit hex (%.8X) and maybe a simple date format
> (%Y-%m-%d). Yes, those are the sort of things I consider advanced.
> Examples I've seen in the discussion aren't "advanced" in my book,
> they are "I'll never use that" :-)
>
> 3. Another very short list of a couple of things you can do with the
> new format, which you can't do with the existing % formats.
> Concentrate here on real-world use cases - tabular reports, reordering
> fields for internationalisation, things like that. As a data point,
> I've never needed to centre a field in a print statement. Heck, I
> don't even recall ever needing to specify how the sign of a number was
> printed!
>
> I get the impression that the "clever" new features aren't actually
> going to address the sorts of formatting problems I hit a lot. That's
> fine, I can write code to do what I want, but there's a sense of YAGNI
> about the discussion, because (for example) by the time I need to
> format a centred, max-18, min-5 character number with 3 decimal places
> and the sign hard to the left, I'm also going to want to dot-fill a
> string to 30 characters and insert commas into the number, and I'm
> writing code anyway, so why bother with an obscure format string that
> only does half the job? (It goes without saying that if the format
> string can do everything I want, that argument doesn't work, but then
> we get to the complexity issues that hit regular expressions :-))
>
> Sorry if this sounds a bit skeptical, but there's a *lot* of
> discussion here over a feature I expect to use pretty infrequently.
> 99% of my % formats use nothing more than %s!
>
> Paul.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/rhamph%40gmail.com
>


-- 
Adam Olsen, aka Rhamphoryncus

From rhamph at gmail.com  Thu Aug 16 18:48:12 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 16 Aug 2007 10:48:12 -0600
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C4480D.1060603@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<46C3CD96.4070902@acm.org> <46C4480D.1060603@trueblade.com>
Message-ID: <aac2c7cb0708160948i5c8cf9d0u6a39e375baceba@mail.gmail.com>

On 8/16/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Talin wrote:
> > Alex Holkner wrote:
> >> What is the behaviour of whitespace in a format specifier?  e.g.
> >> how much of the following is valid?
> >>
> >>       "{  foo . name  : 20s }".format(foo=open('bar'))
> >
> > Eric, it's your call :)
>
> I'm okay with whitespace before the colon (or !, as the case may be).
> After the colon, I'd say it's significant and can't be automatically
> removed, because a particular formatter might care (for example, "%Y %M
> %D" for dates).
>
> Currently the code doesn't allow whitespace before the colon.  I can add
> this if time permits, once everything else is implemented.

YAGNI.

-- 
Adam Olsen, aka Rhamphoryncus

From eric+python-dev at trueblade.com  Thu Aug 16 18:57:13 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 16 Aug 2007 12:57:13 -0400
Subject: [Python-3000] Adding __format__ to object
In-Reply-To: <ca471dc20708160933y3981b006tb34d6d9f4214f196@mail.gmail.com>
References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de>	
	<46C468C4.3020907@trueblade.com>	
	<79990c6b0708160832v2a7a105bp9f6323a77b05361b@mail.gmail.com>
	<ca471dc20708160933y3981b006tb34d6d9f4214f196@mail.gmail.com>
Message-ID: <46C481E9.9010006@trueblade.com>

Guido van Rossum wrote:
> Paul's right. I agree it's confusing that object and type are both
> defined in the same file (though there's probably a good reason, given
> that type is derived from object and object is an instance of type
> :-). To add methods to object, add them to object_methods in that
> file. I've tested this.

Awesome!  I thought I looked for all occurrences in that file, but 
apparently not.

Thanks all for the help.

Eric.


From walter at livinglogic.de  Thu Aug 16 19:10:48 2007
From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=)
Date: Thu, 16 Aug 2007 19:10:48 +0200
Subject: [Python-3000] UTF-32 codecs
Message-ID: <46C48518.3070701@livinglogic.de>

I have a patch against the py3k branch (http://bugs.python.org/1775604)
that adds UTF-32 codecs. On a narrow build it combines surrogate pairs
in the unicode object into one codepoint on encoding and creates
surrogate pairs for codepoints outside the BMP on decoding.

Should I apply this to the py3k branch only, or do we want that for
Python 2.6 too (using str instead of bytes)?

Servus,
   Walter

From fdrake at acm.org  Thu Aug 16 19:18:44 2007
From: fdrake at acm.org (Fred Drake)
Date: Thu, 16 Aug 2007 13:18:44 -0400
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <ca471dc20708160938s364ed232t498e5f9c842d938c@mail.gmail.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa1hkr$gfj$1@sea.gmane.org>
	<ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
	<18116.26560.234919.274765@montanaro.dyndns.org>
	<ca471dc20708160938s364ed232t498e5f9c842d938c@mail.gmail.com>
Message-ID: <F8EC835C-24A0-4669-9C9D-3224CB8E78B0@acm.org>

On Aug 16, 2007, at 12:38 PM, Guido van Rossum wrote:
> I use it in the same way we used to refer to Python 3000 in the  
> past. :-)

If we acknowledge that Python 3.0 isn't the same as Python 3000, we  
don't even need a new name for it.  ;-)


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From martin at v.loewis.de  Thu Aug 16 19:42:09 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 16 Aug 2007 19:42:09 +0200
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <46C48518.3070701@livinglogic.de>
References: <46C48518.3070701@livinglogic.de>
Message-ID: <46C48C71.3000400@v.loewis.de>

Walter D?rwald schrieb:
> I have a patch against the py3k branch (http://bugs.python.org/1775604)
> that adds UTF-32 codecs. On a narrow build it combines surrogate pairs
> in the unicode object into one codepoint on encoding and creates
> surrogate pairs for codepoints outside the BMP on decoding.
> 
> Should I apply this to the py3k branch only, or do we want that for
> Python 2.6 too (using str instead of bytes)?

If it's no effort, I would like to seem this on the trunk also.

In general, I'm skeptical about the "new features only in 3k" strategy.
Some features can be added easily with no backwards-compatibility issues
in 2.x, and would have normally been added to the next major 2.x release
without much discussion.

Regards,
Martin

From eric+python-dev at trueblade.com  Thu Aug 16 19:54:03 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 16 Aug 2007 13:54:03 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C3C098.6080601@acm.org>
References: <46C2809C.3000806@acm.org>
	<8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com>
	<46C331BF.2020104@trueblade.com> <46C3C098.6080601@acm.org>
Message-ID: <46C48F3B.50203@trueblade.com>

Talin wrote:
> Eric Smith wrote:
>> James Thiele wrote:
>>> I think the example:
>>>
>>>     "My name is {0.name}".format(file('out.txt'))

> Those examples are kind of contrived to begin with. Maybe we should 
> replace them with more realistic ones.

I just added this test case:

         d = datetime.date(2007, 8, 18)
         self.assertEqual("The year is {0.year}".format(d),
                          "The year is 2007")

maybe we can use it.

From lists at cheimes.de  Thu Aug 16 20:31:29 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 16 Aug 2007 20:31:29 +0200
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.17176.123168.265491@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
Message-ID: <fa256g$orv$1@sea.gmane.org>

skip at pobox.com wrote:
>     Alex> The PEP abstract says this proposal will replace the '%' operator,

[...]

I agree with Skip, too. The % printf operator is a very useful and
powerful feature. I'm doing newbie support at my university and in
#python. Newbies are often astonished how easy and powerful printf() is
in Python. I like the % format operator, too. It's easy and fast to type
for small jobs.

I beg you to keep the feature. I agree that the new PEP 3101 style
format is useful and required for more complex string formating. But
please keep a simple one for simple jobs.

Christian


From guido at python.org  Thu Aug 16 20:33:19 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 16 Aug 2007 11:33:19 -0700
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <46C48C71.3000400@v.loewis.de>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>
Message-ID: <ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>

On 8/16/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Walter D?rwald schrieb:
> > I have a patch against the py3k branch (http://bugs.python.org/1775604)
> > that adds UTF-32 codecs. On a narrow build it combines surrogate pairs
> > in the unicode object into one codepoint on encoding and creates
> > surrogate pairs for codepoints outside the BMP on decoding.
> >
> > Should I apply this to the py3k branch only, or do we want that for
> > Python 2.6 too (using str instead of bytes)?
>
> If it's no effort, I would like to seem this on the trunk also.
>
> In general, I'm skeptical about the "new features only in 3k" strategy.
> Some features can be added easily with no backwards-compatibility issues
> in 2.x, and would have normally been added to the next major 2.x release
> without much discussion.

Agreed, especially since we're planning on backporting much to 2.6. I
want to draw the line at *dropping* stuff from 2.6 though (or
replacing it, or changing it). 2.6 needs to be *very* compatible with
2.5, in order to lure most users into upgrading to 2.6, which is a
prerequisite for porting to 3.0 eventually.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steven.bethard at gmail.com  Thu Aug 16 20:47:59 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Thu, 16 Aug 2007 12:47:59 -0600
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <fa256g$orv$1@sea.gmane.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
Message-ID: <d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>

On 8/16/07, Christian Heimes <lists at cheimes.de> wrote:
> skip at pobox.com wrote:
> >     Alex> The PEP abstract says this proposal will replace the '%' operator,
>
> [...]
>
> I agree with Skip, too. The % printf operator is a very useful and
> powerful feature. I'm doing newbie support at my university and in
> #python. Newbies are often astonished how easy and powerful printf() is
> in Python. I like the % format operator, too. It's easy and fast to type
> for small jobs.
>
> I beg you to keep the feature. I agree that the new PEP 3101 style
> format is useful and required for more complex string formating. But
> please keep a simple one for simple jobs.

I honestly can't see the point of keeping this::

    >>> '%-10s bought %02i apples for %3.2f' % ('John', 8, 3.78)
    'John       bought 08 apples for 3.78'

alongside this::

    >>> '{0:-10} bought {1:02i} apples for {2:3.2f}'.format('John', 8, 3.78)
    'John       bought 08 apples for 3.78'

They're so similar I don't see why you think the latter is no longer
"easy and powerful".

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From lists at cheimes.de  Thu Aug 16 20:53:31 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 16 Aug 2007 20:53:31 +0200
Subject: [Python-3000] Adding __format__ to object
In-Reply-To: <46C468C4.3020907@trueblade.com>
References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de>
	<46C468C4.3020907@trueblade.com>
Message-ID: <46C49D2B.709@cheimes.de>

Eric Smith wrote:
> I should have mentioned that's among the things I've already tried.
> 
> But that appears to add methods to 'type', not to an instance of 
> 'object'.  If you do dir(object()):
>
> You don't see the methods in typeobject.c (__mro__, etc).
> 
> This is pretty much the last hurdle in finishing my implementation of 
> PEP 3101.  The rest of it is either done, or just involves refactoring 
> existing code.

It works for me:

$ LC_ALL=C svn diff Objects/typeobject.c
Index: Objects/typeobject.c
===================================================================
--- Objects/typeobject.c        (revision 57099)
+++ Objects/typeobject.c        (working copy)
@@ -2938,6 +2938,8 @@
         PyDoc_STR("helper for pickle")},
        {"__reduce__", object_reduce, METH_VARARGS,
         PyDoc_STR("helper for pickle")},
+        {"__format__", object_reduce, METH_VARARGS,
+         PyDoc_STR("helper for pickle")},
        {0}
 };

$ ./python
Python 3.0x (py3k:57099M, Aug 16 2007, 20:45:17)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> dir(object())
['__class__', '__delattr__', '__doc__', '__eq__', '__format__',
'__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__str__']
>>> object().__format__
<built-in method __format__ of object object at 0xb7d0c468>
>>> object().__format__()
(<function _reconstructor at 0xb7cc71ac>, (<type 'object'>, <type
'object'>, None))

Are you sure that you have changed the correct array?

Christian


From brett at python.org  Thu Aug 16 21:05:57 2007
From: brett at python.org (Brett Cannon)
Date: Thu, 16 Aug 2007 12:05:57 -0700
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <fa256g$orv$1@sea.gmane.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
Message-ID: <bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>

[Tonight, the role of old, cranky python-dev'er will be played by
Brett Cannon.  Don't take this personally, Christian, your email just
happened to be last.  =)]

On 8/16/07, Christian Heimes <lists at cheimes.de> wrote:
> skip at pobox.com wrote:
> >     Alex> The PEP abstract says this proposal will replace the '%' operator,
>
> [...]
>
> I agree with Skip, too. The % printf operator is a very useful and
> powerful feature. I'm doing newbie support at my university and in
> #python. Newbies are often astonished how easy and powerful printf() is
> in Python. I like the % format operator, too. It's easy and fast to type
> for small jobs.
>

But how is::

  "{0} is happy to see {1}".format('Brett', 'Christian')

that less easier to read than::

  "%s is happy to see %s" % ('Brett', 'Christian')

?  Yes, PEP 3101 style is more to type but it isn't grievous; we have
just been spoiled by the overloading of the % operator.  And I don't
know how newbies think these days, but I know I find the numeric
markers much easier to follow then the '%s', especially if the string
ends up becoming long.

And if it is the use of a method instead of an operator that the
newbies might have issues with, well methods and functions come up
quick so they probably won't go long without knowing what is going on.

> I beg you to keep the feature. I agree that the new PEP 3101 style
> format is useful and required for more complex string formating. But
> please keep a simple one for simple jobs.

This is where the cranky python-dev'er comes in: PEP 3101 was
published in April 2006 which is over a year ago!  This is not a new
PEP or a new plan.  I personally stayed out of the discussions on this
subject as I knew reasonable people were keeping an eye on it and I
didn't feel I had anything to contribute.  That means I just go with
what they decide whether I like it or not.

I understand the feeling of catching up on a thread and going, "oh no,
I don't like that!", but that is the nature of the beast.  In my view,
if you just don't have the time or energy (which I completely
understand not having, don't get me wrong) for a thread, you basically
have to defer to the people who do and trust that the proper things
were discussed and that the group as a whole (or Guido in those cases
where his gut tells him to ignore everyone) is going to make a sound
decision.

At this point the discussion has gone long enough with Guido
participating and agreeing with key decisions, that the only way to
get this course of action changed is to come up with really good
examples of how the % format is hugely better than PEP 3101 and
convince the people involved.  But just saying you like %s over {0} is
like saying you don't like the decorator syntax: that's nice and all,
but that is not a compelling reason to change the decision being made.

-Brett

From eric+python-dev at trueblade.com  Thu Aug 16 21:23:01 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 16 Aug 2007 15:23:01 -0400
Subject: [Python-3000] Adding __format__ to object
In-Reply-To: <46C49D2B.709@cheimes.de>
References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de>
	<46C468C4.3020907@trueblade.com> <46C49D2B.709@cheimes.de>
Message-ID: <46C4A415.7090408@trueblade.com>

> Are you sure that you have changed the correct array?

Yes, that was the issue.  I changed the wrong array.  I stupidly assumed 
that it was one object per file, but of course there's no valid reason 
to make that assumption.

I'm sure I don't have the most best version of this coded up, but that's 
a problem for another day, and I'll ask for help on that when all of my 
tests pass.

Thanks again for your (and others) help on this.  I now have 
object.__format__ working, so I can finally get back to 
unicode.__format__ and parsing format specifiers.

Eric.


From rrr at ronadam.com  Thu Aug 16 21:35:33 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 16 Aug 2007 14:35:33 -0500
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com>
References: <46B13ADE.7080901@acm.org> <46BF22D8.2090309@trueblade.com>	
	<46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com>	
	<fb6fbf560708141233k193e36d0rae99e1decff42f11@mail.gmail.com>	
	<46C2325D.1010209@ronadam.com>	
	<fb6fbf560708150907q5d2f037ex85139dce51dec7fb@mail.gmail.com>	
	<46C367A8.4040601@ronadam.com>	
	<79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com>	
	<46C4250C.3050806@ronadam.com>
	<79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com>
Message-ID: <46C4A705.9030506@ronadam.com>



Paul Moore wrote:
> On 16/08/07, Ron Adam <rrr at ronadam.com> wrote:
>> Currently these particular examples aren't the syntax supported by the PEP.
>>  It's an alternative/possibly syntax only if there is enough support for a
>> serial left to right specification pattern as outlined.
> 
> Ah, I hadn't realised that. I've been skipping most of the
> discussions, mainly because of the lack of concrete examples :-)

And the discussion has progressed and changed in ways that makes thing a 
bit more confusing as well.  So the earlier parts of it don't connect with 
the later parts well.

It looks like the consensus is to keep something very close to the current 
%xxx style syntax just without the '%' on the front after all.  So this is 
all academic, although it may be useful sometime in th future.


>> I think most of developers here are still looking at various details and
>> are still undecided.  Do you have a preference for one or the other yet?
> 
> As evidenced by the fact that I failed to notice the difference, I
> can't distinguish the two :-)
> 
> All of the examples I've seen are hard to read. As Greg said, I find
> that I have to study the format string, mentally breaking it into
> parts, before I understand it. This is in complete contrast to
> printf-style "%.10s" formats. I'm not at all sure this is anything
> more than unfamiliarity, compounded by the fact that most of the
> examples I see on the list are relatively complex, or edge cases. But
> it's a major barrier to both understanding and acceptance of the new
> proposals.

Yes, familiarity is a big part.  I think the last version was the simplest 
because it breaks thing up logically to start with, (you don't have to do 
it mentally), it's just a matter of reading it left to right once you are 
familiar with the basic idea.

But it's still quite a bit different from what others are used to.  Enough 
so that it may take a bit of unlearning when things are changed this much. 
And enough so that it may seem overly complex at first glance.  In most 
cases it wouldn't be.

Anyways...  Skip to the new threads where they are discussing what the 
current status and things left to do are. Even with the older syntax, it 
will still be better than what we had because you can still create objects 
with custom formatting if you want to.  Most people won't even need that I 
suspect.

Cheers,
    Ron



> I'd still really like to see:
> 
> 1. A simple listing of common cases, maybe taken from something like
> stdlib uses of %-formats. Yes, most of them would be pretty trivial.
> That's the point!
> 2. A *very short* comparison of a few more advanced cases - I'd
> suggest formatting floats as fixed width, 2 decimal places (%5.2f),
> formatting 8-digit hex (%.8X) and maybe a simple date format
> (%Y-%m-%d). Yes, those are the sort of things I consider advanced.
> Examples I've seen in the discussion aren't "advanced" in my book,
> they are "I'll never use that" :-)
> 
> 3. Another very short list of a couple of things you can do with the
> new format, which you can't do with the existing % formats.
> Concentrate here on real-world use cases - tabular reports, reordering
> fields for internationalisation, things like that. As a data point,
> I've never needed to centre a field in a print statement. Heck, I
> don't even recall ever needing to specify how the sign of a number was
> printed!
> 
> I get the impression that the "clever" new features aren't actually
> going to address the sorts of formatting problems I hit a lot. That's
> fine, I can write code to do what I want, but there's a sense of YAGNI
> about the discussion, because (for example) by the time I need to
> format a centred, max-18, min-5 character number with 3 decimal places
> and the sign hard to the left, I'm also going to want to dot-fill a
> string to 30 characters and insert commas into the number, and I'm
> writing code anyway, so why bother with an obscure format string that
> only does half the job? (It goes without saying that if the format
> string can do everything I want, that argument doesn't work, but then
> we get to the complexity issues that hit regular expressions :-))
> 
> Sorry if this sounds a bit skeptical, but there's a *lot* of
> discussion here over a feature I expect to use pretty infrequently.
> 99% of my % formats use nothing more than %s!
> 
> Paul.
> 
> 

From walter at livinglogic.de  Thu Aug 16 21:47:19 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 16 Aug 2007 21:47:19 +0200
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>
	<ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>
Message-ID: <46C4A9C7.9060408@livinglogic.de>

Guido van Rossum wrote:

> On 8/16/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Walter D?rwald schrieb:
>>> I have a patch against the py3k branch (http://bugs.python.org/1775604)
>>> that adds UTF-32 codecs. On a narrow build it combines surrogate pairs
>>> in the unicode object into one codepoint on encoding and creates
>>> surrogate pairs for codepoints outside the BMP on decoding.
>>>
>>> Should I apply this to the py3k branch only, or do we want that for
>>> Python 2.6 too (using str instead of bytes)?
>> If it's no effort, I would like to seem this on the trunk also.
>>
>> In general, I'm skeptical about the "new features only in 3k" strategy.
>> Some features can be added easily with no backwards-compatibility issues
>> in 2.x, and would have normally been added to the next major 2.x release
>> without much discussion.
> 
> Agreed, especially since we're planning on backporting much to 2.6. I
> want to draw the line at *dropping* stuff from 2.6 though (or
> replacing it, or changing it). 2.6 needs to be *very* compatible with
> 2.5, in order to lure most users into upgrading to 2.6, which is a
> prerequisite for porting to 3.0 eventually.

OK, then I'll check it into the py3k branch, and backport it to the trunk.

Servus,
    Walter


From skip at pobox.com  Thu Aug 16 21:52:21 2007
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 16 Aug 2007 14:52:21 -0500
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
Message-ID: <18116.43765.194435.952513@montanaro.dyndns.org>


    STeVe> I honestly can't see the point of keeping this::

    >>>> '%-10s bought %02i apples for %3.2f' % ('John', 8, 3.78)
    STeVe>     'John       bought 08 apples for 3.78'

    STeVe> alongside this::

    >>>> '{0:-10} bought {1:02i} apples for {2:3.2f}'.format('John', 8, 3.78)
    STeVe>     'John       bought 08 apples for 3.78'

    STeVe> They're so similar I don't see why you think the latter is no
    STeVe> longer "easy and powerful".

You mean other than:

   * the new is more verbose than the old
   * the curly braces and [012]: prefixes are just syntactic sugar when
     converting old to new
   * in situations where the format string isn't a literal that mechanical
     translation from old to new won't be possible
   * lots of people are familiar with the old format, few with the new

?

I suppose nothing.

Skip



From martin at v.loewis.de  Thu Aug 16 21:56:50 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 16 Aug 2007 21:56:50 +0200
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <46C4A9C7.9060408@livinglogic.de>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>
	<ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>
	<46C4A9C7.9060408@livinglogic.de>
Message-ID: <46C4AC02.3020203@v.loewis.de>

> OK, then I'll check it into the py3k branch, and backport it to the trunk.

This raises another procedural question: are we still merging from the
trunk to the 3k branch, or are they now officially split?

If we still merge, and assuming that the implementations are
sufficiently similar and live in the same files, it would be better
to commit into the trunk, then merge (or wait for somebody else to
merge), then apply any modifications that the 3k branch needs.

Regards,
Martin

From skip at pobox.com  Thu Aug 16 21:57:33 2007
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 16 Aug 2007 14:57:33 -0500
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
Message-ID: <18116.44077.156543.671421@montanaro.dyndns.org>


    Brett> But how is::

    Brett>   "{0} is happy to see {1}".format('Brett', 'Christian')

    Brett> that less easier to read than::

    Brett>   "%s is happy to see %s" % ('Brett', 'Christian')

    Brett> ?  Yes, PEP 3101 style is more to type but it isn't grievous; we
    Brett> have just been spoiled by the overloading of the % operator.  And
    Brett> I don't know how newbies think these days, but I know I find the
    Brett> numeric markers much easier to follow then the '%s', especially
    Brett> if the string ends up becoming long.

If you decide to insert another format token in the middle the new is more
error-prone than the old:

    "{0} asks {1} if he is happy to see {1}".format('Brett', 'Skip', 'Christian')

                                        ^^^ whoops

vs:

    "%s asks %s if he is happy to see %s" % ('Brett', 'Skip', 'Christian')

Now extend that to format strings with more than a couple expansions.

    Brett> This is where the cranky python-dev'er comes in: PEP 3101 was
    Brett> published in April 2006 which is over a year ago!  This is not a
    Brett> new PEP or a new plan.

Yes, but Python 3 is more real today than 15 months ago, hence the greater
focus now than before.

Skip

From walter at livinglogic.de  Thu Aug 16 22:03:07 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 16 Aug 2007 22:03:07 +0200
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <46C4AC02.3020203@v.loewis.de>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>
	<ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>
	<46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de>
Message-ID: <46C4AD7B.5040808@livinglogic.de>

Martin v. L?wis wrote:

>> OK, then I'll check it into the py3k branch, and backport it to the trunk.
> 
> This raises another procedural question: are we still merging from the
> trunk to the 3k branch, or are they now officially split?
> 
> If we still merge, and assuming that the implementations are
> sufficiently similar

See below.

> and live in the same files,

Mostly they do, but there are three new files in Lib/encodings: 
utf_32.py, utf_32_le.py and utf_32_be.py

> it would be better
> to commit into the trunk, then merge (or wait for somebody else to
> merge), then apply any modifications that the 3k branch needs.

A simple merge won't work, because in 3.0 the codec uses bytes and in 
2.6 it uses str. Also the call to the decoding error handler has 
changed, because in 3.0 the error handler could modify the mutable input 
buffer.

Servus,
    Walter

From martin at v.loewis.de  Thu Aug 16 22:11:19 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 16 Aug 2007 22:11:19 +0200
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <46C4AD7B.5040808@livinglogic.de>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>
	<ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>
	<46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de>
	<46C4AD7B.5040808@livinglogic.de>
Message-ID: <46C4AF67.1090209@v.loewis.de>

> A simple merge won't work, because in 3.0 the codec uses bytes and in
> 2.6 it uses str. Also the call to the decoding error handler has
> changed, because in 3.0 the error handler could modify the mutable input
> buffer.

So what's the strategy then? Block the trunk revision from merging?

Regards,
Martin

From lists at cheimes.de  Thu Aug 16 22:15:05 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 16 Aug 2007 22:15:05 +0200
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
References: <46C2809C.3000806@acm.org>
	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>
	<bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
Message-ID: <fa2b8g$ehm$1@sea.gmane.org>

Brett Cannon wrote:
> [Tonight, the role of old, cranky python-dev'er will be played by
> Brett Cannon.  Don't take this personally, Christian, your email just
> happened to be last.  =)]

hehe :)
I don't feel offended.

> But how is::
> 
>   "{0} is happy to see {1}".format('Brett', 'Christian')
> 
> that less easier to read than::
> 
>   "%s is happy to see %s" % ('Brett', 'Christian')
> 
> ?  Yes, PEP 3101 style is more to type but it isn't grievous; we have
> just been spoiled by the overloading of the % operator.  And I don't
> know how newbies think these days, but I know I find the numeric
> markers much easier to follow then the '%s', especially if the string
> ends up becoming long.
> 
> And if it is the use of a method instead of an operator that the
> newbies might have issues with, well methods and functions come up
> quick so they probably won't go long without knowing what is going on.

My concerns are partly based on my laziness and my antipathy against '{'
and '}'. On my German keyboard I have to move my whole hand to another
position to enter a { or }. It's right ALT (Alt Gr) + 7 and 0. The %
character is much easier to type for me. :]

> This is where the cranky python-dev'er comes in: PEP 3101 was
> published in April 2006 which is over a year ago!  This is not a new
> PEP or a new plan.  I personally stayed out of the discussions on this
> subject as I knew reasonable people were keeping an eye on it and I
> didn't feel I had anything to contribute.  That means I just go with
> what they decide whether I like it or not.

I've read the PEP about an year ago. I was always under the impression
that the PEP was going to *add* an alternative and more powerful format
to Python. I didn't noticed that the PEP was about a *replacement* for
the % format operator. My fault ;)

> I understand the feeling of catching up on a thread and going, "oh no,
> I don't like that!", but that is the nature of the beast.  In my view,
> if you just don't have the time or energy (which I completely
> understand not having, don't get me wrong) for a thread, you basically
> have to defer to the people who do and trust that the proper things
> were discussed and that the group as a whole (or Guido in those cases
> where his gut tells him to ignore everyone) is going to make a sound
> decision.
> 
> At this point the discussion has gone long enough with Guido
> participating and agreeing with key decisions, that the only way to
> get this course of action changed is to come up with really good
> examples of how the % format is hugely better than PEP 3101 and
> convince the people involved.  But just saying you like %s over {0} is
> like saying you don't like the decorator syntax: that's nice and all,
> but that is not a compelling reason to change the decision being made.

You are right. I'm guilty as charged to be a participant  of a red bike
shed discussion. :) I'm seeing myself as a small Python user and
developer who is trying to get in touch with the gods in the temple of
python core development (exaggerated *G*). I've been using Python for
about 5 years and I'm trying to give something back to the community. In
the past months I've submitted patches for small bugs (low hanging
fruits) and I've raised my voice to show my personal - sometimes
inadequate - opinion.

By the way it's great that the core developers are taking their time to
discuss this matter with a newbie. Although it is sometimes
disappointing to see that my ideas don't make it into the core I don't
feel denied. It gives me the feeling that my work is appreciated but not
(yet) good enough to meet the quality standards.

I'll stick around and see how I can be of service in the future.

Christian


From barry at python.org  Thu Aug 16 22:16:49 2007
From: barry at python.org (Barry Warsaw)
Date: Thu, 16 Aug 2007 16:16:49 -0400
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.43765.194435.952513@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
	<18116.43765.194435.952513@montanaro.dyndns.org>
Message-ID: <18861C49-7121-44B2-B17A-30992DF25E0D@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 16, 2007, at 3:52 PM, skip at pobox.com wrote:

>     STeVe> I honestly can't see the point of keeping this::
>
>>>>> '%-10s bought %02i apples for %3.2f' % ('John', 8, 3.78)
>     STeVe>     'John       bought 08 apples for 3.78'
>
>     STeVe> alongside this::
>
>>>>> '{0:-10} bought {1:02i} apples for {2:3.2f}'.format('John', 8,  
>>>>> 3.78)
>     STeVe>     'John       bought 08 apples for 3.78'
>
>     STeVe> They're so similar I don't see why you think the latter  
> is no
>     STeVe> longer "easy and powerful".
>
> You mean other than:
>
>    * the new is more verbose than the old
>    * the curly braces and [012]: prefixes are just syntactic sugar  
> when
>      converting old to new
>    * in situations where the format string isn't a literal that  
> mechanical
>      translation from old to new won't be possible
>    * lots of people are familiar with the old format, few with the new

There's one other problem that I see, though it might be minor or  
infrequent enough not to matter.  %s positional placeholders are  
easily to generate programmatically than {#} placeholders.   Think  
about translating this:

def make_query(flag1, flag2):
     base_query = 'SELECT %s from %s WHERE name = %s '
     if flag1:
         base_query += 'AND age = %s '
     if flag2:
         base_query += 'AND height = %s '
     base_query = 'AND gender = %s'
     return base_query

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsSwsXEjvBPtnXfVAQLyPgP+P/bnNZhZUGbAcM6lDJdACVYmEFh3bGDR
NJH874DLXp7fsn5iLJ3Fel7eiWqLZ0/lvGYEmAAz/4SYagKFnrYAFTsDKglFroiL
bHzZBKHHuf/Db1oNJBcuQakpbhddX0WMu+XxcKXbgUK87tJE4kbaZPTjU8WF5XDW
EriR/UZBZ40=
=qHQs
-----END PGP SIGNATURE-----

From brett at python.org  Thu Aug 16 22:19:49 2007
From: brett at python.org (Brett Cannon)
Date: Thu, 16 Aug 2007 13:19:49 -0700
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.44077.156543.671421@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
	<18116.44077.156543.671421@montanaro.dyndns.org>
Message-ID: <bbaeab100708161319v77af7b82x69f9a4e2afbe04d5@mail.gmail.com>

On 8/16/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     Brett> But how is::
>
>     Brett>   "{0} is happy to see {1}".format('Brett', 'Christian')
>
>     Brett> that less easier to read than::
>
>     Brett>   "%s is happy to see %s" % ('Brett', 'Christian')
>
>     Brett> ?  Yes, PEP 3101 style is more to type but it isn't grievous; we
>     Brett> have just been spoiled by the overloading of the % operator.  And
>     Brett> I don't know how newbies think these days, but I know I find the
>     Brett> numeric markers much easier to follow then the '%s', especially
>     Brett> if the string ends up becoming long.
>
> If you decide to insert another format token in the middle the new is more
> error-prone than the old:
>
>     "{0} asks {1} if he is happy to see {1}".format('Brett', 'Skip', 'Christian')
>
>                                         ^^^ whoops
>
> vs:
>
>     "%s asks %s if he is happy to see %s" % ('Brett', 'Skip', 'Christian')
>
> Now extend that to format strings with more than a couple expansions.
>

Sure, but I find the %s form harder to read honestly.  Plus you didn't
need to insert in that order; you could have put it as::

  "{0} asks {2} if he is happy to see {1}".format("Brett", "Christian", "Skip")

Not perfect, but it works.  Or you just name your arguments.  We are
talking simple things here, and as soon as you start to scale you will
most likely move to name-based arguments anyway once your quick hack
format scheme doesn't hold.

>     Brett> This is where the cranky python-dev'er comes in: PEP 3101 was
>     Brett> published in April 2006 which is over a year ago!  This is not a
>     Brett> new PEP or a new plan.
>
> Yes, but Python 3 is more real today than 15 months ago, hence the greater
> focus now than before.

Well, for me the "realness" of Py3K is the same now as it was back
when Guido created teh p3yk branch.

-Brett

From steven.bethard at gmail.com  Thu Aug 16 22:29:31 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Thu, 16 Aug 2007 14:29:31 -0600
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.43765.194435.952513@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
	<18116.43765.194435.952513@montanaro.dyndns.org>
Message-ID: <d11dcfba0708161329ia79a264ued8ea7165cbb4072@mail.gmail.com>

On 8/16/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     STeVe> I honestly can't see the point of keeping this::
>
>     >>>> '%-10s bought %02i apples for %3.2f' % ('John', 8, 3.78)
>     STeVe>     'John       bought 08 apples for 3.78'
>
>     STeVe> alongside this::
>
>     >>>> '{0:-10} bought {1:02i} apples for {2:3.2f}'.format('John', 8, 3.78)
>     STeVe>     'John       bought 08 apples for 3.78'
>
>     STeVe> They're so similar I don't see why you think the latter is no
>     STeVe> longer "easy and powerful".
>
> You mean other than:
>
>    * the new is more verbose than the old
>    * the curly braces and [012]: prefixes are just syntactic sugar when
>      converting old to new
>    * in situations where the format string isn't a literal that mechanical
>      translation from old to new won't be possible
>    * lots of people are familiar with the old format, few with the new
>
> ?

As I understand it, it's already been decided that {}-style formatting
will be present in Python 3.  So the question is not about the merits
of {}-style formatting vs. %-style formatting.  That debate's already
been had.  The question is whether it makes sense to keep %-style
formatting around when {}-style formatting is so similar.

Since %-style formatting saves at most a couple of characters per
specifier, that doesn't seem to justify the massive duplication to me.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From eric+python-dev at trueblade.com  Thu Aug 16 22:34:19 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 16 Aug 2007 16:34:19 -0400
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.44077.156543.671421@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org>
	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>	<bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
	<18116.44077.156543.671421@montanaro.dyndns.org>
Message-ID: <46C4B4CB.7020001@trueblade.com>

skip at pobox.com wrote:
 >     Brett> But how is::
 >
 >     Brett>   "{0} is happy to see {1}".format('Brett', 'Christian')
 >
 >     Brett> that less easier to read than::
 >
 >     Brett>   "%s is happy to see %s" % ('Brett', 'Christian')
 >
 >     Brett> ?  Yes, PEP 3101 style is more to type but it isn't 
grievous; we
 >     Brett> have just been spoiled by the overloading of the % 
operator.  And
 >     Brett> I don't know how newbies think these days, but I know I 
find the
 >     Brett> numeric markers much easier to follow then the '%s', 
especially
 >     Brett> if the string ends up becoming long.
 >
 > If you decide to insert another format token in the middle the new is 
more
 > error-prone than the old:
 >
 >     "{0} asks {1} if he is happy to see {1}".format('Brett', 'Skip', 
'Christian')
 >
 >                                         ^^^ whoops
 >
 > vs:
 >
 >     "%s asks %s if he is happy to see %s" % ('Brett', 'Skip', 
'Christian')

The whole point of the indexes is that the order now doesn't matter:

"{0} asks {2} if he is happy to see {1}".format('Brett', 'Christian',
                                                 'Skip')

If you really have many items to expand, name them, and then it matters 
even less:
"{asker} asks {askee} if he is happy to see 
{person}".format(asker='Brett', person='Christian', askee='Skip')

Which I think is way better than the %-formatting equivalent.


From fdrake at acm.org  Thu Aug 16 22:47:31 2007
From: fdrake at acm.org (Fred Drake)
Date: Thu, 16 Aug 2007 16:47:31 -0400
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <bbaeab100708161319v77af7b82x69f9a4e2afbe04d5@mail.gmail.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
	<18116.44077.156543.671421@montanaro.dyndns.org>
	<bbaeab100708161319v77af7b82x69f9a4e2afbe04d5@mail.gmail.com>
Message-ID: <D6F75A15-7B43-485B-8A6B-68C473D8D88A@acm.org>

On Aug 16, 2007, at 4:19 PM, Brett Cannon wrote:
> Well, for me the "realness" of Py3K is the same now as it was back
> when Guido created teh p3yk branch.

Somehow, I suspect the reality of Python 3.0 for any individual is  
strong tied to the amount of time they've had to read the emails,  
PEPs, blogs(!) and what-not related to it.  From where I stand, I'm  
not yet certain that 2.5 is real.  ;-/


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From guido at python.org  Thu Aug 16 23:15:27 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 16 Aug 2007 14:15:27 -0700
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <46C4AC02.3020203@v.loewis.de>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>
	<ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>
	<46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de>
Message-ID: <ca471dc20708161415k632226f6r138e96d873b86af2@mail.gmail.com>

On 8/16/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > OK, then I'll check it into the py3k branch, and backport it to the trunk.
>
> This raises another procedural question: are we still merging from the
> trunk to the 3k branch, or are they now officially split?

I plan to set up merging again; I still think it's useful.

> If we still merge, and assuming that the implementations are
> sufficiently similar and live in the same files, it would be better
> to commit into the trunk, then merge (or wait for somebody else to
> merge), then apply any modifications that the 3k branch needs.

Yes. But no biggie for new code if it's done the other way around.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From walter at livinglogic.de  Thu Aug 16 23:32:17 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 16 Aug 2007 23:32:17 +0200
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <46C4AF67.1090209@v.loewis.de>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>
	<ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>
	<46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de>
	<46C4AD7B.5040808@livinglogic.de> <46C4AF67.1090209@v.loewis.de>
Message-ID: <46C4C261.2080304@livinglogic.de>

Martin v. L?wis wrote:

>> A simple merge won't work, because in 3.0 the codec uses bytes and in
>> 2.6 it uses str. Also the call to the decoding error handler has
>> changed, because in 3.0 the error handler could modify the mutable input
>> buffer.
> 
> So what's the strategy then? Block the trunk revision from merging?

I've never used svnmerge, so I don't know what the strategy for 
automatic merging would be. What I would do is check in the patch for 
the py3k branch, then apply the patch to the trunk, get it to work and 
check it in.

Servus,
    Walter


From guido at python.org  Thu Aug 16 23:43:49 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 16 Aug 2007 14:43:49 -0700
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <46C4C261.2080304@livinglogic.de>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>
	<ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>
	<46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de>
	<46C4AD7B.5040808@livinglogic.de> <46C4AF67.1090209@v.loewis.de>
	<46C4C261.2080304@livinglogic.de>
Message-ID: <ca471dc20708161443k692d56e4haa6ff709969a27a@mail.gmail.com>

On 8/16/07, Walter D?rwald <walter at livinglogic.de> wrote:
> Martin v. L?wis wrote:
>
> >> A simple merge won't work, because in 3.0 the codec uses bytes and in
> >> 2.6 it uses str. Also the call to the decoding error handler has
> >> changed, because in 3.0 the error handler could modify the mutable input
> >> buffer.
> >
> > So what's the strategy then? Block the trunk revision from merging?
>
> I've never used svnmerge, so I don't know what the strategy for
> automatic merging would be. What I would do is check in the patch for
> the py3k branch, then apply the patch to the trunk, get it to work and
> check it in.

Go right ahead. I'll clean up afterwards.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Thu Aug 16 23:50:50 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 16 Aug 2007 17:50:50 -0400
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
Message-ID: <fb6fbf560708161450j7d6e3cqf91e3ca985c5267f@mail.gmail.com>

On 8/16/07, Brett Cannon <brett at python.org> wrote:

> But how is::

>   "{0} is happy to see {1}".format('Brett', 'Christian')

> that less easier to read than::

>   "%s is happy to see %s" % ('Brett', 'Christian')

Excluding minor aesthetics, they are equivalent.

The new format only has advantages when the formatting string is
complicated.  So the real question is:

Should we keep the old way for backwards compatibility?

Or should we force people to upgrade their code (and their translation
data files), even if their code doesn't benefit, and wouldn't need to
change otherwise?

Remember that most of the time, the old way worked fine, and it will
be the new way that seems redundant.  Remember also that 2to3 won't
get this change entirely right.  Remember that people can already
subclass string.Template if they really do need fancy logic.  Note
that this removal alone would go a huge way toward preventing code
that works in both 3.0 and 2.5 (or 2.2).


> But just saying you like %s over {0} is
> like saying you don't like the decorator syntax: that's nice and all,
> but that is not a compelling reason to change the decision being made.

It is more like saying you prefer the old style of rebinding the name.
 Adding the new format is one thing; removing the old is another.

-jJ

From martin at v.loewis.de  Thu Aug 16 23:52:07 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 16 Aug 2007 23:52:07 +0200
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <46C4C261.2080304@livinglogic.de>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>
	<ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>
	<46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de>
	<46C4AD7B.5040808@livinglogic.de> <46C4AF67.1090209@v.loewis.de>
	<46C4C261.2080304@livinglogic.de>
Message-ID: <46C4C707.5010302@v.loewis.de>

> I've never used svnmerge, so I don't know what the strategy for
> automatic merging would be.

Reading about svnmerge tells me that you probably should use
"svnmerge merge -r <trunk-rev> -S trunk -M" on the 3k branch;
this should record the revision from the branch as already
(manually) merged.

Regards,
Martin

From brett at python.org  Fri Aug 17 00:11:42 2007
From: brett at python.org (Brett Cannon)
Date: Thu, 16 Aug 2007 15:11:42 -0700
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <fa2b8g$ehm$1@sea.gmane.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
	<fa2b8g$ehm$1@sea.gmane.org>
Message-ID: <bbaeab100708161511s77f0913ft85fe78561b14efa0@mail.gmail.com>

On 8/16/07, Christian Heimes <lists at cheimes.de> wrote:
[SNIP]

> You are right. I'm guilty as charged to be a participant  of a red bike
> shed discussion. :) I'm seeing myself as a small Python user and
> developer who is trying to get in touch with the gods in the temple of
> python core development (exaggerated *G*). I've been using Python for
> about 5 years and I'm trying to give something back to the community.

Which is great!

> In
> the past months I've submitted patches for small bugs (low hanging
> fruits) and I've raised my voice to show my personal - sometimes
> inadequate - opinion.
>

Your opinion can't be inadequate; perk of it being subjective.  =)
And I partially did the email as I did to explain to you and to other
people who have not been around for a long time how stuff like this
goes when a late-in-the-process objection tends to be taken.

> By the way it's great that the core developers are taking their time to
> discuss this matter with a newbie.

Reasonable discussions are fine, newbie or not.  Perk of the Python
community being friendly is most people are happy to answer questions.

> Although it is sometimes
> disappointing to see that my ideas don't make it into the core I don't
> feel denied.

That's good.  We all have ideas that have been rejected (including Guido  =).

> It gives me the feeling that my work is appreciated but not
> (yet) good enough to meet the quality standards.
>
> I'll stick around and see how I can be of service in the future.

Wonderful!

-Brett

From martin at v.loewis.de  Fri Aug 17 00:35:39 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Aug 2007 00:35:39 +0200
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.43765.194435.952513@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org>
	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
	<18116.43765.194435.952513@montanaro.dyndns.org>
Message-ID: <46C4D13B.8070608@v.loewis.de>

>    * the new is more verbose than the old
>    * the curly braces and [012]: prefixes are just syntactic sugar when
>      converting old to new
>    * in situations where the format string isn't a literal that mechanical
>      translation from old to new won't be possible
>    * lots of people are familiar with the old format, few with the new

I think most of these points are irrelevant. The curly braces are not
just syntactic sugar, at least the opening brace is not; the digit
is not syntactic sugar in the case of message translations.

That lots of people are familiar with the old format and only few are
with the new is merely a matter of time. As Guido van Rossum says:
the number of Python programs yet to be written is hopefully larger
than the number of programs already written (or else continuing the
Python development is futile).

That the new format is more verbose than the old one is true, but only
slightly so - typing .format is actually easier for me than typing
% (which requires a shift key).

Porting programs that have computed format strings is indeed a
challenge. The theory here is that this affects only few programs.

Regards,
Martin

From martin at v.loewis.de  Fri Aug 17 00:45:32 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Aug 2007 00:45:32 +0200
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18861C49-7121-44B2-B17A-30992DF25E0D@python.org>
References: <46C2809C.3000806@acm.org>
	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>	<18116.43765.194435.952513@montanaro.dyndns.org>
	<18861C49-7121-44B2-B17A-30992DF25E0D@python.org>
Message-ID: <46C4D38C.8000402@v.loewis.de>

> There's one other problem that I see, though it might be minor or  
> infrequent enough not to matter.  %s positional placeholders are  
> easily to generate programmatically than {#} placeholders.   Think  
> about translating this:
> 
> def make_query(flag1, flag2):
>      base_query = 'SELECT %s from %s WHERE name = %s '
>      if flag1:
>          base_query += 'AND age = %s '
>      if flag2:
>          base_query += 'AND height = %s '
>      base_query = 'AND gender = %s'
>      return base_query

Of course, *this* specific example is flawed: you are likely to
pass the result to a DB-API library, which supports %s as a
placeholder independent of whether strings support the modulo
operator (it is then flawed also in that you don't typically
have placeholders for the result fields and table name - not sure
whether you even can in DB-API).

If I had to generate a computed format string, I'd probably
use the named placeholders, rather than the indexed ones.

   base_query = 'SELECT {field} FROM {table} WHERE name = {name} '
   if flag1:
      base_query += 'AND age = {age} '
   if flag2:
      base_query += 'AND height = {height} '
   base_query += 'AND gender = {gender}'
   return base_query

Regards,
Martin

From janssen at parc.com  Fri Aug 17 01:09:32 2007
From: janssen at parc.com (Bill Janssen)
Date: Thu, 16 Aug 2007 16:09:32 PDT
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com> 
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
Message-ID: <07Aug16.160935pdt."57996"@synergy1.parc.xerox.com>

> But just saying you like %s over {0} is
> like saying you don't like the decorator syntax: that's nice and all,
> but that is not a compelling reason to change the decision being made.

My guess is that the folks who object to it are, like me, folks who
primarily work in Python and C, and don't want to try to keep two
really different sets of formatting codes in their heads.  Folks who
work primarily in Python and .NET (or whereever these new-fangled
codes come from) probably feel the opposite.  But it's a mistake to
say that it's just "taste"; for this one there's a real cognitive
load that affects ease of programming.

Bill

From janssen at parc.com  Fri Aug 17 01:16:10 2007
From: janssen at parc.com (Bill Janssen)
Date: Thu, 16 Aug 2007 16:16:10 PDT
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C4D13B.8070608@v.loewis.de> 
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
	<18116.43765.194435.952513@montanaro.dyndns.org>
	<46C4D13B.8070608@v.loewis.de>
Message-ID: <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com>

> I think most of these points are irrelevant. The curly braces are not
> just syntactic sugar, at least the opening brace is not; the digit
> is not syntactic sugar in the case of message translations.

Are there "computation of matching braces" problems here?

> That lots of people are familiar with the old format and only few are
> with the new is merely a matter of time.

Sure, but the problem is that there are a lot of Python programmers
*now* and learning the new syntax imposes a burden on all of *them*.
Who cares how many people know the syntax in the future?

> That the new format is more verbose than the old one is true, but only
> slightly so - typing .format is actually easier for me than typing
> % (which requires a shift key).

I don't mind the switch to ".format"; it's the formatting codes that I
don't want to see changed.

> Porting programs that have computed format strings is indeed a
> challenge. The theory here is that this affects only few programs.

I think you'll find it's more than a few.  This issue is obviously an
iceberg issue; most folks never thought you were going to remove the
old formatting codes, just add a newer and more capable set.

Bill


From alexandre at peadrop.com  Fri Aug 17 01:43:10 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Thu, 16 Aug 2007 19:43:10 -0400
Subject: [Python-3000] [Python-Dev]  Documentation switch imminent
In-Reply-To: <ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>
References: <f9t2nn$ksg$1@sea.gmane.org> <f9v2rd$2dl$1@sea.gmane.org>
	<ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>
Message-ID: <acd65fa20708161643s6d7c0c42x135eb0957980aa14@mail.gmail.com>

On 8/16/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> On 8/15/07, Georg Brandl <g.brandl at gmx.net> wrote:
> > Okay, I made the switch.  I tagged the state of both Python branches
> > before the switch as tags/py{26,3k}-before-rstdocs/.
>
> http://docs.python.org/dev/
> http://docs.python.org/dev/3.0/
>

Is it just me, or the markup of the new docs is quite heavy?

alex% wget -q -O- http://docs.python.org/api/genindex.html | wc -c
77868
alex% wget -q -O- http://docs.python.org/dev/3.0/genindex.html | wc -c
918359

Firefox, on my fairly recent machine, takes ~5 seconds rendering the
index of the new docs from disk, compared to a fraction of a second
for the old one.

-- Alexandre

From greg.ewing at canterbury.ac.nz  Fri Aug 17 03:05:32 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 17 Aug 2007 13:05:32 +1200
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa1hkr$gfj$1@sea.gmane.org>
	<ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
Message-ID: <46C4F45C.1040904@canterbury.ac.nz>

Guido van Rossum wrote:
> The only added wrinkle is that you can also write
> {0!r} to *force* using repr() on the value.

What if you want a field width with that? Will you be
able to write {0!10r} or will it have to be {0!r:10}?

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Aug 17 03:16:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 17 Aug 2007 13:16:20 +1200
Subject: [Python-3000] More PEP 3101 changes incoming
In-Reply-To: <fb6fbf560708160817v4b8b23a0pd6974961d01da5c7@mail.gmail.com>
References: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu>
	<fb6fbf560708160817v4b8b23a0pd6974961d01da5c7@mail.gmail.com>
Message-ID: <46C4F6E4.9050602@canterbury.ac.nz>

Jim Jewett wrote:
> You can't write int(s) if you're passing a mapping (or tuple) from
> someone else; at best you can copy the mapping and modify certain
> values.

Maybe this could be handled using a wrapper object that
takes a sequence or mapping and a collection of functions
to be applied to specified items.

    "i = {0}, x = {1}".format(convert(stuff, int, float))

or using names

    "i = {i}, x = {x}".format(convert(stuff, i = int, x = float))

This would have the advantage of allowing arbitrarily
complex conversions while keeping the potentially verbose
specifications of those conversions out of the format
string. Plus the convert() wrapper could be useful in its
own right for other things besides formatting.

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Aug 17 03:36:51 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 17 Aug 2007 13:36:51 +1200
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <18116.44077.156543.671421@montanaro.dyndns.org>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<bbaeab100708161205s3a37a051l451bf9090bf218b7@mail.gmail.com>
	<18116.44077.156543.671421@montanaro.dyndns.org>
Message-ID: <46C4FBB3.4060607@canterbury.ac.nz>

skip at pobox.com wrote:
>     "{0} asks {1} if he is happy to see {1}".format('Brett', 'Skip', 'Christian')
> 
>                                         ^^^ whoops

This kind of mistake is easy to spot if the format
string is short.

If it's not short, it would be better to use names:

   "{asker} asks {askee} if he is happy to see {friend}".format(
      asker = 'Brett', askee = 'Skip', friend = 'Christian')

--
Greg

From skip at pobox.com  Fri Aug 17 03:58:28 2007
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 16 Aug 2007 20:58:28 -0500
Subject: [Python-3000] Is it possible to avoid filenames with spaces?
Message-ID: <18117.196.883933.981385@montanaro.dyndns.org>

Given that filenames containing spaces make things a bit more challenging
for tools like find and xargs would it be possible to get rid of them in the
Python source tree?  I only see two files containing spaces at the moment:

    ./Mac/Icons/Disk Image.icns
    ./Mac/Icons/Python Folder.icns

Can they be easily renamed without creating havoc somewhere else?

Thx,

Skip

From greg.ewing at canterbury.ac.nz  Fri Aug 17 04:03:21 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 17 Aug 2007 14:03:21 +1200
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C4D38C.8000402@v.loewis.de>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
	<18116.43765.194435.952513@montanaro.dyndns.org>
	<18861C49-7121-44B2-B17A-30992DF25E0D@python.org>
	<46C4D38C.8000402@v.loewis.de>
Message-ID: <46C501E9.5030408@canterbury.ac.nz>

Martin v. L?wis wrote:
> (it is then flawed also in that you don't typically
> have placeholders for the result fields and table name - not sure
> whether you even can in DB-API).

It probably depends on the underlying DB, but most DB
interfaces I know of don't allow such a thing.

My biggest gripe about the DB-API is that you have
to use one of about five different ways of marking
parameters *depending on the DB*. Which kind of
defeats the purpose of having a DB-independent API
in the first place...

--
Greg

From guido at python.org  Fri Aug 17 04:11:11 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 16 Aug 2007 19:11:11 -0700
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C4F45C.1040904@canterbury.ac.nz>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa1hkr$gfj$1@sea.gmane.org>
	<ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
	<46C4F45C.1040904@canterbury.ac.nz>
Message-ID: <ca471dc20708161911w797dd85cwd9e870a92255b5f9@mail.gmail.com>

On 8/16/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> > The only added wrinkle is that you can also write
> > {0!r} to *force* using repr() on the value.
>
> What if you want a field width with that? Will you be
> able to write {0!10r} or will it have to be {0!r:10}?

The latter.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 17 05:53:29 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 16 Aug 2007 20:53:29 -0700
Subject: [Python-3000] Two new test failures (one OSX PPC only)
Message-ID: <ca471dc20708162053m3c55c2ecyeed870dbbbdf7a4f@mail.gmail.com>

I see two new tests failing tonight:

- test_xmlrpc fails on all platforms I have. This is due to several
new tests that were merged in from the trunk; presumably those tests
need changes due to str vs. bytes.

- test_codecs fails on OSX PPC only. This is in the new UTF-32 codecs;
probably a byte order issue.

There's still one leak that Neal would like to see fixed, in
test_zipimport. Instructions to reproduce: in a *debug* build, run
this command:

  ./python Lib/test/regrtest.py -R1:1: test_zipimport

This reports 29 leaked references.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Fri Aug 17 06:40:37 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Aug 2007 06:40:37 +0200
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
	<18116.43765.194435.952513@montanaro.dyndns.org>
	<46C4D13B.8070608@v.loewis.de>
	<07Aug16.161620pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <46C526C5.20906@v.loewis.de>

Bill Janssen schrieb:
>> I think most of these points are irrelevant. The curly braces are not
>> just syntactic sugar, at least the opening brace is not; the digit
>> is not syntactic sugar in the case of message translations.
> 
> Are there "computation of matching braces" problems here?

I don't understand: AFAIK, the braces don't nest, so the closing
brace just marks the end of the place holder (which in the printf
format is defined by the type letter).

>> That lots of people are familiar with the old format and only few are
>> with the new is merely a matter of time.
> 
> Sure, but the problem is that there are a lot of Python programmers
> *now* and learning the new syntax imposes a burden on all of *them*.
> Who cares how many people know the syntax in the future?

That is the problem of any change, right? People know the current
language; they don't know the changed language. Still, there are
conditions when change is "allowed".

For example, the syntax of the except clause changes in Py3k, replacing
the comma with "as"; this is also a burden for all Python programmers,
yet the change has been made.

>> That the new format is more verbose than the old one is true, but only
>> slightly so - typing .format is actually easier for me than typing
>> % (which requires a shift key).
> 
> I don't mind the switch to ".format"; it's the formatting codes that I
> don't want to see changed.

Ok. For these, the "more verbose" argument holds even less: in the most
simple case, it's just one character more verbose per placeholder.

Regards,
Martin

From eric+python-dev at trueblade.com  Fri Aug 17 07:09:21 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 17 Aug 2007 01:09:21 -0400
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C526C5.20906@v.loewis.de>
References: <46C2809C.3000806@acm.org>
	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>	<18116.43765.194435.952513@montanaro.dyndns.org>	<46C4D13B.8070608@v.loewis.de>	<07Aug16.161620pdt."57996"@synergy1.parc.xerox.com>
	<46C526C5.20906@v.loewis.de>
Message-ID: <46C52D81.9010003@trueblade.com>

Martin v. L?wis wrote:
> Bill Janssen schrieb:
>>> I think most of these points are irrelevant. The curly braces are not
>>> just syntactic sugar, at least the opening brace is not; the digit
>>> is not syntactic sugar in the case of message translations.
>> Are there "computation of matching braces" problems here?
> 
> I don't understand: AFAIK, the braces don't nest, so the closing
> brace just marks the end of the place holder (which in the printf
> format is defined by the type letter).

I don't understand, either.  The braces do nest, but I don't know what 
the "computation of matching brace" problem is.

This test currently passes in my implementation:
         self.assertEqual('{0[{bar}]}'.format('abcdefg', bar=4), 'e')

This shows nesting braces working.  Bill, what problem are you thinking of?

Eric.


From rrr at ronadam.com  Fri Aug 17 07:49:33 2007
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 17 Aug 2007 00:49:33 -0500
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C526C5.20906@v.loewis.de>
References: <46C2809C.3000806@acm.org>
	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>	<18116.43765.194435.952513@montanaro.dyndns.org>	<46C4D13B.8070608@v.loewis.de>	<07Aug16.161620pdt."57996"@synergy1.parc.xerox.com>
	<46C526C5.20906@v.loewis.de>
Message-ID: <46C536ED.6060500@ronadam.com>



Martin v. Lo"wis wrote:
> Bill Janssen schrieb:
>>> I think most of these points are irrelevant. The curly braces are not
>>> just syntactic sugar, at least the opening brace is not; the digit
>>> is not syntactic sugar in the case of message translations.
>> Are there "computation of matching braces" problems here?
> 
> I don't understand: AFAIK, the braces don't nest, so the closing
> brace just marks the end of the place holder (which in the printf
> format is defined by the type letter).

They can nest.  See these tests that Eric posted earlier, second example.

Eric Smith wrote:
> These tests all pass:
> 
> self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e')
> self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e')
> self.assertEqual("My name is {0}".format('Fred'), "My name is Fred")
> self.assertEqual("My name is {0[name]}".format(dict(name='Fred')),
>                   "My name is Fred")
> self.assertEqual("My name is {0} :-{{}}".format('Fred'),
>                   "My name is Fred :-{}")


So expressions like the following might be difficult to spell.

      '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred")

This would probably produce an unmatched brace error on the first '}'.


>>> That lots of people are familiar with the old format and only few are
>>> with the new is merely a matter of time.
>> Sure, but the problem is that there are a lot of Python programmers
>> *now* and learning the new syntax imposes a burden on all of *them*.
>> Who cares how many people know the syntax in the future?
> 
> That is the problem of any change, right? People know the current
> language; they don't know the changed language. Still, there are
> conditions when change is "allowed".
> 
> For example, the syntax of the except clause changes in Py3k, replacing
> the comma with "as"; this is also a burden for all Python programmers,
> yet the change has been made.
> 
>>> That the new format is more verbose than the old one is true, but only
>>> slightly so - typing .format is actually easier for me than typing
>>> % (which requires a shift key).
>> I don't mind the switch to ".format"; it's the formatting codes that I
>> don't want to see changed.
> 
> Ok. For these, the "more verbose" argument holds even less: in the most
> simple case, it's just one character more verbose per placeholder.

I think having more verbose syntax is a matter of trade offs.  I don't mind 
one or two characters if it saves me from writing 20 or 30 someplace else.

Such is the case if you can't do something in the format string, it means 
you need to do it someplace else that will often take up a few lines rather 
than a few characters.

_RON


From g.brandl at gmx.net  Fri Aug 17 08:16:55 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 17 Aug 2007 08:16:55 +0200
Subject: [Python-3000] [Python-Dev]  Documentation switch imminent
In-Reply-To: <acd65fa20708161643s6d7c0c42x135eb0957980aa14@mail.gmail.com>
References: <f9t2nn$ksg$1@sea.gmane.org>
	<f9v2rd$2dl$1@sea.gmane.org>	<ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>
	<acd65fa20708161643s6d7c0c42x135eb0957980aa14@mail.gmail.com>
Message-ID: <fa3egl$56d$1@sea.gmane.org>

Alexandre Vassalotti schrieb:
> On 8/16/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
>> On 8/15/07, Georg Brandl <g.brandl at gmx.net> wrote:
>> > Okay, I made the switch.  I tagged the state of both Python branches
>> > before the switch as tags/py{26,3k}-before-rstdocs/.
>>
>> http://docs.python.org/dev/
>> http://docs.python.org/dev/3.0/
>>
> 
> Is it just me, or the markup of the new docs is quite heavy?

Docutils markup tends to be a bit verbose, yes, but the index is not
even generated by them.

> alex% wget -q -O- http://docs.python.org/api/genindex.html | wc -c
> 77868
> alex% wget -q -O- http://docs.python.org/dev/3.0/genindex.html | wc -c
> 918359

The new index includes all documents (api, lib, ref, ...), so the ratio
is more like 678000 : 950000 (using 2.6 here), and the difference can be
explained quite easily because (a) sphinx uses different anchor names
("mailbox.Mailbox.__contains__" vs "l2h-849") and the hrefs have to
include subdirs like "reference/".

I've now removed leading spaces in the index output, and the character
count is down to 850000.

> Firefox, on my fairly recent machine, takes ~5 seconds rendering the
> index of the new docs from disk, compared to a fraction of a second
> for the old one.

But you're right that rendering is slow there.  It may be caused by the
more complicated CSS... perhaps the index should be split up in several
pages.

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From walter at livinglogic.de  Fri Aug 17 10:20:22 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Fri, 17 Aug 2007 10:20:22 +0200
Subject: [Python-3000] Two new test failures (one OSX PPC only)
In-Reply-To: <ca471dc20708162053m3c55c2ecyeed870dbbbdf7a4f@mail.gmail.com>
References: <ca471dc20708162053m3c55c2ecyeed870dbbbdf7a4f@mail.gmail.com>
Message-ID: <46C55A46.4040608@livinglogic.de>

Guido van Rossum wrote:

> I see two new tests failing tonight:
> 
> - test_xmlrpc fails on all platforms I have. This is due to several
> new tests that were merged in from the trunk; presumably those tests
> need changes due to str vs. bytes.
> 
> - test_codecs fails on OSX PPC only. This is in the new UTF-32 codecs;
> probably a byte order issue.

We have a PPC Mac here at work, so I can investigate where the problem lies.

 > [...]

Servus,
    Walter

From victor.stinner at haypocalc.com  Fri Aug 17 11:23:00 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 17 Aug 2007 11:23:00 +0200
Subject: [Python-3000] format() method and % operator
Message-ID: <200708171123.00674.victor.stinner@haypocalc.com>

Hi,

I read many people saying that
   "{0} {1}".format('Hello', 'World')
is easiert to read than
   "%s %s" % ('Hello', 'World')


But for me it looks to be more complex: we have to maintain indexes (0, 1, 
2, ...), marker is different ({0} != {1}), etc.


I didn't read the PEP nor all email discussions. So can you tell me if it 
would be possible to write simply:
   "{} {}".format('Hello', 'World')


Victor Stinner aka haypo
http://hachoir.org/

From eric+python-dev at trueblade.com  Fri Aug 17 12:45:54 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 17 Aug 2007 06:45:54 -0400
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C536ED.6060500@ronadam.com>
References: <46C2809C.3000806@acm.org>	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>	<18116.43765.194435.952513@montanaro.dyndns.org>	<46C4D13B.8070608@v.loewis.de>	<07Aug16.161620pdt."57996"@synergy1.parc.xerox.com>	<46C526C5.20906@v.loewis.de>
	<46C536ED.6060500@ronadam.com>
Message-ID: <46C57C62.3040000@trueblade.com>

Ron Adam wrote:
> 
> Martin v. Lo"wis wrote:
>> Bill Janssen schrieb:
>>>> I think most of these points are irrelevant. The curly braces are not
>>>> just syntactic sugar, at least the opening brace is not; the digit
>>>> is not syntactic sugar in the case of message translations.
>>> Are there "computation of matching braces" problems here?
>> I don't understand: AFAIK, the braces don't nest, so the closing
>> brace just marks the end of the place holder (which in the printf
>> format is defined by the type letter).

> So expressions like the following might be difficult to spell.
> 
>       '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred")
> 
> This would probably produce an unmatched brace error on the first '}'.

Ah, I see.  I hadn't thought of that case.  You're correct, it gives an 
error on the first '}'.  This is a case where allowing whitespace would 
solve the problem, sort of like C++'s "< <" template issue (which I 
think they've since addressed).  I'm not sure if it's worth doing, though:

'{ {foo}{bar} }'.format(foo='FOO', bar='BAR', FOOBAR = "Fred")

On second thought, that won't work.  For example, this currently doesn't 
work:
'{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR')
KeyError: 'FOO'

I can't decide if that's a bug or not.

Eric.



From skip at pobox.com  Fri Aug 17 15:02:35 2007
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 17 Aug 2007 08:02:35 -0500
Subject: [Python-3000] AtheOS?
Message-ID: <18117.40043.124714.520626@montanaro.dyndns.org>

I just got rid of BeOS and RiscOS.  I'm about to launch into Irix and Tru64,
the other two listed on

    http://wiki.python.org/moin/Py3kDeprecated

I wonder, should AtheOS support be removed as well?  According to Wikipedia
it's no longer being developed, having been superceded by something called
Syllable.  According to the Syllable Wikipedia page, it supports Python.

Skip

From lists at cheimes.de  Fri Aug 17 14:44:55 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 17 Aug 2007 14:44:55 +0200
Subject: [Python-3000] Two new test failures (one OSX PPC only)
In-Reply-To: <ca471dc20708162053m3c55c2ecyeed870dbbbdf7a4f@mail.gmail.com>
References: <ca471dc20708162053m3c55c2ecyeed870dbbbdf7a4f@mail.gmail.com>
Message-ID: <46C59847.9040501@cheimes.de>

Guido van Rossum wrote:
> There's still one leak that Neal would like to see fixed, in
> test_zipimport. Instructions to reproduce: in a *debug* build, run
> this command:
> 
>   ./python Lib/test/regrtest.py -R1:1: test_zipimport
> 
> This reports 29 leaked references.
> 

Err, this patch is using a better name for the data. *blush*

Index: Modules/zipimport.c
===================================================================
--- Modules/zipimport.c (Revision 57115)
+++ Modules/zipimport.c (Arbeitskopie)
@@ -851,10 +851,11 @@
        }
        buf[data_size] = '\0';

-       if (compress == 0) {  /* data is not compressed */
-               raw_data = PyBytes_FromStringAndSize(buf, data_size);
-               return raw_data;
-       }
+       if (compress == 0) { /* data is not compressed */
+               data = PyBytes_FromStringAndSize(buf, data_size);
+               Py_DECREF(raw_data);
+               return data;
+    }

        /* Decompress with zlib */
        decompress = get_decompress_func();

Christian

From lists at cheimes.de  Fri Aug 17 14:41:07 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 17 Aug 2007 14:41:07 +0200
Subject: [Python-3000] Two new test failures (one OSX PPC only)
In-Reply-To: <ca471dc20708162053m3c55c2ecyeed870dbbbdf7a4f@mail.gmail.com>
References: <ca471dc20708162053m3c55c2ecyeed870dbbbdf7a4f@mail.gmail.com>
Message-ID: <46C59763.5080408@cheimes.de>

Guido van Rossum wrote:
> There's still one leak that Neal would like to see fixed, in
> test_zipimport. Instructions to reproduce: in a *debug* build, run
> this command:
> 
>   ./python Lib/test/regrtest.py -R1:1: test_zipimport
> 
> This reports 29 leaked references.
> 

I found the problem in Modules/zipimport.c around line 850. raw_data
wasn't DECREFed.

LC_ALL=C svn diff Modules/zipimport.c
Index: Modules/zipimport.c
===================================================================
--- Modules/zipimport.c (revision 57115)
+++ Modules/zipimport.c (working copy)
@@ -851,10 +851,11 @@
        }
        buf[data_size] = '\0';

-       if (compress == 0) {  /* data is not compressed */
-               raw_data = PyBytes_FromStringAndSize(buf, data_size);
-               return raw_data;
-       }
+       if (compress == 0) { /* data is not compressed */
+               decompress = PyBytes_FromStringAndSize(buf, data_size);
+               Py_DECREF(raw_data);
+               return decompress;
+    }

        /* Decompress with zlib */
        decompress = get_decompress_func();

Christian

From ncoghlan at gmail.com  Fri Aug 17 16:00:25 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 18 Aug 2007 00:00:25 +1000
Subject: [Python-3000] [Python-Dev]  Documentation switch imminent
In-Reply-To: <fa3egl$56d$1@sea.gmane.org>
References: <f9t2nn$ksg$1@sea.gmane.org>	<f9v2rd$2dl$1@sea.gmane.org>	<ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>	<acd65fa20708161643s6d7c0c42x135eb0957980aa14@mail.gmail.com>
	<fa3egl$56d$1@sea.gmane.org>
Message-ID: <46C5A9F9.1070002@gmail.com>

Georg Brandl wrote:
>> Firefox, on my fairly recent machine, takes ~5 seconds rendering the
>> index of the new docs from disk, compared to a fraction of a second
>> for the old one.
> 
> But you're right that rendering is slow there.  It may be caused by the
> more complicated CSS... perhaps the index should be split up in several
> pages.

Splitting out the C API index would probably be a reasonable start. (It 
may also be worth considering ignoring a leading Py or _Py in that index 
- many of the C API index entries end up under just two index groups).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Fri Aug 17 16:32:15 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Aug 2007 07:32:15 -0700
Subject: [Python-3000] AtheOS?
In-Reply-To: <18117.40043.124714.520626@montanaro.dyndns.org>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
Message-ID: <ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>

I'd get in touch with the last known maintainer of the AtheOS port, or
some of the Syllabe maintainers (if all else fails, spam their wiki's
front page :-). If it's just a renaming maybe they're relying on the
same #ifdefs still.

Thanks for doing this BTW! I love cleanups.

(If there are other people interested in helping out with cleanups,
getting rid of deprecated behavior is also a great starter project.
Look for DeprecationWarning in Python or C code.)

On 8/17/07, skip at pobox.com <skip at pobox.com> wrote:
> I just got rid of BeOS and RiscOS.  I'm about to launch into Irix and Tru64,
> the other two listed on
>
>     http://wiki.python.org/moin/Py3kDeprecated
>
> I wonder, should AtheOS support be removed as well?  According to Wikipedia
> it's no longer being developed, having been superceded by something called
> Syllable.  According to the Syllable Wikipedia page, it supports Python.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Fri Aug 17 17:37:52 2007
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 17 Aug 2007 10:37:52 -0500
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C57C62.3040000@trueblade.com>
References: <46C2809C.3000806@acm.org>	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>	<18116.43765.194435.952513@montanaro.dyndns.org>	<46C4D13B.8070608@v.loewis.de>	<07Aug16.161620pdt."57996"@synergy1.parc.xerox.com>	<46C526C5.20906@v.loewis.de>
	<46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com>
Message-ID: <46C5C0D0.7040609@ronadam.com>



Eric Smith wrote:
> Ron Adam wrote:
>>
>> Martin v. Lo"wis wrote:
>>> Bill Janssen schrieb:
>>>>> I think most of these points are irrelevant. The curly braces are not
>>>>> just syntactic sugar, at least the opening brace is not; the digit
>>>>> is not syntactic sugar in the case of message translations.
>>>> Are there "computation of matching braces" problems here?
>>> I don't understand: AFAIK, the braces don't nest, so the closing
>>> brace just marks the end of the place holder (which in the printf
>>> format is defined by the type letter).
> 
>> So expressions like the following might be difficult to spell.
>>
>>       '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred")
>>
>> This would probably produce an unmatched brace error on the first '}'.
> 
> Ah, I see.  I hadn't thought of that case.  You're correct, it gives an 
> error on the first '}'.  This is a case where allowing whitespace would 
> solve the problem, sort of like C++'s "< <" template issue (which I 
> think they've since addressed).  I'm not sure if it's worth doing, though:
> 
> '{ {foo}{bar} }'.format(foo='FOO', bar='BAR', FOOBAR = "Fred")
> 
> On second thought, that won't work.  For example, this currently doesn't 
> work:
> '{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR')
> KeyError: 'FOO'
> 
> I can't decide if that's a bug or not.

I think it will be a bug.  Some one is bound to run into it at some point 
if they are using nested braces routinely.  Although most people never 
will, so it may be a limitation we can live with.

White space will only work on the name side, not the specifier side of the 
colon as it's significant on that side.

_RON





From guido at python.org  Fri Aug 17 17:42:56 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Aug 2007 08:42:56 -0700
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C57C62.3040000@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
	<18116.43765.194435.952513@montanaro.dyndns.org>
	<46C4D13B.8070608@v.loewis.de> <46C526C5.20906@v.loewis.de>
	<46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com>
Message-ID: <ca471dc20708170842i27b1f6f5rc244f06a8a5045c@mail.gmail.com>

I think you should just disallow {...} for the start of the variable
reference. I.e. {0.{1}} is okay, but {{1}} is not.

On 8/17/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Ron Adam wrote:
> >
> > Martin v. Lo"wis wrote:
> >> Bill Janssen schrieb:
> >>>> I think most of these points are irrelevant. The curly braces are not
> >>>> just syntactic sugar, at least the opening brace is not; the digit
> >>>> is not syntactic sugar in the case of message translations.
> >>> Are there "computation of matching braces" problems here?
> >> I don't understand: AFAIK, the braces don't nest, so the closing
> >> brace just marks the end of the place holder (which in the printf
> >> format is defined by the type letter).
>
> > So expressions like the following might be difficult to spell.
> >
> >       '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred")
> >
> > This would probably produce an unmatched brace error on the first '}'.
>
> Ah, I see.  I hadn't thought of that case.  You're correct, it gives an
> error on the first '}'.  This is a case where allowing whitespace would
> solve the problem, sort of like C++'s "< <" template issue (which I
> think they've since addressed).  I'm not sure if it's worth doing, though:
>
> '{ {foo}{bar} }'.format(foo='FOO', bar='BAR', FOOBAR = "Fred")
>
> On second thought, that won't work.  For example, this currently doesn't
> work:
> '{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR')
> KeyError: 'FOO'
>
> I can't decide if that's a bug or not.
>
> Eric.
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From cjw at sympatico.ca  Fri Aug 17 16:42:51 2007
From: cjw at sympatico.ca (Colin J. Williams)
Date: Fri, 17 Aug 2007 10:42:51 -0400
Subject: [Python-3000] format() method and % operator
In-Reply-To: <200708171123.00674.victor.stinner@haypocalc.com>
References: <200708171123.00674.victor.stinner@haypocalc.com>
Message-ID: <fa4c5j$vv2$1@sea.gmane.org>

Victor Stinner wrote:
> Hi,
> 
> I read many people saying that
>    "{0} {1}".format('Hello', 'World')
> is easiert to read than
>    "%s %s" % ('Hello', 'World')
> 
Not me.
> 
> But for me it looks to be more complex: we have to maintain indexes (0, 1, 
> 2, ...), marker is different ({0} != {1}), etc.
> 
> 
> I didn't read the PEP nor all email discussions. 
Ditto

Colin W.


From martin at v.loewis.de  Fri Aug 17 18:17:48 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Aug 2007 18:17:48 +0200
Subject: [Python-3000] Nested brackets (Was: Please don't kill the %
	operator...)
In-Reply-To: <46C5C0D0.7040609@ronadam.com>
References: <46C2809C.3000806@acm.org>	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>	<18116.43765.194435.952513@montanaro.dyndns.org>	<46C4D13B.8070608@v.loewis.de>	<07Aug16.161620pdt."57996"@synergy1.parc.xerox.com>	<46C526C5.20906@v.loewis.de>
	<46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com>
	<46C5C0D0.7040609@ronadam.com>
Message-ID: <46C5CA2C.1000306@v.loewis.de>

>> On second thought, that won't work.  For example, this currently
>> doesn't work:
>> '{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR')
>> KeyError: 'FOO'
>>
>> I can't decide if that's a bug or not.
> 
> I think it will be a bug.  Some one is bound to run into it at some
> point if they are using nested braces routinely.  Although most people
> never will, so it may be a limitation we can live with.

OK, I think both the PEP and the understanding must get some serious
tightening.

According to the PEP, "The rules for parsing an item key are very
simple" - unfortunately without specifying what the rules actually
*are*, other than "If it starts with a digit, then its treated as a
number, otherwise it is used as a string".

So we know the key is a string (it does not start with a digit); the
next question: which string?

The PEP says, as an implementation note, "The str.format() function
will have a minimalist parser which only attempts to figure out when it
is "done" with an identifier (by finding a '.' or a ']', or '}', etc.)."

This probably means to say that it looks for ']' in this context (a
getitem operator), so then the string would be "{foo}{bar}". I would
expect that this produces

KeyError: '{foo}{bar}'

I.e. according to the PEP

a) nested curly braces are not supported in compound field names (*),
   the only valid operators are '.' and '[]'.
b) concatenation of strings in keys is not supported (again because
   the only operators are getattr and getitem)

I now agree with Bill that we have a "computation of matching braces
problem", surprisingly: people disagree with each other and with the
PEP what the meaning of the braces in above example is.

Regards,
Martin

(*) they are supported in format specifiers

From alexandre at peadrop.com  Fri Aug 17 18:28:43 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Fri, 17 Aug 2007 12:28:43 -0400
Subject: [Python-3000] [Python-Dev] Documentation switch imminent
In-Reply-To: <fa3egl$56d$1@sea.gmane.org>
References: <f9t2nn$ksg$1@sea.gmane.org> <f9v2rd$2dl$1@sea.gmane.org>
	<ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>
	<acd65fa20708161643s6d7c0c42x135eb0957980aa14@mail.gmail.com>
	<fa3egl$56d$1@sea.gmane.org>
Message-ID: <acd65fa20708170928y6dc91623tefa112d4fe7528e5@mail.gmail.com>

On 8/17/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Alexandre Vassalotti schrieb:
> > On 8/16/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> >> On 8/15/07, Georg Brandl <g.brandl at gmx.net> wrote:
> >> > Okay, I made the switch.  I tagged the state of both Python branches
> >> > before the switch as tags/py{26,3k}-before-rstdocs/.
> >>
> >> http://docs.python.org/dev/
> >> http://docs.python.org/dev/3.0/
> >>
> >
> > Is it just me, or the markup of the new docs is quite heavy?
>
> Docutils markup tends to be a bit verbose, yes, but the index is not
> even generated by them.
>
> > alex% wget -q -O- http://docs.python.org/api/genindex.html | wc -c
> > 77868
> > alex% wget -q -O- http://docs.python.org/dev/3.0/genindex.html | wc -c
> > 918359
>
> The new index includes all documents (api, lib, ref, ...), so the ratio
> is more like 678000 : 950000 (using 2.6 here), and the difference can be
> explained quite easily because (a) sphinx uses different anchor names
> ("mailbox.Mailbox.__contains__" vs "l2h-849") and the hrefs have to
> include subdirs like "reference/".

Ah, I didn't notice that index included all the documents. That
explains the huge size increase. However, would it be possible to keep
the indexes separated? I noticed that I find I want more quickly when
the indexes are separated.

> I've now removed leading spaces in the index output, and the character
> count is down to 850000.
>
> > Firefox, on my fairly recent machine, takes ~5 seconds rendering the
> > index of the new docs from disk, compared to a fraction of a second
> > for the old one.
>
> But you're right that rendering is slow there.  It may be caused by the
> more complicated CSS... perhaps the index should be split up in several
> pages.
>

I disabled CSS-support (with View->Page Style->No Style), but it
didn't affect the initial rendering speed. However, scrolling was
*much* faster without CSS.

-- Alexandre

From martin at v.loewis.de  Fri Aug 17 18:32:01 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Aug 2007 18:32:01 +0200
Subject: [Python-3000] AtheOS?
In-Reply-To: <ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>
Message-ID: <46C5CD81.1040903@v.loewis.de>

Guido van Rossum schrieb:
> I'd get in touch with the last known maintainer of the AtheOS port, or
> some of the Syllabe maintainers (if all else fails, spam their wiki's
> front page :-). If it's just a renaming maybe they're relying on the
> same #ifdefs still.

The port was originally contributed by Octavian Cerna (sf:tavyc),
in bugs.python.org/488073. He did that because the version of Python
that came with AtheOS was 1.5.2.

> Thanks for doing this BTW! I love cleanups.

It took some effort to integrate this for 2.3, so I feel sad that this
is now all ripped out again. I'm not certain the code gets cleaner
that way - just smaller. Perhaps I should just reject patches that
port Python to minor platforms in the future, as the chance is high
that the original contributor won't keep it up-to-date, and nobody else
will, either, for several years.

Regards,
Martin

From guido at python.org  Fri Aug 17 18:40:25 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Aug 2007 09:40:25 -0700
Subject: [Python-3000] AtheOS?
In-Reply-To: <46C5CD81.1040903@v.loewis.de>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>
	<46C5CD81.1040903@v.loewis.de>
Message-ID: <ca471dc20708170940n55e06e14p6689ac6969dfcb8c@mail.gmail.com>

On 8/17/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Guido van Rossum schrieb:
> > I'd get in touch with the last known maintainer of the AtheOS port, or
> > some of the Syllabe maintainers (if all else fails, spam their wiki's
> > front page :-). If it's just a renaming maybe they're relying on the
> > same #ifdefs still.
>
> The port was originally contributed by Octavian Cerna (sf:tavyc),
> in bugs.python.org/488073. He did that because the version of Python
> that came with AtheOS was 1.5.2.
>
> > Thanks for doing this BTW! I love cleanups.
>
> It took some effort to integrate this for 2.3, so I feel sad that this
> is now all ripped out again. I'm not certain the code gets cleaner
> that way - just smaller. Perhaps I should just reject patches that
> port Python to minor platforms in the future, as the chance is high
> that the original contributor won't keep it up-to-date, and nobody else
> will, either, for several years.

True, I'm also a bit sad -- my pride used to be the number of
platforms that ran Python. But minority platforms need to learn that
they should support the established conventions rather than invent
their own if they want to be able to run most open source software.
And I think they *are* learning. Now all we need to do is get rid of
all the silly difference between the *BSD versions. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From walter at livinglogic.de  Fri Aug 17 18:42:22 2007
From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=)
Date: Fri, 17 Aug 2007 18:42:22 +0200
Subject: [Python-3000] UTF-32 codecs
In-Reply-To: <ca471dc20708161443k692d56e4haa6ff709969a27a@mail.gmail.com>
References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de>	
	<ca471dc20708161133hd890bd4m33abd2ba68d914da@mail.gmail.com>	
	<46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de>	
	<46C4AD7B.5040808@livinglogic.de> <46C4AF67.1090209@v.loewis.de>	
	<46C4C261.2080304@livinglogic.de>
	<ca471dc20708161443k692d56e4haa6ff709969a27a@mail.gmail.com>
Message-ID: <46C5CFEE.8030301@livinglogic.de>

Guido van Rossum wrote:
> On 8/16/07, Walter D?rwald <walter at livinglogic.de> wrote:
>> Martin v. L?wis wrote:
>>
>>>> A simple merge won't work, because in 3.0 the codec uses bytes and in
>>>> 2.6 it uses str. Also the call to the decoding error handler has
>>>> changed, because in 3.0 the error handler could modify the mutable input
>>>> buffer.
>>> So what's the strategy then? Block the trunk revision from merging?
>> I've never used svnmerge, so I don't know what the strategy for
>> automatic merging would be. What I would do is check in the patch for
>> the py3k branch, then apply the patch to the trunk, get it to work and
>> check it in.
> 
> Go right ahead.

Done! The bug surfacing on Mac is fixed too (stupid typo).

> I'll clean up afterwards.

Thanks!

Servus,
   Walter


From jimjjewett at gmail.com  Fri Aug 17 18:47:12 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 17 Aug 2007 12:47:12 -0400
Subject: [Python-3000] format() method and % operator
In-Reply-To: <200708171123.00674.victor.stinner@haypocalc.com>
References: <200708171123.00674.victor.stinner@haypocalc.com>
Message-ID: <fb6fbf560708170947jd2aa639oea58157396d66f4f@mail.gmail.com>

On 8/17/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> But for me it looks to be more complex: we have to maintain indexes (0, 1,
> 2, ...), marker is different ({0} != {1}), etc.

> ... tell me if it would be possible to write simply:
>    "{} {}".format('Hello', 'World')

It would be possible to support that, but I think it was excluded
intentionally, as a nudge toward more robust formatting strings.

(1)  Translators may need to reorder the arguments.  So the format
string might change from
    "{0} xxx {1}"
to a more idiomatic (in the other language)
    "yyy {1} {0}"

This doesn't by itself rule out {} for the default case, but being
explicit makes things more parallel, and easier to verify.

(2)  You already have to maintain indices mentally; it is just
bug-prone on strings long enough for the formatting language to
matter.  For example, if

    gossip="%s told %s that %"
changes to
    gossip="%s told %s on %s that %"

Then in some other part of the program, you will also have to change
    gossip % (name1, name2, msg)
to
    gossip % (name1, date, name2, msg)

Using a name mapping (speaker=, ... hearer=..., ) is a better answer,
but explicit numbers are a halfway measure.

-jJ

From skip at pobox.com  Fri Aug 17 19:02:23 2007
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 17 Aug 2007 12:02:23 -0500
Subject: [Python-3000] AtheOS?
In-Reply-To: <46C5CD81.1040903@v.loewis.de>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>
	<46C5CD81.1040903@v.loewis.de>
Message-ID: <18117.54431.306667.256066@montanaro.dyndns.org>


    Martin> It took some effort to integrate this for 2.3, so I feel sad
    Martin> that this is now all ripped out again. I'm not certain the code
    Martin> gets cleaner that way - just smaller.

Well, fewer #ifdefs can't be a bad thing.

I noticed this on the Syllable Wikipedia page:

    It was forked from the stagnant AtheOS in July 2002.

It would appear that AtheOS has been defunct for quite awhile.

Skip

From martin at v.loewis.de  Fri Aug 17 19:22:18 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Aug 2007 19:22:18 +0200
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>	<18103.34967.170146.660275@montanaro.dyndns.org>	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
	<e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
Message-ID: <46C5D94A.4030808@v.loewis.de>

> The odd thing here is that RFC 2047 (MIME) seems to be about encoding
> non-ASCII character sets in ASCII.  So the spec is kind of odd here.
> The actual bytes on the wire seem to be ASCII, but they may an
> interpretation where those ASCII bytes represent a non-ASCII string.

HTTP is fairly confused about usage of non-ASCII characters in headers.
For example, RFC 2617 specifies that, for Basic authentication, userid
and password are *TEXT (excluding : in the userid); it then says that
user-pass is base64-encoded. It nowhere says what the charset of userid
or password should be.

People now interpret that as saying: it's TEXT, so you need to encode
it according to RFC 2047 before using it in a header, requiring that
the userid first gets MIME-Q-encoded (say, or B), and then the result
gets base64-encoded again, then transmitted. Neither web browsers nor
web servers implement that correctly today.

But in short, the intention seems to be that the HTTP headers are
strict ASCII on the wire, with non-ASCII encoded using MIME header
encoding.

A library implementing that in Python should certainly use bytes
at the network (stream) side, and strings at the application side.
Even though the format is human-readable, the protocol is byte-oriented,
not character-oriented.

Regards,
Martin

From jimjjewett at gmail.com  Fri Aug 17 19:27:57 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 17 Aug 2007 13:27:57 -0400
Subject: [Python-3000] PEP 3101 clarification requests
Message-ID: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>

The PEP says:

  The general form of a standard format specifier is:

        [[fill]align][sign][width][.precision][type]

but then says:

    A zero fill character without an alignment flag
    implies an alignment type of '='.

In the above form, how can you get a fill character without an
alignment character?  And why would you want to support it; It just
makes the width look like an (old-style) octal.  (The spec already
says that you need the alignment if you use a digit other than zero.)

--------------

The explicit conversion flag is limited to "r" and "s", but I assume
that can be overridden in a Formatter subclass.  That possibility
might be worth mentioning explicitly.

--------------


    'check_unused_args' is used to implement checking
    for unused arguments ... The intersection of these two
    sets will be the set of unused args.

Huh?  I *think* the actual intent is (args union kwargs)-used.  I
can't find an intersection in there.

-----------------
    This can easily be done by overriding get_named() as follows:

I assume that should be get_value.

    class NamespaceFormatter(Formatter):
          def __init__(self, namespace={}, flags=0):
              Formatter.__init__(self, flags)

but the Formatter class took no init parameters -- should flags be
added to the Formatter constructor, or taken out of here?

The get_value override can be expressed more simply as

    def get_value(self, key, args, kwds):
        try:
            # simplify even more by assuming PEP 3135?
            super(NamespaceFormatter, self).get_value(key, args, kwds)
        except KeyError:
            return self.namespace[name]

The example usage then takes globals()...

        fmt = NamespaceFormatter(globals())
        greeting = "hello"
        print(fmt("{greeting}, world!"))

Is there now a promise that the objects returned by locals() and
globals() will be "live", so that they would reflect the new value of
"greeting", even though it was set after the Formatter was created?

-jJ

From martin at v.loewis.de  Fri Aug 17 19:36:09 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Aug 2007 19:36:09 +0200
Subject: [Python-3000] AtheOS?
In-Reply-To: <18117.54431.306667.256066@montanaro.dyndns.org>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>
	<46C5CD81.1040903@v.loewis.de>
	<18117.54431.306667.256066@montanaro.dyndns.org>
Message-ID: <46C5DC89.5040207@v.loewis.de>

>     Martin> It took some effort to integrate this for 2.3, so I feel sad
>     Martin> that this is now all ripped out again. I'm not certain the code
>     Martin> gets cleaner that way - just smaller.
> 
> Well, fewer #ifdefs can't be a bad thing.

By that principle, it would be best if Python supported only a single
platform - I would chose Linux (that would also save me creating
Windows binaries :-)

Fewer ifdefs are a bad thing if they also go along with fewer
functionality, or worse portability. As I said, people contributed their
time to write this code (in this case, it took me several hours of work,
to understand and adjust the patch being contributed), and I do find
it bad, in principle, that this work is now all declared wasted.

I'm in favor of removing code that is clearly not needed anymore,
and I (sadly) agree to removing the AtheOS port - although it's not
obvious to me that it isn't needed anymore.

My only plea is that PEP 11 gets followed strictly, i.e. that code
is only removed after users of a platform have been given the chance
to object. If it isn't followed, I better withdraw it (I notice that
AtheOS is listed for "unsupported" status in 2.6, so in this case,
it's fine).

Regards,
Martin

From steven.bethard at gmail.com  Fri Aug 17 20:13:05 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Fri, 17 Aug 2007 12:13:05 -0600
Subject: [Python-3000] format() method and % operator
In-Reply-To: <200708171123.00674.victor.stinner@haypocalc.com>
References: <200708171123.00674.victor.stinner@haypocalc.com>
Message-ID: <d11dcfba0708171113m63445f36leec51168a8e2219c@mail.gmail.com>

On 8/17/07, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> I didn't read the PEP nor all email discussions. So can you tell me if it
> would be possible to write simply:
>    "{} {}".format('Hello', 'World')

If you really want to write this, I suggest adding the following
helper function to your code somewhere::

    >>> def fix_format(fmt):
    ...     def get_index(match, indices=itertools.count()):
    ...         return str(indices.next())
    ...     return re.sub(r'(?<={)(?=})', get_index, fmt)
    ...
    >>> fix_format('{} {}')
    '{0} {1}'
    >>> fix_format('{} {} blah {}')
    '{0} {1} blah {2}'

That way, if you really want to bypass the precautions that the new
format strings try to take for you, you can do it in only four lines
of code.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From eric+python-dev at trueblade.com  Fri Aug 17 21:03:46 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 17 Aug 2007 15:03:46 -0400
Subject: [Python-3000] Looking for advice on PEP 3101 implementation details
Message-ID: <46C5F112.5050101@trueblade.com>

I'm refactoring the sandbox implementation, and I need to add the code 
that parses the standard format specifiers to py3k.  Since strings, 
ints, and floats share same format specifiers, I want to have only a 
single parser.

My first question is:  where should this parser code live?  Should I 
create a file Python/format.c, or is there a better place?  Should the 
.h file be Include/format.h?


I also need to have C code that is called by both str.format, and that 
is also used by the Formatter implementation.

So my second question is:  should I create a Module/_format.c for this 
code?  And why do some of these modules have leading underscores?  Is it 
a problem if str.format uses code in Module/_format.c?  Where would the 
.h file for this code go, if str.format (implemented in unicodeobject.c) 
needs to get access to it?

Thanks for your help, and ongoing patience with a Python internals 
newbie (but C/C++ veteran).

Eric.


PS: I realize that both of my questions have multiple parts.  Sorry if 
that's confusing.


From barry at python.org  Fri Aug 17 21:03:11 2007
From: barry at python.org (Barry Warsaw)
Date: Fri, 17 Aug 2007 15:03:11 -0400
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <46C5D94A.4030808@v.loewis.de>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>	<18103.34967.170146.660275@montanaro.dyndns.org>	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
	<e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
	<46C5D94A.4030808@v.loewis.de>
Message-ID: <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 17, 2007, at 1:22 PM, Martin v. L?wis wrote:

> A library implementing that in Python should certainly use bytes
> at the network (stream) side, and strings at the application side.
> Even though the format is human-readable, the protocol is byte- 
> oriented,
> not character-oriented.

I should point out that the email package does not really intend to  
operate from the wire endpoints.  For example, for sending a message  
over an smtp connection it expects that something like smtplib would  
properly transform \n to \r\n as required by RFC 2821.  It's a bit  
dicier on the input side since you could envision a milter or  
something taking an on-the-wire email representation from an smptd  
and parsing it into an internal representation.

As I'm working on the email package I'm realizing that classes like  
the parser and generator need to be stricter about how they interpret  
their input, and that both use cases are reasonable in many  
situations.  Sometimes you want the parser to accept strings, but  
bytes are not always unreasonable.  Similarly with generating  
output.  Internally though, I feel fairly strongly that an email  
message should be represented as strings, though sometimes (certainly  
for idempotency) you still need to carry around the charset (i.e.  
encoding).  Headers are an example of this.

The email package conflates 8-bit strings and bytes all over the  
place and I'm trying now to make its semantics much clearer.   
Ideally, the package would be well suited not only for wire-to-wire  
and all-internal uses, but also related domains like HTTP and other  
RFC 2822-like contexts.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsXw8HEjvBPtnXfVAQIjrAP+LJ5X3CPqYYMpTZHl3WQeMPq1p4SA36yo
exM518OJl/10i5DGDCxnwdVylnlQpvKG+wnjNCXSdfEf9O/Fk63tDrpGqlGBNBkx
lNGcHl/s2b+vMm8uhkqu0d1wjOo90od8HFtMA3Y1iSsJw73F4/6sZ7XPR6ERd0yU
o1EIR1sHuwE=
=pE1O
-----END PGP SIGNATURE-----

From brett at python.org  Fri Aug 17 21:23:46 2007
From: brett at python.org (Brett Cannon)
Date: Fri, 17 Aug 2007 12:23:46 -0700
Subject: [Python-3000] AtheOS?
In-Reply-To: <18117.40043.124714.520626@montanaro.dyndns.org>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
Message-ID: <bbaeab100708171223y30c0f1f7v30302c582bf4d16b@mail.gmail.com>

On 8/17/07, skip at pobox.com <skip at pobox.com> wrote:
> I just got rid of BeOS and RiscOS.

Just so you know, Skip, BeOS still has a maintainer on the 2.x branch.
 Whether we want to continue support past 2.x is another question (as
Guido says in another email, it's a hassle and so we should try to
minimize the OS support to those that follow convention).

>  I'm about to launch into Irix and Tru64,
> the other two listed on
>
>     http://wiki.python.org/moin/Py3kDeprecated
>
> I wonder, should AtheOS support be removed as well?  According to Wikipedia
> it's no longer being developed, having been superceded by something called
> Syllable.  According to the Syllable Wikipedia page, it supports Python.

AtheOS has been slated for removal after 2.6 already, so you should be
able to get rid of it.  I couldn't get a hold of a maintainer for it.

-Brett

From guido at python.org  Fri Aug 17 21:36:37 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Aug 2007 12:36:37 -0700
Subject: [Python-3000] Looking for advice on PEP 3101 implementation
	details
In-Reply-To: <46C5F112.5050101@trueblade.com>
References: <46C5F112.5050101@trueblade.com>
Message-ID: <ca471dc20708171236v34eeea14v5bf55e4b72e218a0@mail.gmail.com>

On 8/17/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> I'm refactoring the sandbox implementation, and I need to add the code
> that parses the standard format specifiers to py3k.  Since strings,
> ints, and floats share same format specifiers, I want to have only a
> single parser.

Really? Strings support only a tiny subset of the numeric
mini-language (only [-]N[.N]).

> My first question is:  where should this parser code live?  Should I
> create a file Python/format.c, or is there a better place?  Should the
> .h file be Include/format.h?

Is it only callable from C? Or is it also callable from Python? If so,
how would Python access it?

> I also need to have C code that is called by both str.format, and that
> is also used by the Formatter implementation.
>
> So my second question is:  should I create a Module/_format.c for this
> code?  And why do some of these modules have leading underscores?  Is it
> a problem if str.format uses code in Module/_format.c?  Where would the
> .h file for this code go, if str.format (implemented in unicodeobject.c)
> needs to get access to it?
>
> Thanks for your help, and ongoing patience with a Python internals
> newbie (but C/C++ veteran).

Unless the plan is for it to be importable from Python, it should not
live in Modules. Modules with an underscore are typically imported
only by a "wrapper" .py module (e.g. _hashlib.c vs. hashlib.py).
Modules without an underscore are for direct import (though there are
a few legacy exceptions, e.g. socket.c should really be _socket.c).

Putting it in Modules makes it harder to access from C, as those
modules are dynamically loaded. If you can't put it in floatobject.c,
and it's not for import, you could create a new file under Python/.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Fri Aug 17 22:14:31 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 17 Aug 2007 16:14:31 -0400
Subject: [Python-3000] Looking for advice on PEP 3101 implementation
	details
In-Reply-To: <ca471dc20708171236v34eeea14v5bf55e4b72e218a0@mail.gmail.com>
References: <46C5F112.5050101@trueblade.com>
	<ca471dc20708171236v34eeea14v5bf55e4b72e218a0@mail.gmail.com>
Message-ID: <46C601A7.6050408@trueblade.com>

Guido van Rossum wrote:
> On 8/17/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
>> I'm refactoring the sandbox implementation, and I need to add the code
>> that parses the standard format specifiers to py3k.  Since strings,
>> ints, and floats share same format specifiers, I want to have only a
>> single parser.
> 
> Really? Strings support only a tiny subset of the numeric
> mini-language (only [-]N[.N]).

I think strings are:
[[fill]align][width][.precision][type]

ints are:
[[fill]align][sign][width][type]

floats are the full thing:
[[fill]align][sign][width][.precision][type]

They seem similar enough that a single parser would make sense.  Is it 
acceptable to put this parse in unicodeobject.c, and have it callable by 
floatobject.c and longobject.c?  I'm okay with that, I just want to make 
sure I'm not violating some convention that objects don't call into each 
other's implementation files.

>> My first question is:  where should this parser code live?  Should I
>> create a file Python/format.c, or is there a better place?  Should the
>> .h file be Include/format.h?
> 
> Is it only callable from C? Or is it also callable from Python? If so,
> how would Python access it?

I think the parser only needs to be callable from C.

>> I also need to have C code that is called by both str.format, and that
>> is also used by the Formatter implementation.
>>
>> So my second question is:  should I create a Module/_format.c for this
>> code?  And why do some of these modules have leading underscores?  Is it
>> a problem if str.format uses code in Module/_format.c?  Where would the
>> .h file for this code go, if str.format (implemented in unicodeobject.c)
>> needs to get access to it?
>>
>> Thanks for your help, and ongoing patience with a Python internals
>> newbie (but C/C++ veteran).
> 
> Unless the plan is for it to be importable from Python, it should not
> live in Modules. Modules with an underscore are typically imported
> only by a "wrapper" .py module (e.g. _hashlib.c vs. hashlib.py).
> Modules without an underscore are for direct import (though there are
> a few legacy exceptions, e.g. socket.c should really be _socket.c).

The PEP calls for a string.Formatter class, that is subclassable in 
Python code.  I was originally thinking that this class would be written 
in Python, but now I'm not so sure.  Let me digest your answers here and 
I'll re-read the PEP, and see where it takes me.

> Putting it in Modules makes it harder to access from C, as those
> modules are dynamically loaded. If you can't put it in floatobject.c,
> and it's not for import, you could create a new file under Python/.

Okay.  Thanks for the help.

Eric.


From skip at pobox.com  Fri Aug 17 22:17:19 2007
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 17 Aug 2007 15:17:19 -0500
Subject: [Python-3000] AtheOS?
In-Reply-To: <bbaeab100708171223y30c0f1f7v30302c582bf4d16b@mail.gmail.com>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<bbaeab100708171223y30c0f1f7v30302c582bf4d16b@mail.gmail.com>
Message-ID: <18118.591.126364.329836@montanaro.dyndns.org>


    Brett> On 8/17/07, skip at pobox.com <skip at pobox.com> wrote:
    >> I just got rid of BeOS and RiscOS.

    Brett> Just so you know, Skip, BeOS still has a maintainer on the 2.x
    Brett> branch.  Whether we want to continue support past 2.x is another
    Brett> question (as Guido says in another email, it's a hassle and so we
    Brett> should try to minimize the OS support to those that follow
    Brett> convention).

I was going by the list on the wiki.  BeOS was on that list.  I removed it
in a single checkin so if it's decided in the near future to put it back
that should be easy to do.

    >> I'm about to launch into Irix and Tru64,
    >> the other two listed on
    >> 
    >> http://wiki.python.org/moin/Py3kDeprecated
    >> 
    >> I wonder, should AtheOS support be removed as well?  According to
    >> Wikipedia it's no longer being developed, having been superceded by
    >> something called Syllable.  According to the Syllable Wikipedia page,
    >> it supports Python.

    Brett> AtheOS has been slated for removal after 2.6 already, so you
    Brett> should be able to get rid of it.  I couldn't get a hold of a
    Brett> maintainer for it.

I'm curious about this Syllable thing.  It is a fork of AtheOS, appears
to be currently maintained, and advertises that Python is supported.  I
posted a note to a discussion forum asking whether Syllable relies on the
AtheOS bits:

    http://www.syllable.org/discussion.php?topic_id=2320

and got a reply back saying that yes, they do use the AtheOS stuff.  I will
hold off on removing the AtheOS bits until we clear up things with the
Syllable folks.

Skip

From alexandre at peadrop.com  Fri Aug 17 22:50:24 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Fri, 17 Aug 2007 16:50:24 -0400
Subject: [Python-3000] AtheOS?
In-Reply-To: <18117.54431.306667.256066@montanaro.dyndns.org>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>
	<46C5CD81.1040903@v.loewis.de>
	<18117.54431.306667.256066@montanaro.dyndns.org>
Message-ID: <acd65fa20708171350v343c0814p59053e150f093efa@mail.gmail.com>

[disclaimer: I am a clueless newbie in the portability area.]

On 8/17/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     Martin> It took some effort to integrate this for 2.3, so I feel sad
>     Martin> that this is now all ripped out again. I'm not certain the code
>     Martin> gets cleaner that way - just smaller.
>
> Well, fewer #ifdefs can't be a bad thing.
>

Perhaps, it would be a good idea to take Plan9's approach to
portability -- i.e., you develop an extreme allergy to code filled
with #if, #ifdef, #else, #elseif; localize system dependencies in
separate files and hide them behind interfaces.

By the way, there is a great chapter about portability in The Practice
of Programming, by Brian W. Kernighan and Rob Pike
(http://plan9.bell-labs.com/cm/cs/tpop/). That is where I first
learned about this approach.

-- Alexandre

From janssen at parc.com  Fri Aug 17 22:54:38 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 17 Aug 2007 13:54:38 PDT
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org> 
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
	<e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
	<46C5D94A.4030808@v.loewis.de>
	<786213B3-2067-41DA-9EE0-75ECB78B240A@python.org>
Message-ID: <07Aug17.135444pdt."57996"@synergy1.parc.xerox.com>

> Ideally, the package would be well suited not only for wire-to-wire
> and all-internal uses, but also related domains like HTTP and other
> RFC 2822-like contexts.

But that's exactly why the internal representation should be bytes,
not strings.  HTTP's use of MIME, for instance, uses "binary" quite a
lot.

> Internally though, I feel fairly strongly that an email 
> message should be represented as strings, though sometimes (certainly 
> for idempotency) you still need to carry around the charset (i.e.
> encoding).

What if you've got a PNG as one of the multipart components?  With a
Content-Transfer-Encoding of "binary"?  There's no way to represent that
as a string.

I wonder if we're misunderstanding each other here.  The "mail
message" itself is essentially a binary data structure, not a sequence
of strings, though many of its fields consist of carefully specified
string values.  Is that what you're saying?  That when decoding the
message, the fields which are string-valued should be kept as strings
in the internal Python representation of the message?

Bill


From guido at python.org  Fri Aug 17 22:58:51 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Aug 2007 13:58:51 -0700
Subject: [Python-3000] Looking for advice on PEP 3101 implementation
	details
In-Reply-To: <46C601A7.6050408@trueblade.com>
References: <46C5F112.5050101@trueblade.com>
	<ca471dc20708171236v34eeea14v5bf55e4b72e218a0@mail.gmail.com>
	<46C601A7.6050408@trueblade.com>
Message-ID: <ca471dc20708171358o33543610i3d2b6f551db5c80f@mail.gmail.com>

On 8/17/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Guido van Rossum wrote:
> > On 8/17/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> >> I'm refactoring the sandbox implementation, and I need to add the code
> >> that parses the standard format specifiers to py3k.  Since strings,
> >> ints, and floats share same format specifiers, I want to have only a
> >> single parser.
> >
> > Really? Strings support only a tiny subset of the numeric
> > mini-language (only [-]N[.N]).
>
> I think strings are:
> [[fill]align][width][.precision][type]

The fill doesn't do anything in 2.x.

> ints are:
> [[fill]align][sign][width][type]

I thought the sign came first. But it appears both orders are accepted.

> floats are the full thing:
> [[fill]align][sign][width][.precision][type]
>
> They seem similar enough that a single parser would make sense.  Is it
> acceptable to put this parse in unicodeobject.c, and have it callable by
> floatobject.c and longobject.c?  I'm okay with that, I just want to make
> sure I'm not violating some convention that objects don't call into each
> other's implementation files.

Sure, that's fine.

> >> My first question is:  where should this parser code live?  Should I
> >> create a file Python/format.c, or is there a better place?  Should the
> >> .h file be Include/format.h?
> >
> > Is it only callable from C? Or is it also callable from Python? If so,
> > how would Python access it?
>
> I think the parser only needs to be callable from C.

Great.

> >> I also need to have C code that is called by both str.format, and that
> >> is also used by the Formatter implementation.
> >>
> >> So my second question is:  should I create a Module/_format.c for this
> >> code?  And why do some of these modules have leading underscores?  Is it
> >> a problem if str.format uses code in Module/_format.c?  Where would the
> >> .h file for this code go, if str.format (implemented in unicodeobject.c)
> >> needs to get access to it?
> >>
> >> Thanks for your help, and ongoing patience with a Python internals
> >> newbie (but C/C++ veteran).
> >
> > Unless the plan is for it to be importable from Python, it should not
> > live in Modules. Modules with an underscore are typically imported
> > only by a "wrapper" .py module (e.g. _hashlib.c vs. hashlib.py).
> > Modules without an underscore are for direct import (though there are
> > a few legacy exceptions, e.g. socket.c should really be _socket.c).
>
> The PEP calls for a string.Formatter class, that is subclassable in
> Python code.  I was originally thinking that this class would be written
> in Python, but now I'm not so sure.  Let me digest your answers here and
> I'll re-read the PEP, and see where it takes me.

Also talk to Talin, we had long discussions about this at some point.
I think the Formatter class can be written in Python, because none of
the C code involved in the built-in format() needs it.

> > Putting it in Modules makes it harder to access from C, as those
> > modules are dynamically loaded. If you can't put it in floatobject.c,
> > and it's not for import, you could create a new file under Python/.
>
> Okay.  Thanks for the help.

You're welcome.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Fri Aug 17 23:03:59 2007
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 17 Aug 2007 16:03:59 -0500
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <46C57C62.3040000@trueblade.com>
References: <46C2809C.3000806@acm.org>	<46C3C1DE.6070302@cs.rmit.edu.au>	<18116.17176.123168.265491@montanaro.dyndns.org>	<fa256g$orv$1@sea.gmane.org>	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>	<18116.43765.194435.952513@montanaro.dyndns.org>	<46C4D13B.8070608@v.loewis.de>	<07Aug16.161620pdt."57996"@synergy1.parc.xerox.com>	<46C526C5.20906@v.loewis.de>
	<46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com>
Message-ID: <46C60D3F.1000808@ronadam.com>



Eric Smith wrote:
> Ron Adam wrote:
>>
>> Martin v. Lo"wis wrote:
>>> Bill Janssen schrieb:
>>>>> I think most of these points are irrelevant. The curly braces are not
>>>>> just syntactic sugar, at least the opening brace is not; the digit
>>>>> is not syntactic sugar in the case of message translations.
>>>> Are there "computation of matching braces" problems here?
>>> I don't understand: AFAIK, the braces don't nest, so the closing
>>> brace just marks the end of the place holder (which in the printf
>>> format is defined by the type letter).
> 
>> So expressions like the following might be difficult to spell.
>>
>>       '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred")
>>
>> This would probably produce an unmatched brace error on the first '}'.
> 
> Ah, I see.  I hadn't thought of that case.  You're correct, it gives an 
> error on the first '}'.  This is a case where allowing whitespace would 
> solve the problem, sort of like C++'s "< <" template issue (which I 
> think they've since addressed).  I'm not sure if it's worth doing, though:
> 
> '{ {foo}{bar} }'.format(foo='FOO', bar='BAR', FOOBAR = "Fred")
> 
> On second thought, that won't work.  For example, this currently doesn't 
> work:
> '{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR')
> KeyError: 'FOO'
> 
> I can't decide if that's a bug or not.

I think if we escaped the braces with '\' it will work nicer.

I used the following to test the idea and it seems to work and should 
convert to C without any trouble.  So to those who can say, would something 
like this be an ok solution?

     def vformat(self, format_string, args, kwargs):
         # Needs unused args check code.
         while 1:
             front, field, back = self._get_inner_field(format_string)
             if not field:
                 break
             key, sep, spec = field.partition(':')
             value = self.get_value(key, args, kwargs)
             result = self.format_field(value, spec)
             format_string = front + result + back
         return format_string.replace('\{', '{').replace('\}', '}')

     def _get_inner_field(self, s):
         # Get an inner most field from right to left.
         end = 0
         while end < len(s):
             if s[end] == '}' and not self._is_escaped(s, end, '}'):
                 break
             end += 1
         if end == len(s):
             return s, '', ''
         start = end - 1
         while start >= 0:
             if s[start] == '{' and not self._is_escaped(s, start, '{'):
                 break
             start -= 1
         if start < 0:
             raise(ValueError, "missmatched braces")
         return s[:start], s[start+1:end], s[end+1:]

     def _is_escaped(self, s, i, char):
         # Determine if the char is escaped with '\'.
         if s[i] != char or i == 0:
             return False
         i -= 1
         n = 0
         while i >= 0 and s[i] == '\\':
             n += 1
             i -= 1
         return n % 2 == 1





From brett at python.org  Fri Aug 17 23:06:48 2007
From: brett at python.org (Brett Cannon)
Date: Fri, 17 Aug 2007 14:06:48 -0700
Subject: [Python-3000] AtheOS?
In-Reply-To: <18118.591.126364.329836@montanaro.dyndns.org>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<bbaeab100708171223y30c0f1f7v30302c582bf4d16b@mail.gmail.com>
	<18118.591.126364.329836@montanaro.dyndns.org>
Message-ID: <bbaeab100708171406k6b3d0ffeue2cf3b6fc4698f01@mail.gmail.com>

On 8/17/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     Brett> On 8/17/07, skip at pobox.com <skip at pobox.com> wrote:
>     >> I just got rid of BeOS and RiscOS.
>
>     Brett> Just so you know, Skip, BeOS still has a maintainer on the 2.x
>     Brett> branch.  Whether we want to continue support past 2.x is another
>     Brett> question (as Guido says in another email, it's a hassle and so we
>     Brett> should try to minimize the OS support to those that follow
>     Brett> convention).
>
> I was going by the list on the wiki.  BeOS was on that list.

Don't know who created the list.

> I removed it
> in a single checkin so if it's decided in the near future to put it back
> that should be easy to do.
>

OK.  I will contact the BeOS maintainer to see if they are up for
doing Python 3.0 as well.

-Brett

From jeremy at alum.mit.edu  Fri Aug 17 23:17:44 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Fri, 17 Aug 2007 17:17:44 -0400
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <-7726188769332043533@unknownmsgid>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
	<e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
	<46C5D94A.4030808@v.loewis.de>
	<786213B3-2067-41DA-9EE0-75ECB78B240A@python.org>
	<-7726188769332043533@unknownmsgid>
Message-ID: <e8bf7a530708171417r4c0343e9q262813c5bf984dc4@mail.gmail.com>

On 8/17/07, Bill Janssen <janssen at parc.com> wrote:
> > Ideally, the package would be well suited not only for wire-to-wire
> > and all-internal uses, but also related domains like HTTP and other
> > RFC 2822-like contexts.
>
> But that's exactly why the internal representation should be bytes,
> not strings.  HTTP's use of MIME, for instance, uses "binary" quite a
> lot.

In the specific case of HTTP, it certainly looks like the headers are
represented on the wire as 7-bit ASCII and could be treated as bytes
or strings by the header processing code it uses via rfc822.py.  The
actual body of the response should still be represented as bytes,
which can be converted to strings by the application.

I assume the current rfc822 handling means that MIME-encoded binary
data in HTTP headers will come back as un-decoded strings.  (But I'm
not sure.)  We don't have any tests in the httplib code for these
cases.  I would expect an application would prefer bytes for the
un-decoded data or strings for the decoded data.  Will email / rfc822
support this?

Jeremy

> > Internally though, I feel fairly strongly that an email
> > message should be represented as strings, though sometimes (certainly
> > for idempotency) you still need to carry around the charset (i.e.
> > encoding).
>
> What if you've got a PNG as one of the multipart components?  With a
> Content-Transfer-Encoding of "binary"?  There's no way to represent that
> as a string.
>
> I wonder if we're misunderstanding each other here.  The "mail
> message" itself is essentially a binary data structure, not a sequence
> of strings, though many of its fields consist of carefully specified
> string values.  Is that what you're saying?  That when decoding the
> message, the fields which are string-valued should be kept as strings
> in the internal Python representation of the message?
>
> Bill
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu
>

From martin at v.loewis.de  Fri Aug 17 23:22:41 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Fri, 17 Aug 2007 23:22:41 +0200
Subject: [Python-3000] AtheOS?
In-Reply-To: <acd65fa20708171350v343c0814p59053e150f093efa@mail.gmail.com>
References: <18117.40043.124714.520626@montanaro.dyndns.org>	
	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>	
	<46C5CD81.1040903@v.loewis.de>	
	<18117.54431.306667.256066@montanaro.dyndns.org>
	<acd65fa20708171350v343c0814p59053e150f093efa@mail.gmail.com>
Message-ID: <46C611A1.9080604@v.loewis.de>

> Perhaps, it would be a good idea to take Plan9's approach to
> portability -- i.e., you develop an extreme allergy to code filled
> with #if, #ifdef, #else, #elseif; localize system dependencies in
> separate files and hide them behind interfaces.
> 
> By the way, there is a great chapter about portability in The Practice
> of Programming, by Brian W. Kernighan and Rob Pike
> (http://plan9.bell-labs.com/cm/cs/tpop/). That is where I first
> learned about this approach.

I'm doubtful whether that makes the code more readable, as you need
to go through layers of indirections to find the place where something
is actually implemented. In any case, contributions to apply this
strategy to selected places are welcome, assuming they don't slow down
the code too much.

Regards,
Martin

From martin at v.loewis.de  Fri Aug 17 23:26:12 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Aug 2007 23:26:12 +0200
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <07Aug17.135444pdt."57996"@synergy1.parc.xerox.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
	<e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
	<46C5D94A.4030808@v.loewis.de>
	<786213B3-2067-41DA-9EE0-75ECB78B240A@python.org>
	<07Aug17.135444pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <46C61274.1050501@v.loewis.de>

> What if you've got a PNG as one of the multipart components?  With a
> Content-Transfer-Encoding of "binary"?  There's no way to represent that
> as a string.

Sure is. Any byte sequence can be interpreted as latin-1.

Not that I think this would be a good thing to do.

> I wonder if we're misunderstanding each other here.  The "mail
> message" itself is essentially a binary data structure, not a sequence
> of strings, though many of its fields consist of carefully specified
> string values.  Is that what you're saying?

I don't think so - I assume Barry really wants to use strings as the
data type to represent the internal structure. It works fine for all
aspects except for the 8bit and binary content-transfer-encodings.

Regards,
Martin

From guido at python.org  Fri Aug 17 23:44:56 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Aug 2007 14:44:56 -0700
Subject: [Python-3000] AtheOS?
In-Reply-To: <46C611A1.9080604@v.loewis.de>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>
	<46C5CD81.1040903@v.loewis.de>
	<18117.54431.306667.256066@montanaro.dyndns.org>
	<acd65fa20708171350v343c0814p59053e150f093efa@mail.gmail.com>
	<46C611A1.9080604@v.loewis.de>
Message-ID: <ca471dc20708171444p3b78a482k5b987c7499d34fb7@mail.gmail.com>

On 8/17/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Perhaps, it would be a good idea to take Plan9's approach to
> > portability -- i.e., you develop an extreme allergy to code filled
> > with #if, #ifdef, #else, #elseif; localize system dependencies in
> > separate files and hide them behind interfaces.
> >
> > By the way, there is a great chapter about portability in The Practice
> > of Programming, by Brian W. Kernighan and Rob Pike
> > (http://plan9.bell-labs.com/cm/cs/tpop/). That is where I first
> > learned about this approach.
>
> I'm doubtful whether that makes the code more readable, as you need
> to go through layers of indirections to find the place where something
> is actually implemented. In any case, contributions to apply this
> strategy to selected places are welcome, assuming they don't slow down
> the code too much.

We already do this, e.g. pyport.h contains lots of stuff like that,
and sometimes thinking about it some more makes it possible to move
more stuff there (or to another platform-specific file). It just
doesn't make sense to turn *every* silly little #ifdef into a system
API, no matter what my esteemed colleagues say. ;-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steve at holdenweb.com  Sat Aug 18 00:58:44 2007
From: steve at holdenweb.com (Steve Holden)
Date: Fri, 17 Aug 2007 18:58:44 -0400
Subject: [Python-3000] [Python-Dev] Documentation switch imminent
In-Reply-To: <acd65fa20708170928y6dc91623tefa112d4fe7528e5@mail.gmail.com>
References: <f9t2nn$ksg$1@sea.gmane.org>
	<f9v2rd$2dl$1@sea.gmane.org>	<ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>	<acd65fa20708161643s6d7c0c42x135eb0957980aa14@mail.gmail.com>	<fa3egl$56d$1@sea.gmane.org>
	<acd65fa20708170928y6dc91623tefa112d4fe7528e5@mail.gmail.com>
Message-ID: <46C62824.90002@holdenweb.com>

Alexandre Vassalotti wrote:
> On 8/17/07, Georg Brandl <g.brandl at gmx.net> wrote:
[...]
> Ah, I didn't notice that index included all the documents. That
> explains the huge size increase. However, would it be possible to keep
> the indexes separated? I noticed that I find I want more quickly when
> the indexes are separated.
> 
Which is fine when you know which section to expect to find your content 
in. But let's retain an "all-documentation" index if we can, as this is 
particularly helpful to the newcomers who aren't that familiar with the 
structure of the documentation.

>> I've now removed leading spaces in the index output, and the character
>> count is down to 850000.
>>
>>> Firefox, on my fairly recent machine, takes ~5 seconds rendering the
>>> index of the new docs from disk, compared to a fraction of a second
>>> for the old one.
>> But you're right that rendering is slow there.  It may be caused by the
>> more complicated CSS... perhaps the index should be split up in several
>> pages.
>>
> 
> I disabled CSS-support (with View->Page Style->No Style), but it
> didn't affect the initial rendering speed. However, scrolling was
> *much* faster without CSS.
> 
Probably because the positional calculations are more straightforward then.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd           http://www.holdenweb.com
Skype: holdenweb      http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------


From janssen at parc.com  Sat Aug 18 01:28:14 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 17 Aug 2007 16:28:14 PDT
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <e8bf7a530708171417r4c0343e9q262813c5bf984dc4@mail.gmail.com> 
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
	<e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
	<46C5D94A.4030808@v.loewis.de>
	<786213B3-2067-41DA-9EE0-75ECB78B240A@python.org>
	<-7726188769332043533@unknownmsgid>
	<e8bf7a530708171417r4c0343e9q262813c5bf984dc4@mail.gmail.com>
Message-ID: <07Aug17.162825pdt."57996"@synergy1.parc.xerox.com>

> On 8/17/07, Bill Janssen <janssen at parc.com> wrote:
> > > Ideally, the package would be well suited not only for wire-to-wire
> > > and all-internal uses, but also related domains like HTTP and other
> > > RFC 2822-like contexts.
> >
> > But that's exactly why the internal representation should be bytes,
> > not strings.  HTTP's use of MIME, for instance, uses "binary" quite a
> > lot.
> 
> In the specific case of HTTP, it certainly looks like the headers are
> represented on the wire as 7-bit ASCII and could be treated as bytes
> or strings by the header processing code it uses via rfc822.py.  The
> actual body of the response should still be represented as bytes,
> which can be converted to strings by the application.

Note that, in the case of HTTP, both the request message and the
response message may contain MIME-tagged binary data.  And some of the
header values for those message types may contain arbitrary RFC-8859-1
octets, not necessarily encoded.  See sections 4.2 and 2.2 of RFC
2616.

But we're not really interested in those message headers -- that's a
consideration for the HTTP libraries.  I'm just concerned about the
MIME standard, which both HTTP and email use, though in different
ways.  The MIME processing in the "email" module must follow the MIME
spec, RFC 2045, 2046, etc., rather than assume RFC 2821 (SMTP) and RFC
2822 encoding everywhere.  SMTP is only one form of message envelope.

The important thing is that we understand that raw mail messages --
say in MH format in a file -- do not consist of "lines" of "text";
they are complicated binary data structures, often largely composed of
pieces of text encoded in very specific ways.  As such, the raw
message *must* be treated as a sequence of bytes.  And the content of
any body part may also be an arbitrary sequence of bytes (which, in an
RFC 2822 context, must be encoded into ASCII octets).  The values of
any header may be an arbitrary string in an arbitrary language in an
arbitrary character set (see RFCs 2047 and 2231), though it must be
put into the message appropriately encoded as a sequence of octets
which must be drawn from a set of octets which happens to be a subset
of the octets in ASCII.

Maybe all of this argues for separating "mime" and "email" into two
different packages.  And maybe renaming "email" "internet-email" or
"rfc2822-email".

Bill



From janssen at parc.com  Sat Aug 18 01:33:05 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 17 Aug 2007 16:33:05 PDT
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <46C61274.1050501@v.loewis.de> 
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
	<e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
	<46C5D94A.4030808@v.loewis.de>
	<786213B3-2067-41DA-9EE0-75ECB78B240A@python.org>
	<07Aug17.135444pdt."57996"@synergy1.parc.xerox.com>
	<46C61274.1050501@v.loewis.de>
Message-ID: <07Aug17.163310pdt."57996"@synergy1.parc.xerox.com>

> > What if you've got a PNG as one of the multipart components?  With a
> > Content-Transfer-Encoding of "binary"?  There's no way to represent that
> > as a string.
> 
> Sure is. Any byte sequence can be interpreted as latin-1.

Last time I looked, Latin-1 didn't cover the octets 0x80 - 0x9F.
Maybe you're thinking of Microsoft codepage 1252?

> > I wonder if we're misunderstanding each other here.  The "mail
> > message" itself is essentially a binary data structure, not a sequence
> > of strings, though many of its fields consist of carefully specified
> > string values.  Is that what you're saying?
> 
> I don't think so - I assume Barry really wants to use strings as the
> data type to represent the internal structure. It works fine for all
> aspects except for the 8bit and binary content-transfer-encodings.

Yep, that's what I'm saying -- doing it that way breaks on certain
content-transfer-encodings.  There's also a problem with line endings;
the mail standards call for an explicit CRLF sequence.

These things really aren't strings.  Few data packets are.

Bill

From bjourne at gmail.com  Sat Aug 18 02:31:10 2007
From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=)
Date: Sat, 18 Aug 2007 00:31:10 +0000
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <f9t2nn$ksg$1@sea.gmane.org>
References: <f9t2nn$ksg$1@sea.gmane.org>
Message-ID: <740c3aec0708171731qc9324c3o17debfafe4c1530d@mail.gmail.com>

It is fantastic! Totally super work. I just have one small request;
pretty please do not set the font. I'm very happy with my browsers
default (Verdana), and Bitstream Vera Sans renders badly for me.

-- 
mvh Bj?rn

From greg.ewing at canterbury.ac.nz  Sat Aug 18 03:01:48 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 18 Aug 2007 13:01:48 +1200
Subject: [Python-3000] Nested brackets (Was: Please don't kill the %
 operator...)
In-Reply-To: <46C5CA2C.1000306@v.loewis.de>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa256g$orv$1@sea.gmane.org>
	<d11dcfba0708161147x24f0a432o75a811371210961d@mail.gmail.com>
	<18116.43765.194435.952513@montanaro.dyndns.org>
	<46C4D13B.8070608@v.loewis.de>
	<07Aug16.161620pdt.57996@synergy1.parc.xerox.com>
	<46C526C5.20906@v.loewis.de>
	<46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com>
	<46C5C0D0.7040609@ronadam.com> <46C5CA2C.1000306@v.loewis.de>
Message-ID: <46C644FC.7020108@canterbury.ac.nz>

Martin v. L?wis wrote:
> I now agree with Bill that we have a "computation of matching braces
> problem", surprisingly: people disagree with each other and with the
> PEP what the meaning of the braces in above example is.

I think it should be considered an error to use
a name which is not a valid identifier, even if
the implementation doesn't detect this.

--
Greg

From stephen at xemacs.org  Sat Aug 18 03:20:11 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 18 Aug 2007 10:20:11 +0900
Subject: [Python-3000] AtheOS?
In-Reply-To: <acd65fa20708171350v343c0814p59053e150f093efa@mail.gmail.com>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>
	<46C5CD81.1040903@v.loewis.de>
	<18117.54431.306667.256066@montanaro.dyndns.org>
	<acd65fa20708171350v343c0814p59053e150f093efa@mail.gmail.com>
Message-ID: <87tzqx3agk.fsf@uwakimon.sk.tsukuba.ac.jp>

Alexandre Vassalotti writes:

 > Perhaps, it would be a good idea to take Plan9's approach to
 > portability -- i.e., you develop an extreme allergy to code filled
 > with #if, #ifdef, #else, #elseif; localize system dependencies in
 > separate files and hide them behind interfaces.

If I understand correctly, Emacs uses this approach (although not
thoroughly).  The resulting portability files are called s & m files
for several reasons.  One of course is that they are kept in
subdirectories called "s" for "system" and "m" for "machine".  The
more relevant here is the pain they inflict on developers, because
they make it more difficult to find implementations of functions, and
hide (or at least distribute across many files) potentially relevant
information.

Although parsing a deep #ifdef tree is very error-prone and annoying,
this can be improved with technology (eg, hideif.el in Emacs).  The
#ifdef approach also has an advantage that all alternative
implementations and their comments are in your face.  Mostly they're
irrelevant, but often enough they're very suggestive.  Though I
suppose a sufficiently disciplined programmer would think to use that
resource if split out into files, I am not one.  This is an important
difference between the approaches for me and perhaps for others who
only intermittently work on the code, and only on parts relevant to
their daily lives.

In the end, the difference doesn't seem to be that great, but for us
preferred practice (especially as we add support for new platforms) is
definitely to put the platform dependencies into configure, to try to
organize them by feature rather than by cpu-os-versions-thereof, and
to use #ifdefs.

FWIW YMMV.


From talin at acm.org  Sat Aug 18 03:48:54 2007
From: talin at acm.org (Talin)
Date: Fri, 17 Aug 2007 18:48:54 -0700
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
Message-ID: <46C65006.9030907@acm.org>

Wow, excellent feedback. I've added your email to the list of reminders 
for the next round of edits.

Jim Jewett wrote:
> The PEP says:
> 
>   The general form of a standard format specifier is:
> 
>         [[fill]align][sign][width][.precision][type]
> 
> but then says:
> 
>     A zero fill character without an alignment flag
>     implies an alignment type of '='.
> 
> In the above form, how can you get a fill character without an
> alignment character?  And why would you want to support it; It just
> makes the width look like an (old-style) octal.  (The spec already
> says that you need the alignment if you use a digit other than zero.)
> 
> --------------
> 
> The explicit conversion flag is limited to "r" and "s", but I assume
> that can be overridden in a Formatter subclass.  That possibility
> might be worth mentioning explicitly.
> 
> --------------
> 
> 
>     'check_unused_args' is used to implement checking
>     for unused arguments ... The intersection of these two
>     sets will be the set of unused args.
> 
> Huh?  I *think* the actual intent is (args union kwargs)-used.  I
> can't find an intersection in there.
> 
> -----------------
>     This can easily be done by overriding get_named() as follows:
> 
> I assume that should be get_value.
> 
>     class NamespaceFormatter(Formatter):
>           def __init__(self, namespace={}, flags=0):
>               Formatter.__init__(self, flags)
> 
> but the Formatter class took no init parameters -- should flags be
> added to the Formatter constructor, or taken out of here?
> 
> The get_value override can be expressed more simply as
> 
>     def get_value(self, key, args, kwds):
>         try:
>             # simplify even more by assuming PEP 3135?
>             super(NamespaceFormatter, self).get_value(key, args, kwds)
>         except KeyError:
>             return self.namespace[name]
> 
> The example usage then takes globals()...
> 
>         fmt = NamespaceFormatter(globals())
>         greeting = "hello"
>         print(fmt("{greeting}, world!"))
> 
> Is there now a promise that the objects returned by locals() and
> globals() will be "live", so that they would reflect the new value of
> "greeting", even though it was set after the Formatter was created?
> 
> -jJ
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org
> 

From talin at acm.org  Sat Aug 18 03:54:36 2007
From: talin at acm.org (Talin)
Date: Fri, 17 Aug 2007 18:54:36 -0700
Subject: [Python-3000] AtheOS?
In-Reply-To: <46C5DC89.5040207@v.loewis.de>
References: <18117.40043.124714.520626@montanaro.dyndns.org>	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>	<46C5CD81.1040903@v.loewis.de>	<18117.54431.306667.256066@montanaro.dyndns.org>
	<46C5DC89.5040207@v.loewis.de>
Message-ID: <46C6515C.20804@acm.org>

Martin v. L?wis wrote:
>>     Martin> It took some effort to integrate this for 2.3, so I feel sad
>>     Martin> that this is now all ripped out again. I'm not certain the code
>>     Martin> gets cleaner that way - just smaller.
>>
>> Well, fewer #ifdefs can't be a bad thing.
> 
> By that principle, it would be best if Python supported only a single
> platform - I would chose Linux (that would also save me creating
> Windows binaries :-)
> 
> Fewer ifdefs are a bad thing if they also go along with fewer
> functionality, or worse portability. As I said, people contributed their
> time to write this code (in this case, it took me several hours of work,
> to understand and adjust the patch being contributed), and I do find
> it bad, in principle, that this work is now all declared wasted.

I wonder how hard it would be - and how much it would distort the Python 
code base - if most if not all platform-specific differences could be 
externalized from the core Python source code. Ideally, a platform 
wishing to support Python shouldn't have to be part of the core Python 
distribution, and a "port" of Python should consist of the concatenation 
of two packages, the universal Python sources, and a set of 
platform-specific adapters, which may or may not be hosted on the main 
Python site.

However, this just might be wishful thinking on my part...

-- Talin

From guido at python.org  Sat Aug 18 04:35:37 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Aug 2007 19:35:37 -0700
Subject: [Python-3000] AtheOS?
In-Reply-To: <46C6515C.20804@acm.org>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<ca471dc20708170732v7152a171ge8652dd217dc420d@mail.gmail.com>
	<46C5CD81.1040903@v.loewis.de>
	<18117.54431.306667.256066@montanaro.dyndns.org>
	<46C5DC89.5040207@v.loewis.de> <46C6515C.20804@acm.org>
Message-ID: <ca471dc20708171935h75edf9f6t2ca180677ec4a37f@mail.gmail.com>

On 8/17/07, Talin <talin at acm.org> wrote:
> I wonder how hard it would be - and how much it would distort the Python
> code base - if most if not all platform-specific differences could be
> externalized from the core Python source code. Ideally, a platform
> wishing to support Python shouldn't have to be part of the core Python
> distribution, and a "port" of Python should consist of the concatenation
> of two packages, the universal Python sources, and a set of
> platform-specific adapters, which may or may not be hosted on the main
> Python site.

Go read the source code and look for #ifdefs. They are all over the
place and for all sorts of reasons. It would be nice if there was a
limited number of established platform dependent APIs, like "open",
"read", "write" etc. But those rarely are the problem. The real
platform differences are things like which pre-release version of
pthreads they support, what the symbol is you have to #define to get
the BSD extensions added to the header files, whether there's a bug in
their va_args implementation (and which bug it is), what the header
file name is to get the ANSI C-compatible C signal definitions, what
the error code is for interrupted I/O, whether I/O is interruptable at
all, etc., etc.

Don't forget that the POSIX and C standards *require* #ifdefs for many
features that are optional or that have different possible semantics.

Just read through fileobject.c or import.c or posixmodule.c.

Sure, there are *some* situations where this approach could clarify
some code a bit. But *most* places are too ad-hoc to define an
interface -- having three lines of random code in a different file
instead of inside an #ifdef doesn't really have any benefits.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Sat Aug 18 07:50:33 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Aug 2007 07:50:33 +0200
Subject: [Python-3000] should rfc822 accept text io or binary io?
In-Reply-To: <07Aug17.163310pdt."57996"@synergy1.parc.xerox.com>
References: <e8bf7a530708060630x699a44aaqa940cfe96a0e158b@mail.gmail.com>
	<18103.34967.170146.660275@montanaro.dyndns.org>
	<3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org>
	<e8bf7a530708070852g3877bfbayc7f722e9ee8d0c0c@mail.gmail.com>
	<ca471dc20708070951s4492aa25hd84cf40ef8a5df53@mail.gmail.com>
	<e8bf7a530708071038o25fb251au3f742c4019ee786e@mail.gmail.com>
	<ca471dc20708071052j1dc74dber6cb7eb890c31668b@mail.gmail.com>
	<e8bf7a530708071131r257c3506tbb163df1097b4e02@mail.gmail.com>
	<46C5D94A.4030808@v.loewis.de>
	<786213B3-2067-41DA-9EE0-75ECB78B240A@python.org>
	<07Aug17.135444pdt."57996"@synergy1.parc.xerox.com>
	<46C61274.1050501@v.loewis.de>
	<07Aug17.163310pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <46C688A9.9020702@v.loewis.de>

>>> What if you've got a PNG as one of the multipart components?  With a
>>> Content-Transfer-Encoding of "binary"?  There's no way to represent that
>>> as a string.
>> Sure is. Any byte sequence can be interpreted as latin-1.
> 
> Last time I looked, Latin-1 didn't cover the octets 0x80 - 0x9F.

Depends on where you looked. The IANA charset ISO_8859-1:1987 (MIBenum
4, alias latin1), defined in RFC 1345, has the C1 controls in this
place. Python's Latin-1 codec implements that specification, and when
Unicode says that the first 256 Unicode code points are identical
to Latin-1, they also refer to this definition of Latin-1.

If you look at section 1 of ISO 8859-1, you'll find that it can be used
with the coded control functions in ISO 6429. People typically assume
that it is indeed used in such a way, because you could not encode
line breaks otherwise (among other things).

> Maybe you're thinking of Microsoft codepage 1252?

Definitely not.

Regards,
Martin


From oliphant.travis at ieee.org  Sat Aug 18 13:33:51 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Sat, 18 Aug 2007 05:33:51 -0600
Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch
Message-ID: <fa6kp4$s9b$1@sea.gmane.org>


Hello all,

I'm sorry I won't be attending the Google sprints next week.  I'm going 
to be moving from Utah to Texas next week and will be offline for 
several days.

In preparation for the sprints, I have converted all Python objects to 
use the new buffer protocol PEP and implemented most of the C-API.  This 
work took place in the py3k-buffer branch which now passes all the tests 
that py3k does.

So, I merged the changes back to the py3k branch in hopes that others 
can continue working on what I've done.  The merge took place after 
fully syncing the py3k-buffer branch with the current trunk.

There will be somebody from our community that will be at the Sprints 
next week.  He has agreed to try and work on the buffer protocol some 
more.  He is new to Python and so will probably need some help.  He has 
my cell phone number and will call me with questions which I hope to 
answer.

Left to do:

1) Finish the MemoryViewObject (getitem/setitem needs work).
2) Finish the struct module changes (I've started, but have not checked
	the changes in).
3) Add tests

Possible problems:

It seems that whenever a PyExc_BufferError is raised, problems (like 
segfaults) occur.  I tried to add a new error object by copying how 
Python did it for other errors, but it's likely that I didn't do it right.

I will have email contact for a few days (until Tuesday) but will not 
have much time to work.

Thanks,


-Travis Oliphant


From skip at pobox.com  Sat Aug 18 14:07:30 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 18 Aug 2007 07:07:30 -0500
Subject: [Python-3000] Bus error after updating this morning
Message-ID: <18118.57602.910814.663900@montanaro.dyndns.org>

After reading Travis's email about the py3k-buffer merge this morning I
updated my sandbox on my Mac and rebuilt.  I got a bus error when trying to
run the tests.

    (gdb) run -E -tt ./Lib/test/regrtest.py -l 
    Starting program: /Users/skip/src/python-svn/py3k/python.exe -E -tt ./Lib/test/regrtest.py -l
    Reading symbols for shared libraries . done
    Reading symbols for shared libraries . done
    Reading symbols for shared libraries . done
    Reading symbols for shared libraries . done

    Program received signal EXC_BAD_ACCESS, Could not access memory.
    Reason: KERN_PROTECTION_FAILURE at address: 0x00000021
    0x006d0ec8 in binascii_hexlify (self=0x0, args=0x1012e70) at /Users/skip/src/python-svn/py3k/Modules/binascii.c:953
    953                     retbuf[j++] = c;
    (gdb) bt
    #0  0x006d0ec8 in binascii_hexlify (self=0x0, args=0x1012e70) at /Users/skip/src/python-svn/py3k/Modules/binascii.c:953
    #1  0x0011b020 in PyCFunction_Call (func=0x1011c78, arg=0x1012e70, kw=0x0) at Objects/methodobject.c:73
    ...

The build was configured like so:

    ./configure --prefix=/Users/skip/local LDFLAGS=-L/opt/local/lib \
    CPPFLAGS=-I/opt/local/include --with-pydebug 

Thinking maybe something didn't get rebuilt that should have I am rebuilding
from scratch after a make distclean.  I'll report back when I have more info
if someone with a faster computer doesn't beat me to it.

Skip


From skip at pobox.com  Sat Aug 18 14:21:12 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 18 Aug 2007 07:21:12 -0500
Subject: [Python-3000] Bus error after updating this morning
Message-ID: <18118.58424.155135.287777@montanaro.dyndns.org>


    Thinking maybe something didn't get rebuilt that should have I am
    rebuilding from scratch after a make distclean.

make test is actually getting to the point where it's actually running
tests, so the make distclean seems to have solved the problem.  Perhaps
there's a missing Makefile dependency somewhere.

Skip



From ncoghlan at gmail.com  Sat Aug 18 15:04:38 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 18 Aug 2007 23:04:38 +1000
Subject: [Python-3000] bytes: compare bytes to integer
In-Reply-To: <200708110225.28056.victor.stinner@haypocalc.com>
References: <200708110225.28056.victor.stinner@haypocalc.com>
Message-ID: <46C6EE66.8010701@gmail.com>

Victor Stinner wrote:
> Hi,
> 
> I don't like the behaviour of Python 3000 when we compare a bytes strings
> with length=1:
>    >>> b'xyz'[0] == b'x'
>    False
> 
> The code can be see as:
>    >>> ord(b'x') == b'x'
>    False

This seems to suggest its own solution:

   bytes_obj[0] == ord('x')

(Given that ord converts *characters* to bytes, does it actually make 
sense to allow a bytes object as an argument to ord()?)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Sat Aug 18 18:45:00 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 18 Aug 2007 09:45:00 -0700
Subject: [Python-3000] Bus error after updating this morning
In-Reply-To: <18118.58424.155135.287777@montanaro.dyndns.org>
References: <18118.58424.155135.287777@montanaro.dyndns.org>
Message-ID: <ca471dc20708180945p1a783f98n93348ea6c29630c1@mail.gmail.com>

On 8/18/07, skip at pobox.com <skip at pobox.com> wrote:
>     Thinking maybe something didn't get rebuilt that should have I am
>     rebuilding from scratch after a make distclean.
>
> make test is actually getting to the point where it's actually running
> tests, so the make distclean seems to have solved the problem.  Perhaps
> there's a missing Makefile dependency somewhere.

This typically happens when an essential .h file changes -- the
setup.py scripts that builds the extension module doesn't check these
dependencies. Nothing we can fix in the Makefile, alas.

The shorter fix is rm -rf build.

Thanks for figuring this out! I was just about to start an investigation.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Aug 18 19:05:57 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 18 Aug 2007 10:05:57 -0700
Subject: [Python-3000] Wanted: tasks for Py3k Sprint next week
Message-ID: <ca471dc20708181005u3058fb42t35fdeee9077d9cd2@mail.gmail.com>

I'm soliciting ideas for things that need to be done for the 3.0
release that would make good sprint topics. Assume we'll have a mix of
more and less experienced developers on hand.

(See wiki.python.org/moin/GoogleSprint .)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Sat Aug 18 20:28:57 2007
From: brett at python.org (Brett Cannon)
Date: Sat, 18 Aug 2007 11:28:57 -0700
Subject: [Python-3000] AtheOS?
In-Reply-To: <18118.591.126364.329836@montanaro.dyndns.org>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<bbaeab100708171223y30c0f1f7v30302c582bf4d16b@mail.gmail.com>
	<18118.591.126364.329836@montanaro.dyndns.org>
Message-ID: <bbaeab100708181128x247ed7aahd50ff3a0faf35789@mail.gmail.com>

On 8/17/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     Brett> On 8/17/07, skip at pobox.com <skip at pobox.com> wrote:
>     >> I just got rid of BeOS and RiscOS.
>
>     Brett> Just so you know, Skip, BeOS still has a maintainer on the 2.x
>     Brett> branch.  Whether we want to continue support past 2.x is another
>     Brett> question (as Guido says in another email, it's a hassle and so we
>     Brett> should try to minimize the OS support to those that follow
>     Brett> convention).
>
> I was going by the list on the wiki.  BeOS was on that list.  I removed it
> in a single checkin so if it's decided in the near future to put it back
> that should be easy to do.
>

Well, the maintainer of the current port said he has been moving away
from BeOS.  He guessed the Haiku developers didn't need the special
support (but that's a guess).

Looks like this can probably be removed from 2.6 as well if there is
no maintainer.

-Brett

From skip at pobox.com  Sat Aug 18 21:10:15 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 18 Aug 2007 14:10:15 -0500
Subject: [Python-3000] AtheOS?
In-Reply-To: <bbaeab100708181128x247ed7aahd50ff3a0faf35789@mail.gmail.com>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<bbaeab100708171223y30c0f1f7v30302c582bf4d16b@mail.gmail.com>
	<18118.591.126364.329836@montanaro.dyndns.org>
	<bbaeab100708181128x247ed7aahd50ff3a0faf35789@mail.gmail.com>
Message-ID: <18119.17431.670483.153498@montanaro.dyndns.org>


    Brett> Well, the maintainer of the current port said he has been moving
    Brett> away from BeOS.  He guessed the Haiku developers didn't need the
    Brett> special support (but that's a guess).

What does poetry have to do with BeOS?

    Brett> Looks like this can probably be removed from 2.6 as well if there
    Brett> is no maintainer.

I'll update PEP 11.  Deprecate in 2.6.  Break the build in 2.7.  Gone
altogether in 3.0.

Skip



From brett at python.org  Sat Aug 18 21:13:08 2007
From: brett at python.org (Brett Cannon)
Date: Sat, 18 Aug 2007 12:13:08 -0700
Subject: [Python-3000] AtheOS?
In-Reply-To: <18119.17431.670483.153498@montanaro.dyndns.org>
References: <18117.40043.124714.520626@montanaro.dyndns.org>
	<bbaeab100708171223y30c0f1f7v30302c582bf4d16b@mail.gmail.com>
	<18118.591.126364.329836@montanaro.dyndns.org>
	<bbaeab100708181128x247ed7aahd50ff3a0faf35789@mail.gmail.com>
	<18119.17431.670483.153498@montanaro.dyndns.org>
Message-ID: <bbaeab100708181213n15334842m8bc203f8a5b08e4f@mail.gmail.com>

On 8/18/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     Brett> Well, the maintainer of the current port said he has been moving
>     Brett> away from BeOS.  He guessed the Haiku developers didn't need the
>     Brett> special support (but that's a guess).
>
> What does poetry have to do with BeOS?

I am assuming you are referencing Haiku.  =)  Haiku is to Syllabus
what AtheOS is to Be; a group of people who loved a dead OS enough to
start a new open source project to mimick the original.

>
>     Brett> Looks like this can probably be removed from 2.6 as well if there
>     Brett> is no maintainer.
>
> I'll update PEP 11.  Deprecate in 2.6.  Break the build in 2.7.  Gone
> altogether in 3.0.

Great!

-Brett

From guido at python.org  Sat Aug 18 21:48:54 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 18 Aug 2007 12:48:54 -0700
Subject: [Python-3000] bytes: compare bytes to integer
In-Reply-To: <46C6EE66.8010701@gmail.com>
References: <200708110225.28056.victor.stinner@haypocalc.com>
	<46C6EE66.8010701@gmail.com>
Message-ID: <ca471dc20708181248u79eaaa63i34902eda9347a725@mail.gmail.com>

On 8/18/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Victor Stinner wrote:
> > Hi,
> >
> > I don't like the behaviour of Python 3000 when we compare a bytes strings
> > with length=1:
> >    >>> b'xyz'[0] == b'x'
> >    False
> >
> > The code can be see as:
> >    >>> ord(b'x') == b'x'
> >    False
>
> This seems to suggest its own solution:
>
>    bytes_obj[0] == ord('x')
>
> (Given that ord converts *characters* to bytes, does it actually make
> sense to allow a bytes object as an argument to ord()?)

No, I added that as a quick hack during the transition. If someone has
the time, please kill this behavior and fix the (hopefully) few places
that were relying on it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Aug 18 22:18:47 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 18 Aug 2007 13:18:47 -0700
Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch
In-Reply-To: <fa6kp4$s9b$1@sea.gmane.org>
References: <fa6kp4$s9b$1@sea.gmane.org>
Message-ID: <ca471dc20708181318h50e95cf7gdbb4c94adaee2a98@mail.gmail.com>

Wow. Thanks for a great job, Travis! I'll accept your PEP now. :-)

We'll attend to the details at the sprint.

--Guido

On 8/18/07, Travis E. Oliphant <oliphant.travis at ieee.org> wrote:
>
> Hello all,
>
> I'm sorry I won't be attending the Google sprints next week.  I'm going
> to be moving from Utah to Texas next week and will be offline for
> several days.
>
> In preparation for the sprints, I have converted all Python objects to
> use the new buffer protocol PEP and implemented most of the C-API.  This
> work took place in the py3k-buffer branch which now passes all the tests
> that py3k does.
>
> So, I merged the changes back to the py3k branch in hopes that others
> can continue working on what I've done.  The merge took place after
> fully syncing the py3k-buffer branch with the current trunk.
>
> There will be somebody from our community that will be at the Sprints
> next week.  He has agreed to try and work on the buffer protocol some
> more.  He is new to Python and so will probably need some help.  He has
> my cell phone number and will call me with questions which I hope to
> answer.
>
> Left to do:
>
> 1) Finish the MemoryViewObject (getitem/setitem needs work).
> 2) Finish the struct module changes (I've started, but have not checked
>         the changes in).
> 3) Add tests
>
> Possible problems:
>
> It seems that whenever a PyExc_BufferError is raised, problems (like
> segfaults) occur.  I tried to add a new error object by copying how
> Python did it for other errors, but it's likely that I didn't do it right.
>
> I will have email contact for a few days (until Tuesday) but will not
> have much time to work.
>
> Thanks,
>
>
> -Travis Oliphant
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From aahz at pythoncraft.com  Sun Aug 19 00:13:44 2007
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 18 Aug 2007 15:13:44 -0700
Subject: [Python-3000] Please don't kill the % operator...
In-Reply-To: <ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au>
	<18116.17176.123168.265491@montanaro.dyndns.org>
	<fa1hkr$gfj$1@sea.gmane.org>
	<ca471dc20708160753t30343b70ve8cf2e3a468b3a0d@mail.gmail.com>
Message-ID: <20070818221344.GA5742@panix.com>

On Thu, Aug 16, 2007, Guido van Rossum wrote:
>
> I don't know what percentage of %-formatting uses a string literal on
> the left; if it's a really high number (high 90s), I'd like to kill
> %-formatting and go with mechanical translation; otherwise, I think
> we'll have to phase out %-formatting in 3.x or 4.0.

Then there's the pseudo-literal for long lines:

    if not data:
        msg = "Syntax error at line %s: missing 'data' element"
        msg %= line_number

Including those, my code probably has quite a few non-literals.  Even
when you exclude them, going through and finding the non-literals will
cause much pain, because we do use "%" for numeric purposes and because
our homebrew templating language uses "%" to indicate a variable.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"And if that makes me an elitist...I couldn't be happier."  --JMS

From skip at pobox.com  Sun Aug 19 00:27:28 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 18 Aug 2007 17:27:28 -0500
Subject: [Python-3000] PEP 11 update - Call for port maintainers to step
	forward
Message-ID: <18119.29264.543717.894262@montanaro.dyndns.org>

I made a quick update to PEP 11, "Removing support for little used
platforms".  I added details about ending support for AtheOS/Syllable and
BeOS.

I also added a yet-to-be-fleshed out section entitled "Platform
Maintainers".  I intend that to the extent possible we document the
responsible parties for various platforms.  Obviously, common platforms like
Windows, Mac OS X, Linux and common Unix platforms (Solaris, *BSD, what
else?)  will continue to be supported by the core Python developer
community, but lesser platforms should have one or more champions, and we
should be able to get ahold of them to determine their continued interest in
supporting Python on their platform(s).  If you are the "owner" of a minor
platform, please drop me a note.  Ones I'm aware of that probably need
specialized support outside the core Python developers include:

    IRIX
    Tru64 (aka OSF/1 and other names (what else?))
    OS2/EMX (Andrew MacIntyre?)
    Cygwin
    MinGW
    HP-UX
    AIX
    Solaris < version 8
    SCO
    Unixware

IRIX and Tru64 are likely to go the way of the dodo if someone doesn't step
up soon to offer support.  I don't expect the others to disappear soon, but
they tend to need more specialized support, especially in more "challenging"
areas (shared library support, threading, etc).

If you maintain the platform-specific aspects for any of these platforms,
please let me know.  If you aren't that person but know who is, please pass
this note along to them.  If I've missed any other platforms (I know I must
have have missed something), let me know that as well.

Thanks,

-- 
Skip Montanaro - skip at pobox.com - http://www.webfast.com/~skip/

From greg.ewing at canterbury.ac.nz  Sun Aug 19 02:27:38 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 19 Aug 2007 12:27:38 +1200
Subject: [Python-3000] bytes: compare bytes to integer
In-Reply-To: <46C6EE66.8010701@gmail.com>
References: <200708110225.28056.victor.stinner@haypocalc.com>
	<46C6EE66.8010701@gmail.com>
Message-ID: <46C78E7A.10600@canterbury.ac.nz>

Nick Coghlan wrote:
>    bytes_obj[0] == ord('x')

That's a rather expensive way of comparing an
integer with a constant, though.

--
Greg

From lists at cheimes.de  Sun Aug 19 03:18:29 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 19 Aug 2007 03:18:29 +0200
Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch
In-Reply-To: <fa6kp4$s9b$1@sea.gmane.org>
References: <fa6kp4$s9b$1@sea.gmane.org>
Message-ID: <fa85p6$lpb$1@sea.gmane.org>

Travis E. Oliphant wrote:
> Left to do:
> 
> 1) Finish the MemoryViewObject (getitem/setitem needs work).
> 2) Finish the struct module changes (I've started, but have not checked
> 	the changes in).
> 3) Add tests
> 
> Possible problems:
> 
> It seems that whenever a PyExc_BufferError is raised, problems (like 
> segfaults) occur.  I tried to add a new error object by copying how 
> Python did it for other errors, but it's likely that I didn't do it right.
> 
> I will have email contact for a few days (until Tuesday) but will not 
> have much time to work.

I was wondering what the memoryview is doing so I tried it with a string:

./python -c "memoryview('test')"
Segmentation fault

Ooops! gdb says this about the error:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1210415424 (LWP 14436)]
0x080f77a0 in PyErr_SetObject (exception=0x81962c0, value=0xb7cee3a8) at
Python/errors.c:55
55              if (exception != NULL &&

Bug report:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1777057&group_id=5470

Christian


From nnorwitz at gmail.com  Sun Aug 19 06:32:24 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sat, 18 Aug 2007 21:32:24 -0700
Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch
In-Reply-To: <fa6kp4$s9b$1@sea.gmane.org>
References: <fa6kp4$s9b$1@sea.gmane.org>
Message-ID: <ee2a432c0708182132v2d89ef9cn1edc3cae933762b1@mail.gmail.com>

On 8/18/07, Travis E. Oliphant <oliphant.travis at ieee.org> wrote:
>
> In preparation for the sprints, I have converted all Python objects to
> use the new buffer protocol PEP and implemented most of the C-API.  This
> work took place in the py3k-buffer branch which now passes all the tests
> that py3k does.
>
> So, I merged the changes back to the py3k branch in hopes that others
> can continue working on what I've done.  The merge took place after
> fully syncing the py3k-buffer branch with the current trunk.
>
> Left to do:
>
> 1) Finish the MemoryViewObject (getitem/setitem needs work).
> 2) Finish the struct module changes (I've started, but have not checked
>         the changes in).
> 3) Add tests

Also need to add doc.  I noticed not all the new APIs mentioned the
meaning of the return value.  Do all the new functions which return
int only return 0 on success and -1 on failure.  Or do any return a
size.  I'm thinking of possible issues with Py_ssize_t vs int
mismatches.  I saw a couple which might have been a problem.  See
below.

> Possible problems:
>
> It seems that whenever a PyExc_BufferError is raised, problems (like
> segfaults) occur.  I tried to add a new error object by copying how
> Python did it for other errors, but it's likely that I didn't do it right.

I think I fixed this.  Needed to add PRE_INIT and POST_INIT for the
new exception.  This fixed the problem reported by Christian Heimes in
this thread.

I checked in revision 57193 which was a code review.  I pointed out
all the places I thought there were problems.  Since some of this code
is tricky, I expect there will be more issues.  This code really,
really needs tests.

I added a comment about a memory leak.  Below is the stack trace of
where the memory was allocated.  I added a comment (in release buffer)
where I thought it could be freed, but I'm not sure that's the right
place.

When I ran the test suite test_xmlrpc failed.  I'm not sure if this
was from your checkin, my checkin, or something else.

n
--
Memory leaked when allocated from:
 array_buffer_getbuf (arraymodule.c:1775)
 buffer_getbuf (bufferobject.c:28)
 bytes_init (bytesobject.c:807)
 type_call (typeobject.c:429)

From guido at python.org  Sun Aug 19 06:37:42 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 18 Aug 2007 21:37:42 -0700
Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch
In-Reply-To: <ee2a432c0708182132v2d89ef9cn1edc3cae933762b1@mail.gmail.com>
References: <fa6kp4$s9b$1@sea.gmane.org>
	<ee2a432c0708182132v2d89ef9cn1edc3cae933762b1@mail.gmail.com>
Message-ID: <ca471dc20708182137m32fb688eq62ec133c415f4cd9@mail.gmail.com>

On 8/18/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> When I ran the test suite test_xmlrpc failed.  I'm not sure if this
> was from your checkin, my checkin, or something else.

This was already failing before; I think I reported it Friday or
Thursday night. This started happening after a merge from the trunk
brought in a bunch of new unit test code for xmlrpc. I'm guessing it's
str/bytes issues.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Sun Aug 19 06:50:10 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Sat, 18 Aug 2007 22:50:10 -0600
Subject: [Python-3000] Wanted: tasks for Py3k Sprint next week
In-Reply-To: <ca471dc20708181005u3058fb42t35fdeee9077d9cd2@mail.gmail.com>
References: <ca471dc20708181005u3058fb42t35fdeee9077d9cd2@mail.gmail.com>
Message-ID: <aac2c7cb0708182150n2c2f7e04u8d24aa761da2a64a@mail.gmail.com>

On 8/18/07, Guido van Rossum <guido at python.org> wrote:
> I'm soliciting ideas for things that need to be done for the 3.0
> release that would make good sprint topics. Assume we'll have a mix of
> more and less experienced developers on hand.
>
> (See wiki.python.org/moin/GoogleSprint .)

Would ripping out the malloc macros[1] be a suitable suggestion?

[1]
Include/objimpl.h:#define PyObject_MALLOC               PyMem_MALLOC
Include/pymem.h:#define PyMem_MALLOC            PyObject_MALLOC

-- 
Adam Olsen, aka Rhamphoryncus

From steve at holdenweb.com  Sat Aug 18 00:58:44 2007
From: steve at holdenweb.com (Steve Holden)
Date: Fri, 17 Aug 2007 18:58:44 -0400
Subject: [Python-3000] [Python-Dev] Documentation switch imminent
In-Reply-To: <acd65fa20708170928y6dc91623tefa112d4fe7528e5@mail.gmail.com>
References: <f9t2nn$ksg$1@sea.gmane.org>
	<f9v2rd$2dl$1@sea.gmane.org>	<ee2a432c0708152207j66c26fbdu6e9bcec6dae0ccf5@mail.gmail.com>	<acd65fa20708161643s6d7c0c42x135eb0957980aa14@mail.gmail.com>	<fa3egl$56d$1@sea.gmane.org>
	<acd65fa20708170928y6dc91623tefa112d4fe7528e5@mail.gmail.com>
Message-ID: <46C62824.90002@holdenweb.com>

Alexandre Vassalotti wrote:
> On 8/17/07, Georg Brandl <g.brandl at gmx.net> wrote:
[...]
> Ah, I didn't notice that index included all the documents. That
> explains the huge size increase. However, would it be possible to keep
> the indexes separated? I noticed that I find I want more quickly when
> the indexes are separated.
> 
Which is fine when you know which section to expect to find your content 
in. But let's retain an "all-documentation" index if we can, as this is 
particularly helpful to the newcomers who aren't that familiar with the 
structure of the documentation.

>> I've now removed leading spaces in the index output, and the character
>> count is down to 850000.
>>
>>> Firefox, on my fairly recent machine, takes ~5 seconds rendering the
>>> index of the new docs from disk, compared to a fraction of a second
>>> for the old one.
>> But you're right that rendering is slow there.  It may be caused by the
>> more complicated CSS... perhaps the index should be split up in several
>> pages.
>>
> 
> I disabled CSS-support (with View->Page Style->No Style), but it
> didn't affect the initial rendering speed. However, scrolling was
> *much* faster without CSS.
> 
Probably because the positional calculations are more straightforward then.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd           http://www.holdenweb.com
Skype: holdenweb      http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

From nnorwitz at gmail.com  Sun Aug 19 20:50:47 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sun, 19 Aug 2007 11:50:47 -0700
Subject: [Python-3000] cleaning up different ways to free an object
Message-ID: <ee2a432c0708191150l655caa0fq941b3c65751e20d5@mail.gmail.com>

I just fixed a bug in the new memoryview that used PyObject_DEL which
caused a problem in debug mode.  I had to change it to a Py_DECREF.
It seems we have a lot of spellings of ways to free an object and I
wonder if there are more problems lurking in there.

$ cat */*.c | grep -c PyObject_Del
103
$ cat */*.c | grep -c PyObject_DEL
16
$ cat */*.c | grep -c PyObject_Free
16
$ cat */*.c | grep -c PyObject_FREE
19

I don't know how many of these are correct or incorrect.

Note in Include/objimpl, the Del and Free variants are the same.  I
plan to get rid of one of them.

#define PyObject_Del            PyObject_Free
#define PyObject_DEL            PyObject_FREE

PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled
with debug mode, pymalloc, or not.

What are the rules for when a particular API should be used (or not
used) to free an object?

n

From brett at python.org  Sun Aug 19 23:08:41 2007
From: brett at python.org (Brett Cannon)
Date: Sun, 19 Aug 2007 14:08:41 -0700
Subject: [Python-3000] cleaning up different ways to free an object
In-Reply-To: <ee2a432c0708191150l655caa0fq941b3c65751e20d5@mail.gmail.com>
References: <ee2a432c0708191150l655caa0fq941b3c65751e20d5@mail.gmail.com>
Message-ID: <bbaeab100708191408h7ade8712wdbace192fc2ea2f1@mail.gmail.com>

On 8/19/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> I just fixed a bug in the new memoryview that used PyObject_DEL which
> caused a problem in debug mode.  I had to change it to a Py_DECREF.
> It seems we have a lot of spellings of ways to free an object and I
> wonder if there are more problems lurking in there.
>
> $ cat */*.c | grep -c PyObject_Del
> 103
> $ cat */*.c | grep -c PyObject_DEL
> 16
> $ cat */*.c | grep -c PyObject_Free
> 16
> $ cat */*.c | grep -c PyObject_FREE
> 19
>
> I don't know how many of these are correct or incorrect.
>
> Note in Include/objimpl, the Del and Free variants are the same.  I
> plan to get rid of one of them.
>
> #define PyObject_Del            PyObject_Free
> #define PyObject_DEL            PyObject_FREE
>
> PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled
> with debug mode, pymalloc, or not.
>
> What are the rules for when a particular API should be used (or not
> used) to free an object?

If you read the comment at the top of Include/objimpl.h, it says that
PyObject_(New|NewVar|Del) are for object allocation while
PyObject_(Malloc|Realloc|Free) are just like malloc/free, but they use
pymalloc instead of the system malloc.

After that there are the usual performance macros.

I am sure that prefixing the pymalloc versions of malloc/free PyObject
is confusing for people.  Maybe that can change to something like
PyMalloc_* or something to disambiguate better.

-Brett

From rhamph at gmail.com  Sun Aug 19 23:33:48 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 19 Aug 2007 15:33:48 -0600
Subject: [Python-3000] cleaning up different ways to free an object
In-Reply-To: <ee2a432c0708191150l655caa0fq941b3c65751e20d5@mail.gmail.com>
References: <ee2a432c0708191150l655caa0fq941b3c65751e20d5@mail.gmail.com>
Message-ID: <aac2c7cb0708191433s44183416uc426970c8692e8b6@mail.gmail.com>

On 8/19/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> I just fixed a bug in the new memoryview that used PyObject_DEL which
> caused a problem in debug mode.  I had to change it to a Py_DECREF.
> It seems we have a lot of spellings of ways to free an object and I
> wonder if there are more problems lurking in there.
>
> $ cat */*.c | grep -c PyObject_Del
> 103
> $ cat */*.c | grep -c PyObject_DEL
> 16
> $ cat */*.c | grep -c PyObject_Free
> 16
> $ cat */*.c | grep -c PyObject_FREE
> 19
>
> I don't know how many of these are correct or incorrect.
>
> Note in Include/objimpl, the Del and Free variants are the same.  I
> plan to get rid of one of them.
>
> #define PyObject_Del            PyObject_Free
> #define PyObject_DEL            PyObject_FREE
>
> PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled
> with debug mode, pymalloc, or not.
>
> What are the rules for when a particular API should be used (or not
> used) to free an object?

Going from the lowest level to the highest level we have:

{malloc,realloc,free} - libc's functions, arguments are bytes.

PyMem_{Malloc,Realloc,Free} - Simple wrapper of {malloc,realloc,free}
or PyObject_{Malloc,Realloc,Free}, but guarantees 0-byte allocations
will always succeed.  Do we really need this?  At best it seems
synonymous with PyObject_{Malloc,Realloc,Free}.  It is a better name
though.

PyObject_{Malloc,Realloc,Free} - obmalloc.c's functions, arguments are
bytes.  Despite the name, I believe it can be used for arbitrary
allocations (not just PyObjects.)  Probably shouldn't be in Objects/.
configure calls these pymalloc and they are controlled by the
WITH_PYMALLOC define.  Also guarantees 0-byte allocations will
succeed.

_PyObject_{New,NewVar} - object.c's functions, arguments are a
PyTypeObject and optionally a size.  Determines the number of bytes
automatically, initializes ob_type and ob_refcnt fields.

_PyObject_Del - Does nothing in particular (wraps free/Free), but the
argument is intended to be a PyObject returned by
_PyObject_{New,NewVar}.  Exists only to complement other functions.
Currently only a macro.  Could be extended to sanity-check ob_refcnt
field on debug builds.

_PyObject_GC_{New,NewVar,Del} - As _PyObject_{New,NewVar,Del}, but
adds hidden accounting info needed by cycle GC.

PyObject{,_GC}_{New,NewVar,Del} - Macros that add typecasting to the above.

-- 
Adam Olsen, aka Rhamphoryncus

From nnorwitz at gmail.com  Mon Aug 20 02:18:28 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sun, 19 Aug 2007 17:18:28 -0700
Subject: [Python-3000] cleaning up different ways to free an object
In-Reply-To: <bbaeab100708191408h7ade8712wdbace192fc2ea2f1@mail.gmail.com>
References: <ee2a432c0708191150l655caa0fq941b3c65751e20d5@mail.gmail.com>
	<bbaeab100708191408h7ade8712wdbace192fc2ea2f1@mail.gmail.com>
Message-ID: <ee2a432c0708191718u7919aee0y55c440b1ae5e1701@mail.gmail.com>

On 8/19/07, Brett Cannon <brett at python.org> wrote:
> On 8/19/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > I just fixed a bug in the new memoryview that used PyObject_DEL which
> > caused a problem in debug mode.  I had to change it to a Py_DECREF.
> > It seems we have a lot of spellings of ways to free an object and I
> > wonder if there are more problems lurking in there.
> >
> > $ cat */*.c | grep -c PyObject_Del
> > 103
> > $ cat */*.c | grep -c PyObject_DEL
> > 16
> > $ cat */*.c | grep -c PyObject_Free
> > 16
> > $ cat */*.c | grep -c PyObject_FREE
> > 19
> >
> > I don't know how many of these are correct or incorrect.
> >
> > Note in Include/objimpl, the Del and Free variants are the same.  I
> > plan to get rid of one of them.
> >
> > #define PyObject_Del            PyObject_Free
> > #define PyObject_DEL            PyObject_FREE
> >
> > PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled
> > with debug mode, pymalloc, or not.
> >
> > What are the rules for when a particular API should be used (or not
> > used) to free an object?
>
> If you read the comment at the top of Include/objimpl.h, it says that
> PyObject_(New|NewVar|Del) are for object allocation while
> PyObject_(Malloc|Realloc|Free) are just like malloc/free, but they use
> pymalloc instead of the system malloc.

Ya, I'm not talking about the distinctions/categories.  They makes
sense.  The 'correctness' I was referring to (thus the rules) was when
to use PyObject_Del vs Py_DECREF (ie, the problem with memoryview).  I
was trying to point out with the greps that the DELs/FREEs were
infrequently used.  I know there are some cases in _bsddb.c and I'm
wondering if those are correct (there are a handful of other modules
which also use them).  The Del variants are used more in Modules while
the Free variants are used more in the core.

Changing PyObject_Del to Py_DECREF may require that more of a
structure needs to be initialized before DECREFing, otherwise the
dealloc might access uninitialized memory.

I guess I can't really get rid of the aliases though, not without
making the API inconsistent.

> After that there are the usual performance macros.

Another thing that kinda bugs me is that the 'macro' versions are not
normally macros.  In a default build (ie, with pymalloc), they are
non-inlined function calls.

n

From eric+python-dev at trueblade.com  Mon Aug 20 02:56:03 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Sun, 19 Aug 2007 20:56:03 -0400
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <46C65006.9030907@acm.org>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
	<46C65006.9030907@acm.org>
Message-ID: <46C8E6A3.4070202@trueblade.com>

Talin wrote:
> Wow, excellent feedback. I've added your email to the list of reminders 
> for the next round of edits.

Here's something else for future edits:

1. When converting a string to an integer, what should the rules be? 
Should:
format("0xd", "d")
produce "13", or should it be an error?

2. I'm making the format specifiers as strict as I can.  So, I've made 
these ValueError's:

For strings:
   - specifying a sign
   - specifying an alignment of '='

For longs:
   - specify a precision
   - specify a sign with type of 'c'

Eric.




From eric+python-dev at trueblade.com  Mon Aug 20 03:16:34 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Sun, 19 Aug 2007 21:16:34 -0400
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <46C8E6A3.4070202@trueblade.com>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>	<46C65006.9030907@acm.org>
	<46C8E6A3.4070202@trueblade.com>
Message-ID: <46C8EB72.2030505@trueblade.com>

Eric Smith wrote:

> 2. I'm making the format specifiers as strict as I can.  So, I've made 
> these ValueError's:

I should have mentioned that I expect there to be criticism of this 
decision.  I'd like to start with making the specifier parser strict, we 
can always loosen it if we find the need, when converting actual code.

Eric.


From oliphant.travis at ieee.org  Mon Aug 20 09:21:18 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Mon, 20 Aug 2007 01:21:18 -0600
Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch
In-Reply-To: <ee2a432c0708182132v2d89ef9cn1edc3cae933762b1@mail.gmail.com>
References: <fa6kp4$s9b$1@sea.gmane.org>
	<ee2a432c0708182132v2d89ef9cn1edc3cae933762b1@mail.gmail.com>
Message-ID: <fabenk$afa$1@sea.gmane.org>

Neal Norwitz wrote:
> On 8/18/07, Travis E. Oliphant <oliphant.travis at ieee.org> wrote:
>> In preparation for the sprints, I have converted all Python objects to
>> use the new buffer protocol PEP and implemented most of the C-API.  This
>> work took place in the py3k-buffer branch which now passes all the tests
>> that py3k does.
>>
>> So, I merged the changes back to the py3k branch in hopes that others
>> can continue working on what I've done.  The merge took place after
>> fully syncing the py3k-buffer branch with the current trunk.
>>
>> Left to do:
>>
>> 1) Finish the MemoryViewObject (getitem/setitem needs work).
>> 2) Finish the struct module changes (I've started, but have not checked
>>         the changes in).
>> 3) Add tests
> 
> Also need to add doc.  I noticed not all the new APIs mentioned the
> meaning of the return value.  Do all the new functions which return
> int only return 0 on success and -1 on failure.  Or do any return a
> size.  I'm thinking of possible issues with Py_ssize_t vs int
> mismatches.  I saw a couple which might have been a problem.  See
> below.

Yes,  IIRC that is correct.

> 
>> Possible problems:
>>
>> It seems that whenever a PyExc_BufferError is raised, problems (like
>> segfaults) occur.  I tried to add a new error object by copying how
>> Python did it for other errors, but it's likely that I didn't do it right.
> 
> I think I fixed this.  Needed to add PRE_INIT and POST_INIT for the
> new exception.  This fixed the problem reported by Christian Heimes in
> this thread.

Thanks very much.

> 
> I checked in revision 57193 which was a code review.  I pointed out
> all the places I thought there were problems.  Since some of this code
> is tricky, I expect there will be more issues.  This code really,
> really needs tests.

If Chris (the guy who will be at the sprint) does not write tests, I 
will, but it will probably be after about Aug. 27.

> 
> I added a comment about a memory leak.  Below is the stack trace of
> where the memory was allocated.  I added a comment (in release buffer)
> where I thought it could be freed, but I'm not sure that's the right
> place.

There should be no memory to free there.  The get and release buffer 
mechanism doesn't allocate or free any memory (there was a hack in 
arrayobject which I just fixed).

Now, perhaps there are some reference counting issues, but the mechanism 
doesn't really play with reference counts either.  I will be around 
after August 27th to test the code more (it will help to finish 
implementing the MemoryView Object -- i.e. get its tolist function 
working, and so forth).

> 
> When I ran the test suite test_xmlrpc failed.  I'm not sure if this
> was from your checkin, my checkin, or something else.
> 

This was definitely happening prior to my checking in.

> n
> --
> Memory leaked when allocated from:
>  array_buffer_getbuf (arraymodule.c:1775)
>  buffer_getbuf (bufferobject.c:28)
>  bytes_init (bytesobject.c:807)
>  type_call (typeobject.c:429)

Hmm.  I'm not sure what memory is being leaked unless there are 
reference counting issues I'm not seeing.

In bytes_init for example, that line number is a static memory 
allocation?  How is static memory being leaked?

The arraymodule.c malloc call should be gone now as the possible strings 
needed are now in the source code itself.


From brett at python.org  Mon Aug 20 09:51:41 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 20 Aug 2007 00:51:41 -0700
Subject: [Python-3000] Planning to switch to new tracker on August 23rd
Message-ID: <bbaeab100708200051kadbe485h2cdc3176b157d8a4@mail.gmail.com>

Having squashed the final issues, we are now ready to switch over to
the new tracker!  The plan is to do it on the 23rd.  But before I
announce to the community I wanted to make sure there was not some
specific objection by python-dev or python-3000.  If there is please
let me know by midday Monday so that we can postpone to next week if
needed.

-Brett

From nnorwitz at gmail.com  Mon Aug 20 18:37:30 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Mon, 20 Aug 2007 09:37:30 -0700
Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch
In-Reply-To: <fabenk$afa$1@sea.gmane.org>
References: <fa6kp4$s9b$1@sea.gmane.org>
	<ee2a432c0708182132v2d89ef9cn1edc3cae933762b1@mail.gmail.com>
	<fabenk$afa$1@sea.gmane.org>
Message-ID: <ee2a432c0708200937h558a5832l409d565722257809@mail.gmail.com>

On 8/20/07, Travis E. Oliphant <oliphant.travis at ieee.org> wrote:
>
> > Memory leaked when allocated from:
> >  array_buffer_getbuf (arraymodule.c:1775)
> >  buffer_getbuf (bufferobject.c:28)
> >  bytes_init (bytesobject.c:807)
> >  type_call (typeobject.c:429)
>
> Hmm.  I'm not sure what memory is being leaked unless there are
> reference counting issues I'm not seeing.
>
> In bytes_init for example, that line number is a static memory
> allocation?  How is static memory being leaked?

I'm not sure if this was before or after my checkin, so the line
numbers could have been off a bit.

> The arraymodule.c malloc call should be gone now as the possible strings
> needed are now in the source code itself.

That was the only leak AFAIK.  So hopefully by removing it there
aren't any more.  Once there are tests it will be worthwhile to check
again.  I don't think I checked for refleaks.

n

From rhamph at gmail.com  Mon Aug 20 18:49:09 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 20 Aug 2007 10:49:09 -0600
Subject: [Python-3000] cleaning up different ways to free an object
In-Reply-To: <ee2a432c0708191718u7919aee0y55c440b1ae5e1701@mail.gmail.com>
References: <ee2a432c0708191150l655caa0fq941b3c65751e20d5@mail.gmail.com>
	<bbaeab100708191408h7ade8712wdbace192fc2ea2f1@mail.gmail.com>
	<ee2a432c0708191718u7919aee0y55c440b1ae5e1701@mail.gmail.com>
Message-ID: <aac2c7cb0708200949p5c88f23fyfa2c9492448df625@mail.gmail.com>

On 8/19/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> On 8/19/07, Brett Cannon <brett at python.org> wrote:
> > On 8/19/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > I just fixed a bug in the new memoryview that used PyObject_DEL which
> > > caused a problem in debug mode.  I had to change it to a Py_DECREF.
> > > It seems we have a lot of spellings of ways to free an object and I
> > > wonder if there are more problems lurking in there.
> > >
> > > $ cat */*.c | grep -c PyObject_Del
> > > 103
> > > $ cat */*.c | grep -c PyObject_DEL
> > > 16
> > > $ cat */*.c | grep -c PyObject_Free
> > > 16
> > > $ cat */*.c | grep -c PyObject_FREE
> > > 19
> > >
> > > I don't know how many of these are correct or incorrect.
> > >
> > > Note in Include/objimpl, the Del and Free variants are the same.  I
> > > plan to get rid of one of them.
> > >
> > > #define PyObject_Del            PyObject_Free
> > > #define PyObject_DEL            PyObject_FREE
> > >
> > > PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled
> > > with debug mode, pymalloc, or not.
> > >
> > > What are the rules for when a particular API should be used (or not
> > > used) to free an object?
> >
> > If you read the comment at the top of Include/objimpl.h, it says that
> > PyObject_(New|NewVar|Del) are for object allocation while
> > PyObject_(Malloc|Realloc|Free) are just like malloc/free, but they use
> > pymalloc instead of the system malloc.
>
> Ya, I'm not talking about the distinctions/categories.  They makes
> sense.  The 'correctness' I was referring to (thus the rules) was when
> to use PyObject_Del vs Py_DECREF (ie, the problem with memoryview).  I
> was trying to point out with the greps that the DELs/FREEs were
> infrequently used.  I know there are some cases in _bsddb.c and I'm
> wondering if those are correct (there are a handful of other modules
> which also use them).  The Del variants are used more in Modules while
> the Free variants are used more in the core.

Thus my suggestion to add a refcount check to _PyObject_Del.  It
should only be used when the refcounts hits 0.  Using it at 1 could be
allowed too, or maybe that should be a ForceDel variant?


> Changing PyObject_Del to Py_DECREF may require that more of a
> structure needs to be initialized before DECREFing, otherwise the
> dealloc might access uninitialized memory.
>
> I guess I can't really get rid of the aliases though, not without
> making the API inconsistent.
>
> > After that there are the usual performance macros.
>
> Another thing that kinda bugs me is that the 'macro' versions are not
> normally macros.  In a default build (ie, with pymalloc), they are
> non-inlined function calls.

I'd much like to see the macros (other than the type casting ones)
ripped out.  I doubt a performance advantage for normal non-debug uses
can be demonstrated.  (Prove me wrong of course.)

The hardest part of trying to understand what to call is because of
all these macros and ifdefs.

-- 
Adam Olsen, aka Rhamphoryncus

From guido at python.org  Mon Aug 20 21:01:16 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 20 Aug 2007 12:01:16 -0700
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <46C8E6A3.4070202@trueblade.com>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>
Message-ID: <ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>

On 8/19/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Talin wrote:
> > Wow, excellent feedback. I've added your email to the list of reminders
> > for the next round of edits.
>
> Here's something else for future edits:
>
> 1. When converting a string to an integer, what should the rules be?
> Should:
> format("0xd", "d")
> produce "13", or should it be an error?

I can't see that as anything besides an error. There should be no
implicit conversions from strings to ints.

> 2. I'm making the format specifiers as strict as I can.  So, I've made
> these ValueError's:
>
> For strings:
>    - specifying a sign
>    - specifying an alignment of '='
>
> For longs:
>    - specify a precision
>    - specify a sign with type of 'c'

Works for me. Will probably catch a few bugs.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 20 21:05:13 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 20 Aug 2007 12:05:13 -0700
Subject: [Python-3000] Wanted: tasks for Py3k Sprint next week
In-Reply-To: <aac2c7cb0708182150n2c2f7e04u8d24aa761da2a64a@mail.gmail.com>
References: <ca471dc20708181005u3058fb42t35fdeee9077d9cd2@mail.gmail.com>
	<aac2c7cb0708182150n2c2f7e04u8d24aa761da2a64a@mail.gmail.com>
Message-ID: <ca471dc20708201205o2f2a6e75oa28838a53bde1fa1@mail.gmail.com>

If one of these pairs exists solely for backwards compatibility, yes.
I think Neal Norwitz started a discussion of a similar issue.

On 8/18/07, Adam Olsen <rhamph at gmail.com> wrote:
> On 8/18/07, Guido van Rossum <guido at python.org> wrote:
> > I'm soliciting ideas for things that need to be done for the 3.0
> > release that would make good sprint topics. Assume we'll have a mix of
> > more and less experienced developers on hand.
> >
> > (See wiki.python.org/moin/GoogleSprint .)
>
> Would ripping out the malloc macros[1] be a suitable suggestion?
>
> [1]
> Include/objimpl.h:#define PyObject_MALLOC               PyMem_MALLOC
> Include/pymem.h:#define PyMem_MALLOC            PyObject_MALLOC
>
> --
> Adam Olsen, aka Rhamphoryncus
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Mon Aug 20 21:11:27 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 20 Aug 2007 15:11:27 -0400
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>	
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>
	<ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>
Message-ID: <46C9E75F.2040208@trueblade.com>

Guido van Rossum wrote:
> On 8/19/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
>> Talin wrote:
>>> Wow, excellent feedback. I've added your email to the list of reminders
>>> for the next round of edits.
>> Here's something else for future edits:
>>
>> 1. When converting a string to an integer, what should the rules be?
>> Should:
>> format("0xd", "d")
>> produce "13", or should it be an error?
> 
> I can't see that as anything besides an error. There should be no
> implicit conversions from strings to ints.

OK.  I had been planning on implicitly converting between strings, ints, 
and floats (in all directions).  The PEP doesn't really say.

So the only implicit conversions will be:
int->float
int->string
float->int
float->string

Now that I look at it, % doesn't support string->float or string->int 
conversions.  Not sure where I got the idea it was needed.  I'll remove 
it and update my test cases.

Converting to strings doesn't really buy you much, since we have the !s 
specifier. But I think it's needed for backward compatibility with % 
formatting.


From guido at python.org  Mon Aug 20 21:16:08 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 20 Aug 2007 12:16:08 -0700
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <46C9E75F.2040208@trueblade.com>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>
	<ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>
	<46C9E75F.2040208@trueblade.com>
Message-ID: <ca471dc20708201216h27555a7nac9a8f3308a502f@mail.gmail.com>

On 8/20/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Guido van Rossum wrote:
> > On 8/19/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> >> Talin wrote:
> >>> Wow, excellent feedback. I've added your email to the list of reminders
> >>> for the next round of edits.
> >> Here's something else for future edits:
> >>
> >> 1. When converting a string to an integer, what should the rules be?
> >> Should:
> >> format("0xd", "d")
> >> produce "13", or should it be an error?
> >
> > I can't see that as anything besides an error. There should be no
> > implicit conversions from strings to ints.
>
> OK.  I had been planning on implicitly converting between strings, ints,
> and floats (in all directions).  The PEP doesn't really say.
>
> So the only implicit conversions will be:
> int->float
> int->string
> float->int
> float->string
>
> Now that I look at it, % doesn't support string->float or string->int
> conversions.  Not sure where I got the idea it was needed.  I'll remove
> it and update my test cases.
>
> Converting to strings doesn't really buy you much, since we have the !s
> specifier. But I think it's needed for backward compatibility with %
> formatting.

Why? The conversion code can just generate !s:-20 instead of :-20s.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Mon Aug 20 21:46:41 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 20 Aug 2007 15:46:41 -0400
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <ca471dc20708201216h27555a7nac9a8f3308a502f@mail.gmail.com>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>	
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>	
	<ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>	
	<46C9E75F.2040208@trueblade.com>
	<ca471dc20708201216h27555a7nac9a8f3308a502f@mail.gmail.com>
Message-ID: <46C9EFA1.5050906@trueblade.com>

Guido van Rossum wrote:
>> Converting to strings doesn't really buy you much, since we have the !s
>> specifier. But I think it's needed for backward compatibility with %
>> formatting.
> 
> Why? The conversion code can just generate !s:-20 instead of :-20s.

True enough.  I'll take it out, too.

Talin:  On your list of to-do items for the PEP, could you add that the 
only conversions for the standard conversion specifiers are int <-> float?

Thanks.

Eric.


From eric+python-dev at trueblade.com  Tue Aug 21 02:18:42 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 20 Aug 2007 20:18:42 -0400
Subject: [Python-3000] Looking for advice on PEP 3101 implementation
	details
In-Reply-To: <ca471dc20708171358o33543610i3d2b6f551db5c80f@mail.gmail.com>
References: <46C5F112.5050101@trueblade.com>	
	<ca471dc20708171236v34eeea14v5bf55e4b72e218a0@mail.gmail.com>	
	<46C601A7.6050408@trueblade.com>
	<ca471dc20708171358o33543610i3d2b6f551db5c80f@mail.gmail.com>
Message-ID: <46CA2F62.8060903@trueblade.com>

I'm basically done with format(), string.format(), object.__format__(), 
string.__format__(), long.__format__(), and float.__format__().  I have 
some cleanup left to do from all of the refactoring, but it's passing 
the vast majority of my tests.

The only real remaining work is to implement string.Formatter.  This is 
a class designed to be overridden to customize the formatting behavior. 
  It will share much of the C code with string.format().

My plan is to write this class in Python, and put it in Lib/string.py.
Given the complexities and book-keeping involved, writing it in C
doesn't seem worth the hassle.  In order to talk back to the C
implementation code, I'll create a private module in Modules/_formatter.c.

Does this seem reasonable?

If so, my question is how to add module in the Modules directory.  There 
is some logic in the top level Makefile.pre.in, but it doesn't look like 
it applies to all of the code in Modules, just some of the files.

Modules/Setup.dist contains this comment:
# This only contains the minimal set of modules required to run the
# setup.py script in the root of the Python source tree.

I think this applies to me, as setup.py indirectly includes string.

So, is the right thing to do to insert my _formatter.c into 
Modules/Setup.dist?  Is there anything else I need to do?  Is there some 
existing code in Modules that I should base my approach on?

I googled for help on this, but didn't get anywhere.

Thanks again for your assistance.

Eric.




From greg.ewing at canterbury.ac.nz  Tue Aug 21 02:48:25 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 21 Aug 2007 12:48:25 +1200
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <46C9EFA1.5050906@trueblade.com>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>
	<ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>
	<46C9E75F.2040208@trueblade.com>
	<ca471dc20708201216h27555a7nac9a8f3308a502f@mail.gmail.com>
	<46C9EFA1.5050906@trueblade.com>
Message-ID: <46CA3659.6010304@canterbury.ac.nz>

Eric Smith wrote:
> Guido van Rossum wrote:
> 
> > Why? The conversion code can just generate !s:-20 instead of :-20s.
>  
> Talin:  On your list of to-do items for the PEP, could you add that the 
> only conversions for the standard conversion specifiers are int <-> float?

Please, no! While the converter may be able to handle
it, "!s:-20" is terribly ugly for humans.

--
Greg

From guido at python.org  Tue Aug 21 03:00:55 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 20 Aug 2007 18:00:55 -0700
Subject: [Python-3000] Looking for advice on PEP 3101 implementation
	details
In-Reply-To: <46CA2F62.8060903@trueblade.com>
References: <46C5F112.5050101@trueblade.com>
	<ca471dc20708171236v34eeea14v5bf55e4b72e218a0@mail.gmail.com>
	<46C601A7.6050408@trueblade.com>
	<ca471dc20708171358o33543610i3d2b6f551db5c80f@mail.gmail.com>
	<46CA2F62.8060903@trueblade.com>
Message-ID: <ca471dc20708201800p3c1e546s906b30a7a7611e9c@mail.gmail.com>

On 8/20/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> I'm basically done with format(), string.format(), object.__format__(),
> string.__format__(), long.__format__(), and float.__format__().  I have
> some cleanup left to do from all of the refactoring, but it's passing
> the vast majority of my tests.
>
> The only real remaining work is to implement string.Formatter.  This is
> a class designed to be overridden to customize the formatting behavior.
>   It will share much of the C code with string.format().
>
> My plan is to write this class in Python, and put it in Lib/string.py.
> Given the complexities and book-keeping involved, writing it in C
> doesn't seem worth the hassle.  In order to talk back to the C
> implementation code, I'll create a private module in Modules/_formatter.c.
>
> Does this seem reasonable?

Sure.

> If so, my question is how to add module in the Modules directory.  There
> is some logic in the top level Makefile.pre.in, but it doesn't look like
> it applies to all of the code in Modules, just some of the files.
>
> Modules/Setup.dist contains this comment:
> # This only contains the minimal set of modules required to run the
> # setup.py script in the root of the Python source tree.
>
> I think this applies to me, as setup.py indirectly includes string.
>
> So, is the right thing to do to insert my _formatter.c into
> Modules/Setup.dist?  Is there anything else I need to do?  Is there some
> existing code in Modules that I should base my approach on?
>
> I googled for help on this, but didn't get anywhere.
>
> Thanks again for your assistance.

You can ignore Makefile* and Modules/Setup*; instead, you should be
editing setup.py at the toplevel. Since your new module doesn't depend
on anything external it should be a one-line change, modeled after
this one:

exts.append( Extension('_weakref', ['_weakref.c']) )

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 21 03:03:09 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 20 Aug 2007 18:03:09 -0700
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <46CA3659.6010304@canterbury.ac.nz>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>
	<ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>
	<46C9E75F.2040208@trueblade.com>
	<ca471dc20708201216h27555a7nac9a8f3308a502f@mail.gmail.com>
	<46C9EFA1.5050906@trueblade.com> <46CA3659.6010304@canterbury.ac.nz>
Message-ID: <ca471dc20708201803n52302cd6w6520d7d87a54d956@mail.gmail.com>

On 8/20/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Eric Smith wrote:
> > Guido van Rossum wrote:
> >
> > > Why? The conversion code can just generate !s:-20 instead of :-20s.
> >
> > Talin:  On your list of to-do items for the PEP, could you add that the
> > only conversions for the standard conversion specifiers are int <-> float?
>
> Please, no! While the converter may be able to handle
> it, "!s:-20" is terribly ugly for humans.

But how often will you need this? (You only need the !s part if you
don't know that the argument is a string.) The alternative would
require every type's formatter to interpret -20s the same way, which
goes against the idea that the conversion mini-language is an object's
own business.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Tue Aug 21 03:22:22 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 20 Aug 2007 21:22:22 -0400
Subject: [Python-3000] Looking for advice on PEP 3101 implementation
	details
In-Reply-To: <ca471dc20708201800p3c1e546s906b30a7a7611e9c@mail.gmail.com>
References: <46C5F112.5050101@trueblade.com>	
	<ca471dc20708171236v34eeea14v5bf55e4b72e218a0@mail.gmail.com>	
	<46C601A7.6050408@trueblade.com>	
	<ca471dc20708171358o33543610i3d2b6f551db5c80f@mail.gmail.com>	
	<46CA2F62.8060903@trueblade.com>
	<ca471dc20708201800p3c1e546s906b30a7a7611e9c@mail.gmail.com>
Message-ID: <46CA3E4E.1020203@trueblade.com>

Guido van Rossum wrote:
> On 8/20/07, Eric Smith <eric+python-dev at trueblade.com> wrote:

>> Modules/Setup.dist contains this comment:
>> # This only contains the minimal set of modules required to run the
>> # setup.py script in the root of the Python source tree.
>>
>> I think this applies to me, as setup.py indirectly includes string.

> You can ignore Makefile* and Modules/Setup*; instead, you should be
> editing setup.py at the toplevel. Since your new module doesn't depend
> on anything external it should be a one-line change, modeled after
> this one:
> 
> exts.append( Extension('_weakref', ['_weakref.c']) )

But if string.py imports _formatter, then setup.py fails with being 
unable to "import string":

$ ./python setup.py
object  : ImportError('No module named _formatter',)
type    : ImportError
refcount: 4
address : 0xf6f9acac
lost sys.stderr

That's why I referenced the comment in Setup.dist.

From guido at python.org  Tue Aug 21 04:25:46 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 20 Aug 2007 19:25:46 -0700
Subject: [Python-3000] Looking for advice on PEP 3101 implementation
	details
In-Reply-To: <46CA3E4E.1020203@trueblade.com>
References: <46C5F112.5050101@trueblade.com>
	<ca471dc20708171236v34eeea14v5bf55e4b72e218a0@mail.gmail.com>
	<46C601A7.6050408@trueblade.com>
	<ca471dc20708171358o33543610i3d2b6f551db5c80f@mail.gmail.com>
	<46CA2F62.8060903@trueblade.com>
	<ca471dc20708201800p3c1e546s906b30a7a7611e9c@mail.gmail.com>
	<46CA3E4E.1020203@trueblade.com>
Message-ID: <ca471dc20708201925i5acdb0a8kc2b595f5c5807523@mail.gmail.com>

On 8/20/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Guido van Rossum wrote:
> > On 8/20/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
>
> >> Modules/Setup.dist contains this comment:
> >> # This only contains the minimal set of modules required to run the
> >> # setup.py script in the root of the Python source tree.
> >>
> >> I think this applies to me, as setup.py indirectly includes string.
>
> > You can ignore Makefile* and Modules/Setup*; instead, you should be
> > editing setup.py at the toplevel. Since your new module doesn't depend
> > on anything external it should be a one-line change, modeled after
> > this one:
> >
> > exts.append( Extension('_weakref', ['_weakref.c']) )
>
> But if string.py imports _formatter, then setup.py fails with being
> unable to "import string":
>
> $ ./python setup.py
> object  : ImportError('No module named _formatter',)
> type    : ImportError
> refcount: 4
> address : 0xf6f9acac
> lost sys.stderr
>
> That's why I referenced the comment in Setup.dist.

Hm, those damn dependencies. In that case I suggest adding it to sys
instead of creating a new internal method. It could be
sys._formatparser or whatever useful name you'd like to give it, as
long as it starts with an underscore. That should solve it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Tue Aug 21 04:32:49 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 20 Aug 2007 22:32:49 -0400
Subject: [Python-3000] Looking for advice on PEP 3101 implementation
	details
In-Reply-To: <ca471dc20708201925i5acdb0a8kc2b595f5c5807523@mail.gmail.com>
References: <46C5F112.5050101@trueblade.com>	
	<ca471dc20708171236v34eeea14v5bf55e4b72e218a0@mail.gmail.com>	
	<46C601A7.6050408@trueblade.com>	
	<ca471dc20708171358o33543610i3d2b6f551db5c80f@mail.gmail.com>	
	<46CA2F62.8060903@trueblade.com>	
	<ca471dc20708201800p3c1e546s906b30a7a7611e9c@mail.gmail.com>	
	<46CA3E4E.1020203@trueblade.com>
	<ca471dc20708201925i5acdb0a8kc2b595f5c5807523@mail.gmail.com>
Message-ID: <46CA4ED1.5010606@trueblade.com>

Guido van Rossum wrote:
> Hm, those damn dependencies. In that case I suggest adding it to sys
> instead of creating a new internal method. It could be
> sys._formatparser or whatever useful name you'd like to give it, as
> long as it starts with an underscore. That should solve it.

Okay, that's much easier for me.  I'll go in that direction.


From greg.ewing at canterbury.ac.nz  Tue Aug 21 07:46:42 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 21 Aug 2007 17:46:42 +1200
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <ca471dc20708201803n52302cd6w6520d7d87a54d956@mail.gmail.com>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>
	<ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>
	<46C9E75F.2040208@trueblade.com>
	<ca471dc20708201216h27555a7nac9a8f3308a502f@mail.gmail.com>
	<46C9EFA1.5050906@trueblade.com> <46CA3659.6010304@canterbury.ac.nz>
	<ca471dc20708201803n52302cd6w6520d7d87a54d956@mail.gmail.com>
Message-ID: <46CA7C42.5040007@canterbury.ac.nz>

Guido van Rossum wrote:
> But how often will you need this? (You only need the !s part if you
> don't know that the argument is a string.)

Maybe I'm confused. I thought we had agreed that most
types would delegate to str if they didn't understand
the format, so most of the time there wouldn't be any
need to use "!s". Is that still true?

If not, I think it will be very inconvenient, as I
very frequently format things of all sorts of types
using "%s", and rely on it doing something reasonable.

--
Greg

From ericsmith at windsor.com  Tue Aug 21 00:54:51 2007
From: ericsmith at windsor.com (Eric V. Smith)
Date: Mon, 20 Aug 2007 18:54:51 -0400
Subject: [Python-3000] Looking for advice on PEP 3101 implementation
	details
In-Reply-To: <46C5F112.5050101@trueblade.com>
References: <46C5F112.5050101@trueblade.com>
Message-ID: <46CA1BBB.6070603@windsor.com>

I've completed most of the implementation for PEP 3101.  The only thing
I have left to do is the Formatter class, which is supposed to live in
the string module.

My plan is to write this part in Python, and put it in Lib/string.py.
Given the complexities and book-keeping involved, writing it in C
doesn't seem worth the hassle.  In order to talk back to the existing C
implementation, I'll create a private module in Modules/_formatter.c.

Does this seem reasonable?

If so, my question is how to add module in the Modules directory.  There 
is some logic in the top level Makefile.pre.in, but it doesn't look like 
it applies to all of the code in Modules, just some of the files.

Modules/Setup.dist contains this comment:
# This only contains the minimal set of modules required to run the
# setup.py script in the root of the Python source tree.

I think this applies to me, as setup.py indirectly includes string.

So, is the right thing to do to insert my _formatter.c into 
Modules/Setup.dist?  Is there anything else I need to do?

I googled for help on this, but didn't get anywhere.

Thanks again for any assistance.

Eric.


From g.brandl at gmx.net  Tue Aug 21 08:17:30 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 21 Aug 2007 08:17:30 +0200
Subject: [Python-3000] Documentation switch imminent
In-Reply-To: <740c3aec0708171731qc9324c3o17debfafe4c1530d@mail.gmail.com>
References: <f9t2nn$ksg$1@sea.gmane.org>
	<740c3aec0708171731qc9324c3o17debfafe4c1530d@mail.gmail.com>
Message-ID: <fae01k$b0o$1@sea.gmane.org>

BJ?rn Lindqvist schrieb:
> It is fantastic! Totally super work. I just have one small request;
> pretty please do not set the font. I'm very happy with my browsers
> default (Verdana), and Bitstream Vera Sans renders badly for me.

Okay, I've changed the stylesheet, it should go live on docs.python.org
intermittently.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From eric+python-dev at trueblade.com  Tue Aug 21 11:21:09 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 21 Aug 2007 05:21:09 -0400
Subject: [Python-3000] Looking for advice on PEP 3101
	implementation	details
In-Reply-To: <46CA1BBB.6070603@windsor.com>
References: <46C5F112.5050101@trueblade.com> <46CA1BBB.6070603@windsor.com>
Message-ID: <46CAAE85.1070206@trueblade.com>

Eric V. Smith wrote:
[a duplicate message]

Please ignore this.  I accidentally sent it twice.

From eric+python-dev at trueblade.com  Tue Aug 21 11:55:23 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 21 Aug 2007 05:55:23 -0400
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <46CA7C42.5040007@canterbury.ac.nz>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>
	<ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>
	<46C9E75F.2040208@trueblade.com>
	<ca471dc20708201216h27555a7nac9a8f3308a502f@mail.gmail.com>
	<46C9EFA1.5050906@trueblade.com>
	<46CA3659.6010304@canterbury.ac.nz>
	<ca471dc20708201803n52302cd6w6520d7d87a54d956@mail.gmail.com>
	<46CA7C42.5040007@canterbury.ac.nz>
Message-ID: <46CAB68B.9060900@trueblade.com>

Greg Ewing wrote:
> Guido van Rossum wrote:
>> But how often will you need this? (You only need the !s part if you
>> don't know that the argument is a string.)
> 
> Maybe I'm confused. I thought we had agreed that most
> types would delegate to str if they didn't understand
> the format, so most of the time there wouldn't be any
> need to use "!s". Is that still true?

Yes, it is true.  Here's a working test case:

     # class with __str__, but no __format__
     class E:
         def __init__(self, x):
             self.x = x
         def __str__(self):
             return 'E(' + self.x + ')'

     self.assertEqual('{0}'.format(E('data')), 'E(data)')
     self.assertEqual('{0:^10}'.format(E('data')), ' E(data)  ')
     self.assertEqual('{0:^10s}'.format(E('data')), ' E(data)  ')

The formatting in all 3 cases is being done by string.__format__() 
(actually object.__format__, which calls str(o).__format__).

> If not, I think it will be very inconvenient, as I
> very frequently format things of all sorts of types
> using "%s", and rely on it doing something reasonable.

That will continue to work, for objects that don't provide a __format__ 
function.  The problem is that if an object does does its own 
__format__, it either needs to understand all of the string formatting, 
or at least recognize a string format and send it along to 
string.__format__() (or object.__format__, which will convert to string 
for you).  Another working test case:

     # class with __format__ that forwards to string,
     #  for some format_spec's
     class G:
         def __init__(self, x):
             self.x = x
         def __str__(self):
             return "string is " + self.x
         def __format__(self, format_spec):
             if format_spec == 's':
                 return 'G(' + self.x + ')'
             return object.__format__(self, format_spec)

     self.assertEqual('{0:s}'.format(G('data')), 'G(data)')

     # unknown spec, will call object.__format__, which calls str()
     self.assertEqual('{0:>15s}'.format(G('data')), ' string is data')

     # convert to string explicitely, overriding G.__format__
     self.assertEqual('{0!s}'.format(G('data')), 'string is data')

Note the collision with the 's' format_spec in this example.  You'd have 
to carefully design your object's __format__ specifiers to be able to 
recognize string specifiers as being different from own specifiers 
(something that G does not cleanly do).

int is like G: it defines its own __format__.  "!s" says: skip the 
object's own __format__ function, just convert the object to a string 
and call string.__format__.  So what Guido is saying is that for int, 
instead of having int.__format__ recognize string formatting specifiers 
and doing the conversion to string, you need to convert it to a string 
yourself with "!s".

Whether that's better or not, I leave up to Guido.  I personally think 
that for int and float, having them recognize "s" format specs is 
sufficiently handy that it's worth having, but I understand not 
providing that feature.

Eric.

From dalcinl at gmail.com  Tue Aug 21 17:00:36 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Tue, 21 Aug 2007 12:00:36 -0300
Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch
In-Reply-To: <fa6kp4$s9b$1@sea.gmane.org>
References: <fa6kp4$s9b$1@sea.gmane.org>
Message-ID: <e7ba66e40708210800s7fc187a3ma0c683244dd8496c@mail.gmail.com>

Travis, I had no much time to follow you on py3k-buffer branch, but
now you merged in py3k, I want to make an small comment for your
consideration.

Pehaps the 'PyBuffer' struct could be named different, something like
'Py_buffer'. The use case has some similarites to 'Py_complex' struct.
It is no related to any 'PyObject*', but to a public structure wich,
(if I understand right) can be declared and used in static storage.

In short, I am propossing the naming below. Note I removed
'bufferinfo' in the typedef line, as it seems to be not needed and it
only appears here after grepping in sources) and could conflict with
user code.

/* buffer interface */
typedef struct {
    .....
} Py_buffer;

typedef struct {
        PyObject_HEAD
        PyObject *base;
        Py_buffer view;
} PyMemoryViewObject;


Again, taking complex as an example, please note the symmetry:


typedef struct {
    double real;
    double imag;
} Py_complex;

typedef struct {
    PyObject_HEAD
    Py_complex cval;
} PyComplexObject;


Regards,


On 8/18/07, Travis E. Oliphant <oliphant.travis at ieee.org> wrote:
> In preparation for the sprints, I have converted all Python objects to
> use the new buffer protocol PEP and implemented most of the C-API.  This
> work took place in the py3k-buffer branch which now passes all the tests
> that py3k does.
>
> So, I merged the changes back to the py3k branch in hopes that others
> can continue working on what I've done.  The merge took place after
> fully syncing the py3k-buffer branch with the current trunk.

-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From guido at python.org  Tue Aug 21 19:06:32 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 21 Aug 2007 10:06:32 -0700
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <46CA7C42.5040007@canterbury.ac.nz>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>
	<ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>
	<46C9E75F.2040208@trueblade.com>
	<ca471dc20708201216h27555a7nac9a8f3308a502f@mail.gmail.com>
	<46C9EFA1.5050906@trueblade.com> <46CA3659.6010304@canterbury.ac.nz>
	<ca471dc20708201803n52302cd6w6520d7d87a54d956@mail.gmail.com>
	<46CA7C42.5040007@canterbury.ac.nz>
Message-ID: <ca471dc20708211006v1066b20o33f81e30227a4d47@mail.gmail.com>

On 8/20/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> > But how often will you need this? (You only need the !s part if you
> > don't know that the argument is a string.)
>
> Maybe I'm confused. I thought we had agreed that most
> types would delegate to str if they didn't understand
> the format, so most of the time there wouldn't be any
> need to use "!s". Is that still true?

Yes, by virtue of this being what object.__format__ does (AFAIU).

> If not, I think it will be very inconvenient, as I
> very frequently format things of all sorts of types
> using "%s", and rely on it doing something reasonable.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From oliphant at enthought.com  Tue Aug 21 18:16:15 2007
From: oliphant at enthought.com (Travis Oliphant)
Date: Tue, 21 Aug 2007 10:16:15 -0600
Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch
In-Reply-To: <e7ba66e40708210800s7fc187a3ma0c683244dd8496c@mail.gmail.com>
References: <fa6kp4$s9b$1@sea.gmane.org>
	<e7ba66e40708210800s7fc187a3ma0c683244dd8496c@mail.gmail.com>
Message-ID: <46CB0FCF.3080100@enthought.com>

Lisandro Dalcin wrote:
> Travis, I had no much time to follow you on py3k-buffer branch, but
> now you merged in py3k, I want to make an small comment for your
> consideration.
>
> Pehaps the 'PyBuffer' struct could be named different, something like
> 'Py_buffer'. The use case has some similarites to 'Py_complex' struct.
> It is no related to any 'PyObject*', but to a public structure wich,
> (if I understand right) can be declared and used in static storage.
>
> In short, I am propossing the naming below. Note I removed
> 'bufferinfo' in the typedef line, as it seems to be not needed and it
> only appears here after grepping in sources) and could conflict with
> user code.
>   

I have no problems with these changes.  I will be unable to do them 
myself though this week.

-Travis

> /* buffer interface */
> typedef struct {
>     .....
> } Py_buffer;
>
> typedef struct {
>         PyObject_HEAD
>         PyObject *base;
>         Py_buffer view;
> } PyMemoryViewObject;
>
>
> Again, taking complex as an example, please note the symmetry:
>
>
> typedef struct {
>     double real;
>     double imag;
> } Py_complex;
>
> typedef struct {
>     PyObject_HEAD
>     Py_complex cval;
> } PyComplexObject;
>
>
>   


From gvanrossum at gmail.com  Tue Aug 21 19:56:50 2007
From: gvanrossum at gmail.com (gvanrossum at gmail.com)
Date: Tue, 21 Aug 2007 10:56:50 -0700
Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets)
Message-ID: <c09ffb51ed04383961b5e8ff223d43@gmail.com>

I've shared a document with you called "Py3k Sprint Tasks":
http://spreadsheets.google.com/ccc?key=pBLWM8elhFAmKbrhhh0ApQA&inv=python-3000 at python.org&t=3328567089265242420&guest

It's not an attachment -- it's stored online at Google Docs & Spreadsheets. To open this document, just click the link above.

(resend, I'm not sure this made it out the first time)

This spreadsheet is where I'm organizing the tasks for the Google
Sprint starting tomorrow.

Feel free to add. If you're coming to the sprint, feel free to claim
ownership of a task.

---
Note: You'll need to sign into Google with this email address. To use a different email address, just reply to this message and ask me to invite your other one.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070821/b44249b1/attachment.htm 

From skip at pobox.com  Tue Aug 21 20:23:11 2007
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 21 Aug 2007 13:23:11 -0500
Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets)
In-Reply-To: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
References: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
Message-ID: <18123.11663.259022.539912@montanaro.dyndns.org>


    Guido> Feel free to add. If you're coming to the sprint, feel free to
    Guido> claim ownership of a task.

I started to edit the spreadsheet but then held off, remembering the edit
conflict problems I caused you just a few minutes earlier with the wiki page
(sorry about that).  Does the Google docs/spreadsheets server do a decent
job handling multiple document writers?

Skip


From guido at python.org  Tue Aug 21 20:34:08 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 21 Aug 2007 11:34:08 -0700
Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets)
In-Reply-To: <18123.11663.259022.539912@montanaro.dyndns.org>
References: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
	<18123.11663.259022.539912@montanaro.dyndns.org>
Message-ID: <ca471dc20708211134m7d0aac81qf4fbcc426ddd2c1b@mail.gmail.com>

On 8/21/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     Guido> Feel free to add. If you're coming to the sprint, feel free to
>     Guido> claim ownership of a task.
>
> I started to edit the spreadsheet but then held off, remembering the edit
> conflict problems I caused you just a few minutes earlier with the wiki page
> (sorry about that).  Does the Google docs/spreadsheets server do a decent
> job handling multiple document writers?

Yes. I think the only way you can create a conflict is by editing the
same cell simultaneously; and it will show who is on which cell (at
least if you open "Discuss"). So by all means go ahead!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Wed Aug 22 00:12:40 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 21 Aug 2007 18:12:40 -0400
Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets)
In-Reply-To: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
References: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
Message-ID: <93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 21, 2007, at 1:56 PM, gvanrossum at gmail.com wrote:

> I've shared a document with you called "Py3k Sprint Tasks":
> http://spreadsheets.google.com/ccc? 
> key=pBLWM8elhFAmKbrhhh0ApQA&inv=python-3000 at python.org&t=3328567089265 
> 242420&guest
>
> It's not an attachment -- it's stored online at Google Docs &  
> Spreadsheets. To open this document, just click the link above.
>
> (resend, I'm not sure this made it out the first time)
>
> This spreadsheet is where I'm organizing the tasks for the Google
> Sprint starting tomorrow.
>
> Feel free to add. If you're coming to the sprint, feel free to claim
> ownership of a task.

I have approval to spend some official time at this sprint, though  
I'll be working from home and will be on IRC, Skype, etc.

I've been spending hours of my own time on the email package for py3k  
this week and every time I think I'm nearing success I get defeated  
again.  I think Victor Stinner came to similar conclusions.  To put  
it mildly, the email package is effed up!  But I'm determined to  
solve the worst of the problems this week.

I only have Wednesday and Thursday to work on this, with most of my  
time available on Thursday.  I'd really like to find one or two other  
folks to connect with to help work out the stickiest issues.  Please  
contact me directly or on this list to arrange a time with me.  I'm  
UTC-4 if that helps.  I'll be on #python-dev (barry) too.

Remember that the current code is in the python sandbox (under  
emailpkg/5_0-exp).  I have some uncommitted code which I'll try to  
check in tonight, though I don't know if it will make matters better  
or worse. ;)

Cheers,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRstjWXEjvBPtnXfVAQLQcQP+Lo/D1YH1+w/51kNyQN1+zrzu1Cov7ERk
1xtT5L2LlaPjXGeVMlc6Xz0bbLVc96kSQ4SIrkc5RRNorcYzMf8kID4rLkO6S+kU
CXtpOVgmzkX9zotAL9O72v2uOHT6c0fcK8ag44EiAtWei3Tdf+R2rL6lOzo0lHgj
qmVPFzlzGCA=
=t1nr
-----END PGP SIGNATURE-----

From janssen at parc.com  Wed Aug 22 02:01:28 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 21 Aug 2007 17:01:28 PDT
Subject: [Python-3000] Py3k Sprint Tasks
In-Reply-To: <c09ffb51ed04383961b5e8ff223d43@gmail.com> 
References: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
Message-ID: <07Aug21.170133pdt."57996"@synergy1.parc.xerox.com>

I'd like to spend some time during the Sprint doing three things:

1.  Doing a code review of the modified SSL C code with 2 or 3 others.
    Can we get a small conference room with a projector to use for an hour?
    If not, I can provide one at PARC.  I also need a few volunteers to be
    the review group.

2.  Working on the test cases and adding them to the standard test suite.

3.  Improving the documentation.  In particular there needs to be better
    documentation on certificates, what's in them, how to use them, what
    certificate validation does and does not provide, where to get standard
    root certificates.  It would be useful to document some standard code
    patterns, like how to shift into TLS after a STARTTLS request has been
    received, etc.

A fourth thing I'd like to do, which isn't strictly Sprint-related, is to
learn more about distutils and packaging.

Bill

From greg.ewing at canterbury.ac.nz  Wed Aug 22 02:11:50 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 22 Aug 2007 12:11:50 +1200
Subject: [Python-3000] PEP 3101 clarification requests
In-Reply-To: <46CAB68B.9060900@trueblade.com>
References: <fb6fbf560708171027h1aa2e8f2pecbafde5235ea164@mail.gmail.com>
	<46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com>
	<ca471dc20708201201u5a36885fhae5604c6c840121@mail.gmail.com>
	<46C9E75F.2040208@trueblade.com>
	<ca471dc20708201216h27555a7nac9a8f3308a502f@mail.gmail.com>
	<46C9EFA1.5050906@trueblade.com> <46CA3659.6010304@canterbury.ac.nz>
	<ca471dc20708201803n52302cd6w6520d7d87a54d956@mail.gmail.com>
	<46CA7C42.5040007@canterbury.ac.nz> <46CAB68B.9060900@trueblade.com>
Message-ID: <46CB7F46.1030200@canterbury.ac.nz>

Eric Smith wrote:
> The problem is that if an object does does its own 
> __format__, it either needs to understand all of the string formatting, 
> or at least recognize a string format and send it along to 
> string.__format__() (or object.__format__, which will convert to string 
> for you).

No, all it needs to do is tell when it *doesn't*
recognise the format and call its inherited __format__
method. Eventually that will get to object.__format__
which will delegate to str.

> Note the collision with the 's' format_spec in this example.

I'd say you should normally design your format specs
so that they don't conflict with string formats.

If you want to implement the string formats your own
way, that's okay, but then it's your responsibility
to support all of them.

--
Greg

From guido at python.org  Wed Aug 22 16:57:28 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 22 Aug 2007 07:57:28 -0700
Subject: [Python-3000] Py3k Sprint Tasks
In-Reply-To: <9106183526398442256@unknownmsgid>
References: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
	<9106183526398442256@unknownmsgid>
Message-ID: <ca471dc20708220757t12b27d22wd2c38ee15aac72dc@mail.gmail.com>

On 8/21/07, Bill Janssen <janssen at parc.com> wrote:
> I'd like to spend some time during the Sprint doing three things:
>
> 1.  Doing a code review of the modified SSL C code with 2 or 3 others.
>     Can we get a small conference room with a projector to use for an hour?
>     If not, I can provide one at PARC.  I also need a few volunteers to be
>     the review group.

NP, I'll try to book something.

> 2.  Working on the test cases and adding them to the standard test suite.
>
> 3.  Improving the documentation.  In particular there needs to be better
>     documentation on certificates, what's in them, how to use them, what
>     certificate validation does and does not provide, where to get standard
>     root certificates.  It would be useful to document some standard code
>     patterns, like how to shift into TLS after a STARTTLS request has been
>     received, etc.
>
> A fourth thing I'd like to do, which isn't strictly Sprint-related, is to
> learn more about distutils and packaging.

Did you ever manage to view the task spreadsheet?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Wed Aug 22 18:48:43 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 22 Aug 2007 12:48:43 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C2C1A0.4060002@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
Message-ID: <46CC68EB.2030609@trueblade.com>

Eric Smith wrote:
> Talin wrote:
>> A new version is up, incorporating material from the various discussions 
>> on this list:
>>
>> 	http://www.python.org/dev/peps/pep-3101/
> 
> self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e')
> self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e')

I've been re-reading the PEP, in an effort to make sure everything is 
working.  I realized that these tests should not pass.  The PEP says 
that "Format specifiers can themselves contain replacement fields".  The 
tests above have replacement fields in the field name, which is not 
allowed.  I'm going to remove this functionality.

I believe the intent is to support a replacement for:
"%.*s" % (4, 'how now brown cow')

Which would be:
"{0:.{1}}".format('how now brown cow', 4)

For this, there's no need for replacement on field name.  I've taken it 
out of the code, and made these tests in to errors.

Eric.


From jyasskin at gmail.com  Wed Aug 22 21:36:31 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Wed, 22 Aug 2007 12:36:31 -0700
Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy
	for Numbers
In-Reply-To: <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>
References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com>
	<5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>
Message-ID: <5d44f72f0708221236k7c3ea054k43eb237f4a3ef577@mail.gmail.com>

There are still some open issues here that need answers:

* Should __pos__ coerce the argument to be an instance of the type
it's defined on?
* Add Demo/classes/Rat.py to the stdlib?
* How many of __trunc__, __floor__, __ceil__, and __round__ should be
magic methods? For __round__, when do we want to return an Integral?
[__properfraction__ is probably subsumed by divmod(x, 1).]
* How to give the removed methods (divmod, etc. on complex) good error
messages without having them show up in help(complex)?

I'll look into this during the sprint.

On 8/2/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> After some more discussion, I have another version of the PEP with a
> draft, partial implementation. Let me know what you think.
>
>
>
> PEP: 3141
> Title: A Type Hierarchy for Numbers
> Version: $Revision: 56646 $
> Last-Modified: $Date: 2007-08-01 10:11:55 -0700 (Wed, 01 Aug 2007) $
> Author: Jeffrey Yasskin <jyasskin at gmail.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 23-Apr-2007
> Post-History: 25-Apr-2007, 16-May-2007, 02-Aug-2007
>
>
> Abstract
> ========
>
> This proposal defines a hierarchy of Abstract Base Classes (ABCs) (PEP
> 3119) to represent number-like classes. It proposes a hierarchy of
> ``Number :> Complex :> Real :> Rational :> Integral`` where ``A :> B``
> means "A is a supertype of B", and a pair of ``Exact``/``Inexact``
> classes to capture the difference between ``floats`` and
> ``ints``. These types are significantly inspired by Scheme's numeric
> tower [#schemetower]_.
>
> Rationale
> =========
>
> Functions that take numbers as arguments should be able to determine
> the properties of those numbers, and if and when overloading based on
> types is added to the language, should be overloadable based on the
> types of the arguments. For example, slicing requires its arguments to
> be ``Integrals``, and the functions in the ``math`` module require
> their arguments to be ``Real``.
>
> Specification
> =============
>
> This PEP specifies a set of Abstract Base Classes, and suggests a
> general strategy for implementing some of the methods. It uses
> terminology from PEP 3119, but the hierarchy is intended to be
> meaningful for any systematic method of defining sets of classes.
>
> The type checks in the standard library should use these classes
> instead of the concrete built-ins.
>
>
> Numeric Classes
> ---------------
>
> We begin with a Number class to make it easy for people to be fuzzy
> about what kind of number they expect. This class only helps with
> overloading; it doesn't provide any operations. ::
>
>     class Number(metaclass=ABCMeta): pass
>
>
> Most implementations of complex numbers will be hashable, but if you
> need to rely on that, you'll have to check it explicitly: mutable
> numbers are supported by this hierarchy. **Open issue:** Should
> __pos__ coerce the argument to be an instance of the type it's defined
> on? Why do the builtins do this? ::
>
>     class Complex(Number):
>         """Complex defines the operations that work on the builtin complex type.
>
>         In short, those are: a conversion to complex, .real, .imag, +, -,
>         *, /, abs(), .conjugate, ==, and !=.
>
>         If it is given heterogenous arguments, and doesn't have special
>         knowledge about them, it should fall back to the builtin complex
>         type as described below.
>         """
>
>         @abstractmethod
>         def __complex__(self):
>             """Return a builtin complex instance."""
>
>         def __bool__(self):
>             """True if self != 0."""
>             return self != 0
>
>         @abstractproperty
>         def real(self):
>             """Retrieve the real component of this number.
>
>             This should subclass Real.
>             """
>             raise NotImplementedError
>
>         @abstractproperty
>         def imag(self):
>             """Retrieve the real component of this number.
>
>             This should subclass Real.
>             """
>             raise NotImplementedError
>
>         @abstractmethod
>         def __add__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __radd__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __neg__(self):
>             raise NotImplementedError
>
>         def __pos__(self):
>             return self
>
>         def __sub__(self, other):
>             return self + -other
>
>         def __rsub__(self, other):
>             return -self + other
>
>         @abstractmethod
>         def __mul__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rmul__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __div__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rdiv__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __pow__(self, exponent):
>             """Like division, a**b should promote to complex when necessary."""
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rpow__(self, base):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __abs__(self):
>             """Returns the Real distance from 0."""
>             raise NotImplementedError
>
>         @abstractmethod
>         def conjugate(self):
>             """(x+y*i).conjugate() returns (x-y*i)."""
>             raise NotImplementedError
>
>         @abstractmethod
>         def __eq__(self, other):
>             raise NotImplementedError
>
>         def __ne__(self, other):
>             return not (self == other)
>
>
> The ``Real`` ABC indicates that the value is on the real line, and
> supports the operations of the ``float`` builtin. Real numbers are
> totally ordered except for NaNs (which this PEP basically ignores). ::
>
>     class Real(Complex):
>         """To Complex, Real adds the operations that work on real numbers.
>
>         In short, those are: a conversion to float, trunc(), divmod,
>         %, <, <=, >, and >=.
>
>         Real also provides defaults for the derived operations.
>         """
>
>         @abstractmethod
>         def __float__(self):
>             """Any Real can be converted to a native float object."""
>             raise NotImplementedError
>
>         @abstractmethod
>         def __trunc__(self):
>             """Truncates self to an Integral.
>
>             Returns an Integral i such that:
>               * i>0 iff self>0
>               * abs(i) <= abs(self).
>             """
>             raise NotImplementedError
>
>         def __divmod__(self, other):
>             """The pair (self // other, self % other).
>
>             Sometimes this can be computed faster than the pair of
>             operations.
>             """
>             return (self // other, self % other)
>
>         def __rdivmod__(self, other):
>             """The pair (self // other, self % other).
>
>             Sometimes this can be computed faster than the pair of
>             operations.
>             """
>             return (other // self, other % self)
>
>         @abstractmethod
>         def __floordiv__(self, other):
>             """The floor() of self/other. Integral."""
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rfloordiv__(self, other):
>             """The floor() of other/self."""
>             raise NotImplementedError
>
>         @abstractmethod
>         def __mod__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rmod__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __lt__(self, other):
>             """< on Reals defines a total ordering, except perhaps for NaN."""
>             raise NotImplementedError
>
>         @abstractmethod
>         def __le__(self, other):
>             raise NotImplementedError
>
>         # Concrete implementations of Complex abstract methods.
>
>         def __complex__(self):
>             return complex(float(self))
>
>         @property
>         def real(self):
>             return self
>
>         @property
>         def imag(self):
>             return 0
>
>         def conjugate(self):
>             """Conjugate is a no-op for Reals."""
>             return self
>
>
> There is no built-in rational type, but it's straightforward to write,
> so we provide an ABC for it. **Open issue**: Add Demo/classes/Rat.py
> to the stdlib? ::
>
>     class Rational(Real, Exact):
>         """.numerator and .denominator should be in lowest terms."""
>
>         @abstractproperty
>         def numerator(self):
>             raise NotImplementedError
>
>         @abstractproperty
>         def denominator(self):
>             raise NotImplementedError
>
>         # Concrete implementation of Real's conversion to float.
>
>         def __float__(self):
>             return self.numerator / self.denominator
>
>
> And finally integers::
>
>     class Integral(Rational):
>         """Integral adds a conversion to int and the bit-string operations."""
>
>         @abstractmethod
>         def __int__(self):
>             raise NotImplementedError
>
>         def __index__(self):
>             return int(self)
>
>         @abstractmethod
>         def __pow__(self, exponent, modulus):
>             """self ** exponent % modulus, but maybe faster.
>
>             Implement this if you want to support the 3-argument version
>             of pow(). Otherwise, just implement the 2-argument version
>             described in Complex. Raise a TypeError if exponent < 0 or any
>             argument isn't Integral.
>             """
>             raise NotImplementedError
>
>         @abstractmethod
>         def __lshift__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rlshift__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rshift__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rrshift__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __and__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rand__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __xor__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __rxor__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __or__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __ror__(self, other):
>             raise NotImplementedError
>
>         @abstractmethod
>         def __invert__(self):
>             raise NotImplementedError
>
>         # Concrete implementations of Rational and Real abstract methods.
>
>         def __float__(self):
>             return float(int(self))
>
>         @property
>         def numerator(self):
>             return self
>
>         @property
>         def denominator(self):
>             return 1
>
>
> Exact vs. Inexact Classes
> -------------------------
>
> Floating point values may not exactly obey several of the properties
> you would expect. For example, it is possible for ``(X + -X) + 3 ==
> 3``, but ``X + (-X + 3) == 0``. On the range of values that most
> functions deal with this isn't a problem, but it is something to be
> aware of.
>
> Therefore, I define ``Exact`` and ``Inexact`` ABCs to mark whether
> types have this problem. Every instance of ``Integral`` and
> ``Rational`` should be Exact, but ``Reals`` and ``Complexes`` may or
> may not be. (Do we really only need one of these, and the other is
> defined as ``not`` the first?) ::
>
>     class Exact(Number): pass
>     class Inexact(Number): pass
>
>
> Changes to operations and __magic__ methods
> -------------------------------------------
>
> To support more precise narrowing from float to int (and more
> generally, from Real to Integral), I'm proposing the following new
> __magic__ methods, to be called from the corresponding library
> functions. All of these return Integrals rather than Reals.
>
> 1. ``__trunc__(self)``, called from a new builtin ``trunc(x)``, which
>    returns the Integral closest to ``x`` between 0 and ``x``.
>
> 2. ``__floor__(self)``, called from ``math.floor(x)``, which returns
>    the greatest Integral ``<= x``.
>
> 3. ``__ceil__(self)``, called from ``math.ceil(x)``, which returns the
>    least Integral ``>= x``.
>
> 4. ``__round__(self)``, called from ``round(x)``, with returns the
>    Integral closest to ``x``, rounding half toward even. **Open
>    issue:** We could support the 2-argument version, but then we'd
>    only return an Integral if the second argument were ``<= 0``.
>
> 5. ``__properfraction__(self)``, called from a new function,
>    ``math.properfraction(x)``, which resembles C's ``modf()``: returns
>    a pair ``(n:Integral, r:Real)`` where ``x == n + r``, both ``n``
>    and ``r`` have the same sign as ``x``, and ``abs(r) < 1``. **Open
>    issue:** Oh, we already have ``math.modf``. What name do we want
>    for this? Should we use divmod(x, 1) instead?
>
> Because the ``int()`` conversion from ``float`` is equivalent to but
> less explicit than ``trunc()``, let's remove it. (Or, if that breaks
> too much, just add a deprecation warning.)
>
> ``complex.__{divmod,mod,floordiv,int,float}__`` should also go
> away. These should continue to raise ``TypeError`` to help confused
> porters, but should not appear in ``help(complex)`` to avoid confusing
> more people. **Open issue:** This is difficult to do with the
> ``PyNumberMethods`` struct. What's the best way to accomplish it?
>
>
> Notes for type implementors
> ---------------------------
>
> Implementors should be careful to make equal numbers equal and
> hash them to the same values. This may be subtle if there are two
> different extensions of the real numbers. For example, a complex type
> could reasonably implement hash() as follows::
>
>         def __hash__(self):
>             return hash(complex(self))
>
> but should be careful of any values that fall outside of the built in
> complex's range or precision.
>
> Adding More Numeric ABCs
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
> There are, of course, more possible ABCs for numbers, and this would
> be a poor hierarchy if it precluded the possibility of adding
> those. You can add ``MyFoo`` between ``Complex`` and ``Real`` with::
>
>     class MyFoo(Complex): ...
>     MyFoo.register(Real)
>
> Implementing the arithmetic operations
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> We want to implement the arithmetic operations so that mixed-mode
> operations either call an implementation whose author knew about the
> types of both arguments, or convert both to the nearest built in type
> and do the operation there. For subtypes of Integral, this means that
> __add__ and __radd__ should be defined as::
>
>     class MyIntegral(Integral):
>
>         def __add__(self, other):
>             if isinstance(other, MyIntegral):
>                 return do_my_adding_stuff(self, other)
>             elif isinstance(other, OtherTypeIKnowAbout):
>                 return do_my_other_adding_stuff(self, other)
>             else:
>                 return NotImplemented
>
>         def __radd__(self, other):
>             if isinstance(other, MyIntegral):
>                 return do_my_adding_stuff(other, self)
>             elif isinstance(other, OtherTypeIKnowAbout):
>                 return do_my_other_adding_stuff(other, self)
>             elif isinstance(other, Integral):
>                 return int(other) + int(self)
>             elif isinstance(other, Real):
>                 return float(other) + float(self)
>             elif isinstance(other, Complex):
>                 return complex(other) + complex(self)
>             else:
>                 return NotImplemented
>
>
> There are 5 different cases for a mixed-type operation on subclasses
> of Complex. I'll refer to all of the above code that doesn't refer to
> MyIntegral and OtherTypeIKnowAbout as "boilerplate". ``a`` will be an
> instance of ``A``, which is a subtype of ``Complex`` (``a : A <:
> Complex``), and ``b : B <: Complex``. I'll consider ``a + b``:
>
>     1. If A defines an __add__ which accepts b, all is well.
>     2. If A falls back to the boilerplate code, and it were to return
>        a value from __add__, we'd miss the possibility that B defines
>        a more intelligent __radd__, so the boilerplate should return
>        NotImplemented from __add__. (Or A may not implement __add__ at
>        all.)
>     3. Then B's __radd__ gets a chance. If it accepts a, all is well.
>     4. If it falls back to the boilerplate, there are no more possible
>        methods to try, so this is where the default implementation
>        should live.
>     5. If B <: A, Python tries B.__radd__ before A.__add__. This is
>        ok, because it was implemented with knowledge of A, so it can
>        handle those instances before delegating to Complex.
>
> If ``A<:Complex`` and ``B<:Real`` without sharing any other knowledge,
> then the appropriate shared operation is the one involving the built
> in complex, and both __radd__s land there, so ``a+b == b+a``.
>
>
> Rejected Alternatives
> =====================
>
> The initial version of this PEP defined an algebraic hierarchy
> inspired by a Haskell Numeric Prelude [#numericprelude]_ including
> MonoidUnderPlus, AdditiveGroup, Ring, and Field, and mentioned several
> other possible algebraic types before getting to the numbers. I had
> expected this to be useful to people using vectors and matrices, but
> the NumPy community really wasn't interested, and we ran into the
> issue that even if ``x`` is an instance of ``X <: MonoidUnderPlus``
> and ``y`` is an instance of ``Y <: MonoidUnderPlus``, ``x + y`` may
> still not make sense.
>
> Then I gave the numbers a much more branching structure to include
> things like the Gaussian Integers and Z/nZ, which could be Complex but
> wouldn't necessarily support things like division. The community
> decided that this was too much complication for Python, so I've now
> scaled back the proposal to resemble the Scheme numeric tower much
> more closely.
>
>
> References
> ==========
>
> .. [#pep3119] Introducing Abstract Base Classes
>    (http://www.python.org/dev/peps/pep-3119/)
>
> .. [#classtree] Possible Python 3K Class Tree?, wiki page created by
> Bill Janssen
>    (http://wiki.python.org/moin/AbstractBaseClasses)
>
> .. [#numericprelude] NumericPrelude: An experimental alternative
> hierarchy of numeric type classes
>    (http://darcs.haskell.org/numericprelude/docs/html/index.html)
>
> .. [#schemetower] The Scheme numerical tower
>    (http://www.swiss.ai.mit.edu/ftpdir/scheme-reports/r5rs-html/r5rs_8.html#SEC50)
>
>
> Acknowledgements
> ================
>
> Thanks to Neil Norwitz for encouraging me to write this PEP in the
> first place, to Travis Oliphant for pointing out that the numpy people
> didn't really care about the algebraic concepts, to Alan Isaac for
> reminding me that Scheme had already done this, and to Guido van
> Rossum and lots of other people on the mailing list for refining the
> concept.
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>
>


-- 
Namast?,
Jeffrey Yasskin
http://jeffrey.yasskin.info/

"Religion is an improper response to the Divine." ? "Skinny Legs and
All", by Tom Robbins

From skip at pobox.com  Wed Aug 22 21:55:06 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 22 Aug 2007 14:55:06 -0500
Subject: [Python-3000] Str v. Unicode in C?
Message-ID: <18124.38042.622272.863273@montanaro.dyndns.org>


If I want to check an object for stringedness in py3k do I use
PyString_Check or PyUnicode_Check?

Thx,

Skip


From guido at python.org  Wed Aug 22 21:57:32 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 22 Aug 2007 12:57:32 -0700
Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy
	for Numbers
In-Reply-To: <5d44f72f0708221236k7c3ea054k43eb237f4a3ef577@mail.gmail.com>
References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com>
	<5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>
	<5d44f72f0708221236k7c3ea054k43eb237f4a3ef577@mail.gmail.com>
Message-ID: <ca471dc20708221257u37420efam25c895fe138b72f7@mail.gmail.com>

On 8/22/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> There are still some open issues here that need answers:
>
> * Should __pos__ coerce the argument to be an instance of the type
> it's defined on?

Yes, I think so. That's what the built-in types do (in case the object
is an instance of a subclass). It makes sense because all other
operators do this too (unless overridden).

> * Add Demo/classes/Rat.py to the stdlib?

Yes, but it needs a makeover. At the very least I'd propose the module
name to be rational.

The code is really old.

> * How many of __trunc__, __floor__, __ceil__, and __round__ should be
> magic methods?

I'm okay with all of these.

> For __round__, when do we want to return an Integral?

When the second argument is absent only.

> [__properfraction__ is probably subsumed by divmod(x, 1).]

Probably, but see PEP 3100, which still lists __mod__ and __divmod__
as to be deleted.

> * How to give the removed methods (divmod, etc. on complex) good error
> messages without having them show up in help(complex)?

If Complex doesn't define them, they'll be TypeErrors, and that's good
enough IMO.

> I'll look into this during the sprint.
>
> On 8/2/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> > After some more discussion, I have another version of the PEP with a
> > draft, partial implementation. Let me know what you think.
> >
> >
> >
> > PEP: 3141
> > Title: A Type Hierarchy for Numbers
> > Version: $Revision: 56646 $
> > Last-Modified: $Date: 2007-08-01 10:11:55 -0700 (Wed, 01 Aug 2007) $
> > Author: Jeffrey Yasskin <jyasskin at gmail.com>
> > Status: Draft
> > Type: Standards Track
> > Content-Type: text/x-rst
> > Created: 23-Apr-2007
> > Post-History: 25-Apr-2007, 16-May-2007, 02-Aug-2007
> >
> >
> > Abstract
> > ========
> >
> > This proposal defines a hierarchy of Abstract Base Classes (ABCs) (PEP
> > 3119) to represent number-like classes. It proposes a hierarchy of
> > ``Number :> Complex :> Real :> Rational :> Integral`` where ``A :> B``
> > means "A is a supertype of B", and a pair of ``Exact``/``Inexact``
> > classes to capture the difference between ``floats`` and
> > ``ints``. These types are significantly inspired by Scheme's numeric
> > tower [#schemetower]_.
> >
> > Rationale
> > =========
> >
> > Functions that take numbers as arguments should be able to determine
> > the properties of those numbers, and if and when overloading based on
> > types is added to the language, should be overloadable based on the
> > types of the arguments. For example, slicing requires its arguments to
> > be ``Integrals``, and the functions in the ``math`` module require
> > their arguments to be ``Real``.
> >
> > Specification
> > =============
> >
> > This PEP specifies a set of Abstract Base Classes, and suggests a
> > general strategy for implementing some of the methods. It uses
> > terminology from PEP 3119, but the hierarchy is intended to be
> > meaningful for any systematic method of defining sets of classes.
> >
> > The type checks in the standard library should use these classes
> > instead of the concrete built-ins.
> >
> >
> > Numeric Classes
> > ---------------
> >
> > We begin with a Number class to make it easy for people to be fuzzy
> > about what kind of number they expect. This class only helps with
> > overloading; it doesn't provide any operations. ::
> >
> >     class Number(metaclass=ABCMeta): pass
> >
> >
> > Most implementations of complex numbers will be hashable, but if you
> > need to rely on that, you'll have to check it explicitly: mutable
> > numbers are supported by this hierarchy. **Open issue:** Should
> > __pos__ coerce the argument to be an instance of the type it's defined
> > on? Why do the builtins do this? ::
> >
> >     class Complex(Number):
> >         """Complex defines the operations that work on the builtin complex type.
> >
> >         In short, those are: a conversion to complex, .real, .imag, +, -,
> >         *, /, abs(), .conjugate, ==, and !=.
> >
> >         If it is given heterogenous arguments, and doesn't have special
> >         knowledge about them, it should fall back to the builtin complex
> >         type as described below.
> >         """
> >
> >         @abstractmethod
> >         def __complex__(self):
> >             """Return a builtin complex instance."""
> >
> >         def __bool__(self):
> >             """True if self != 0."""
> >             return self != 0
> >
> >         @abstractproperty
> >         def real(self):
> >             """Retrieve the real component of this number.
> >
> >             This should subclass Real.
> >             """
> >             raise NotImplementedError
> >
> >         @abstractproperty
> >         def imag(self):
> >             """Retrieve the real component of this number.
> >
> >             This should subclass Real.
> >             """
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __add__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __radd__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __neg__(self):
> >             raise NotImplementedError
> >
> >         def __pos__(self):
> >             return self
> >
> >         def __sub__(self, other):
> >             return self + -other
> >
> >         def __rsub__(self, other):
> >             return -self + other
> >
> >         @abstractmethod
> >         def __mul__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rmul__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __div__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rdiv__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __pow__(self, exponent):
> >             """Like division, a**b should promote to complex when necessary."""
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rpow__(self, base):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __abs__(self):
> >             """Returns the Real distance from 0."""
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def conjugate(self):
> >             """(x+y*i).conjugate() returns (x-y*i)."""
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __eq__(self, other):
> >             raise NotImplementedError
> >
> >         def __ne__(self, other):
> >             return not (self == other)
> >
> >
> > The ``Real`` ABC indicates that the value is on the real line, and
> > supports the operations of the ``float`` builtin. Real numbers are
> > totally ordered except for NaNs (which this PEP basically ignores). ::
> >
> >     class Real(Complex):
> >         """To Complex, Real adds the operations that work on real numbers.
> >
> >         In short, those are: a conversion to float, trunc(), divmod,
> >         %, <, <=, >, and >=.
> >
> >         Real also provides defaults for the derived operations.
> >         """
> >
> >         @abstractmethod
> >         def __float__(self):
> >             """Any Real can be converted to a native float object."""
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __trunc__(self):
> >             """Truncates self to an Integral.
> >
> >             Returns an Integral i such that:
> >               * i>0 iff self>0
> >               * abs(i) <= abs(self).
> >             """
> >             raise NotImplementedError
> >
> >         def __divmod__(self, other):
> >             """The pair (self // other, self % other).
> >
> >             Sometimes this can be computed faster than the pair of
> >             operations.
> >             """
> >             return (self // other, self % other)
> >
> >         def __rdivmod__(self, other):
> >             """The pair (self // other, self % other).
> >
> >             Sometimes this can be computed faster than the pair of
> >             operations.
> >             """
> >             return (other // self, other % self)
> >
> >         @abstractmethod
> >         def __floordiv__(self, other):
> >             """The floor() of self/other. Integral."""
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rfloordiv__(self, other):
> >             """The floor() of other/self."""
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __mod__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rmod__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __lt__(self, other):
> >             """< on Reals defines a total ordering, except perhaps for NaN."""
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __le__(self, other):
> >             raise NotImplementedError
> >
> >         # Concrete implementations of Complex abstract methods.
> >
> >         def __complex__(self):
> >             return complex(float(self))
> >
> >         @property
> >         def real(self):
> >             return self
> >
> >         @property
> >         def imag(self):
> >             return 0
> >
> >         def conjugate(self):
> >             """Conjugate is a no-op for Reals."""
> >             return self
> >
> >
> > There is no built-in rational type, but it's straightforward to write,
> > so we provide an ABC for it. **Open issue**: Add Demo/classes/Rat.py
> > to the stdlib? ::
> >
> >     class Rational(Real, Exact):
> >         """.numerator and .denominator should be in lowest terms."""
> >
> >         @abstractproperty
> >         def numerator(self):
> >             raise NotImplementedError
> >
> >         @abstractproperty
> >         def denominator(self):
> >             raise NotImplementedError
> >
> >         # Concrete implementation of Real's conversion to float.
> >
> >         def __float__(self):
> >             return self.numerator / self.denominator
> >
> >
> > And finally integers::
> >
> >     class Integral(Rational):
> >         """Integral adds a conversion to int and the bit-string operations."""
> >
> >         @abstractmethod
> >         def __int__(self):
> >             raise NotImplementedError
> >
> >         def __index__(self):
> >             return int(self)
> >
> >         @abstractmethod
> >         def __pow__(self, exponent, modulus):
> >             """self ** exponent % modulus, but maybe faster.
> >
> >             Implement this if you want to support the 3-argument version
> >             of pow(). Otherwise, just implement the 2-argument version
> >             described in Complex. Raise a TypeError if exponent < 0 or any
> >             argument isn't Integral.
> >             """
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __lshift__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rlshift__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rshift__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rrshift__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __and__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rand__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __xor__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __rxor__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __or__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __ror__(self, other):
> >             raise NotImplementedError
> >
> >         @abstractmethod
> >         def __invert__(self):
> >             raise NotImplementedError
> >
> >         # Concrete implementations of Rational and Real abstract methods.
> >
> >         def __float__(self):
> >             return float(int(self))
> >
> >         @property
> >         def numerator(self):
> >             return self
> >
> >         @property
> >         def denominator(self):
> >             return 1
> >
> >
> > Exact vs. Inexact Classes
> > -------------------------
> >
> > Floating point values may not exactly obey several of the properties
> > you would expect. For example, it is possible for ``(X + -X) + 3 ==
> > 3``, but ``X + (-X + 3) == 0``. On the range of values that most
> > functions deal with this isn't a problem, but it is something to be
> > aware of.
> >
> > Therefore, I define ``Exact`` and ``Inexact`` ABCs to mark whether
> > types have this problem. Every instance of ``Integral`` and
> > ``Rational`` should be Exact, but ``Reals`` and ``Complexes`` may or
> > may not be. (Do we really only need one of these, and the other is
> > defined as ``not`` the first?) ::
> >
> >     class Exact(Number): pass
> >     class Inexact(Number): pass
> >
> >
> > Changes to operations and __magic__ methods
> > -------------------------------------------
> >
> > To support more precise narrowing from float to int (and more
> > generally, from Real to Integral), I'm proposing the following new
> > __magic__ methods, to be called from the corresponding library
> > functions. All of these return Integrals rather than Reals.
> >
> > 1. ``__trunc__(self)``, called from a new builtin ``trunc(x)``, which
> >    returns the Integral closest to ``x`` between 0 and ``x``.
> >
> > 2. ``__floor__(self)``, called from ``math.floor(x)``, which returns
> >    the greatest Integral ``<= x``.
> >
> > 3. ``__ceil__(self)``, called from ``math.ceil(x)``, which returns the
> >    least Integral ``>= x``.
> >
> > 4. ``__round__(self)``, called from ``round(x)``, with returns the
> >    Integral closest to ``x``, rounding half toward even. **Open
> >    issue:** We could support the 2-argument version, but then we'd
> >    only return an Integral if the second argument were ``<= 0``.
> >
> > 5. ``__properfraction__(self)``, called from a new function,
> >    ``math.properfraction(x)``, which resembles C's ``modf()``: returns
> >    a pair ``(n:Integral, r:Real)`` where ``x == n + r``, both ``n``
> >    and ``r`` have the same sign as ``x``, and ``abs(r) < 1``. **Open
> >    issue:** Oh, we already have ``math.modf``. What name do we want
> >    for this? Should we use divmod(x, 1) instead?
> >
> > Because the ``int()`` conversion from ``float`` is equivalent to but
> > less explicit than ``trunc()``, let's remove it. (Or, if that breaks
> > too much, just add a deprecation warning.)
> >
> > ``complex.__{divmod,mod,floordiv,int,float}__`` should also go
> > away. These should continue to raise ``TypeError`` to help confused
> > porters, but should not appear in ``help(complex)`` to avoid confusing
> > more people. **Open issue:** This is difficult to do with the
> > ``PyNumberMethods`` struct. What's the best way to accomplish it?
> >
> >
> > Notes for type implementors
> > ---------------------------
> >
> > Implementors should be careful to make equal numbers equal and
> > hash them to the same values. This may be subtle if there are two
> > different extensions of the real numbers. For example, a complex type
> > could reasonably implement hash() as follows::
> >
> >         def __hash__(self):
> >             return hash(complex(self))
> >
> > but should be careful of any values that fall outside of the built in
> > complex's range or precision.
> >
> > Adding More Numeric ABCs
> > ~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > There are, of course, more possible ABCs for numbers, and this would
> > be a poor hierarchy if it precluded the possibility of adding
> > those. You can add ``MyFoo`` between ``Complex`` and ``Real`` with::
> >
> >     class MyFoo(Complex): ...
> >     MyFoo.register(Real)
> >
> > Implementing the arithmetic operations
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > We want to implement the arithmetic operations so that mixed-mode
> > operations either call an implementation whose author knew about the
> > types of both arguments, or convert both to the nearest built in type
> > and do the operation there. For subtypes of Integral, this means that
> > __add__ and __radd__ should be defined as::
> >
> >     class MyIntegral(Integral):
> >
> >         def __add__(self, other):
> >             if isinstance(other, MyIntegral):
> >                 return do_my_adding_stuff(self, other)
> >             elif isinstance(other, OtherTypeIKnowAbout):
> >                 return do_my_other_adding_stuff(self, other)
> >             else:
> >                 return NotImplemented
> >
> >         def __radd__(self, other):
> >             if isinstance(other, MyIntegral):
> >                 return do_my_adding_stuff(other, self)
> >             elif isinstance(other, OtherTypeIKnowAbout):
> >                 return do_my_other_adding_stuff(other, self)
> >             elif isinstance(other, Integral):
> >                 return int(other) + int(self)
> >             elif isinstance(other, Real):
> >                 return float(other) + float(self)
> >             elif isinstance(other, Complex):
> >                 return complex(other) + complex(self)
> >             else:
> >                 return NotImplemented
> >
> >
> > There are 5 different cases for a mixed-type operation on subclasses
> > of Complex. I'll refer to all of the above code that doesn't refer to
> > MyIntegral and OtherTypeIKnowAbout as "boilerplate". ``a`` will be an
> > instance of ``A``, which is a subtype of ``Complex`` (``a : A <:
> > Complex``), and ``b : B <: Complex``. I'll consider ``a + b``:
> >
> >     1. If A defines an __add__ which accepts b, all is well.
> >     2. If A falls back to the boilerplate code, and it were to return
> >        a value from __add__, we'd miss the possibility that B defines
> >        a more intelligent __radd__, so the boilerplate should return
> >        NotImplemented from __add__. (Or A may not implement __add__ at
> >        all.)
> >     3. Then B's __radd__ gets a chance. If it accepts a, all is well.
> >     4. If it falls back to the boilerplate, there are no more possible
> >        methods to try, so this is where the default implementation
> >        should live.
> >     5. If B <: A, Python tries B.__radd__ before A.__add__. This is
> >        ok, because it was implemented with knowledge of A, so it can
> >        handle those instances before delegating to Complex.
> >
> > If ``A<:Complex`` and ``B<:Real`` without sharing any other knowledge,
> > then the appropriate shared operation is the one involving the built
> > in complex, and both __radd__s land there, so ``a+b == b+a``.
> >
> >
> > Rejected Alternatives
> > =====================
> >
> > The initial version of this PEP defined an algebraic hierarchy
> > inspired by a Haskell Numeric Prelude [#numericprelude]_ including
> > MonoidUnderPlus, AdditiveGroup, Ring, and Field, and mentioned several
> > other possible algebraic types before getting to the numbers. I had
> > expected this to be useful to people using vectors and matrices, but
> > the NumPy community really wasn't interested, and we ran into the
> > issue that even if ``x`` is an instance of ``X <: MonoidUnderPlus``
> > and ``y`` is an instance of ``Y <: MonoidUnderPlus``, ``x + y`` may
> > still not make sense.
> >
> > Then I gave the numbers a much more branching structure to include
> > things like the Gaussian Integers and Z/nZ, which could be Complex but
> > wouldn't necessarily support things like division. The community
> > decided that this was too much complication for Python, so I've now
> > scaled back the proposal to resemble the Scheme numeric tower much
> > more closely.
> >
> >
> > References
> > ==========
> >
> > .. [#pep3119] Introducing Abstract Base Classes
> >    (http://www.python.org/dev/peps/pep-3119/)
> >
> > .. [#classtree] Possible Python 3K Class Tree?, wiki page created by
> > Bill Janssen
> >    (http://wiki.python.org/moin/AbstractBaseClasses)
> >
> > .. [#numericprelude] NumericPrelude: An experimental alternative
> > hierarchy of numeric type classes
> >    (http://darcs.haskell.org/numericprelude/docs/html/index.html)
> >
> > .. [#schemetower] The Scheme numerical tower
> >    (http://www.swiss.ai.mit.edu/ftpdir/scheme-reports/r5rs-html/r5rs_8.html#SEC50)
> >
> >
> > Acknowledgements
> > ================
> >
> > Thanks to Neil Norwitz for encouraging me to write this PEP in the
> > first place, to Travis Oliphant for pointing out that the numpy people
> > didn't really care about the algebraic concepts, to Alan Isaac for
> > reminding me that Scheme had already done this, and to Guido van
> > Rossum and lots of other people on the mailing list for refining the
> > concept.
> >
> > Copyright
> > =========
> >
> > This document has been placed in the public domain.
> >
> >
> >
> > ..
> >    Local Variables:
> >    mode: indented-text
> >    indent-tabs-mode: nil
> >    sentence-end-double-space: t
> >    fill-column: 70
> >    coding: utf-8
> >    End:
> >
> >
>
>
> --
> Namast?,
> Jeffrey Yasskin
> http://jeffrey.yasskin.info/
>
> "Religion is an improper response to the Divine." ? "Skinny Legs and
> All", by Tom Robbins
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Wed Aug 22 22:28:09 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 22 Aug 2007 22:28:09 +0200
Subject: [Python-3000] Str v. Unicode in C?
In-Reply-To: <18124.38042.622272.863273@montanaro.dyndns.org>
References: <18124.38042.622272.863273@montanaro.dyndns.org>
Message-ID: <46CC9C59.1000306@v.loewis.de>

skip at pobox.com schrieb:
> If I want to check an object for stringedness in py3k do I use
> PyString_Check or PyUnicode_Check?

In the medium term, you should use PyUnicode_Check. In the short
term, additionally, do PyString_Check as well if you want to
support str8 (your choice). In the long term, it might be that
PyUnicode_Check gets renamed to PyString_Check, provided that
str8 is removed from the code base.

Regards,
Martin

From rrr at ronadam.com  Thu Aug 23 00:43:03 2007
From: rrr at ronadam.com (Ron Adam)
Date: Wed, 22 Aug 2007 17:43:03 -0500
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46CC68EB.2030609@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<46CC68EB.2030609@trueblade.com>
Message-ID: <46CCBBF7.3060201@ronadam.com>



Eric Smith wrote:
> Eric Smith wrote:
>> Talin wrote:
>>> A new version is up, incorporating material from the various discussions 
>>> on this list:
>>>
>>> 	http://www.python.org/dev/peps/pep-3101/
>> self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e')
>> self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e')
 >
> I've been re-reading the PEP, in an effort to make sure everything is 
> working.  I realized that these tests should not pass.  The PEP says 
> that "Format specifiers can themselves contain replacement fields".  The 
> tests above have replacement fields in the field name, which is not 
> allowed.  I'm going to remove this functionality.
> 
> I believe the intent is to support a replacement for:
> "%.*s" % (4, 'how now brown cow')
> 
> Which would be:
> "{0:.{1}}".format('how now brown cow', 4)
> 
> For this, there's no need for replacement on field name.  I've taken it 
> out of the code, and made these tests in to errors.
> 
> Eric.

I think it should work myself, but it could be added back in later if there 
is a need to.


I'm still concerned about the choice of {{ and }} as escaped brackets.

What does the following do?


"{0:{{^{1}}".format('Python', '12')

"{{{{Python}}}}"



"{{{0:{{^{1}}}}".format('Python', '12')

"{{{{{Python}}}}}"



class ShowSpec(str):
     def __format__(self, spec):
	return spec

ShowSpec("{0:{{{1}}}}").format('abc', 'xyz')

"{{xyz}}"



"{0}".format('{value:{{^{width}}', width='10', value='Python')

"{{Python}}"


_RON






From john.reese at gmail.com  Thu Aug 23 00:43:35 2007
From: john.reese at gmail.com (John Reese)
Date: Wed, 22 Aug 2007 15:43:35 -0700
Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k
Message-ID: <bc1fe4bd0708221543h5fa41d87jf94145275e2c2bdd@mail.gmail.com>

Good afternoon.  I'm in the Google Python Sprint working on getting
the test_xmlrpc unittest to pass.  The following patch was prepared by
Jacques Frechet and me.  We'd appreciate feedback on the attached
patch.

What was broken:


1. BaseHTTPServer attempts to parse the http headers with an
rfc822.Message class.  This was changed in r56905 by Jeremy Hylton to
use the new io library instead of stringio as before.  Unfortunately
Jeremy's change resulted in TextIOWrapper stealing part of the HTTP
request body, due to its buffering quantum.  This was not seen in
normal tests because GET requests have no body, but xmlrpc uses POSTs.
 We fixed this by doing the equivalent of what was done before, but
using io.StringIO instead of the old cStringIO class: we pull out just
the header using a sequence of readlines.


2. Once this was fixed, a second error asserted:
test_xmlrpc.test_with{_no,}_info call .get on the headers object from
xmlrpclib.ProtocolError.  This fails because the headers object became
a list in r57194.  The story behind this is somewhat complicated:
  - xmlrpclib used to use httplib.HTTP, which is old and deprecated
  - r57024 Jeremy Hylton switched py3k to use more modern httplib
infrastructure, but broke xmlrpclib.Transport.request; the "headers"
variable was now referenced without being set
  - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the
headers in a way that didn't explode; unfortunately, it now returned a
list instead of a dict, but there were no tests to catch this
  - r57221 Guido integrated xmlrpc changes from the trunk, including
r57158, which added tests that relied on headers being a dict.
Unfortunately, it no longer was.


3. test_xmlrpc.test_fail_with_info was failing because the ValueError
string of int('nonintegralstring') in py3k currently has an "s".  This
is presumably going away soon; the test now uses a regular expression
with an optional leading "s", which is a little silly, but r56209 is
prior art.

>>> int('z')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: s'z'
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xmlrpc.patch
Type: application/octet-stream
Size: 3372 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070822/aa87a546/attachment.obj 

From python3now at gmail.com  Thu Aug 23 01:23:07 2007
From: python3now at gmail.com (James Thiele)
Date: Wed, 22 Aug 2007 16:23:07 -0700
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46C2809C.3000806@acm.org>
References: <46C2809C.3000806@acm.org>
Message-ID: <8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com>

In the section "Explicit Conversion Flag" of PEP 3101 it says:

Currently, two explicit conversion flags are recognized:

        !r - convert the value to a string using repr().
        !s - convert the value to a string using str().
--
It does not say what action is taken if an unrecognized explicit
conversion flag is found.

Later in the PEP the pseudocode for vformat() silently ignores the
case of an unrecognized explicit conversion flag. This seems
unPythonic to me but if this is the desired behavior please make it
clear in the PEP.

On 8/14/07, Talin <talin at acm.org> wrote:
> A new version is up, incorporating material from the various discussions
> on this list:
>
>         http://www.python.org/dev/peps/pep-3101/
>
> Diffs are here:
>
> http://svn.python.org/view/peps/trunk/pep-3101.txt?rev=57044&r1=56535&r2=57044
>
>
> -- Talin
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/python3now%40gmail.com
>

From guido at python.org  Thu Aug 23 01:42:39 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 22 Aug 2007 16:42:39 -0700
Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k
In-Reply-To: <bc1fe4bd0708221543h5fa41d87jf94145275e2c2bdd@mail.gmail.com>
References: <bc1fe4bd0708221543h5fa41d87jf94145275e2c2bdd@mail.gmail.com>
Message-ID: <ca471dc20708221642r537eaf22v208638c655c90e9c@mail.gmail.com>

Thanks! I've checked the bulk of this in, excepting the fix for #3,
which I fixed at the source in longobject.c. Also, I changed the call
to io.StringIO() to first convert the bytes to characters, using the
same encoding as used for the HTTP request header line (Latin-1).

--Guido

On 8/22/07, John Reese <john.reese at gmail.com> wrote:
> Good afternoon.  I'm in the Google Python Sprint working on getting
> the test_xmlrpc unittest to pass.  The following patch was prepared by
> Jacques Frechet and me.  We'd appreciate feedback on the attached
> patch.
>
> What was broken:
>
>
> 1. BaseHTTPServer attempts to parse the http headers with an
> rfc822.Message class.  This was changed in r56905 by Jeremy Hylton to
> use the new io library instead of stringio as before.  Unfortunately
> Jeremy's change resulted in TextIOWrapper stealing part of the HTTP
> request body, due to its buffering quantum.  This was not seen in
> normal tests because GET requests have no body, but xmlrpc uses POSTs.
>  We fixed this by doing the equivalent of what was done before, but
> using io.StringIO instead of the old cStringIO class: we pull out just
> the header using a sequence of readlines.
>
>
> 2. Once this was fixed, a second error asserted:
> test_xmlrpc.test_with{_no,}_info call .get on the headers object from
> xmlrpclib.ProtocolError.  This fails because the headers object became
> a list in r57194.  The story behind this is somewhat complicated:
>   - xmlrpclib used to use httplib.HTTP, which is old and deprecated
>   - r57024 Jeremy Hylton switched py3k to use more modern httplib
> infrastructure, but broke xmlrpclib.Transport.request; the "headers"
> variable was now referenced without being set
>   - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the
> headers in a way that didn't explode; unfortunately, it now returned a
> list instead of a dict, but there were no tests to catch this
>   - r57221 Guido integrated xmlrpc changes from the trunk, including
> r57158, which added tests that relied on headers being a dict.
> Unfortunately, it no longer was.
>
>
> 3. test_xmlrpc.test_fail_with_info was failing because the ValueError
> string of int('nonintegralstring') in py3k currently has an "s".  This
> is presumably going away soon; the test now uses a regular expression
> with an optional leading "s", which is a little silly, but r56209 is
> prior art.
>
> >>> int('z')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: invalid literal for int() with base 10: s'z'
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Thu Aug 23 01:46:34 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 22 Aug 2007 19:46:34 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com>
References: <46C2809C.3000806@acm.org>
	<8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com>
Message-ID: <46CCCADA.1060206@trueblade.com>

James Thiele wrote:
> In the section "Explicit Conversion Flag" of PEP 3101 it says:
> 
> Currently, two explicit conversion flags are recognized:
> 
>         !r - convert the value to a string using repr().
>         !s - convert the value to a string using str().
> --
> It does not say what action is taken if an unrecognized explicit
> conversion flag is found.

My implementation raises a ValueError, which I think is the desired 
behavior:

 >>> "{0!x}".format(1)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: Unknown converion specifier x

I agree the PEP should be explicit about this.


From guido at python.org  Thu Aug 23 01:52:20 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 22 Aug 2007 16:52:20 -0700
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46CCCADA.1060206@trueblade.com>
References: <46C2809C.3000806@acm.org>
	<8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com>
	<46CCCADA.1060206@trueblade.com>
Message-ID: <ca471dc20708221652p27fd3d60w3310cf6ed28c471e@mail.gmail.com>

On 8/22/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> James Thiele wrote:
> > In the section "Explicit Conversion Flag" of PEP 3101 it says:
> >
> > Currently, two explicit conversion flags are recognized:
> >
> >         !r - convert the value to a string using repr().
> >         !s - convert the value to a string using str().
> > --
> > It does not say what action is taken if an unrecognized explicit
> > conversion flag is found.
>
> My implementation raises a ValueError, which I think is the desired
> behavior:
>
>  >>> "{0!x}".format(1)
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> ValueError: Unknown converion specifier x

You raise ValueErrors for other errors with the format, right? If
there's a reason to be more lenient, the best approach would probably
be to interpret it as !r.

> I agree the PEP should be explicit about this.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Thu Aug 23 02:00:57 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 22 Aug 2007 20:00:57 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <ca471dc20708221652p27fd3d60w3310cf6ed28c471e@mail.gmail.com>
References: <46C2809C.3000806@acm.org>	
	<8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com>	
	<46CCCADA.1060206@trueblade.com>
	<ca471dc20708221652p27fd3d60w3310cf6ed28c471e@mail.gmail.com>
Message-ID: <46CCCE39.5060301@trueblade.com>

Guido van Rossum wrote:
> On 8/22/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
>> James Thiele wrote:
>>> In the section "Explicit Conversion Flag" of PEP 3101 it says:
>>>
>>> Currently, two explicit conversion flags are recognized:
>>>
>>>         !r - convert the value to a string using repr().
>>>         !s - convert the value to a string using str().
>>> --
>>> It does not say what action is taken if an unrecognized explicit
>>> conversion flag is found.
>> My implementation raises a ValueError, which I think is the desired
>> behavior:
>>
>>  >>> "{0!x}".format(1)
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>> ValueError: Unknown converion specifier x
> 
> You raise ValueErrors for other errors with the format, right? If
> there's a reason to be more lenient, the best approach would probably
> be to interpret it as !r.

Yes, ValueError gets raised for other errors with the format specifiers. 
  My concern is that if we silently treat unknown conversion specifiers 
as !r, we can't add other specifiers in the future without breaking 
existing code.


From mierle at gmail.com  Thu Aug 23 01:58:28 2007
From: mierle at gmail.com (Keir Mierle)
Date: Wed, 22 Aug 2007 16:58:28 -0700
Subject: [Python-3000] [PATCH] Fix math.ceil() behaviour for PEP3141
Message-ID: <ef5675f30708221658k2746c1al8e585d92efc97393@mail.gmail.com>

The attached patch fixes math.ceil to delegate to x.__ceil__() if it
is defined, according to PEP 3141, and adds tests to cover the new
cases. Patch is against r57303.

No new test failures are introduced.

Keir
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ceil.diff
Type: text/x-patch
Size: 1880 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070822/6654ccee/attachment-0001.bin 

From eric+python-dev at trueblade.com  Thu Aug 23 02:10:11 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 22 Aug 2007 20:10:11 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46CCBBF7.3060201@ronadam.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com>
Message-ID: <46CCD063.2060007@trueblade.com>

Ron Adam wrote:
>> I've been re-reading the PEP, in an effort to make sure everything is 
>> working.  I realized that these tests should not pass.  The PEP says 
>> that "Format specifiers can themselves contain replacement fields".  
>> The tests above have replacement fields in the field name, which is 
>> not allowed.  I'm going to remove this functionality.
>>
>> I believe the intent is to support a replacement for:
>> "%.*s" % (4, 'how now brown cow')
>>
>> Which would be:
>> "{0:.{1}}".format('how now brown cow', 4)
>>
>> For this, there's no need for replacement on field name.  I've taken 
>> it out of the code, and made these tests in to errors.
> 
> I think it should work myself, but it could be added back in later if 
> there is a need to.
> 
> 
> I'm still concerned about the choice of {{ and }} as escaped brackets.
> 
> What does the following do?
> 
> 
> "{0:{{^{1}}".format('Python', '12')

 >>> "{0:{{^{1}}".format('Python', '12')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: unterminated replacement field

But:
 >>> "{0:^{1}}".format('Python', '12')
'   Python   '

> "{{{0:{{^{1}}}}".format('Python', '12')
 >>> "{{{0:{{^{1}}}}".format('Python', '12')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: Unknown conversion type }

But,
 >>> "{{{0:^{1}}".format('Python', '12')
'{   Python   '

> class ShowSpec(str):
>     	
>     return spec
> 
> ShowSpec("{0:{{{1}}}}").format('abc', 'xyz')
> 

 >>> ShowSpec("{0:{{{1}}}}").format('abc', 'xyz')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: Invalid conversion specification

I think you mean:
ShowSpec("{0:{1}}").format('abc', 'xyz')

But I have some error with that.  I'm looking into it.

> "{0}".format('{value:{{^{width}}', width='10', value='Python')

 >>> "{0}".format('{value:{{^{width}}', width='10', value='Python')
'{value:{{^{width}}'



From greg at electricrain.com  Thu Aug 23 01:59:29 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Wed, 22 Aug 2007 16:59:29 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46B7FACC.8030503@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
Message-ID: <20070822235929.GA12780@electricrain.com>

On Tue, Aug 07, 2007 at 06:53:32AM +0200, "Martin v. L?wis" wrote:
> > I guess we have to rethink our use of these databases somewhat.
> 
> Ok. In the interest of progress, I'll be looking at coming up with
> some fixes for the code base right now; as we agree that the
> underlying semantics is bytes:bytes, any encoding wrappers on
> top of it can be added later.

The underlying Modules/_bsddb.c today uses PyArg_Parse(..., "s#", ...)
which if i read Python/getargs.c correctly is very lenient on the
input types it accepts.  It appears to accept anything with a buffer
API, auto-converting unicode to the default encoding as needed.

IMHO all of that is desirable in many situations but it is not strict.
bytes:bytes or int:bytes (depending on the database type) are
fundamentally all the C berkeleydb library knows.  Attaching meaning
to the keys and values is up to the user.  I'm about to try a _bsddb.c
that strictly enforces bytes as values for the underlying bsddb.db API
provided by _bsddb in my sandbox under the assumption that being
strict about bytes is desired at that level there.  I predict lots of
Lib/bsddb/test/ edits.

> My concern is that people need to access existing databases. It's
> all fine that the code accessing them breaks, and that they have
> to actively port to Py3k. However, telling them that they have to
> represent the keys in their dbm disk files in a different manner
> might cause a revolt...

agreed.  thus the importance of allowing bytes:bytes.


From greg at electricrain.com  Thu Aug 23 02:41:55 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Wed, 22 Aug 2007 17:41:55 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
Message-ID: <20070823004155.GB12780@electricrain.com>

> > There are currently about 7 failing unit tests left:
> >
> > test_bsddb
> > test_bsddb3
...

fyi these two pass for me on the current py3k branch on ubuntu linux
and mac os x 10.4.9.

-greg


From rrr at ronadam.com  Thu Aug 23 03:08:35 2007
From: rrr at ronadam.com (Ron Adam)
Date: Wed, 22 Aug 2007 20:08:35 -0500
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46CCD063.2060007@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com>
	<46CCD063.2060007@trueblade.com>
Message-ID: <46CCDE13.60502@ronadam.com>



Eric Smith wrote:
> Ron Adam wrote:
>>> I've been re-reading the PEP, in an effort to make sure everything is 
>>> working.  I realized that these tests should not pass.  The PEP says 
>>> that "Format specifiers can themselves contain replacement fields".  
>>> The tests above have replacement fields in the field name, which is 
>>> not allowed.  I'm going to remove this functionality.
>>>
>>> I believe the intent is to support a replacement for:
>>> "%.*s" % (4, 'how now brown cow')
>>>
>>> Which would be:
>>> "{0:.{1}}".format('how now brown cow', 4)
>>>
>>> For this, there's no need for replacement on field name.  I've taken 
>>> it out of the code, and made these tests in to errors.
>>
>> I think it should work myself, but it could be added back in later if 
>> there is a need to.
>>
>>
>> I'm still concerned about the choice of {{ and }} as escaped brackets.
>>
>> What does the following do?
>>
>>
>> "{0:{{^{1}}".format('Python', '12')
> 
>  >>> "{0:{{^{1}}".format('Python', '12')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: unterminated replacement field

When are the "{{" and "}}" escape characters replaced with  '{' and '}'?

> But:
>  >>> "{0:^{1}}".format('Python', '12')
> '   Python   '
 >

>> "{{{0:{{^{1}}}}".format('Python', '12')
>  >>> "{{{0:{{^{1}}}}".format('Python', '12')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: Unknown conversion type }
> 
> But,
>  >>> "{{{0:^{1}}".format('Python', '12')
> '{   Python   '

So escaping '{' with '{{' and '}' with '}}' doesn't work inside of format 
expressions?

That would mean there is no way to pass a brace to a __format__ method.


>> class ShowSpec(str):
>>        
>>     return spec
>>
>> ShowSpec("{0:{{{1}}}}").format('abc', 'xyz')
>>
> 
>  >>> ShowSpec("{0:{{{1}}}}").format('abc', 'xyz')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: Invalid conversion specification

> I think you mean:
> ShowSpec("{0:{1}}").format('abc', 'xyz')

No, because you may need to be able to pass the '{' and '}' character to 
the format specifier in some way.  The standard specifiers don't use them, 
but custom specifiers may need them.


> But I have some error with that.  I'm looking into it.
> 
>> "{0}".format('{value:{{^{width}}', width='10', value='Python')
> 
>  >>> "{0}".format('{value:{{^{width}}', width='10', value='Python')
> '{value:{{^{width}}'

Depending on weather or not the evaluation is recursive this may or may not 
be correct.

I think it's actually easier to do it recursively and not put limits on 
where format specifiers can be used or not.

_RON






From eric+python-dev at trueblade.com  Thu Aug 23 03:33:19 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 22 Aug 2007 21:33:19 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46CCDE13.60502@ronadam.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com>
	<46CCD063.2060007@trueblade.com> <46CCDE13.60502@ronadam.com>
Message-ID: <46CCE3DF.1090004@trueblade.com>

Ron Adam wrote:
> 
> 
> Eric Smith wrote:
>> Ron Adam wrote:
>>>> I've been re-reading the PEP, in an effort to make sure everything 
>>>> is working.  I realized that these tests should not pass.  The PEP 
>>>> says that "Format specifiers can themselves contain replacement 
>>>> fields".  The tests above have replacement fields in the field name, 
>>>> which is not allowed.  I'm going to remove this functionality.
>>>>
>>>> I believe the intent is to support a replacement for:
>>>> "%.*s" % (4, 'how now brown cow')
>>>>
>>>> Which would be:
>>>> "{0:.{1}}".format('how now brown cow', 4)
>>>>
>>>> For this, there's no need for replacement on field name.  I've taken 
>>>> it out of the code, and made these tests in to errors.
>>>
>>> I think it should work myself, but it could be added back in later if 
>>> there is a need to.
>>>
>>>
>>> I'm still concerned about the choice of {{ and }} as escaped brackets.
>>>
>>> What does the following do?
>>>
>>>
>>> "{0:{{^{1}}".format('Python', '12')
>>
>>  >>> "{0:{{^{1}}".format('Python', '12')
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> ValueError: unterminated replacement field
> 
> When are the "{{" and "}}" escape characters replaced with  '{' and '}'?

While parsing for the starting '{'.  I'm not saying this is the best or 
only or even PEP-specified way of doing it, but that's how the sample 
implementation does it (and the way the sandbox version has done it for 
many months).

>> But,
>>  >>> "{{{0:^{1}}".format('Python', '12')
>> '{   Python   '
> 
> So escaping '{' with '{{' and '}' with '}}' doesn't work inside of 
> format expressions?

As I have it implemented, yes.

> That would mean there is no way to pass a brace to a __format__ method.

No way using string.format, correct.  You could pass it in using the 
builtin format(), or by calling __format__ directly.  But you're 
correct, for the most part if string.format doesn't accept it, it's not 
practical.

>> I think you mean:
>> ShowSpec("{0:{1}}").format('abc', 'xyz')
> 
> No, because you may need to be able to pass the '{' and '}' character to 
> the format specifier in some way.  The standard specifiers don't use 
> them, but custom specifiers may need them.

Also true.

>>> "{0}".format('{value:{{^{width}}', width='10', value='Python')
>>
>>  >>> "{0}".format('{value:{{^{width}}', width='10', value='Python')
>> '{value:{{^{width}}'
> 
> Depending on weather or not the evaluation is recursive this may or may 
> not be correct.
> 
> I think it's actually easier to do it recursively and not put limits on 
> where format specifiers can be used or not.

But then you'd always have to worry that some replaced string looks like 
something that could be interpreted as a field, even if that's not what 
you want.

What if "{value}" came from user supplied input?  I don't think you'd 
want (or expect) any string you output that contains braces to be expanded.

From guido at python.org  Thu Aug 23 04:06:59 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 22 Aug 2007 19:06:59 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <20070823004155.GB12780@electricrain.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<20070823004155.GB12780@electricrain.com>
Message-ID: <ca471dc20708221906r4ec1bee4o83aa6da564c1d55f@mail.gmail.com>

For me too. Great work whoever fixed these (and other tests, like xmlrpc).

All I've got left failing is the three email tests, and I know Barry
Warsaw is working on those. (Although he used some choice swearwords
to describe the current state. :-)

--Guido

On 8/22/07, Gregory P. Smith <greg at electricrain.com> wrote:
> > > There are currently about 7 failing unit tests left:
> > >
> > > test_bsddb
> > > test_bsddb3
> ...
>
> fyi these two pass for me on the current py3k branch on ubuntu linux
> and mac os x 10.4.9.
>
> -greg
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug 23 04:07:51 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 22 Aug 2007 19:07:51 -0700
Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k
In-Reply-To: <bc1fe4bd0708221828g533d04f2n8f92a7218ac04e8b@mail.gmail.com>
References: <bc1fe4bd0708221543h5fa41d87jf94145275e2c2bdd@mail.gmail.com>
	<ca471dc20708221642r537eaf22v208638c655c90e9c@mail.gmail.com>
	<bc1fe4bd0708221828g533d04f2n8f92a7218ac04e8b@mail.gmail.com>
Message-ID: <ca471dc20708221907t72070ad9o4e545db75ae918e2@mail.gmail.com>

My mistake, out of habit I limited the submit to the Lib subdirectory.
Will do later.

On 8/22/07, John Reese <jtr at miskatonic.nu> wrote:
> Thanks, sounds good.  I'm curious why you left out the change to
> Doc/library/xmlrpclib.rst -- the documentation of the type of the
> parameter was out-of-date, if it was ever right.
>
> On 8/22/07, Guido van Rossum <guido at python.org> wrote:
> > Thanks! I've checked the bulk of this in, excepting the fix for #3,
> > which I fixed at the source in longobject.c. Also, I changed the call
> > to io.StringIO() to first convert the bytes to characters, using the
> > same encoding as used for the HTTP request header line (Latin-1).
> >
> > --Guido
> >
> > On 8/22/07, John Reese <john.reese at gmail.com> wrote:
> > > Good afternoon.  I'm in the Google Python Sprint working on getting
> > > the test_xmlrpc unittest to pass.  The following patch was prepared by
> > > Jacques Frechet and me.  We'd appreciate feedback on the attached
> > > patch.
> > >
> > > What was broken:
> > >
> > >
> > > 1. BaseHTTPServer attempts to parse the http headers with an
> > > rfc822.Message class.  This was changed in r56905 by Jeremy Hylton to
> > > use the new io library instead of stringio as before.  Unfortunately
> > > Jeremy's change resulted in TextIOWrapper stealing part of the HTTP
> > > request body, due to its buffering quantum.  This was not seen in
> > > normal tests because GET requests have no body, but xmlrpc uses POSTs.
> > >  We fixed this by doing the equivalent of what was done before, but
> > > using io.StringIO instead of the old cStringIO class: we pull out just
> > > the header using a sequence of readlines.
> > >
> > >
> > > 2. Once this was fixed, a second error asserted:
> > > test_xmlrpc.test_with{_no,}_info call .get on the headers object from
> > > xmlrpclib.ProtocolError.  This fails because the headers object became
> > > a list in r57194.  The story behind this is somewhat complicated:
> > >   - xmlrpclib used to use httplib.HTTP, which is old and deprecated
> > >   - r57024 Jeremy Hylton switched py3k to use more modern httplib
> > > infrastructure, but broke xmlrpclib.Transport.request; the "headers"
> > > variable was now referenced without being set
> > >   - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the
> > > headers in a way that didn't explode; unfortunately, it now returned a
> > > list instead of a dict, but there were no tests to catch this
> > >   - r57221 Guido integrated xmlrpc changes from the trunk, including
> > > r57158, which added tests that relied on headers being a dict.
> > > Unfortunately, it no longer was.
> > >
> > >
> > > 3. test_xmlrpc.test_fail_with_info was failing because the ValueError
> > > string of int('nonintegralstring') in py3k currently has an "s".  This
> > > is presumably going away soon; the test now uses a regular expression
> > > with an optional leading "s", which is a little silly, but r56209 is
> > > prior art.
> > >
> > > >>> int('z')
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in <module>
> > > ValueError: invalid literal for int() with base 10: s'z'
> > >
> > > _______________________________________________
> > > Python-3000 mailing list
> > > Python-3000 at python.org
> > > http://mail.python.org/mailman/listinfo/python-3000
> > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> > >
> > >
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From andrew.j.wade at gmail.com  Thu Aug 23 05:15:56 2007
From: andrew.j.wade at gmail.com (Andrew James Wade)
Date: Wed, 22 Aug 2007 23:15:56 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46CCE3DF.1090004@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com>
	<46CCD063.2060007@trueblade.com> <46CCDE13.60502@ronadam.com>
	<46CCE3DF.1090004@trueblade.com>
Message-ID: <20070822231556.c1f9f647.ajwade+py3k@andrew.wade.networklinux.net>

On Wed, 22 Aug 2007 21:33:19 -0400
Eric Smith <eric+python-dev at trueblade.com> wrote:

> Ron Adam wrote:
...
> > That would mean there is no way to pass a brace to a __format__ method.
> 
> No way using string.format, correct.  You could pass it in using the 
> builtin format(), or by calling __format__ directly.  But you're 
> correct, for the most part if string.format doesn't accept it, it's not 
> practical.

What about:
>>>  "{0:{lb}{1}{lb}}".format(ShowSpec(), 'abc', lb='{', rb='}')
'{abc}'
Ugly, but better than nothing.

> > I think it's actually easier to do it recursively and not put limits on 
> > where format specifiers can be used or not.
>
> But then you'd always have to worry that some replaced string looks like 
> something that could be interpreted as a field, even if that's not what 
> you want.
> 
> What if "{value}" came from user supplied input?  I don't think you'd 
> want (or expect) any string you output that contains braces to be expanded.

Not a problem with recursion:

$ echo $(echo $(pwd))
/home/ajwade
$ a='echo $(pwd)'
$ echo $a
echo $(pwd)
$ echo $($a)
$(pwd)
$ echo $($($a))
bash: $(pwd): command not found

The key is to do substitution only once at each level of recursion;
which is what a naive recursive algorithm would do anyway. And I'd do
the recursive substitution before even starting to parse the field:
it's simple and powerful.

-- Andrew

From andrew.j.wade at gmail.com  Thu Aug 23 06:56:19 2007
From: andrew.j.wade at gmail.com (Andrew James Wade)
Date: Thu, 23 Aug 2007 00:56:19 -0400
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <20070822231556.c1f9f647.ajwade+py3k@andrew.wade.networklinux.net>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com>
	<46CCD063.2060007@trueblade.com> <46CCDE13.60502@ronadam.com>
	<46CCE3DF.1090004@trueblade.com>
	<20070822231556.c1f9f647.ajwade+py3k@andrew.wade.networklinux.net>
Message-ID: <20070823005619.bdc3c1e1.ajwade+py3k@andrew.wade.networklinux.net>

On Wed, 22 Aug 2007 23:15:56 -0400
Andrew James Wade <andrew.j.wade at gmail.com> wrote:

> And I'd do
> the recursive substitution before even starting to parse the field:
> it's simple and powerful.

Scratch that suggestion; the implications need to be thought through.
If we allow recursive substitution only in the format specifier that
decision can always be re-visited at a later date.

-- Andrew

From jtr at miskatonic.nu  Thu Aug 23 03:28:50 2007
From: jtr at miskatonic.nu (John Reese)
Date: Wed, 22 Aug 2007 18:28:50 -0700
Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k
In-Reply-To: <ca471dc20708221642r537eaf22v208638c655c90e9c@mail.gmail.com>
References: <bc1fe4bd0708221543h5fa41d87jf94145275e2c2bdd@mail.gmail.com>
	<ca471dc20708221642r537eaf22v208638c655c90e9c@mail.gmail.com>
Message-ID: <bc1fe4bd0708221828g533d04f2n8f92a7218ac04e8b@mail.gmail.com>

Thanks, sounds good.  I'm curious why you left out the change to
Doc/library/xmlrpclib.rst -- the documentation of the type of the
parameter was out-of-date, if it was ever right.

On 8/22/07, Guido van Rossum <guido at python.org> wrote:
> Thanks! I've checked the bulk of this in, excepting the fix for #3,
> which I fixed at the source in longobject.c. Also, I changed the call
> to io.StringIO() to first convert the bytes to characters, using the
> same encoding as used for the HTTP request header line (Latin-1).
>
> --Guido
>
> On 8/22/07, John Reese <john.reese at gmail.com> wrote:
> > Good afternoon.  I'm in the Google Python Sprint working on getting
> > the test_xmlrpc unittest to pass.  The following patch was prepared by
> > Jacques Frechet and me.  We'd appreciate feedback on the attached
> > patch.
> >
> > What was broken:
> >
> >
> > 1. BaseHTTPServer attempts to parse the http headers with an
> > rfc822.Message class.  This was changed in r56905 by Jeremy Hylton to
> > use the new io library instead of stringio as before.  Unfortunately
> > Jeremy's change resulted in TextIOWrapper stealing part of the HTTP
> > request body, due to its buffering quantum.  This was not seen in
> > normal tests because GET requests have no body, but xmlrpc uses POSTs.
> >  We fixed this by doing the equivalent of what was done before, but
> > using io.StringIO instead of the old cStringIO class: we pull out just
> > the header using a sequence of readlines.
> >
> >
> > 2. Once this was fixed, a second error asserted:
> > test_xmlrpc.test_with{_no,}_info call .get on the headers object from
> > xmlrpclib.ProtocolError.  This fails because the headers object became
> > a list in r57194.  The story behind this is somewhat complicated:
> >   - xmlrpclib used to use httplib.HTTP, which is old and deprecated
> >   - r57024 Jeremy Hylton switched py3k to use more modern httplib
> > infrastructure, but broke xmlrpclib.Transport.request; the "headers"
> > variable was now referenced without being set
> >   - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the
> > headers in a way that didn't explode; unfortunately, it now returned a
> > list instead of a dict, but there were no tests to catch this
> >   - r57221 Guido integrated xmlrpc changes from the trunk, including
> > r57158, which added tests that relied on headers being a dict.
> > Unfortunately, it no longer was.
> >
> >
> > 3. test_xmlrpc.test_fail_with_info was failing because the ValueError
> > string of int('nonintegralstring') in py3k currently has an "s".  This
> > is presumably going away soon; the test now uses a regular expression
> > with an optional leading "s", which is a little silly, but r56209 is
> > prior art.
> >
> > >>> int('z')
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > ValueError: invalid literal for int() with base 10: s'z'
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
> >
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From keir at google.com  Thu Aug 23 03:34:20 2007
From: keir at google.com (Keir Mierle)
Date: Wed, 22 Aug 2007 18:34:20 -0700
Subject: [Python-3000] [PATCH] Fix broken round and truncate behaviour
	(PEP3141)
Message-ID: <c9a9f8a90708221834v787ff5datd89b3949b7cd57b9@mail.gmail.com>

This patch fixes the previously added truncate, and also fixes round
behavior. The two argument version of round is not currently handling
the round toward even case.

Keir
-------------- next part --------------
A non-text attachment was scrubbed...
Name: round_truncate_fix.diff
Type: text/x-patch
Size: 6257 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070822/757bdb72/attachment.bin 

From rrr at ronadam.com  Thu Aug 23 07:31:04 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 23 Aug 2007 00:31:04 -0500
Subject: [Python-3000] PEP 3101 Updated
In-Reply-To: <46CCE3DF.1090004@trueblade.com>
References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com>
	<46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com>
	<46CCD063.2060007@trueblade.com> <46CCDE13.60502@ronadam.com>
	<46CCE3DF.1090004@trueblade.com>
Message-ID: <46CD1B98.1090008@ronadam.com>



Eric Smith wrote:
> Ron Adam wrote:
>>
>>
>> Eric Smith wrote:
>>> Ron Adam wrote:
>>>>> I've been re-reading the PEP, in an effort to make sure everything 
>>>>> is working.  I realized that these tests should not pass.  The PEP 
>>>>> says that "Format specifiers can themselves contain replacement 
>>>>> fields".  The tests above have replacement fields in the field 
>>>>> name, which is not allowed.  I'm going to remove this functionality.
>>>>>
>>>>> I believe the intent is to support a replacement for:
>>>>> "%.*s" % (4, 'how now brown cow')
>>>>>
>>>>> Which would be:
>>>>> "{0:.{1}}".format('how now brown cow', 4)
>>>>>
>>>>> For this, there's no need for replacement on field name.  I've 
>>>>> taken it out of the code, and made these tests in to errors.
>>>>
>>>> I think it should work myself, but it could be added back in later 
>>>> if there is a need to.
>>>>
>>>>
>>>> I'm still concerned about the choice of {{ and }} as escaped brackets.
>>>>
>>>> What does the following do?
>>>>
>>>>
>>>> "{0:{{^{1}}".format('Python', '12')
>>>
>>>  >>> "{0:{{^{1}}".format('Python', '12')
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> ValueError: unterminated replacement field
>>
>> When are the "{{" and "}}" escape characters replaced with  '{' and '}'?
> 
> While parsing for the starting '{'.  I'm not saying this is the best or 
> only or even PEP-specified way of doing it, but that's how the sample 
> implementation does it (and the way the sandbox version has done it for 
> many months).

Any problems can be fixed of course once the desired behavior is decided 
on.  These are just some loose ends that still need to be spelled out in 
the PEP.


>>> But,
>>>  >>> "{{{0:^{1}}".format('Python', '12')
>>> '{   Python   '
>>
>> So escaping '{' with '{{' and '}' with '}}' doesn't work inside of 
>> format expressions?
> 
> As I have it implemented, yes.
> 
>> That would mean there is no way to pass a brace to a __format__ method.
> 
> No way using string.format, correct.  You could pass it in using the 
> builtin format(), or by calling __format__ directly.  But you're 
> correct, for the most part if string.format doesn't accept it, it's not 
> practical.

See the suggestions below.


>>>> "{0}".format('{value:{{^{width}}', width='10', value='Python')
>>>
>>>  >>> "{0}".format('{value:{{^{width}}', width='10', value='Python')
>>> '{value:{{^{width}}'
>>
>> Depending on weather or not the evaluation is recursive this may or 
>> may not be correct.
>>
>> I think it's actually easier to do it recursively and not put limits 
>> on where format specifiers can be used or not.
> 
> But then you'd always have to worry that some replaced string looks like 
> something that could be interpreted as a field, even if that's not what 
> you want.
> 
> What if "{value}" came from user supplied input?  I don't think you'd 
> want (or expect) any string you output that contains braces to be expanded.

Ok, after thinking about it for a while...

Then maybe it's best not to use any recursion, not even at the top level.

The above expressions would then need to spelled:


     "{{0:.{0}}}".format(4).format('how now brown cow')


     "{{value:^{width}}}".format(width='10').format(value='Python')


     "{{0:{{^{0}}}".format(12).format('Python')


Do those work?

This is not that different to how '%' formatting already works.



Either way I'd like to see an unambiguous escapes used for braces.  For 
both ease of implementation and ease of reading.   Maybe we can re-use the 
'%' for escaping characters.  ['%%', '%{', '%}']

     "%{0:.{0}%}".format('4').format('how now brown cow')


     "%{value:^{width}%}".format(width='10').format(value='Python')


     "%{0:%%%{^{0}%}".format('12').format('Python')


The only draw back of this is the '%' specifier type needs to be expressed 
as either '%%' or maybe by another letter, 'p'?   Which is a minor issue I 
think.

Reasons for doing this...

     - It makes determining where fields start and stop easier than
       using '{{' and '}}' when other braces are in the string.
       (Both for humans and for code.)

     - It's a better alternative than '\{' and '\}' because it doesn't
       have any issues with back slashes and raw strings or cause
       excessive '\'s in strings.

     - It doesn't collide with regular expressions.

     - It looks familiar in the context that it's used in.

     - Everyone is already familiar with '%%'. So adding '%{' and '%}'
       would not seem out of place.


Although a recursive solution is neat and doesn't require typing '.format' 
as much, these two suggestions together are easy to understand and avoid 
all of the above issues.  (As near as I can tell.)

Cheers,
   _RON






From martin at v.loewis.de  Thu Aug 23 07:58:33 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 23 Aug 2007 07:58:33 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <20070822235929.GA12780@electricrain.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
Message-ID: <46CD2209.8000408@v.loewis.de>

> IMHO all of that is desirable in many situations but it is not strict.
> bytes:bytes or int:bytes (depending on the database type) are
> fundamentally all the C berkeleydb library knows.  Attaching meaning
> to the keys and values is up to the user.  I'm about to try a _bsddb.c
> that strictly enforces bytes as values for the underlying bsddb.db API
> provided by _bsddb in my sandbox under the assumption that being
> strict about bytes is desired at that level there.  I predict lots of
> Lib/bsddb/test/ edits.

I fixed it all a few weeks ago, in revisions r56754, r56840, r56890,
r56892, r56914. I predict you'll find that most of the edits are
already committed.

Regards,
Martin

From greg at electricrain.com  Thu Aug 23 08:12:32 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Wed, 22 Aug 2007 23:12:32 -0700
Subject: [Python-3000] easy int to bytes conversion similar to chr?
Message-ID: <20070823061232.GA4405@electricrain.com>

Is there anything similar to chr(65) for creating a single byte string
that doesn't involve creating an intermediate string or tuple object?

 bytes(chr(65))
 bytes((65,))

both seem slightly weird.

Greg


From greg at electricrain.com  Thu Aug 23 08:54:08 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Wed, 22 Aug 2007 23:54:08 -0700
Subject: [Python-3000] easy int to bytes conversion similar to chr?
In-Reply-To: <20070823061232.GA4405@electricrain.com>
References: <20070823061232.GA4405@electricrain.com>
Message-ID: <20070823065408.GB4405@electricrain.com>

On Wed, Aug 22, 2007 at 11:12:32PM -0700, Gregory P. Smith wrote:
> Is there anything similar to chr(65) for creating a single byte string
> that doesn't involve creating an intermediate string or tuple object?
> 
>  bytes(chr(65))
>  bytes((65,))
> 
> both seem slightly weird.
> 
> Greg

yes i know.. bad example.  b'\x41' works for that.  pretend i used an
integer variable not an up front constant.

 bytes(chr(my_int))    # not strictly correct unless 0<=my_int<=255
 bytes((my_int,))
 struct.pack('B', my_int)

This came up as being useful in unittests for the bsddb bytes:bytes
changes i'm making but at the moment I'm not coming up with practical
examples where its important.  maybe this is a nonissue.

-gps

From martin at v.loewis.de  Thu Aug 23 09:27:37 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 23 Aug 2007 09:27:37 +0200
Subject: [Python-3000] easy int to bytes conversion similar to chr?
In-Reply-To: <20070823061232.GA4405@electricrain.com>
References: <20070823061232.GA4405@electricrain.com>
Message-ID: <46CD36E9.1090609@v.loewis.de>

Gregory P. Smith schrieb:
> Is there anything similar to chr(65) for creating a single byte string
> that doesn't involve creating an intermediate string or tuple object?
> 
>  bytes(chr(65))
>  bytes((65,))
> 
> both seem slightly weird.

b = bytes(1)
b[0] = 65

doesn't create an intermediate string or tuple object.

Regards,
Martin

From greg at electricrain.com  Thu Aug 23 09:38:38 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Thu, 23 Aug 2007 00:38:38 -0700
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <46CD2209.8000408@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
Message-ID: <20070823073837.GA14725@electricrain.com>

On Thu, Aug 23, 2007 at 07:58:33AM +0200, "Martin v. L?wis" wrote:
> > IMHO all of that is desirable in many situations but it is not strict.
> > bytes:bytes or int:bytes (depending on the database type) are
> > fundamentally all the C berkeleydb library knows.  Attaching meaning
> > to the keys and values is up to the user.  I'm about to try a _bsddb.c
> > that strictly enforces bytes as values for the underlying bsddb.db API
> > provided by _bsddb in my sandbox under the assumption that being
> > strict about bytes is desired at that level there.  I predict lots of
> > Lib/bsddb/test/ edits.
> 
> I fixed it all a few weeks ago, in revisions r56754, r56840, r56890,
> r56892, r56914. I predict you'll find that most of the edits are
> already committed.
> 
> Regards,
> Martin

Yeah you did the keys (good!).  I just checked in a change to require
values to also by bytes.  Maybe that goes so far as to be inconvenient?
Its accurate.  All retreived data comes back as bytes.

Greg

From martin at v.loewis.de  Thu Aug 23 09:49:19 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 23 Aug 2007 09:49:19 +0200
Subject: [Python-3000] Immutable bytes type and dbm modules
In-Reply-To: <20070823073837.GA14725@electricrain.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
Message-ID: <46CD3BFF.5080904@v.loewis.de>

> Yeah you did the keys (good!).  I just checked in a change to require
> values to also by bytes.  Maybe that goes so far as to be inconvenient?

Ah, ok. I think it is fine. We still need to discuss what the best
way is to do string:string databases, or string:bytes databases.

I added StringKeys and StringValues to allow for such cases, and I
also changed shelve to use string keys (not bytes keys), as this
is really a dictionary-like application; this all needs to be
discussed.

Regards,
Martin

From barry at python.org  Thu Aug 23 13:16:52 2007
From: barry at python.org (Barry Warsaw)
Date: Thu, 23 Aug 2007 07:16:52 -0400
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <ca471dc20708221906r4ec1bee4o83aa6da564c1d55f@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<20070823004155.GB12780@electricrain.com>
	<ca471dc20708221906r4ec1bee4o83aa6da564c1d55f@mail.gmail.com>
Message-ID: <352F453B-81BD-4AE1-AA1B-08B325601172@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 22, 2007, at 10:06 PM, Guido van Rossum wrote:

> For me too. Great work whoever fixed these (and other tests, like  
> xmlrpc).
>
> All I've got left failing is the three email tests, and I know Barry
> Warsaw is working on those. (Although he used some choice swearwords
> to describe the current state. :-)

I plan to spend some Real Time on those today and I think Bill is  
going to meet up with me on #python-dev when the Californians wake up.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRs1spHEjvBPtnXfVAQLepwQArE3MEL1ygNiEvfHa1uBShfUyRwjT/JyI
WzPv8pVWUumwdSqgzj0CW1iyAqV1dUtm9MoRgImyJQu7rowtPnDyOutdJJSyo9xN
y/oUSj6pRPftu785u6ZcbOWA34ROjmbv8R4wFvfFHs2fBnX18OosfSLoR9rWqSlM
ae2kv0maDFw=
=WzNl
-----END PGP SIGNATURE-----

From aahz at pythoncraft.com  Thu Aug 23 16:28:59 2007
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 23 Aug 2007 07:28:59 -0700
Subject: [Python-3000] [PATCH] Fix math.ceil() behaviour for PEP3141
In-Reply-To: <ef5675f30708221658k2746c1al8e585d92efc97393@mail.gmail.com>
References: <ef5675f30708221658k2746c1al8e585d92efc97393@mail.gmail.com>
Message-ID: <20070823142859.GA16448@panix.com>

On Wed, Aug 22, 2007, Keir Mierle wrote:
>
> The attached patch fixes math.ceil to delegate to x.__ceil__() if it
> is defined, according to PEP 3141, and adds tests to cover the new
> cases. Patch is against r57303.

Please wait until the new python.org bug tracker is up and post the
patch there.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you don't know what your program is supposed to do, you'd better not
start writing it."  --Dijkstra

From jeremy at alum.mit.edu  Thu Aug 23 16:37:35 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Thu, 23 Aug 2007 10:37:35 -0400
Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k
In-Reply-To: <bc1fe4bd0708221543h5fa41d87jf94145275e2c2bdd@mail.gmail.com>
References: <bc1fe4bd0708221543h5fa41d87jf94145275e2c2bdd@mail.gmail.com>
Message-ID: <e8bf7a530708230737j1ecb224exaa00064f76bffb42@mail.gmail.com>

On 8/22/07, John Reese <john.reese at gmail.com> wrote:
> Good afternoon.  I'm in the Google Python Sprint working on getting
> the test_xmlrpc unittest to pass.  The following patch was prepared by
> Jacques Frechet and me.  We'd appreciate feedback on the attached
> patch.
>
> What was broken:
>
>
> 1. BaseHTTPServer attempts to parse the http headers with an
> rfc822.Message class.  This was changed in r56905 by Jeremy Hylton to
> use the new io library instead of stringio as before.  Unfortunately
> Jeremy's change resulted in TextIOWrapper stealing part of the HTTP
> request body, due to its buffering quantum.  This was not seen in
> normal tests because GET requests have no body, but xmlrpc uses POSTs.
>  We fixed this by doing the equivalent of what was done before, but
> using io.StringIO instead of the old cStringIO class: we pull out just
> the header using a sequence of readlines.

Thanks for the fix.  Are there any tests you can add to
test_xmlrpc_net that would have caught this error?  There was some
non-unittest test code in xmlrpc but it seemed to use servers or
requests that don't work anymore.  I couldn't find any xmlrpc servers
that I could use to test more than the getCurrentTime() test that we
currently have.

Jeremy

>
>
> 2. Once this was fixed, a second error asserted:
> test_xmlrpc.test_with{_no,}_info call .get on the headers object from
> xmlrpclib.ProtocolError.  This fails because the headers object became
> a list in r57194.  The story behind this is somewhat complicated:
>   - xmlrpclib used to use httplib.HTTP, which is old and deprecated
>   - r57024 Jeremy Hylton switched py3k to use more modern httplib
> infrastructure, but broke xmlrpclib.Transport.request; the "headers"
> variable was now referenced without being set
>   - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the
> headers in a way that didn't explode; unfortunately, it now returned a
> list instead of a dict, but there were no tests to catch this
>   - r57221 Guido integrated xmlrpc changes from the trunk, including
> r57158, which added tests that relied on headers being a dict.
> Unfortunately, it no longer was.
>
>
> 3. test_xmlrpc.test_fail_with_info was failing because the ValueError
> string of int('nonintegralstring') in py3k currently has an "s".  This
> is presumably going away soon; the test now uses a regular expression
> with an optional leading "s", which is a little silly, but r56209 is
> prior art.
>
> >>> int('z')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: invalid literal for int() with base 10: s'z'
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu
>
>
>

From guido at python.org  Thu Aug 23 16:38:05 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 07:38:05 -0700
Subject: [Python-3000] [PATCH] Fix math.ceil() behaviour for PEP3141
In-Reply-To: <20070823142859.GA16448@panix.com>
References: <ef5675f30708221658k2746c1al8e585d92efc97393@mail.gmail.com>
	<20070823142859.GA16448@panix.com>
Message-ID: <ca471dc20708230738y44129798r8dfbe25d5ef99713@mail.gmail.com>

Aahz,

While the sprint is going on (and because I knew about the tracker
move) I've encouraged people at the sprint to post their patches
directly to python-3000 -- most likely the review feedback will be
given in person and then the patch will be checked in. This is quicker
than using a tracker for this particular purpose. Patches that are
left hanging past the sprint will have to be uploaded to the (new)
tracker.

--Guido

On 8/23/07, Aahz <aahz at pythoncraft.com> wrote:
> On Wed, Aug 22, 2007, Keir Mierle wrote:
> >
> > The attached patch fixes math.ceil to delegate to x.__ceil__() if it
> > is defined, according to PEP 3141, and adds tests to cover the new
> > cases. Patch is against r57303.
>
> Please wait until the new python.org bug tracker is up and post the
> patch there.
> --
> Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/
>
> "If you don't know what your program is supposed to do, you'd better not
> start writing it."  --Dijkstra
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From p.f.moore at gmail.com  Thu Aug 23 17:36:35 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 23 Aug 2007 16:36:35 +0100
Subject: [Python-3000] Is __cmp__ deprecated?
Message-ID: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com>

Can I just check - is __cmp__ due for removal in Py3K? There's no
mention of it in PEP 3100, but its status seems unclear from
references I've found.

Actually, is __coerce__ still around, as well? Again, I can't see a
clear answer in the PEPs or list discussions.

Paul.

From guido at python.org  Thu Aug 23 17:43:52 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 08:43:52 -0700
Subject: [Python-3000] Is __cmp__ deprecated?
In-Reply-To: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com>
References: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com>
Message-ID: <ca471dc20708230843w70b31409gca41c091367566d7@mail.gmail.com>

Coerce is definitely dead.

cmp() is still alive and __cmp__ is used to overload it; on the one
hand I'd like to get rid of it but OTOH it's occasionally useful. So
it'll probably stay. However, to overload <, == etc. you *have* to
overload __lt__ and friends.

On 8/23/07, Paul Moore <p.f.moore at gmail.com> wrote:
> Can I just check - is __cmp__ due for removal in Py3K? There's no
> mention of it in PEP 3100, but its status seems unclear from
> references I've found.
>
> Actually, is __coerce__ still around, as well? Again, I can't see a
> clear answer in the PEPs or list discussions.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From p.f.moore at gmail.com  Thu Aug 23 17:55:08 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 23 Aug 2007 16:55:08 +0100
Subject: [Python-3000] Is __cmp__ deprecated?
In-Reply-To: <ca471dc20708230843w70b31409gca41c091367566d7@mail.gmail.com>
References: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com>
	<ca471dc20708230843w70b31409gca41c091367566d7@mail.gmail.com>
Message-ID: <79990c6b0708230855k720ecff7n45a2b570a823daa3@mail.gmail.com>

On 23/08/07, Guido van Rossum <guido at python.org> wrote:
> Coerce is definitely dead.
>
> cmp() is still alive and __cmp__ is used to overload it; on the one
> hand I'd like to get rid of it but OTOH it's occasionally useful. So
> it'll probably stay. However, to overload <, == etc. you *have* to
> overload __lt__ and friends.

Thanks. In particular, thanks for the comment about overloading < etc
- that's what I was looking at, and I was wondering about using
__cmp__ to save some boilerplate. You saved me some headaches!

Paul.

From p.f.moore at gmail.com  Thu Aug 23 18:20:27 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 23 Aug 2007 17:20:27 +0100
Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy
	for Numbers
In-Reply-To: <ca471dc20708221257u37420efam25c895fe138b72f7@mail.gmail.com>
References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com>
	<5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com>
	<5d44f72f0708221236k7c3ea054k43eb237f4a3ef577@mail.gmail.com>
	<ca471dc20708221257u37420efam25c895fe138b72f7@mail.gmail.com>
Message-ID: <79990c6b0708230920x323dd369h9369e33f29989517@mail.gmail.com>

On 22/08/07, Guido van Rossum <guido at python.org> wrote:
> > * Add Demo/classes/Rat.py to the stdlib?
>
> Yes, but it needs a makeover. At the very least I'd propose the module
> name to be rational.

If no-one else gets to this, I might take a look. But I'm not likely
to make fast progress as I don't have a lot of free time... (And I
don't have a Windows compiler, so I'll need to set up a Linux VM and
find out how to build Py3K on that!)

> The code is really old.

Too right - it's riddled with "isinstance" calls which probably aren't
very flexible, and it seems to try to handle mixed-mode operations
with float and complex, which I can't see the use for...

Given that the basic algorithms are pretty trivial, the major part of
any makeover would be rewriting the special method implementations to
conform to the Rational ABC. A makeover is probably more or less a
rewrite. I wrote a rational number class in C++ for Boost once, it
wouldn't be too hard to port.

Paul.

From greg at electricrain.com  Thu Aug 23 19:18:38 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Thu, 23 Aug 2007 10:18:38 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <46CD3BFF.5080904@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
Message-ID: <20070823171837.GI24059@electricrain.com>

On Thu, Aug 23, 2007 at 09:49:19AM +0200, "Martin v. L?wis" wrote:
> > Yeah you did the keys (good!).  I just checked in a change to require
> > values to also by bytes.  Maybe that goes so far as to be inconvenient?
> 
> Ah, ok. I think it is fine. We still need to discuss what the best
> way is to do string:string databases, or string:bytes databases.
> 
> I added StringKeys and StringValues to allow for such cases, and I
> also changed shelve to use string keys (not bytes keys), as this
> is really a dictionary-like application; this all needs to be
> discussed.
> 
> Regards,
> Martin

Alright, regarding bytes being mutable.  I realized this morning that
things just won't work with the database libraries that way.
PyBytes_AS_STRING() returns a the bytesobjects char *ob_bytes pointer.
But database operations occur with the GIL released so that mutable
string is free to change out from underneath it.

I -detest- the idea of making another temporary copy of the data just
to allow the GIL to be released during IO.  data copies == bad.
Wasn't a past mailing list thread claiming the bytes type was supposed
to be great for IO?  How's that possible unless we add a lock to the
bytesobject?  (Its not -likely- that bytes objects will be modified
while in use for IO in most circumstances but just the possibility
that it could be is a problem)

I don't have much sprint time available today but I'll stop by to talk
about this one a bit.

-greg

From janssen at parc.com  Thu Aug 23 20:41:09 2007
From: janssen at parc.com (Bill Janssen)
Date: Thu, 23 Aug 2007 11:41:09 PDT
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <352F453B-81BD-4AE1-AA1B-08B325601172@python.org> 
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<20070823004155.GB12780@electricrain.com>
	<ca471dc20708221906r4ec1bee4o83aa6da564c1d55f@mail.gmail.com>
	<352F453B-81BD-4AE1-AA1B-08B325601172@python.org>
Message-ID: <07Aug23.114119pdt."57996"@synergy1.parc.xerox.com>

> I plan to spend some Real Time on those today and I think Bill is  
> going to meet up with me on #python-dev when the Californians wake up.

That's not working real well right now... IRC seems wedged for me.

Probably a firewall issue.

I've got to try a different location, and we'll try to connect again
when I'm there.

Bill

From g.brandl at gmx.net  Thu Aug 23 20:44:11 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 23 Aug 2007 20:44:11 +0200
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <07Aug23.114119pdt."57996"@synergy1.parc.xerox.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>	<20070823004155.GB12780@electricrain.com>	<ca471dc20708221906r4ec1bee4o83aa6da564c1d55f@mail.gmail.com>	<352F453B-81BD-4AE1-AA1B-08B325601172@python.org>
	<07Aug23.114119pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <fakkhj$2rt$1@sea.gmane.org>

Bill Janssen schrieb:
>> I plan to spend some Real Time on those today and I think Bill is  
>> going to meet up with me on #python-dev when the Californians wake up.
> 
> That's not working real well right now... IRC seems wedged for me.
> 
> Probably a firewall issue.
> 
> I've got to try a different location, and we'll try to connect again
> when I'm there.

Note that if it's a simple blocked port issue, you can also connect to freenode
on port 8000.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From collinw at gmail.com  Thu Aug 23 20:47:24 2007
From: collinw at gmail.com (Collin Winter)
Date: Thu, 23 Aug 2007 11:47:24 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <20070823004155.GB12780@electricrain.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<20070823004155.GB12780@electricrain.com>
Message-ID: <43aa6ff70708231147h673bf14evbb87ef262094a7af@mail.gmail.com>

On 8/22/07, Gregory P. Smith <greg at electricrain.com> wrote:
> > > There are currently about 7 failing unit tests left:
> > >
> > > test_bsddb
> > > test_bsddb3
> ...
>
> fyi these two pass for me on the current py3k branch on ubuntu linux
> and mac os x 10.4.9.

test_bsddb works for me on Ubuntu, and test_bsddb3 was working for me
yesterday, but now fails with

python: /home/collinwinter/src/python/py3k/Modules/_bsddb.c:388:
make_dbt: Assertion `((((PyObject*)(obj))->ob_type) == (&PyBytes_Type)
|| PyType_IsSubtype((((PyObject*)(obj))->ob_type), (&PyBytes_Type)))'
failed.

The failure occurs after this line is emitted

test02_cursors (bsddb.test.test_dbshelve.EnvThreadHashShelveTestCase) ... ok

Collin Winter

From martin at v.loewis.de  Thu Aug 23 21:18:59 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 23 Aug 2007 21:18:59 +0200
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <20070823171837.GI24059@electricrain.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
Message-ID: <46CDDDA3.1050906@v.loewis.de>

> I -detest- the idea of making another temporary copy of the data just
> to allow the GIL to be released during IO.  data copies == bad.
> Wasn't a past mailing list thread claiming the bytes type was supposed
> to be great for IO?  How's that possible unless we add a lock to the
> bytesobject?  (Its not -likely- that bytes objects will be modified
> while in use for IO in most circumstances but just the possibility
> that it could be is a problem)

I agree. There must be a way to lock a bytes object from modification,
preferably not by locking an attempt to modify it, but by raising an
exception when a locked bytes object is modified.

(I do realise that this gives something very close to immutable bytes
objects).

Regards,
Martin

From greg at electricrain.com  Thu Aug 23 21:24:08 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Thu, 23 Aug 2007 12:24:08 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <46CDDDA3.1050906@v.loewis.de>
References: <ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CDDDA3.1050906@v.loewis.de>
Message-ID: <20070823192408.GJ24059@electricrain.com>

On Thu, Aug 23, 2007 at 09:18:59PM +0200, "Martin v. L?wis" wrote:
> > I -detest- the idea of making another temporary copy of the data just
> > to allow the GIL to be released during IO.  data copies == bad.
> > Wasn't a past mailing list thread claiming the bytes type was supposed
> > to be great for IO?  How's that possible unless we add a lock to the
> > bytesobject?  (Its not -likely- that bytes objects will be modified
> > while in use for IO in most circumstances but just the possibility
> > that it could be is a problem)
> 
> I agree. There must be a way to lock a bytes object from modification,
> preferably not by locking an attempt to modify it, but by raising an
> exception when a locked bytes object is modified.
> 
> (I do realise that this gives something very close to immutable bytes
> objects).
> 
> Regards,
> Martin

I like that idea.  Its simple and leaves any actual locking up to a
subclass or other wrapper.

-gps

From pfdubois at gmail.com  Thu Aug 23 23:06:40 2007
From: pfdubois at gmail.com (Paul Dubois)
Date: Thu, 23 Aug 2007 14:06:40 -0700
Subject: [Python-3000] document processing tools conversion report
Message-ID: <f74a6c2f0708231406q6038a28ev5429db86d2fc8631@mail.gmail.com>

FYI: docutils will require some modification of at least the io module, to
the extent that the ideal mode of fixing the current sources so that a
subsequent pass of 2to3 will do the job, is probably not possible (but may
be outside of this one file). I've made a report to the docutils tracker to
that effect; will see what dialog ensues. There are things to handle images
etc. so I think fixing this is above my paygrade.

pygments converts ok but uses cPickle; changing to pickle is easy. Since
pygments uses docutils.io, it isn't possible to run it further.

sphinx converts ok, and can be imported.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070823/e64731f8/attachment.htm 

From g.brandl at gmx.net  Thu Aug 23 23:59:50 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 23 Aug 2007 23:59:50 +0200
Subject: [Python-3000] document processing tools conversion report
In-Reply-To: <f74a6c2f0708231406q6038a28ev5429db86d2fc8631@mail.gmail.com>
References: <f74a6c2f0708231406q6038a28ev5429db86d2fc8631@mail.gmail.com>
Message-ID: <fal00e$g8q$1@sea.gmane.org>

Paul Dubois schrieb:
> FYI: docutils will require some modification of at least the io module,
> to the extent that the ideal mode of fixing the current sources so that
> a subsequent pass of 2to3 will do the job, is probably not possible (but
> may be outside of this one file). I've made a report to the docutils
> tracker to that effect; will see what dialog ensues. There are things to
> handle images etc. so I think fixing this is above my paygrade.

Thanks for looking into this! If just the one file is problematic and the
rest can be handled by 2to3, we might be able to set up a way to do this
automatically once people want to build the docs with 3k.

I'll monitor the docutils tracker issue in any case.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From guido at python.org  Fri Aug 24 00:07:55 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 15:07:55 -0700
Subject: [Python-3000] [PATCH] Fix broken round and truncate behaviour
	(PEP3141)
In-Reply-To: <c9a9f8a90708221834v787ff5datd89b3949b7cd57b9@mail.gmail.com>
References: <c9a9f8a90708221834v787ff5datd89b3949b7cd57b9@mail.gmail.com>
Message-ID: <ca471dc20708231507h62d63b1ei4fd3fc342f081590@mail.gmail.com>

Alex and I did a bunch more work based on this patch and checked it in:

Committed revision 57359.


On 8/22/07, Keir Mierle <keir at google.com> wrote:
> This patch fixes the previously added truncate, and also fixes round
> behavior. The two argument version of round is not currently handling
> the round toward even case.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mierle at gmail.com  Fri Aug 24 01:45:15 2007
From: mierle at gmail.com (Keir Mierle)
Date: Thu, 23 Aug 2007 16:45:15 -0700
Subject: [Python-3000] [PATCH] Fix rich set comparison
Message-ID: <ef5675f30708231645s6f5e5493i60c4979ed6e9d6d0@mail.gmail.com>

This patch fixes rich set comparison so that x < y works when x is a
set and y is something which implements the corresponding comparison.

Keir
-------------- next part --------------
A non-text attachment was scrubbed...
Name: richsetcmp.diff
Type: text/x-patch
Size: 2070 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070823/989ac442/attachment.bin 

From larry at hastings.org  Fri Aug 24 02:03:21 2007
From: larry at hastings.org (Larry Hastings)
Date: Thu, 23 Aug 2007 17:03:21 -0700
Subject: [Python-3000] [PATCH] Fix dumbdbm, which fixes test_shelve (for me);
 instrument other tests so we catch this sooner (and more directly)
Message-ID: <46CE2049.8020802@hastings.org>



Attached is a patch for review.

As of revision 57341 (only a couple hours old as of this writing), 
test_shelve was failing on my machine.  This was because I didn't have 
any swell databases available, so anydbm was falling back to dumbdbm, 
and dumbdbm had a bug.  In Py3k, dumbdbm's dict-like interface now 
requires byte objects, which it internally encodes to "latin-1" then 
uses with a real dict.  But dumbdbm.__contains__ was missing the 
conversion, so it was trying to use a bytes object with a real dict, and 
that failed with an error (as bytes objects are not hashable).  This 
patch fixes dumbdbm.__contains__ so it encodes the key, fixing 
test_shelve on my machine.

But there's more!  Neil Norvitz pointed out that test_shelve /didn't/ 
fail on his machine.  That's because dumbdbm is the last resort of 
anydbm, and he had a superior database module available.  So the 
regression test suite was really missing two things:

    * test_dumbdbm should test dumbdbm.__contains__.
    * test_anydbm should test all the database modules available, not
      merely its first choice.

So this patch also adds test_write_contains() to test_dumbdbm, and a new 
external function to test_anydbm: dbm_iterate(), which returns an 
iterator over all database modules available to anydbm, /and/ internally 
forces anydbm to use that database module, restoring anydbm to its first 
choice when it finishes iteration.  I also renamed _delete_files() to 
delete_files() so it could be the canonical dbm cleanup function for 
other tests.

While I was at it, I noticed that test_whichdbm.py did a good job of 
testing all the databases available, but with a slightly odd approach: 
it iterated over all the possible databases, and created new test 
methods--inserting them into the class object--for each one that was 
available.  I changed it to use dbm_iterate() and delete_files() from 
test.test_anydbm, so that that logic can live in only one place.  I 
didn't preserve the setattr() approach; I simply iterate over all the 
modules and run the tests inside one conventional method.

One final thought, for the folks who defined this "in Py3k, 
database-backed dict-like objects use byte objects as keys" interface.  
dumbdbm.keys() returns self._index.keys(), which means that it's serving 
up real strings, not byte objects.  Shouldn't it return 
[k.encode("latin-1") for k in self._index.keys()] ?  (Or perhaps change 
iterkeys to return that as a generator expression, and change keys() to 
return list(self.iterkeys())?)

Thanks,


/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070823/d08bb467/attachment-0001.htm 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lch.py3k.dumbdb.contains.diff.1.txt
Url: http://mail.python.org/pipermail/python-3000/attachments/20070823/d08bb467/attachment-0001.txt 

From mierle at gmail.com  Fri Aug 24 02:23:47 2007
From: mierle at gmail.com (Keir Mierle)
Date: Thu, 23 Aug 2007 17:23:47 -0700
Subject: [Python-3000] Strange method resolution problem with __trunc__,
	__round__ on floats
Message-ID: <ef5675f30708231723s616e5a75l6d45842809d0064f@mail.gmail.com>

The newly introduced trunc() and round() have the following odd behavior:

$ ./python
Python 3.0x (py3k, Aug 23 2007, 17:15:22)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> trunc(3.14)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type float doesn't define __trunc__ method
[36040 refs]
>>> 3.14.__trunc__
<built-in method __trunc__ of float object at 0x8255244>
[36230 refs]
>>> trunc(3.14)
3
[36230 refs]
>>>

It looks like builtin_trunc() is failing at the call to
_PyType_Lookup(), which must be returning NULL to get the above
behavior. I'm not sure what's causing this; perhaps someone more
experienced than me has an idea?

Keir

From larry at hastings.org  Fri Aug 24 02:50:31 2007
From: larry at hastings.org (Larry Hastings)
Date: Thu, 23 Aug 2007 17:50:31 -0700
Subject: [Python-3000] [PATCH] Fix dumbdbm, which fixes test_shelve (for
 me); instrument other tests so we catch this sooner (and more directly)
In-Reply-To: <46CE2049.8020802@hastings.org>
References: <46CE2049.8020802@hastings.org>
Message-ID: <46CE2B57.2040909@hastings.org>


Patch submitted to Roundup; it's issue #1007:
    http://bugs.python.org/issue1007
(It's listed under Python 2.6 as there's currently no appropriate choice 
in the "Versions" list.)


/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070823/b9c2aa29/attachment.htm 

From ero.carrera at gmail.com  Fri Aug 24 02:55:03 2007
From: ero.carrera at gmail.com (Ero Carrera)
Date: Thu, 23 Aug 2007 17:55:03 -0700
Subject: [Python-3000] String to unicode fixes in time and datetime
Message-ID: <883A2C41-5CCB-4C2A-97D6-E5ACE5DEA46F@gmail.com>


Hi,

I'm attaching a small patch result of attempting to tackle part of  
one of the tasks in the Google Sprint.
The patch removes most of the references of PyString_* calls in the  
"time" and "datetime" modules and adds Unicode support instead.

There's a problem in "datetime" with  
"_PyUnicode_AsDefaultEncodedString". As there's no current equivalent  
that would provide an object of type "bytes", there are two  
occurrences of PyString_* functions to handle the returned "default  
encoded string" and convert it into bytes.

cheers,
--
Ero


-------------- next part --------------
A non-text attachment was scrubbed...
Name: time_datetime_pystring_patch.diff
Type: application/octet-stream
Size: 9162 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070823/8b0f0b34/attachment.obj 

From guido at python.org  Fri Aug 24 04:02:23 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 19:02:23 -0700
Subject: [Python-3000] Strange method resolution problem with __trunc__,
	__round__ on floats
In-Reply-To: <ef5675f30708231723s616e5a75l6d45842809d0064f@mail.gmail.com>
References: <ef5675f30708231723s616e5a75l6d45842809d0064f@mail.gmail.com>
Message-ID: <ca471dc20708231902w49aea02hbef53ac73dcbd002@mail.gmail.com>

I figured it out by stepping through builtin_trunc and into
_PyType_Lookup for a bit. The type is not completely initialized;
apparently fundamental types like float get initialized lazily
*really* late. Inserting this block of code before the _PyType_Lookup
call fixes things:

        if (Py_Type(number)->tp_dict == NULL) {
                if (PyType_Ready(Py_Type(number)) < 0)
                        return NULL;
        }

I'll check in a change ASAP.

(Eric: this applies to the code I mailed you for format() earlier too!)

--Guido

On 8/23/07, Keir Mierle <mierle at gmail.com> wrote:
> The newly introduced trunc() and round() have the following odd behavior:
>
> $ ./python
> Python 3.0x (py3k, Aug 23 2007, 17:15:22)
> [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> trunc(3.14)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: type float doesn't define __trunc__ method
> [36040 refs]
> >>> 3.14.__trunc__
> <built-in method __trunc__ of float object at 0x8255244>
> [36230 refs]
> >>> trunc(3.14)
> 3
> [36230 refs]
> >>>
>
> It looks like builtin_trunc() is failing at the call to
> _PyType_Lookup(), which must be returning NULL to get the above
> behavior. I'm not sure what's causing this; perhaps someone more
> experienced than me has an idea?
>
> Keir
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From adrian at holovaty.com  Fri Aug 24 04:50:16 2007
From: adrian at holovaty.com (Adrian Holovaty)
Date: Thu, 23 Aug 2007 21:50:16 -0500
Subject: [Python-3000] Should 2to3 point out *possible*,
	but not definite changes?
Message-ID: <a2eb1c260708231950y534b3efajafc0e9dc0e29294a@mail.gmail.com>

As part of the Python 3000 sprint (at Google's Chicago office), I've
been working on the documentation for 2to3. I'm publishing updates at
http://red-bean.com/~adrian/2to3.rst and will submit this as a
documentation patch when it's completed. (I didn't get as much done
today as I would have liked, but I'll be back at it Friday.)

In my research of the 2to3 utility, I've been thinking about whether
it should be expanded to include the equivalent of "warnings." I know
one of its design goals has been to be "dumb but correct," but I
propose that including optional warnings would be a bit
smarter/helpful, without risking the tool's correctness.

Specifically, I propose:

*  2to3 gains either an "--include-warnings" option or an
"--exclude-warnings" option, depending on which behavior is decided to
be default.

* If this option is set, the utility would search for an *additional*
set of fixes -- fixes that *might* need to be made to the code but
cannot be determined with certainty. An example of this is noted in
the "Limitations" section of the 2to3 README:

      a = apply
      a(f, *args)

(2to3 cannot handle this because it cannot detect reassignment.)

Under my proposal, the utility would notice that "apply" is a builtin
whose behavior is changing, and that this is a situation in which the
correct 2to3 porting is ambiguous. The utility would designate this in
the output with a Python comment on the previous line:

      # 2to3note: The semantics of apply() have changed.
      a = apply
      a(f, *args)

Each comment would have a common prefix such as "2to3note" for easy grepping.

Given the enormity of the Python 3000 syntax change, I think that the
2to3 utility should provide as much guidance as possible. What it does
currently is extremely cool (I daresay miraculous), but I think we can
get closer to 100% coverage if we take into account the ambiguous
changes.

Oh, and I'm happy to (attempt to) write this addition to the tool, as
long as the powers at be deem it worthwhile.

Thoughts?

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com

From yginsburg at gmail.com  Fri Aug 24 04:24:02 2007
From: yginsburg at gmail.com (Yuri Ginsburg)
Date: Thu, 23 Aug 2007 19:24:02 -0700
Subject: [Python-3000] make uuid.py creation threadsafe
Message-ID: <3343b3d90708231924rac129a5p2da2cd03a274dfed@mail.gmail.com>

The attached small patch makes output buffer thus making uuid.py thread-safe.

-- 
Yuri Ginsburg (YG10)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uuid.py.diff
Type: application/octet-stream
Size: 1570 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070823/3d6eabfd/attachment.obj 

From greg.ewing at canterbury.ac.nz  Fri Aug 24 05:36:25 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 24 Aug 2007 15:36:25 +1200
Subject: [Python-3000] Is __cmp__ deprecated?
In-Reply-To: <ca471dc20708230843w70b31409gca41c091367566d7@mail.gmail.com>
References: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com>
	<ca471dc20708230843w70b31409gca41c091367566d7@mail.gmail.com>
Message-ID: <46CE5239.1000506@canterbury.ac.nz>

Guido van Rossum wrote:
> cmp() is still alive and __cmp__ is used to overload it; on the one
> hand I'd like to get rid of it but OTOH it's occasionally useful.

Maybe you could keep cmp() but implement it in terms
of, say, __lt__ and __eq__?

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Fri Aug 24 05:40:54 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 24 Aug 2007 15:40:54 +1200
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <20070823171837.GI24059@electricrain.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<B24E6FF9-3080-4494-A377-BE68B9949A52@gmail.com>
	<ca471dc20708061906w5957811evfc02cf9f04d7cfde@mail.gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
Message-ID: <46CE5346.10301@canterbury.ac.nz>

Gregory P. Smith wrote:
> Wasn't a past mailing list thread claiming the bytes type was supposed
> to be great for IO?  How's that possible unless we add a lock to the
> bytesobject?

Doesn't the new buffer protocol provide something for
getting a locked view of the data? If so, it seems like
bytes should implement that.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From eric+python-dev at trueblade.com  Fri Aug 24 05:57:04 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 23 Aug 2007 23:57:04 -0400
Subject: [Python-3000] PEP 3101 implementation uploaded to the tracker.
Message-ID: <46CE5710.2030907@trueblade.com>

There are a handful of remaining issues, but it works for the most part.

http://bugs.python.org/issue1009

Thanks to Guido and Talin for all of their help the last few days, and 
thanks to Patrick Maupin for help with the initial implementation.

Known issues:
Better error handling, per the PEP.

Need to write Formatter class.

test_long is failing, but I don't think it's my doing.

Need to fix this warning that I introduced when compiling 
Python/formatter_unicode.c:
Objects/stringlib/unicodedefs.h:26: warning: `STRINGLIB_CMP' defined but 
not used

Need more tests for sign handling for int and float.

It still supports "()" sign formatting from an earlier PEP version.

Eric.

From mierle at gmail.com  Fri Aug 24 06:03:39 2007
From: mierle at gmail.com (Keir Mierle)
Date: Thu, 23 Aug 2007 21:03:39 -0700
Subject: [Python-3000] [PATCH] Implement remaining rich comparison
	operations on dictviews
Message-ID: <ef5675f30708232103u70ee3527he94af64e07103b06@mail.gmail.com>

This patch implements rich comparisons with dict views such that
dict().keys() can be compared like a set (i.e. < is subset, etc).

Keir
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dictview_richcompare.diff
Type: text/x-patch
Size: 4420 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070823/9623468b/attachment.bin 

From guido at python.org  Fri Aug 24 06:10:25 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 21:10:25 -0700
Subject: [Python-3000] String to unicode fixes in time and datetime
In-Reply-To: <883A2C41-5CCB-4C2A-97D6-E5ACE5DEA46F@gmail.com>
References: <883A2C41-5CCB-4C2A-97D6-E5ACE5DEA46F@gmail.com>
Message-ID: <ca471dc20708232110gac70193u40d3a26c5ba1ac40@mail.gmail.com>

Hi Ero,

Thanks for these! I checked them in.

The datetime patch had a few problems (did you run the unit test?)
that I got rid of.

The function you were looking for does exist, it's
PyUnicode_AsUTF8String(). (which returns a new object instead of a
borrowed reference). I changed your code to use this.

I changed a few places from PyBuffer_FromStringAndSize("", 1) to
<ditto>("", 0) -- the bytes object always allocates an extra null byte
that isn't included in the count.

I changed a few places from using strlen() to using the
PyBuffer_GET_SIZE() macro.

PyBuffer_AS_STRING() can be NULL if the size is 0; I rearranged some
code to avoid asserts triggering in this case.

There are still two remaining problems: test_datetime leaks a bit (49
references) and test_strptime ands test_strftime leak a lot (over 2000
references!).  We can hunt these down tomorrow.

--Guido

On 8/23/07, Ero Carrera <ero.carrera at gmail.com> wrote:
>
> Hi,
>
> I'm attaching a small patch result of attempting to tackle part of
> one of the tasks in the Google Sprint.
> The patch removes most of the references of PyString_* calls in the
> "time" and "datetime" modules and adds Unicode support instead.
>
> There's a problem in "datetime" with
> "_PyUnicode_AsDefaultEncodedString". As there's no current equivalent
> that would provide an object of type "bytes", there are two
> occurrences of PyString_* functions to handle the returned "default
> encoded string" and convert it into bytes.
>
> cheers,
> --
> Ero
>
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 24 06:13:51 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 21:13:51 -0700
Subject: [Python-3000] make uuid.py creation threadsafe
In-Reply-To: <3343b3d90708231924rac129a5p2da2cd03a274dfed@mail.gmail.com>
References: <3343b3d90708231924rac129a5p2da2cd03a274dfed@mail.gmail.com>
Message-ID: <ca471dc20708232113s396ecf3co8c0137c51ff09552@mail.gmail.com>

Thanks!

Committed revision 57375.


On 8/23/07, Yuri Ginsburg <yginsburg at gmail.com> wrote:
> The attached small patch makes output buffer thus making uuid.py thread-safe.
>
> --
> Yuri Ginsburg (YG10)
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Fri Aug 24 06:15:26 2007
From: janssen at parc.com (Bill Janssen)
Date: Thu, 23 Aug 2007 21:15:26 PDT
Subject: [Python-3000] sprint patch for server-side SSL
Message-ID: <46CE5B5E.8030005@parc.com>

Here's the final form of the SSL patch. Now includes a test file.   All 
bugs discovered on Wednesday have been fixed.

Bill

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ssl-update-diff
Url: http://mail.python.org/pipermail/python-3000/attachments/20070823/0c1c0c37/attachment-0001.txt 

From guido at python.org  Fri Aug 24 06:17:04 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 21:17:04 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <46CE5346.10301@canterbury.ac.nz>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
Message-ID: <ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>

On 8/23/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Gregory P. Smith wrote:
> > Wasn't a past mailing list thread claiming the bytes type was supposed
> > to be great for IO?  How's that possible unless we add a lock to the
> > bytesobject?
>
> Doesn't the new buffer protocol provide something for
> getting a locked view of the data? If so, it seems like
> bytes should implement that.

It *does* implement that! So there's the solution: these APIs should
not insist on bytes but use the buffer API. It's quite a bit of work I
suspect (especially since you can't use PyArg_ParseTuple with y# any
more) but worth it.

BTW PyUnicode should *not* support the buffer API.

I'll add both of these to the task spreadsheet.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 24 06:36:45 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 21:36:45 -0700
Subject: [Python-3000] Should 2to3 point out *possible*,
	but not definite changes?
In-Reply-To: <a2eb1c260708231950y534b3efajafc0e9dc0e29294a@mail.gmail.com>
References: <a2eb1c260708231950y534b3efajafc0e9dc0e29294a@mail.gmail.com>
Message-ID: <ca471dc20708232136w725482d1jc6ffd857984b0758@mail.gmail.com>

Yes, I think this would be way cool! I believe there are already a few
fixers that print messages about things they know are wrong but don't
know how to fix, those could also be integrated (although arguably
you'd want those messages to be treated as more severe).

Does this mean that Django is committing to converting to Py3k? :-)

--Guido

On 8/23/07, Adrian Holovaty <adrian at holovaty.com> wrote:
> As part of the Python 3000 sprint (at Google's Chicago office), I've
> been working on the documentation for 2to3. I'm publishing updates at
> http://red-bean.com/~adrian/2to3.rst and will submit this as a
> documentation patch when it's completed. (I didn't get as much done
> today as I would have liked, but I'll be back at it Friday.)
>
> In my research of the 2to3 utility, I've been thinking about whether
> it should be expanded to include the equivalent of "warnings." I know
> one of its design goals has been to be "dumb but correct," but I
> propose that including optional warnings would be a bit
> smarter/helpful, without risking the tool's correctness.
>
> Specifically, I propose:
>
> *  2to3 gains either an "--include-warnings" option or an
> "--exclude-warnings" option, depending on which behavior is decided to
> be default.
>
> * If this option is set, the utility would search for an *additional*
> set of fixes -- fixes that *might* need to be made to the code but
> cannot be determined with certainty. An example of this is noted in
> the "Limitations" section of the 2to3 README:
>
>       a = apply
>       a(f, *args)
>
> (2to3 cannot handle this because it cannot detect reassignment.)
>
> Under my proposal, the utility would notice that "apply" is a builtin
> whose behavior is changing, and that this is a situation in which the
> correct 2to3 porting is ambiguous. The utility would designate this in
> the output with a Python comment on the previous line:
>
>       # 2to3note: The semantics of apply() have changed.
>       a = apply
>       a(f, *args)
>
> Each comment would have a common prefix such as "2to3note" for easy grepping.
>
> Given the enormity of the Python 3000 syntax change, I think that the
> 2to3 utility should provide as much guidance as possible. What it does
> currently is extremely cool (I daresay miraculous), but I think we can
> get closer to 100% coverage if we take into account the ambiguous
> changes.
>
> Oh, and I'm happy to (attempt to) write this addition to the tool, as
> long as the powers at be deem it worthwhile.
>
> Thoughts?
>
> Adrian
>
> --
> Adrian Holovaty
> holovaty.com | djangoproject.com
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 24 07:03:57 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 22:03:57 -0700
Subject: [Python-3000] uuid creation not thread-safe?
In-Reply-To: <1185671069.839769.274620@z28g2000prd.googlegroups.com>
References: <ca471dc20707201052p68883fc5l3efd8ecc5cfd497f@mail.gmail.com>
	<1185671069.839769.274620@z28g2000prd.googlegroups.com>
Message-ID: <ca471dc20708232203h5e918e3aw9af156da3b0fbb9b@mail.gmail.com>

This was now fixed in 3.0. Somebody might want to backport.

On 7/28/07, lcaamano <lcaamano at gmail.com> wrote:
>
> On Jul 20, 1:52 pm, "Guido van Rossum" <gu... at python.org> wrote:
> > I discovered what appears to be a thread-unsafety inuuid.py. This is
> > in the trunk as well as in 3.x; I'm using the trunk here for easy
> > reference. There's some code around like 395:
> >
> >     import ctypes, ctypes.util
> >     _buffer = ctypes.create_string_buffer(16)
> >
> > This creates a *global* buffer which is used as the output parameter
> > to later calls to _uuid_generate_random() and _uuid_generate_time().
> > For example, around line 481, in uuid1():
> >
> >         _uuid_generate_time(_buffer)
> >         returnUUID(bytes=_buffer.raw)
> >
> > Clearly if two threads do this simultaneously they are overwriting
> > _buffer in unpredictable order. There are a few other occurrences of
> > this too.
> >
> > I find it somewhat disturbing that what seems a fairly innocent
> > function that doesn't *appear* to have global state is nevertheless
> > not thread-safe. Would it be wise to fix this, e.g. by allocating a
> > fresh output buffer inside uuid1() and other callers?
> >
>
>
> I didn't find any reply to this, which is odd, so forgive me if it's
> old news.
>
> I agree with you that it's not thread safe and that a local buffer in
> the stack should fix it.
>
> Just for reference, the thread-safe uuid extension we've been using
> since python 2.1, which I don't recall where we borrow it from, uses a
> local buffer in the stack.  It looks like this:
>
> -----begin uuid.c--------------
>
> static char uuid__doc__ [] =
> "DCE compatible Universally Unique Identifier module";
>
> #include "Python.h"
> #include <uuid/uuid.h>
>
> static char uuidgen__doc__ [] =
> "Create a new DCE compatible UUID value";
>
> static PyObject *
> uuidgen(void)
> {
> uuid_t out;
> char buf[48];
>
>     uuid_generate(out);
>     uuid_unparse(out, buf);
>     return PyString_FromString(buf);
> }
>
> static PyMethodDef uuid_methods[] = {
>     {"uuidgen", uuidgen, 0, uuidgen__doc__},
>     {NULL,      NULL}        /* Sentinel */
> };
>
> DL_EXPORT(void)
> inituuid(void)
> {
>     Py_InitModule4("uuid",
>                uuid_methods,
>                uuid__doc__,
>                (PyObject *)NULL,
>                PYTHON_API_VERSION);
> }
>
> -----end uuid.c--------------
>
>
> It also seems that using uuid_generate()/uuid_unparse() should be
> faster than using uuid_generate_random() and then creating a python
> object to call its __str__ method.  If so, it would be nice if the
> uuid.py module also provided equivalent fast versions that returned
> strings instead of objects.
>
>
> --
> Luis P Caamano
> Atlanta, GA, USA
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 24 07:08:20 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 22:08:20 -0700
Subject: [Python-3000] Move to a "py3k" branch *DONE*
In-Reply-To: <43aa6ff70708231147h673bf14evbb87ef262094a7af@mail.gmail.com>
References: <ca471dc20708090743k30562834uee2abd624f2f41b5@mail.gmail.com>
	<ee2a432c0708092237i3b1cf045ied38af29c75d394c@mail.gmail.com>
	<20070823004155.GB12780@electricrain.com>
	<43aa6ff70708231147h673bf14evbb87ef262094a7af@mail.gmail.com>
Message-ID: <ca471dc20708232208n2104fa95paf6e23c9fccf361e@mail.gmail.com>

That looks like a simple logic bug in the routine where the assert is
failing (should return 1 when detecting None). I'll check in a fix
momentarily.

--Guido

On 8/23/07, Collin Winter <collinw at gmail.com> wrote:
> On 8/22/07, Gregory P. Smith <greg at electricrain.com> wrote:
> > > > There are currently about 7 failing unit tests left:
> > > >
> > > > test_bsddb
> > > > test_bsddb3
> > ...
> >
> > fyi these two pass for me on the current py3k branch on ubuntu linux
> > and mac os x 10.4.9.
>
> test_bsddb works for me on Ubuntu, and test_bsddb3 was working for me
> yesterday, but now fails with
>
> python: /home/collinwinter/src/python/py3k/Modules/_bsddb.c:388:
> make_dbt: Assertion `((((PyObject*)(obj))->ob_type) == (&PyBytes_Type)
> || PyType_IsSubtype((((PyObject*)(obj))->ob_type), (&PyBytes_Type)))'
> failed.
>
> The failure occurs after this line is emitted
>
> test02_cursors (bsddb.test.test_dbshelve.EnvThreadHashShelveTestCase) ... ok
>
> Collin Winter
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Fri Aug 24 07:09:28 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 24 Aug 2007 07:09:28 +0200
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>	<46B7EA06.5040106@v.loewis.de>	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>	<46B7FACC.8030503@v.loewis.de>	<20070822235929.GA12780@electricrain.com>	<46CD2209.8000408@v.loewis.de>	<20070823073837.GA14725@electricrain.com>	<46CD3BFF.5080904@v.loewis.de>	<20070823171837.GI24059@electricrain.com>	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
Message-ID: <46CE6808.1070007@v.loewis.de>

> It *does* implement that! So there's the solution: these APIs should
> not insist on bytes but use the buffer API. It's quite a bit of work I
> suspect (especially since you can't use PyArg_ParseTuple with y# any
> more) but worth it.

I think there could be another code for PyArg_ParseTuple (or the meaning
of y# be changed): that code would not only return char* and Py_ssize_t,
but also a PyObject* and fill a PyBuffer b to be passed to
PyObject_ReleaseBuffer(o, &b).

> BTW PyUnicode should *not* support the buffer API.

Why not? It should set readonly to 1, and format to "u" or "w".

Regards,
Martin


From guido at python.org  Fri Aug 24 07:19:49 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Aug 2007 22:19:49 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <46CE6808.1070007@v.loewis.de>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
	<46CE6808.1070007@v.loewis.de>
Message-ID: <ca471dc20708232219s5f931788i64af49abebb97e3@mail.gmail.com>

On 8/23/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > It *does* implement that! So there's the solution: these APIs should
> > not insist on bytes but use the buffer API. It's quite a bit of work I
> > suspect (especially since you can't use PyArg_ParseTuple with y# any
> > more) but worth it.
>
> I think there could be another code for PyArg_ParseTuple (or the meaning
> of y# be changed): that code would not only return char* and Py_ssize_t,
> but also a PyObject* and fill a PyBuffer b to be passed to
> PyObject_ReleaseBuffer(o, &b).

That hardly saves any work compared to O though.

> > BTW PyUnicode should *not* support the buffer API.
>
> Why not? It should set readonly to 1, and format to "u" or "w".

Because the read() method of binary files (and similar places, like
socket.send() and in the future probably various database objects)
accept anything that supports the buffer API, but writing a (text)
string to these is almost certainly a bug. Not supporting the buffer
API in PyUnicode is IMO preferable to making explicit exceptions for
PyUnicode in all those places.

I don't think that the savings possible when writing to a text file
using the UTF-16 or -32 encoding (whichever matches Py_UNICODE_SIZE)
in the native byte order are worth leaving that bug unchecked.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From aahz at pythoncraft.com  Fri Aug 24 15:02:52 2007
From: aahz at pythoncraft.com (Aahz)
Date: Fri, 24 Aug 2007 06:02:52 -0700
Subject: [Python-3000] Is __cmp__ deprecated?
In-Reply-To: <46CE5239.1000506@canterbury.ac.nz>
References: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com>
	<ca471dc20708230843w70b31409gca41c091367566d7@mail.gmail.com>
	<46CE5239.1000506@canterbury.ac.nz>
Message-ID: <20070824130252.GB18456@panix.com>

On Fri, Aug 24, 2007, Greg Ewing wrote:
> Guido van Rossum wrote:
>>
>> cmp() is still alive and __cmp__ is used to overload it; on the one
>> hand I'd like to get rid of it but OTOH it's occasionally useful.
> 
> Maybe you could keep cmp() but implement it in terms of, say, __lt__
> and __eq__?

No!  The whole point of cmp() is to be able to make *one* call; this is
especially important for things like Decimal and NumPy.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you don't know what your program is supposed to do, you'd better not
start writing it."  --Dijkstra

From thomas at python.org  Fri Aug 24 16:33:47 2007
From: thomas at python.org (Thomas Wouters)
Date: Fri, 24 Aug 2007 16:33:47 +0200
Subject: [Python-3000] Removing simple slicing
Message-ID: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>

I did some work at last year's Google sprint on removing the simple slicing
API (__getslice__, tp_as_sequence->sq_slice) in favour of the more flexible
sliceobject API (__getitem__ and tp_as_mapping->mp_subscript using slice
objects as index.) For some more detail, see the semi-PEP below. (I hesitate
to call it a PEP because it's way past the Py3k PEP deadline, but the email
I was originally going to send on this subject grew in such a size that I
figured I might as well use PEP layout and use the opportunity to record
some best practices and behaviour. And the change should probably be
recorded in a PEP anyway, even though it has never been formally proposed,
just taken as a given.)

If anyone is bored and/or interested in doing some complicated work, there
is still a bit of (optional) work to be done in this area: I uploaded
patches to be applied to the trunk SF 8 months ago -- extended slicing
support for a bunch of types. Some of that extended slicing support is
limited to step-1 slices, though, most notably UserString.MutableString and
ctypes. I can guarantee adding non-step-1 support to them is a challenging
and fulfilling exercise, having done it for several types, but I can't
muster the intellectual stamina to do it for these (to me) fringe types. The
patches can be found in Roundup:
http://bugs.python.org/issue?%40search_text=&title=&%40columns=title&id=&%40columns=id&creation=&creator=twouters&activity=&%40columns=activity&%40sort=activity&actor=&type=&components=&versions=&severity=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&%40pagesize=50&%40startwith=0&%40action=search(there
doesn't seem to be a shorter URL; just search for issues created by
'twouters' instead.)

If nobody cares, I will be checking these patches into the trunk this
weekend (after updating them), and then update and check in the rest of the
p3yk-noslice branch into the py3k branch.

Abstract
========

This proposal discusses getting rid of the two types of slicing Python uses,
``simple`` and ``extended``. Extended slicing was added later, and uses a
different API at both the C and the Python level for backward compatibility.
Extended slicing can express everything simple slicing can express,
however, making the simple slicing API practically redundant.

A Tale of Two APIs
==================

Simple slicing is a slice operation without a step, Ellipsis or tuple of
slices -- the archetypical slice of just `start` and/or `stop`, with a
single colon separating them and both sides being optional::

    L[1:3]
    L[2:]
    L[:-5]
    L[:]

An extended slice is any slice that isn't simple::

    L[1:5:2]
    L[1:3, 8:10]
    L[1, ..., 5:-2]
    L[1:3:]

(Note that the presence of an extra colon in the last example makes the very
first simple slice an extended slice, but otherwise expresses the exact same
slicing operation.)

In applying a simple slice, Python does the work of translating omitted, out
of bounds or negative indices into the appropriate actual indices, based on
the length of the sequence. The normalized ``start`` and ``stop`` indices
are then passed to the appropriate method: ``__getslice__``,
``__setslice__`` or ``__delslice__`` for Python classes,
``tp_as_sequence``'s ``sq_slice`` or ``sq_ass_slice`` for C types.

For extended slicing, no special handling of slice indices is done. The
indices in ``start:stop:step`` are wrapped in a ``slice`` object, with
missing indices represented as None. The indices are otherwise taken as-is.
The sequence object is then indexed with the slice object as if it were a
mapping: ``__getitem__``,`` __setitem__`` or ``__delitem__`` for Python
classes, ``tp_as_mapping``'s ``mp_subscript`` or ``mp_ass_subscript``.
It is entirely up to the sequence to interpret the meaning of missing, out
of bounds or negative indices, let alone non-numerical indices like tuples
or Ellipsis or arbitrary objects.

Since at least Python 2.1, applying a simple slice to an object that does
not
implement the simple slicing API will fall back to using extended slicing,
calling __getitem__ (or mp_subscript) instead of __getslice__ (or sq_slice),
and similarly for slice assignment/deletion.

Problems
========

Aside from the obvious disadvantage of having two ways to do the same thing,
simple slicing is an inconvenient wart for several reasons:

 1) It (passively) promotes supporting only simple slicing, as observed by
    the builtin types only supporting extended slicing many years after
    extended slicing was introduced.

 2) The Python VM dedicates 12 of its opcodes, about 11%, to support
    simple slicing, and effectively reserves another 13 for code
    convenience. Reducing the Big Switch in the bytecode interpreter
    would certainly not hurt Python performance.

 5) The same goes for the number of functions, macros and function-pointers
    supporting simple slicing, although the impact would be maintainability
    and readability of the source rather than performance.

Proposed Solution
=================

The proposed solution, as implemented in the p3yk-noslice SVN branch, gets
rid of the simple slicing methods and PyType entries. The simple C API
(using ``Py_ssize_t`` for start and stop) remains, but creates a slice
object as necessary instead. Various types had to be updated to support
slice objects, or improve the simple slicing case of extended slicing.

The result is that ``__getslice__``, ``__setslice__`` and ``__delslice__``
are no longer
called in any situation. Classes that delegate ``__getitem__`` (or the C
equivalent) to a sequence type get any slicing behaviour of that type for
free. Classes that implement their own slicing will have to be modified to
accept slice objects and process the indices themselves. This means that at
the C level, like is already the case at the Python level, the same method
is used for mapping-like access as for slicing. C types will still want to
implement ``tp_as_sequence->sq_item``, but that function will only be called
when using the ``PySequence_*Item()`` API. Those API functions do not (yet)
fall
back to using ``tp_as_mapping->mp_subscript``, although they possibly
should.

A casualty of this change is ``PyMapping_Check()``. It used to check for
``tp_as_mapping`` being available, and was modified to check for
``tp_as_mapping`` but *not* ``tp_as_sequence->sq_slice`` when extended
slicing was added to the builtin types. It could conceivably check for
``tp_as_sequence->sq_item`` instead of ``sq_slice``, but the added value is
unclear (especially considering ABCs.) In the standard library and CPython
itself, ``PyMapping_Check()`` is used mostly to provide early errors, for
instance by checking the arguments to ``exec()``.

Alternate Solution
------------------

A possible alternative to removing simple slicing completely, would be to
introduce a new typestruct hook, with the same signature as
``tp_as_mapping->mp_subscript``, which would be called for slicing
operations. All as-mapping index operations would have to fall back to this
new ``sq_extended_slice`` hook, in order for ``seq[slice(...)]`` to work as
expected. For some added efficiency and error-checking, expressions using
actual slice syntax could compile into bytecodes specific for slicing (of
which there would only be three, instead of twelve.) This approach would
simplify C types wanting to support extended slicing but not
arbitrary-object indexing (and vice-versa) somewhat, but the benefit seems
too small to warrant the added complexity in the CPython runtime itself.


Implementing Extended Slicing
=============================

Supporting extended slicing in C types is not as easily done as supporting
simple slicing. There are a number of edgecases in interpreting the odder
combinations of ``start``, ``stop`` and ``step``. This section tries to give
some explanations and best practices.

Extended Slicing in C
---------------------

Because the mapping API takes precedence over the sequence API, any
``tp_as_mapping->mp_subscript`` and ``tp_as_mapping->mp_ass_subscript``
functions need to proper typechecks on their argument. In Python 2.5 and
later, this is best done using ``PyIndex_Check()`` and ``PySlice_Check()``
(and possibly ``PyTuple_Check()`` and comparison against ``Py_Ellipsis``.)
For compatibility with Python 2.4 and earlier, ``PyIndex_Check()`` would
have to be replaced with ``PyInt_Check()`` and ``PyLong_Check()``.

Indices that pass ``PyIndex_Check()`` should be converted to a
``Py_ssize_t`` using ``PyIndex_AsSsizeT()`` and delegated to
``tp_as_sequence->sq_item``. (For compatibility with Python 2.4, use
``PyNumber_AsLong()`` and downcast to an ``int`` instead.)

The exact meaning of tuples of slices, and of Ellipsis, is up to the type,
as no standard-library types support it. It may be useful to use the same
convention as the Numpy package. Slices inside tuples, if supported, should
probably follow the same rules as direct slices.

>From slice objects, correct indices can be extracted with
``PySlice_GetIndicesEx()``. Negative and out-of-bounds indices will be
adjusted based on the provided length, but a negative ``step``, and a
``stop`` before a ``step`` are kept as-is. This means that, for a getslice
operation, a simple for-loop can be used to visit the correct items in the
correct order::

    for (cur = start, i = 0; i < slicelength; cur += step, i++)
        dest[i] = src[cur];


If ``PySlice_GetIndicesEx()`` is not appropriate, the individual indices can
be extracted from the ``PySlice`` object. If the indices are to be converted
to C types, that should be done using ``PyIndex_Check()``,
``PyIndex_AsSsizeT()`` and the ``Py_ssize_t`` type, except that ``None``
should be accepted as the default value for the index.

For deleting slices (``mp_ass_subscript`` called with ``NULL`` as
value) where the order does not matter, a reverse slice can be turned into
the equivalent forward slice with::

    if (step < 0) {
        stop = start + 1;
        start = stop + step*(slicelength - 1) - 1;
        step = -step;
    }


For slice assignment with a ``step`` other than 1, it's usually necessary to
require the source iterable to have the same length as the slice. When
assigning to a slice of length 0, care needs to be taken to select the right
insertion point. For a slice S[5:2], the correct insertion point is before
index 5, not before index 2.

For both deleting slice and slice assignment, it is important to remember
arbitrary Python code may be executed when calling Py_DECREF() or otherwise
interacting with arbitrary objects. Because of that, it's important your
datatype stays consistent throughout the operation. Either operate on a copy
of your datatype, or delay (for instance) Py_DECREF() calls until the
datatype is updated. The latter is usually done by keeping a scratchpad of
to-be-DECREF'ed items.

Extended slicing in Python
--------------------------

The simplest way to support extended slicing in Python is by delegating to
an underlying type that already supports extended slicing. The class can
simply index the underlying type with the slice object (or tuple) it was
indexed with.

Barring that, the Python code will have to pretty much apply
the same logic as the C type. ``PyIndex_AsSsizeT()`` is available as
``operator.index()``, with a ``try/except`` block replacing
``PyIndex_Check()``. ``isinstance(o, slice)`` and ``sliceobj.indices()``
replace ``PySlice_Check()`` and ``PySlice_GetIndices()``, but the
slicelength
(which is provided by ``PySlice_GetIndicesEx()``) has to be calculated
manually.

Testing extended slicing
------------------------

Proper tests of extended slicing capabilities should at least include the
following (if the operations are supported), assuming a sequence of
length 10. Triple-colon notation is used everywhere so it uses extended
slicing even in Python 2.5 and earlier::

   S[2:5:] (same as S[2:5])
   S[5:2:] (same as S[5:2], an empty slice)
   S[::] (same as S[:], a copy of the sequence)
   S[:2:] (same as S[:2])
   S[:11:] (same as S[:11], a copy of the sequence)
   S[5::] (same as S[5:])
   S[-11::] (same as S[-11:], a copy of the sequence)
   S[-5:2:1] (same as S[:2])
   S[-5:-2:2] (same as S[-5:-2], an empty slice)
   S[5:2:-1] (the reverse of S[2:4])
   S[-2:-5:-1] (the reverse of S[-4:-1])

   S[:5:2] ([ S[0], S[2], S[4] ]))
   S[9::2] ([ S[9] ])
   S[8::2] ([ S[8] ])
   S[7::2] ([ S[7], S[9]])
   S[1::-1] ([ S[1], S[0] ])
   S[1:0:-1] ([ S[1] ], does not include S[0]!)
   S[1:-1:-1] (an empty slice)
   S[::10] ([ S[0] ])
   S[::-10] ([ S[9] ])

   S[2:5:] = [1, 2, 3] ([ S[2], S[3], S[4] ] become [1, 2, 3])
   S[2:5:] = [1] (S[2] becomes 1, S[3] and S[4] are deleted)
   S[5:2:] = [1, 2, 3] ([1, 2, 3] inserted before S[5])
   S[2:5:2] = [1, 2] ([ S[2], S[4] ] become [1, 2])
   S[5:2:-2] = [1, 2] ([ S[3], S[5] ] become [2, 1])
   S[3::3] = [1, 2, 3] ([ S[3], S[6], S[9] ] become [1, 2, 3])
   S[:-5:-2] = [1, 2] ([ S[7], S[9] ] become [2, 1])

   S[::-1] = S (reverse S in-place awkwardly)
   S[:5:] = S (replaces S[:5] with a copy of S)

   S[2:5:2] = [1, 2, 3] (error: assigning length-3 to slicelength-2)
   S[2:5:2] = None (error: need iterable)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070824/779e59cb/attachment-0001.htm 

From collinw at gmail.com  Fri Aug 24 18:46:29 2007
From: collinw at gmail.com (Collin Winter)
Date: Fri, 24 Aug 2007 09:46:29 -0700
Subject: [Python-3000] Should 2to3 point out *possible*,
	but not definite changes?
In-Reply-To: <ca471dc20708232136w725482d1jc6ffd857984b0758@mail.gmail.com>
References: <a2eb1c260708231950y534b3efajafc0e9dc0e29294a@mail.gmail.com>
	<ca471dc20708232136w725482d1jc6ffd857984b0758@mail.gmail.com>
Message-ID: <43aa6ff70708240946s4fb506a6o53bf1de54b4c96d6@mail.gmail.com>

On 8/23/07, Guido van Rossum <guido at python.org> wrote:
> Yes, I think this would be way cool! I believe there are already a few
> fixers that print messages about things they know are wrong but don't
> know how to fix, those could also be integrated (although arguably
> you'd want those messages to be treated as more severe).

Adrian and I talked about this this morning, and he said he's going to
go ahead with an implementation. The original warning messages were a
good idea, but they tend to get lost when converting large projects.

Collin Winter


> On 8/23/07, Adrian Holovaty <adrian at holovaty.com> wrote:
> > As part of the Python 3000 sprint (at Google's Chicago office), I've
> > been working on the documentation for 2to3. I'm publishing updates at
> > http://red-bean.com/~adrian/2to3.rst and will submit this as a
> > documentation patch when it's completed. (I didn't get as much done
> > today as I would have liked, but I'll be back at it Friday.)
> >
> > In my research of the 2to3 utility, I've been thinking about whether
> > it should be expanded to include the equivalent of "warnings." I know
> > one of its design goals has been to be "dumb but correct," but I
> > propose that including optional warnings would be a bit
> > smarter/helpful, without risking the tool's correctness.
> >
> > Specifically, I propose:
> >
> > *  2to3 gains either an "--include-warnings" option or an
> > "--exclude-warnings" option, depending on which behavior is decided to
> > be default.
> >
> > * If this option is set, the utility would search for an *additional*
> > set of fixes -- fixes that *might* need to be made to the code but
> > cannot be determined with certainty. An example of this is noted in
> > the "Limitations" section of the 2to3 README:
> >
> >       a = apply
> >       a(f, *args)
> >
> > (2to3 cannot handle this because it cannot detect reassignment.)
> >
> > Under my proposal, the utility would notice that "apply" is a builtin
> > whose behavior is changing, and that this is a situation in which the
> > correct 2to3 porting is ambiguous. The utility would designate this in
> > the output with a Python comment on the previous line:
> >
> >       # 2to3note: The semantics of apply() have changed.
> >       a = apply
> >       a(f, *args)
> >
> > Each comment would have a common prefix such as "2to3note" for easy grepping.
> >
> > Given the enormity of the Python 3000 syntax change, I think that the
> > 2to3 utility should provide as much guidance as possible. What it does
> > currently is extremely cool (I daresay miraculous), but I think we can
> > get closer to 100% coverage if we take into account the ambiguous
> > changes.
> >
> > Oh, and I'm happy to (attempt to) write this addition to the tool, as
> > long as the powers at be deem it worthwhile.
> >
> > Thoughts?
> >
> > Adrian
> >
> > --
> > Adrian Holovaty
> > holovaty.com | djangoproject.com
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/collinw%40gmail.com
>

From fumanchu at aminus.org  Fri Aug 24 18:35:10 2007
From: fumanchu at aminus.org (Robert Brewer)
Date: Fri, 24 Aug 2007 09:35:10 -0700
Subject: [Python-3000] Removing simple slicing
References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>
Message-ID: <9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local>

Thomas Wouters wrote:

> 1) It (passively) promotes supporting only simple slicing,
> as observed by the builtin types only supporting extended
> slicing many years after extended slicing was introduced

Should that read "...only supporting simple slicing..."?

> The proposed solution, as implemented in the p3yk-noslice
> SVN branch, gets rid of the simple slicing methods and
> PyType entries. The simple C API (using ``Py_ssize_t``
> for start and stop) remains, but creates a slice object
> as necessary instead. Various types had to be updated to
> support slice objects, or improve the simple slicing case
> of extended slicing.

Am I reading this correctly, that: since the "simple C API
remains", one can still write S[3:8] with only one colon
and have it work as before? Or would it have to be rewritten
to include two colons?


Robert Brewer
fumanchu at aminus.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070824/1aa6ddd6/attachment.htm 

From guido at python.org  Fri Aug 24 18:51:02 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Aug 2007 09:51:02 -0700
Subject: [Python-3000] Removing simple slicing
In-Reply-To: <9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local>
References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>
	<9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local>
Message-ID: <ca471dc20708240951j782700bbj2d5d45457fb2602e@mail.gmail.com>

On 8/24/07, Robert Brewer <fumanchu at aminus.org> wrote:
> Thomas Wouters wrote:
>  > The proposed solution, as implemented in the p3yk-noslice
>  > SVN branch, gets rid of the simple slicing methods and
>  > PyType entries. The simple C API (using ``Py_ssize_t``
>  > for start and stop) remains, but creates a slice object
>  > as necessary instead. Various types had to be updated to
>  > support slice objects, or improve the simple slicing case
>  > of extended slicing.
>
>  Am I reading this correctly, that: since the "simple C API
>  remains", one can still write S[3:8] with only one colon
>  and have it work as before? Or would it have to be rewritten
>  to include two colons?

Don't worry, this syntax won't go away; it will be executed differently.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at electricrain.com  Fri Aug 24 18:58:24 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Fri, 24 Aug 2007 09:58:24 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
References: <46B7EA06.5040106@v.loewis.de>
	<ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
Message-ID: <20070824165823.GM24059@electricrain.com>

On Thu, Aug 23, 2007 at 09:17:04PM -0700, Guido van Rossum wrote:
> On 8/23/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > Gregory P. Smith wrote:
> > > Wasn't a past mailing list thread claiming the bytes type was supposed
> > > to be great for IO?  How's that possible unless we add a lock to the
> > > bytesobject?
> >
> > Doesn't the new buffer protocol provide something for
> > getting a locked view of the data? If so, it seems like
> > bytes should implement that.
> 
> It *does* implement that! So there's the solution: these APIs should
> not insist on bytes but use the buffer API. It's quite a bit of work I
> suspect (especially since you can't use PyArg_ParseTuple with y# any
> more) but worth it.
> 
> BTW PyUnicode should *not* support the buffer API.
> 
> I'll add both of these to the task spreadsheet.

this sounds good, i'll work on it today for bsddb and hashlib.

-greg

From guido at python.org  Fri Aug 24 19:02:26 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Aug 2007 10:02:26 -0700
Subject: [Python-3000] Removing simple slicing
In-Reply-To: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>
References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>
Message-ID: <ca471dc20708241002y42a35d06t821f0d040ad485c2@mail.gmail.com>

On 8/24/07, Thomas Wouters <thomas at python.org> wrote:
> If nobody cares, I will be checking these patches into the trunk this
> weekend (after updating them), and then update and check in the rest of the
> p3yk-noslice branch into the py3k branch.

In the trunk? I'm concerned that that might make it (ever so slightly)
incompatible with 2.5, and we're trying to make it as easy as possible
to migrate to 2.6. Or perhaps you're just proposing to change the
standard builtin types to always use the extended API, without
removing the possibility of user types (either in C or in Python)
using the simple API, at least in 2.6?

I think in 2.6, if a class defines __{get,set,del}slice__, that should
still be called when simple slice syntax is used in preference of
__{get,set,del}item__. I'm less sure that this is relevant for the C
API; perhaps someone more familiar with numpy could comment. In 3.0
this should all be gone of course.

Apart from that, I'm looking forward to getting this over with, and
checked in to both 2.6 and 3.0!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 24 19:26:16 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Aug 2007 10:26:16 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <ca471dc20708232219s5f931788i64af49abebb97e3@mail.gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
	<46CE6808.1070007@v.loewis.de>
	<ca471dc20708232219s5f931788i64af49abebb97e3@mail.gmail.com>
Message-ID: <ca471dc20708241026q2b385b37he6a39a9672b2faa1@mail.gmail.com>

On 8/23/07, Guido van Rossum <guido at python.org> wrote:
> > > BTW PyUnicode should *not* support the buffer API.

> On 8/23/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Why not? It should set readonly to 1, and format to "u" or "w".

[me again]
> Because the read() method of binary files (and similar places, like
> socket.send() and in the future probably various database objects)
> accept anything that supports the buffer API, but writing a (text)
> string to these is almost certainly a bug. Not supporting the buffer
> API in PyUnicode is IMO preferable to making explicit exceptions for
> PyUnicode in all those places.
>
> I don't think that the savings possible when writing to a text file
> using the UTF-16 or -32 encoding (whichever matches Py_UNICODE_SIZE)
> in the native byte order are worth leaving that bug unchecked.

I looked at the code, and it's even more complicated than that. The
new buffer API continues to make a distinction between binary and
character data, and there's collusion between the bytes and unicode
types so that this works:

  b = b"abc"
  b[1:2] = "X"

even though these things all fail:

  b.extend("XYZ")
  b += "ZYX"

Unfortunately taking the buffer API away from unicode makes things
fail early (before sys.std{in,out,err} are set), so apparently the I/O
library or something else somehow depends on this.

I'll investigate.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From thomas at python.org  Fri Aug 24 19:39:54 2007
From: thomas at python.org (Thomas Wouters)
Date: Fri, 24 Aug 2007 19:39:54 +0200
Subject: [Python-3000] Removing simple slicing
In-Reply-To: <ca471dc20708241002y42a35d06t821f0d040ad485c2@mail.gmail.com>
References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>
	<ca471dc20708241002y42a35d06t821f0d040ad485c2@mail.gmail.com>
Message-ID: <9e804ac0708241039x502989f5tfe01d4f1318b6082@mail.gmail.com>

On 8/24/07, Guido van Rossum <guido at python.org> wrote:
>
> On 8/24/07, Thomas Wouters <thomas at python.org> wrote:
> > If nobody cares, I will be checking these patches into the trunk this
> > weekend (after updating them), and then update and check in the rest of
> the
> > p3yk-noslice branch into the py3k branch.
>
> In the trunk? I'm concerned that that might make it (ever so slightly)
> incompatible with 2.5, and we're trying to make it as easy as possible
> to migrate to 2.6. Or perhaps you're just proposing to change the
> standard builtin types to always use the extended API, without
> removing the possibility of user types (either in C or in Python)
> using the simple API, at least in 2.6?


The changes I uploaded only implement (and in some cases, fix some bugs in)
extended slicing support in various builtin types. None of the API changes
would be backported (although 2.6 in py3k-warning-mode should obviously tell
people to not define __getslice__, and instead accept slice objects in
__getitem__. Perhaps even when not in py3k-warnings-mode.)

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070824/df1d8e14/attachment.htm 

From guido at python.org  Fri Aug 24 19:46:46 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Aug 2007 10:46:46 -0700
Subject: [Python-3000] Removing simple slicing
In-Reply-To: <9e804ac0708241039x502989f5tfe01d4f1318b6082@mail.gmail.com>
References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>
	<ca471dc20708241002y42a35d06t821f0d040ad485c2@mail.gmail.com>
	<9e804ac0708241039x502989f5tfe01d4f1318b6082@mail.gmail.com>
Message-ID: <ca471dc20708241046s6b1d0226tc0bb94d15bfde2dd@mail.gmail.com>

Oh, good! Forget what I said about 2.6 then. :-)

On 8/24/07, Thomas Wouters <thomas at python.org> wrote:
>
>
> On 8/24/07, Guido van Rossum <guido at python.org> wrote:
> > On 8/24/07, Thomas Wouters <thomas at python.org> wrote:
> > > If nobody cares, I will be checking these patches into the trunk this
> > > weekend (after updating them), and then update and check in the rest of
> the
> > > p3yk-noslice branch into the py3k branch.
> >
> > In the trunk? I'm concerned that that might make it (ever so slightly)
> > incompatible with 2.5, and we're trying to make it as easy as possible
> > to migrate to 2.6. Or perhaps you're just proposing to change the
> > standard builtin types to always use the extended API, without
> > removing the possibility of user types (either in C or in Python)
> > using the simple API, at least in 2.6?
>
> The changes I uploaded only implement (and in some cases, fix some bugs in)
> extended slicing support in various builtin types. None of the API changes
> would be backported (although 2.6 in py3k-warning-mode should obviously tell
> people to not define __getslice__, and instead accept slice objects in
> __getitem__. Perhaps even when not in py3k-warnings-mode.)
>
> --
> Thomas Wouters <thomas at python.org>
>
> Hi! I'm a .signature virus! copy me into your .signature file to help me
> spread!


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From thomas at python.org  Fri Aug 24 19:47:42 2007
From: thomas at python.org (Thomas Wouters)
Date: Fri, 24 Aug 2007 19:47:42 +0200
Subject: [Python-3000] Removing simple slicing
In-Reply-To: <9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local>
References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>
	<9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local>
Message-ID: <9e804ac0708241047s970fe75m9d2310686b5db096@mail.gmail.com>

On 8/24/07, Robert Brewer <fumanchu at aminus.org> wrote:
>
>  Thomas Wouters wrote:
>
> > 1) It (passively) promotes supporting only simple slicing,
> > as observed by the builtin types only supporting extended
> > slicing many years after extended slicing was introduced
>
> Should that read "...only supporting simple slicing..."?
>
Yes :)

> The proposed solution, as implemented in the p3yk-noslice
> > SVN branch, gets rid of the simple slicing methods and
> > PyType entries. The simple C API (using ``Py_ssize_t``
> > for start and stop) remains, but creates a slice object
> > as necessary instead. Various types had to be updated to
> > support slice objects, or improve the simple slicing case
> > of extended slicing.
>
> Am I reading this correctly, that: since the "simple C API
> remains", one can still write S[3:8] with only one colon
> and have it work as before? Or would it have to be rewritten
> to include two colons?
>
No. We're just talking about the underlying object API. The methods on
objects that get called. The changes just mean that S[3:8] will behave
exactly like S[3:8:]. Currently, the former calls __getslice__ or
__getitem__ (if __getslice__ does not exist), the latter always calls
__getitem__.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070824/5da6a97e/attachment.htm 

From skip at pobox.com  Fri Aug 24 23:00:41 2007
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 24 Aug 2007 16:00:41 -0500
Subject: [Python-3000] Removal of PyArg_Parse()
Message-ID: <18127.18169.547364.692145@montanaro.dyndns.org>

I started in looking at removing PyArg_Parse.  The first module I tackled
was the time module.  That was harder than I thought it would be
(PyArg_Parse is only called from one place), in large part I think because
it can take a number of different types of arguments.  Is there some
recommended way of getting rid of it?  I think I can simply replace it with
PyArg_ParseTuple if the format string is enclosed in parens, but is there a
reasonably mechanical approach if the format string doesn't state that the
argument must be a tuple?

Thx,

Skip

From skip at pobox.com  Fri Aug 24 23:20:50 2007
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 24 Aug 2007 16:20:50 -0500
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <18127.18169.547364.692145@montanaro.dyndns.org>
References: <18127.18169.547364.692145@montanaro.dyndns.org>
Message-ID: <18127.19378.582174.753256@montanaro.dyndns.org>


    skip> I started in looking at removing PyArg_Parse.  

Before I go any farther, perhaps I should ask:  Is PyArg_Parse going away or
just its use as the argument parser for METH_OLDARGS functions?

Skip

From mierle at gmail.com  Fri Aug 24 23:22:30 2007
From: mierle at gmail.com (Keir Mierle)
Date: Fri, 24 Aug 2007 14:22:30 -0700
Subject: [Python-3000] Fwd: [issue1015] [PATCH] Updated patch for rich dict
	view (dict().keys()) comparisons
In-Reply-To: <1187990408.37.0.755658114648.issue1015@psf.upfronthosting.co.za>
References: <1187990408.37.0.755658114648.issue1015@psf.upfronthosting.co.za>
Message-ID: <ef5675f30708241422o2aeace9emde34216ae631124a@mail.gmail.com>

I'm sending this to the py3k list to make sure the old patch is not used.

Keir

---------- Forwarded message ----------
From: Keir Mierle <report at bugs.python.org>
Date: Aug 24, 2007 2:20 PM
Subject: [issue1015] [PATCH] Updated patch for rich dict view
(dict().keys()) comparisons
To: mierle at gmail.com

New submission from Keir Mierle:

This an updated version of the patch I submitted earlier to python-3000;
it is almost identical except it extends the test case to cover more of
the code.

----------
components: Interpreter Core
files: dictview_richcompare_ver2.diff
messages: 55275
nosy: keir
severity: normal
status: open
title: [PATCH] Updated patch for rich dict view (dict().keys()) comparisons
versions: Python 3.0

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1015>
__________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dictview_richcompare_ver2.diff
Type: text/x-patch
Size: 4662 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070824/06740548/attachment.bin 

From guido at python.org  Fri Aug 24 23:48:04 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Aug 2007 14:48:04 -0700
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <18127.19378.582174.753256@montanaro.dyndns.org>
References: <18127.18169.547364.692145@montanaro.dyndns.org>
	<18127.19378.582174.753256@montanaro.dyndns.org>
Message-ID: <ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>

I think that's a question for Martin von Loewis. Are there any
existing uses (in the core) that are hard to replace with
PyArg_ParseTuple()?

On 8/24/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     skip> I started in looking at removing PyArg_Parse.
>
> Before I go any farther, perhaps I should ask:  Is PyArg_Parse going away or
> just its use as the argument parser for METH_OLDARGS functions?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nnorwitz at gmail.com  Sat Aug 25 00:32:01 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Fri, 24 Aug 2007 15:32:01 -0700
Subject: [Python-3000] what to do with profilers in the stdlib
Message-ID: <ee2a432c0708241532ndd42617w81efc88faad69176@mail.gmail.com>

We ought to clean up the profiling modules.  There was a long
discussion about this here:

http://mail.python.org/pipermail/python-dev/2005-November/058212.html

Much of the discussion revolved around whether to add lsprof in the
stdlib.  That's been resolved.  It was added.  Now what do we do?

I suggest merging profile and cProfile (which uses _lsprof) similar to
how stringio and pickle are being merged.  This leaves hotshot as odd
man out.  We should remove it.  If we don't remove it, we should try
to merge these modules so they have the same API and capabilities as
much as possible, even if they work in different ways.

The hotshot doc states:

Note

The hotshot module focuses on minimizing the overhead while profiling,
at the expense of long data post-processing times. For common usages
it is recommended to use cProfile instead. hotshot is not maintained
and might be removed from the standard library in the future.

Caveat

The hotshot profiler does not yet work well with threads. It is useful
to use an unthreaded script to run the profiler over the code you're
interested in measuring if at all possible.

n

From guido at python.org  Sat Aug 25 01:05:30 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Aug 2007 16:05:30 -0700
Subject: [Python-3000] what to do with profilers in the stdlib
In-Reply-To: <ee2a432c0708241532ndd42617w81efc88faad69176@mail.gmail.com>
References: <ee2a432c0708241532ndd42617w81efc88faad69176@mail.gmail.com>
Message-ID: <ca471dc20708241605iee75de9p4b7aa02ca8ccbcd4@mail.gmail.com>

I'm still a happy user of profile.py, so I'm probably not the right
one to drive this discussion. :-)

On 8/24/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> We ought to clean up the profiling modules.  There was a long
> discussion about this here:
>
> http://mail.python.org/pipermail/python-dev/2005-November/058212.html
>
> Much of the discussion revolved around whether to add lsprof in the
> stdlib.  That's been resolved.  It was added.  Now what do we do?
>
> I suggest merging profile and cProfile (which uses _lsprof) similar to
> how stringio and pickle are being merged.  This leaves hotshot as odd
> man out.  We should remove it.  If we don't remove it, we should try
> to merge these modules so they have the same API and capabilities as
> much as possible, even if they work in different ways.
>
> The hotshot doc states:
>
> Note
>
> The hotshot module focuses on minimizing the overhead while profiling,
> at the expense of long data post-processing times. For common usages
> it is recommended to use cProfile instead. hotshot is not maintained
> and might be removed from the standard library in the future.
>
> Caveat
>
> The hotshot profiler does not yet work well with threads. It is useful
> to use an unthreaded script to run the profiler over the code you're
> interested in measuring if at all possible.
>
> n
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Sat Aug 25 02:14:32 2007
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 24 Aug 2007 19:14:32 -0500
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>
References: <18127.18169.547364.692145@montanaro.dyndns.org>
	<18127.19378.582174.753256@montanaro.dyndns.org>
	<ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>
Message-ID: <18127.29800.339928.420183@montanaro.dyndns.org>


    Guido> Are there any existing uses (in the core) that are hard to
    Guido> replace with PyArg_ParseTuple()?

There are lots of uses where the arguments aren't tuples.  I was
particularly vexed by the time module because it was used to extract
arguments both from tuples and from time.struct_time objects.

I suspect most of the low-hanging fruit (PyArg_Parse used to parse tuples)
has already been plucked.

Skip

From skip at pobox.com  Sat Aug 25 02:16:53 2007
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 24 Aug 2007 19:16:53 -0500
Subject: [Python-3000] what to do with profilers in the stdlib
In-Reply-To: <ee2a432c0708241532ndd42617w81efc88faad69176@mail.gmail.com>
References: <ee2a432c0708241532ndd42617w81efc88faad69176@mail.gmail.com>
Message-ID: <18127.29941.900987.118459@montanaro.dyndns.org>


    Neal> The hotshot doc states:

    Neal> Note

    Neal> The hotshot module focuses on minimizing the overhead while
    Neal> profiling, at the expense of long data post-processing times. For
    Neal> common usages it is recommended to use cProfile instead. hotshot
    Neal> is not maintained and might be removed from the standard library
    Neal> in the future.

    Neal> Caveat

    Neal> The hotshot profiler does not yet work well with threads. It is
    Neal> useful to use an unthreaded script to run the profiler over the
    Neal> code you're interested in measuring if at all possible.

The cProfile module has the same benefit as hotshot (low run-time cost),
without the downside of long post-processing times.  On that basis alone I
would argue that hotshot be dropped.

Skip

From alexandre at peadrop.com  Sat Aug 25 02:45:53 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Fri, 24 Aug 2007 20:45:53 -0400
Subject: [Python-3000] what to do with profilers in the stdlib
In-Reply-To: <ee2a432c0708241532ndd42617w81efc88faad69176@mail.gmail.com>
References: <ee2a432c0708241532ndd42617w81efc88faad69176@mail.gmail.com>
Message-ID: <acd65fa20708241745t56392ff7k773f89bd618f0e06@mail.gmail.com>

On 8/24/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> I suggest merging profile and cProfile (which uses _lsprof) similar to
> how stringio and pickle are being merged.

cProfile and profile.py are on my merge to-do. I was supposed to merge
cProfile/profile.py as part of my GSoC, but stringio and pickle have
taken most of my time. So, I will merge the profile modules in my free
time.

> This leaves hotshot as odd man out.  We should remove it.  If we don't
> remove it, we should try to merge these modules so they have the same API
> and capabilities as much as possible, even if they work in different
> ways.

I don't think hotshot has any features that cProfile or profile don't
(but I haven't checked throughly yet). So, I agree that it should be
removed.

-- Alexandre

From nnorwitz at gmail.com  Sat Aug 25 03:32:04 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Fri, 24 Aug 2007 18:32:04 -0700
Subject: [Python-3000] what to do with profilers in the stdlib
In-Reply-To: <acd65fa20708241745t56392ff7k773f89bd618f0e06@mail.gmail.com>
References: <ee2a432c0708241532ndd42617w81efc88faad69176@mail.gmail.com>
	<acd65fa20708241745t56392ff7k773f89bd618f0e06@mail.gmail.com>
Message-ID: <ee2a432c0708241832j446d21a6td1913c4a3f2b7c02@mail.gmail.com>

On 8/24/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote:
> On 8/24/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > I suggest merging profile and cProfile (which uses _lsprof) similar to
> > how stringio and pickle are being merged.
>
> cProfile and profile.py are on my merge to-do. I was supposed to merge
> cProfile/profile.py as part of my GSoC, but stringio and pickle have
> taken most of my time. So, I will merge the profile modules in my free
> time.

Awesome!  I was hoping you would volunteer.  It looks like you've made
a ton of progress on stringio and pickle so far.  They are more
important to get done. After they are completed, we can finish off the
profile modules.

n

From greg.ewing at canterbury.ac.nz  Sat Aug 25 03:31:07 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 25 Aug 2007 13:31:07 +1200
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <ca471dc20708241026q2b385b37he6a39a9672b2faa1@mail.gmail.com>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
	<46CE6808.1070007@v.loewis.de>
	<ca471dc20708232219s5f931788i64af49abebb97e3@mail.gmail.com>
	<ca471dc20708241026q2b385b37he6a39a9672b2faa1@mail.gmail.com>
Message-ID: <46CF865B.2050508@canterbury.ac.nz>

Guido van Rossum wrote:
> there's collusion between the bytes and unicode
> types so that this works:
> 
>   b = b"abc"
>   b[1:2] = "X"

Is this intentional? Doesn't it run counter to the idea
that text and bytes should be clearly separated?

> Unfortunately taking the buffer API away from unicode makes things
> fail early

If the buffer API distinguishes between text and binary
buffers, then the binary streams can just accept binary
buffers only, and unicode can keep its buffer API.

--
Greg

From guido at python.org  Sat Aug 25 04:15:49 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Aug 2007 19:15:49 -0700
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <18127.29800.339928.420183@montanaro.dyndns.org>
References: <18127.18169.547364.692145@montanaro.dyndns.org>
	<18127.19378.582174.753256@montanaro.dyndns.org>
	<ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>
	<18127.29800.339928.420183@montanaro.dyndns.org>
Message-ID: <ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>

On 8/24/07, skip at pobox.com <skip at pobox.com> wrote:
>
>     Guido> Are there any existing uses (in the core) that are hard to
>     Guido> replace with PyArg_ParseTuple()?
>
> There are lots of uses where the arguments aren't tuples.  I was
> particularly vexed by the time module because it was used to extract
> arguments both from tuples and from time.struct_time objects.
>
> I suspect most of the low-hanging fruit (PyArg_Parse used to parse tuples)
> has already been plucked.

Then I don't think it's a priority to try to get rid of it, and maybe
it should just stay.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Aug 25 04:19:12 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Aug 2007 19:19:12 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <46CF865B.2050508@canterbury.ac.nz>
References: <D3DD7578-D515-4F46-A3F3-B96A43AB1FBB@gmail.com>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
	<46CE6808.1070007@v.loewis.de>
	<ca471dc20708232219s5f931788i64af49abebb97e3@mail.gmail.com>
	<ca471dc20708241026q2b385b37he6a39a9672b2faa1@mail.gmail.com>
	<46CF865B.2050508@canterbury.ac.nz>
Message-ID: <ca471dc20708241919i5d987826sf0994946e4890e28@mail.gmail.com>

On 8/24/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> > there's collusion between the bytes and unicode
> > types so that this works:
> >
> >   b = b"abc"
> >   b[1:2] = "X"
>
> Is this intentional? Doesn't it run counter to the idea
> that text and bytes should be clearly separated?

Sorry, I wasn't clear. I was describing the status quo, which I am as
unhappy about as you are.

> > Unfortunately taking the buffer API away from unicode makes things
> > fail early
>
> If the buffer API distinguishes between text and binary
> buffers, then the binary streams can just accept binary
> buffers only, and unicode can keep its buffer API.

Yes, but the bytes object is the one doing the work, and for some
reason that I don't yet fathom it asks for character buffers. Probably
because there are tons of places where str and bytes are still being
mixed. :-(

I tried to change the bytes constructor so that bytes(s) is invalid if
isinstance(s, str), forcing one to use bytes(s, <encoding>). This
caused many failures, some of which I could fix, others which seem to
hinge on a fundamental problem (asserting that bytes objects support
the string API).

More work to do... :-(

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Sat Aug 25 04:23:02 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 25 Aug 2007 14:23:02 +1200
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>
References: <18127.18169.547364.692145@montanaro.dyndns.org>
	<18127.19378.582174.753256@montanaro.dyndns.org>
	<ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>
	<18127.29800.339928.420183@montanaro.dyndns.org>
	<ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>
Message-ID: <46CF9286.7040207@canterbury.ac.nz>

Guido van Rossum wrote:
> Then I don't think it's a priority to try to get rid of it, and maybe
> it should just stay.

Maybe it should be renamed to reflect the fact that
it's now general-purpose and no longer used at all
for argument parsing? Perhaps PyObject_Parse?

--
Greg

From eric+python-dev at trueblade.com  Sat Aug 25 04:36:54 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 24 Aug 2007 22:36:54 -0400
Subject: [Python-3000] PEP 3101 implementation uploaded to the tracker.
In-Reply-To: <46CE5710.2030907@trueblade.com>
References: <46CE5710.2030907@trueblade.com>
Message-ID: <46CF95C6.3020606@trueblade.com>

Per Guido, I've checked a slightly different version of this patch in to 
the py3k branch as revision 57444.  The primary difference is that I 
modified sysmodule.c and unicodeobject.c to start implementing the 
string.Formatter class.

Should I mark the original patch as closed in the tracker?

Eric Smith wrote:
> There are a handful of remaining issues, but it works for the most part.
> 
> http://bugs.python.org/issue1009
> 
> Thanks to Guido and Talin for all of their help the last few days, and 
> thanks to Patrick Maupin for help with the initial implementation.
> 
> Known issues:
> Better error handling, per the PEP.
> 
> Need to write Formatter class.
> 
> test_long is failing, but I don't think it's my doing.
> 
> Need to fix this warning that I introduced when compiling 
> Python/formatter_unicode.c:
> Objects/stringlib/unicodedefs.h:26: warning: `STRINGLIB_CMP' defined but 
> not used
> 
> Need more tests for sign handling for int and float.
> 
> It still supports "()" sign formatting from an earlier PEP version.
> 
> Eric.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/eric%2Bpython-dev%40trueblade.com
> 


From guido at python.org  Sat Aug 25 05:08:15 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Aug 2007 20:08:15 -0700
Subject: [Python-3000] PEP 3101 implementation uploaded to the tracker.
In-Reply-To: <46CF95C6.3020606@trueblade.com>
References: <46CE5710.2030907@trueblade.com> <46CF95C6.3020606@trueblade.com>
Message-ID: <ca471dc20708242008x32ab5525q208daaa9fd352524@mail.gmail.com>

On 8/24/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Per Guido, I've checked a slightly different version of this patch in to
> the py3k branch as revision 57444.  The primary difference is that I
> modified sysmodule.c and unicodeobject.c to start implementing the
> string.Formatter class.

Great! I'm looking forward to taking it for a spin.

> Should I mark the original patch as closed in the tracker?

Sure, it's served its purpose.

--Guido

> Eric Smith wrote:
> > There are a handful of remaining issues, but it works for the most part.
> >
> > http://bugs.python.org/issue1009
> >
> > Thanks to Guido and Talin for all of their help the last few days, and
> > thanks to Patrick Maupin for help with the initial implementation.
> >
> > Known issues:
> > Better error handling, per the PEP.
> >
> > Need to write Formatter class.
> >
> > test_long is failing, but I don't think it's my doing.
> >
> > Need to fix this warning that I introduced when compiling
> > Python/formatter_unicode.c:
> > Objects/stringlib/unicodedefs.h:26: warning: `STRINGLIB_CMP' defined but
> > not used
> >
> > Need more tests for sign handling for int and float.
> >
> > It still supports "()" sign formatting from an earlier PEP version.
> >
> > Eric.
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/eric%2Bpython-dev%40trueblade.com
> >
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nnorwitz at gmail.com  Sat Aug 25 05:30:48 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Fri, 24 Aug 2007 20:30:48 -0700
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>
References: <18127.18169.547364.692145@montanaro.dyndns.org>
	<18127.19378.582174.753256@montanaro.dyndns.org>
	<ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>
	<18127.29800.339928.420183@montanaro.dyndns.org>
	<ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>
Message-ID: <ee2a432c0708242030u4f0e5471mf262781f06c64a91@mail.gmail.com>

On 8/24/07, Guido van Rossum <guido at python.org> wrote:
> On 8/24/07, skip at pobox.com <skip at pobox.com> wrote:
> >
> >     Guido> Are there any existing uses (in the core) that are hard to
> >     Guido> replace with PyArg_ParseTuple()?
> >
> > There are lots of uses where the arguments aren't tuples.  I was
> > particularly vexed by the time module because it was used to extract
> > arguments both from tuples and from time.struct_time objects.

There are 45 uses in */*.c spread across 9 modules:
  arraymodule.c, posixmodule.c,
  _hashopenssl.c (2), dbmmodule.c (4), gdbmmodule.c (2),
  mactoolboxglue.c (5), stringobject.c (2)

with Python/mactoolboxglue.c looking like it's low hanging fruit, and
stringobject.c will hopefully go away.  Some of the others don't look
bad.  The bulk of the uses are in array and posixmodules.  I'm not
sure if those are easy to change.

The remaining 65 uses are in Mac modules.  I'm not sure if all of them
are sticking around.  (That's a separate discussion we should
have--which of the mac modules should go.)

> > I suspect most of the low-hanging fruit (PyArg_Parse used to parse tuples)
> > has already been plucked.

I think this is mostly true, but there are still some that are
low-hanging.  Maybe just kill the low hanging fruit for now.

> Then I don't think it's a priority to try to get rid of it, and maybe
> it should just stay.

I agree it's not the biggest priority, but it would be nice if it was
done.  There's still over 500 uses of PyString which is higher
priority, but also probably harder in many cases.

n

From nnorwitz at gmail.com  Sat Aug 25 05:35:47 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Fri, 24 Aug 2007 20:35:47 -0700
Subject: [Python-3000] marshalling bytes objects
Message-ID: <ee2a432c0708242035o21bc3fccmc43b7c4ef2bfec09@mail.gmail.com>

I see in PEP 358 (bytes) http://www.python.org/dev/peps/pep-0358/ that
marshalling bytes is an open issue and needs to be specified.  I'm
converting code objects to use bytes for the bytecode and lnotab.  Is
there anything special to be aware of here?  It seems like it can be
treated like an non-interned string.

n

From nnorwitz at gmail.com  Sat Aug 25 05:49:10 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Fri, 24 Aug 2007 20:49:10 -0700
Subject: [Python-3000] marshalling bytes objects
In-Reply-To: <ee2a432c0708242035o21bc3fccmc43b7c4ef2bfec09@mail.gmail.com>
References: <ee2a432c0708242035o21bc3fccmc43b7c4ef2bfec09@mail.gmail.com>
Message-ID: <ee2a432c0708242049q59bfdbb4l4ab42a885c274c65@mail.gmail.com>

On 8/24/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> I see in PEP 358 (bytes) http://www.python.org/dev/peps/pep-0358/ that
> marshalling bytes is an open issue and needs to be specified.  I'm
> converting code objects to use bytes for the bytecode and lnotab.  Is
> there anything special to be aware of here?

By "here" I was originally thinking about the marshaling aspect.  But
clearly the mutability of bytes isn't particularly good for code
objects. :-)  This goes back to the question of whether bytes should
be able to be immutable (frozen).

n

From martin at v.loewis.de  Sat Aug 25 06:02:32 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 25 Aug 2007 06:02:32 +0200
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>
References: <18127.18169.547364.692145@montanaro.dyndns.org>	<18127.19378.582174.753256@montanaro.dyndns.org>	<ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>	<18127.29800.339928.420183@montanaro.dyndns.org>
	<ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>
Message-ID: <46CFA9D8.4020603@v.loewis.de>

> Then I don't think it's a priority to try to get rid of it, and maybe
> it should just stay.

I think it would be desirable to get rid of METH_OLDARGS. Ideally, this
should already be possible, as all modules should have been changed to
be explicit about their usage of METH_OLDARGS (rather than relying on
the struct field defaulting to 0), this can be "verified" by running the
test suite once and checking that all ml_flags have one of METH_VARARGS,
METH_NOARGS or METH_O set.

Then it would be possible to drop METH_VARARGS, declaring that a 0
value of ml_flags means the default, which is "arguments are passed
as a tuple".

As for the remaining 50 or so PyArg_Parse calls: most of them convert
a single object to some C representation; it should be possible to
use the proper underlying conversion function.

For example:
- dbm/gdbm convert using s#; this can be replaced with the buffer
  API.
- the array module converts the values on setitem using PyArg_Parse;
  these can be replaced with PyInt_AsLong, except that PyArg_Parse
  also does a range check, which could be moved into a range-checking
  function in arraymodule.

As for the case of timemodule: the surprising feature is that
"(ii)" uses PySequence_Getitem to access the fields, whereas
PyArg_ParseTuple uses PyTuple_GET_ITEM, so it won't work for
StructSequences.

Regards,
Martin

From adrian at holovaty.com  Sat Aug 25 07:58:57 2007
From: adrian at holovaty.com (Adrian Holovaty)
Date: Sat, 25 Aug 2007 00:58:57 -0500
Subject: [Python-3000] [patch] Should 2to3 point out *possible*,
	but not definite changes?
In-Reply-To: <43aa6ff70708240946s4fb506a6o53bf1de54b4c96d6@mail.gmail.com>
References: <a2eb1c260708231950y534b3efajafc0e9dc0e29294a@mail.gmail.com>
	<ca471dc20708232136w725482d1jc6ffd857984b0758@mail.gmail.com>
	<43aa6ff70708240946s4fb506a6o53bf1de54b4c96d6@mail.gmail.com>
Message-ID: <a2eb1c260708242258n6f9b1934r4907eca764b73563@mail.gmail.com>

On 8/24/07, Collin Winter <collinw at gmail.com> wrote:
> Adrian and I talked about this this morning, and he said he's going to
> go ahead with an implementation. The original warning messages were a
> good idea, but they tend to get lost when converting large projects.

(I assume this is the place to post patches for the 2to3 utility, but
please set me straight if I should use bugs.python.org instead...)

I've attached two patches that implement the 2to3 change discussed in
this thread. In 2to3_insert_comment.diff --

* fixes/util.py gets an insert_comment() function. Give it a Node/Leaf
and a comment message, and it will insert a Python comment before the
given Node/Leaf. This takes indentation into account, such that the
comment will be indented to fix the indentation of the line it is
commenting. For example:

    if foo:
        # comment about bar()
        bar()

It also handles existing comments gracefully. If a line already has a
comment above it, the new comment will be added on a new line under
the old one.

* pytree.Base gets two new methods: get_previous_sibling() and
get_previous_in_tree(). These just made it easier and clearer to
implement insert_comment().

* tests/test_util.py has unit tests for insert_comment(), and
tests/test_pytree.py has tests for the two new pytree.Base methods.

The other patch, 2to3_comment_warnings.diff, is an example of how we
could integrate this new insert_comment() method to replace the
current functionality of fixes.basefix.BaseFix.warning(). To see this
in action, apply these two patches and run the 2to3 script
(refactor.py) on the following input:

    foo()
    map(f, x)

The resulting output should display a Python comment above the map()
call instead of outputting a warning to stdout, which was the previous
behavior.

If these patches are accepted, the next steps would be to change the
behavior of warns() and warns_unchanged() in tests/test_fixers.py, so
that the tests can catch the new behavior.

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2to3_comment_warnings.diff
Type: application/octet-stream
Size: 741 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/c87ff631/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2to3_insert_comment.diff
Type: application/octet-stream
Size: 11031 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/c87ff631/attachment-0001.obj 

From stephen at xemacs.org  Sat Aug 25 08:10:04 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 25 Aug 2007 15:10:04 +0900
Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets)
In-Reply-To: <93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org>
References: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
	<93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org>
Message-ID: <87y7g0401v.fsf@uwakimon.sk.tsukuba.ac.jp>

Barry Warsaw writes:

 > I've been spending hours of my own time on the email package for py3k  
 > this week and every time I think I'm nearing success I get defeated  
 > again.

I'm ankle deep in the Big Muddy (daughter tested positive for TB as
expected -- the Japanese innoculate all children against it because of
the sins of their fathers -- and school starts on Tuesday, so we need
to make a bunch of extra trips to doctors and whatnot), so what thin
hope I had of hanging out with the big boys at the Python-3000 sprint
long since evaporated.

However, starting next week I should have a day a week or so I can
devote to email stuff -- if you want to send any thoughts or
requisitions my way (or an URL to sprint IRC transcripts), I'd love to
help.  Of course you'll get it all done and leave none for me, right?

 > But I'm determined to solve the worst of the problems this week.

Bu-wha-ha-ha!

Steve


From skip at pobox.com  Sat Aug 25 15:31:50 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 25 Aug 2007 08:31:50 -0500
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <46CFA9D8.4020603@v.loewis.de>
References: <18127.18169.547364.692145@montanaro.dyndns.org>
	<18127.19378.582174.753256@montanaro.dyndns.org>
	<ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>
	<18127.29800.339928.420183@montanaro.dyndns.org>
	<ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>
	<46CFA9D8.4020603@v.loewis.de>
Message-ID: <18128.12102.439038.376077@montanaro.dyndns.org>


    Martin> As for the case of timemodule: the surprising feature is that
    Martin> "(ii)" uses PySequence_Getitem to access the fields, whereas
    Martin> PyArg_ParseTuple uses PyTuple_GET_ITEM, so it won't work for
    Martin> StructSequences.

I believe I've already fixed this (r57416) by inserting an intermediate
function to convert time.struct_time objects to tuples before
PyArgParseTuple sees them.  It would be nice if someone could take a minute
or two and review that change.

Skip

From guido at python.org  Sat Aug 25 15:34:01 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Aug 2007 06:34:01 -0700
Subject: [Python-3000] marshalling bytes objects
In-Reply-To: <ee2a432c0708242049q59bfdbb4l4ab42a885c274c65@mail.gmail.com>
References: <ee2a432c0708242035o21bc3fccmc43b7c4ef2bfec09@mail.gmail.com>
	<ee2a432c0708242049q59bfdbb4l4ab42a885c274c65@mail.gmail.com>
Message-ID: <ca471dc20708250634l21a20ef8gae9210e5b03a0d4d@mail.gmail.com>

Can we put this decision off till after the a1 release? At this point
I don't expect PyString to be removed in time for the release, which I
want to be done by August 31.

On 8/24/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> On 8/24/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > I see in PEP 358 (bytes) http://www.python.org/dev/peps/pep-0358/ that
> > marshalling bytes is an open issue and needs to be specified.  I'm
> > converting code objects to use bytes for the bytecode and lnotab.  Is
> > there anything special to be aware of here?
>
> By "here" I was originally thinking about the marshaling aspect.  But
> clearly the mutability of bytes isn't particularly good for code
> objects. :-)  This goes back to the question of whether bytes should
> be able to be immutable (frozen).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Aug 25 15:36:01 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Aug 2007 06:36:01 -0700
Subject: [Python-3000] Removing email package until it's fixed
Message-ID: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>

FYI, I'm removing the email package from the py3k branch for now.
If/when Barry has a working version we'll add it back. Given that it's
so close to the release I'd rather release without the email package
than with a broken one. If Barry finishes it after the a1 release,
people who need it can always download his version directly.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Sat Aug 25 15:44:58 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 25 Aug 2007 08:44:58 -0500
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <ee2a432c0708242030u4f0e5471mf262781f06c64a91@mail.gmail.com>
References: <18127.18169.547364.692145@montanaro.dyndns.org>
	<18127.19378.582174.753256@montanaro.dyndns.org>
	<ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>
	<18127.29800.339928.420183@montanaro.dyndns.org>
	<ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>
	<ee2a432c0708242030u4f0e5471mf262781f06c64a91@mail.gmail.com>
Message-ID: <18128.12890.680091.978246@montanaro.dyndns.org>


    Neal> with Python/mactoolboxglue.c looking like it's low hanging fruit,

I already took care of the easy cases there, though I haven't checked it in
yet.

    Neal> The remaining 65 uses are in Mac modules.  I'm not sure if all of
    Neal> them are sticking around.  (That's a separate discussion we should
    Neal> have--which of the mac modules should go.)

As I understand it, these are generated by bgen.  Presumably we could change
that code and regenerate those modules.

Skip

From lists at cheimes.de  Sat Aug 25 16:47:43 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 25 Aug 2007 16:47:43 +0200
Subject: [Python-3000] [patch] roman.py
Message-ID: <fapfev$6s6$1@sea.gmane.org>

The patch fixes roman.py for Py3k (<> and raise fixes).

Christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: roman.diff
Type: text/x-patch
Size: 1213 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/f8883817/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/f8883817/attachment.pgp 

From guido at python.org  Sat Aug 25 16:56:17 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Aug 2007 07:56:17 -0700
Subject: [Python-3000] [patch] roman.py
In-Reply-To: <fapfev$6s6$1@sea.gmane.org>
References: <fapfev$6s6$1@sea.gmane.org>
Message-ID: <ca471dc20708250756k5fe3d67ayc52528dd4cfde00@mail.gmail.com>

Thanks, applied.

There's a lot more to bing able to run "make html PYTHON=python3.0"
successfully, isn't there?

On 8/25/07, Christian Heimes <lists at cheimes.de> wrote:
> The patch fixes roman.py for Py3k (<> and raise fixes).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Sat Aug 25 18:27:53 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 25 Aug 2007 18:27:53 +0200
Subject: [Python-3000] [patch] io.py improvements
Message-ID: <faplav$mla$1@sea.gmane.org>

The patch improves io.py and socket.py's SocketIO:

* I've removed all asserts and replaces them by explict raises
* I've added four convenient methods _check_readable, _check_writable,
_check_seekable and _check_closed. The methods take an optional msg
argument for future usage.
* unit tests for the stdin.name ... and io.__all__.

open problems:

The io.__all__ tuple contains a reference to SocketIO but SocketIO is in
socket.py. from io import * fails. Should the facade SocketIO class
moved to io.py or should SocketIO be removed from io.__all__?

The predecessor of the patch was discussed in the sf.net bug tracker
http://bugs.python.org/issue1771364

Christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: io_assert.patch
Type: text/x-patch
Size: 10330 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/f76066fb/attachment-0001.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/f76066fb/attachment-0001.pgp 

From nnorwitz at gmail.com  Sat Aug 25 18:30:58 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sat, 25 Aug 2007 09:30:58 -0700
Subject: [Python-3000] marshalling bytes objects
In-Reply-To: <ca471dc20708250634l21a20ef8gae9210e5b03a0d4d@mail.gmail.com>
References: <ee2a432c0708242035o21bc3fccmc43b7c4ef2bfec09@mail.gmail.com>
	<ee2a432c0708242049q59bfdbb4l4ab42a885c274c65@mail.gmail.com>
	<ca471dc20708250634l21a20ef8gae9210e5b03a0d4d@mail.gmail.com>
Message-ID: <ee2a432c0708250930u3e3bb26an9c25e8972196ee4d@mail.gmail.com>

On 8/25/07, Guido van Rossum <guido at python.org> wrote:
> Can we put this decision off till after the a1 release?

Yes.

> At this point
> I don't expect PyString to be removed in time for the release, which I
> want to be done by August 31.

Agreed.  I plan to make a patch for this and upload it.  All tests
except modulefinder pass (I'm not sure why).  There is a hack in
marshal to convert co_code and co_lnotab to a bytes object after
reading in a string.

n
--
>
> On 8/24/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > On 8/24/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > > I see in PEP 358 (bytes) http://www.python.org/dev/peps/pep-0358/ that
> > > marshalling bytes is an open issue and needs to be specified.  I'm
> > > converting code objects to use bytes for the bytecode and lnotab.  Is
> > > there anything special to be aware of here?
> >
> > By "here" I was originally thinking about the marshaling aspect.  But
> > clearly the mutability of bytes isn't particularly good for code
> > objects. :-)  This goes back to the question of whether bytes should
> > be able to be immutable (frozen).
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From guido at python.org  Sat Aug 25 18:40:06 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Aug 2007 09:40:06 -0700
Subject: [Python-3000] [patch] io.py improvements
In-Reply-To: <faplav$mla$1@sea.gmane.org>
References: <faplav$mla$1@sea.gmane.org>
Message-ID: <ca471dc20708250940w597ff317h5f165fbcd1f9fef8@mail.gmail.com>

Would you mind uploading this to the new tracker at bugs.python.org?
And you can close the predecessor of the patch there (unless you want
to reuse that one).

(If you're having trouble using the tracker, you may need to reset
your password -- it'll send an email with a new password to your SF
account. You can then edit your profile to change the password once
again and reset the email.)

--Guido

On 8/25/07, Christian Heimes <lists at cheimes.de> wrote:
> The patch improves io.py and socket.py's SocketIO:
>
> * I've removed all asserts and replaces them by explict raises
> * I've added four convenient methods _check_readable, _check_writable,
> _check_seekable and _check_closed. The methods take an optional msg
> argument for future usage.
> * unit tests for the stdin.name ... and io.__all__.
>
> open problems:
>
> The io.__all__ tuple contains a reference to SocketIO but SocketIO is in
> socket.py. from io import * fails. Should the facade SocketIO class
> moved to io.py or should SocketIO be removed from io.__all__?
>
> The predecessor of the patch was discussed in the sf.net bug tracker
> http://bugs.python.org/issue1771364
>
> Christian
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Sat Aug 25 18:59:05 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 25 Aug 2007 18:59:05 +0200
Subject: [Python-3000] [patch] io.py improvements
In-Reply-To: <ca471dc20708250940w597ff317h5f165fbcd1f9fef8@mail.gmail.com>
References: <faplav$mla$1@sea.gmane.org>
	<ca471dc20708250940w597ff317h5f165fbcd1f9fef8@mail.gmail.com>
Message-ID: <46D05FD9.7080607@cheimes.de>

Guido van Rossum wrote:
> Would you mind uploading this to the new tracker at bugs.python.org?
> And you can close the predecessor of the patch there (unless you want
> to reuse that one).
> 
> (If you're having trouble using the tracker, you may need to reset
> your password -- it'll send an email with a new password to your SF
> account. You can then edit your profile to change the password once
> again and reset the email.)

I'd like to close the bug but I'm not the owner of the bug any more. In
fact all my bug reports and patches aren't assigned to me any more. I
thought that I'd keep the assignments after the migration. Is it a bug?

Christian

From fdrake at acm.org  Sat Aug 25 21:12:05 2007
From: fdrake at acm.org (Fred Drake)
Date: Sat, 25 Aug 2007 15:12:05 -0400
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
Message-ID: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>

On Aug 25, 2007, at 9:36 AM, Guido van Rossum wrote:
> FYI, I'm removing the email package from the py3k branch for now.
> If/when Barry has a working version we'll add it back. Given that it's
> so close to the release I'd rather release without the email package
> than with a broken one. If Barry finishes it after the a1 release,
> people who need it can always download his version directly.

Alternately, we could move toward separate libraries for such  
components; this allows separate packages to have separate  
maintenance cycles, and makes it easier for applications to pick up  
bug fixes.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From greg at electricrain.com  Sat Aug 25 21:38:15 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Sat, 25 Aug 2007 12:38:15 -0700
Subject: [Python-3000] Removal of PyArg_Parse()
In-Reply-To: <ee2a432c0708242030u4f0e5471mf262781f06c64a91@mail.gmail.com>
References: <18127.18169.547364.692145@montanaro.dyndns.org>
	<18127.19378.582174.753256@montanaro.dyndns.org>
	<ca471dc20708241448sd6f3d0fx659f0bc7cd182c06@mail.gmail.com>
	<18127.29800.339928.420183@montanaro.dyndns.org>
	<ca471dc20708241915y7150aeecg49329815fe5337a9@mail.gmail.com>
	<ee2a432c0708242030u4f0e5471mf262781f06c64a91@mail.gmail.com>
Message-ID: <20070825193815.GO24059@electricrain.com>

On Fri, Aug 24, 2007 at 08:30:48PM -0700, Neal Norwitz wrote:
> On 8/24/07, Guido van Rossum <guido at python.org> wrote:
> > On 8/24/07, skip at pobox.com <skip at pobox.com> wrote:
> > >
> > >     Guido> Are there any existing uses (in the core) that are hard to
> > >     Guido> replace with PyArg_ParseTuple()?
> > >
> > > There are lots of uses where the arguments aren't tuples.  I was
> > > particularly vexed by the time module because it was used to extract
> > > arguments both from tuples and from time.struct_time objects.
> 
> There are 45 uses in */*.c spread across 9 modules:
>   arraymodule.c, posixmodule.c,
>   _hashopenssl.c (2), dbmmodule.c (4), gdbmmodule.c (2),
>   mactoolboxglue.c (5), stringobject.c (2)

_hashopenssl.c will stop using it soon enough as I modify it to take
objects supporting the buffer api.

-greg

From guido at python.org  Sat Aug 25 22:50:06 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Aug 2007 13:50:06 -0700
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
Message-ID: <ca471dc20708251350j6790a6cq954365fa3949db9c@mail.gmail.com>

Works for me. Barry?

On 8/25/07, Fred Drake <fdrake at acm.org> wrote:
> On Aug 25, 2007, at 9:36 AM, Guido van Rossum wrote:
> > FYI, I'm removing the email package from the py3k branch for now.
> > If/when Barry has a working version we'll add it back. Given that it's
> > so close to the release I'd rather release without the email package
> > than with a broken one. If Barry finishes it after the a1 release,
> > people who need it can always download his version directly.
>
> Alternately, we could move toward separate libraries for such
> components; this allows separate packages to have separate
> maintenance cycles, and makes it easier for applications to pick up
> bug fixes.
>
>
>    -Fred
>
> --
> Fred Drake   <fdrake at acm.org>
>
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From baranguren at gmail.com  Sat Aug 25 20:09:20 2007
From: baranguren at gmail.com (Benjamin Aranguren)
Date: Sat, 25 Aug 2007 11:09:20 -0700
Subject: [Python-3000] backported ABC
Message-ID: <f413c71e0708251109t31a92bedrdd5003d5053c852@mail.gmail.com>

Worked with Alex Martelli at the Goolge Python Sprint.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070825/6486578b/attachment.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pyABC_backport_to_2_6.patch
Type: text/x-patch
Size: 1521 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/6486578b/attachment.bin 

From brett at python.org  Sun Aug 26 00:00:15 2007
From: brett at python.org (Brett Cannon)
Date: Sat, 25 Aug 2007 15:00:15 -0700
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
Message-ID: <bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>

On 8/25/07, Fred Drake <fdrake at acm.org> wrote:
> On Aug 25, 2007, at 9:36 AM, Guido van Rossum wrote:
> > FYI, I'm removing the email package from the py3k branch for now.
> > If/when Barry has a working version we'll add it back. Given that it's
> > so close to the release I'd rather release without the email package
> > than with a broken one. If Barry finishes it after the a1 release,
> > people who need it can always download his version directly.
>
> Alternately, we could move toward separate libraries for such
> components; this allows separate packages to have separate
> maintenance cycles, and makes it easier for applications to pick up
> bug fixes.

Are you suggesting of just leaving email out of the core then and just
have people download it as necessary?  Or just having it developed
externally and thus have its own release schedule, but then pull in
the latest stable release when we do a new Python release?

I don't like the former, but the latter is intriguing.  If we could
host large packages (e.g., email, sqlite, ctypes, etc.) on python.org
by providing tracker, svn, and web space they could be developed and
released on their own schedule.  Then the Python release would then
become a sumo release of these various packages.  People could release
code that still depends on a specific Python version flatly (and thus
not have external dependencies), or say it needs support of Python 2.6
+ email 42.2 or something if some feature is really needed).  But
obviously this ups the resource needs on Python's infrastructure so I
don't know how reasonable it really is in the end.

-Brett

From greg at krypto.org  Sun Aug 26 00:26:14 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 25 Aug 2007 15:26:14 -0700
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
Message-ID: <20070825222614.GQ24059@electricrain.com>

On Sat, Aug 25, 2007 at 03:00:15PM -0700, Brett Cannon wrote:
> On 8/25/07, Fred Drake <fdrake at acm.org> wrote:
> > Alternately, we could move toward separate libraries for such
> > components; this allows separate packages to have separate
> > maintenance cycles, and makes it easier for applications to pick up
> > bug fixes.
> 
> Are you suggesting of just leaving email out of the core then and just
> have people download it as necessary?  Or just having it developed
> externally and thus have its own release schedule, but then pull in
> the latest stable release when we do a new Python release?
> 
> I don't like the former, but the latter is intriguing.  If we could
> host large packages (e.g., email, sqlite, ctypes, etc.) on python.org
> by providing tracker, svn, and web space they could be developed and
> released on their own schedule.  Then the Python release would then
> become a sumo release of these various packages.  People could release
> code that still depends on a specific Python version flatly (and thus
> not have external dependencies), or say it needs support of Python 2.6
> + email 42.2 or something if some feature is really needed).  But
> obviously this ups the resource needs on Python's infrastructure so I
> don't know how reasonable it really is in the end.
> 
> -Brett

Agreed, the latter of still pulling in the latest stable release when
doing a new Python release is preferred.  Libraries not included with
the standard library set in python distributions are much less likely
to be used because not all python installs will include them by default.

I think something better than 'latest stable release' of any given
module would make sense.  Presumably we'd want to keep up the no
features/API changes within a given Python 3.x releases standard
library?

Or would that just become "no backwards incompatible API changes" to
allow for new features; all such modules would need to include their
own version info.  In that case we should make it easy to specify an
API version at import time causing an ImportError if the API version
is not met.

brainstorm:

import spam api 3.0
from spam api 3.0 import eggs as chickens

import spam(3.0)
from spam(3.0) import eggs as chickens

it could get annoying to need to think much about package versions in
import statements.  its much less casual, it should not be required.

-gps

From p.f.moore at gmail.com  Sun Aug 26 00:33:16 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 25 Aug 2007 23:33:16 +0100
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
Message-ID: <79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com>

On 25/08/07, Brett Cannon <brett at python.org> wrote:
> On 8/25/07, Fred Drake <fdrake at acm.org> wrote:
> > On Aug 25, 2007, at 9:36 AM, Guido van Rossum wrote:
> > > FYI, I'm removing the email package from the py3k branch for now.
> > > If/when Barry has a working version we'll add it back. Given that it's
> > > so close to the release I'd rather release without the email package
> > > than with a broken one. If Barry finishes it after the a1 release,
> > > people who need it can always download his version directly.
> >
> > Alternately, we could move toward separate libraries for such
> > components; this allows separate packages to have separate
> > maintenance cycles, and makes it easier for applications to pick up
> > bug fixes.
>
> Are you suggesting of just leaving email out of the core then and just
> have people download it as necessary?  Or just having it developed
> externally and thus have its own release schedule, but then pull in
> the latest stable release when we do a new Python release?

FWIW, I'm very much against moving email out of the core. This has
been discussed a number of times before, and as far as I am aware, no
conclusion reached. However, the "batteries included" approach of
Python is a huge benefit for me. Every time I have to endure writing
Perl, I find some module that I don't have available as standard. I
can download it, sure, but I can't *rely* on it.

No matter how good eggs and/or PyPI get, please let's keep the
standard library with the "batteries included" philosophy.

(Apologies if removing email permanently was never the intention - you
just touched a nerve there!)

Paul.

From janssen at parc.com  Sun Aug 26 01:00:27 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 25 Aug 2007 16:00:27 PDT
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com> 
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com>
Message-ID: <07Aug25.160033pdt."57996"@synergy1.parc.xerox.com>

> FWIW, I'm very much against moving email out of the core. This has
> been discussed a number of times before, and as far as I am aware, no
> conclusion reached. However, the "batteries included" approach of
> Python is a huge benefit for me.

I agree.  But if the current code doesn't work with 3K, not sure what
else to do.  I guess it could just be labelled a "show-stopper" till
it's fixed.

Bill

From greg at krypto.org  Sun Aug 26 01:30:30 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 25 Aug 2007 16:30:30 -0700
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <7829056871282917102@unknownmsgid>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com>
	<7829056871282917102@unknownmsgid>
Message-ID: <52dc1c820708251630u17e3e96by1c447747a5896d10@mail.gmail.com>

On 8/25/07, Bill Janssen <janssen at parc.com> wrote:
> > FWIW, I'm very much against moving email out of the core. This has
> > been discussed a number of times before, and as far as I am aware, no
> > conclusion reached. However, the "batteries included" approach of
> > Python is a huge benefit for me.
>
> I agree.  But if the current code doesn't work with 3K, not sure what
> else to do.  I guess it could just be labelled a "show-stopper" till
> it's fixed.

yeah, relax.  its not as if its going away for good.  just to get
3.0a1 out.  though by the time py3k is popular maybe sms and jabber
libraries would be more useful. ;)

From greg at krypto.org  Sun Aug 26 02:54:13 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 25 Aug 2007 17:54:13 -0700
Subject: [Python-3000] PyBuffer ndim unsigned
Message-ID: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com>

Anyone mind if I do this?

--- Include/object.h    (revision 57412)
+++ Include/object.h    (working copy)
@@ -148,7 +148,7 @@
         Py_ssize_t itemsize;  /* This is Py_ssize_t so it can be
                                  pointed to by strides in simple case.*/
         int readonly;
-        int ndim;
+        unsigned int ndim;
         char *format;
         Py_ssize_t *shape;
         Py_ssize_t *strides;


PEP 3118 and all reality as I know it says ndim must be >= 0 so it
makes sense to me.

From barry at python.org  Sun Aug 26 03:51:23 2007
From: barry at python.org (Barry Warsaw)
Date: Sat, 25 Aug 2007 21:51:23 -0400
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <07Aug25.160033pdt."57996"@synergy1.parc.xerox.com>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com>
	<07Aug25.160033pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <70E776FF-416F-4F40-A86C-7CDD1D76C26A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 25, 2007, at 7:00 PM, Bill Janssen wrote:

>> FWIW, I'm very much against moving email out of the core. This has
>> been discussed a number of times before, and as far as I am aware, no
>> conclusion reached. However, the "batteries included" approach of
>> Python is a huge benefit for me.
>
> I agree.  But if the current code doesn't work with 3K, not sure what
> else to do.  I guess it could just be labelled a "show-stopper" till
> it's fixed.

Just a quick reply for right now, more later.

email /will/ be made to work with py3k and I'm against removing it  
permanently unless as part of a py3k-wide policy to detach large  
parts of the stdlib.  I made a lot of progress last week and I intend  
to continue working on it until it passes all the tests.  Please  
check out the temporary sandbox version until then.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtDcnHEjvBPtnXfVAQJlnQQAlQORbGXzqnhw6z+5PGkTrb2p3kpHE2rf
AN/MYQ+sF7ASMHiNE9ZqKvbOjsNi7HW49LdBcJ6ySOYolzo8k1+pjh0HJCt6ROST
T4hPSFBIHtOtlBtg3LAo8q+y5fAynviSE2r7jn+LyezdVD9vTJnTGJlGWtYoIHZt
+LDF5uY4arc=
=Z1gN
-----END PGP SIGNATURE-----

From guido at python.org  Sun Aug 26 03:55:31 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Aug 2007 18:55:31 -0700
Subject: [Python-3000] Limitations of "batteries included"
Message-ID: <ca471dc20708251855p19ec32a4ib159d92fe2b28be1@mail.gmail.com>

[Subject was: Removing email package until it's fixed]

I find there are pluses and minuses to the "batteries included"
philosophy. Not so much in the case of the email package (which I'm
sure will be back before 3.0 final is released), but in general, in
order for a package to be suitable for inclusion in the core, it must
have pretty much stopped evolving, or at least its evolution rate must
have slowed down to the same 18-24 month feature release cycle that
the core language and library experiences.

Take for example GUI packages. Tkinter is far from ideal, but there
are many competitors, none of them perfect (not even those packages
specifically designed to be platform-neutral). We can't very well
include all of the major packages (PyQt, PyGtk, wxPython, anygui) --
the release would just bloat tremendously, and getting stable versions
of all of these would just be a maintenance nightmare. (I don't know
how Linux distros do it, but they tend to have a large group of people
*just* devoted to *bundling* stuff, and their release cycles are even
slower. I don't think Python should be in that business.)

Database wrappers are in the same boat, and IMO the approach of
separately downloadable 3rd party wrappers (sometimes multiple
competing wrappers for the same database) has served the users well.

Or consider the major pain caused by PyXML (xmlplus), which tried to
pre-empt the namespace of the built-in xml package, causing endless
confusion and breakage.

Would anyone seriously consider including something like Django,
TurboGears or Pylons in a Python release? I hope not -- these all
evolve at a rate about 10x that of Python, and the version included
with a core distribution would be out of date (and a nuisance to
replace) within months of the core release.

I believe the only reasonable solution is to promote the use of
package managers, and to let go of the "batteries included" philosophy
where it comes to major external functionality. When it links to
something that requires me to do install a pre-built external
non-Python bundle anyway (e.g. Berkeley Db, Sqlite, and others), the
included battery is useless until it is "charged" by installing that
dependency; the Python wrapper might as well be managed by the same
package manager.

Now, there's plenty of pure Python (or Python-specific) functionality
for which "batteries included" makes total sense, including the email
package, wsgiref, XML processing, and more; it's often a judgement
call. But I want to warn against the desire to include everything --
it's not going to happen, and it shouldn't.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Aug 26 04:08:03 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Aug 2007 19:08:03 -0700
Subject: [Python-3000] PyBuffer ndim unsigned
In-Reply-To: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com>
References: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com>
Message-ID: <ca471dc20708251908w47c50f2ei28f24410fe46314e@mail.gmail.com>

I look at it from another POV -- does anyone care about not being able
to represent dimensionalities over 2 billion? I don't see the
advantage of saying unsigned int here; it just means that we'll get
more compiler warnings in code that is otherwise fine. After all, the
previous line says 'int readonly' -- I'm sure that's meant to be a
bool as well. Hey, Python sequences use Py_ssize_t to express their
length, and I've never seen a string with a negative length either.
:-)

I could even see code computing the difference between two dimensions
and checking if it is negative; don't some compilers actively work
against making such code work correctly?

--Guido

On 8/25/07, Gregory P. Smith <greg at krypto.org> wrote:
> Anyone mind if I do this?
>
> --- Include/object.h    (revision 57412)
> +++ Include/object.h    (working copy)
> @@ -148,7 +148,7 @@
>          Py_ssize_t itemsize;  /* This is Py_ssize_t so it can be
>                                   pointed to by strides in simple case.*/
>          int readonly;
> -        int ndim;
> +        unsigned int ndim;
>          char *format;
>          Py_ssize_t *shape;
>          Py_ssize_t *strides;
>
>
> PEP 3118 and all reality as I know it says ndim must be >= 0 so it
> makes sense to me.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Aug 26 04:10:16 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Aug 2007 19:10:16 -0700
Subject: [Python-3000] backported ABC
In-Reply-To: <f413c71e0708251109t31a92bedrdd5003d5053c852@mail.gmail.com>
References: <f413c71e0708251109t31a92bedrdd5003d5053c852@mail.gmail.com>
Message-ID: <ca471dc20708251910o1ef3ea77nf0b2579862977211@mail.gmail.com>

Um, that patch contains only the C code for overloading isinstance()
and issubclass().

Did you do anything about abc.py and _abcoll.py/collections.py and
their respective unit tests? Or what about the unit tests for
isinstance()/issubclass()?

On 8/25/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> Worked with Alex Martelli at the Goolge Python Sprint.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Sun Aug 26 04:54:56 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 25 Aug 2007 21:54:56 -0500
Subject: [Python-3000] A couple 2to3 questions
Message-ID: <18128.60288.458934.140003@montanaro.dyndns.org>

I ran 2to3 over the Doc/tools directory.  This left a number of problems
which I initially began replacing manually.  I then realized that it would
be better to tweak 2to3.  A couple things I wondered about:

    1. How are we supposed to maintain changes to Doc/tools?  Running svn
       status doesn't show any changes.

    2. I noticed a couple places where it seems to replace "if isinstance"
       with "ifinstance".  Seems like an output bug of some sort.

    3. Here are some obvious transformations (I don't know what to do to
       make these changes to 2to3):

          * replace uppercase and lowercase from the string module with
            their "ascii_"-prefixed names.

          * replace types.StringType and types.UnicodeType with str and
            unicode.

Skip

From greg at krypto.org  Sun Aug 26 05:02:08 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 25 Aug 2007 20:02:08 -0700
Subject: [Python-3000] PyBuffer ndim unsigned
In-Reply-To: <ca471dc20708251908w47c50f2ei28f24410fe46314e@mail.gmail.com>
References: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com>
	<ca471dc20708251908w47c50f2ei28f24410fe46314e@mail.gmail.com>
Message-ID: <52dc1c820708252002v3efce97eu869fd46e97e88271@mail.gmail.com>

heh good point.  ignore that thought.  python is a signed language.  :)

On 8/25/07, Guido van Rossum <guido at python.org> wrote:
> I look at it from another POV -- does anyone care about not being able
> to represent dimensionalities over 2 billion? I don't see the
> advantage of saying unsigned int here; it just means that we'll get
> more compiler warnings in code that is otherwise fine. After all, the
> previous line says 'int readonly' -- I'm sure that's meant to be a
> bool as well. Hey, Python sequences use Py_ssize_t to express their
> length, and I've never seen a string with a negative length either.
> :-)
>
> I could even see code computing the difference between two dimensions
> and checking if it is negative; don't some compilers actively work
> against making such code work correctly?
>
> --Guido
>
> On 8/25/07, Gregory P. Smith <greg at krypto.org> wrote:
> > Anyone mind if I do this?
> >
> > --- Include/object.h    (revision 57412)
> > +++ Include/object.h    (working copy)
> > @@ -148,7 +148,7 @@
> >          Py_ssize_t itemsize;  /* This is Py_ssize_t so it can be
> >                                   pointed to by strides in simple case.*/
> >          int readonly;
> > -        int ndim;
> > +        unsigned int ndim;
> >          char *format;
> >          Py_ssize_t *shape;
> >          Py_ssize_t *strides;
> >
> >
> > PEP 3118 and all reality as I know it says ndim must be >= 0 so it
> > makes sense to me.
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From jimjjewett at gmail.com  Sun Aug 26 05:02:22 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sat, 25 Aug 2007 23:02:22 -0400
Subject: [Python-3000] Limitations of "batteries included"
In-Reply-To: <ca471dc20708251855p19ec32a4ib159d92fe2b28be1@mail.gmail.com>
References: <ca471dc20708251855p19ec32a4ib159d92fe2b28be1@mail.gmail.com>
Message-ID: <fb6fbf560708252002t88d14eew8a1f40b069d6eee7@mail.gmail.com>

On 8/25/07, Guido van Rossum <guido at python.org> wrote:

> I believe the only reasonable solution is to promote the use of
> package managers, and to let go of the "batteries included" philosophy
> where it comes to major external functionality. When it links to
> something that requires me to do install a pre-built external
> non-Python bundle anyway (e.g. Berkeley Db, Sqlite, and others), the
> included battery is useless until it is "charged" by installing that
> dependency; the Python wrapper might as well be managed by the same
> package manager.

Windows is in a slightly different category; many people can't easily
install the external bundle.  If it is included in the python binary
(sqlite3, tcl), then everything is fine.  But excluding them by
default on non-windows machines seems to be opening the door to
bitrot.  (Remember that one of the pushes toward the buildbots was a
realization of how long the windows build had stayed broken without
anyone noticing.)

-jJ

From nnorwitz at gmail.com  Sun Aug 26 05:13:10 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sat, 25 Aug 2007 20:13:10 -0700
Subject: [Python-3000] A couple 2to3 questions
In-Reply-To: <18128.60288.458934.140003@montanaro.dyndns.org>
References: <18128.60288.458934.140003@montanaro.dyndns.org>
Message-ID: <ee2a432c0708252013j4bc9bd58v9eea4abf31ee896b@mail.gmail.com>

On 8/25/07, skip at pobox.com <skip at pobox.com> wrote:
> I ran 2to3 over the Doc/tools directory.  This left a number of problems
> which I initially began replacing manually.  I then realized that it would
> be better to tweak 2to3.  A couple things I wondered about:
>
>     1. How are we supposed to maintain changes to Doc/tools?  Running svn
>        status doesn't show any changes.

Dunno, Georg will have to answer this one.

>     2. I noticed a couple places where it seems to replace "if isinstance"
>        with "ifinstance".  Seems like an output bug of some sort.

That bug was probably me.  I did some large changes and broke
somethings a while back.  I've since learned my lesson and just use
2to3 to automate the task. :-)

>     3. Here are some obvious transformations (I don't know what to do to
>        make these changes to 2to3):
>
>           * replace uppercase and lowercase from the string module with
>             their "ascii_"-prefixed names.

This should be an easy fixer.  Typically the easiest thing to do is
copy an existing fixer that is similar and replace the pattern and
transform method.  To figure out the pattern, use 2to3/find_pattern.py
.  1) Pass the python expression on the command line.  2) Press return
until it shows you the expression you are interested in.  3) Then type
y<return> and you have your pattern.  Here's an example:

$ python find_pattern.py 'string.letters'
'.letters'

'string.letters'
y
power< 'string' trailer< '.' 'letters' > >

That last line is the pattern to use.  Use that string in the fixer as
the PATTERN.  You may want to add names so you can pull out pieces.
For example, if we want to pull out letters, modify the pattern to add
my_name:

power< 'string' trailer< '.' my_name='letters' > >

Then modify the transform method to get my_name, clone the node, and
set the new node to what you want.  (See another fixer for the
details.)

>           * replace types.StringType and types.UnicodeType with str and
>             unicode.

This one is already done.  I checked in fixer for this a few days ago
(during the sprint).  See 2to3/fixes/fix_types.py .  It also handles
other builtin types that were aliased in the types module.

HTH,
n

From janssen at parc.com  Sun Aug 26 05:53:33 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 25 Aug 2007 20:53:33 PDT
Subject: [Python-3000] Limitations of "batteries included"
In-Reply-To: <fb6fbf560708252002t88d14eew8a1f40b069d6eee7@mail.gmail.com> 
References: <ca471dc20708251855p19ec32a4ib159d92fe2b28be1@mail.gmail.com>
	<fb6fbf560708252002t88d14eew8a1f40b069d6eee7@mail.gmail.com>
Message-ID: <07Aug25.205334pdt."57996"@synergy1.parc.xerox.com>

> On 8/25/07, Guido van Rossum <guido at python.org> wrote:
> 
> I believe the only reasonable solution is to promote the use of
> package managers, and to let go of the "batteries included" philosophy

It's important to realize that most operating systems (Windows, OS X)
don't really support the use of package managers.  Installers, yes;
package managers, no.  And installers don't do dependencies.  And most
users (and probably most developers) are running one of these
package-manager-less systems.

Even with package managers, installing an external extension is out of
bounds for most users.  Many work for companies where the IT
department controls what can and can't be installed, and the IT
department does the installs.  I do this myself, out of sheer lazyness
-- I don't want to understand the system of dependencies for each
Linux variant and I don't want to work as a sysop, so when I need some
package on some Fedora box that isn't there, I don't fire up "yum";
instead, I call our internal tech support to make it happen.  This
means a turn-around time that varies from an hour to several days.
This can be a killer if you just want to try something out -- the
energy barrier is too high.  So as soon as you require an install of
something, you lose 80% of your potential users.

Though I agree with some of your other points, those about the
fast-moving unstable frameworks, and about the packages that depend on
an external non-Python non-standard resource.  Aside from that,
though, I believe "batteries included" is really effective.  I'd like
to see more API-based work, like the DB-API work, and the WSGI work,
both of which have been very effective.  I'd like to see something
like PyGUI added as a standard UI API, with a default binding for each
platform (GTK+ for Windows and Linux, Cocoa for OS X, Swing for
Jython, HTML5 for Web apps, perhaps a Tk binding for legacy systems,
etc.)  I think a standard image-processing API, perhaps based on PIL,
would be another interesting project.

Bill

From nas at arctrix.com  Sun Aug 26 05:57:55 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Sun, 26 Aug 2007 03:57:55 +0000 (UTC)
Subject: [Python-3000] Limitations of "batteries included"
References: <ca471dc20708251855p19ec32a4ib159d92fe2b28be1@mail.gmail.com>
Message-ID: <faqto3$dt4$1@sea.gmane.org>

Guido van Rossum <guido at python.org> wrote:
> Now, there's plenty of pure Python (or Python-specific) functionality
> for which "batteries included" makes total sense, including the email
> package, wsgiref, XML processing, and more; it's often a judgement
> call. But I want to warn against the desire to include everything --
> it's not going to happen, and it shouldn't.

It sounds like we basically agree as to what "batteries included"
means.  Still, I think we should include more batteries.  The
problem is that, with the current model, the Python development team
has to take on too much responsibility in order to include them.

The "email" package is a good example.  Most people would agree that
it should be included in the distribution.  It meets the
requirements of a battery: it provides widely useful functionality,
it has a (relatively) stable API, and it's well documented.
However, it should not have to live in the Python source tree and be
looked after be the Python developers.

There should be a set of packages that are part of the Python
release that managed by their own teams (e.g.  email, ElementTree).
In order to make a Python release, we would coordinate with the
other teams to pull known good versions of their packages into the
final distribution package.  There could be a PEP that defines how
the package must be organized, making it possible to automate most
of the bundling process (e.g. unit test and documentation
conventions).

  Neil


From nas at arctrix.com  Sun Aug 26 06:10:17 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Sun, 26 Aug 2007 04:10:17 +0000 (UTC)
Subject: [Python-3000] Removing email package until it's fixed
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
Message-ID: <faquf9$f8j$1@sea.gmane.org>

Brett Cannon <brett at python.org> wrote:
> I don't like the former, but the latter is intriguing.  If we could
> host large packages (e.g., email, sqlite, ctypes, etc.) on python.org
> by providing tracker, svn, and web space they could be developed and
> released on their own schedule.  Then the Python release would then
> become a sumo release of these various packages.

Hosting them on python.org is a separate decision.  We should be
able to pull in packages that are hosted anywhere into the
"batteries included" distribution.

It sounds like most people are supportive of this idea. All that's
needed is a little documentation outline the rules that packages
must confirm to and a little scripting.

We could have another file in the distribution, similar to
Modules/Setup, say Modules/Batteries. :-)  Something like:

    # ElementTree -- An XML API
    http://effbot.org/downloads/elementtree.tar.gz

    # email -- An email and MIME handling package
    http://www.python.org/downloads/email.tar.gz

There could be a makefile target or script that downloads them and
unpacks them into the right places.

  Neil


From aahz at pythoncraft.com  Sun Aug 26 07:13:02 2007
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 25 Aug 2007 22:13:02 -0700
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <faquf9$f8j$1@sea.gmane.org>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<faquf9$f8j$1@sea.gmane.org>
Message-ID: <20070826051302.GC24678@panix.com>

On Sun, Aug 26, 2007, Neil Schemenauer wrote:
> Brett Cannon <brett at python.org> wrote:
>>
>> I don't like the former, but the latter is intriguing.  If we could
>> host large packages (e.g., email, sqlite, ctypes, etc.) on python.org
>> by providing tracker, svn, and web space they could be developed and
>> released on their own schedule.  Then the Python release would then
>> become a sumo release of these various packages.
> 
> Hosting them on python.org is a separate decision.  We should be able
> to pull in packages that are hosted anywhere into the "batteries
> included" distribution.
>
> It sounds like most people are supportive of this idea.

Please don't interpret a missing chorus of opposition as support.  I'm
only -0, but I definitely am negative on the idea based on my guess about
the likelihood of problems.

(OTOH, I have no opinion about temporarily removing the email package
for a1 -- though I'm tempted to suggest we call it a0 instead.)
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you don't know what your program is supposed to do, you'd better not
start writing it."  --Dijkstra

From aahz at pythoncraft.com  Sun Aug 26 07:17:19 2007
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 25 Aug 2007 22:17:19 -0700
Subject: [Python-3000] Limitations of "batteries included"
In-Reply-To: <ca471dc20708251855p19ec32a4ib159d92fe2b28be1@mail.gmail.com>
References: <ca471dc20708251855p19ec32a4ib159d92fe2b28be1@mail.gmail.com>
Message-ID: <20070826051719.GD24678@panix.com>

On Sat, Aug 25, 2007, Guido van Rossum wrote:
>
> I believe the only reasonable solution is to promote the use of
> package managers, and to let go of the "batteries included" philosophy
> where it comes to major external functionality. When it links to
> something that requires me to do install a pre-built external
> non-Python bundle anyway (e.g. Berkeley Db, Sqlite, and others), the
> included battery is useless until it is "charged" by installing that
> dependency; the Python wrapper might as well be managed by the same
> package manager.
> 
> Now, there's plenty of pure Python (or Python-specific) functionality
> for which "batteries included" makes total sense, including the email
> package, wsgiref, XML processing, and more; it's often a judgement
> call. But I want to warn against the desire to include everything --
> it's not going to happen, and it shouldn't.

That overall makes sense and is roughly my understanding of the status
for the past while -- it's why we've been pushing PyPI.  What I would say
is that the Python philosophy stays "batteries included" and does not
move closer to a "sumo" philosophy.  I do think a separate sumo
distribution might make sense if someone wants to drive it.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you don't know what your program is supposed to do, you'd better not
start writing it."  --Dijkstra

From skip at pobox.com  Sun Aug 26 13:43:37 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 26 Aug 2007 06:43:37 -0500
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <20070826051302.GC24678@panix.com>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<faquf9$f8j$1@sea.gmane.org> <20070826051302.GC24678@panix.com>
Message-ID: <18129.26473.77328.489985@montanaro.dyndns.org>


    aahz> Please don't interpret a missing chorus of opposition as support.
    aahz> I'm only -0, but I definitely am negative on the idea based on my
    aahz> guess about the likelihood of problems.

-0 on the idea of more batteries or fewer batteries?

Skip

From barry at python.org  Sun Aug 26 14:03:59 2007
From: barry at python.org (Barry Warsaw)
Date: Sun, 26 Aug 2007 08:03:59 -0400
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <07Aug25.160033pdt."57996"@synergy1.parc.xerox.com>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com>
	<07Aug25.160033pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <D218DEAC-4742-4961-9442-2DE94B4B4796@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 25, 2007, at 7:00 PM, Bill Janssen wrote:

>> FWIW, I'm very much against moving email out of the core. This has
>> been discussed a number of times before, and as far as I am aware, no
>> conclusion reached. However, the "batteries included" approach of
>> Python is a huge benefit for me.
>
> I agree.  But if the current code doesn't work with 3K, not sure what
> else to do.  I guess it could just be labelled a "show-stopper" till
> it's fixed.

Another possibility (which I personally favor) is to leave the email  
package in a1 as flawed as it is, but to disable the tests.  It's an / 
alpha/ for gosh sakes, so maybe leaving it in and partly broken will  
help rustle up some volunteers to help get it fixed.

If you're agreeable to this, you can just merge the sandbox[1] into  
the head of the branch.  I'm traveling until Monday and won't get a  
chance to do that until I'm back on the net.  The sandbox is up-to- 
date with all my latest changes.

- -Barry

[1] svn+ssh://pythondev at svn.python.org/sandbox/trunk/emailpkg/5_0-exp

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtFsL3EjvBPtnXfVAQKRXQQAqUOoafghpcoE5ENEiNDWmzQgQStXe3VP
WFsh8QwcCGXXxKTih4dNHK8yLd+ayrwZCxzqFpv4Ie5DFacQ6/d4qq+XPX+vK92Y
wWPMIRKXscTK5Ep0n6lfvb/3I+d9E/AJKa+exgXarHhkpSaij1V8FrXxqx1GgNMK
Bw/nU5stMjA=
=M2tb
-----END PGP SIGNATURE-----

From barry at python.org  Sun Aug 26 14:05:37 2007
From: barry at python.org (Barry Warsaw)
Date: Sun, 26 Aug 2007 08:05:37 -0400
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <faquf9$f8j$1@sea.gmane.org>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<faquf9$f8j$1@sea.gmane.org>
Message-ID: <A03D5C1D-C35C-4272-AF47-C1C6FE72B3FE@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 26, 2007, at 12:10 AM, Neil Schemenauer wrote:

> Brett Cannon <brett at python.org> wrote:
>> I don't like the former, but the latter is intriguing.  If we could
>> host large packages (e.g., email, sqlite, ctypes, etc.) on python.org
>> by providing tracker, svn, and web space they could be developed and
>> released on their own schedule.  Then the Python release would then
>> become a sumo release of these various packages.
>
> Hosting them on python.org is a separate decision.  We should be
> able to pull in packages that are hosted anywhere into the
> "batteries included" distribution.
>
> It sounds like most people are supportive of this idea. All that's
> needed is a little documentation outline the rules that packages
> must confirm to and a little scripting.
>
> We could have another file in the distribution, similar to
> Modules/Setup, say Modules/Batteries. :-)  Something like:
>
>     # ElementTree -- An XML API
>     http://effbot.org/downloads/elementtree.tar.gz
>
>     # email -- An email and MIME handling package
>     http://www.python.org/downloads/email.tar.gz
>
> There could be a makefile target or script that downloads them and
> unpacks them into the right places.

/IF/ we do this, I would require that the packages be available on  
the cheese^H^H^H^H^H er, PyPI.  The key thing is the version number  
since there will be at least 3 versions of email being maintained.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtFskXEjvBPtnXfVAQLTxQQAhhmFOkjLUgMPl2Kt6q7yn1anZUhQlagb
cxroOsXZ55tScn8KVnQ5oFbv1l5IFg+bzdEZZcNyEsCptFs9WuKqYUB7/hAJ+mF+
Cw/zQUGoZUT2ZyB19pIfb9At1tp6sf2vLZraXztsHh6jib2uQVc0kCKR3HA+Tjef
XQKdyUTjVW4=
=bM9q
-----END PGP SIGNATURE-----

From p.f.moore at gmail.com  Sun Aug 26 14:33:26 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 26 Aug 2007 13:33:26 +0100
Subject: [Python-3000] Limitations of "batteries included"
In-Reply-To: <2088134209622619925@unknownmsgid>
References: <ca471dc20708251855p19ec32a4ib159d92fe2b28be1@mail.gmail.com>
	<fb6fbf560708252002t88d14eew8a1f40b069d6eee7@mail.gmail.com>
	<2088134209622619925@unknownmsgid>
Message-ID: <79990c6b0708260533x105ca70fn3b528a8d632ddb99@mail.gmail.com>

On 26/08/07, Bill Janssen <janssen at parc.com> wrote:
> > On 8/25/07, Guido van Rossum <guido at python.org> wrote:
> >
> > I believe the only reasonable solution is to promote the use of
> > package managers, and to let go of the "batteries included" philosophy
>
> It's important to realize that most operating systems (Windows, OS X)
> don't really support the use of package managers.
[...]
> Even with package managers, installing an external extension is out of
> bounds for most users.
[...]
> So as soon as you require an install of something, you lose 80% of your
> potential users.

These are very good points, and fit exactly with my experience. For my
personal use, I happily install and use any package that helps. For
deployment, however, I very rarely contemplate relying on anything
other than "the essentials" (to me, that covers Python, pywin32, and
cx_Oracle - they get installed by default on any of our systems).

> Though I agree with some of your other points, those about the
> fast-moving unstable frameworks, and about the packages that depend on
> an external non-Python non-standard resource.

Definitely. I think the whole issue of inclusion in the standard
library is a delicate balance - but one which Python has so far got
just about right. I'd like to see that continue. The improvements in
PyPI, and the rise of setuptools and eggs, are great, but shouldn't in
themselves be a reason to slim down the standard library.

Paul.

From aahz at pythoncraft.com  Sun Aug 26 16:07:18 2007
From: aahz at pythoncraft.com (Aahz)
Date: Sun, 26 Aug 2007 07:07:18 -0700
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <18129.26473.77328.489985@montanaro.dyndns.org>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<faquf9$f8j$1@sea.gmane.org> <20070826051302.GC24678@panix.com>
	<18129.26473.77328.489985@montanaro.dyndns.org>
Message-ID: <20070826140718.GA15100@panix.com>

On Sun, Aug 26, 2007, skip at pobox.com wrote:
> 
>     aahz> Please don't interpret a missing chorus of opposition as support.
>     aahz> I'm only -0, but I definitely am negative on the idea based on my
>     aahz> guess about the likelihood of problems.
> 
> -0 on the idea of more batteries or fewer batteries?

-0 on the idea of making "batteries included" include PyPI packages.
Anything part of "batteries included" IMO should just be part of the
standard install.

BTW, you snipped too much context, so that I had to go rummaging through
my old e-email to figure out what the context was.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you don't know what your program is supposed to do, you'd better not
start writing it."  --Dijkstra

From guido at python.org  Sun Aug 26 16:52:02 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 26 Aug 2007 07:52:02 -0700
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <D218DEAC-4742-4961-9442-2DE94B4B4796@python.org>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com>
	<D218DEAC-4742-4961-9442-2DE94B4B4796@python.org>
Message-ID: <ca471dc20708260752ya06978do6b61256bf90423f9@mail.gmail.com>

On 8/26/07, Barry Warsaw <barry at python.org> wrote:
> Another possibility (which I personally favor) is to leave the email
> package in a1 as flawed as it is, but to disable the tests.  It's an /
> alpha/ for gosh sakes, so maybe leaving it in and partly broken will
> help rustle up some volunteers to help get it fixed.
>
> If you're agreeable to this, you can just merge the sandbox[1] into
> the head of the branch.  I'm traveling until Monday and won't get a
> chance to do that until I'm back on the net.  The sandbox is up-to-
> date with all my latest changes.

No, thanks. The broken package doesn't do people much good. People
whose code depends on the email package can't use it anyway until it's
fixed; they either have to wait, or they have to help fix it.
Instructions for accessing the broken package will of course be
included in the README, and as soon as the email package is fixed
we'll include it again (hopefully in 3.0a2).

BTW I'm surprised nobody else is helping out fixing it. I sent several
requests to the email-sig and AFAIK nobody piped up.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Aug 26 16:56:42 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 26 Aug 2007 07:56:42 -0700
Subject: [Python-3000] backported ABC
In-Reply-To: <e17b64640708260329t1c53e4b7r504418793c6ddd97@mail.gmail.com>
References: <f413c71e0708251109t31a92bedrdd5003d5053c852@mail.gmail.com>
	<ca471dc20708251910o1ef3ea77nf0b2579862977211@mail.gmail.com>
	<e17b64640708260329t1c53e4b7r504418793c6ddd97@mail.gmail.com>
Message-ID: <ca471dc20708260756j5a518fe3m789500cddf1385a9@mail.gmail.com>

Thanks!

Would it inconvenience you terribly to upload this all to the new
tracker (bugs.python.org)? Preferably as a single patch against the
svn trunk (to use svn diff, you have to svn add the new files first!)

Also, are you planning to work on _abcoll.py and the changes to collections.py?

--Guido

On 8/26/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6.
>
> After making all the changes we ran all the tests to ensure that no
> other modules were affected.
>
> Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6.
>
> On 8/25/07, Guido van Rossum <guido at python.org> wrote:
> > Um, that patch contains only the C code for overloading isinstance()
> > and issubclass().
> >
> > Did you do anything about abc.py and _abcoll.py/collections.py and
> > their respective unit tests? Or what about the unit tests for
> > isinstance()/issubclass()?
> >
> > On 8/25/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> > > Worked with Alex Martelli at the Goolge Python Sprint.
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From baranguren at gmail.com  Sun Aug 26 12:29:25 2007
From: baranguren at gmail.com (Benjamin Aranguren)
Date: Sun, 26 Aug 2007 03:29:25 -0700
Subject: [Python-3000] backported ABC
In-Reply-To: <ca471dc20708251910o1ef3ea77nf0b2579862977211@mail.gmail.com>
References: <f413c71e0708251109t31a92bedrdd5003d5053c852@mail.gmail.com>
	<ca471dc20708251910o1ef3ea77nf0b2579862977211@mail.gmail.com>
Message-ID: <e17b64640708260329t1c53e4b7r504418793c6ddd97@mail.gmail.com>

We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6.

After making all the changes we ran all the tests to ensure that no
other modules were affected.

Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6.

On 8/25/07, Guido van Rossum <guido at python.org> wrote:
> Um, that patch contains only the C code for overloading isinstance()
> and issubclass().
>
> Did you do anything about abc.py and _abcoll.py/collections.py and
> their respective unit tests? Or what about the unit tests for
> isinstance()/issubclass()?
>
> On 8/25/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> > Worked with Alex Martelli at the Goolge Python Sprint.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: abc.py
Type: text/x-python
Size: 7986 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070826/296fbf4b/attachment-0002.py 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_abc.py
Type: text/x-python
Size: 4591 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070826/296fbf4b/attachment-0003.py 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: abc_backport_to_2_6.patch
Type: text/x-patch
Size: 1867 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070826/296fbf4b/attachment-0002.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_abc_backport_to_2_6.patch
Type: text/x-patch
Size: 3543 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070826/296fbf4b/attachment-0003.bin 

From nas at arctrix.com  Sun Aug 26 19:20:36 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Sun, 26 Aug 2007 17:20:36 +0000 (UTC)
Subject: [Python-3000] Removing email package until it's fixed
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<faquf9$f8j$1@sea.gmane.org> <20070826051302.GC24678@panix.com>
	<18129.26473.77328.489985@montanaro.dyndns.org>
	<20070826140718.GA15100@panix.com>
Message-ID: <fascp4$kqj$1@sea.gmane.org>

Aahz <aahz at pythoncraft.com> wrote:
> -0 on the idea of making "batteries included" include PyPI packages.
> Anything part of "batteries included" IMO should just be part of the
> standard install.

I think you misunderstand the proposal.  The "batteries" would be
included as part of the final Python release.  From the end user's
point of view there would be no change from the current model.  The
difference would be from the Python developer's point of view.  Some
libraries would no longer be part of SVN checkout and you would have
to run a script to pull them into your source tree.

IMO, depending on PyPI not necessary or even desirable.  All that's
necessary is that the batteries conform to some standards regarding
layout, documentation and unit tests.  They could be pulled based on
URLs and the hostname of the URL is not important.  That scheme
would make is easier for someone to make a sumo distribution just by
adding more URLs to the list before building it.

  Neil


From baranguren at gmail.com  Sun Aug 26 19:47:17 2007
From: baranguren at gmail.com (Benjamin Aranguren)
Date: Sun, 26 Aug 2007 10:47:17 -0700
Subject: [Python-3000] backported ABC
In-Reply-To: <e17b64640708261029r372c835ake134636e9dbbb73e@mail.gmail.com>
References: <f413c71e0708251109t31a92bedrdd5003d5053c852@mail.gmail.com>
	<ca471dc20708251910o1ef3ea77nf0b2579862977211@mail.gmail.com>
	<e17b64640708260329t1c53e4b7r504418793c6ddd97@mail.gmail.com>
	<ca471dc20708260756j5a518fe3m789500cddf1385a9@mail.gmail.com>
	<e17b64640708261029r372c835ake134636e9dbbb73e@mail.gmail.com>
Message-ID: <e17b64640708261047s5f96e854ybed62a7248f3ee51@mail.gmail.com>

I got it now.  both modules need to be backported as well.  I'm on it.

On 8/26/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> No problem.  Created issue 1026 in tracker with a single patch file attached.
>
> I'm not aware of what changes need to be done with _abcoll.py and
> collections.py.  If you can point me to the right direction, I would
> definitely like to work on it.
>
> On 8/26/07, Guido van Rossum <guido at python.org> wrote:
> > Thanks!
> >
> > Would it inconvenience you terribly to upload this all to the new
> > tracker (bugs.python.org)? Preferably as a single patch against the
> > svn trunk (to use svn diff, you have to svn add the new files first!)
> >
> > Also, are you planning to work on _abcoll.py and the changes to collections.py?
> >
> > --Guido
> >
> > On 8/26/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> > > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6.
> > >
> > > After making all the changes we ran all the tests to ensure that no
> > > other modules were affected.
> > >
> > > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6.
> > >
> > > On 8/25/07, Guido van Rossum <guido at python.org> wrote:
> > > > Um, that patch contains only the C code for overloading isinstance()
> > > > and issubclass().
> > > >
> > > > Did you do anything about abc.py and _abcoll.py/collections.py and
> > > > their respective unit tests? Or what about the unit tests for
> > > > isinstance()/issubclass()?
> > > >
> > > > On 8/25/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> > > > > Worked with Alex Martelli at the Goolge Python Sprint.
> > > >
> > > > --
> > > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > > >
> > >
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>

From john.m.camara at comcast.net  Sun Aug 26 19:50:48 2007
From: john.m.camara at comcast.net (john.m.camara at comcast.net)
Date: Sun, 26 Aug 2007 17:50:48 +0000
Subject: [Python-3000] Limitations of "batteries included"
Message-ID: <082620071750.21994.46D1BD7800071D07000055EA22120207840E9D0E030E0CD203D202080106@comcast.net>

Sorry.  Forgot to change the subject

 -------------- Original message ----------------------
From: john.m.camara at comcast.net
> On 8/25/07, "Guido van Rossum" <guido at python.org> wrote:
> > Take for example GUI packages. Tkinter is far from ideal, but there
> > are many competitors, none of them perfect (not even those packages
> > specifically designed to be platform-neutral). We can't very well
> > include all of the major packages (PyQt, PyGtk, wxPython, anygui) --
> > the release would just bloat tremendously, and getting stable versions
> > of all of these would just be a maintenance nightmare. (I don't know
> > how Linux distros do it, but they tend to have a large group of people
> > *just* devoted to *bundling* stuff, and their release cycles are even
> > slower. I don't think Python should be in that business.)
> 
> Python can't include all the major packages but it is necessary for any 
> language to support a good GUI package in order to be widely adopted 
> by the masses.  Right now this is one of Python's weaknesses that needs 
> to be corrected.  I agree with you that none of the major packages are 
> perfect and at the current slow rate of progress in this area I doubt any 
> of them will be perfect any time soon.  There just doesn't seam like there 
> is enough motivation out there for this issue to self correct itself unlike the 
> situation that is currently go on in the web frameworks where significant 
> progress has been made in the last 2 years.  I think its time to just 
> pronounce a package as it will be good for the community.  My vote would 
> be for wxPython but I'm not someone who truly cares much about GUIs 
> as I much prefer to write the back ends of systems and stay far away from 
> the front ends.
> > 
> > Database wrappers are in the same boat, and IMO the approach of
> > separately downloadable 3rd party wrappers (sometimes multiple
> > competing wrappers for the same database) has served the users well.
> 
> I agree with you at this point in time but SQLAlchemy is something special 
> and will likely be worthy to be part of the std library in 18-24 months if the 
> current rate of development continues.  In my opinion, it's Python's new 
> killer library and I expect it will be given a significant amount of positive 
> press soon and will help Python's user base grow.
> 
> > 
> > Would anyone seriously consider including something like Django,
> > TurboGears or Pylons in a Python release? I hope not -- these all
> > evolve at a rate about 10x that of Python, and the version included
> > with a core distribution would be out of date (and a nuisance to
> > replace) within months of the core release.
> 
> At this point in time none of the web frameworks are worthy to be included 
> in the standard library.  I believe the community has been doing a good 
> job in this area with great progress being made in the last few years.  What 
> we need in the standard library are some additional low level libraries/api 
> like WSGI.  For example libraries for authentication/authorization, a web 
> services bus to manage WSGI services (to provide start, stop, reload, 
> events, scheduler, etc), and a new configuration system so that higher 
> level frameworks can seamlessly work together.
> 
> John


From john.m.camara at comcast.net  Sun Aug 26 19:48:45 2007
From: john.m.camara at comcast.net (john.m.camara at comcast.net)
Date: Sun, 26 Aug 2007 17:48:45 +0000
Subject: [Python-3000] Python-3000 Digest, Vol 18, Issue 116
Message-ID: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net>

On 8/25/07, "Guido van Rossum" <guido at python.org> wrote:
> Take for example GUI packages. Tkinter is far from ideal, but there
> are many competitors, none of them perfect (not even those packages
> specifically designed to be platform-neutral). We can't very well
> include all of the major packages (PyQt, PyGtk, wxPython, anygui) --
> the release would just bloat tremendously, and getting stable versions
> of all of these would just be a maintenance nightmare. (I don't know
> how Linux distros do it, but they tend to have a large group of people
> *just* devoted to *bundling* stuff, and their release cycles are even
> slower. I don't think Python should be in that business.)

Python can't include all the major packages but it is necessary for any 
language to support a good GUI package in order to be widely adopted 
by the masses.  Right now this is one of Python's weaknesses that needs 
to be corrected.  I agree with you that none of the major packages are 
perfect and at the current slow rate of progress in this area I doubt any 
of them will be perfect any time soon.  There just doesn't seam like there 
is enough motivation out there for this issue to self correct itself unlike the 
situation that is currently go on in the web frameworks where significant 
progress has been made in the last 2 years.  I think its time to just 
pronounce a package as it will be good for the community.  My vote would 
be for wxPython but I'm not someone who truly cares much about GUIs 
as I much prefer to write the back ends of systems and stay far away from 
the front ends.
> 
> Database wrappers are in the same boat, and IMO the approach of
> separately downloadable 3rd party wrappers (sometimes multiple
> competing wrappers for the same database) has served the users well.

I agree with you at this point in time but SQLAlchemy is something special 
and will likely be worthy to be part of the std library in 18-24 months if the 
current rate of development continues.  In my opinion, it's Python's new 
killer library and I expect it will be given a significant amount of positive 
press soon and will help Python's user base grow.

> 
> Would anyone seriously consider including something like Django,
> TurboGears or Pylons in a Python release? I hope not -- these all
> evolve at a rate about 10x that of Python, and the version included
> with a core distribution would be out of date (and a nuisance to
> replace) within months of the core release.

At this point in time none of the web frameworks are worthy to be included 
in the standard library.  I believe the community has been doing a good 
job in this area with great progress being made in the last few years.  What 
we need in the standard library are some additional low level libraries/api 
like WSGI.  For example libraries for authentication/authorization, a web 
services bus to manage WSGI services (to provide start, stop, reload, 
events, scheduler, etc), and a new configuration system so that higher 
level frameworks can seamlessly work together.

John

From barry at python.org  Sun Aug 26 20:30:47 2007
From: barry at python.org (Barry Warsaw)
Date: Sun, 26 Aug 2007 14:30:47 -0400
Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets)
In-Reply-To: <87y7g0401v.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
	<93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org>
	<87y7g0401v.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <9CBCCF2F-B428-4D37-8C18-1EAFB86CD7D9@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 25, 2007, at 2:10 AM, Stephen J. Turnbull wrote:

> Barry Warsaw writes:
>
>> I've been spending hours of my own time on the email package for py3k
>> this week and every time I think I'm nearing success I get defeated
>> again.
>
> I'm ankle deep in the Big Muddy (daughter tested positive for TB as
> expected -- the Japanese innoculate all children against it because of
> the sins of their fathers -- and school starts on Tuesday, so we need
> to make a bunch of extra trips to doctors and whatnot), so what thin
> hope I had of hanging out with the big boys at the Python-3000 sprint
> long since evaporated.

Stephen, sorry to hear about your daughter and I hope she's going to  
be okay of course!

> However, starting next week I should have a day a week or so I can
> devote to email stuff -- if you want to send any thoughts or
> requisitions my way (or an URL to sprint IRC transcripts), I'd love to
> help.  Of course you'll get it all done and leave none for me, right?

Unfortunately, we didn't really sprint much on it, but I did get a  
chance to spend time on the branch.  I think I see the light at the  
end of the tunnel for getting the existing tests to pass, though I  
haven't even looked at test_email_codecs.py yet.  Because of the way  
things are going to work with in put and output codecs, I'll  
definitely want to get some sanity checks with Asian codecs.  I'll  
try to put together a list of issues and questions and get those sent  
out next week.

>> But I'm determined to solve the worst of the problems this week.
>
> Bu-wha-ha-ha!

Heh, well I'm getting closer.  We're definitely going to have some  
API changes, so I'll outline those as well.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtHG13EjvBPtnXfVAQKCngP+PUTm82FjnVpqz7HvPLS/zPXBMelDNhkK
AKGIk5hveka180QEbA/DMsu7LZmPK2jXOQJWxufRsLfuzwKL3WtDF1IIyiICkC/I
HoR04bHZJzUdEzZuZPL53I704JoO8QBpXEOn/JdauFEaZ6qakueLdnqx1Ab0LbSP
RCLiVh9BxtU=
=6Ngh
-----END PGP SIGNATURE-----

From janssen at parc.com  Sun Aug 26 20:44:48 2007
From: janssen at parc.com (Bill Janssen)
Date: Sun, 26 Aug 2007 11:44:48 PDT
Subject: [Python-3000] Limitations of "batteries included"
In-Reply-To: <79990c6b0708260533x105ca70fn3b528a8d632ddb99@mail.gmail.com> 
References: <ca471dc20708251855p19ec32a4ib159d92fe2b28be1@mail.gmail.com>
	<fb6fbf560708252002t88d14eew8a1f40b069d6eee7@mail.gmail.com>
	<2088134209622619925@unknownmsgid>
	<79990c6b0708260533x105ca70fn3b528a8d632ddb99@mail.gmail.com>
Message-ID: <07Aug26.114451pdt."57996"@synergy1.parc.xerox.com>

> These are very good points, and fit exactly with my experience. For my
> personal use, I happily install and use any package that helps. For
> deployment, however, I very rarely contemplate relying on anything
> other than "the essentials" (to me, that covers Python, pywin32, and
> cx_Oracle - they get installed by default on any of our systems).

Indeed.  I still write everything against Python 2.3.5, just so that
OS X users can use my stuff -- few people will install a second Python
on their machine.

Bill

From guido at python.org  Sun Aug 26 21:24:51 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 26 Aug 2007 12:24:51 -0700
Subject: [Python-3000] How should the hash digest of a Unicode string be
	computed?
Message-ID: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>

Change r57490 by Gregory P Smith broke a test in test_unicodedata and,
on PPC OSX, several tests in test_hashlib.

Looking into this it's pretty clear *why* it broke: before, the 's#'
format code was used, while Gregory's change changed this into using
the buffer API (to ensure the data won't move around). Now, when a
(Unicode) string is passed to s#, it uses the UTF-8 encoding. But the
buffer API uses the raw bytes in the Unicode object, which is
typically UTF-16 or UTF-32. (I can't quite figure out why the tests
didn't fail on my Linux box; I'm guessing it's an endianness issue,
but it can't be that simple. Perhaps that box happens to be falling
back on a different implementation of the checksums?)

I checked in a fix (because I don't like broken tests :-) which
restores the old behavior by passing PyBUF_CHARACTER to
PyObject_GetBuffer(), which enables a special case in the buffer API
for PyUnicode that returns the UTF-8 encoded bytes instead of the raw
bytes. (I still find this questionable, especially since a few random
places in bytesobject.c also use PyBUF_CHARACTER, presumably to make
tests pass, but for the *bytes* type, requesting *characters* (even
encoded ones) is iffy.

But I'm wondering if passing a Unicode string to the various hash
digest functions should work at all! Hashes are defined on sequences
of bytes, and IMO we should insist on the user to pass us bytes, and
not second-guess what to do with Unicode.

Opinions?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From collinw at gmail.com  Sun Aug 26 21:44:54 2007
From: collinw at gmail.com (Collin Winter)
Date: Sun, 26 Aug 2007 12:44:54 -0700
Subject: [Python-3000] A couple 2to3 questions
In-Reply-To: <ee2a432c0708252013j4bc9bd58v9eea4abf31ee896b@mail.gmail.com>
References: <18128.60288.458934.140003@montanaro.dyndns.org>
	<ee2a432c0708252013j4bc9bd58v9eea4abf31ee896b@mail.gmail.com>
Message-ID: <43aa6ff70708261244v5ed8b85bj8d62b001bb630134@mail.gmail.com>

On 8/25/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> On 8/25/07, skip at pobox.com <skip at pobox.com> wrote:
> >     2. I noticed a couple places where it seems to replace "if isinstance"
> >        with "ifinstance".  Seems like an output bug of some sort.
>
> That bug was probably me.  I did some large changes and broke
> somethings a while back.  I've since learned my lesson and just use
> 2to3 to automate the task. :-)

It wasn't you; it was a bug in fix_type_equality. I've fixed it in r57514.

Collin Winter

From amauryfa at gmail.com  Sun Aug 26 23:23:37 2007
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Sun, 26 Aug 2007 23:23:37 +0200
Subject: [Python-3000] python 3 closes sys.stdout
Message-ID: <e27efe130708261423x6631df35h78b6263662577cd7@mail.gmail.com>

Hello,

It seems that the new I/O system closes the 3 standard descriptors
(stdin, stdout and stderr) when the sys module is unloaded.

I don't know if it is a good thing on Unix, but on Windows at least,
python crashes on exit, when call_ll_exitfuncs calls fflush(stdout)
and fflush(stderr).

As a quick correction, I changed a test in _fileio.c::internal_close():

Index: Modules/_fileio.c
===========================================
--- Modules/_fileio.c        (revision 57506)
+++ Modules/_fileio.c        (working copy)
@@ -45,7 +45,7 @@
 internal_close(PyFileIOObject *self)
 {
        int save_errno = 0;
-       if (self->fd >= 0) {
+       if (self->fd >= 3) {
                int fd = self->fd;
                self->fd = -1;
                Py_BEGIN_ALLOW_THREADS

OTOH, documentation of io.open() says
"""
    (*) If a file descriptor is given, it is closed when the returned
    I/O object is closed.  If you don't want this to happen, use
    os.dup() to create a duplicate file descriptor.
"""

So a more correct change would be to dup the three sys.stdout,
sys.stdin, sys.stderr, in site.py: installnewio()
(BTW, the -S option is broken. You guess why)

What are the consequences of a dup() on the standard descriptors?
I don't like the idea of sys.stdout.fileno() to be different than 1. I
know some code using the numbers 0,1,2 to refer to the standard files.

Or we could change the behaviour to "If a file descriptor is given, it
won't be closed". You opened it, you close it.

What do you think?

-- 
Amaury Forgeot d'Arc

From nnorwitz at gmail.com  Sun Aug 26 23:36:02 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sun, 26 Aug 2007 14:36:02 -0700
Subject: [Python-3000] python 3 closes sys.stdout
In-Reply-To: <e27efe130708261423x6631df35h78b6263662577cd7@mail.gmail.com>
References: <e27efe130708261423x6631df35h78b6263662577cd7@mail.gmail.com>
Message-ID: <ee2a432c0708261436h27c7d1b2tc6c4fc47fc258994@mail.gmail.com>

On 8/26/07, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
> Hello,
>
> It seems that the new I/O system closes the 3 standard descriptors
> (stdin, stdout and stderr) when the sys module is unloaded.

Amaury,

Other than this problem, can you report on how py3k is working on
Windows?  How did you compile it?  What version of the compiler?  Did
you have any problems?  Do you have outstanding changes to make it
work?  Which tests are failing?  etc.

Thanks,
n

From victor.stinner at haypocalc.com  Mon Aug 27 00:11:21 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 27 Aug 2007 00:11:21 +0200
Subject: [Python-3000] python 3 closes sys.stdout
In-Reply-To: <e27efe130708261423x6631df35h78b6263662577cd7@mail.gmail.com>
References: <e27efe130708261423x6631df35h78b6263662577cd7@mail.gmail.com>
Message-ID: <200708270011.21719.victor.stinner@haypocalc.com>

On Sunday 26 August 2007 23:23:37 Amaury Forgeot d'Arc wrote:
>  internal_close(PyFileIOObject *self)
>  {
>         int save_errno = 0;
> -       if (self->fd >= 0) {
> +       if (self->fd >= 3) {
>                 int fd = self->fd;
>                 self->fd = -1;
>                 Py_BEGIN_ALLOW_THREADS

Hum, a before fix would be to add an option to choose if the file should be 
closed or not on object destruction.

Victor Stinner aka haypo
http://hachoir.org/

From greg at krypto.org  Mon Aug 27 00:54:07 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sun, 26 Aug 2007 15:54:07 -0700
Subject: [Python-3000] How should the hash digest of a Unicode string be
	computed?
In-Reply-To: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
References: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
Message-ID: <52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com>

I'm in favor of not allowing unicode for hash functions.  Depending on
the system default encoding for a hash will not be portable.

another question for hashlib:  It uses PyArg_Parse to get a single 's'
out of an optional parameter [see the code] and I couldn't figure out
what the best thing to do there was.  It just needs a C string to pass
to openssl to lookup a hash function by name.  Its C so i doubt it'll
ever be anything but ascii.  How should that parameter be parsed
instead of the old 's' string format?  PyBUF_CHARACTER actually sounds
ideal in that case assuming it guarantees UTF-8 but I wasn't clear
that it did that (is it always utf-8 or the possibly useless as far as
APIs expecting C strings are concerned system "default encoding")?
Requiring a bytes object would also work but I really don't like the
idea of users needing to use a specific type for something so simple.
(i consider string constants with their preceding b, r, u, s, type
characters ugly in code without a good reason for them to be there)

test_hashlib.py passed on the x86 osx system i was using to write the
code.  I neglected to run the full suite or grep for hashlib in other
test suites and run those so i missed the test_unicodedata failure,
sorry about the breakage.

Is it just me or do unicode objects supporting the buffer api seem
like an odd concept given that buffer api consumers (rather than
unicode consumers) shouldn't need to know about encodings of the data
being received.

-gps

On 8/26/07, Guido van Rossum <guido at python.org> wrote:
> Change r57490 by Gregory P Smith broke a test in test_unicodedata and,
> on PPC OSX, several tests in test_hashlib.
>
> Looking into this it's pretty clear *why* it broke: before, the 's#'
> format code was used, while Gregory's change changed this into using
> the buffer API (to ensure the data won't move around). Now, when a
> (Unicode) string is passed to s#, it uses the UTF-8 encoding. But the
> buffer API uses the raw bytes in the Unicode object, which is
> typically UTF-16 or UTF-32. (I can't quite figure out why the tests
> didn't fail on my Linux box; I'm guessing it's an endianness issue,
> but it can't be that simple. Perhaps that box happens to be falling
> back on a different implementation of the checksums?)
>
> I checked in a fix (because I don't like broken tests :-) which
> restores the old behavior by passing PyBUF_CHARACTER to
> PyObject_GetBuffer(), which enables a special case in the buffer API
> for PyUnicode that returns the UTF-8 encoded bytes instead of the raw
> bytes. (I still find this questionable, especially since a few random
> places in bytesobject.c also use PyBUF_CHARACTER, presumably to make
> tests pass, but for the *bytes* type, requesting *characters* (even
> encoded ones) is iffy.
>
> But I'm wondering if passing a Unicode string to the various hash
> digest functions should work at all! Hashes are defined on sequences
> of bytes, and IMO we should insist on the user to pass us bytes, and
> not second-guess what to do with Unicode.
>
> Opinions?
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org
>

From adam at hupp.org  Mon Aug 27 03:00:44 2007
From: adam at hupp.org (Adam Hupp)
Date: Sun, 26 Aug 2007 21:00:44 -0400
Subject: [Python-3000] Support for newline and encoding arguments to open in
	tempfile module, also mktemp deprecation
Message-ID: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com>

It would be useful to support 'newline' and 'encoding' arguments in
tempfile.TemporaryFile and friends.  These new arguments would be
passed directly into io.open.  I've uploaded a patch for this to:

http://bugs.python.org/issue1033

The 'bufsize' argument to os.fdopen has changed to 'buffering' so I
went ahead and made the same change to TemporaryFile etc.  Is this a
desirable?

While in tempfile, I noticed that tempfile.mktemp() has the following comment:

"This function is unsafe and should not be used."

The docs list it as "Deprecated since release 2.3".  Should it be
removed in py3k?


-- 
Adam Hupp | http://hupp.org/adam/

From oliphant.travis at ieee.org  Mon Aug 27 03:13:33 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Sun, 26 Aug 2007 20:13:33 -0500
Subject: [Python-3000] How should the hash digest of a Unicode string be
	computed?
In-Reply-To: <52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com>
References: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
	<52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com>
Message-ID: <fat8fv$u9v$1@sea.gmane.org>

Gregory P. Smith wrote:
> I'm in favor of not allowing unicode for hash functions.  Depending on
> the system default encoding for a hash will not be portable.
> 
> another question for hashlib:  It uses PyArg_Parse to get a single 's'
> out of an optional parameter [see the code] and I couldn't figure out
> what the best thing to do there was.  It just needs a C string to pass
> to openssl to lookup a hash function by name.  Its C so i doubt it'll
> ever be anything but ascii.  How should that parameter be parsed
> instead of the old 's' string format?  PyBUF_CHARACTER actually sounds
> ideal in that case assuming it guarantees UTF-8 but I wasn't clear
> that it did that (is it always utf-8 or the possibly useless as far as
> APIs expecting C strings are concerned system "default encoding")?
> Requiring a bytes object would also work but I really don't like the
> idea of users needing to use a specific type for something so simple.
> (i consider string constants with their preceding b, r, u, s, type
> characters ugly in code without a good reason for them to be there)
>

The PyBUF_CHARACTER flag was an add-on after I realized that the old 
buffer API was being in several places to get Unicode objects to encode 
their data as a string (in the default encoding of the system, I believe).

The unicode object is the only one that I know of that actually does 
something different when it is called with PyBUF_CHARACTER.


> test_hashlib.py passed on the x86 osx system i was using to write the
> code.  I neglected to run the full suite or grep for hashlib in other
> test suites and run those so i missed the test_unicodedata failure,
> sorry about the breakage.
> 
> Is it just me or do unicode objects supporting the buffer api seem
> like an odd concept given that buffer api consumers (rather than
> unicode consumers) shouldn't need to know about encodings of the data
> being received.

I think you have a point.   The buffer API does support the concept of 
"formats" but not "encodings" so having this PyBUF_CHARACTER flag looks 
rather like a hack.   I'd have to look, because I don't even remember 
what is returned as the "format" from a unicode object if it is 
requested (it is probably not correct).

I would prefer that the notion of encoding a unicode object is separated 
from the notion of the buffer API, but last week I couldn't see another 
way to un-tease it.


-Travis



> 
> -gps
> 
> On 8/26/07, Guido van Rossum <guido at python.org> wrote:
>> Change r57490 by Gregory P Smith broke a test in test_unicodedata and,
>> on PPC OSX, several tests in test_hashlib.
>>
>> Looking into this it's pretty clear *why* it broke: before, the 's#'
>> format code was used, while Gregory's change changed this into using
>> the buffer API (to ensure the data won't move around). Now, when a
>> (Unicode) string is passed to s#, it uses the UTF-8 encoding. But the
>> buffer API uses the raw bytes in the Unicode object, which is
>> typically UTF-16 or UTF-32. (I can't quite figure out why the tests
>> didn't fail on my Linux box; I'm guessing it's an endianness issue,
>> but it can't be that simple. Perhaps that box happens to be falling
>> back on a different implementation of the checksums?)
>>
>> I checked in a fix (because I don't like broken tests :-) which
>> restores the old behavior by passing PyBUF_CHARACTER to
>> PyObject_GetBuffer(), which enables a special case in the buffer API
>> for PyUnicode that returns the UTF-8 encoded bytes instead of the raw
>> bytes. (I still find this questionable, especially since a few random
>> places in bytesobject.c also use PyBUF_CHARACTER, presumably to make
>> tests pass, but for the *bytes* type, requesting *characters* (even
>> encoded ones) is iffy.
>>
>> But I'm wondering if passing a Unicode string to the various hash
>> digest functions should work at all! Hashes are defined on sequences
>> of bytes, and IMO we should insist on the user to pass us bytes, and
>> not second-guess what to do with Unicode.
>>
>> Opinions?
>>
>> --
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>> _______________________________________________
>> Python-3000 mailing list
>> Python-3000 at python.org
>> http://mail.python.org/mailman/listinfo/python-3000
>> Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org
>>


From guido at python.org  Mon Aug 27 03:51:39 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 26 Aug 2007 18:51:39 -0700
Subject: [Python-3000] Support for newline and encoding arguments to
	open in tempfile module, also mktemp deprecation
In-Reply-To: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com>
References: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com>
Message-ID: <ca471dc20708261851h1b16c6acie6359d066729b90e@mail.gmail.com>

On 8/26/07, Adam Hupp <adam at hupp.org> wrote:
> It would be useful to support 'newline' and 'encoding' arguments in
> tempfile.TemporaryFile and friends.  These new arguments would be
> passed directly into io.open.  I've uploaded a patch for this to:
>
> http://bugs.python.org/issue1033
>
> The 'bufsize' argument to os.fdopen has changed to 'buffering' so I
> went ahead and made the same change to TemporaryFile etc.  Is this a
> desirable?

Hm, why not just create the temporary file in binary mode and wrap an
io.TextIOWrapper instance around it?

> While in tempfile, I noticed that tempfile.mktemp() has the following comment:
>
> "This function is unsafe and should not be used."
>
> The docs list it as "Deprecated since release 2.3".  Should it be
> removed in py3k?

I personally think the deprecation was an overreaction to the security
concerns. People avoid the warning by calling mkstemp() but then just
close the file descriptor and use the filename anyway; that's just as
unsafe, but often there's just no other way. I say, remove the
deprecation.

The attack on mktemp() is much less likely because the name is much
more random anyway.

(If you haven't heard of the attack: another process could guess the
name of the tempfile and quickly replacing it with a symbolic link
pointing to a file owned by the user owning the process, e.g.
/etc/passwd, which will then get overwritten. This is because /tmp is
writable by anyone. It works for non-root users too, to some extent.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 27 03:54:49 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 26 Aug 2007 18:54:49 -0700
Subject: [Python-3000] How should the hash digest of a Unicode string be
	computed?
In-Reply-To: <fat8fv$u9v$1@sea.gmane.org>
References: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
	<52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com>
	<fat8fv$u9v$1@sea.gmane.org>
Message-ID: <ca471dc20708261854m4911cfdaj47fee2598db18a17@mail.gmail.com>

On 8/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> Gregory P. Smith wrote:
> > I'm in favor of not allowing unicode for hash functions.  Depending on
> > the system default encoding for a hash will not be portable.
> >
> > another question for hashlib:  It uses PyArg_Parse to get a single 's'
> > out of an optional parameter [see the code] and I couldn't figure out
> > what the best thing to do there was.  It just needs a C string to pass
> > to openssl to lookup a hash function by name.  Its C so i doubt it'll
> > ever be anything but ascii.  How should that parameter be parsed
> > instead of the old 's' string format?  PyBUF_CHARACTER actually sounds
> > ideal in that case assuming it guarantees UTF-8 but I wasn't clear
> > that it did that (is it always utf-8 or the possibly useless as far as
> > APIs expecting C strings are concerned system "default encoding")?
> > Requiring a bytes object would also work but I really don't like the
> > idea of users needing to use a specific type for something so simple.
> > (i consider string constants with their preceding b, r, u, s, type
> > characters ugly in code without a good reason for them to be there)
> >
>
> The PyBUF_CHARACTER flag was an add-on after I realized that the old
> buffer API was being in several places to get Unicode objects to encode
> their data as a string (in the default encoding of the system, I believe).
>
> The unicode object is the only one that I know of that actually does
> something different when it is called with PyBUF_CHARACTER.

Aha, I figured something like that.

> > test_hashlib.py passed on the x86 osx system i was using to write the
> > code.  I neglected to run the full suite or grep for hashlib in other
> > test suites and run those so i missed the test_unicodedata failure,
> > sorry about the breakage.
> >
> > Is it just me or do unicode objects supporting the buffer api seem
> > like an odd concept given that buffer api consumers (rather than
> > unicode consumers) shouldn't need to know about encodings of the data
> > being received.
>
> I think you have a point.   The buffer API does support the concept of
> "formats" but not "encodings" so having this PyBUF_CHARACTER flag looks
> rather like a hack.   I'd have to look, because I don't even remember
> what is returned as the "format" from a unicode object if it is
> requested (it is probably not correct).
>
> I would prefer that the notion of encoding a unicode object is separated
> from the notion of the buffer API, but last week I couldn't see another
> way to un-tease it.

I'll work on this some more. The problem is that it is currently
relied on in a number of places (some of which probably don't even
know it), and all those places must be changed to explicitly encode
the Unicode string instead of passing it to some API that expects
bytes.

FWIW, this is the only issue that I have with your work so far. Two of
your friends made it to the Sprint at least one day, but I have to
admit that I don't know if they made any changes.

--Guido

> -Travis
>
>
>
> >
> > -gps
> >
> > On 8/26/07, Guido van Rossum <guido at python.org> wrote:
> >> Change r57490 by Gregory P Smith broke a test in test_unicodedata and,
> >> on PPC OSX, several tests in test_hashlib.
> >>
> >> Looking into this it's pretty clear *why* it broke: before, the 's#'
> >> format code was used, while Gregory's change changed this into using
> >> the buffer API (to ensure the data won't move around). Now, when a
> >> (Unicode) string is passed to s#, it uses the UTF-8 encoding. But the
> >> buffer API uses the raw bytes in the Unicode object, which is
> >> typically UTF-16 or UTF-32. (I can't quite figure out why the tests
> >> didn't fail on my Linux box; I'm guessing it's an endianness issue,
> >> but it can't be that simple. Perhaps that box happens to be falling
> >> back on a different implementation of the checksums?)
> >>
> >> I checked in a fix (because I don't like broken tests :-) which
> >> restores the old behavior by passing PyBUF_CHARACTER to
> >> PyObject_GetBuffer(), which enables a special case in the buffer API
> >> for PyUnicode that returns the UTF-8 encoded bytes instead of the raw
> >> bytes. (I still find this questionable, especially since a few random
> >> places in bytesobject.c also use PyBUF_CHARACTER, presumably to make
> >> tests pass, but for the *bytes* type, requesting *characters* (even
> >> encoded ones) is iffy.
> >>
> >> But I'm wondering if passing a Unicode string to the various hash
> >> digest functions should work at all! Hashes are defined on sequences
> >> of bytes, and IMO we should insist on the user to pass us bytes, and
> >> not second-guess what to do with Unicode.
> >>
> >> Opinions?
> >>
> >> --
> >> --Guido van Rossum (home page: http://www.python.org/~guido/)
> >> _______________________________________________
> >> Python-3000 mailing list
> >> Python-3000 at python.org
> >> http://mail.python.org/mailman/listinfo/python-3000
> >> Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org
> >>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at krypto.org  Mon Aug 27 05:43:30 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sun, 26 Aug 2007 20:43:30 -0700
Subject: [Python-3000] How should the hash digest of a Unicode string be
	computed?
In-Reply-To: <fat8fv$u9v$1@sea.gmane.org>
References: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
	<52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com>
	<fat8fv$u9v$1@sea.gmane.org>
Message-ID: <52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com>

On 8/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
>
> Gregory P. Smith wrote:
> > I'm in favor of not allowing unicode for hash functions.  Depending on
> > the system default encoding for a hash will not be portable.
> >
> > another question for hashlib:  It uses PyArg_Parse to get a single 's'
> > out of an optional parameter [see the code] and I couldn't figure out
> > what the best thing to do there was.  It just needs a C string to pass
> > to openssl to lookup a hash function by name.  Its C so i doubt it'll
> > ever be anything but ascii.  How should that parameter be parsed
> > instead of the old 's' string format?  PyBUF_CHARACTER actually sounds
> > ideal in that case assuming it guarantees UTF-8 but I wasn't clear
> > that it did that (is it always utf-8 or the possibly useless as far as
> > APIs expecting C strings are concerned system "default encoding")?
> > Requiring a bytes object would also work but I really don't like the
> > idea of users needing to use a specific type for something so simple.
> > (i consider string constants with their preceding b, r, u, s, type
> > characters ugly in code without a good reason for them to be there)
> >
>
> The PyBUF_CHARACTER flag was an add-on after I realized that the old
> buffer API was being in several places to get Unicode objects to encode
> their data as a string (in the default encoding of the system, I believe).
>
> The unicode object is the only one that I know of that actually does
> something different when it is called with PyBUF_CHARACTER.
>
> Is it just me or do unicode objects supporting the buffer api seem
> > like an odd concept given that buffer api consumers (rather than
> > unicode consumers) shouldn't need to know about encodings of the data
> > being received.
>
> I think you have a point.   The buffer API does support the concept of
> "formats" but not "encodings" so having this PyBUF_CHARACTER flag looks
> rather like a hack.   I'd have to look, because I don't even remember
> what is returned as the "format" from a unicode object if it is
> requested (it is probably not correct).


given that utf-8 characters are varying widths i don't see how it could ever
practically be correct for unicode.

I would prefer that the notion of encoding a unicode object is separated
> from the notion of the buffer API, but last week I couldn't see another
> way to un-tease it.
>
> -Travis


A thought that just occurred to me... Would a PyBUF_CANONICAL flag be useful
instead of CHARACTERS?  For unicode that'd mean utf-8 (not just the default
encoding) but I could imagine other potential uses such as multi-dimension
buffers (PIL image objects?) presenting a defined canonical form of the data
useful for either serialization and hashing.  Any buffer api implementing
object would define its own canonical form.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070826/20b62c7a/attachment.htm 

From guido at python.org  Mon Aug 27 07:02:16 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 26 Aug 2007 22:02:16 -0700
Subject: [Python-3000] How should the hash digest of a Unicode string be
	computed?
In-Reply-To: <52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com>
References: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
	<52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com>
	<fat8fv$u9v$1@sea.gmane.org>
	<52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com>
Message-ID: <ca471dc20708262202x1214db31gde15145dd9c93246@mail.gmail.com>

On 8/26/07, Gregory P. Smith <greg at krypto.org> wrote:
> On 8/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> > Gregory P. Smith wrote:
> > > I'm in favor of not allowing unicode for hash functions.  Depending on
> > > the system default encoding for a hash will not be portable.
> > >
> > > another question for hashlib:  It uses PyArg_Parse to get a single 's'
> > > out of an optional parameter [see the code] and I couldn't figure out
> > > what the best thing to do there was.  It just needs a C string to pass
> > > to openssl to lookup a hash function by name.  Its C so i doubt it'll
> > > ever be anything but ascii.  How should that parameter be parsed
> > > instead of the old 's' string format?  PyBUF_CHARACTER actually sounds
> > > ideal in that case assuming it guarantees UTF-8 but I wasn't clear
> > > that it did that (is it always utf-8 or the possibly useless as far as
> > > APIs expecting C strings are concerned system "default encoding")?
> > > Requiring a bytes object would also work but I really don't like the
> > > idea of users needing to use a specific type for something so simple.
> > > (i consider string constants with their preceding b, r, u, s, type
> > > characters ugly in code without a good reason for them to be there)
> > >
> >
> > The PyBUF_CHARACTER flag was an add-on after I realized that the old
> > buffer API was being in several places to get Unicode objects to encode
> > their data as a string (in the default encoding of the system, I believe).
> >
> > The unicode object is the only one that I know of that actually does
> > something different when it is called with PyBUF_CHARACTER.
> >
> > > Is it just me or do unicode objects supporting the buffer api seem
> > > like an odd concept given that buffer api consumers (rather than
> > > unicode consumers) shouldn't need to know about encodings of the data
> > > being received.
> >
> > I think you have a point.   The buffer API does support the concept of
> > "formats" but not "encodings" so having this PyBUF_CHARACTER flag looks
> > rather like a hack.   I'd have to look, because I don't even remember
> > what is returned as the "format" from a unicode object if it is
> > requested (it is probably not correct).
>
> given that utf-8 characters are varying widths i don't see how it could ever
> practically be correct for unicode.

Well, *practically*, the unicode object returns UTF-8 for
PyBUF_CHARACTER. That is correct (at least until I rip all this out,
which I'm in the middle of -- but no time to finish it tonight).

> > I would prefer that the notion of encoding a unicode object is separated
> > from the notion of the buffer API, but last week I couldn't see another
> > way to un-tease it.
> >
> > -Travis
>
> A thought that just occurred to me... Would a PyBUF_CANONICAL flag be useful
> instead of CHARACTERS?  For unicode that'd mean utf-8 (not just the default
> encoding) but I could imagine other potential uses such as multi-dimension
> buffers (PIL image objects?) presenting a defined canonical form of the data
> useful for either serialization and hashing.  Any buffer api implementing
> object would define its own canonical form.

Note, the default encoding in 3.0 is fixed to UTF-8. (And it's fixed
in a much more permanent way than in 2.x -- it is really hardcoded and
there is really no way to change it.)

But I'm thinking YAGNI -- the buffer API should always just return the
bytes as they already are sitting in memory, not some transformation
thereof. The current behavior of the unicode object for
PyBUF_CHARACTER violates this. (There are no other violations BTW.)
This is why I want to rip it out. I'm close...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From baranguren at gmail.com  Sun Aug 26 19:29:18 2007
From: baranguren at gmail.com (Benjamin Aranguren)
Date: Sun, 26 Aug 2007 10:29:18 -0700
Subject: [Python-3000] backported ABC
In-Reply-To: <ca471dc20708260756j5a518fe3m789500cddf1385a9@mail.gmail.com>
References: <f413c71e0708251109t31a92bedrdd5003d5053c852@mail.gmail.com>
	<ca471dc20708251910o1ef3ea77nf0b2579862977211@mail.gmail.com>
	<e17b64640708260329t1c53e4b7r504418793c6ddd97@mail.gmail.com>
	<ca471dc20708260756j5a518fe3m789500cddf1385a9@mail.gmail.com>
Message-ID: <e17b64640708261029r372c835ake134636e9dbbb73e@mail.gmail.com>

No problem.  Created issue 1026 in tracker with a single patch file attached.

I'm not aware of what changes need to be done with _abcoll.py and
collections.py.  If you can point me to the right direction, I would
definitely like to work on it.

On 8/26/07, Guido van Rossum <guido at python.org> wrote:
> Thanks!
>
> Would it inconvenience you terribly to upload this all to the new
> tracker (bugs.python.org)? Preferably as a single patch against the
> svn trunk (to use svn diff, you have to svn add the new files first!)
>
> Also, are you planning to work on _abcoll.py and the changes to collections.py?
>
> --Guido
>
> On 8/26/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6.
> >
> > After making all the changes we ran all the tests to ensure that no
> > other modules were affected.
> >
> > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6.
> >
> > On 8/25/07, Guido van Rossum <guido at python.org> wrote:
> > > Um, that patch contains only the C code for overloading isinstance()
> > > and issubclass().
> > >
> > > Did you do anything about abc.py and _abcoll.py/collections.py and
> > > their respective unit tests? Or what about the unit tests for
> > > isinstance()/issubclass()?
> > >
> > > On 8/25/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> > > > Worked with Alex Martelli at the Goolge Python Sprint.
> > >
> > > --
> > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > >
> >
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From nnorwitz at gmail.com  Mon Aug 27 08:57:07 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sun, 26 Aug 2007 23:57:07 -0700
Subject: [Python-3000] backported ABC
In-Reply-To: <e17b64640708261047s5f96e854ybed62a7248f3ee51@mail.gmail.com>
References: <f413c71e0708251109t31a92bedrdd5003d5053c852@mail.gmail.com>
	<ca471dc20708251910o1ef3ea77nf0b2579862977211@mail.gmail.com>
	<e17b64640708260329t1c53e4b7r504418793c6ddd97@mail.gmail.com>
	<ca471dc20708260756j5a518fe3m789500cddf1385a9@mail.gmail.com>
	<e17b64640708261029r372c835ake134636e9dbbb73e@mail.gmail.com>
	<e17b64640708261047s5f96e854ybed62a7248f3ee51@mail.gmail.com>
Message-ID: <ee2a432c0708262357o51622336t55e90ad3625751fd@mail.gmail.com>

Another thing that needs to be ported are the changes to
Lib/test/regrtest.py.  Pretty much anything that references ABCs in
there needs backporting.  You can verify it works properly by running
regrtest.py with the -R option on any test that uses an ABC.  It
should not report leaks.  The full command line should look something
like:  ./python Lib/test/regrtest.py -R 4:3 test_abc

n
--
On 8/26/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> I got it now.  both modules need to be backported as well.  I'm on it.
>
> On 8/26/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> > No problem.  Created issue 1026 in tracker with a single patch file attached.
> >
> > I'm not aware of what changes need to be done with _abcoll.py and
> > collections.py.  If you can point me to the right direction, I would
> > definitely like to work on it.
> >
> > On 8/26/07, Guido van Rossum <guido at python.org> wrote:
> > > Thanks!
> > >
> > > Would it inconvenience you terribly to upload this all to the new
> > > tracker (bugs.python.org)? Preferably as a single patch against the
> > > svn trunk (to use svn diff, you have to svn add the new files first!)
> > >
> > > Also, are you planning to work on _abcoll.py and the changes to collections.py?
> > >
> > > --Guido
> > >
> > > On 8/26/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> > > > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6.
> > > >
> > > > After making all the changes we ran all the tests to ensure that no
> > > > other modules were affected.
> > > >
> > > > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6.
> > > >
> > > > On 8/25/07, Guido van Rossum <guido at python.org> wrote:
> > > > > Um, that patch contains only the C code for overloading isinstance()
> > > > > and issubclass().
> > > > >
> > > > > Did you do anything about abc.py and _abcoll.py/collections.py and
> > > > > their respective unit tests? Or what about the unit tests for
> > > > > isinstance()/issubclass()?
> > > > >
> > > > > On 8/25/07, Benjamin Aranguren <baranguren at gmail.com> wrote:
> > > > > > Worked with Alex Martelli at the Goolge Python Sprint.
> > > > >
> > > > > --
> > > > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > >
> >
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/nnorwitz%40gmail.com
>

From nnorwitz at gmail.com  Mon Aug 27 09:48:52 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Mon, 27 Aug 2007 00:48:52 -0700
Subject: [Python-3000] status (preparing for first alpha)
Message-ID: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>

Py3k is progressing nicely.  We are planning the first alpha sometime
this week.  The tests are mostly passing.  With all the churn over the
last week, I'm sure it's about to change.  :-)  AFAIK, nearly all the
tests pass on Linux and Mac OS X.  There was a report that Windows/VC8
was able to build python but it crashed in test_builtin.  Can anyone
confirm this?

Here are the tasks that we need help with before the alpha is released:
 * Verify Windows build works with VC7 (currently the default compiler for 2.5)
 * Verify Windows build passes all tests
 * Verify other Unix builds work and pass all tests
 * Fix reference leaks probably related to IO
 * Fix problem with signal 32 on old gentoo box (new IO related?)

See below for more details about many of these.

The string/unicode merge is making good progress.  There are less than
400 references to PyString.  Most of the references are in about 5-10
modules.  Less than 50 modules in the core have any references to
PyString.  We still need help converting over to use unicode.  If you
are interested in helping out, the spreadsheet is the best place to
look for tasks:
http://spreadsheets.google.com/ccc?key=pBLWM8elhFAmKbrhhh0ApQA&hl=en_US&pli=1

Separate sheets exist for C, Python, Writing, and Reading tasks.

You can review the 3.0 docs at:  http://docs.python.org/dev/3.0/
They are updated every 12 hours.  There are many parts which need improvement.

There are 4 tests that report reference leaks:

test_io leaked [62, 62] references
test_urllib leaked [122, 122] references
test_urllib2_localnet leaked [3, 3] references
test_xmlrpc leaked [26, 26] references

On the gentoo machine that builds the docs, I would like to run the
tests.  2.x is currently running python without a problem.  In 3.0,
there is a strange error about receiving an Unknown signal 32.  I'm
guessing this is related to the new IO library, but that's really a
guess.  Does anyone have a clue about what's happening here?  I don't
think I can catch the signal (I tried).  Part of the reason I suspect
IO is that the problem seems to occur while running various tests that
use sockets.  test_poplib is often the first one.  But even if that's
skipped many other tests can cause the problem, including:  test_queue
test_smtplib test_socket test_socket_ssl test_socketserver.  Perhaps
there's a test that triggers the problem and almost any other test
seems to be causing the problem?

There are some unexplained oddities.  The most recent issue I saw was
this strange exception while running the tests:

  File "Lib/httplib.py", line 1157, in __init__
    HTTPConnection.__init__(self, host, port, strict, timeout)
TypeError: unbound method __init__() must be called with
FakeHTTPConnection instance as first argument (got HTTPSConnection
instance instead)

I've seen this exactly once.  I don't know what happened.  Completely
unrelated, I also had a problem with using uninitialized memory from
test_bytes.  That also only happened once.  It could have been a
problem with an underlying library.

n

From amauryfa at gmail.com  Mon Aug 27 10:22:42 2007
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Mon, 27 Aug 2007 10:22:42 +0200
Subject: [Python-3000] status (preparing for first alpha)
In-Reply-To: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>
References: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>
Message-ID: <e27efe130708270122y2f9afe62h9f55525eff861927@mail.gmail.com>

Hello,

Neal Norwitz wrote:
> There was a report that Windows/VC8
> was able to build python but it crashed in test_builtin.  Can anyone
> confirm this?

After some more digging:
- Only the debug build is concerned. No crash with a release build.
- The crash is a stack overflow.
- the failing function is test_cmp() in test_builtin.py, and indeed it
tries to "verify that circular objects are not handled", by expecting
a RuntimeErrror.
- The debugger stops in PyUnicode_EncodeUTF8. This function defines
somewhat large variable:
    #define MAX_SHORT_UNICHARS 300  /* largest size we'll do on the stack */
    char stackbuf[MAX_SHORT_UNICHARS * 4];

I suspect that the stack requirements for a recursive __cmp__ have increased.
It may be lower for a release build thanks to compiler optimizations.
I will try to come later with more precise measurements.

-- 
Amaury Forgeot d'Arc

From greg at electricrain.com  Mon Aug 27 09:59:25 2007
From: greg at electricrain.com (Gregory P. Smith)
Date: Mon, 27 Aug 2007 00:59:25 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <20070824165823.GM24059@electricrain.com>
References: <ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46B7FACC.8030503@v.loewis.de>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
	<20070824165823.GM24059@electricrain.com>
Message-ID: <20070827075925.GT24059@electricrain.com>

On Fri, Aug 24, 2007 at 09:58:24AM -0700, Gregory P. Smith wrote:
> On Thu, Aug 23, 2007 at 09:17:04PM -0700, Guido van Rossum wrote:
> > On 8/23/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > > Gregory P. Smith wrote:
> > > > Wasn't a past mailing list thread claiming the bytes type was supposed
> > > > to be great for IO?  How's that possible unless we add a lock to the
> > > > bytesobject?
> > >
> > > Doesn't the new buffer protocol provide something for
> > > getting a locked view of the data? If so, it seems like
> > > bytes should implement that.
> > 
> > It *does* implement that! So there's the solution: these APIs should
> > not insist on bytes but use the buffer API. It's quite a bit of work I
> > suspect (especially since you can't use PyArg_ParseTuple with y# any
> > more) but worth it.
> > 
> > BTW PyUnicode should *not* support the buffer API.
> > 
> > I'll add both of these to the task spreadsheet.
> 
> this sounds good, i'll work on it today for bsddb and hashlib.

So I converted _bsddb.c to use the buffer API everywhere only to find
that bytes objects don't support the PyBUF_LOCKDATA option of the
buffer API...  I should've seen that coming.  :)  Anyways I opened a
bug to track that.  Its needed in order to release the GIL while doing
I/O from bytes objects.

 http://bugs.python.org/issue1035

My _bsddb patch is stored for posterity until issue1035 can be fixed
in issue1036.  I'll test it another day ignoring the mutability issues
(as the current _bssdb.c does with its direct use of bytes) and update
the patch after squashing bugs.

-gps


From skip at pobox.com  Mon Aug 27 13:12:53 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 27 Aug 2007 06:12:53 -0500
Subject: [Python-3000] status (preparing for first alpha)
In-Reply-To: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>
References: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>
Message-ID: <18130.45493.75756.332057@montanaro.dyndns.org>


    Neal> The string/unicode merge is making good progress.  There are less
    Neal> than 400 references to PyString.  Most of the references are in
    Neal> about 5-10 modules.  Less than 50 modules in the core have any
    Neal> references to PyString.  We still need help converting over to use
    Neal> unicode.  If you are interested in helping out, the spreadsheet is
    Neal> the best place to look for tasks:
    Neal> http://spreadsheets.google.com/ccc?key=pBLWM8elhFAmKbrhhh0ApQA&hl=en_US&pli=1

As someone who hasn't participated in the string->unicode conversion up to
this point (not even looking at any of the hundreds of related checkins)
it's not at all obvious how to correctly replace PyString_* with PyUnicode_*
in any given situation.  Is there some document somewhere that can be used
to at least give some hints?  (I think I asked this before.  I'm not sure I
got an answer.)

Also, given that we are close, shouldn't a few buildbots be set up?

Skip

From theller at ctypes.org  Mon Aug 27 13:17:37 2007
From: theller at ctypes.org (Thomas Heller)
Date: Mon, 27 Aug 2007 13:17:37 +0200
Subject: [Python-3000] status (preparing for first alpha)
In-Reply-To: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>
References: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>
Message-ID: <faubsh$1tv$1@sea.gmane.org>

Neal Norwitz schrieb:
> Py3k is progressing nicely.  We are planning the first alpha sometime
> this week.  The tests are mostly passing.  With all the churn over the
> last week, I'm sure it's about to change.  :-)  AFAIK, nearly all the
> tests pass on Linux and Mac OS X.  There was a report that Windows/VC8
> was able to build python but it crashed in test_builtin.  Can anyone
> confirm this?
> 
> Here are the tasks that we need help with before the alpha is released:
>  * Verify Windows build works with VC7 (currently the default compiler for 2.5)

The build works for me, now that I've fixed PCBuild\build_ssl.py for Python3.

>  * Verify Windows build passes all tests
Hehe.

For me Python3 still cannot 'import time', because of umlauts in the _tzname
libc variable:

  c:\svn\py3k\PCbuild>python_d
  Python 3.0x (py3k:57555, Aug 27 2007, 10:00:25) [MSC v.1310 32 bit (Intel)] on win32
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import time
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data
  [36397 refs]
  >>>

Setting the environment variable TZ to 'GMT' for example is a workaround for this problem.


Running the PCBuild\rt.bat script fails when it compares the expected output
with the actual output.  Some inspection shows that the comparison fails because
there are '\n' linefeeds in the expected and '\n\r' linefeeds in the actual output:

  c:\svn\py3k\PCbuild>python_d  -E -tt ../lib/test/regrtest.py
  test_grammar
  test test_grammar produced unexpected output:
  **********************************************************************
  *** mismatch between line 1 of expected output and line 1 of actual output:
  - test_grammar
  + test_grammar
  ?             +
  (['test_grammar\n'], ['test_grammar\r\n'])
  ... and so on ...

(The last line is printed by some code I added to Lib\regrtest.py.)

It seems that this behaviour was introduced by r57186:

  New I/O code from Tony Lownds implement newline feature correctly,
  and implements .newlines attribute in a 2.x-compatible fashion.

Temporarily reverting this change from Lib\io.py I can run the tests
without all this comparison failures.  What I see is:

...
test test_builtin failed -- Traceback (most recent call last):
  File "c:\svn\py3k\lib\test\test_builtin.py", line 1473, in test_round
    self.assertEqual(round(1e20), 1e20)
AssertionError: 0 != 1e+020
...
(a lot of failures in test_doctest.  Could this also be a line ending problem?)

Unicode errors in various tests:
  test_glob
  test test_glob failed -- Traceback (most recent call last):
    File "c:\svn\py3k\lib\test\test_glob.py", line 87, in test_glob_directory_names
      eq(self.glob('*', '*a'), [])
    File "c:\svn\py3k\lib\test\test_glob.py", line 41, in glob
      res = glob.glob(p)
    File "c:\svn\py3k\lib\glob.py", line 16, in glob
      return list(iglob(pathname))
    File "c:\svn\py3k\lib\glob.py", line 42, in iglob
      for name in glob_in_dir(dirname, basename):
    File "c:\svn\py3k\lib\glob.py", line 56, in glob1
      names = os.listdir(dirname)
  UnicodeDecodeError: 'utf8' codec can't decode bytes in position 27-31: unexpected end of data

I'll stop the report here.  A py3k buildbot on Windows would allow everyone to look at the test outcome.

Thomas


From theller at ctypes.org  Mon Aug 27 13:33:32 2007
From: theller at ctypes.org (Thomas Heller)
Date: Mon, 27 Aug 2007 13:33:32 +0200
Subject: [Python-3000] status (preparing for first alpha)
In-Reply-To: <faubsh$1tv$1@sea.gmane.org>
References: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>
	<faubsh$1tv$1@sea.gmane.org>
Message-ID: <faucqc$45a$1@sea.gmane.org>

Thomas Heller schrieb:
> Neal Norwitz schrieb:
>> Py3k is progressing nicely.  We are planning the first alpha sometime
>> this week.  The tests are mostly passing.  With all the churn over the
>> last week, I'm sure it's about to change.  :-)  AFAIK, nearly all the
>> tests pass on Linux and Mac OS X.  There was a report that Windows/VC8
>> was able to build python but it crashed in test_builtin.  Can anyone
>> confirm this?
>> 
>> Here are the tasks that we need help with before the alpha is released:
>>  * Verify Windows build works with VC7 (currently the default compiler for 2.5)
> 
> The build works for me, now that I've fixed PCBuild\build_ssl.py for Python3.
> 
>>  * Verify Windows build passes all tests

> Running the PCBuild\rt.bat script fails when it compares the expected output
> with the actual output.  Some inspection shows that the comparison fails because
> there are '\n' linefeeds in the expected and '\n\r' linefeeds in the actual output:
> 
>   c:\svn\py3k\PCbuild>python_d  -E -tt ../lib/test/regrtest.py
>   test_grammar
>   test test_grammar produced unexpected output:
>   **********************************************************************
>   *** mismatch between line 1 of expected output and line 1 of actual output:
>   - test_grammar
>   + test_grammar
>   ?             +
>   (['test_grammar\n'], ['test_grammar\r\n'])
>   ... and so on ...
> 
> (The last line is printed by some code I added to Lib\regrtest.py.)


http://bugs.python.org/issue1029 apparently fixes this problem.

Thomas


From guido at python.org  Mon Aug 27 16:12:59 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 07:12:59 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <20070827075925.GT24059@electricrain.com>
References: <ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
	<20070824165823.GM24059@electricrain.com>
	<20070827075925.GT24059@electricrain.com>
Message-ID: <ca471dc20708270712p3170fbb1o9450e24c11042bd5@mail.gmail.com>

On 8/27/07, Gregory P. Smith <greg at electricrain.com> wrote:
> So I converted _bsddb.c to use the buffer API everywhere only to find
> that bytes objects don't support the PyBUF_LOCKDATA option of the
> buffer API...  I should've seen that coming.  :)  Anyways I opened a
> bug to track that.  Its needed in order to release the GIL while doing
> I/O from bytes objects.
>
>  http://bugs.python.org/issue1035
>
> My _bsddb patch is stored for posterity until issue1035 can be fixed
> in issue1036.  I'll test it another day ignoring the mutability issues
> (as the current _bssdb.c does with its direct use of bytes) and update
> the patch after squashing bugs.

Adding data locking shouldn't be too complicated, but is it necessary?
The bytes object does support locking the buffer in place; isn't that
enough? It means someone evil could still produce a phase error by
changing the contents while you're looking at it (basically sabotaging
their own application) but I don't see how they could cause a segfault
that way.

Even if you really need the LOCKDATA feature, perhaps you can check in
a slight mod of your code that uses SIMPLE for now -- use a macro for
the flags that's defined as PyBUF_SIMPLE and add a comment that you'd
like it to be LOCKDATA once bytes support that.

That way we have less code in the tracker and more in subversion --
always a good thing IMO.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Aug 27 16:21:50 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 07:21:50 -0700
Subject: [Python-3000] status (preparing for first alpha)
In-Reply-To: <18130.45493.75756.332057@montanaro.dyndns.org>
References: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>
	<18130.45493.75756.332057@montanaro.dyndns.org>
Message-ID: <ca471dc20708270721s7ecfba17uf6472a32ae87b44b@mail.gmail.com>

On 8/27/07, skip at pobox.com <skip at pobox.com> wrote:
> As someone who hasn't participated in the string->unicode conversion up to
> this point (not even looking at any of the hundreds of related checkins)
> it's not at all obvious how to correctly replace PyString_* with PyUnicode_*
> in any given situation.  Is there some document somewhere that can be used
> to at least give some hints?  (I think I asked this before.  I'm not sure I
> got an answer.)

There isn't one recipe. You first have to decide whether a particular
API should use bytes or str. I would like to write something up
because I know it will be important for maintainers of extension
modules; but I don't have the time right now.

> Also, given that we are close, shouldn't a few buildbots be set up?

Agreed. Neal tried to set up a buildbot on the only machine he can
easily use for this, but that's the "old gentoo box" where he keeps
getting signal 32. (I suspect this may be a kernel bug and not our
fault.) I forget who can set up buildbots -- is it Martin? Can someone
else help?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Mon Aug 27 16:32:29 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 27 Aug 2007 09:32:29 -0500
Subject: [Python-3000] status (preparing for first alpha)
In-Reply-To: <ca471dc20708270721s7ecfba17uf6472a32ae87b44b@mail.gmail.com>
References: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>
	<18130.45493.75756.332057@montanaro.dyndns.org>
	<ca471dc20708270721s7ecfba17uf6472a32ae87b44b@mail.gmail.com>
Message-ID: <18130.57469.16905.629301@montanaro.dyndns.org>


    >> Also, given that we are close, shouldn't a few buildbots be set up?

    Guido> Agreed. Neal tried to set up a buildbot on the only machine he
    Guido> can easily use for this, but that's the "old gentoo box" where he
    Guido> keeps getting signal 32. (I suspect this may be a kernel bug and
    Guido> not our fault.) I forget who can set up buildbots -- is it
    Guido> Martin? Can someone else help?

I run a couple community buildbots on my G5 for SQLAlchemy.  I can set one
up there for py3k if desired.  Just let me know what to do.

Skip

From aahz at pythoncraft.com  Mon Aug 27 18:32:51 2007
From: aahz at pythoncraft.com (Aahz)
Date: Mon, 27 Aug 2007 09:32:51 -0700
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <fascp4$kqj$1@sea.gmane.org>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<faquf9$f8j$1@sea.gmane.org> <20070826051302.GC24678@panix.com>
	<18129.26473.77328.489985@montanaro.dyndns.org>
	<20070826140718.GA15100@panix.com> <fascp4$kqj$1@sea.gmane.org>
Message-ID: <20070827163251.GA9067@panix.com>

On Sun, Aug 26, 2007, Neil Schemenauer wrote:
> Aahz <aahz at pythoncraft.com> wrote:
>>
>> -0 on the idea of making "batteries included" include PyPI packages.
>> Anything part of "batteries included" IMO should just be part of the
>> standard install.
> 
> I think you misunderstand the proposal.  The "batteries" would be
> included as part of the final Python release.  From the end user's
> point of view there would be no change from the current model.  The
> difference would be from the Python developer's point of view.  Some
> libraries would no longer be part of SVN checkout and you would have
> to run a script to pull them into your source tree.

Given how little dev I do, I'm not entitled to an opinion, but given the
number of messages I see to the mailing lists that end up as being
checkout synch problems, I see this as a recipe for trouble, particularly
for regression testing.

Because this is just an infrastructure/procedure change to the dev
process, it should be easy enough to revert if it proves problematic, so
I remove my -0.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you don't know what your program is supposed to do, you'd better not
start writing it."  --Dijkstra

From rhamph at gmail.com  Mon Aug 27 19:21:21 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 27 Aug 2007 11:21:21 -0600
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <fascp4$kqj$1@sea.gmane.org>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<faquf9$f8j$1@sea.gmane.org> <20070826051302.GC24678@panix.com>
	<18129.26473.77328.489985@montanaro.dyndns.org>
	<20070826140718.GA15100@panix.com> <fascp4$kqj$1@sea.gmane.org>
Message-ID: <aac2c7cb0708271021o17371fd6ob51997137fb130e1@mail.gmail.com>

On 8/26/07, Neil Schemenauer <nas at arctrix.com> wrote:
> Aahz <aahz at pythoncraft.com> wrote:
> > -0 on the idea of making "batteries included" include PyPI packages.
> > Anything part of "batteries included" IMO should just be part of the
> > standard install.
>
> I think you misunderstand the proposal.  The "batteries" would be
> included as part of the final Python release.  From the end user's
> point of view there would be no change from the current model.  The
> difference would be from the Python developer's point of view.  Some
> libraries would no longer be part of SVN checkout and you would have
> to run a script to pull them into your source tree.
>
> IMO, depending on PyPI not necessary or even desirable.  All that's
> necessary is that the batteries conform to some standards regarding
> layout, documentation and unit tests.  They could be pulled based on
> URLs and the hostname of the URL is not important.  That scheme
> would make is easier for someone to make a sumo distribution just by
> adding more URLs to the list before building it.

This would complicate the work of various packaging systems.  Either
they'd need to build their own mechanism to pull the sources from
their archives, or they'd split them into separate packages and would
no longer distribute with all the "batteries included" packages by
default.

Or more likely they'd pull them into a single source tarball in
advance and ignore the whole mess.  We'd probably distribute a single
source tarball too, so we'd only burden the developers with this whole
architecture.

-1.  email is a temporary situation.  There are no consequences, so no
further thought is needed.

-- 
Adam Olsen, aka Rhamphoryncus

From jimjjewett at gmail.com  Mon Aug 27 19:59:40 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 27 Aug 2007 13:59:40 -0400
Subject: [Python-3000] How should the hash digest of a Unicode string be
	computed?
In-Reply-To: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
References: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
Message-ID: <fb6fbf560708271059n6f0d05c7ndc7693381190f598@mail.gmail.com>

On 8/26/07, Guido van Rossum <guido at python.org> wrote:
> But I'm wondering if passing a Unicode string to the various hash
> digest functions should work at all! Hashes are defined on sequences
> of bytes, and IMO we should insist on the user to pass us bytes, and
> not second-guess what to do with Unicode.

Conceptually, unicode *by itself* can't be represented as a buffer.

What can be represented is a unicode string + an encoding.  The
question is whether the hash function needs to know the encoding to
figure out the hash.

If you're hashing arbitrary bytes, then it doesn't really matter --
there is no expectation that a recoding should have the same hash.

For hashing as a shortcut to __ne__, it does matter for text.

Unfortunately, for historical reasons, plenty of code grabs the string
buffer expecting text.

For dict comparisons, we really ought to specify the equality (and
therefore hash) in terms of a canonical equivalent, encoded in X (It
isn't clear to me that X should be UTF-8 in particular, but the main
thing is to pick something.)

The alternative is that defensive code will need to do a (normally
useless boilerplate) decode/canonicalize/reencode dance before
dictionary checks and insertions.

I would rather see that boilerplate done once in the unicode type (and
again in any equivalent types, if need be), because
   (1)  most storage type/encodings would be able to take shortcuts.
   (2)  if people don't do the defensive coding, the bugs will be very obscure

-jJ

From guido at python.org  Mon Aug 27 20:05:30 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 11:05:30 -0700
Subject: [Python-3000] How should the hash digest of a Unicode string be
	computed?
In-Reply-To: <fb6fbf560708271059n6f0d05c7ndc7693381190f598@mail.gmail.com>
References: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
	<fb6fbf560708271059n6f0d05c7ndc7693381190f598@mail.gmail.com>
Message-ID: <ca471dc20708271105k6d185305s99d5122ef789a00e@mail.gmail.com>

On 8/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/26/07, Guido van Rossum <guido at python.org> wrote:
> > But I'm wondering if passing a Unicode string to the various hash
> > digest functions should work at all! Hashes are defined on sequences
> > of bytes, and IMO we should insist on the user to pass us bytes, and
> > not second-guess what to do with Unicode.
>
> Conceptually, unicode *by itself* can't be represented as a buffer.
>
> What can be represented is a unicode string + an encoding.  The
> question is whether the hash function needs to know the encoding to
> figure out the hash.
>
> If you're hashing arbitrary bytes, then it doesn't really matter --
> there is no expectation that a recoding should have the same hash.
>
> For hashing as a shortcut to __ne__, it does matter for text.
>
> Unfortunately, for historical reasons, plenty of code grabs the string
> buffer expecting text.

Such code is broken, and this will be an error soon. I think this
handles all the other issues -- as promised, *any* operation that
mixes str and bytes (or anything else supporting the buffer API) will
fail with a TypeError unless an encoding is specified explicitly.

> For dict comparisons, we really ought to specify the equality (and
> therefore hash) in terms of a canonical equivalent, encoded in X (It
> isn't clear to me that X should be UTF-8 in particular, but the main
> thing is to pick something.)

No, dict keys can't be bytes or buffers.

> The alternative is that defensive code will need to do a (normally
> useless boilerplate) decode/canonicalize/reencode dance before
> dictionary checks and insertions.
>
> I would rather see that boilerplate done once in the unicode type (and
> again in any equivalent types, if need be), because
>    (1)  most storage type/encodings would be able to take shortcuts.
>    (2)  if people don't do the defensive coding, the bugs will be very obscure

There is no dance.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brakhane at googlemail.com  Mon Aug 27 20:01:49 2007
From: brakhane at googlemail.com (Dennis Brakhane)
Date: Mon, 27 Aug 2007 20:01:49 +0200
Subject: [Python-3000] Will standard library modules comply with PEP 8?
Message-ID: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com>

Hi,

sorry if this has been answered before, I search the mailing list and
didn't find anything.

I'd like to ask if the modules in the standard library will comply
with PEP 8. I've always found it weird that - in the logging module,
for example - I have to get the logger via getLogger instead of
get_logger. I understand that the logging module is older than PEP 8
and therefore couldn't be changed. So if there's a time to "fix"
logging, it'd probably be now.

Greetings,
  Dennis

From brett at python.org  Mon Aug 27 21:25:35 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 27 Aug 2007 12:25:35 -0700
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com>
References: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com>
Message-ID: <bbaeab100708271225r49f7e0a0x23426d5a7ce3f821@mail.gmail.com>

On 8/27/07, Dennis Brakhane <brakhane at googlemail.com> wrote:
> Hi,
>
> sorry if this has been answered before, I search the mailing list and
> didn't find anything.
>
> I'd like to ask if the modules in the standard library will comply
> with PEP 8. I've always found it weird that - in the logging module,
> for example - I have to get the logger via getLogger instead of
> get_logger. I understand that the logging module is older than PEP 8
> and therefore couldn't be changed. So if there's a time to "fix"
> logging, it'd probably be now.

Standard library decisions have not been made yet.  But this could
definitely be a possibility.

-Brett

From g.brandl at gmx.net  Mon Aug 27 21:31:47 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 27 Aug 2007 21:31:47 +0200
Subject: [Python-3000] [patch] roman.py
In-Reply-To: <ca471dc20708250756k5fe3d67ayc52528dd4cfde00@mail.gmail.com>
References: <fapfev$6s6$1@sea.gmane.org>
	<ca471dc20708250756k5fe3d67ayc52528dd4cfde00@mail.gmail.com>
Message-ID: <fav8qs$6e3$1@sea.gmane.org>

Guido van Rossum schrieb:
> Thanks, applied.
> 
> There's a lot more to bing able to run "make html PYTHON=python3.0"
> successfully, isn't there?

Yes, there is; IMO it won't have to work for alpha1, but I'll work on this
during the next few weeks.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From guido at python.org  Mon Aug 27 21:37:24 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 12:37:24 -0700
Subject: [Python-3000] [patch] roman.py
In-Reply-To: <fav8qs$6e3$1@sea.gmane.org>
References: <fapfev$6s6$1@sea.gmane.org>
	<ca471dc20708250756k5fe3d67ayc52528dd4cfde00@mail.gmail.com>
	<fav8qs$6e3$1@sea.gmane.org>
Message-ID: <ca471dc20708271237q3071eb28s329e425f607a847a@mail.gmail.com>

On 8/27/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Guido van Rossum schrieb:
> > Thanks, applied.
> >
> > There's a lot more to bing able to run "make html PYTHON=python3.0"
> > successfully, isn't there?
>
> Yes, there is; IMO it won't have to work for alpha1, but I'll work on this
> during the next few weeks.

Right. That's why I forced PYTHON = python2.5 in the Makefile for now... :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Mon Aug 27 21:44:01 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 27 Aug 2007 21:44:01 +0200
Subject: [Python-3000] A couple 2to3 questions
In-Reply-To: <18128.60288.458934.140003@montanaro.dyndns.org>
References: <18128.60288.458934.140003@montanaro.dyndns.org>
Message-ID: <fav9hp$8lb$1@sea.gmane.org>

skip at pobox.com schrieb:
> I ran 2to3 over the Doc/tools directory.  This left a number of problems
> which I initially began replacing manually.  I then realized that it would
> be better to tweak 2to3.  A couple things I wondered about:
> 
>     1. How are we supposed to maintain changes to Doc/tools?  Running svn
>        status doesn't show any changes.

The individual tools are checked out from different repositories on the first
"make html".

tools/docutils and tools/pygments are fixed versions of the
respective libraries, checked out from the svn.python.org/external/ repository.
I'll have Pygments 2to3-ready with the next (0.9) release, and I'll probably
look at docutils soon too, perhaps creating a branch.

tools/sphinx is checked out from svn.python.org/doctools/trunk and maintained
there, so if you want to change that code (which you're welcome to do) just
check it out from there.
(In theory, if you cd to tools/sphinx, you can use svn from there too.)

Cheers,
Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From lists at cheimes.de  Mon Aug 27 21:59:30 2007
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 27 Aug 2007 21:59:30 +0200
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com>
References: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com>
Message-ID: <favafe$bve$1@sea.gmane.org>

Dennis Brakhane wrote:
> I'd like to ask if the modules in the standard library will comply
> with PEP 8. I've always found it weird that - in the logging module,
> for example - I have to get the logger via getLogger instead of
> get_logger. I understand that the logging module is older than PEP 8
> and therefore couldn't be changed. So if there's a time to "fix"
> logging, it'd probably be now.

If I were in the position to decide I would rather change the PEP than
the logging module. I prefer Zope 3 style camel case names for public
attributes and methods
(http://wiki.zope.org/zope3/ZopePythonNamingConventions point 3) over
underscore names. I like to see the camel case style for public names as
an alternative in PEP 8. I find it easier to read and less to type. But
again it is just my personal and subjective opinion.

Provided that a package uses a *single* style I can live with both
styles but I'm using the camel case style for my projects.

Chrstian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://mail.python.org/pipermail/python-3000/attachments/20070827/edb2da45/attachment.pgp 

From nas at arctrix.com  Mon Aug 27 22:01:58 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Mon, 27 Aug 2007 14:01:58 -0600
Subject: [Python-3000] Removing email package until it's fixed
In-Reply-To: <aac2c7cb0708271021o17371fd6ob51997137fb130e1@mail.gmail.com>
References: <ca471dc20708250636t5c0c6c1er50621f837a1a0f0b@mail.gmail.com>
	<8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org>
	<bbaeab100708251500s48810d7fn521dcc74c2a42db7@mail.gmail.com>
	<faquf9$f8j$1@sea.gmane.org> <20070826051302.GC24678@panix.com>
	<18129.26473.77328.489985@montanaro.dyndns.org>
	<20070826140718.GA15100@panix.com> <fascp4$kqj$1@sea.gmane.org>
	<aac2c7cb0708271021o17371fd6ob51997137fb130e1@mail.gmail.com>
Message-ID: <20070827200158.GA4566@arctrix.com>

On Mon, Aug 27, 2007 at 11:21:21AM -0600, Adam Olsen wrote:
> This would complicate the work of various packaging systems.

You're not getting it.  The tarball that we distribute as a Python
release would look basically like it does now (i.e. it would include
things like the "email" package).  I can't see how that would
complicate the life of anyone downstream of the people putting
together the Python release.

  Neil

From guido at python.org  Mon Aug 27 22:05:13 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 13:05:13 -0700
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <favafe$bve$1@sea.gmane.org>
References: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com>
	<favafe$bve$1@sea.gmane.org>
Message-ID: <ca471dc20708271305u6961c69h2ffd97c54e2fe613@mail.gmail.com>

On 8/27/07, Christian Heimes <lists at cheimes.de> wrote:
> Dennis Brakhane wrote:
> > I'd like to ask if the modules in the standard library will comply
> > with PEP 8. I've always found it weird that - in the logging module,
> > for example - I have to get the logger via getLogger instead of
> > get_logger. I understand that the logging module is older than PEP 8
> > and therefore couldn't be changed. So if there's a time to "fix"
> > logging, it'd probably be now.
>
> If I were in the position to decide I would rather change the PEP than
> the logging module. I prefer Zope 3 style camel case names for public
> attributes and methods
> (http://wiki.zope.org/zope3/ZopePythonNamingConventions point 3) over
> underscore names. I like to see the camel case style for public names as
> an alternative in PEP 8. I find it easier to read and less to type. But
> again it is just my personal and subjective opinion.

Let's not start another bikeshed color debate. The PEP has been
discussed, discussed again, and accepted.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Mon Aug 27 22:15:00 2007
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 27 Aug 2007 22:15:00 +0200
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <ca471dc20708271305u6961c69h2ffd97c54e2fe613@mail.gmail.com>
References: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com>	
	<favafe$bve$1@sea.gmane.org>
	<ca471dc20708271305u6961c69h2ffd97c54e2fe613@mail.gmail.com>
Message-ID: <46D330C4.3060409@cheimes.de>

Guido van Rossum wrote:
> Let's not start another bikeshed color debate. The PEP has been
> discussed, discussed again, and accepted.

*g* :]

I was on the verge of writing that I don't want to start another bike
shed [1] discussion ...

Christian

[1]
http://www.freebsd.org/cgi/getmsg.cgi?fetch=506636+517178+/usr/local/www/db/text/1999/freebsd-hackers/19991003.freebsd-hackers

From guido at python.org  Mon Aug 27 22:38:54 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 13:38:54 -0700
Subject: [Python-3000] Does bytes() need to support bytes(<str>, <encoding>)?
Message-ID: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>

I'm still working on stricter enforcement of the "don't mix str and
bytes" rule. I'm finding a lot of trivial problems, which are
relatively easy to fix but time-consuming.

While doing this, I realize there are two idioms for converting a str
to bytes: s.encode(e) or bytes(s, e). These have identical results. I
think we can't really drop s.encode(), for symmetry with b.decode().
So is bytes(s, e) redundant?

To make things murkier, str(b, e) is not quite redundant compared to
b.encode(e), since str(b, e) also accepts buffer objects. But this
doesn't apply to bytes(s, e) -- that one *only* accepts str. (NB:
bytes(x) is a different API and accepts a different set of types.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From john.m.camara at comcast.net  Tue Aug 28 00:16:51 2007
From: john.m.camara at comcast.net (john.m.camara at comcast.net)
Date: Mon, 27 Aug 2007 22:16:51 +0000
Subject: [Python-3000] Will standard library modules comply with PEP 8?
Message-ID: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>

> Message: 10
> Date: Mon, 27 Aug 2007 13:05:13 -0700
> From: "Guido van Rossum" <guido at python.org>
> Subject: Re: [Python-3000] Will standard library modules comply with
> 	PEP 8?
> To: "Christian Heimes" <lists at cheimes.de>
> Cc: python-3000 at python.org
> Message-ID:
> 	<ca471dc20708271305u6961c69h2ffd97c54e2fe613 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
On 8/27/07, "Guido van Rossum" <guido at python.org> wrote:
> On 8/27/07, Christian Heimes <lists at cheimes.de> wrote:
> > Dennis Brakhane wrote:
> > > I'd like to ask if the modules in the standard library will comply
> > > with PEP 8. I've always found it weird that - in the logging module,
> > > for example - I have to get the logger via getLogger instead of
> > > get_logger. I understand that the logging module is older than PEP 8
> > > and therefore couldn't be changed. So if there's a time to "fix"
> > > logging, it'd probably be now.
> >
> > If I were in the position to decide I would rather change the PEP than
> > the logging module. I prefer Zope 3 style camel case names for public
> > attributes and methods
> > (http://wiki.zope.org/zope3/ZopePythonNamingConventions point 3) over
> > underscore names. I like to see the camel case style for public names as
> > an alternative in PEP 8. I find it easier to read and less to type. But
> > again it is just my personal and subjective opinion.
> 
> Let's not start another bikeshed color debate. The PEP has been
> discussed, discussed again, and accepted.
> 
Not trying to continue the bikeshed debate but just pointing out an area in PEP 8 which could be improved.

I would like to see PEP 8 remove the "as necessary to improve readability" in the function and method naming conventions.  That way methods like StringIO.getvalue() can be renamed to StringIO.get_value().

from PEP 8

      Function Names

            Function names should be lowercase, with words separated by underscores
            as necessary to improve readability.

            ...

      Method Names and Instance Variables

            Use the function naming rules: lowercase with words separated by
            underscores as necessary to improve readability.

            ...

John

From greg at krypto.org  Tue Aug 28 01:33:45 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 27 Aug 2007 16:33:45 -0700
Subject: [Python-3000] Does bytes() need to support bytes(<str>,
	<encoding>)?
In-Reply-To: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
References: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
Message-ID: <52dc1c820708271633k42b64363ufd64650d1fc02cb8@mail.gmail.com>

+1 from me, i don't see a reason for bytes(s, e) to exist when
s.encode(e) does the same job and is more symmetric.

On 8/27/07, Guido van Rossum <guido at python.org> wrote:
> I'm still working on stricter enforcement of the "don't mix str and
> bytes" rule. I'm finding a lot of trivial problems, which are
> relatively easy to fix but time-consuming.
>
> While doing this, I realize there are two idioms for converting a str
> to bytes: s.encode(e) or bytes(s, e). These have identical results. I
> think we can't really drop s.encode(), for symmetry with b.decode().
> So is bytes(s, e) redundant?
>
> To make things murkier, str(b, e) is not quite redundant compared to
> b.encode(e), since str(b, e) also accepts buffer objects. But this
> doesn't apply to bytes(s, e) -- that one *only* accepts str. (NB:
> bytes(x) is a different API and accepts a different set of types.)
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org
>

From guido at python.org  Tue Aug 28 02:16:37 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 17:16:37 -0700
Subject: [Python-3000] Need help enforcing strict str/bytes distinctions
Message-ID: <ca471dc20708271716s475bbebfs18f0390c86f3ac91@mail.gmail.com>

As anyone following the py3k checkins should have figured out by now,
I'm on a mission to require all code to be consistent about bytes vs.
str. For example binary files will soon refuse str arguments to
write(), and vice versa.

I have a patch that turns on this enforcement, but I have anout 14
failing unit tests that require a lot of attention. I'm hoping a few
folks might have time to help out.

Here are the unit tests that still need work:

test_asynchat
test_bsddb3
test_cgi
test_cmd_line
test_csv
test_doctest
test_gettext
test_httplib
test_shelve
test_sqlite
test_tarfile
test_urllib
test_urllib2
test_urllib2_localnet

Attached is the patch that makes them fail. Note that it forces an
error when you use PyBUF_CHARACTERS when calling PyObject_GetBuffer on
a str (PyUnicode) object.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strictbytes.diff
Type: text/x-patch
Size: 11911 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070827/cce5ffb8/attachment-0001.bin 

From adam at hupp.org  Tue Aug 28 03:58:51 2007
From: adam at hupp.org (Adam Hupp)
Date: Mon, 27 Aug 2007 21:58:51 -0400
Subject: [Python-3000] Support for newline and encoding arguments to
	open in tempfile module, also mktemp deprecation
In-Reply-To: <ca471dc20708261851h1b16c6acie6359d066729b90e@mail.gmail.com>
References: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com>
	<ca471dc20708261851h1b16c6acie6359d066729b90e@mail.gmail.com>
Message-ID: <766a29bd0708271858x2d76d15bn4344b7d431e0ac3f@mail.gmail.com>

On 8/26/07, Guido van Rossum <guido at python.org> wrote:
>
> Hm, why not just create the temporary file in binary mode and wrap an
> io.TextIOWrapper instance around it?

That works, but leaves TemporaryFile with a text mode that is somewhat
crippled.  TemporaryFile unconditionally uses the default filesystem
encoding when in text mode so it can't be relied upon to hold
arbitrary strings.  This is error prone and confusing IMO.

An additional reason for adding newline and encoding: TemporaryFile
has always taken all of the optional arguments open() has, namely
'mode' and 'bufsize'.  There is a nice symmetry in adding these new
arguments as well.


-- 
Adam Hupp | http://hupp.org/adam/

From guido at python.org  Tue Aug 28 04:03:22 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 19:03:22 -0700
Subject: [Python-3000] Support for newline and encoding arguments to
	open in tempfile module, also mktemp deprecation
In-Reply-To: <766a29bd0708271858x2d76d15bn4344b7d431e0ac3f@mail.gmail.com>
References: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com>
	<ca471dc20708261851h1b16c6acie6359d066729b90e@mail.gmail.com>
	<766a29bd0708271858x2d76d15bn4344b7d431e0ac3f@mail.gmail.com>
Message-ID: <ca471dc20708271903k59f64260u6e7b8d42769690c7@mail.gmail.com>

OK, I think you've convinced me.

Now, how about also making the default mode be text instead of binary?
I've got a hunch that text files are used more than binary files, even
where temporary files are concerned.

--Guido

On 8/27/07, Adam Hupp <adam at hupp.org> wrote:
> On 8/26/07, Guido van Rossum <guido at python.org> wrote:
> >
> > Hm, why not just create the temporary file in binary mode and wrap an
> > io.TextIOWrapper instance around it?
>
> That works, but leaves TemporaryFile with a text mode that is somewhat
> crippled.  TemporaryFile unconditionally uses the default filesystem
> encoding when in text mode so it can't be relied upon to hold
> arbitrary strings.  This is error prone and confusing IMO.
>
> An additional reason for adding newline and encoding: TemporaryFile
> has always taken all of the optional arguments open() has, namely
> 'mode' and 'bufsize'.  There is a nice symmetry in adding these new
> arguments as well.
>
>
> --
> Adam Hupp | http://hupp.org/adam/
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stephen at xemacs.org  Tue Aug 28 04:09:00 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Aug 2007 11:09:00 +0900
Subject: [Python-3000] PyBuffer ndim unsigned
In-Reply-To: <52dc1c820708252002v3efce97eu869fd46e97e88271@mail.gmail.com>
References: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com>
	<ca471dc20708251908w47c50f2ei28f24410fe46314e@mail.gmail.com>
	<52dc1c820708252002v3efce97eu869fd46e97e88271@mail.gmail.com>
Message-ID: <87ir70mmv7.fsf@uwakimon.sk.tsukuba.ac.jp>

Gregory P. Smith writes:

 > heh good point.  ignore that thought.  python is a signed language.  :)

For what little it's worth, I object strongly.  The problem isn't
Python, it's C.  Because the rules give unsigned precedence over
signed in implicit conversions, mixed signed/unsigned arithmetic in C
is just a world of pain.  It's especially dangerous when dealing with
Unix-convention stream functions where non-negative returns are
lengths and negative returns are error codes.  Often the only
indication you get is one of those stupid "due to insufficient range
of type comparison is always true" warnings.

In my experience except when dealing with standard functions of
unsigned type, it's best to avoid unsigned like the plague.  It's
worth the effort of doing a range check on an unsigned return and then
stuffing it into a signed if you got one big enough.

Sign me ... "Escaped from Unsigned Purgatory in XEmacs"

From adam at hupp.org  Tue Aug 28 04:54:31 2007
From: adam at hupp.org (Adam Hupp)
Date: Mon, 27 Aug 2007 22:54:31 -0400
Subject: [Python-3000] Need help enforcing strict str/bytes distinctions
In-Reply-To: <ca471dc20708271716s475bbebfs18f0390c86f3ac91@mail.gmail.com>
References: <ca471dc20708271716s475bbebfs18f0390c86f3ac91@mail.gmail.com>
Message-ID: <766a29bd0708271954t49ec448fhaac11c633de447f0@mail.gmail.com>

This patch (already in the tracker) fixes test_csv:

http://bugs.python.org/issue1033

On 8/27/07, Guido van Rossum <guido at python.org> wrote:
> As anyone following the py3k checkins should have figured out by now,
> I'm on a mission to require all code to be consistent about bytes vs.
> str. For example binary files will soon refuse str arguments to
> write(), and vice versa.
>
> I have a patch that turns on this enforcement, but I have anout 14
> failing unit tests that require a lot of attention. I'm hoping a few
> folks might have time to help out.
>
> Here are the unit tests that still need work:
>
> test_asynchat
> test_bsddb3
> test_cgi
> test_cmd_line
> test_csv
> test_doctest
> test_gettext
> test_httplib
> test_shelve
> test_sqlite
> test_tarfile
> test_urllib
> test_urllib2
> test_urllib2_localnet
>
> Attached is the patch that makes them fail. Note that it forces an
> error when you use PyBUF_CHARACTERS when calling PyObject_GetBuffer on
> a str (PyUnicode) object.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/adam%40hupp.org
>
>
>


-- 
Adam Hupp | http://hupp.org/adam/

From barry at python.org  Tue Aug 28 04:57:52 2007
From: barry at python.org (Barry Warsaw)
Date: Mon, 27 Aug 2007 22:57:52 -0400
Subject: [Python-3000] Does bytes() need to support bytes(<str>,
	<encoding>)?
In-Reply-To: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
References: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
Message-ID: <06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 27, 2007, at 4:38 PM, Guido van Rossum wrote:

> I'm still working on stricter enforcement of the "don't mix str and
> bytes" rule. I'm finding a lot of trivial problems, which are
> relatively easy to fix but time-consuming.
>
> While doing this, I realize there are two idioms for converting a str
> to bytes: s.encode(e) or bytes(s, e). These have identical results. I
> think we can't really drop s.encode(), for symmetry with b.decode().
> So is bytes(s, e) redundant?

I think it might be.  I've hit this several time while working on the  
email package and it's certainly confusing.  I've also run into  
situations where I did not like the default e=utf-8 argument for bytes 
().  Sometimes I am able to work around failures by doing this: "bytes 
(ord(c) for c in s)" until I found "bytes(s, 'raw-unicode-escape')"

I'm probably doing something really dumb to need that, but it does  
get me farther along.  I do intend to go back and look at those  
(there are only a few) when I get the rest of the package working again.

Getting back to the original question, I'd like to see "bytes(s, e)"  
dropped in favor of "s.encode(e)" and maayyybeee (he says bracing for  
the shout down) "bytes(s)" to be defined as "bytes(s, 'raw-unicode- 
escape')".

- -Barry


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCUAwUBRtOPMXEjvBPtnXfVAQKOoAP3RDpIXe1LHFCZuZmCGUlkg579RftvV4H+
Q8Roy+RUbCBlw17dZjjlfVUyESdCnLF0Pv2LHKm6fIvsUeKRpFFFeNbV71aTk8kB
zaZixFIhH7pQMReHiQ6Ich8SBnIxj0Hixz4KQ7tp8w1TENOE9secAtTWPhWSwIZU
09XeNyFXJw==
=orby
-----END PGP SIGNATURE-----

From barry at python.org  Tue Aug 28 05:02:34 2007
From: barry at python.org (Barry Warsaw)
Date: Mon, 27 Aug 2007 23:02:34 -0400
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>
Message-ID: <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote:

> I would like to see PEP 8 remove the "as necessary to improve  
> readability" in the function and method naming conventions.  That  
> way methods like StringIO.getvalue() can be renamed to  
> StringIO.get_value().

+1
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtOQSnEjvBPtnXfVAQJASQQAlkOOBa0Nvznx3saiN3d3SuzPA1AqhOqU
4D3lRSh4o6UdlorsXKYtP7KJJqha01lE5zb3hc4u3okmt6zXL11CKu74hBDTbMrR
5b3Q3Gw8b6Uvw+YqYF5P/39VkaEb3/FJ9Fq7r5qP4d8m3xAieAEJXsQdIewM++qW
5TFohaILL28=
=b+dD
-----END PGP SIGNATURE-----

From guido at python.org  Tue Aug 28 05:20:15 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 20:20:15 -0700
Subject: [Python-3000] Does bytes() need to support bytes(<str>,
	<encoding>)?
In-Reply-To: <06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org>
References: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
	<06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org>
Message-ID: <ca471dc20708272020x2301e3f3tcd59c773a05f3c@mail.gmail.com>

On 8/27/07, Barry Warsaw <barry at python.org> wrote:
> On Aug 27, 2007, at 4:38 PM, Guido van Rossum wrote:
>
> > I'm still working on stricter enforcement of the "don't mix str and
> > bytes" rule. I'm finding a lot of trivial problems, which are
> > relatively easy to fix but time-consuming.
> >
> > While doing this, I realize there are two idioms for converting a str
> > to bytes: s.encode(e) or bytes(s, e). These have identical results. I
> > think we can't really drop s.encode(), for symmetry with b.decode().
> > So is bytes(s, e) redundant?
>
> I think it might be.  I've hit this several time while working on the
> email package and it's certainly confusing.  I've also run into
> situations where I did not like the default e=utf-8 argument for bytes
> ().  Sometimes I am able to work around failures by doing this: "bytes
> (ord(c) for c in s)" until I found "bytes(s, 'raw-unicode-escape')"
>
> I'm probably doing something really dumb to need that, but it does
> get me farther along.  I do intend to go back and look at those
> (there are only a few) when I get the rest of the package working again.
>
> Getting back to the original question, I'd like to see "bytes(s, e)"
> dropped in favor of "s.encode(e)" and maayyybeee (he says bracing for
> the shout down) "bytes(s)" to be defined as "bytes(s, 'raw-unicode-
> escape')".

I see a consensus developing for dropping bytes(s, e). Start avoiding
it like the plague now to help reduce the work needed once it's
actually gone.

But I don't see the point of defaulting to raw-unicode-escape --
what's the use case for that? I think you should just explicitly say
s.encode('raw-unicode-escape') where you need that. Any reason you
can't?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 28 05:22:06 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 20:22:06 -0700
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>
	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
Message-ID: <ca471dc20708272022t1e3bf36ct107ac512ef4226bf@mail.gmail.com>

On 8/27/07, Barry Warsaw <barry at python.org> wrote:
>
> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote:
>
> > I would like to see PEP 8 remove the "as necessary to improve
> > readability" in the function and method naming conventions.  That
> > way methods like StringIO.getvalue() can be renamed to
> > StringIO.get_value().
>
> +1
> - -Barry

Sure, but after the 3.0a1 release (slated for 8/31, i.e. this Friday).
We've got enough changes coming down the pike already that affect
every other file, and IMO this clearly belongs to the library reorg.

(I'm personally perfectly fine with getvalue(), but I understand
others don't see it that way.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stephen at xemacs.org  Tue Aug 28 05:30:20 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Aug 2007 12:30:20 +0900
Subject: [Python-3000] Python-3000 Digest, Vol 18, Issue 116
In-Reply-To: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net>
References: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net>
Message-ID: <87veb0z67n.fsf@uwakimon.sk.tsukuba.ac.jp>

john.m.camara at comcast.net writes:

 > Python can't include all the major packages but it is necessary for any 
 > language to support a good GUI package in order to be widely adopted 
 > by the masses.  [...]  My vote would 
 > be for wxPython but I'm not someone who truly cares much about GUIs 
 > as I much prefer to write the back ends of systems and stay far away from 
 > the front ends.

My experience with wxPython on Mac OS X using the MacPorts (formerly
DarwinPorts) distribution has been somewhat annoying.  wxPython seems
to be closely bound to wxWindows, which in turn has a raft of
dependencies making upgrades delicate.  It also seems to be quite
heavy compared to the more specialized GUIs like PyGTK and PyQt.


From guido at python.org  Tue Aug 28 05:36:30 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Aug 2007 20:36:30 -0700
Subject: [Python-3000] Need help enforcing strict str/bytes distinctions
In-Reply-To: <766a29bd0708271954t49ec448fhaac11c633de447f0@mail.gmail.com>
References: <ca471dc20708271716s475bbebfs18f0390c86f3ac91@mail.gmail.com>
	<766a29bd0708271954t49ec448fhaac11c633de447f0@mail.gmail.com>
Message-ID: <ca471dc20708272036y533aaa7cm982f1b37e482cfe3@mail.gmail.com>

So it does! Thanks! (Did you perchance borrow my time machine? :-)

--Guido

On 8/27/07, Adam Hupp <adam at hupp.org> wrote:
> This patch (already in the tracker) fixes test_csv:
>
> http://bugs.python.org/issue1033
>
> On 8/27/07, Guido van Rossum <guido at python.org> wrote:
> > As anyone following the py3k checkins should have figured out by now,
> > I'm on a mission to require all code to be consistent about bytes vs.
> > str. For example binary files will soon refuse str arguments to
> > write(), and vice versa.
> >
> > I have a patch that turns on this enforcement, but I have anout 14
> > failing unit tests that require a lot of attention. I'm hoping a few
> > folks might have time to help out.
> >
> > Here are the unit tests that still need work:
> >
> > test_asynchat
> > test_bsddb3
> > test_cgi
> > test_cmd_line
> > test_csv
> > test_doctest
> > test_gettext
> > test_httplib
> > test_shelve
> > test_sqlite
> > test_tarfile
> > test_urllib
> > test_urllib2
> > test_urllib2_localnet
> >
> > Attached is the patch that makes them fail. Note that it forces an
> > error when you use PyBUF_CHARACTERS when calling PyObject_GetBuffer on
> > a str (PyUnicode) object.
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/adam%40hupp.org
> >
> >
> >
>
>
> --
> Adam Hupp | http://hupp.org/adam/
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stephen at xemacs.org  Tue Aug 28 05:36:56 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Aug 2007 12:36:56 +0900
Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets)
In-Reply-To: <9CBCCF2F-B428-4D37-8C18-1EAFB86CD7D9@python.org>
References: <c09ffb51ed04383961b5e8ff223d43@gmail.com>
	<93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org>
	<87y7g0401v.fsf@uwakimon.sk.tsukuba.ac.jp>
	<9CBCCF2F-B428-4D37-8C18-1EAFB86CD7D9@python.org>
Message-ID: <87tzqkz5wn.fsf@uwakimon.sk.tsukuba.ac.jp>

Barry Warsaw writes:

 > Stephen, sorry to hear about your daughter and I hope she's going to  
 > be okay of course!

Oh, she's *fine*.  There's just a conflict between the Japanese
practice of vaccinating all school children against TB, and the
U.S. practice of testing for TB antibodies.  About 1 in 3 kids coming
from Japan to U.S. schools get snagged.  Annoying, but I'll trade this
for the problems with visas and the like that colleagues have had
*any* day.

 > haven't even looked at test_email_codecs.py yet.  Because of the way  
 > things are going to work with in put and output codecs, I'll  
 > definitely want to get some sanity checks with Asian codecs.

OK, *that* I can help with!


From talin at acm.org  Tue Aug 28 07:36:02 2007
From: talin at acm.org (Talin)
Date: Mon, 27 Aug 2007 22:36:02 -0700
Subject: [Python-3000] Python-3000 Digest, Vol 18, Issue 116
In-Reply-To: <87veb0z67n.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net>
	<87veb0z67n.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <46D3B442.20600@acm.org>

Stephen J. Turnbull wrote:
> john.m.camara at comcast.net writes:
> 
>  > Python can't include all the major packages but it is necessary for any 
>  > language to support a good GUI package in order to be widely adopted 
>  > by the masses.  [...]  My vote would 
>  > be for wxPython but I'm not someone who truly cares much about GUIs 
>  > as I much prefer to write the back ends of systems and stay far away from 
>  > the front ends.
> 
> My experience with wxPython on Mac OS X using the MacPorts (formerly
> DarwinPorts) distribution has been somewhat annoying.  wxPython seems
> to be closely bound to wxWindows, which in turn has a raft of
> dependencies making upgrades delicate.  It also seems to be quite
> heavy compared to the more specialized GUIs like PyGTK and PyQt.

Part of the problem is that all GUI toolkits today are heavy, because 
the set of standard widgets that a GUI toolkit is expected to support 
has grown enormously. A typical UI programmer today would be very 
disappointed in a toolkit that didn't support, say, multi-column grids, 
dynamic layout, tabbed dialogs, toolbars, static HTML rendering, and so 
on. I myself generally won't bother with a GUI toolkit that doesn't have 
draggable tabbed document windows, since I tend to design apps that use 
that style of document management.

I know that Greg Ewing was working on a "minimal" python GUI 
(http://www.cosc.canterbury.ac.nz/greg.ewing/python_gui/), but it hasn't 
been updated in over a year. And I'm not sure that a minimal toolkit is 
really all that useful. Even if you restricted it to only those things 
needed to write IDLE, that still means you have to have a text editor 
widget which is itself a major component.

But I sure would like a completely "Pythonic" GUI that supported all of 
the features that I need.

-- Talin

From greg.ewing at canterbury.ac.nz  Tue Aug 28 08:06:05 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 28 Aug 2007 18:06:05 +1200
Subject: [Python-3000] How should the hash digest of a Unicode string be
 computed?
In-Reply-To: <52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com>
References: <ca471dc20708261224k6d4ab9eer71882d93517ac647@mail.gmail.com>
	<52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com>
	<fat8fv$u9v$1@sea.gmane.org>
	<52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com>
Message-ID: <46D3BB4D.1000505@canterbury.ac.nz>

Gregory P. Smith wrote:
> A thought that just occurred to me... Would a PyBUF_CANONICAL flag be 
> useful instead of CHARACTERS?

I don't think the buffer API should be allowing for
anything that requires, or could require, the provider
to convert the data into a different form. It should
stick to being a way of getting direct access to the
underlying data, whatever form it's in.

There could be type codes for various representations
of Unicode if desired. But re-encoding should be a
separate step.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug 28 08:15:37 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 28 Aug 2007 18:15:37 +1200
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <ca471dc20708270712p3170fbb1o9450e24c11042bd5@mail.gmail.com>
References: <ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<20070822235929.GA12780@electricrain.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
	<20070824165823.GM24059@electricrain.com>
	<20070827075925.GT24059@electricrain.com>
	<ca471dc20708270712p3170fbb1o9450e24c11042bd5@mail.gmail.com>
Message-ID: <46D3BD89.4000909@canterbury.ac.nz>

Guido van Rossum wrote:
> someone evil could still produce a phase error by
> changing the contents while you're looking at it (basically sabotaging
> their own application) but I don't see how they could cause a segfault
> that way.

Maybe not in the same program, but if the data is
output from the back end of a compiler, and it
gets corrupted, then when you try to run the
resulting object file...

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug 28 08:27:39 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 28 Aug 2007 18:27:39 +1200
Subject: [Python-3000] Does bytes() need to support bytes(<str>,
 <encoding>)?
In-Reply-To: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
References: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
Message-ID: <46D3C05B.2060706@canterbury.ac.nz>

Guido van Rossum wrote:
> I think we can't really drop s.encode(), for symmetry with b.decode().

Do we actually need b.decode()?

--
Greg

From greg at krypto.org  Tue Aug 28 08:33:24 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 27 Aug 2007 23:33:24 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <ca471dc20708270712p3170fbb1o9450e24c11042bd5@mail.gmail.com>
References: <ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<46CD2209.8000408@v.loewis.de>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
	<20070824165823.GM24059@electricrain.com>
	<20070827075925.GT24059@electricrain.com>
	<ca471dc20708270712p3170fbb1o9450e24c11042bd5@mail.gmail.com>
Message-ID: <52dc1c820708272333m517bb5d0v3c658f9eb46ea16b@mail.gmail.com>

> Adding data locking shouldn't be too complicated, but is it necessary?
> The bytes object does support locking the buffer in place; isn't that
> enough? It means someone evil could still produce a phase error by
> changing the contents while you're looking at it (basically sabotaging
> their own application) but I don't see how they could cause a segfault
> that way.

I'm sure the BerkeleyDB library is not expecting the data passed in as
a lookup key to change mid database traversal.  No idea if it'll
handle that gracefully or not but I wouldn't expect it to and bet its
possible to cause a segfault and/or irrepairable database damage that
way.  The same goes for any other C APIs that you may pass data to
that release the GIL.

> Even if you really need the LOCKDATA feature, perhaps you can check in
> a slight mod of your code that uses SIMPLE for now -- use a macro for
> the flags that's defined as PyBUF_SIMPLE and add a comment that you'd
> like it to be LOCKDATA once bytes support that.
>
> That way we have less code in the tracker and more in subversion --
> always a good thing IMO.

yeah i have it almost working in SIMPLE mode for now, i'll check it in
soon.  its no worse in behavior than the existing bytes object using
code currently checked in.

From greg.ewing at canterbury.ac.nz  Tue Aug 28 08:39:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 28 Aug 2007 18:39:20 +1200
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>
	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
Message-ID: <46D3C318.60403@canterbury.ac.nz>

On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote:

> I would like to see PEP 8 remove the "as necessary to improve  
> readability" in the function and method naming conventions.

That would leave a whole bunch of built-in stuff
non-conforming with PEP 8, though...

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Aug 28 09:18:13 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 28 Aug 2007 19:18:13 +1200
Subject: [Python-3000] Python-3000 Digest, Vol 18, Issue 116
In-Reply-To: <46D3B442.20600@acm.org>
References: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net>
	<87veb0z67n.fsf@uwakimon.sk.tsukuba.ac.jp> <46D3B442.20600@acm.org>
Message-ID: <46D3CC35.9040203@canterbury.ac.nz>

Talin wrote:
> I know that Greg Ewing was working on a "minimal" python GUI 
> (http://www.cosc.canterbury.ac.nz/greg.ewing/python_gui/), but it hasn't 
> been updated in over a year. And I'm not sure that a minimal toolkit is 
> really all that useful.

Don't worry, I haven't given up! And I plan to support
text editing by wrapping the Cocoa and gtk text widgets.

My belief is that a Python GUI wrapper can be both
lightweight and featureful, provided there is native
support on the platform concerned for the desired features.
If there isn't such support, that's a weakness of the
platform, not of the pure-Python wrapper philosophy.

--
Greg

From nnorwitz at gmail.com  Tue Aug 28 09:23:27 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Tue, 28 Aug 2007 00:23:27 -0700
Subject: [Python-3000] buildbots
Message-ID: <ee2a432c0708280023x377e0f77if438fa21589a190b@mail.gmail.com>

We got 'em.  Let the spam begin! :-)

This page is not linked from the web anywhere:
  http://python.org/dev/buildbot/3.0/

I'm not expecting a lot of signal out of them at the beginning.  All
but one has successfully compiled py3k though.  I noticed there were
many warnings on windows.  I wonder if they are important:

pythoncore - 0 error(s), 6 warning(s)
_ctypes - 0 error(s), 1 warning(s)
bz2 - 0 error(s), 9 warning(s)
_ssl - 0 error(s), 23 warning(s)
_socket - 0 error(s), 1 warning(s)

On trunk, the same machine only has:
bz2 - 0 error(s), 2 warning(s)

There are several other known warnings on various platforms:
Objects/stringobject.c:4104: warning: comparison is always false due
to limited range of data type
Python/import.c:886: warning: comparison is always true due to limited
range of data type
Python/../Objects/stringlib/unicodedefs.h:26: warning: 'STRINGLIB_CMP'
defined but not used

I find it interesting that the gentoo buildbot can run the tests to
completion even though I can't run the tests from the command line.
There was one error:

Traceback (most recent call last):
  File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/test/test_normalization.py",
line 36, in test_main
    for line in open_urlresource(TESTDATAURL):
  File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/io.py",
line 1240, in __next__
    line = self.readline()
  File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/io.py",
line 1319, in readline
    readahead, pending = self._read_chunk()
  File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/io.py",
line 1123, in _read_chunk
    pending = self._decoder.decode(readahead, not readahead)
  File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/encodings/ascii.py",
line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position
105: ordinal not in range(128)

The alpha has this problem:

test_socket
sem_init: Too many open files
Unhandled exception in thread started by <bound method
BasicSocketPairTest.clientRun of <test.test_socket.BasicSocketPairTest
testMethod=testRecv>>
Traceback (most recent call last):
  File "/net/taipan/scratch1/nnorwitz/python/3.0.norwitz-tru64/build/Lib/test/test_socket.py",
line 124, in clientRun
    self.server_ready.wait()
  File "/net/taipan/scratch1/nnorwitz/python/3.0.norwitz-tru64/build/Lib/threading.py",
line 367, in wait
    self._cond.wait(timeout)
  File "/net/taipan/scratch1/nnorwitz/python/3.0.norwitz-tru64/build/Lib/threading.py",
line 209, in wait
    waiter = _allocate_lock()
thread.error: can't allocate lock
Fatal Python error: UNREF invalid object
*** IOT/Abort trap

Also test_long failed on the Alpha.

ia64 had this problem:

test test_builtin failed -- Traceback (most recent call last):
  File "/home/pybot/buildarea/3.0.klose-debian-ia64/build/Lib/test/test_builtin.py",
line 1474, in test_round
    self.assertEqual(round(1e20), 1e20)
AssertionError: 0 != 1e+20

Then:

test_tarfile
python: Objects/exceptions.c:1392: PyUnicodeDecodeError_Create:
Assertion `start < 2147483647' failed.
make: *** [buildbottest] Aborted

On the amd64 (ubuntu) test_unicode_file fails all 3 tests.

The windows buildbot seems to be failing due to line ending issues?

Another windows buildbot failed to compile:
_tkinter - 3 error(s), 1 warning(s)

See the link for more details.  Lots of little errors.  It doesn't
look like any buildbot will pass on the first run.  However, it looks
like many are pretty close.

n
PS Sorry about the spam on python-checkins.  It looks like there can
be only a single mailing list and that it's all or nothing for getting
mail.  At least I didn't see an obvious way to configure by branch.
You'll just have to filter out the stuff to py3k.

Since I always seem to recreate the steps necessary for adding a new
branch, here are some notes (mostly for me).  If anyone else wants to
help out with the buildbot, etc, that would be great.  To add a new
branch for a buildbot:
 * Add the branch in the buildbot master.cfg file.  2 places need to be updated.
 * Add new rules in the apache default configuration file (2 lines).
Make sure to use the same port number in both the changes.
 * Check in the buildbot master config.  apache config too?

Remember it takes a while (30-60 seconds) to restart both apache and
the buildbot master.  Both need to be restarted for the change to take
effect.

From lars at gustaebel.de  Tue Aug 28 09:44:20 2007
From: lars at gustaebel.de (Lars =?iso-8859-15?Q?Gust=E4bel?=)
Date: Tue, 28 Aug 2007 09:44:20 +0200
Subject: [Python-3000] Need help enforcing strict str/bytes distinctions
In-Reply-To: <ca471dc20708271716s475bbebfs18f0390c86f3ac91@mail.gmail.com>
References: <ca471dc20708271716s475bbebfs18f0390c86f3ac91@mail.gmail.com>
Message-ID: <20070828074420.GA15998@core.g33x.de>

On Mon, Aug 27, 2007 at 05:16:37PM -0700, Guido van Rossum wrote:
> As anyone following the py3k checkins should have figured out by now,
> I'm on a mission to require all code to be consistent about bytes vs.
> str. For example binary files will soon refuse str arguments to
> write(), and vice versa.
> 
> I have a patch that turns on this enforcement, but I have anout 14
> failing unit tests that require a lot of attention. I'm hoping a few
> folks might have time to help out.
> 
> Here are the unit tests that still need work:
> [...]
> test_tarfile

Fixed in r57608.

-- 
Lars Gust?bel
lars at gustaebel.de

The direct use of force is such a poor solution to any problem,
it is generally employed only by small children and large nations.
(David Friedman)

From python at rcn.com  Tue Aug 28 10:15:07 2007
From: python at rcn.com (Raymond Hettinger)
Date: Tue, 28 Aug 2007 01:15:07 -0700
Subject: [Python-3000] Will standard library modules comply with PEP 8?
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>
	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
Message-ID: <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1>

> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote:
>
>> I would like to see PEP 8 remove the "as necessary to improve
>> readability" in the function and method naming conventions.  That
>> way methods like StringIO.getvalue() can be renamed to
>> StringIO.get_value().

Gratuitous breakage -- for nothing.  This is idiotic, pedantic, and counterproductive.  (No offense intended, I'm talking about the 
suggestion, not the suggestor).

Ask ten of your programmer friends to write down "result equals object dot get value" and see if more than one in ten uses an 
underscore (no stacking the deck with Cobol programmers).


Raymond 

From eric+python-dev at trueblade.com  Tue Aug 28 11:49:06 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 28 Aug 2007 05:49:06 -0400
Subject: [Python-3000] buildbots
In-Reply-To: <ee2a432c0708280023x377e0f77if438fa21589a190b@mail.gmail.com>
References: <ee2a432c0708280023x377e0f77if438fa21589a190b@mail.gmail.com>
Message-ID: <46D3EF92.6020406@trueblade.com>

Neal Norwitz wrote:

> There are several other known warnings on various platforms:
...
> Python/../Objects/stringlib/unicodedefs.h:26: warning: 'STRINGLIB_CMP'
> defined but not used

I fixed this warning in r57613.  Unfortunately I had to change from an
inline function to a macro, but I don't see another way.



From barry at python.org  Tue Aug 28 13:40:23 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 28 Aug 2007 07:40:23 -0400
Subject: [Python-3000] Does bytes() need to support bytes(<str>,
	<encoding>)?
In-Reply-To: <ca471dc20708272020x2301e3f3tcd59c773a05f3c@mail.gmail.com>
References: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
	<06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org>
	<ca471dc20708272020x2301e3f3tcd59c773a05f3c@mail.gmail.com>
Message-ID: <7A1473AB-611F-4D53-82EB-E9682F2741CD@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 27, 2007, at 11:20 PM, Guido van Rossum wrote:

> But I don't see the point of defaulting to raw-unicode-escape --
> what's the use case for that? I think you should just explicitly say
> s.encode('raw-unicode-escape') where you need that. Any reason you
> can't?

Nope.  So what would bytes(s) do?

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtQJp3EjvBPtnXfVAQIh8AP+KFlZjz8sF40L/6AKZNYiOHn48HBitV8a
29Blv/JhTJlt7ZLEypm+SbudCfRmQTnUoBPfvTxezKhjHzaffaZyjqB308VlPqxv
nv3aTGJvxrQNDzJT1GeltddZj/GBG7Pk5ZpsjjejROe0OGHyGwpWXt0py6tfDED/
2Dk9Fdp8zCU=
=ESMM
-----END PGP SIGNATURE-----

From eric+python-dev at trueblade.com  Tue Aug 28 13:48:24 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 28 Aug 2007 07:48:24 -0400
Subject: [Python-3000] string.Formatter class
Message-ID: <46D40B88.4080202@trueblade.com>

One of the things that PEP 3101 deliberately under specifies is the 
Formatter class, leaving decisions up to the implementation.  Now that a 
working implementation exists, I think it's reasonable to tighten it up.

I have checked in a Formatter class that specifies the following methods 
(in addition to the ones already defined in the PEP):

parse(format_string)
Loops over the format_string and returns an iterable of tuples 
(literal_text, field_name, format_spec, conversion).  This is used by 
vformat to break the string in to either literal text, or fields that 
need expanding.  If literal_text is None, then expand (field_name, 
format_spec, conversion) and append it to the output.  If literal_text 
is not None, append it to the output.

get_field(field_name, args, kwargs, used_args)
Given a field_name as returned by parse, convert it to an object to be 
formatted.  The default version takes strings of the form defined in the 
PEP, such as "0[name]" or "label.title".  It records which args have 
been used in used_args.  args and kwargs are as passed in to vformat.

convert_field(value, conversion)
Converts the value (returned by get_field) using the conversion 
(returned by the parse tuple).  The default version understands 'r' 
(repr) and 's' (str).

Given these, we can define a formatter that uses the normal syntax, but 
calls its arguments to get their value:

=================
class CallFormatter(Formatter):
     def format_field(self, value, format_spec):
         return format(value(), format_spec)

fmt = CallFormatter()
print(fmt.format('*{0}*', datetime.datetime.now))
=================
which prints:
*2007-08-28 07:39:29.946909*

Or, something that uses vertical bars for separating markup:
=================
class BarFormatter(Formatter):
     # returns an iterable that contains tuples of the form:
     # (literal_text, field_name, format_spec, conversion)
     def parse(self, format_string):
         for field in format_string.split('|'):
             if field[0] == '+':
		# it's markup
                 field_name, _, format_spec = field[1:].partition(':')
                 yield None, field_name, format_spec, None
             else:
                 yield field, None, None, None
fmt = BarFormatter()
print(fmt.format('*|+0:^10s|*', 'foo'))
=================
which prints:
*   foo    *

Or, define your own conversion character:
=================
class XFormatter(Formatter):
     def convert_field(self, value, conversion):
         if conversion == 'x':
             return None
         if conversion == 'r':
             return repr(value)
         if conversion == 's':
             return str(value)
         return value
fmt = XFormatter()
print(fmt.format("{0!r}:{0!x}", fmt))
=================
which prints:
<__main__.XFormatter object at 0xf6f6d2cc>:None

These are obviously contrived examples, without great error checking, 
but I think they demonstrate the flexibility.  I'm not wild about the 
method names, so any suggestions are appreciated.  Any other comments 
are welcome, too.

Eric.

From barry at python.org  Tue Aug 28 14:22:37 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 28 Aug 2007 08:22:37 -0400
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <ca471dc20708272022t1e3bf36ct107ac512ef4226bf@mail.gmail.com>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>
	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
	<ca471dc20708272022t1e3bf36ct107ac512ef4226bf@mail.gmail.com>
Message-ID: <E3699EAD-DB62-46F9-8C1D-2BCF0A634ECF@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 27, 2007, at 11:22 PM, Guido van Rossum wrote:

> On 8/27/07, Barry Warsaw <barry at python.org> wrote:
>>
>> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote:
>>
>>> I would like to see PEP 8 remove the "as necessary to improve
>>> readability" in the function and method naming conventions.  That
>>> way methods like StringIO.getvalue() can be renamed to
>>> StringIO.get_value().
>>
>> +1
>> - -Barry
>
> Sure, but after the 3.0a1 release (slated for 8/31, i.e. this Friday).
> We've got enough changes coming down the pike already that affect
> every other file, and IMO this clearly belongs to the library reorg.

Yes definitely.  I was +1'ing the change to the PEP language.

- -Barry


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtQTjXEjvBPtnXfVAQJ/ygQAoLb1mHJcrIJzZGe3ACb+crVPvtOQ8j/f
7x/LRe3ETklODsegq7+kgy353Nfob8QKLjd+AAlB/44btO6pXMth0AeUQyZ9ZPFz
/BwfcDHij1UvdxfSRov9kspnGhd18rPeEfP+mnXsBGKFSgTdiCottB5C5yfmXI8z
2tEQnSQ2FGo=
=XI6x
-----END PGP SIGNATURE-----

From bwinton at latte.ca  Tue Aug 28 15:27:01 2007
From: bwinton at latte.ca (Blake Winton)
Date: Tue, 28 Aug 2007 09:27:01 -0400
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
	<014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1>
Message-ID: <46D422A5.7010501@latte.ca>

Raymond Hettinger wrote:
>> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote:
>>> I would like to see PEP 8 remove the "as necessary to improve
>>> readability" in the function and method naming conventions.  That
>>> way methods like StringIO.getvalue() can be renamed to
>>> StringIO.get_value().
> 
> Gratuitous breakage -- for nothing.  This is idiotic, pedantic,
 > and counterproductive.  (No offense intended, I'm talking about
 > the suggestion, not the suggestor).
> 
> Ask ten of your programmer friends to write down "result equals
 > object dot get value" and see if more than one in ten uses an
> underscore (no stacking the deck with Cobol programmers).

Sure, but given the rise of Java, how many of them will spell it with a 
capital 'V'?   ;)

On the one hand, I really like consistency in my programming languages.
On the other hand, a foolish consistency is the hobgoblin of little minds.

Later,
Blake.

From benji at benjiyork.com  Tue Aug 28 15:51:31 2007
From: benji at benjiyork.com (Benji York)
Date: Tue, 28 Aug 2007 09:51:31 -0400
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <46D422A5.7010501@latte.ca>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>	<014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1>
	<46D422A5.7010501@latte.ca>
Message-ID: <46D42863.8010300@benjiyork.com>

Blake Winton wrote:
> Raymond Hettinger wrote:
>>> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote:
>>>> I would like to see PEP 8 remove the "as necessary to improve
>>>> readability" in the function and method naming conventions.  That
>>>> way methods like StringIO.getvalue() can be renamed to
>>>> StringIO.get_value().
>> Gratuitous breakage -- for nothing.  This is idiotic, pedantic,
>  > and counterproductive.  (No offense intended, I'm talking about
>  > the suggestion, not the suggestor).
>> Ask ten of your programmer friends to write down "result equals
>  > object dot get value" and see if more than one in ten uses an
>> underscore (no stacking the deck with Cobol programmers).
> 
> Sure, but given the rise of Java, how many of them will spell it with a 
> capital 'V'?   ;)
> 
> On the one hand, I really like consistency in my programming languages.
> On the other hand, a foolish consistency is the hobgoblin of little minds.

I call quote misapplication.  Having predictable identifier names isn't 
"foolish".  Having to divine what is and is not "necessary to improve 
readability" isn't either, but is perhaps suboptimal.
-- 
Benji York
http://benjiyork.com

From ncoghlan at gmail.com  Tue Aug 28 16:08:45 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 29 Aug 2007 00:08:45 +1000
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <46D42863.8010300@benjiyork.com>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>	<014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1>	<46D422A5.7010501@latte.ca>
	<46D42863.8010300@benjiyork.com>
Message-ID: <46D42C6D.3050602@gmail.com>

Benji York wrote:
> Blake Winton wrote:
>> On the one hand, I really like consistency in my programming languages.
>> On the other hand, a foolish consistency is the hobgoblin of little minds.
> 
> I call quote misapplication.  Having predictable identifier names isn't 
> "foolish".  Having to divine what is and is not "necessary to improve 
> readability" isn't either, but is perhaps suboptimal.

On the gripping hand, breaking getattr, getitem, setattr, setitem, 
delattr and delitem without a *really* good reason would mean seriously 
annoying a heck of a lot of people for no real gain.

Being more consistent in following PEP 8 would be good, particularly for 
stuff which is going to break (or at least need to be looked at) anyway. 
The question of whether or not to change things which would otherwise be 
fine needs to be considered far more carefully.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Tue Aug 28 17:19:15 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Aug 2007 08:19:15 -0700
Subject: [Python-3000] Does bytes() need to support bytes(<str>,
	<encoding>)?
In-Reply-To: <46D3C05B.2060706@canterbury.ac.nz>
References: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
	<46D3C05B.2060706@canterbury.ac.nz>
Message-ID: <ca471dc20708280819q5687909ct887aca8612b6df51@mail.gmail.com>

On 8/27/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> > I think we can't really drop s.encode(), for symmetry with b.decode().
>
> Do we actually need b.decode()?

For symmetry with s.encode().

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 28 17:21:50 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Aug 2007 08:21:50 -0700
Subject: [Python-3000] Does bytes() need to support bytes(<str>,
	<encoding>)?
In-Reply-To: <7A1473AB-611F-4D53-82EB-E9682F2741CD@python.org>
References: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
	<06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org>
	<ca471dc20708272020x2301e3f3tcd59c773a05f3c@mail.gmail.com>
	<7A1473AB-611F-4D53-82EB-E9682F2741CD@python.org>
Message-ID: <ca471dc20708280821x3d6eef54l648a901379fa60db@mail.gmail.com>

On 8/28/07, Barry Warsaw <barry at python.org> wrote:

> On Aug 27, 2007, at 11:20 PM, Guido van Rossum wrote:
>
> > But I don't see the point of defaulting to raw-unicode-escape --
> > what's the use case for that? I think you should just explicitly say
> > s.encode('raw-unicode-escape') where you need that. Any reason you
> > can't?
>
> Nope.  So what would bytes(s) do?

Raise TypeError (when s is a str). The argument to bytes() must be
either an int (then it creates a zero-filled bytes bytes array of that
length) or an iterable of ints (then it creates a bytes array
initialized with those ints -- if any int is out of range, an
exception is raised, and also if any value is not an int).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Aug 28 17:26:57 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Aug 2007 08:26:57 -0700
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <46D42C6D.3050602@gmail.com>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>
	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
	<014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1>
	<46D422A5.7010501@latte.ca> <46D42863.8010300@benjiyork.com>
	<46D42C6D.3050602@gmail.com>
Message-ID: <ca471dc20708280826n60634825i7cb7f629fe987918@mail.gmail.com>

On 8/28/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Benji York wrote:
> > Blake Winton wrote:
> >> On the one hand, I really like consistency in my programming languages.
> >> On the other hand, a foolish consistency is the hobgoblin of little minds.
> >
> > I call quote misapplication.  Having predictable identifier names isn't
> > "foolish".  Having to divine what is and is not "necessary to improve
> > readability" isn't either, but is perhaps suboptimal.
>
> On the gripping hand, breaking getattr, getitem, setattr, setitem,
> delattr and delitem without a *really* good reason would mean seriously
> annoying a heck of a lot of people for no real gain.
>
> Being more consistent in following PEP 8 would be good, particularly for
> stuff which is going to break (or at least need to be looked at) anyway.
> The question of whether or not to change things which would otherwise be
> fine needs to be considered far more carefully.

The prudent way is to change the PEP (I'll do it) but to be
conservative in implementation. The PEP should be used to guide new
API design and the occasional grand refactoring; it should not be used
as an excuse to change every non-conforming API.

That said, I do want to get rid of all module and package names that
still use CapitalWords. Module names are relatively easy to fix with
2to3.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From thomas at python.org  Tue Aug 28 17:42:26 2007
From: thomas at python.org (Thomas Wouters)
Date: Tue, 28 Aug 2007 17:42:26 +0200
Subject: [Python-3000] Removing simple slicing
In-Reply-To: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>
References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com>
Message-ID: <9e804ac0708280842p734f0603l89c587dae6631bb7@mail.gmail.com>

I updated the patches destined for the trunk (slice-object support for all
objects that supported simple slicing, and actual extended slicing support
for most of them) and checked them in. Next stop is cleaning up the actual
slice-removal bits. I do have two remaining issues: what do we do about
PyMapping_Check(), and should I make the post an actual PEP? I'm thinking
PyMapping_Check() should be removed, or made to look at
tp_as_sequence->sq_item instead of tp_as_sequence->sq_slice and deprecated.
I also think this change should be documented in an actual PEP, as it'll be
a pretty big change for any C extension implementing simple slices but not
slice-object support.

On 8/24/07, Thomas Wouters <thomas at python.org> wrote:
>
>
> I did some work at last year's Google sprint on removing the simple
> slicing API (__getslice__, tp_as_sequence->sq_slice) in favour of the more
> flexible sliceobject API (__getitem__ and tp_as_mapping->mp_subscript using
> slice objects as index.) For some more detail, see the semi-PEP below. (I
> hesitate to call it a PEP because it's way past the Py3k PEP deadline, but
> the email I was originally going to send on this subject grew in such a size
> that I figured I might as well use PEP layout and use the opportunity to
> record some best practices and behaviour. And the change should probably be
> recorded in a PEP anyway, even though it has never been formally proposed,
> just taken as a given.)
>
> If anyone is bored and/or interested in doing some complicated work, there
> is still a bit of (optional) work to be done in this area: I uploaded
> patches to be applied to the trunk SF 8 months ago -- extended slicing
> support for a bunch of types. Some of that extended slicing support is
> limited to step-1 slices, though, most notably UserString.MutableStringand ctypes. I can guarantee adding non-step-1 support to them is a
> challenging and fulfilling exercise, having done it for several types, but I
> can't muster the intellectual stamina to do it for these (to me) fringe
> types. The patches can be found in Roundup: http://bugs.python.org/issue?%40search_text=&title=&%40columns=title&id=&%40columns=id&creation=&creator=twouters&activity=&%40columns=activity&%40sort=activity&actor=&type=&components=&versions=&severity=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&%40pagesize=50&%40startwith=0&%40action=search
> (there doesn't seem to be a shorter URL; just search for issues created by
> 'twouters' instead.)
>
> If nobody cares, I will be checking these patches into the trunk this
> weekend (after updating them), and then update and check in the rest of the
> p3yk-noslice branch into the py3k branch.
>
> Abstract
> ========
>
> This proposal discusses getting rid of the two types of slicing Python
> uses,
> ``simple`` and ``extended``. Extended slicing was added later, and uses a
> different API at both the C and the Python level for backward
> compatibility.
> Extended slicing can express everything simple slicing can express,
> however, making the simple slicing API practically redundant.
>
> A Tale of Two APIs
> ==================
>
> Simple slicing is a slice operation without a step, Ellipsis or tuple of
> slices -- the archetypical slice of just `start` and/or `stop`, with a
> single colon separating them and both sides being optional::
>
>     L[1:3]
>     L[2:]
>     L[:-5]
>     L[:]
>
> An extended slice is any slice that isn't simple::
>
>     L[1:5:2]
>     L[1:3, 8:10]
>     L[1, ..., 5:-2]
>     L[1:3:]
>
> (Note that the presence of an extra colon in the last example makes the
> very
> first simple slice an extended slice, but otherwise expresses the exact
> same
> slicing operation.)
>
> In applying a simple slice, Python does the work of translating omitted,
> out
> of bounds or negative indices into the appropriate actual indices, based
> on
> the length of the sequence. The normalized ``start`` and ``stop`` indices
> are then passed to the appropriate method: ``__getslice__``,
> ``__setslice__`` or ``__delslice__`` for Python classes,
> ``tp_as_sequence``'s ``sq_slice`` or ``sq_ass_slice`` for C types.
>
> For extended slicing, no special handling of slice indices is done. The
> indices in ``start:stop:step`` are wrapped in a ``slice`` object, with
> missing indices represented as None. The indices are otherwise taken
> as-is.
> The sequence object is then indexed with the slice object as if it were a
> mapping: ``__getitem__``,`` __setitem__`` or ``__delitem__`` for Python
> classes, ``tp_as_mapping``'s ``mp_subscript`` or ``mp_ass_subscript``.
> It is entirely up to the sequence to interpret the meaning of missing, out
>
> of bounds or negative indices, let alone non-numerical indices like tuples
> or Ellipsis or arbitrary objects.
>
> Since at least Python 2.1, applying a simple slice to an object that does
> not
> implement the simple slicing API will fall back to using extended slicing,
>
> calling __getitem__ (or mp_subscript) instead of __getslice__ (or
> sq_slice),
> and similarly for slice assignment/deletion.
>
> Problems
> ========
>
> Aside from the obvious disadvantage of having two ways to do the same
> thing,
> simple slicing is an inconvenient wart for several reasons:
>
>  1) It (passively) promotes supporting only simple slicing, as observed by
>     the builtin types only supporting extended slicing many years after
>     extended slicing was introduced.
>
>  2) The Python VM dedicates 12 of its opcodes, about 11%, to support
>     simple slicing, and effectively reserves another 13 for code
>     convenience. Reducing the Big Switch in the bytecode interpreter
>     would certainly not hurt Python performance.
>
>  5) The same goes for the number of functions, macros and
> function-pointers
>     supporting simple slicing, although the impact would be
> maintainability
>     and readability of the source rather than performance.
>
> Proposed Solution
> =================
>
> The proposed solution, as implemented in the p3yk-noslice SVN branch, gets
> rid of the simple slicing methods and PyType entries. The simple C API
> (using ``Py_ssize_t`` for start and stop) remains, but creates a slice
> object as necessary instead. Various types had to be updated to support
> slice objects, or improve the simple slicing case of extended slicing.
>
> The result is that ``__getslice__``, ``__setslice__`` and ``__delslice__``
> are no longer
> called in any situation. Classes that delegate ``__getitem__`` (or the C
> equivalent) to a sequence type get any slicing behaviour of that type for
> free. Classes that implement their own slicing will have to be modified to
>
> accept slice objects and process the indices themselves. This means that
> at
> the C level, like is already the case at the Python level, the same method
> is used for mapping-like access as for slicing. C types will still want to
>
> implement ``tp_as_sequence->sq_item``, but that function will only be
> called
> when using the ``PySequence_*Item()`` API. Those API functions do not
> (yet) fall
> back to using ``tp_as_mapping->mp_subscript``, although they possibly
> should.
>
> A casualty of this change is ``PyMapping_Check()``. It used to check for
> ``tp_as_mapping`` being available, and was modified to check for
> ``tp_as_mapping`` but *not* ``tp_as_sequence->sq_slice`` when extended
> slicing was added to the builtin types. It could conceivably check for
> ``tp_as_sequence->sq_item`` instead of ``sq_slice``, but the added value
> is
> unclear (especially considering ABCs.) In the standard library and CPython
>
> itself, ``PyMapping_Check()`` is used mostly to provide early errors, for
> instance by checking the arguments to ``exec()``.
>
> Alternate Solution
> ------------------
>
> A possible alternative to removing simple slicing completely, would be to
> introduce a new typestruct hook, with the same signature as
> ``tp_as_mapping->mp_subscript``, which would be called for slicing
> operations. All as-mapping index operations would have to fall back to
> this
> new ``sq_extended_slice`` hook, in order for ``seq[slice(...)]`` to work
> as
> expected. For some added efficiency and error-checking, expressions using
> actual slice syntax could compile into bytecodes specific for slicing (of
> which there would only be three, instead of twelve.) This approach would
> simplify C types wanting to support extended slicing but not
> arbitrary-object indexing (and vice-versa) somewhat, but the benefit seems
> too small to warrant the added complexity in the CPython runtime itself.
>
>
> Implementing Extended Slicing
> =============================
>
> Supporting extended slicing in C types is not as easily done as supporting
> simple slicing. There are a number of edgecases in interpreting the odder
> combinations of ``start``, ``stop`` and ``step``. This section tries to
> give
> some explanations and best practices.
>
> Extended Slicing in C
> ---------------------
>
> Because the mapping API takes precedence over the sequence API, any
> ``tp_as_mapping->mp_subscript`` and ``tp_as_mapping->mp_ass_subscript``
> functions need to proper typechecks on their argument. In Python 2.5 and
> later, this is best done using ``PyIndex_Check()`` and ``PySlice_Check()``
>
> (and possibly ``PyTuple_Check()`` and comparison against ``Py_Ellipsis``.)
> For compatibility with Python 2.4 and earlier, ``PyIndex_Check()`` would
> have to be replaced with ``PyInt_Check()`` and ``PyLong_Check()``.
>
> Indices that pass ``PyIndex_Check()`` should be converted to a
> ``Py_ssize_t`` using ``PyIndex_AsSsizeT()`` and delegated to
> ``tp_as_sequence->sq_item``. (For compatibility with Python 2.4, use
> ``PyNumber_AsLong()`` and downcast to an ``int`` instead.)
>
> The exact meaning of tuples of slices, and of Ellipsis, is up to the type,
> as no standard-library types support it. It may be useful to use the same
> convention as the Numpy package. Slices inside tuples, if supported,
> should
> probably follow the same rules as direct slices.
>
> From slice objects, correct indices can be extracted with
> ``PySlice_GetIndicesEx()``. Negative and out-of-bounds indices will be
> adjusted based on the provided length, but a negative ``step``, and a
> ``stop`` before a ``step`` are kept as-is. This means that, for a getslice
> operation, a simple for-loop can be used to visit the correct items in the
> correct order::
>
>     for (cur = start, i = 0; i < slicelength; cur += step, i++)
>         dest[i] = src[cur];
>
>
> If ``PySlice_GetIndicesEx()`` is not appropriate, the individual indices
> can
> be extracted from the ``PySlice`` object. If the indices are to be
> converted
> to C types, that should be done using ``PyIndex_Check()``,
> ``PyIndex_AsSsizeT()`` and the ``Py_ssize_t`` type, except that ``None``
> should be accepted as the default value for the index.
>
> For deleting slices (``mp_ass_subscript`` called with ``NULL`` as
> value) where the order does not matter, a reverse slice can be turned into
>
> the equivalent forward slice with::
>
>     if (step < 0) {
>         stop = start + 1;
>         start = stop + step*(slicelength - 1) - 1;
>         step = -step;
>     }
>
>
> For slice assignment with a ``step`` other than 1, it's usually necessary
> to
> require the source iterable to have the same length as the slice. When
> assigning to a slice of length 0, care needs to be taken to select the
> right
> insertion point. For a slice S[5:2], the correct insertion point is before
>
> index 5, not before index 2.
>
> For both deleting slice and slice assignment, it is important to remember
> arbitrary Python code may be executed when calling Py_DECREF() or
> otherwise
> interacting with arbitrary objects. Because of that, it's important your
> datatype stays consistent throughout the operation. Either operate on a
> copy
> of your datatype, or delay (for instance) Py_DECREF() calls until the
> datatype is updated. The latter is usually done by keeping a scratchpad of
>
> to-be-DECREF'ed items.
>
> Extended slicing in Python
> --------------------------
>
> The simplest way to support extended slicing in Python is by delegating to
> an underlying type that already supports extended slicing. The class can
> simply index the underlying type with the slice object (or tuple) it was
> indexed with.
>
> Barring that, the Python code will have to pretty much apply
> the same logic as the C type. ``PyIndex_AsSsizeT()`` is available as
> ``operator.index()``, with a ``try/except`` block replacing
> ``PyIndex_Check()``. ``isinstance(o, slice)`` and ``sliceobj.indices()``
> replace ``PySlice_Check()`` and ``PySlice_GetIndices()``, but the
> slicelength
> (which is provided by ``PySlice_GetIndicesEx()``) has to be calculated
> manually.
>
> Testing extended slicing
> ------------------------
>
> Proper tests of extended slicing capabilities should at least include the
> following (if the operations are supported), assuming a sequence of
> length 10. Triple-colon notation is used everywhere so it uses extended
> slicing even in Python 2.5 and earlier::
>
>    S[2:5:] (same as S[2:5])
>    S[5:2:] (same as S[5:2], an empty slice)
>    S[::] (same as S[:], a copy of the sequence)
>    S[:2:] (same as S[:2])
>    S[:11:] (same as S[:11], a copy of the sequence)
>    S[5::] (same as S[5:])
>    S[-11::] (same as S[-11:], a copy of the sequence)
>    S[-5:2:1] (same as S[:2])
>    S[-5:-2:2] (same as S[-5:-2], an empty slice)
>    S[5:2:-1] (the reverse of S[2:4])
>    S[-2:-5:-1] (the reverse of S[-4:-1])
>
>    S[:5:2] ([ S[0], S[2], S[4] ]))
>    S[9::2] ([ S[9] ])
>    S[8::2] ([ S[8] ])
>    S[7::2] ([ S[7], S[9]])
>    S[1::-1] ([ S[1], S[0] ])
>    S[1:0:-1] ([ S[1] ], does not include S[0]!)
>    S[1:-1:-1] (an empty slice)
>    S[::10] ([ S[0] ])
>    S[::-10] ([ S[9] ])
>
>    S[2:5:] = [1, 2, 3] ([ S[2], S[3], S[4] ] become [1, 2, 3])
>    S[2:5:] = [1] (S[2] becomes 1, S[3] and S[4] are deleted)
>    S[5:2:] = [1, 2, 3] ([1, 2, 3] inserted before S[5])
>    S[2:5:2] = [1, 2] ([ S[2], S[4] ] become [1, 2])
>    S[5:2:-2] = [1, 2] ([ S[3], S[5] ] become [2, 1])
>    S[3::3] = [1, 2, 3] ([ S[3], S[6], S[9] ] become [1, 2, 3])
>    S[:-5:-2] = [1, 2] ([ S[7], S[9] ] become [2, 1])
>
>    S[::-1] = S (reverse S in-place awkwardly)
>    S[:5:] = S (replaces S[:5] with a copy of S)
>
>    S[2:5:2] = [1, 2, 3] (error: assigning length-3 to slicelength-2)
>    S[2:5:2] = None (error: need iterable)
>
>
>


-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070828/1d5ece79/attachment-0001.htm 

From guido at python.org  Tue Aug 28 17:51:41 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Aug 2007 08:51:41 -0700
Subject: [Python-3000] Immutable bytes type and bsddb or other IO
In-Reply-To: <52dc1c820708272333m517bb5d0v3c658f9eb46ea16b@mail.gmail.com>
References: <ca471dc20708062056w362a951ma705452868e41472@mail.gmail.com>
	<20070823073837.GA14725@electricrain.com>
	<46CD3BFF.5080904@v.loewis.de>
	<20070823171837.GI24059@electricrain.com>
	<46CE5346.10301@canterbury.ac.nz>
	<ca471dc20708232117o290ccdees4175d057de483f56@mail.gmail.com>
	<20070824165823.GM24059@electricrain.com>
	<20070827075925.GT24059@electricrain.com>
	<ca471dc20708270712p3170fbb1o9450e24c11042bd5@mail.gmail.com>
	<52dc1c820708272333m517bb5d0v3c658f9eb46ea16b@mail.gmail.com>
Message-ID: <ca471dc20708280851s2fbdd82ayccb2c447317014f7@mail.gmail.com>

On 8/27/07, Gregory P. Smith <greg at krypto.org> wrote:
> I'm sure the BerkeleyDB library is not expecting the data passed in as
> a lookup key to change mid database traversal.  No idea if it'll
> handle that gracefully or not but I wouldn't expect it to and bet its
> possible to cause a segfault and/or irrepairable database damage that
> way.  The same goes for any other C APIs that you may pass data to
> that release the GIL.

In the case of BerkeleyDB I find this a weak argument -- there are so
many other things you can do to that API from Python that might cause
it to go beserk, that mutating the bytes while it's looking at them
sounds like a rather roundabout approach to sabotage.

Now, in general, I'm the first one to worry about techniques that
could let "pure Python" code cause a segfault, but when using a 3rd
party library, there usually isn't a choice.

Yet another thing is malignant *data*, but that's not the case here --
you would have to actively write evil code to trigger this condition.
So I don't see this as a security concern (otherwise the mere
existence of code probably would qualify as a security concern ;-).

IOW I'm not worried. (Though I'm not saying I would reject a patch
that adds the data locking facility to the bytes type. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From bjourne at gmail.com  Tue Aug 28 17:56:21 2007
From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=)
Date: Tue, 28 Aug 2007 17:56:21 +0200
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>
	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
	<014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1>
Message-ID: <740c3aec0708280856i6da94edfr4ca7274894315421@mail.gmail.com>

On 8/28/07, Raymond Hettinger <python at rcn.com> wrote:
> > On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote:
> >
> >> I would like to see PEP 8 remove the "as necessary to improve
> >> readability" in the function and method naming conventions.  That
> >> way methods like StringIO.getvalue() can be renamed to
> >> StringIO.get_value().
>
> Gratuitous breakage -- for nothing.  This is idiotic, pedantic, and counterproductive.  (No offense intended, I'm talking about the
> suggestion, not the suggestor).

Scale up. If X is the amount of pain inflicted by breaking the method
name and X/10 per year is the amount gained due to improved api
predictability, then the investment pays off in only 10 years.
Everyone using Python 5k will thank you for it. Besides, with
deprecations, changing api isn't that painful.

> Ask ten of your programmer friends to write down "result equals object dot get value" and see if more than one in ten uses an
> underscore (no stacking the deck with Cobol programmers).

I wonder how many will write down "result equals object dot get value".... :)

-- 
mvh Bj?rn

From martin at v.loewis.de  Tue Aug 28 18:22:57 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Aug 2007 18:22:57 +0200
Subject: [Python-3000] status (preparing for first alpha)
In-Reply-To: <ca471dc20708270721s7ecfba17uf6472a32ae87b44b@mail.gmail.com>
References: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>	<18130.45493.75756.332057@montanaro.dyndns.org>
	<ca471dc20708270721s7ecfba17uf6472a32ae87b44b@mail.gmail.com>
Message-ID: <46D44BE1.6010007@v.loewis.de>

> Agreed. Neal tried to set up a buildbot on the only machine he can
> easily use for this, but that's the "old gentoo box" where he keeps
> getting signal 32. (I suspect this may be a kernel bug and not our
> fault.) I forget who can set up buildbots -- is it Martin? Can someone
> else help?

It's fairly easy to do - I just have to tell the build slaves to
build the 3k branch as well. The active branches (2.5, trunk, 3k)
will then compete for the slaves, in a FIFO manner (assuming there
are concurrent commits).

Regards,
Martin


From martin at v.loewis.de  Tue Aug 28 18:52:50 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Aug 2007 18:52:50 +0200
Subject: [Python-3000] status (preparing for first alpha)
In-Reply-To: <46D44BE1.6010007@v.loewis.de>
References: <ee2a432c0708270048jfbc7344ge51d1ff039db7ca2@mail.gmail.com>	<18130.45493.75756.332057@montanaro.dyndns.org>	<ca471dc20708270721s7ecfba17uf6472a32ae87b44b@mail.gmail.com>
	<46D44BE1.6010007@v.loewis.de>
Message-ID: <46D452E2.3080300@v.loewis.de>

> It's fairly easy to do - I just have to tell the build slaves to
> build the 3k branch as well. The active branches (2.5, trunk, 3k)
> will then compete for the slaves, in a FIFO manner (assuming there
> are concurrent commits).

Apparently, Neal already did that. The 3k buildbots are at

http://www.python.org/dev/buildbot/3.0/

Currently, the tests pass on none of these machines; some
fail to build.

Regards,
Martin

From theller at ctypes.org  Tue Aug 28 18:57:13 2007
From: theller at ctypes.org (Thomas Heller)
Date: Tue, 28 Aug 2007 18:57:13 +0200
Subject: [Python-3000] buildbots
Message-ID: <46D453E9.4020903@ctypes.org>

Unfortunately, I read nearly all my mailing lists through gmane with nntp
- and gmane is down currently (it doesn't deliver new messages any more).
So I cannot write a reply in the original thread :-(

Neal:
> > We got 'em.  Let the spam begin! :-)
> > 
> > This page is not linked from the web anywhere:
> >   http://python.org/dev/buildbot/3.0/
> > 
> > I'm not expecting a lot of signal out of them at the beginning.  All
> > but one has successfully compiled py3k though.  I noticed there were
> > many warnings on windows.  I wonder if they are important:
> > 
> > pythoncore - 0 error(s), 6 warning(s)
> > _ctypes - 0 error(s), 1 warning(s)
> > bz2 - 0 error(s), 9 warning(s)
> > _ssl - 0 error(s), 23 warning(s)
> > _socket - 0 error(s), 1 warning(s)
> > 
> > On trunk, the same machine only has:
> > bz2 - 0 error(s), 2 warning(s)

Since the tests fail on the trunk (on the windows machines),
the 'clean' step is not run.  So, the next build is not a complete
rebuild, and only some parts are actually compiled.  IMO.
If you look at later compiler runs (Windows, py3k), you see that
there are also less errors.

> > The windows buildbot seems to be failing due to line ending issues?

Yes.

http://bugs.python.org/issue1029 fixes the problem.  I've also recorded
this failure in http://bugs.python.org/issue1041.

Other windows build problems are recorded as issues 1039, 1040, 1041, 1042, 1043.

The most severe is http://bugs.python.org/issue1039, since it causes the buildbot
test runs to hang (an assertion in the debug MS runtime library displays a messagebox).

> > Another windows buildbot failed to compile:
> > _tkinter - 3 error(s), 1 warning(s)

I'll have to look into that. This is the win64 buildbot.

Thomas



From martin at v.loewis.de  Tue Aug 28 19:31:39 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Aug 2007 19:31:39 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <ee2a432c0708280023x377e0f77if438fa21589a190b@mail.gmail.com>
References: <ee2a432c0708280023x377e0f77if438fa21589a190b@mail.gmail.com>
Message-ID: <46D45BFB.6090501@v.loewis.de>

>  * Add the branch in the buildbot master.cfg file.  2 places need to be updated.
>  * Add new rules in the apache default configuration file (2 lines).
> Make sure to use the same port number in both the changes.
>  * Check in the buildbot master config.  apache config too?

* Edit pydotorg:build/data/dev/buildbot/content.ht

Regards,
Martin



From martin at v.loewis.de  Tue Aug 28 19:35:06 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Aug 2007 19:35:06 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D453E9.4020903@ctypes.org>
References: <46D453E9.4020903@ctypes.org>
Message-ID: <46D45CCA.3050206@v.loewis.de>

> Since the tests fail on the trunk (on the windows machines),
> the 'clean' step is not run.

No. The 'clean' step is run even if the test step failed.

The problem must be somewhere else: for some reason, the
connection breaks down/times out; this causes the build
to abort.

Can you check the slave logfile to see what they say?

Regards,
Martin


From theller at ctypes.org  Tue Aug 28 21:06:02 2007
From: theller at ctypes.org (Thomas Heller)
Date: Tue, 28 Aug 2007 21:06:02 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D462EB.4070600@ctypes.org>
References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de>
	<46D462EB.4070600@ctypes.org>
Message-ID: <46D4721A.2040208@ctypes.org>

Thomas Heller schrieb:
> Martin v. L?wis schrieb:
>>> Since the tests fail on the trunk (on the windows machines),
>>> the 'clean' step is not run.
>> 
>> No. The 'clean' step is run even if the test step failed.
>> 
>> The problem must be somewhere else: for some reason, the
>> connection breaks down/times out; this causes the build
>> to abort.
>> 
> On the windows buildbot named x86 XP-3 trunk I see:
> 
> An XP firewall message box asking if python_d should be unblocked (which is possibly unrelated).
> 
> A Debug assertion message box.  Clicking 'Retry' to debug start Visual Studio,
> it points at line 1343 in db-4.4.20\log\log_put.c:
> 
> 	/*
> 	 * If the open failed for reason other than the file
> 	 * not being there, complain loudly, the wrong user
> 	 * probably started up the application.
> 	 */
> 	if (ret != ENOENT) {
> 		__db_err(dbenv,
> 		     "%s: log file unreadable: %s", *namep, db_strerror(ret));
> =>>		return (__db_panic(dbenv, ret));
> 	}
> 
> Now that I have written this I'm not so sure any longer whether this was for the trunk
> or the py3k build ;-(.

I've checked again: it is in the trunk.

Do you know if it is possible to configure windows so that debug assertions do NOT
display a message box (it is very convenient for interactive testing, but not so
for automatic tests)?

Thomas

From eric+python-dev at trueblade.com  Tue Aug 28 22:33:42 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 28 Aug 2007 16:33:42 -0400
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D40B88.4080202@trueblade.com>
References: <46D40B88.4080202@trueblade.com>
Message-ID: <46D486A6.2070701@trueblade.com>

Eric Smith wrote:
> One of the things that PEP 3101 deliberately under specifies is the 
> Formatter class, leaving decisions up to the implementation.  Now that a 
> working implementation exists, I think it's reasonable to tighten it up.

<examples deleted>

I should also have included the recipe for the 'smart' formatter
mentioned in the PEP, which automatically searches local and global
namespaces:
---------------------
import inspect
from string import Formatter

class SmartFormatter(Formatter):
     def vformat(self, format_string, args, kwargs):
         self.locals = inspect.currentframe(2).f_locals
         return super(SmartFormatter, self).vformat(format_string, args,
                                                    kwargs)

     def get_value(self, key, args, kwargs):
         if isinstance(key, basestring):
             try:
                 # try kwargs first
                 return kwargs[key]
             except KeyError:
                 try:
                     # try locals next
                     return self.locals[key]
                 except KeyError:
                     # try globals last
                     return globals()[key]
         else:
             return args[key]


def func():
     var0 = 'local--0'
     print(fmt.format('in func   0:{var0} 1:{var1}'))

fmt = SmartFormatter()

var0 = 'global-0'
var1 = 'global-1'
func()

print(fmt.format('in module 0:{var0} 1:{var1}'))
---------------------

This code produces:
in func   0:local--0 1:global-1
in module 0:global-0 1:global-1




From theller at ctypes.org  Tue Aug 28 18:50:07 2007
From: theller at ctypes.org (Thomas Heller)
Date: Tue, 28 Aug 2007 18:50:07 +0200
Subject: [Python-3000] buildbots
Message-ID: <46D4523F.4040609@ctypes.org>

Unfortunately, I read nearly all my mailing lists through gmane with nntp
- and gmane is down currently (it doesn't deliver new messages any more).
So I cannot write a reply in the original thread :-(

Neal:
> We got 'em.  Let the spam begin! :-)
> 
> This page is not linked from the web anywhere:
>   http://python.org/dev/buildbot/3.0/
> 
> I'm not expecting a lot of signal out of them at the beginning.  All
> but one has successfully compiled py3k though.  I noticed there were
> many warnings on windows.  I wonder if they are important:
> 
> pythoncore - 0 error(s), 6 warning(s)
> _ctypes - 0 error(s), 1 warning(s)
> bz2 - 0 error(s), 9 warning(s)
> _ssl - 0 error(s), 23 warning(s)
> _socket - 0 error(s), 1 warning(s)
> 
> On trunk, the same machine only has:
> bz2 - 0 error(s), 2 warning(s)

Since the tests fail on the trunk (on the windows machines),
the 'clean' step is not run.  So, the next build is not a complete
rebuild, and only some parts are actually compiled.  IMO.
If you look at later compiler runs (Windows, py3k), you see that
there are also less errors.

> The windows buildbot seems to be failing due to line ending issues?

Yes.

http://bugs.python.org/issue1029 fixes the problem.  I've also recorded
this failure in http://bugs.python.org/issue1041.

Other windows build problems are recorded as issues 1039, 1040, 1041, 1042, 1043.

The most severe is http://bugs.python.org/issue1039, since it causes the buildbot
test runs to hang (an assertion in the debug MS runtime library displays a messagebox).

> Another windows buildbot failed to compile:
> _tkinter - 3 error(s), 1 warning(s)

I'll have to look into that. This is the win64 buildbot.

Thomas

From theller at python.net  Tue Aug 28 18:51:23 2007
From: theller at python.net (Thomas Heller)
Date: Tue, 28 Aug 2007 18:51:23 +0200
Subject: [Python-3000] buildbots
Message-ID: <46D4528B.5070108@python.net>

Unfortunately, I read nearly all my mailing lists through gmane with nntp
- and gmane is down currently (it doesn't deliver new messages any more).
So I cannot write a reply in the original thread :-(

Neal:
> > We got 'em.  Let the spam begin! :-)
> > 
> > This page is not linked from the web anywhere:
> >   http://python.org/dev/buildbot/3.0/
> > 
> > I'm not expecting a lot of signal out of them at the beginning.  All
> > but one has successfully compiled py3k though.  I noticed there were
> > many warnings on windows.  I wonder if they are important:
> > 
> > pythoncore - 0 error(s), 6 warning(s)
> > _ctypes - 0 error(s), 1 warning(s)
> > bz2 - 0 error(s), 9 warning(s)
> > _ssl - 0 error(s), 23 warning(s)
> > _socket - 0 error(s), 1 warning(s)
> > 
> > On trunk, the same machine only has:
> > bz2 - 0 error(s), 2 warning(s)

Since the tests fail on the trunk (on the windows machines),
the 'clean' step is not run.  So, the next build is not a complete
rebuild, and only some parts are actually compiled.  IMO.
If you look at later compiler runs (Windows, py3k), you see that
there are also less errors.

> > The windows buildbot seems to be failing due to line ending issues?

Yes.

http://bugs.python.org/issue1029 fixes the problem.  I've also recorded
this failure in http://bugs.python.org/issue1041.

Other windows build problems are recorded as issues 1039, 1040, 1041, 1042, 1043.

The most severe is http://bugs.python.org/issue1039, since it causes the buildbot
test runs to hang (an assertion in the debug MS runtime library displays a messagebox).

> > Another windows buildbot failed to compile:
> > _tkinter - 3 error(s), 1 warning(s)

I'll have to look into that. This is the win64 buildbot.

Thomas


From john.m.camara at comcast.net  Tue Aug 28 23:08:00 2007
From: john.m.camara at comcast.net (john.m.camara at comcast.net)
Date: Tue, 28 Aug 2007 21:08:00 +0000
Subject: [Python-3000] Will standard library modules comply with PEP 8?
Message-ID: <082820072108.14036.46D48EB000054A84000036D422155934140E9D0E030E0CD203D202080106@comcast.net>

On 8/28/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On the gripping hand, breaking getattr, getitem, setattr, setitem, 
> delattr and delitem without a *really* good reason would mean seriously 
> annoying a heck of a lot of people for no real gain.
>
Making an exception to the naming convention for builtins seams 
acceptable as it's just a small list of method names for someone to 
remember.  The bigger issue I see is the numerous inconsistencies 
that exist in the standard library.  I know when I was learning Python 
(8 years ago) I found these inconsistencies annoying.  I also see that 
everyone I convince to use Python also has this issue for the first year 
or 2 as they learn to master the language.

At this point in time a change of names would be a pain for myself as 
I'm well aware of the names used in the standard library and will likely 
type them wrong in the begining if the change takes place.  But I know 
it will only take a short time to get used to the new names so I see it 
as a small price to pay to shorten the learning curve for newbies.  I'll 
even get over the pain of updating 300+ Kloc that I maintain.

John

From thomas at python.org  Tue Aug 28 23:17:37 2007
From: thomas at python.org (Thomas Wouters)
Date: Tue, 28 Aug 2007 23:17:37 +0200
Subject: [Python-3000] Merging the trunk SSL changes.
Message-ID: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com>

I'm trying to merge the trunk into the py3k branch (so I can work on
removing simple slices), but the SSL changes in the trunk are in the way.
That is to say, the new 'ssl' module depends on the Python 2.x layout in the
'socket' module. Specifically, that socket.socket is a wrapper class around
_socket.socket, so it can subclass that and still pass a socket as a
separate argument to __init__. Unfortunately, in Python 3.0,
socket.socketis a subclass of _socket.socket, so that trick won't
work. And there isn't
really a way to fake it, either, except by making ssl.sslsocket *not*
subclass socket.socket.

I'm going to check in this merge despite of the extra breakage, but it would
be good if someone could either fix the py3k branch proper (I don't see
how), or change the trunk strategy to be more forward-compatible.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070828/a5866f77/attachment.htm 

From jimjjewett at gmail.com  Wed Aug 29 00:47:30 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 28 Aug 2007 18:47:30 -0400
Subject: [Python-3000] Will standard library modules comply with PEP 8?
In-Reply-To: <46D42863.8010300@benjiyork.com>
References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net>
	<6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org>
	<014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1>
	<46D422A5.7010501@latte.ca> <46D42863.8010300@benjiyork.com>
Message-ID: <fb6fbf560708281547u7d229b6cl8ac4591ab020120e@mail.gmail.com>

On 8/28/07, Benji York <benji at benjiyork.com> wrote:
> Blake Winton wrote:
> > Raymond Hettinger wrote:
> >>> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote:

> >> Ask ten of your programmer friends to write down "result equals
> >> object dot get value" ...

> > Sure, but given the rise of Java, how many of them will spell it with a
> > capital 'V'?   ;)

> > On the one hand, I really like consistency in my programming languages.
> > On the other hand, a foolish consistency is the hobgoblin of little minds.

> I call quote misapplication.  Having predictable identifier names isn't
> "foolish".

Agreed; the question is what is predictable.

When I worked in Common Lisp, the separators were usually _ or -, and
it was a royal pain to remember which.

In python, there isn't a consistent separator, because it can be any
of runthewordstogether, wordBreaksCapitalize, IncludingFirstWord, or
underscore_is_your_friend.

Unfortunately, even if we picked a single convention, it still
wouldn't always seem right, because sometimes the words represent
different kinds of things ("get" and name) or involve acronyms that
already have their own capitalization rules (HTTP).  So we can't
possibly get perfect consistency.

If we're at (made-up numbers) 85% now, and can only get to 95%, and
that extra 10% would break other consistencies that we didn't consider
(such as consistency with wrapped code) ... is it really worth
incompatibilities?

-jJ

From jimjjewett at gmail.com  Wed Aug 29 01:07:49 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 28 Aug 2007 19:07:49 -0400
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D40B88.4080202@trueblade.com>
References: <46D40B88.4080202@trueblade.com>
Message-ID: <fb6fbf560708281607n377a1513m2a191ad569610af2@mail.gmail.com>

On 8/28/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> parse(format_string)

>... returns an iterable of tuples
> (literal_text, field_name, format_spec, conversion)

Which are really either

    (literal_text, None, None, None)
or
    (None, field_name, format_spec, conversion)

I can't help thinking that these two return types shouldn't be
alternatives that both pretend to be 4-tuples.  At the least, they
should be

    "literal text"
vs
    (field_name, format_spec, conversion)

but you might want to take inspiration from the "tail" of an
elementtree node, and return the field with the literal next to it as
a single object.

    (literal_text, field_name, format_spec, conversion)

Where the consumer should output the literal text followed by the
results of formatting the field.  And yes, the last tuple would often
be

    (literal_text, None, None, None)

to indicate no additional fields need processing.

-jJ

From eric+python-dev at trueblade.com  Wed Aug 29 01:18:24 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 28 Aug 2007 19:18:24 -0400
Subject: [Python-3000] string.Formatter class
In-Reply-To: <fb6fbf560708281607n377a1513m2a191ad569610af2@mail.gmail.com>
References: <46D40B88.4080202@trueblade.com>
	<fb6fbf560708281607n377a1513m2a191ad569610af2@mail.gmail.com>
Message-ID: <46D4AD40.9070006@trueblade.com>

Jim Jewett wrote:
> On 8/28/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
>> parse(format_string)
> 
>> ... returns an iterable of tuples
>> (literal_text, field_name, format_spec, conversion)
> 
> Which are really either
> 
>     (literal_text, None, None, None)
> or
>     (None, field_name, format_spec, conversion)
> 
> I can't help thinking that these two return types shouldn't be
> alternatives that both pretend to be 4-tuples.  At the least, they
> should be
> 
>     "literal text"
> vs
>     (field_name, format_spec, conversion)

I agree that it might not be the best interface.  It was originally just 
an internal thing, where it didn't really matter.  But since then it's 
(possibly) become exposed as part of the Formatter API, so rethinking it 
makes sense.  I really didn't want to write:

for result in parse(format_string):
     if isinstance(result, str):
         # it's a literal
     else:
         field_name, format_spec, conversion = result

> but you might want to take inspiration from the "tail" of an
> elementtree node, and return the field with the literal next to it as
> a single object.
> 
>     (literal_text, field_name, format_spec, conversion)

I think I like that best.

Thanks!

Eric.

From janssen at parc.com  Wed Aug 29 01:37:08 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 28 Aug 2007 16:37:08 PDT
Subject: [Python-3000] Merging the trunk SSL changes.
In-Reply-To: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com> 
References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com>
Message-ID: <07Aug28.163713pdt."57996"@synergy1.parc.xerox.com>

> I'm trying to merge the trunk into the py3k branch (so I can work on
> removing simple slices), but the SSL changes in the trunk are in the way.
> That is to say, the new 'ssl' module depends on the Python 2.x layout in the
> 'socket' module. Specifically, that socket.socket is a wrapper class around
> _socket.socket, so it can subclass that and still pass a socket as a
> separate argument to __init__. Unfortunately, in Python 3.0,
> socket.socketis a subclass of _socket.socket, so that trick won't
> work. And there isn't
> really a way to fake it, either, except by making ssl.sslsocket *not*
> subclass socket.socket.

It's on my list -- in 3K, the plan is that ssl.SSLSocket is going to
inherit from socket.socket and socket.SocketIO, and (I think) pass
fileno instead of _sock.

> I'm going to check in this merge despite of the extra breakage, but it would
> be good if someone could either fix the py3k branch proper (I don't see
> how), or change the trunk strategy to be more forward-compatible.

If you can hold off one day before doing the trunk merge, I'm going to
post a fix to the Windows SSL breakage this evening (PDT).

Bill

From thomas at python.org  Wed Aug 29 01:48:39 2007
From: thomas at python.org (Thomas Wouters)
Date: Wed, 29 Aug 2007 01:48:39 +0200
Subject: [Python-3000] Merging the trunk SSL changes.
In-Reply-To: <3730382933471592889@unknownmsgid>
References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com>
	<3730382933471592889@unknownmsgid>
Message-ID: <9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com>

On 8/29/07, Bill Janssen <janssen at parc.com> wrote:

> If you can hold off one day before doing the trunk merge, I'm going to
> post a fix to the Windows SSL breakage this evening (PDT).


Too late, sorry, it's already checked in. You can revert the SSL bits if you
want, and take care to merge the proper changes later.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070829/adcfe9d6/attachment.htm 

From janssen at parc.com  Wed Aug 29 02:15:49 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 28 Aug 2007 17:15:49 PDT
Subject: [Python-3000] Merging the trunk SSL changes.
In-Reply-To: <9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com> 
References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com>
	<3730382933471592889@unknownmsgid>
	<9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com>
Message-ID: <07Aug28.171558pdt."57996"@synergy1.parc.xerox.com>

> > If you can hold off one day before doing the trunk merge, I'm going to
> > post a fix to the Windows SSL breakage this evening (PDT).
> 
> 
> Too late, sorry, it's already checked in. You can revert the SSL bits if you
> want, and take care to merge the proper changes later.

No, that's OK.  I'll just (eventually) generate a 3K patch against
what's in the repo.  Probably not this week.

Here's my work plan (from yesterday's python-dev):

1)  Generate a patch to the trunk to remove all use of socket.ssl in
    library modules (and elsewhere except for
    test/test_socket_ssl.py), and switch them to use the ssl module.
    This would affect httplib, imaplib, poplib, smtplib, urllib,
    and xmlrpclib.

    This patch should also deprecate the use of socket.ssl, and
    particularly the "server" and "issuer" methods on it, which can
    return bad data.

2)  Expand the test suite to exhaustively test edge cases, particularly
    things like invalid protocol ids, bad cert files, bad key files,
    etc.

3)  Take the threaded server example in test/test_ssl.py, clean it up,
    and add it to the Demos directory (maybe it should be a HOWTO?).

4)  Generate a patch for the Py3K branch.  This patch would remove the
    "ssl" function from the socket module, and would also remove the
    "server" and "issuer" methods on the SSL context.  The ssl.sslsocket
    class would be renamed to SSLSocket (PEP 8), and would inherit
    from socket.socket and io.RawIOBase.  The current improvements to
    the Modules/_ssl.c file would be folded in.  The patch would
    also fix all uses of socket.ssl in the other library modules.

5)  Generate a package for older Pythons (2.3-2.5).  This would
    install the ssl module, plus the improved version of _ssl.c.
    Needs more design.


I've currently got a patch for (1).  Sounds like I should switch the
order of (3) and (4).


Bill

From guido at python.org  Wed Aug 29 03:29:13 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Aug 2007 18:29:13 -0700
Subject: [Python-3000] Merging the trunk SSL changes.
In-Reply-To: <-3823197267807538008@unknownmsgid>
References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com>
	<3730382933471592889@unknownmsgid>
	<9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com>
	<-3823197267807538008@unknownmsgid>
Message-ID: <ca471dc20708281829m1aed4c6sdd9148b4bc905b81@mail.gmail.com>

On 8/28/07, Bill Janssen <janssen at parc.com> wrote:
> > > If you can hold off one day before doing the trunk merge, I'm going to
> > > post a fix to the Windows SSL breakage this evening (PDT).
> >
> >
> > Too late, sorry, it's already checked in. You can revert the SSL bits if you
> > want, and take care to merge the proper changes later.
>
> No, that's OK.  I'll just (eventually) generate a 3K patch against
> what's in the repo.  Probably not this week.
>
> Here's my work plan (from yesterday's python-dev):
>
> 1)  Generate a patch to the trunk to remove all use of socket.ssl in
>     library modules (and elsewhere except for
>     test/test_socket_ssl.py), and switch them to use the ssl module.
>     This would affect httplib, imaplib, poplib, smtplib, urllib,
>     and xmlrpclib.
>
>     This patch should also deprecate the use of socket.ssl, and
>     particularly the "server" and "issuer" methods on it, which can
>     return bad data.
>
> 2)  Expand the test suite to exhaustively test edge cases, particularly
>     things like invalid protocol ids, bad cert files, bad key files,
>     etc.
>
> 3)  Take the threaded server example in test/test_ssl.py, clean it up,
>     and add it to the Demos directory (maybe it should be a HOWTO?).
>
> 4)  Generate a patch for the Py3K branch.  This patch would remove the
>     "ssl" function from the socket module, and would also remove the
>     "server" and "issuer" methods on the SSL context.  The ssl.sslsocket
>     class would be renamed to SSLSocket (PEP 8), and would inherit
>     from socket.socket and io.RawIOBase.  The current improvements to
>     the Modules/_ssl.c file would be folded in.  The patch would
>     also fix all uses of socket.ssl in the other library modules.
>
> 5)  Generate a package for older Pythons (2.3-2.5).  This would
>     install the ssl module, plus the improved version of _ssl.c.
>     Needs more design.
>
>
> I've currently got a patch for (1).  Sounds like I should switch the
> order of (3) and (4).

Until ssl.py is fixed, I've added quick hacks to test_ssl.py and
test_socket_ssl.py to disable these tests, so people won't be alarmed
by the test failures.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Wed Aug 29 05:27:56 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 28 Aug 2007 20:27:56 PDT
Subject: [Python-3000] Merging the trunk SSL changes.
In-Reply-To: <ca471dc20708281829m1aed4c6sdd9148b4bc905b81@mail.gmail.com> 
References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com>
	<3730382933471592889@unknownmsgid>
	<9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com>
	<-3823197267807538008@unknownmsgid>
	<ca471dc20708281829m1aed4c6sdd9148b4bc905b81@mail.gmail.com>
Message-ID: <07Aug28.202800pdt."57996"@synergy1.parc.xerox.com>

> Until ssl.py is fixed, I've added quick hacks to test_ssl.py and
> test_socket_ssl.py to disable these tests, so people won't be alarmed
> by the test failures.

You might just want to configure out SSL support, or have Lib/ssl.py
raise an ImportError, for the moment.

Bill

From eric+python-dev at trueblade.com  Wed Aug 29 05:33:10 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 28 Aug 2007 23:33:10 -0400
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D4AD40.9070006@trueblade.com>
References: <46D40B88.4080202@trueblade.com>	<fb6fbf560708281607n377a1513m2a191ad569610af2@mail.gmail.com>
	<46D4AD40.9070006@trueblade.com>
Message-ID: <46D4E8F6.30508@trueblade.com>

Eric Smith wrote:
> Jim Jewett wrote:

>> but you might want to take inspiration from the "tail" of an
>> elementtree node, and return the field with the literal next to it as
>> a single object.
>>
>>     (literal_text, field_name, format_spec, conversion)
> 
> I think I like that best.

I implemented this in r57641.  I think it simplifies things.  At least,
it's easier to explain.

Due to an optimization dealing with escaped braces, it's possible for
(literal, None, None, None) to be returned more than once.  I don't
think that's a problem, as long as it's documented.  If you look at
string.py's Formatter.vformat, I don't think it complicates the
implementation at all.

Thanks for the suggestion.




From guido at python.org  Wed Aug 29 05:32:31 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Aug 2007 20:32:31 -0700
Subject: [Python-3000] Merging the trunk SSL changes.
In-Reply-To: <210293633547667201@unknownmsgid>
References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com>
	<3730382933471592889@unknownmsgid>
	<9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com>
	<-3823197267807538008@unknownmsgid>
	<ca471dc20708281829m1aed4c6sdd9148b4bc905b81@mail.gmail.com>
	<210293633547667201@unknownmsgid>
Message-ID: <ca471dc20708282032p39c3efadsa880a965fd0784c8@mail.gmail.com>

Yes, that makes more sense. Bah, three revisions for one.

On 8/28/07, Bill Janssen <janssen at parc.com> wrote:
> > Until ssl.py is fixed, I've added quick hacks to test_ssl.py and
> > test_socket_ssl.py to disable these tests, so people won't be alarmed
> > by the test failures.
>
> You might just want to configure out SSL support, or have Lib/ssl.py
> raise an ImportError, for the moment.
>
> Bill
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 29 06:07:44 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Aug 2007 21:07:44 -0700
Subject: [Python-3000] Need help enforcing strict str/bytes distinctions
In-Reply-To: <20070828074420.GA15998@core.g33x.de>
References: <ca471dc20708271716s475bbebfs18f0390c86f3ac91@mail.gmail.com>
	<20070828074420.GA15998@core.g33x.de>
Message-ID: <ca471dc20708282107h592cece3i3ac7f023f40a00b4@mail.gmail.com>

On 8/28/07, Lars Gust?bel <lars at gustaebel.de> wrote:
> On Mon, Aug 27, 2007 at 05:16:37PM -0700, Guido van Rossum wrote:
> > As anyone following the py3k checkins should have figured out by now,
> > I'm on a mission to require all code to be consistent about bytes vs.
> > str. For example binary files will soon refuse str arguments to
> > write(), and vice versa.
> >
> > I have a patch that turns on this enforcement, but I have anout 14
> > failing unit tests that require a lot of attention. I'm hoping a few
> > folks might have time to help out.
> >
> > Here are the unit tests that still need work:
> > [...]
> > test_tarfile
>
> Fixed in r57608.

Thanks!

I fixed most others (I think); we're down to test_asynchat and
test_urllib2_localnet failing. But I've checked in the main patch
enforcing stricter str/bytes, just to get things rolling.


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Wed Aug 29 15:02:57 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 29 Aug 2007 09:02:57 -0400
Subject: [Python-3000] [Python-checkins] buildbot failure in S-390
	Debian 3.0
In-Reply-To: <20070829125131.714341E4002@bag.python.org>
References: <20070829125131.714341E4002@bag.python.org>
Message-ID: <46D56E81.1030604@trueblade.com>

The URL is getting mangled, it should be:
http://www.python.org/dev/buildbot/all/S-390%20Debian%203.0/builds/9

buildbot at python.org wrote:
> The Buildbot has detected a new failure of S-390 Debian 3.0.
> Full details are available at:
>  http://www.python.org/dev/buildbot/all/S-390%2520Debian%25203.0/builds/9
> 
> Buildbot URL: http://www.python.org/dev/buildbot/all/
> 
> Build Reason: 
> Build Source Stamp: [branch branches/py3k] HEAD
> Blamelist: eric.smith
> 
> BUILD FAILED: failed compile
> 
> sincerely,
>  -The Buildbot
> 
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://mail.python.org/mailman/listinfo/python-checkins
> 


From guido at python.org  Wed Aug 29 15:20:52 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 06:20:52 -0700
Subject: [Python-3000] [Python-checkins] buildbot failure in S-390
	Debian 3.0
In-Reply-To: <46D56E81.1030604@trueblade.com>
References: <20070829125131.714341E4002@bag.python.org>
	<46D56E81.1030604@trueblade.com>
Message-ID: <ca471dc20708290620g3ad16f2es5c29d7f465c4e4af@mail.gmail.com>

I noticed this failed with a traceback from distutils, caused by a bug
in an exception handler; which I fixed.

I also noticed that there were a *lot* of warnings like this:

Objects/object.c:193: warning: format '%d' expects type 'int', but
argument 7 has type 'Py_ssize_t'

These can be fixed using %zd I believe. Any volunteers? This would
tremendously improve 64-bit quality!

--Guido

On 8/29/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> The URL is getting mangled, it should be:
> http://www.python.org/dev/buildbot/all/S-390%20Debian%203.0/builds/9
>
> buildbot at python.org wrote:
> > The Buildbot has detected a new failure of S-390 Debian 3.0.
> > Full details are available at:
> >  http://www.python.org/dev/buildbot/all/S-390%2520Debian%25203.0/builds/9
> >
> > Buildbot URL: http://www.python.org/dev/buildbot/all/
> >
> > Build Reason:
> > Build Source Stamp: [branch branches/py3k] HEAD
> > Blamelist: eric.smith
> >
> > BUILD FAILED: failed compile
> >
> > sincerely,
> >  -The Buildbot
> >
> > _______________________________________________
> > Python-checkins mailing list
> > Python-checkins at python.org
> > http://mail.python.org/mailman/listinfo/python-checkins
> >
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jeremy at alum.mit.edu  Wed Aug 29 16:01:17 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed, 29 Aug 2007 10:01:17 -0400
Subject: [Python-3000] Need help enforcing strict str/bytes distinctions
In-Reply-To: <ca471dc20708282107h592cece3i3ac7f023f40a00b4@mail.gmail.com>
References: <ca471dc20708271716s475bbebfs18f0390c86f3ac91@mail.gmail.com>
	<20070828074420.GA15998@core.g33x.de>
	<ca471dc20708282107h592cece3i3ac7f023f40a00b4@mail.gmail.com>
Message-ID: <e8bf7a530708290701x36133aa4j6039fca1f111a8e9@mail.gmail.com>

On 8/29/07, Guido van Rossum <guido at python.org> wrote:
> On 8/28/07, Lars Gust?bel <lars at gustaebel.de> wrote:
> > On Mon, Aug 27, 2007 at 05:16:37PM -0700, Guido van Rossum wrote:
> > > As anyone following the py3k checkins should have figured out by now,
> > > I'm on a mission to require all code to be consistent about bytes vs.
> > > str. For example binary files will soon refuse str arguments to
> > > write(), and vice versa.
> > >
> > > I have a patch that turns on this enforcement, but I have anout 14
> > > failing unit tests that require a lot of attention. I'm hoping a few
> > > folks might have time to help out.
> > >
> > > Here are the unit tests that still need work:
> > > [...]
> > > test_tarfile
> >
> > Fixed in r57608.
>
> Thanks!
>
> I fixed most others (I think); we're down to test_asynchat and
> test_urllib2_localnet failing. But I've checked in the main patch
> enforcing stricter str/bytes, just to get things rolling.

I'm working on test_urllib2_localnet now.

Jeremy

>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu
>

From martin at v.loewis.de  Wed Aug 29 16:05:59 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Aug 2007 16:05:59 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D4721A.2040208@ctypes.org>
References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de>
	<46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org>
Message-ID: <46D57D47.1090709@v.loewis.de>

> Do you know if it is possible to configure windows so that debug assertions do NOT
> display a message box (it is very convenient for interactive testing, but not so
> for automatic tests)?

You can use _set_error_mode(_OUT_TO_STDERR) to make assert() go to
stderr rather than to a message box. You can use
_CrtSetReportMode(_CRT_ASSERT /* or _CRT_WARN or CRT_ERROR */,
_CRTDBG_MODE_FILE) to make _ASSERT() go to a file; you need to
call _CrtSetReportFile( _CRT_ASSERT, _CRTDBG_FILE_STDERR ) in
addition to make the file stderr.

Not sure what window precisely you got, so I can't comment which
of these (if any) would have made the message go away.

Regards,
Martin

From guido at python.org  Wed Aug 29 16:29:36 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 07:29:36 -0700
Subject: [Python-3000] Need help enforcing strict str/bytes distinctions
In-Reply-To: <e8bf7a530708290701x36133aa4j6039fca1f111a8e9@mail.gmail.com>
References: <ca471dc20708271716s475bbebfs18f0390c86f3ac91@mail.gmail.com>
	<20070828074420.GA15998@core.g33x.de>
	<ca471dc20708282107h592cece3i3ac7f023f40a00b4@mail.gmail.com>
	<e8bf7a530708290701x36133aa4j6039fca1f111a8e9@mail.gmail.com>
Message-ID: <ca471dc20708290729u772ff198jc1db010f1ebd2e0f@mail.gmail.com>

Stop, I already fixed it. Sorry!

On 8/29/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> On 8/29/07, Guido van Rossum <guido at python.org> wrote:
> > On 8/28/07, Lars Gust?bel <lars at gustaebel.de> wrote:
> > > On Mon, Aug 27, 2007 at 05:16:37PM -0700, Guido van Rossum wrote:
> > > > As anyone following the py3k checkins should have figured out by now,
> > > > I'm on a mission to require all code to be consistent about bytes vs.
> > > > str. For example binary files will soon refuse str arguments to
> > > > write(), and vice versa.
> > > >
> > > > I have a patch that turns on this enforcement, but I have anout 14
> > > > failing unit tests that require a lot of attention. I'm hoping a few
> > > > folks might have time to help out.
> > > >
> > > > Here are the unit tests that still need work:
> > > > [...]
> > > > test_tarfile
> > >
> > > Fixed in r57608.
> >
> > Thanks!
> >
> > I fixed most others (I think); we're down to test_asynchat and
> > test_urllib2_localnet failing. But I've checked in the main patch
> > enforcing stricter str/bytes, just to get things rolling.
>
> I'm working on test_urllib2_localnet now.
>
> Jeremy
>
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu
> >
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jeremy at alum.mit.edu  Wed Aug 29 18:49:08 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed, 29 Aug 2007 12:49:08 -0400
Subject: [Python-3000] ctype crashes
Message-ID: <e8bf7a530708290949m2cffe48dm13bf94f0d931e37e@mail.gmail.com>

I'm seeing a bunch of C extensions crash on my box.  I'm uncertain
about a few issues, but I think I'm running 32-bit binary on a 64-bit
linux box.  The crash I see in ctypes is the following:

#0  0x080a483e in PyUnicodeUCS2_FromString (u=0x5 <Address 0x5 out of bounds>)
    at ../Objects/unicodeobject.c:471
#1  0xf7cd4f8e in z_get (ptr=0x0, size=4)
    at /usr/local/google/home/jhylton/python/py3k/Modules/_ctypes/cfield.c:1380
#2  0xf7ccdbb5 in Simple_get_value (self=0xf7ba8a04)
    at /usr/local/google/home/jhylton/python/py3k/Modules/_ctypes/_ctypes.c:3976
#3  0x0807f218 in PyObject_GenericGetAttr (obj=0xf7ba8a04, name=0xf7e26ea0)
    at ../Objects/object.c:1098
#4  0x080b63da in PyEval_EvalFrameEx (f=0x81ca8fc, throwflag=0)
    at ../Python/ceval.c:1937

I'll look at this again sometime this afternoon, but I'm headed for lunch now.

Jeremy

From guido at python.org  Wed Aug 29 18:52:25 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 09:52:25 -0700
Subject: [Python-3000] [Python-checkins] buildbot failure in S-390
	Debian 3.0
In-Reply-To: <ca471dc20708290620g3ad16f2es5c29d7f465c4e4af@mail.gmail.com>
References: <20070829125131.714341E4002@bag.python.org>
	<46D56E81.1030604@trueblade.com>
	<ca471dc20708290620g3ad16f2es5c29d7f465c4e4af@mail.gmail.com>
Message-ID: <ca471dc20708290952v7211ce49wf94fc140e3c4918b@mail.gmail.com>

Never mind. Amaury pointed out that the code already includes
PY_FORMAT_SIZE_T, but that particular platform doesn't support %zd.
Maybe PY_FORMAT_SIZE_T should be "l" instead on that platform? (As
it's not Windows I'm pretty sure sizeof(long) == sizeof(void*)...)

--Guido

On 8/29/07, Guido van Rossum <guido at python.org> wrote:
> I noticed this failed with a traceback from distutils, caused by a bug
> in an exception handler; which I fixed.
>
> I also noticed that there were a *lot* of warnings like this:
>
> Objects/object.c:193: warning: format '%d' expects type 'int', but
> argument 7 has type 'Py_ssize_t'
>
> These can be fixed using %zd I believe. Any volunteers? This would
> tremendously improve 64-bit quality!
>
> --Guido
>
> On 8/29/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> > The URL is getting mangled, it should be:
> > http://www.python.org/dev/buildbot/all/S-390%20Debian%203.0/builds/9
> >
> > buildbot at python.org wrote:
> > > The Buildbot has detected a new failure of S-390 Debian 3.0.
> > > Full details are available at:
> > >  http://www.python.org/dev/buildbot/all/S-390%2520Debian%25203.0/builds/9
> > >
> > > Buildbot URL: http://www.python.org/dev/buildbot/all/
> > >
> > > Build Reason:
> > > Build Source Stamp: [branch branches/py3k] HEAD
> > > Blamelist: eric.smith
> > >
> > > BUILD FAILED: failed compile
> > >
> > > sincerely,
> > >  -The Buildbot
> > >
> > > _______________________________________________
> > > Python-checkins mailing list
> > > Python-checkins at python.org
> > > http://mail.python.org/mailman/listinfo/python-checkins
> > >
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Aug 29 19:08:44 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 10:08:44 -0700
Subject: [Python-3000] Invalid type for 'u' argument 3
Message-ID: <ca471dc20708291008h647c1c34w30bc194e8e5e6fd2@mail.gmail.com>

On some buildbots I see this failure to build the datetime module:

building 'datetime' extension
gcc -pthread -fPIC -fno-strict-aliasing -g -Wall -Wstrict-prototypes
-I. -I/home2/buildbot/slave/3.0.loewis-linux/build/./Include
-I./Include -I. -I/usr/local/include
-I/home2/buildbot/slave/3.0.loewis-linux/build/Include
-I/home2/buildbot/slave/3.0.loewis-linux/build -c
/home2/buildbot/slave/3.0.loewis-linux/build/Modules/datetimemodule.c
-o build/temp.linux-i686-3.0/home2/buildbot/slave/3.0.loewis-linux/build/Modules/datetimemodule.o
/home2/buildbot/slave/3.0.loewis-linux/build/Modules/datetimemodule.c:
In function 'datetime_strptime':
/home2/buildbot/slave/3.0.loewis-linux/build/Modules/datetimemodule.c:3791:
error: Invalid type for 'u' argument 3

The source line is this:

	if (!PyArg_ParseTuple(args, "uu:strptime", &string, &format))

I hink this is relevant, in pyport.h:

#ifdef HAVE_ATTRIBUTE_FORMAT_PARSETUPLE
#define Py_FORMAT_PARSETUPLE(func,p1,p2) __attribute__((format(func,p1,p2)))
#else
#define Py_FORMAT_PARSETUPLE(func,p1,p2)
#endif

But how does this work?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Wed Aug 29 19:17:31 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Aug 2007 19:17:31 +0200
Subject: [Python-3000] [Python-checkins] buildbot failure in
 S-390	Debian 3.0
In-Reply-To: <ca471dc20708290952v7211ce49wf94fc140e3c4918b@mail.gmail.com>
References: <20070829125131.714341E4002@bag.python.org>	<46D56E81.1030604@trueblade.com>	<ca471dc20708290620g3ad16f2es5c29d7f465c4e4af@mail.gmail.com>
	<ca471dc20708290952v7211ce49wf94fc140e3c4918b@mail.gmail.com>
Message-ID: <46D5AA2B.6070409@v.loewis.de>

Guido van Rossum schrieb:
> Never mind. Amaury pointed out that the code already includes
> PY_FORMAT_SIZE_T, but that particular platform doesn't support %zd.
> Maybe PY_FORMAT_SIZE_T should be "l" instead on that platform? (As
> it's not Windows I'm pretty sure sizeof(long) == sizeof(void*)...)

Are you still talking about S/390? I see this from configure:

checking size of int... 4
checking size of long... 4
checking size of void *... 4
checking size of size_t... 4

So:
a) it's not a 64-bit system (it should then be an 31-bit system),
b) Python already would use %ld if sizeof(ssize_t)!=sizeof(int),
   but sizeof(ssize_t)==sizeof(long)

Not sure why gcc is complaining; according to this change to APR

http://www.mail-archive.com/dev at apr.apache.org/msg18533.html

it might still be that the warning goes away if %ld is used on
S390 (similar to what is done for __APPLE__). Interestingly
enough, they use this code for OSX :-)

+    *apple-darwin*)
+        osver=`uname -r`
+        case $osver in
+           [[0-7]].*)
+              ssize_t_fmt="d"
+              ;;
+           *)
+              ssize_t_fmt="ld"
+              ;;
+        esac

Regards,
Martin

From martin at v.loewis.de  Wed Aug 29 19:24:16 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Aug 2007 19:24:16 +0200
Subject: [Python-3000] Invalid type for 'u' argument 3
In-Reply-To: <ca471dc20708291008h647c1c34w30bc194e8e5e6fd2@mail.gmail.com>
References: <ca471dc20708291008h647c1c34w30bc194e8e5e6fd2@mail.gmail.com>
Message-ID: <46D5ABC0.8040903@v.loewis.de>

> On some buildbots I see this failure to build the datetime module:

See also bugs.python.org/1055.

> error: Invalid type for 'u' argument 3
> 
> The source line is this:
> 
> 	if (!PyArg_ParseTuple(args, "uu:strptime", &string, &format))

and string and format are of type char*; for "u", they should be of
type Py_UNICODE*.

> I hink this is relevant, in pyport.h:
> 
> #ifdef HAVE_ATTRIBUTE_FORMAT_PARSETUPLE
> #define Py_FORMAT_PARSETUPLE(func,p1,p2) __attribute__((format(func,p1,p2)))
> #else
> #define Py_FORMAT_PARSETUPLE(func,p1,p2)
> #endif
> 
> But how does this work?

Also consider this:

PyAPI_FUNC(int) PyArg_ParseTuple(PyObject *, const char *, ...)
Py_FORMAT_PARSETUPLE(PyArg_ParseTuple, 2, 3);

Together, they expand to

int PyArg_ParseTuple(PyObject*, const char*, ...)
__attribute__((format(PyArg_ParseTuple, 2, 3)));

It's exactly one buildbot slave (I hope), the one that runs mvlgcc. I
created a patch for GCC to check ParseTuple calls for correctness, and
set up a buildbot slave to use this compiler.

Regards,
Martin


From guido at python.org  Wed Aug 29 19:28:58 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 10:28:58 -0700
Subject: [Python-3000] Invalid type for 'u' argument 3
In-Reply-To: <46D5ABC0.8040903@v.loewis.de>
References: <ca471dc20708291008h647c1c34w30bc194e8e5e6fd2@mail.gmail.com>
	<46D5ABC0.8040903@v.loewis.de>
Message-ID: <ca471dc20708291028u3494c3f5v1de1b843a510a1ef@mail.gmail.com>

Oh wow. I see, very clever and useful. It found a real bug! (Except it
was transparent since these variables were only used to pass to
another "uu" format, canceling out the type.) Fixed. Committed
revision 57665.

--Guido

On 8/29/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > On some buildbots I see this failure to build the datetime module:
>
> See also bugs.python.org/1055.
>
> > error: Invalid type for 'u' argument 3
> >
> > The source line is this:
> >
> >       if (!PyArg_ParseTuple(args, "uu:strptime", &string, &format))
>
> and string and format are of type char*; for "u", they should be of
> type Py_UNICODE*.
>
> > I hink this is relevant, in pyport.h:
> >
> > #ifdef HAVE_ATTRIBUTE_FORMAT_PARSETUPLE
> > #define Py_FORMAT_PARSETUPLE(func,p1,p2) __attribute__((format(func,p1,p2)))
> > #else
> > #define Py_FORMAT_PARSETUPLE(func,p1,p2)
> > #endif
> >
> > But how does this work?
>
> Also consider this:
>
> PyAPI_FUNC(int) PyArg_ParseTuple(PyObject *, const char *, ...)
> Py_FORMAT_PARSETUPLE(PyArg_ParseTuple, 2, 3);
>
> Together, they expand to
>
> int PyArg_ParseTuple(PyObject*, const char*, ...)
> __attribute__((format(PyArg_ParseTuple, 2, 3)));
>
> It's exactly one buildbot slave (I hope), the one that runs mvlgcc. I
> created a patch for GCC to check ParseTuple calls for correctness, and
> set up a buildbot slave to use this compiler.
>
> Regards,
> Martin
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jeremy at alum.mit.edu  Wed Aug 29 19:31:51 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed, 29 Aug 2007 13:31:51 -0400
Subject: [Python-3000] proposal: comparing bytes and str raises TypeError
Message-ID: <e8bf7a530708291031g6a11a585ka01366cfd7d87b9a@mail.gmail.com>

As I was cleaning up the http libraries, I noticed a lot of code that
has comparisons with string literals.  As we change code to return
bytes instead of strings, these comparisons start to fail silently.
When you're lucky, you have a test that catches the failure.  In the
httplib case, there were a couple places where the code got stuck in a
loop, because it was waiting for a socket to return "" before exiting.
 There are lots of places where we are not so lucky.

I made a local change to my bytesobject.c to raise an exception
whenever it is compared to a PyUnicode_Object.  This has caught a
number of real bugs that weren't caught by the test suite.  I think we
should make this the expected behavior for comparisons of bytes and
strings, because users are going to have the same problem and it's
hard to track down without changing the interpreter.

The obvious downside is that you can't have a heterogeneous containers
that mix strings and bytes:
>>> L = ["1", b"1"]
>>> "1" in L
True
>>> "2" in L
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare str and bytes

But I'm not sure that we actually need to support this case.

Jeremy

From eric+python-dev at trueblade.com  Wed Aug 29 19:34:37 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 29 Aug 2007 13:34:37 -0400
Subject: [Python-3000] Can a Python object move in memory?
Message-ID: <46D5AE2D.7050005@trueblade.com>

As part of the PEP 3101 stuff, I have an iterator that I've written in 
C.  It has a PyUnicodeObject* in it, which holds the string I'm parsing. 
  I do some parsing, return a result, do some more parsing on the next 
call, etc.  This code is callable from Python code.

I keep Py_UNICODE* pointers into this PyUnicodeObject in my iterator 
object, and I access these pointers on subsequent calls to my next() 
method.  Is this an error?  The more I think about it the more convinced 
I am it's an error.

I can change it to use indexes instead of pointers pretty easily, if I 
need to.

From skip at pobox.com  Wed Aug 29 19:36:43 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 29 Aug 2007 12:36:43 -0500
Subject: [Python-3000] Will Py3K be friendlier to optimization opportunities?
Message-ID: <18133.44715.247285.482372@montanaro.dyndns.org>

At various times in the past Python's highly dynamic nature has gotten in
the way of various optimizations (consider optimizing access to globals
which a number of us have taken cracks at).  I believe Guido has said on
more than one occasion that he could see Python becoming a bit less dynamic
to allow some of these sorts of optimizations (I hope I'm not putting words
into your virtual mouth, Guido).  Another thing that pops up from
time-to-time is the GIL and its impact on multithreaded applications.

Is Python 3 likely to change in any way so as to make future performance
optimization work more fruitful?  I realize that it may be more reasonable
to expect extreme performance gains to come from Python-like systems like
Pyrex or ShedSkin, but it might still be worthwhile to consider what might
be possible after 3.0a1 is released.

Based on the little reading I've done in the PEPs, the changes I've seen
that lean in this direction are:

* from ... import * is no longer supported at function scope
* None, True and False become keywords
* optional function annotations (PEP 3107)

I'm sure there must be other changes which, while not strictly done to
support further optimization, will allow more to be done in some areas.

Is there more than that?

Skip


From martin at v.loewis.de  Wed Aug 29 19:41:08 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Aug 2007 19:41:08 +0200
Subject: [Python-3000] Can a Python object move in memory?
In-Reply-To: <46D5AE2D.7050005@trueblade.com>
References: <46D5AE2D.7050005@trueblade.com>
Message-ID: <46D5AFB4.30205@v.loewis.de>

> I keep Py_UNICODE* pointers into this PyUnicodeObject in my iterator 
> object, and I access these pointers on subsequent calls to my next() 
> method.  Is this an error?  The more I think about it the more convinced 
> I am it's an error.

Because the pointer may change? There is a (silent) promise that for
a given PyUnicodeObject, the Py_UNICODE* will never change.

Regards,
Martin

From eric+python-dev at trueblade.com  Wed Aug 29 19:44:13 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Wed, 29 Aug 2007 13:44:13 -0400
Subject: [Python-3000] Can a Python object move in memory?
In-Reply-To: <46D5AFB4.30205@v.loewis.de>
References: <46D5AE2D.7050005@trueblade.com> <46D5AFB4.30205@v.loewis.de>
Message-ID: <46D5B06D.3040002@trueblade.com>

Martin v. L?wis wrote:
>> I keep Py_UNICODE* pointers into this PyUnicodeObject in my iterator 
>> object, and I access these pointers on subsequent calls to my next() 
>> method.  Is this an error?  The more I think about it the more convinced 
>> I am it's an error.
> 
> Because the pointer may change? There is a (silent) promise that for
> a given PyUnicodeObject, the Py_UNICODE* will never change.

Right, it's the pointer changing that I'm worried about.  Should I not 
bother with changing my code, then?


From martin at v.loewis.de  Wed Aug 29 19:45:33 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Aug 2007 19:45:33 +0200
Subject: [Python-3000] Will Py3K be friendlier to optimization
	opportunities?
In-Reply-To: <18133.44715.247285.482372@montanaro.dyndns.org>
References: <18133.44715.247285.482372@montanaro.dyndns.org>
Message-ID: <46D5B0BD.6010209@v.loewis.de>

> Is Python 3 likely to change in any way so as to make future performance
> optimization work more fruitful?

I think Python 3 is likely what you see in subversion today, plus any
PEPs that have been accepted and not yet implemented (there are only
few of these). So any higher-reaching goal must wait for Python 4,
Python 5, or Python 6.

In particular, people have repeatedly requested that the GIL be removed
for Python 3. There is nothing remotely resembling a patch implementing
such a feature at the moment, so this won't happen.

Regards,
Martin

From amauryfa at gmail.com  Wed Aug 29 19:51:06 2007
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Wed, 29 Aug 2007 19:51:06 +0200
Subject: [Python-3000] py3k patches for Windows
Message-ID: <e27efe130708291051j32fc0592jb8aa710a8675b2eb@mail.gmail.com>

Hello,

I recently created patches that correct some problems in py3k on Windows.
They are:

- http://bugs.python.org/issue1029 io.StringIO used to transform \n into \r\n.
This problem must be fixed, if you want the stdout comparisons and
doctests to succeed.

- http://bugs.python.org/issue1047 converts PC/subprocess.c to full
Unicode (no more PyString...) and test_subprocess passes without a
change.

- http://bugs.python.org/issue1048 corrects a bogus %zd format used
somewhere by test_float, and prevents a crash...

- http://bugs.python.org/issue1050 prevents test_marshal from crashing
on debug builds where vc8 seems to insert additional items on the
stack: reduce the recursion level.

Would someone want to review (and discuss) them and apply to the branch?
Tonight I plan to have a list of the remaining failing tests (before
the buildbots ;-) )
and maybe propose corrections for some of those...

-- 
Amaury Forgeot d'Arc

From guido at python.org  Wed Aug 29 19:53:28 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 10:53:28 -0700
Subject: [Python-3000] proposal: comparing bytes and str raises TypeError
In-Reply-To: <e8bf7a530708291031g6a11a585ka01366cfd7d87b9a@mail.gmail.com>
References: <e8bf7a530708291031g6a11a585ka01366cfd7d87b9a@mail.gmail.com>
Message-ID: <ca471dc20708291053r222c7f7bp68f230ce4e43fba0@mail.gmail.com>

Thanks! I simply forgot about this. Can yuo check in the change to
bytesobject.c? We'll deal with the fallout shortly.

On 8/29/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> As I was cleaning up the http libraries, I noticed a lot of code that
> has comparisons with string literals.  As we change code to return
> bytes instead of strings, these comparisons start to fail silently.
> When you're lucky, you have a test that catches the failure.  In the
> httplib case, there were a couple places where the code got stuck in a
> loop, because it was waiting for a socket to return "" before exiting.
>  There are lots of places where we are not so lucky.
>
> I made a local change to my bytesobject.c to raise an exception
> whenever it is compared to a PyUnicode_Object.  This has caught a
> number of real bugs that weren't caught by the test suite.  I think we
> should make this the expected behavior for comparisons of bytes and
> strings, because users are going to have the same problem and it's
> hard to track down without changing the interpreter.
>
> The obvious downside is that you can't have a heterogeneous containers
> that mix strings and bytes:
> >>> L = ["1", b"1"]
> >>> "1" in L
> True
> >>> "2" in L
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: can't compare str and bytes
>
> But I'm not sure that we actually need to support this case.
>
> Jeremy
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Wed Aug 29 20:12:46 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Aug 2007 20:12:46 +0200
Subject: [Python-3000] Can a Python object move in memory?
In-Reply-To: <46D5B06D.3040002@trueblade.com>
References: <46D5AE2D.7050005@trueblade.com> <46D5AFB4.30205@v.loewis.de>
	<46D5B06D.3040002@trueblade.com>
Message-ID: <46D5B71E.3030409@v.loewis.de>

>> Because the pointer may change? There is a (silent) promise that for
>> a given PyUnicodeObject, the Py_UNICODE* will never change.
> 
> Right, it's the pointer changing that I'm worried about.  Should I not
> bother with changing my code, then?

Correct. If you think this promise should be given explicitly in the
documentation, feel free to propose a documentation patch.

Of course, if the underlying (rather, encapsulating) PyObject goes
away, the pointer becomes invalid. IIUC, you have some guarantee
that the unicode object will stay available all the time.

Regards,
Martin


From theller at ctypes.org  Wed Aug 29 20:37:09 2007
From: theller at ctypes.org (Thomas Heller)
Date: Wed, 29 Aug 2007 20:37:09 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D57D47.1090709@v.loewis.de>
References: <46D453E9.4020903@ctypes.org>
	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>
	<46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de>
Message-ID: <fb4ecl$t5a$1@sea.gmane.org>

Martin v. L?wis schrieb:
>> Do you know if it is possible to configure windows so that debug assertions do NOT
>> display a message box (it is very convenient for interactive testing, but not so
>> for automatic tests)?
> 
> You can use _set_error_mode(_OUT_TO_STDERR) to make assert() go to
> stderr rather than to a message box. You can use
> _CrtSetReportMode(_CRT_ASSERT /* or _CRT_WARN or CRT_ERROR */,
> _CRTDBG_MODE_FILE) to make _ASSERT() go to a file; you need to
> call _CrtSetReportFile( _CRT_ASSERT, _CRTDBG_FILE_STDERR ) in
> addition to make the file stderr.
> 
> Not sure what window precisely you got, so I can't comment which
> of these (if any) would have made the message go away.

Currently, the debug build of py3k fails in test_os.py with an assertion
in the C library inside the execv call.  This displays a dialogbox from the MSVC
Debug Library:

  Debug Assertion Failed!
  Program: c:\svn\py3k\PCBuild\python_d.exe
  File: execv.c
  Line: 44

  Expression: *argvector != NULL

  For information ....
  (Press Retry to debug the application)

  Abbrechen Wiederholen Ignorieren

The last line is the labels on three buttons that are displayed.


If I insert these statements into Modules\posixmodule.c:

	_CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE);
	_CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDERR);
	_CrtSetReportMode(_CRT_ERROR, _CRTDBG_MODE_FILE);
	_CrtSetReportFile(_CRT_ERROR, _CRTDBG_FILE_STDERR);
	_CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE);
	_CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);

	_set_error_mode(_OUT_TO_STDERR);

and recompile and test then the dialog box looks like this:

  Die Anweisung in "0x10..." verweist auf Speicher in "0x00000000".  Der Vorgang
  "read" konnte nicht im Speciher durchgef?hrt werden.

  Klicken Sie auf "OK", um das programm zu beenden.
  Klicken Sie auf "Abbrechen", um das programm zu debuggen.

      OK Abbrechen

These messageboxes of course hang the tests on the windows build servers,
so probably it would be good if they could be disabled completely.

Thomas


From jeremy at alum.mit.edu  Wed Aug 29 21:04:48 2007
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed, 29 Aug 2007 15:04:48 -0400
Subject: [Python-3000] ctype crashes
In-Reply-To: <e8bf7a530708290949m2cffe48dm13bf94f0d931e37e@mail.gmail.com>
References: <e8bf7a530708290949m2cffe48dm13bf94f0d931e37e@mail.gmail.com>
Message-ID: <e8bf7a530708291204t252dd9e4g161554c4ac28492a@mail.gmail.com>

Never mind.  This was an optimized build.  When I did a make clean and
rebuilt, it went away.

Jeremy

On 8/29/07, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> I'm seeing a bunch of C extensions crash on my box.  I'm uncertain
> about a few issues, but I think I'm running 32-bit binary on a 64-bit
> linux box.  The crash I see in ctypes is the following:
>
> #0  0x080a483e in PyUnicodeUCS2_FromString (u=0x5 <Address 0x5 out of bounds>)
>     at ../Objects/unicodeobject.c:471
> #1  0xf7cd4f8e in z_get (ptr=0x0, size=4)
>     at /usr/local/google/home/jhylton/python/py3k/Modules/_ctypes/cfield.c:1380
> #2  0xf7ccdbb5 in Simple_get_value (self=0xf7ba8a04)
>     at /usr/local/google/home/jhylton/python/py3k/Modules/_ctypes/_ctypes.c:3976
> #3  0x0807f218 in PyObject_GenericGetAttr (obj=0xf7ba8a04, name=0xf7e26ea0)
>     at ../Objects/object.c:1098
> #4  0x080b63da in PyEval_EvalFrameEx (f=0x81ca8fc, throwflag=0)
>     at ../Python/ceval.c:1937
>
> I'll look at this again sometime this afternoon, but I'm headed for lunch now.
>
> Jeremy
>

From skip at pobox.com  Wed Aug 29 21:21:25 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 29 Aug 2007 14:21:25 -0500
Subject: [Python-3000] Will Py3K be friendlier to optimization
 opportunities?
In-Reply-To: <46D5B0BD.6010209@v.loewis.de>
References: <18133.44715.247285.482372@montanaro.dyndns.org>
	<46D5B0BD.6010209@v.loewis.de>
Message-ID: <18133.50997.687840.234611@montanaro.dyndns.org>


    >> Is Python 3 likely to change in any way so as to make future
    >> performance optimization work more fruitful?

    Martin> In particular, people have repeatedly requested that the GIL be
    Martin> removed for Python 3. There is nothing remotely resembling a
    Martin> patch implementing such a feature at the moment, so this won't
    Martin> happen.

I certainly wasn't expecting something to be available for review now or in
the near future.  I was actually mostly thinking about language syntax and
semantics when I started writing that email.  I think those are more likely
to be frozen early on in the 3.0 development cycle.  I seem to recall some
message(s) on python-dev a long time ago about maybe restricting outside
modification of a module's globals, e.g.:

    import a
    a.x = 1     # proposed as an error?

which would have allowed easier optimization of global access.

Skip

From barry at python.org  Wed Aug 29 21:33:02 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 29 Aug 2007 15:33:02 -0400
Subject: [Python-3000] proposal: comparing bytes and str raises TypeError
In-Reply-To: <e8bf7a530708291031g6a11a585ka01366cfd7d87b9a@mail.gmail.com>
References: <e8bf7a530708291031g6a11a585ka01366cfd7d87b9a@mail.gmail.com>
Message-ID: <F7AAB808-0A62-4349-A1FB-646081D598AC@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 29, 2007, at 1:31 PM, Jeremy Hylton wrote:

> I made a local change to my bytesobject.c to raise an exception
> whenever it is compared to a PyUnicode_Object.

+1.  I hit several silent errors in the email package because of  
this.  A TypeError would have been very helpful!

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtXJ7nEjvBPtnXfVAQLFAwP+MuHX4glrmiapgxdpF9jYxXdvEZ7Bt0sn
VPq0KRgwj/t97CyqA15d2oo/ojkiZagk3erCKfVT8LQUHb73P9334gEVVWt6bIQn
2Cz8S40WhpOysr0FyLYbdhoPKTx4XihK1cmOZJ/Odv2G8SEjaKQfHlY5qAeHwAV7
M85o+Rc5U8o=
=cT6d
-----END PGP SIGNATURE-----

From barry at python.org  Wed Aug 29 22:03:31 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 29 Aug 2007 16:03:31 -0400
Subject: [Python-3000] Does bytes() need to support bytes(<str>,
	<encoding>)?
In-Reply-To: <ca471dc20708280821x3d6eef54l648a901379fa60db@mail.gmail.com>
References: <ca471dc20708271338v68c51b53sd66a549e64a114af@mail.gmail.com>
	<06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org>
	<ca471dc20708272020x2301e3f3tcd59c773a05f3c@mail.gmail.com>
	<7A1473AB-611F-4D53-82EB-E9682F2741CD@python.org>
	<ca471dc20708280821x3d6eef54l648a901379fa60db@mail.gmail.com>
Message-ID: <1ACC173E-77E7-430B-9781-71862A81202E@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 28, 2007, at 11:21 AM, Guido van Rossum wrote:

>> Nope.  So what would bytes(s) do?
>
> Raise TypeError (when s is a str). The argument to bytes() must be
> either an int (then it creates a zero-filled bytes bytes array of that
> length) or an iterable of ints (then it creates a bytes array
> initialized with those ints -- if any int is out of range, an
> exception is raised, and also if any value is not an int).

+1
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtXRFHEjvBPtnXfVAQLY+gP9HsP7Va5ZNdBLEO/yeOU+AQwmjyR+ei4Y
KqRK6PNV+7dOGUPeExgfvZhmKoPmu11Q6EYQMFcCFN1/2xb/OooQYaSrT4nI6P3J
eNxfmYrUu4H49myygC1IezswJWuestJi3KLawS8MFdLUqphQloH5QfZLBQsRIV8/
m2x3CVXfOrY=
=bO41
-----END PGP SIGNATURE-----

From martin at v.loewis.de  Wed Aug 29 22:08:12 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Aug 2007 22:08:12 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <fb4ecl$t5a$1@sea.gmane.org>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>
	<46D57D47.1090709@v.loewis.de> <fb4ecl$t5a$1@sea.gmane.org>
Message-ID: <46D5D22C.3010003@v.loewis.de>

> If I insert these statements into Modules\posixmodule.c:
> 
> 	_CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE);
> 	_CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDERR);
> 	_CrtSetReportMode(_CRT_ERROR, _CRTDBG_MODE_FILE);
> 	_CrtSetReportFile(_CRT_ERROR, _CRTDBG_FILE_STDERR);
> 	_CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE);
> 	_CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
> 
> 	_set_error_mode(_OUT_TO_STDERR);
> 
> and recompile and test then the dialog box looks like this:

Do you get an output to stderr before that next dialog box?

>   Die Anweisung in "0x10..." verweist auf Speicher in "0x00000000".  Der Vorgang
>   "read" konnte nicht im Speciher durchgef?hrt werden.
> 
>   Klicken Sie auf "OK", um das programm zu beenden.
>   Klicken Sie auf "Abbrechen", um das programm zu debuggen.

That is not from the C library, but from the operating system.
Apparently, the CRT continues after giving out the assertion
failure. To work around that, it would be possible to install
a report hook (using _CrtSetReportHook(2)). This hook would
output the error message, and then TerminateProcess.

> These messageboxes of course hang the tests on the windows build servers,
> so probably it would be good if they could be disabled completely.

I think this will be very difficult to achieve.

Regards,
Martin

From guido at python.org  Wed Aug 29 22:19:17 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 13:19:17 -0700
Subject: [Python-3000] py3k patches for Windows
In-Reply-To: <e27efe130708291051j32fc0592jb8aa710a8675b2eb@mail.gmail.com>
References: <e27efe130708291051j32fc0592jb8aa710a8675b2eb@mail.gmail.com>
Message-ID: <ca471dc20708291319n54a7859r4ceed9fe9d9c8b1@mail.gmail.com>

I've checked all those in except the change that prevents closing fds
0, 1, 2. Watch the buildbots, I have no Windows access myself.

On 8/29/07, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
> Hello,
>
> I recently created patches that correct some problems in py3k on Windows.
> They are:
>
> - http://bugs.python.org/issue1029 io.StringIO used to transform \n into \r\n.
> This problem must be fixed, if you want the stdout comparisons and
> doctests to succeed.
>
> - http://bugs.python.org/issue1047 converts PC/subprocess.c to full
> Unicode (no more PyString...) and test_subprocess passes without a
> change.
>
> - http://bugs.python.org/issue1048 corrects a bogus %zd format used
> somewhere by test_float, and prevents a crash...
>
> - http://bugs.python.org/issue1050 prevents test_marshal from crashing
> on debug builds where vc8 seems to insert additional items on the
> stack: reduce the recursion level.
>
> Would someone want to review (and discuss) them and apply to the branch?
> Tonight I plan to have a list of the remaining failing tests (before
> the buildbots ;-) )
> and maybe propose corrections for some of those...
>
> --
> Amaury Forgeot d'Arc
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From db3l.net at gmail.com  Wed Aug 29 22:34:13 2007
From: db3l.net at gmail.com (David Bolen)
Date: Wed, 29 Aug 2007 16:34:13 -0400
Subject: [Python-3000] buildbots
References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de>
	<46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org>
	<46D57D47.1090709@v.loewis.de> <fb4ecl$t5a$1@sea.gmane.org>
	<46D5D22C.3010003@v.loewis.de>
Message-ID: <m2wsvef5bu.fsf@valheru.db3l.homeip.net>

"Martin v. L?wis" <martin at v.loewis.de> writes:

>> These messageboxes of course hang the tests on the windows build servers,
>> so probably it would be good if they could be disabled completely.
>
> I think this will be very difficult to achieve.

Could the tests be run beneath a shim process that used SetErrorMode()
to disable all the OS-based process failure dialog boxes?  If I
remember correctly the error mode is inherited, so an independent
small exec module could reset the mode, and execute the normal test
sequence as a child process.

-- David


From db3l.net at gmail.com  Wed Aug 29 22:56:52 2007
From: db3l.net at gmail.com (David Bolen)
Date: Wed, 29 Aug 2007 16:56:52 -0400
Subject: [Python-3000] buildbots
References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de>
	<46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org>
	<46D57D47.1090709@v.loewis.de> <fb4ecl$t5a$1@sea.gmane.org>
	<46D5D22C.3010003@v.loewis.de>
	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>
Message-ID: <m2sl62f4a3.fsf@valheru.db3l.homeip.net>

David Bolen <db3l.net at gmail.com> writes:

> "Martin v. L?wis" <martin at v.loewis.de> writes:
>
>>> These messageboxes of course hang the tests on the windows build servers,
>>> so probably it would be good if they could be disabled completely.
>>
>> I think this will be very difficult to achieve.
>
> Could the tests be run beneath a shim process that used SetErrorMode()
> to disable all the OS-based process failure dialog boxes?  If I
> remember correctly the error mode is inherited, so an independent
> small exec module could reset the mode, and execute the normal test
> sequence as a child process.

Or if using ctypes is ok, perhaps it could be done right in the test
runner.

While I haven't done any local mods to preventthe C RTL boxes,
selecting Ignore on them gets me to the OS level box, and:

Python 3.0x (py3k, Aug 27 2007, 22:44:06) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import test_os
[50256 refs]
>>> test_os.test_main()

dies with popup in test_execvpe_with_bad_program (test_os.ExecTests).  But

Python 3.0x (py3k, Aug 27 2007, 22:44:06) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
[39344 refs]
>>> ctypes.windll.kernel32.SetErrorMode(7)
0
[40694 refs]
>>> import test_os
[55893 refs]
>>> test_os.test_main()

doesn't present the OS popup prior to process exit.

-- David


From collinw at gmail.com  Wed Aug 29 23:02:39 2007
From: collinw at gmail.com (Collin Winter)
Date: Wed, 29 Aug 2007 14:02:39 -0700
Subject: [Python-3000] Will Py3K be friendlier to optimization
	opportunities?
In-Reply-To: <18133.50997.687840.234611@montanaro.dyndns.org>
References: <18133.44715.247285.482372@montanaro.dyndns.org>
	<46D5B0BD.6010209@v.loewis.de>
	<18133.50997.687840.234611@montanaro.dyndns.org>
Message-ID: <43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com>

On 8/29/07, skip at pobox.com <skip at pobox.com> wrote:
[snip]
> I certainly wasn't expecting something to be available for review now or in
> the near future.  I was actually mostly thinking about language syntax and
> semantics when I started writing that email.  I think those are more likely
> to be frozen early on in the 3.0 development cycle.  I seem to recall some
> message(s) on python-dev a long time ago about maybe restricting outside
> modification of a module's globals, e.g.:
>
>     import a
>     a.x = 1     # proposed as an error?
>
> which would have allowed easier optimization of global access.

When thinking about these kinds of optimizations and restrictions,
keep in mind their effect on testing. For example, I work on code that
makes use of the ability to tinker with another module's view of
os.path in order to simulate error conditions that would otherwise be
hard to test. If you wanted to hide this kind of restriction behind an
-O flag, that would be one thing, but having it on by default seems
like a bad idea.

Collin Winter

From martin at v.loewis.de  Thu Aug 30 00:15:53 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 30 Aug 2007 00:15:53 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <m2wsvef5bu.fsf@valheru.db3l.homeip.net>
References: <46D453E9.4020903@ctypes.org>
	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>
	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>
	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>
	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>
Message-ID: <46D5F019.9090706@v.loewis.de>

>>> These messageboxes of course hang the tests on the windows build servers,
>>> so probably it would be good if they could be disabled completely.
>> I think this will be very difficult to achieve.
> 
> Could the tests be run beneath a shim process that used SetErrorMode()
> to disable all the OS-based process failure dialog boxes?

I did not know about that - it may help.

> If I
> remember correctly the error mode is inherited, so an independent
> small exec module could reset the mode, and execute the normal test
> sequence as a child process.

It would also be possible to put that into the interpreter itself,
at least when running in debug mode.

What does

"Instead, the system sends the error to the calling process."

mean?

Regards,
Martin

From db3l.net at gmail.com  Thu Aug 30 00:33:09 2007
From: db3l.net at gmail.com (David Bolen)
Date: Wed, 29 Aug 2007 18:33:09 -0400
Subject: [Python-3000] buildbots
In-Reply-To: <46D5F019.9090706@v.loewis.de>
References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de>
	<46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org>
	<46D57D47.1090709@v.loewis.de> <fb4ecl$t5a$1@sea.gmane.org>
	<46D5D22C.3010003@v.loewis.de>
	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>
	<46D5F019.9090706@v.loewis.de>
Message-ID: <9f94e2360708291533h139b3d37y2d9aef068228579c@mail.gmail.com>

On 8/29/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > If I
> > remember correctly the error mode is inherited, so an independent
> > small exec module could reset the mode, and execute the normal test
> > sequence as a child process.
>
> It would also be possible to put that into the interpreter itself,
> at least when running in debug mode.

Yep, although you might want to choose whether or not to do it in
interactive mode I suppose.  Or, as in my subsequent message, it could
just be incorporated into the test runner (such as in regrtest.py).

> What does
>
> "Instead, the system sends the error to the calling process."
>
> mean?

It's somewhat dependent on the type of problem that was going to lead
to the dialog box.  For a catastrophic failure (e.g., GPF), such as
those that only provide OK (or perhaps Cancel for debug) in the
dialog, the process is still going to be abruptly terminated, as if OK
was pressed with no further execution of code within the process
itself.  A parent process can detect this based on the exit code of
the subprocess.

For other less critical failures (like the box that pops up when
trying to open a file on a removable device that isn't present), an
error is simply returned to the calling process as the result of the
original system call that triggered the failure - just like any other
failing I/O operation, and equivalent I believe to hitting Cancel on
the dialog that would otherwise have popped up.

-- David

From amauryfa at gmail.com  Thu Aug 30 00:43:02 2007
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Thu, 30 Aug 2007 00:43:02 +0200
Subject: [Python-3000] py3k patches for Windows
In-Reply-To: <ca471dc20708291319n54a7859r4ceed9fe9d9c8b1@mail.gmail.com>
References: <e27efe130708291051j32fc0592jb8aa710a8675b2eb@mail.gmail.com>
	<ca471dc20708291319n54a7859r4ceed9fe9d9c8b1@mail.gmail.com>
Message-ID: <e27efe130708291543m52727791wf44146b5ccc63236@mail.gmail.com>

2007/8/29, Guido van Rossum <guido at python.org>:
> I've checked all those in except the change that prevents closing fds
> 0, 1, 2. Watch the buildbots, I have no Windows access myself.

Oops, the 0,1,2 fds stuff was not supposed to leave my workspace. I
agree that it should be done differently.

Note that there is currently only one Windows buildbot to watch - and
it is a win64. Most tests that fail there pass on my machine.
Nevertheless, the log is much smaller than before.

I see additional errors like "can't use str as char buffer". I suppose
it is because of the recent stricter distinction between bytes and
str.

Thanks for all,

-- 
Amaury Forgeot d'Arc

From db3l.net at gmail.com  Thu Aug 30 00:58:23 2007
From: db3l.net at gmail.com (David Bolen)
Date: Wed, 29 Aug 2007 18:58:23 -0400
Subject: [Python-3000] py3k patches for Windows
References: <e27efe130708291051j32fc0592jb8aa710a8675b2eb@mail.gmail.com>
	<ca471dc20708291319n54a7859r4ceed9fe9d9c8b1@mail.gmail.com>
	<e27efe130708291543m52727791wf44146b5ccc63236@mail.gmail.com>
Message-ID: <m2lkbueynk.fsf@valheru.db3l.homeip.net>

"Amaury Forgeot d'Arc" <amauryfa at gmail.com> writes:

> Note that there is currently only one Windows buildbot to watch - and
> it is a win64. Most tests that fail there pass on my machine.
> Nevertheless, the log is much smaller than before.

There's an offer of mine to host an additional Windows (win32)
buildbot, for whatever versions are helpful, in the moderator queue
for python-dev.  Although in looking at the current 3.0 buildbot
status page there would seem to be two others already running, so I
wasn't sure if another was really needed.

-- David


From guido at python.org  Thu Aug 30 01:06:40 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 16:06:40 -0700
Subject: [Python-3000] py3k patches for Windows
In-Reply-To: <e27efe130708291543m52727791wf44146b5ccc63236@mail.gmail.com>
References: <e27efe130708291051j32fc0592jb8aa710a8675b2eb@mail.gmail.com>
	<ca471dc20708291319n54a7859r4ceed9fe9d9c8b1@mail.gmail.com>
	<e27efe130708291543m52727791wf44146b5ccc63236@mail.gmail.com>
Message-ID: <ca471dc20708291606l6d380996y12172df5b194466b@mail.gmail.com>

On 8/29/07, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
> I see additional errors like "can't use str as char buffer". I suppose
> it is because of the recent stricter distinction between bytes and
> str.

Indeed. It typically means that something is using
PyObject_AsCharBuffer() instead of PyUnicode_AsStringAndSize(). Please
fix occurrences you find and upload patches.

It seems that any use of the 't#' format for PyArg_ParseTuple() also
triggers this error. I don't understand the code for that format; I've
been successful in the short term by changing these to 's#' but I
don't think that's the correct solution...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nnorwitz at gmail.com  Thu Aug 30 01:08:50 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Wed, 29 Aug 2007 16:08:50 -0700
Subject: [Python-3000] py3k patches for Windows
In-Reply-To: <m2lkbueynk.fsf@valheru.db3l.homeip.net>
References: <e27efe130708291051j32fc0592jb8aa710a8675b2eb@mail.gmail.com>
	<ca471dc20708291319n54a7859r4ceed9fe9d9c8b1@mail.gmail.com>
	<e27efe130708291543m52727791wf44146b5ccc63236@mail.gmail.com>
	<m2lkbueynk.fsf@valheru.db3l.homeip.net>
Message-ID: <ee2a432c0708291608u1dd1b4dck19a09a8d4fd8ccdc@mail.gmail.com>

On 8/29/07, David Bolen <db3l.net at gmail.com> wrote:
>
> There's an offer of mine to host an additional Windows (win32)
> buildbot, for whatever versions are helpful, in the moderator queue
> for python-dev.

Hmm, that's odd.

> Although in looking at the current 3.0 buildbot
> status page there would seem to be two others already running, so I
> wasn't sure if another was really needed.

I think it would help.  We have 4 windows buildbots IIRC, but 2 are
offline and there are various problems.  Given all the slight
variations of Windows, I think it would be good to get another Windows
bot.

Note: that a bot for one branch is a bot for all branches.  Only one
branch will be run at a time though.

Contact me offline with some of the details and I will try to get this
setup tonight.  Let me know what timezone you are in too.

Thanks!
n

From guido at python.org  Thu Aug 30 01:18:46 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 16:18:46 -0700
Subject: [Python-3000] Need Windows build instructions
Message-ID: <ca471dc20708291618v523af477uec4bcee59cae14fd@mail.gmail.com>

Can someone familiar with building Py3k on Windows add a section on
how to build it on Windows to the new README? (Do a svn up first; I've
rewritten most of it.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tim.peters at gmail.com  Thu Aug 30 01:25:36 2007
From: tim.peters at gmail.com (Tim Peters)
Date: Wed, 29 Aug 2007 19:25:36 -0400
Subject: [Python-3000] py3k patches for Windows
In-Reply-To: <ee2a432c0708291608u1dd1b4dck19a09a8d4fd8ccdc@mail.gmail.com>
References: <e27efe130708291051j32fc0592jb8aa710a8675b2eb@mail.gmail.com>
	<ca471dc20708291319n54a7859r4ceed9fe9d9c8b1@mail.gmail.com>
	<e27efe130708291543m52727791wf44146b5ccc63236@mail.gmail.com>
	<m2lkbueynk.fsf@valheru.db3l.homeip.net>
	<ee2a432c0708291608u1dd1b4dck19a09a8d4fd8ccdc@mail.gmail.com>
Message-ID: <1f7befae0708291625m79f1db4fyf3c3eb55e3b9f283@mail.gmail.com>

[Neal Norwitz]
> ...
> We have 4 windows buildbots IIRC, but 2 are offline and there are
> various problems.  Given all the slight variations of Windows, I
> think it would be good to get another Windows bot.

FYI, my bot is offline now just because I'm having major HW problems
(endless spontaneous reboots, possibly due to overheating).  I'm
paying a fellow who knows more about PC HW to come over this weekend,
and I hope that will resolve it.  If not, I'll just make the PSF buy
me a new machine ;-)  One way or another, I should get a bot back
online within the next couple weeks.

From nnorwitz at gmail.com  Thu Aug 30 01:28:36 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Wed, 29 Aug 2007 16:28:36 -0700
Subject: [Python-3000] py3k patches for Windows
In-Reply-To: <1f7befae0708291625m79f1db4fyf3c3eb55e3b9f283@mail.gmail.com>
References: <e27efe130708291051j32fc0592jb8aa710a8675b2eb@mail.gmail.com>
	<ca471dc20708291319n54a7859r4ceed9fe9d9c8b1@mail.gmail.com>
	<e27efe130708291543m52727791wf44146b5ccc63236@mail.gmail.com>
	<m2lkbueynk.fsf@valheru.db3l.homeip.net>
	<ee2a432c0708291608u1dd1b4dck19a09a8d4fd8ccdc@mail.gmail.com>
	<1f7befae0708291625m79f1db4fyf3c3eb55e3b9f283@mail.gmail.com>
Message-ID: <ee2a432c0708291628j566d645dq9268d11350115cf0@mail.gmail.com>

On 8/29/07, Tim Peters <tim.peters at gmail.com> wrote:
> [Neal Norwitz]
> > ...
> > We have 4 windows buildbots IIRC, but 2 are offline and there are
> > various problems.  Given all the slight variations of Windows, I
> > think it would be good to get another Windows bot.
>
> FYI, my bot is offline now just because I'm having major HW problems
> (endless spontaneous reboots, possibly due to overheating).  I'm
> paying a fellow who knows more about PC HW to come over this weekend,
> and I hope that will resolve it.  If not, I'll just make the PSF buy
> me a new machine ;-)  One way or another, I should get a bot back
> online within the next couple weeks.

Do you want two machines?  It's long past time for the PSF to buy you
a machine.  Getting you a new, faster machine would be great for the
PSF.  I still have a checkbook, how much do you need. :-)

n

From greg at krypto.org  Thu Aug 30 01:47:28 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 29 Aug 2007 16:47:28 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
Message-ID: <20070829234728.GV24059@electricrain.com>

Attached is what I've come up with so far.  Only a single field is
added to the PyBytesObject struct.  This adds support to the bytes
object for PyBUF_LOCKDATA buffer API operation.  bytes objects can be
marked temporarily read-only for use while the buffer api has handed
them off to something which may run without the GIL (think IO).  Any
attempt to modify them during that time will raise an exception as I
believe Martin suggested earlier.

As an added bonus because its been discussed here, support for setting
a bytes object immutable has been added since its pretty trivial once
the read only export support was in place.  Thats not required but was
trivial to include.

I'd appreciate any feedback.

My TODO list for this patch:

 0. Get feedback and make adjustments as necessary.

 1. Deciding between PyBUF_SIMPLE and PyBUF_WRITEABLE for the internal
    uses of the _getbuffer() function.  bytesobject.c contains both readonly
    and read-write uses of the buffers, i'll add boolean parameter for
    that.

 2. More testing: a few tests in the test suite fail after this but the
    number was low and I haven't had time to look at why or what the
    failures were.

 3. Exporting methods suggested in the TODO at the top of the file.

 4. Unit tests for all of the functionality this adds.

NOTE: after these changes I had to make clean and rm -rf build before
things would not segfault on import.  I suspect some things (modules?)
were not properly recompiled after the bytesobject.h struct change
otherwise.

-gps

-------------- next part --------------
Index: Include/bytesobject.h
===================================================================
--- Include/bytesobject.h	(revision 57679)
+++ Include/bytesobject.h	(working copy)
@@ -17,17 +17,18 @@
  * For the convenience of C programmers, the bytes type is considered
  * to contain a char pointer, not an unsigned char pointer.
  */
 
 /* Object layout */
 typedef struct {
     PyObject_VAR_HEAD
     /* XXX(nnorwitz): should ob_exports be Py_ssize_t? */
-    int ob_exports; /* how many buffer exports */
+    int ob_exports; /* How many buffer exports */
+    int ob_readonly_exports; /* How many buffer exports as readonly */
     Py_ssize_t ob_alloc; /* How many bytes allocated */
     char *ob_bytes;
 } PyBytesObject;
 
 /* Type object */
 PyAPI_DATA(PyTypeObject) PyBytes_Type;
 
 /* Type check macros */
Index: Objects/bytesobject.c
===================================================================
--- Objects/bytesobject.c	(revision 57679)
+++ Objects/bytesobject.c	(working copy)
@@ -1,16 +1,156 @@
 /* Bytes object implementation */
 
 /* XXX TO DO: optimizations */
 
 #define PY_SSIZE_T_CLEAN
 #include "Python.h"
 #include "structmember.h"
 
+/*
+ * Constants for use with the PyBytesObject.ob_readonly_exports.
+ */
+#define IMMUTABLE               (INT_MAX)
+#define MAX_READONLY_EXPORTS    (INT_MAX-1)
+
+/* 
+ * Should we bounds check PyBytesObject.ob_exports and
+ * ob_readonly_exports when we increment them?
+ */
+#if MAX_READONLY_EXPORTS <= USHRT_MAX
+#define BOUNDS_CHECK_EXPORTS 1
+#else
+#undef BOUNDS_CHECK_EXPORTS
+#endif
+
+/*
+ * XXX(gps): I added support for immutability because it was a trivial
+ * addition to the work I was already doing to add PyBUF_READLOCK
+ * support to bytes objects.  It isn't required but is included as an
+ * example to decide if it should stay.
+ *
+ * TODO(gps) Do we want to provide an exported interface for any of 
+ * these inlines for use by C code that uses Bytes objects directly
+ * rather than the buffer API?  I suggest C code should prefer to use
+ * the buffer API (though it is heavier weight).
+ * 
+ *   APIs I think should be public in C and Python:
+ *     is_readonly
+ *     set_immutable  &  is_immutable
+ */
+
+/*
+ * Set a bytes object to be immutable.  If outstanding non-readonly
+ * exports exist this will raise an error instead.  Once immutable,
+ * always immutable.  This cannot be undone.
+ *
+ * Returns: 0 on success, 1 on failure with an exception set.
+ */
+Py_LOCAL_INLINE(int) set_immutable(PyBytesObject *obj)
+{
+    if (obj->ob_exports > 0) {
+        PyErr_SetString(PyExc_BufferError,
+                "bytes with outstanding non-readonly exports"
+                "cannot be made immutable.");
+        return 1;
+    }
+    obj->ob_readonly_exports = IMMUTABLE;
+    return 0;
+}
+
+/*
+ * Is this bytes object immutable?  0: no, 1: yes
+ */
+Py_LOCAL_INLINE(int) is_immutable(PyBytesObject *obj)
+{
+    return obj->ob_readonly_exports == IMMUTABLE;
+}
+
+/*
+ * Is this bytes object currently read only?  0: no, 1: yes
+ */
+Py_LOCAL_INLINE(int) is_readonly(PyBytesObject *obj)
+{
+    assert(is_immutable(obj) || obj->ob_readonly_exports <= obj->ob_exports);
+    return (obj->ob_readonly_exports > 0 && obj->ob_exports == 0);
+}
+
+/*
+ * Increment the export count.  For use by getbuffer.
+ *
+ * Returns: 0 on success, -1 on failure with an exception set.
+ * (-1 matches the required buffer API getbuffer return value)
+ */
+Py_LOCAL_INLINE(int) inc_exports(PyBytesObject *obj)
+{
+    obj->ob_exports++;
+#ifdef BOUNDS_CHECK_EXPORTS
+    if (obj->ob_exports <= 0) {
+        PyErr_SetString(PyExc_RuntimeError,
+                "ob_exports integer overflow");
+        obj->ob_exports--;
+        return -1;
+    }
+#endif
+    return 0;
+}
+
+/*
+ * Decrement the export count.  For use by releasebuffer.
+ */
+Py_LOCAL_INLINE(void) dec_exports(PyBytesObject *obj)
+{
+    obj->ob_exports--;
+}
+
+
+/*
+ * Increment the readonly export count if the object is mutable.
+ * Must be called with the GIL held.
+ *
+ * For use by the buffer API to implement PyBUF_LOCKDATA requests.
+ */
+Py_LOCAL_INLINE(void) inc_readonly_exports(PyBytesObject *obj)
+{
+#ifdef BOUNDS_CHECK_EXPORTS
+    if (obj->ob_readonly_exports == MAX_READONLY_EXPORTS) {
+        /* XXX(gps): include object id in this warning? */
+        PyErr_WarnEx(PyExc_RuntimeWarning,
+            "readonly_exports overflow on bytes object; "
+            "marking it immutable.", 1);
+        obj->ob_readonly_exports = IMMUTABLE;
+    }
+#endif
+    /* 
+     * NOTE: Even if the above check isn't made, the values are such that
+     * incrementing ob_readonly_exports past the max value will cause it
+     * to become immutable (as a partial safety feature).
+     */
+    if (obj->ob_readonly_exports != IMMUTABLE) {
+        obj->ob_readonly_exports++;
+    }
+}
+
+
+/*
+ * Decrement the readonly export count.
+ * Must be called with the GIL held.
+ *
+ * For use by the buffer API to implement PyBUF_LOCKDATA requests.
+ */
+Py_LOCAL_INLINE(void) dec_readonly_exports(PyBytesObject *obj)
+{
+    assert(is_immutable(obj) || obj->ob_readonly_exports <= obj->ob_exports);
+    if (obj->ob_readonly_exports != IMMUTABLE) {
+        obj->ob_readonly_exports--;
+    }
+}
+
+
 /* The nullbytes are used by the stringlib during partition.
  * If partition is removed from bytes, nullbytes and its helper
  * Init/Fini should also be removed.
  */
 static PyBytesObject *nullbytes = NULL;
 
 void
 PyBytes_Fini(void)
@@ -49,35 +189,44 @@
     return 1;
 }
 
 static int
 bytes_getbuffer(PyBytesObject *obj, PyBuffer *view, int flags)
 {
         int ret;
         void *ptr;
+        int readonly = 0;
         if (view == NULL) {
-                obj->ob_exports++;
-                return 0;
+                return inc_exports(obj);
         }
         if (obj->ob_bytes == NULL)
                 ptr = "";
         else
                 ptr = obj->ob_bytes;
-        ret = PyBuffer_FillInfo(view, ptr, Py_Size(obj), 0, flags);
+        if (((flags & PyBUF_LOCKDATA) == PyBUF_LOCKDATA) &&
+            obj->ob_exports == 0) {
+                inc_readonly_exports(obj);
+                readonly = -1;
+        } else {
+                readonly = is_readonly(obj);
+        }
+        ret = PyBuffer_FillInfo(view, ptr, Py_Size(obj), readonly, flags);
         if (ret >= 0) {
-                obj->ob_exports++;
+                return inc_exports(obj);
         }
         return ret;
 }
 
 static void
 bytes_releasebuffer(PyBytesObject *obj, PyBuffer *view)
 {
-        obj->ob_exports--;
+        dec_exports(obj);
+        if (view && view->readonly == -1)
+                dec_readonly_exports(obj);
 }
 
 static Py_ssize_t
 _getbuffer(PyObject *obj, PyBuffer *view)
 {
     PyBufferProcs *buffer = Py_Type(obj)->tp_as_buffer;
 
     if (buffer == NULL ||
@@ -85,16 +234,20 @@
         buffer->bf_getbuffer == NULL)
     {
         PyErr_Format(PyExc_TypeError,
                      "Type %.100s doesn't support the buffer API",
                      Py_Type(obj)->tp_name);
         return -1;
     }
 
+    /* 
+     * TODO(gps): make this PyBUF_WRITEABLE?  or just verify sanity on
+     * our own before calling if the GIL is not being relinquished?
+     */
     if (buffer->bf_getbuffer(obj, view, PyBUF_SIMPLE) < 0)
             return -1;
     return view->len;
 }
 
 /* Direct API functions */
 
 PyObject *
@@ -151,26 +304,40 @@
 PyBytes_AsString(PyObject *self)
 {
     assert(self != NULL);
     assert(PyBytes_Check(self));
 
     return PyBytes_AS_STRING(self);
 }
 
+#define SET_RO_ERROR(bo)  do {  \
+            if (is_immutable((PyBytesObject *)(bo)))  \
+                PyErr_SetString(PyExc_BufferError,  \
+                    "Immutable flag set: object cannot be modified"); \
+            else \
+                PyErr_SetString(PyExc_BufferError, \
+                    "Readonly export exists: object cannot be modified"); \
+        } while (0);
+
 int
 PyBytes_Resize(PyObject *self, Py_ssize_t size)
 {
     void *sval;
     Py_ssize_t alloc = ((PyBytesObject *)self)->ob_alloc;
 
     assert(self != NULL);
     assert(PyBytes_Check(self));
     assert(size >= 0);
 
+    if (is_readonly((PyBytesObject *)self)) {
+        SET_RO_ERROR(self);
+        return -1;
+    }
+
     if (size < alloc / 2) {
         /* Major downsize; resize down to exact size */
         alloc = size + 1;
     }
     else if (size < alloc) {
         /* Within allocated size; quick exit */
         Py_Size(self) = size;
         ((PyBytesObject *)self)->ob_bytes[size] = '\0'; /* Trailing null */
@@ -275,17 +442,23 @@
     }
 
     mysize = Py_Size(self);
     size = mysize + vo.len;
     if (size < 0) {
         PyObject_ReleaseBuffer(other, &vo);
         return PyErr_NoMemory();
     }
+
     if (size < self->ob_alloc) {
+        if (is_readonly((PyBytesObject *)self)) {
+            SET_RO_ERROR(self);
+            PyObject_ReleaseBuffer(other, &vo);
+            return NULL;
+        }
         Py_Size(self) = size;
         self->ob_bytes[Py_Size(self)] = '\0'; /* Trailing null byte */
     }
     else if (PyBytes_Resize((PyObject *)self, size) < 0) {
         PyObject_ReleaseBuffer(other, &vo);
         return NULL;
     }
     memcpy(self->ob_bytes + mysize, vo.buf, vo.len);
@@ -327,17 +500,22 @@
     Py_ssize_t size;
 
     if (count < 0)
         count = 0;
     mysize = Py_Size(self);
     size = mysize * count;
     if (count != 0 && size / count != mysize)
         return PyErr_NoMemory();
+
     if (size < self->ob_alloc) {
+        if (is_readonly((PyBytesObject *)self)) {
+            SET_RO_ERROR(self);
+            return NULL;
+        }
         Py_Size(self) = size;
         self->ob_bytes[Py_Size(self)] = '\0'; /* Trailing null byte */
     }
     else if (PyBytes_Resize((PyObject *)self, size) < 0)
         return NULL;
 
     if (mysize == 1)
         memset(self->ob_bytes, self->ob_bytes[0], size);
@@ -487,16 +665,22 @@
                                  "can't set bytes slice from %.100s",
                                  Py_Type(values)->tp_name);
                     return -1;
             }
             needed = vbytes.len;
             bytes = vbytes.buf;
     }
 
+    if (is_readonly((PyBytesObject *)self)) {
+        SET_RO_ERROR(self);
+        res = -1;
+        goto finish;
+    }
+
     if (lo < 0)
         lo = 0;
     if (hi < lo)
         hi = lo;
     if (hi > Py_Size(self))
         hi = Py_Size(self);
 
     avail = hi - lo;
@@ -553,16 +737,21 @@
     if (i < 0 || i >= Py_Size(self)) {
         PyErr_SetString(PyExc_IndexError, "bytes index out of range");
         return -1;
     }
 
     if (value == NULL)
         return bytes_setslice(self, i, i+1, NULL);
 
+    if (is_readonly((PyBytesObject *)self)) {
+        SET_RO_ERROR(self);
+        return -1;
+    }
+
     ival = PyNumber_AsSsize_t(value, PyExc_ValueError);
     if (ival == -1 && PyErr_Occurred())
         return -1;
 
     if (ival < 0 || ival >= 256) {
         PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)");
         return -1;
     }
@@ -572,16 +761,21 @@
 }
 
 static int
 bytes_ass_subscript(PyBytesObject *self, PyObject *item, PyObject *values)
 {
     Py_ssize_t start, stop, step, slicelen, needed;
     char *bytes;
 
+    if (is_readonly((PyBytesObject *)self)) {
+        SET_RO_ERROR(self);
+        return -1;
+    }
+
     if (PyIndex_Check(item)) {
         Py_ssize_t i = PyNumber_AsSsize_t(item, PyExc_IndexError);
 
         if (i == -1 && PyErr_Occurred())
             return -1;
 
         if (i < 0)
             i += PyBytes_GET_SIZE(self);
@@ -1335,16 +1529,18 @@
 PyDoc_STRVAR(translate__doc__,
 "B.translate(table [,deletechars]) -> bytes\n\
 \n\
 Return a copy of the bytes B, where all characters occurring\n\
 in the optional argument deletechars are removed, and the\n\
 remaining characters have been mapped through the given\n\
 translation table, which must be a bytes of length 256.");
 
+/* XXX(gps): bytes could also use an in place bytes_itranslate method? */
+
 static PyObject *
 bytes_translate(PyBytesObject *self, PyObject *args)
 {
     register char *input, *output;
     register const char *table;
     register Py_ssize_t i, c, changed = 0;
     PyObject *input_obj = (PyObject*)self;
     const char *table1, *output_start, *del_table=NULL;
@@ -2030,16 +2226,18 @@
 
 PyDoc_STRVAR(replace__doc__,
 "B.replace (old, new[, count]) -> bytes\n\
 \n\
 Return a copy of bytes B with all occurrences of subsection\n\
 old replaced by new.  If the optional argument count is\n\
 given, only the first count occurrences are replaced.");
 
+/* XXX(gps): bytes could also use an in place bytes_ireplace method? */
+
 static PyObject *
 bytes_replace(PyBytesObject *self, PyObject *args)
 {
     Py_ssize_t count = -1;
     PyObject *from, *to, *res;
     PyBuffer vfrom, vto;
 
     if (!PyArg_ParseTuple(args, "OO|n:replace", &from, &to, &count))
@@ -2382,16 +2580,21 @@
 \n\
 Reverse the order of the values in bytes in place.");
 static PyObject *
 bytes_reverse(PyBytesObject *self, PyObject *unused)
 {
     char swap, *head, *tail;
     Py_ssize_t i, j, n = Py_Size(self);
 
+    if (is_readonly((PyBytesObject *)self)) {
+        SET_RO_ERROR(self);
+        return NULL;
+    }
+
     j = n / 2;
     head = self->ob_bytes;
     tail = head + n - 1;
     for (i = 0; i < j; i++) {
         swap = *head;
         *head++ = *tail;
         *tail-- = swap;
     }
@@ -2407,16 +2610,21 @@
 bytes_insert(PyBytesObject *self, PyObject *args)
 {
     int value;
     Py_ssize_t where, n = Py_Size(self);
 
     if (!PyArg_ParseTuple(args, "ni:insert", &where, &value))
         return NULL;
 
+    if (is_readonly((PyBytesObject *)self)) {
+        SET_RO_ERROR(self);
+        return NULL;
+    }
+
     if (n == PY_SSIZE_T_MAX) {
         PyErr_SetString(PyExc_OverflowError,
                         "cannot add more objects to bytes");
         return NULL;
     }
     if (value < 0 || value >= 256) {
         PyErr_SetString(PyExc_ValueError,
                         "byte must be in range(0, 256)");
@@ -2472,16 +2680,21 @@
 bytes_pop(PyBytesObject *self, PyObject *args)
 {
     int value;
     Py_ssize_t where = -1, n = Py_Size(self);
 
     if (!PyArg_ParseTuple(args, "|n:pop", &where))
         return NULL;
 
+    if (is_readonly((PyBytesObject *)self)) {
+        SET_RO_ERROR(self);
+        return NULL;
+    }
+
     if (n == 0) {
         PyErr_SetString(PyExc_OverflowError,
                         "cannot pop an empty bytes");
         return NULL;
     }
     if (where < 0)
         where += Py_Size(self);
     if (where < 0 || where >= Py_Size(self)) {
@@ -2505,16 +2718,21 @@
 bytes_remove(PyBytesObject *self, PyObject *arg)
 {
     int value;
     Py_ssize_t where, n = Py_Size(self);
 
     if (! _getbytevalue(arg, &value))
         return NULL;
 
+    if (is_readonly((PyBytesObject *)self)) {
+        SET_RO_ERROR(self);
+        return NULL;
+    }
+
     for (where = 0; where < n; where++) {
         if (self->ob_bytes[where] == value)
             break;
     }
     if (where == n) {
         PyErr_SetString(PyExc_ValueError, "value not found in bytes");
         return NULL;
     }
@@ -2783,16 +3001,18 @@
 
   error:
     Py_DECREF(newbytes);
     return NULL;
 }
 
 PyDoc_STRVAR(reduce_doc, "Return state information for pickling.");
 
+/* XXX(gps): should is_immutable() be in the pickle? */
+
 static PyObject *
 bytes_reduce(PyBytesObject *self)
 {
     PyObject *latin1;
     if (self->ob_bytes)
         latin1 = PyUnicode_DecodeLatin1(self->ob_bytes,
                                         Py_Size(self), NULL);
     else

From thomas at python.org  Thu Aug 30 01:56:53 2007
From: thomas at python.org (Thomas Wouters)
Date: Thu, 30 Aug 2007 01:56:53 +0200
Subject: [Python-3000] refleak in test_io?
Message-ID: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com>

Am I the only one seeing a refleak in test_io?

timberwolf:~/python/python/py3k > ./python -E -tt Lib/test/regrtest.py -R::
test_io
test_io
beginning 9 repetitions
123456789
.........
test_io leaked [62, 62, 62, 62] references, sum=248
1 test OK.

It's in this particular piece of code:

    def test_destructor(self):
        record = []
        class MyFileIO(io.FileIO):
            def __del__(self):
                record.append(1)
                io.FileIO.__del__(self)
            def close(self):
                record.append(2)
                io.FileIO.close(self)
            def flush(self):
                record.append(3)
                io.FileIO.flush(self)
        f = MyFileIO(test_support.TESTFN, "w")
        f.write("xxx")
        del f
        self.assertEqual(record, [1, 2, 3])

which you can simplify to:

    def test_destructor(self):
        class MyFileIO(io.FileIO):
            pass
        f = MyFileIO(test_support.TESTFN, "w")
        del f

That leaks 30 references each time it's called. Taking the class definition
out of the function stops the leak, so it smells like something, somewhere,
is leaking a reference to the MyFileIO class, Instantiating the class is
necessary to trigger the leak: the refcount of the class goes up after
creating the instance, but does not go down after it's destroyed. However,
creating and destroying another instance does not leak another reference,
only the single reference is leaked.

I tried recreating the leak with more controllable types, but I haven't got
very far. It seems to be caused by some weird interaction between io.FileIO,
_fileio._FileIO and io.IOBase, specifically io.IOBase.__del__() calling
self.close(), and io.FileIO.close() calling _fileio._FileIO.close() *and*
io.RawIOBase.close(). The weird thing is that the contents of
RawIOBase.close() doesn't matter. The mere act of calling RawBaseIO.close(self)
causes the leak. Remove the call, or change it into an attribute fetch, and
the leak is gone. I'm stumped.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070830/39925414/attachment.htm 

From guido at python.org  Thu Aug 30 02:04:53 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 17:04:53 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <20070829234728.GV24059@electricrain.com>
References: <20070829234728.GV24059@electricrain.com>
Message-ID: <ca471dc20708291704kca1feb9sc064c7f93ed9cf92@mail.gmail.com>

That's a huge patch to land so close before a release. I'm not sure I
like the immutability API -- it won't be useful unless we add a hash
method, and then we have all sorts of difficulties again -- the
distinction between a hashable and an unhashable object should be made
by type, not by value (tuples containing unhashable values
notwithstanding).

I don't understand the comment about using PyBUF_WRITABLE in
_getbuffer() -- this is only used for data we're *reading* and I don't
think the GIL is even released while we're reading such things.

If you think it's important to get this in the 3.0a1 release, we
should pair-program on it ASAP, preferable tomorrow morning.
Otherwise, let's do a review next week.

--Guido

On 8/29/07, Gregory P. Smith <greg at krypto.org> wrote:
> Attached is what I've come up with so far.  Only a single field is
> added to the PyBytesObject struct.  This adds support to the bytes
> object for PyBUF_LOCKDATA buffer API operation.  bytes objects can be
> marked temporarily read-only for use while the buffer api has handed
> them off to something which may run without the GIL (think IO).  Any
> attempt to modify them during that time will raise an exception as I
> believe Martin suggested earlier.
>
> As an added bonus because its been discussed here, support for setting
> a bytes object immutable has been added since its pretty trivial once
> the read only export support was in place.  Thats not required but was
> trivial to include.
>
> I'd appreciate any feedback.
>
> My TODO list for this patch:
>
>  0. Get feedback and make adjustments as necessary.
>
>  1. Deciding between PyBUF_SIMPLE and PyBUF_WRITEABLE for the internal
>     uses of the _getbuffer() function.  bytesobject.c contains both readonly
>     and read-write uses of the buffers, i'll add boolean parameter for
>     that.
>
>  2. More testing: a few tests in the test suite fail after this but the
>     number was low and I haven't had time to look at why or what the
>     failures were.
>
>  3. Exporting methods suggested in the TODO at the top of the file.
>
>  4. Unit tests for all of the functionality this adds.
>
> NOTE: after these changes I had to make clean and rm -rf build before
> things would not segfault on import.  I suspect some things (modules?)
> were not properly recompiled after the bytesobject.h struct change
> otherwise.
>
> -gps
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nnorwitz at gmail.com  Thu Aug 30 02:07:49 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Wed, 29 Aug 2007 17:07:49 -0700
Subject: [Python-3000] refleak in test_io?
In-Reply-To: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com>
References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com>
Message-ID: <ee2a432c0708291707l8d604f0s1c149a5745db5650@mail.gmail.com>

On 8/29/07, Thomas Wouters <thomas at python.org> wrote:
>
> Am I the only one seeing a refleak in test_io?

I know of leaks in 4 modules, but they all may point to the same one
you identified:

test_io leaked [62, 62] references, sum=124
test_urllib leaked [122, 122] references, sum=244
test_urllib2_localnet leaked [3, 3] references, sum=6
test_xmlrpc leaked [26, 26] references, sum=52

n

From greg at krypto.org  Thu Aug 30 02:49:45 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 29 Aug 2007 17:49:45 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <ca471dc20708291704kca1feb9sc064c7f93ed9cf92@mail.gmail.com>
References: <20070829234728.GV24059@electricrain.com>
	<ca471dc20708291704kca1feb9sc064c7f93ed9cf92@mail.gmail.com>
Message-ID: <52dc1c820708291749v60d50326me7684678553ce3cb@mail.gmail.com>

I'm inclined to let this one wait for 3.0a2, I'm out of python time for the
week and will be out of town (but online) until next Thursday. Pairing up to
finish it later on would be nice if needed.  I'm happy if the immutable
support is dropped, I just figured I'd include it as an example once I
realized how easy it was.  I don't want hashable bytes objects either (let
someone implement that using a subclass in python :).

As for the _getbuffer() stuff I left it as a comment because I hadn't looked
into it in enough detail yet, you're right about the GIL.

-gps

On 8/29/07, Guido van Rossum <guido at python.org> wrote:
>
> That's a huge patch to land so close before a release. I'm not sure I
> like the immutability API -- it won't be useful unless we add a hash
> method, and then we have all sorts of difficulties again -- the
> distinction between a hashable and an unhashable object should be made
> by type, not by value (tuples containing unhashable values
> notwithstanding).
>
> I don't understand the comment about using PyBUF_WRITABLE in
> _getbuffer() -- this is only used for data we're *reading* and I don't
> think the GIL is even released while we're reading such things.
>
> If you think it's important to get this in the 3.0a1 release, we
> should pair-program on it ASAP, preferable tomorrow morning.
> Otherwise, let's do a review next week.
>
> --Guido
>
> On 8/29/07, Gregory P. Smith <greg at krypto.org> wrote:
> > Attached is what I've come up with so far.  Only a single field is
> > added to the PyBytesObject struct.  This adds support to the bytes
> > object for PyBUF_LOCKDATA buffer API operation.  bytes objects can be
> > marked temporarily read-only for use while the buffer api has handed
> > them off to something which may run without the GIL (think IO).  Any
> > attempt to modify them during that time will raise an exception as I
> > believe Martin suggested earlier.
> >
> > As an added bonus because its been discussed here, support for setting
> > a bytes object immutable has been added since its pretty trivial once
> > the read only export support was in place.  Thats not required but was
> > trivial to include.
> >
> > I'd appreciate any feedback.
> >
> > My TODO list for this patch:
> >
> >  0. Get feedback and make adjustments as necessary.
> >
> >  1. Deciding between PyBUF_SIMPLE and PyBUF_WRITEABLE for the internal
> >     uses of the _getbuffer() function.  bytesobject.c contains both
> readonly
> >     and read-write uses of the buffers, i'll add boolean parameter for
> >     that.
> >
> >  2. More testing: a few tests in the test suite fail after this but the
> >     number was low and I haven't had time to look at why or what the
> >     failures were.
> >
> >  3. Exporting methods suggested in the TODO at the top of the file.
> >
> >  4. Unit tests for all of the functionality this adds.
> >
> > NOTE: after these changes I had to make clean and rm -rf build before
> > things would not segfault on import.  I suspect some things (modules?)
> > were not properly recompiled after the bytesobject.h struct change
> > otherwise.
> >
> > -gps
> >
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
> >
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070829/855df381/attachment.htm 

From barry at python.org  Thu Aug 30 03:35:11 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 29 Aug 2007 21:35:11 -0400
Subject: [Python-3000] [Python-3000-checkins] r57691 -
	python/branches/py3k/Lib/email
In-Reply-To: <20070830011514.A50B51E4002@bag.python.org>
References: <20070830011514.A50B51E4002@bag.python.org>
Message-ID: <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 29, 2007, at 9:15 PM, guido.van.rossum wrote:

> Author: guido.van.rossum
> Date: Thu Aug 30 03:15:14 2007
> New Revision: 57691
>
> Added:
>    python/branches/py3k/Lib/email/
>       - copied from r57592, sandbox/trunk/emailpkg/5_0-exp/email/
> Log:
> Copying the email package back, despite its failings.

Oh, okay!  I have a few uncommitted changes that improve things a  
bit, but I'll commit those to the branch and kill off the sandbox  
branch.

Note that there /are/ API changes involved here so the documentation  
and docstrings need to be updated and NEWS entries need to be  
written.  I'll do all that after I fix the last few failures.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtYez3EjvBPtnXfVAQKL0AP6A5YGJdQ5vRDk1PaHlD/R6qnlFF4O8omT
uFCw/JbWD+FEFfpzEgFGtlJcidPclPbSnhL6xRix1IDOz+O8f6jUHZ/rES+LDjhT
4XkK3cwqH/+qwl/QH92/M0Kz+uS7ADcXvxIKH+cvSZGc1c5W1J4jTb8SZtlAugyd
m4deMR8/E5M=
=tRyX
-----END PGP SIGNATURE-----

From guido at python.org  Thu Aug 30 04:02:41 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 19:02:41 -0700
Subject: [Python-3000] [Python-3000-checkins] r57691 -
	python/branches/py3k/Lib/email
In-Reply-To: <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org>
References: <20070830011514.A50B51E4002@bag.python.org>
	<5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org>
Message-ID: <ca471dc20708291902h4835cbe2if4a8efb26518990f@mail.gmail.com>

Great! The more you can fix up by Friday the better, but I figured
it's better to have a little lead time so we can fix up other things
depending on it, and have a little test time.

On 8/29/07, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Aug 29, 2007, at 9:15 PM, guido.van.rossum wrote:
>
> > Author: guido.van.rossum
> > Date: Thu Aug 30 03:15:14 2007
> > New Revision: 57691
> >
> > Added:
> >    python/branches/py3k/Lib/email/
> >       - copied from r57592, sandbox/trunk/emailpkg/5_0-exp/email/
> > Log:
> > Copying the email package back, despite its failings.
>
> Oh, okay!  I have a few uncommitted changes that improve things a
> bit, but I'll commit those to the branch and kill off the sandbox
> branch.
>
> Note that there /are/ API changes involved here so the documentation
> and docstrings need to be updated and NEWS entries need to be
> written.  I'll do all that after I fix the last few failures.
>
> - -Barry
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (Darwin)
>
> iQCVAwUBRtYez3EjvBPtnXfVAQKL0AP6A5YGJdQ5vRDk1PaHlD/R6qnlFF4O8omT
> uFCw/JbWD+FEFfpzEgFGtlJcidPclPbSnhL6xRix1IDOz+O8f6jUHZ/rES+LDjhT
> 4XkK3cwqH/+qwl/QH92/M0Kz+uS7ADcXvxIKH+cvSZGc1c5W1J4jTb8SZtlAugyd
> m4deMR8/E5M=
> =tRyX
> -----END PGP SIGNATURE-----
> _______________________________________________
> Python-3000-checkins mailing list
> Python-3000-checkins at python.org
> http://mail.python.org/mailman/listinfo/python-3000-checkins
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Thu Aug 30 04:19:46 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 29 Aug 2007 21:19:46 -0500
Subject: [Python-3000] Will Py3K be friendlier to optimization
 opportunities?
In-Reply-To: <43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com>
References: <18133.44715.247285.482372@montanaro.dyndns.org>
	<46D5B0BD.6010209@v.loewis.de>
	<18133.50997.687840.234611@montanaro.dyndns.org>
	<43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com>
Message-ID: <18134.10562.798661.664005@montanaro.dyndns.org>


    Collin> When thinking about these kinds of optimizations and
    Collin> restrictions, keep in mind their effect on testing. For example,
    Collin> I work on code that makes use of the ability to tinker with
    Collin> another module's view of os.path in order to simulate error
    Collin> conditions that would otherwise be hard to test. If you wanted
    Collin> to hide this kind of restriction behind an -O flag, that would
    Collin> be one thing, but having it on by default seems like a bad idea.

You can achieve these sorts of effects by assigning an object to
sys.modules[modulename].

Skip


From skip at pobox.com  Thu Aug 30 04:23:48 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 29 Aug 2007 21:23:48 -0500
Subject: [Python-3000] Will Py3K be friendlier to optimization
 opportunities?
In-Reply-To: <18134.10562.798661.664005@montanaro.dyndns.org>
References: <18133.44715.247285.482372@montanaro.dyndns.org>
	<46D5B0BD.6010209@v.loewis.de>
	<18133.50997.687840.234611@montanaro.dyndns.org>
	<43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com>
	<18134.10562.798661.664005@montanaro.dyndns.org>
Message-ID: <18134.10804.103888.38682@montanaro.dyndns.org>


    skip> You can achieve these sorts of effects by assigning an object to
    skip> sys.modules[modulename].

I forgot you should always be able to assign to the module's __dict__
attribute as well:

    >>> import os
    >>> os.__dict__['foo'] = 'a'
    >>> os.foo
    'a'

S

From barry at python.org  Thu Aug 30 04:24:02 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 29 Aug 2007 22:24:02 -0400
Subject: [Python-3000] [Python-3000-checkins] r57691 -
	python/branches/py3k/Lib/email
In-Reply-To: <ca471dc20708291902h4835cbe2if4a8efb26518990f@mail.gmail.com>
References: <20070830011514.A50B51E4002@bag.python.org>
	<5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org>
	<ca471dc20708291902h4835cbe2if4a8efb26518990f@mail.gmail.com>
Message-ID: <8FE35331-8B63-421B-BBB9-044983EA5760@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 29, 2007, at 10:02 PM, Guido van Rossum wrote:

> Great! The more you can fix up by Friday the better, but I figured
> it's better to have a little lead time so we can fix up other things
> depending on it, and have a little test time.

Guido, do you remember which revision from the sandbox you merged/ 
copied?  It looks like it was missing some stuff I committed last  
night.  If you know which revision you merged it'll make it easy for  
me to copy the latest stuff over.  Sadly I did not use svnmerge. :(

OTOH, if you get to it before I do, the sandbox branch is now  
completely up-to-date with all my changes.  It needs to be merged to  
the head of the py3k trunk and then I can kill the sandbox branch.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtYqQ3EjvBPtnXfVAQLfkgP+LsMc8jST44xecY/G0TGlHDB3CNXSVqoA
kZuVM/8YPjDfWdhTuaLaGHD9o6ld7OMAGQsYi3PYFG5tOgBM1dauCVvHh1ltSRTf
rYCNcoW6hntCiGnCVqECsK2nCLhcMowI7R0FWylEbCY16Vobs3hHsJKAdMrqR120
9jqTgzr9aic=
=fvgZ
-----END PGP SIGNATURE-----

From fdrake at acm.org  Thu Aug 30 04:25:47 2007
From: fdrake at acm.org (Fred Drake)
Date: Wed, 29 Aug 2007 22:25:47 -0400
Subject: [Python-3000] Will Py3K be friendlier to optimization
	opportunities?
In-Reply-To: <18134.10562.798661.664005@montanaro.dyndns.org>
References: <18133.44715.247285.482372@montanaro.dyndns.org>
	<46D5B0BD.6010209@v.loewis.de>
	<18133.50997.687840.234611@montanaro.dyndns.org>
	<43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com>
	<18134.10562.798661.664005@montanaro.dyndns.org>
Message-ID: <DD273557-D0A8-4872-8ECA-261D813E56B9@acm.org>

On Aug 29, 2007, at 10:19 PM, skip at pobox.com wrote:
> You can achieve these sorts of effects by assigning an object to
> sys.modules[modulename].

Only for imports that haven't happened yet, which tends to be fragile.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From guido at python.org  Thu Aug 30 03:30:38 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 18:30:38 -0700
Subject: [Python-3000] email package back
Message-ID: <ca471dc20708291830v567fe8d4v6c39698a089562ab@mail.gmail.com>

In preparation of the release Friday, I put the email package back,
from Barry's sandbox.

This breaks a few things. Please help clean these up!

Also a few things were disabled to cope with its temporary demise. If
you remember doing one of these, please undo them!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug 30 05:21:44 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 20:21:44 -0700
Subject: [Python-3000] [Python-3000-checkins] r57691 -
	python/branches/py3k/Lib/email
In-Reply-To: <8FE35331-8B63-421B-BBB9-044983EA5760@python.org>
References: <20070830011514.A50B51E4002@bag.python.org>
	<5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org>
	<ca471dc20708291902h4835cbe2if4a8efb26518990f@mail.gmail.com>
	<8FE35331-8B63-421B-BBB9-044983EA5760@python.org>
Message-ID: <ca471dc20708292021l19beaeeex891d710651c73c33@mail.gmail.com>

On 8/29/07, Barry Warsaw <barry at python.org> wrote:
> Guido, do you remember which revision from the sandbox you merged/
> copied?  It looks like it was missing some stuff I committed last
> night.  If you know which revision you merged it'll make it easy for
> me to copy the latest stuff over.  Sadly I did not use svnmerge. :(
>
> OTOH, if you get to it before I do, the sandbox branch is now
> completely up-to-date with all my changes.  It needs to be merged to
> the head of the py3k trunk and then I can kill the sandbox branch.

Ouch, I didn't make a note of that. I misunderstood how svn copy
worked, and assumed it would copy the latest revision -- but it copied
my working copy. Fortunately I had done a sync not too long ago.

I think I can reconstruct the diffs and will apply them manually to
the py3k branch, then you can erase the sandbox.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug 30 05:28:16 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 20:28:16 -0700
Subject: [Python-3000] [Python-3000-checkins] r57691 -
	python/branches/py3k/Lib/email
In-Reply-To: <ca471dc20708292021l19beaeeex891d710651c73c33@mail.gmail.com>
References: <20070830011514.A50B51E4002@bag.python.org>
	<5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org>
	<ca471dc20708291902h4835cbe2if4a8efb26518990f@mail.gmail.com>
	<8FE35331-8B63-421B-BBB9-044983EA5760@python.org>
	<ca471dc20708292021l19beaeeex891d710651c73c33@mail.gmail.com>
Message-ID: <ca471dc20708292028s23124f88o798b3bf8d5665303@mail.gmail.com>

No, I don't think I can recover the changes. Would it work to just
copy the files over from the sandbox, forcing Lib/email in the py3k
branch to be identical to emailpkg/5_0-exp/email in the sandbox?

On 8/29/07, Guido van Rossum <guido at python.org> wrote:
> On 8/29/07, Barry Warsaw <barry at python.org> wrote:
> > Guido, do you remember which revision from the sandbox you merged/
> > copied?  It looks like it was missing some stuff I committed last
> > night.  If you know which revision you merged it'll make it easy for
> > me to copy the latest stuff over.  Sadly I did not use svnmerge. :(
> >
> > OTOH, if you get to it before I do, the sandbox branch is now
> > completely up-to-date with all my changes.  It needs to be merged to
> > the head of the py3k trunk and then I can kill the sandbox branch.
>
> Ouch, I didn't make a note of that. I misunderstood how svn copy
> worked, and assumed it would copy the latest revision -- but it copied
> my working copy. Fortunately I had done a sync not too long ago.
>
> I think I can reconstruct the diffs and will apply them manually to
> the py3k branch, then you can erase the sandbox.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Thu Aug 30 05:33:53 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 29 Aug 2007 23:33:53 -0400
Subject: [Python-3000] [Python-3000-checkins] r57691 -
	python/branches/py3k/Lib/email
In-Reply-To: <ca471dc20708292028s23124f88o798b3bf8d5665303@mail.gmail.com>
References: <20070830011514.A50B51E4002@bag.python.org>
	<5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org>
	<ca471dc20708291902h4835cbe2if4a8efb26518990f@mail.gmail.com>
	<8FE35331-8B63-421B-BBB9-044983EA5760@python.org>
	<ca471dc20708292021l19beaeeex891d710651c73c33@mail.gmail.com>
	<ca471dc20708292028s23124f88o798b3bf8d5665303@mail.gmail.com>
Message-ID: <6AFC1C7C-6BF7-4057-9E53-76030FEB214C@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 29, 2007, at 11:28 PM, Guido van Rossum wrote:

> No, I don't think I can recover the changes. Would it work to just
> copy the files over from the sandbox, forcing Lib/email in the py3k
> branch to be identical to emailpkg/5_0-exp/email in the sandbox?

Yes, that /should/ work.  I'll lose my last commit to the py3k branch  
but that will be easy to recover.  I'm going to sleep now so if you  
get to it before I wake up I won't do it in the morning before you  
wake up.  And vice versa (or something like that :).

Thanks,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtY6oXEjvBPtnXfVAQK3hgQAqqyeWk9qive/A/VsP6sQB/DUVZoMlWhV
L1VVB133aFxii8TGyk+C8LvZOD0Z31/98vREW5aDMEhxGEhkk9kHQAQkLqSXEEWA
a1kS0fJ0YMXOaDA8cbvFpJ2NWPJpFh1ki1wc+PtobO59O+rRnhnOdL5WyxB86ahR
9LIqkF5xol0=
=dKTM
-----END PGP SIGNATURE-----

From guido at python.org  Thu Aug 30 05:51:05 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Aug 2007 20:51:05 -0700
Subject: [Python-3000] [Python-3000-checkins] r57691 -
	python/branches/py3k/Lib/email
In-Reply-To: <6AFC1C7C-6BF7-4057-9E53-76030FEB214C@python.org>
References: <20070830011514.A50B51E4002@bag.python.org>
	<5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org>
	<ca471dc20708291902h4835cbe2if4a8efb26518990f@mail.gmail.com>
	<8FE35331-8B63-421B-BBB9-044983EA5760@python.org>
	<ca471dc20708292021l19beaeeex891d710651c73c33@mail.gmail.com>
	<ca471dc20708292028s23124f88o798b3bf8d5665303@mail.gmail.com>
	<6AFC1C7C-6BF7-4057-9E53-76030FEB214C@python.org>
Message-ID: <ca471dc20708292051i495487dbyb03a9ad1059e4d64@mail.gmail.com>

On 8/29/07, Barry Warsaw <barry at python.org> wrote:
> On Aug 29, 2007, at 11:28 PM, Guido van Rossum wrote:
> > No, I don't think I can recover the changes. Would it work to just
> > copy the files over from the sandbox, forcing Lib/email in the py3k
> > branch to be identical to emailpkg/5_0-exp/email in the sandbox?
>
> Yes, that /should/ work.  I'll lose my last commit to the py3k branch
> but that will be easy to recover.  I'm going to sleep now so if you
> get to it before I wake up I won't do it in the morning before you
> wake up.  And vice versa (or something like that :).

OK, I did that. However (despite my promise in the checkin msg) I
couldn't re-apply the changes which you applied to the py3kbranch. Can
you reconstruct these yourself when you get up in the morning? I'm
afrain I'll just break more stuff.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Thu Aug 30 07:35:32 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Aug 2007 07:35:32 +0200
Subject: [Python-3000] Need Windows build instructions
In-Reply-To: <ca471dc20708291618v523af477uec4bcee59cae14fd@mail.gmail.com>
References: <ca471dc20708291618v523af477uec4bcee59cae14fd@mail.gmail.com>
Message-ID: <46D65724.7000602@v.loewis.de>

> Can someone familiar with building Py3k on Windows add a section on
> how to build it on Windows to the new README? 

Done, by pointing to PCbuild/readme.txt.

Regards,
Martin

From theller at ctypes.org  Thu Aug 30 08:08:40 2007
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 30 Aug 2007 08:08:40 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <m2sl62f4a3.fsf@valheru.db3l.homeip.net>
References: <46D453E9.4020903@ctypes.org>
	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>
	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>
	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>
	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>
Message-ID: <fb5mt8$5n4$1@sea.gmane.org>

David Bolen schrieb:
> David Bolen <db3l.net at gmail.com> writes:
> 
>> "Martin v. L?wis" <martin at v.loewis.de> writes:
>>
>>>> These messageboxes of course hang the tests on the windows build servers,
>>>> so probably it would be good if they could be disabled completely.
>>>
>>> I think this will be very difficult to achieve.
>>
>> Could the tests be run beneath a shim process that used SetErrorMode()
>> to disable all the OS-based process failure dialog boxes?  If I
>> remember correctly the error mode is inherited, so an independent
>> small exec module could reset the mode, and execute the normal test
>> sequence as a child process.
> 
> Or if using ctypes is ok, perhaps it could be done right in the test
> runner.
> 
> While I haven't done any local mods to preventthe C RTL boxes,
> selecting Ignore on them gets me to the OS level box, and:
> 
> Python 3.0x (py3k, Aug 27 2007, 22:44:06) [MSC v.1310 32 bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import test_os
> [50256 refs]
>>>> test_os.test_main()
> 
> dies with popup in test_execvpe_with_bad_program (test_os.ExecTests).  But
> 
> Python 3.0x (py3k, Aug 27 2007, 22:44:06) [MSC v.1310 32 bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import ctypes
> [39344 refs]
>>>> ctypes.windll.kernel32.SetErrorMode(7)
> 0
> [40694 refs]
>>>> import test_os
> [55893 refs]
>>>> test_os.test_main()
> 
> doesn't present the OS popup prior to process exit.

Cool, this works!

I suggest to apply this patch, which sets an environment variable in the
Tools\buildbot\test.bat script, detects the Windows debug build, and calls
SetErrorMode(7) as David suggested:

Index: Lib/test/regrtest.py
===================================================================
--- Lib/test/regrtest.py	(revision 57666)
+++ Lib/test/regrtest.py	(working copy)
@@ -208,6 +208,15 @@
     flags on the command line.
     """
 
+    if sys.platform == "win32":
+        if "_d.pyd" in [s[0] for s in imp.get_suffixes()]:
+            # running is a debug build.
+            if os.environ.get("PYTEST_NONINTERACTIVE", ""):
+                # If the PYTEST_NONINTERACTIVE environment variable is
+                # set, we do not want any message boxes.
+                import ctypes
+                ctypes.windll.kernel32.SetErrorMode(7)
+
     test_support.record_original_stdout(sys.stdout)
     try:
         opts, args = getopt.getopt(sys.argv[1:], 'dhvgqxsS:rf:lu:t:TD:NLR:wM:',
Index: Tools/buildbot/test.bat
===================================================================
--- Tools/buildbot/test.bat	(revision 57666)
+++ Tools/buildbot/test.bat	(working copy)
@@ -1,3 +1,4 @@
 @rem Used by the buildbot "test" step.
 cd PCbuild
+set PYTEST_NONINTERACTIVE=1
 call rt.bat -d -q -uall -rw



Thomas


From theller at ctypes.org  Thu Aug 30 08:21:37 2007
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 30 Aug 2007 08:21:37 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <fb5mt8$5n4$1@sea.gmane.org>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>
	<fb5mt8$5n4$1@sea.gmane.org>
Message-ID: <fb5nlh$5n4$2@sea.gmane.org>

Thomas Heller schrieb:
> 
> I suggest to apply this patch, which sets an environment variable in the
> Tools\buildbot\test.bat script, detects the Windows debug build, and calls
> SetErrorMode(7) as David suggested:

If noone objects, I would like to apply this patch first, see if it avoids
the test_os.py test hanging, and afterwards fix the test_os test.

Thomas

> 
> Index: Lib/test/regrtest.py
> ===================================================================
> --- Lib/test/regrtest.py	(revision 57666)
> +++ Lib/test/regrtest.py	(working copy)
> @@ -208,6 +208,15 @@
>      flags on the command line.
>      """
>  
> +    if sys.platform == "win32":
> +        if "_d.pyd" in [s[0] for s in imp.get_suffixes()]:
> +            # running is a debug build.
> +            if os.environ.get("PYTEST_NONINTERACTIVE", ""):
> +                # If the PYTEST_NONINTERACTIVE environment variable is
> +                # set, we do not want any message boxes.
> +                import ctypes
> +                ctypes.windll.kernel32.SetErrorMode(7)
> +
>      test_support.record_original_stdout(sys.stdout)
>      try:
>          opts, args = getopt.getopt(sys.argv[1:], 'dhvgqxsS:rf:lu:t:TD:NLR:wM:',
> Index: Tools/buildbot/test.bat
> ===================================================================
> --- Tools/buildbot/test.bat	(revision 57666)
> +++ Tools/buildbot/test.bat	(working copy)
> @@ -1,3 +1,4 @@
>  @rem Used by the buildbot "test" step.
>  cd PCbuild
> +set PYTEST_NONINTERACTIVE=1
>  call rt.bat -d -q -uall -rw
> 


From martin at v.loewis.de  Thu Aug 30 08:26:33 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 30 Aug 2007 08:26:33 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <fb5mt8$5n4$1@sea.gmane.org>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>
	<fb5mt8$5n4$1@sea.gmane.org>
Message-ID: <46D66319.7030209@v.loewis.de>

> I suggest to apply this patch, which sets an environment variable in the
> Tools\buildbot\test.bat script, detects the Windows debug build, and calls
> SetErrorMode(7) as David suggested:

Sounds fine with me - although I would leave out the test for debug
build, and just check the environment variable.

Are you saying that calling SetErrorMode also makes the VC _ASSERT
message boxes go away?

Regards,
Martin

From db3l.net at gmail.com  Thu Aug 30 08:32:40 2007
From: db3l.net at gmail.com (David Bolen)
Date: Thu, 30 Aug 2007 02:32:40 -0400
Subject: [Python-3000] buildbots
References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de>
	<46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org>
	<46D57D47.1090709@v.loewis.de> <fb4ecl$t5a$1@sea.gmane.org>
	<46D5D22C.3010003@v.loewis.de>
	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>
	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>
	<fb5mt8$5n4$1@sea.gmane.org> <46D66319.7030209@v.loewis.de>
Message-ID: <m2hcmhfs6v.fsf@valheru.db3l.homeip.net>

"Martin v. L?wis" <martin at v.loewis.de> writes:

> Are you saying that calling SetErrorMode also makes the VC _ASSERT
> message boxes go away?

I don't believe it should, no.  The assert message boxes are from the VC
runtime, whereas the OS error dialogs are from, well, the OS :-)

Certainly in my manual tests, I still had to "Ignore" my way through the
assert dialogs before checking the results on the OS dialogs.

-- David


From theller at ctypes.org  Thu Aug 30 08:40:05 2007
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 30 Aug 2007 08:40:05 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D66319.7030209@v.loewis.de>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>
	<46D66319.7030209@v.loewis.de>
Message-ID: <fb5oo5$a6e$1@sea.gmane.org>

Martin v. L?wis schrieb:
>> I suggest to apply this patch, which sets an environment variable in the
>> Tools\buildbot\test.bat script, detects the Windows debug build, and calls
>> SetErrorMode(7) as David suggested:
> 
> Sounds fine with me - although I would leave out the test for debug
> build, and just check the environment variable.
> 
> Are you saying that calling SetErrorMode also makes the VC _ASSERT
> message boxes go away?

No. My mistake - I still had some _CrtSetReport... calls in a patched
posixmodule.c.

New patch (still detects the debug build because the name of the C runtime
dll depends on it):


Index: Lib/test/regrtest.py
===================================================================
--- Lib/test/regrtest.py	(revision 57666)
+++ Lib/test/regrtest.py	(working copy)
@@ -208,6 +208,22 @@
     flags on the command line.
     """
 
+    if sys.platform == "win32":
+        import imp
+        if "_d.pyd" in [s[0] for s in imp.get_suffixes()]:
+            # running is a debug build.
+            if os.environ.get("PYTEST_NONINTERACTIVE", ""):
+                # If the PYTEST_NONINTERACTIVE environment variable is
+                # set, we do not want any message boxes.
+                import ctypes
+                # from <crtdbg.h>
+                _CRT_ASSERT = 2
+                _CRTDBG_MODE_FILE = 1
+                _CRTDBG_FILE_STDERR = -5
+                ctypes.cdll.msvcr71d._CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE);
+                ctypes.cdll.msvcr71d._CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
+                ctypes.windll.kernel32.SetErrorMode(7)
+
     test_support.record_original_stdout(sys.stdout)
     try:
         opts, args = getopt.getopt(sys.argv[1:], 'dhvgqxsS:rf:lu:t:TD:NLR:wM:',
Index: Tools/buildbot/test.bat
===================================================================
--- Tools/buildbot/test.bat	(revision 57666)
+++ Tools/buildbot/test.bat	(working copy)
@@ -1,3 +1,4 @@
 @rem Used by the buildbot "test" step.
 cd PCbuild
+set PYTEST_NONINTERACTIVE=1
 call rt.bat -d -q -uall -rw


From nnorwitz at gmail.com  Thu Aug 30 09:00:59 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Thu, 30 Aug 2007 00:00:59 -0700
Subject: [Python-3000] current status
Message-ID: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>

There are 6 tests that fail on all platforms AFAIK:

3 tests failed:
    test_mailbox test_old_mailbox test_unicode_file
3 skips unexpected on linux2:
    test_smtplib test_sundry test_ssl

I believe test_smtplib, test_sundry fail for the same reason at least
partially.  They can't import email.base64mime.encode.  There are
decode functions, but encode is gone from base64mime.  I don't know if
that's the way it's supposed to be or not.  But smtplib can't be
imported because encode is missing.

Some of the failures in test_mailbox and test_old_mailbox are the
same, but I think test_mailbox might have more problems.

I hopefully fixed some platform specific problems, but others remain:

* test_normalization fails on several boxes (where locale is not C maybe?)

* On ia64, test_tarfile.PAXUnicodeTest.test_utf7_filename generates
this exception:
Objects/exceptions.c:1392: PyUnicodeDecodeError_Create: Assertion
`start < 2147483647' failed.

* On ia64 and Win64 (IIRC), this fails:  self.assertEqual(round(1e20), 1e20)
AssertionError: 0 != 1e+20

* On PPC64, all the dbm code seems to be crashing

* File "Lib/test/test_nis.py", line 27, in test_maps
    if nis.match(k, nismap) != v:
SystemError: can't use str as char buffer

* On Solaris, hashlib can't import _md5 which creates a bunch of problems.

* On Win64, there's this assert:
   SystemError: Objects\longobject.c:412: bad argument to internal function
I don't see how it's getting triggered based on the traceback though

Win64 has a bunch of weird issues:

http://python.org/dev/buildbot/3.0/amd64%20XP%203.0/builds/40/step-test/0

n

From martin at v.loewis.de  Thu Aug 30 09:01:29 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Aug 2007 09:01:29 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <fb5oo5$a6e$1@sea.gmane.org>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>
	<fb5oo5$a6e$1@sea.gmane.org>
Message-ID: <46D66B49.8070600@v.loewis.de>

> New patch (still detects the debug build because the name of the C runtime
> dll depends on it):

I know it will be difficult to talk you into not using ctypes :-),
but...

I think this should go into the interpreter code itself. One problem
with your patch is that it breaks if Python is build with VC8.

It should still require an environment variable, say
PYTHONNOERRORWINDOW, whether or not it should be considered only
in debug releases, I don't know. One place to put it would be
Modules/main.c (where all the other environment variables are
considered).

Regards,
Martin

From talin at acm.org  Thu Aug 30 09:03:30 2007
From: talin at acm.org (Talin)
Date: Thu, 30 Aug 2007 00:03:30 -0700
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D4E8F6.30508@trueblade.com>
References: <46D40B88.4080202@trueblade.com>	<fb6fbf560708281607n377a1513m2a191ad569610af2@mail.gmail.com>	<46D4AD40.9070006@trueblade.com>
	<46D4E8F6.30508@trueblade.com>
Message-ID: <46D66BC2.3060708@acm.org>

Eric Smith wrote:
> Eric Smith wrote:
>> Jim Jewett wrote:
> 
>>> but you might want to take inspiration from the "tail" of an
>>> elementtree node, and return the field with the literal next to it as
>>> a single object.
>>>
>>>     (literal_text, field_name, format_spec, conversion)
>> I think I like that best.
> 
> I implemented this in r57641.  I think it simplifies things.  At least,
> it's easier to explain.

Actually...I'm in the middle of writing the docs for the reference 
manual, and I'm finding this a little harder to explain. Not *much* 
harder, but a little bit.

I would probably have gone with one of the following:

    # Test for str vs tuple
    literal_text
    (field_name, format_spec, conversion)

    # Test for length of the tuple
    (literal_text)
    (field_name, format_spec, conversion)

    # Test for 'None' format_spec
    (literal_text, None, None)
    (field_name, format_spec, conversion)

However, I'm not adamant about this - it's up to you what you like best, 
I'll come up with a way to explain it. Also I recognize that your method 
is probably more efficient for the nominal use case -- less tuple creation.

Also I wanted to ask: How about making the built-in 'format' function 
have a default value of "" for the second argument? So I can just say:

    format(x)

as a synonym for:

    str(x)

> Due to an optimization dealing with escaped braces, it's possible for
> (literal, None, None, None) to be returned more than once.  I don't
> think that's a problem, as long as it's documented.  If you look at
> string.py's Formatter.vformat, I don't think it complicates the
> implementation at all.

It's also possible for the literal text to be an empty string if you 
have several consecutive format fields - correct?

> Thanks for the suggestion.


From theller at ctypes.org  Thu Aug 30 09:25:06 2007
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 30 Aug 2007 09:25:06 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D66B49.8070600@v.loewis.de>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>
	<46D66B49.8070600@v.loewis.de>
Message-ID: <fb5rcj$ig9$1@sea.gmane.org>

Martin v. L?wis schrieb:
>> New patch (still detects the debug build because the name of the C runtime
>> dll depends on it):
> 
> I know it will be difficult to talk you into not using ctypes :-),
> but...
> 
> I think this should go into the interpreter code itself. One problem
> with your patch is that it breaks if Python is build with VC8.

ctypes isn't perfect - it needs a way to reliably access the currently used
C runtime library on windows.  But that is off-topic for this thread.

> It should still require an environment variable, say
> PYTHONNOERRORWINDOW, whether or not it should be considered only
> in debug releases, I don't know. One place to put it would be
> Modules/main.c (where all the other environment variables are
> considered).

IMO all this is currently a hack for the buildbot only.  Maybe it should be
converted into something more useful.


About debug release:  The _CrtSetReport... functions are only available
in the debug libray.  So, they have to live inside a #ifdef _DEBUG/#endif
block.

The set_error_mode() function is more useful; AFAIK it also prevents that 
a dialog box is shown when an extension module cannot be loaded because
the extension module depends on a dll that is not found or doesn't have the
entry points that the extension links to.

So, an environment variable would be useful, but maybe there should also be
a Python function available that calls set_error_mode().  sys.set_error_mode()?

Thomas


From martin at v.loewis.de  Thu Aug 30 09:39:06 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Aug 2007 09:39:06 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <fb5rcj$ig9$1@sea.gmane.org>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>	<46D66B49.8070600@v.loewis.de>
	<fb5rcj$ig9$1@sea.gmane.org>
Message-ID: <46D6741A.9040801@v.loewis.de>

> So, an environment variable would be useful, but maybe there should also be
> a Python function available that calls set_error_mode().  sys.set_error_mode()?

Even though this would be somewhat lying - I'd put it into
msvcrt.set_error_mode. For the _CrtSet functions, one might
expose them as-is; they do belong to msvcrt, so the module
would be the proper place. For SetErrorMode, still put it
into msvcrt - it's at least Windows-specific.

Regards,
Martin

From martin at v.loewis.de  Thu Aug 30 10:38:08 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 30 Aug 2007 10:38:08 +0200
Subject: [Python-3000] refleak in test_io?
In-Reply-To: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com>
References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com>
Message-ID: <46D681F0.2050105@v.loewis.de>

> I tried recreating the leak with more controllable types, but I haven't
> got very far. It seems to be caused by some weird interaction between
> io.FileIO, _fileio._FileIO and io.IOBase, specifically io.IOBase.__del_
> _() calling self.close(), and io.FileIO.close() calling
> _fileio._FileIO.close() *and* io.RawIOBase.close(). The weird thing is
> that the contents of RawIOBase.close() doesn't matter. The mere act of
> calling RawBaseIO.close (self) causes the leak. Remove the call, or
> change it into an attribute fetch, and the leak is gone. I'm stumped.

I think the problem is that the class remains referenced in
io.RawIOBase._abc_cache:

py> io.RawIOBase._abc_cache
set()
py> class f(io.RawIOBase):pass
...
py> isinstance(f(), io.RawIOBase)
True
py> io.RawIOBase._abc_cache
{<class '__main__.f'>}
py> del f
py> io.RawIOBase._abc_cache
{<class '__main__.f'>}

Each time test_destructor is called, another class will be added to
_abc_cache.

Regards,
Martin

From eric+python-dev at trueblade.com  Thu Aug 30 12:55:18 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 30 Aug 2007 06:55:18 -0400
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D66BC2.3060708@acm.org>
References: <46D40B88.4080202@trueblade.com>	<fb6fbf560708281607n377a1513m2a191ad569610af2@mail.gmail.com>	<46D4AD40.9070006@trueblade.com>
	<46D4E8F6.30508@trueblade.com> <46D66BC2.3060708@acm.org>
Message-ID: <46D6A216.9010104@trueblade.com>

Talin wrote:
> Eric Smith wrote:
>> Eric Smith wrote:
>>> Jim Jewett wrote:
>>
>>>> but you might want to take inspiration from the "tail" of an
>>>> elementtree node, and return the field with the literal next to it as
>>>> a single object.
>>>>
>>>>     (literal_text, field_name, format_spec, conversion)
>>> I think I like that best.
>>
>> I implemented this in r57641.  I think it simplifies things.  At least,
>> it's easier to explain.
> 
> Actually...I'm in the middle of writing the docs for the reference 
> manual, and I'm finding this a little harder to explain. Not *much* 
> harder, but a little bit.

I think it's easier because it's always:
Output the (possibly zero length) literal text
then, format and output the field, if field_name is non-None

But I'm flexible.

> I would probably have gone with one of the following:
> 
>    # Test for str vs tuple
>    literal_text
>    (field_name, format_spec, conversion)
> 
>    # Test for length of the tuple
>    (literal_text)
>    (field_name, format_spec, conversion)
> 
>    # Test for 'None' format_spec
>    (literal_text, None, None)
>    (field_name, format_spec, conversion)

If you want to change, I'd go with this last one.  Actually, I had it 
working this way, once, but I thought that re-using the first item 
(which I called literal_or_field_name) was too obscure.

> However, I'm not adamant about this - it's up to you what you like best, 
> I'll come up with a way to explain it. Also I recognize that your method 
> is probably more efficient for the nominal use case -- less tuple creation.

Also, it requires fewer iterations.  Instead of 2 iterations per 
field_name in the string:
   yield literal
   yield field_name, format_spec, conversion

it's just one:
   yield literal, field_name, format_spec, conversion

Like you, I don't feel strongly about which way it works.  But the Jim's 
suggestion that it's how elementtree works sort of convinced me.

> Also I wanted to ask: How about making the built-in 'format' function 
> have a default value of "" for the second argument? So I can just say:
> 
>    format(x)
> 
> as a synonym for:
> 
>    str(x)

Makes sense to me.  It would really call x.__format__(''), which the PEP 
suggests (but does not require) be the same as str(x).

>> Due to an optimization dealing with escaped braces, it's possible for
>> (literal, None, None, None) to be returned more than once.  I don't
>> think that's a problem, as long as it's documented.  If you look at
>> string.py's Formatter.vformat, I don't think it complicates the
>> implementation at all.
> 
> It's also possible for the literal text to be an empty string if you 
> have several consecutive format fields - correct?

Correct.  The literal text will always be a zero-or-greater length string.


From eric+python-dev at trueblade.com  Thu Aug 30 13:05:45 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 30 Aug 2007 07:05:45 -0400
Subject: [Python-3000] buildbots
In-Reply-To: <46D66B49.8070600@v.loewis.de>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>
	<46D66B49.8070600@v.loewis.de>
Message-ID: <46D6A489.9010206@trueblade.com>

Martin v. L?wis wrote:
>> New patch (still detects the debug build because the name of the C runtime
>> dll depends on it):
> 
> I know it will be difficult to talk you into not using ctypes :-),
> but...
> 
> I think this should go into the interpreter code itself. One problem
> with your patch is that it breaks if Python is build with VC8.
> 
> It should still require an environment variable, say
> PYTHONNOERRORWINDOW, whether or not it should be considered only
> in debug releases, I don't know. One place to put it would be
> Modules/main.c (where all the other environment variables are
> considered).

It should also not be used with pythonw.exe, correct?  In that case, you 
want the various dialog boxes.


From rrr at ronadam.com  Thu Aug 30 13:12:13 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 30 Aug 2007 06:12:13 -0500
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D40B88.4080202@trueblade.com>
References: <46D40B88.4080202@trueblade.com>
Message-ID: <46D6A60D.2070503@ronadam.com>



Eric Smith wrote:
> One of the things that PEP 3101 deliberately under specifies is the 
> Formatter class, leaving decisions up to the implementation.  Now that a 
> working implementation exists, I think it's reasonable to tighten it up.
> 
> I have checked in a Formatter class that specifies the following methods 
> (in addition to the ones already defined in the PEP):
> 
> parse(format_string)
> Loops over the format_string and returns an iterable of tuples 
> (literal_text, field_name, format_spec, conversion).  This is used by 
> vformat to break the string in to either literal text, or fields that 
> need expanding.  If literal_text is None, then expand (field_name, 
> format_spec, conversion) and append it to the output.  If literal_text 
> is not None, append it to the output.
> 
> get_field(field_name, args, kwargs, used_args)
> Given a field_name as returned by parse, convert it to an object to be 
> formatted.  The default version takes strings of the form defined in the 
> PEP, such as "0[name]" or "label.title".  It records which args have 
> been used in used_args.  args and kwargs are as passed in to vformat.

Rather than pass the used_args set out and have it modified in a different 
methods, I think it would be better to pass the arg_used back along with 
the object.  That keeps all the code that is involved in checking used args 
is in one method.  The arg_used value may be useful in other ways as well.

      obj, arg_used = self.get_field(field_name, args, kwargs)
      used_args.add(arg_used)


> convert_field(value, conversion)
> Converts the value (returned by get_field) using the conversion 
> (returned by the parse tuple).  The default version understands 'r' 
> (repr) and 's' (str).


> Or, define your own conversion character:
> =================
> class XFormatter(Formatter):
>      def convert_field(self, value, conversion):
>          if conversion == 'x':
>              return None
>          if conversion == 'r':
>              return repr(value)
>          if conversion == 's':
>              return str(value)
>          return value
> fmt = XFormatter()
> print(fmt.format("{0!r}:{0!x}", fmt))
> =================
> which prints:
> <__main__.XFormatter object at 0xf6f6d2cc>:None

I wonder if this is splitting things up a bit too finely?  If the format 
function takes a conversion argument, it makes it possible to do everything 
by overriding format_field.

     def format_field(self, value, format_spec, conversion):
         return format(value, format_spec, conversion)


Adding this to Talins suggestion, the signature of format could be...

     format(value, format_spec="", conversion="")


Then the above example becomes...

   class XFormatter(Formatter):
      def format_field(self, value, format_spec, conversion):
          if conversion == 'x':
              return "None"
          return format(value, format_spec, conversion)


It just seems cleaner to me.

Cheers,
    Ron





From martin at v.loewis.de  Thu Aug 30 13:24:34 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Aug 2007 13:24:34 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D6A489.9010206@trueblade.com>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>
	<46D66B49.8070600@v.loewis.de> <46D6A489.9010206@trueblade.com>
Message-ID: <46D6A8F2.2060106@v.loewis.de>

>> It should still require an environment variable, say
>> PYTHONNOERRORWINDOW, whether or not it should be considered only
>> in debug releases, I don't know. One place to put it would be
>> Modules/main.c (where all the other environment variables are
>> considered).
> 
> It should also not be used with pythonw.exe, correct?  In that case, you
> want the various dialog boxes.

I'm not sure. If PYTHONNOERRORWINDOW is set, I would expect that it does
not create error windows, even if it creates windows otherwise just
fine. If you don't want this, don't set PYTHONNOERRORWINDOW.

Regards,
Martin

From barry at python.org  Thu Aug 30 13:27:02 2007
From: barry at python.org (Barry Warsaw)
Date: Thu, 30 Aug 2007 07:27:02 -0400
Subject: [Python-3000] [Python-3000-checkins] r57691 -
	python/branches/py3k/Lib/email
In-Reply-To: <ca471dc20708292051i495487dbyb03a9ad1059e4d64@mail.gmail.com>
References: <20070830011514.A50B51E4002@bag.python.org>
	<5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org>
	<ca471dc20708291902h4835cbe2if4a8efb26518990f@mail.gmail.com>
	<8FE35331-8B63-421B-BBB9-044983EA5760@python.org>
	<ca471dc20708292021l19beaeeex891d710651c73c33@mail.gmail.com>
	<ca471dc20708292028s23124f88o798b3bf8d5665303@mail.gmail.com>
	<6AFC1C7C-6BF7-4057-9E53-76030FEB214C@python.org>
	<ca471dc20708292051i495487dbyb03a9ad1059e4d64@mail.gmail.com>
Message-ID: <F912696B-016E-4D75-9B3C-0703258B1F88@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 29, 2007, at 11:51 PM, Guido van Rossum wrote:

> On 8/29/07, Barry Warsaw <barry at python.org> wrote:
>> On Aug 29, 2007, at 11:28 PM, Guido van Rossum wrote:
>>> No, I don't think I can recover the changes. Would it work to just
>>> copy the files over from the sandbox, forcing Lib/email in the py3k
>>> branch to be identical to emailpkg/5_0-exp/email in the sandbox?
>>
>> Yes, that /should/ work.  I'll lose my last commit to the py3k branch
>> but that will be easy to recover.  I'm going to sleep now so if you
>> get to it before I wake up I won't do it in the morning before you
>> wake up.  And vice versa (or something like that :).
>
> OK, I did that. However (despite my promise in the checkin msg) I
> couldn't re-apply the changes which you applied to the py3kbranch. Can
> you reconstruct these yourself when you get up in the morning? I'm
> afrain I'll just break more stuff.

Thanks Guido.  Yep, will do.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtaph3EjvBPtnXfVAQIpxwQAnqjZnL7j7LjYTaSURvkAXdycju+IC3FS
p2jDnWAMA4TYEjEsyN/OEhaOMVhkPz7cEa+TYcEDe+toCkNHHq6rdEQH3ouI3y9n
mFgzEPHPu1GrhqJUp4hT4prUqU/oDbeRL9ulryTzv4JNCaIrZsmElscbUWWbrp3W
z4+LAJFD9mo=
=iAFM
-----END PGP SIGNATURE-----

From barry at python.org  Thu Aug 30 13:33:19 2007
From: barry at python.org (Barry Warsaw)
Date: Thu, 30 Aug 2007 07:33:19 -0400
Subject: [Python-3000] current status
In-Reply-To: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
References: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
Message-ID: <5BCDBECB-9509-4F76-A6D8-4DD53AEA5CC7@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 30, 2007, at 3:00 AM, Neal Norwitz wrote:

> There are 6 tests that fail on all platforms AFAIK:
>
> 3 tests failed:
>     test_mailbox test_old_mailbox test_unicode_file
> 3 skips unexpected on linux2:
>     test_smtplib test_sundry test_ssl
>
> I believe test_smtplib, test_sundry fail for the same reason at least
> partially.  They can't import email.base64mime.encode.  There are
> decode functions, but encode is gone from base64mime.  I don't know if
> that's the way it's supposed to be or not.  But smtplib can't be
> imported because encode is missing.

For now, I'll restore .encode() for a1 though it may eventually go away.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtaq/3EjvBPtnXfVAQIkRQP/WA+alW/UqOxRABBfqOxIvjFsp0Yaif/w
HcJRIrDXeZmMFF5EYX3k2iwYkJ5vQoaEtL2fbPniOU4Vu5HdPBddctjo5yzBKmGE
PsRuHCk4Q+YXoOOxNN9/vqEZnhHPjho6CTZi6wGs08czF7JqqC2vzuFFF3Fn/Iks
X77MbAKUgqM=
=7pmB
-----END PGP SIGNATURE-----

From eric+python-dev at trueblade.com  Thu Aug 30 13:38:18 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 30 Aug 2007 07:38:18 -0400
Subject: [Python-3000] buildbots
In-Reply-To: <46D6A8F2.2060106@v.loewis.de>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>
	<46D66B49.8070600@v.loewis.de> <46D6A489.9010206@trueblade.com>
	<46D6A8F2.2060106@v.loewis.de>
Message-ID: <46D6AC2A.1080907@trueblade.com>

Martin v. L?wis wrote:
>>> It should still require an environment variable, say
>>> PYTHONNOERRORWINDOW, whether or not it should be considered only
>>> in debug releases, I don't know. One place to put it would be
>>> Modules/main.c (where all the other environment variables are
>>> considered).
>> It should also not be used with pythonw.exe, correct?  In that case, you
>> want the various dialog boxes.
> 
> I'm not sure. If PYTHONNOERRORWINDOW is set, I would expect that it does
> not create error windows, even if it creates windows otherwise just
> fine. If you don't want this, don't set PYTHONNOERRORWINDOW.

But unlike Unix, these text messages are guaranteed to be lost.	 I don't 
see the point in setting up a situation where they'd be lost, but I 
don't feel that strongly about it.  As you say, don't set the 
environment variable.


From theller at ctypes.org  Thu Aug 30 13:36:11 2007
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 30 Aug 2007 13:36:11 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D6741A.9040801@v.loewis.de>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>	<46D66B49.8070600@v.loewis.de>	<fb5rcj$ig9$1@sea.gmane.org>
	<46D6741A.9040801@v.loewis.de>
Message-ID: <fb6a3c$1ls$1@sea.gmane.org>

Martin v. L?wis schrieb:
>> So, an environment variable would be useful, but maybe there should also be
>> a Python function available that calls set_error_mode().  sys.set_error_mode()?
> 
> Even though this would be somewhat lying - I'd put it into
> msvcrt.set_error_mode. For the _CrtSet functions, one might
> expose them as-is; they do belong to msvcrt, so the module
> would be the proper place. For SetErrorMode, still put it
> into msvcrt - it's at least Windows-specific.

These are all great ideas, but I'm afraid it doesn't fit into the
time I have available to spend on this.  My primary goal is to
care about the buildbots.

Ok, so I'll keep clicking 'Abort' on the message boxes whenever I see them,
and I will soon try to fix the assertion in test_os.py.

Thomas


From martin at v.loewis.de  Thu Aug 30 13:52:54 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Aug 2007 13:52:54 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D6AC2A.1080907@trueblade.com>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>
	<46D66B49.8070600@v.loewis.de> <46D6A489.9010206@trueblade.com>
	<46D6A8F2.2060106@v.loewis.de> <46D6AC2A.1080907@trueblade.com>
Message-ID: <46D6AF96.9030404@v.loewis.de>

> But unlike Unix, these text messages are guaranteed to be lost.

Not really. We are probably talking about release builds primarily,
where the only such message is the system error (as the assertions
aren't compiled in, anyway). If such an error occurs, the message
is lost - but an error code is returned to the API function that
caused the error. This error code should then translate to a Python
exception (e.g. an ImportError if the dialog tried to say that
a DLL could not be loaded). Whether or not that exception then also
gets lost depends on the application.

Regards,
Martin

From db3l.net at gmail.com  Thu Aug 30 13:49:10 2007
From: db3l.net at gmail.com (David Bolen)
Date: Thu, 30 Aug 2007 07:49:10 -0400
Subject: [Python-3000] buildbots
References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de>
	<46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org>
	<46D57D47.1090709@v.loewis.de> <fb4ecl$t5a$1@sea.gmane.org>
	<46D5D22C.3010003@v.loewis.de>
	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>
	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>
	<fb5mt8$5n4$1@sea.gmane.org> <46D66319.7030209@v.loewis.de>
	<fb5oo5$a6e$1@sea.gmane.org> <46D66B49.8070600@v.loewis.de>
	<fb5rcj$ig9$1@sea.gmane.org> <46D6741A.9040801@v.loewis.de>
Message-ID: <m2d4x5fdjd.fsf@valheru.db3l.homeip.net>

"Martin v. L?wis" <martin at v.loewis.de> writes:

>> So, an environment variable would be useful, but maybe there should also be
>> a Python function available that calls set_error_mode().  sys.set_error_mode()?
>
> Even though this would be somewhat lying - I'd put it into
> msvcrt.set_error_mode. For the _CrtSet functions, one might
> expose them as-is; they do belong to msvcrt, so the module
> would be the proper place. For SetErrorMode, still put it
> into msvcrt - it's at least Windows-specific.

For SetErrorMode, if you're just looking for a non-ctypes wrapping,
it's already covered by pywin32's win32api, which seems simple enough
to obtain (and likely to already be present) if you're working at this
level with Win32 calls as a user of Python.  Nor is ctypes very
complicated as a fallback.  I'm not sure this is a common enough a
call to need a built-in wrapping.

For this particular case of wanting to use it when developing Python
itself, it actually feels a bit more appropriate to me to make the
call external to the Python executable under test since it's really a
behavior being imposed by the test environment.  If a mechanism was
implemented to have Python issue the call itself, I'd probably limit
it to this specific use case.

-- David


From thomas at python.org  Thu Aug 30 14:17:51 2007
From: thomas at python.org (Thomas Wouters)
Date: Thu, 30 Aug 2007 14:17:51 +0200
Subject: [Python-3000] refleak in test_io?
In-Reply-To: <46D681F0.2050105@v.loewis.de>
References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com>
	<46D681F0.2050105@v.loewis.de>
Message-ID: <9e804ac0708300517y636fb92bsdee0c798458bea@mail.gmail.com>

On 8/30/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>
> > I tried recreating the leak with more controllable types, but I haven't
> > got very far. It seems to be caused by some weird interaction between
> > io.FileIO, _fileio._FileIO and io.IOBase, specifically io.IOBase.__del_
> > _() calling self.close(), and io.FileIO.close() calling
> > _fileio._FileIO.close() *and* io.RawIOBase.close(). The weird thing is
> > that the contents of RawIOBase.close() doesn't matter. The mere act of
> > calling RawBaseIO.close (self) causes the leak. Remove the call, or
> > change it into an attribute fetch, and the leak is gone. I'm stumped.
>
> I think the problem is that the class remains referenced in
> io.RawIOBase._abc_cache:
>
> py> io.RawIOBase._abc_cache
> set()
> py> class f(io.RawIOBase):pass
> ...
> py> isinstance(f(), io.RawIOBase)
> True
> py> io.RawIOBase._abc_cache
> {<class '__main__.f'>}
> py> del f
> py> io.RawIOBase._abc_cache
> {<class '__main__.f'>}
>
> Each time test_destructor is called, another class will be added to
> _abc_cache.


Ahh, thanks, I missed that cache. After browsing the code a bit it seems to
me the _abc_cache and _abc_negative_cache need to be turned into weak sets.
(Since a class can appear in any number of caches, positive and negative, we
can't just check refcounts on the items in the caches.) Do we have a weak
set implementation anywhere yet? I think I have one lying around I wrote for
someone else a while back, it could be added to the weakref module.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070830/400f61d8/attachment.htm 

From thomas at python.org  Thu Aug 30 14:24:39 2007
From: thomas at python.org (Thomas Wouters)
Date: Thu, 30 Aug 2007 14:24:39 +0200
Subject: [Python-3000] refleak in test_io?
In-Reply-To: <ee2a432c0708291707l8d604f0s1c149a5745db5650@mail.gmail.com>
References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com>
	<ee2a432c0708291707l8d604f0s1c149a5745db5650@mail.gmail.com>
Message-ID: <9e804ac0708300524l12490ae5j9aa31392c91c5ebd@mail.gmail.com>

On 8/30/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
>
> On 8/29/07, Thomas Wouters <thomas at python.org> wrote:
> >
> > Am I the only one seeing a refleak in test_io?
>
> I know of leaks in 4 modules, but they all may point to the same one
> you identified:
>
> test_io leaked [62, 62] references, sum=124
> test_urllib leaked [122, 122] references, sum=244
> test_urllib2_localnet leaked [3, 3] references, sum=6
> test_xmlrpc leaked [26, 26] references, sum=52


FWIW, they do. Removing the subclass-cache fixes all these refleaks (but
it's not really a solution ;)

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070830/9541eba3/attachment.htm 

From martin at v.loewis.de  Thu Aug 30 14:33:08 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 30 Aug 2007 14:33:08 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <m2d4x5fdjd.fsf@valheru.db3l.homeip.net>
References: <46D453E9.4020903@ctypes.org>
	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>
	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>
	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>
	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>
	<46D66B49.8070600@v.loewis.de>	<fb5rcj$ig9$1@sea.gmane.org>
	<46D6741A.9040801@v.loewis.de>
	<m2d4x5fdjd.fsf@valheru.db3l.homeip.net>
Message-ID: <46D6B904.1020505@v.loewis.de>

> For SetErrorMode, if you're just looking for a non-ctypes wrapping,
> it's already covered by pywin32's win32api, which seems simple enough
> to obtain (and likely to already be present) if you're working at this
> level with Win32 calls as a user of Python.  Nor is ctypes very
> complicated as a fallback.  I'm not sure this is a common enough a
> call to need a built-in wrapping.

However, we can't use pywin32 on the buildbot slaves - it's not
installed.

> For this particular case of wanting to use it when developing Python
> itself, it actually feels a bit more appropriate to me to make the
> call external to the Python executable under test since it's really a
> behavior being imposed by the test environment.  If a mechanism was
> implemented to have Python issue the call itself, I'd probably limit
> it to this specific use case.

That covers the SetErrorMode case, but not the CRT assertions - their
messagebox settings don't get inherited through CreateProcess.

Not sure why you want to limit it - I think it's a useful feature on
its own to allow Python to run without somebody clicking buttons.
(it would be a useful feature for windows as well to stop producing
these messages, and report them through some other mechanism).

Regards,
Martin

From guido at python.org  Thu Aug 30 15:43:21 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Aug 2007 06:43:21 -0700
Subject: [Python-3000] current status
In-Reply-To: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
References: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
Message-ID: <ca471dc20708300643i6e1e9b2x3db4107d916179f@mail.gmail.com>

On 8/30/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> There are 6 tests that fail on all platforms AFAIK:
>
> 3 tests failed:
>     test_mailbox test_old_mailbox test_unicode_file
> 3 skips unexpected on linux2:
>     test_smtplib test_sundry test_ssl

Martin fixed test_unicode_file (I think he may be the only one who
understood that test).

test_ssl is not working because the ssl support in socket.py has been
disabled -- with the latest merge from the trunk, Bill Janssen's
server-side code came in, but it doesn't work yet with 3.0, which has
a completely different set of classes in socket.py. I hope it's OK to
release 3.0a1 without SSL support.

> I believe test_smtplib, test_sundry fail for the same reason at least
> partially.  They can't import email.base64mime.encode.  There are
> decode functions, but encode is gone from base64mime.  I don't know if
> that's the way it's supposed to be or not.  But smtplib can't be
> imported because encode is missing.

Barry said he'd fix this.

> Some of the failures in test_mailbox and test_old_mailbox are the
> same, but I think test_mailbox might have more problems.
>
> I hopefully fixed some platform specific problems, but others remain:
>
> * test_normalization fails on several boxes (where locale is not C maybe?)

Oh, good suggestion. I was wondering about this myself. Alas, I have
no idea what that test does.

> * On ia64, test_tarfile.PAXUnicodeTest.test_utf7_filename generates
> this exception:
> Objects/exceptions.c:1392: PyUnicodeDecodeError_Create: Assertion
> `start < 2147483647' failed.

That's probably an uninitialized variable 'startinpos' in one of the
functions that calls unicode_decode_call_errorhandler(). It's the 7th
parameter. The header of that function is 150 characters wide. Yuck!
Someone will need to reproduce the bug and then point gdb at it and it
should be obvious. :-)

> * On ia64 and Win64 (IIRC), this fails:  self.assertEqual(round(1e20), 1e20)
> AssertionError: 0 != 1e+20
>
> * On PPC64, all the dbm code seems to be crashing
>
> * File "Lib/test/test_nis.py", line 27, in test_maps
>     if nis.match(k, nismap) != v:
> SystemError: can't use str as char buffer

Someone fixed this by changing t# into s#. Do we still need t#? Can
someone who understands it explain what it does?

> * On Solaris, hashlib can't import _md5 which creates a bunch of problems.

I've seen this on other platforms that have an old openssl version. I
think we don't have _md5 any more, so the code that looks for it is
broken -- but this means we're more dependent on openssl than I'm
comfortable with, even though the old _md5 modulehad an RSA copyright.
:-(

> * On Win64, there's this assert:
>    SystemError: Objects\longobject.c:412: bad argument to internal function
> I don't see how it's getting triggered based on the traceback though
>
> Win64 has a bunch of weird issues:
>
> http://python.org/dev/buildbot/3.0/amd64%20XP%203.0/builds/40/step-test/0

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Aug 30 15:47:24 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Aug 2007 06:47:24 -0700
Subject: [Python-3000] refleak in test_io?
In-Reply-To: <9e804ac0708300517y636fb92bsdee0c798458bea@mail.gmail.com>
References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com>
	<46D681F0.2050105@v.loewis.de>
	<9e804ac0708300517y636fb92bsdee0c798458bea@mail.gmail.com>
Message-ID: <ca471dc20708300647h3c93d1d9if3bd2d76885c3f3b@mail.gmail.com>

On 8/30/07, Thomas Wouters <thomas at python.org> wrote:
>
>
> On 8/30/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > > I tried recreating the leak with more controllable types, but I haven't
> > > got very far. It seems to be caused by some weird interaction between
> > > io.FileIO, _fileio._FileIO and io.IOBase, specifically io.IOBase.__del_
> > > _() calling self.close(), and io.FileIO.close() calling
> > > _fileio._FileIO.close() *and* io.RawIOBase.close(). The weird thing is
> > > that the contents of RawIOBase.close() doesn't matter. The mere act of
> > > calling RawBaseIO.close (self) causes the leak. Remove the call, or
> > > change it into an attribute fetch, and the leak is gone. I'm stumped.
> >
> > I think the problem is that the class remains referenced in
> > io.RawIOBase._abc_cache:
> >
> > py> io.RawIOBase._abc_cache
> > set()
> > py> class f(io.RawIOBase):pass
> > ...
> > py> isinstance(f(), io.RawIOBase)
> > True
> > py> io.RawIOBase._abc_cache
> > {<class '__main__.f'>}
> > py> del f
> > py> io.RawIOBase._abc_cache
> > {<class '__main__.f'>}
> >
> > Each time test_destructor is called, another class will be added to
> > _abc_cache.
>
> Ahh, thanks, I missed that cache. After browsing the code a bit it seems to
> me the _abc_cache and _abc_negative_cache need to be turned into weak sets.
> (Since a class can appear in any number of caches, positive and negative, we
> can't just check refcounts on the items in the caches.) Do we have a weak
> set implementation anywhere yet? I think I have one lying around I wrote for
> someone else a while back, it could be added to the weakref module.

It should be made into weak refs indeed. Post 3.0a1 I suspect. I'll
add an issue so we won't forget.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From walter at livinglogic.de  Thu Aug 30 17:48:52 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 30 Aug 2007 17:48:52 +0200
Subject: [Python-3000] current status
In-Reply-To: <ca471dc20708300643i6e1e9b2x3db4107d916179f@mail.gmail.com>
References: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
	<ca471dc20708300643i6e1e9b2x3db4107d916179f@mail.gmail.com>
Message-ID: <46D6E6E4.4030902@livinglogic.de>

Guido van Rossum wrote:

> [...[
> 
>> * On ia64, test_tarfile.PAXUnicodeTest.test_utf7_filename generates
>> this exception:
>> Objects/exceptions.c:1392: PyUnicodeDecodeError_Create: Assertion
>> `start < 2147483647' failed.
> 
> That's probably an uninitialized variable 'startinpos' in one of the
> functions that calls unicode_decode_call_errorhandler(). It's the 7th
> parameter. The header of that function is 150 characters wide. Yuck!

Seems that a linefeed has gone missing there.

> Someone will need to reproduce the bug and then point gdb at it and it
> should be obvious. :-)

I've added an initialization to the "illegal special character" branch 
of the code.

However test_tarfile.py still segfaults for me in the py3k branch. The 
top of the stacktrace is:

#0  0xb7eec07f in memcpy () from /lib/tls/libc.so.6
#1  0xb7a905bc in s_pack_internal (soself=0xb77dc97c, args=0xb77cddfc, 
offset=0, buf=0x8433c4c "")
     at /var/home/walter/checkouts/Python/py3k/Modules/_struct.c:1667
#2  0xb7a90a32 in s_pack (self=0xb77dc97c, args=0xb77cddfc) at 
/var/home/walter/checkouts/Python/py3k/Modules/_struct.c:1741
#3  0x08085f96 in PyCFunction_Call (func=0xb7a72a0c, arg=0xb77cddfc, 
kw=0x0) at Objects/methodobject.c:73

Servus,
    Walter

From walter at livinglogic.de  Thu Aug 30 17:53:42 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 30 Aug 2007 17:53:42 +0200
Subject: [Python-3000] current status
In-Reply-To: <46D6E6E4.4030902@livinglogic.de>
References: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
	<ca471dc20708300643i6e1e9b2x3db4107d916179f@mail.gmail.com>
	<46D6E6E4.4030902@livinglogic.de>
Message-ID: <46D6E806.1080207@livinglogic.de>

Walter D?rwald wrote:

> [...]
> However test_tarfile.py still segfaults for me in the py3k branch. The 
> top of the stacktrace is:
> 
> #0  0xb7eec07f in memcpy () from /lib/tls/libc.so.6
> #1  0xb7a905bc in s_pack_internal (soself=0xb77dc97c, args=0xb77cddfc, 
> offset=0, buf=0x8433c4c "")
>     at /var/home/walter/checkouts/Python/py3k/Modules/_struct.c:1667
> #2  0xb7a90a32 in s_pack (self=0xb77dc97c, args=0xb77cddfc) at 
> /var/home/walter/checkouts/Python/py3k/Modules/_struct.c:1741
> #3  0x08085f96 in PyCFunction_Call (func=0xb7a72a0c, arg=0xb77cddfc, 
> kw=0x0) at Objects/methodobject.c:73

I forgot to mention that it fails in

test_100_char_name (__main__.WriteTest) ...

Servus,
    Walter

From theller at ctypes.org  Thu Aug 30 18:10:16 2007
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 30 Aug 2007 18:10:16 +0200
Subject: [Python-3000] current status
In-Reply-To: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
References: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
Message-ID: <fb6q58$umt$1@sea.gmane.org>

Neal Norwitz schrieb:
> * On Win64, there's this assert:
>    SystemError: Objects\longobject.c:412: bad argument to internal function
> I don't see how it's getting triggered based on the traceback though

Python/getargs.c, line 672:
		ival = PyInt_AsSsize_t(arg);

calls Objects/longobject.c PyLong_AsSsize_t(), but that accepts only
PyLong_Objects (in contrast to PyLong_AsLong which calls nb_int).

Thomas


From guido at python.org  Thu Aug 30 19:02:26 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Aug 2007 10:02:26 -0700
Subject: [Python-3000] Release 3.0a1 Countdown
Message-ID: <ca471dc20708301002l5c0458e7laa7b95178ba49111@mail.gmail.com>

Tomorrow (Friday August 31) I want to do the 3.0a1 release. I want to
do it early in the day (US west coast time). It's going to be a
lightweight release -- I plan to put out a source tarball only, if
Martin wants to contribute an MSI installer that would be great. I
plan to lock the tree Friday morning early (perhaps as early as 6am
PDT, i.e. 15:00 in Germany). So get your stuff in and working (on as
many platforms as possible) by then!

I'll spend most of today writing up a what's new document and release
notes. I'm still hoping that the following unit tests that currently
fail everywhere can be fixed:

test_mailbox
test_old_mailbox
test_smtplib

(test_sundry now passes BTW, I don't see any tests being skipped unexpectedly.)

I've set up a task list on the spreadsheet we used for the sprint.
Here's the invitation:

http://spreadsheets.google.com/ccc?key=pBLWM8elhFAmKbrhhh0ApQA&inv=guido at python.org&t=3328567089265242420&guest

Use the tabs at the bottom to go to the "Countdown" sheet and you can
watch me procrastinate in real time. :-) Don't hesitate to add items!

Some other things I expect to land today:

Thomas Wouters's patch for the ref leak in the ABC cache (issue 1061)
Thomas Wouters's noslice feature
issue 1753395 (Georg)
possibly more work on PEP 3109 and 3134 (Collin)

Also I'd appreciate it if people would check the buildbots and the bug
tracker for possible issues.

Thanks everyone for the large number of improvements that came in this week!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Thu Aug 30 19:49:16 2007
From: barry at python.org (Barry Warsaw)
Date: Thu, 30 Aug 2007 13:49:16 -0400
Subject: [Python-3000] current status
In-Reply-To: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
References: <ee2a432c0708300000p178194cbyb0c354be4437ec51@mail.gmail.com>
Message-ID: <E1A6CDB3-66BB-4E43-9215-5EA36B1B1427@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 30, 2007, at 3:00 AM, Neal Norwitz wrote:

> Some of the failures in test_mailbox and test_old_mailbox are the
> same, but I think test_mailbox might have more problems.

It does, and I won't be spending any more time before a1 looking at  
it.  The problem is that MH.__setitem__() opens its file in binary  
mode, then passes a string to the base class's _dump_message()  
method.  It then tries to write a string to a binary file and you get  
a TypeError.  You can't just encode strings to bytes in _dump_message 
() though because sometimes the file you're passed is a text file and  
so you trade one failure for another.

I don't think it's quite right to do the conversion in MH.__setitem__ 
() either though because _dump_message() isn't prepared to handle  
bytes.  Maybe it should be, but the basic problem is that you can get  
passed either a text or binary file object and you need to be able to  
write either strings or bytes to either.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtcDHXEjvBPtnXfVAQJPvgP+L2cGjpioinZE/PQ/zLdQu0CebCIygpBj
RYOvSF/Mw1xiK4sOfHEdfG8LaYAgfL2mAP9smn+s5osodbPXP4kYPHTbMgzSN7oT
BhMvvMeqeosz6/sLb0hdEKdk+54zo3yqh62DeLBuYSLMhaLVoVShFdlTvOEs8YPQ
qZGQsiu57Wo=
=+sdc
-----END PGP SIGNATURE-----

From ntoronto at cs.byu.edu  Thu Aug 30 21:30:33 2007
From: ntoronto at cs.byu.edu (Neil Toronto)
Date: Thu, 30 Aug 2007 13:30:33 -0600
Subject: [Python-3000] Python-love (was Release 3.0a1 Countdown)
In-Reply-To: <ca471dc20708301002l5c0458e7laa7b95178ba49111@mail.gmail.com>
References: <ca471dc20708301002l5c0458e7laa7b95178ba49111@mail.gmail.com>
Message-ID: <46D71AD9.9050905@cs.byu.edu>

Guido van Rossum wrote:
> Thanks everyone for the large number of improvements that came in this week!

Can I echo this in general? I just lurk here, being fascinated by the 
distributed language development process, so I don't have much license 
to post and steal precious developer attention. But I'd like to thank 
everyone whose blood, sweat, and tears - volunteered - have produced the 
first programming language and set of libraries that I really fell in 
love with. What I see here and in the PEPs has got me seriously stoked 
for 3.0.

I feel I can represent thousands of developers when I say: Here's to 
you, guys. Awesome work. Thank you so much. Developing in Python is the 
smoothest and most joyous development I've ever done.

Sorry if this is too off-topic. If there were a "python-love" list I'd 
post it there. :)

Neil


From martin at v.loewis.de  Thu Aug 30 21:54:34 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Aug 2007 21:54:34 +0200
Subject: [Python-3000] Release 3.0a1 Countdown
In-Reply-To: <ca471dc20708301002l5c0458e7laa7b95178ba49111@mail.gmail.com>
References: <ca471dc20708301002l5c0458e7laa7b95178ba49111@mail.gmail.com>
Message-ID: <46D7207A.8080108@v.loewis.de>

> Tomorrow (Friday August 31) I want to do the 3.0a1 release. I want to
> do it early in the day (US west coast time). It's going to be a
> lightweight release -- I plan to put out a source tarball only, if
> Martin wants to contribute an MSI installer that would be great.

I see what I can do. An x86 installer would certainly be possible;
for AMD64, I don't have a test machine right now (but I could produce
one "blindly").

Regards,
Martin


From db3l.net at gmail.com  Thu Aug 30 22:15:18 2007
From: db3l.net at gmail.com (David Bolen)
Date: Thu, 30 Aug 2007 16:15:18 -0400
Subject: [Python-3000] buildbots
References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de>
	<46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org>
	<46D57D47.1090709@v.loewis.de> <fb4ecl$t5a$1@sea.gmane.org>
	<46D5D22C.3010003@v.loewis.de>
	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>
	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>
	<fb5mt8$5n4$1@sea.gmane.org> <46D66319.7030209@v.loewis.de>
	<fb5oo5$a6e$1@sea.gmane.org> <46D66B49.8070600@v.loewis.de>
	<fb5rcj$ig9$1@sea.gmane.org> <46D6741A.9040801@v.loewis.de>
	<m2d4x5fdjd.fsf@valheru.db3l.homeip.net>
	<46D6B904.1020505@v.loewis.de>
Message-ID: <m2k5rceq3t.fsf@valheru.db3l.homeip.net>

"Martin v. L?wis" <martin at v.loewis.de> writes:

> However, we can't use pywin32 on the buildbot slaves - it's not
> installed.

Agreed, thus my original suggestion of a standalone wrapper executable
(or using ctypes).  But for end users of Python on Windows, this is a
direct Windows-specific API wrapping, for which using the pywin32
wrapper seems appropriate, or if needed, ctypes use is trivial (for a
call taking and returning a single ulong).

>> For this particular case of wanting to use it when developing Python
>> itself, it actually feels a bit more appropriate to me to make the
>> call external to the Python executable under test since it's really a
>> behavior being imposed by the test environment.  If a mechanism was
>> implemented to have Python issue the call itself, I'd probably limit
>> it to this specific use case.
>
> That covers the SetErrorMode case, but not the CRT assertions - their
> messagebox settings don't get inherited through CreateProcess.

Agreed - for Python in debug mode, the CRT stuff needs specific
support (although Thomas' example using ctypes, albeit somewhat ugly,
did manage to still keep out in the test case runner).  I can see
exporting access to them in debug builds could be helpful as they
aren't otherwise wrapped.

> Not sure why you want to limit it - I think it's a useful feature on
> its own to allow Python to run without somebody clicking buttons.
> (it would be a useful feature for windows as well to stop producing
> these messages, and report them through some other mechanism).

I just think that if someone needs the functionality they'll have an
easy time with existing methods.  And I'm not sure it's something to
encourage average use of, if only because Python (and it's child,
potentially unrelated, processes) will behave differently than other
applications.

But it's not like I'm vehemently opposed or anything.  At this stage
I'd think having anything that prevented the popups for the buildbots
would be beneficial.  Putting it up in the test code (such as
regrtest), seems less intrusive and complicated, even if it involves
slightly ugly code, than deciding how to incorporate it into the
Python core, which could always be done subsequently.

-- David


From martin at v.loewis.de  Thu Aug 30 22:40:47 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 30 Aug 2007 22:40:47 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <m2k5rceq3t.fsf@valheru.db3l.homeip.net>
References: <46D453E9.4020903@ctypes.org>
	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>
	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>
	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>
	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>
	<46D66B49.8070600@v.loewis.de>	<fb5rcj$ig9$1@sea.gmane.org>
	<46D6741A.9040801@v.loewis.de>	<m2d4x5fdjd.fsf@valheru.db3l.homeip.net>	<46D6B904.1020505@v.loewis.de>
	<m2k5rceq3t.fsf@valheru.db3l.homeip.net>
Message-ID: <46D72B4F.6040908@v.loewis.de>

> Agreed, thus my original suggestion of a standalone wrapper executable
> (or using ctypes).

That doesn't work well, either - how do we get this wrapper onto the
build slaves? It would work if such wrapper shipped with the
operating system.

> I just think that if someone needs the functionality they'll have an
> easy time with existing methods.

I don't think it's that easy. It took three people two days to find out
how to do it correctly (and I'm still not convinced the code I committed
covers all cases).

> And I'm not sure it's something to
> encourage average use of, if only because Python (and it's child,
> potentially unrelated, processes) will behave differently than other
> applications.

I completely disagree. It's a gross annoyance of Windows that it
performs user interaction in a library call. I suspect there are
many cases where people really couldn't tolerate such user
interaction, and where they appreciate builtin support for
a window-less operation.

> But it's not like I'm vehemently opposed or anything.  At this stage
> I'd think having anything that prevented the popups for the buildbots
> would be beneficial.

Ok, I committed PYTHONNOERRORWINDOW.

> Putting it up in the test code (such as
> regrtest), seems less intrusive and complicated,

It might be less intrusive (although I don't see why this is a
desirable property); it is certainly more complicated than
calling C APIs using C code.

Regards,
Martin

From eric+python-dev at trueblade.com  Fri Aug 31 01:05:27 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 30 Aug 2007 19:05:27 -0400
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D6A60D.2070503@ronadam.com>
References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com>
Message-ID: <46D74D37.5040007@trueblade.com>

Ron Adam wrote:
>> get_field(field_name, args, kwargs, used_args)
>> Given a field_name as returned by parse, convert it to an object to be 
>> formatted.  The default version takes strings of the form defined in 
>> the PEP, such as "0[name]" or "label.title".  It records which args 
>> have been used in used_args.  args and kwargs are as passed in to 
>> vformat.
> 
> Rather than pass the used_args set out and have it modified in a 
> different methods, I think it would be better to pass the arg_used back 
> along with the object.  That keeps all the code that is involved in 
> checking used args is in one method.  The arg_used value may be useful 
> in other ways as well.
> 
>      obj, arg_used = self.get_field(field_name, args, kwargs)
>      used_args.add(arg_used)

I'm really not wild about either solution, but I suppose yours is less 
objectionable than mine.  I'll check this change in tonight (before the 
deadline).

I think you'd have to say:

if args_used is not None:
    used_args.add(args_used)

as it's possible that the field was not derived from the args or kwargs.

> I wonder if this is splitting things up a bit too finely?  If the format 
> function takes a conversion argument, it makes it possible to do 
> everything by overriding format_field.
> 
>     def format_field(self, value, format_spec, conversion):
>         return format(value, format_spec, conversion)
> 
> 
> Adding this to Talins suggestion, the signature of format could be...
> 
>     format(value, format_spec="", conversion="")

But this conflates conversions with formatting, which the PEP takes 
pains not to do.  I'd rather leave them separate, but I'll let Talin 
make the call.

Eric.


From amauryfa at gmail.com  Fri Aug 31 02:54:32 2007
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Fri, 31 Aug 2007 02:54:32 +0200
Subject: [Python-3000] New PyException_HEAD fails to compile on Windows
Message-ID: <e27efe130708301754v186df867v560243185f447253@mail.gmail.com>

Hello,

the windows version of py3k suddenly stopped compiling because of an
extra semicolon in PyException_HEAD: PyObject_HEAD definition already
ends with a semicolon.

gcc seems more tolerant though...

-- 
Amaury Forgeot d'Arc

From collinw at gmail.com  Fri Aug 31 02:56:47 2007
From: collinw at gmail.com (Collin Winter)
Date: Thu, 30 Aug 2007 17:56:47 -0700
Subject: [Python-3000] New PyException_HEAD fails to compile on Windows
In-Reply-To: <e27efe130708301754v186df867v560243185f447253@mail.gmail.com>
References: <e27efe130708301754v186df867v560243185f447253@mail.gmail.com>
Message-ID: <43aa6ff70708301756i7f0b3fcg16fdb25f5089e51a@mail.gmail.com>

On 8/30/07, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
> Hello,
>
> the windows version of py3k suddenly stopped compiling because of an
> extra semicolon in PyException_HEAD: PyObject_HEAD definition already
> ends with a semicolon.
>
> gcc seems more tolerant though...

Sorry, fix on the way...

Collin Winter

From rrr at ronadam.com  Fri Aug 31 03:11:47 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 30 Aug 2007 20:11:47 -0500
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D74D37.5040007@trueblade.com>
References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com>
	<46D74D37.5040007@trueblade.com>
Message-ID: <46D76AD3.1040301@ronadam.com>



Eric Smith wrote:
> Ron Adam wrote:
>>> get_field(field_name, args, kwargs, used_args)
>>> Given a field_name as returned by parse, convert it to an object to 
>>> be formatted.  The default version takes strings of the form defined 
>>> in the PEP, such as "0[name]" or "label.title".  It records which 
>>> args have been used in used_args.  args and kwargs are as passed in 
>>> to vformat.
>>
>> Rather than pass the used_args set out and have it modified in a 
>> different methods, I think it would be better to pass the arg_used 
>> back along with the object.  That keeps all the code that is involved 
>> in checking used args is in one method.  The arg_used value may be 
>> useful in other ways as well.
>>
>>      obj, arg_used = self.get_field(field_name, args, kwargs)
>>      used_args.add(arg_used)
> 
> I'm really not wild about either solution, but I suppose yours is less 
> objectionable than mine.  I'll check this change in tonight (before the 
> deadline).

Cool.  I looked at other possible ways, but this seemed to be the easiest 
to live with.  The alternative is to use an attributes to pass and hold 
values, but sense the Formatter class isn't a data class, that doesn't seem 
appropriate.

> I think you'd have to say:
> 
> if args_used is not None:
>    used_args.add(args_used)
> 
> as it's possible that the field was not derived from the args or kwargs.

How?  From what I can see an exception would be raised in the get_value method.

When would I ever want to get a None for args_used?


>> I wonder if this is splitting things up a bit too finely?  If the 
>> format function takes a conversion argument, it makes it possible to 
>> do everything by overriding format_field.
>>
>>     def format_field(self, value, format_spec, conversion):
>>         return format(value, format_spec, conversion)
>>
>>
>> Adding this to Talins suggestion, the signature of format could be...
>>
>>     format(value, format_spec="", conversion="")
> 
> But this conflates conversions with formatting, which the PEP takes 
> pains not to do.  I'd rather leave them separate, but I'll let Talin 
> make the call.

Yes the PEP is pretty specific on the format() function signature.


Cheers,
    Ron














From eric+python-dev at trueblade.com  Fri Aug 31 03:17:05 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 30 Aug 2007 21:17:05 -0400
Subject: [Python-3000] format_spec parameter to format() builtin defaults to
 "" [was: Re: string.Formatter class]
In-Reply-To: <46D66BC2.3060708@acm.org>
References: <46D40B88.4080202@trueblade.com>	<fb6fbf560708281607n377a1513m2a191ad569610af2@mail.gmail.com>	<46D4AD40.9070006@trueblade.com>
	<46D4E8F6.30508@trueblade.com> <46D66BC2.3060708@acm.org>
Message-ID: <46D76C11.7050208@trueblade.com>

Talin wrote:
> Also I wanted to ask: How about making the built-in 'format' function 
> have a default value of "" for the second argument? So I can just say:
> 
>    format(x)
> 
> as a synonym for:
> 
>    str(x)

I implemented this in r57797.

Eric.



From eric+python-dev at trueblade.com  Fri Aug 31 03:26:45 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 30 Aug 2007 21:26:45 -0400
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D76AD3.1040301@ronadam.com>
References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com>
	<46D74D37.5040007@trueblade.com> <46D76AD3.1040301@ronadam.com>
Message-ID: <46D76E55.6000709@trueblade.com>

Ron Adam wrote:
> 
> 
> Eric Smith wrote:
>> Ron Adam wrote:
>>>> get_field(field_name, args, kwargs, used_args)
>>>> Given a field_name as returned by parse, convert it to an object to 
>>>> be formatted.  The default version takes strings of the form defined 
>>>> in the PEP, such as "0[name]" or "label.title".  It records which 
>>>> args have been used in used_args.  args and kwargs are as passed in 
>>>> to vformat.
>>>
>>> Rather than pass the used_args set out and have it modified in a 
>>> different methods, I think it would be better to pass the arg_used 
>>> back along with the object.  That keeps all the code that is involved 
>>> in checking used args is in one method.  The arg_used value may be 
>>> useful in other ways as well.
>>>
>>>      obj, arg_used = self.get_field(field_name, args, kwargs)
>>>      used_args.add(arg_used)
>>
>> I'm really not wild about either solution, but I suppose yours is less 
>> objectionable than mine.  I'll check this change in tonight (before 
>> the deadline).
> 
> Cool.  I looked at other possible ways, but this seemed to be the 
> easiest to live with.  The alternative is to use an attributes to pass 
> and hold values, but sense the Formatter class isn't a data class, that 
> doesn't seem appropriate.

I agree that attributes seem like the wrong way to go about it.  It also 
makes recursively calling the formatter impossible without saving state.

I'm still planning on making this change tonight.

>> I think you'd have to say:
>>
>> if args_used is not None:
>>    used_args.add(args_used)
>>
>> as it's possible that the field was not derived from the args or kwargs.
> 
> How?  From what I can see an exception would be raised in the get_value 
> method.
> 
> When would I ever want to get a None for args_used?

I meant arg_used.

You're right.  I was confusing get_field with get_value.  (Surely we can 
pick better names!)

Eric.

From talin at acm.org  Fri Aug 31 03:31:51 2007
From: talin at acm.org (Talin)
Date: Thu, 30 Aug 2007 18:31:51 -0700
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D74D37.5040007@trueblade.com>
References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com>
	<46D74D37.5040007@trueblade.com>
Message-ID: <46D76F87.6050006@acm.org>

Eric Smith wrote:
> Ron Adam wrote:
>> I wonder if this is splitting things up a bit too finely?  If the format 
>> function takes a conversion argument, it makes it possible to do 
>> everything by overriding format_field.
>>
>>     def format_field(self, value, format_spec, conversion):
>>         return format(value, format_spec, conversion)
>>
>>
>> Adding this to Talins suggestion, the signature of format could be...
>>
>>     format(value, format_spec="", conversion="")
> 
> But this conflates conversions with formatting, which the PEP takes 
> pains not to do.  I'd rather leave them separate, but I'll let Talin 
> make the call.

Correct. There's no reason for 'format' to handle conversions, when its 
trivial for a caller to do it themselves:

   format(repr(value), format_spec)

-- Talin

From talin at acm.org  Fri Aug 31 03:36:06 2007
From: talin at acm.org (Talin)
Date: Thu, 30 Aug 2007 18:36:06 -0700
Subject: [Python-3000] Need Decimal.__format__
Message-ID: <46D77086.3030207@acm.org>

I'm looking for a volunteer who understands the Decimal class well 
enough to write a __format__ method for it. It should handle all of the 
same format specifiers as float.__format__, but it should not use the 
same implementation as float (so as to preserve accuracy.)

Also, I'm interested in suggestions as to any other standard types that 
ought to have a __format__ method, other than the obvious Date/Time 
classes. What kinds of things do people usually want to print?

-- Talin

From eric+python-dev at trueblade.com  Fri Aug 31 04:03:03 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 30 Aug 2007 22:03:03 -0400
Subject: [Python-3000] Need Decimal.__format__
In-Reply-To: <46D77086.3030207@acm.org>
References: <46D77086.3030207@acm.org>
Message-ID: <46D776D7.7050907@trueblade.com>

Talin wrote:
> I'm looking for a volunteer who understands the Decimal class well 
> enough to write a __format__ method for it. It should handle all of the 
> same format specifiers as float.__format__, but it should not use the 
> same implementation as float (so as to preserve accuracy.)

If no one else steps up, I can look at it.  But I doubt I can finish it 
by a1.

> Also, I'm interested in suggestions as to any other standard types that 
> ought to have a __format__ method, other than the obvious Date/Time 
> classes. What kinds of things do people usually want to print?

I can do datetime.datetime and datetime.date, if no one else already 
has.  I think they're just aliases for strftime.  Is there any problem 
with re-using the C implemenation exactly?

static PyMethodDef date_methods[] = {
...
	{"strftime",   	(PyCFunction)date_strftime,	METH_VARARGS | METH_KEYWORDS,
	 PyDoc_STR("format -> strftime() style string.")},
	{"__format__",   (PyCFunction)date_strftime,	METH_VARARGS | METH_KEYWORDS,
	 PyDoc_STR("Alias for strftime.")},

...

I just want to make sure there's no requirement that the function 
pointer be unique within the array, or anything like that.


From guido at python.org  Fri Aug 31 04:07:46 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Aug 2007 19:07:46 -0700
Subject: [Python-3000] Need Decimal.__format__
In-Reply-To: <46D776D7.7050907@trueblade.com>
References: <46D77086.3030207@acm.org> <46D776D7.7050907@trueblade.com>
Message-ID: <ca471dc20708301907w441c54fn93faf76d7761af3b@mail.gmail.com>

On 8/30/07, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Talin wrote:
> > I'm looking for a volunteer who understands the Decimal class well
> > enough to write a __format__ method for it. It should handle all of the
> > same format specifiers as float.__format__, but it should not use the
> > same implementation as float (so as to preserve accuracy.)
>
> If no one else steps up, I can look at it.  But I doubt I can finish it
> by a1.

No, that's not Talin's point: we're not expecting this in a1, but a2
would be good.

> > Also, I'm interested in suggestions as to any other standard types that
> > ought to have a __format__ method, other than the obvious Date/Time
> > classes. What kinds of things do people usually want to print?
>
> I can do datetime.datetime and datetime.date, if no one else already
> has.  I think they're just aliases for strftime.  Is there any problem
> with re-using the C implemenation exactly?
>
> static PyMethodDef date_methods[] = {
> ...
>         {"strftime",    (PyCFunction)date_strftime,     METH_VARARGS | METH_KEYWORDS,
>          PyDoc_STR("format -> strftime() style string.")},
>         {"__format__",   (PyCFunction)date_strftime,    METH_VARARGS | METH_KEYWORDS,
>          PyDoc_STR("Alias for strftime.")},
>
> ...
>
> I just want to make sure there's no requirement that the function
> pointer be unique within the array, or anything like that.
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Fri Aug 31 04:21:18 2007
From: rrr at ronadam.com (Ron Adam)
Date: Thu, 30 Aug 2007 21:21:18 -0500
Subject: [Python-3000] string.Formatter class
In-Reply-To: <46D76E55.6000709@trueblade.com>
References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com>
	<46D74D37.5040007@trueblade.com> <46D76AD3.1040301@ronadam.com>
	<46D76E55.6000709@trueblade.com>
Message-ID: <46D77B1E.6030206@ronadam.com>



Eric Smith wrote:
> Ron Adam wrote:
>>
>>
>> Eric Smith wrote:
>>> Ron Adam wrote:
>>>>> get_field(field_name, args, kwargs, used_args)
>>>>> Given a field_name as returned by parse, convert it to an object to 
>>>>> be formatted.  The default version takes strings of the form 
>>>>> defined in the PEP, such as "0[name]" or "label.title".  It records 
>>>>> which args have been used in used_args.  args and kwargs are as 
>>>>> passed in to vformat.
>>>>
>>>> Rather than pass the used_args set out and have it modified in a 
>>>> different methods, I think it would be better to pass the arg_used 
>>>> back along with the object.  That keeps all the code that is 
>>>> involved in checking used args is in one method.  The arg_used value 
>>>> may be useful in other ways as well.
>>>>
>>>>      obj, arg_used = self.get_field(field_name, args, kwargs)
>>>>      used_args.add(arg_used)
>>>
>>> I'm really not wild about either solution, but I suppose yours is 
>>> less objectionable than mine.  I'll check this change in tonight 
>>> (before the deadline).
>>
>> Cool.  I looked at other possible ways, but this seemed to be the 
>> easiest to live with.  The alternative is to use an attributes to pass 
>> and hold values, but sense the Formatter class isn't a data class, 
>> that doesn't seem appropriate.
> 
> I agree that attributes seem like the wrong way to go about it.  It also 
> makes recursively calling the formatter impossible without saving state.
> 
> I'm still planning on making this change tonight.
> 
>>> I think you'd have to say:
>>>
>>> if args_used is not None:
>>>    used_args.add(args_used)
>>>
>>> as it's possible that the field was not derived from the args or kwargs.
>>
>> How?  From what I can see an exception would be raised in the 
>> get_value method.
>>
>> When would I ever want to get a None for args_used?
> 
> I meant arg_used.

I understood.

> You're right.  I was confusing get_field with get_value.  (Surely we can 
> pick better names!)


Hmm... how about this?

          if field_name is not None:
              name, sub_names = field_name._formatter_field_name_split()
              obj = self.get_value(name, args, kwargs)
              obj = self.get_sub_value(obj, sub_names)
              obj = self.convert_field(obj, conversion)
              used_args.add(name)

              # format the object and append to the result
              result.append(self.format_field(obj, format_spec))


This doesn't require passing an arg_used value as it's available in vformat.


Get_sub_value replaces get_field.

     def get_sub_value(self, sub_value_name, args, kwargs):
         #  Get sub value of an object.
         #  (indices or attributes)
         for is_attr, i in sub_value_name:
             if is_attr:
                 obj = getattr(obj, i)
             else:
                 obj = obj[i]
         return obj

While it moves more into vformat, I think it's clearer what everything does.


Cheers,
    Ron





From guido at python.org  Fri Aug 31 06:52:08 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Aug 2007 21:52:08 -0700
Subject: [Python-3000] Release Countdown
Message-ID: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>

I'm pretty happy where we stand now -- I just squashed the last two
failing tests (test_mailbox and test_oldmailbox). It is 9:30 pm here
and I'm tired, so I'm going to try and get a good night's sleep and do
the release as early as I can tomorrow.

Remember, I'll freeze the branch (not a real lock, just a request to
stop submitting) tomorrow (Friday) around 6 am my time -- that's 9 am
US east coast, 15:00 in most of western Europe. I'd appreciate it if
there were no broken unit tests then. :-)

If there are urgent things that I need to look at, put them in the bug
tracker, set priority to urgent, version to Python 3.0, and assign
them to me. Please exercise restraint in making last-minute sweeping
changes (except to the docs).

Please do review README, RELNOTES (new!), Misc/NEWS, and especially
Doc/whatsnew/3.0.rst, (and the rest of the docs) and add what you feel
ought to be added.

You can also preview the web page I plan to use for the release --
it's not linked from anywhere yet, but here it is anyway:
http://www.python.org/download/releases/3.0/. Those of you lucky
enough to be able to edit it, please go ahead; others, add suggestions
to the bug tracker as above. Note that this page ends with a complete
copy of the release notes; I expect to be adding more release notes
after the release has been published, once we figure out what else
isn't working.

I expect 3.0a2 to follow within 2-4 weeks; the alpha release process
is relatively light-weight now that I've figured out most of the
details.

PS. PEP 101 needs a serious rewrite. It still talks about CVS.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From theller at ctypes.org  Fri Aug 31 08:07:15 2007
From: theller at ctypes.org (Thomas Heller)
Date: Fri, 31 Aug 2007 08:07:15 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D72B4F.6040908@v.loewis.de>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>	<46D66B49.8070600@v.loewis.de>	<fb5rcj$ig9$1@sea.gmane.org>	<46D6741A.9040801@v.loewis.de>	<m2d4x5fdjd.fsf@valheru.db3l.homeip.net>	<46D6B904.1020505@v.loewis.de>	<m2k5rceq3t.fsf@valheru.db3l.homeip.net>
	<46D72B4F.6040908@v.loewis.de>
Message-ID: <fb8b6k$9n0$1@sea.gmane.org>

Martin v. L?wis schrieb:
>> Agreed, thus my original suggestion of a standalone wrapper executable
>> (or using ctypes).
> 
> That doesn't work well, either - how do we get this wrapper onto the
> build slaves? It would work if such wrapper shipped with the
> operating system.
> 
>> I just think that if someone needs the functionality they'll have an
>> easy time with existing methods.
> 
> I don't think it's that easy. It took three people two days to find out
> how to do it correctly (and I'm still not convinced the code I committed
> covers all cases).
> 
>> And I'm not sure it's something to
>> encourage average use of, if only because Python (and it's child,
>> potentially unrelated, processes) will behave differently than other
>> applications.
> 
> I completely disagree. It's a gross annoyance of Windows that it
> performs user interaction in a library call. I suspect there are
> many cases where people really couldn't tolerate such user
> interaction, and where they appreciate builtin support for
> a window-less operation.

True, but we're talking about automatic testing on the buildbots in this case.

>> But it's not like I'm vehemently opposed or anything.  At this stage
>> I'd think having anything that prevented the popups for the buildbots
>> would be beneficial.
> 
> Ok, I committed PYTHONNOERRORWINDOW.

It works, but does not have the desired effect (on the buildbots, again).
PCBuild\rt.bat does run 'python_d -E ...', which ignores the environment
variables.

The debug assertion in Lib\test\test_os.py is fixed now, but the test hangs
on Windows in the next debug assertion in _bsddb_d.pyd (IIRC).

Any suggestions?

>> Putting it up in the test code (such as
>> regrtest), seems less intrusive and complicated,
> 
> It might be less intrusive (although I don't see why this is a
> desirable property); it is certainly more complicated than
> calling C APIs using C code.
> 
> Regards,
> Martin

Thomas


From talin at acm.org  Fri Aug 31 08:53:22 2007
From: talin at acm.org (Talin)
Date: Thu, 30 Aug 2007 23:53:22 -0700
Subject: [Python-3000] PATCH: library reference docs for PEP 3101
Message-ID: <46D7BAE2.3050709@acm.org>

I just posted on the tracker a patch which adds extensive documentation 
for PEP 3101 to the Python Library Reference. This includes:

   str.format()
   format()
   __format__
   Formatter
   format string syntax
   format specification mini-language

http://bugs.python.org/issue1068

(Eric, my description of the Formatter overloaded methods may not match 
your latest revisions. Feel free to point out any errors.)

Oh, and thanks to Georg for making it possible for me to actually write 
library documentation :)

-- Talin

From g.brandl at gmx.net  Fri Aug 31 11:24:37 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 31 Aug 2007 11:24:37 +0200
Subject: [Python-3000] PATCH: library reference docs for PEP 3101
In-Reply-To: <46D7BAE2.3050709@acm.org>
References: <46D7BAE2.3050709@acm.org>
Message-ID: <fb8moe$br7$1@sea.gmane.org>

Talin schrieb:
> I just posted on the tracker a patch which adds extensive documentation 
> for PEP 3101 to the Python Library Reference. This includes:
> 
>    str.format()
>    format()
>    __format__
>    Formatter
>    format string syntax
>    format specification mini-language
> 
> http://bugs.python.org/issue1068
> 
> (Eric, my description of the Formatter overloaded methods may not match 
> your latest revisions. Feel free to point out any errors.)
> 
> Oh, and thanks to Georg for making it possible for me to actually write 
> library documentation :)

I hope it was a pleasant experience :)

I've committed the patch together with more string fixes.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From eric+python-dev at trueblade.com  Fri Aug 31 11:35:22 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 31 Aug 2007 05:35:22 -0400
Subject: [Python-3000] PATCH: library reference docs for PEP 3101
In-Reply-To: <46D7BAE2.3050709@acm.org>
References: <46D7BAE2.3050709@acm.org>
Message-ID: <46D7E0DA.6000503@trueblade.com>

Talin wrote:
> I just posted on the tracker a patch which adds extensive documentation 
> for PEP 3101 to the Python Library Reference. This includes:
> 
>    str.format()
>    format()
>    __format__
>    Formatter
>    format string syntax
>    format specification mini-language
> 
> http://bugs.python.org/issue1068
> 
> (Eric, my description of the Formatter overloaded methods may not match 
> your latest revisions. Feel free to point out any errors.)

This is awesome!  Thanks.

The only 2 differences are:

- in the implementation for float formatting, a type of '' is the same 
as 'g'.  I think the PEP originally had the wording it does so that 
float(1.0, '') would match str(1.0).  This case now matches, because of 
the change that says zero length format_spec's are the same as str(). 
However, if there's anything else in the format_spec (still with no 
type), it doesn't match what str() would do.
 >>> str(1.0)
'1.0'
 >>> format(1.0)
'1.0'
 >>> format(1.0, "-")
'1'
 >>> format(1.0, "g")
'1'
Actually, str() doesn't add a decimal for exponential notation:
 >>> str(1e100)
'1e+100'

I'd like to see the docs just say that an empty type is the same as 'g', 
but I'm not sure of the use case for what the documentation currently says.


- I changed Formatter.get_field to something like:

    .. method:: get_field(field_name, args, kwargs)

       Given *field_name* as returned by :meth:`parse` (see above),
       convert it to an object to be formatted.  Returns a tuple (obj,
       used_key).  The default version takes strings of the form
       defined in :pep:`3101`, such as "0[name]" or "label.title".
       *args* and *kwargs* are as passed in to :meth:`vformat`.  The
       return value *used_key* has the same meaning as the *key*
       parameter to :meth:`get_value`.


From g.brandl at gmx.net  Fri Aug 31 11:41:38 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 31 Aug 2007 11:41:38 +0200
Subject: [Python-3000] str.decode; buffers
Message-ID: <fb8nob$eqs$1@sea.gmane.org>

Two short issues:

* Shouldn't str.decode() be removed? Every call to it says
  "TypeError: decoding str is not supported".

* Using e.g. b"abc".find("a") gives "SystemError: can't use str as char buffer".
  This should be a TypeError IMO.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From g.brandl at gmx.net  Fri Aug 31 12:20:13 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 31 Aug 2007 12:20:13 +0200
Subject: [Python-3000] PATCH: library reference docs for PEP 3101
In-Reply-To: <46D7E0DA.6000503@trueblade.com>
References: <46D7BAE2.3050709@acm.org> <46D7E0DA.6000503@trueblade.com>
Message-ID: <fb8q0m$lu1$1@sea.gmane.org>

Eric Smith schrieb:
> Talin wrote:
>> I just posted on the tracker a patch which adds extensive documentation 
>> for PEP 3101 to the Python Library Reference. This includes:
>> 
>>    str.format()
>>    format()
>>    __format__
>>    Formatter
>>    format string syntax
>>    format specification mini-language
>> 
>> http://bugs.python.org/issue1068
>> 
>> (Eric, my description of the Formatter overloaded methods may not match 
>> your latest revisions. Feel free to point out any errors.)
> 
> This is awesome!  Thanks.
> 
> The only 2 differences are:
> 
> - in the implementation for float formatting, a type of '' is the same 
> as 'g'.  I think the PEP originally had the wording it does so that 
> float(1.0, '') would match str(1.0).  This case now matches, because of 
> the change that says zero length format_spec's are the same as str(). 
> However, if there's anything else in the format_spec (still with no 
> type), it doesn't match what str() would do.
>  >>> str(1.0)
> '1.0'
>  >>> format(1.0)
> '1.0'
>  >>> format(1.0, "-")
> '1'
>  >>> format(1.0, "g")
> '1'
> Actually, str() doesn't add a decimal for exponential notation:
>  >>> str(1e100)
> '1e+100'
> 
> I'd like to see the docs just say that an empty type is the same as 'g', 
> but I'm not sure of the use case for what the documentation currently says.

Can you suggest a patch?

> - I changed Formatter.get_field to something like:
> 
>     .. method:: get_field(field_name, args, kwargs)
> 
>        Given *field_name* as returned by :meth:`parse` (see above),
>        convert it to an object to be formatted.  Returns a tuple (obj,
>        used_key).  The default version takes strings of the form
>        defined in :pep:`3101`, such as "0[name]" or "label.title".
>        *args* and *kwargs* are as passed in to :meth:`vformat`.  The
>        return value *used_key* has the same meaning as the *key*
>        parameter to :meth:`get_value`.

I changed this.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From eric+python-dev at trueblade.com  Fri Aug 31 13:22:29 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 31 Aug 2007 07:22:29 -0400
Subject: [Python-3000] PATCH: library reference docs for PEP 3101
In-Reply-To: <fb8q0m$lu1$1@sea.gmane.org>
References: <46D7BAE2.3050709@acm.org> <46D7E0DA.6000503@trueblade.com>
	<fb8q0m$lu1$1@sea.gmane.org>
Message-ID: <46D7F9F5.8090209@trueblade.com>

Georg Brandl wrote:
> Eric Smith schrieb:
>> Talin wrote:
>>> I just posted on the tracker a patch which adds extensive documentation 
>>> for PEP 3101 to the Python Library Reference. This includes:
>>>
>>>    str.format()
>>>    format()
>>>    __format__
>>>    Formatter
>>>    format string syntax
>>>    format specification mini-language
>>>
>>> http://bugs.python.org/issue1068
>>>
>>> (Eric, my description of the Formatter overloaded methods may not match 
>>> your latest revisions. Feel free to point out any errors.)
>> This is awesome!  Thanks.
>>
>> The only 2 differences are:
>>
>> - in the implementation for float formatting, a type of '' is the same 
>> as 'g'.  I think the PEP originally had the wording it does so that 
>> float(1.0, '') would match str(1.0).  This case now matches, because of 
>> the change that says zero length format_spec's are the same as str(). 
>> However, if there's anything else in the format_spec (still with no 
>> type), it doesn't match what str() would do.
>>  >>> str(1.0)
>> '1.0'
>>  >>> format(1.0)
>> '1.0'
>>  >>> format(1.0, "-")
>> '1'
>>  >>> format(1.0, "g")
>> '1'
>> Actually, str() doesn't add a decimal for exponential notation:
>>  >>> str(1e100)
>> '1e+100'
>>
>> I'd like to see the docs just say that an empty type is the same as 'g', 
>> but I'm not sure of the use case for what the documentation currently says.
> 
> Can you suggest a patch?

If we want the docs to match the code, instead of:
None   similar to ``'g'``, except that it prints at least one digit 
after the decimal point.

it would be:
None   the same as 'g'.

But before you do that, I want see what Talin says.  I'm not sure if 
instead we shouldn't modify the code to match the docs.

(Sorry about not doing a real diff.  I'm short on time, and haven't 
checked out the new docs yet.)

Eric.


From barry at python.org  Fri Aug 31 13:21:19 2007
From: barry at python.org (Barry Warsaw)
Date: Fri, 31 Aug 2007 07:21:19 -0400
Subject: [Python-3000] Release Countdown
In-Reply-To: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>
Message-ID: <DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 31, 2007, at 12:52 AM, Guido van Rossum wrote:

> I'm pretty happy where we stand now -- I just squashed the last two
> failing tests (test_mailbox and test_oldmailbox). It is 9:30 pm here
> and I'm tired, so I'm going to try and get a good night's sleep and do
> the release as early as I can tomorrow.

G'morning Guido!

I've re-enabled test_email because it now passes completely, although  
I had to cheat a bit on the last couple of failures.  I'll address  
those XXXs after a1.

For me on OS X, I'm still getting a failure in test_plistlib and an  
unexpected skip in test_ssl.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtf5r3EjvBPtnXfVAQKn1gP9HsX7xYt3O7XGPV/TAXv1W25Coh7aMLwK
IOS2TrUDbhgehuaGEcS3u2Q4HBGsDwhURCguLXpSQKch8b4At2qvUXlesOaIixh1
wpwZ5NuiFn43MG/a4MGc9L2VUuRgSyFnl0HsNw9NvklMt+o8p90cCYYaa1McKwaY
vhyf00oBTeQ=
=7zNb
-----END PGP SIGNATURE-----

From thomas at python.org  Fri Aug 31 13:25:47 2007
From: thomas at python.org (Thomas Wouters)
Date: Fri, 31 Aug 2007 13:25:47 +0200
Subject: [Python-3000] Release Countdown
In-Reply-To: <DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>
	<DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
Message-ID: <9e804ac0708310425p42dd5461s13a1e4ddd6c943e9@mail.gmail.com>

On 8/31/07, Barry Warsaw <barry at python.org> wrote:

> For me on OS X, I'm still getting a failure in test_plistlib and an
> unexpected skip in test_ssl.


The skip is intentional; the ssl module is in a state of flux, having the
latest changes from the trunk applied, but not adjusted to the new layout of
the socket.socket class.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070831/f0b44a91/attachment.htm 

From eric+python-dev at trueblade.com  Fri Aug 31 13:41:49 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 31 Aug 2007 07:41:49 -0400
Subject: [Python-3000] Release Countdown
In-Reply-To: <DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>
	<DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
Message-ID: <46D7FE7D.5020909@trueblade.com>

Barry Warsaw wrote:

> For me on OS X, I'm still getting a failure in test_plistlib and an  
> unexpected skip in test_ssl.

If it helps, the test_plistlib errors follow.


$ ./python.exe Lib/test/test_plistlib.py -v
test_appleformatting (__main__.TestPlistlib) ... ERROR
test_appleformattingfromliteral (__main__.TestPlistlib) ... ERROR
test_bytes (__main__.TestPlistlib) ... ERROR
test_bytesio (__main__.TestPlistlib) ... ERROR
test_controlcharacters (__main__.TestPlistlib) ... ok
test_create (__main__.TestPlistlib) ... ok
test_io (__main__.TestPlistlib) ... ERROR
test_nondictroot (__main__.TestPlistlib) ... ok

======================================================================
ERROR: test_appleformatting (__main__.TestPlistlib)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "Lib/test/test_plistlib.py", line 140, in test_appleformatting
     pl = plistlib.readPlistFromBytes(TESTDATA)
   File "/py3k/Lib/plat-mac/plistlib.py", line 102, in readPlistFromBytes
     return readPlist(BytesIO(data))
   File "/py3k/Lib/plat-mac/plistlib.py", line 77, in readPlist
     rootObject = p.parse(pathOrFile)
   File "/py3k/Lib/plat-mac/plistlib.py", line 405, in parse
     parser.ParseFile(fileobj)
   File "/py3k/Lib/plat-mac/plistlib.py", line 417, in handleEndElement
     handler()
   File "/py3k/Lib/plat-mac/plistlib.py", line 467, in end_data
     self.addObject(Data.fromBase64(self.getData()))
   File "/py3k/Lib/plat-mac/plistlib.py", line 374, in fromBase64
     return cls(binascii.a2b_base64(data))
SystemError: can't use str as char buffer

======================================================================
ERROR: test_appleformattingfromliteral (__main__.TestPlistlib)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "Lib/test/test_plistlib.py", line 147, in 
test_appleformattingfromliteral
     pl2 = plistlib.readPlistFromBytes(TESTDATA)
   File "/py3k/Lib/plat-mac/plistlib.py", line 102, in readPlistFromBytes
     return readPlist(BytesIO(data))
   File "/py3k/Lib/plat-mac/plistlib.py", line 77, in readPlist
     rootObject = p.parse(pathOrFile)
   File "/py3k/Lib/plat-mac/plistlib.py", line 405, in parse
     parser.ParseFile(fileobj)
   File "/py3k/Lib/plat-mac/plistlib.py", line 417, in handleEndElement
     handler()
   File "/py3k/Lib/plat-mac/plistlib.py", line 467, in end_data
     self.addObject(Data.fromBase64(self.getData()))
   File "/py3k/Lib/plat-mac/plistlib.py", line 374, in fromBase64
     return cls(binascii.a2b_base64(data))
SystemError: can't use str as char buffer

======================================================================
ERROR: test_bytes (__main__.TestPlistlib)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "Lib/test/test_plistlib.py", line 133, in test_bytes
     data = plistlib.writePlistToBytes(pl)
   File "/py3k/Lib/plat-mac/plistlib.py", line 109, in writePlistToBytes
     writePlist(rootObject, f)
   File "/py3k/Lib/plat-mac/plistlib.py", line 93, in writePlist
     writer.writeValue(rootObject)
   File "/py3k/Lib/plat-mac/plistlib.py", line 250, in writeValue
     self.writeDict(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 278, in writeDict
     self.writeValue(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 256, in writeValue
     self.writeArray(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 284, in writeArray
     self.writeValue(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 252, in writeValue
     self.writeData(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 263, in writeData
     maxlinelength = 76 - len(self.indent.replace("\t", " " * 8) *
TypeError: Type str doesn't support the buffer API

======================================================================
ERROR: test_bytesio (__main__.TestPlistlib)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "Lib/test/test_plistlib.py", line 155, in test_bytesio
     plistlib.writePlist(pl, b)
   File "/py3k/Lib/plat-mac/plistlib.py", line 93, in writePlist
     writer.writeValue(rootObject)
   File "/py3k/Lib/plat-mac/plistlib.py", line 250, in writeValue
     self.writeDict(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 278, in writeDict
     self.writeValue(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 256, in writeValue
     self.writeArray(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 284, in writeArray
     self.writeValue(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 252, in writeValue
     self.writeData(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 263, in writeData
     maxlinelength = 76 - len(self.indent.replace("\t", " " * 8) *
TypeError: Type str doesn't support the buffer API

======================================================================
ERROR: test_io (__main__.TestPlistlib)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "Lib/test/test_plistlib.py", line 127, in test_io
     plistlib.writePlist(pl, test_support.TESTFN)
   File "/py3k/Lib/plat-mac/plistlib.py", line 93, in writePlist
     writer.writeValue(rootObject)
   File "/py3k/Lib/plat-mac/plistlib.py", line 250, in writeValue
     self.writeDict(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 278, in writeDict
     self.writeValue(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 256, in writeValue
     self.writeArray(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 284, in writeArray
     self.writeValue(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 252, in writeValue
     self.writeData(value)
   File "/py3k/Lib/plat-mac/plistlib.py", line 263, in writeData
     maxlinelength = 76 - len(self.indent.replace("\t", " " * 8) *
TypeError: Type str doesn't support the buffer API

----------------------------------------------------------------------
Ran 8 tests in 0.060s

FAILED (errors=5)
Traceback (most recent call last):
   File "Lib/test/test_plistlib.py", line 185, in <module>
     test_main()
   File "Lib/test/test_plistlib.py", line 181, in test_main
     test_support.run_unittest(TestPlistlib)
   File "/py3k/Lib/test/test_support.py", line 541, in run_unittest
     _run_suite(suite)
   File "/py3k/Lib/test/test_support.py", line 523, in _run_suite
     raise TestFailed(msg)
test.test_support.TestFailed: errors occurred; run in verbose mode for 
details




From nicko at nicko.org  Fri Aug 31 14:01:35 2007
From: nicko at nicko.org (Nicko van Someren)
Date: Fri, 31 Aug 2007 13:01:35 +0100
Subject: [Python-3000] Need Decimal.__format__
In-Reply-To: <46D77086.3030207@acm.org>
References: <46D77086.3030207@acm.org>
Message-ID: <76E400CE-5A66-4409-A3DC-A9A4045D6CEB@nicko.org>

On 31 Aug 2007, at 02:36, Talin wrote:
...
> Also, I'm interested in suggestions as to any other standard types  
> that
> ought to have a __format__ method, other than the obvious Date/Time
> classes. What kinds of things do people usually want to print?

For years I've thought that various collection types would benefit  
from better formatters.  If space is limited (e.g. fixed width  
fields) then lists, tuples and sets are often better displayed  
truncated, with an ellipsis in lieu of the remainder of the  
contents.  Being able to have a standard way to print your list as  
[2, 3, 5, 7, 11,...] when you only have 20 characters would  
frequently be useful.  Having a formatter directive for collections  
to ask for the contents without the enclosing []/()/{} would also be  
useful.

It's not clear to me that there's much to be gained from building in  
more complex formatters for dictionary-like objects, since it will be  
hard to describe the plethora of different ways it could be done, and  
while having multi-line formatting options for long lists/sets would  
be nice it may deviate too far from the standard usage of string  
formatting, but displaying simple collections is a sufficiently  
common task that I think it's worth looking at.

	Nicko


From martin at v.loewis.de  Fri Aug 31 14:05:19 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 31 Aug 2007 14:05:19 +0200
Subject: [Python-3000] Release Countdown
In-Reply-To: <46D7FE7D.5020909@trueblade.com>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>	<DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
	<46D7FE7D.5020909@trueblade.com>
Message-ID: <46D803FF.9000909@v.loewis.de>

>> For me on OS X, I'm still getting a failure in test_plistlib and an  
>> unexpected skip in test_ssl.
> 
> If it helps, the test_plistlib errors follow.

In case it isn't clear: test_plistlib will fail *only* on OS X, because
it isn't run elsewhere. So somebody with OS X needs to fix it.

Regards,
Martin

From eric+python-dev at trueblade.com  Fri Aug 31 14:17:28 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 31 Aug 2007 08:17:28 -0400
Subject: [Python-3000] Release Countdown
In-Reply-To: <46D803FF.9000909@v.loewis.de>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>	<DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
	<46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de>
Message-ID: <46D806D8.4070905@trueblade.com>

Martin v. L?wis wrote:
>>> For me on OS X, I'm still getting a failure in test_plistlib and an  
>>> unexpected skip in test_ssl.
>> If it helps, the test_plistlib errors follow.
> 
> In case it isn't clear: test_plistlib will fail *only* on OS X, because
> it isn't run elsewhere. So somebody with OS X needs to fix it.

Yes, thanks.  I won't have time to fix it before a1, I thought maybe 
this might jog the memory of someone who is more familiar with the test 
to suggest a fix.

If not, I can take a look at it post a1, and/or we could disable the 
test for a1.

Eric.

From eric+python-dev at trueblade.com  Fri Aug 31 14:19:55 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 31 Aug 2007 08:19:55 -0400
Subject: [Python-3000] Need Decimal.__format__
In-Reply-To: <46D77086.3030207@acm.org>
References: <46D77086.3030207@acm.org>
Message-ID: <46D8076B.8070005@trueblade.com>

Talin wrote:
> I'm looking for a volunteer who understands the Decimal class well 
> enough to write a __format__ method for it. It should handle all of the 
> same format specifiers as float.__format__, but it should not use the 
> same implementation as float (so as to preserve accuracy.)
> 
> Also, I'm interested in suggestions as to any other standard types that 
> ought to have a __format__ method, other than the obvious Date/Time 
> classes. What kinds of things do people usually want to print?

I have a patch for adding __format__ to datetime, date, and time.  For a 
zero length format_spec, they return str(self), otherwise 
self.strftime(format_spec).

I can whip up some tests and check it in if you want this before a1, but 
if you want more discussion on what it should do then we can wait.  Let 
me know.  But since the deadline is in 40 minutes, I guess we can do it 
for a2.

As for what other types, I can't think of any.  I've scanned through my 
real work code, and int, float, string, and datetime pretty much cover it.

Eric.

From barry at python.org  Fri Aug 31 14:40:19 2007
From: barry at python.org (Barry Warsaw)
Date: Fri, 31 Aug 2007 08:40:19 -0400
Subject: [Python-3000] Release Countdown
In-Reply-To: <9e804ac0708310425p42dd5461s13a1e4ddd6c943e9@mail.gmail.com>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>
	<DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
	<9e804ac0708310425p42dd5461s13a1e4ddd6c943e9@mail.gmail.com>
Message-ID: <1D9688DF-92D5-4602-A93B-0D3998FD8891@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 31, 2007, at 7:25 AM, Thomas Wouters wrote:

> On 8/31/07, Barry Warsaw <barry at python.org> wrote:
> For me on OS X, I'm still getting a failure in test_plistlib and an
> unexpected skip in test_ssl.
>
> The skip is intentional; the ssl module is in a state of flux,  
> having the latest changes from the trunk applied, but not adjusted  
> to the new layout of the socket.socket class.

Does that mean the skip is intentionally unexpected, or unexpectedly  
intentional? :)

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtgMM3EjvBPtnXfVAQKzkwQApEm7j9iF1PCXZzYvNo6JH7Bu7BEv7TZ6
YFMQOHh4BXay4TmQxvx3jhjD4jnql01e6dBRCaNJ0xCNhsBXMOsAHc/EUkdYR7QF
5D8ozpw3uPEkhWh7AeQpynFuLdtObWmApKEXxjbDFmP/hq5LifAfHUwakx6z4F50
/iRWLNp7k6w=
=aXRL
-----END PGP SIGNATURE-----

From barry at python.org  Fri Aug 31 14:41:26 2007
From: barry at python.org (Barry Warsaw)
Date: Fri, 31 Aug 2007 08:41:26 -0400
Subject: [Python-3000] Release Countdown
In-Reply-To: <46D806D8.4070905@trueblade.com>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>	<DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
	<46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de>
	<46D806D8.4070905@trueblade.com>
Message-ID: <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 31, 2007, at 8:17 AM, Eric Smith wrote:

> Martin v. L?wis wrote:
>>>> For me on OS X, I'm still getting a failure in test_plistlib and an
>>>> unexpected skip in test_ssl.
>>> If it helps, the test_plistlib errors follow.
>>
>> In case it isn't clear: test_plistlib will fail *only* on OS X,  
>> because
>> it isn't run elsewhere. So somebody with OS X needs to fix it.
>
> Yes, thanks.  I won't have time to fix it before a1, I thought maybe
> this might jog the memory of someone who is more familiar with the  
> test
> to suggest a fix.
>
> If not, I can take a look at it post a1, and/or we could disable the
> test for a1.

I took a 5 minute crack at it this morning but didn't get anywhere  
and won't have any more time to work on it before a1.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRtgMdnEjvBPtnXfVAQKooAP5AfZCcbg682Ff/hig8Y2ZUoWdlvCpNvgL
hFLac958MYT6VmqH6/HwXnwcW1CD7l7/7RkooFGAfecG1Rr88THQHvh0k6W09Hur
lwSb65yflVRbGer0RsERgUcgZ5S1bZkzo/0NGCbmQB99RPhzTEDfSLWmFKqOyFa1
/GpuHVoIFWA=
=8/zS
-----END PGP SIGNATURE-----

From martin at v.loewis.de  Fri Aug 31 14:52:36 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 31 Aug 2007 14:52:36 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <fb8b6k$9n0$1@sea.gmane.org>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>	<46D66B49.8070600@v.loewis.de>	<fb5rcj$ig9$1@sea.gmane.org>	<46D6741A.9040801@v.loewis.de>	<m2d4x5fdjd.fsf@valheru.db3l.homeip.net>	<46D6B904.1020505@v.loewis.de>	<m2k5rceq3t.fsf@valheru.db3l.homeip.net>	<46D72B4F.6040908@v.loewis.de>
	<fb8b6k$9n0$1@sea.gmane.org>
Message-ID: <46D80F14.5080009@v.loewis.de>

> Any suggestions?

I've now backed out my first patch, and implemented an extension
to msvcrt, as well as a command line option for regrtest. Let's
see how this works.

Regards,
Martin

From theller at ctypes.org  Fri Aug 31 15:11:47 2007
From: theller at ctypes.org (Thomas Heller)
Date: Fri, 31 Aug 2007 15:11:47 +0200
Subject: [Python-3000] buildbots
In-Reply-To: <46D80F14.5080009@v.loewis.de>
References: <46D453E9.4020903@ctypes.org>	<46D45CCA.3050206@v.loewis.de>	<46D462EB.4070600@ctypes.org>	<46D4721A.2040208@ctypes.org>	<46D57D47.1090709@v.loewis.de>	<fb4ecl$t5a$1@sea.gmane.org>	<46D5D22C.3010003@v.loewis.de>	<m2wsvef5bu.fsf@valheru.db3l.homeip.net>	<m2sl62f4a3.fsf@valheru.db3l.homeip.net>	<fb5mt8$5n4$1@sea.gmane.org>	<46D66319.7030209@v.loewis.de>	<fb5oo5$a6e$1@sea.gmane.org>	<46D66B49.8070600@v.loewis.de>	<fb5rcj$ig9$1@sea.gmane.org>	<46D6741A.9040801@v.loewis.de>	<m2d4x5fdjd.fsf@valheru.db3l.homeip.net>	<46D6B904.1020505@v.loewis.de>	<m2k5rceq3t.fsf@valheru.db3l.homeip.net>	<46D72B4F.6040908@v.loewis.de>	<fb8b6k$9n0$1@sea.gmane.org>
	<46D80F14.5080009@v.loewis.de>
Message-ID: <fb942j$ns7$1@sea.gmane.org>

Martin v. L?wis schrieb:
>> Any suggestions?
> 
> I've now backed out my first patch, and implemented an extension
> to msvcrt, as well as a command line option for regrtest. Let's
> see how this works.
> 
> Regards,
> Martin

At least the tests on the win32 buildbot now do not hang any longer
if I do not click the abort button on the message box.

See for example
http://python.org/dev/buildbot/3.0/x86%20XP-3%203.0/builds/57/step-test/0

Thanks,
Thomas


From theller at ctypes.org  Fri Aug 31 15:16:49 2007
From: theller at ctypes.org (Thomas Heller)
Date: Fri, 31 Aug 2007 15:16:49 +0200
Subject: [Python-3000] Merging between trunk and py3k?
Message-ID: <fb94c1$nqj$1@sea.gmane.org>

Will commits still be merged between trunk and py3k in the future
(after the 3.0a1 release), or must this now be down by the developers
themselves?

Or is it less work for the one who does the merge if applicable bug fixes are
comitted to both trunk and py3k branch?

Thomas


From guido at python.org  Fri Aug 31 15:25:17 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Aug 2007 06:25:17 -0700
Subject: [Python-3000] Py3k Branch FROZEN
Message-ID: <ca471dc20708310625n703309cfydf6383972b7c35d9@mail.gmail.com>

Please don't submit anything to the py3k branch until I announce it's
unfrozen or I specifically ask you to do something.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 31 15:45:20 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Aug 2007 06:45:20 -0700
Subject: [Python-3000] str.decode; buffers
In-Reply-To: <fb8nob$eqs$1@sea.gmane.org>
References: <fb8nob$eqs$1@sea.gmane.org>
Message-ID: <ca471dc20708310645x561f8849h412b2721a7e2fea8@mail.gmail.com>

Yes on both accounts. Checkin coming up.

On 8/31/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Two short issues:
>
> * Shouldn't str.decode() be removed? Every call to it says
>   "TypeError: decoding str is not supported".
>
> * Using e.g. b"abc".find("a") gives "SystemError: can't use str as char buffer".
>   This should be a TypeError IMO.
>
> Georg
>
> --
> Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
> Four shall be the number of spaces thou shalt indent, and the number of thy
> indenting shall be four. Eight shalt thou not indent, nor either indent thou
> two, excepting that thou then proceed to four. Tabs are right out.
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 31 15:49:49 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Aug 2007 06:49:49 -0700
Subject: [Python-3000] str.decode; buffers
In-Reply-To: <ca471dc20708310645x561f8849h412b2721a7e2fea8@mail.gmail.com>
References: <fb8nob$eqs$1@sea.gmane.org>
	<ca471dc20708310645x561f8849h412b2721a7e2fea8@mail.gmail.com>
Message-ID: <ca471dc20708310649u493e8fc1k5d1aa4f126fc4d7@mail.gmail.com>

FWIW I think "s".find(b"b") should also raise a TypeError, but I don't
have the guts to tackle that today.

On 8/31/07, Guido van Rossum <guido at python.org> wrote:
> Yes on both accounts. Checkin coming up.
>
> On 8/31/07, Georg Brandl <g.brandl at gmx.net> wrote:
> > Two short issues:
> >
> > * Shouldn't str.decode() be removed? Every call to it says
> >   "TypeError: decoding str is not supported".
> >
> > * Using e.g. b"abc".find("a") gives "SystemError: can't use str as char buffer".
> >   This should be a TypeError IMO.
> >
> > Georg
> >
> > --
> > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
> > Four shall be the number of spaces thou shalt indent, and the number of thy
> > indenting shall be four. Eight shalt thou not indent, nor either indent thou
> > two, excepting that thou then proceed to four. Tabs are right out.
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 31 15:57:53 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Aug 2007 06:57:53 -0700
Subject: [Python-3000] Merging between trunk and py3k?
In-Reply-To: <fb94c1$nqj$1@sea.gmane.org>
References: <fb94c1$nqj$1@sea.gmane.org>
Message-ID: <ca471dc20708310657o57c71f80u5e780a9a7d038202@mail.gmail.com>

I haven't heard yet that merging is impossible or useless; there's
still a lot of similarity between the trunk and the branch.

As long that remains the case, I'd like to continue to do merges
(except for those files that have been completely rewritten or
removed, like README, bufferobject.* or intobject.*).

Once we stop merging, I'd like to reformat all C code to conform to
the new coding standard (4-space indents, no tabs, no trailing
whitespace, 80-col line length strictly enforced). But I expect
that'll be a long time in the future.

--Guido

On 8/31/07, Thomas Heller <theller at ctypes.org> wrote:
> Will commits still be merged between trunk and py3k in the future
> (after the 3.0a1 release), or must this now be down by the developers
> themselves?
>
> Or is it less work for the one who does the merge if applicable bug fixes are
> comitted to both trunk and py3k branch?
>
> Thomas
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 31 18:24:47 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Aug 2007 09:24:47 -0700
Subject: [Python-3000] Python 3.0a1 released!
Message-ID: <ca471dc20708310924r45d66728oe6a2a17a4a686b35@mail.gmail.com>

The release is available from http://python.org/download/releases/3.0/

I'll send a longer announcement to python-list and
python-announce-list. Please blog about this if you have a blog!

Thanks to all who helped out! It's been a great ride.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 31 18:25:39 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Aug 2007 09:25:39 -0700
Subject: [Python-3000] Py3k Branch UNFROZEN
Message-ID: <ca471dc20708310925y3067b977h621d37dec8bec96a@mail.gmail.com>

The branch is now unfrozen. I tagged the release as r30a1.

On 8/31/07, Guido van Rossum <guido at python.org> wrote:
> Please don't submit anything to the py3k branch until I announce it's
> unfrozen or I specifically ask you to do something.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Aug 31 18:36:08 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Aug 2007 09:36:08 -0700
Subject: [Python-3000] Release Countdown
In-Reply-To: <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>
	<DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
	<46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de>
	<46D806D8.4070905@trueblade.com>
	<797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org>
Message-ID: <ca471dc20708310936v4f0f9601ke4819b6c87af95d2@mail.gmail.com>

On 8/31/07, Barry Warsaw <barry at python.org> wrote:
> On Aug 31, 2007, at 8:17 AM, Eric Smith wrote:
> > Martin v. L?wis wrote:
> >>>> For me on OS X, I'm still getting a failure in test_plistlib and an
> >>>> unexpected skip in test_ssl.
> >>> If it helps, the test_plistlib errors follow.
> >>
> >> In case it isn't clear: test_plistlib will fail *only* on OS X,
> >> because
> >> it isn't run elsewhere. So somebody with OS X needs to fix it.
> >
> > Yes, thanks.  I won't have time to fix it before a1, I thought maybe
> > this might jog the memory of someone who is more familiar with the
> > test
> > to suggest a fix.
> >
> > If not, I can take a look at it post a1, and/or we could disable the
> > test for a1.
>
> I took a 5 minute crack at it this morning but didn't get anywhere
> and won't have any more time to work on it before a1.

No worry, I cracked it, just in time before the release.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Fri Aug 31 18:46:41 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 31 Aug 2007 18:46:41 +0200
Subject: [Python-3000] Compiling Python 3.0 with MS Visual Studio 2005
Message-ID: <fb9glk$hpg$1@sea.gmane.org>

I tried to compile Python 3.0 with MS Visual Studio 2005 on Windows XP
SP2 (German) and I run into multiple problems with 3rd party modules.
The problem with time on German installations of Windows still exists.
It renders Python 3.0a on Windows for Germans pretty useless. :/

* the import of _time still fails on my (German) windows box because my
time zone contains umlauts. "set TZ=GMT" fixes the problem.

* bzip2: The readme claims that libbz2.lib is compiled automatically but
it's not the case for pcbuild8. I had to compile it manually using the
libbz2.dsp project file with the target "Release".

* bsddb: The recipe for _bsddb isn't working because Berkeley_DB.sln was
created with an older version of Visual Studio. But one can convert the
file and build it with Visual Studio easily without using the shell.
Only the db_static project is required. The bsddb project has another
issue. The file _bsddb.pyd ends in win32pgo and not in win32debug or
win32release.

* MSI: msi.lib is missing on x86. It's must be installed with the
platform SDK. The blog entry
http://blogs.msdn.com/heaths/archive/2005/12/15/504399.aspx explains the
background and how to install msi.lib

* sqlite3: The sqlite3.dll isn't copied into the build directories
win32debug and win32release. This breaks the imports and tests.

* SSL: The _ssl project is not listed in the project map and the
compilation of openssl fails with an error in crypto/des/enc_read.c:150
"The POSIX name for this item is deprecated. Instead, use the ISO C++
conformant name: _read."

Christian


From duda.piotr at gmail.com  Fri Aug 31 19:26:08 2007
From: duda.piotr at gmail.com (Piotr Duda)
Date: Fri, 31 Aug 2007 19:26:08 +0200
Subject: [Python-3000] os.stat() raises UnicodeDecodeError on Polish windows
	if file not exist
Message-ID: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com>

In 3.0a1 on Polish winxp os.stat() raises UnicodeDecodeError (utf-8
codec can't decode ...) if file not exist, it is probably caused by
localized error messages returned by FormatMessage.

-- 
????????
??????

From lists at cheimes.de  Fri Aug 31 20:14:26 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 31 Aug 2007 20:14:26 +0200
Subject: [Python-3000] os.stat() raises UnicodeDecodeError on Polish
 windows if file not exist
In-Reply-To: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com>
References: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com>
Message-ID: <fb9lq3$msd$1@sea.gmane.org>

Piotr Duda wrote:
> In 3.0a1 on Polish winxp os.stat() raises UnicodeDecodeError (utf-8
> codec can't decode ...) if file not exist, it is probably caused by
> localized error messages returned by FormatMessage.

On German Win XP, too

Christian


From amauryfa at gmail.com  Fri Aug 31 22:31:10 2007
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Fri, 31 Aug 2007 22:31:10 +0200
Subject: [Python-3000] os.stat() raises UnicodeDecodeError on Polish
	windows if file not exist
In-Reply-To: <fb9lq3$msd$1@sea.gmane.org>
References: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com>
	<fb9lq3$msd$1@sea.gmane.org>
Message-ID: <e27efe130708311331o7667aac7sdbfb1a0a03f086f7@mail.gmail.com>

Hello,
Christian Heimes wrote:
> Piotr Duda wrote:
> > In 3.0a1 on Polish winxp os.stat() raises UnicodeDecodeError (utf-8
> > codec can't decode ...) if file not exist, it is probably caused by
> > localized error messages returned by FormatMessage.
>
> On German Win XP, too

Would you please test with the following patch? it seems to correct
the problem on my French Windows XP.
Maybe we can have it corrected before a complete European tour...

-- 
Amaury Forgeot d'Arc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: errors.diff
Type: application/octet-stream
Size: 4574 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070831/2a318b5d/attachment.obj 

From guido at python.org  Fri Aug 31 22:59:42 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Aug 2007 13:59:42 -0700
Subject: [Python-3000] os.stat() raises UnicodeDecodeError on Polish
	windows if file not exist
In-Reply-To: <e27efe130708311331o7667aac7sdbfb1a0a03f086f7@mail.gmail.com>
References: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com>
	<fb9lq3$msd$1@sea.gmane.org>
	<e27efe130708311331o7667aac7sdbfb1a0a03f086f7@mail.gmail.com>
Message-ID: <ca471dc20708311359n672f3ed1n6f7f1a8e3c4e1b42@mail.gmail.com>

Can you guys please put this in the bug tracker too?

On 8/31/07, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
> Hello,
> Christian Heimes wrote:
> > Piotr Duda wrote:
> > > In 3.0a1 on Polish winxp os.stat() raises UnicodeDecodeError (utf-8
> > > codec can't decode ...) if file not exist, it is probably caused by
> > > localized error messages returned by FormatMessage.
> >
> > On German Win XP, too
>
> Would you please test with the following patch? it seems to correct
> the problem on my French Windows XP.
> Maybe we can have it corrected before a complete European tour...
>
> --
> Amaury Forgeot d'Arc
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Fri Aug 31 23:00:47 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 31 Aug 2007 17:00:47 -0400
Subject: [Python-3000] Release Countdown
In-Reply-To: <ca471dc20708310936v4f0f9601ke4819b6c87af95d2@mail.gmail.com>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>
	<DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
	<46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de>
	<46D806D8.4070905@trueblade.com>
	<797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org>
	<ca471dc20708310936v4f0f9601ke4819b6c87af95d2@mail.gmail.com>
Message-ID: <fb6fbf560708311400n4956a645gffcffd38815a03dc@mail.gmail.com>

On 8/31/07, Guido van Rossum <guido at python.org> wrote:

> > >>>> For me on OS X, I'm still getting a failure in test_plistlib and an

> No worry, I cracked it, just in time before the release.

Seeing the recent changes to plistlib does make me think that bytes is
more awkward than it should be.  The changes I would suggest:

(1)  Allow bytes methods to take a literal string (which will
obviously be in the source file's encoding).

Needing to change

    for line in data.asBase64(maxlinelength).split("\n"):
to
    for line in data.asBase64(maxlinelength).split(b"\n"):

(even when I know the "integers" represent ASCII letters) is exactly
the sort of type-checking that annoys me in Java.

http://svn.python.org/view/python/branches/py3k/Lib/plat-mac/plistlib.py?rev=57844&r1=57744&r2=57844


(2)  There really ought to be an immutable bytes type, and the literal
(or at least a literal, if capitalization matters) ought to be the
immutable.

PLISTHEADER = b"""\
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD
PLIST 1.0//EN" "http://www.apple.com/DTDs/
PropertyList-1.0.dtd">
"""

If the value of PLISTHEADER does change during the run, it will almost
certainly be a bug.  I could code defensively by only ever passing
copies, but that seems wasteful, and it could hide other bugs.  If
something does try to modify (not replace, modify) it, then there was
probably a typo or API misunderstanding; I *want* an exception.

http://svn.python.org/view/python/branches/py3k/Lib/plat-mac/plistlib.py?rev=57563&r1=57305&r2=57563

-jJ

From guido at python.org  Fri Aug 31 23:03:46 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Aug 2007 14:03:46 -0700
Subject: [Python-3000] Release Countdown
In-Reply-To: <fb6fbf560708311400n4956a645gffcffd38815a03dc@mail.gmail.com>
References: <ca471dc20708302152n3036d987w988487d23b1e25bd@mail.gmail.com>
	<DE23846E-DD6F-499B-A82B-407E15CE9D5A@python.org>
	<46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de>
	<46D806D8.4070905@trueblade.com>
	<797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org>
	<ca471dc20708310936v4f0f9601ke4819b6c87af95d2@mail.gmail.com>
	<fb6fbf560708311400n4956a645gffcffd38815a03dc@mail.gmail.com>
Message-ID: <ca471dc20708311403q6ddd6131x9d20d44996553df1@mail.gmail.com>

On 8/31/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 8/31/07, Guido van Rossum <guido at python.org> wrote:
>
> > > >>>> For me on OS X, I'm still getting a failure in test_plistlib and an
>
> > No worry, I cracked it, just in time before the release.
>
> Seeing the recent changes to plistlib does make me think that bytes is
> more awkward than it should be.  The changes I would suggest:
>
> (1)  Allow bytes methods to take a literal string (which will
> obviously be in the source file's encoding).

Yuck, yuck about the source file encoding part. Also, there is no way
to tell that a particular argument was passed a literal. The very
definition of "this was a literal" is iffy -- is x a literal when
passed to f below?

  x = "abc"
  f(x)

> Needing to change
>
>     for line in data.asBase64(maxlinelength).split("\n"):
> to
>     for line in data.asBase64(maxlinelength).split(b"\n"):
>
> (even when I know the "integers" represent ASCII letters) is exactly
> the sort of type-checking that annoys me in Java.
>
> http://svn.python.org/view/python/branches/py3k/Lib/plat-mac/plistlib.py?rev=57844&r1=57744&r2=57844
>
>
> (2)  There really ought to be an immutable bytes type, and the literal
> (or at least a literal, if capitalization matters) ought to be the
> immutable.
>
> PLISTHEADER = b"""\
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD
> PLIST 1.0//EN" "http://www.apple.com/DTDs/
> PropertyList-1.0.dtd">
> """
>
> If the value of PLISTHEADER does change during the run, it will almost
> certainly be a bug.  I could code defensively by only ever passing
> copies, but that seems wasteful, and it could hide other bugs.  If
> something does try to modify (not replace, modify) it, then there was
> probably a typo or API misunderstanding; I *want* an exception.

Sounds like you're worrying to much. Do you have any indication that
this is going to be a common problem?
> http://svn.python.org/view/python/branches/py3k/Lib/plat-mac/plistlib.py?rev=57563&r1=57305&r2=57563
>
> -jJ
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Fri Aug 31 23:23:41 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 31 Aug 2007 23:23:41 +0200
Subject: [Python-3000] Compiling Python 3.0 with MS Visual Studio 2005
In-Reply-To: <fb9glk$hpg$1@sea.gmane.org>
References: <fb9glk$hpg$1@sea.gmane.org>
Message-ID: <46D886DD.2070601@v.loewis.de>

Christian Heimes schrieb:
> I tried to compile Python 3.0 with MS Visual Studio 2005 on Windows XP
> SP2 (German) and I run into multiple problems with 3rd party modules.
> The problem with time on German installations of Windows still exists.

Not for me - it works fine here. Are you sure your source is up-to-date?

I can't comment on PCbuild8 problems - this directory is largely
unmaintained.

Regards,
Martin

From pfdubois at gmail.com  Thu Aug 30 19:11:03 2007
From: pfdubois at gmail.com (Paul Dubois)
Date: Thu, 30 Aug 2007 10:11:03 -0700
Subject: [Python-3000] Patch for Doc/tutorial
Message-ID: <f74a6c2f0708301011h5acf7425kcc884e7c7306efbf@mail.gmail.com>

Attached is a patch for changes to the tutorial. I made it by doing:

svn diff tutorial > tutorial.diff

in the Doc directory. I hope this is what is wanted; if not let me know what
to do.

Unfortunately cygwin will not run Sphinx correctly even using 2.5, much less
3.0. And running docutils by hand gets a lot of errors because Sphinx has
hidden a lot of the definitions used in the tutorial. So the bottom line is
I have only an imperfect idea if I have screwed up any formatting.

I would like to rewrite the classes.rst file in particular, and it is the
one that I did not check to be sure the examples worked, but first I need to
do something about getting me a real Linux so I don't have these problems.
So unless someone is hot to trot I'd like to remain 'owner' of this issue on
the spreadsheet.

Whoever puts in these patches, I would appreciate being notified that it is
done.

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070830/b366efa3/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tutorial.diff
Type: application/octet-stream
Size: 54448 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070830/b366efa3/attachment-0001.obj