From andrewm at object-craft.com.au  Mon Sep  1 03:53:45 2008
From: andrewm at object-craft.com.au (Andrew McNamara)
Date: Mon, 01 Sep 2008 11:53:45 +1000
Subject: [Python-3000] Minor problem with ABCMeta?
Message-ID: <20080901015345.A1196600801@longblack.object-craft.com.au>

The __subclasscheck__ method of ABCMeta contains the following code:

        # Check if it's a subclass of a registered class (recursive)
        for rcls in cls._abc_registry:
            if issubclass(subclass, rcls):
                cls._abc_registry.add(subclass)
                return True

It looks to me like this code will result in an unnecessary call to
cls._abc_registry.add() in the case that "subclass" is already in
cls._abc_registry. It looks like the code should be preceded with
something like:

        if subclass in cls._abc_registry:
            return True

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

From ncoghlan at gmail.com  Mon Sep  1 12:05:58 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 01 Sep 2008 20:05:58 +1000
Subject: [Python-3000] Minor problem with ABCMeta?
In-Reply-To: <20080901015345.A1196600801@longblack.object-craft.com.au>
References: <20080901015345.A1196600801@longblack.object-craft.com.au>
Message-ID: <48BBBE86.20506@gmail.com>

Andrew McNamara wrote:
> The __subclasscheck__ method of ABCMeta contains the following code:
> 
>         # Check if it's a subclass of a registered class (recursive)
>         for rcls in cls._abc_registry:
>             if issubclass(subclass, rcls):
>                 cls._abc_registry.add(subclass)
>                 return True
> 
> It looks to me like this code will result in an unnecessary call to
> cls._abc_registry.add() in the case that "subclass" is already in
> cls._abc_registry. It looks like the code should be preceded with
> something like:
> 
>         if subclass in cls._abc_registry:
>             return True

Actually, it looks to me like the subclass is getting added to the wrong
set - it should be going into the _abc_cache, not the _abc_registry.

Tracker item with the 2-line patch here if someone would care to give it
the necessary post-beta review:
http://bugs.python.org/issue3747

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From andrewm at object-craft.com.au  Tue Sep  2 03:46:34 2008
From: andrewm at object-craft.com.au (Andrew McNamara)
Date: Tue, 02 Sep 2008 11:46:34 +1000
Subject: [Python-3000] Minor problem with ABCMeta?
In-Reply-To: <48BBBE86.20506@gmail.com> 
References: <20080901015345.A1196600801@longblack.object-craft.com.au>
	<48BBBE86.20506@gmail.com>
Message-ID: <20080902014634.326D0600801@longblack.object-craft.com.au>

>Actually, it looks to me like the subclass is getting added to the wrong
>set - it should be going into the _abc_cache, not the _abc_registry.

Ah, you're right - that makes a lot more sense. Thanks.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

From andrewm at object-craft.com.au  Tue Sep  2 04:37:39 2008
From: andrewm at object-craft.com.au (Andrew McNamara)
Date: Tue, 02 Sep 2008 12:37:39 +1000
Subject: [Python-3000] re.escape() fails when passed bytes()
Message-ID: <20080902023739.72D14600801@longblack.object-craft.com.au>

In Python 2, re.escape() works with either str or unicode, but in Python
3, it no longer works with bytes().

I've created issue 3756 to track this:

    http://bugs.python.org/issue3756

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

From guido at python.org  Tue Sep  2 19:26:27 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 2 Sep 2008 10:26:27 -0700
Subject: [Python-3000] Should len() clip to sys.maxsize or raise
	OverflowError?
In-Reply-To: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
Message-ID: <ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>

On Sat, Aug 30, 2008 at 8:07 AM, Hagen F?rstenau
<hagenf at coli.uni-saarland.de> wrote:
> While __len__() is allowed to return a value of any size, issues 2723
> and 3729 need a decision on what len() should do if the value doesn't
> fit in a Py_ssize_t.
>
> In a previous thread
> (http://mail.python.org/pipermail/python-3000/2008-May/013387.html)
> Guido wanted len() to "lie" and return sys.maxsize in this case, but
> several people have voiced strong discomfort with that. Any comments
> or pronouncements?

I stand by my view. I might voice strong discomfort with raising an
exception because it doesn't fit in some implementation detail.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From daniel at stutzbachenterprises.com  Tue Sep  2 20:14:30 2008
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Tue, 2 Sep 2008 13:14:30 -0500
Subject: [Python-3000] Should len() clip to sys.maxsize or raise
	OverflowError?
In-Reply-To: <ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
Message-ID: <eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>

On Tue, Sep 2, 2008 at 12:26 PM, Guido van Rossum <guido at python.org> wrote:
> I stand by my view. I might voice strong discomfort with raising an
> exception because it doesn't fit in some implementation detail.

Isn't that precisely what OverflowError is for?  ("it doesn't fit in some
implementation detail")

It seems to me that the Purity angle here would be to allow len() to return
any Python int object.  The Practical angle wants to restrict it to
sys.maxsize for performance reasons.  Throwing an OverflowError seems like a
good way for Practical to cry, "Oops, I've been caught".

(I'm interested in this issue because my list-like extension
type<http://stutzbachenterprises.com/blist>can in some cases have a
length greater than sys.maxsize)

--
Daniel Stutzbach, Ph.D.
http://stutzbachenterprises.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080902/159cdb9e/attachment.htm>

From guido at python.org  Tue Sep  2 20:21:49 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 2 Sep 2008 11:21:49 -0700
Subject: [Python-3000] Should len() clip to sys.maxsize or raise
	OverflowError?
In-Reply-To: <eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
Message-ID: <ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>

On Tue, Sep 2, 2008 at 11:14 AM, Daniel Stutzbach
<daniel at stutzbachenterprises.com> wrote:
> On Tue, Sep 2, 2008 at 12:26 PM, Guido van Rossum <guido at python.org> wrote:
>> I stand by my view. I might voice strong discomfort with raising an
>> exception because it doesn't fit in some implementation detail.
>
> Isn't that precisely what OverflowError is for?  ("it doesn't fit in some
> implementation detail")
>
> It seems to me that the Purity angle here would be to allow len() to return
> any Python int object.  The Practical angle wants to restrict it to
> sys.maxsize for performance reasons.  Throwing an OverflowError seems like a
> good way for Practical to cry, "Oops, I've been caught".
>
> (I'm interested in this issue because my list-like extension type can in
> some cases have a length greater than sys.maxsize)

The way I see it is that there are tons of ways I can think of how
raising OverflowError can break unsuspecting programs (e.g. code that
has been tested before but never with a humungous input), whereas
returning a "little white lie" would allow such code to proceed just
fine. Some examples of code that is inconvenienced by the exception:

if len(x): # used as non-empty test
if len(x) > 100: # used to guarantee we can access items 0 through 99
for i in range(len(x)): # will be broken out before reaching the end

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Tue Sep  2 21:08:14 2008
From: python at rcn.com (Raymond Hettinger)
Date: Tue, 2 Sep 2008 12:08:14 -0700
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com><ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com><eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
Message-ID: <D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>

From: "Guido van Rossum" <guido at python.org>
> The way I see it is that there are tons of ways I can think of how
> raising OverflowError can break unsuspecting programs (e.g. code that
> has been tested before but never with a humungous input), whereas
> returning a "little white lie" would allow such code to proceed just
> fine. Some examples of code that is inconvenienced by the exception:
> 
> if len(x): # used as non-empty test
> if len(x) > 100: # used to guarantee we can access items 0 through 99
> for i in range(len(x)): # will be broken out before reaching the end

That makes sense to me and there a probably plenty of examples.
However, I worry more about other examples that will fail
and do so it a way that is nearly impossible to find through
code review (because the code IS correct as written).

  n = len(log_entries)
  if log_entries[n] in handled:
      log_entries.pop(n)

It's not hard to imagine other examples with slicing and whatnot.
These cases may be less common that those pointed out by Guido
but they will be disasterous when they occur and very hard to
defend against or debug.

I would rather face the overflow errors when they arise than
deal with the latter cases.   In the former, I can always make an immediate
fix by replacing the builtin with an Overflow suppressing version.
But, in the latter case, the silent failure is *much* harder to deal with.

Raymond

From guido at python.org  Tue Sep  2 21:20:32 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 2 Sep 2008 12:20:32 -0700
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
In-Reply-To: <D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
Message-ID: <ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>

On Tue, Sep 2, 2008 at 12:08 PM, Raymond Hettinger <python at rcn.com> wrote:
> From: "Guido van Rossum" <guido at python.org>
>>
>> The way I see it is that there are tons of ways I can think of how
>> raising OverflowError can break unsuspecting programs (e.g. code that
>> has been tested before but never with a humungous input), whereas
>> returning a "little white lie" would allow such code to proceed just
>> fine. Some examples of code that is inconvenienced by the exception:
>>
>> if len(x): # used as non-empty test
>> if len(x) > 100: # used to guarantee we can access items 0 through 99
>> for i in range(len(x)): # will be broken out before reaching the end
>
> That makes sense to me and there a probably plenty of examples.
> However, I worry more about other examples that will fail
> and do so it a way that is nearly impossible to find through
> code review (because the code IS correct as written).
>
>  n = len(log_entries)
>  if log_entries[n] in handled:

This should raise an IndexError. I think you meant something else?

>     log_entries.pop(n)
>
> It's not hard to imagine other examples with slicing and whatnot.
> These cases may be less common that those pointed out by Guido
> but they will be disasterous when they occur and very hard to
> defend against or debug.
>
> I would rather face the overflow errors when they arise than
> deal with the latter cases.   In the former, I can always make an immediate
> fix by replacing the builtin with an Overflow suppressing version.
> But, in the latter case, the silent failure is *much* harder to deal with.
>
>
> Raymond
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From digitalxero at gmail.com  Tue Sep  2 22:13:21 2008
From: digitalxero at gmail.com (Dj Gilcrease)
Date: Tue, 2 Sep 2008 14:13:21 -0600
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
In-Reply-To: <ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
Message-ID: <e9764b730809021313y11c7fe6dnba1694cfc6455e33@mail.gmail.com>

why would it raise an index error when log_entries has more indicies
then sys.maxsize, it should just check the entry @ sys.maxsize.

Maybe I missed it, but why cant len just return an int, which if I
remember correctly is now a long in py3k, so on a 64 bit system len
would (hopefully) never lie, but on a 32 bit system it could return a
number segnificently lower then the actual number of entries.

On 9/2/08, Guido van Rossum <guido at python.org> wrote:
> On Tue, Sep 2, 2008 at 12:08 PM, Raymond Hettinger <python at rcn.com> wrote:
>> From: "Guido van Rossum" <guido at python.org>
>>>
>>> The way I see it is that there are tons of ways I can think of how
>>> raising OverflowError can break unsuspecting programs (e.g. code that
>>> has been tested before but never with a humungous input), whereas
>>> returning a "little white lie" would allow such code to proceed just
>>> fine. Some examples of code that is inconvenienced by the exception:
>>>
>>> if len(x): # used as non-empty test
>>> if len(x) > 100: # used to guarantee we can access items 0 through 99
>>> for i in range(len(x)): # will be broken out before reaching the end
>>
>> That makes sense to me and there a probably plenty of examples.
>> However, I worry more about other examples that will fail
>> and do so it a way that is nearly impossible to find through
>> code review (because the code IS correct as written).
>>
>>  n = len(log_entries)
>>  if log_entries[n] in handled:
>
> This should raise an IndexError. I think you meant something else?
>
>>     log_entries.pop(n)
>>
>> It's not hard to imagine other examples with slicing and whatnot.
>> These cases may be less common that those pointed out by Guido
>> but they will be disasterous when they occur and very hard to
>> defend against or debug.
>>
>> I would rather face the overflow errors when they arise than
>> deal with the latter cases.   In the former, I can always make an
>> immediate
>> fix by replacing the builtin with an Overflow suppressing version.
>> But, in the latter case, the silent failure is *much* harder to deal with.
>>
>>
>> Raymond
>>
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/digitalxero%40gmail.com
>

-- 
Dj Gilcrease
OpenRPG Developer
~~http://www.openrpg.com
OpenRPG+ Lead Developer
~~http://openrpg.digitalxero.net
XeroPortal Creator
~~http://www.digitalxero.net
Web Admin for Thewarcouncil.us
~~http://www.thewarcouncil.us

From python at rcn.com  Tue Sep  2 22:35:18 2008
From: python at rcn.com (Raymond Hettinger)
Date: Tue, 2 Sep 2008 13:35:18 -0700
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
Message-ID: <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>

>> That makes sense to me and there a probably plenty of examples.
>> However, I worry more about other examples that will fail
>> and do so it a way that is nearly impossible to find through
>> code review (because the code IS correct as written).
>>
>>  n = len(log_entries)
>>  if log_entries[n] in handled:
> 
> This should raise an IndexError. I think you meant something else?
> 
>>     log_entries.pop(n)

Right. It should have been n-1 in my quick example.
The idea is that if the len() return value is actually being use for something
(in this case indexing, but possibly also slicing, resource managment, etc),
then the app will silently start doing the wrong thing.

  next_ticket_number = len(tickets)
  create_new_ticket_form(time(), next_ticket_number)

ISTM, there are many uses for len() when it is bad news if the result
is less than the real length.  Those cases will be harder to detect and
correct than if an overflow was raised.

Raymond

From guido at python.org  Tue Sep  2 22:53:00 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 2 Sep 2008 13:53:00 -0700
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
In-Reply-To: <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
	<52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
Message-ID: <ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>

On Tue, Sep 2, 2008 at 1:35 PM, Raymond Hettinger <python at rcn.com> wrote:
>>> That makes sense to me and there a probably plenty of examples.
>>> However, I worry more about other examples that will fail
>>> and do so it a way that is nearly impossible to find through
>>> code review (because the code IS correct as written).
>>>
>>>  n = len(log_entries)
>>>  if log_entries[n] in handled:
>>
>> This should raise an IndexError. I think you meant something else?
>>
>>>    log_entries.pop(n)
>
> Right. It should have been n-1 in my quick example.

And why not -1? That doesn't have the clipping problem.

> The idea is that if the len() return value is actually being use for
> something
> (in this case indexing, but possibly also slicing, resource managment, etc),
> then the app will silently start doing the wrong thing.
>
>  next_ticket_number = len(tickets)
>  create_new_ticket_form(time(), next_ticket_number)
>
> ISTM, there are many uses for len() when it is bad news if the result
> is less than the real length.  Those cases will be harder to detect and
> correct than if an overflow was raised.

I'm sorry, but toy examples like these don't convince me. Most of them
sound like they are likely using real lists.

But for *real* lists, and for anything that actually store an item (no
matter how small -- could be a reference to None) for each valid index
value, there is no possibility that __len__() will ever overlow, since
the clipping limit is half the memory size, while the theoretical
number of references that can be stored in memory is either 1/4th or
1/8th of the memory size depending on pointer size.

The only time when __len__ can be larger than sys.maxsize is when the
class implements some kind of virtual space where the values are
computed on the fly. In such cases trying to walk over all values is
bound to take forever, and the length is likely not of all that much
interest to the caller -- but sometimes we may need to pass such an
object to some library code we didn't write that is making some
trivial use of len(), like the examples I gave before.

That said, I would actually be okay with the status quo (which does
raise an OverflowError) as long as we commit to fixing this properly
in 2.7 / 3.1, by removing the range restriction (like we've done for
other int operations a long time ago).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From daniel at stutzbachenterprises.com  Tue Sep  2 23:18:08 2008
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Tue, 2 Sep 2008 16:18:08 -0500
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
In-Reply-To: <ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
	<52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
	<ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
Message-ID: <eae285400809021418q77009aefmb9a5e86bc796f633@mail.gmail.com>

On Tue, Sep 2, 2008 at 3:53 PM, Guido van Rossum <guido at python.org> wrote:

> The only time when __len__ can be larger than sys.maxsize is when the
> class implements some kind of virtual space where the values are
> computed on the fly. In such cases trying to walk over all values is
> bound to take forever, and the length is likely not of all that much
> interest to the caller -- but sometimes we may need to pass such an
> object to some library code we didn't write that is making some
> trivial use of len(), like the examples I gave before.

len() is useful for more than iteration, such as setting the bounds for a
binary search (e.g., over a large on-disk data structure)

 That said, I would actually be okay with the status quo (which does
> raise an OverflowError) as long as we commit to fixing this properly
> in 2.7 / 3.1, by removing the range restriction (like we've done for
> other int operations a long time ago).
>

+1

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080902/7ee455e8/attachment.htm>

From rhamph at gmail.com  Tue Sep  2 23:34:26 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 2 Sep 2008 15:34:26 -0600
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
In-Reply-To: <ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
	<52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
	<ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
Message-ID: <aac2c7cb0809021434r73dfa4edp2b2158076dfe9706@mail.gmail.com>

On Tue, Sep 2, 2008 at 2:53 PM, Guido van Rossum <guido at python.org> wrote:
> The only time when __len__ can be larger than sys.maxsize is when the
> class implements some kind of virtual space where the values are
> computed on the fly. In such cases trying to walk over all values is
> bound to take forever, and the length is likely not of all that much
> interest to the caller -- but sometimes we may need to pass such an
> object to some library code we didn't write that is making some
> trivial use of len(), like the examples I gave before.
>
> That said, I would actually be okay with the status quo (which does
> raise an OverflowError) as long as we commit to fixing this properly
> in 2.7 / 3.1, by removing the range restriction (like we've done for
> other int operations a long time ago).

+1

Otherwise it sounds like these virtual containers shouldn't support
len() at all.  Maybe a .len() method instead, with all the TMTOWTDI
that implies.

-- 
Adam Olsen, aka Rhamphoryncus

From ncoghlan at gmail.com  Tue Sep  2 23:57:59 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 03 Sep 2008 07:57:59 +1000
Subject: [Python-3000] Should len() clip to sys.maxsize
	or	raiseOverflowError?
In-Reply-To: <ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>	<52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
	<ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
Message-ID: <48BDB6E7.5070304@gmail.com>

Guido van Rossum wrote:
> The only time when __len__ can be larger than sys.maxsize is when the
> class implements some kind of virtual space where the values are
> computed on the fly. In such cases trying to walk over all values is
> bound to take forever, and the length is likely not of all that much
> interest to the caller -- but sometimes we may need to pass such an
> object to some library code we didn't write that is making some
> trivial use of len(), like the examples I gave before.
> 
> That said, I would actually be okay with the status quo (which does
> raise an OverflowError) as long as we commit to fixing this properly
> in 2.7 / 3.1, by removing the range restriction (like we've done for
> other int operations a long time ago).

For those that haven't been following issue 2690, the latter paragraph
will make it much easier to turn range() into a proper representative of
collections.Sequence.

I don't actually see any huge technical problems with implementing this
- we're just going to have to add a second C level method slot that uses
the unaryfunc signature (returning PyObject *) for a "virtual length"
method in addition to the existing mp_length and sq_length (which return
PySsize_t).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Wed Sep  3 00:01:19 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 03 Sep 2008 08:01:19 +1000
Subject: [Python-3000] Should len() clip to sys.maxsize
	or	raiseOverflowError?
In-Reply-To: <e9764b730809021313y11c7fe6dnba1694cfc6455e33@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
	<e9764b730809021313y11c7fe6dnba1694cfc6455e33@mail.gmail.com>
Message-ID: <48BDB7AF.5050702@gmail.com>

Dj Gilcrease wrote:
> why would it raise an index error when log_entries has more indicies
> then sys.maxsize, it should just check the entry @ sys.maxsize.
> 
> Maybe I missed it, but why cant len just return an int, which if I
> remember correctly is now a long in py3k, so on a 64 bit system len
> would (hopefully) never lie, but on a 32 bit system it could return a
> number segnificently lower then the actual number of entries.

That's the implementation detail Guido is referring to - when len(obj)
delegates to obj.__len__(), the result of the method call gets stored in
a PySsize_t value, creating the problem.

It's fixable, but not for the current release cycle.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From python at rcn.com  Tue Sep  2 23:30:59 2008
From: python at rcn.com (Raymond Hettinger)
Date: Tue, 2 Sep 2008 14:30:59 -0700
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
	<52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
	<ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
Message-ID: <67F7BAE0456A4455B3D9629F8A8BA15A@RaymondLaptop1>

From: "Guido van Rossum" <guido at python.org>
> That said, I would actually be okay with the status quo (which does
> raise an OverflowError) as long as we commit to fixing this properly
> in 2.7 / 3.1, by removing the range restriction (like we've done for
> other int operations a long time ago).

And there was much rejoicing!

Raymond

From greg.ewing at canterbury.ac.nz  Wed Sep  3 03:54:13 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 03 Sep 2008 13:54:13 +1200
Subject: [Python-3000] Should len() clip to sys.maxsize	or
 raiseOverflowError?
In-Reply-To: <48BDB6E7.5070304@gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
	<52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
	<ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
	<48BDB6E7.5070304@gmail.com>
Message-ID: <48BDEE45.705@canterbury.ac.nz>

Nick Coghlan wrote:

> - we're just going to have to add a second C level method slot that uses
> the unaryfunc signature (returning PyObject *) for a "virtual length"
> method in addition to the existing mp_length and sq_length (which return
> PySsize_t).

As an aside, is there any plan to clean up the duplication
between the mp_ and sq_ method slots?

-- 
Greg

From hagenf at coli.uni-saarland.de  Wed Sep  3 10:23:37 2008
From: hagenf at coli.uni-saarland.de (=?UTF-8?Q?Hagen_F=C3=BCrstenau?=)
Date: Wed, 3 Sep 2008 09:23:37 +0100
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
In-Reply-To: <ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
	<52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
	<ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
Message-ID: <33965e610809030123p4df9c62aq9a02e36a9e79a041@mail.gmail.com>

> That said, I would actually be okay with the status quo (which does
> raise an OverflowError) as long as we commit to fixing this properly
> in 2.7 / 3.1, by removing the range restriction (like we've done for
> other int operations a long time ago).

What should be done when __len__() returns a float? In Python 2.6 the
behaviour depends on whether the class in new-style or not. (Old-style
classes raise a TypeError, new-style classes truncate.) Is there any
good reason for truncating?

- Hagen

From guido at python.org  Wed Sep  3 18:15:14 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 3 Sep 2008 09:15:14 -0700
Subject: [Python-3000] Should len() clip to sys.maxsize or
	raiseOverflowError?
In-Reply-To: <33965e610809030123p4df9c62aq9a02e36a9e79a041@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>
	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>
	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>
	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>
	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>
	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>
	<52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
	<ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
	<33965e610809030123p4df9c62aq9a02e36a9e79a041@mail.gmail.com>
Message-ID: <ca471dc20809030915v560524ecye6ab8b955e411b36@mail.gmail.com>

2008/9/3 Hagen F?rstenau <hagenf at coli.uni-saarland.de>:
>> That said, I would actually be okay with the status quo (which does
>> raise an OverflowError) as long as we commit to fixing this properly
>> in 2.7 / 3.1, by removing the range restriction (like we've done for
>> other int operations a long time ago).
>
> What should be done when __len__() returns a float? In Python 2.6 the
> behaviour depends on whether the class in new-style or not. (Old-style
> classes raise a TypeError, new-style classes truncate.) Is there any
> good reason for truncating?

That souds like a bug. IMO TypeError is the right response here.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcea at jcea.es  Wed Sep  3 18:59:15 2008
From: jcea at jcea.es (Jesus Cea)
Date: Wed, 03 Sep 2008 18:59:15 +0200
Subject: [Python-3000] Should len() clip to sys.maxsize
	or	raiseOverflowError?
In-Reply-To: <ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com>	<ca471dc20809021026v26734dedt4563782a55f714b2@mail.gmail.com>	<eae285400809021114g7d49735egb134c9376f822288@mail.gmail.com>	<ca471dc20809021121l3d83dd2l2feec1fa59529560@mail.gmail.com>	<D1971DF6A46C47E485AA1696D370BD55@RaymondLaptop1>	<ca471dc20809021220r4d192292r63ba8cb4ae5caf9d@mail.gmail.com>	<52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1>
	<ca471dc20809021353n3e424ffbtc13dcec262f6b125@mail.gmail.com>
Message-ID: <48BEC263.40402@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Guido van Rossum wrote:
> That said, I would actually be okay with the status quo (which does
> raise an OverflowError) as long as we commit to fixing this properly
> in 2.7 / 3.1, by removing the range restriction (like we've done for
> other int operations a long time ago).

+1

Some of my python time is spend managing a huge python persistent
repository of almost two hundred terabytes. I have more than 2^32
objects just now. In fact, a bit more than 2^35. Growing about 25-30%
per year. Disk capacity raises about 40% per year, so the hardware
growing is under control :)... barely.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSL7CYJlgi5GaxT1NAQJwFgP9HHsI/GfNY3i0ZTEvfRt16BGVJ5gOKq35
KNDv4XzuMFmdaPEdwtAuKvGEbcb5f8+jvkbi3kmUHVJI73hs0pO/lKMbtKAjwsal
rq/wlA7na6oHhe7zIN/UljQPJhy1K0SuSO2Y0GzeJ88MTRbBjw/Ulitw3ESc5Dij
eSQkpRrkQHc=
=QSRC
-----END PGP SIGNATURE-----

From jcea at jcea.es  Wed Sep  3 20:03:16 2008
From: jcea at jcea.es (Jesus Cea)
Date: Wed, 03 Sep 2008 20:03:16 +0200
Subject: [Python-3000] bsddb finished for 2.6/3.0 (and "<class
 'BytesWarning'>: str() on a bytes instance")
Message-ID: <48BED164.1050103@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

With this email I confirm that bsddb work for 2.6/3.0 rc1 is done.

I have some issues with the buildbots, nevertheless. See:
<http://www.python.org/dev/buildbot/3.0.stable/g4 osx.4
3.0/builds/1334/step-test/0>, for example.

"""
Re-running failed tests in verbose mode
Re-running test 'test_bsddb3' in verbose mode
test test_bsddb3 crashed -- <class 'BytesWarning'>: str() on a bytes
instance
Traceback (most recent call last):
  File "./Lib/test/regrtest.py", line 603, in runtest_inner
    indirect_test()
  File "/Users/buildslave/bb/3.0.psf-g4/build/Lib/test/test_bsddb3.py",
line 60, in test_main
    print(db.DB_VERSION_STRING, file=sys.stderr)
BytesWarning: str() on a bytes instance
"""

I can't reproduce the issue in my local Python3.0 development version
(here, all tests passes fine). Any suggestion?.

"Decoding" the "db.DB_VERSION_STRING" byte string would solve the error,
but I rather prefer to know WHY I am having this issue at all. My
Python3.0 "str()" has no any issue with byte values:

"""
[jcea at tesalia tmp]$ python3.0
Python 3.0b3+ (py3k:66121, Sep  1 2008, 22:25:14)
[GCC 4.2.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> b=b'some string...'
>>> b
b'some string...'
>>> str(b)
"b'some string...'"
>>>
"""

Help appreciated.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSL7RXplgi5GaxT1NAQLGywP+KHothVVQ1bxmQZixBfKrdAhyqCRZ3S61
xGHE9U6IOaF1fB5O9S+E/OGEa8RX5hWNyxie5UsjG7N7qt0r6q5tqA8bedomsZtY
/CM0lCOr5F3ssFsUF965WxUD03aD+IRssr+7SKTyotNHH1qGF7ffggfGDJmF0/wq
C3RYRS6cUjs=
=sx2A
-----END PGP SIGNATURE-----

From lists at cheimes.de  Wed Sep  3 20:36:48 2008
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 03 Sep 2008 20:36:48 +0200
Subject: [Python-3000] bsddb finished for 2.6/3.0 (and "<class
 'BytesWarning'>: str() on a bytes instance")
In-Reply-To: <48BED164.1050103@jcea.es>
References: <48BED164.1050103@jcea.es>
Message-ID: <g9mlg0$isv$1@ger.gmane.org>

Jesus Cea wrote:
> I can't reproduce the issue in my local Python3.0 development version
> (here, all tests passes fine). Any suggestion?.

Yeah, use my byte warning mode of Python 3.0. Before your time in the 
core team I had a long discussion with Guido and a few others. The 
conclusion of the discussion was the byte warning mode.

./python --help
-b     : issue warnings about str(bytes_instance), str(buffer_instance)
          and comparing bytes/buffer with str. (-bb: issue errors)

By the way buffer_instance should be renamed to bytearray.

./python -bb
Python 3.0b3+ (py3k:66180M, Sep  3 2008, 12:35:13)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> str(b'')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
BytesWarning: str() on a bytes instance
[41030 refs]

> "Decoding" the "db.DB_VERSION_STRING" byte string would solve the error,
> but I rather prefer to know WHY I am having this issue at all. My
> Python3.0 "str()" has no any issue with byte values:

It has no issues because Guido wanted str() to successed. Any comparison 
or conversion of a byte / bytearray instance with / to str is most 
likely a bug or design flaw in the application. The byte warning mode 
helps to discover hard to find bugs like "" == b"".

Christian

From jcea at jcea.es  Wed Sep  3 23:31:49 2008
From: jcea at jcea.es (Jesus Cea)
Date: Wed, 03 Sep 2008 23:31:49 +0200
Subject: [Python-3000] bsddb finished for 2.6/3.0 (and "<class
 'BytesWarning'>: str() on a bytes instance")
In-Reply-To: <g9mlg0$isv$1@ger.gmane.org>
References: <48BED164.1050103@jcea.es> <g9mlg0$isv$1@ger.gmane.org>
Message-ID: <48BF0245.9000102@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Christian Heimes wrote:
> Jesus Cea wrote:
>> I can't reproduce the issue in my local Python3.0 development version
>> (here, all tests passes fine). Any suggestion?.
> 
> Yeah, use my byte warning mode of Python 3.0. Before your time in the
> core team I had a long discussion with Guido and a few others. The
> conclusion of the discussion was the byte warning mode.

So much to learn, so little time :). Do you have an URL at hand?

>> "Decoding" the "db.DB_VERSION_STRING" byte string would solve the error,
>> but I rather prefer to know WHY I am having this issue at all. My
>> Python3.0 "str()" has no any issue with byte values:
> 
> It has no issues because Guido wanted str() to successed. Any comparison
> or conversion of a byte / bytearray instance with / to str is most
> likely a bug or design flaw in the application. The byte warning mode
> helps to discover hard to find bugs like "" == b"".

Just committed the "decode" thing: r66188.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSL8CPplgi5GaxT1NAQK9gQP/SvkaSpRl+hqAtuCy1uh8cj6NwUN2Iw0E
Z6XZfxNIRgcBtwJdugSu/70lqRsCusj9cSrxxhCw5xPQSjUeLQTsVlvqGNPGU1XI
PNKrA6ofqsHRlJgg/umKmdlyOy8PftckWugPOw2RVIeQXeRuWxs35/7F4uVEnCT2
ttCvJJPhDJo=
=hahg
-----END PGP SIGNATURE-----

From barry at python.org  Thu Sep  4 00:29:02 2008
From: barry at python.org (Barry Warsaw)
Date: Wed, 3 Sep 2008 18:29:02 -0400
Subject: [Python-3000] bsddb finished for 2.6/3.0 (and "<class
	'BytesWarning'>: str() on a bytes instance")
In-Reply-To: <48BF0245.9000102@jcea.es>
References: <48BED164.1050103@jcea.es> <g9mlg0$isv$1@ger.gmane.org>
	<48BF0245.9000102@jcea.es>
Message-ID: <EC821EDF-5198-4490-B9A8-AA691E7EF576@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 3, 2008, at 5:31 PM, Jesus Cea wrote:

> Christian Heimes wrote:
>> Jesus Cea wrote:
>>> I can't reproduce the issue in my local Python3.0 development  
>>> version
>>> (here, all tests passes fine). Any suggestion?.
>>
>> Yeah, use my byte warning mode of Python 3.0. Before your time in the
>> core team I had a long discussion with Guido and a few others. The
>> conclusion of the discussion was the byte warning mode.
>
> So much to learn, so little time :). Do you have an URL at hand?
>
>>> "Decoding" the "db.DB_VERSION_STRING" byte string would solve the  
>>> error,
>>> but I rather prefer to know WHY I am having this issue at all. My
>>> Python3.0 "str()" has no any issue with byte values:
>>
>> It has no issues because Guido wanted str() to successed. Any  
>> comparison
>> or conversion of a byte / bytearray instance with / to str is most
>> likely a bug or design flaw in the application. The byte warning mode
>> helps to discover hard to find bugs like "" == b"".
>
> Just committed the "decode" thing: r66188.

Jesus, again thanks for working so hard on pybsddb, I really  
appreciate the effort.

However, in talking with several developers, there are still concerns  
about bundling bsddb with Python 3.0.  We have to leave it in 2.6 for  
backward compatibility reasons, but we should deprecate it, remove it  
from 3.0 and continue to release it as a separate package.

The issues come down to these:

- - You (Jesus) are the only person maintaining the code
- - The upstream bsddb API has never been the most stable thing in the  
world
- - Concerns that the buildbot environments are not adequately testing  
the code

My gut own feeling is that both pybsddb and Python would be much  
better served with this code outside the core, distributed  
separately.  All your work would still live on, and be appreciated by  
the community, but neither pybsddb nor Python would be tied to each  
other's release cycles.  And of course, your continued maintenance on  
the 2.6 branch is greatly appreciated.

The dilemma for me is whether to let 3.0rc1 go out with or without  
bsddb.  Either way, if there's a pitchfork revolt of this decision  
we'll have to break our rule for rc2 to back the change out.

Guido has already given his approval to remove pybsddb from Python 3.0:

http://mail.python.org/pipermail/python-dev/2008-July/081379.html

and I know Brett agrees, so that's it.  On IRC, I've just asked  
Benjamin to do the honors for 3.0 and Brett will add the deprecations  
for 2.6.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL8PrnEjvBPtnXfVAQKFfAP9EfWlyrmdBvIrO85vX4dpd/uIjM1Q5Ngm
LP4a20nWPsmA6LpMbW7fjpwVnnNOeJqamqX8JFsqcETw1GOJnIgovqhHItzCgQjb
0X+Uw/m2Uv0TqKcgrf0WXw61sLG8liWdkV4tq92JnbzBVwEzCTdZPDfOGUGEYRop
Q3LLOxKRRow=
=4dVo
-----END PGP SIGNATURE-----

From jcea at jcea.es  Thu Sep  4 01:01:29 2008
From: jcea at jcea.es (Jesus Cea)
Date: Thu, 04 Sep 2008 01:01:29 +0200
Subject: [Python-3000] bsddb finished for 2.6/3.0 (and
 "<class	'BytesWarning'>: str() on a bytes instance")
In-Reply-To: <EC821EDF-5198-4490-B9A8-AA691E7EF576@python.org>
References: <48BED164.1050103@jcea.es>
	<g9mlg0$isv$1@ger.gmane.org>	<48BF0245.9000102@jcea.es>
	<EC821EDF-5198-4490-B9A8-AA691E7EF576@python.org>
Message-ID: <48BF1749.7040803@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Barry Warsaw wrote:
> and I know Brett agrees, so that's it.  On IRC, I've just asked Benjamin
> to do the honors for 3.0 and Brett will add the deprecations for 2.6.

I just committed the fix for bsddb testsuite in Python 3.0 branch:
http://www.python.org/dev/buildbot/3.0.stable/changes/2687

Can I do anything to revert this decision?. If not, what can I do to be
reconsidered in 3.1?.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSL8XSZlgi5GaxT1NAQJQ5QP/ZGivsmwMbMta2mcxYSbc97BgHGbvIavD
fTjuJ7v2R+3p0bQIAAGs7ih7mkJ/6a+F6j2hqC4Qk+0p3NK5IYn+lCThtBmjlIyb
zmcTBzWyctuSdjV/AvDhjziRbnMlCIjhBCBHO9vc82hb5AmiBo5XT9szJHCfpDa+
2DWp8t765t8=
=SrAU
-----END PGP SIGNATURE-----

From barry at python.org  Thu Sep  4 03:25:26 2008
From: barry at python.org (Barry Warsaw)
Date: Wed, 3 Sep 2008 21:25:26 -0400
Subject: [Python-3000] bsddb finished for 2.6/3.0 (and
	"<class	'BytesWarning'>: str() on a bytes instance")
In-Reply-To: <48BF1749.7040803@jcea.es>
References: <48BED164.1050103@jcea.es>
	<g9mlg0$isv$1@ger.gmane.org>	<48BF0245.9000102@jcea.es>
	<EC821EDF-5198-4490-B9A8-AA691E7EF576@python.org>
	<48BF1749.7040803@jcea.es>
Message-ID: <DFB7E9A9-A814-4106-AB7C-24F3A2135300@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 3, 2008, at 7:01 PM, Jesus Cea wrote:

> Barry Warsaw wrote:
>> and I know Brett agrees, so that's it.  On IRC, I've just asked  
>> Benjamin
>> to do the honors for 3.0 and Brett will add the deprecations for 2.6.
>
> I just committed the fix for bsddb testsuite in Python 3.0 branch:
> http://www.python.org/dev/buildbot/3.0.stable/changes/2687
>
> Can I do anything to revert this decision?. If not, what can I do to  
> be
> reconsidered in 3.1?.

Start raising some pitchforks.  It looks like Raymond will join the  
march :).

Really, this is about what's best for Python and pybsddb.  In this  
article, Guido unambiguously states his opinion:

http://mail.python.org/pipermail/python-dev/2008-July/081362.html

"+1. In my recollection maintaining bsddb has been nothing but trouble
right from the start when we were all sitting together at "Zope Corp
North" in a rented office in McLean... We can remove it from 3.0. We
can't really remove it from 2.6, but we can certainly start
end-of-lifing it in 2.6."

Jesus, let me stress that IMO this is not a reflection on your work at  
all.  On the contrary, keeping it alive in 2.x and providing a really  
solid independent package for 3.0 is critical for its continued  
relevance to Python programmers.

I completely agree with Guido that bsddb (not pybsddb) has been a  
headache since forever.  For example, IIRC Sleepycat was notorious for  
changing the API in micro releases, though I don't know if that's  
still the case with the current maintainers.  I personally believe  
that Python and pybsddb are both better off with their own maintenance  
lifecycles so I stand by my decision that pulling it out of 3.0 is the  
right thing to do.  3.1 is far enough away that any decision we make  
in 3.0 can be re-evaluated.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL85BnEjvBPtnXfVAQLfkwQAtoagOP37uAwL1r2H7w73erTsWBYHf4VH
KcTZsjeQ/mEvmaaJIG86ylAtpxmDmMF5x7OClR66bXXxf0oTnWV4KMC9rLdQW8R/
KpMIfuQw/501AQgFmcB0M6SQ6CYyJHU5K+K6X+ScOPHOJoG8usPK1pk8XFGOXBZK
UGXCEHVvlrk=
=7AOQ
-----END PGP SIGNATURE-----

From skip at pobox.com  Thu Sep  4 04:33:48 2008
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 3 Sep 2008 21:33:48 -0500
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
Message-ID: <18623.18700.76260.893902@montanaro-dyndns-org.local>

>From issue3769:

    Skip> Remind me why we want to get rid of bsddb?

    Benjamin> The reasons are enumerated in PEP 3108.

Not much justification and no references to outside discussion for such a
heavily used package which has been part of Python for a long time in one
form or another.

I find it amusing that bsddb3 is a key justification for the removal of
bsddb185 and then later on bsddb3 is deemed too much of a maintenance burden
itself to retain.

Does dumbdbm (aka dbm.dumb) work any better than it used to?  I'd hate to
think that's going to be the default cross-platform dictionary-on-disk
package for Python.  Can he pep at least be edited to reflect which of the
various dbm.{bsd,ndbm,dumb,gnu} modules will be supported on what platforms
by default?

Skip

From brett at python.org  Thu Sep  4 05:26:22 2008
From: brett at python.org (Brett Cannon)
Date: Wed, 3 Sep 2008 20:26:22 -0700
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <18623.18700.76260.893902@montanaro-dyndns-org.local>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
	<18623.18700.76260.893902@montanaro-dyndns-org.local>
Message-ID: <bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>

On Wed, Sep 3, 2008 at 7:33 PM,  <skip at pobox.com> wrote:
>
> >From issue3769:
>
>    Skip> Remind me why we want to get rid of bsddb?
>
>    Benjamin> The reasons are enumerated in PEP 3108.
>
> Not much justification and no references to outside discussion for such a
> heavily used package which has been part of Python for a long time in one
> form or another.
>
> I find it amusing that bsddb3 is a key justification for the removal of
> bsddb185 and then later on bsddb3 is deemed too much of a maintenance burden
> itself to retain.
>
> Does dumbdbm (aka dbm.dumb) work any better than it used to?  I'd hate to
> think that's going to be the default cross-platform dictionary-on-disk
> package for Python.  Can he pep at least be edited to reflect which of the
> various dbm.{bsd,ndbm,dumb,gnu} modules will be supported on what platforms
> by default?
>

All but dbm.dumb require some pre-existing library to exist to compile
against. So any platform that has the proper libraries installed will
be able to use ndbm or gnu, but as for which platforms that are I do
not know.

-Brett

From python at rcn.com  Thu Sep  4 05:36:40 2008
From: python at rcn.com (Raymond Hettinger)
Date: Wed, 3 Sep 2008 20:36:40 -0700
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
	<18623.18700.76260.893902@montanaro-dyndns-org.local>
Message-ID: <1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1>

>    Skip> Remind me why we want to get rid of bsddb?
> 
>    Benjamin> The reasons are enumerated in PEP 3108.
> 
> Not much justification and no references to outside discussion for such a
> heavily used package which has been part of Python for a long time in one
> form or another.

Well said.

Raymond

From barry at python.org  Thu Sep  4 05:41:27 2008
From: barry at python.org (Barry Warsaw)
Date: Wed, 3 Sep 2008 23:41:27 -0400
Subject: [Python-3000] Not releasing rc1 tonight
Message-ID: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm not going to release rc1 tonight.  There are too many open release  
blockers that I don't want to defer, and I'd like the buildbots to  
churn through the bsddb removal on all platforms.  Let me first thank  
Benjamin, Brett, Mark and Antoine for their help on IRC tonight.

Here are the issues I'm not comfortable with deferring:

   3640 test_cpickle crash on AMD64 Windows build
874900 threading module can deadlock after fork
   3574 compile() cannot decode Latin-1 source encodings
   3657 pickle can pickle the wrong function
   3187 os.listdir can return byte strings
   3660 reference leaks in 3.0
   3594 PyTokenizer_FindEncoding() never succeeds
   3629 Py30b3 won't compile a regex that compiles with 2.5.2 and 30b2

In addition, Mark reported in IRC that there are some regressions in  
the logging module.

I appreciate any feedback or fixes you can provide on these issues.   
You might also want to look at the deferred blockers to see if there's  
anything that really should be blocking rc1.

I'd like to try again on Friday and stick to rc2 on the 17th.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL9Y53EjvBPtnXfVAQJGXwP+JZUa5EWlQh7yzt7aFdEM3qgiFZnKqWhz
TN4Cen0/eK8c4+t8a5WC+OLvc/P3PhMPhLSnE+g6IqQUO+pt+2LANgpAvCUrUahc
Nk2pt3gCclcmWlzVvCBspVPZjFPkHsW0uVhgK6x1C/2Re90yjeBqPGgT4LGlmaR3
bz6A3iiUnk0=
=Y5aN
-----END PGP SIGNATURE-----

From brett at python.org  Thu Sep  4 06:10:33 2008
From: brett at python.org (Brett Cannon)
Date: Wed, 3 Sep 2008 21:10:33 -0700
Subject: [Python-3000] Problem with grammar for 'except'?
Message-ID: <bbaeab100809032110i58bdbcefpa66d5536ef02c7dc@mail.gmail.com>

I gave a talk last night at the Vancouver Python users group on
2.6/3.0, and I tried the following code and it failed during a live
demo::

  >>> try: pass
  ... except Exception, Exception: pass
    File "<stdin>", line 2
      except Exception, Exception: pass
                                 ^
  SyntaxError: invalid syntax

Now from what I can tell from PEP 3110, that should be legal in 3.0.
Am I reading the PEP correctly?

-Brett

From python at rcn.com  Thu Sep  4 06:14:21 2008
From: python at rcn.com (Raymond Hettinger)
Date: Wed, 3 Sep 2008 21:14:21 -0700
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
Message-ID: <7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1>

[Barry]
> I'm not going to release rc1 tonight.  

Can I go ahead with some bug fixes and doc improvements
or should I wait until after Friday?

Raymond

From python at rcn.com  Thu Sep  4 06:25:00 2008
From: python at rcn.com (Raymond Hettinger)
Date: Wed, 3 Sep 2008 21:25:00 -0700
Subject: [Python-3000] Problem with grammar for 'except'?
References: <bbaeab100809032110i58bdbcefpa66d5536ef02c7dc@mail.gmail.com>
Message-ID: <A38FA5A4111844B8B3CF639A04795311@RaymondLaptop1>

[Brett]
>I gave a talk last night at the Vancouver Python users group on
> 2.6/3.0, and I tried the following code and it failed during a live
> demo::
> 
>  >>> try: pass
>  ... except Exception, Exception: pass
>    File "<stdin>", line 2
>      except Exception, Exception: pass
>                                 ^
>  SyntaxError: invalid syntax
> 
> Now from what I can tell from PEP 3110, that should be legal in 3.0.
> Am I reading the PEP correctly?

Don't think so.
The parens are necessary for a tuple of exceptions
lest it be confused with the old "except E, v" syntax
which meant "except E as e".

Maybe in 3.1, the paren requirement can be dropped.
But for 3.0, it would be a problem given that old
scripts would start getting misinterpreted.

I did something similar for list.sort() by requiring
keyword arguments.  That way, we wouldn't have
list.sort(f) running with f as a cmp function 2.6 and
as a key function in 3.0.

Raymond

From greg at krypto.org  Thu Sep  4 07:08:04 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 3 Sep 2008 22:08:04 -0700
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
	<18623.18700.76260.893902@montanaro-dyndns-org.local>
	<1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1>
Message-ID: <52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com>

On Wed, Sep 3, 2008 at 8:36 PM, Raymond Hettinger <python at rcn.com> wrote:
>>   Skip> Remind me why we want to get rid of bsddb?
>>
>>   Benjamin> The reasons are enumerated in PEP 3108.
>>
>> Not much justification and no references to outside discussion for such a
>> heavily used package which has been part of Python for a long time in one
>> form or another.
>
> Well said.

Frankly I don't see a big deal with not including it in *3.0* so long
as a reference to where to download it as an add on (jcea's pybsddb
site) is included in the release notes and PEP 3108.  I've updated the
relevant documentation in 3.0.

I'm not going to fight a battle against its removal when I know
several python devs are already way to scarred and cranky to ever
change their minds due to BerkeleyDB itself being painful to get
working right on all platforms.  Theres truth to that not being worth
our time if we actually want to test the module properly to avoid
shipping a lemon.

The fact that the Python Lib/bsddb/test/ test suite has uncovered
actual Oracle/Sleepycat BerkeleyDB bugs in supposedly stable releases
has always disturbed me.

I do wish this had been discussed on comp.lang.python before now
rather than pulling the rug out at the last minute.  oh well.

-gps

PS Thank you jcea for your wonderful work on improving bsddb!
Regardless of whether it appears in the standard library in the future
you're making many users very happy with your work.

From greg at krypto.org  Thu Sep  4 07:17:53 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 3 Sep 2008 22:17:53 -0700
Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta
	3 planned for this Wednesday)
In-Reply-To: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com>
References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com>
Message-ID: <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com>

I agree that this should go in.  zlib should return bytes.  other read
functions and similar modules like bz2module already return bytes.
unless i hear objections, i'll commit this in about 12 hours.

On Wed, Aug 20, 2008 at 12:20 AM, Anand Balachandran Pillai
<abpillai at gmail.com> wrote:
> Hi,
>
>    I think the patches for issue 3492 should also be merged in for this beta.
> It affects the behaviour of zlib module. I have submitted patches for
> zlibmodule.c
> and test_zlib.py a few weeks back and they are ready to be merged anytime.
>
> I am forwarding a message where I discussed this with Antoine, who is
> CCed on the bug.
>
> Thanks
>
> --Anand
>
> ---------- Forwarded message ----------
> From: Antoine Pitrou <solipsis at pitrou.net>
> Date: Wed, Aug 20, 2008 at 12:26 PM
> Subject: Re: [Python-3000] Beta 3 planned for this Wednesday
> To: Anand Balachandran Pillai <abpillai at gmail.com>
>
>
>
> Hello Anand,
>
>>       If that is the case is http://bugs.python.org/issue3492 important ? It is
>> not marked as critical or blocker or anything, but I find it strange
>> that zlib in
>> Python 3.0 will return bytearrays, if we don't merge the patches.
>
> Agreed, however I can't do it myself today. You should post your message
> to the list or on the bug tracker if you want someone to apply the patch
> before the beta.
>
> Regards
>
> Antoine.
>
>
>
>
>
> --
> -Anand
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org
>

From abpillai at gmail.com  Thu Sep  4 07:42:43 2008
From: abpillai at gmail.com (Anand Balachandran Pillai)
Date: Thu, 4 Sep 2008 11:12:43 +0530
Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta
	3 planned for this Wednesday)
In-Reply-To: <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com>
References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com>
	<52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com>
Message-ID: <8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com>

On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith <greg at krypto.org> wrote:
> I agree that this should go in.  zlib should return bytes.  other read
> functions and similar modules like bz2module already return bytes.
> unless i hear objections, i'll commit this in about 12 hours.

+1  :)

>

Regards

-- 
-Anand

From mhammond at skippinet.com.au  Thu Sep  4 09:28:36 2008
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu, 4 Sep 2008 17:28:36 +1000
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
Message-ID: <00cd01c90e5f$daf326a0$90d973e0$@com.au>

Barry writes:

> In addition, Mark reported in IRC that there are some regressions in
> the logging module.

3772 logging module fails with non-ascii data

Which according to the IRC discussion doesn't apply to py3k.  The fix for
2.6 is quite trivial...

Cheers,

Mark

From jcea at jcea.es  Thu Sep  4 11:59:28 2008
From: jcea at jcea.es (Jesus Cea)
Date: Thu, 04 Sep 2008 11:59:28 +0200
Subject: [Python-3000] bsddb finished for 2.6/3.0
 (and	"<class	'BytesWarning'>: str() on a bytes instance")
In-Reply-To: <DFB7E9A9-A814-4106-AB7C-24F3A2135300@python.org>
References: <48BED164.1050103@jcea.es>	<g9mlg0$isv$1@ger.gmane.org>	<48BF0245.9000102@jcea.es>	<EC821EDF-5198-4490-B9A8-AA691E7EF576@python.org>	<48BF1749.7040803@jcea.es>
	<DFB7E9A9-A814-4106-AB7C-24F3A2135300@python.org>
Message-ID: <48BFB180.7010706@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Barry Warsaw wrote:
>> Can I do anything to revert this decision?. If not, what can I do to be
>> reconsidered in 3.1?.
> 
> Start raising some pitchforks.  It looks like Raymond will join the
> march :).

Sorry, I know the word ("pitchfork"), but I don't understand the meaning
you want to communicate. English is not my native language.

> I completely agree with Guido that bsddb (not pybsddb) has been a
> headache since forever.

I toke over bsddb maintenance in february/march because it was decided
it was unmaintained and should be removed. I stepped forward to avoid
this risk, because I'm qualified, motivated and I use bsddb everyday. I
think that providing a powerful ACID/replication/distributed transaction
module in stock python is a *huge* feature. In fact, given some
enterprise policies, some users of this module won't be able to use it,
because policies don't allow to install "non standard" packages, aside
python itself.

Being a novice python-dev member is a handicap, working in such a
"visible" and shamed module, and it shows. But although managing the 3.0
conversion hasn't been easy, it is already done.

> I personally believe that Python
> and pybsddb are both better off with their own maintenance lifecycles

I agree, and that is the reason there exists a separate bsddb3 module
available via PYPI. But that is orthogonal to bsddb inclusion in Python.

I will keep maintaining bsddb as a separate package, in any case. But I
will miss being able to get your advice and you knowledge, and the
invaluable patches Neal provides from time to time :).

3.0 release is being stressful for all of us. I now. Thanks for the time
you spent explaining the situation and letting me argue back :). Thanks
to all of you (python-dev) for the time you wasted teaching me and
suffering builtbots crashes O:-)

PS: I will battle for bsddb readmission. If any of you can provide
positive rationals for it, please, let me know.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSL+xgJlgi5GaxT1NAQIdNAP/bySuzSW6bqUnT89Y0tUuGI0G0Svmol1A
3YNXtW/UIhNSL2BVNrAzrSLlcFjJmoCJOOUCfsK22sMb7+JveLFofPiUz+4Q2eaA
Zs3rIFY/k13eJmFtDd101OExgBtamzIUjkYVyr6OxdxlvIMbDp2zMdwHiFQM3vr8
MUntInFEDQA=
=qTfu
-----END PGP SIGNATURE-----

From andrewm at object-craft.com.au  Thu Sep  4 12:29:04 2008
From: andrewm at object-craft.com.au (Andrew McNamara)
Date: Thu, 04 Sep 2008 20:29:04 +1000
Subject: [Python-3000] bsddb finished for 2.6/3.0 (and "<class
	'BytesWarning'>: str() on a bytes instance")
In-Reply-To: <48BFB180.7010706@jcea.es> 
References: <48BED164.1050103@jcea.es> <g9mlg0$isv$1@ger.gmane.org>
	<48BF0245.9000102@jcea.es>
	<EC821EDF-5198-4490-B9A8-AA691E7EF576@python.org>
	<48BF1749.7040803@jcea.es>
	<DFB7E9A9-A814-4106-AB7C-24F3A2135300@python.org>
	<48BFB180.7010706@jcea.es>
Message-ID: <20080904102904.66DBB600898@longblack.object-craft.com.au>

>> Start raising some pitchforks.  It looks like Raymond will join the
>> march :).
>
>Sorry, I know the word ("pitchfork"), but I don't understand the meaning
>you want to communicate. English is not my native language.

The farmers march on town hall as a mob, carrying their pitchforks,
when the council makes an unpopular rule.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

From skip at pobox.com  Thu Sep  4 13:03:27 2008
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 4 Sep 2008 06:03:27 -0500
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
	<18623.18700.76260.893902@montanaro-dyndns-org.local>
	<bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
Message-ID: <18623.49279.686393.3857@montanaro-dyndns-org.local>

    Brett> All but dbm.dumb require some pre-existing library to exist to
    Brett> compile against. So any platform that has the proper libraries
    Brett> installed will be able to use ndbm or gnu, but as for which
    Brett> platforms that are I do not know.

Wasn't bsddb (either bsddb185 or bsddb3) built for the Windows
distributions?  Without something there's no guarantee that anydbm or shelve
will work out of the box.  As Raymond pointed out, dumbdbm would be a poor
choice as the default dict-on-disk module.

Skip

From skip at pobox.com  Thu Sep  4 13:08:01 2008
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 4 Sep 2008 06:08:01 -0500
Subject: [Python-3000] Not releasing rc1 tonight
In-Reply-To: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
Message-ID: <18623.49553.393129.601096@montanaro-dyndns-org.local>

    Barry> In addition, Mark reported in IRC that there are some regressions
    Barry> in the logging module.

Vinay apparently checked in some changes to the logging module with no
review.  In the absence of obvious bug fixes there that should probably be
reverted.

Skip

From facundobatista at gmail.com  Thu Sep  4 13:31:18 2008
From: facundobatista at gmail.com (Facundo Batista)
Date: Thu, 4 Sep 2008 08:31:18 -0300
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1>
Message-ID: <e04bdf310809040431x4db0bfb2y1e0ffaf3ef5725a5@mail.gmail.com>

2008/9/4 Raymond Hettinger <python at rcn.com>:

> Can I go ahead with some bug fixes and doc improvements
> or should I wait until after Friday?

Doc improvements: go ahead.

Bug fixes: the patchs should be revised by other developer.

(I'll be hanging around in #python-dev today and tomorrow, btw, ping
me if I can help you)

-- 
. Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From barry at python.org  Thu Sep  4 15:01:34 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 09:01:34 -0400
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1>
Message-ID: <5FEA88D7-EEA9-40E1-AD03-FC2607565FEC@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 12:14 AM, Raymond Hettinger wrote:

> [Barry]
>> I'm not going to release rc1 tonight.
>
> Can I go ahead with some bug fixes and doc improvements
> or should I wait until after Friday?

Doc fixes are fine.  Please have bug fix patches reviewed by another  
python-dev member.  Bonus points for any bug fix that closes a release  
blocker! :)

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL/cL3EjvBPtnXfVAQKnmgQAlx89LWeq0hEmTRvTGy/DHIYioARqAisG
wJAnZPqinbGI6pkyn0kiwgDOvNzstnFQSZsEFiAFU+iF+nbgkm8agcTf+eLXqCFK
y+o0xXTi7fLXKuaGioY54kz3BcwQH17Ul3X6vRxBdCWYesAe3rIXprnNgt/Euuyy
P5sZLKwfTls=
=b3n4
-----END PGP SIGNATURE-----

From barry at python.org  Thu Sep  4 15:02:44 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 09:02:44 -0400
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <00cd01c90e5f$daf326a0$90d973e0$@com.au>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<00cd01c90e5f$daf326a0$90d973e0$@com.au>
Message-ID: <50FD6660-44FE-433E-827C-B1ECD82FB55A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 3:28 AM, Mark Hammond wrote:

> Barry writes:
>
>> In addition, Mark reported in IRC that there are some regressions in
>> the logging module.
>
> 3772 logging module fails with non-ascii data
>
> Which according to the IRC discussion doesn't apply to py3k.  The  
> fix for
> 2.6 is quite trivial...

Thanks.  Looks like Vinay committed this.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL/cdHEjvBPtnXfVAQIb7gP9G2o8eSnWWfEmlanwoqiHGxgqUjQtx8Xz
es/Sjclk5KZ2X4I/jITJcOxGDfTT3h7FX9tDQiUaIzZAVB66qyzWc3957bUwqeqS
9HNqfB4OoIa1Ds2+lukXpEPci6eddl2xVFEkejgsfdyS4q2/K1/R6URTPCXnPNiH
zoiXNaEcBcM=
=Zk4M
-----END PGP SIGNATURE-----

From barry at python.org  Thu Sep  4 15:03:34 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 09:03:34 -0400
Subject: [Python-3000] Not releasing rc1 tonight
In-Reply-To: <18623.49553.393129.601096@montanaro-dyndns-org.local>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<18623.49553.393129.601096@montanaro-dyndns-org.local>
Message-ID: <4C4EC784-A7CA-4497-A0FF-1D9B5D0E8399@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 7:08 AM, skip at pobox.com wrote:

>
>    Barry> In addition, Mark reported in IRC that there are some  
> regressions
>    Barry> in the logging module.
>
> Vinay apparently checked in some changes to the logging module with no
> review.  In the absence of obvious bug fixes there that should  
> probably be
> reverted.

Or did he commit Mark's patch from bug 3772?  If so, that would count  
as a reviewed patch.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL/cpnEjvBPtnXfVAQIkSwQApjBbIGgyV3X1oBhBLtRjTZrVDgFXPfRH
XyXtVd1r75PT+24UuqPHWNC9l+/sKnUaYqH3kJbHG2duMyr/duG7j6EIJLzOz+QC
SKwqtQr+WDBR0vpH3Q0wrUzQNXhtDyCjWx84IatRbKRVDUfbDlFQy/jj+SLvRBBR
WGJTAFP1x5g=
=mebg
-----END PGP SIGNATURE-----

From barry at python.org  Thu Sep  4 15:04:25 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 09:04:25 -0400
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <e04bdf310809040431x4db0bfb2y1e0ffaf3ef5725a5@mail.gmail.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1>
	<e04bdf310809040431x4db0bfb2y1e0ffaf3ef5725a5@mail.gmail.com>
Message-ID: <27C24817-848B-4173-B5F8-7240EB5B28D1@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 7:31 AM, Facundo Batista wrote:

> (I'll be hanging around in #python-dev today and tomorrow, btw, ping
> me if I can help you)

Me too, though I'm a bit busy at work.  Ping my nick 'barry' if you  
need any RM-level decision.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL/c2XEjvBPtnXfVAQJ4EQP/SecaG0VRtsezedDRpX+zwmVo6W0n+9EP
rmKH5CWMSTWh53rXySCmE8IS2rrdhoyCZNSy0aERMTGz5JuEh/sw+O5EaxJQMFST
DdYx0aLRVwb62JaQHr7W7YyVWBG5+CQa3BehASFiwsw0dsAp0BpkwW1nIhybkLcW
hXNRzB2gwXI=
=9Mgt
-----END PGP SIGNATURE-----

From skip at pobox.com  Thu Sep  4 15:45:43 2008
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 4 Sep 2008 08:45:43 -0500
Subject: [Python-3000] Not releasing rc1 tonight
In-Reply-To: <4C4EC784-A7CA-4497-A0FF-1D9B5D0E8399@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<18623.49553.393129.601096@montanaro-dyndns-org.local>
	<4C4EC784-A7CA-4497-A0FF-1D9B5D0E8399@python.org>
Message-ID: <18623.59015.183377.941143@montanaro-dyndns-org.local>

    Barry> Or did he commit Mark's patch from bug 3772?  If so, that would
    Barry> count as a reviewed patch.

The checkin message says issue 3726:

    Author: vinay.sajip
    Date: Wed Sep  3 11:20:05 2008
    New Revision: 66180

    Log:
    Issue #3726: Allowed spaces in separators in logging configuration files.

    Modified:
       python/trunk/Lib/logging/config.py
       python/trunk/Lib/test/test_logging.py
       python/trunk/Misc/NEWS

I noticed because someone else (Brett?) questioned the apparent lack of
review.

Skip

From barry at python.org  Thu Sep  4 15:48:44 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 09:48:44 -0400
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
	<18623.18700.76260.893902@montanaro-dyndns-org.local>
	<1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1>
	<52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com>
Message-ID: <AD7C0A3B-94B7-42D9-ABC9-695629FEFE88@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 1:08 AM, Gregory P. Smith wrote:

> Frankly I don't see a big deal with not including it in *3.0* so long
> as a reference to where to download it as an add on (jcea's pybsddb
> site) is included in the release notes and PEP 3108.  I've updated the
> relevant documentation in 3.0.

Thanks Greg.  I think this is exactly right.  I've updated the  
RELNOTES file to point to the Cheeseshop package.  We should make sure  
to widely advertise the availability of the separate package.

Aside: can someone review the older RELNOTES items?  Ideally, I'd like  
to have this cleaned up to contain just the issues for 3.0 final.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL/nPXEjvBPtnXfVAQI89QP9GMeaK5YvShAwB1Ok2YKK0FDa0f04LKdk
0rIKpNLCR4Yhw3HTmtff8TVanbGoXcClM17UrTJSkAzzQIZDNSp6dT1Y+lnpe/Gi
/3sVObliEdpOhVg5HPBd0mBr3Vgehqo9x4fxaws4p9GcdPvE7dF/96gFIuJXAK4l
5ybmmErxECI=
=F78M
-----END PGP SIGNATURE-----

From p.f.moore at gmail.com  Thu Sep  4 16:19:07 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 4 Sep 2008 15:19:07 +0100
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
	<18623.18700.76260.893902@montanaro-dyndns-org.local>
	<bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
Message-ID: <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>

2008/9/4 Brett Cannon <brett at python.org>:
> On Wed, Sep 3, 2008 at 7:33 PM,  <skip at pobox.com> wrote:
> All but dbm.dumb require some pre-existing library to exist to compile
> against. So any platform that has the proper libraries installed will
> be able to use ndbm or gnu, but as for which platforms that are I do
> not know.

On Windows, none are available except dbm.dumb and bsddb (presently).
If bsddb is to be removed, can/should one of the other "real" dbm
variants be added to the standard binary, so that Windows users have
at least one usable dbm option?

Paul

From barry at python.org  Thu Sep  4 16:39:46 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 10:39:46 -0400
Subject: [Python-3000] Not releasing rc1 tonight
In-Reply-To: <18623.59015.183377.941143@montanaro-dyndns-org.local>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<18623.49553.393129.601096@montanaro-dyndns-org.local>
	<4C4EC784-A7CA-4497-A0FF-1D9B5D0E8399@python.org>
	<18623.59015.183377.941143@montanaro-dyndns-org.local>
Message-ID: <CDC12824-E981-4739-88EE-3C504EEB7D23@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 9:45 AM, skip at pobox.com wrote:

>
>    Barry> Or did he commit Mark's patch from bug 3772?  If so, that  
> would
>    Barry> count as a reviewed patch.
>
> The checkin message says issue 3726:
>
>    Author: vinay.sajip
>    Date: Wed Sep  3 11:20:05 2008
>    New Revision: 66180
>
>    Log:
>    Issue #3726: Allowed spaces in separators in logging  
> configuration files.
>
>    Modified:
>       python/trunk/Lib/logging/config.py
>       python/trunk/Lib/test/test_logging.py
>       python/trunk/Misc/NEWS
>
> I noticed because someone else (Brett?) questioned the apparent lack  
> of
> review.

Yep, that's a different issue.  Unless someone wants to vouch for the  
committed patch after the fact, could someone please revert the change  
and contact Vinay to get a proper fix reviewed?  I noticed that he  
says in the tracker issue that what was committed was modified from  
the posted patch.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL/zMnEjvBPtnXfVAQJY3QP+LNXhx1YGuCHSw/D2n0yVBj1PLLUbgYnp
k/+zWWmvIRc8YSApV1YyYR4iXfqqYFoi1SH0eh7F1k9+2CZ51HHD0p6CZ0Eb1FQ2
405ocxT28R3UR/E0ozxFca3IuNhGPR2FI/BCfsLrdrA3UtHA4XvZMDvM3KxEMarl
9WdYgop/I8Y=
=b6Ry
-----END PGP SIGNATURE-----

From python at rcn.com  Thu Sep  4 16:59:13 2008
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 4 Sep 2008 07:59:13 -0700
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local><bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
	<79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>
Message-ID: <E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>

[Brett Cannon]
>> On Wed, Sep 3, 2008 at 7:33 PM,  <skip at pobox.com> wrote:
>> All but dbm.dumb require some pre-existing library to exist to compile
>> against. So any platform that has the proper libraries installed will
>> be able to use ndbm or gnu, but as for which platforms that are I do
>> not know.

[Paul Moore]
> On Windows, none are available except dbm.dumb and bsddb (presently).
> If bsddb is to be removed, can/should one of the other "real" dbm
> variants be added to the standard binary, so that Windows users have
> at least one usable dbm option?

The is a major problem for shelves (which I use often).

Some alternative needs to be put in place before bsddb gets ripped-out.

Was any of this discussed or thought through for PEP 3108?
What's wrong with deprecating in 3.0 and replacing in 3.1?
What advantage is there in ripping out bsddb during a release
candidate and crippling shelves on Windows?   
Why the rush to rip it out no matter what?

Raymond

From jcea at jcea.es  Thu Sep  4 17:03:53 2008
From: jcea at jcea.es (Jesus Cea)
Date: Thu, 04 Sep 2008 17:03:53 +0200
Subject: [Python-3000] About "daemon" in threading module
Message-ID: <48BFF8D9.3030002@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

First we had "thread.setDaemon()". This was not PEP8, so Python 3.0
renamed it to "thread.set_daemon()". Lately Python 3.0 changes the
method to an attribute "thread.daemon".

I think the last change is risky, because you can mistype and create a
new attribute, instead of set daemon mode. Since daemon mode is only
usually visible when things goes wrong (the main thread dies), you can
miss the bug for a long time.

Similarly with the new "thread.name" attribute.

I would rather revert to the method style, or redo the class to avoid
new attribute creation, maybe via some "thread.__setattr__()" magic.

Sorry if this issue is already discussed. I don't find any previous
thread about this.

PS: If you mistype the method name, you get an error. If you mistype the
attribute assignment, the bug goes unnoticed.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSL/415lgi5GaxT1NAQJ+ugP/RA0wG1b2/q0C96FYq18AIMGONrfKMh7+
BjxrdAL3knqwPXsJW1JbgQV5vsOLpMqx6v8epdFN9FLH5KBLTW3jDmn3OAh7FwyN
5CcoXFc8MWT4/tsa2+SUOVC1rBibx5+b2Cz28KK/RnXK6O4WR/u/3f8fpssMApdW
kC6Y0MqFoD4=
=2/g3
-----END PGP SIGNATURE-----

From p.f.moore at gmail.com  Thu Sep  4 17:07:31 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 4 Sep 2008 16:07:31 +0100
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <48BFF8D9.3030002@jcea.es>
References: <48BFF8D9.3030002@jcea.es>
Message-ID: <79990c6b0809040807sddbb366h8c8ed0286334d570@mail.gmail.com>

2008/9/4 Jesus Cea <jcea at jcea.es>:
> PS: If you mistype the method name, you get an error. If you mistype the
> attribute assignment, the bug goes unnoticed.

I'm neutral over the threading change, but this is a good point to
consider in general as part of the "method vs property" question when
designing classes.

Paul.

From barry at python.org  Thu Sep  4 17:24:05 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 11:24:05 -0400
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local><bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
	<79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>
	<E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>
Message-ID: <AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Since rc1 did not go out last night, bsddb could be restored.  I still  
don't think it should be, but at this point it's up to Guido to  
override, and I will abide by his decision.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSL/9lXEjvBPtnXfVAQKMygQAi1Wtox+GxN4ZiC1lyiqX3Vklyket4Was
5bCxt4On1DeGmacunjOdCigCzG4Or6fbSe7cSh1y4CbTL3httxKLTh1gok6PME/X
k9MBSY3T6j1ykcvc64ThMoaGvgNuFXKmYo7ifLaDfp2KLSpmGDHVj3uuuYcVPmMy
HwhYQDIEda4=
=Jzrq
-----END PGP SIGNATURE-----

From jnoller at gmail.com  Thu Sep  4 17:33:51 2008
From: jnoller at gmail.com (Jesse Noller)
Date: Thu, 4 Sep 2008 11:33:51 -0400
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <48BFF8D9.3030002@jcea.es>
References: <48BFF8D9.3030002@jcea.es>
Message-ID: <4222a8490809040833r4af29fa1j50af4d69ed3be683@mail.gmail.com>

On Thu, Sep 4, 2008 at 11:03 AM, Jesus Cea <jcea at jcea.es> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0
> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the
> method to an attribute "thread.daemon".
>
> I think the last change is risky, because you can mistype and create a
> new attribute, instead of set daemon mode. Since daemon mode is only
> usually visible when things goes wrong (the main thread dies), you can
> miss the bug for a long time.
>
> Similarly with the new "thread.name" attribute.
>
> I would rather revert to the method style, or redo the class to avoid
> new attribute creation, maybe via some "thread.__setattr__()" magic.
>
> Sorry if this issue is already discussed. I don't find any previous
> thread about this.
>
> PS: If you mistype the method name, you get an error. If you mistype the
> attribute assignment, the bug goes unnoticed.
>

Jesus -

The discussion and implementation happened here:

http://bugs.python.org/issue3352

and:

http://mail.python.org/pipermail/python-dev/2008-June/080285.html

-Jesse

From python at rcn.com  Thu Sep  4 17:35:26 2008
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 4 Sep 2008 08:35:26 -0700
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local><bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com><79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com><E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>
	<AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>
Message-ID: <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>

[Barry]
> Since rc1 did not go out last night, bsddb could be restored.  I still  
> don't think it should be, but at this point it's up to Guido to  
> override, and I will abide by his decision.

Put in my vote for restoration, deprecation, and thought-out removal/replacement in 3.1.
The ensuing discussions have made it clear that immediate removal is controversial and problematic.
Also, part of the original motivation disappeared when Jesus Cea stepped-up.

Raymond

From lists at cheimes.de  Thu Sep  4 17:39:22 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 04 Sep 2008 17:39:22 +0200
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <48BFF8D9.3030002@jcea.es>
References: <48BFF8D9.3030002@jcea.es>
Message-ID: <g9ovfa$7ch$1@ger.gmane.org>

Jesus Cea wrote:
> I would rather revert to the method style, or redo the class to avoid
> new attribute creation, maybe via some "thread.__setattr__()" magic.

Or maybe with __slots__ in the threading class. It'd also safe some 
memory and subclasses of Threading still work as expected.

Christian

From jnoller at gmail.com  Thu Sep  4 17:41:03 2008
From: jnoller at gmail.com (Jesse Noller)
Date: Thu, 4 Sep 2008 11:41:03 -0400
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <g9ovfa$7ch$1@ger.gmane.org>
References: <48BFF8D9.3030002@jcea.es> <g9ovfa$7ch$1@ger.gmane.org>
Message-ID: <4222a8490809040841k3905bcd0h495f5b673f621bee@mail.gmail.com>

On Thu, Sep 4, 2008 at 11:39 AM, Christian Heimes <lists at cheimes.de> wrote:
> Jesus Cea wrote:
>>
>> I would rather revert to the method style, or redo the class to avoid
>> new attribute creation, maybe via some "thread.__setattr__()" magic.
>
> Or maybe with __slots__ in the threading class. It'd also safe some memory
> and subclasses of Threading still work as expected.
>
> Christian

FWIW: Any change like this should also patch the multiprocessing
module - both threading and multiprocessing are "moving in lock-step"

From solipsis at pitrou.net  Thu Sep  4 17:47:36 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 4 Sep 2008 15:47:36 +0000 (UTC)
Subject: [Python-3000] About "daemon" in threading module
References: <48BFF8D9.3030002@jcea.es>
Message-ID: <loom.20080904T154530-598@post.gmane.org>

Jesus Cea <jcea <at> jcea.es> writes:
> 
> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0
> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the
> method to an attribute "thread.daemon".
> 
> I think the last change is risky, because you can mistype and create a
> new attribute, instead of set daemon mode. Since daemon mode is only
> usually visible when things goes wrong (the main thread dies), you can
> miss the bug for a long time.

I've never understood why the "daemon" flag couldn't be passed as one of the
constructor arguments. It would make code shorter, and avoid the mistyping risk
mentioned by Jesus. It also sounds saner, since you shouldn't change the flag
after the thread is started anyway.

Regards

Antoine.

From jcea at jcea.es  Thu Sep  4 18:02:22 2008
From: jcea at jcea.es (Jesus Cea)
Date: Thu, 04 Sep 2008 18:02:22 +0200
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>	<18623.18700.76260.893902@montanaro-dyndns-org.local>	<1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1>
	<52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com>
Message-ID: <48C0068E.1060606@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gregory P. Smith wrote:
> The fact that the Python Lib/bsddb/test/ test suite has uncovered
> actual Oracle/Sleepycat BerkeleyDB bugs in supposedly stable releases
> has always disturbed me.

This is true. But python uses openssl, for example, and it must be
updated from time to time, for example. The only difference is that the
bugs are not discovered by python.

In fact, I can say that Berkeley DB 4.7 snapshot releases crashed a lot
with bsddb testsuite. Berkeley DB 4.7.25 is rock solid, in part, because
of pybsddb and the feedback between me and Oracle people.

> PS Thank you jcea for your wonderful work on improving bsddb!
> Regardless of whether it appears in the standard library in the future
> you're making many users very happy with your work.

I would give a leg to know how many, actually :). Err, put that sawchain
down! :-P

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSMAGjplgi5GaxT1NAQIIUgQApuIbyiv6iLf2SgiiXQEJ3iJ3gLK2ksne
Nry+Yb1fwR1DfEyUd94QqkpI6rAsCZblqC2uboNblx59naz6V4Xlg8ts2ZCr0k5y
GD6vDjV+PeGSgDKZHf8X28kCikVRXDSwPcpT659hjjYBfaezxQDkrVMbt+RQFU8H
KbKWpz5A5fc=
=rHIW
-----END PGP SIGNATURE-----

From jcea at jcea.es  Thu Sep  4 18:20:59 2008
From: jcea at jcea.es (Jesus Cea)
Date: Thu, 04 Sep 2008 18:20:59 +0200
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local><bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com><79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com><E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>	<AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>
	<00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>
Message-ID: <48C00AEB.6030708@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Raymond Hettinger wrote:
> [Barry]
>> Since rc1 did not go out last night, bsddb could be restored.  I
>> still  don't think it should be, but at this point it's up to Guido
>> to  override, and I will abide by his decision.
> 
> Put in my vote for restoration, deprecation, and thought-out
> removal/replacement in 3.1.
> The ensuing discussions have made it clear that immediate removal is
> controversial and problematic.
> Also, part of the original motivation disappeared when Jesus Cea
> stepped-up.

I have a travel this weekend (4 days weekend in Madrid, until
Wednesday), but I can cancel and be available for any remaining issue. I
would need to know in the next 24-30 hours or so :).

I'm a bit worried about you restoring bsddb and be pulled-off shortly
again if I can't resolve any remaining issues in minutes :). But I would
take the risk.

PS: Something to consider, also, is that (I think), some buildbot could
crash (core dump) while running the bsddb testsuite. That would be an
issue with Berkeley DB installation, not bsddb module, undetected until
now because no code used the Berkeley DB library. Since the run would
crash with core dump, it wouldn't verify the entire python testsuite.
That would be a big issue. What can be done then, if the buildbot admins
are not available to verify library versions, provide the coredump, and so?.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSMAK6Jlgi5GaxT1NAQK++QP+INECV7OaHxXDoJNj4BUT2aImNuNq+uFc
EkYeWXXIzvbnOjukTT22V6UrH/eXdqKW/E/QMAzw7K9h35/13Xedz+5VLigVVOUK
ELW/0/bBiSaEHiwpLPwfWeq1eCN5NgwlgiHVMH9XJtyCh5Qw8f/pio+llcoJ2wun
NZ9FQnRP3Sw=
=VUCY
-----END PGP SIGNATURE-----

From barry at python.org  Thu Sep  4 19:07:47 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 13:07:47 -0400
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <48C00AEB.6030708@jcea.es>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local><bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com><79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com><E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>	<AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>
	<00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>
	<48C00AEB.6030708@jcea.es>
Message-ID: <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 12:20 PM, Jesus Cea wrote:

> I'm a bit worried about you restoring bsddb and be pulled-off shortly
> again if I can't resolve any remaining issues in minutes :). But I  
> would
> take the risk.

Don't worry about that.  Guido's decision will be binding for 3.0.

> PS: Something to consider, also, is that (I think), some buildbot  
> could
> crash (core dump) while running the bsddb testsuite. That would be an
> issue with Berkeley DB installation, not bsddb module, undetected  
> until
> now because no code used the Berkeley DB library. Since the run would
> crash with core dump, it wouldn't verify the entire python testsuite.
> That would be a big issue. What can be done then, if the buildbot  
> admins
> are not available to verify library versions, provide the coredump,  
> and so?.

Live with it.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMAV43EjvBPtnXfVAQJRVAP/fr9hqBegZhWSlBa9Rtt0ODfysGCW112Q
4fgOG3LiWjxKOvdk4cTJ4zpRvBqwhF8h5ZSImDPJpQbE+Nzw8qNFGKGjTP37TRfV
XNBm6gJlzEvX8B3N7BtvVWk7LmzhVP2+Dcs/36drGwnrfclUIpOjBjlKktet7sL1
eQan89KEPbg=
=vYG1
-----END PGP SIGNATURE-----

From guido at python.org  Thu Sep  4 19:57:40 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 4 Sep 2008 10:57:40 -0700
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
	<18623.18700.76260.893902@montanaro-dyndns-org.local>
	<bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
	<79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>
	<E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>
	<AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>
	<00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>
	<48C00AEB.6030708@jcea.es>
	<36FB505D-1489-4BD0-84D6-74F2791F914D@python.org>
Message-ID: <ca471dc20809041057k346491cbg2975cedf2c5f8186@mail.gmail.com>

[I don't know who added my Google address to the CC list. Please don't
do that again.]

On Thu, Sep 4, 2008 at 10:07 AM, Barry Warsaw <barry at python.org> wrote:
> On Sep 4, 2008, at 12:20 PM, Jesus Cea wrote:
>
>> I'm a bit worried about you restoring bsddb and be pulled-off shortly
>> again if I can't resolve any remaining issues in minutes :). But I would
>> take the risk.
>
> Don't worry about that.  Guido's decision will be binding for 3.0.

I am still in favor of removing bsddb from Python 3.0. It depends on a
3rd party library of enormous complexity whose stability cannot always
be taken for granted. Arguments about code ownership, release cycles,
bugbot stability and more all point towards keeping it separate. I
consider it no different in nature than 3rd party UI packages (e.g.
wxPython or PyQt) or relational database bindings (e.g. the MySQL or
PostgreSQL bindings): very useful to a certain class of users, but
outside the scope of the core distribution.

Python 3.0 is a perfect opportunity to say goodbye to bsddb as a
standard library component. For apps that depend on it, it is just a
download away -- deprecating in 3.0 and removal in 3.1 would actually
send the *wrong* message, since it is very much alive! I am grateful
for Jesus to have taken over maintenance, and hope that the package
blossoms in its newfound freedom.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Sep  4 20:01:13 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 4 Sep 2008 11:01:13 -0700
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
	<18623.18700.76260.893902@montanaro-dyndns-org.local>
	<bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
	<79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>
	<E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>
	<AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>
	<00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>
	<48C00AEB.6030708@jcea.es>
	<36FB505D-1489-4BD0-84D6-74F2791F914D@python.org>
Message-ID: <ca471dc20809041101m6f1b184en86aa36a9d880a5af@mail.gmail.com>

[I don't know who added my Google address to the CC list. Please don't
do that again.]

On Thu, Sep 4, 2008 at 10:07 AM, Barry Warsaw <barry at python.org> wrote:
> On Sep 4, 2008, at 12:20 PM, Jesus Cea wrote:
>
>> I'm a bit worried about you restoring bsddb and be pulled-off shortly
>> again if I can't resolve any remaining issues in minutes :). But I would
>> take the risk.
>
> Don't worry about that.  Guido's decision will be binding for 3.0.

I am still in favor of removing bsddb from Python 3.0. It depends on a
3rd party library of enormous complexity whose stability cannot always
be taken for granted. Arguments about code ownership, release cycles,
bugbot stability and more all point towards keeping it separate. I
consider it no different in nature than 3rd party UI packages (e.g.
wxPython or PyQt) or relational database bindings (e.g. the MySQL or
PostgreSQL bindings): very useful to a certain class of users, but
outside the scope of the core distribution.

Python 3.0 is a perfect opportunity to say goodbye to bsddb as a
standard library component. For apps that depend on it, it is just a
download away -- deprecating in 3.0 and removal in 3.1 would actually
send the *wrong* message, since it is very much alive! I am grateful
for Jesus to have taken over maintenance, and hope that the package
blossoms in its newfound freedom.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Thu Sep  4 20:16:26 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 14:16:26 -0400
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <ca471dc20809041057k346491cbg2975cedf2c5f8186@mail.gmail.com>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>
	<18623.18700.76260.893902@montanaro-dyndns-org.local>
	<bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
	<79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>
	<E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>
	<AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>
	<00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>
	<48C00AEB.6030708@jcea.es>
	<36FB505D-1489-4BD0-84D6-74F2791F914D@python.org>
	<ca471dc20809041057k346491cbg2975cedf2c5f8186@mail.gmail.com>
Message-ID: <5149970C-83E2-4F40-903D-06601860BE0B@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 1:57 PM, Guido van Rossum wrote:

> I am still in favor of removing bsddb from Python 3.0. It depends on a
> 3rd party library of enormous complexity whose stability cannot always
> be taken for granted. Arguments about code ownership, release cycles,
> bugbot stability and more all point towards keeping it separate. I
> consider it no different in nature than 3rd party UI packages (e.g.
> wxPython or PyQt) or relational database bindings (e.g. the MySQL or
> PostgreSQL bindings): very useful to a certain class of users, but
> outside the scope of the core distribution.
>
> Python 3.0 is a perfect opportunity to say goodbye to bsddb as a
> standard library component. For apps that depend on it, it is just a
> download away -- deprecating in 3.0 and removal in 3.1 would actually
> send the *wrong* message, since it is very much alive! I am grateful
> for Jesus to have taken over maintenance, and hope that the package
> blossoms in its newfound freedom.

Thanks Guido.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMAl+3EjvBPtnXfVAQKOFAQAoccw1UNoJ9EIqpkauyhD6ITloOYdMEEC
Mqp4hmJWgW8PO2J4YzGYGr7W4ty4JdsL9VGxga20bT9iFvIiTVR8ZOkAPInzhZke
bpiXhac5Z6v9I0+8xLbEguiM9z10xHVXscCmQdjBkk4RRWTtoioq+NCESeU6qgrM
LxHfAPlCMms=
=THGn
-----END PGP SIGNATURE-----

From jcea at jcea.es  Thu Sep  4 20:33:25 2008
From: jcea at jcea.es (Jesus Cea)
Date: Thu, 04 Sep 2008 20:33:25 +0200
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <ca471dc20809041057k346491cbg2975cedf2c5f8186@mail.gmail.com>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>	<18623.18700.76260.893902@montanaro-dyndns-org.local>	<bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>	<79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>	<E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>	<AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>	<00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>	<48C00AEB.6030708@jcea.es>	<36FB505D-1489-4BD0-84D6-74F2791F914D@python.org>
	<ca471dc20809041057k346491cbg2975cedf2c5f8186@mail.gmail.com>
Message-ID: <48C029F5.70307@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Guido van Rossum wrote:
> I am still in favor of removing bsddb from Python 3.0.

BDFL has talked.

I want to record this:

* I will keep maintaining bsddb in Python 2.6. No idea what is the plan
for 2.7, nevertheless.

* I will keep bsddb updated and available via PYPI, both for 2.x and 3.x
branches. Source only. Windows users will be at the mercy of other
compiling the module and making it available.

* I will be available if the decision to drop bsddb from standard lib is
reconsidered.

* I will try to find another Python area of interest to me, to fully
honor my commit privileges.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSMAp8Zlgi5GaxT1NAQLLpAP/Yb5Po5krSTt+L6llyIx/CDkcr60M57Kc
W034uMSH8IfQ4cswkM+d96BwCHlDczew5qWzHYR/f7K0YeZPaKWuT4Z8/WlchejV
oGC0orGq/NQ1LnNyGjzgdFq50htdQt93EWUvBjhQwOyFeiLb1XuxCFyM/ZA3ge2y
fCaamryHnqA=
=1RcD
-----END PGP SIGNATURE-----

From brett at python.org  Thu Sep  4 20:41:30 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 4 Sep 2008 11:41:30 -0700
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <48C029F5.70307@jcea.es>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>
	<bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
	<79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>
	<E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>
	<AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>
	<00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>
	<48C00AEB.6030708@jcea.es>
	<36FB505D-1489-4BD0-84D6-74F2791F914D@python.org>
	<ca471dc20809041057k346491cbg2975cedf2c5f8186@mail.gmail.com>
	<48C029F5.70307@jcea.es>
Message-ID: <bbaeab100809041141nb49508cgc4ba99fd38797ed6@mail.gmail.com>

On Thu, Sep 4, 2008 at 11:33 AM, Jesus Cea <jcea at jcea.es> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Guido van Rossum wrote:
>> I am still in favor of removing bsddb from Python 3.0.
>
> BDFL has talked.
>
> I want to record this:
>
> * I will keep maintaining bsddb in Python 2.6. No idea what is the plan
> for 2.7, nevertheless.
>

Great! As everyone has said, this is nothing personal and I am glad
you are not taking it that way.

As for 2.7, it's a wait and see. My guess is that we will have one
where we have backported some more stuff from 3.0/3.1 to 2.7 to keep
the transition easy.

> * I will keep bsddb updated and available via PYPI, both for 2.x and 3.x
> branches. Source only. Windows users will be at the mercy of other
> compiling the module and making it available.
>
> * I will be available if the decision to drop bsddb from standard lib is
> reconsidered.
>
> * I will try to find another Python area of interest to me, to fully
> honor my commit privileges.
>

As I mentioned in another email, I think it would be a great idea to
change the dbm package so that bsddb can easily be hooked into the dbm
package as a 3rd-party DB back-end (along with any other DB backend
that wants to). If you want to work on that for 2.7/3.1 it would be
greatly appreciated!

-Brett

From barry at python.org  Thu Sep  4 20:52:26 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 14:52:26 -0400
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <48C029F5.70307@jcea.es>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>	<18623.18700.76260.893902@montanaro-dyndns-org.local>	<bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>	<79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>	<E765C4AC7A704CB693B7ACB64D795730@RaymondLaptop1>	<AA856BF4-2B0B-4CDF-8381-986C9C8060D1@python.org>	<00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1>	<48C00AEB.6030708@jcea.es>	<36FB505D-1489-4BD0-84D6-74F2791F914D@python.org>
	<ca471dc20809041057k346491cbg2975cedf2c5f8186@mail.gmail.com>
	<48C029F5.70307@jcea.es>
Message-ID: <6C8A6A2C-0234-4DFD-BBF2-6AF8A4383349@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 2:33 PM, Jesus Cea wrote:

> * I will try to find another Python area of interest to me, to fully
> honor my commit privileges.

BTW Jesus, if you want to maintain the code on python.org, we can  
create an area in the sandbox for you.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMAuanEjvBPtnXfVAQLD6QP+MZdd5MtqxpNOO2jFG0KpKf1/2f+jzaHS
UCyjTINbkRIBoAA8QYOTtbkVdNvALnxatCR4N6HPKPEKxVVdAptOP3QQr4iP3dU4
YjBYfd6Ki17gqcuS65ELwjNowPAow+E+8duOfRK3QmrcOS0nl/soTSh1j4ylF6xZ
D9v15jK4NKY=
=VN3g
-----END PGP SIGNATURE-----

From guido at python.org  Thu Sep  4 21:36:12 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 4 Sep 2008 12:36:12 -0700
Subject: [Python-3000] Problem with grammar for 'except'?
In-Reply-To: <A38FA5A4111844B8B3CF639A04795311@RaymondLaptop1>
References: <bbaeab100809032110i58bdbcefpa66d5536ef02c7dc@mail.gmail.com>
	<A38FA5A4111844B8B3CF639A04795311@RaymondLaptop1>
Message-ID: <ca471dc20809041236l3955a60blcf8046a38adfd928@mail.gmail.com>

On Wed, Sep 3, 2008 at 9:25 PM, Raymond Hettinger <python at rcn.com> wrote:
> [Brett]
>>
>> I gave a talk last night at the Vancouver Python users group on
>> 2.6/3.0, and I tried the following code and it failed during a live
>> demo::
>>
>>  >>> try: pass
>>  ... except Exception, Exception: pass
>>   File "<stdin>", line 2
>>     except Exception, Exception: pass
>>                                ^
>>  SyntaxError: invalid syntax
>>
>> Now from what I can tell from PEP 3110, that should be legal in 3.0.
>> Am I reading the PEP correctly?
>
> Don't think so.
> The parens are necessary for a tuple of exceptions
> lest it be confused with the old "except E, v" syntax
> which meant "except E as e".
>
> Maybe in 3.1, the paren requirement can be dropped.

I would wait longer -- until well after the 2.x line is dead and
buried. It will take some time for every Python user to train their
Python fingers not to type "except E, v:" and we don't want people who
are late in migrating inserting bugs like this in their first 3.x
program.

> But for 3.0, it would be a problem given that old
> scripts would start getting misinterpreted.
>
> I did something similar for list.sort() by requiring
> keyword arguments.  That way, we wouldn't have
> list.sort(f) running with f as a cmp function 2.6 and
> as a key function in 3.0.
>
>
> Raymond
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Sep  4 21:56:51 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 4 Sep 2008 12:56:51 -0700
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <loom.20080904T154530-598@post.gmane.org>
References: <48BFF8D9.3030002@jcea.es>
	<loom.20080904T154530-598@post.gmane.org>
Message-ID: <ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>

On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Jesus Cea <jcea <at> jcea.es> writes:
>>
>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0
>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the
>> method to an attribute "thread.daemon".
>>
>> I think the last change is risky, because you can mistype and create a
>> new attribute, instead of set daemon mode. Since daemon mode is only
>> usually visible when things goes wrong (the main thread dies), you can
>> miss the bug for a long time.
>
> I've never understood why the "daemon" flag couldn't be passed as one of the
> constructor arguments. It would make code shorter, and avoid the mistyping risk
> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag
> after the thread is started anyway.

As to the why question, this was done to match the Java Thread class.
I don't want to speculate why the Java API was designed this way --
possibly it was a relic of an earlier API version in Java, but
possibly there's a reason I can't fathom right now. After all, there
are excellent reasons why start() is a separate call...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Thu Sep  4 22:05:21 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Sep 2008 16:05:21 -0400
Subject: [Python-3000] [Python-3000-checkins] r66218 -
	python/branches/py3k/RELNOTES
In-Reply-To: <ca471dc20809041251k113d44dcj1032ce1ede440111@mail.gmail.com>
References: <20080904134435.6093A1E400B@bag.python.org>
	<ca471dc20809041251k113d44dcj1032ce1ede440111@mail.gmail.com>
Message-ID: <B4D6452A-5438-4A00-9099-65BE9C1C2E63@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 3:51 PM, Guido van Rossum wrote:

> I'm a little confused -- why did you remove the release notes for
> previous betas but leave those for the alphas in place? ISTM that the
> file was an accumulation of release notes throughout the various
> releases -- just like Misc/NEWS, but with a different focus. This is
> how release notes in other products I've seen typically work, too.

Mostly because I wasn't sure which of those release notes are still  
relevant.  I asked (albeit in a different thread) for some assistance  
in determining what the current state of those are.

I'm not so sure it's helpful to separately indicate release notes for  
alphas and betas, and it's definitely not helpful to include items  
that are no longer relevant.  My thought was to list only those that  
apply to the final release, and then track them with public stable  
releases such as 3.0.1, 3.0.2, etc.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMA/gXEjvBPtnXfVAQKAhAP+O4/0eQYulAcc1gcuf4fO74kJlUBmF+i9
ezQIoPBSKh98xPGybqPcuwqNw5ZES3TSt4zgX4MNdD+4aLdXE6n/lcTS/OTgcGtd
5Tyw2ltVy9WfNE1oKK785B25uQQe94IpaYORMZW4ABdd2IYL0KOTVbRd3zjKKHqC
krx0XRSzgzU=
=7X4W
-----END PGP SIGNATURE-----

From steven.bethard at gmail.com  Thu Sep  4 22:05:31 2008
From: steven.bethard at gmail.com (Steven Bethard)
Date: Thu, 4 Sep 2008 14:05:31 -0600
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>
References: <48BFF8D9.3030002@jcea.es>
	<loom.20080904T154530-598@post.gmane.org>
	<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>
Message-ID: <d11dcfba0809041305i1288de4dje90090161f87930b@mail.gmail.com>

On Thu, Sep 4, 2008 at 1:56 PM, Guido van Rossum <guido at python.org> wrote:
> On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Jesus Cea <jcea <at> jcea.es> writes:
>>>
>>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0
>>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the
>>> method to an attribute "thread.daemon".
>>>
>>> I think the last change is risky, because you can mistype and create a
>>> new attribute, instead of set daemon mode. Since daemon mode is only
>>> usually visible when things goes wrong (the main thread dies), you can
>>> miss the bug for a long time.
>>
>> I've never understood why the "daemon" flag couldn't be passed as one of the
>> constructor arguments. It would make code shorter, and avoid the mistyping risk
>> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag
>> after the thread is started anyway.
>
> As to the why question, this was done to match the Java Thread class.
> I don't want to speculate why the Java API was designed this way --
> possibly it was a relic of an earlier API version in Java, but
> possibly there's a reason I can't fathom right now. After all, there
> are excellent reasons why start() is a separate call...

This may or may not be relevant, but since Java doesn't support
argument defaults, it's often easier to define a very simple
constructor, and use a bunch of setters if you want to modify the
defaults. I've done this myself when programming in Java to avoid the
exponential number of constructor overloads that would be necessary to
do defaults properly.

Of course, I don't know whether or not that had anything to do with
this particular Java decision.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
 --- Bucky Katt, Get Fuzzy

From brett at python.org  Thu Sep  4 22:15:08 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 4 Sep 2008 13:15:08 -0700
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
Message-ID: <bbaeab100809041315w46e5099o4e2da93b8d3b5725@mail.gmail.com>

On Wed, Sep 3, 2008 at 8:41 PM, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I'm not going to release rc1 tonight.  There are too many open release
> blockers that I don't want to defer, and I'd like the buildbots to churn
> through the bsddb removal on all platforms.  Let me first thank Benjamin,
> Brett, Mark and Antoine for their help on IRC tonight.
>
> Here are the issues I'm not comfortable with deferring:
>
>  3640 test_cpickle crash on AMD64 Windows build
> 874900 threading module can deadlock after fork
>  3574 compile() cannot decode Latin-1 source encodings
>  3657 pickle can pickle the wrong function
>  3187 os.listdir can return byte strings
>  3660 reference leaks in 3.0
>  3594 PyTokenizer_FindEncoding() never succeeds
>  3629 Py30b3 won't compile a regex that compiles with 2.5.2 and 30b2
>

I just added issue 3776 to this list: deprecate bsddb/dbhash in 2.6
for removal in 3.0 . There is a patch attached to the issue to be
reviewed.

-Brett

From ncoghlan at gmail.com  Thu Sep  4 23:20:49 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 05 Sep 2008 07:20:49 +1000
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>
References: <48BFF8D9.3030002@jcea.es>	<loom.20080904T154530-598@post.gmane.org>
	<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>
Message-ID: <48C05131.60209@gmail.com>

Guido van Rossum wrote:
> On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Jesus Cea <jcea <at> jcea.es> writes:
>>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0
>>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the
>>> method to an attribute "thread.daemon".
>>>
>>> I think the last change is risky, because you can mistype and create a
>>> new attribute, instead of set daemon mode. Since daemon mode is only
>>> usually visible when things goes wrong (the main thread dies), you can
>>> miss the bug for a long time.
>> I've never understood why the "daemon" flag couldn't be passed as one of the
>> constructor arguments. It would make code shorter, and avoid the mistyping risk
>> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag
>> after the thread is started anyway.
> 
> As to the why question, this was done to match the Java Thread class.
> I don't want to speculate why the Java API was designed this way --
> possibly it was a relic of an earlier API version in Java, but
> possibly there's a reason I can't fathom right now. After all, there
> are excellent reasons why start() is a separate call...

Hmm, having (daemon=False) as a parameter on start() would probably be
an even better API than having it on __init__() (modulo subclassing
compatibility concerns).

Regarding Jesus concern, you can always call t._set_daemon(True) and
t._set_name(whatever) if you want the extra defence against typographic
errors. The potential for mistyping attribute names is hardly a problem
that is unique to threading.Thread.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Thu Sep  4 23:30:34 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 05 Sep 2008 07:30:34 +1000
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <48C0068E.1060606@jcea.es>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>	<18623.18700.76260.893902@montanaro-dyndns-org.local>	<1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1>	<52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com>
	<48C0068E.1060606@jcea.es>
Message-ID: <48C0537A.9050404@gmail.com>

Jesus Cea wrote:
> This is true. But python uses openssl, for example, and it must be
> updated from time to time, for example. The only difference is that the
> bugs are not discovered by python.
> 
> In fact, I can say that Berkeley DB 4.7 snapshot releases crashed a lot
> with bsddb testsuite. Berkeley DB 4.7.25 is rock solid, in part, because
> of pybsddb and the feedback between me and Oracle people.

I think that comparison actually cuts to the heart of the issue - the
problem isn't the stability of pybsddb itself, it's the stability of the
underlying bsddb libraries. We don't typically have anywhere near the
same level of problems with other wrapped interfaces (tk, sqlite3,
openssl come to mind).

Making anydbm/whichdb more extensible to allow any DB-API compliant
interfaces to add themselves in 2.7/3.1 in a supported fashion would
definitely be a good change though. The ActiveState and Enthought folks
may also give some serious thought to continuing to bundle pybsddb even
with their Python 3.0 releases (especially for Windows).

Cheers,
Nick.

_______________________________________________
Python-3000 mailing list
Python-3000 at python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/ncoghlan%40gmail.com

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From guido at python.org  Fri Sep  5 00:12:48 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 4 Sep 2008 15:12:48 -0700
Subject: [Python-3000] [Python-3000-checkins] r66218 -
	python/branches/py3k/RELNOTES
In-Reply-To: <B4D6452A-5438-4A00-9099-65BE9C1C2E63@python.org>
References: <20080904134435.6093A1E400B@bag.python.org>
	<ca471dc20809041251k113d44dcj1032ce1ede440111@mail.gmail.com>
	<B4D6452A-5438-4A00-9099-65BE9C1C2E63@python.org>
Message-ID: <ca471dc20809041512g2240c282p786ce63442f27cf7@mail.gmail.com>

On Thu, Sep 4, 2008 at 1:05 PM, Barry Warsaw <barry at python.org> wrote:
>
> On Sep 4, 2008, at 3:51 PM, Guido van Rossum wrote:
>
>> I'm a little confused -- why did you remove the release notes for
>> previous betas but leave those for the alphas in place? ISTM that the
>> file was an accumulation of release notes throughout the various
>> releases -- just like Misc/NEWS, but with a different focus. This is
>> how release notes in other products I've seen typically work, too.
>
> Mostly because I wasn't sure which of those release notes are still
> relevant.  I asked (albeit in a different thread) for some assistance in
> determining what the current state of those are.
>
> I'm not so sure it's helpful to separately indicate release notes for alphas
> and betas, and it's definitely not helpful to include items that are no
> longer relevant.  My thought was to list only those that apply to the final
> release, and then track them with public stable releases such as 3.0.1,
> 3.0.2, etc.

Well, all the alpha notes should have been fixed by now too -- they
all describe temporary deviations from our high standard for releases.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Fri Sep  5 06:13:30 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 4 Sep 2008 21:13:30 -0700
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
Message-ID: <bbaeab100809042113x6d2961a5k12226625693fc618@mail.gmail.com>

On Wed, Sep 3, 2008 at 8:41 PM, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I'm not going to release rc1 tonight.  There are too many open release
> blockers that I don't want to defer, and I'd like the buildbots to churn
> through the bsddb removal on all platforms.  Let me first thank Benjamin,
> Brett, Mark and Antoine for their help on IRC tonight.
>
> Here are the issues I'm not comfortable with deferring:
>
>  3640 test_cpickle crash on AMD64 Windows build
> 874900 threading module can deadlock after fork
>  3574 compile() cannot decode Latin-1 source encodings
>  3657 pickle can pickle the wrong function
>  3187 os.listdir can return byte strings
>  3660 reference leaks in 3.0
>  3594 PyTokenizer_FindEncoding() never succeeds
>  3629 Py30b3 won't compile a regex that compiles with 2.5.2 and 30b2
>

And because I can't stop causing trouble, I just uploaded a patch for
issue3781 which solidifies warnings.catch_warnings() and its API a
little bit more. Really simple patch.

-Brett

From greg at krypto.org  Fri Sep  5 09:17:13 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Fri, 5 Sep 2008 00:17:13 -0700
Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta
	3 planned for this Wednesday)
In-Reply-To: <8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com>
References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com>
	<52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com>
	<8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com>
Message-ID: <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com>

Anyone have an opinion on http://bugs.python.org/issue3492 in regards
to it being a release blocker?

The gist of it:  zlib returns bytearrays where other modules return
bytes.  zipimport, because it uses zlib, required bytearrays instead
of bytes as input.  A few other modules also appear to return
bytearrays when they're likely better off returning bytes for
consistency.

IMHO, it seems like bytearrays should rarely be returned by the
existing standard library apis.  Since they are mutable they are
ideally suited for new APIs where they're passed in and modified.

Whats the big deal if this is not fixed before release?  Users are
likely to get frustrated at inputs not being hashable without explicit
(data copy) conversion to an immutable type.  And any code that gets
written depending on these returning bytearrays instead of bytes would
need fixing if we waited until 3.1 to fix it.

-gps

On Wed, Sep 3, 2008 at 10:42 PM, Anand Balachandran Pillai
<abpillai at gmail.com> wrote:
> On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith <greg at krypto.org> wrote:
>> I agree that this should go in.  zlib should return bytes.  other read
>> functions and similar modules like bz2module already return bytes.
>> unless i hear objections, i'll commit this in about 12 hours.
>
> +1  :)
>
>>
>
> Regards
>
> --
> -Anand
>

From greg at krypto.org  Fri Sep  5 09:30:04 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Fri, 5 Sep 2008 00:30:04 -0700
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <g9ovfa$7ch$1@ger.gmane.org>
References: <48BFF8D9.3030002@jcea.es> <g9ovfa$7ch$1@ger.gmane.org>
Message-ID: <52dc1c820809050030r48ae629ejc05a968a6e811c1e@mail.gmail.com>

On Thu, Sep 4, 2008 at 8:39 AM, Christian Heimes <lists at cheimes.de> wrote:
> Jesus Cea wrote:
>>
>> I would rather revert to the method style, or redo the class to avoid
>> new attribute creation, maybe via some "thread.__setattr__()" magic.
>
> Or maybe with __slots__ in the threading class. It'd also safe some memory
> and subclasses of Threading still work as expected.

Agreed.  This is what __slots__ is for.

From jcea at jcea.es  Fri Sep  5 18:08:28 2008
From: jcea at jcea.es (Jesus Cea)
Date: Fri, 05 Sep 2008 18:08:28 +0200
Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta
 3 planned for this Wednesday)
In-Reply-To: <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com>
References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com>	<52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com>	<8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com>
	<52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com>
Message-ID: <48C1597C.40507@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gregory P. Smith wrote:
> Anyone have an opinion on http://bugs.python.org/issue3492 in regards
> to it being a release blocker?
> 
> The gist of it:  zlib returns bytearrays where other modules return
> bytes.  zipimport, because it uses zlib, required bytearrays instead
> of bytes as input.  A few other modules also appear to return
> bytearrays when they're likely better off returning bytes for
> consistency.

I strongly agree that zlib *SHOULD* return bytes. If zimport requires a
bytearray (why?), it can do the conversion itself.

> Whats the big deal if this is not fixed before release?  Users are
> likely to get frustrated at inputs not being hashable without explicit
> (data copy) conversion to an immutable type.  And any code that gets
> written depending on these returning bytearrays instead of bytes would
> need fixing if we waited until 3.1 to fix it.

+1. Release Blocker.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSMFZeZlgi5GaxT1NAQKPhgQAgeEIZyLU6jhMYq3ALKv5/Ashpa6tGWQV
8SligFMYOyAY7THD8pxMZ3yWtghRtzIRvaDkJCRhNfJp4sGHO4gLj/FCbxe5cuLv
41pYndNQ+VXZMCVkJ5OsdVvww59vvOHKvqwSOWd6BL3JUWjKdWIe/yyAKM2+tKL9
owJPMlIE+l0=
=j0iu
-----END PGP SIGNATURE-----

From jcea at jcea.es  Fri Sep  5 18:10:38 2008
From: jcea at jcea.es (Jesus Cea)
Date: Fri, 05 Sep 2008 18:10:38 +0200
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <48C05131.60209@gmail.com>
References: <48BFF8D9.3030002@jcea.es>	<loom.20080904T154530-598@post.gmane.org>	<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>
	<48C05131.60209@gmail.com>
Message-ID: <48C159FE.7070400@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nick Coghlan wrote:
> Guido van Rossum wrote:
>> On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>> Jesus Cea <jcea <at> jcea.es> writes:
>>>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0
>>>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the
>>>> method to an attribute "thread.daemon".
>>>>
>>>> I think the last change is risky, because you can mistype and create a
>>>> new attribute, instead of set daemon mode. Since daemon mode is only
>>>> usually visible when things goes wrong (the main thread dies), you can
>>>> miss the bug for a long time.
>>> I've never understood why the "daemon" flag couldn't be passed as one of the
>>> constructor arguments. It would make code shorter, and avoid the mistyping risk
>>> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag
>>> after the thread is started anyway.
>> As to the why question, this was done to match the Java Thread class.
>> I don't want to speculate why the Java API was designed this way --
>> possibly it was a relic of an earlier API version in Java, but
>> possibly there's a reason I can't fathom right now. After all, there
>> are excellent reasons why start() is a separate call...
> 
> Hmm, having (daemon=False) as a parameter on start() would probably be
> an even better API than having it on __init__() (modulo subclassing
> compatibility concerns).

Agreed. Could it be done for 3.0?.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSMFZ/plgi5GaxT1NAQKvoQQApXIgymNMmPtL3ZX/EsllxnnW47oSgzB7
OaOzQXaFsyCo00ErUFm0hluIIHLT6Wqa4nlY1ixx6ThgytNOqHQIRgN/w6oS4kGP
WXO3pztXKaiD3gJfxjUOU7FRdOrlXjqwGryq/OPwKtxKFzyloTdTwUAhKCgpwFt3
9QLSioRgLPo=
=Pztd
-----END PGP SIGNATURE-----

From jnoller at gmail.com  Fri Sep  5 19:05:55 2008
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 5 Sep 2008 13:05:55 -0400
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <48C159FE.7070400@jcea.es>
References: <48BFF8D9.3030002@jcea.es>
	<loom.20080904T154530-598@post.gmane.org>
	<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>
	<48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es>
Message-ID: <4222a8490809051005j6a0e32c5q78f36fb6f148310@mail.gmail.com>

On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea <jcea at jcea.es> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Nick Coghlan wrote:
>> Guido van Rossum wrote:
>>> On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>>> Jesus Cea <jcea <at> jcea.es> writes:
>>>>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0
>>>>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the
>>>>> method to an attribute "thread.daemon".
>>>>>
>>>>> I think the last change is risky, because you can mistype and create a
>>>>> new attribute, instead of set daemon mode. Since daemon mode is only
>>>>> usually visible when things goes wrong (the main thread dies), you can
>>>>> miss the bug for a long time.
>>>> I've never understood why the "daemon" flag couldn't be passed as one of the
>>>> constructor arguments. It would make code shorter, and avoid the mistyping risk
>>>> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag
>>>> after the thread is started anyway.
>>> As to the why question, this was done to match the Java Thread class.
>>> I don't want to speculate why the Java API was designed this way --
>>> possibly it was a relic of an earlier API version in Java, but
>>> possibly there's a reason I can't fathom right now. After all, there
>>> are excellent reasons why start() is a separate call...
>>
>> Hmm, having (daemon=False) as a parameter on start() would probably be
>> an even better API than having it on __init__() (modulo subclassing
>> compatibility concerns).
>
> Agreed. Could it be done for 3.0?.

Personally, I'm staunchly against changing the __init__ for the
threading.Thread and multiprocessing.Process modules - it does
break/make more confusing the common subclassing people do.

I do like the idea of using __slots__, or reverting back to a
set_method entirely.

-jesse

From jnoller at gmail.com  Fri Sep  5 19:06:27 2008
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 5 Sep 2008 13:06:27 -0400
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <48C159FE.7070400@jcea.es>
References: <48BFF8D9.3030002@jcea.es>
	<loom.20080904T154530-598@post.gmane.org>
	<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>
	<48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es>
Message-ID: <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com>

On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea <jcea at jcea.es> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Nick Coghlan wrote:
>> Guido van Rossum wrote:
>>> On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>>> Jesus Cea <jcea <at> jcea.es> writes:
>>>>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0
>>>>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the
>>>>> method to an attribute "thread.daemon".
>>>>>
>>>>> I think the last change is risky, because you can mistype and create a
>>>>> new attribute, instead of set daemon mode. Since daemon mode is only
>>>>> usually visible when things goes wrong (the main thread dies), you can
>>>>> miss the bug for a long time.
>>>> I've never understood why the "daemon" flag couldn't be passed as one of the
>>>> constructor arguments. It would make code shorter, and avoid the mistyping risk
>>>> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag
>>>> after the thread is started anyway.
>>> As to the why question, this was done to match the Java Thread class.
>>> I don't want to speculate why the Java API was designed this way --
>>> possibly it was a relic of an earlier API version in Java, but
>>> possibly there's a reason I can't fathom right now. After all, there
>>> are excellent reasons why start() is a separate call...
>>
>> Hmm, having (daemon=False) as a parameter on start() would probably be
>> an even better API than having it on __init__() (modulo subclassing
>> compatibility concerns).
>
> Agreed. Could it be done for 3.0?.

Also, FWIW, I thought we were no longer doing API changes?

From guido at python.org  Fri Sep  5 19:37:03 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 5 Sep 2008 10:37:03 -0700
Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta
	3 planned for this Wednesday)
In-Reply-To: <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com>
References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com>
	<52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com>
	<8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com>
	<52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com>
Message-ID: <ca471dc20809051037t69dd4b86pb078a633d34de8cd@mail.gmail.com>

This needs to be fixed. It is surely a relic from the alpha1 situation
where the bytes type was mutable. No read APIs should return mutable
bytes. Write APIs should accept mutable and immutable bytes though.

On Fri, Sep 5, 2008 at 12:17 AM, Gregory P. Smith <greg at krypto.org> wrote:
> Anyone have an opinion on http://bugs.python.org/issue3492 in regards
> to it being a release blocker?
>
> The gist of it:  zlib returns bytearrays where other modules return
> bytes.  zipimport, because it uses zlib, required bytearrays instead
> of bytes as input.  A few other modules also appear to return
> bytearrays when they're likely better off returning bytes for
> consistency.
>
> IMHO, it seems like bytearrays should rarely be returned by the
> existing standard library apis.  Since they are mutable they are
> ideally suited for new APIs where they're passed in and modified.
>
> Whats the big deal if this is not fixed before release?  Users are
> likely to get frustrated at inputs not being hashable without explicit
> (data copy) conversion to an immutable type.  And any code that gets
> written depending on these returning bytearrays instead of bytes would
> need fixing if we waited until 3.1 to fix it.
>
> -gps
>
> On Wed, Sep 3, 2008 at 10:42 PM, Anand Balachandran Pillai
> <abpillai at gmail.com> wrote:
>> On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith <greg at krypto.org> wrote:
>>> I agree that this should go in.  zlib should return bytes.  other read
>>> functions and similar modules like bz2module already return bytes.
>>> unless i hear objections, i'll commit this in about 12 hours.
>>
>> +1  :)
>>
>>>
>>
>> Regards
>>
>> --
>> -Anand
>>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jeremy.kloth at gmail.com  Sat Sep  6 03:54:42 2008
From: jeremy.kloth at gmail.com (Jeremy Kloth)
Date: Fri, 5 Sep 2008 19:54:42 -0600
Subject: [Python-3000] PyUnicodeObject implementation
Message-ID: <200809051954.42787.jeremy.kloth@gmail.com>

I don't know if this is too late to do before the final release, but shouldn't 
the implementation of PyUnicodeObject be updated to match the much more 
efficient old PyStringObject layout?  I mean eliminating the double malloc 
that is currently required for each unicode string.

PyStringObject is declared as a PyVarObject allocated in one chunk, whereas 
the current PyUnicodeObject is a PyObject allocated in two chunks, one for 
the object and one for the Py_UNICODE data.

I think that this change would go a long way towads making unicode strings 
comparable to old (2.x) string speeds. I can see that if not changed now, 
there would be 3rd party extensions that would be relying on the particular 
layout of PyUnicodeObject and therefore making changing it later too risky.

If there is interest in this change, I would happily write a patch that make 
this change.

Thanks,
Jeremy

-- 
Jeremy Kloth
http://4suite.org/

From guido at python.org  Sat Sep  6 04:25:41 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 5 Sep 2008 19:25:41 -0700
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <200809051954.42787.jeremy.kloth@gmail.com>
References: <200809051954.42787.jeremy.kloth@gmail.com>
Message-ID: <ca471dc20809051925h39b3f98er2884d39ae3892f66@mail.gmail.com>

This is an excellent idea that fell by the wayside. Since it is a
major coding project there's no way it can be done for 3.0 -- the risk
of introducing new instabilities or leaks is just too high. We *might*
be able to get it in for 3.0.1, if the code is reviewed really well.
Though it might be safer to aim for 3.1.

On Fri, Sep 5, 2008 at 6:54 PM, Jeremy Kloth <jeremy.kloth at gmail.com> wrote:
> I don't know if this is too late to do before the final release, but shouldn't
> the implementation of PyUnicodeObject be updated to match the much more
> efficient old PyStringObject layout?  I mean eliminating the double malloc
> that is currently required for each unicode string.
>
> PyStringObject is declared as a PyVarObject allocated in one chunk, whereas
> the current PyUnicodeObject is a PyObject allocated in two chunks, one for
> the object and one for the Py_UNICODE data.
>
> I think that this change would go a long way towads making unicode strings
> comparable to old (2.x) string speeds. I can see that if not changed now,
> there would be 3rd party extensions that would be relying on the particular
> layout of PyUnicodeObject and therefore making changing it later too risky.
>
> If there is interest in this change, I would happily write a patch that make
> this change.
>
> Thanks,
> Jeremy
>
> --
> Jeremy Kloth
> http://4suite.org/
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ncoghlan at gmail.com  Sat Sep  6 04:38:26 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 06 Sep 2008 12:38:26 +1000
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com>
References: <48BFF8D9.3030002@jcea.es>	
	<loom.20080904T154530-598@post.gmane.org>	
	<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>	
	<48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es>
	<4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com>
Message-ID: <48C1ED22.5040002@gmail.com>

Jesse Noller wrote:
> On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea <jcea at jcea.es> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Nick Coghlan wrote:
>>> Hmm, having (daemon=False) as a parameter on start() would probably be
>>> an even better API than having it on __init__() (modulo subclassing
>>> compatibility concerns).
>> Agreed. Could it be done for 3.0?.
> 
> Also, FWIW, I thought we were no longer doing API changes?

We aren't - if we'd thought of it a month ago, we could have included
it, but now 2.7/3.1 is the earliest for that change.

As far as the 'typo protection' goes... I'm still not convinced that the
delayed action of the set daemon effect means that the Thread object
needs special protection.

If an application fails to set the attribute properly, then its test
suite will hang on shutdown (as the threading module attempts to do
.join() on a thread that hasn't been told to stop).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From jnoller at gmail.com  Sat Sep  6 04:44:18 2008
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 5 Sep 2008 22:44:18 -0400
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <48C1ED22.5040002@gmail.com>
References: <48BFF8D9.3030002@jcea.es>
	<loom.20080904T154530-598@post.gmane.org>
	<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>
	<48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es>
	<4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com>
	<48C1ED22.5040002@gmail.com>
Message-ID: <4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com>

On Fri, Sep 5, 2008 at 10:38 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Jesse Noller wrote:
>> On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea <jcea at jcea.es> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Nick Coghlan wrote:
>>>> Hmm, having (daemon=False) as a parameter on start() would probably be
>>>> an even better API than having it on __init__() (modulo subclassing
>>>> compatibility concerns).
>>> Agreed. Could it be done for 3.0?.
>>
>> Also, FWIW, I thought we were no longer doing API changes?
>
> We aren't - if we'd thought of it a month ago, we could have included
> it, but now 2.7/3.1 is the earliest for that change.
>
> As far as the 'typo protection' goes... I'm still not convinced that the
> delayed action of the set daemon effect means that the Thread object
> needs special protection.
>
> If an application fails to set the attribute properly, then its test
> suite will hang on shutdown (as the threading module attempts to do
> .join() on a thread that hasn't been told to stop).

I happen to really like like the property-approach. It makes sense to
call thread.daemon = True, it's also clean and feels natural now that
it's there. And you're right - typos in this will bite people fairly
quickly, but to Jesus' point - those people may go chasing something
else before noticing they typed deamon instead of daemon.

-jesse

From solipsis at pitrou.net  Sat Sep  6 12:40:35 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 6 Sep 2008 10:40:35 +0000 (UTC)
Subject: [Python-3000] PyUnicodeObject implementation
References: <200809051954.42787.jeremy.kloth@gmail.com>
Message-ID: <loom.20080906T103737-758@post.gmane.org>

Jeremy Kloth <jeremy.kloth <at> gmail.com> writes:
> 
> I don't know if this is too late to do before the final release, but shouldn't 
> the implementation of PyUnicodeObject be updated to match the much more 
> efficient old PyStringObject layout?  I mean eliminating the double malloc 
> that is currently required for each unicode string.

I have already written such a patch some months ago, you can find it here:
http://bugs.python.org/issue1943

You will perhaps need to adapt the patch a bit in order for it to work properly
with the current py3k branch.

Also note that Marc-Andr? Lemburg (one of the authors of the unicode
implementation) is opposed to that change. See the discussion in the bug tracker
issue for the details.

Regards

Antoine.

From barry at python.org  Sat Sep  6 18:24:01 2008
From: barry at python.org (Barry Warsaw)
Date: Sat, 6 Sep 2008 12:24:01 -0400
Subject: [Python-3000] [Python-3000-checkins] r66218 -
	python/branches/py3k/RELNOTES
In-Reply-To: <ca471dc20809041512g2240c282p786ce63442f27cf7@mail.gmail.com>
References: <20080904134435.6093A1E400B@bag.python.org>
	<ca471dc20809041251k113d44dcj1032ce1ede440111@mail.gmail.com>
	<B4D6452A-5438-4A00-9099-65BE9C1C2E63@python.org>
	<ca471dc20809041512g2240c282p786ce63442f27cf7@mail.gmail.com>
Message-ID: <0286F806-C214-4C6A-BF0B-DEF10A1961D9@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2008, at 6:12 PM, Guido van Rossum wrote:

> On Thu, Sep 4, 2008 at 1:05 PM, Barry Warsaw <barry at python.org> wrote:
>>
>> On Sep 4, 2008, at 3:51 PM, Guido van Rossum wrote:
>>
>>> I'm a little confused -- why did you remove the release notes for
>>> previous betas but leave those for the alphas in place? ISTM that  
>>> the
>>> file was an accumulation of release notes throughout the various
>>> releases -- just like Misc/NEWS, but with a different focus. This is
>>> how release notes in other products I've seen typically work, too.
>>
>> Mostly because I wasn't sure which of those release notes are still
>> relevant.  I asked (albeit in a different thread) for some  
>> assistance in
>> determining what the current state of those are.
>>
>> I'm not so sure it's helpful to separately indicate release notes  
>> for alphas
>> and betas, and it's definitely not helpful to include items that  
>> are no
>> longer relevant.  My thought was to list only those that apply to  
>> the final
>> release, and then track them with public stable releases such as  
>> 3.0.1,
>> 3.0.2, etc.
>
> Well, all the alpha notes should have been fixed by now too -- they
> all describe temporary deviations from our high standard for releases.

Okay, I'm going to blow away the old alpha issues and just leave known  
big issues.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMKuoXEjvBPtnXfVAQKIhQP/c7JOmiEjLGk8Oa3tUbmDIl3ka22ttmxg
u+zQJSZJCgCLjVMrU2CUEjmh5QYqHItctSNdIxnQzwXvTPWdETV6D+7Q2Y+Mx5Qz
7kbzeNXXXWGRMaJyacwwfrtoqn5tA517btCJPjCHvwXl/R79suBT0CtTlvM399NG
iwUCmZEtOfo=
=1MLS
-----END PGP SIGNATURE-----

From greg at krypto.org  Sat Sep  6 22:53:53 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 6 Sep 2008 13:53:53 -0700
Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta
	3 planned for this Wednesday)
In-Reply-To: <ca471dc20809051037t69dd4b86pb078a633d34de8cd@mail.gmail.com>
References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com>
	<52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com>
	<8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com>
	<52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com>
	<ca471dc20809051037t69dd4b86pb078a633d34de8cd@mail.gmail.com>
Message-ID: <52dc1c820809061353w6bd21232jf08ccad6b7bde19f@mail.gmail.com>

issue 3797 created with trivial patches for the remaining bytearray
returning abusers.  review needed.

I don't have a build environment for windows to test the PC/winreg one
on but its too simple to be wrong.

On Fri, Sep 5, 2008 at 10:37 AM, Guido van Rossum <guido at python.org> wrote:
> This needs to be fixed. It is surely a relic from the alpha1 situation
> where the bytes type was mutable. No read APIs should return mutable
> bytes. Write APIs should accept mutable and immutable bytes though.
>
> On Fri, Sep 5, 2008 at 12:17 AM, Gregory P. Smith <greg at krypto.org> wrote:
>> Anyone have an opinion on http://bugs.python.org/issue3492 in regards
>> to it being a release blocker?
>>
>> The gist of it:  zlib returns bytearrays where other modules return
>> bytes.  zipimport, because it uses zlib, required bytearrays instead
>> of bytes as input.  A few other modules also appear to return
>> bytearrays when they're likely better off returning bytes for
>> consistency.
>>
>> IMHO, it seems like bytearrays should rarely be returned by the
>> existing standard library apis.  Since they are mutable they are
>> ideally suited for new APIs where they're passed in and modified.
>>
>> Whats the big deal if this is not fixed before release?  Users are
>> likely to get frustrated at inputs not being hashable without explicit
>> (data copy) conversion to an immutable type.  And any code that gets
>> written depending on these returning bytearrays instead of bytes would
>> need fixing if we waited until 3.1 to fix it.
>>
>> -gps
>>
>> On Wed, Sep 3, 2008 at 10:42 PM, Anand Balachandran Pillai
>> <abpillai at gmail.com> wrote:
>>> On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith <greg at krypto.org> wrote:
>>>> I agree that this should go in.  zlib should return bytes.  other read
>>>> functions and similar modules like bz2module already return bytes.
>>>> unless i hear objections, i'll commit this in about 12 hours.
>>>
>>> +1  :)
>>>
>>>>
>>>
>>> Regards
>>>
>>> --
>>> -Anand
>>>
>> _______________________________________________
>> Python-3000 mailing list
>> Python-3000 at python.org
>> http://mail.python.org/mailman/listinfo/python-3000
>> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>>
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From skip at pobox.com  Sat Sep  6 23:06:49 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 6 Sep 2008 16:06:49 -0500
Subject: [Python-3000] Should package __init__ files include
	pkgutil.extend_path?
Message-ID: <18626.61673.143430.847735@montanaro-dyndns-org.local>

I'm trying to figure out how to install this dbm.sqlite module I have
without overwriting the basic install.  My thought was to create a dbm
package in site-packages then copy sqlite.py there.  That doesn't work
though.  Modifying dbm.__init__.py to include this does:

    import pkgutil
    __path__ = pkgutil.extend_path(__path__, __name__)

I'm wondering if all the core packages in 3.x should include the above in
their __init__.py files.

Skip

From brett at python.org  Sun Sep  7 00:28:18 2008
From: brett at python.org (Brett Cannon)
Date: Sat, 6 Sep 2008 15:28:18 -0700
Subject: [Python-3000] Should package __init__ files include
	pkgutil.extend_path?
In-Reply-To: <18626.61673.143430.847735@montanaro-dyndns-org.local>
References: <18626.61673.143430.847735@montanaro-dyndns-org.local>
Message-ID: <bbaeab100809061528m490f7484m71074ac450ef49c1@mail.gmail.com>

On Sat, Sep 6, 2008 at 2:06 PM,  <skip at pobox.com> wrote:
> I'm trying to figure out how to install this dbm.sqlite module I have
> without overwriting the basic install.  My thought was to create a dbm
> package in site-packages then copy sqlite.py there.  That doesn't work
> though.  Modifying dbm.__init__.py to include this does:
>
>    import pkgutil
>    __path__ = pkgutil.extend_path(__path__, __name__)
>
> I'm wondering if all the core packages in 3.x should include the above in
> their __init__.py files.
>

Well, a side-effect of this is that all package imports will suddenly
spike the number of stat calls linearly to the number of entries on
sys.path.

Another option is to use a pth file that imports your module (as like
_dbm_sqlite.py or something) and have it, as a side-effect of
importing, set itself on dbm.

-Brett

From skip at pobox.com  Sun Sep  7 00:36:08 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 6 Sep 2008 17:36:08 -0500
Subject: [Python-3000] Nonlinearity in dbm.ndbm?
Message-ID: <18627.1496.785870.379332@montanaro-dyndns-org.local>

While doing a little testing of my dbm.sqlite module (it's pretty damn slow
at the moment) I came across this chestnut.  Given this shell for loop:

    for n in 10 100 1000 10000 ; do
        rm -f /tmp/trash.db*
        python3.0 -m timeit -s 'import dbm.ndbm as db' -s 'f = db.open("/tmp/trash.db", "c")' 'for i in range('$n'): f[str(i)] = str(i)'
    done

I get this output:

    100000 loops, best of 3: 16 usec per loop
    1000 loops, best of 3: 185 usec per loop
    100 loops, best of 3: 5.04 msec per loop
    10 loops, best of 3: 207 msec per loop

Replacing dbm.ndbm with dbm.sqlite shows more linear growth (only went to
n=1000 because it was so slow):

    10 loops, best of 3: 44.9 msec per loop
    10 loops, best of 3: 460 msec per loop
    10 loops, best of 3: 5.26 sec per loop

My guess is there is something nonlinear in the ndbm code, probably the
underlying library, but it may be worth checking the wrapper quickly.

Platform is Mac OSX 10.5.4 on a MacBook Pro.

Now to dig into the abysmal sqlite performance.

Skip

From josiah.carlson at gmail.com  Sun Sep  7 00:47:49 2008
From: josiah.carlson at gmail.com (Josiah Carlson)
Date: Sat, 6 Sep 2008 15:47:49 -0700
Subject: [Python-3000] Nonlinearity in dbm.ndbm?
In-Reply-To: <18627.1496.785870.379332@montanaro-dyndns-org.local>
References: <18627.1496.785870.379332@montanaro-dyndns-org.local>
Message-ID: <e6511dbf0809061547n3e5cbac3s699fee8dcf897399@mail.gmail.com>

On Sat, Sep 6, 2008 at 3:36 PM,  <skip at pobox.com> wrote:
> While doing a little testing of my dbm.sqlite module (it's pretty damn slow
> at the moment) I came across this chestnut.  Given this shell for loop:
>
>    for n in 10 100 1000 10000 ; do
>        rm -f /tmp/trash.db*
>        python3.0 -m timeit -s 'import dbm.ndbm as db' -s 'f = db.open("/tmp/trash.db", "c")' 'for i in range('$n'): f[str(i)] = str(i)'
>    done
>
> I get this output:
>
>    100000 loops, best of 3: 16 usec per loop
>    1000 loops, best of 3: 185 usec per loop
>    100 loops, best of 3: 5.04 msec per loop
>    10 loops, best of 3: 207 msec per loop
>
> Replacing dbm.ndbm with dbm.sqlite shows more linear growth (only went to
> n=1000 because it was so slow):
>
>    10 loops, best of 3: 44.9 msec per loop
>    10 loops, best of 3: 460 msec per loop
>    10 loops, best of 3: 5.26 sec per loop
>
> My guess is there is something nonlinear in the ndbm code, probably the
> underlying library, but it may be worth checking the wrapper quickly.
>
> Platform is Mac OSX 10.5.4 on a MacBook Pro.
>
> Now to dig into the abysmal sqlite performance.

The version I just posted to the tracker reads/writes about 30k
entries/second.  You may want to look at the differences (looks to be
due to your lack of a primary key/index).

 - Josiah

From skip at pobox.com  Sun Sep  7 01:25:48 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 6 Sep 2008 18:25:48 -0500
Subject: [Python-3000] Nonlinearity in dbm.ndbm?
In-Reply-To: <e6511dbf0809061547n3e5cbac3s699fee8dcf897399@mail.gmail.com>
References: <18627.1496.785870.379332@montanaro-dyndns-org.local>
	<e6511dbf0809061547n3e5cbac3s699fee8dcf897399@mail.gmail.com>
Message-ID: <18627.4476.160727.405718@montanaro-dyndns-org.local>

    >> Now to dig into the abysmal sqlite performance.

    Josiah> The version I just posted to the tracker reads/writes about 30k
    Josiah> entries/second.  You may want to look at the differences (looks
    Josiah> to be due to your lack of a primary key/index).

Thanks.  The real speedup was to avoid using cursors.  Here's the
progression:

* My original (no indexes, keys and values are text, using cursors w/
  commit and explicit close, delete+insert to assign key):

    10 loops, best of 3: 51.4 msec per loop
    10 loops, best of 3: 505 msec per loop

* As above, but with a primary key:

    10 loops, best of 3: 52.5 msec per loop
    10 loops, best of 3: 507 msec per loop

* As above, but keys and values are blobs:

    10 loops, best of 3: 50.4 msec per loop
    10 loops, best of 3: 529 msec per loop

* As above, but get rid of del self[key] in __setitem__ and use the replace
  statement instead of insert:

    10 loops, best of 3: 25.4 msec per loop
    10 loops, best of 3: 263 msec per loop

* Remove try/finally with explicit close() calls (Gerhard says he never
  closes cursors.):

    10 loops, best of 3: 23.2 msec per loop
    10 loops, best of 3: 270 msec per loop

* Get rid of cursors, calling the connection's execute method instead:

    1000 loops, best of 3: 198 usec per loop
    100 loops, best of 3: 2.26 msec per loop

Hmmm...  Should cursors be used?  What benefit are they?  Without them is
the sqlite code thread-safe?

Skip

From ncoghlan at gmail.com  Sun Sep  7 04:33:07 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 07 Sep 2008 12:33:07 +1000
Subject: [Python-3000] Should package __init__ files
	include	pkgutil.extend_path?
In-Reply-To: <bbaeab100809061528m490f7484m71074ac450ef49c1@mail.gmail.com>
References: <18626.61673.143430.847735@montanaro-dyndns-org.local>
	<bbaeab100809061528m490f7484m71074ac450ef49c1@mail.gmail.com>
Message-ID: <48C33D63.1010305@gmail.com>

Brett Cannon wrote:
> On Sat, Sep 6, 2008 at 2:06 PM,  <skip at pobox.com> wrote:
>> I'm trying to figure out how to install this dbm.sqlite module I have
>> without overwriting the basic install.  My thought was to create a dbm
>> package in site-packages then copy sqlite.py there.  That doesn't work
>> though.  Modifying dbm.__init__.py to include this does:
>>
>>    import pkgutil
>>    __path__ = pkgutil.extend_path(__path__, __name__)
>>
>> I'm wondering if all the core packages in 3.x should include the above in
>> their __init__.py files.
>>
> 
> Well, a side-effect of this is that all package imports will suddenly
> spike the number of stat calls linearly to the number of entries on
> sys.path.
> 
> Another option is to use a pth file that imports your module (as like
> _dbm_sqlite.py or something) and have it, as a side-effect of
> importing, set itself on dbm.

It would probably be cleaner to add "extend_path" functions to the
extensible core packages rather than have them automatically extend
their path list on startup.

E.g. dbm.__init__.py may have something like the following:

def extend_package(dirs=None):
  global __path__
  if dirs is None:
    import pkgutil
    if __package_name__ is not None:
      name = __package_name__
    else:
      name = __name__
    __path__ = pkgutil.extend_path(__path__, name)
  else:
    __path__.extend(dirs)

So the standard library packages would be self-contained by default, but
an application could explicitly request that the extensible packages be
expanded to incorporate other directories.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From skip at pobox.com  Sun Sep  7 05:37:16 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 6 Sep 2008 22:37:16 -0500
Subject: [Python-3000] Nonlinearity in dbm.ndbm?
In-Reply-To: <18627.4476.160727.405718@montanaro-dyndns-org.local>
References: <18627.1496.785870.379332@montanaro-dyndns-org.local>
	<e6511dbf0809061547n3e5cbac3s699fee8dcf897399@mail.gmail.com>
	<18627.4476.160727.405718@montanaro-dyndns-org.local>
Message-ID: <18627.19564.859436.270707@montanaro-dyndns-org.local>

    Josiah> The version I just posted to the tracker reads/writes about 30k
    Josiah> entries/second.  You may want to look at the differences (looks
    Josiah> to be due to your lack of a primary key/index).

    me> Thanks.  The real speedup was to avoid using cursors.

Let me take another stab at this.  My __setitem__ looks like this:

    def __setitem__(self, key, val):
        c = self._conn.cursor()
        c.execute("replace into dict"
                  " (key, value) values (?, ?)", (key, val))
        self._conn.commit()

This works (tests pass), but is slow (23-25 msec per loop).  If I change it
to this:

    def __setitem__(self, key, val):
        self._conn.execute("replace into dict"
                           " (key, value) values (?, ?)", (key, val))

which is essentially your __setitem__ without the type checks on the key and
value, it runs much faster (about 300 usec per loop), but the unit tests
fail.  This also works:

    def __setitem__(self, key, val):
        self._conn.execute("replace into dict"
                           " (key, value) values (?, ?)", (key, val))
        self._conn.commit()

I think you need the commits and have to suffer with the speed penalty.

Skip

From josiah.carlson at gmail.com  Sun Sep  7 05:58:20 2008
From: josiah.carlson at gmail.com (Josiah Carlson)
Date: Sat, 6 Sep 2008 20:58:20 -0700
Subject: [Python-3000] Nonlinearity in dbm.ndbm?
In-Reply-To: <18627.19564.859436.270707@montanaro-dyndns-org.local>
References: <18627.1496.785870.379332@montanaro-dyndns-org.local>
	<e6511dbf0809061547n3e5cbac3s699fee8dcf897399@mail.gmail.com>
	<18627.4476.160727.405718@montanaro-dyndns-org.local>
	<18627.19564.859436.270707@montanaro-dyndns-org.local>
Message-ID: <e6511dbf0809062058w39f607d0vec946849e256d52@mail.gmail.com>

On Sat, Sep 6, 2008 at 8:37 PM,  <skip at pobox.com> wrote:
>
>    Josiah> The version I just posted to the tracker reads/writes about 30k
>    Josiah> entries/second.  You may want to look at the differences (looks
>    Josiah> to be due to your lack of a primary key/index).
>
>    me> Thanks.  The real speedup was to avoid using cursors.
>
> Let me take another stab at this.  My __setitem__ looks like this:
>
>    def __setitem__(self, key, val):
>        c = self._conn.cursor()
>        c.execute("replace into dict"
>                  " (key, value) values (?, ?)", (key, val))
>        self._conn.commit()
>
> This works (tests pass), but is slow (23-25 msec per loop).  If I change it
> to this:
>
>    def __setitem__(self, key, val):
>        self._conn.execute("replace into dict"
>                           " (key, value) values (?, ?)", (key, val))
>
> which is essentially your __setitem__ without the type checks on the key and
> value, it runs much faster (about 300 usec per loop), but the unit tests
> fail.  This also works:
>
>    def __setitem__(self, key, val):
>        self._conn.execute("replace into dict"
>                           " (key, value) values (?, ?)", (key, val))
>        self._conn.commit()
>
> I think you need the commits and have to suffer with the speed penalty.

I guess I need to look at your unittests, because in my testing,
reading/writing with a single instance works great, but if you want
changes to be seen by other instances (in other threads or processes),
you need to .commit() changes.  I'm thinking that that's a reasonable
expectation; I never expected bsddbs to be able to share their data
with other processes until I did a .sync(), but maybe I never expected
much from my dbm-like interfaces?

 - Josiah

From josiah.carlson at gmail.com  Sun Sep  7 06:08:03 2008
From: josiah.carlson at gmail.com (Josiah Carlson)
Date: Sat, 6 Sep 2008 21:08:03 -0700
Subject: [Python-3000] Nonlinearity in dbm.ndbm?
In-Reply-To: <e6511dbf0809062058w39f607d0vec946849e256d52@mail.gmail.com>
References: <18627.1496.785870.379332@montanaro-dyndns-org.local>
	<e6511dbf0809061547n3e5cbac3s699fee8dcf897399@mail.gmail.com>
	<18627.4476.160727.405718@montanaro-dyndns-org.local>
	<18627.19564.859436.270707@montanaro-dyndns-org.local>
	<e6511dbf0809062058w39f607d0vec946849e256d52@mail.gmail.com>
Message-ID: <e6511dbf0809062108x366d98fu5252c46f1f3f1174@mail.gmail.com>

On Sat, Sep 6, 2008 at 8:58 PM, Josiah Carlson <josiah.carlson at gmail.com> wrote:
> On Sat, Sep 6, 2008 at 8:37 PM,  <skip at pobox.com> wrote:
>>
>>    Josiah> The version I just posted to the tracker reads/writes about 30k
>>    Josiah> entries/second.  You may want to look at the differences (looks
>>    Josiah> to be due to your lack of a primary key/index).
>>
>>    me> Thanks.  The real speedup was to avoid using cursors.
>>
>> Let me take another stab at this.  My __setitem__ looks like this:
>>
>>    def __setitem__(self, key, val):
>>        c = self._conn.cursor()
>>        c.execute("replace into dict"
>>                  " (key, value) values (?, ?)", (key, val))
>>        self._conn.commit()
>>
>> This works (tests pass), but is slow (23-25 msec per loop).  If I change it
>> to this:
>>
>>    def __setitem__(self, key, val):
>>        self._conn.execute("replace into dict"
>>                           " (key, value) values (?, ?)", (key, val))
>>
>> which is essentially your __setitem__ without the type checks on the key and
>> value, it runs much faster (about 300 usec per loop), but the unit tests
>> fail.  This also works:
>>
>>    def __setitem__(self, key, val):
>>        self._conn.execute("replace into dict"
>>                           " (key, value) values (?, ?)", (key, val))
>>        self._conn.commit()
>>
>> I think you need the commits and have to suffer with the speed penalty.
>
> I guess I need to look at your unittests, because in my testing,
> reading/writing with a single instance works great, but if you want
> changes to be seen by other instances (in other threads or processes),
> you need to .commit() changes.  I'm thinking that that's a reasonable
> expectation; I never expected bsddbs to be able to share their data
> with other processes until I did a .sync(), but maybe I never expected
> much from my dbm-like interfaces?

I took sandbox/trunk/dbm_sqlite/Lib/test/test_dbm_sqlite.py, changed
some of the imports to be used with 2.6, got rid of the 'b' prefix on
bytes objects, and my implementation passes in 2.5 (I had to add
support for buffer, double-close, and the 'c' flag).

Maybe there's something funky going on with Python 3.0's sqlite3?

 - Josiah

From stefan_ml at behnel.de  Sun Sep  7 09:15:42 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 07 Sep 2008 09:15:42 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <loom.20080906T103737-758@post.gmane.org>
References: <200809051954.42787.jeremy.kloth@gmail.com>
	<loom.20080906T103737-758@post.gmane.org>
Message-ID: <g9vv2u$3ot$1@ger.gmane.org>

Antoine Pitrou wrote:
> Also note that Marc-Andr? Lemburg (one of the authors of the unicode
> implementation) is opposed to that change. See the discussion in the bug tracker
> issue for the details.

>From a Cython perspective, I find the lack of efficient subclassing after such
a change particularly striking. That seriously bit me in Py2 when I tried
making XML text content a bit more intelligent in lxml (i.e. make it remember
what XML element it originated from). Having the same problem for unicode in
Py3 doesn't sound like a good idea to me.

Stefan

From solipsis at pitrou.net  Sun Sep  7 15:52:54 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 7 Sep 2008 13:52:54 +0000 (UTC)
Subject: [Python-3000] PyUnicodeObject implementation
References: <200809051954.42787.jeremy.kloth@gmail.com>
	<loom.20080906T103737-758@post.gmane.org>
	<g9vv2u$3ot$1@ger.gmane.org>
Message-ID: <loom.20080907T132230-87@post.gmane.org>

Stefan Behnel <stefan_ml <at> behnel.de> writes:
> 
> From a Cython perspective, I find the lack of efficient subclassing after such
> a change particularly striking. That seriously bit me in Py2 when I tried
> making XML text content a bit more intelligent in lxml (i.e. make it remember
> what XML element it originated from).

I've used a library which had adopted this kind of behaviour (I think it was
BeautifulSoup). After using it several times in a row I noticed memory
consumption of my program exploded. The problem was that the library was
returning objects which looked innocently like strings, but internally kept a
reference to a multi-megabyte HTML tree. The solution was to convert them
explicitly to str before storing them for later use, which defeated the point of
having an str-derived type.

In these cases I think it's much friendlier to the user of the API to use
composition rather than inheritance. Or, simply, just return a raw string and
let the user keep the context separately if he wants to.

PS: what do you call "efficient subclassing"? if you look at the current
implementation of unicode_subtype_new() in unicodeobject.c, it isn't very
efficient (everything including the raw data buffer is allocated twice).

From guido at python.org  Sun Sep  7 16:38:06 2008
From: guido at python.org (Guido van Rossum)
Date: Sun, 7 Sep 2008 07:38:06 -0700
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <g9vv2u$3ot$1@ger.gmane.org>
References: <200809051954.42787.jeremy.kloth@gmail.com>
	<loom.20080906T103737-758@post.gmane.org> <g9vv2u$3ot$1@ger.gmane.org>
Message-ID: <ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>

On Sun, Sep 7, 2008 at 12:15 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Antoine Pitrou wrote:
>> Also note that Marc-Andr? Lemburg (one of the authors of the unicode
>> implementation) is opposed to that change. See the discussion in the bug tracker
>> issue for the details.
>
> From a Cython perspective, I find the lack of efficient subclassing after such
> a change particularly striking. That seriously bit me in Py2 when I tried
> making XML text content a bit more intelligent in lxml (i.e. make it remember
> what XML element it originated from). Having the same problem for unicode in
> Py3 doesn't sound like a good idea to me.

Can you explain this a bit more? I presume you're talking about
subclassing in C, which always precarious -- from the Python
perspective there's no difference, the objects are opaque. I do note
that the mechanisms that exist for supporting adding a __dict__ to a
str (in 2.x; or bytes in 3.x) or a tuple could be extended for other
purposes.

Also, please explain why instead of subclassing you couldn't use a
wrapper class? (I.e. use containment instead of inheritance.)

All in all, given the advantage (half the number of allocations) of
the proposal I think there would have to be *very* good arguments
against before we reject this outright. I'd like to understand
Marc-Andre's reasons too.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stefan_ml at behnel.de  Sun Sep  7 16:46:29 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 07 Sep 2008 16:46:29 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <loom.20080907T132230-87@post.gmane.org>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>
	<loom.20080907T132230-87@post.gmane.org>
Message-ID: <ga0pg4$to6$1@ger.gmane.org>

Antoine Pitrou wrote:
> Stefan Behnel <stefan_ml <at> behnel.de> writes:
>> From a Cython perspective, I find the lack of efficient subclassing after such
>> a change particularly striking. That seriously bit me in Py2 when I tried
>> making XML text content a bit more intelligent in lxml (i.e. make it remember
>> what XML element it originated from).
> 
> I've used a library which had adopted this kind of behaviour (I think it was
> BeautifulSoup). After using it several times in a row I noticed memory
> consumption of my program exploded. The problem was that the library was
> returning objects which looked innocently like strings, but internally kept a
> reference to a multi-megabyte HTML tree. The solution was to convert them
> explicitly to str before storing them for later use, which defeated the point of
> having an str-derived type.

I'm aware of that problem.

> In these cases I think it's much friendlier to the user of the API to use
> composition rather than inheritance. Or, simply, just return a raw string and
> let the user keep the context separately if he wants to.

That's not that easy for the result of an arbitrary XPath query. But you can
switch the behaviour off when you build the query, so that it gives you a
straight string as result.

> PS: what do you call "efficient subclassing"? if you look at the current
> implementation of unicode_subtype_new() in unicodeobject.c, it isn't very
> efficient (everything including the raw data buffer is allocated twice).

That's something that may be optimised one day without affecting user code. A
different memory layout that prevents C-level subclassing is a very different
kind of change.

Plus, even with the double-allocation, a C-level subclass is still faster than
a Python-level subclass for me. Setup for timeit:

        s = b"abcdef ghijk";
        from lxml.etree import _ElementUnicodeResult;
        u = type("u", (unicode,), {})

$ python2.6 -m timeit ... 'unicode(s)'
1000000 loops, best of 3: 0.623 usec per loop

$ python2.6 -m timeit -s ... '_ElementUnicodeResult(s)'
1000000 loops, best of 3: 0.822 usec per loop

$ python2.6 -m timeit -s ... 'u(s)'
1000000 loops, best of 3: 0.849 usec per loop

$ python2.6 -m timeit -s ... 'unicode(s, "utf-8")'
1000000 loops, best of 3: 0.622 usec per loop

$ python2.6 -m timeit -s ... '_ElementUnicodeResult(s, "utf-8")'
1000000 loops, best of 3: 0.806 usec per loop

$ python2.6 -m timeit -s ... 'u(s, "utf-8")'
1000000 loops, best of 3: 0.844 usec per loop

Doing the same with a unicode string as input gives me lower but similar numbers.

Stefan

From stefan_ml at behnel.de  Sun Sep  7 16:58:13 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 07 Sep 2008 16:58:13 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>
	<g9vv2u$3ot$1@ger.gmane.org>
	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
Message-ID: <ga0q63$vrd$1@ger.gmane.org>

Hi,

Guido van Rossum wrote:
> On Sun, Sep 7, 2008 at 12:15 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> Antoine Pitrou wrote:
>>> Also note that Marc-Andr? Lemburg (one of the authors of the unicode
>>> implementation) is opposed to that change. See the discussion in the bug tracker
>>> issue for the details.
>> From a Cython perspective, I find the lack of efficient subclassing after such
>> a change particularly striking. That seriously bit me in Py2 when I tried
>> making XML text content a bit more intelligent in lxml (i.e. make it remember
>> what XML element it originated from). Having the same problem for unicode in
>> Py3 doesn't sound like a good idea to me.
> 
> Can you explain this a bit more? I presume you're talking about
> subclassing in C

Yes, I mentioned Cython above.

> I do note that the mechanisms that exist for supporting adding a __dict__
> to a str (in 2.x; or bytes in 3.x) or a tuple could be extended for other
> purposes.

I never looked into these, but this does not sound like it would impact
subclassing.

> Also, please explain why instead of subclassing you couldn't use a
> wrapper class? (I.e. use containment instead of inheritance.)

Because users will expect that the return values can be passed into anything
that accepts a string, which is much more than you could catch with a wrapper
class. There are tons of C-level APIs inside and outside of Python itself that
require strings for certain operations and will not accept any other object.
Just think of passing a wrapper object as type name of a newly created type.

Stefan

From skip at pobox.com  Sun Sep  7 17:22:11 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 7 Sep 2008 10:22:11 -0500
Subject: [Python-3000] Would someone please look at this bug report?
Message-ID: <18627.61859.941803.115472@montanaro-dyndns-org.local>

I created this bug report against 3.0 yesterday:

    http://bugs.python.org/issue3799

I marked it high priority because it seems to me that all the dbm.* modules
should agree on whether they accept strings as keys or require bytes.
That's clearly not the case at the moment.  I suppose perhaps I should have
marked it as a release blocker, but I don't think that's my call.

Thx,

Skip

From martin at v.loewis.de  Sun Sep  7 18:01:53 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 07 Sep 2008 18:01:53 +0200
Subject: [Python-3000] XML as bytes or unicode?
In-Reply-To: <ca471dc20808250923h39fd06bei16c71508e391610f@mail.gmail.com>
References: <ffb592890808171040g5db8f25flb9a1222f40364760@mail.gmail.com>	
	<bbaeab100808171209q7b29fff0pa585c02e0c95053a@mail.gmail.com>	
	<loom.20080818T162617-305@post.gmane.org>	
	<1afaf6160808180935k3470efc0n65c318b87d54a99@mail.gmail.com>	
	<loom.20080818T164216-491@post.gmane.org>	
	<bbaeab100808181029w1eb1051fp9aecf0c6e239a818@mail.gmail.com>	
	<48B24525.3080808@v.loewis.de>
	<ca471dc20808250923h39fd06bei16c71508e391610f@mail.gmail.com>
Message-ID: <48C3FAF1.5090909@v.loewis.de>

>> Parsing Unicode XML strings isn't quite that meaningful.
> 
> Maybe not according to the XML standard, but I can see lots of
> practical situations where the encoding is always known and applied by
> some other layer, i.e. the I/O library or a database wrapper. Forcing
> XML to be interpreted as binary isn't always the best idea. E.g.
> consider storing XML in a SVN repository. Or consider storing XML
> fragments in Python string literals.

Stefan got it right - a "higher-level protocol" may override the
encoding declaration in the XML data. In the case of Python Unicode
strings, the data is 16-bit Unicode (or 32-bit), "obviously" overriding
the declared encoding (although technically, that protocol needs to
explicitly state what encoding takes precedence).

So let me rephrase: "Parsing Unicode XML strings may easily lead
to parsing problems" (i.e. if the parser hasn't been told that a
higher-layer protocol was in place). This is currently the case in 3.0:

py> d=xml.dom.minidom.parseString("<?xml version='1.0'
encoding='iso-8859-1'?><hallo>\u20ac</hallo>")
py> d.documentElement.childNodes[0].data
'?\x82?'
py> list(map(ord,d.documentElement.childNodes[0].data))
[226, 130, 172]

Regards,
Martin

From barry at python.org  Sun Sep  7 18:02:06 2008
From: barry at python.org (Barry Warsaw)
Date: Sun, 7 Sep 2008 12:02:06 -0400
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <ga0ppm$udq$1@ger.gmane.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
Message-ID: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 7, 2008, at 10:51 AM, Fredrik Lundh wrote:

> Barry Warsaw wrote:
>
>> I'm not going to release rc1 tonight.  There are too many open  
>> release blockers that I don't want to defer, and I'd like the  
>> buildbots to churn through the bsddb removal on all platforms.
>
>> I'd like to try again on Friday and stick to rc2 on the 17th.
>
> any news on this front?
>
> (I have a few minor ET fixes, and possibly a Unicode 5.1 patch, but  
> have had absolutely no time to spend on that.  is the window still  
> open?)

There are 8 open release blockers, a few of which have patches that  
need review.  So I think we are still not ready to release rc1.  But  
it worries me because I think this is going to push the final release  
beyond our October 1st goal.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMP6/3EjvBPtnXfVAQIprQQAsWgxQPKyxM/rrG5TWL4UqI7xne6dLTjL
Nx3OBpi8hcNXEqyxzoosFXXZy4PpSWU+SwxuI1YQT9rUjv/ks6yxu3cBcEVhtEHV
KE34YS4D825tVGvbvpsOXF06fsfv5j5zZGB6hlSipZoiv1rhR3uEsO2zkWaI4eQ6
Ty2Cfuxu10A=
=8eP5
-----END PGP SIGNATURE-----

From brett at python.org  Sun Sep  7 21:58:15 2008
From: brett at python.org (Brett Cannon)
Date: Sun, 7 Sep 2008 12:58:15 -0700
Subject: [Python-3000] Should package __init__ files include
	pkgutil.extend_path?
In-Reply-To: <48C33D63.1010305@gmail.com>
References: <18626.61673.143430.847735@montanaro-dyndns-org.local>
	<bbaeab100809061528m490f7484m71074ac450ef49c1@mail.gmail.com>
	<48C33D63.1010305@gmail.com>
Message-ID: <bbaeab100809071258h7348527ai2c9e0815f096681c@mail.gmail.com>

On Sat, Sep 6, 2008 at 7:33 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
[SNIP]
> So the standard library packages would be self-contained by default, but
> an application could explicitly request that the extensible packages be
> expanded to incorporate other directories.
>

I was thinking about this the other day and realized there is a risk
of people make incorrect associations of third-party code as part of
the stdlib. It also could lead to future name clashes with modules if
we were ever to add a module with the same name as one injected by a
third-party.

-Brett

From brett at python.org  Sun Sep  7 22:02:11 2008
From: brett at python.org (Brett Cannon)
Date: Sun, 7 Sep 2008 13:02:11 -0700
Subject: [Python-3000] Would someone please look at this bug report?
In-Reply-To: <18627.61859.941803.115472@montanaro-dyndns-org.local>
References: <18627.61859.941803.115472@montanaro-dyndns-org.local>
Message-ID: <bbaeab100809071302v45e0a57du75834e8cf5606bb2@mail.gmail.com>

On Sun, Sep 7, 2008 at 8:22 AM,  <skip at pobox.com> wrote:
> I created this bug report against 3.0 yesterday:
>
>    http://bugs.python.org/issue3799
>
> I marked it high priority because it seems to me that all the dbm.* modules
> should agree on whether they accept strings as keys or require bytes.
> That's clearly not the case at the moment.  I suppose perhaps I should have
> marked it as a release blocker, but I don't think that's my call.
>

Well, I think it is your call by being a core developer. If Barry
disagrees he can lower the priority.

-Brett

From ncoghlan at gmail.com  Sun Sep  7 23:23:26 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 08 Sep 2008 07:23:26 +1000
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>
	<g9vv2u$3ot$1@ger.gmane.org>
	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
Message-ID: <48C4464E.5010707@gmail.com>

Guido van Rossum wrote:
> All in all, given the advantage (half the number of allocations) of
> the proposal I think there would have to be *very* good arguments
> against before we reject this outright. I'd like to understand
> Marc-Andre's reasons too.

As Stefan notes, because of the frequency with which strings are
manipulated in C code via PyString_* / PyUnicode_* calls, it is a data
type where "accept no substitutes" prevails.

MAL's primary concern appears to be that having Unicode as a plain
PyObject leaves the type more open to subclass-based optimisations that
have been rejected for the builtin types themselves. Having
PyString/PyBytes as PyVarObjects means that subclasses are more limited
in what they can do.

One possibility that occurs to me is to use a PyVarObject variant that
allocates space for an additional void pointer before the variable sized
section of the object. The builtin type would leave that pointer NULL,
but subtypes could perform the second allocation needed to populate it.

The question is whether the 4-8 bytes wasted per object would be worth
the fact that only one memory allocation would be needed.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sun Sep  7 23:25:14 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 08 Sep 2008 07:25:14 +1000
Subject: [Python-3000] Would someone please look at this bug report?
In-Reply-To: <bbaeab100809071302v45e0a57du75834e8cf5606bb2@mail.gmail.com>
References: <18627.61859.941803.115472@montanaro-dyndns-org.local>
	<bbaeab100809071302v45e0a57du75834e8cf5606bb2@mail.gmail.com>
Message-ID: <48C446BA.8070301@gmail.com>

Brett Cannon wrote:
> On Sun, Sep 7, 2008 at 8:22 AM,  <skip at pobox.com> wrote:
>> I created this bug report against 3.0 yesterday:
>>
>>    http://bugs.python.org/issue3799
>>
>> I marked it high priority because it seems to me that all the dbm.* modules
>> should agree on whether they accept strings as keys or require bytes.
>> That's clearly not the case at the moment.  I suppose perhaps I should have
>> marked it as a release blocker, but I don't think that's my call.
>>
> 
> Well, I think it is your call by being a core developer. If Barry
> disagrees he can lower the priority.

That's the way I've been interpreting it (and while a couple of them
have certainly turned out to be less urgent than I thought after further
analysis, I don't regret getting an explicit decision on them before the
rc went out).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From guido at python.org  Mon Sep  8 00:55:32 2008
From: guido at python.org (Guido van Rossum)
Date: Sun, 7 Sep 2008 15:55:32 -0700
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <48C4464E.5010707@gmail.com>
References: <200809051954.42787.jeremy.kloth@gmail.com>
	<loom.20080906T103737-758@post.gmane.org> <g9vv2u$3ot$1@ger.gmane.org>
	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
	<48C4464E.5010707@gmail.com>
Message-ID: <ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>

On Sun, Sep 7, 2008 at 2:23 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
>> All in all, given the advantage (half the number of allocations) of
>> the proposal I think there would have to be *very* good arguments
>> against before we reject this outright. I'd like to understand
>> Marc-Andre's reasons too.
>
> As Stefan notes, because of the frequency with which strings are
> manipulated in C code via PyString_* / PyUnicode_* calls, it is a data
> type where "accept no substitutes" prevails.
>
> MAL's primary concern appears to be that having Unicode as a plain
> PyObject leaves the type more open to subclass-based optimisations that
> have been rejected for the builtin types themselves.

Hm. I don't have any particularly insightful imagination as to what
those optimizations might be. Have any been implemented (in 3rd party
code) in the 8 years that the Unicode object has existed?

> Having
> PyString/PyBytes as PyVarObjects means that subclasses are more limited
> in what they can do.

True.

> One possibility that occurs to me is to use a PyVarObject variant that
> allocates space for an additional void pointer before the variable sized
> section of the object. The builtin type would leave that pointer NULL,
> but subtypes could perform the second allocation needed to populate it.
>
> The question is whether the 4-8 bytes wasted per object would be worth
> the fact that only one memory allocation would be needed.

I believe that 4-8 bytes is more than the overhead of an extra memory
allocation from the obmalloc heap. It is probably about the same as
the overhead for a memory allocation from the regular malloc heap. So
for short strings (of which there are often a lot) it would be more
expensive; for longer objects it would probably work out just about
the same.

There could be a different approach though, whereby the offset from
the start of the object to the start of the character array wasn't a
constant but a value stored in the class object. (In fact,
tp_basicsize could probably be used for this.) It would slow down
access to the characters a bit though -- a classic time-space
trade-off that would require careful measurement in order to decide
which is better.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From foom at fuhm.net  Mon Sep  8 02:23:21 2008
From: foom at fuhm.net (James Y Knight)
Date: Sun, 7 Sep 2008 20:23:21 -0400
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>
References: <200809051954.42787.jeremy.kloth@gmail.com>
	<loom.20080906T103737-758@post.gmane.org>
	<g9vv2u$3ot$1@ger.gmane.org>
	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
	<48C4464E.5010707@gmail.com>
	<ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>
Message-ID: <C5B0E433-C918-413B-8109-C91121B3904C@fuhm.net>

On Sep 7, 2008, at 6:55 PM, Guido van Rossum wrote:
>> One possibility that occurs to me is to use a PyVarObject variant  
>> that
>> allocates space for an additional void pointer before the variable  
>> sized
>> section of the object. The builtin type would leave that pointer  
>> NULL,
>> but subtypes could perform the second allocation needed to populate  
>> it.
>>
>> The question is whether the 4-8 bytes wasted per object would be  
>> worth
>> the fact that only one memory allocation would be needed.
>
> I believe that 4-8 bytes is more than the overhead of an extra memory
> allocation from the obmalloc heap. It is probably about the same as
> the overhead for a memory allocation from the regular malloc heap. So
> for short strings (of which there are often a lot) it would be more
> expensive; for longer objects it would probably work out just about
> the same.
>
> There could be a different approach though, whereby the offset from
> the start of the object to the start of the character array wasn't a
> constant but a value stored in the class object. (In fact,
> tp_basicsize could probably be used for this.) It would slow down
> access to the characters a bit though -- a classic time-space
> trade-off that would require careful measurement in order to decide
> which is better.

Given that you can, today, subclass str in Python, without wasting an  
extra 4/8 bytes of memory, or adding anything new to the class object,  
why wouldn't anyone who really wanted to make a hypothetical optimized  
subclass just use the same mechanism (putting your additional data  
*after* the character data) to subclass it in C?

It may be a little tricky, but not exactly rocket science, and given  
that all these C subclasses of str are so far hypothetical, just  
leaving it as "it's possible" seems perfectly reasonable...

James

From wescpy at gmail.com  Mon Sep  8 02:34:59 2008
From: wescpy at gmail.com (wesley chun)
Date: Sun, 7 Sep 2008 17:34:59 -0700
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
Message-ID: <78b3a9580809071734u5967f305mbff6120dfca538b7@mail.gmail.com>

>> Barry Warsaw wrote:
>>> I'm not going to release rc1 tonight.
>>> I'd like to try again on Friday and stick to rc2 on the 17th.
>
> There are 8 open release blockers, a few of which have patches that need
> review.  So I think we are still not ready to release rc1.  But it worries
> me because I think this is going to push the final release beyond our
> October 1st goal.

the goal is admirable, but unless there are paying sponsors that
require this deadline be met, i'd suggest that we can push the
releases until they're ready.  the changes that 2.6 and 3.0 bring are
too major to be released before they are ready for primetime.

also, there hasn't been a beta3 download available for Win users
(aside from the developers who can build it) since Martin has been on
vacation... they will effectively be leapfrogged from b2 directly to
rc1. i think he comes back tomorrow, so if rc1 really is going out
soon, would it make sense for him to make b3 MSI files too?

just my $0.02,
-wesley

From abpillai at gmail.com  Mon Sep  8 06:10:01 2008
From: abpillai at gmail.com (Anand Balachandran Pillai)
Date: Mon, 8 Sep 2008 09:40:01 +0530
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com>
References: <48BFF8D9.3030002@jcea.es>
	<loom.20080904T154530-598@post.gmane.org>
	<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>
	<48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es>
	<4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com>
	<48C1ED22.5040002@gmail.com>
	<4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com>
Message-ID: <8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com>

On Sat, Sep 6, 2008 at 8:14 AM, Jesse Noller <jnoller at gmail.com> wrote:
> On Fri, Sep 5, 2008 at 10:38 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Jesse Noller wrote:
>>> On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea <jcea at jcea.es> wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Nick Coghlan wrote:
>>>>> Hmm, having (daemon=False) as a parameter on start() would probably be
>>>>> an even better API than having it on __init__() (modulo subclassing
>>>>> compatibility concerns).
>>>> Agreed. Could it be done for 3.0?.
>>>
>>> Also, FWIW, I thought we were no longer doing API changes?
>>
>> We aren't - if we'd thought of it a month ago, we could have included
>> it, but now 2.7/3.1 is the earliest for that change.
>>
>> As far as the 'typo protection' goes... I'm still not convinced that the
>> delayed action of the set daemon effect means that the Thread object
>> needs special protection.
>>
>> If an application fails to set the attribute properly, then its test
>> suite will hang on shutdown (as the threading module attempts to do
>> .join() on a thread that hasn't been told to stop).
>
> I happen to really like like the property-approach. It makes sense to
> call thread.daemon = True, it's also clean and feels natural now that
> it's there. And you're right - typos in this will bite people fairly
> quickly, but to Jesus' point - those people may go chasing something
> else before noticing they typed deamon instead of daemon.

I think Jesus raises a very valid point. I have often typed "setDeamon"
instead of "setDaemon" for my thread objects. I always make it a point
to keep open the module documentation for threading.Thread before
calling this method on the objects.

I think the "Pythonic" way of doing it would be to use properties,
so 'thread.daemon=1' is very nice. But without __slots__, we are
going to have many developers write 'thread.deamon=1' and not
notice this is the problem when they start debugging after stuff
does not work the way they expect at process shutdown. They
are going to chase after some other thread...

I guess adding __slots__ to Thread class is the best approach
for this. +1 for that...

IMHO, this is perhaps late for 3.0, but definitely a good thing
to add for 3.1.

>
> -jesse
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/abpillai%40gmail.com
>

Regards,

-- 
-Anand

From martin at v.loewis.de  Mon Sep  8 06:35:28 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 08 Sep 2008 06:35:28 +0200
Subject: [Python-3000] PEP 3108 and the demise of bsddb3
In-Reply-To: <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>
References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za>	<1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com>	<18623.18700.76260.893902@montanaro-dyndns-org.local>	<bbaeab100809032026j643492f8w6b61169859e1bf05@mail.gmail.com>
	<79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com>
Message-ID: <48C4AB90.1020601@v.loewis.de>

> On Windows, none are available except dbm.dumb and bsddb (presently).
> If bsddb is to be removed, can/should one of the other "real" dbm
> variants be added to the standard binary, so that Windows users have
> at least one usable dbm option?

Which one specifically? What's the licensing implications?

For 3.0, I think that is too late.

Regards,
Martin

From martin at v.loewis.de  Mon Sep  8 07:00:48 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 08 Sep 2008 07:00:48 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <ga0q63$vrd$1@ger.gmane.org>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
	<ga0q63$vrd$1@ger.gmane.org>
Message-ID: <48C4B180.2050301@v.loewis.de>

>> Can you explain this a bit more? I presume you're talking about
>> subclassing in C
> 
> Yes, I mentioned Cython above.

Can you please still elaborate? I have never used Cython before, but
if it cannot efficiently subclass str, isn't that a bug in Cython?

>> I do note that the mechanisms that exist for supporting adding a __dict__
>> to a str (in 2.x; or bytes in 3.x) or a tuple could be extended for other
>> purposes.
> 
> I never looked into these, but this does not sound like it would impact
> subclassing.

To me, the relationship is fairly straight: if you want to subclass a
type, *all* you need is a way to place an __dict__ in the object, if
it doesn't already have one. If the base object already has an __dict__,
the layout of the subtype can be the same as the layout of the base
type.

Now, what Guido (probably) refers to is the implementation strategy
used for adding __dict__ could be generalized for adding additional
slots as well: for a variable-sized object (str or tuple), the
dictoffset is negative, indicating that you have to count from the
end of the object, not from the start, to find the slot. So if you
are worried about __dict__-stored attributes being too slow (*), this
approach could be a solution.

(*) This assumes that the lack of additional slots actually *is* your
concern.

Regards,
Martin

From martin at v.loewis.de  Mon Sep  8 07:07:46 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 08 Sep 2008 07:07:46 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <C5B0E433-C918-413B-8109-C91121B3904C@fuhm.net>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<48C4464E.5010707@gmail.com>	<ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>
	<C5B0E433-C918-413B-8109-C91121B3904C@fuhm.net>
Message-ID: <48C4B322.2060401@v.loewis.de>

> Given that you can, today, subclass str in Python, without wasting an
> extra 4/8 bytes of memory, or adding anything new to the class object,
> why wouldn't anyone who really wanted to make a hypothetical optimized
> subclass just use the same mechanism (putting your additional data
> *after* the character data) to subclass it in C?
> 
> It may be a little tricky, but not exactly rocket science

I believe many people do consider it rocket science, or at least so much
out of their reach that it doesn't actually come to their mind as a
possible solution.

I'm really curious about Stefan's explanation why efficient subclassing
of str is not possible in Cython (is it not possible at all? is it
possible but inefficient? if so, how much, and why?)

Regards,
Martin

From abpillai at gmail.com  Mon Sep  8 07:40:45 2008
From: abpillai at gmail.com (Anand Balachandran Pillai)
Date: Mon, 8 Sep 2008 11:10:45 +0530
Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta
	3 planned for this Wednesday)
In-Reply-To: <52dc1c820809061353w6bd21232jf08ccad6b7bde19f@mail.gmail.com>
References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com>
	<52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com>
	<8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com>
	<52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com>
	<ca471dc20809051037t69dd4b86pb078a633d34de8cd@mail.gmail.com>
	<52dc1c820809061353w6bd21232jf08ccad6b7bde19f@mail.gmail.com>
Message-ID: <8548c5f30809072240n26dce363h2545e5147da1f37@mail.gmail.com>

Hi Gregory,

         If you need help in testing out the bytearray related patches
on various platforms (#3797, #3492) let me know.

Regards

--Anand

On Sun, Sep 7, 2008 at 2:23 AM, Gregory P. Smith <greg at krypto.org> wrote:
> issue 3797 created with trivial patches for the remaining bytearray
> returning abusers.  review needed.
>
> I don't have a build environment for windows to test the PC/winreg one
> on but its too simple to be wrong.
>
> On Fri, Sep 5, 2008 at 10:37 AM, Guido van Rossum <guido at python.org> wrote:
>> This needs to be fixed. It is surely a relic from the alpha1 situation
>> where the bytes type was mutable. No read APIs should return mutable
>> bytes. Write APIs should accept mutable and immutable bytes though.
>>
>> On Fri, Sep 5, 2008 at 12:17 AM, Gregory P. Smith <greg at krypto.org> wrote:
>>> Anyone have an opinion on http://bugs.python.org/issue3492 in regards
>>> to it being a release blocker?
>>>
>>> The gist of it:  zlib returns bytearrays where other modules return
>>> bytes.  zipimport, because it uses zlib, required bytearrays instead
>>> of bytes as input.  A few other modules also appear to return
>>> bytearrays when they're likely better off returning bytes for
>>> consistency.
>>>
>>> IMHO, it seems like bytearrays should rarely be returned by the
>>> existing standard library apis.  Since they are mutable they are
>>> ideally suited for new APIs where they're passed in and modified.
>>>
>>> Whats the big deal if this is not fixed before release?  Users are
>>> likely to get frustrated at inputs not being hashable without explicit
>>> (data copy) conversion to an immutable type.  And any code that gets
>>> written depending on these returning bytearrays instead of bytes would
>>> need fixing if we waited until 3.1 to fix it.
>>>
>>> -gps
>>>
>>> On Wed, Sep 3, 2008 at 10:42 PM, Anand Balachandran Pillai
>>> <abpillai at gmail.com> wrote:
>>>> On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith <greg at krypto.org> wrote:
>>>>> I agree that this should go in.  zlib should return bytes.  other read
>>>>> functions and similar modules like bz2module already return bytes.
>>>>> unless i hear objections, i'll commit this in about 12 hours.
>>>>
>>>> +1  :)
>>>>
>>>>>
>>>>
>>>> Regards
>>>>
>>>> --
>>>> -Anand
>>>>
>>> _______________________________________________
>>> Python-3000 mailing list
>>> Python-3000 at python.org
>>> http://mail.python.org/mailman/listinfo/python-3000
>>> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>>>
>>
>>
>>
>> --
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>
>

-- 
-Anand

From stefan_ml at behnel.de  Mon Sep  8 08:56:17 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 08 Sep 2008 08:56:17 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <48C4B180.2050301@v.loewis.de>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<ga0q63$vrd$1@ger.gmane.org>
	<48C4B180.2050301@v.loewis.de>
Message-ID: <ga2iah$6e7$1@ger.gmane.org>

Martin v. L?wis wrote:
>>> Can you explain this a bit more? I presume you're talking about
>>> subclassing in C
>> Yes, I mentioned Cython above.
> 
> Can you please still elaborate? I have never used Cython before

Should have been clearer, sorry. C-level subtyping in Cython/Pyrex works as
follows. We create a new struct for the type that contains the parent-struct
as first field, and then we add the new attributes of the new type behind
that. This implies that this kind of subtyping is single inheritance (as
opposed to normal Python subclassing, which is the same in Pyrex/Cython and
Python). This currently works for all builtin types, except str. It results in
a very regular memory layout for extension types.

The way it's written in Pyrex/Cython is:

    cdef class MyListSubType(PyListObject):
        cdef int some_additional_int_field
        cdef my_struct* some_struct

        def __init__(self):
            self.some_struct = get_the_struct_pointer(...)
            self.some_additional_int_field = 1

PyListObject will become a struct member called "__pyx_base" in the new struct
for MyListSubType, and access to members of the base type does a straight

    self->__pyx_base-> (... possibly more __pyx_base derefs ...) -> field_name

The C compiler will make this a straight "self[index_of_field_name]" pointer
deref, unbeatable in speed.

The exact memory layout only needs to be available at C compile time. Also,
the exact members of the parent type(s) are not required at Cython compile
time (only those used in the code), as the C compiler will get them right when
it reads their header file.

> if it cannot efficiently subclass str, isn't that a bug in Cython?

I wouldn't mind letting Cython special case subtypes of str (or unicode in
Py3) *somehow*, as long as this "somewhow" proves to be a viable solution that
only applies to exactly those types *and* can be done realiably for subtypes
of subtypes. I'm just not aware of such a solution.

>>> I do note that the mechanisms that exist for supporting adding a __dict__
>>> to a str (in 2.x; or bytes in 3.x) or a tuple could be extended for other
>>> purposes.
>> I never looked into these, but this does not sound like it would impact
>> subclassing.
> 
> To me, the relationship is fairly straight: if you want to subclass a
> type, *all* you need is a way to place an __dict__ in the object, if
> it doesn't already have one. If the base object already has an __dict__,
> the layout of the subtype can be the same as the layout of the base
> type.

As long as you accept the dictionary indirection and type unpacking for
accessing fields even in the context of private C-level type members of an
extension type, which are currently accessible through straight pointers.

There is a huge performance difference between e.g. a) dereferencing a pointer
to a C int, and b) asking a dictionary for a name, have it find the result,
check if the result is empty, check if the result is a Python long or int (or
a pointer object, or whatever), unpack the result into a C int. Plus the need
to raise an exception in the error case, plus the Python-level visibility of
internal C-level fields (such as arbitrary pointers), plus the inability to do
this without holding the GIL. Plus the casting all over the place when it's
not a C int but a struct pointer, for example.

> Now, what Guido (probably) refers to is the implementation strategy
> used for adding __dict__ could be generalized for adding additional
> slots as well: for a variable-sized object (str or tuple), the
> dictoffset is negative, indicating that you have to count from the
> end of the object, not from the start, to find the slot.

This does sound interesting, but I will have to look into the implications. As
I said, it has to be a viable solution without (noticeable) impact on other
types. I'm not sure how this would interact with subtypes of subtypes, and
what the memory layout would be in that case.

Stefan

From solipsis at pitrou.net  Mon Sep  8 12:19:54 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 8 Sep 2008 10:19:54 +0000 (UTC)
Subject: [Python-3000] PyUnicodeObject implementation
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<ga0q63$vrd$1@ger.gmane.org>
	<48C4B180.2050301@v.loewis.de> <ga2iah$6e7$1@ger.gmane.org>
Message-ID: <loom.20080908T100214-459@post.gmane.org>

Stefan Behnel <stefan_ml <at> behnel.de> writes:
> 
>     cdef class MyListSubType(PyListObject):
>         cdef int some_additional_int_field
>         cdef my_struct* some_struct
> 
>         def __init__(self):
>             self.some_struct = get_the_struct_pointer(...)
>             self.some_additional_int_field = 1

In your example, you could wrap the additional fields (additional_int_field and
some_struct) in a dedicated struct, and define a macro which gives a pointer to
this struct when given the address of the object. Once you have the pointer to
the struct, accessing additional fields is as simple as in the non-PyVarObject
case.

Something like (pseudocode):

#define MyStrSubType_FIELDS_ADDR(op) \
  ((struct MyStrSubType_subfields*) &((void*)op + PyString_Type->tp_basicsize \
      + op->size * PyString_Type->tp_itemsize))

It's not as trivially cheap as a straight field access, but much less expensive
than a dictionary lookup.

(perhaps this needs to be a bit more complicated if you want a specific
alignment for your fields)

From p.f.moore at gmail.com  Mon Sep  8 13:04:35 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 8 Sep 2008 12:04:35 +0100
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <78b3a9580809071734u5967f305mbff6120dfca538b7@mail.gmail.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<78b3a9580809071734u5967f305mbff6120dfca538b7@mail.gmail.com>
Message-ID: <79990c6b0809080404h4ae2f636xe77b46ede7c437ec@mail.gmail.com>

2008/9/8 wesley chun <wescpy at gmail.com>:
> the goal is admirable, but unless there are paying sponsors that
> require this deadline be met, i'd suggest that we can push the
> releases until they're ready.  the changes that 2.6 and 3.0 bring are
> too major to be released before they are ready for primetime.

I believe that the reason for the Oct 1st deadline is that, if we hit
it, the new versions will be included in some vendor OS releases (I
don't know the exact details, but that's my recollection).

> also, there hasn't been a beta3 download available for Win users
> (aside from the developers who can build it) since Martin has been on
> vacation... they will effectively be leapfrogged from b2 directly to
> rc1. i think he comes back tomorrow, so if rc1 really is going out
> soon, would it make sense for him to make b3 MSI files too?

I agree that the lack of Windows installers is somewhat frustrating
(not that I begrudge Martin his holiday!) but in practice I wonder how
much impact it has. I've used the earlier betas and alphas, but most
of my code relies on one or more external packages, so I tend to have
to wait for 2.3 (or 3.0) compatible binaries of those. The only one
readily available is pywin32, where there's a 2.6 version (but still
no 3.0). I don't know how common my situation is, but certainly the
Windows betas don't get as much testing by me as I'd like.

Paul.

From solipsis at pitrou.net  Mon Sep  8 13:24:03 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 8 Sep 2008 11:24:03 +0000 (UTC)
Subject: [Python-3000] os.write accepts unicode strings
Message-ID: <loom.20080908T112246-241@post.gmane.org>

Hello,

I thought I'd mention the following issue before it's too late to possibly fix
it in 3.0. Basically, os.write() accepts str as well as bytes object, which
doesn't sound right.

http://bugs.python.org/issue3782

Regards

Antoine.

From ncoghlan at gmail.com  Mon Sep  8 13:25:11 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 08 Sep 2008 21:25:11 +1000
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com>
References: <48BFF8D9.3030002@jcea.es>	
	<loom.20080904T154530-598@post.gmane.org>	
	<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>	
	<48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es>	
	<4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com>	
	<48C1ED22.5040002@gmail.com>	
	<4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com>
	<8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com>
Message-ID: <48C50B97.1060409@gmail.com>

Anand Balachandran Pillai wrote:
> I guess adding __slots__ to Thread class is the best approach
> for this. +1 for that...
> 
> IMHO, this is perhaps late for 3.0, but definitely a good thing
> to add for 3.1.

I'd want at least one major release with a deprecation warning on
Thread's __setattr__ before we did anything like blocking the addition
of new attributes. Thread has been exposed as a normal python class for
a long time, and there is sure to be code out there that relies on
setting new attributes on Thread instances.

And I still don't know what makes daemon so special that it needs typo
protection when almost everything else in the standard library doesn't
have it. Given the amount of memory that is going to be allocated for
the new thread's stack, the saving of the space for an empty __dict__
slot also isn't a particularly significant gain.

(I deliberately pronounce daemon as day-mon though, so I don't forget
how to spell it - perhaps pronouncing it as dee-mon makes it harder to
remember the order of the 'a' and the 'e'?)

If it is just the specific typo as 'deamon' that concerns people, adding
a property specifically to raise an exception for that name would be far
less hassle than locking down the attributes of all Thread instances.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From barry at python.org  Mon Sep  8 15:16:07 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 8 Sep 2008 09:16:07 -0400
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <ga1ck7$ldi$1@ger.gmane.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
Message-ID: <F1FCDA57-E3A8-44F6-AB40-83889C4CC3FA@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 7, 2008, at 4:12 PM, Fredrik Lundh wrote:

> Barry Warsaw wrote:
>
>>> (I have a few minor ET fixes, and possibly a Unicode 5.1 patch,  
>>> but have had absolutely no time to spend on that.  is the window  
>>> still open?)
>> There are 8 open release blockers, a few of which have patches that  
>> need review.  So I think we are still not ready to release rc1.
>
> So what's the new ETA?  Should I set aside some time to work on the  
> patches, say, tomorrow, or is it too late?

It's not too late.  If they fix bugs and the code gets reviewed then  
yes, you can check them in.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMUlmHEjvBPtnXfVAQJ51QP7BdUGcKN4+L9vD+g7y2TI0+TSw4Ms+eAc
yXprcbQnfGp1+uxzjiTCeAv0OSAodw4aakAaI4wzrAkKYNmsVaWOiGKiKrLvR7+Y
++qBxxxVwlKL606hlJCKgphD4hbZcW1w3wY94CXkmrTqyZe/XrStvBj7X10gWeYW
lwC3ATaQQ5Y=
=tyym
-----END PGP SIGNATURE-----

From barry at python.org  Mon Sep  8 15:17:46 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 8 Sep 2008 09:17:46 -0400
Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight
In-Reply-To: <79990c6b0809080404h4ae2f636xe77b46ede7c437ec@mail.gmail.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<78b3a9580809071734u5967f305mbff6120dfca538b7@mail.gmail.com>
	<79990c6b0809080404h4ae2f636xe77b46ede7c437ec@mail.gmail.com>
Message-ID: <98B2280E-5101-44AE-B7E4-4A880F78A0B2@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 8, 2008, at 7:04 AM, Paul Moore wrote:

> 2008/9/8 wesley chun <wescpy at gmail.com>:
>> the goal is admirable, but unless there are paying sponsors that
>> require this deadline be met, i'd suggest that we can push the
>> releases until they're ready.  the changes that 2.6 and 3.0 bring are
>> too major to be released before they are ready for primetime.
>
> I believe that the reason for the Oct 1st deadline is that, if we hit
> it, the new versions will be included in some vendor OS releases (I
> don't know the exact details, but that's my recollection).

This is what I've been told.  I haven't been told that if we miss the  
mark, it /won't/ be included but that's my assumption.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMUl+nEjvBPtnXfVAQJHsQQAhCt1HfqB3JooC0KZXzUryRJUNMdC7QZh
KiX1dayV8q0R2QZtJFBaxP05uqCMEP0uxnWGwmyUm3LT4Idmde6ZGcTnBO160HgL
bjwYGYDMtS7X9PxQjMyszVY1gwIX4iFX4KhYtqXKrtodMrqwSbuH69b5cM/0RZ9s
DUUPYS/qKjo=
=9zBO
-----END PGP SIGNATURE-----

From barry at python.org  Mon Sep  8 15:23:37 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 8 Sep 2008 09:23:37 -0400
Subject: [Python-3000] Proposed revised schedule
In-Reply-To: <ga1ck7$ldi$1@ger.gmane.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
Message-ID: <D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I don't think there's any way we're going to make our October 1st  
goal.  We have 8 open release critical bugs, and 18 deferred  
blockers.  We do not have a beta3 Windows installer and I don't have  
high hopes for rectifying all of these problems in the next day or two.

I propose that we push the entire schedule back two weeks.  This means  
that the planned rc2 on 17-September becomes our rc1.  The planned  
final release for 01-October becomes our rc2, and we release the  
finals on 15-October.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMUnWXEjvBPtnXfVAQIEAQQAnut+CRyBAacC2zzptb5l9cphwke0sEjx
THJXHCBUfidaEV7SCtyfkh6i+IpqynvFRsKyOYSWsMojAa5rO/iM6ZJLkUav9c62
IzweJ6Nw3UnOJ/7xksCesDVxDRncFtvu0eRUZWDkOsrNawL+Z21DGKtAuau/pgiY
sFnKeyP7NX0=
=ZNPm
-----END PGP SIGNATURE-----

From guido at python.org  Mon Sep  8 19:13:08 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Sep 2008 10:13:08 -0700
Subject: [Python-3000] Proposed revised schedule
In-Reply-To: <D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
Message-ID: <ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>

On Mon, Sep 8, 2008 at 6:23 AM, Barry Warsaw <barry at python.org> wrote:
> I don't think there's any way we're going to make our October 1st goal.  We
> have 8 open release critical bugs, and 18 deferred blockers.  We do not have
> a beta3 Windows installer and I don't have high hopes for rectifying all of
> these problems in the next day or two.
>
> I propose that we push the entire schedule back two weeks.  This means that
> the planned rc2 on 17-September becomes our rc1.  The planned final release
> for 01-October becomes our rc2, and we release the finals on 15-October.
>
> - -Barry

Perhaps it's time to separate the 2.6 and 3.0 release schedules? I
don't care if the next version of OSX contains 3.0 or not -- but I do
care about it having 2.6.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Mon Sep  8 22:10:07 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 08 Sep 2008 16:10:07 -0400
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <48C50B97.1060409@gmail.com>
References: <48BFF8D9.3030002@jcea.es>		<loom.20080904T154530-598@post.gmane.org>		<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>		<48C05131.60209@gmail.com>
	<48C159FE.7070400@jcea.es>		<4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com>		<48C1ED22.5040002@gmail.com>		<4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com>	<8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com>
	<48C50B97.1060409@gmail.com>
Message-ID: <ga40qv$dvp$1@ger.gmane.org>

Nick Coghlan wrote:

> (I deliberately pronounce daemon as day-mon though, so I don't forget
> how to spell it - perhaps pronouncing it as dee-mon makes it harder to
> remember the order of the 'a' and the 'e'?)
> 
> If it is just the specific typo as 'deamon' that concerns people, adding
> a property specifically to raise an exception for that name would be far
> less hassle than locking down the attributes of all Thread instances.

Different people have different mis-spelling quirks.  I might type demon 
(in other contexts) but never deamon instead of daemon.  There are other 
stdlib attributes I am more likely to misspell, so worrying about just 
this one, to the point of changing the implementation, seems a bit 
mis-directed.

From ncoghlan at gmail.com  Mon Sep  8 23:01:41 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Sep 2008 07:01:41 +1000
Subject: [Python-3000] About "daemon" in threading module
In-Reply-To: <ga40qv$dvp$1@ger.gmane.org>
References: <48BFF8D9.3030002@jcea.es>		<loom.20080904T154530-598@post.gmane.org>		<ca471dc20809041256m5fec338fy4f5081a47260c4c3@mail.gmail.com>		<48C05131.60209@gmail.com>	<48C159FE.7070400@jcea.es>		<4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com>		<48C1ED22.5040002@gmail.com>		<4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com>	<8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com>	<48C50B97.1060409@gmail.com>
	<ga40qv$dvp$1@ger.gmane.org>
Message-ID: <48C592B5.6060203@gmail.com>

Terry Reedy wrote:
> Different people have different mis-spelling quirks.  I might type demon
> (in other contexts) but never deamon instead of daemon.  There are other
> stdlib attributes I am more likely to misspell, so worrying about just
> this one, to the point of changing the implementation, seems a bit
> mis-directed.

I actually agree, but the concern about mispelling daemon in particular
was raised by a couple of folks (my own opinion is that failing to set a
thread's daemon status correctly should be picked up by even a pretty
basic unit test suite).

However, if anything at all was to be done about this, explicitly
intercepting a couple of common spelling errors (such as 'demon' and
'deamon') struck me as a lower impact approach than completely blocking
the addition of new attributes to Thread instances.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From martin at v.loewis.de  Tue Sep  9 00:12:36 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 09 Sep 2008 00:12:36 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <ga2iah$6e7$1@ger.gmane.org>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<ga0q63$vrd$1@ger.gmane.org>	<48C4B180.2050301@v.loewis.de>
	<ga2iah$6e7$1@ger.gmane.org>
Message-ID: <48C5A354.6070908@v.loewis.de>

> I wouldn't mind letting Cython special case subtypes of str (or unicode in
> Py3) *somehow*, as long as this "somewhow" proves to be a viable solution that
> only applies to exactly those types *and* can be done realiably for subtypes
> of subtypes. I'm just not aware of such a solution.

As people have pointed out: add new fields *after* the variable-sized
members. To access it, you need to compute the length of the base
object, and then cast the pointer to an extension struct.

That extends to further subtypes, too.

Access is slightly slower, i.e. it's not a compile-time constant, but

  base_address + base_address[ob_len]*elem_size - more_fields_size

This still compiles efficiently, e.g. on x86, gcc compiles a struct
field access to

  movl    20(%eax), %eax

and an access with a var-sized offset into

  movl    8(%eax), %edx; fetch length into edx
  movl    -20(%eax,%edx,2), %eax; access 20-byte sized struct, assuming
elements of size 2

> This does sound interesting, but I will have to look into the implications. As
> I said, it has to be a viable solution without (noticeable) impact on other
> types. I'm not sure how this would interact with subtypes of subtypes, and
> what the memory layout would be in that case.

See above.

Regards,
Martin

From martin at v.loewis.de  Tue Sep  9 00:16:38 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 09 Sep 2008 00:16:38 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <48C5A354.6070908@v.loewis.de>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<ga0q63$vrd$1@ger.gmane.org>	<48C4B180.2050301@v.loewis.de>	<ga2iah$6e7$1@ger.gmane.org>
	<48C5A354.6070908@v.loewis.de>
Message-ID: <48C5A446.8040101@v.loewis.de>

>   base_address + base_address[ob_len]*elem_size - more_fields_size

The subtraction is wrong, of course - it's still an addition. I was
just confused by tp_dictoffset being negative in that case; the sign
is but a mere flag in that case, and the offset is still positive.

Regards,
Martin

From musiccomposition at gmail.com  Tue Sep  9 01:13:29 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Mon, 8 Sep 2008 18:13:29 -0500
Subject: [Python-3000] Proposed revised schedule
In-Reply-To: <ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
Message-ID: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>

On Mon, Sep 8, 2008 at 12:13 PM, Guido van Rossum <guido at python.org> wrote:
> On Mon, Sep 8, 2008 at 6:23 AM, Barry Warsaw <barry at python.org> wrote:
>> I don't think there's any way we're going to make our October 1st goal.  We
>> have 8 open release critical bugs, and 18 deferred blockers.  We do not have
>> a beta3 Windows installer and I don't have high hopes for rectifying all of
>> these problems in the next day or two.
>>
>> I propose that we push the entire schedule back two weeks.  This means that
>> the planned rc2 on 17-September becomes our rc1.  The planned final release
>> for 01-October becomes our rc2, and we release the finals on 15-October.
>>
>> - -Barry
>
> Perhaps it's time to separate the 2.6 and 3.0 release schedules? I
> don't care if the next version of OSX contains 3.0 or not -- but I do
> care about it having 2.6.

I'm not really sure what good that would do us unless we wanted to
bring 3.0 back to the beta phase and continue to work on some larger
issues with it. I also suspect doing two separate, but close together
final releases would be more stressful than having them in lock and
step.

Just my pocket change, though.

-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From eric at trueblade.com  Tue Sep  9 01:15:04 2008
From: eric at trueblade.com (Eric Smith)
Date: Mon, 08 Sep 2008 19:15:04 -0400
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <48C5A354.6070908@v.loewis.de>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<ga0q63$vrd$1@ger.gmane.org>	<48C4B180.2050301@v.loewis.de>	<ga2iah$6e7$1@ger.gmane.org>
	<48C5A354.6070908@v.loewis.de>
Message-ID: <48C5B1F8.5070707@trueblade.com>

Martin v. L?wis wrote:
>> I wouldn't mind letting Cython special case subtypes of str (or unicode in
>> Py3) *somehow*, as long as this "somewhow" proves to be a viable solution that
>> only applies to exactly those types *and* can be done realiably for subtypes
>> of subtypes. I'm just not aware of such a solution.
> 
> As people have pointed out: add new fields *after* the variable-sized
> members. To access it, you need to compute the length of the base
> object, and then cast the pointer to an extension struct.

How about putting the variable sized data _before_ the struct?  That is, 
make the memory layout:

<string data>
<PyObject fields>
<PyUnicodeObject fields>
<derived object fields>

Admittedly, accessing the string data is now more complex, since you 
have to know where it starts (which we already know, based on the size). 
But that might be simpler than having the offset logic when accessing 
derived object fields, because that would be different from all other C 
objects.

There would be some complications when allocating, because of alignment 
issues, but I don't think it would be impossible to do this. We'd need 
to be careful when deallocating, as well (of course).

Eric.

From guido at python.org  Tue Sep  9 01:25:10 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Sep 2008 16:25:10 -0700
Subject: [Python-3000] Proposed revised schedule
In-Reply-To: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
Message-ID: <ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>

On Mon, Sep 8, 2008 at 4:13 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> On Mon, Sep 8, 2008 at 12:13 PM, Guido van Rossum <guido at python.org> wrote:
>> On Mon, Sep 8, 2008 at 6:23 AM, Barry Warsaw <barry at python.org> wrote:
>>> I don't think there's any way we're going to make our October 1st goal.  We
>>> have 8 open release critical bugs, and 18 deferred blockers.  We do not have
>>> a beta3 Windows installer and I don't have high hopes for rectifying all of
>>> these problems in the next day or two.
>>>
>>> I propose that we push the entire schedule back two weeks.  This means that
>>> the planned rc2 on 17-September becomes our rc1.  The planned final release
>>> for 01-October becomes our rc2, and we release the finals on 15-October.
>>>
>>> - -Barry
>>
>> Perhaps it's time to separate the 2.6 and 3.0 release schedules? I
>> don't care if the next version of OSX contains 3.0 or not -- but I do
>> care about it having 2.6.
>
> I'm not really sure what good that would do us unless we wanted to
> bring 3.0 back to the beta phase and continue to work on some larger
> issues with it. I also suspect doing two separate, but close together
> final releases would be more stressful than having them in lock and
> step.

Well, from the number of release blockers it sounds like another 3.0
beta is the right thing. For 2.6 however I believe we're much closer
to the finish line -- there aren't all those bytes/str issues to clean
up, for example! And apparently the benefit of releasing on schedule
is that we will be included in OSX. That's a much bigger deal for 2.6
than for 3.0 (I doubt that Apple would add two versions anyway).

> Just my pocket change, though.
>
>
>
> --
> Cheers,
> Benjamin Peterson
> "There's no place like 127.0.0.1."
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Tue Sep  9 02:11:49 2008
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 09 Sep 2008 02:11:49 +0200
Subject: [Python-3000] Proposed revised schedule
In-Reply-To: <ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga0ppm$udq$1@ger.gmane.org>	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
Message-ID: <48C5BF45.9020807@cheimes.de>

Guido van Rossum wrote:
> Well, from the number of release blockers it sounds like another 3.0
> beta is the right thing. For 2.6 however I believe we're much closer
> to the finish line -- there aren't all those bytes/str issues to clean
> up, for example! And apparently the benefit of releasing on schedule
> is that we will be included in OSX. That's a much bigger deal for 2.6
> than for 3.0 (I doubt that Apple would add two versions anyway).

I'm on Guido's side.

Ok, from the marketing perspective it's a nice catch to release 2.6 and 
3.0 on the same day. "Python 2.6.0 and 3.0.0 released" makes a great 
headline.
But given the chance to get Python 2.6 into the next OSX version it's 
fine with me to release 3.0 a couple of weeks later. Python 3.0 is not 
ready for a release candidate. We just fixed a bunch of memory leaks and 
critical errors over the last week. And don't forget Windows! The 
Windows builds didn't get thorough testing because we didn't provide our 
tests with official builds.

I'm +1 for a 2.6rc and another beta of 3.0

Christian

From greg.ewing at canterbury.ac.nz  Tue Sep  9 02:10:19 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 09 Sep 2008 12:10:19 +1200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <ga2iah$6e7$1@ger.gmane.org>
References: <200809051954.42787.jeremy.kloth@gmail.com>
	<loom.20080906T103737-758@post.gmane.org> <g9vv2u$3ot$1@ger.gmane.org>
	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
	<ga0q63$vrd$1@ger.gmane.org> <48C4B180.2050301@v.loewis.de>
	<ga2iah$6e7$1@ger.gmane.org>
Message-ID: <48C5BEEB.3030009@canterbury.ac.nz>

Stefan Behnel wrote:

> We create a new struct for the type that contains the parent-struct
> as first field, and then we add the new attributes of the new type behind
> that.

I seem to remember there's a field in the type called tp_basicsize
that's meant to indicate how big the base part of the struct is,
with any variable-size part placed after it.

If a variable-size type always uses this field to find the variable
data, it seems to me that the usual scheme for subclassing should
still work, with the extra fields existing in between those of the
base class and the new position of the variable data.

Does Py_Unicode not take notice of this field? If not, maybe that's
something that should be fixed.

-- 
Greg

From python at rcn.com  Tue Sep  9 04:07:48 2008
From: python at rcn.com (Raymond Hettinger)
Date: Mon, 8 Sep 2008 19:07:48 -0700
Subject: [Python-3000] [Python-Dev]  Proposed revised schedule
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org><ga0ppm$udq$1@ger.gmane.org><9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org><ga1ck7$ldi$1@ger.gmane.org><D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org><ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com><1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
Message-ID: <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>

[Guido van Rossum]
> Well, from the number of release blockers it sounds like another 3.0
> beta is the right thing. For 2.6 however I believe we're much closer
> to the finish line -- there aren't all those bytes/str issues to clean
> up, for example! And apparently the benefit of releasing on schedule
> is that we will be included in OSX. That's a much bigger deal for 2.6
> than for 3.0 (I doubt that Apple would add two versions anyway).

With the extra time, it would be worthwhile to add dbm.sqlite to 3.0
to compensate for the loss of bsddb so that shelves won't become
useless on Windows builds.

Raymond

From guido at python.org  Tue Sep  9 04:11:02 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Sep 2008 19:11:02 -0700
Subject: [Python-3000] [Python-Dev]  Proposed revised schedule
In-Reply-To: <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
Message-ID: <ca471dc20809081911h6811accdx5065b1244d84c6ac@mail.gmail.com>

On Mon, Sep 8, 2008 at 7:07 PM, Raymond Hettinger <python at rcn.com> wrote:
> [Guido van Rossum]
>>
>> Well, from the number of release blockers it sounds like another 3.0
>> beta is the right thing. For 2.6 however I believe we're much closer
>> to the finish line -- there aren't all those bytes/str issues to clean
>> up, for example! And apparently the benefit of releasing on schedule
>> is that we will be included in OSX. That's a much bigger deal for 2.6
>> than for 3.0 (I doubt that Apple would add two versions anyway).
>
> With the extra time, it would be worthwhile to add dbm.sqlite to 3.0
> to compensate for the loss of bsddb so that shelves won't become
> useless on Windows builds.

So get started already! :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Tue Sep  9 04:12:23 2008
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 8 Sep 2008 21:12:23 -0500
Subject: [Python-3000] [Python-Dev]  Proposed revised schedule
In-Reply-To: <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
Message-ID: <18629.56199.90786.234922@montanaro-dyndns-org.local>

    Raymond> With the extra time, it would be worthwhile to add dbm.sqlite
    Raymond> to 3.0 to compensate for the loss of bsddb so that shelves
    Raymond> won't become useless on Windows builds.

My vote is to separate 2.6 and 3.0 then come back together for 2.7 and 3.1.
I'm a bit less sure about adding dbm.sqlite.  Unless Josiah's version is
substantially faster and more robust I think my version needs to cook a bit
longer.  I'm just not comfortable enough with SQLite to pronounce my version
fit enough.  I only intended it as a proof-of-concept, and it's clear it has
some shortcomings.

Skip

From martin at v.loewis.de  Tue Sep  9 07:39:43 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 09 Sep 2008 07:39:43 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <48C5B1F8.5070707@trueblade.com>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<ga0q63$vrd$1@ger.gmane.org>	<48C4B180.2050301@v.loewis.de>	<ga2iah$6e7$1@ger.gmane.org>
	<48C5A354.6070908@v.loewis.de> <48C5B1F8.5070707@trueblade.com>
Message-ID: <48C60C1F.2030501@v.loewis.de>

> How about putting the variable sized data _before_ the struct?

That won't work for container objects (such as tuples); they already
have the GC structure before the PyObject, whose size and layout is
opaque to the objects.

Regards,
Martin

From stefan_ml at behnel.de  Tue Sep  9 08:03:16 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 09 Sep 2008 08:03:16 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <48C5A354.6070908@v.loewis.de>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<ga0q63$vrd$1@ger.gmane.org>	<48C4B180.2050301@v.loewis.de>	<ga2iah$6e7$1@ger.gmane.org>
	<48C5A354.6070908@v.loewis.de>
Message-ID: <ga53j3$2jr$1@ger.gmane.org>

Martin v. L?wis wrote:
>> I wouldn't mind letting Cython special case subtypes of str (or unicode in
>> Py3) *somehow*, as long as this "somewhow" proves to be a viable solution that
>> only applies to exactly those types *and* can be done realiably for subtypes
>> of subtypes. I'm just not aware of such a solution.
> 
> As people have pointed out: add new fields *after* the variable-sized
> members. To access it, you need to compute the length of the base
> object, and then cast the pointer to an extension struct.
> 
> That extends to further subtypes, too.

Thanks Martin, Antoine,

this still requires some figuring out of the details for Cython, but I agree
that Cython is a good place to handle this problem and to fix it for both Py2
and whatever Py3 will add to it.

Martin, you compared these things to rocket science, so let me quote a variant
of what some of the Jython people tend to say:

    "Cython - we write C so you don't have to."

Stefan

From stefan_ml at behnel.de  Tue Sep  9 10:31:33 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 9 Sep 2008 08:31:33 +0000 (UTC)
Subject: [Python-3000] PyUnicodeObject implementation
References: <200809051954.42787.jeremy.kloth@gmail.com>
	<loom.20080906T103737-758@post.gmane.org>
	<g9vv2u$3ot$1@ger.gmane.org>
	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
	<ga0q63$vrd$1@ger.gmane.org> <48C4B180.2050301@v.loewis.de>
	<ga2iah$6e7$1@ger.gmane.org> <48C5BEEB.3030009@canterbury.ac.nz>
Message-ID: <loom.20080909T080232-754@post.gmane.org>

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
> > We create a new struct for the type that contains the parent-struct
> > as first field, and then we add the new attributes of the new type behind
> > that.
> 
> I seem to remember there's a field in the type called tp_basicsize
> that's meant to indicate how big the base part of the struct is,
> with any variable-size part placed after it.
> 
> If a variable-size type always uses this field to find the variable
> data, it seems to me that the usual scheme for subclassing should
> still work, with the extra fields existing in between those of the
> base class and the new position of the variable data.
> 
> Does Py_Unicode not take notice of this field? If not, maybe that's
> something that should be fixed.

Look at the layout of PyStringObject. The last entry is a

    char* ob_sval[1]

The only purpose of that entry is to point to the buffer. That's also 
exploited by PyString_AS_STRING(), a macro that translates to the pointer 
deref "s->ob_sval". Subtypes that declare their own members will have them run 
into ob_sval.

As you noted, a general solution for this problem would be to replace 
PyString_AS_STRING() and the future PyUnicode_AS_DATA() (and, well, all 
occurrences of "->ob_sval" in the CPython source code) by

    (s + s->tp_basicsize)

But that would have the same impact on all string data access operations as 
noted by Martin. I expect that this could be done for the new PyUnicode type 
in Py3. The performance impact is relatively small and it removes the C 
subclassing problem, so that may be considered a reasonable trade-off.

Regarding Cython (and Pyrex), however, it doesn't solve the problem in general 
for the existing Py2 versions that Cython supports (starting from 2.3), so a 
portable solution implemented by Cython would still be best.

Stefan

From stefan_ml at behnel.de  Tue Sep  9 10:39:20 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 9 Sep 2008 08:39:20 +0000 (UTC)
Subject: [Python-3000] PyUnicodeObject implementation
References: <200809051954.42787.jeremy.kloth@gmail.com>
	<loom.20080906T103737-758@post.gmane.org>
	<g9vv2u$3ot$1@ger.gmane.org>
	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
	<ga0q63$vrd$1@ger.gmane.org> <48C4B180.2050301@v.loewis.de>
	<ga2iah$6e7$1@ger.gmane.org> <48C5BEEB.3030009@canterbury.ac.nz>
	<loom.20080909T080232-754@post.gmane.org>
Message-ID: <loom.20080909T083712-865@post.gmane.org>

Stefan Behnel wrote:
>     (s + s->tp_basicsize)

I (obviously) meant

      (s + Py_TYPE(s)->tp_basicsize)

so the impact is another bit bigger.

Stefan

From mal at egenix.com  Tue Sep  9 11:32:37 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 09 Sep 2008 11:32:37 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>
	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<48C4464E.5010707@gmail.com>
	<ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>
Message-ID: <48C642B5.2020109@egenix.com>

Before jumping to conclusions, please read the discussion on the
patch ticket:

    http://bugs.python.org/issue1943

It turned out that the patch only provides a marginal performance
improvement, so the perceived main argument for the PyVarObject
implementation doesn't turn out to be a real advantage.

The reasons for chosing a PyObject approach for Unicode rather than
a PyVarObject one like for strings were the following:

 * a pointer to the actual data makes it possible to implement
   optimizations that share data, e.g. slice objects that a
   parser generates when parsing a larger input string or
   view objects that turn a memory mapped file into a live
   Unicode object without any copying overhead

 * a fixed object size results in making good use of the Python
   allocator, since all objects live in the same pool; as a result
   you have better cache locality - which is good for situations
   where you have to deal with lots of objects

 * objects should be small in order to have lots of them in
   the free lists

 * resizing the object should not result in the object's address
   to change, since this is a common operation when creating
   Unicode objects

 * a fixed size PyObject makes extending the object at C level
   very easy

(probably a few more that I've forgotten - it's been a while
since the days of Python 1.6)

The disadvantages of PyVarObjects w/r to extending them in C
were made rather clear in this thread:

 * finding the extensions requires pointer arithmetic

 * the alignment of the extended parts has to be dealt with
   in the object implementation (rather than having the compiler
   take care of this)

 * when resizing the object's data, the extension parts have to
   be copied and realigned as well

 * when resizing the object's data, the addresses of the extension
   parts change, so code has to be aware of this, e.g. caching of
   the offsets is not easily possible

There are also more general disadvantages:

 * resizing the object can cause a change in the object's address,
   so code has to be aware of this

 * objects are spread over many different pools in the memory
   allocator, reducing cache locality

 * keeping PyVarObjects in the free lists requires more memory

IMHO, it's a lot better to tweak the parameters that we have
in the Unicode implementation (e.g. raise the KEEPALIVE_SIZE_LIMIT
to 32, see the ticket for details) and to improve
the memory allocator for storage of small memory chunks or
improve the free list management (which Antoine did with his
free list patch).

The only valid advantage I see with the PyVarObject patch
is the slightly simplified implementation for the standard
case. Given the number of disadvantages, that did not convince
me to change my -1 on the patch.

Regarding making a PyObject -> PyVarObject change in 3.0.1: that's
not a good idea, since it's not a bug fix, but rather a new feature
that also changes the C API significantly.

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 09 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

On 2008-09-08 00:55, Guido van Rossum wrote:
> On Sun, Sep 7, 2008 at 2:23 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Guido van Rossum wrote:
>>> All in all, given the advantage (half the number of allocations) of
>>> the proposal I think there would have to be *very* good arguments
>>> against before we reject this outright. I'd like to understand
>>> Marc-Andre's reasons too.
>> As Stefan notes, because of the frequency with which strings are
>> manipulated in C code via PyString_* / PyUnicode_* calls, it is a data
>> type where "accept no substitutes" prevails.
>>
>> MAL's primary concern appears to be that having Unicode as a plain
>> PyObject leaves the type more open to subclass-based optimisations that
>> have been rejected for the builtin types themselves.
> 
> Hm. I don't have any particularly insightful imagination as to what
> those optimizations might be. Have any been implemented (in 3rd party
> code) in the 8 years that the Unicode object has existed?
> 
>> Having
>> PyString/PyBytes as PyVarObjects means that subclasses are more limited
>> in what they can do.
> 
> True.
> 
>> One possibility that occurs to me is to use a PyVarObject variant that
>> allocates space for an additional void pointer before the variable sized
>> section of the object. The builtin type would leave that pointer NULL,
>> but subtypes could perform the second allocation needed to populate it.
>>
>> The question is whether the 4-8 bytes wasted per object would be worth
>> the fact that only one memory allocation would be needed.
> 
> I believe that 4-8 bytes is more than the overhead of an extra memory
> allocation from the obmalloc heap. It is probably about the same as
> the overhead for a memory allocation from the regular malloc heap. So
> for short strings (of which there are often a lot) it would be more
> expensive; for longer objects it would probably work out just about
> the same.
> 
> There could be a different approach though, whereby the offset from
> the start of the object to the start of the character array wasn't a
> constant but a value stored in the class object. (In fact,
> tp_basicsize could probably be used for this.) It would slow down
> access to the characters a bit though -- a classic time-space
> trade-off that would require careful measurement in order to decide
> which is better.
> 

From ncoghlan at gmail.com  Tue Sep  9 12:20:29 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Sep 2008 20:20:29 +1000
Subject: [Python-3000] [Python-Dev]  Proposed revised schedule
In-Reply-To: <18629.56199.90786.234922@montanaro-dyndns-org.local>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga0ppm$udq$1@ger.gmane.org>	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
	<18629.56199.90786.234922@montanaro-dyndns-org.local>
Message-ID: <48C64DED.3090103@gmail.com>

skip at pobox.com wrote:
>     Raymond> With the extra time, it would be worthwhile to add dbm.sqlite
>     Raymond> to 3.0 to compensate for the loss of bsddb so that shelves
>     Raymond> won't become useless on Windows builds.
> 
> My vote is to separate 2.6 and 3.0 then come back together for 2.7 and 3.1.
> I'm a bit less sure about adding dbm.sqlite.  Unless Josiah's version is
> substantially faster and more robust I think my version needs to cook a bit
> longer.  I'm just not comfortable enough with SQLite to pronounce my version
> fit enough.  I only intended it as a proof-of-concept, and it's clear it has
> some shortcomings.

Given that the *API* is fixed though, it is probably better to have the
module present in 3.0 and bring it back to the main line in 2.7.

If any absolute clangers from a performance/stability point of view get
past Raymond (and everyone else with an interest in this) then they can
be addressed in 3.0.1 in a few months time. Whereas if we leave the
module out entirely, then 3.0 users are completely out of luck until 3.1
(or have to download and possibly build pybsddb).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From solipsis at pitrou.net  Tue Sep  9 12:31:19 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 9 Sep 2008 10:31:19 +0000 (UTC)
Subject: [Python-3000] [Python-Dev] dbm.sqlite
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga0ppm$udq$1@ger.gmane.org>	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
	<18629.56199.90786.234922@montanaro-dyndns-org.local>
	<48C64DED.3090103@gmail.com>
Message-ID: <loom.20080909T102813-664@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> Given that the *API* is fixed though, it is probably better to have the
> module present in 3.0 and bring it back to the main line in 2.7.
> 
> If any absolute clangers from a performance/stability point of view get
> past Raymond (and everyone else with an interest in this) then they can
> be addressed in 3.0.1 in a few months time.

I agree about performance but I don't think it's right to say we can fix
stability later. This is a storage module, and people risk losing their data if
there are glaring bugs. If we really want an efficient dbm-compatible storage
backend for all platforms on 3.0, then why not bite the bullet and re-add bsddb?
Even though it has its quirks, it's certainly much more tested than a
hypothetical dbm.sqlite whipped up in a few days and used by nobody in the wild.

From stefan_ml at behnel.de  Tue Sep  9 12:55:07 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 9 Sep 2008 10:55:07 +0000 (UTC)
Subject: [Python-3000] PyUnicodeObject implementation
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>
	<loom.20080907T132230-87@post.gmane.org>
	<ga0pg4$to6$1@ger.gmane.org>
Message-ID: <loom.20080909T105135-954@post.gmane.org>

Stefan Behnel wrote:
> Antoine Pitrou wrote:
>> Stefan Behnel <stefan_ml <at> behnel.de> writes:
>>> From a Cython perspective, I find the lack of efficient subclassing after
>>> such a change particularly striking.
>> what do you call "efficient subclassing"? if you look at the current
>> implementation of unicode_subtype_new() in unicodeobject.c, it isn't very
>> efficient (everything including the raw data buffer is allocated twice).
> 
> That's something that may be optimised one day without affecting user code.

Coming back to this: Why is this done anyway? Can't the new instance of the 
unicode-subtype just steal the buffer pointer of the already allocated unicode 
object?

Stefan

From solipsis at pitrou.net  Tue Sep  9 13:13:57 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 9 Sep 2008 11:13:57 +0000 (UTC)
Subject: [Python-3000] PyUnicodeObject implementation
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>
	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<48C4464E.5010707@gmail.com>
	<ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>
	<48C642B5.2020109@egenix.com>
Message-ID: <loom.20080909T110315-549@post.gmane.org>

Hello,

M.-A. Lemburg <mal <at> egenix.com> writes:
> 
> It turned out that the patch only provides a marginal performance
> improvement, so the perceived main argument for the PyVarObject
> implementation doesn't turn out to be a real advantage.

Uh, while the results are not always overwhelming, they are however far better
than the simple freelist improvement (which is not even always an improvement).

>  * a fixed object size results in making good use of the Python
>    allocator, since all objects live in the same pool; as a result
>    you have better cache locality - which is good for situations
>    where you have to deal with lots of objects

I'm not sure how cache locality of unrelated unicode objects helps performance.
However, having a separate allocation in a different pool for the raw character
data implies that cache locality is worse when it comes to actually accessing
the character data (the pointer and the data is points to are in completely
different areas). Pointer chasing makes memory accesses impossible to predict,
and thus access latencies difficult to hide for the CPU.

Anyway, it's just theoretical speculation, I think running benchmarks and
comparing performance numbers is the most reasonable thing we can do (which is a
bit difficult since we don't have real-world benchmarks for string processing;
stringbench and pybench most probably run from the CPU cache and thus don't
really stress memory access patterns; it's why I chose the simplistic split() of
a very large string to demonstrate performance of my patches).

>  * objects should be small in order to have lots of them in
>    the free lists

But the freelists are less efficient since they only avoid one allocation and
not both of them. And if you make them avoid both allocations, then the
freelists are actually bigger in memory (because of more overhead).

Also, those two arguments could be made for lists vs. tuples, but I've never
seen anyone dispute that tuples are more efficient than lists.

> IMHO, it's a lot better to tweak the parameters that we have
> in the Unicode implementation (e.g. raise the KEEPALIVE_SIZE_LIMIT
> to 32, see the ticket for details) and to improve
> the memory allocator for storage of small memory chunks or
> improve the free list management (which Antoine did with his
> free list patch).

But that patch as I said above yields very mixed results, it even degrades
performance in some case. I'm not against apply (some variant of) it, but it's
really not a deal-breaker.

Regards

Antoine.

From jnoller at gmail.com  Tue Sep  9 14:49:20 2008
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 9 Sep 2008 08:49:20 -0400
Subject: [Python-3000] Proposed revised schedule
In-Reply-To: <ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
Message-ID: <DF2D22CB-BDF9-45BD-8928-0561CD4C5A3D@gmail.com>

On Sep 8, 2008, at 1:13 PM, "Guido van Rossum" <guido at python.org> wrote:

> On Mon, Sep 8, 2008 at 6:23 AM, Barry Warsaw <barry at python.org> wrote:
>> I don't think there's any way we're going to make our October 1st  
>> goal.  We
>> have 8 open release critical bugs, and 18 deferred blockers.  We do  
>> not have
>> a beta3 Windows installer and I don't have high hopes for  
>> rectifying all of
>> these problems in the next day or two.
>>
>> I propose that we push the entire schedule back two weeks.  This  
>> means that
>> the planned rc2 on 17-September becomes our rc1.  The planned final  
>> release
>> for 01-October becomes our rc2, and we release the finals on 15- 
>> October.
>>
>> - -Barry
>
> Perhaps it's time to separate the 2.6 and 3.0 release schedules? I
> don't care if the next version of OSX contains 3.0 or not -- but I do
> care about it having 2.6.
>

Given that 2.6 is going to be more widely adopted and used by both the  
community and OS distributors, I'm +1 on splitting the releases as well.

-Jesse 

From barry at python.org  Tue Sep  9 15:17:10 2008
From: barry at python.org (Barry Warsaw)
Date: Tue, 9 Sep 2008 09:17:10 -0400
Subject: [Python-3000] Proposed revised schedule
In-Reply-To: <ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
Message-ID: <937C6D77-3168-4127-8D4F-59AA291F0A86@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 8, 2008, at 1:13 PM, Guido van Rossum wrote:

> Perhaps it's time to separate the 2.6 and 3.0 release schedules? I
> don't care if the next version of OSX contains 3.0 or not -- but I do
> care about it having 2.6.

I've talked with my contact at MajorOS Vendor (tm) and, as much as he  
can say, he would be fine with this.  They're having problems getting  
3rd party modules to build against 3.0 anyway, but if we can release a  
very solid 2.6 by the 1-Oct deadline, I would support splitting the  
releases.

I really don't like doing this, but if we can get 2.6 out on time, and  
3.0 doesn't lag too far behind, I'm okay with it.  We'll have to  
abbreviate the release schedule though, so everyone should concentrate  
on fixing the 2.6 showstoppers.  I think we need to get 2.6rc1 out  
this week, followed by 2.6rc2 next Wednesday as planned and 2.6final  
on 1-October.

I've shuffled the tracker to reduce all 3.0-only bugs to deferred  
blocker, and to increase all 2.6 deferred blockers to release  
blockers.  There are 11 open blocker issues for 2.6:

3629 Python won't compile a regex that compiles with 2.5.2 and 30b2
3640 test_cpickle crash on AMD64 Windows build
3777 long(4.2) now returns an int
3781 warnings.catch_warnings fails gracelessly when recording warnings  
but...
2876 Write UserDict fixer for 2to3
2350 'exceptions' import fixer
3642 Objects/obmalloc.c:529: warning: comparison is always false due...
3617 Add MS EULA to the list of third-party licenses in the Windows...
3657 pickle can pickle the wrong function
1868 threading.local doesn't free attrs when assigning thread exits
3809 test_logging leaving a 'test.blah' file behind

If we can close them by Wednesday or Thursday, and the 2.6 bots stay  
green, I will cut the 2.6rc1 release this week and the 2.6rc2 and  
final on schedule.

If you're on board with this, please do what you can to resolve these  
open issues.  As always, I'm on irc if you need to discuss anything.

Cheers,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMZ3V3EjvBPtnXfVAQKLbAP6A9b0WBB0H/ONZbKie2TazK/qYLthYnZQ
iIpfJ2UboOA7dJ/ueXIsD413oI8GTbUOsUlJOWbSzAfJ6oBuPHrjr4IFRCZhchKG
lwViDaK/7aWgIusGFpt6y/SgwJBU531wb7o3Lx/P6rLx5Wh5Nr+tvhngt0WkSMSj
WtCsy3mmgmQ=
=3HdI
-----END PGP SIGNATURE-----

From barry at python.org  Tue Sep  9 15:21:53 2008
From: barry at python.org (Barry Warsaw)
Date: Tue, 9 Sep 2008 09:21:53 -0400
Subject: [Python-3000] Proposed revised schedule
In-Reply-To: <ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
Message-ID: <914B35A3-C8C2-42B6-9A3B-11E1F0F03998@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 8, 2008, at 7:25 PM, Guido van Rossum wrote:

> Well, from the number of release blockers it sounds like another 3.0
> beta is the right thing. For 2.6 however I believe we're much closer
> to the finish line -- there aren't all those bytes/str issues to clean
> up, for example! And apparently the benefit of releasing on schedule
> is that we will be included in OSX. That's a much bigger deal for 2.6
> than for 3.0 (I doubt that Apple would add two versions anyway).

The MajorOS Vendor (tm) may be willing to ship a 3.0 beta if it's far  
enough along, though not as the primary Python version.  They clearly  
want 2.6 for that.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMZ4cXEjvBPtnXfVAQL4ygP/fLILvf3NhvmN3R2T7htGm08xt/bOBYGt
+BDrV4rapS4j3jo2Cx+McEdjJZCdq9x7BIaTN+4ITwq02LEY5fmhp6NkhzE1dlnq
qdgBq8x/Z4AnsxfydtqYrPhrzLWPpdEZElgll5FB6Dj6XIA7cB8tuds2cE7+OXJI
Guom1Y0k6Ao=
=u4FB
-----END PGP SIGNATURE-----

From barry at python.org  Tue Sep  9 15:23:28 2008
From: barry at python.org (Barry Warsaw)
Date: Tue, 9 Sep 2008 09:23:28 -0400
Subject: [Python-3000] [Python-Dev]  Proposed revised schedule
In-Reply-To: <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org><ga0ppm$udq$1@ger.gmane.org><9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org><ga1ck7$ldi$1@ger.gmane.org><D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org><ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com><1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
Message-ID: <3DFD4AAC-D8EA-46E6-BC56-C713861C02B7@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 8, 2008, at 10:07 PM, Raymond Hettinger wrote:

> [Guido van Rossum]
>> Well, from the number of release blockers it sounds like another 3.0
>> beta is the right thing. For 2.6 however I believe we're much closer
>> to the finish line -- there aren't all those bytes/str issues to  
>> clean
>> up, for example! And apparently the benefit of releasing on schedule
>> is that we will be included in OSX. That's a much bigger deal for 2.6
>> than for 3.0 (I doubt that Apple would add two versions anyway).
>
> With the extra time, it would be worthwhile to add dbm.sqlite to 3.0
> to compensate for the loss of bsddb so that shelves won't become
> useless on Windows builds.

That seems risky to me.  First, it's a new feature.  Second, it will  
be largely untested code.  I would much rather see dbm.sqlite released  
as a separate package for possible integration into the core for 3.1.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMZ40XEjvBPtnXfVAQK2WQP/e3N2rYD2rbsoynEnXvAjzF8lPoPRFDvl
hbjERsbB93uSoBPHaTdjtXnW+InC0W4GC5ogHF9wARbzYTJaxx09WmjihX+PvgsW
JhXwLpG3gtyclfqSAF8MWZHc4UnKnyUt5UgYBlZrzT0z7FhWmelUPl8QhS8/2n9L
oT3qX8eLabI=
=Zu70
-----END PGP SIGNATURE-----

From barry at python.org  Tue Sep  9 15:25:03 2008
From: barry at python.org (Barry Warsaw)
Date: Tue, 9 Sep 2008 09:25:03 -0400
Subject: [Python-3000] [Python-Dev]  Proposed revised schedule
In-Reply-To: <ga587v$nvj$1@ger.gmane.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga0ppm$udq$1@ger.gmane.org>	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
	<ga587v$nvj$1@ger.gmane.org>
Message-ID: <BDEAED28-1FC6-4C58-B7BF-947326D49CBA@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 9, 2008, at 3:22 AM, Georg Brandl wrote:

> Even if I can't contribute very much at the moment, I'm still +1 to  
> that.
> I doubt Python would get nice publicity if we released a 3.0 but had  
> to
> tell everyone, "but don't really use it yet, it may still contain any
> number of showstoppers."

I completely agree.  We should not release anything that's not ready.   
Assuming that we all agree that 2.6 is much closer to being ready,  
that gives us two options: delay 2.6 to coincide with 3.0 or split the  
releases.  The latter seems like the wisest choice to meet our goals.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMZ5L3EjvBPtnXfVAQJwSQP/U7FFFI8ao5Xesf6F3QFIUMYFeISrlhof
9ynkQXAskUMelAfayGMSd2nD2+buXA7gyBWplAAEF2rtLhZ3N0+zeh/2HnqcY0b9
EtUM5shAIMlb2948IMoXlxSMplH5auBHMLYFnuPAIH9ERXsGVfyihLnUarAfzmT+
XrWfjrU62TA=
=CUR4
-----END PGP SIGNATURE-----

From ncoghlan at gmail.com  Tue Sep  9 16:17:27 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 10 Sep 2008 00:17:27 +1000
Subject: [Python-3000] [Python-Dev]  Proposed revised schedule
In-Reply-To: <914B35A3-C8C2-42B6-9A3B-11E1F0F03998@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga0ppm$udq$1@ger.gmane.org>	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
	<914B35A3-C8C2-42B6-9A3B-11E1F0F03998@python.org>
Message-ID: <48C68577.6070705@gmail.com>

Barry Warsaw wrote:
> On Sep 8, 2008, at 7:25 PM, Guido van Rossum wrote:
> 
>> Well, from the number of release blockers it sounds like another 3.0
>> beta is the right thing. For 2.6 however I believe we're much closer
>> to the finish line -- there aren't all those bytes/str issues to clean
>> up, for example! And apparently the benefit of releasing on schedule
>> is that we will be included in OSX. That's a much bigger deal for 2.6
>> than for 3.0 (I doubt that Apple would add two versions anyway).
> 
> The MajorOS Vendor (tm) may be willing to ship a 3.0 beta if it's far
> enough along, though not as the primary Python version.  They clearly
> want 2.6 for that.

Given that the sum total of actual Python 3.0 programs is currently
pretty close to zero, I don't really see any reason for *any* OS vendor
(even Linux distros) to be including a 3.0 interpreter in their base
install at this point in time. I personally expect it to stay in the
"optional extras" category until some time next year.

Pessimists-have-more-opportunities-to-be-pleasantly-surprised'ly,
Nick.

_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Tue Sep  9 16:21:14 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 10 Sep 2008 00:21:14 +1000
Subject: [Python-3000] [Python-Dev]  Proposed revised schedule
In-Reply-To: <937C6D77-3168-4127-8D4F-59AA291F0A86@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga0ppm$udq$1@ger.gmane.org>	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
	<937C6D77-3168-4127-8D4F-59AA291F0A86@python.org>
Message-ID: <48C6865A.5000703@gmail.com>

Barry Warsaw wrote:
> 3781 warnings.catch_warnings fails gracelessly when recording warnings

I just assigned this one to myself - I'll have a patch up for review
shortly (the patch will revert back to having this be a regression test
suite only feature).

Cheers,
Nick.

_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From pje at telecommunity.com  Tue Sep  9 19:06:15 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 09 Sep 2008 13:06:15 -0400
Subject: [Python-3000] Should package __init__ files include
 pkgutil.extend_path?
In-Reply-To: <bbaeab100809061528m490f7484m71074ac450ef49c1@mail.gmail.co
 m>
References: <18626.61673.143430.847735@montanaro-dyndns-org.local>
	<bbaeab100809061528m490f7484m71074ac450ef49c1@mail.gmail.com>
Message-ID: <20080909170519.36BAE3A4072@sparrow.telecommunity.com>

At 03:28 PM 9/6/2008 -0700, Brett Cannon wrote:
>On Sat, Sep 6, 2008 at 2:06 PM,  <skip at pobox.com> wrote:
> > I'm trying to figure out how to install this dbm.sqlite module I have
> > without overwriting the basic install.  My thought was to create a dbm
> > package in site-packages then copy sqlite.py there.  That doesn't work
> > though.  Modifying dbm.__init__.py to include this does:
> >
> >    import pkgutil
> >    __path__ = pkgutil.extend_path(__path__, __name__)
> >
> > I'm wondering if all the core packages in 3.x should include the above in
> > their __init__.py files.
> >
>
>Well, a side-effect of this is that all package imports will suddenly
>spike the number of stat calls linearly to the number of entries on
>sys.path.

"All package imports"?  "Spike"?

>Another option is to use a pth file that imports your module (as like
>_dbm_sqlite.py or something) and have it, as a side-effect of
>importing, set itself on dbm.

That adds an import to startup time, whether you use the package or 
not.  At least extend_path will only take effect if you actually 
import that package. 

From josiah.carlson at gmail.com  Tue Sep  9 19:43:34 2008
From: josiah.carlson at gmail.com (Josiah Carlson)
Date: Tue, 9 Sep 2008 10:43:34 -0700
Subject: [Python-3000] [Python-Dev] dbm.sqlite
In-Reply-To: <loom.20080909T102813-664@post.gmane.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
	<18629.56199.90786.234922@montanaro-dyndns-org.local>
	<48C64DED.3090103@gmail.com> <loom.20080909T102813-664@post.gmane.org>
Message-ID: <e6511dbf0809091043h3079b03dr2a8a7cde710a78ac@mail.gmail.com>

On Tue, Sep 9, 2008 at 3:31 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>>
>> Given that the *API* is fixed though, it is probably better to have the
>> module present in 3.0 and bring it back to the main line in 2.7.
>>
>> If any absolute clangers from a performance/stability point of view get
>> past Raymond (and everyone else with an interest in this) then they can
>> be addressed in 3.0.1 in a few months time.
>
> I agree about performance but I don't think it's right to say we can fix
> stability later. This is a storage module, and people risk losing their data if
> there are glaring bugs. If we really want an efficient dbm-compatible storage
> backend for all platforms on 3.0, then why not bite the bullet and re-add bsddb?
> Even though it has its quirks, it's certainly much more tested than a
> hypothetical dbm.sqlite whipped up in a few days and used by nobody in the wild.

Yes and no.  Sqlite in Python 3.0 has been tested and used more than
bsddb in Python 3.0, this can be trivially seen because sqlite has
been working in Python 3.0 for quite a while, which hasn't been the
case with bsddb in Python 3.0.  While the wrapper for sqlite to offer
a dbm-like interface is relatively untested (it does have testcases
thanks to Skip), dealing with a couple-hundred (at most) line wrapper
is far more reasonable for testing, verification, bugfixing, etc.,
than the wrappers for bsddb.

 - Josiah

From brett at python.org  Tue Sep  9 21:38:28 2008
From: brett at python.org (Brett Cannon)
Date: Tue, 9 Sep 2008 12:38:28 -0700
Subject: [Python-3000] Should package __init__ files include
	pkgutil.extend_path?
In-Reply-To: <20080909170519.36BAE3A4072@sparrow.telecommunity.com>
References: <18626.61673.143430.847735@montanaro-dyndns-org.local>
	<bbaeab100809061528m490f7484m71074ac450ef49c1@mail.gmail.com>
	<20080909170519.36BAE3A4072@sparrow.telecommunity.com>
Message-ID: <bbaeab100809091238h79905e95w2864d3cce6dfc2ba@mail.gmail.com>

On Tue, Sep 9, 2008 at 10:06 AM, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 03:28 PM 9/6/2008 -0700, Brett Cannon wrote:
>>
>> On Sat, Sep 6, 2008 at 2:06 PM,  <skip at pobox.com> wrote:
>> > I'm trying to figure out how to install this dbm.sqlite module I have
>> > without overwriting the basic install.  My thought was to create a dbm
>> > package in site-packages then copy sqlite.py there.  That doesn't work
>> > though.  Modifying dbm.__init__.py to include this does:
>> >
>> >    import pkgutil
>> >    __path__ = pkgutil.extend_path(__path__, __name__)
>> >
>> > I'm wondering if all the core packages in 3.x should include the above
>> > in
>> > their __init__.py files.
>> >
>>
>> Well, a side-effect of this is that all package imports will suddenly
>> spike the number of stat calls linearly to the number of entries on
>> sys.path.
>
> "All package imports"?  "Spike"?
>

pkgutil.extend_path() would be executed for every package imported by
the fact that is code at the global level of the module. And if you
look at the implementation of extend_path(), there is a
os.path.isdir() call for every entry on sys.path, and if that succeeds
there is os.path.isfile() call. Plus there is also an os.path.isfile()
call for every sys.path entry as well.

I call that a "spike" in "all package imports" in terms of stat calls
if this was added to all packages as suggested. And that can be
painful on systems where stat calls are expensive (e.g., NFS). At
least extend_path() appends so the new entries are put at the back of
the list.

>
>> Another option is to use a pth file that imports your module (as like
>> _dbm_sqlite.py or something) and have it, as a side-effect of
>> importing, set itself on dbm.
>
> That adds an import to startup time, whether you use the package or not.  At
> least extend_path will only take effect if you actually import that package.
>

Yes, it's a trade-off depending on what penalty cost you would prefer
to pay. But as I said, I don't like the idea of letting people inject
into the stdlib namespace like this in the first place so I don't want
this to happen in any official capacity.

-Brett

From python at rcn.com  Tue Sep  9 21:47:32 2008
From: python at rcn.com (Raymond Hettinger)
Date: Tue, 9 Sep 2008 12:47:32 -0700
Subject: [Python-3000] [Python-Dev] dbm.sqlite
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org><ga1ck7$ldi$1@ger.gmane.org><D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org><ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com><1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com><ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com><3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1><18629.56199.90786.234922@montanaro-dyndns-org.local><48C64DED.3090103@gmail.com>
	<loom.20080909T102813-664@post.gmane.org>
	<e6511dbf0809091043h3079b03dr2a8a7cde710a78ac@mail.gmail.com>
Message-ID: <D9F7B665D5634A079033110FE857056F@RaymondLaptop1>

>>> Given that the *API* is fixed though, it is probably better to have the
>>> module present in 3.0 and bring it back to the main line in 2.7.
>>>
>>> If any absolute clangers from a performance/stability point of view get
>>> past Raymond (and everyone else with an interest in this) then they can
>>> be addressed in 3.0.1 in a few months time.
>>
>> I agree about performance but I don't think it's right to say we can fix
>> stability later. This is a storage module, and people risk losing their data if
>> there are glaring bugs. If we really want an efficient dbm-compatible storage
>> backend for all platforms on 3.0, then why not bite the bullet and re-add bsddb?
>> Even though it has its quirks, it's certainly much more tested than a
>> hypothetical dbm.sqlite whipped up in a few days and used by nobody in the wild.
>
> Yes and no.  Sqlite in Python 3.0 has been tested and used more than
> bsddb in Python 3.0, this can be trivially seen because sqlite has
> been working in Python 3.0 for quite a while, which hasn't been the
> case with bsddb in Python 3.0.  While the wrapper for sqlite to offer
> a dbm-like interface is relatively untested (it does have testcases
> thanks to Skip), dealing with a couple-hundred (at most) line wrapper
> is far more reasonable for testing, verification, bugfixing, etc.,
> than the wrappers for bsddb.

I concur.  Sqlite is very stable, especially for our purposes here (records with only a text key paired with a pickled blob).  Also, 
the dbm API and mapping API's have been worked-out long ago.  Also, we've got the shelve test suite to exercise the setup.  And the 
wrapper module is small enough and simple enough to be very easier to review.  Doesn't get much easier than this.

Raymond

From solipsis at pitrou.net  Tue Sep  9 21:49:25 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 9 Sep 2008 19:49:25 +0000 (UTC)
Subject: [Python-3000] [Python-Dev] dbm.sqlite
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
	<18629.56199.90786.234922@montanaro-dyndns-org.local>
	<48C64DED.3090103@gmail.com>
	<loom.20080909T102813-664@post.gmane.org>
	<e6511dbf0809091043h3079b03dr2a8a7cde710a78ac@mail.gmail.com>
Message-ID: <loom.20080909T194122-356@post.gmane.org>

Josiah Carlson <josiah.carlson <at> gmail.com> writes:
> 
> While the wrapper for sqlite to offer
> a dbm-like interface is relatively untested (it does have testcases
> thanks to Skip), dealing with a couple-hundred (at most) line wrapper
> is far more reasonable for testing, verification, bugfixing, etc.,
> than the wrappers for bsddb.

There is theory and there is practice. There are lots of things that unittests
can't or often don't catch. I don't think it is reasonable at all to add a
completely new and untested storage backend at the last minute, while the usual
advice for module inclusion is to first publish it on PyPI to get feedback and
estimate popularity.

Another possibility is for the module to be clearly labeled as experimental.

From pje at telecommunity.com  Tue Sep  9 22:31:22 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 09 Sep 2008 16:31:22 -0400
Subject: [Python-3000] Should package __init__ files include
 pkgutil.extend_path?
In-Reply-To: <bbaeab100809091238h79905e95w2864d3cce6dfc2ba@mail.gmail.co
 m>
References: <18626.61673.143430.847735@montanaro-dyndns-org.local>
	<bbaeab100809061528m490f7484m71074ac450ef49c1@mail.gmail.com>
	<20080909170519.36BAE3A4072@sparrow.telecommunity.com>
	<bbaeab100809091238h79905e95w2864d3cce6dfc2ba@mail.gmail.com>
Message-ID: <20080909203021.8CC703A409C@sparrow.telecommunity.com>

At 12:38 PM 9/9/2008 -0700, Brett Cannon wrote:
>On Tue, Sep 9, 2008 at 10:06 AM, Phillip J. Eby <pje at telecommunity.com> wrote:
> > At 03:28 PM 9/6/2008 -0700, Brett Cannon wrote:
> >>
> >> On Sat, Sep 6, 2008 at 2:06 PM,  <skip at pobox.com> wrote:
> >> > I'm trying to figure out how to install this dbm.sqlite module I have
> >> > without overwriting the basic install.  My thought was to create a dbm
> >> > package in site-packages then copy sqlite.py there.  That doesn't work
> >> > though.  Modifying dbm.__init__.py to include this does:
> >> >
> >> >    import pkgutil
> >> >    __path__ = pkgutil.extend_path(__path__, __name__)
> >> >
> >> > I'm wondering if all the core packages in 3.x should include the above
> >> > in
> >> > their __init__.py files.
> >> >
> >>
> >> Well, a side-effect of this is that all package imports will suddenly
> >> spike the number of stat calls linearly to the number of entries on
> >> sys.path.
> >
> > "All package imports"?  "Spike"?
> >
>
>pkgutil.extend_path() would be executed for every package imported by
>the fact that is code at the global level of the module.

Each package that uses it and that is imported, yes.

>  And if you
>look at the implementation of extend_path(), there is a
>os.path.isdir() call for every entry on sys.path, and if that succeeds
>there is os.path.isfile() call. Plus there is also an os.path.isfile()
>call for every sys.path entry as well.

Note, btw, that that could be greatly reduced by use of 
sys.path_importer_cache; only entries that are missing or None need 
to have the subdirectory check.

>I call that a "spike" in "all package imports" in terms of stat calls
>if this was added to all packages as suggested. And that can be
>painful on systems where stat calls are expensive (e.g., NFS). At
>least extend_path() appends so the new entries are put at the back of
>the list.

...which actually negates the entire point of the proposal, which was 
somebody wanting to be able to install an override/upgrade of a 
module in a stdlib package.

>Yes, it's a trade-off depending on what penalty cost you would prefer
>to pay. But as I said, I don't like the idea of letting people inject
>into the stdlib namespace like this in the first place so I don't want
>this to happen in any official capacity.

IIUC, the OP was requesting the ability to *upgrade* a 
stdlib-provided module, not add items to the namespace.

From mal at egenix.com  Tue Sep  9 23:26:34 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 09 Sep 2008 23:26:34 +0200
Subject: [Python-3000] Should package __init__ files
	include	pkgutil.extend_path?
In-Reply-To: <18626.61673.143430.847735@montanaro-dyndns-org.local>
References: <18626.61673.143430.847735@montanaro-dyndns-org.local>
Message-ID: <48C6EA0A.9020006@egenix.com>

On 2008-09-06 23:06, skip at pobox.com wrote:
> I'm trying to figure out how to install this dbm.sqlite module I have
> without overwriting the basic install.  My thought was to create a dbm
> package in site-packages then copy sqlite.py there.  That doesn't work
> though.  Modifying dbm.__init__.py to include this does:
> 
>     import pkgutil
>     __path__ = pkgutil.extend_path(__path__, __name__)
> 
> I'm wondering if all the core packages in 3.x should include the above in
> their __init__.py files.

If all you want to do is get the module into the dbm package, why not
make this explicit by requiring an import to install the extra module ?!

import install_dbm_sqlite

which then does:

import sys, dbm
import dbm_sqlite

# Install dbm_sqlite into the dbm package
sys.modules['dbm.sqlite'] = dbm_sqlite
dbm.sqlite = dbm_sqlite

Unlike pkgutil, this also works with ZIP files and frozen modules
and makes the installation explicit and visible to the user reading
your code.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 09 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at egenix.com  Tue Sep  9 23:37:15 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 09 Sep 2008 23:37:15 +0200
Subject: [Python-3000] [Python-Dev] dbm.sqlite
In-Reply-To: <loom.20080909T194122-356@post.gmane.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>	<18629.56199.90786.234922@montanaro-dyndns-org.local>	<48C64DED.3090103@gmail.com>	<loom.20080909T102813-664@post.gmane.org>	<e6511dbf0809091043h3079b03dr2a8a7cde710a78ac@mail.gmail.com>
	<loom.20080909T194122-356@post.gmane.org>
Message-ID: <48C6EC8B.7060502@egenix.com>

On 2008-09-09 21:49, Antoine Pitrou wrote:
> Josiah Carlson <josiah.carlson <at> gmail.com> writes:
>> While the wrapper for sqlite to offer
>> a dbm-like interface is relatively untested (it does have testcases
>> thanks to Skip), dealing with a couple-hundred (at most) line wrapper
>> is far more reasonable for testing, verification, bugfixing, etc.,
>> than the wrappers for bsddb.
> 
> There is theory and there is practice. There are lots of things that unittests
> can't or often don't catch. I don't think it is reasonable at all to add a
> completely new and untested storage backend at the last minute, while the usual
> advice for module inclusion is to first publish it on PyPI to get feedback and
> estimate popularity.
> 
> Another possibility is for the module to be clearly labeled as experimental.

Since when are we adding experimental modules to the stdlib ?

Also, we're past beta3. This is not the time to add completely new
modules to the stdlib - regardless of whether they are stable,
experimental or something in between.

Note that this doesn't mean to imply anything regarding the module
implementation itself. It's a matter of quality control and
assurance.

Besides, what's so bad with downloading and installing a package
from PyPI ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 09 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From skip at pobox.com  Wed Sep 10 00:03:11 2008
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 9 Sep 2008 17:03:11 -0500
Subject: [Python-3000] Should package __init__ files include
 pkgutil.extend_path?
In-Reply-To: <48C6EA0A.9020006@egenix.com>
References: <18626.61673.143430.847735@montanaro-dyndns-org.local>
	<48C6EA0A.9020006@egenix.com>
Message-ID: <18630.62111.952479.189074@montanaro-dyndns-org.local>

    mal> If all you want to do is get the module into the dbm package, why
    mal> not make this explicit by requiring an import to install the extra
    mal> module ?!

    mal> import install_dbm_sqlite

    mal> which then does:

    mal> import sys, dbm
    mal> import dbm_sqlite

    mal> # Install dbm_sqlite into the dbm package
    mal> sys.modules['dbm.sqlite'] = dbm_sqlite
    mal> dbm.sqlite = dbm_sqlite

I was hoping to make migration from an external module in test to a module
distributed with Python (if it gets that far) as seamless as possible.

Skip

From skip at pobox.com  Wed Sep 10 00:26:31 2008
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 9 Sep 2008 17:26:31 -0500
Subject: [Python-3000] [Python-Dev] dbm.sqlite
In-Reply-To: <48C6EC8B.7060502@egenix.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
	<18629.56199.90786.234922@montanaro-dyndns-org.local>
	<48C64DED.3090103@gmail.com> <loom.20080909T102813-664@post.gmane.org>
	<e6511dbf0809091043h3079b03dr2a8a7cde710a78ac@mail.gmail.com>
	<loom.20080909T194122-356@post.gmane.org> <48C6EC8B.7060502@egenix.com>
Message-ID: <18630.63511.654735.927024@montanaro-dyndns-org.local>

    mal> Besides, what's so bad with downloading and installing a package
    mal> from PyPI ?

Nothing, I do it all the time.  But my impression is that when an external
module moves into the core it frequently undergoes some type of name change
(e.g. pysqlite vs sqlite3 or Optik vs optparse) even if the two versions are
functionally identical.  In this case, my hope is that dbm.sqlite will
eventually move into the distributed dbm package.  If so, it would be nice
if the move was transparent.

Skip

From martin at v.loewis.de  Wed Sep 10 00:29:35 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 10 Sep 2008 00:29:35 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <loom.20080909T105135-954@post.gmane.org>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<loom.20080907T132230-87@post.gmane.org>	<ga0pg4$to6$1@ger.gmane.org>
	<loom.20080909T105135-954@post.gmane.org>
Message-ID: <48C6F8CF.1080905@v.loewis.de>

> Coming back to this: Why is this done anyway? Can't the new instance of the 
> unicode-subtype just steal the buffer pointer of the already allocated unicode 
> object?

Only if the refcount of the tmp object is 1. But then, yes, it could.
You then also need to change unicode_dealloc, to only optionally release
the pointer (and probably also to not put the object into the freelist
if it doesn't have a str pointer). IOW, no :-)

Regards,
Martin

From mal at egenix.com  Wed Sep 10 11:23:54 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 10 Sep 2008 11:23:54 +0200
Subject: [Python-3000] [Python-Dev] dbm.sqlite
In-Reply-To: <18630.63511.654735.927024@montanaro-dyndns-org.local>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>	<18629.56199.90786.234922@montanaro-dyndns-org.local>	<48C64DED.3090103@gmail.com>
	<loom.20080909T102813-664@post.gmane.org>	<e6511dbf0809091043h3079b03dr2a8a7cde710a78ac@mail.gmail.com>	<loom.20080909T194122-356@post.gmane.org>
	<48C6EC8B.7060502@egenix.com>
	<18630.63511.654735.927024@montanaro-dyndns-org.local>
Message-ID: <48C7922A.2070401@egenix.com>

On 2008-09-10 00:26, skip at pobox.com wrote:
>     mal> Besides, what's so bad with downloading and installing a package
>     mal> from PyPI ?
> 
> Nothing, I do it all the time.  But my impression is that when an external
> module moves into the core it frequently undergoes some type of name change
> (e.g. pysqlite vs sqlite3 or Optik vs optparse) even if the two versions are
> functionally identical.  In this case, my hope is that dbm.sqlite will
> eventually move into the distributed dbm package.  If so, it would be nice
> if the move was transparent.

Transparent as in "I don't have to change my code" ?

I actually find it helpful to have the PyPI packages that ended up in
the stdlib use different names, since that opens up the possibility
to use the more current releases from PyPI in an application.

Switching back to the core version usually just takes a one line
change, if at all...

try:
    import pysqlite as sqlite
except ImportError:
    import sqlite3 as sqlite

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 10 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at gmail.com  Wed Sep 10 11:55:35 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 10 Sep 2008 19:55:35 +1000
Subject: [Python-3000] [Python-Dev] dbm.sqlite
In-Reply-To: <48C7922A.2070401@egenix.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>	<18629.56199.90786.234922@montanaro-dyndns-org.local>	<48C64DED.3090103@gmail.com>	<loom.20080909T102813-664@post.gmane.org>	<e6511dbf0809091043h3079b03dr2a8a7cde710a78ac@mail.gmail.com>	<loom.20080909T194122-356@post.gmane.org>	<48C6EC8B.7060502@egenix.com>	<18630.63511.654735.927024@montanaro-dyndns-org.local>
	<48C7922A.2070401@egenix.com>
Message-ID: <48C79997.3040705@gmail.com>

M.-A. Lemburg wrote:
> On 2008-09-10 00:26, skip at pobox.com wrote:
>>     mal> Besides, what's so bad with downloading and installing a package
>>     mal> from PyPI ?
>>
>> Nothing, I do it all the time.  But my impression is that when an external
>> module moves into the core it frequently undergoes some type of name change
>> (e.g. pysqlite vs sqlite3 or Optik vs optparse) even if the two versions are
>> functionally identical.  In this case, my hope is that dbm.sqlite will
>> eventually move into the distributed dbm package.  If so, it would be nice
>> if the move was transparent.
> 
> Transparent as in "I don't have to change my code" ?
> 
> I actually find it helpful to have the PyPI packages that ended up in
> the stdlib use different names, since that opens up the possibility
> to use the more current releases from PyPI in an application.
> 
> Switching back to the core version usually just takes a one line
> change, if at all...
> 
> try:
>     import pysqlite as sqlite
> except ImportError:
>     import sqlite3 as sqlite
> 
I still think it would be kind of nice to be able to write that as:

import pysqlite or sqlite3 as sqlite

(ditto for "from pysqlite or sqlite3 import <whatever>")

You could even do it as a pre-AST transform (similar to
try/except/finally) and not even have to go anywhere near the
implementation of the import system itself.

I've never been motivated enough to write a PEP about it though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From fdrake at acm.org  Wed Sep 10 14:09:32 2008
From: fdrake at acm.org (Fred Drake)
Date: Wed, 10 Sep 2008 08:09:32 -0400
Subject: [Python-3000] [Python-Dev] dbm.sqlite
In-Reply-To: <48C7922A.2070401@egenix.com>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>	<18629.56199.90786.234922@montanaro-dyndns-org.local>	<48C64DED.3090103@gmail.com>
	<loom.20080909T102813-664@post.gmane.org>	<e6511dbf0809091043h3079b03dr2a8a7cde710a78ac@mail.gmail.com>	<loom.20080909T194122-356@post.gmane.org>
	<48C6EC8B.7060502@egenix.com>
	<18630.63511.654735.927024@montanaro-dyndns-org.local>
	<48C7922A.2070401@egenix.com>
Message-ID: <605AFC31-2E44-4787-BCC8-929DE09F0BCA@acm.org>

On Sep 10, 2008, at 5:23 AM, M.-A. Lemburg wrote:
> I actually find it helpful to have the PyPI packages that ended up in
> the stdlib use different names, since that opens up the possibility
> to use the more current releases from PyPI in an application.

I'm with Marc-Andre on this; using the same module names for different  
codebases is a pain.  The standard library and the rest of the Python  
world shouldn't overlap module names (this is part of why many have  
requested a separate namespace for the Python standard library).

   -Fred

-- 
Fred Drake   <fdrake at acm.org>

From guido at python.org  Wed Sep 10 18:11:41 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 10 Sep 2008 09:11:41 -0700
Subject: [Python-3000] [Python-Dev] dbm.sqlite
In-Reply-To: <605AFC31-2E44-4787-BCC8-929DE09F0BCA@acm.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<18629.56199.90786.234922@montanaro-dyndns-org.local>
	<48C64DED.3090103@gmail.com> <loom.20080909T102813-664@post.gmane.org>
	<e6511dbf0809091043h3079b03dr2a8a7cde710a78ac@mail.gmail.com>
	<loom.20080909T194122-356@post.gmane.org>
	<48C6EC8B.7060502@egenix.com>
	<18630.63511.654735.927024@montanaro-dyndns-org.local>
	<48C7922A.2070401@egenix.com>
	<605AFC31-2E44-4787-BCC8-929DE09F0BCA@acm.org>
Message-ID: <ca471dc20809100911y5b0708ffoe9299093d75f7023@mail.gmail.com>

2008/9/10 Fred Drake <fdrake at acm.org>:
> On Sep 10, 2008, at 5:23 AM, M.-A. Lemburg wrote:
>> I actually find it helpful to have the PyPI packages that ended up in
>> the stdlib use different names, since that opens up the possibility
>> to use the more current releases from PyPI in an application.
>
> I'm with Marc-Andre on this; using the same module names for different
codebases is a pain.  The standard library and the rest of the Python world
shouldn't overlap module names (this is part of why many have requested a
separate namespace for the Python standard library).

+1.

Remember the xml / _xmlplus debacle?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080910/bf43951d/attachment.htm>

From edreamleo at gmail.com  Thu Sep 11 17:56:48 2008
From: edreamleo at gmail.com (Edward K. Ream)
Date: Thu, 11 Sep 2008 10:56:48 -0500
Subject: [Python-3000] 2to3 still broken with b3 on XP
Message-ID: <ffb592890809110856k7047d800p98c0c91c4d0952b6@mail.gmail.com>

Just downloaded the latest Windows installer, python-3.0b3.msi

The 2to3 script is still broken:

C:\Python30\Tools\Scripts>c:\python30\python.exe 2to3.py
Traceback (most recent call last):
  File "2to3.py", line 5, in <module>
    sys.exit(refactor.main())
TypeError: main() takes at least 1 positional argument (0 given)

Edward
--------------------------------------------------------------------
Edward K. Ream email: edreamleo at gmail.com
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------

From edreamleo at gmail.com  Thu Sep 11 18:46:33 2008
From: edreamleo at gmail.com (Edward K. Ream)
Date: Thu, 11 Sep 2008 11:46:33 -0500
Subject: [Python-3000] 2to3 still broken with b3 on XP
In-Reply-To: <ffb592890809110856k7047d800p98c0c91c4d0952b6@mail.gmail.com>
References: <ffb592890809110856k7047d800p98c0c91c4d0952b6@mail.gmail.com>
Message-ID: <ffb592890809110946g125b379eu7b3c3b3042d4255f@mail.gmail.com>

On Thu, Sep 11, 2008 at 10:56 AM, Edward K. Ream <edreamleo at gmail.com> wrote:

> The 2to3 script is still broken:
>
> C:\Python30\Tools\Scripts>c:\python30\python.exe 2to3.py
> Traceback (most recent call last):
>  File "2to3.py", line 5, in <module>
>    sys.exit(refactor.main())
> TypeError: main() takes at least 1 positional argument (0 given)

I hacked 2to3.py as follows:

#!/usr/bin/env python
from lib2to3 import refactor
import sys
import os
sys.exit(refactor.main(fixer_dir=os.curdir))

This mostly seems to work.  However, non-ascii characters can cause a crash:

QQQQQQ
C:\leo.repo\trunk\leo>fix test\fix-failure.py

C:\leo.repo\trunk\leo>c:\Python30\python.exe
c:\Python30\Tools\Scripts\2to3.py test\fix-failure.py
Traceback (most recent call last):
  File "c:\Python30\Tools\Scripts\2to3.py", line 9, in <module>
    sys.exit(refactor.main(fixer_dir=os.curdir))
  File "c:\python30\lib\lib2to3\refactor.py", line 85, in main
    rt.refactor_args(args)
  File "c:\python30\lib\lib2to3\refactor.py", line 243, in refactor_args
    self.refactor_file(arg)
  File "c:\python30\lib\lib2to3\refactor.py", line 272, in refactor_file
    input = f.read() + "\n" # Silence certain parse errors
  File "c:\python30\lib\io.py", line 1719, in read
    decoder.decode(self.buffer.read(), final=True))
  File "c:\python30\lib\io.py", line 1294, in decode
    output = self.decoder.decode(input, final=final)
  File "c:\python30\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position
47: character maps to <undefined>

C:\leo.repo\trunk\leo>print /d:con test\fix-failure.py
C:\leo.repo\trunk\leo\test\fix-failure.py is currently being printed
# -*- coding: utf-8 -*-

s = 'abc'.replace(u'??? ???', '" "')
?
C:\leo.repo\trunk\leo>
QQQQQQ

The actual contents of the file are:

# -*- coding: utf-8 -*-

s = 'abc'.replace(u'" "', '" "')

BTW, in other situations I've seen similar crashes with non-ascii
characters outside the first 256 characters on Python 2.5, which seems
strange to me because Leo handles all kinds of unicode characters
correctly.  I have no idea whether I am doing something wrong.

The following appears in my sitecustomize.py file:

sys.setdefaultencoding('utf-8')

Edward
--------------------------------------------------------------------
Edward K. Ream email: edreamleo at gmail.com
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------

From edreamleo at gmail.com  Thu Sep 11 18:53:38 2008
From: edreamleo at gmail.com (Edward K. Ream)
Date: Thu, 11 Sep 2008 11:53:38 -0500
Subject: [Python-3000] 2to3 still broken with b3 on XP
In-Reply-To: <ffb592890809110946g125b379eu7b3c3b3042d4255f@mail.gmail.com>
References: <ffb592890809110856k7047d800p98c0c91c4d0952b6@mail.gmail.com>
	<ffb592890809110946g125b379eu7b3c3b3042d4255f@mail.gmail.com>
Message-ID: <ffb592890809110953m1e9cecb6l37728c9a7640c1cd@mail.gmail.com>

On Thu, Sep 11, 2008 at 11:46 AM, Edward K. Ream <edreamleo at gmail.com> wrote:

> The actual contents of the file are:
>
> # -*- coding: utf-8 -*-
>
> s = 'abc'.replace(u'" "', '" "')

The chars got munged. The first string should have the following characters:

U+201C: left double quotation mark
U+201D: right double quotation mark

Edward

--------------------------------------------------------------------
Edward K. Ream email: edreamleo at gmail.com
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------

From skip at pobox.com  Fri Sep 12 14:17:55 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 12 Sep 2008 07:17:55 -0500
Subject: [Python-3000] How much should non-dict mappings behave like dict?
Message-ID: <18634.24051.300204.451209@montanaro-dyndns-org.local>

In issue 3783 (http://bugs.python.org/issue3783) the question was raised
about whether or not it's worthwhile making this guarantee:

    zip(d.keys(), d.values()) == d.items()

in the face of no changes to the mapping object.  At issue is whether the
SQL query should force a predictable order on the keys and values fetched
from the database or if that's just wasted CPU cycles.  Making it concrete,
should these two SELECT statements force a consistent ordering on the keys
and values retrieved from the database:

    select key from dict order by key
    select value from dict order by key

Currently SQLite does return the keys and values in the same, predictable,
order, but doesn't guarantee that behavior (so it could change in the
future).

While the discussion in the issue is related to this nascent dbm.sqlite
module, I think it's worth considering the more general issue of how
behavior non-dict mapping types should be required to share with the dict
type.

In the section "Mapping Types -- dict" in the 2.5.2 library reference:

    http://docs.python.org/lib/typesmapping.html

there is a footnote about ordering of keys and values:

    Keys and values are listed in an arbitrary order which is non-random,
    varies across Python implementations, and depends on the dictionary's
    history of insertions and deletions. If items(), keys(), values(),
    iteritems(), iterkeys(), and itervalues() are called with no intervening
    modifications to the dictionary, the lists will directly
    correspond. This allows the creation of (value, key) pairs using zip():
    "pairs = zip(a.values(), a.keys())". The same relationship holds for the
    iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(),
    a.iterkeys())" provides the same value for pairs. Another way to create
    the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]".

It's not entirely clear if this page is meant to apply just to dictionaries
or if (to the extent possible) it should apply to all mapping types.  I'm of
the opinion it should apply more broadly.  Others are not of that opinion.
Should the documentation be more explicit about this?

Comments?  

Thx,

Skip

From eric at trueblade.com  Fri Sep 12 14:51:32 2008
From: eric at trueblade.com (Eric Smith)
Date: Fri, 12 Sep 2008 08:51:32 -0400
Subject: [Python-3000] How much should non-dict mappings behave like
	dict?
In-Reply-To: <18634.24051.300204.451209@montanaro-dyndns-org.local>
References: <18634.24051.300204.451209@montanaro-dyndns-org.local>
Message-ID: <48CA65D4.9070702@trueblade.com>

skip at pobox.com wrote:
> In issue 3783 (http://bugs.python.org/issue3783) the question was raised
> about whether or not it's worthwhile making this guarantee:
> 
>     zip(d.keys(), d.values()) == d.items()
> 
> in the face of no changes to the mapping object.  At issue is whether the
> SQL query should force a predictable order on the keys and values fetched
> from the database or if that's just wasted CPU cycles.  Making it concrete,
> should these two SELECT statements force a consistent ordering on the keys
> and values retrieved from the database:
> 
>     select key from dict order by key
>     select value from dict order by key
> 
> Currently SQLite does return the keys and values in the same, predictable,
> order, but doesn't guarantee that behavior (so it could change in the
> future).
> 
> While the discussion in the issue is related to this nascent dbm.sqlite
> module, I think it's worth considering the more general issue of how
> behavior non-dict mapping types should be required to share with the dict
> type.
> 
> In the section "Mapping Types -- dict" in the 2.5.2 library reference:
> 
>     http://docs.python.org/lib/typesmapping.html
> 
> there is a footnote about ordering of keys and values:
> 
>     Keys and values are listed in an arbitrary order which is non-random,
>     varies across Python implementations, and depends on the dictionary's
>     history of insertions and deletions. If items(), keys(), values(),
>     iteritems(), iterkeys(), and itervalues() are called with no intervening
>     modifications to the dictionary, the lists will directly
>     correspond. This allows the creation of (value, key) pairs using zip():
>     "pairs = zip(a.values(), a.keys())". The same relationship holds for the
>     iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(),
>     a.iterkeys())" provides the same value for pairs. Another way to create
>     the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]".
> 
> It's not entirely clear if this page is meant to apply just to dictionaries
> or if (to the extent possible) it should apply to all mapping types.  I'm of
> the opinion it should apply more broadly.  Others are not of that opinion.
> Should the documentation be more explicit about this?
> 
> Comments?  

I think the guarantee should be removed from dicts, and in any event 
shouldn't be a requirement for other mappings. I think the ordering 
should be an implementation detail, just as the part that says "depends 
on the dictionary's history of insertions and deletions" need not be 
true for all mapping implementations. I wouldn't want the performance of 
any dictionary or any mapping type bound by these constraints, 
especially one that might be large and in a database.

Given items(), I don't see why you'd ever need "zip(a.keys(), 
a.values())" to work.

Antoine makes many of these same points in the issue comments.

But then I have no current or imagined use case that would rely on this 
behavior. Others may of course disagree. Has anyone ever seen code that 
relies on this? Might such code predate items()?

Eric.

From barry at python.org  Fri Sep 12 14:54:20 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 12 Sep 2008 08:54:20 -0400
Subject: [Python-3000] Updated release schedule for 2.6 and 3.0
Message-ID: <93DF7138-3E99-44E3-AF7D-01D89D13E910@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

We had a lot of discussion recently about changing the release  
schedule and splitting Python 2.6 and 3.0.  There was general  
consensus that this was a good idea, in order to hit our October 1  
deadline for Python 2.6 final at least.

There is only one open blocker for 2.6, issue 3657.  Andrew, Fred, Tim  
and I (via IRC) will be getting together tonight to do some Python  
hacking, so we should resolve this issue and release 2.6rc1 tonight.

We'll have an abbreviated 2.6rc1, and I will release 2.6rc2 next  
Wednesday the 17th as planned. The final planned release of 2.6 will  
be Wednesday October 1st.

If 3.0 is looking better, I will release 3.0rc1 on Wednesday,  
otherwise we'll re-evaluate the release schedule for 3.0 as  
necessary.  This means currently the schedule looks like this:

Fri 12-Sep 2.6rc1
Wed 17-Sep 2.6rc2, 3.0rc1
Wed 01-Oct 2.6 final, 3.0rc2
Wed 15-Oct 3.0 final

I've updated the Python Release Schedule gcal and will update the PEP  
momentarily.  We'll close the tree later tonight (UTC-4).

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMpmfHEjvBPtnXfVAQJUiQP/eTXStyp9M0+Ja7iAFfYcpzfM19j9ddBr
ocMC+KTDSgci5/rmFw4KdMhqO9TBk2sXIdCd9GInnuMOKtndCKZ/PXaexnVvSVGb
P3CpkkMs/vG1itQIc/EXq6CUhzwuxEv9h8Wo8+zcmL05Cc1IrE5d2OYiO0+KQ8ei
lW+j/aNKMWY=
=w2oI
-----END PGP SIGNATURE-----

From skip at pobox.com  Fri Sep 12 14:59:37 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 12 Sep 2008 07:59:37 -0500
Subject: [Python-3000] How much should non-dict mappings behave like
	dict?
In-Reply-To: <48CA65D4.9070702@trueblade.com>
References: <18634.24051.300204.451209@montanaro-dyndns-org.local>
	<48CA65D4.9070702@trueblade.com>
Message-ID: <18634.26553.442410.703389@montanaro-dyndns-org.local>

    Eric> Given items(), I don't see why you'd ever need "zip(a.keys(), 
    Eric> a.values())" to work.

    Eric> Antoine makes many of these same points in the issue comments.

And as I pointed out there's no telling what users will do.  The
zip(keys,values) behavior works for dicts and has probably worked for other
mapping types.  I'm just asking here whether or not the Python documentation
should be more explicit about what is and isn't expected of mapping types.

    Eric> But then I have no current or imagined use case that would rely on
    Eric> this behavior. Others may of course disagree. Has anyone ever seen
    Eric> code that relies on this? Might such code predate items()?

Or might accidentally work:

    keys = d.keys()
    ...do something with keys and other lists...
    # sometime later...
    vals = d.values()
    ...do something with vals and those other lists...

It might just be by accident that the code works, or it might be code which,
as you indicated, predates items() (though it's been around quite awhile
now).

The point of only addressing this to the topic of Python 3 is that this is
presumed to be the place where we remove as many warts as possible.  It
would also be nice to tighten up the documentation in these somewhat murky
areas.

Skip

From edreamleo at gmail.com  Fri Sep 12 15:19:27 2008
From: edreamleo at gmail.com (Edward K. Ream)
Date: Fri, 12 Sep 2008 08:19:27 -0500
Subject: [Python-3000] Updated release schedule for 2.6 and 3.0
In-Reply-To: <93DF7138-3E99-44E3-AF7D-01D89D13E910@python.org>
References: <93DF7138-3E99-44E3-AF7D-01D89D13E910@python.org>
Message-ID: <ffb592890809120619ne821addj108a22e7163afab6@mail.gmail.com>

On Fri, Sep 12, 2008 at 7:54 AM, Barry Warsaw <barry at python.org> wrote:

> We had a lot of discussion recently about changing the release schedule and
> splitting Python 2.6 and 3.0.  There was general consensus that this was a
> good idea, in order to hit our October 1 deadline for Python 2.6 final at
> least.

Does it matter to anyone besides the you, the Python developers,
whether the schedule slips by two weeks, or two months, for that
matter?

I am underwhelmed by 3.0 b3: sax and 2to3 are/were broken.  A b4
release for 3.0 (at least) would seem more prudent.

Edward

From barry at python.org  Fri Sep 12 15:20:46 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 12 Sep 2008 09:20:46 -0400
Subject: [Python-3000] Updated release schedule for 2.6 and 3.0
In-Reply-To: <ffb592890809120619ne821addj108a22e7163afab6@mail.gmail.com>
References: <93DF7138-3E99-44E3-AF7D-01D89D13E910@python.org>
	<ffb592890809120619ne821addj108a22e7163afab6@mail.gmail.com>
Message-ID: <364A23D0-DC52-4C1D-B853-1D8A30C6C928@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 12, 2008, at 9:19 AM, Edward K. Ream wrote:

> On Fri, Sep 12, 2008 at 7:54 AM, Barry Warsaw <barry at python.org>  
> wrote:
>
>> We had a lot of discussion recently about changing the release  
>> schedule and
>> splitting Python 2.6 and 3.0.  There was general consensus that  
>> this was a
>> good idea, in order to hit our October 1 deadline for Python 2.6  
>> final at
>> least.
>
> Does it matter to anyone besides the you, the Python developers,
> whether the schedule slips by two weeks, or two months, for that
> matter?

For Python 3.0?  No.

> I am underwhelmed by 3.0 b3: sax and 2to3 are/were broken.  A b4
> release for 3.0 (at least) would seem more prudent.

We will release no Python before its time.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSMpsr3EjvBPtnXfVAQKCUQP/WYfGeubbkWjmwI9mlQ4dMbVjGk15imAJ
ArIBs4sH9tbZTE12uNhjNgvXRbN+1QfejNeWOEJEAnPdErPAT0TKAmgA2Rj1MmjP
ook5+MbxkgkNnKbz8lozMPduclc7Djf22CYboAqiskK7G6LfD1fsCrIMEVSku/HX
dQpXGkG/C4g=
=FYgJ
-----END PGP SIGNATURE-----

From guido at python.org  Fri Sep 12 18:13:05 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 12 Sep 2008 09:13:05 -0700
Subject: [Python-3000] How much should non-dict mappings behave like
	dict?
In-Reply-To: <18634.24051.300204.451209@montanaro-dyndns-org.local>
References: <18634.24051.300204.451209@montanaro-dyndns-org.local>
Message-ID: <ca471dc20809120913j19747998yc855fcb60cc0b3d3@mail.gmail.com>

2008/9/12 skip <skip at pobox.com>:
> In issue 3783 (http://bugs.python.org/issue3783) the question was raised
> about whether or not it's worthwhile making this guarantee:
>
>    zip(d.keys(), d.values()) == d.items()
>
> in the face of no changes to the mapping object.  At issue is whether the
> SQL query should force a predictable order on the keys and values fetched
> from the database or if that's just wasted CPU cycles.  Making it
concrete,
> should these two SELECT statements force a consistent ordering on the keys
> and values retrieved from the database:
>
>    select key from dict order by key
>    select value from dict order by key

What's the purpose of the "order by key" clauses here? Doesn't that force
the return order? Perhaps you meant to leave those out?

> Currently SQLite does return the keys and values in the same, predictable,
> order, but doesn't guarantee that behavior (so it could change in the
> future).
>
> While the discussion in the issue is related to this nascent dbm.sqlite
> module, I think it's worth considering the more general issue of how
> behavior non-dict mapping types should be required to share with the dict
> type.
>
> In the section "Mapping Types -- dict" in the 2.5.2 library reference:
>
>    http://docs.python.org/lib/typesmapping.html
>
> there is a footnote about ordering of keys and values:
>
>    Keys and values are listed in an arbitrary order which is non-random,
>    varies across Python implementations, and depends on the dictionary's
>    history of insertions and deletions. If items(), keys(), values(),
>    iteritems(), iterkeys(), and itervalues() are called with no
intervening
>    modifications to the dictionary, the lists will directly
>    correspond. This allows the creation of (value, key) pairs using zip():
>    "pairs = zip(a.values(), a.keys())". The same relationship holds for
the
>    iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(),
>    a.iterkeys())" provides the same value for pairs. Another way to create
>    the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]".

That last example is unnecessarily odd -- why does it put the values first?

> It's not entirely clear if this page is meant to apply just to
dictionaries
> or if (to the extent possible) it should apply to all mapping types.  I'm
of
> the opinion it should apply more broadly.  Others are not of that opinion.
> Should the documentation be more explicit about this?

I probably wrote an early version of that text, and I meant it to apply to
dicts only. (In general this section is a description of the dict
implementation, not of the mapping concept.) I do want to keep this
guarantee for dicts, for the following reasons: (a) it's very unlikely it
will ever change in CPython (note the caveat of no changes); (b) users will
write working code that subtly depends on it, without even realizing it; (c)
no amount of documentation is going to get those users not to make that
assumption; (d) but documenting this requirement (for dicts) is sure to draw
the attention of the implementers of alternative Python versions, who will
have to implement this so as not to break the implicit assumptions of users
in (b).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080912/4e09e23c/attachment-0001.htm>

From skip at pobox.com  Fri Sep 12 19:13:11 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 12 Sep 2008 12:13:11 -0500
Subject: [Python-3000] How much should non-dict mappings behave like
	dict?
In-Reply-To: <ca471dc20809120913j19747998yc855fcb60cc0b3d3@mail.gmail.com>
References: <18634.24051.300204.451209@montanaro-dyndns-org.local>
	<ca471dc20809120913j19747998yc855fcb60cc0b3d3@mail.gmail.com>
Message-ID: <18634.41767.788522.651599@montanaro-dyndns-org.local>

    >> select key from dict order by key
    >> select value from dict order by key

    Guido> What's the purpose of the "order by key" clauses here? Doesn't
    Guido> that force the return order? Perhaps you meant to leave those
    Guido> out?

It's simply to guarantee that the order of the elements of values() is the
same as the order of the elements of keys().  Again, I was thinking that
this property: zip(d.keys(), d.values()) == d.items() was a desirable
property of mappings, not just of the CPython dict implementation.

So is there a definition of what it means to be a mapping?  Maybe this page
in the C API doc?

    http://docs.python.org/api/mapping.html

>From that I infer that a mapping must offer these methods: keys, values,
items, __len__, __contains__, __getitem__, __setitem__ and __delitem__.  No
guarantee about the ordering of keys, values and items is made.  Can we
settle on something like this and spell it out explicitly somewhere in the
3.0 docs?

Skip

From guido at python.org  Fri Sep 12 19:33:06 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 12 Sep 2008 10:33:06 -0700
Subject: [Python-3000] How much should non-dict mappings behave like
	dict?
In-Reply-To: <18634.41767.788522.651599@montanaro-dyndns-org.local>
References: <18634.24051.300204.451209@montanaro-dyndns-org.local>
	<ca471dc20809120913j19747998yc855fcb60cc0b3d3@mail.gmail.com>
	<18634.41767.788522.651599@montanaro-dyndns-org.local>
Message-ID: <ca471dc20809121033j1bbba458w5816317674d9dc8a@mail.gmail.com>

On Fri, Sep 12, 2008 at 10:13 AM,  <skip at pobox.com> wrote:
>
>    >> select key from dict order by key
>    >> select value from dict order by key
>
>    Guido> What's the purpose of the "order by key" clauses here? Doesn't
>    Guido> that force the return order? Perhaps you meant to leave those
>    Guido> out?
>
> It's simply to guarantee that the order of the elements of values() is the
> same as the order of the elements of keys().  Again, I was thinking that
> this property: zip(d.keys(), d.values()) == d.items() was a desirable
> property of mappings, not just of the CPython dict implementation.

But in SQL this would force alphabetical ordering so of course it
would both return them in corresponding order. Maybe we should just
drop this, it seems hardly relevant.

> So is there a definition of what it means to be a mapping?  Maybe this page
> in the C API doc?
>
>    http://docs.python.org/api/mapping.html
>
> From that I infer that a mapping must offer these methods: keys, values,
> items, __len__, __contains__, __getitem__, __setitem__ and __delitem__.  No
> guarantee about the ordering of keys, values and items is made.  Can we
> settle on something like this and spell it out explicitly somewhere in the
> 3.0 docs?

That's a C API definition that hasn't been updated. If anything
documents the concept of a mapping it would be the Mapping ABC in the
collections module.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From josiah.carlson at gmail.com  Fri Sep 12 19:55:13 2008
From: josiah.carlson at gmail.com (Josiah Carlson)
Date: Fri, 12 Sep 2008 10:55:13 -0700
Subject: [Python-3000] How much should non-dict mappings behave like
	dict?
In-Reply-To: <ca471dc20809121033j1bbba458w5816317674d9dc8a@mail.gmail.com>
References: <18634.24051.300204.451209@montanaro-dyndns-org.local>
	<ca471dc20809120913j19747998yc855fcb60cc0b3d3@mail.gmail.com>
	<18634.41767.788522.651599@montanaro-dyndns-org.local>
	<ca471dc20809121033j1bbba458w5816317674d9dc8a@mail.gmail.com>
Message-ID: <e6511dbf0809121055n4e23aae8rf1830abe22101341@mail.gmail.com>

On Fri, Sep 12, 2008 at 10:33 AM, Guido van Rossum <guido at python.org> wrote:
> On Fri, Sep 12, 2008 at 10:13 AM,  <skip at pobox.com> wrote:
>>
>>    >> select key from dict order by key
>>    >> select value from dict order by key
>>
>>    Guido> What's the purpose of the "order by key" clauses here? Doesn't
>>    Guido> that force the return order? Perhaps you meant to leave those
>>    Guido> out?
>>
>> It's simply to guarantee that the order of the elements of values() is the
>> same as the order of the elements of keys().  Again, I was thinking that
>> this property: zip(d.keys(), d.values()) == d.items() was a desirable
>> property of mappings, not just of the CPython dict implementation.
>
> But in SQL this would force alphabetical ordering so of course it
> would both return them in corresponding order. Maybe we should just
> drop this, it seems hardly relevant.

If the desire is to behave like a bsddb.btree instance, alphabetical
is ok.  Replacing the 'order by key' with 'order by rowid' is
reasonably sane, if alphabetical is explicitly undesired.  Really
there are 3 options: alphabetical ordering, rowid ordering, no
guaranteed ordering (which seems to be on-disk or rowid ordering, my
brief tests tell me nothing).

>> So is there a definition of what it means to be a mapping?  Maybe this page
>> in the C API doc?
>>
>>    http://docs.python.org/api/mapping.html
>>
>> From that I infer that a mapping must offer these methods: keys, values,
>> items, __len__, __contains__, __getitem__, __setitem__ and __delitem__.  No
>> guarantee about the ordering of keys, values and items is made.  Can we
>> settle on something like this and spell it out explicitly somewhere in the
>> 3.0 docs?
>
> That's a C API definition that hasn't been updated. If anything
> documents the concept of a mapping it would be the Mapping ABC in the
> collections module.

According to PEP 3119 on the mapping ABC: "i.e. iterating over the
items, keys and values should return results in the same order."

So... key ordered, or rowid ordered?

 - Josiah

From skip at pobox.com  Fri Sep 12 20:02:44 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 12 Sep 2008 13:02:44 -0500
Subject: [Python-3000] How much should non-dict mappings behave like
 dict?
In-Reply-To: <e6511dbf0809121055n4e23aae8rf1830abe22101341@mail.gmail.com>
References: <18634.24051.300204.451209@montanaro-dyndns-org.local>
	<ca471dc20809120913j19747998yc855fcb60cc0b3d3@mail.gmail.com>
	<18634.41767.788522.651599@montanaro-dyndns-org.local>
	<ca471dc20809121033j1bbba458w5816317674d9dc8a@mail.gmail.com>
	<e6511dbf0809121055n4e23aae8rf1830abe22101341@mail.gmail.com>
Message-ID: <18634.44740.453950.907065@montanaro-dyndns-org.local>

    Guido> What's the purpose of the "order by key" clauses here? Doesn't
    Guido> that force the return order? Perhaps you meant to leave those
    Guido> out?
    >>> 
    >>> It's simply to guarantee that the order of the elements of values()
    >>> is the same as the order of the elements of keys().  Again, I was
    >>> thinking that this property: zip(d.keys(), d.values()) == d.items()
    >>> was a desirable property of mappings, not just of the CPython dict
    >>> implementation.
    >> 
    >> But in SQL this would force alphabetical ordering so of course it
    >> would both return them in corresponding order. Maybe we should just
    >> drop this, it seems hardly relevant.

    Josiah> If the desire is to behave like a bsddb.btree instance,
    Josiah> alphabetical is ok.  Replacing the 'order by key' with 'order by
    Josiah> rowid' is reasonably sane, if alphabetical is explicitly
    Josiah> undesired.  Really there are 3 options: alphabetical ordering,
    Josiah> rowid ordering, no guaranteed ordering (which seems to be
    Josiah> on-disk or rowid ordering, my brief tests tell me nothing).

Folks, this is my last comment on this particular issue.  I think everybody
misunderstands what I was getting at here.  All I wanted to do was guarantee
that keys and values were returned in the same order if called with no
intervening updates.  Ordering both statements by the keys seemed to be the
easiest way to accomplish that.  I could have cared less if the result was
sorted, just that it was predictable.  Gerhard suggested that if predictable
ordering was desired that "order by rowid" would be better.

    >> That's a C API definition that hasn't been updated. If anything
    >> documents the concept of a mapping it would be the Mapping ABC in the
    >> collections module.

    Josiah> According to PEP 3119 on the mapping ABC: "i.e. iterating over
    Josiah> the items, keys and values should return results in the same
    Josiah> order."

    Josiah> So... key ordered, or rowid ordered?

I could care less, just so it's predictable.  Ordering by rowid is probably
more efficient.

Skip

From solipsis at pitrou.net  Fri Sep 12 20:10:51 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Sep 2008 18:10:51 +0000 (UTC)
Subject: [Python-3000] How much should non-dict mappings behave like
	dict?
References: <18634.24051.300204.451209@montanaro-dyndns-org.local>
	<ca471dc20809120913j19747998yc855fcb60cc0b3d3@mail.gmail.com>
	<18634.41767.788522.651599@montanaro-dyndns-org.local>
	<ca471dc20809121033j1bbba458w5816317674d9dc8a@mail.gmail.com>
	<e6511dbf0809121055n4e23aae8rf1830abe22101341@mail.gmail.com>
	<18634.44740.453950.907065@montanaro-dyndns-org.local>
Message-ID: <loom.20080912T180849-366@post.gmane.org>

> Gerhard suggested that if predictable
> ordering was desired that "order by rowid" would be better.

I personally don't understand what predictability brings (using a disk backend
implies that you should minimize queries, so using keys() then values() is
inefficient compared to using items() anyway), but this will be my last comment
on the issue as well :)

Regards

Antoine.

From greg at krypto.org  Sun Sep 14 02:07:46 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 13 Sep 2008 17:07:46 -0700
Subject: [Python-3000] [Python-Dev]  Proposed revised schedule
In-Reply-To: <3DFD4AAC-D8EA-46E6-BC56-C713861C02B7@python.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>
	<ga0ppm$udq$1@ger.gmane.org>
	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>
	<ga1ck7$ldi$1@ger.gmane.org>
	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>
	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>
	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>
	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>
	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>
	<3DFD4AAC-D8EA-46E6-BC56-C713861C02B7@python.org>
Message-ID: <52dc1c820809131707o18e75f6an24a6f3cca1dd084@mail.gmail.com>

On Tue, Sep 9, 2008 at 6:23 AM, Barry Warsaw <barry at python.org> wrote:
>
> That seems risky to me.  First, it's a new feature.  Second, it will be
> largely untested code.  I would much rather see dbm.sqlite released as a
> separate package for possible integration into the core for 3.1.
>
> - -Barry
>

+1

From andreaskalsch at gmx.de  Sun Sep 14 11:57:50 2008
From: andreaskalsch at gmx.de (Andreas Kalsch)
Date: Sun, 14 Sep 2008 11:57:50 +0200
Subject: [Python-3000] Hi
Message-ID: <20080914095750.223700@gmx.net>

Hi,

I am a newbie and I have some questions:

- Is there a searchable archive of this (and any other) mailing lists? http://mail.python.org/pipermail/python-3000/ doesn't let me search the archive, so I don't know, if I ask a question which has already been answered.

- Where can I find replacements for removed Python 2.x modules? E.g. I want to use the commands module (to execute shell commands in Python). Where do I find it in Python 3000?

Andi

-- 
GMX Kostenlose Spiele: Einfach online spielen und Spa? haben mit Pastry Passion!
http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196

From qgallet at gmail.com  Sun Sep 14 12:21:05 2008
From: qgallet at gmail.com (Quentin Gallet-Gilles)
Date: Sun, 14 Sep 2008 12:21:05 +0200
Subject: [Python-3000] Hi
In-Reply-To: <20080914095750.223700@gmx.net>
References: <20080914095750.223700@gmx.net>
Message-ID: <8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com>

Hi Andreas,

- There are alternatives to the mailman interface. Gmane, for instance, is
searchable : http://news.gmane.org/gmane.comp.python.python%2d3000.devel

- I suggest you take a look at the 2.6 library reference (
http://docs.python.org/dev/library/index.html). For the "commands" module,
you'll see the following warning :
"In 3.x, getstatus() and two undocumented functions (mk2arg() and mkarg())
have been removed. Also, getstatusoutput() and getoutput() have been moved
to the subprocess module."
Also, if you follow the link to the subprocess module documentation, you'll
see many examples on how to do what you want.

By the way, those questions are best answered on comp.lang.python. This list
is about the core development of Python3000 exclusively.

Cheers,
Quentin

On Sun, Sep 14, 2008 at 11:57 AM, Andreas Kalsch <andreaskalsch at gmx.de>wrote:

> Hi,
>
> I am a newbie and I have some questions:
>
> - Is there a searchable archive of this (and any other) mailing lists?
> http://mail.python.org/pipermail/python-3000/ doesn't let me search the
> archive, so I don't know, if I ask a question which has already been
> answered.
>
> - Where can I find replacements for removed Python 2.x modules? E.g. I want
> to use the commands module (to execute shell commands in Python). Where do I
> find it in Python 3000?
>
> Andi
>
>
> --
> GMX Kostenlose Spiele: Einfach online spielen und Spa? haben mit Pastry
> Passion!
>
> http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/qgallet%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080914/53b3574f/attachment.htm>

From andreaskalsch at gmx.de  Sun Sep 14 12:59:52 2008
From: andreaskalsch at gmx.de (Andreas Kalsch)
Date: Sun, 14 Sep 2008 12:59:52 +0200
Subject: [Python-3000] Hi
In-Reply-To: <8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com>
References: <20080914095750.223700@gmx.net>
	<8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com>
Message-ID: <20080914105952.18780@gmx.net>

> Hi Andreas,
> 
> - There are alternatives to the mailman interface. Gmane, for instance, is
> searchable : http://news.gmane.org/gmane.comp.python.python%2d3000.devel

This is what I was searching for, thanks!

> - I suggest you take a look at the 2.6 library reference (
> http://docs.python.org/dev/library/index.html). For the "commands" module,
> you'll see the following warning :
> "In 3.x, getstatus() and two undocumented functions (mk2arg() and mkarg())
> have been removed. Also, getstatusoutput() and getoutput() have been moved
> to the subprocess module."
> Also, if you follow the link to the subprocess module documentation,
> you'll
> see many examples on how to do what you want.

Yes I have found, that now there is the subprocess module.

> By the way, those questions are best answered on comp.lang.python. This
> list
> is about the core development of Python3000 exclusively.
> 
> Cheers,
> Quentin

Thank you for your answers!

> On Sun, Sep 14, 2008 at 11:57 AM, Andreas Kalsch
> <andreaskalsch at gmx.de>wrote:
> 
> > Hi,
> >
> > I am a newbie and I have some questions:
> >
> > - Is there a searchable archive of this (and any other) mailing lists?
> > http://mail.python.org/pipermail/python-3000/ doesn't let me search the
> > archive, so I don't know, if I ask a question which has already been
> > answered.
> >
> > - Where can I find replacements for removed Python 2.x modules? E.g. I
> want
> > to use the commands module (to execute shell commands in Python). Where
> do I
> > find it in Python 3000?
> >
> > Andi
> >
> >
> > --
> > GMX Kostenlose Spiele: Einfach online spielen und Spa? haben mit Pastry
> > Passion!
> >
> >
> http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe:
> > http://mail.python.org/mailman/options/python-3000/qgallet%40gmail.com
> >

-- 
GMX Kostenlose Spiele: Einfach online spielen und Spa? haben mit Pastry Passion!
http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196

From ncoghlan at gmail.com  Sun Sep 14 14:06:59 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 14 Sep 2008 22:06:59 +1000
Subject: [Python-3000] Hi
In-Reply-To: <8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com>
References: <20080914095750.223700@gmx.net>
	<8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com>
Message-ID: <48CCFE63.6050109@gmail.com>

Quentin Gallet-Gilles wrote:
> Hi Andreas,
> 
> - There are alternatives to the mailman interface. Gmane, for instance,
> is searchable : http://news.gmane.org/gmane.comp.python.python%2d3000.devel

For what it's worth, I personally just use Google on the mailman
archives via a couple of keyword bookmarks in Firefox. (e.g. including
"site:mail.python.org inurl:python-dev" in a Google search will search
the Mailman archives for python-dev, and I can trigger such a search by
typing "pydev <search terms>" in the address bar).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From jcea at jcea.es  Wed Sep 17 20:27:21 2008
From: jcea at jcea.es (Jesus Cea)
Date: Wed, 17 Sep 2008 20:27:21 +0200
Subject: [Python-3000] [Python-Dev] dbm.sqlite
In-Reply-To: <loom.20080909T102813-664@post.gmane.org>
References: <E4008E05-32A8-4D09-B56C-B324DF468434@python.org>	<ga0ppm$udq$1@ger.gmane.org>	<9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org>	<ga1ck7$ldi$1@ger.gmane.org>	<D1F8B586-F45C-4B8C-9887-612A6A794A45@python.org>	<ca471dc20809081013s63ed2bc2ja10e0fa75059642f@mail.gmail.com>	<1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com>	<ca471dc20809081625s16f5c6f4m243375298748b02d@mail.gmail.com>	<3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1>	<18629.56199.90786.234922@montanaro-dyndns-org.local>	<48C64DED.3090103@gmail.com>
	<loom.20080909T102813-664@post.gmane.org>
Message-ID: <48D14C09.10608@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Antoine Pitrou wrote:
> I agree about performance but I don't think it's right to say we can fix
> stability later. This is a storage module, and people risk losing their data if
> there are glaring bugs. If we really want an efficient dbm-compatible storage
> backend for all platforms on 3.0, then why not bite the bullet and re-add bsddb?
> Even though it has its quirks, it's certainly much more tested than a
> hypothetical dbm.sqlite whipped up in a few days and used by nobody in the wild.

Of course I'm +1 to re-adding bsddb, moreover with 3.0 slipping the
original 1st October release. But note than Guido in person "rather
prefer" to drop bsddb in 3.0.

I have a conflict talking about sqlite dbm module in 3.0. So I rather do
not vote on that issue.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSNFMA5lgi5GaxT1NAQI9ywP/U4g7PjtMp5Uae0NMxByCJsbFgJPXkbMx
S8xi31YqUx9j3hc/3vFjYH2+Ywf1WPTDfUN3LLhf0oVBEbwJl9QQKyua0e2AesBY
g6qQ0meZdpRHm0WzHByI5/aMkxAnwEoHILveMubnQRr1KpTexGHEa6mXv5aVwkJm
6KIqS3tG0kk=
=XMnZ
-----END PGP SIGNATURE-----

From barry at python.org  Thu Sep 18 07:40:18 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 18 Sep 2008 01:40:18 -0400
Subject: [Python-3000] RELEASED Python 2.6rc2 and 3.0rc1
Message-ID: <D1B63EBE-3BA3-46EC-B8A3-F4AC81F0C50D@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On behalf of the Python development team and the Python community, I  
am happy to announce the second and final planned release candidate  
for Python 2.6, as well as the first release candidate for Python 3.0.

These are release candidates, so while they are not suitable for  
production environments, we strongly encourage you to download and  
test them on your software.  We expect only critical bugs to be fixed  
between now and the final releases.  Currently Python 2.6 is scheduled  
for October 1st, 2008.  Python 3.0 release candidate 2 is planned for  
October 1st, with the final release planned for October 15, 2008.

If you find things broken or incorrect, please submit bug reports at

     http://bugs.python.org

For more information and downloadable distributions, see the Python  
2.6 website:

     http://www.python.org/download/releases/2.6/

and the Python 3.0 web site:

     http://www.python.org/download/releases/3.0/

See PEP 361 for release schedule details:

     http://www.python.org/dev/peps/pep-0361/

Enjoy,
- -Barry

Barry Warsaw
barry at python.org
Python 2.6/3.0 Release Manager
(on behalf of the entire python-dev team)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSNHpw3EjvBPtnXfVAQLW9wP/RBCaUvhuheIh+BjLLIHQFBQi7D3uVgqi
l0+4fhhoKGJvtWklLfSM9I1prcjH/d6tzUu4fIOjX7aM+wZSG++vkfmBoehnhyZW
AvU9Lax4mqDwhOJA2QA0WMx0obpYYVHeUl7D1g9kWzbRUkZDX9NZGMWThhEOC1qA
UA3bBYbvWiQ=
=BFNH
-----END PGP SIGNATURE-----

From gregor.lingl at aon.at  Mon Sep 22 00:15:36 2008
From: gregor.lingl at aon.at (Gregor Lingl)
Date: Mon, 22 Sep 2008 00:15:36 +0200
Subject: [Python-3000] turtle.Screen.__init__ issue
Message-ID: <48D6C788.2050400@aon.at>

Hello there,

its high time to resolve an issue, which I have already addressed twice 
some weeks ago. (You can find a more elaborate description in my former 
posting cited below)

There is a tiny difference (also in behaviour!) in 
turtle.Screen.__init__() between the versions for 2.6 and 3.0. The 
difference results from the fact, that I submitted the 3.0 version 
approx. a week later, after having it ported to 3.0. In this process I 
had found what I now consider to be a bug in 2.6 and changed it 
accordingly.  Shortly:

If you have already a Screen object containing some turtles and some 
graphics,

in 2.6: s = Screen() returns an object with identical state and 
behaviour, but clears (re-initializes) the screen and thus destroys the 
content
in 3.0 s = Screen()  returns an object with identical state and 
behaviour, but leaves the content untouched

The difference in code consist only in indenting the call of the 
__init__ method of the parent class, so it will be executed only 
conditionally.

Anyway, as this difference between the two versions is highly 
undesirable there are (imho) three options to proceed:

(1) correct 2.6 in order that it will work like 3.0
(2) undo the change in 3.0 in order that it will work like 2.6
(3) find a different solution for both

I would (like Vern, see below) decisevely prefer option (1), and I 
suppose that there is not enough time left to chose option (3) as this 
would probably need some discussions.

What is your opinion, and who should decide?

For your convenience I've attached a diff-file which also contains the 
description of three other small bugs, which I've found in the meantime 
and which shouldn't cause any controversies.

Regards, Gregor

%%%%%%%%%%

Here follows the answer of Vern Ceder - a long term turtle graphics user 
and author of several patches for the old turtle module - to my former 
posting:

 >> Gregor,
 >>
 >> I don't feel authoritative on the correctness/appropriateness of the 
implementation,
 >> but I do agree completely that behavior b, or what you have in the 
3.0 version,
 >> is vastly preferable.
 >>
 >> Cheers,
 >> Vern

-------- Original-Nachricht --------
Betreff: 	[Python-Dev] turtle.Screen- how to implement best a Singleton
Datum: 	Mon, 18 Aug 2008 10:15:45 +0200
Von: 	Gregor Lingl <gregor.lingl at aon.at>
An: 	python-dev at python.org
CC: 	Toby Donaldson <tjd at sfu.ca>, python-3000 at python.org, 
jjposner at snet.net, Brad Miller <bonelake at gmail.com>, Vern Ceder 
<vceder at canterburyschool.org>

Hi,

this posting - concerning the new turtle module - goes to the Python-Dev 
and Python-3000 lists and to a couple of 'power users' of turtle 
graphics, hoping to recieve feedback from the developer's point of view 
as well as from the user's point of view.

Currently the implementations of the turtle.Screen class for Python 2.6 
and Python 3.0 differ by a 'tiny' detail with an important difference in 
behaviour. So clearly this has to be resolved  before the final 
release.(The origin of this difference is, that when I ported turtle.py 
to Python 3.0 I discovered (and 'fixed') what I now consider to be a bug 
in the 2.6 version.) I'd like to ask you kindly for your advice to 
achieve an optimal solution.

The posting consists of three parts:
1. Exposition of design goals
2. Problem with the implementation 
3. How to solve it?

Preliminary remark:  I've had some discussions on this topic before but 
I still do not see a clear solution. Moreover I'm well aware of the fact 
that using the Singleton pattern is controversial. So ...

1. Exposition of design goals
... why use the Singleton design pattern? The turtle module contains a 
TurtleScreen class, which implements methods to control the drawing area 
the turtle is (turtles are) drawing on. It's constructor needs a Tkinter 
Canvas as argument. In order to avoid the need for users to tinker 
around with Tkinter stuff there is the Screen(TurtleScreen) class, 
designed to be used by beginners(students, kids,...), particularly in 
interactive sessions.

A (THE (!)) Screen object is essentially a window containing a scrolled 
canvas, the TurtleScreen. So it's a ressource which should exist only 
once. It can be constructed in several ways:
- implicitely by calling an arbitrary function derived from a 
Turtle-method, such as forward(100) or by constructing a Turtle such as 
bob = Turtle()
- implicitely by calling an arbitrary function derived from a Screen 
method, such as bgcolor("red")
- explicitely by calling it's constructor such as s = Screen()
Anyway this construction should only happen if a Screen object doesn't 
exist yet.
Now for the pending question: What should happen, when s = Screen() is 
called explicitely and there exists already 'the' Screen object.
(i) Clearly s should get a reference to the existing Screen object, but ...
(ii) (a)... should s be reinitialized (this is the case now in Python 
2.6), or
    (b)... should s be left untouched (this is the case now in Python 3.0)

I, for my part, prefer the latter solution (b). Example: a student, 
having (interactively) produced some design using some turtle t = 
Turtle() decides spontaneously to change backgroundcolor. s = Screen(); 
s.bgcolor("pink") should do this for her - instead of deleting her 
design and moreover her turtle. To reinitialize the screen she still can 
use s.clear().

Of course, there are workarounds to achieve the same effect also with 
solution (a), for instance by assigning s = Screen() *before* drawing 
anything or by assigning s = t.getscreen(). But imho (which derives 
itself from my experience as a teacher) solution (b) supports better the 
oop-view as well as experimenting spontaneously in interactive sessions.

2. Problem with the implementation
The task is to derive a Singleton class from a Nonsingleton class 
(Screen from TurtleScreen). The current implementations of the Screen 
'Singleton' both use the Borg idiom.  Just for *explaining* the 
difference between the two versions of class Screen here concisely,  
I'll use a 'standard' Singleton pattern (roughly equivalent to the Borg 
idiom):

class Spam(object):
   def __init__(self, s):
       self.s = s

class SingleSpam(Spam):
   _inst = None
   def __new__(cls, *args, **kwargs):       
       if cls != type(cls._inst):
           cls._inst = Spam.__new__(cls, *args, **kwargs)
       return cls._inst
   def __init__(self, s):
       if vars(self): return    ######  should this be here???
       Spam.__init__(self, s)

Shortly, this means that SingleSpam.__init__() acts like an empty method 
whenever a (the!) SingleSpam object already exists. 3.0 version of 
Screen acts like this. By contrast 2.6 version of Screen acts as if the 
butlast line were not there and thus reinitializes the Screen object.

3. How to solve it?

Main question: which *behaviour* of the Screen class should be 
preferred.  If  3.0, is it feasible and correct not to call the 
constructor of the parent class if the object already exists?

Additional question: Do you consider the Borg idiom a good solution for 
this task or should the standard singleton pattern as shown above be 
preferred. Or would you suggest a solution/an approach different from both?

Thanks for your patience, and - in advance - for your assistance

Regard,
Gregor

_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/gregor.lingl%40aon.at

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: turtle_patch_rc2-diff.txt
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080922/ac8279f3/attachment.txt>

From Graham.Dumpleton at gmail.com  Fri Sep 19 13:37:09 2008
From: Graham.Dumpleton at gmail.com (Graham Dumpleton)
Date: Fri, 19 Sep 2008 04:37:09 -0700 (PDT)
Subject: [Python-3000] PySys_SetObject() crashes in secondary sub
	interpreters.
Message-ID: <8d669f5a-ff02-482c-9fa4-a1fb303ba0d8@o40g2000prn.googlegroups.com>

For early Python 3.0 alpha versions I had mod_wsgi working no
problems. When I tried b3 it breaks for secondary sub interpreters. In
particular, a call to PySys_SetObject() crashes.

>From what I can tell so far the problem is that 'interp->sysdict' is
NULL after calling Py_NewInterpreter() to create a secondary sub
interpreter.

Reading through code and using a debugger, at this point this seems to
be due to condition if code:

        sysmod = _PyImport_FindExtension("sys", "sys");
        if (bimod != NULL && sysmod != NULL) {
                interp->sysdict = PyModule_GetDict(sysmod);
                if (interp->sysdict == NULL)
                        goto handle_error;
                Py_INCREF(interp->sysdict);
                PySys_SetPath(Py_GetPath());
                PyDict_SetItemString(interp->sysdict, "modules",
                                     interp->modules);
                _PyImportHooks_Init();
                initmain();
                if (!Py_NoSiteFlag)
                        initsite();
        }

in Py_NewInterpreter() not executing due to
_PyImport_FindExtension("sys", "sys") returning NULL.

Down in _PyImport_FindExtension(), it appears that the reason it fails
is because of following returning with NULL.

        def = (PyModuleDef*)PyDict_GetItemString(extensions,
filename);

        .....

                if (def->m_base.m_init == NULL)
                        return NULL;

In other words, whatever m_base.m_init is meant to be is NULL when
perhaps it isn't meant to be.

(gdb) call ((PyModuleDef*)PyDict_GetItemString(extensions,"builtins"))-
>m_base.m_init
$9 = (PyObject *(*)()) 0
(gdb) call ((PyModuleDef*)PyDict_GetItemString(extensions,"sys"))-
>m_base.m_init
$10 = (PyObject *(*)()) 0

I am going to keep tracking through to try and work out why, but
posting this initial information in case this rings a bell with
anyone.

I'll also try creating a small test outside of mod_wsgi which creates
a secondary interpreter and calls PySys_SetObject() to see if it
crashes. This should show if there is an underlying problem, or
something to do with how mod_wsgi uses interpreter creation code.

Thanks in advance for any feedback.

Graham

From krstic at solarsail.hcs.harvard.edu  Fri Sep 26 02:15:22 2008
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Thu, 25 Sep 2008 20:15:22 -0400
Subject: [Python-3000] PyCon 2009 - Call for proposals
Message-ID: <C7AE5709-D281-452B-8DFD-5A719A28A367@solarsail.hcs.harvard.edu>

Hi folks,

PyCon '09 will be opening for talk proposals shortly; see below. We'd  
love to have some great talks on Python 3000, so please don't be shy!

Cheers,

Ivan Krsti?
Chair, PyCon 2009 Program Committee

                               * * *

Call for proposals -- PyCon 2009 -- <http://us.pycon.org/2009/>
===============================================================

Want to share your experience and expertise? PyCon 2009 is looking for
proposals to fill the formal presentation tracks. The PyCon conference
days will be March 27-29, 2009 in Chicago, Illinois, preceded by the
tutorial days (March 25-26), and followed by four days of development
sprints (March 30-April 2).

Previous PyCon conferences have had a broad range of presentations,
from reports on academic and commercial projects to tutorials and case
studies. We hope to continue that tradition this year.

Online proposal submission will open on September 29, 2008. Proposals
will be accepted through November 03, with acceptance notifications
coming out on December 15. For the detailed call for proposals,
please see:

  <http://us.pycon.org/2009/conference/proposals/>

We look forward to seeing you in Chicago!

--
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | http://radian.org

From solipsis at pitrou.net  Fri Sep 26 12:08:49 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 26 Sep 2008 10:08:49 +0000 (UTC)
Subject: [Python-3000] PyUnicodeObject implementation
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>
	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<48C4464E.5010707@gmail.com>
	<ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>
	<48C642B5.2020109@egenix.com>
	<loom.20080909T110315-549@post.gmane.org>
Message-ID: <loom.20080926T100812-765@post.gmane.org>

So what would be the outcome of this discussion, and should a decision (and
which one) be taken?

Regards

Antoine.

From mal at egenix.com  Fri Sep 26 16:42:59 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 26 Sep 2008 16:42:59 +0200
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <loom.20080926T100812-765@post.gmane.org>
References: <200809051954.42787.jeremy.kloth@gmail.com>	<loom.20080906T103737-758@post.gmane.org>	<g9vv2u$3ot$1@ger.gmane.org>	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>	<48C4464E.5010707@gmail.com>	<ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>	<48C642B5.2020109@egenix.com>	<loom.20080909T110315-549@post.gmane.org>
	<loom.20080926T100812-765@post.gmane.org>
Message-ID: <48DCF4F3.3050803@egenix.com>

On 2008-09-26 12:08, Antoine Pitrou wrote:
> So what would be the outcome of this discussion, and should a decision (and
> which one) be taken?

I'm still -1 on changing Unicode objects to PyVarObjects for the
reasons already stated in various postings on this thread and on
the ticket.

I'd much rather like to see the parameters of the implementation
optimized (both in the Unicode implementation and pymalloc). See the
ticket discussion for details.

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 26 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From guido at python.org  Fri Sep 26 19:00:09 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Sep 2008 10:00:09 -0700
Subject: [Python-3000] PyUnicodeObject implementation
In-Reply-To: <48DCF4F3.3050803@egenix.com>
References: <200809051954.42787.jeremy.kloth@gmail.com>
	<loom.20080906T103737-758@post.gmane.org> <g9vv2u$3ot$1@ger.gmane.org>
	<ca471dc20809070738v39fdfcccg64d052ba35616bbb@mail.gmail.com>
	<48C4464E.5010707@gmail.com>
	<ca471dc20809071555g337d098au10b5cae210dfa0f9@mail.gmail.com>
	<48C642B5.2020109@egenix.com>
	<loom.20080909T110315-549@post.gmane.org>
	<loom.20080926T100812-765@post.gmane.org>
	<48DCF4F3.3050803@egenix.com>
Message-ID: <ca471dc20809261000j3e6a27a4s1136113bedbe6d7a@mail.gmail.com>

On Fri, Sep 26, 2008 at 7:42 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-09-26 12:08, Antoine Pitrou wrote:
>> So what would be the outcome of this discussion, and should a decision (and
>> which one) be taken?
>
> I'm still -1 on changing Unicode objects to PyVarObjects for the
> reasons already stated in various postings on this thread and on
> the ticket.

I still find those reasons rather weak; the old 8-bit string object
was pretty darn successful despite being a PyVarObject.

> I'd much rather like to see the parameters of the implementation
> optimized (both in the Unicode implementation and pymalloc). See the
> ticket discussion for details.

I think the only way to decide is to have an alternative
implementation ready and prove that it is faster. Or maybe it isn't,
which would also decide the case. In order to allow for a fair race,
if the new implementation comes up with a neat speed-up trick that
could also be applied to the old implementation, it should be remove
or applied to both implementations. That is, I don't want a black-box
race -- I want to see it proven that for a realistic app the choice to
use a PyVarObject actually makes a difference.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Sat Sep 27 00:24:37 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 26 Sep 2008 18:24:37 -0400
Subject: [Python-3000] Reminder: Python 2.6 final next Wednesday
Message-ID: <AA1F5430-661B-46AE-86C8-3E9CC9CE26C8@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is a reminder that Python 2.6 final is schedule for release next  
Wednesday, October 1st.

Once again, I've gone through the release blocker issues and knocked  
anything that doesn't specifically affect 2.6 to deferred blocker.   
This leaves us with  7 open blocking issues.

Please spend some time over the next several days reviewing patches,  
making comments and working toward closing these issues.  Email me  
directly if you have any questions, or ping me on irc.  Unfortunately,  
my 'net connection may be a little bit flakey until Wednesday, but I  
will do my best to get online and follow up as needed.

Please pitch in to help get Python 2.6 released on time!

Thanks,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSN1hJnEjvBPtnXfVAQIQfQQAgYr0tuzJhm3LZX/1SaCwRJcX09UvNH1I
CjZHs2TKS22MjF9d3mmBgcEJPl9AwGE+6EF6OiSgrsNRoRtnN0MMT3nQo+deRkan
P3jUgMFJMFkA7Uq5MmuNnEnKZXa/bsu/8Om/4wqHqvtDXbUQkZPfyE8BFwBzJJSM
Aa6Wp3wieFs=
=r4iD
-----END PGP SIGNATURE-----

From giles at spacepigs.com  Sat Sep 27 22:16:32 2008
From: giles at spacepigs.com (Giles Constant)
Date: Sat, 27 Sep 2008 21:16:32 +0100 (BST)
Subject: [Python-3000] Alternative to standard regular expressions
Message-ID: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com>

unsure if this is the right place to voice my thoughts on such a thing,
but given the idealism of python (particularly as an anti-thesis to much of
the ideas of perl), after trying to fix a broken perl script late at
night, It
occurred to me that regular expressions are somewhat un-pythonic.  I actually
find the python 're' module, although more versatile than regular expressions
in perl, something that I always have to refer to the manual for, in spite of
the number of times I've used it.  In other words, I'm tempted to stretch our
beloved term "unpythonic" to regular expressions.  This is rare for a small
python module.

So I thought it's time to start something new, perhaps as a python module.

I've googled around to see if there's any attempts at an alternative out
there, and found nothing, although there have been some people who have made
some very well written articles about how regular expressions are a
problem in
a number of ways:

1) They look horrible.  Like line noise.  Each character is a functional
unit,
meaning something that would take a paragraph to describe is reduced to a
small number of characters.  Given that programmers tend to spend more time
thinking than typing, I don't see any advantage to this.

2) They can fail in subtle ways.  Exceptional cases can emerge where an
expression which works in 99% of cases starts losing characters whose
possibility were missed by the author

3) They can very quickly become rather long (check the expression for an
email
address in the back of the 'mastering regular expressions' o'reilly book).

4) The use of multi-line switches and other trailing-end characters
complicates things further.

One of the great things about python is that its string, slice, and
split/join
functions mean that I rarely use regular expressions in python.  In fact, I
try to avoid it.  But a more pythonic matching and substitution system could
be a great thing.

The first thing that occurred to me in trying to imagine what an easier to
use
alternative would look like is that they're the wrong way round: the
functional characters - the things that actually do things - are escaped,
while the match strings written in text are the default.  Unless you're
trying
to write a '/' or '\', that is, which you have to escape (carefully, if
you're
writing something exposed to the internet and you don't want your server
hosed
by a hacker).

In other words, it is the match string which should be treated as special,
and
the special functions which should be the norm.  So, for an example first
foray
into this idea (I'm making this up as I go along.. I should point out!)

Instead of:
  /\d+hello/

How about (explanation of syntax to follow):

 boolean = match(input, "oneormore(digit).one('hello')")

I'm using a '.' to separate lexical units here.  The specifying functions
indicate how many times or under what circumstances the unit is matched, and
within the brackets are classes representing what needs to be matched.
'digit' represents '\d' in this case, and a string is just that.

Taking it a bit further:

  /\d{1,3}hello/

is replaced by

  boolean = match(input, "range(digit, (1,3)).one('hello')"

Ok, so what about substitution..

  s/.*(hello).*/$1/

  result = substitute(input, "many(char)|one('hello')|many(char)",
"match(0)")

Instead of dots, matches which should be captured are contained between pipe
symbols.  I'm still having an argument with myself as to whether some sort of
function/keyword should be used instead.  I dunno.  That's why I emailed you
guys :-)

I'm going to have a bigger think about this tomorrow, but I think it could be
a great feature.

Cheers!  (and thanks for a great language),

Giles

From musiccomposition at gmail.com  Sat Sep 27 22:52:50 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Sat, 27 Sep 2008 15:52:50 -0500
Subject: [Python-3000] Alternative to standard regular expressions
In-Reply-To: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com>
References: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com>
Message-ID: <1afaf6160809271352m3e30be10jc260fd5fb2b4d0de@mail.gmail.com>

On Sat, Sep 27, 2008 at 3:16 PM, Giles Constant <giles at spacepigs.com> wrote:
> unsure if this is the right place to voice my thoughts on such a thing,
> but given the idealism of python (particularly as an anti-thesis to much of
> the ideas of perl), after trying to fix a broken perl script late at
> night, It
> occurred to me that regular expressions are somewhat un-pythonic.  I actually
> find the python 're' module, although more versatile than regular expressions
> in perl, something that I always have to refer to the manual for, in spite of
> the number of times I've used it.  In other words, I'm tempted to stretch our
> beloved term "unpythonic" to regular expressions.  This is rare for a small
> python module.

Try the comp.lang.python or python-ideas mailing list. This list is
more devoted to the current development of Python than new ideas.

-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From digitalxero at gmail.com  Sun Sep 28 21:00:10 2008
From: digitalxero at gmail.com (Dj Gilcrease)
Date: Sun, 28 Sep 2008 13:00:10 -0600
Subject: [Python-3000] Alternative to standard regular expressions
In-Reply-To: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com>
References: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com>
Message-ID: <e9764b730809281200x79b97b4eyaa110e47606683e1@mail.gmail.com>

On Sat, Sep 27, 2008 at 2:16 PM, Giles Constant <giles at spacepigs.com> wrote:
> Instead of:
>  /\d+hello/
>
> How about (explanation of syntax to follow):
>
>  boolean = match(input, "oneormore(digit).one('hello')")

Looks like you want pyparsing http://pyparsing.wikispaces.com/

From greg at krypto.org  Sun Sep 28 22:34:50 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Sun, 28 Sep 2008 13:34:50 -0700
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <48DE705E.6050405@v.loewis.de>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
Message-ID: <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>

On 9/27/08, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> I think that the problem is important because it's a regression from 2.5
>> to
>> 2.6/3.0. Python 2.5 uses bytes filename, so it was possible to
>> open/unlink "invalid" unicode strings (since it's not unicode but bytes).
>
> I'd like to stress that the problem is *not* a regression from 2.5 to 2.6.
>
> As for 3.0, I'd like to argue that the problem is a minor issue. Even
> though you may run into file names that can't be decoded, that happening
> really indicates some bigger problem in the management of the system
> where this happens, and the proper solution (IMO) should be to change
> the system (leaving open the question whether or not Python should
> be also changed to work with such broken systems).
>
> Regards,
> Martin

Note:  bcc python-dev,cc: python-3000

"broken" systems will always exist.  Code to deal with them must be
possible to write in python 3.0.

since any given path (not just fs) can have its own encoding it makes
the most sense to me to let the OS deal with the errors and not try to
enforce bytes vs string encoding type at the python lib. level.

-gps

From martin at v.loewis.de  Sun Sep 28 23:13:38 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 28 Sep 2008 23:13:38 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>	
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
Message-ID: <48DFF382.7020006@v.loewis.de>

> "broken" systems will always exist.  Code to deal with them must be
> possible to write in python 3.0.

Python 3.0 will have bugs. This might just be one of them. I can agree
that Python 3.x will need to support that somehow, but perhaps not 3.0.

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Mon Sep 29 00:55:09 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 29 Sep 2008 10:55:09 +1200
Subject: [Python-3000] Alternative to standard regular expressions
In-Reply-To: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com>
References: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com>
Message-ID: <48E00B4D.9040904@canterbury.ac.nz>

Giles Constant wrote:

> How about (explanation of syntax to follow):
> 
>  boolean = match(input, "oneormore(digit).one('hello')")

Take this a step further and use constructor functions
to build the RE.

   from spiffy_re import one, oneormore
   pattern = oneormore(digit) + one('hello')
   match = pattern.match(input)

-- 
Greg

From greg at krypto.org  Mon Sep 29 01:21:16 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Sun, 28 Sep 2008 16:21:16 -0700
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <48DFF382.7020006@v.loewis.de>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
Message-ID: <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>

On Sun, Sep 28, 2008 at 2:13 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> "broken" systems will always exist.  Code to deal with them must be
>> possible to write in python 3.0.
>
> Python 3.0 will have bugs. This might just be one of them. I can agree
> that Python 3.x will need to support that somehow, but perhaps not 3.0.
>
> Regards,
> Martin

Agreed.  At this point I think we just need to get 3.0 out there and
be willing to fix flaws like this for 3.1 or in some cases for 3.0.1.

From foom at fuhm.net  Mon Sep 29 06:43:55 2008
From: foom at fuhm.net (James Y Knight)
Date: Mon, 29 Sep 2008 00:43:55 -0400
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
Message-ID: <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>

On Sep 28, 2008, at 7:21 PM, Gregory P. Smith wrote:

> On Sun, Sep 28, 2008 at 2:13 PM, "Martin v. L?wis"  
> <martin at v.loewis.de> wrote:
>>> "broken" systems will always exist.  Code to deal with them must be
>>> possible to write in python 3.0.
>>
>> Python 3.0 will have bugs. This might just be one of them. I can  
>> agree
>> that Python 3.x will need to support that somehow, but perhaps not  
>> 3.0.
>>
>> Regards,
>> Martin
>
> Agreed.  At this point I think we just need to get 3.0 out there and
> be willing to fix flaws like this for 3.1 or in some cases for 3.0.1.

This problem sure would be "practically" solved simply by switching  
the way the filesystemencoding is selected. You'll note that if you  
want things to Just Work for a backup tool with today's Py3k, all you  
need to do is switch the filesystem encoding to iso-8859-1. In that  
encoding, every byte string has an associated unique unicode string,  
so there's no problem with any possible filename.

With that in mind, here's my proposal:
a) Whenever ASCII would be selected as a filesystem encoding, use  
iso-8859-1 instead.
a) Whenever UTF-8 would be selected as a filesystem encoding, use  
UTF-8b [1] instead.

It's clearly not a 100% perfect solution, but it completely solves the  
issue for users with the most popular filesystem encodings: ASCII,  
iso-8859-1, and UTF-8. IMO, that's good enough to just leave things  
there.

But even if it's deemed not good enough, and the byte-string level  
file access APIs are all implemented, I *still* think doing the above  
is a good idea. It makes unicode string file/environment/argv access  
work in a huge majority of cases: a) windows always, b) Mac OS X  
always, c) ASCII locale always, d) ISO-8859-1 locale always, e) UTF-8  
locale always, f) other locales when the filenames really are encoded  
in their locale.

It will make users happy, and it's simple enough to implement for  
python 3.0.

James

[1] UTF-8b has a similar property to 8859-1, in that all byte strings  
can be successfully round-tripped. It's not currently implemented in  
python core, but it's a pretty trivial encoding, and is available  
under the BSD license, see below.

Background:
http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html

Blog post:
http://bsittler.livejournal.com/10381.html

Implementation for python:
http://hyperreal.org/~est/libutf8b/

James

From martin at v.loewis.de  Mon Sep 29 07:09:11 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 29 Sep 2008 07:09:11 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
 2.6 or 3.0?
In-Reply-To: <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
Message-ID: <48E062F7.8060501@v.loewis.de>

> This problem sure would be "practically" solved simply by switching the
> way the filesystemencoding is selected.

Great minds think alike :-) I just proposed a similar approach in the
tracker, with the following variations:
- applications can explicitly set the file system encoding. If they set
  it to Latin-1, they can access all files on a POSIX system.
- use private-use characters for unrepresentable bytes

For the second item, there was the immediate objection that this gives
conflicts in UTF-8, for which UTF-8b could be a good solution.

Regards,
Martin

From rhamph at gmail.com  Mon Sep 29 09:32:48 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Sep 2008 01:32:48 -0600
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
Message-ID: <aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>

On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <foom at fuhm.net> wrote:
> [1] UTF-8b has a similar property to 8859-1, in that all byte strings can be
> successfully round-tripped. It's not currently implemented in python core,
> but it's a pretty trivial encoding, and is available under the BSD license,
> see below.

UTF-8b doesn't work as intended.  It produces an invalid unicode
object (garbage surrogates) that cannot be used with external APIs or
libraries that require unicode.  If you don't need unicode then your
code should state so explicitly, and 8859-1 is ideal there.

-- 
Adam Olsen, aka Rhamphoryncus

From solipsis at pitrou.net  Mon Sep 29 13:12:43 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 29 Sep 2008 11:12:43 +0000 (UTC)
Subject: [Python-3000]
	=?utf-8?q?=5BPython-Dev=5D_Filename_as_byte_string_?=
	=?utf-8?b?aW4gcHl0aG9uCTIuNiBvciAzLjA/?=
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
Message-ID: <loom.20080929T110713-396@post.gmane.org>

Adam Olsen <rhamph <at> gmail.com> writes:
> 
> UTF-8b doesn't work as intended.  It produces an invalid unicode
> object (garbage surrogates) that cannot be used with external APIs or
> libraries that require unicode.

At least it works with all Python operations supported by the unicode type
(methods, concatenation, etc.) without any bad surprise. That feeding it to e.g.
PyGTK may give bogus results is another problem.

> If you don't need unicode then your
> code should state so explicitly, and 8859-1 is ideal there.

But then you can say bye-bye to proper representation (e.g. using print()) of
even valid filenames.

From victor.stinner at haypocalc.com  Mon Sep 29 14:07:55 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 29 Sep 2008 14:07:55 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
Message-ID: <200809291407.55291.victor.stinner@haypocalc.com>

Hi,

After reading the previous discussion, here is new proposition.

Python 2.x and Windows are not affected by this issue. Only Python3 on POSIX 
(eg. Linux or *BSD) is affected.

Some system are broken, but Python have to be able to open/copy/move/remove 
files with an "invalid filename".

The issue can wait for Python 3.0.1 / 3.1.

Windows
-------

On Windows, we might reject bytes filenames for all file operations: open(), 
unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)

POSIX OS
--------

The default behaviour should be to use unicode and raise an error if 
conversion to unicode fails. It should also be possible to use bytes using 
bytes arguments and optional arguments (for getcwd).

 - listdir(unicode) -> unicode and raise an error on invalid filename
 - listdir(bytes) -> bytes
 - getcwd() -> unicode
 - getcwd(bytes=True) -> bytes
 - open(): accept bytes or unicode

os.path.*() should accept operations on bytes filenames, but maybe not on 
bytes+unicode arguments. os.path.join('directory', b'filename'): raise an 
error (or use *implicit* conversion to bytes)?

When the user wants to display a filename to the screen, he can uses:
   text = str(filename, fs_encoding, "replace")

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From victor.stinner at haypocalc.com  Mon Sep 29 14:12:07 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 29 Sep 2008 14:12:07 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
Message-ID: <200809291412.07840.victor.stinner@haypocalc.com>

Le Monday 29 September 2008 06:43:55, vous avez ?crit?:
> It will make users happy, and it's simple enough to implement for
> python 3.0.

I dislike your argument. A "quick and dirty hack" is always faster to 
implement than a real solution, but we may hits later new issues if we don't 
choose the right solution.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From victor.stinner at haypocalc.com  Mon Sep 29 15:23:04 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 29 Sep 2008 15:23:04 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
Message-ID: <200809291523.04421.victor.stinner@haypocalc.com>

Patches are already avaible in the issue #3187 (os.listdir):

Le Monday 29 September 2008 14:07:55 Victor Stinner, vous avez ?crit?:
>  - listdir(unicode) -> unicode and raise an error on invalid filename

Need raise_decoding_errors.patch (don't clear Unicode error

>  - listdir(bytes) -> bytes

Always working.

>  - getcwd() -> unicode
>  - getcwd(bytes=True) -> bytes

Need merge_os_getcwd_getcwdu.patch

Note that current implement of getcwd() uses PyUnicode_FromString() to encode 
the directory, whereas getcwdu() uses the correct code (PyUnicode_Decode). So 
I merged both functions to keep only the correct version: getcwdu() => 
getcwd().

>  - open(): accept bytes or unicode

Need io_byte_filename.patch (just remove a check)

> os.path.*() should accept operations on bytes filenames, but maybe not on
> bytes+unicode arguments. os.path.join('directory', b'filename'): raise an
> error (or use *implicit* conversion to bytes)?

os.path.join() already reject mixing bytes + str.

But os.path.join(), glob.glob(), fnmatch.*(), etc. doesn't support bytes. I 
wrote some patches like:
 - glob1_bytes.patch: Fix glob.glob() to accept invalid directory name
 - fnmatch_bytes.patch: Patch fnmatch.filter() to accept bytes filenames

But I dislike both patches since they mix bytes and str. So this part still 
need some work.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From steven.bethard at gmail.com  Mon Sep 29 17:16:47 2008
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 29 Sep 2008 09:16:47 -0600
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
Message-ID: <d11dcfba0809290816k40b75e84pf6df8cfd6263ab15@mail.gmail.com>

On Mon, Sep 29, 2008 at 6:07 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> The default behaviour should be to use unicode and raise an error if
> conversion to unicode fails. It should also be possible to use bytes using
> bytes arguments and optional arguments (for getcwd).
>
>  - listdir(unicode) -> unicode and raise an error on invalid filename
>  - listdir(bytes) -> bytes
>  - getcwd() -> unicode
>  - getcwd(bytes=True) -> bytes

Please let's not introduce boolean flags like this. How about
``getcwdb`` in parallel with the old ``getcwdu``?

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From victor.stinner at haypocalc.com  Mon Sep 29 18:00:18 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 29 Sep 2008 18:00:18 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <d11dcfba0809290816k40b75e84pf6df8cfd6263ab15@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<d11dcfba0809290816k40b75e84pf6df8cfd6263ab15@mail.gmail.com>
Message-ID: <200809291800.18911.victor.stinner@haypocalc.com>

Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez ?crit?:
> >  - getcwd() -> unicode
> >  - getcwd(bytes=True) -> bytes
>
> Please let's not introduce boolean flags like this. How about
> ``getcwdb`` in parallel with the old ``getcwdu``?

Yeah, you're right. So i wrote a new patch: os_getcwdb.patch

With my patch we get (Python3):
 * os.getcwd() -> unicode
 * os.getcwdb() -> bytes

Previously in Python2 it was:
 * os.getcwd() -> str (bytes)
 * os.getcwdu() -> unicode

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From foom at fuhm.net  Mon Sep 29 18:16:32 2008
From: foom at fuhm.net (James Y Knight)
Date: Mon, 29 Sep 2008 12:16:32 -0400
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
Message-ID: <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>

On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:
> On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <foom at fuhm.net>  
> wrote:
>> [1] UTF-8b has a similar property to 8859-1, in that all byte  
>> strings can be
>> successfully round-tripped. It's not currently implemented in  
>> python core,
>> but it's a pretty trivial encoding, and is available under the BSD  
>> license,
>> see below.
>
> UTF-8b doesn't work as intended.  It produces an invalid unicode
> object (garbage surrogates) that cannot be used with external APIs or
> libraries that require unicode.

I'd be interested to hear more detail on what you expect the practical  
ramifications of this to be. It doesn't sound likely to be a problem  
to me.

> If you don't need unicode then your
> code should state so explicitly, and 8859-1 is ideal there.

But, I *do* want unicode. ALL my filenames are encoded in utf8.  
Except...that one over there. That's the whole point of UTF-8b:  
correctly encoded names get decoded correctly and readably, and the  
other cases get decoded into something unique that cannot possibly  
conflict.

James

From g.brandl at gmx.net  Mon Sep 29 18:45:28 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 29 Sep 2008 18:45:28 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
Message-ID: <gbr0nv$iqu$1@ger.gmane.org>

Victor Stinner schrieb:

> POSIX OS
> --------
> 
> The default behaviour should be to use unicode and raise an error if 
> conversion to unicode fails. It should also be possible to use bytes using 
> bytes arguments and optional arguments (for getcwd).
> 
>  - listdir(unicode) -> unicode and raise an error on invalid filename
>  - listdir(bytes) -> bytes
>  - getcwd() -> unicode
>  - getcwd(bytes=True) -> bytes
>  - open(): accept bytes or unicode
> 
> os.path.*() should accept operations on bytes filenames, but maybe not on 
> bytes+unicode arguments. os.path.join('directory', b'filename'): raise an 
> error (or use *implicit* conversion to bytes)?

This approach (changing all path-handling functions to accept either bytes
or string, but not both) is doomed in my eyes. First, there are lots of them,
second, they are not only in os.path but in many modules and also in user
code, and third, I see no clean way of implementing them in the specified way.
(Just try to do it with os.path.join as an example; I couldn't find the
good way to write it, only the bad and the ugly...)

If I had to choose, I'd still argue for the modified UTF-8 as filesystem
encoding (if it were UTF-8 otherwise), despite possible surprises when a
such-encoded filename escapes from Python.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From guido at python.org  Mon Sep 29 19:06:01 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Sep 2008 10:06:01 -0700
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbr0nv$iqu$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
Message-ID: <ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>

> Victor Stinner schrieb:

(Thanks Victor for moving this to the list. Having a discussion in the
tracker is really painful, I find.)

>> POSIX OS
>> --------
>>
>> The default behaviour should be to use unicode and raise an error if
>> conversion to unicode fails. It should also be possible to use bytes using
>> bytes arguments and optional arguments (for getcwd).
>>
>>  - listdir(unicode) -> unicode and raise an error on invalid filename

I know I keep flipflopping on this one, but the more I think about it
the more I believe it is better to drop those names than to raise an
exception. Otherwise a "naive" program that happens to use
os.listdir() can be rendered completely useless by a single non-UTF-8
filename. Consider the use of os.listdir() by the glob module. If I am
globbing for *.py, why should the presence of a file named b'\xff'
cause it to fail?

Robust programs using os.listdir() should use the bytes->bytes version.

>>  - listdir(bytes) -> bytes
>>  - getcwd() -> unicode
>>  - getcwd(bytes=True) -> bytes
>>  - open(): accept bytes or unicode
>>
>> os.path.*() should accept operations on bytes filenames, but maybe not on
>> bytes+unicode arguments. os.path.join('directory', b'filename'): raise an
>> error (or use *implicit* conversion to bytes)?

(Yeah, it should be all bytes or all strings.)

On Mon, Sep 29, 2008 at 9:45 AM, Georg Brandl <g.brandl at gmx.net> wrote:

> This approach (changing all path-handling functions to accept either bytes
> or string, but not both) is doomed in my eyes. First, there are lots of them,
> second, they are not only in os.path but in many modules and also in user
> code, and third, I see no clean way of implementing them in the specified way.
> (Just try to do it with os.path.join as an example; I couldn't find the
> good way to write it, only the bad and the ugly...)

It doesn't have to be supported for all operations -- just enough to
be able to access all the system calls. and do the most basic pathname
manipulations (split and join -- almost everything else can be built
out of those).

> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
> encoding (if it were UTF-8 otherwise), despite possible surprises when a
> such-encoded filename escapes from Python.

I'm having a hard time finding info about UTF-8b. Does anyone have a
decent link?

I noticed that OSX has a different approach yet. I believe it insists
on valid UTF-8 filenames. It may even require some normalization but I
don't know if the kernel enforces this. I tried to create a file named
b'\xff' and it came out as %ff. Then "rm %ff" worked. So I think it
may be replacing all bad UTF8 sequences with their % encoding.

The "set filesystem encoding to be Latin-1" approach has a certain
charm as well, but clearly would be a mistake on OSX, and probably on
other systems too (whenever the user doesn't think in Latin-1).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Mon Sep 29 23:57:45 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Sep 2008 15:57:45 -0600
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <loom.20080929T110713-396@post.gmane.org>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<loom.20080929T110713-396@post.gmane.org>
Message-ID: <aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>

On Mon, Sep 29, 2008 at 5:12 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Adam Olsen <rhamph <at> gmail.com> writes:
>>
>> UTF-8b doesn't work as intended.  It produces an invalid unicode
>> object (garbage surrogates) that cannot be used with external APIs or
>> libraries that require unicode.
>
> At least it works with all Python operations supported by the unicode type
> (methods, concatenation, etc.) without any bad surprise. That feeding it to e.g.
> PyGTK may give bogus results is another problem.
>
>> If you don't need unicode then your
>> code should state so explicitly, and 8859-1 is ideal there.
>
> But then you can say bye-bye to proper representation (e.g. using print()) of
> even valid filenames.

You can't print UTF-8b either.  Printing requires converting the
unicode object to UTF-8 (or whatever output encoding), and the unicode
object isn't valid, so you'd get an exception[1].

The same applies to all other hacks (such as PUA scalars).  Either the
scalar value already has an expected behaviour, in which case decoding
is lossy and reencoding replaces the correct behaviour, or it's not a
valid scalar value, which then can't be used with any external API
that requires conformant unicode.  There's no solution except to not
decode, and 8859-1 is the way to do that.

[1] Python's UTF codecs are broken in a couple respects, including the
fact that python itself uses CESU-8(!).  See
http://bugs.python.org/issue3297 and http://bugs.python.org/issue3672

-- 
Adam Olsen, aka Rhamphoryncus

From rhamph at gmail.com  Tue Sep 30 00:04:38 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Sep 2008 16:04:38 -0600
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <200809291800.18911.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<d11dcfba0809290816k40b75e84pf6df8cfd6263ab15@mail.gmail.com>
	<200809291800.18911.victor.stinner@haypocalc.com>
Message-ID: <aac2c7cb0809291504k7f19e190ga9f0dd82587d4448@mail.gmail.com>

On Mon, Sep 29, 2008 at 10:00 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez ?crit :
>> >  - getcwd() -> unicode
>> >  - getcwd(bytes=True) -> bytes
>>
>> Please let's not introduce boolean flags like this. How about
>> ``getcwdb`` in parallel with the old ``getcwdu``?
>
> Yeah, you're right. So i wrote a new patch: os_getcwdb.patch
>
> With my patch we get (Python3):
>  * os.getcwd() -> unicode
>  * os.getcwdb() -> bytes
>
> Previously in Python2 it was:
>  * os.getcwd() -> str (bytes)
>  * os.getcwdu() -> unicode

Why not do:
 * os.getcwd() -> unicode
 * posix.getcwdb() -> bytes

os gets the standard version and posix has an (unambiguously named)
platform-specific version.

-- 
Adam Olsen, aka Rhamphoryncus

From rhamph at gmail.com  Tue Sep 30 00:17:20 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Sep 2008 16:17:20 -0600
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
Message-ID: <aac2c7cb0809291517h13a631bbqd4ea49ec757f36e1@mail.gmail.com>

On Mon, Sep 29, 2008 at 11:06 AM, Guido van Rossum <guido at python.org> wrote:
> On Mon, Sep 29, 2008 at 9:45 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>
>> This approach (changing all path-handling functions to accept either bytes
>> or string, but not both) is doomed in my eyes. First, there are lots of them,
>> second, they are not only in os.path but in many modules and also in user
>> code, and third, I see no clean way of implementing them in the specified way.
>> (Just try to do it with os.path.join as an example; I couldn't find the
>> good way to write it, only the bad and the ugly...)
>
> It doesn't have to be supported for all operations -- just enough to
> be able to access all the system calls. and do the most basic pathname
> manipulations (split and join -- almost everything else can be built
> out of those).
>
>> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
>> encoding (if it were UTF-8 otherwise), despite possible surprises when a
>> such-encoded filename escapes from Python.
>
> I'm having a hard time finding info about UTF-8b. Does anyone have a
> decent link?

http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html

Scroll down to item D, near the bottom.

It turns malformed bytes into lone (therefor malformed) surrogates.

> I noticed that OSX has a different approach yet. I believe it insists
> on valid UTF-8 filenames. It may even require some normalization but I
> don't know if the kernel enforces this. I tried to create a file named
> b'\xff' and it came out as %ff. Then "rm %ff" worked. So I think it
> may be replacing all bad UTF8 sequences with their % encoding.

I suspect linux will eventually take this route as well.  If ext3 had
an option for UTF-8 validation I know I'd want it on.  That'd move the
error to the program creating bogus file names, rather than those
trying to read, display, and manage them.

> The "set filesystem encoding to be Latin-1" approach has a certain
> charm as well, but clearly would be a mistake on OSX, and probably on
> other systems too (whenever the user doesn't think in Latin-1).

Aye, it's a better hack than UTF-8b, but adding byte functions is even better.

-- 
Adam Olsen, aka Rhamphoryncus

From foom at fuhm.net  Tue Sep 30 00:28:31 2008
From: foom at fuhm.net (James Y Knight)
Date: Mon, 29 Sep 2008 18:28:31 -0400
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <aac2c7cb0809291517h13a631bbqd4ea49ec757f36e1@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
	<aac2c7cb0809291517h13a631bbqd4ea49ec757f36e1@mail.gmail.com>
Message-ID: <D6599E0D-3744-4D08-913D-544BB13743E9@fuhm.net>

On Sep 29, 2008, at 6:17 PM, Adam Olsen wrote:
> I suspect linux will eventually take this route as well.  If ext3 had
> an option for UTF-8 validation I know I'd want it on.  That'd move the
> error to the program creating bogus file names, rather than those
> trying to read, display, and manage them.

Of course, even on Mac OS X, or a theoretical UTF-8-enforcing ext3,  
random byte strings are still possible in your program's argv, in  
environment variables, and as arguments to subprocesses.

So python still needs to do something...

James

From martin at v.loewis.de  Tue Sep 30 00:56:18 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 00:56:18 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
Message-ID: <48E15D12.40009@v.loewis.de>

> The default behaviour should be to use unicode and raise an error if 
> conversion to unicode fails. It should also be possible to use bytes using 
> bytes arguments and optional arguments (for getcwd).

I'm still opposed to allowing bytes as file names at all in 3k. Python
should really strive for providing a uniform datatype, and that should
be the character string type.

For applications that cannot trust that the conversion works always
correctly on POSIX systems, sys.setfilesystemencoding should be
provided.

In the long run, need for explicit calls to this function should be
reduced, by
a) systems getting more consistent in their file name encoding, and
b) Python providing better defaults for detecting the file name
   encoding, and better round-trip support for non-encodable bytes.
Part b) is probably out-of-scope for 3.0 now, but should be reconsidered
for 3.1

Regards,
Martin

From martin at v.loewis.de  Tue Sep 30 01:14:29 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 01:14:29 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
 2.6 or 3.0?
In-Reply-To: <aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>	<48DE705E.6050405@v.loewis.de>	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>	<48DFF382.7020006@v.loewis.de>	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>	<loom.20080929T110713-396@post.gmane.org>
	<aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>
Message-ID: <48E16155.1040209@v.loewis.de>

Adam Olsen wrote:
> There's no solution except to not
> decode, and 8859-1 is the way to do that.

I think you need to elaborate that. What does ISO-8859-1 has to do
with a Python datatype in this context: which datatype, and what
algorithm on it are you specifically referring to?

When I do (in 2.x)

py> "foo".decode("iso-8859-1")
u'foo'

ISTM that 8859-1 is all about decoding, so I don't understand why
you say it is a way not to decode.

Regards,
Martin

From rhamph at gmail.com  Tue Sep 30 01:23:52 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Sep 2008 17:23:52 -0600
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <48E16155.1040209@v.loewis.de>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<loom.20080929T110713-396@post.gmane.org>
	<aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>
	<48E16155.1040209@v.loewis.de>
Message-ID: <aac2c7cb0809291623ke994058qcdbe9837695f3b0c@mail.gmail.com>

On Mon, Sep 29, 2008 at 5:14 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Adam Olsen wrote:
>> There's no solution except to not
>> decode, and 8859-1 is the way to do that.
>
> I think you need to elaborate that. What does ISO-8859-1 has to do
> with a Python datatype in this context: which datatype, and what
> algorithm on it are you specifically referring to?
>
> When I do (in 2.x)
>
> py> "foo".decode("iso-8859-1")
> u'foo'
>
> ISTM that 8859-1 is all about decoding, so I don't understand why
> you say it is a way not to decode.

8859-1 has no invalid bytes and is a 1-to-1 mapping.  If you have an
API that always returns unicode but accepts an encoding you can use
it, then reencode using 8859-1 to get back the original bytes.

An ugly hack, but more correct than UTF-8b or any similar attempt to
do "unicode but not quite unicode"; either it's lossy, or it's not
unicode.  There's no in between.

-- 
Adam Olsen, aka Rhamphoryncus

From victor.stinner at haypocalc.com  Tue Sep 30 01:29:24 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 30 Sep 2008 01:29:24 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
Message-ID: <200809300129.24972.victor.stinner@haypocalc.com>

Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit?:
> >>  - listdir(unicode) -> unicode and raise an error on invalid filename
>
> I know I keep flipflopping on this one, but the more I think about it
> the more I believe it is better to drop those names than to raise an
> exception. Otherwise a "naive" program that happens to use
> os.listdir() can be rendered completely useless by a single non-UTF-8
> filename. Consider the use of os.listdir() by the glob module. If I am
> globbing for *.py, why should the presence of a file named b'\xff'
> cause it to fail?

It would be hard for a newbie programmer to understand why he's unable to find 
his very important file ("important r?port.doc") using os.listdir(). And yes, 
if your file system is broken, glob(<unicode>) will fail.

If we choose to support bytes on Linux, a robust and portable program have to 
use only bytes filenames on Linux to always be able to list and open files.

A full example to list files and display filenames:

  import os
  import os.path
  import sys
  if os.path.supports_unicode_filenames:
     cwd = getcwd()
  else:
     cwd = getcwdb()
     encoding = sys.getfilesystemencoding()
  for filename in os.listdir(cwd):
     if os.path.supports_unicode_filenames:
        text = str(filename, encoding, "replace)
     else:
        text = filename
     print("=== File {0} ===".format(text))
     for line in open(filename):
        ...

We need an "if" to choose the directory. The second "if" is only needed to 
display the filename. Using bytes, it would be possible to write better code 
detect the real charset (eg. ISO-8859-1 in a UTF-8 file system) and so 
display correctly the filename and/or propose to rename the file. Would it 
possible using UTF-8b / PUA hacks?

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From martin at v.loewis.de  Tue Sep 30 01:31:11 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 01:31:11 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
 2.6 or 3.0?
In-Reply-To: <aac2c7cb0809291623ke994058qcdbe9837695f3b0c@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>	
	<48DE705E.6050405@v.loewis.de>	
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>	
	<48DFF382.7020006@v.loewis.de>	
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>	
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>	
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>	
	<loom.20080929T110713-396@post.gmane.org>	
	<aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>	
	<48E16155.1040209@v.loewis.de>
	<aac2c7cb0809291623ke994058qcdbe9837695f3b0c@mail.gmail.com>
Message-ID: <48E1653F.2050308@v.loewis.de>

>> ISTM that 8859-1 is all about decoding, so I don't understand why
>> you say it is a way not to decode.
> 
> 8859-1 has no invalid bytes and is a 1-to-1 mapping.  If you have an
> API that always returns unicode but accepts an encoding you can use
> it, then reencode using 8859-1 to get back the original bytes.

I still don't understand. 8859-1 is an encoding, not a datatype.
So how do you propose file names to be represented? "In 8859-1"
is not a valid answer, because you cannot derive an implementation
from that answer (atleast, I cannot). Please explain.

Regards,
Martin

From foom at fuhm.net  Tue Sep 30 01:33:47 2008
From: foom at fuhm.net (James Y Knight)
Date: Mon, 29 Sep 2008 19:33:47 -0400
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <aac2c7cb0809291623ke994058qcdbe9837695f3b0c@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<loom.20080929T110713-396@post.gmane.org>
	<aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>
	<48E16155.1040209@v.loewis.de>
	<aac2c7cb0809291623ke994058qcdbe9837695f3b0c@mail.gmail.com>
Message-ID: <47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net>

On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote:
> An ugly hack, but more correct than UTF-8b or any similar attempt to
> do "unicode but not quite unicode"; either it's lossy, or it's not
> unicode.  There's no in between.

Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a  
very poor way forward. I don't see how you can claim it's more  
correct. It's correct in no case except for pure ASCII on a utf-8  
system.

I still like the UTF-8b proposal, but if you want to push against  
that, I don't see any sensible alternative but to move back towards a  
bytestring API. Having two parallel APIs or a mixture of data types is  
confusing, so, just toss the Unicode APIs entirely. That'd be much  
much nicer than having everyone use 8859-1, incorrectly, for their  
platform encoding.

On Windows, the platform-native Unicode strings could simply be  
encoded into utf-8 when entering Python-land, and decoded back to  
Unicode when leaving pythonland, to keep the API consistently  
bytestring oriented on both platforms.

James

From rhamph at gmail.com  Tue Sep 30 01:34:43 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Sep 2008 17:34:43 -0600
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <48E1653F.2050308@v.loewis.de>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<loom.20080929T110713-396@post.gmane.org>
	<aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>
	<48E16155.1040209@v.loewis.de>
	<aac2c7cb0809291623ke994058qcdbe9837695f3b0c@mail.gmail.com>
	<48E1653F.2050308@v.loewis.de>
Message-ID: <aac2c7cb0809291634t516f7660u1e424275856838b1@mail.gmail.com>

On Mon, Sep 29, 2008 at 5:31 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>> ISTM that 8859-1 is all about decoding, so I don't understand why
>>> you say it is a way not to decode.
>>
>> 8859-1 has no invalid bytes and is a 1-to-1 mapping.  If you have an
>> API that always returns unicode but accepts an encoding you can use
>> it, then reencode using 8859-1 to get back the original bytes.
>
> I still don't understand. 8859-1 is an encoding, not a datatype.
> So how do you propose file names to be represented? "In 8859-1"
> is not a valid answer, because you cannot derive an implementation
> from that answer (atleast, I cannot). Please explain.

Decoding UTF-8 using 8859-1 gives you garbage, but it's lossless and
reversible, and that's all a backup program would need.

-- 
Adam Olsen, aka Rhamphoryncus

From guido at python.org  Tue Sep 30 01:41:36 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Sep 2008 16:41:36 -0700
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <200809300129.24972.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
	<200809300129.24972.victor.stinner@haypocalc.com>
Message-ID: <ca471dc20809291641m3b563b8ar6521b4ec94f0e120@mail.gmail.com>

On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit :
>> >>  - listdir(unicode) -> unicode and raise an error on invalid filename
>>
>> I know I keep flipflopping on this one, but the more I think about it
>> the more I believe it is better to drop those names than to raise an
>> exception. Otherwise a "naive" program that happens to use
>> os.listdir() can be rendered completely useless by a single non-UTF-8
>> filename. Consider the use of os.listdir() by the glob module. If I am
>> globbing for *.py, why should the presence of a file named b'\xff'
>> cause it to fail?
>
> It would be hard for a newbie programmer to understand why he's unable to find
> his very important file ("important r?port.doc") using os.listdir().

*Every* failure in this scenario will be hard to understand for a
newbie programmer. We can just document the fact.

> And yes,
> if your file system is broken, glob(<unicode>) will fail.

Why should it?

> If we choose to support bytes on Linux, a robust and portable program have to
> use only bytes filenames on Linux to always be able to list and open files.

Right. But such robustness is only needed to support certain odd cases
and we cannot demand that most people bother to write robust code all
the time.

> A full example to list files and display filenames:
>
>  import os
>  import os.path
>  import sys
>  if os.path.supports_unicode_filenames:

This is backwards -- the Unicode API is always supported, the bytes
API only on Linux (and perhaps some other other Unixes).

>     cwd = getcwd()
>  else:
>     cwd = getcwdb()
>     encoding = sys.getfilesystemencoding()
>  for filename in os.listdir(cwd):
>     if os.path.supports_unicode_filenames:
>        text = str(filename, encoding, "replace)
>     else:
>        text = filename
>     print("=== File {0} ===".format(text))
>     for line in open(filename):
>        ...
>
> We need an "if" to choose the directory. The second "if" is only needed to
> display the filename. Using bytes, it would be possible to write better code
> detect the real charset (eg. ISO-8859-1 in a UTF-8 file system) and so
> display correctly the filename and/or propose to rename the file. Would it
> possible using UTF-8b / PUA hacks?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Tue Sep 30 01:50:32 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Sep 2008 17:50:32 -0600
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<loom.20080929T110713-396@post.gmane.org>
	<aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>
	<48E16155.1040209@v.loewis.de>
	<aac2c7cb0809291623ke994058qcdbe9837695f3b0c@mail.gmail.com>
	<47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net>
Message-ID: <aac2c7cb0809291650i5aad3ea3i5df7242096498c15@mail.gmail.com>

On Mon, Sep 29, 2008 at 5:33 PM, James Y Knight <foom at fuhm.net> wrote:
> On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote:
>>
>> An ugly hack, but more correct than UTF-8b or any similar attempt to
>> do "unicode but not quite unicode"; either it's lossy, or it's not
>> unicode.  There's no in between.
>
> Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a very
> poor way forward. I don't see how you can claim it's more correct. It's
> correct in no case except for pure ASCII on a utf-8 system.

It's correct in the sense that it can roundtrip all filenames.  UTF-8b
is lossy, so certain filenames are not roundtripped properly.

It doesn't let you print correctly, but neither would an API that
returns bytes.  8859-1 is just a hack for when you want bytes, but the
API only allows unicode.

> I still like the UTF-8b proposal, but if you want to push against that, I
> don't see any sensible alternative but to move back towards a bytestring
> API. Having two parallel APIs or a mixture of data types is confusing, so,
> just toss the Unicode APIs entirely. That'd be much much nicer than having
> everyone use 8859-1, incorrectly, for their platform encoding.

As a user, I expect all file names to be printable.  That requires
unicode, and any program that creates filenames with arbitrary
bytestrings is just broken.  Not all operating systems enforce this
yet, but returning bytes only means we have to explicitly decode in
the 99% of cases where we'd happily assume it's correct unicode.

I'd rather the 1% of cases that need to handle bad file names make an
explicit effort to do so, via alternate byte APIs or (if necessary)
the 8859-1 hack.

> On Windows, the platform-native Unicode strings could simply be encoded into
> utf-8 when entering Python-land, and decoded back to Unicode when leaving
> pythonland, to keep the API consistently bytestring oriented on both
> platforms.

-- 
Adam Olsen, aka Rhamphoryncus

From victor.stinner at haypocalc.com  Tue Sep 30 02:02:38 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 30 Sep 2008 02:02:38 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbr0nv$iqu$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
Message-ID: <200809300202.38574.victor.stinner@haypocalc.com>

Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez ?crit?:
> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
> encoding (if it were UTF-8 otherwise), despite possible surprises when a
> such-encoded filename escapes from Python.

If I understand correctly this solution. The idea is to change the default 
file system encoding, right? Eg. if your filesystem is UTF-8, use ISO-8859-1 
to make sure that UTF-8 conversion will never fail.

Let's try with an ugly directory on my UTF-8 file system:
$ find
.
./t?ste
./?
./a?b
./dossi?
./dossi?/abc
./dir?name
./dir?name/xyz

Python3 using encoding=ISO-8859-1:
>>> import os; os.listdir(b'.')
[b't\xc3\xaaste', b'\xc3\xb4', b'a\xffb', b'dossi\xc3\xa9', b'dir\xffname']
>>> files=os.listdir('.'); files
['t??ste', '??', 'a?b', 'dossi??', 'dir?name']
>>> open(files[0]).close()
>>> os.listdir(files[-1])
['xyz']

Ok, I have unicode filenames and I'm able to open a file and list a directory. 
The problem is now to display correctly the filenames.

For me "unicode" sounds like "text (characters) encoded in the correct 
charset". In this case, unicode is just a storage for *bytes* in a custom 
charset.

How can we mix <custom unicode (bytes encoded in ISO-8859-1)> with <real 
unicode>? Eg. os.path.join('dossi??', "fichi?") : first argument is encoded 
in ISO-8859-1 whereas the second argument is encoding in Unicode. It's 
something like that:
   str(b'dossi\xc3\xa9', 'ISO-8859-1') + '/' + 'fichi\xe9'

Whereas the correct (unicode) result should be: 
   'dossi?/fichi?'
as bytes in ISO-8859-1:
   b'dossi\xc3\xa9/fichi\xc3\xa9'
as bytes in UTF-8:
   b'dossi\xe9/fichi\xe9'

Change the default file system encoding to store bytes in Unicode is like 
introducing a new Python type: <fake Unicode for filename hacks>.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From martin at v.loewis.de  Tue Sep 30 02:07:23 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 02:07:23 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <200809300129.24972.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbr0nv$iqu$1@ger.gmane.org>	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
	<200809300129.24972.victor.stinner@haypocalc.com>
Message-ID: <48E16DBB.1030805@v.loewis.de>

>   import os
>   import os.path
>   import sys
>   if os.path.supports_unicode_filenames:
>      cwd = getcwd()
>   else:
>      cwd = getcwdb()
>      encoding = sys.getfilesystemencoding()
>   for filename in os.listdir(cwd):
>      if os.path.supports_unicode_filenames:
>         text = str(filename, encoding, "replace)
>      else:
>         text = filename
>      print("=== File {0} ===".format(text))
>      for line in open(filename):
>         ...
> 
> We need an "if" to choose the directory. The second "if" is only needed to 
> display the filename. Using bytes, it would be possible to write better code 
> detect the real charset (eg. ISO-8859-1 in a UTF-8 file system) and so 
> display correctly the filename and/or propose to rename the file. Would it 
> possible using UTF-8b / PUA hacks?

Not sure what "it" is: to write the code above using the PUA hack:

for filename in os.listdir(os.getcwd())
    text = repr(filename)
    print("=== File {0} ===".format(text))
    for line in open(filenmae):
        ...

If "it" is "display the filename": sure, see above. If "it" is "detect
the real charset": sure, why not?

Regards,
Martin

From rhamph at gmail.com  Tue Sep 30 02:08:41 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Sep 2008 18:08:41 -0600
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <200809300129.24972.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
	<200809300129.24972.victor.stinner@haypocalc.com>
Message-ID: <aac2c7cb0809291708u2fc03647sa5a09af4a15719ba@mail.gmail.com>

On Mon, Sep 29, 2008 at 5:29 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit :
>> >>  - listdir(unicode) -> unicode and raise an error on invalid filename
>>
>> I know I keep flipflopping on this one, but the more I think about it
>> the more I believe it is better to drop those names than to raise an
>> exception. Otherwise a "naive" program that happens to use
>> os.listdir() can be rendered completely useless by a single non-UTF-8
>> filename. Consider the use of os.listdir() by the glob module. If I am
>> globbing for *.py, why should the presence of a file named b'\xff'
>> cause it to fail?
>
> It would be hard for a newbie programmer to understand why he's unable to find
> his very important file ("important r?port.doc") using os.listdir(). And yes,
> if your file system is broken, glob(<unicode>) will fail.

Imagine a program that list all files in a dir, as well as their file
size.  If we return bytes we'll print the name wrong.  If we return
lossy unicode we'll be unable to get the size of some files.  If we
return a malformed unicode we'll be unable to print at all (and what
if this is a GUI app?)

The common use cases need unicode, so the best options for them are to
fail outright or skip bad filenames.

The uncommon use cases need bytes, and they could do an explicit lossy
decode for printing, while still keeping the internal file name as
bytes.

Failing outright does have the advantage that the resulting exception
should have a half-decent approximation of the bad filename.  (Thanks
to the recent choices on unicode repr() and having stderr do escapes.)

-- 
Adam Olsen, aka Rhamphoryncus

From victor.stinner at haypocalc.com  Tue Sep 30 02:09:33 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 30 Sep 2008 02:09:33 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <aac2c7cb0809291631sf25a0cahe355fc6cc5816ff7@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48E15B83.9040205@v.loewis.de>
	<aac2c7cb0809291631sf25a0cahe355fc6cc5816ff7@mail.gmail.com>
Message-ID: <200809300209.33636.victor.stinner@haypocalc.com>

Le Tuesday 30 September 2008 01:31:45 Adam Olsen, vous avez ?crit?:
> The alternative is not be valid unicode, but since we can't use such 
> objects with external libs, can't even print them, we might as well 
> call them something else.  We already have a name for that: bytes.

:-)

From victor.stinner at haypocalc.com  Tue Sep 30 02:47:20 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 30 Sep 2008 02:47:20 +0200
Subject: [Python-3000] Patch for an initial support of bytes filename in
	Python3
Message-ID: <200809300247.20349.victor.stinner@haypocalc.com>

Hi,

See attached patch: python3_bytes_filename.patch

Using the patch, you will get:
 - open() support bytes
 - listdir(unicode) -> only unicode, *skip* invalid filenames 
   (as asked by Guido)
 - remove os.getcwdu()
 - create os.getcwdb() -> bytes
 - glob.glob() support bytes
 - fnmatch.filter() support bytes
 - posixpath.join() and posixpath.split() support bytes

Mixing bytes and str is invalid. Examples raising a TypeError:
 - posixpath.join(b'x', 'y')
 - fnmatch.filter([b'x', 'y'], '*')
 - fnmatch.filter([b'x', b'y'], '*')
 - glob.glob1('.', b'*')
 - glob.glob1(b'.', '*')

$ diffstat ~/python3_bytes_filename.patch
 Lib/fnmatch.py        |    7 +++-
 Lib/glob.py           |   15 ++++++---
 Lib/io.py             |    2 -
 Lib/posixpath.py      |   20 ++++++++----
 Modules/posixmodule.c |   83 
++++++++++++++++++--------------------------------
 5 files changed, 62 insertions(+), 65 deletions(-)

TODO:
 - review this patch :-)
 - support non-ASCII bytes in fnmatch.filter()
 - fix other functions, eg. posixpath.isabs() and fnmatch.fnmatchcase()
 - fix functions written in C: grep FileSystemDefaultEncoding
 - make sure that mixing bytes and str is rejected

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: python3_bytes_filename.patch
Type: text/x-diff
Size: 6732 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080930/e8998338/attachment.patch>

From stephen at xemacs.org  Tue Sep 30 04:24:29 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 30 Sep 2008 11:24:29 +0900
Subject: [Python-3000] [Python-Dev] New proposition for Python3
	bytes	filename issue
In-Reply-To: <ca471dc20809291641m3b563b8ar6521b4ec94f0e120@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
	<200809300129.24972.victor.stinner@haypocalc.com>
	<ca471dc20809291641m3b563b8ar6521b4ec94f0e120@mail.gmail.com>
Message-ID: <87prmme5gi.fsf@xemacs.org>

Guido van Rossum writes:
 > On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner
 > <victor.stinner at haypocalc.com> wrote:

 > > It would be hard for a newbie programmer to understand why he's
 > > unable to find his very important file ("important r?port.doc")
 > > using os.listdir().

 > *Every* failure in this scenario will be hard to understand for a
 > newbie programmer. We can just document the fact.

Guido is absolutely right.  The Emacs/Mule people have been trying to
solve this kind of problem for 20 years, and the best they've come up
with is Martin's strategy: if you need really robust decoding, force
ISO 8859/1 (which for historical reasons uses all 256 octets) to get a
lossless internal text representation, and decode from that and *track
the encoding used* at the application level.  The email-sig/Mailman
people will testify how hard this is to do well, even when you have a
handful of RFCs that specify how it is to be done!

On the other hand, this kind of robustness is almost never needed in
"general newbie programming", except when you are writing a program to
be used to clean up after an undisciplined administration, or some
other system disaster.  Under normal circumstances the system encoding
is well-known and conformance is universal.

The best you can do for a general programming system is to
heuristically determine a single system encoding and raise an error if
the decoding fails.

From stephen at xemacs.org  Tue Sep 30 05:11:12 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 30 Sep 2008 12:11:12 +0900
Subject: [Python-3000] [Python-Dev] Filename as byte string in
	python	2.6 or 3.0?
In-Reply-To: <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>
Message-ID: <87od26e3an.fsf@xemacs.org>

James Y Knight writes:
 > On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:

 > > UTF-8b doesn't work as intended.  It produces an invalid unicode
 > > object (garbage surrogates) that cannot be used with external APIs or
 > > libraries that require unicode.
 > 
 > I'd be interested to hear more detail on what you expect the practical  
 > ramifications of this to be. It doesn't sound likely to be a problem  
 > to me.

That's because you have a specific use case in mind.  Adam clearly has
in mind passing the filename on to a library which might proceed to
signal an error (to him, unexpected) on garbage surrogates.  He
doesn't want to be surprised by that.

The problem is that all of these hacks involve a private encoding that
looks like something else, and standards-conforming external programs
will be confused by them.  You can't prevent them from leaking unless
you store them as a non-text type, which has huge ramifications.

 > > If you don't need unicode then your
 > > code should state so explicitly, and 8859-1 is ideal there.
 > 
 > But, I *do* want unicode. ALL my filenames are encoded in utf8.  

That's not what really is at issue here.  The point is that in the
exceptional case where you get non-Unicode, and are willing to accept
it, ersatz binary (ISO-8859-1) works fine.  The problem is tagging
this as an exceptional filename that doesn't use the usual encoding;
that should be done by the application, I think.  Most applications
won't need it.

 > Except...that one over there. That's the whole point of UTF-8b:  
 > correctly encoded names get decoded correctly and readably, and the  
 > other cases get decoded into something unique that cannot possibly  
 > conflict.

Sure.  But there are lots of other operations besides encoding and
decoding that we do with filenames.  How do you display a filename?
How about concatenating them to make paths?  What do you do when you
want to mix a filename with other, well-formed strings?  If you keep
the filenames internally in UTF-8b, you're going to need what amounts
to a whole string API for dealing with them, aren't you?  If you're
not doing that, how is UTF-8b represented?

And in any case, when you do want to process them as text, the
"something unique" will have to be handled exceptionally.  I don't
think it makes sense to delay that exception; the exception should be
raised as soon as Python fails to make sense of the filename.  What to
do about that exception is a policy matter, as well.  Shouldn't that
policy be decided at the application level, rather than the Python
level?

From brett at python.org  Tue Sep 30 05:14:00 2008
From: brett at python.org (Brett Cannon)
Date: Mon, 29 Sep 2008 20:14:00 -0700
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in Python3
In-Reply-To: <200809300247.20349.victor.stinner@haypocalc.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
Message-ID: <bbaeab100809292014y275c801fwa27e98ba8c5066f3@mail.gmail.com>

On Mon, Sep 29, 2008 at 5:47 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Hi,
>
> See attached patch: python3_bytes_filename.patch
>

Patches should go on the tracker, not the mailing list. Otherwise it
will just get lost.

-Brett

From martin at v.loewis.de  Tue Sep 30 08:00:55 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 08:00:55 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <200809300202.38574.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
Message-ID: <48E1C097.8030309@v.loewis.de>

> Change the default file system encoding to store bytes in Unicode is like 
> introducing a new Python type: <fake Unicode for filename hacks>.

Exactly. Seems like the best solution to me, despite your polemics.

Regards,
Martin

From g.brandl at gmx.net  Tue Sep 30 08:22:37 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 30 Sep 2008 08:22:37 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <200809300202.38574.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
Message-ID: <gbsgk6$kc1$1@ger.gmane.org>

Victor Stinner schrieb:
> Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez ?crit :
>> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
>> encoding (if it were UTF-8 otherwise), despite possible surprises when a
>> such-encoded filename escapes from Python.
> 
> If I understand correctly this solution. The idea is to change the default 
> file system encoding, right? Eg. if your filesystem is UTF-8, use ISO-8859-1 
> to make sure that UTF-8 conversion will never fail.

No, that was not what I meant (although it is another possibility). As I wrote,
Martin's proposal that I support here is using the modified UTF-8 codec that
successfully roundtrips otherwise invalid UTF-8 data.

You seem to forget that (disregarding OSX here, since it already enforces
UTF-8) the majority of file names on Posix systems will be encoded correctly.

> Let's try with an ugly directory on my UTF-8 file system:
> $ find
> ..
> ../t?ste
> ../?
> ../a?b
> ../dossi?
> ../dossi?/abc
> ../dir?name
> ../dir?name/xyz
> 
> Python3 using encoding=ISO-8859-1:
>>>> import os; os.listdir(b'.')
> [b't\xc3\xaaste', b'\xc3\xb4', b'a\xffb', b'dossi\xc3\xa9', b'dir\xffname']
>>>> files=os.listdir('.'); files
> ['t??ste', '??', 'a?b', 'dossi??', 'dir?name']
>>>> open(files[0]).close()
>>>> os.listdir(files[-1])
> ['xyz']
> 
> Ok, I have unicode filenames and I'm able to open a file and list a directory. 
> The problem is now to display correctly the filenames.
> 
> For me "unicode" sounds like "text (characters) encoded in the correct 
> charset". In this case, unicode is just a storage for *bytes* in a custom 
> charset.

> How can we mix <custom unicode (bytes encoded in ISO-8859-1)> with <real 
> unicode>? Eg. os.path.join('dossi??', "fichi?") : first argument is encoded 
> in ISO-8859-1 whereas the second argument is encoding in Unicode. It's 
> something like that:
>    str(b'dossi\xc3\xa9', 'ISO-8859-1') + '/' + 'fichi\xe9'
> 
> Whereas the correct (unicode) result should be: 
>    'dossi?/fichi?'
> as bytes in ISO-8859-1:
>    b'dossi\xc3\xa9/fichi\xc3\xa9'
> as bytes in UTF-8:
>    b'dossi\xe9/fichi\xe9'

With the filenames decoded by UTF-8, your files named t?ste, ?, dossi? will
be displayed and handled correctly. The others are *invalid* in the filesystem
encoding UTF-8 and therefore would be represented by something like

u'dir\uXXffname' where XX is some private use Unicode namespace. It won't look
pretty when printed, but then, what do other applications do? They e.g. display
a question mark as you show above, which is not better in terms of readability.

But it will work when given to a filename-handling function. Valid filenames
can be compared to Unicode strings.

A real-world example: OpenOffice can't open files with invalid bytes in their
name. They are displayed in the "Open file" dialog, but trying to open fails.
This regularly drives me crazy. Let's not make Python not work this way too,
or, even worse, not even display those filenames.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From rhamph at gmail.com  Tue Sep 30 08:52:21 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 30 Sep 2008 00:52:21 -0600
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbsgk6$kc1$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<gbsgk6$kc1$1@ger.gmane.org>
Message-ID: <aac2c7cb0809292352l40a3cca1h64a8046fd7cd4364@mail.gmail.com>

On Tue, Sep 30, 2008 at 12:22 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> Victor Stinner schrieb:
>> Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez ?crit :
>>> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
>>> encoding (if it were UTF-8 otherwise), despite possible surprises when a
>>> such-encoded filename escapes from Python.
>>
>> If I understand correctly this solution. The idea is to change the default
>> file system encoding, right? Eg. if your filesystem is UTF-8, use ISO-8859-1
>> to make sure that UTF-8 conversion will never fail.
>
> No, that was not what I meant (although it is another possibility). As I wrote,
> Martin's proposal that I support here is using the modified UTF-8 codec that
> successfully roundtrips otherwise invalid UTF-8 data.
>
> You seem to forget that (disregarding OSX here, since it already enforces
> UTF-8) the majority of file names on Posix systems will be encoded correctly.
>
>> Let's try with an ugly directory on my UTF-8 file system:
>> $ find
>> ..
>> ../t?ste
>> ../?
>> ../a?b
>> ../dossi?
>> ../dossi?/abc
>> ../dir?name
>> ../dir?name/xyz
>>
>> Python3 using encoding=ISO-8859-1:
>>>>> import os; os.listdir(b'.')
>> [b't\xc3\xaaste', b'\xc3\xb4', b'a\xffb', b'dossi\xc3\xa9', b'dir\xffname']
>>>>> files=os.listdir('.'); files
>> ['t??ste', '??', 'a?b', 'dossi?(c)', 'dir?name']
>>>>> open(files[0]).close()
>>>>> os.listdir(files[-1])
>> ['xyz']
>>
>> Ok, I have unicode filenames and I'm able to open a file and list a directory.
>> The problem is now to display correctly the filenames.
>>
>> For me "unicode" sounds like "text (characters) encoded in the correct
>> charset". In this case, unicode is just a storage for *bytes* in a custom
>> charset.
>
>> How can we mix <custom unicode (bytes encoded in ISO-8859-1)> with <real
>> unicode>? Eg. os.path.join('dossi?(c)', "fichi?") : first argument is encoded
>> in ISO-8859-1 whereas the second argument is encoding in Unicode. It's
>> something like that:
>>    str(b'dossi\xc3\xa9', 'ISO-8859-1') + '/' + 'fichi\xe9'
>>
>> Whereas the correct (unicode) result should be:
>>    'dossi?/fichi?'
>> as bytes in ISO-8859-1:
>>    b'dossi\xc3\xa9/fichi\xc3\xa9'
>> as bytes in UTF-8:
>>    b'dossi\xe9/fichi\xe9'
>
> With the filenames decoded by UTF-8, your files named t?ste, ?, dossi? will
> be displayed and handled correctly. The others are *invalid* in the filesystem
> encoding UTF-8 and therefore would be represented by something like
>
> u'dir\uXXffname' where XX is some private use Unicode namespace. It won't look
> pretty when printed, but then, what do other applications do? They e.g. display
> a question mark as you show above, which is not better in terms of readability.
>
> But it will work when given to a filename-handling function. Valid filenames
> can be compared to Unicode strings.
>
> A real-world example: OpenOffice can't open files with invalid bytes in their
> name. They are displayed in the "Open file" dialog, but trying to open fails.
> This regularly drives me crazy. Let's not make Python not work this way too,
> or, even worse, not even display those filenames.

The only way to display that file would be to transform it into some
other valid unicode string.  However, as that string is already valid,
you've just made any files named after it impossible to open.  If you
extend unicode then you're unable to display that extended name[1].

I think Guido's right on this one.  If I have to choose between
openoffice crashing or skipping my file, I'd vastly prefer it skip it.
 A warning would be a nice bonus (from python or from openoffice),
telling me there's a buggered file I should go fix.  Renaming the file
is the end solution.

[1] You could argue that Unicode should add new scalars to handle all
currently invalid UTF-8 sequences.  They could then output to their
original forms if in UTF-8, or a mundane form in UTF-16 and UTF-32.
However, I suspect "we don't want to add validation to linux" will not
be a very persuasive argument.

-- 
Adam Olsen, aka Rhamphoryncus

From solipsis at pitrou.net  Tue Sep 30 11:28:03 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Sep 2008 09:28:03 +0000 (UTC)
Subject: [Python-3000] New proposition for Python3 bytes filename issue
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<gbsgk6$kc1$1@ger.gmane.org>
	<aac2c7cb0809292352l40a3cca1h64a8046fd7cd4364@mail.gmail.com>
Message-ID: <loom.20080930T092131-632@post.gmane.org>

Adam Olsen <rhamph <at> gmail.com> writes:
> 
> The only way to display that file would be to transform it into some
> other valid unicode string.  However, as that string is already valid,
> you've just made any files named after it impossible to open.

Not if those valid sequences are also properly escaped to avoid collisions.
That's what utf-8b claims to do.

My view of utf-8b is that if is not really  a new codec, but an escaping phase
added in front of utf-8, such that illegal byte sequences get converted to legal
byte sequences. This is how e.g. XML-escaping works ("&" -> "&amp;", etc.). The
only difficulty being in choosing sufficiently rare escaping sequences, so that
readability is not impacted.

From mal at egenix.com  Tue Sep 30 12:31:51 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 30 Sep 2008 12:31:51 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <48E1C097.8030309@v.loewis.de>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbr0nv$iqu$1@ger.gmane.org>	<200809300202.38574.victor.stinner@haypocalc.com>
	<48E1C097.8030309@v.loewis.de>
Message-ID: <48E20017.3020405@egenix.com>

On 2008-09-30 08:00, Martin v. L?wis wrote:
>> Change the default file system encoding to store bytes in Unicode is like 
>> introducing a new Python type: <fake Unicode for filename hacks>.
> 
> Exactly. Seems like the best solution to me, despite your polemics.

Not a bad idea... have os.listdir() return Unicode subclasses that work
like file handles, ie. they have an extra buffer that holds the original
bytes value received from the underlying C API.

Passing these handles to open() would then do the right thing by using
whatever os.listdir() got back from the file system to open the file,
while still providing a sane way to display the filename, e.g. using
question marks for the invalid characters.

The only problem with this approach is concatenation of such handles
to form pathnames, but then perhaps those concatenations could just
work on the bytes value as well (I don't know of any OS that uses non-
ASCII path separators).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 30 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From stephen at xemacs.org  Tue Sep 30 13:24:45 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 30 Sep 2008 20:24:45 +0900
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <aac2c7cb0809292352l40a3cca1h64a8046fd7cd4364@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<gbsgk6$kc1$1@ger.gmane.org>
	<aac2c7cb0809292352l40a3cca1h64a8046fd7cd4364@mail.gmail.com>
Message-ID: <87iqsdev0i.fsf@xemacs.org>

Adam Olsen writes:

 > [1] You could argue that Unicode should add new scalars to handle all
 > currently invalid UTF-8 sequences.

AFAIK there are about 2^31 of these, though!

From rhamph at gmail.com  Tue Sep 30 14:20:28 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 30 Sep 2008 06:20:28 -0600
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <loom.20080930T092131-632@post.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<gbsgk6$kc1$1@ger.gmane.org>
	<aac2c7cb0809292352l40a3cca1h64a8046fd7cd4364@mail.gmail.com>
	<loom.20080930T092131-632@post.gmane.org>
Message-ID: <aac2c7cb0809300520y571e9flcea42296dfda4a22@mail.gmail.com>

On Tue, Sep 30, 2008 at 3:28 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Adam Olsen <rhamph <at> gmail.com> writes:
>>
>> The only way to display that file would be to transform it into some
>> other valid unicode string.  However, as that string is already valid,
>> you've just made any files named after it impossible to open.
>
> Not if those valid sequences are also properly escaped to avoid collisions.
> That's what utf-8b claims to do.
>
> My view of utf-8b is that if is not really  a new codec, but an escaping phase
> added in front of utf-8, such that illegal byte sequences get converted to legal
> byte sequences. This is how e.g. XML-escaping works ("&" -> "&amp;", etc.). The
> only difficulty being in choosing sufficiently rare escaping sequences, so that
> readability is not impacted.

UTF-8b uses lone surrogates, which are malformed.

You bring up a good point though.  That sort escaping is lossless, and
a PUA escape character would be unlikely to collide.  It would still
fail if another API was used to open the file (gtk or openoffice?),
and the thought of it creeping into other apps gives me an icky
feeling.

-- 
Adam Olsen, aka Rhamphoryncus

From rhamph at gmail.com  Tue Sep 30 14:36:51 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 30 Sep 2008 06:36:51 -0600
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <87iqsdev0i.fsf@xemacs.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<gbsgk6$kc1$1@ger.gmane.org>
	<aac2c7cb0809292352l40a3cca1h64a8046fd7cd4364@mail.gmail.com>
	<87iqsdev0i.fsf@xemacs.org>
Message-ID: <aac2c7cb0809300536l3004acees8790141f6da04b44@mail.gmail.com>

On Tue, Sep 30, 2008 at 5:24 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Adam Olsen writes:
>
>  > [1] You could argue that Unicode should add new scalars to handle all
>  > currently invalid UTF-8 sequences.
>
> AFAIK there are about 2^31 of these, though!

They've promised to never allocate above U+10FFFF (0 to 1114111).  Not
sure that makes new additions easier or harder. ;)

-- 
Adam Olsen, aka Rhamphoryncus

From solipsis at pitrou.net  Tue Sep 30 11:06:52 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Sep 2008 11:06:52 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
 2.6 or 3.0?
In-Reply-To: <aac2c7cb0809291650i5aad3ea3i5df7242096498c15@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<loom.20080929T110713-396@post.gmane.org>
	<aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>
	<48E16155.1040209@v.loewis.de>
	<aac2c7cb0809291623ke994058qcdbe9837695f3b0c@mail.gmail.com>
	<47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net>
	<aac2c7cb0809291650i5aad3ea3i5df7242096498c15@mail.gmail.com>
Message-ID: <1222765612.6214.12.camel@fsol>

Le lundi 29 septembre 2008 ? 17:50 -0600, Adam Olsen a ?crit :
> It's correct in the sense that it can roundtrip all filenames.  UTF-8b
> is lossy, so certain filenames are not roundtripped properly.

Why do you say UTF-8b is lossy? From what I've read it claims to be
lossless (i.e. the range of characters used for escaping of invalid
bytes are themselves escaped if they are encountered in the source
sequence).

> As a user, I expect all file names to be printable.  That requires
> unicode, and any program that creates filenames with arbitrary
> bytestrings is just broken.

But if you use iso-8859-1 for decoding, all non-ASCII filenames will be
printed wrongly, not only those with invalid bytestrings. I fail to see
what it brings.

From guido at python.org  Tue Sep 30 15:50:10 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 06:50:10 -0700
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <gbs7up$1q5$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
	<200809300129.24972.victor.stinner@haypocalc.com>
	<gbs7up$1q5$1@ger.gmane.org>
Message-ID: <ca471dc20809300650i7cb5c9adl38f5cbe7aa9fc18@mail.gmail.com>

On Mon, Sep 29, 2008 at 8:55 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>
>> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit :
>
>>> I know I keep flipflopping on this one, but the more I think about it
>>> the more I believe it is better to drop those names than to raise an
>>> exception. Otherwise a "naive" program that happens to use
>>> os.listdir() can be rendered completely useless by a single non-UTF-8
>>> filename. Consider the use of os.listdir() by the glob module. If I am
>>> globbing for *.py, why should the presence of a file named b'\xff'
>>> cause it to fail?
>
> To avoid silent skipping, is it possible to drop 'unreadable' names, issue a
> warning (instead of exception), and continue to completion?
> "Warning: unreadable filename skipped; see PyWiki/UnreadableFilenames"

That would be annoying as hell in most cases.

I consider the dropping of unreadable names similar to the suppression
of "hidden" files by various operating systems.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 30 15:53:09 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 06:53:09 -0700
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <48E1C097.8030309@v.loewis.de>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<48E1C097.8030309@v.loewis.de>
Message-ID: <ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>

On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Change the default file system encoding to store bytes in Unicode is like
>> introducing a new Python type: <fake Unicode for filename hacks>.
>
> Exactly. Seems like the best solution to me, despite your polemics.

Martin, I don't understand why you are in favor of storing raw bytes
encoded as Latin-1 in Unicode string objects, which clearly gives rise
to mojibake. In the past you have always been staunchly opposed to API
changes or practices that could lead to mojibake (and you had me quite
convinced).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From victor.stinner at haypocalc.com  Tue Sep 30 15:54:20 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 30 Sep 2008 15:54:20 +0200
Subject: [Python-3000]
	=?iso-8859-1?q?=5BPython-Dev=5D_Patch_for_an_initia?=
	=?iso-8859-1?q?l_support_of_bytes_filename_in=09Python3?=
In-Reply-To: <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
	<20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com>
Message-ID: <200809301554.21222.victor.stinner@haypocalc.com>

Hi,

> This is the most sane contribution I've seen so far :).

Oh thanks.

> Do I understand properly that (listdir(bytes) -> bytes)?

Yes, os.listdir(bytes)->bytes. It's already the current behaviour.

But with Python3 trunk, os.listdir(str) -> str ... or bytes (if unicode 
conversion fails).

> If so, this seems basically sane to me, since it provides text behavior
> where possible and allows more sophisticated filesystem wrappers (i.e.
> Twisted's FilePath, Will McGugan's "FS") to do more tricky things,
> separating filenames for display to the user and filenames for exchange
> with the FS.

It's the goal of my patch. Let people do what you want with bytes: rename the 
file, try the best charset to display the filename, etc.

> >- remove os.getcwdu()
> >- create os.getcwdb() -> bytes
> >- glob.glob() support bytes
> >- fnmatch.filter() support bytes
> >- posixpath.join() and posixpath.split() support bytes
>
> It sounds like maybe there should be some 2to3 fixers in here somewhere,
> too?

IMHO a programmer should not use bytes for filenames. Only specific programs 
used to fix a broken system (eg. convmv program), a backup program, etc. 
should use bytes. So the "default" type (type and not charset) for filenames 
should be str in Python3.

If my patch would be applied, 2to3 have to replace getcwdu() to getcwd(). 
That's all.

> Not necessarily as part of this patch, but somewhere related?  I 
> don't know what they would do, but it does seem quite likely that code
> which was previously correct under 2.6 (using bytes) would suddenly be
> mixing bytes and unicode with these APIs.

It looks like 2to3 convert all text '...' or u'...' to unicode (str). So 
converted programs will use str for filenames.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From guido at python.org  Tue Sep 30 15:59:42 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 06:59:42 -0700
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbsgk6$kc1$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<gbsgk6$kc1$1@ger.gmane.org>
Message-ID: <ca471dc20809300659g608f8c14g29ba2b30def1be1f@mail.gmail.com>

On Mon, Sep 29, 2008 at 11:22 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> No, that was not what I meant (although it is another possibility). As I wrote,
> Martin's proposal that I support here is using the modified UTF-8 codec that
> successfully roundtrips otherwise invalid UTF-8 data.

I thought that the "successful rountripping" pretty much stopped as
soon as the unicode data is exported to somewhere else -- doesn't it
contain invalid surrogate sequences?

In general, I'm very reluctant to use utf-8b given that it doesn't
seem to be well documented as a standard anywhere. Providing some
minimal APIs that can process raw-bytes filenames still makes more
sense -- it is mostly analogous of our treatment of text files, where
the underlying binary data is also accessible.

> You seem to forget that (disregarding OSX here, since it already enforces
> UTF-8) the majority of file names on Posix systems will be encoded correctly.

Apparently under certain circumstances (external FS mounted) OSX can
also have non-UTF-8 filenames.

[...]

> With the filenames decoded by UTF-8, your files named t?ste, ?, dossi? will
> be displayed and handled correctly. The others are *invalid* in the filesystem
> encoding UTF-8 and therefore would be represented by something like
>
> u'dir\uXXffname' where XX is some private use Unicode namespace. It won't look
> pretty when printed, but then, what do other applications do? They e.g. display
> a question mark as you show above, which is not better in terms of readability.
>
> But it will work when given to a filename-handling function. Valid filenames
> can be compared to Unicode strings.
>
> A real-world example: OpenOffice can't open files with invalid bytes in their
> name. They are displayed in the "Open file" dialog, but trying to open fails.
> This regularly drives me crazy. Let's not make Python not work this way too,
> or, even worse, not even display those filenames.

How can it *regularly* drive you crazy when "the majority of fie names
[...] encoded correctly" (as you assert above)?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 30 16:04:09 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 07:04:09 -0700
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <loom.20080930T092131-632@post.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<gbsgk6$kc1$1@ger.gmane.org>
	<aac2c7cb0809292352l40a3cca1h64a8046fd7cd4364@mail.gmail.com>
	<loom.20080930T092131-632@post.gmane.org>
Message-ID: <ca471dc20809300704x70e452f0g5931caad9df97825@mail.gmail.com>

On Tue, Sep 30, 2008 at 2:28 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Adam Olsen <rhamph <at> gmail.com> writes:
>>
>> The only way to display that file would be to transform it into some
>> other valid unicode string.  However, as that string is already valid,
>> you've just made any files named after it impossible to open.
>
> Not if those valid sequences are also properly escaped to avoid collisions.
> That's what utf-8b claims to do.
>
> My view of utf-8b is that if is not really  a new codec, but an escaping phase
> added in front of utf-8, such that illegal byte sequences get converted to legal
> byte sequences. This is how e.g. XML-escaping works ("&" -> "&amp;", etc.). The
> only difficulty being in choosing sufficiently rare escaping sequences, so that
> readability is not impacted.

The problem is that there's no way (at least nobody has proposed one
AFAICT) to tell whether the escaping has been applied. When reading
XML, you *know* that you are expected to unescape exactly one level of
& escaping. You would never find XML with the unescaping already done
for you. But the output of utf-8b is indistinguishable from regular
utf-8 so you don't know whether you need to unescape things.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From victor.stinner at haypocalc.com  Tue Sep 30 16:11:02 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 30 Sep 2008 16:11:02 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<48E1C097.8030309@v.loewis.de>
	<ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>
Message-ID: <200809301611.03027.victor.stinner@haypocalc.com>

Le Tuesday 30 September 2008 15:53:09 Guido van Rossum, vous avez ?crit?:
> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. L?wis" <martin at v.loewis.de> 
wrote:
> >> Change the default file system encoding to store bytes in Unicode is
> >> like introducing a new Python type: <fake Unicode for filename hacks>.
> >
> > Exactly. Seems like the best solution to me, despite your polemics.
>
> Martin, I don't understand why you are in favor of storing raw bytes
> encoded as Latin-1 in Unicode string objects, which clearly gives rise
> to mojibake. In the past you have always been staunchly opposed to API
> changes or practices that could lead to mojibake (and you had me quite
> convinced).

If I understood correctly, the goal of Python3 is the clear *separation* of 
bytes and characters. Store bytes in Unicode is pratical because it doesn't 
need to change the existing code, but it doesn't fix the problem, it's just 
move problems which be raised later.

I didn't get an answer to my question: what is the result <bytes (fake 
characters) stored in unicode> + <real unicode>? I guess that the result is 
<mixed "bytes" and characters in unicode> instead of raising an error 
(invalid types). So again: why introducing a new type instead of reusing 
existing Python types?

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From guido at python.org  Tue Sep 30 16:05:58 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 07:05:58 -0700
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <48E20017.3020405@egenix.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<48E1C097.8030309@v.loewis.de> <48E20017.3020405@egenix.com>
Message-ID: <ca471dc20809300705x58aa87acn5e3760891e6b57b9@mail.gmail.com>

On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-09-30 08:00, Martin v. L?wis wrote:
>>> Change the default file system encoding to store bytes in Unicode is like
>>> introducing a new Python type: <fake Unicode for filename hacks>.
>>
>> Exactly. Seems like the best solution to me, despite your polemics.
>
> Not a bad idea... have os.listdir() return Unicode subclasses that work
> like file handles, ie. they have an extra buffer that holds the original
> bytes value received from the underlying C API.
>
> Passing these handles to open() would then do the right thing by using
> whatever os.listdir() got back from the file system to open the file,
> while still providing a sane way to display the filename, e.g. using
> question marks for the invalid characters.
>
> The only problem with this approach is concatenation of such handles
> to form pathnames, but then perhaps those concatenations could just
> work on the bytes value as well (I don't know of any OS that uses non-
> ASCII path separators).

While this seems to work superficially I expect an infinite number of
problems caused by code that doesn't understand this subclass. You are
hinting at this in your last paragraph.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 30 16:32:38 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 07:32:38 -0700
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in Python3
In-Reply-To: <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
	<20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com>
Message-ID: <ca471dc20809300732r456678fcgb8caeb369a6cf349@mail.gmail.com>

On Tue, Sep 30, 2008 at 6:21 AM,  <glyph at divmod.com> wrote:
> On 12:47 am, victor.stinner at haypocalc.com wrote:
>
> This is the most sane contribution I've seen so far :).

Thanks. I'll review it later today (after coffee+breakfast :) and will
apply it assuming the code is reasonably sane, otherwise I'll go
around with Victor until it is to my satisfaction.

>> See attached patch: python3_bytes_filename.patch
>>
>> Using the patch, you will get:
>> - open() support bytes
>> - listdir(unicode) -> only unicode, *skip* invalid filenames
>>  (as asked by Guido)
>
> Forgive me for being a bit dense, but I couldn't find this hunk in the
> patch.  Do I understand properly that (listdir(bytes) -> bytes)?
>
> If so, this seems basically sane to me, since it provides text behavior
> where possible and allows more sophisticated filesystem wrappers (i.e.
> Twisted's FilePath, Will McGugan's "FS") to do more tricky things,
> separating filenames for display to the user and filenames for exchange with
> the FS.
>>
>> - remove os.getcwdu()
>> - create os.getcwdb() -> bytes
>> - glob.glob() support bytes
>> - fnmatch.filter() support bytes
>> - posixpath.join() and posixpath.split() support bytes
>
> It sounds like maybe there should be some 2to3 fixers in here somewhere,
> too?  Not necessarily as part of this patch, but somewhere related?  I don't
> know what they would do, but it does seem quite likely that code which was
> previously correct under 2.6 (using bytes) would suddenly be mixing bytes
> and unicode with these APIs.

Doesn't seem easy for 2to3 to recognize such cases.

If 2.6 weren't pretty much released already I'd ask to add
os.getcwdb() there, as an alias for os.getcwd(), and add a 2to3 fixer
that converts os.getcwdu() to os.getcwd(), leaves os.getcwd() alone
(benefit of the doubt) and leaves os.getcwdb() alone as well (a strong
indication the user meant to get bytes in the 3.x version of their
code. (Similar to using bytes instead of str in 2.6 even though they
mean the same thing there -- they will be properly separated in 3.x.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From foom at fuhm.net  Tue Sep 30 17:14:12 2008
From: foom at fuhm.net (James Y Knight)
Date: Tue, 30 Sep 2008 11:14:12 -0400
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <aac2c7cb0809291650i5aad3ea3i5df7242096498c15@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<loom.20080929T110713-396@post.gmane.org>
	<aac2c7cb0809291457t2bdee45fj497d0596da41797b@mail.gmail.com>
	<48E16155.1040209@v.loewis.de>
	<aac2c7cb0809291623ke994058qcdbe9837695f3b0c@mail.gmail.com>
	<47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net>
	<aac2c7cb0809291650i5aad3ea3i5df7242096498c15@mail.gmail.com>
Message-ID: <3B962D7E-076B-4871-99A8-A3C6220592CD@fuhm.net>

On Sep 29, 2008, at 7:50 PM, Adam Olsen wrote:
> I'd rather the 1% of cases that need to handle bad file names make an
> explicit effort to do so, via alternate byte APIs or (if necessary)
> the 8859-1 hack.

So are you okay with python failing to run properly if the current  
directory has strange bytes in it? What if something odd is on the  
PATH environment variable? So much for being able to access  
os.environ['PATH']? I just don't see how that's okay behavior of a  
programming language to fail so drastically.

Unless you're proposing that nothing in python itself ever use the  
Unicode file API...but if you're proposing that, it kinda seems silly  
to even have it.

James

From mal at egenix.com  Tue Sep 30 17:20:42 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 30 Sep 2008 17:20:42 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <ca471dc20809300705x58aa87acn5e3760891e6b57b9@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbr0nv$iqu$1@ger.gmane.org>	<200809300202.38574.victor.stinner@haypocalc.com>	<48E1C097.8030309@v.loewis.de>
	<48E20017.3020405@egenix.com>
	<ca471dc20809300705x58aa87acn5e3760891e6b57b9@mail.gmail.com>
Message-ID: <48E243CA.1090604@egenix.com>

On 2008-09-30 16:05, Guido van Rossum wrote:
> On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 2008-09-30 08:00, Martin v. L?wis wrote:
>>>> Change the default file system encoding to store bytes in Unicode is like
>>>> introducing a new Python type: <fake Unicode for filename hacks>.
>>> Exactly. Seems like the best solution to me, despite your polemics.
>> Not a bad idea... have os.listdir() return Unicode subclasses that work
>> like file handles, ie. they have an extra buffer that holds the original
>> bytes value received from the underlying C API.
>>
>> Passing these handles to open() would then do the right thing by using
>> whatever os.listdir() got back from the file system to open the file,
>> while still providing a sane way to display the filename, e.g. using
>> question marks for the invalid characters.
>>
>> The only problem with this approach is concatenation of such handles
>> to form pathnames, but then perhaps those concatenations could just
>> work on the bytes value as well (I don't know of any OS that uses non-
>> ASCII path separators).
> 
> While this seems to work superficially I expect an infinite number of
> problems caused by code that doesn't understand this subclass. You are
> hinting at this in your last paragraph.

Well, to some extent Unicode objects themselves already implement
such a strategy: the default encoded bytes object basically provides
the low-level interfacing value.

But I agree, the approach is not foolproof.

In the end, I think it's better not to be clever and just return
the filenames that cannot be decoded as bytes objects in os.listdir().

Passing those to open() will then open the files as expected, in most
other cases the application will have to provide explicit conversions
in whatever way best fits the application.

Also note that os.listdir() isn't the only source of filesnames. You
often read them from a file, a database, some socket, etc, so letting
the application decide what to do is not asking too much, IMHO.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 30 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From janssen at parc.com  Tue Sep 30 17:47:30 2008
From: janssen at parc.com (Bill Janssen)
Date: Tue, 30 Sep 2008 08:47:30 PDT
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in Python3
In-Reply-To: <200809300247.20349.victor.stinner@haypocalc.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
Message-ID: <58953.1222789650@parc.com>

Victor Stinner <victor.stinner at haypocalc.com> wrote:

>  - listdir(unicode) -> only unicode, *skip* invalid filenames 
>    (as asked by Guido)

Is there an option listdir(bytes) which will return *all* filenames (as
byte sequences)?  Otherwise, this seems troubling to me; *something*
should be returned for filenames which can't be represented, even if
it's only None.

Bill

From glyph at divmod.com  Tue Sep 30 15:21:51 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Tue, 30 Sep 2008 13:21:51 -0000
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in	Python3
In-Reply-To: <200809300247.20349.victor.stinner@haypocalc.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
Message-ID: <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com>

On 12:47 am, victor.stinner at haypocalc.com wrote:

This is the most sane contribution I've seen so far :).
>See attached patch: python3_bytes_filename.patch
>
>Using the patch, you will get:
>- open() support bytes
>- listdir(unicode) -> only unicode, *skip* invalid filenames
>   (as asked by Guido)

Forgive me for being a bit dense, but I couldn't find this hunk in the 
patch.  Do I understand properly that (listdir(bytes) -> bytes)?

If so, this seems basically sane to me, since it provides text behavior 
where possible and allows more sophisticated filesystem wrappers (i.e. 
Twisted's FilePath, Will McGugan's "FS") to do more tricky things, 
separating filenames for display to the user and filenames for exchange 
with the FS.
>- remove os.getcwdu()
>- create os.getcwdb() -> bytes
>- glob.glob() support bytes
>- fnmatch.filter() support bytes
>- posixpath.join() and posixpath.split() support bytes

It sounds like maybe there should be some 2to3 fixers in here somewhere, 
too?  Not necessarily as part of this patch, but somewhere related?  I 
don't know what they would do, but it does seem quite likely that code 
which was previously correct under 2.6 (using bytes) would suddenly be 
mixing bytes and unicode with these APIs.

From foom at fuhm.net  Tue Sep 30 18:20:00 2008
From: foom at fuhm.net (James Y Knight)
Date: Tue, 30 Sep 2008 12:20:00 -0400
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <87od26e3an.fsf@xemacs.org>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>
	<87od26e3an.fsf@xemacs.org>
Message-ID: <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>

On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote:

>> Except...that one over there. That's the whole point of UTF-8b:
>> correctly encoded names get decoded correctly and readably, and the
>> other cases get decoded into something unique that cannot possibly
>> conflict.
>
> Sure.  But there are lots of other operations besides encoding and
> decoding that we do with filenames.  How do you display a filename?
> How about concatenating them to make paths?  What do you do when you
> want to mix a filename with other, well-formed strings?  If you keep
> the filenames internally in UTF-8b, you're going to need what amounts
> to a whole string API for dealing with them, aren't you?  If you're
> not doing that, how is UTF-8b represented?

No, you keep the filenames internally in a PyUnicode object. All that  
stuff *works* in Python today, with a UTF-8b decoded string.

Displaying a filename is encoding it into some other encoding. Like  
this:
 >>> '\x90\x90'.decode('utf-8b')
u'\udc90\udc90'
 >>> u'\udc90\udc90'.encode('utf-8')
'\xed\xb2\x90\xed\xb2\x90'

So, that seems to work okay. Maybe I should try to display that in a  
web browser. Shows up as 2 "unknown character" glyphs. Perfect.

If you want to mix a filename with other strings, you append them  
together, or use os.path, same as always. You don't need any new  
string API.

Since from what I've tried, things seem to work, I'd really like to  
know what precisely does fail from the opponents of utf-8b.

And again: if utf-8b isn't acceptable, because it does break things in  
some unknown-to-me way, I really can't imagine anything working but  
just going back to byte-string access as the only API. It's really not  
okay for the "obvious" APIs to be totally broken by unexpected input.  
Think os.getcwd(),  sys.argv, os.environ. You can't just ignore bad  
files and call it done.

James

From guido at python.org  Tue Sep 30 18:46:00 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 09:46:00 -0700
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <48E243CA.1090604@egenix.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<48E1C097.8030309@v.loewis.de> <48E20017.3020405@egenix.com>
	<ca471dc20809300705x58aa87acn5e3760891e6b57b9@mail.gmail.com>
	<48E243CA.1090604@egenix.com>
Message-ID: <ca471dc20809300946n275084cep66ba478ac93b9de9@mail.gmail.com>

On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> In the end, I think it's better not to be clever and just return
> the filenames that cannot be decoded as bytes objects in os.listdir().

Unfortunately that's going to break most code that is using
os.listdir(), so it's hardly an improved experience.

> Passing those to open() will then open the files as expected, in most
> other cases the application will have to provide explicit conversions
> in whatever way best fits the application.

In most cases the app will try to concatenate a pathname given as a
string and then it will fail.

> Also note that os.listdir() isn't the only source of filesnames. You
> often read them from a file, a database, some socket, etc, so letting
> the application decide what to do is not asking too much, IMHO.

In all those cases, the code that reads them is responsible for
picking an encoding or relying on a default encoding, and the
resulting filenames are always expressed as text, not bytes. I don't
think it's the same at all.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 30 18:47:10 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 09:47:10 -0700
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in Python3
In-Reply-To: <58953.1222789650@parc.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
	<58953.1222789650@parc.com>
Message-ID: <ca471dc20809300947q6ba8c4aw1285e84b0e124919@mail.gmail.com>

On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen <janssen at parc.com> wrote:
> Victor Stinner <victor.stinner at haypocalc.com> wrote:
>
>>  - listdir(unicode) -> only unicode, *skip* invalid filenames
>>    (as asked by Guido)
>
> Is there an option listdir(bytes) which will return *all* filenames (as
> byte sequences)?  Otherwise, this seems troubling to me; *something*
> should be returned for filenames which can't be represented, even if
> it's only None.

Yes, os.listdir() becomes polymorphic -- if you pass it a pathname in
bytes the output is in bytes and it will return everything exactly as
the underlying syscall returns it to you.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 30 18:57:21 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 09:57:21 -0700
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>
	<87od26e3an.fsf@xemacs.org>
	<6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>
Message-ID: <ca471dc20809300957o61b554e2n9101e0b1078b1647@mail.gmail.com>

On Tue, Sep 30, 2008 at 9:20 AM, James Y Knight <foom at fuhm.net> wrote:
>
> On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote:
>
>>> Except...that one over there. That's the whole point of UTF-8b:
>>> correctly encoded names get decoded correctly and readably, and the
>>> other cases get decoded into something unique that cannot possibly
>>> conflict.
>>
>> Sure.  But there are lots of other operations besides encoding and
>> decoding that we do with filenames.  How do you display a filename?
>> How about concatenating them to make paths?  What do you do when you
>> want to mix a filename with other, well-formed strings?  If you keep
>> the filenames internally in UTF-8b, you're going to need what amounts
>> to a whole string API for dealing with them, aren't you?  If you're
>> not doing that, how is UTF-8b represented?
>
> No, you keep the filenames internally in a PyUnicode object. All that stuff
> *works* in Python today, with a UTF-8b decoded string.
>
> Displaying a filename is encoding it into some other encoding. Like this:
>>>> '\x90\x90'.decode('utf-8b')
> u'\udc90\udc90'
>>>> u'\udc90\udc90'.encode('utf-8')
> '\xed\xb2\x90\xed\xb2\x90'
>
> So, that seems to work okay. Maybe I should try to display that in a web
> browser. Shows up as 2 "unknown character" glyphs. Perfect.

Well browsers are of course the epitome of lenient parsing.

Try incorporating one of these things to an XML file and see if
standard-conforming XML product likes it.

> If you want to mix a filename with other strings, you append them together,
> or use os.path, same as always. You don't need any new string API.
>
> Since from what I've tried, things seem to work, I'd really like to know
> what precisely does fail from the opponents of utf-8b.

Another problem I have with UTF-8b is its lack of standardization.

> And again: if utf-8b isn't acceptable, because it does break things in some
> unknown-to-me way, I really can't imagine anything working but just going
> back to byte-string access as the only API. It's really not okay for the
> "obvious" APIs to be totally broken by unexpected input. Think os.getcwd(),
>  sys.argv, os.environ. You can't just ignore bad files and call it done.

Actually that is what you *have* to do with the
filesystem-as-a-black-box model. Filesystems reserve the right to fail
occasionally and there's nothing you can do to prevent it -- it would
be unacceptable if the entire disk would stop working because it had
one bad block (unless the bad block is in some kind of master table)
so you just have to deal with it, and you can't wish the problems away
by insisting on a perfect abstraction.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Tue Sep 30 19:29:01 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 30 Sep 2008 13:29:01 -0400
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <ca471dc20809300650i7cb5c9adl38f5cbe7aa9fc18@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbr0nv$iqu$1@ger.gmane.org>	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>	<200809300129.24972.victor.stinner@haypocalc.com>	<gbs7up$1q5$1@ger.gmane.org>
	<ca471dc20809300650i7cb5c9adl38f5cbe7aa9fc18@mail.gmail.com>
Message-ID: <gbtnks$r2f$1@ger.gmane.org>

Guido van Rossum wrote:
> On Mon, Sep 29, 2008 at 8:55 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit :
>>>> I know I keep flipflopping on this one, but the more I think about it
>>>> the more I believe it is better to drop those names than to raise an
>>>> exception. Otherwise a "naive" program that happens to use
>>>> os.listdir() can be rendered completely useless by a single non-UTF-8
>>>> filename. Consider the use of os.listdir() by the glob module. If I am
>>>> globbing for *.py, why should the presence of a file named b'\xff'
>>>> cause it to fail?
>> To avoid silent skipping, is it possible to drop 'unreadable' names, issue a
>> warning (instead of exception), and continue to completion?
>> "Warning: unreadable filename skipped; see PyWiki/UnreadableFilenames"
> 
> That would be annoying as hell in most cases.

OK.  Put one warning in the docs at the top of OS/Files and Directories:

Note: On Unix, illegal filenames (and the files they name) are silently 
ignored by many of the functions below.

-- but perhaps with more specific info, such as what is illegal, which 
functions, and how to fix outside of Python.

> I consider the dropping of unreadable names similar to the suppression
> of "hidden" files by various operating systems.

That is documented, sometimes annoying, and reversible when it is. 
Python should at least document doing something similar.

tjr

From qrczak at knm.org.pl  Tue Sep 30 19:37:36 2008
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Tue, 30 Sep 2008 19:37:36 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>
	<87od26e3an.fsf@xemacs.org>
	<6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>
Message-ID: <3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com>

2008/9/30 James Y Knight <foom at fuhm.net>:

>>>> u'\udc90\udc90'.encode('utf-8')
> '\xed\xb2\x90\xed\xb2\x90'

This is wrong: UTF-8 (like other UTF-x) encodes Unicode scalar values,
not Unicode code points, i.e. surrogates as such are unencodable.
'\xed\xb2\x90' is invalid UTF-8.

I've experimentally implemented (not for Python) a different escaping
scheme with a similar goal as UTF-8b: undecodable bytes are prefixed
with U+0000 instead of being converted to unpaired surrogates, and
'\x00' decodes as U+0000 U+0000.

Glib provides some functions to convert filenames for display, in a
way which is not necessarily reversible (includes some hex escapes in
ASCII).

-- 
Marcin Kowalczyk
qrczak at knm.org.pl
http://qrnik.knm.org.pl/~qrczak/

From janssen at parc.com  Tue Sep 30 19:41:05 2008
From: janssen at parc.com (Bill Janssen)
Date: Tue, 30 Sep 2008 10:41:05 PDT
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in Python3
In-Reply-To: <ca471dc20809300947q6ba8c4aw1285e84b0e124919@mail.gmail.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
	<58953.1222789650@parc.com>
	<ca471dc20809300947q6ba8c4aw1285e84b0e124919@mail.gmail.com>
Message-ID: <61658.1222796465@parc.com>

Guido van Rossum <guido at python.org> wrote:
> On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen <janssen at parc.com> wrote:
> > Victor Stinner <victor.stinner at haypocalc.com> wrote:
> >
> >>  - listdir(unicode) -> only unicode, *skip* invalid filenames
> >>    (as asked by Guido)
> >
> > Is there an option listdir(bytes) which will return *all* filenames (as
> > byte sequences)?  Otherwise, this seems troubling to me; *something*
> > should be returned for filenames which can't be represented, even if
> > it's only None.
> 
> Yes, os.listdir() becomes polymorphic -- if you pass it a pathname in
> bytes the output is in bytes and it will return everything exactly as
> the underlying syscall returns it to you.

What about everything else?  For instance, if I call
os.path.join(<bytes>, <bytes>), I presume I get back a <bytes> which can
be passed to os.listdir() to retrieve the contents of that directory.

Bill

From guido at python.org  Tue Sep 30 19:45:55 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 10:45:55 -0700
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <gbtnjo$quh$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<gbsgk6$kc1$1@ger.gmane.org>
	<ca471dc20809300659g608f8c14g29ba2b30def1be1f@mail.gmail.com>
	<gbtnjo$quh$1@ger.gmane.org>
Message-ID: <ca471dc20809301045r59251402g3fe947dec3bc7f22@mail.gmail.com>

On Tue, Sep 30, 2008 at 10:28 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>> How can it *regularly* drive you crazy when "the majority of fie names
>> [...] encoded correctly" (as you assert above)?
>
> Because Office files are a) often named with long, seemingly descriptive
> filenames, which invariably means umlauts in German, and b) often sent around
> between systems, creating encoding problems.

Gotcha.

> Having seen how much controversy returning an invalid Unicode string sparks,
> and given that it really isn't obvious to the newbie either, I think I now agree
> that dropping filenames when calling a listdir() that returns Unicode filenames
> is the best solution. I'm a little uneasy with having one function for both
> bytes and Unicode return, because that kind of str/unicode mixing I thought we
> had left behind in 2.x, but of course can live with it.

Well, the *current* Py3k behavior where it may return a mix of bytes
and str instances is really messy, and likely to trip up most code
that doesn't expect it in a way that makes it hard to debug. However
the *proposed* behavior (returns bytes if the arg was bytes, and
returns str when the arg was str) is IMO sane, and no different than
the polymorphism found in len() or many builtin operations.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From foom at fuhm.net  Tue Sep 30 19:47:06 2008
From: foom at fuhm.net (James Y Knight)
Date: Tue, 30 Sep 2008 13:47:06 -0400
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>
	<87od26e3an.fsf@xemacs.org>
	<6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>
	<3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com>
Message-ID: <F96D3B12-F105-4A90-B809-4F9914D1A9FD@fuhm.net>

On Sep 30, 2008, at 1:37 PM, Marcin 'Qrczak' Kowalczyk wrote:
> I've experimentally implemented (not for Python) a different escaping
> scheme with a similar goal as UTF-8b: undecodable bytes are prefixed
> with U+0000 instead of being converted to unpaired surrogates, and
> '\x00' decodes as U+0000 U+0000.
>
> Glib provides some functions to convert filenames for display, in a
> way which is not necessarily reversible (includes some hex escapes in
> ASCII).

This sounds quite promising: 0 is an invalid character in the  
filesystem API, in the environment, and in command lines, yet not in a  
unicode string. Good thinking!

James

From mal at egenix.com  Tue Sep 30 19:50:37 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 30 Sep 2008 19:50:37 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <ca471dc20809300946n275084cep66ba478ac93b9de9@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbr0nv$iqu$1@ger.gmane.org>	<200809300202.38574.victor.stinner@haypocalc.com>	<48E1C097.8030309@v.loewis.de>
	<48E20017.3020405@egenix.com>	<ca471dc20809300705x58aa87acn5e3760891e6b57b9@mail.gmail.com>	<48E243CA.1090604@egenix.com>
	<ca471dc20809300946n275084cep66ba478ac93b9de9@mail.gmail.com>
Message-ID: <48E266ED.9020902@egenix.com>

On 2008-09-30 18:46, Guido van Rossum wrote:
> On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> In the end, I think it's better not to be clever and just return
>> the filenames that cannot be decoded as bytes objects in os.listdir().
> 
> Unfortunately that's going to break most code that is using
> os.listdir(), so it's hardly an improved experience.

Right, but this also signals a problem to the application and the
application is in the best position to determine a proper
work-around.

>> Passing those to open() will then open the files as expected, in most
>> other cases the application will have to provide explicit conversions
>> in whatever way best fits the application.
> 
> In most cases the app will try to concatenate a pathname given as a
> string and then it will fail.

True, and that's the right thing to do in those cases.
The application will have to deal with the problem, e.g. convert
the path to bytes and retry the joining, or convert the bytes string
to Latin-1 and then convert the result back to bytes (using Latin-1)
for passing it to open() (which will of course only work if there are
no non-Latin-1 characters in the path dir), or apply a different
filename encoding based on the path and then retry to convert the
bytes filename into Unicode, or ask the user what to do, etc.

There are many possibilities to solve the problem, apply a work-around,
or inform the user of ways to correct it.

>> Also note that os.listdir() isn't the only source of filesnames. You
>> often read them from a file, a database, some socket, etc, so letting
>> the application decide what to do is not asking too much, IMHO.
> 
> In all those cases, the code that reads them is responsible for
> picking an encoding or relying on a default encoding, and the
> resulting filenames are always expressed as text, not bytes. I don't
> think it's the same at all.

What I was trying to say is that you run into the same problem
in other places as well. Trying to have os.listdir() implement
some strategy is not going to solve the problem at large.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 30 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From guido at python.org  Tue Sep 30 19:54:15 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 10:54:15 -0700
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in Python3
In-Reply-To: <61658.1222796465@parc.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
	<58953.1222789650@parc.com>
	<ca471dc20809300947q6ba8c4aw1285e84b0e124919@mail.gmail.com>
	<61658.1222796465@parc.com>
Message-ID: <ca471dc20809301054x21082592i5a8a58e81223c4f1@mail.gmail.com>

On Tue, Sep 30, 2008 at 10:41 AM, Bill Janssen <janssen at parc.com> wrote:
> Guido van Rossum <guido at python.org> wrote:
>> On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen <janssen at parc.com> wrote:
>> > Victor Stinner <victor.stinner at haypocalc.com> wrote:
>> >
>> >>  - listdir(unicode) -> only unicode, *skip* invalid filenames
>> >>    (as asked by Guido)
>> >
>> > Is there an option listdir(bytes) which will return *all* filenames (as
>> > byte sequences)?  Otherwise, this seems troubling to me; *something*
>> > should be returned for filenames which can't be represented, even if
>> > it's only None.
>>
>> Yes, os.listdir() becomes polymorphic -- if you pass it a pathname in
>> bytes the output is in bytes and it will return everything exactly as
>> the underlying syscall returns it to you.
>
> What about everything else?  For instance, if I call
> os.path.join(<bytes>, <bytes>), I presume I get back a <bytes> which can
> be passed to os.listdir() to retrieve the contents of that directory.

Yeah, Victor's code at http://bugs.python.org/issue3187 (file
python3_bytes_filename.patch) does this. More needs to be done but
it's a start.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 30 19:56:45 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 10:56:45 -0700
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in Python3
In-Reply-To: <20080930175932.31635.989735053.divmod.xquotient.478@weber.divmod.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
	<20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com>
	<ca471dc20809300732r456678fcgb8caeb369a6cf349@mail.gmail.com>
	<20080930175932.31635.989735053.divmod.xquotient.478@weber.divmod.com>
Message-ID: <ca471dc20809301056j6800b6e1nca9a9ec5a52e8445@mail.gmail.com>

On Tue, Sep 30, 2008 at 10:59 AM,  <glyph at divmod.com> wrote:
> On 02:32 pm, guido at python.org wrote:
>> If 2.6 weren't pretty much released already I'd ask to add
>> os.getcwdb() there, as an alias for os.getcwd(), and add a 2to3 fixer
>> that converts os.getcwdu() to os.getcwd(), leaves os.getcwd() alone
>> (benefit of the doubt) and leaves os.getcwdb() alone as well (a strong
>> indication the user meant to get bytes in the 3.x version of their
>> code. (Similar to using bytes instead of str in 2.6 even though they
>> mean the same thing there -- they will be properly separated in 3.x.)
>
> In the absence of a 2.6 getcwdb, perhaps the fixer could just drop the
> "benefit of the doubt" case?  It could always be added to 2.7, and the
> parity release of 2to3 could have a --2.7 switch that would modify the
> behavior of this and other fixers.

I'm not sure what you're proposing. *My* proposal is that 2to3 changes
os.getcwdu() calls to os.getcwd() and leaves os.getcwd() calls alone
-- there's no way to tell whether os.getcwdb() would be a better
match, and for portable code, it won't be (since os.getcwdb() is a
Unix-only thing).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Tue Sep 30 20:13:49 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 30 Sep 2008 20:13:49 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
Message-ID: <gbtq8t$3dl$1@ger.gmane.org>

Victor Stinner schrieb:
> Hi,
> 
> After reading the previous discussion, here is new proposition.
> 
> Python 2.x and Windows are not affected by this issue. Only Python3 on POSIX 
> (eg. Linux or *BSD) is affected.
> 
> Some system are broken, but Python have to be able to open/copy/move/remove 
> files with an "invalid filename".
> 
> The issue can wait for Python 3.0.1 / 3.1.
> 
> Windows
> -------
> 
> On Windows, we might reject bytes filenames for all file operations: open(), 
> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)

Since I've seen no objections to this yet: please no. If we offer a
"lower-level" bytes filename API, it should work for all platforms.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From guido at python.org  Tue Sep 30 20:20:22 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 11:20:22 -0700
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbtq8t$3dl$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbtq8t$3dl$1@ger.gmane.org>
Message-ID: <ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>

On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> Victor Stinner schrieb:
>> On Windows, we might reject bytes filenames for all file operations: open(),
>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
>
> Since I've seen no objections to this yet: please no. If we offer a
> "lower-level" bytes filename API, it should work for all platforms.

I'm not sure either way. I've heard it claim that Windows filesystem
APIs use Unicode natively. Does Python 3.0 on Windows currently
support filenames expressed as bytes? Are they encoded first before
passing to the Unicode APIs? Using what encoding?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From foom at fuhm.net  Tue Sep 30 20:25:54 2008
From: foom at fuhm.net (James Y Knight)
Date: Tue, 30 Sep 2008 14:25:54 -0400
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <ca471dc20809300957o61b554e2n9101e0b1078b1647@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<48DE705E.6050405@v.loewis.de>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>
	<87od26e3an.fsf@xemacs.org>
	<6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>
	<ca471dc20809300957o61b554e2n9101e0b1078b1647@mail.gmail.com>
Message-ID: <B11BFA0C-4238-4623-B040-73BC5358831F@fuhm.net>

On Sep 30, 2008, at 12:57 PM, Guido van Rossum wrote:

>> And again: if utf-8b isn't acceptable, because it does break things  
>> in some
>> unknown-to-me way, I really can't imagine anything working but just  
>> going
>> back to byte-string access as the only API. It's really not okay  
>> for the
>> "obvious" APIs to be totally broken by unexpected input. Think  
>> os.getcwd(),
>> sys.argv, os.environ. You can't just ignore bad files and call it  
>> done.
>
> Actually that is what you *have* to do with the
> filesystem-as-a-black-box model. Filesystems reserve the right to fail
> occasionally and there's nothing you can do to prevent it -- it would
> be unacceptable if the entire disk would stop working because it had
> one bad block (unless the bad block is in some kind of master table)
> so you just have to deal with it, and you can't wish the problems away
> by insisting on a perfect abstraction.

What I meant is that ignoring certain files not nearly good enough to  
solve the problem.

python -c "import sys; print sys.argv" "$(echo -e 'filename\x90\x90')"  
-> python3 fails to start.

cd "$(echo -e 'dir\x90')" # Assume said dir exists
python -> python3 fails to start.

PATH="$PATH:$(echo -e /home/user/dir\x90)"
python3 -c "import os; print os.environ['PATH']" -> nope, no PATH.

Those aren't good behaviors, and can't be solved simply by pretending  
certain files don't exist.

But please see the U+0000-escape alternative proposed by Marcin. It,  
unlike utf-8b doesn't depend upon non-standard unicode, so maybe there  
won't be as much opposition to it.

James

From g.brandl at gmx.net  Tue Sep 30 21:41:22 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 30 Sep 2008 21:41:22 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbtq8t$3dl$1@ger.gmane.org>
	<ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
Message-ID: <gbtvd3$na4$1@ger.gmane.org>

Guido van Rossum schrieb:
> On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>> Victor Stinner schrieb:
>>> On Windows, we might reject bytes filenames for all file operations: open(),
>>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
>>
>> Since I've seen no objections to this yet: please no. If we offer a
>> "lower-level" bytes filename API, it should work for all platforms.
> 
> I'm not sure either way. I've heard it claim that Windows filesystem
> APIs use Unicode natively. Does Python 3.0 on Windows currently
> support filenames expressed as bytes? Are they encoded first before
> passing to the Unicode APIs? Using what encoding?

Oh, ok. I had assumed Windows just uses a fixed encoding without the problem
of misencoded filenames.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From tjreedy at udel.edu  Tue Sep 30 21:42:23 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 30 Sep 2008 15:42:23 -0400
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbtq8t$3dl$1@ger.gmane.org>
	<ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
Message-ID: <gbtveu$ocl$1@ger.gmane.org>

Guido van Rossum wrote:
> On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>> Victor Stinner schrieb:
>>> On Windows, we might reject bytes filenames for all file operations: open(),
>>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
>> Since I've seen no objections to this yet: please no. If we offer a
>> "lower-level" bytes filename API, it should work for all platforms.
> 
> I'm not sure either way. I've heard it claim that Windows filesystem
> APIs use Unicode natively. Does Python 3.0 on Windows currently
> support filenames expressed as bytes? Are they encoded first before
> passing to the Unicode APIs? Using what encoding?

In 3.0rc1, the listdir doc needs updating:
"os.listdir(path)
Return a list containing the names of the entries in the directory. The 
list is in arbitrary order. It does not include the special entries '.' 
and '..' even if they are present in the directory. Availability: Unix, 
Windows.

On Windows NT/2k/XP and Unix, if path is a Unicode object, the result 
will be a list of Unicode objects."

s/Unicode/bytes/ at least for Windows.

 >>> os.listdir(b'.')
[b'countries.txt', b'multeetest.py', b't1.py', b't1.pyc', b't2.py', 
b'tem', b'temp.py', b'temp.pyc', b'temp2.py', b'temp3.py', b'temp4.py', 
b'test.py', b'z', b'z.txt']

The bytes names do not work however:

 >>> t=open(b'tem')
Traceback (most recent call last):
   File "<pyshell#23>", line 1, in <module>
     t=open(b'tem')
   File "C:\Programs\Python30\lib\io.py", line 284, in __new__
     return open(*args, **kwargs)
   File "C:\Programs\Python30\lib\io.py", line 184, in open
     raise TypeError("invalid file: %r" % file)
TypeError: invalid file: b'tem'

Is this what you were asking?

tjr

From qrczak at knm.org.pl  Tue Sep 30 21:46:36 2008
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Tue, 30 Sep 2008 21:46:36 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>
	<48DFF382.7020006@v.loewis.de>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>
	<87od26e3an.fsf@xemacs.org>
	<6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>
	<3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com>
Message-ID: <3f4107910809301246j62ce1cb7n6401e6f3b303c46@mail.gmail.com>

2008/9/30 Marcin 'Qrczak' Kowalczyk <qrczak at knm.org.pl>:

> I've experimentally implemented (not for Python) a different escaping
> scheme with a similar goal as UTF-8b: undecodable bytes are prefixed
> with U+0000 instead of being converted to unpaired surrogates, and
> '\x00' decodes as U+0000 U+0000.

This was not my idea: mono did that first.
http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding
"In short, it's a Glorious Hack. Rejoice. Or something."

Note that there are many people, including the Unicode list, who
consider this evil because they view this as a non-standard
modification of UTF-8. I am undecided on how evil it is.

(My implementation differs from mono by the strictness of what Unicode
sequences can be encoded: mono encodes all and mine does not, OTOH
mine is a bijection and mono is not. Both implementations decode all
byte sequences of course.)

-- 
Marcin Kowalczyk
qrczak at knm.org.pl
http://qrnik.knm.org.pl/~qrczak/

From martin at v.loewis.de  Tue Sep 30 22:04:42 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 22:04:42 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	
	<gbr0nv$iqu$1@ger.gmane.org>	
	<200809300202.38574.victor.stinner@haypocalc.com>	
	<48E1C097.8030309@v.loewis.de>
	<ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>
Message-ID: <48E2865A.3010404@v.loewis.de>

Guido van Rossum wrote:
> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>> Change the default file system encoding to store bytes in Unicode is like
>>> introducing a new Python type: <fake Unicode for filename hacks>.
>> Exactly. Seems like the best solution to me, despite your polemics.
> 
> Martin, I don't understand why you are in favor of storing raw bytes
> encoded as Latin-1 in Unicode string objects, which clearly gives rise
> to mojibake. In the past you have always been staunchly opposed to API
> changes or practices that could lead to mojibake (and you had me quite
> convinced).

True. I try to outweigh the need for simplicity in the API against the
need to support all cases. So I see two solutions:

a) support bytes as file names. Supports all cases, but complicates
   the API very much, by pervasively bringing bytes into the status
   of a character data type. IMO, this must be prevented at all costs.

b) make character (Unicode) strings the only string type. Does not
   immediately support all cases, so some hacks are needed. However,
   even with the hacks, it preserves the simplicity of the API; the
   hacks then should ideally be limited to the applications that need
   it. On this side, I see the following approaches:
   1. try to automatically embed non-representable characters into
      the Unicode strings, e.g. by using PUA characters. Reduces
      the amount of moji-bake, but produces a lot of difficult issues.
   2. let applications that desire so access all file names in a
      uniform manner, at the cost of producing tons of moji-bake

In this case, I think moji-bake is unavoidable: it is just a plain
flaw in the POSIX implementations (not the API or specification) that
you can run into file names where you can't come up with the right
rendering. Even for solution a), the resulting data cannot
be displayed "correctly" in all cases.

Currently, I favor b2, but haven't given up on b1, and they don't
exclude each other. b2 is simple to implement, and delegates the
choice between legible file names and universal access to all files
to the application. Given the way Unix works, this is the most sensible
choice, IMO: by default, Python should try to make file names legible,
but stuff like backup applications should be implementable also -
and they don't need legible file names.

I think option a) will hunt us forever. People will ask for more and
more features in the bytes type, eventually asking "give us Python
2.x strings back". It already starts: see #3982, where Benjamin
asks to have .format added to bytes (for a reason unrelated to file
names).

Regards,
Martin

From solipsis at pitrou.net  Tue Sep 30 22:11:41 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Sep 2008 20:11:41 +0000 (UTC)
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
References: <200809291407.55291.victor.stinner@haypocalc.com>	
	<gbr0nv$iqu$1@ger.gmane.org>	
	<200809300202.38574.victor.stinner@haypocalc.com>	
	<48E1C097.8030309@v.loewis.de>
	<ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>
	<48E2865A.3010404@v.loewis.de>
Message-ID: <loom.20080930T201027-854@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:
> 
> True. I try to outweigh the need for simplicity in the API against the
> need to support all cases. So I see two solutions:
> 
> a) (...)
> 
> b) (...)

By the way, doesn't all this controversy yearn for a PEP?

From tjreedy at udel.edu  Tue Sep 30 22:12:20 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 30 Sep 2008 16:12:20 -0400
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbtveu$ocl$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbtq8t$3dl$1@ger.gmane.org>	<ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
	<gbtveu$ocl$1@ger.gmane.org>
Message-ID: <gbu173$vnq$1@ger.gmane.org>

Terry Reedy wrote:
> Guido van Rossum wrote:

>> I'm not sure either way. I've heard it claim that Windows filesystem
>> APIs use Unicode natively. Does Python 3.0 on Windows currently
>> support filenames expressed as bytes? Are they encoded first before
>> passing to the Unicode APIs? Using what encoding?

 > [os.listdir(bytes) returns list of bytes, open(bytes) fails]

More:

The path functions seem also do not work:

 >>> op.abspath(b'tem')
...
     path = path.replace("/", "\\")
TypeError: expected an object with the buffer interface

The error message is a bit cryptic given that the problem is that the 
arguments to replace should be bytes instead of strings for a bytes path.

.basename fails with
...
    while i and p[i-1] not in '/\\':
TypeError: 'in <string>' requires string as left operand, not int

os.rename, os.stat, os.mkdir, os.rmdir work.  I presume same is true for 
others that normally work on windows.

tjr

From martin at v.loewis.de  Tue Sep 30 22:22:07 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 22:22:07 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <200809301611.03027.victor.stinner@haypocalc.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<48E1C097.8030309@v.loewis.de>	<ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>
	<200809301611.03027.victor.stinner@haypocalc.com>
Message-ID: <48E28A6F.8030604@v.loewis.de>

> I didn't get an answer to my question: what is the result <bytes (fake 
> characters) stored in unicode> + <real unicode>? I guess that the result is 
> <mixed "bytes" and characters in unicode> instead of raising an error 
> (invalid types). So again: why introducing a new type instead of reusing 
> existing Python types?

I didn't mean to introduce a new data type in the strict sense - merely
to pass through undecodable bytes through the regular Unicode type.
So the result of adding them is a regular Unicode string.

Regards,
Martin

From martin at v.loewis.de  Tue Sep 30 22:29:37 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 22:29:37 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <ca471dc20809301045r59251402g3fe947dec3bc7f22@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbr0nv$iqu$1@ger.gmane.org>	<200809300202.38574.victor.stinner@haypocalc.com>	<gbsgk6$kc1$1@ger.gmane.org>	<ca471dc20809300659g608f8c14g29ba2b30def1be1f@mail.gmail.com>	<gbtnjo$quh$1@ger.gmane.org>
	<ca471dc20809301045r59251402g3fe947dec3bc7f22@mail.gmail.com>
Message-ID: <48E28C31.6060606@v.loewis.de>

Guido van Rossum wrote:
> However
> the *proposed* behavior (returns bytes if the arg was bytes, and
> returns str when the arg was str) is IMO sane, and no different than
> the polymorphism found in len() or many builtin operations.

My concern still is that it brings the bytes type into the status of
another character string type, which is really bad, and will require
further modifications to Python for the lifetime of 3.x.

This is because applications will then regularly use byte strings for
file names on Unix, and regular strings on Windows, and then expect
the program to work the same without further modifications. The next
question then will be environment variables and command line arguments,
for which we then should provide two versions (e.g. sys.argv and
sys.argvb; for os.environ, os.environ["PATH"] could mean something
different from os.environ[b"PATH"]). And so on (passwd/group file,
Tkinter, ...)

Regards,
Martin

From martin at v.loewis.de  Tue Sep 30 22:45:55 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 22:45:55 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbtq8t$3dl$1@ger.gmane.org>
	<ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
Message-ID: <48E29003.1010703@v.loewis.de>

> I'm not sure either way. I've heard it claim that Windows filesystem
> APIs use Unicode natively. Does Python 3.0 on Windows currently
> support filenames expressed as bytes?

Yes, it does (at least, os.open, os.stat support them, builtin open
doesn't).

> Are they encoded first before
> passing to the Unicode APIs? Using what encoding?

They aren't passed to the Unicode (W) APIs (by Python). Instead, they
are passed to the "ANSI" (A) APIs (i.e. CP_ACP APIs). On Windows NT+,
that API then converts it to Unicode through the CP_ACP (aka "mbcs")
encoding; this is inside the system DLLs.

CP_ACP is a lossy encoding (from Unicode to bytes): Microsoft uses
replacement characters if they can, starting with similarly-looking
characters, and falling back to question marks.

Regards,
Martin

From tciny at dword.org  Tue Sep 30 22:42:59 2008
From: tciny at dword.org (Jan Althaus)
Date: Tue, 30 Sep 2008 21:42:59 +0100
Subject: [Python-3000] Request for documentation: PyModuleDef
Message-ID: <215BD948-5392-4CDD-AF82-7CCCBEEDD9D7@dword.org>

Please correct me if I'm wrong, but it doesn't seem like there is a  
full documentation of PyModuleDef's members available?
While some of them are intuitive, others aren't. The usage of m_size  
in particular isn't clear to me. I understand this is the size of  
additional per-interpreter storage, however I'm not sure how this  
translates to code; both in terms of declaration and functions such  
as PyModule_Create (e.g. how/when are the additional members/storage  
initialised?)
Do you reckon it would make sense to add an example for such a case  
to the Embedding and Extending part of the docs?
Ta!

Jan

From guido at python.org  Tue Sep 30 23:06:31 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 14:06:31 -0700
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbtveu$ocl$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbtq8t$3dl$1@ger.gmane.org>
	<ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
	<gbtveu$ocl$1@ger.gmane.org>
Message-ID: <ca471dc20809301406s286e2328u853dbc453f1faf13@mail.gmail.com>

On Tue, Sep 30, 2008 at 12:42 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> Guido van Rossum wrote:
>>
>> On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>>>
>>> Victor Stinner schrieb:
>>>>
>>>> On Windows, we might reject bytes filenames for all file operations:
>>>> open(),
>>>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
>>>
>>> Since I've seen no objections to this yet: please no. If we offer a
>>> "lower-level" bytes filename API, it should work for all platforms.
>>
>> I'm not sure either way. I've heard it claim that Windows filesystem
>> APIs use Unicode natively. Does Python 3.0 on Windows currently
>> support filenames expressed as bytes? Are they encoded first before
>> passing to the Unicode APIs? Using what encoding?
>
> In 3.0rc1, the listdir doc needs updating:
> "os.listdir(path)
> Return a list containing the names of the entries in the directory. The list
> is in arbitrary order. It does not include the special entries '.' and '..'
> even if they are present in the directory. Availability: Unix, Windows.
>
> On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will
> be a list of Unicode objects."
>
> s/Unicode/bytes/ at least for Windows.
>
>>>> os.listdir(b'.')
> [b'countries.txt', b'multeetest.py', b't1.py', b't1.pyc', b't2.py', b'tem',
> b'temp.py', b'temp.pyc', b'temp2.py', b'temp3.py', b'temp4.py', b'test.py',
> b'z', b'z.txt']
>
> The bytes names do not work however:
>
>>>> t=open(b'tem')
> Traceback (most recent call last):
>  File "<pyshell#23>", line 1, in <module>
>    t=open(b'tem')
>  File "C:\Programs\Python30\lib\io.py", line 284, in __new__
>    return open(*args, **kwargs)
>  File "C:\Programs\Python30\lib\io.py", line 184, in open
>    raise TypeError("invalid file: %r" % file)
> TypeError: invalid file: b'tem'
>
> Is this what you were asking?

No, that's because bytes is missing from the explicit list of
allowable types in io.open. Victor has a one-line trivial patch for
this. Could you try this though?

>>> import _fileio
>>> _fileio._FileIO(b'tem')

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 30 23:22:11 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 14:22:11 -0700
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <48E2865A.3010404@v.loewis.de>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<48E1C097.8030309@v.loewis.de>
	<ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>
	<48E2865A.3010404@v.loewis.de>
Message-ID: <ca471dc20809301422u1e797dacm8a19fd9b4e3e74e6@mail.gmail.com>

On Tue, Sep 30, 2008 at 1:04 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Guido van Rossum wrote:
>> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>>> Change the default file system encoding to store bytes in Unicode is like
>>>> introducing a new Python type: <fake Unicode for filename hacks>.
>>> Exactly. Seems like the best solution to me, despite your polemics.
>>
>> Martin, I don't understand why you are in favor of storing raw bytes
>> encoded as Latin-1 in Unicode string objects, which clearly gives rise
>> to mojibake. In the past you have always been staunchly opposed to API
>> changes or practices that could lead to mojibake (and you had me quite
>> convinced).
>
> True. I try to outweigh the need for simplicity in the API against the
> need to support all cases. So I see two solutions:
>
> a) support bytes as file names. Supports all cases, but complicates
>   the API very much, by pervasively bringing bytes into the status
>   of a character data type. IMO, this must be prevented at all costs.

That's a matter of opinion. I would also like to point out that it is
in fact already supported by the system calls. io.open() doesn't, but
that's a wrapper around _fileio._FileIO which does support bytes. All
other syscalls already do the right thing (even readlink()!) except
os.listdir(), which returns a mixture of bytes and str values (which
is horrible) and os.getcwd() which needs a bytes equivalent. Victor's
patch addresses all these issues.

Victor's patch also tries to fix glob.py, fnmatch.py, and
posixpath.py. That is more debatable, because this might be the start
of a never-ending project. OTOH we have precedents, e.g. the re module
similarly supports both bytes and unicode (and makes an effort to
avoid mixing them).

> b) make character (Unicode) strings the only string type. Does not
>   immediately support all cases, so some hacks are needed. However,
>   even with the hacks, it preserves the simplicity of the API; the
>   hacks then should ideally be limited to the applications that need
>   it. On this side, I see the following approaches:
>   1. try to automatically embed non-representable characters into
>      the Unicode strings, e.g. by using PUA characters. Reduces
>      the amount of moji-bake, but produces a lot of difficult issues.
>   2. let applications that desire so access all file names in a
>      uniform manner, at the cost of producing tons of moji-bake
>
> In this case, I think moji-bake is unavoidable: it is just a plain
> flaw in the POSIX implementations (not the API or specification) that
> you can run into file names where you can't come up with the right
> rendering. Even for solution a), the resulting data cannot
> be displayed "correctly" in all cases.

But I still like the ultimate solution to displaying names for (a)
better: if it's not decodable, display it as the repr() of a bytes
object. (Which happens to be its str() as well.)

> Currently, I favor b2, but haven't given up on b1, and they don't
> exclude each other. b2 is simple to implement, and delegates the
> choice between legible file names and universal access to all files
> to the application. Given the way Unix works, this is the most sensible
> choice, IMO: by default, Python should try to make file names legible,
> but stuff like backup applications should be implementable also -
> and they don't need legible file names.

I don't believe that an application-wide choice is safe. For example
the tempfile module manipulates filenames (at least for
NamedTemporaryFile) and I think it would be wrong if it were affected
by such a global setting. (E.g. the user could pass a suffix argument
containing Unicode characters outside Latin-1.)

> I think option a) will hunt us forever. People will ask for more and
> more features in the bytes type, eventually asking "give us Python
> 2.x strings back". It already starts: see #3982, where Benjamin
> asks to have .format added to bytes (for a reason unrelated to file
> names).

I'm not so worried about feature requests for the bytes type unrelated
to filesystems; we can either grant them or not, and I am actually in
many cases in favor of granting them -- just like we support bytes in
the re module as I already mentioned above. The bytes and str types
have intentionally similar APIs, because they have similar structure,
and even somewhat similar semantics (b'ABC' and 'ABC' have related
meanings even if there are subtle differences).

I am also encouraged by Glyph's support for (a). He has a lot of
practical experience.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 30 23:24:31 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 14:24:31 -0700
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbu173$vnq$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbtq8t$3dl$1@ger.gmane.org>
	<ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
	<gbtveu$ocl$1@ger.gmane.org> <gbu173$vnq$1@ger.gmane.org>
Message-ID: <ca471dc20809301424u523554d4jf322865d143ab638@mail.gmail.com>

On Tue, Sep 30, 2008 at 1:12 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> Terry Reedy wrote:
>>
>> Guido van Rossum wrote:
>
>>> I'm not sure either way. I've heard it claim that Windows filesystem
>>> APIs use Unicode natively. Does Python 3.0 on Windows currently
>>> support filenames expressed as bytes? Are they encoded first before
>>> passing to the Unicode APIs? Using what encoding?
>
>> [os.listdir(bytes) returns list of bytes, open(bytes) fails]
>
> More:
>
> The path functions seem also do not work:
>
>>>> op.abspath(b'tem')
> ...
>    path = path.replace("/", "\\")
> TypeError: expected an object with the buffer interface
>
> The error message is a bit cryptic given that the problem is that the
> arguments to replace should be bytes instead of strings for a bytes path.
>
> .basename fails with
> ...
>   while i and p[i-1] not in '/\\':
> TypeError: 'in <string>' requires string as left operand, not int
>
> os.rename, os.stat, os.mkdir, os.rmdir work.  I presume same is true for
> others that normally work on windows.

It looks roughly like the system calls do support bytes (using what
encoding?) but the Python code in os.path doesn't. This is the same as
the status quo on Linux.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ncoghlan at gmail.com  Tue Sep 30 23:31:34 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 01 Oct 2008 07:31:34 +1000
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
 2.6 or 3.0?
In-Reply-To: <B11BFA0C-4238-4623-B040-73BC5358831F@fuhm.net>
References: <200809271404.25654.victor.stinner@haypocalc.com>	<48DE705E.6050405@v.loewis.de>	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>	<48DFF382.7020006@v.loewis.de>	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>	<87od26e3an.fsf@xemacs.org>	<6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>	<ca471dc20809300957o61b554e2n9101e0b1078b1647@mail.gmail.com>
	<B11BFA0C-4238-4623-B040-73BC5358831F@fuhm.net>
Message-ID: <48E29AB6.908@gmail.com>

James Y Knight wrote:
> Those aren't good behaviors, and can't be solved simply by pretending
> certain files don't exist.

A couple of output comparisons for two of James's examples (system
Python is 2.5.3, the Python :

$ python -V
Python 2.5.2
$ python -c "import sys; print sys.argv" "$(echo -e 'filename\x90\x90')"
['-c', 'filename\x90\x90']
$ python -c "import os; print os.environ['DUMMY']"
filename??

$ ./python -V
Python 3.0b3+
$ ./python -c "import sys; print(sys.argv)" "$(echo -e 'filename\x90\x90')"
Could not convert argument 3 to str
$ ./python -c "import os; print(os.environ['DUMMY'])"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ncoghlan/devel/py3k/Lib/os.py", line 389, in __getitem__
    return self.data[self.keymap(key)]
KeyError: 'DUMMY'

(Is there a bug report for these yet?)

I'm also starting to wonder if allowing mixed types might be the way to
go for these interfaces - leaving the bytes objects in place if the
Unicode decode operation fails.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
            http://www.boredomandlaziness.org

From martin at v.loewis.de  Tue Sep 30 23:33:40 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 30 Sep 2008 23:33:40 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <loom.20080930T201027-854@post.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>		<gbr0nv$iqu$1@ger.gmane.org>		<200809300202.38574.victor.stinner@haypocalc.com>		<48E1C097.8030309@v.loewis.de>	<ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>	<48E2865A.3010404@v.loewis.de>
	<loom.20080930T201027-854@post.gmane.org>
Message-ID: <48E29B34.5080202@v.loewis.de>

> By the way, doesn't all this controversy yearn for a PEP?

There must be a solution for 3.0 (which *could* be "it's a bug,
don't use Python 3.0 on such broken systems"); we can't wait for
a PEP to resolve this issue for 3.0.

Most likely, the solution for 3.0 arrives through BDFL pronouncement,
in which case no PEP is needed.

Regards,
Martin

From qrczak at knm.org.pl  Tue Sep 30 23:34:37 2008
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Tue, 30 Sep 2008 23:34:37 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <48E29335.7090102@g.nevcal.com>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<ca471dc20809291006n93de9c6y2aad06d59b22aca3@mail.gmail.com>
	<48E29335.7090102@g.nevcal.com>
Message-ID: <3f4107910809301434j2e23d5f5l84ef14a1d248659b@mail.gmail.com>

2008/9/30 Glenn Linderman <v+python at g.nevcal.com>:

> So the problem is that a Unicode file system interface can't deal with
> non-UTF-8 byte streams as file names.
>
> So it seems there are four suggested approaches, all of which have aspects
> that are inconvenient.

Let's not forget what happens when a non-UTF-8 file name is read from
a file or written to a file, under the assumption that the filename is
written to the file directly (which probably breaks for filenames
containing newlines or such).

> 4) Use of bytes APIs on FS interfaces.  This seems to be the "solution"
> adopted by Posix that creates the "problem" encountered by Unicode-native
> applications.  It is cumbersome to deal with within applications that
> attempt to display the names.  What do Posix-style "open file" dialog boxes
> do in this case?

http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.html#g-filename-display-name

I used to observe three different ways to display such filenames
within gedit (including %xx and \xx escapes), but now it is
consistent, probably because it switched to using the above function
everywhere:
$ touch $'abc\xffz'
$ gedit
The Open dialog shows:
   abc?z (invalid encoding)
When the file is open, the window title and the tab title show:
   abc?z
and the same is in recent file list.

It has a bug: it appends " (invalid encoding)" even if the filename
contains a correctly encoded U+FFFD character. Nautilus has the same
behavior and the same bug because this is a design bug of that
function which does not allow to tell whether the conversion was
successful.

A filename containing a newline is sometimes displayed in two lines,
and sometimes with a U+000A character from a fallback font (hex
character number in a box).

-- 
Marcin Kowalczyk
qrczak at knm.org.pl
http://qrnik.knm.org.pl/~qrczak/

From guido at python.org  Tue Sep 30 23:34:36 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 14:34:36 -0700
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
	filename issue
In-Reply-To: <48E28C31.6060606@v.loewis.de>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<gbsgk6$kc1$1@ger.gmane.org>
	<ca471dc20809300659g608f8c14g29ba2b30def1be1f@mail.gmail.com>
	<gbtnjo$quh$1@ger.gmane.org>
	<ca471dc20809301045r59251402g3fe947dec3bc7f22@mail.gmail.com>
	<48E28C31.6060606@v.loewis.de>
Message-ID: <ca471dc20809301434u6116391cje5778bcef5048cc9@mail.gmail.com>

On Tue, Sep 30, 2008 at 1:29 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Guido van Rossum wrote:
>> However
>> the *proposed* behavior (returns bytes if the arg was bytes, and
>> returns str when the arg was str) is IMO sane, and no different than
>> the polymorphism found in len() or many builtin operations.
>
> My concern still is that it brings the bytes type into the status of
> another character string type, which is really bad, and will require
> further modifications to Python for the lifetime of 3.x.

I'd like to understand why this is "really bad". I though it was by
design that the str and bytes types behave pretty similarly. You can
use both as dict keys.

> This is because applications will then regularly use byte strings for
> file names on Unix, and regular strings on Windows, and then expect
> the program to work the same without further modifications.

It seems that bytes arguments actually *do* work on Windows -- somehow
they get decoded. (Unless Terry's report was from 2.x.)

> The next
> question then will be environment variables and command line arguments,
> for which we then should provide two versions (e.g. sys.argv and
> sys.argvb; for os.environ, os.environ["PATH"] could mean something
> different from os.environ[b"PATH"]).

Actually something like that may not be a bad idea. Ian Bicking's
webob supports similar double APIs for getting the request parameters
out of a request object; I believe request.GET['x'] is a text object
and request.GET_str['x'] is the corresponding uninterpreted bytes
sequence. I would prefer to have os.environb over os.environ[b"PATH"]
though.

> And so on (passwd/group file, Tkinter, ...)

I assume at some point we can stop and have sufficiently low-level
interfaces that everyone can agree are in bytes only. Bytes aren't
going away. How does Java deal with this? Its File class doesn't seem
to deal in bytes at all. What would its listFiles() method do with
undecodable filenames?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 30 23:38:15 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Sep 2008 14:38:15 -0700
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
	2.6 or 3.0?
In-Reply-To: <48E29AB6.908@gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>
	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>
	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>
	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>
	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>
	<87od26e3an.fsf@xemacs.org>
	<6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>
	<ca471dc20809300957o61b554e2n9101e0b1078b1647@mail.gmail.com>
	<B11BFA0C-4238-4623-B040-73BC5358831F@fuhm.net>
	<48E29AB6.908@gmail.com>
Message-ID: <ca471dc20809301438p4d6761c1m6937859f29bc677@mail.gmail.com>

On Tue, Sep 30, 2008 at 2:31 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> I'm also starting to wonder if allowing mixed types might be the way to
> go for these interfaces - leaving the bytes objects in place if the
> Unicode decode operation fails.

No, no, nooooo!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Tue Sep 30 23:40:01 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 23:40:01 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbtq8t$3dl$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbtq8t$3dl$1@ger.gmane.org>
Message-ID: <48E29CB1.5010309@v.loewis.de>

>> On Windows, we might reject bytes filenames for all file operations: open(), 
>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
> 
> Since I've seen no objections to this yet: please no. If we offer a
> "lower-level" bytes filename API, it should work for all platforms.

Unfortunately, it can't. You cannot represent all possible file names
in a byte string in Windows (just as you can't do so in a Unicode
string on Unix).

So using byte strings on Windows would work for some files, but fail
for others. In particular, listdir might give you a list of file names
which you then can't open/stat/recurse into.

(of course, you could use UTF-8 as the file system encoding on Windows,
but then you will have to rewrite a lot of C code first)

Regards,
Martin

From martin at v.loewis.de  Tue Sep 30 23:42:19 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 23:42:19 +0200
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <gbtvd3$na4$1@ger.gmane.org>
References: <200809291407.55291.victor.stinner@haypocalc.com>	<gbtq8t$3dl$1@ger.gmane.org>	<ca471dc20809301120y5149d346s31b0027b7bdd529e@mail.gmail.com>
	<gbtvd3$na4$1@ger.gmane.org>
Message-ID: <48E29D3B.5030900@v.loewis.de>

> Oh, ok. I had assumed Windows just uses a fixed encoding without the problem
> of misencoded filenames.

It's the other way 'round: On Windows, Unicode file names are the
natural choice, and byte strings have limitations. In a sense, Windows
got it right - but then, they started later. Unix missed the opportunity
of declaring that all file APIs are UTF-8 (except for Plan-9 and OS X,
neither being "true" Unix).

Regards,
Martin

From martin at v.loewis.de  Tue Sep 30 23:48:33 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 23:48:33 +0200
Subject: [Python-3000] Request for documentation: PyModuleDef
In-Reply-To: <215BD948-5392-4CDD-AF82-7CCCBEEDD9D7@dword.org>
References: <215BD948-5392-4CDD-AF82-7CCCBEEDD9D7@dword.org>
Message-ID: <48E29EB1.4070800@v.loewis.de>

Jan Althaus wrote:
> Please correct me if I'm wrong, but it doesn't seem like there is a full
> documentation of PyModuleDef's members available?

That's most likely the case, yes.

> While some of them are intuitive, others aren't. The usage of m_size in
> particular isn't clear to me.

See PEP 3121.

> I understand this is the size of
> additional per-interpreter storage, however I'm not sure how this
> translates to code; both in terms of declaration and functions such as
> PyModule_Create (e.g. how/when are the additional members/storage
> initialised?)
> Do you reckon it would make sense to add an example for such a case to
> the Embedding and Extending part of the docs?

Sure! Contributions are welcome.

Regards,
Martin

From martin at v.loewis.de  Tue Sep 30 23:51:18 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Sep 2008 23:51:18 +0200
Subject: [Python-3000] [Python-Dev] Filename as byte string in python
 2.6 or 3.0?
In-Reply-To: <48E29AB6.908@gmail.com>
References: <200809271404.25654.victor.stinner@haypocalc.com>	<48DE705E.6050405@v.loewis.de>	<52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com>	<48DFF382.7020006@v.loewis.de>	<52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com>	<96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net>	<aac2c7cb0809290032x336951e3pd430e464607c4fb0@mail.gmail.com>	<2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net>	<87od26e3an.fsf@xemacs.org>	<6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net>	<ca471dc20809300957o61b554e2n9101e0b1078b1647@mail.gmail.com>	<B11BFA0C-4238-4623-B040-73BC5358831F@fuhm.net>
	<48E29AB6.908@gmail.com>
Message-ID: <48E29F56.7060206@v.loewis.de>

> $ ./python -c "import sys; print(sys.argv)" "$(echo -e 'filename\x90\x90')"
> Could not convert argument 3 to str
> $ ./python -c "import os; print(os.environ['DUMMY'])"
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File "/home/ncoghlan/devel/py3k/Lib/os.py", line 389, in __getitem__
>     return self.data[self.keymap(key)]
> KeyError: 'DUMMY'
> 
> (Is there a bug report for these yet?)
> 
> I'm also starting to wonder if allowing mixed types might be the way to
> go for these interfaces - leaving the bytes objects in place if the
> Unicode decode operation fails.

While I can sympathize with people having non-ASCII file names on their
disks, I can't sympathize with this example. Normal users just don't
put \x90 into their command lines, and those who do deserve the error
message they get.

Regards,
Martin

From foom at fuhm.net  Tue Sep 30 23:59:10 2008
From: foom at fuhm.net (James Y Knight)
Date: Tue, 30 Sep 2008 17:59:10 -0400
Subject: [Python-3000] New proposition for Python3 bytes filename issue
In-Reply-To: <48E29CB1.5010309@v.loewis.de>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbtq8t$3dl$1@ger.gmane.org> <48E29CB1.5010309@v.loewis.de>
Message-ID: <83758335-97EA-441B-A783-05F16EBE6D7A@fuhm.net>

On Sep 30, 2008, at 5:40 PM, Martin v. L?wis wrote:
>>> On Windows, we might reject bytes filenames for all file  
>>> operations: open(),
>>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
>>
>> Since I've seen no objections to this yet: please no. If we offer a
>> "lower-level" bytes filename API, it should work for all platforms.
>
> Unfortunately, it can't. You cannot represent all possible file names
> in a byte string in Windows (just as you can't do so in a Unicode
> string on Unix).

As you mention in the parenthetical below, of course it can.

> So using byte strings on Windows would work for some files, but fail
> for others. In particular, listdir might give you a list of file names
> which you then can't open/stat/recurse into.
>
> (of course, you could use UTF-8 as the file system encoding on  
> Windows,
> but then you will have to rewrite a lot of C code first)

Yes! If there is a byte-string access method for Windows, pretty  
please make it decode from UTF-8 internally and call the Unicode  
version of the Windows APIs. The non-unicode windows APIs are pretty  
much just broken -- Ideally, Python should never be calling those.

But, I still don't like the idea of propagating the "sometimes a  
string, sometimes bytes" APIs...One or the other, please. Either  
always strings (if and only if a method for assuring decoding always  
succeeds), or always bytes.

James

From solipsis at pitrou.net  Tue Sep 30 23:55:48 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Sep 2008 23:55:48 +0200
Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes
 filename issue
In-Reply-To: <48E29B34.5080202@v.loewis.de>
References: <200809291407.55291.victor.stinner@haypocalc.com>
	<gbr0nv$iqu$1@ger.gmane.org>
	<200809300202.38574.victor.stinner@haypocalc.com>
	<48E1C097.8030309@v.loewis.de>
	<ca471dc20809300653m4e79dcd7y818b624f9ecd8f5e@mail.gmail.com>
	<48E2865A.3010404@v.loewis.de>
	<loom.20080930T201027-854@post.gmane.org>
	<48E29B34.5080202@v.loewis.de>
Message-ID: <1222811748.11841.0.camel@fsol>

Le mardi 30 septembre 2008 ? 23:33 +0200, "Martin v. L?wis" a ?crit :
> > By the way, doesn't all this controversy yearn for a PEP?
> 
> There must be a solution for 3.0 (which *could* be "it's a bug,
> don't use Python 3.0 on such broken systems"); we can't wait for
> a PEP to resolve this issue for 3.0.

Yes, I was thinking of a PEP for 3.1, with the solution for 3.0 being
"it's a bug, don't use Python 3.0 on such broken systems" :-)

Regards

Antoine.

From glyph at divmod.com  Tue Sep 30 19:59:32 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Tue, 30 Sep 2008 17:59:32 -0000
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in	Python3
In-Reply-To: <ca471dc20809300732r456678fcgb8caeb369a6cf349@mail.gmail.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
	<20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com>
	<ca471dc20809300732r456678fcgb8caeb369a6cf349@mail.gmail.com>
Message-ID: <20080930175932.31635.989735053.divmod.xquotient.478@weber.divmod.com>

On 02:32 pm, guido at python.org wrote:
>On Tue, Sep 30, 2008 at 6:21 AM,  <glyph at divmod.com> wrote:
>>On 12:47 am, victor.stinner at haypocalc.com wrote:

>>It sounds like maybe there should be some 2to3 fixers in here 
>>somewhere,
>>too?  Not necessarily as part of this patch, but somewhere related?  I 
>>don't
>>know what they would do, but it does seem quite likely that code which 
>>was
>>previously correct under 2.6 (using bytes) would suddenly be mixing 
>>bytes
>>and unicode with these APIs.
>
>Doesn't seem easy for 2to3 to recognize such cases.

Actually I think I'm wrong.  As far as dealing with glob(), listdir() 
and friends, I suppose that other bytes/text fixers will already have 
had their opportunity to deal with getting the type to be the 
appropriate thing, and if you have glob(<something that 2to3 understands 
should be bytes>) it will work as expected in 3.0.  (I am really just 
confirming that I have nothing useful to say here, using too many words 
to do it: at least, I hope that nobody will waste further time thinking 
about it as a result.)
>If 2.6 weren't pretty much released already I'd ask to add
>os.getcwdb() there, as an alias for os.getcwd(), and add a 2to3 fixer
>that converts os.getcwdu() to os.getcwd(), leaves os.getcwd() alone
>(benefit of the doubt) and leaves os.getcwdb() alone as well (a strong
>indication the user meant to get bytes in the 3.x version of their
>code. (Similar to using bytes instead of str in 2.6 even though they
>mean the same thing there -- they will be properly separated in 3.x.)

In the absence of a 2.6 getcwdb, perhaps the fixer could just drop the 
"benefit of the doubt" case?  It could always be added to 2.7, and the 
parity release of 2to3 could have a --2.7 switch that would modify the 
behavior of this and other fixers.

From glyph at divmod.com  Tue Sep 30 20:47:51 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Tue, 30 Sep 2008 18:47:51 -0000
Subject: [Python-3000] [Python-Dev] Patch for an initial support of
	bytes filename in	Python3
In-Reply-To: <ca471dc20809301056j6800b6e1nca9a9ec5a52e8445@mail.gmail.com>
References: <200809300247.20349.victor.stinner@haypocalc.com>
	<20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com>
	<ca471dc20809300732r456678fcgb8caeb369a6cf349@mail.gmail.com>
	<20080930175932.31635.989735053.divmod.xquotient.478@weber.divmod.com>
	<ca471dc20809301056j6800b6e1nca9a9ec5a52e8445@mail.gmail.com>
Message-ID: <20080930184751.31635.1484325691.divmod.xquotient.520@weber.divmod.com>

On 05:56 pm, guido at python.org wrote:
>On Tue, Sep 30, 2008 at 10:59 AM,  <glyph at divmod.com> wrote:
>>On 02:32 pm, guido at python.org wrote:

>>In the absence of a 2.6 getcwdb, perhaps the fixer could just drop the
>>"benefit of the doubt" case?  It could always be added to 2.7, and the
>>parity release of 2to3 could have a --2.7 switch that would modify the
>>behavior of this and other fixers.
>
>I'm not sure what you're proposing. *My* proposal is that 2to3 changes
>os.getcwdu() calls to os.getcwd() and leaves os.getcwd() calls alone
>-- there's no way to tell whether os.getcwdb() would be a better
>match, and for portable code, it won't be (since os.getcwdb() is a
>Unix-only thing).

My proposal is simply to change getcwd to getcwdb, and getcwdu to 
getcwd.  This preserves whatever bytes/text behavior you are expecting 
from 2.6 into 3.0.  Granted, the fact that unicode is really always the 
right thing to do on Windows complicates things.

I already tend to avoid os.getcwd() though, and this is just one more 
reason to avoid it.  In the rare cases where I really do need it, it 
looks like os.path.abspath(b".") / os.path.abspath(u".") will provide 
the clarity that I want.