From starsareblueandfaraway at  Thu May  5 16:37:04 2011
From: starsareblueandfaraway at (Roy Hyunjin Han)
Date: Thu, 5 May 2011 10:37:04 -0400
Subject: [Python-ideas] [Python-Dev] What if replacing items in a
 dictionary returns the new dictionary?
In-Reply-To: <>
References: <>
Message-ID: <>

>> 2011/4/29 Roy Hyunjin Han <starsareblueandfaraway at>:
>> It would be convenient if replacing items in a dictionary returns the
>> new dictionary, in a manner analogous to str.replace().  What do you
>> think?
>>    # Current behavior
>>    x = {'key1': 1}
>>    x.update(key1=3) == None
>>    x == {'key1': 3} # Original variable has changed
>>    # Possible behavior
>>    x = {'key1': 1}
>>    x.replace(key1=3) == {'key1': 3}
>>    x == {'key1': 1} # Original variable is unchanged
> 2011/5/5 Giuseppe Ottaviano <giuott at>:
> In general nothing stops you to use a proxy object that returns itself
> after each method call, something like
> class using(object):
>    def __init__(self, obj):
>        self._wrappee = obj
>    def unwrap(self):
>        return self._wrappee
>    def __getattr__(self, attr):
>        def wrapper(*args, **kwargs):
>            getattr(self._wrappee, attr)(*args, **kwargs)
>            return self
>        return wrapper
> d = dict()
> print using(d).update(dict(a=1)).update(dict(b=2)).unwrap()
> # prints {'a': 1, 'b': 2}
> l = list()
> print using(l).append(1).append(2).unwrap()
> # prints [1, 2]

Cool!  I never thought of that.  That's a great snippet.

I'll forward this to the python-ideas list.  I don't think the
python-dev people want this discussion to continue on their mailing

From starsareblueandfaraway at  Thu May  5 16:42:57 2011
From: starsareblueandfaraway at (Roy Hyunjin Han)
Date: Thu, 5 May 2011 10:42:57 -0400
Subject: [Python-ideas] [Python-Dev] What if replacing items in a
 dictionary returns the new dictionary?
In-Reply-To: <>
References: <>
Message-ID: <>

>> ? ?# Possible behavior
>> ? ?x = {'key1': 1}
>> ? ?x.replace(key1=3) == {'key1': 3}
>> ? ?x == {'key1': 1} # Original variable is unchanged
> 2011/5/5 Giuseppe Ottaviano <giuott at>:
> class using(object):
> ? ?def __init__(self, obj):
> ? ? ? ?self._wrappee = obj
> ? ?def unwrap(self):
> ? ? ? ?return self._wrappee
> ? ?def __getattr__(self, attr):
> ? ? ? ?def wrapper(*args, **kwargs):
> ? ? ? ? ? ?getattr(self._wrappee, attr)(*args, **kwargs)
> ? ? ? ? ? ?return self
> ? ? ? ?return wrapper

The only thing I would add is obj.copy(), to ensure that the original
dictionary is unchanged.

class using(object):
    def __init__(self, obj):
        self._wrappee = obj.copy()

From starsareblueandfaraway at  Thu May  5 17:19:16 2011
From: starsareblueandfaraway at (Roy Hyunjin Han)
Date: Thu, 5 May 2011 11:19:16 -0400
Subject: [Python-ideas] [Python-Dev] What if replacing items in a
 dictionary returns the new dictionary?
In-Reply-To: <>
References: <>
Message-ID: <>

2011/5/5 Giuseppe Ottaviano <giuott at>:
>> The only thing I would add is obj.copy(), to ensure that the original
>> dictionary is unchanged.
>> class using(object):
>> ? ?def __init__(self, obj):
>> ? ? ? ?self._wrappee = obj.copy()
> My example was just a proof of concept, there are many other things
> that may need to be taken care of (for example, non-callable
> attributes).
> BTW, the copy should be done outside. If the object is copied, I'd say
> "using" is a poor choice of name for the proxy.

You're right, I would need to do more work to get it to mimic the
underlying object.  I think I will stick with Oleg's suggestion to
subclass dict for now; it's great for unit tests.  Thanks for the
idea, though.

class ReplaceableDict(dict):
   def replace(self, **kwargs):
       'Works for replacing string-based keys'
       return dict(self.items() + kwargs.items())

From moloney at  Thu May  5 23:41:06 2011
From: moloney at (Brendan Moloney)
Date: Thu, 5 May 2011 14:41:06 -0700
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>


I posted this on python-dev, but was told that this is the more appropriate list.

Currently if I do:

$ import pkg

Then all of the public subpackages/submodules are not automatically pulled into the 'pkg' namespace. I can do:

$ from pkg import *

To get all of the public subpackages/submodules, but that dumps them all into the current namespace. Why not allow:

$ import pkg.*

This would allow easier interactive use (by eliminating the need to import individual subpackages/submodules) while keeping the 'pkg' namespace around.

Brendan Moloney

From benjamin at  Fri May  6 00:00:35 2011
From: benjamin at (Benjamin Peterson)
Date: Thu, 5 May 2011 22:00:35 +0000 (UTC)
Subject: [Python-ideas] Allow 'import star' with namespaces
References: <>
Message-ID: <>

Brendan Moloney <moloney at ...> writes:
> This would allow easier interactive use (by eliminating the need to import
> subpackages/submodules) while keeping the 'pkg' namespace around.

import * is generally frowned upon, so encouraging its use by extending it is
not a good idea.

From moloney at  Fri May  6 00:24:16 2011
From: moloney at (Brendan Moloney)
Date: Thu, 5 May 2011 15:24:16 -0700
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson [benjamin at] wrote:
> import * is generally frowned upon, so encouraging its use by extending it is
> not a good idea.

Well it is frowned upon precisely because it pollutes the current namespace. This change would eliminate that issue.

From dag.odenhall at  Fri May  6 09:20:26 2011
From: dag.odenhall at (dag.odenhall at
Date: Fri, 6 May 2011 09:20:26 +0200
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 May 2011 00:24, Brendan Moloney <moloney at> wrote:
> Benjamin Peterson [benjamin at] wrote:
>> import * is generally frowned upon, so encouraging its use by extending it is
>> not a good idea.
> Well it is frowned upon precisely because it pollutes the current namespace. This change would eliminate that issue.

I like this idea, except it's inconsistent with from-import-star, the
latter which does *not* get you sub-packages or modules.

From g.brandl at  Fri May  6 09:44:02 2011
From: g.brandl at (Georg Brandl)
Date: Fri, 06 May 2011 09:44:02 +0200
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <iq08s1$psk$>

On 06.05.2011 09:20, dag.odenhall at wrote:
> On 6 May 2011 00:24, Brendan Moloney <moloney at> wrote:
>> Benjamin Peterson [benjamin at] wrote:
>>> import * is generally frowned upon, so encouraging its use by extending it is
>>> not a good idea.
>> Well it is frowned upon precisely because it pollutes the current namespace. This change would eliminate that issue.
> I like this idea, except it's inconsistent with from-import-star, the
> latter which does *not* get you sub-packages or modules.

And that's for a reason: it's not easy (I think it's even impossible, because
for example individual submodules can change __path__) to determine all
importable submodules of a package.

So ``import pkg.*`` would not have any behavior other than ``import pkg``.


From matt at  Fri May  6 19:51:24 2011
From: matt at (Matt Chaput)
Date: Fri, 06 May 2011 13:51:24 -0400
Subject: [Python-ideas] 1_000_000
Message-ID: <>

Not sure if this has been proposed before: A syntax change to allow 
underscores as thousands separators in literal numbers to improve 
readability, e.g.:

   for i in range(1, 1_000_000):

I believe D allows this and while it's a small thing it really is much 
more readable.

Worth a PEP?



From janssen at  Fri May  6 21:11:59 2011
From: janssen at (Bill Janssen)
Date: Fri, 6 May 2011 12:11:59 PDT
Subject: [Python-ideas] thoughts on regular expression improvements
Message-ID: <>

I've been doing a lot of RE hacking lately, and some possible
improvements suggest themselves.

1.  Multiple occurrences of a named group

Right now, you can compose RE's with

   x = re.compile("...")
   y = re.compile("..." + x.pattern + "...")

But if x contains named groups, you run into trouble if you have
something like

   z = re.compile("..." + x.pattern + "..." + x.pattern + "...")

which can easily happen if x could occur at various places in z.  The
issue is that a named group is only allowed once, which isn't a bad
error-prevention mechanism, but it would be nice if it could occur more
than once (in alternative subexpressions), perhaps enabled by a another
RE flag.

2.  Easier composition.


   y = re.compile("..." + x.pattern + "...")

seems a tad groty, to use a term from my childhood, and affords the RE
engine no purchase on the composition, which can be an issue if the
flags for x are different from the flags for y.

If the first argument to re.compile could be a tuple or list, you could write

   y = re.compile(["...", x, "..."])

and the engine could see that "..." is a string, and that x is a RE, and
could inspect x as necessary.

3.  Edit distances.

The RE engine TRE ( supports fuzzy
matching of strings, using edit distances.

One can write an expression like "(total){~2}" which would any string
that's "total" with no more than two edit errors.

You can also specify insertions, deletions, and substitution limits
separately with "+", "-", and "#".

That would be nice to have...


From moloney at  Fri May  6 21:49:08 2011
From: moloney at (Brendan Moloney)
Date: Fri, 6 May 2011 12:49:08 -0700
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <iq08s1$psk$>
References: <>
Message-ID: <>

dag.odenhall at wrote:
> I like this idea, except it's inconsistent with from-import-star, the
> latter which does *not* get you sub-packages or modules.

Georg Brandl [g.brandl at] wrote:
> And that's for a reason: it's not easy (I think it's even impossible, because
> for example individual submodules can change __path__) to determine all
> importable submodules of a package.

> So ``import pkg.*`` would not have any behavior other than ``import pkg``.

When I said all _public_ sub-packages and modules I was referring to those listed in the  __all__ attribute of 'pkg'.  Thus it would behave in the exact same way as from-import-star except you don't pollute the current namespace.


From dirkjan at  Fri May  6 21:58:36 2011
From: dirkjan at (Dirkjan Ochtman)
Date: Fri, 6 May 2011 21:58:36 +0200
Subject: [Python-ideas] thoughts on regular expression improvements
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 21:11, Bill Janssen <janssen at> wrote:
> I've been doing a lot of RE hacking lately, and some possible
> improvements suggest themselves.

Have you looked at the regex module?



From ethan at  Fri May  6 22:12:00 2011
From: ethan at (Ethan Furman)
Date: Fri, 06 May 2011 13:12:00 -0700
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>	<>	<>	<>	<>,
Message-ID: <>

Brendan Moloney wrote:
> dag.odenhall at wrote:
>> I like this idea, except it's inconsistent with from-import-star, the
>> latter which does *not* get you sub-packages or modules.
> Georg Brandl [g.brandl at] wrote:
>> And that's for a reason: it's not easy (I think it's even impossible, because
>> for example individual submodules can change __path__) to determine all
>> importable submodules of a package.
>> So ``import pkg.*`` would not have any behavior other than ``import pkg``.
> When I said all _public_ sub-packages and modules I was referring to those
 > listed in the  __all__ attribute of 'pkg'.  Thus it would behave in 
the exact
 > same way as from-import-star except you don't pollute the current 

I'm not catching the vision -- could you put together a short example 
that would illustrate?


From janssen at  Fri May  6 22:28:12 2011
From: janssen at (Bill Janssen)
Date: Fri, 6 May 2011 13:28:12 PDT
Subject: [Python-ideas] thoughts on regular expression improvements
In-Reply-To: <>
References: <>
Message-ID: <>

Dirkjan Ochtman <dirkjan at> wrote:

> On Fri, May 6, 2011 at 21:11, Bill Janssen <janssen at> wrote:
> > I've been doing a lot of RE hacking lately, and some possible
> > improvements suggest themselves.
> Have you looked at the regex module?

>From Python 1.4?  Not in a long time...


From janssen at  Fri May  6 22:32:18 2011
From: janssen at (Bill Janssen)
Date: Fri, 6 May 2011 13:32:18 PDT
Subject: [Python-ideas] thoughts on regular expression improvements
In-Reply-To: <>
References: <>
Message-ID: <>

Dirkjan Ochtman <dirkjan at> wrote:

> On Fri, May 6, 2011 at 21:11, Bill Janssen <janssen at> wrote:
> > I've been doing a lot of RE hacking lately, and some possible
> > improvements suggest themselves.
> Have you looked at the regex module?

Ah, you mean the PyPI "regex".  Looks like it has "branch reset", which
might support my #1?  Using the same group name multiple times?

I don't see fuzzy matches, or support for composition, though.


From jsbueno at  Fri May  6 22:42:53 2011
From: jsbueno at (Joao S. O. Bueno)
Date: Fri, 6 May 2011 17:42:53 -0300
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 5:12 PM, Ethan Furman <ethan at> wrote:
> Brendan Moloney wrote:
>> dag.odenhall at wrote:
>>> I like this idea, except it's inconsistent with from-import-star, the
>>> latter which does *not* get you sub-packages or modules.
>> Georg Brandl [g.brandl at] wrote:
>>> And that's for a reason: it's not easy (I think it's even impossible,
>>> because
>>> for example individual submodules can change __path__) to determine all
>>> importable submodules of a package.
>>> So ``import pkg.*`` would not have any behavior other than ``import
>>> pkg``.
>> When I said all _public_ sub-packages and modules I was referring to those
>> listed in the ?__all__ attribute of 'pkg'. ?Thus it would behave in the
>> exact
>> same way as from-import-star except you don't pollute the current
>> namespace.
> I'm not catching the vision -- could you put together a short example that
> would illustrate?

The idea is to be able to do operate witha  single import when
submodules would have to be
implicited imported - like xml.etree.ElementTree :

[gwidion at powerpuff ~]$ python
Python 2.6.1 (r261:67515, Apr 12 2009, 04:14:16)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml
>>> xml.etree
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'etree'
>>> import xml.etree
>>> xml.etree.ElementTree
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'ElementTree'
>>> import xml.etree.ElementTree
>>> xml.etree.ElementTree
<module 'xml.etree.ElementTree' from

> ~Ethan~
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From moloney at  Fri May  6 22:50:14 2011
From: moloney at (Brendan Moloney)
Date: Fri, 6 May 2011 13:50:14 -0700
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan Furman [ethan at] wrote:

> I'm not catching the vision -- could you put together a short example
> that would illustrate?

The motivation is really just for interactive usage (much like the current from-import-star). 

If 'pkg' contains a number of sub-packages/modules that take a while to import, it makes sense to not automatically import them into the 'pkg' namespace (in the pkg.__init__ module). Putting the sub-package/module names into the __all__ list gives interactive users the ability to import everything in one go using from-import-star. Unfortunately the from-import-star usage pollutes the current namespace, and thus its use is discouraged. 

So really the vision is that developers can make their packages convenient for interactive use (by setting the __all__ attribute) without requiring users to use a discouraged language feature or making regular import of the package slow.


From ericsnowcurrently at  Fri May  6 22:52:09 2011
From: ericsnowcurrently at (Eric Snow)
Date: Fri, 6 May 2011 14:52:09 -0600
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 2:12 PM, Ethan Furman <ethan at> wrote:

> Brendan Moloney wrote:
>> dag.odenhall at wrote:
>>> I like this idea, except it's inconsistent with from-import-star, the
>>> latter which does *not* get you sub-packages or modules.
>> Georg Brandl [g.brandl at] wrote:
>>> And that's for a reason: it's not easy (I think it's even impossible,
>>> because
>>> for example individual submodules can change __path__) to determine all
>>> importable submodules of a package.
>>  So ``import pkg.*`` would not have any behavior other than ``import
>>> pkg``.
>> When I said all _public_ sub-packages and modules I was referring to those
> > listed in the  __all__ attribute of 'pkg'.  Thus it would behave in the
> exact
> > same way as from-import-star except you don't pollute the current
> namespace.
> I'm not catching the vision -- could you put together a short example that
> would illustrate?
He's saying that the package would be imported like normal.  Then all
"public" sub-modules of the package would automatically imported and bound
to the namespace of the object that resulted from the import of the package.
 The trickery is that __all__ in the would change meaning
somewhat, and, do you bind the submodules into the package's module object
or something else?

If you have a list of the submodules you want imported then you can already
accomplish this:

import parent
for mod in parent.__all_submodules__:

Of course, this does not bind the submodules to the namespace of the package
module, but I suppose you could try that with one more step.  I am not sure
of the specific import mechanism with regards to name binding, but that
would seem to be a conflict with the way imported names for submodules are


> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From dag.odenhall at  Fri May  6 22:59:05 2011
From: dag.odenhall at (dag.odenhall at
Date: Fri, 6 May 2011 22:59:05 +0200
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 May 2011 22:12, Ethan Furman <ethan at> wrote:
> Brendan Moloney wrote:
>> dag.odenhall at wrote:
>>> I like this idea, except it's inconsistent with from-import-star, the
>>> latter which does *not* get you sub-packages or modules.
>> Georg Brandl [g.brandl at] wrote:
>>> And that's for a reason: it's not easy (I think it's even impossible,
>>> because
>>> for example individual submodules can change __path__) to determine all
>>> importable submodules of a package.
>>> So ``import pkg.*`` would not have any behavior other than ``import
>>> pkg``.
>> When I said all _public_ sub-packages and modules I was referring to those
>> listed in the ?__all__ attribute of 'pkg'. ?Thus it would behave in the
>> exact
>> same way as from-import-star except you don't pollute the current
>> namespace.

If you're going to require listing in __all__ anyway, you might as
well use what already works: import the modules in the package, and
you can then import the package and access the modules as attributes:

from . import mod
import pkg
pkg.mod  #=> pkg/

From dag.odenhall at  Fri May  6 23:06:18 2011
From: dag.odenhall at (dag.odenhall at
Date: Fri, 6 May 2011 23:06:18 +0200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 May 2011 19:51, Matt Chaput <matt at> wrote:
> Not sure if this has been proposed before: A syntax change to allow
> underscores as thousands separators in literal numbers to improve
> readability, e.g.:
> ?for i in range(1, 1_000_000):
> ? ?pass
> I believe D allows this and while it's a small thing it really is much more
> readable.

Ruby too.

You could also use e-notation[1]: 1e6, in your example. In many
situations it's even more readable because you don't need to "count
the zeros". This is already supported in Python.


From nadeem.vawda at  Fri May  6 23:23:05 2011
From: nadeem.vawda at (Nadeem Vawda)
Date: Fri, 6 May 2011 23:23:05 +0200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 11:06 PM, dag.odenhall at
<dag.odenhall at> wrote:
> You could also use e-notation[1]: 1e6, in your example.

1e6 is a float, though. If you use it in that example, range() complains that
its arguments must be integers.

From solipsis at  Fri May  6 23:24:07 2011
From: solipsis at (Antoine Pitrou)
Date: Fri, 6 May 2011 23:24:07 +0200
Subject: [Python-ideas] 1_000_000
References: <>
Message-ID: <>

On Fri, 6 May 2011 23:06:18 +0200
"dag.odenhall at"
<dag.odenhall at> wrote:
> On 6 May 2011 19:51, Matt Chaput <matt-KKMwxO2wslj3fQ9qLvQP4Q at> wrote:
> > Not sure if this has been proposed before: A syntax change to allow
> > underscores as thousands separators in literal numbers to improve
> > readability, e.g.:
> >
> > ?for i in range(1, 1_000_000):
> > ? ?pass
> >
> > I believe D allows this and while it's a small thing it really is much more
> > readable.
> Ruby too.
> You could also use e-notation[1]: 1e6, in your example. In many
> situations it's even more readable because you don't need to "count
> the zeros". This is already supported in Python.

Yes, but it gives a float, not an integer:

>>> for i in range(0, 1e6): pass
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'float' object cannot be interpreted as an integer



From kirubakaran at  Fri May  6 23:25:56 2011
From: kirubakaran at (Kirubakaran)
Date: Fri, 6 May 2011 14:25:56 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

How about range(10**60) ?

- Kirubakaran.

On Fri, May 6, 2011 at 2:24 PM, Antoine Pitrou <solipsis at> wrote:

> On Fri, 6 May 2011 23:06:18 +0200
> "dag.odenhall at"
> <dag.odenhall at> wrote:
> > On 6 May 2011 19:51, Matt Chaput <
> matt-KKMwxO2wslj3fQ9qLvQP4Q at> wrote:
> > > Not sure if this has been proposed before: A syntax change to allow
> > > underscores as thousands separators in literal numbers to improve
> > > readability, e.g.:
> > >
> > >  for i in range(1, 1_000_000):
> > >    pass
> > >
> > > I believe D allows this and while it's a small thing it really is much
> more
> > > readable.
> >
> > Ruby too.
> >
> > You could also use e-notation[1]: 1e6, in your example. In many
> > situations it's even more readable because you don't need to "count
> > the zeros". This is already supported in Python.
> Yes, but it gives a float, not an integer:
> >>> for i in range(0, 1e6): pass
> ...
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> TypeError: 'float' object cannot be interpreted as an integer
> Regards
> Antoine.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From kirubakaran at  Fri May  6 23:26:14 2011
From: kirubakaran at (Kirubakaran)
Date: Fri, 6 May 2011 14:26:14 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

(fixed typo)
How about range(10**6) ?

- Kirubakaran.

On Fri, May 6, 2011 at 2:25 PM, Kirubakaran <kirubakaran at> wrote:

> How about range(10**60) ?
> - Kirubakaran.
> On Fri, May 6, 2011 at 2:24 PM, Antoine Pitrou <solipsis at>wrote:
>> On Fri, 6 May 2011 23:06:18 +0200
>> "dag.odenhall at"
>> <dag.odenhall at> wrote:
>> > On 6 May 2011 19:51, Matt Chaput <
>> matt-KKMwxO2wslj3fQ9qLvQP4Q at> wrote:
>> > > Not sure if this has been proposed before: A syntax change to allow
>> > > underscores as thousands separators in literal numbers to improve
>> > > readability, e.g.:
>> > >
>> > >  for i in range(1, 1_000_000):
>> > >    pass
>> > >
>> > > I believe D allows this and while it's a small thing it really is much
>> more
>> > > readable.
>> >
>> > Ruby too.
>> >
>> > You could also use e-notation[1]: 1e6, in your example. In many
>> > situations it's even more readable because you don't need to "count
>> > the zeros". This is already supported in Python.
>> Yes, but it gives a float, not an integer:
>> >>> for i in range(0, 1e6): pass
>> ...
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>> TypeError: 'float' object cannot be interpreted as an integer
>> Regards
>> Antoine.
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From matt at  Fri May  6 23:36:47 2011
From: matt at (Matt Chaput)
Date: Fri, 06 May 2011 17:36:47 -0400
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

On 06/05/2011 5:26 PM, Kirubakaran wrote:
> (fixed typo)
> How about range(10**6) ?

Both 1e6 (if it worked in the example) and 10**6 both require a bit of 
work (at least for my non-mathematician brain) to decode as "1 million", 
whereas with 1_000_000 you're not so much counting the zeros in your 
head as counting the *groups* of zeros visually. For me it's much more 
readable at a glance.

Also, obviously the 10**6 trick doesn't work so well if the example is:

   for i in range(47_284_345):


From kirubakaran at  Fri May  6 23:37:10 2011
From: kirubakaran at (Kirubakaran)
Date: Fri, 6 May 2011 14:37:10 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

Ah, thanks. Sorry, I don't know how I failed to see that.

On Fri, May 6, 2011 at 2:30 PM, Andre Roberge <andre.roberge at>wrote:

> I believe that the original suggestion was meant to be more general than
> the specific suggestions for powers of 10.  For example, consider the
> following hypothetical:
> for i in range(1, 1_111_111_111, 1024):
>     pass
> where the _ really helps in figuring out the size.
> Andr?
> On Fri, May 6, 2011 at 6:26 PM, Kirubakaran <kirubakaran at> wrote:
>> (fixed typo)
>> How about range(10**6) ?
>> - Kirubakaran.
>> On Fri, May 6, 2011 at 2:25 PM, Kirubakaran <kirubakaran at>wrote:
>>> How about range(10**60) ?
>>> - Kirubakaran.
>>> On Fri, May 6, 2011 at 2:24 PM, Antoine Pitrou <solipsis at>wrote:
>>>> On Fri, 6 May 2011 23:06:18 +0200
>>>> "dag.odenhall at"
>>>> <dag.odenhall at> wrote:
>>>> > On 6 May 2011 19:51, Matt Chaput <
>>>> matt-KKMwxO2wslj3fQ9qLvQP4Q at> wrote:
>>>> > > Not sure if this has been proposed before: A syntax change to allow
>>>> > > underscores as thousands separators in literal numbers to improve
>>>> > > readability, e.g.:
>>>> > >
>>>> > >  for i in range(1, 1_000_000):
>>>> > >    pass
>>>> > >
>>>> > > I believe D allows this and while it's a small thing it really is
>>>> much more
>>>> > > readable.
>>>> >
>>>> > Ruby too.
>>>> >
>>>> > You could also use e-notation[1]: 1e6, in your example. In many
>>>> > situations it's even more readable because you don't need to "count
>>>> > the zeros". This is already supported in Python.
>>>> Yes, but it gives a float, not an integer:
>>>> >>> for i in range(0, 1e6): pass
>>>> ...
>>>> Traceback (most recent call last):
>>>>  File "<stdin>", line 1, in <module>
>>>> TypeError: 'float' object cannot be interpreted as an integer
>>>> Regards
>>>> Antoine.
>>>> _______________________________________________
>>>> Python-ideas mailing list
>>>> Python-ideas at
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From bruce at  Fri May  6 23:38:19 2011
From: bruce at (Bruce Leban)
Date: Fri, 6 May 2011 14:38:19 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

None of these answers address the original suggestion. Matt didn't say that
he only wanted this for numbers of the form 10^N; he just gave that as an

Consider these examples instead:

   - 1_234_000
   - 9.876_543_210
   - 0xFEFF_0042

I'm not advocating this change (nor against it); I just think the discussion
should be focused on the actual idea. I do have a question:

Is _ just ignored in numbers or are there more complex rules?

   - 1_2345_6789  (can I use groups of other sizes instead?)
   - 1_2_3_4_5  (ditto)
   - 1_234_6789  (do all the groups need to be the same size?)
   - 1_   (must the _ only be in between 2 digits?)
   - 1__234   (what about multiple _s?)
   - 9.876_543_210   (can it be used to the right of the decimal point?)
   - 0xFEFF_0042   (can it be used in hex, octal or binary numbers?)
   - int('123_456')   (do other functions accept this syntax too?)

--- Bruce
Puzzazz newsletter: including April Fools!
Blog post: Ironically, a glaring Google grammatical

On Fri, May 6, 2011 at 2:26 PM, Kirubakaran <kirubakaran at> wrote:

> (fixed typo)
> How about range(10**6) ?
> - Kirubakaran.
> On Fri, May 6, 2011 at 2:25 PM, Kirubakaran <kirubakaran at> wrote:
>> How about range(10**60) ?
>> - Kirubakaran.
>> On Fri, May 6, 2011 at 2:24 PM, Antoine Pitrou <solipsis at>wrote:
>>> On Fri, 6 May 2011 23:06:18 +0200
>>> "dag.odenhall at"
>>> <dag.odenhall at> wrote:
>>> > On 6 May 2011 19:51, Matt Chaput <
>>> matt-KKMwxO2wslj3fQ9qLvQP4Q at> wrote:
>>> > > Not sure if this has been proposed before: A syntax change to allow
>>> > > underscores as thousands separators in literal numbers to improve
>>> > > readability, e.g.:
>>> > >
>>> > >  for i in range(1, 1_000_000):
>>> > >    pass
>>> > >
>>> > > I believe D allows this and while it's a small thing it really is
>>> much more
>>> > > readable.
>>> >
>>> > Ruby too.
>>> >
>>> > You could also use e-notation[1]: 1e6, in your example. In many
>>> > situations it's even more readable because you don't need to "count
>>> > the zeros". This is already supported in Python.
>>> Yes, but it gives a float, not an integer:
>>> >>> for i in range(0, 1e6): pass
>>> ...
>>> Traceback (most recent call last):
>>>  File "<stdin>", line 1, in <module>
>>> TypeError: 'float' object cannot be interpreted as an integer
>>> Regards
>>> Antoine.
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From p.f.moore at  Sat May  7 00:04:43 2011
From: p.f.moore at (Paul Moore)
Date: Fri, 6 May 2011 23:04:43 +0100
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 May 2011 21:52, Eric Snow <ericsnowcurrently at> wrote:
> He's saying that the package would be imported like normal. ?Then all
> "public" sub-modules of the package would automatically imported and bound
> to the namespace of the object that resulted from the import of the package.

There is no means of determining what submodules of a package exist.
Check PEP 302 for details - finders find modules ant they can do so
any way they like - there's nothing in the protocol to enumerate
subpackages, so you can't do it (if faced with a general PEP 302


From ethan at  Sat May  7 00:40:06 2011
From: ethan at (Ethan Furman)
Date: Fri, 06 May 2011 15:40:06 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Bruce Leban wrote:
> None of these answers address the original suggestion. Matt didn't say 
> that he only wanted this for numbers of the form 10^N; he just gave that 
> as an example.
> Consider these examples instead:
>     * 1_234_000
>     * 9.876_543_210
>     * 0xFEFF_0042
> I'm not advocating this change (nor against it); I just think the 
> discussion should be focused on the actual idea. I do have a question:
> Is _ just ignored in numbers or are there more complex rules?
>     * 1_2345_6789  (can I use groups of other sizes instead?)
>     * 1_2_3_4_5  (ditto)
>     * 1_234_6789  (do all the groups need to be the same size?)
>     * 1_   (must the _ only be in between 2 digits?)
>     * 1__234   (what about multiple _s?)
>     * 9.876_543_210   (can it be used to the right of the decimal point?)
>     * 0xFEFF_0042   (can it be used in hex, octal or binary numbers?)
>     * int('123_456')   (do other functions accept this syntax too?)

I would say it's ignored.  Have the rule be something like 

The only wrinkle is that currently '_1' is usable name, and that should 
probably be disallowed if the above change took place.

I'm +1 on the idea.


From alexander.belopolsky at  Sat May  7 00:42:59 2011
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Fri, 6 May 2011 18:42:59 -0400
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 6:40 PM, Ethan Furman <ethan at> wrote:
> The only wrinkle is that currently '_1' is usable name, and that should
> probably be disallowed if the above change took place.

-1_000 if _1 becomes invalid as an identifier.

+0 otherwise.

From fdrake at  Sat May  7 00:45:23 2011
From: fdrake at (Fred Drake)
Date: Fri, 6 May 2011 18:45:23 -0400
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 6:40 PM, Ethan Furman <ethan at> wrote:
> The only wrinkle is that currently '_1' is usable name, and that should
> probably be disallowed if the above change took place.

Why?  I've never seen a leading thousands separator in practice.  For example,


isn't generally accepted usage, so why should


be considered acceptable?

(I'm not taking a position on the proposal here; just commenting on the problem
of breaking code by making _1 a number instead of an identifier.)


Fred L. Drake, Jr.? ? <fdrake at>
"Give me the luxuries of life and I will willingly do without the necessities."
?? --Frank Lloyd Wright

From ethan at  Sat May  7 00:58:50 2011
From: ethan at (Ethan Furman)
Date: Fri, 06 May 2011 15:58:50 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Alexander Belopolsky wrote:
> On Fri, May 6, 2011 at 6:40 PM, Ethan Furman <ethan at> wrote:
> ..
>> The only wrinkle is that currently '_1' is usable name, and that should
>> probably be disallowed if the above change took place.
> -1_000 if _1 becomes invalid as an identifier.
> +0 otherwise.

So you use _8127 style names for your objects* then?


*Okay, avoiding the word 'variables' can make for some slightly odd 
sounding sentences!  ;)

From ethan at  Sat May  7 01:02:08 2011
From: ethan at (Ethan Furman)
Date: Fri, 06 May 2011 16:02:08 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

Fred Drake wrote:
> On Fri, May 6, 2011 at 6:40 PM, Ethan Furman <ethan at> wrote:
>> The only wrinkle is that currently '_1' is usable name, and that should
>> probably be disallowed if the above change took place.
> Why?  I've never seen a leading thousands separator in practice.  For example,
>     ,123,456
> isn't generally accepted usage, so why should
>     _123_456
> be considered acceptable?
> (I'm not taking a position on the proposal here; just commenting on the problem
> of breaking code by making _1 a number instead of an identifier.)

I see it as a readability issue -- if you have 1_024 and _1025 (etc, 
etc), where one is a number and the other a name, confusion can easily 


From fdrake at  Sat May  7 00:59:02 2011
From: fdrake at (Fred Drake)
Date: Fri, 6 May 2011 18:59:02 -0400
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 6:58 PM, Ethan Furman <ethan at> wrote:
> So you use _8127 style names for your objects* then?

Code generators often use such names, though.  Since _1234 is currently a
legal identifier, you'd be breaking backward compatibility.

I understand the motivation for a thousands separator, at least (though
I'll admit, I don't find it compelling; *all* big numbers in code are
too magical).


Fred L. Drake, Jr.? ? <fdrake at>
"Give me the luxuries of life and I will willingly do without the necessities."
?? --Frank Lloyd Wright

From cs at  Sat May  7 00:51:38 2011
From: cs at (Cameron Simpson)
Date: Sat, 7 May 2011 08:51:38 +1000
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On 06May2011 15:40, Ethan Furman <ethan at> wrote:
| Bruce Leban wrote:
| >Is _ just ignored in numbers or are there more complex rules?
| >
| >    * 1_2345_6789  (can I use groups of other sizes instead?)
| >    * 1_2_3_4_5  (ditto)
| >    * 1_234_6789  (do all the groups need to be the same size?)
| >    * 1_   (must the _ only be in between 2 digits?)
| >    * 1__234   (what about multiple _s?)
| >    * 9.876_543_210   (can it be used to the right of the decimal point?)
| >    * 0xFEFF_0042   (can it be used in hex, octal or binary numbers?)
| >    * int('123_456')   (do other functions accept this syntax too?)
| I would say it's ignored.  Have the rule be something like
| number_string.replace('_','').
| The only wrinkle is that currently '_1' is usable name, and that
| should probably be disallowed if the above change took place.
| I'm +1 on the idea.

Personally I'm be for ignoring the _ also, save that I would forbid it
at the start or end, so no _1 or 1_.

And I would permit it in hex code etc.

I'm +0.5, myself.

Cameron Simpson <cs at> DoD#743

A strong conviction that something must be done is the parent of many
bad measures.   - Daniel Webster

From python at  Sat May  7 01:41:33 2011
From: python at (MRAB)
Date: Sat, 07 May 2011 00:41:33 +0100
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On 06/05/2011 23:51, Cameron Simpson wrote:
> On 06May2011 15:40, Ethan Furman<ethan at>  wrote:
> | Bruce Leban wrote:
> |>Is _ just ignored in numbers or are there more complex rules?
> |>
> |>     * 1_2345_6789  (can I use groups of other sizes instead?)
> |>     * 1_2_3_4_5  (ditto)
> |>     * 1_234_6789  (do all the groups need to be the same size?)
> |>     * 1_   (must the _ only be in between 2 digits?)
> |>     * 1__234   (what about multiple _s?)
> |>     * 9.876_543_210   (can it be used to the right of the decimal point?)
> |>     * 0xFEFF_0042   (can it be used in hex, octal or binary numbers?)
> |>     * int('123_456')   (do other functions accept this syntax too?)
> |
> | I would say it's ignored.  Have the rule be something like
> | number_string.replace('_','').
> |
> | The only wrinkle is that currently '_1' is usable name, and that
> | should probably be disallowed if the above change took place.
> |
> | I'm +1 on the idea.
> Personally I'm be for ignoring the _ also, save that I would forbid it
> at the start or end, so no _1 or 1_.
> And I would permit it in hex code etc.
> I'm +0.5, myself.
As far as I remember, Ada also permits it, but has the rule that it can
occur only between digits. If we follow that, then:

     1_2345_6789 => Yes
     1_2_3_4_5 => Yes
     1_234_6789 => Yes
     1_ => No
     _1 => No
     1__234 => No
     9.876_543_210 => Yes
     9._876_543_210 => No
     9_.876_543_210 => No
     0xFEFF_0042 => Yes
     int('123_456') => Yes

From bruce at  Sat May  7 01:44:21 2011
From: bruce at (Bruce Leban)
Date: Fri, 6 May 2011 16:44:21 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

I'm opposed to changing int so that int('123_456') ignores the _ as that
will change the behavior of existing code and could break apps.

Alternatively, if you want to change int how about int('123_456',
separator='_') ignores the _. That would also admit int('123,456',

--- Bruce

On Fri, May 6, 2011 at 4:41 PM, MRAB <python at> wrote:

> On 06/05/2011 23:51, Cameron Simpson wrote:
>> On 06May2011 15:40, Ethan Furman<ethan at>  wrote:
>> | Bruce Leban wrote:
>> |>Is _ just ignored in numbers or are there more complex rules?
>> |>
>> |>     * 1_2345_6789  (can I use groups of other sizes instead?)
>> |>     * 1_2_3_4_5  (ditto)
>> |>     * 1_234_6789  (do all the groups need to be the same size?)
>> |>     * 1_   (must the _ only be in between 2 digits?)
>> |>     * 1__234   (what about multiple _s?)
>> |>     * 9.876_543_210   (can it be used to the right of the decimal
>> point?)
>> |>     * 0xFEFF_0042   (can it be used in hex, octal or binary numbers?)
>> |>     * int('123_456')   (do other functions accept this syntax too?)
>> |
>> | I would say it's ignored.  Have the rule be something like
>> | number_string.replace('_','').
>> |
>> | The only wrinkle is that currently '_1' is usable name, and that
>> | should probably be disallowed if the above change took place.
>> |
>> | I'm +1 on the idea.
>> Personally I'm be for ignoring the _ also, save that I would forbid it
>> at the start or end, so no _1 or 1_.
>> And I would permit it in hex code etc.
>> I'm +0.5, myself.
>>  As far as I remember, Ada also permits it, but has the rule that it can
> occur only between digits. If we follow that, then:
>    1_2345_6789 => Yes
>    1_2_3_4_5 => Yes
>    1_234_6789 => Yes
>    1_ => No
>    _1 => No
>    1__234 => No
>    9.876_543_210 => Yes
>    9._876_543_210 => No
>    9_.876_543_210 => No
>    0xFEFF_0042 => Yes
>    int('123_456') => Yes
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ironfroggy at  Sat May  7 01:55:11 2011
From: ironfroggy at (Calvin Spealman)
Date: Fri, 6 May 2011 19:55:11 -0400
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 7:44 PM, Bruce Leban <bruce at> wrote:
> I'm opposed to changing int so that int('123_456') ignores the _ as that
> will change the behavior of existing code and could break apps.
> Alternatively, if you want to change int how about int('123_456',
> separator='_') ignores the _. That would also admit int('123,456',
> separator=',')
> --- Bruce
> On Fri, May 6, 2011 at 4:41 PM, MRAB <python at> wrote:
>> On 06/05/2011 23:51, Cameron Simpson wrote:
>>> On 06May2011 15:40, Ethan Furman<ethan at> ?wrote:
>>> | Bruce Leban wrote:
>>> |>Is _ just ignored in numbers or are there more complex rules?
>>> |>
>>> |> ? ? * 1_2345_6789 ?(can I use groups of other sizes instead?)
>>> |> ? ? * 1_2_3_4_5 ?(ditto)
>>> |> ? ? * 1_234_6789 ?(do all the groups need to be the same size?)
>>> |> ? ? * 1_ ? (must the _ only be in between 2 digits?)
>>> |> ? ? * 1__234 ? (what about multiple _s?)
>>> |> ? ? * 9.876_543_210 ? (can it be used to the right of the decimal
>>> point?)
>>> |> ? ? * 0xFEFF_0042 ? (can it be used in hex, octal or binary numbers?)
>>> |> ? ? * int('123_456') ? (do other functions accept this syntax too?)
>>> |
>>> | I would say it's ignored. ?Have the rule be something like
>>> | number_string.replace('_','').
>>> |
>>> | The only wrinkle is that currently '_1' is usable name, and that
>>> | should probably be disallowed if the above change took place.
>>> |
>>> | I'm +1 on the idea.
>>> Personally I'm be for ignoring the _ also, save that I would forbid it
>>> at the start or end, so no _1 or 1_.
>>> And I would permit it in hex code etc.
>>> I'm +0.5, myself.
>> As far as I remember, Ada also permits it, but has the rule that it can
>> occur only between digits. If we follow that, then:
>> ? ?1_2345_6789 => Yes
>> ? ?1_2_3_4_5 => Yes
>> ? ?1_234_6789 => Yes
>> ? ?1_ => No
>> ? ?_1 => No
>> ? ?1__234 => No
>> ? ?9.876_543_210 => Yes
>> ? ?9._876_543_210 => No
>> ? ?9_.876_543_210 => No
>> ? ?0xFEFF_0042 => Yes
>> ? ?int('123_456') => Yes
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

I am +0 on the whole idea, but +0.5 if is not an underscore, which I
think is ugly. Would it conflict with any other syntax rules if
numbers allowed a space separator?

for i in range(1 111 111):

It looks cleaner and in a fixed-font should be just as obvious about
separator placement.

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From greg.ewing at  Sat May  7 01:56:04 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 07 May 2011 11:56:04 +1200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

Matt Chaput wrote:
> Not sure if this has been proposed before: A syntax change to allow 
> underscores as thousands separators in literal numbers to improve 
> readability,

It has, but it received a rather lukewarm response last time.

An alternative would be to allow spaces.


From pjenvey at  Sat May  7 01:59:35 2011
From: pjenvey at (Philip Jenvey)
Date: Fri, 6 May 2011 16:59:35 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On May 6, 2011, at 4:41 PM, MRAB wrote:

> On 06/05/2011 23:51, Cameron Simpson wrote:
>> On 06May2011 15:40, Ethan Furman<ethan at>  wrote:
>> | Bruce Leban wrote:
>> |>Is _ just ignored in numbers or are there more complex rules?
>> |>
>> |>     * 1_2345_6789  (can I use groups of other sizes instead?)
>> |>     * 1_2_3_4_5  (ditto)
>> |>     * 1_234_6789  (do all the groups need to be the same size?)
>> |>     * 1_   (must the _ only be in between 2 digits?)
>> |>     * 1__234   (what about multiple _s?)
>> |>     * 9.876_543_210   (can it be used to the right of the decimal point?)
>> |>     * 0xFEFF_0042   (can it be used in hex, octal or binary numbers?)
>> |>     * int('123_456')   (do other functions accept this syntax too?)
>> |
>> | I would say it's ignored.  Have the rule be something like
>> | number_string.replace('_','').
>> |
>> | The only wrinkle is that currently '_1' is usable name, and that
>> | should probably be disallowed if the above change took place.
>> |
>> | I'm +1 on the idea.
>> Personally I'm be for ignoring the _ also, save that I would forbid it
>> at the start or end, so no _1 or 1_.
>> And I would permit it in hex code etc.
>> I'm +0.5, myself.
> As far as I remember, Ada also permits it, but has the rule that it can
> occur only between digits. If we follow that, then:
>    1_2345_6789 => Yes
>    1_2_3_4_5 => Yes
>    1_234_6789 => Yes
>    1_ => No
>    _1 => No
>    1__234 => No
>    9.876_543_210 => Yes
>    9._876_543_210 => No
>    9_.876_543_210 => No
>    0xFEFF_0042 => Yes
>    int('123_456') => Yes

Java 7 also adds this feature. Its rules:

You can place underscores only between digits; you cannot place underscores in the following places:

	? At the beginning or end of a number
	? Adjacent to a decimal point in a floating point literal
	? Prior to an F or L suffix
	? In positions where a string of digits is expected
The following examples demonstrate valid and invalid underscore placements in numeric literals:

float pi1 = 3_.1415F;      // Invalid; cannot put underscores adjacent to a decimal point
float pi2 = 3._1415F;      // Invalid; cannot put underscores adjacent to a decimal point
long socialSecurityNumber1
  = 999_99_9999_L;         // Invalid; cannot put underscores prior to an L suffix

int x1 = _52;              // This is an identifier, not a numeric literal
int x2 = 5_2;              // OK (decimal literal)
int x3 = 52_;              // Invalid; cannot put underscores at the end of a literal
int x4 = 5_______2;        // OK (decimal literal)

int x5 = 0_x52;            // Invalid; cannot put underscores in the 0x radix prefix
int x6 = 0x_52;            // Invalid; cannot put underscores at the beginning of a number
int x7 = 0x5_2;            // OK (hexadecimal literal)
int x8 = 0x52_;            // Invalid; cannot put underscores at the end of a number

int x9 = 0_52;             // OK (octal literal)
int x10 = 05_2;            // OK (octal literal)
int x11 = 052_;            // Invalid; cannot put underscores at the end of a number

(From )

Philip Jenvey

From dholth at  Sat May  7 02:16:21 2011
From: dholth at (Daniel Holth)
Date: Fri, 6 May 2011 20:16:21 -0400
Subject: [Python-ideas] AttributeError: __exit__
Message-ID: <>

I just learned about Python internals from The ZODB transaction module. In
Python < 2.7, the module works as a transaction manager. More or less:

manager = Foo()
__exit__ = manager.__exit__
__enter__ = manager.__enter__

After Python 2.7, it doesn't work.

import transaction
with transaction: pass
>>> AttributeError: __exit__

It should be obvious to even the most casual observer that the exception is
because, after Python 2.7, the with: statement has its own opcode that
bypasses transaction.__getattribute__('__exit__') ->
transaction.__dict__['__exit__']. Instead, CPython calls special_lookup(),
looks for __exit__ on the module type, not the instance, doesn't find it,
and raises the AttributeError.


import sys
>>> AttributeError: 'module' object has no attribute '__exit__'

The interpreter should at least explain the AttributeError in the same way
as it does when the user triggers it directly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sat May  7 02:38:05 2011
From: guido at (Guido van Rossum)
Date: Fri, 6 May 2011 17:38:05 -0700
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

The point is that the pkg should use __all__ to declare what submodules
exist. That's what it was invented for!
On May 6, 2011 3:05 PM, "Paul Moore" <p.f.moore at> wrote:
> On 6 May 2011 21:52, Eric Snow <ericsnowcurrently at> wrote:
>> He's saying that the package would be imported like normal.  Then all
>> "public" sub-modules of the package would automatically imported and
>> to the namespace of the object that resulted from the import of the
> There is no means of determining what submodules of a package exist.
> Check PEP 302 for details - finders find modules ant they can do so
> any way they like - there's nothing in the protocol to enumerate
> subpackages, so you can't do it (if faced with a general PEP 302
> finder).
> Paul.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sat May  7 02:41:33 2011
From: guido at (Guido van Rossum)
Date: Fri, 6 May 2011 17:41:33 -0700
Subject: [Python-ideas] AttributeError: __exit__
In-Reply-To: <>
References: <>
Message-ID: <>

Please file a bug.

On May 6, 2011 5:17 PM, "Daniel Holth" <dholth at> wrote:
> I just learned about Python internals from The ZODB transaction module. In
Python < 2.7, the module works as a transaction manager. More or less:
> manager = Foo()
> __exit__ = manager.__exit__
> __enter__ = manager.__enter__
> After Python 2.7, it doesn't work.
> import transaction
> with transaction: pass
> >>> AttributeError: __exit__
> It should be obvious to even the most casual observer that the exception
is because, after Python 2.7, the with: statement has its own opcode that
bypasses transaction.__getattribute__('__exit__') ->
transaction.__dict__['__exit__']. Instead, CPython calls special_lookup(),
looks for __exit__ on the module type, not the instance, doesn't find it,
and raises the AttributeError.
> Instead,
> import sys
> sys.__exit__
> >>> AttributeError: 'module' object has no attribute '__exit__'
> The interpreter should at least explain the AttributeError in the same way
as it does when the user triggers it directly.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ben+python at  Sat May  7 02:44:09 2011
From: ben+python at (Ben Finney)
Date: Sat, 07 May 2011 10:44:09 +1000
Subject: [Python-ideas] 1 246 358 (was: 1_000_000)
References: <> <>
Message-ID: <>

Greg Ewing <greg.ewing at> writes:

> An alternative would be to allow spaces.

I would prefer to allow space between digits in a numeric literal.

    1 2345 6789
    1 2 3 4 5 6789
    1 234 6789
    1  234   567 89
    9.876 543 210
    0xFEFF 0042

This nicely parallels the fact that space can separate chunks of a
string literal.

But that still leaves the following inconsistency:

    int('1 234 567')

That will currently raise a ValueError. Should it continue to do so
under this proposal?

 \      ?You say ?Carmina?, and I say ?Burana?, You say ?Fortuna?, and |
  `\    I say ?cantata?, Carmina, Burana, Fortuna, cantata, Let's Carl |
_o__)                                the whole thing Orff.? ?anonymous |
Ben Finney

From guido at  Sat May  7 02:54:15 2011
From: guido at (Guido van Rossum)
Date: Fri, 6 May 2011 17:54:15 -0700
Subject: [Python-ideas] 1 246 358 (was: 1_000_000)
In-Reply-To: <>
References: <> <>
Message-ID: <>

Too ambiguous, too hard to parse. I like the _ proposal.
On May 6, 2011 5:45 PM, "Ben Finney" <ben+python at> wrote:
> Greg Ewing <greg.ewing at> writes:
>> An alternative would be to allow spaces.
> I would prefer to allow space between digits in a numeric literal.
> 1 2345 6789
> 1 2 3 4 5 6789
> 1 234 6789
> 1 234 567 89
> 9.876 543 210
> 0xFEFF 0042
> This nicely parallels the fact that space can separate chunks of a
> string literal.
> But that still leaves the following inconsistency:
> int('1 234 567')
> That will currently raise a ValueError. Should it continue to do so
> under this proposal?
> --
> \ ?You say ?Carmina?, and I say ?Burana?, You say ?Fortuna?, and |
> `\ I say ?cantata?, Carmina, Burana, Fortuna, cantata, Let's Carl |
> _o__) the whole thing Orff.? ?anonymous |
> Ben Finney
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From python at  Sat May  7 02:55:52 2011
From: python at (MRAB)
Date: Sat, 07 May 2011 01:55:52 +0100
Subject: [Python-ideas] 1 246 358
In-Reply-To: <>
References: <> <>
Message-ID: <>

On 07/05/2011 01:44, Ben Finney wrote:
> Greg Ewing<greg.ewing at>  writes:
>> An alternative would be to allow spaces.
> I would prefer to allow space between digits in a numeric literal.
>      1 2345 6789
>      1 2 3 4 5 6789
>      1 234 6789
>      1  234   567 89
>      9.876 543 210
>      0xFEFF 0042
> This nicely parallels the fact that space can separate chunks of a
> string literal.
> But that still leaves the following inconsistency:
>      int('1 234 567')
> That will currently raise a ValueError. Should it continue to do so
> under this proposal?
I prefer there not to be whitespace inside tokens. String literals are
an exception, they are explicitly delimited.

From steve at  Sat May  7 04:00:11 2011
From: steve at (Steven D'Aprano)
Date: Sat, 07 May 2011 12:00:11 +1000
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Bruce Leban wrote:

> Consider these examples instead:
>    - 1_234_000
>    - 9.876_543_210
>    - 0xFEFF_0042
> I'm not advocating this change (nor against it); I just think the discussion
> should be focused on the actual idea. I do have a question:
> Is _ just ignored in numbers or are there more complex rules?
>    - 1_2345_6789  (can I use groups of other sizes instead?)
>    - 1_2_3_4_5  (ditto)
>    - 1_234_6789  (do all the groups need to be the same size?)

+1 on all of these. I don't particularly like the look of _ as a number 
separator, but it's hard to think of any alternatives other than space, 
and some separator is better than long sequences of digits.

I'm -0.5 on spaces even though it looks MUCH better, because it's too 
easy to leave the commas out in lists etc:

L = [1, 2, 3, 4 5, 6, 7, 8, 9, 10]  # oops, wanted 4 & 5 not 45

(Admittedly if the items where strings, the same failure mode applies.)

>    - 1_   (must the _ only be in between 2 digits?)
>    - 1__234   (what about multiple _s?)

-1 on allowing either _1 or 1_ as numbers.

-0 on allowing doubled underscores.

>    - 9.876_543_210   (can it be used to the right of the decimal point?)
>    - 0xFEFF_0042   (can it be used in hex, octal or binary numbers?)

+1 on these two.

>    - int('123_456')   (do other functions accept this syntax too?)

That's a tricky one... I'd say No, but I'm not entirely sure. It's easy 
enough to say:

int('123_456'.replace('_', ''))

albeit a tad verbose. Also easy to say:

int('123' '456')

which is less verbose. And it will change the behaviour of the int 
function. So I don't think we need to support separators inside strings.

We can always change our mind later and add it in, but it's much harder 
to take it out later.


From steve at  Sat May  7 04:00:43 2011
From: steve at (Steven D'Aprano)
Date: Sat, 07 May 2011 12:00:43 +1000
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Ethan Furman wrote:

> I see it as a readability issue -- if you have 1_024 and _1025 (etc, 
> etc), where one is a number and the other a name, confusion can easily 
> result.

I don't think there will be *that* much confusion though.

_1025 can occur on the LHS of an assignment, 1_024 cannot. And we 
already distinguish between x1234 and 0x1234 without much confusion.


From guido at  Sat May  7 05:45:18 2011
From: guido at (Guido van Rossum)
Date: Fri, 6 May 2011 20:45:18 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 7:00 PM, Steven D'Aprano <steve at> wrote:
> Bruce Leban wrote:
>> Consider these examples instead:
>> ? - 1_234_000
>> ? - 9.876_543_210
>> ? - 0xFEFF_0042
>> I'm not advocating this change (nor against it); I just think the
>> discussion
>> should be focused on the actual idea. I do have a question:
>> Is _ just ignored in numbers or are there more complex rules?
>> ? - 1_2345_6789 ?(can I use groups of other sizes instead?)
>> ? - 1_2_3_4_5 ?(ditto)
>> ? - 1_234_6789 ?(do all the groups need to be the same size?)
> +1 on all of these. I don't particularly like the look of _ as a number
> separator, but it's hard to think of any alternatives other than space, and
> some separator is better than long sequences of digits.
> I'm -0.5 on spaces even though it looks MUCH better, because it's too easy
> to leave the commas out in lists etc:
> L = [1, 2, 3, 4 5, 6, 7, 8, 9, 10] ?# oops, wanted 4 & 5 not 45
> (Admittedly if the items where strings, the same failure mode applies.)

And it does sometimes bite. So let's not do more of that. (In
retrospect 'xxx' + 'yyy' would have been good enough.)

>> ? - 1_ ? (must the _ only be in between 2 digits?)
>> ? - 1__234 ? (what about multiple _s?)
> -1 on allowing either _1 or 1_ as numbers.
> -0 on allowing doubled underscores.
>> ? - 9.876_543_210 ? (can it be used to the right of the decimal point?)
>> ? - 0xFEFF_0042 ? (can it be used in hex, octal or binary numbers?)
> +1 on these two.

Steven channels me well so far.

Fine points about _ in floats: IMO the _ should be allowed to appear
between any two digits, or between the last digit and the 'e' in the
exponent, or between the 'e' and a following digit. But not adjacent
to the '.' or to the '+' or '-' in the exponent. So 3.141_593 yes,
3_.14 no.

Fine points about _ in bin/oct/hex literals: 0x_dead_beef yes, 0_xdeadbeef no.

(The overall rule seems to be that it must be internal to alphanumeric
strings, except that leading 0x, 0o or 0b must not be separated --
somehow I find 0_x_dead_beef would be a disservice to human readers.)

>> ? - int('123_456') ? (do other functions accept this syntax too?)
> That's a tricky one... I'd say No, but I'm not entirely sure. It's easy
> enough to say:
> int('123_456'.replace('_', ''))
> albeit a tad verbose. Also easy to say:
> int('123' '456')
> which is less verbose.

But that's not how it'll be used. The argument will be provided by the
user of the code.

> And it will change the behaviour of the int function.
> So I don't think we need to support separators inside strings.

I think it's fine, the same reason why we want to write 1_234_567 in
code sometimes applies to input or command line arguments too, and I
see little harm.

> We can always change our mind later and add it in, but it's much harder to
> take it out later.

It seems entirely harmless here. Also for float().

It would also be nice to have an easy way to emit _ in suitable
places. Maybe this could be added to the .format() language for
numbers? It would be nice if you could tell it to emit an _ every N

--Guido van Rossum (

From cs at  Sat May  7 06:29:11 2011
From: cs at (Cameron Simpson)
Date: Sat, 7 May 2011 14:29:11 +1000
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On 06May2011 19:55, Calvin Spealman <ironfroggy at> wrote:
| I am +0 on the whole idea, but +0.5 if is not an underscore, which I
| think is ugly.

I think the underscore is one of the better choices:

  - it is very visible, unlike a dot or comma

  - it is "low" or "flat", not intruding into the glyph space of the
    digits, leaving things easy to read

  - it is already widely used (perl (sorry), Ada (where I first
    encountered it now that someone ele has mentioned it, etc)
    i.e. it is a pre-existing idom with successful use

| Would it conflict with any other syntax rules if
| numbers allowed a space separator?
| for i in range(1 111 111):
|     foo(i)
| It looks cleaner and in a fixed-font should be just as obvious about
| separator placement.

I'm very -1 on this one. Like another recent proposal it take a common
typing error and turns it into legal syntax. Code that once would fail
to compile because the author dropped a comma between values now runs,
with silent breakage (the new stuff isn't even the wrong type!)

Cameron Simpson <cs at> DoD#743

It's there as a sop to former Ada programmers.  :-)
        - Larry Wall regarding 10_000_000 in <11556 at jpl-devvax.JPL.NASA.GOV>

From cs at  Sat May  7 06:30:09 2011
From: cs at (Cameron Simpson)
Date: Sat, 7 May 2011 14:30:09 +1000
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On 07May2011 00:41, MRAB <python at> wrote:
| As far as I remember, Ada also permits it,

That's where I first encountered it myself.

| but has the rule that it can
| occur only between digits. If we follow that, then:
|     1_2345_6789 => Yes
|     1_2_3_4_5 => Yes
|     1_234_6789 => Yes
|     1_ => No
|     _1 => No
|     1__234 => No
|     9.876_543_210 => Yes
|     9._876_543_210 => No
|     9_.876_543_210 => No
|     0xFEFF_0042 => Yes
|     int('123_456') => Yes

+1 to this.

Cameron Simpson <cs at> DoD#743

It is impossible to travel faster than light, and certainly not desirable as
ones hat keeps blowing off.     - Woody Allen

From ben+python at  Sat May  7 07:03:42 2011
From: ben+python at (Ben Finney)
Date: Sat, 07 May 2011 15:03:42 +1000
Subject: [Python-ideas] 1 246 358
References: <> <>
Message-ID: <>

MRAB <python at> writes:

> On 07/05/2011 01:44, Ben Finney wrote:
> > I would prefer to allow space between digits in a numeric literal.

> > This nicely parallels the fact that space can separate chunks of a
> > string literal.

> I prefer there not to be whitespace inside tokens. String literals are
> an exception, they are explicitly delimited.

That's a good justification for the special case. Okay, I withdraw my

 \     ?Facts are stubborn things; and whatever may be our wishes, our |
  `\   inclinations, or the dictates of our passion, they cannot alter |
_o__)        the state of facts and evidence.? ?John Adams, 1770-12-04 |
Ben Finney

From lac at  Sat May  7 07:05:37 2011
From: lac at (Laura Creighton)
Date: Sat, 07 May 2011 07:05:37 +0200
Subject: [Python-ideas] 1_000_000
In-Reply-To: Message from MRAB <> of "Sat,
	07 May 2011 00:41:33 BST." <> 
References: <>
Message-ID: <>

If you disallow variable names of the form _<some number> you will break a 
huge amount of my automatically generated code.  Admittedly, it wouldn't
be hard to change things so that the generated variables are now
X<some number> instead, but that happens to be the way I have written
it now.


From greg.ewing at  Sat May  7 09:29:47 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 07 May 2011 19:29:47 +1200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan Furman wrote:

> So you use _8127 style names for your objects* then?

I can easily imagine a code generator producing names
like that to reduce the chance of collision with a
user's names.


From greg.ewing at  Sat May  7 09:36:07 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 07 May 2011 19:36:07 +1200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

Fred Drake wrote:

> I understand the motivation for a thousands separator, at least (though
> I'll admit, I don't find it compelling; *all* big numbers in code are
> too magical).

Bigness is a relative concept. Avogadro's number is fairly
big in absolute terms, but you can hold that many molecules
in your hand quite easily.

Although writing it as 6_020_000_000_000_000_000_000_000_000
probably wouldn't be very helpful.


From greg.ewing at  Sat May  7 09:41:43 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 07 May 2011 19:41:43 +1200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan Furman wrote:

> I see it as a readability issue -- if you have 1_024 and _1025 (etc, 
> etc), where one is a number and the other a name, confusion can easily 
> result.

But probably not much worse than the confusion you can get
today between 1234e6 and _1234e6, or O000001 and 0000001.
There will always be ways of creating confusing-looking
code if you put your mind to it. :-)


From greg.ewing at  Sat May  7 09:46:57 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 07 May 2011 19:46:57 +1200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

Bruce Leban wrote:
> I'm opposed to changing int so that int('123_456') ignores the _ as that 
> will change the behavior of existing code and could break apps.

But int('123_456', 0) should perhaps work? (On the grounds that
it parses numbers using the same syntax as Python source.)


From greg.ewing at  Sat May  7 09:51:35 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 07 May 2011 19:51:35 +1200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

Philip Jenvey wrote:

> int x4 = 5_______2;        // OK (decimal literal)

Hmmm, that one looks really weird -- maybe it should be
disallowed as well?


From steve at  Sat May  7 10:18:22 2011
From: steve at (Steven D'Aprano)
Date: Sat, 07 May 2011 18:18:22 +1000
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Greg Ewing wrote:
> Philip Jenvey wrote:
>> int x4 = 5_______2;        // OK (decimal literal)
> Hmmm, that one looks really weird -- maybe it should be
> disallowed as well?

I don't think we need disallow it merely over an aesthetic judgement 
(although it does look weird *grins*). There is precedence with 
separators in collections:

 >>> t = (1,,,,2)
   File "<stdin>", line 1
     t = (1,,,,2)
SyntaxError: invalid syntax

Like consecutive commas, consecutive underscores are likely to indicate 
a typo rather than a deliberate decision. So I'm +1 on strictly 
enforcing a single underscore between digits.


From greg.ewing at  Sat May  7 10:27:14 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 07 May 2011 20:27:14 +1200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Steven D'Aprano wrote:

> Like consecutive commas, consecutive underscores are likely to indicate 
> a typo rather than a deliberate decision.

Well, yes, that's really the rationale I had in mind.

Although it would provide an amusingly funky way of
introducing dividing line comments into your code:

class A:


class B:

You could even decorate it with scissors for a bit
more panache:



From p.f.moore at  Sat May  7 10:58:56 2011
From: p.f.moore at (Paul Moore)
Date: Sat, 7 May 2011 09:58:56 +0100
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On 7 May 2011 01:38, Guido van Rossum <guido at> wrote:
> The point is that the pkg should use __all__ to declare what submodules
> exist. That's what it was invented for!

Hmm, OK. I missed that. But how would that work?


__all__ = ['p2', 'foo']
def foo(): print ""


__all__ = ['foo']
def foo(): print ""

If I import p1, p1.__all__ shows me that p2 and foo are public.
exists and I can tell it's not a module. p1.p2 doesn't exist in the p1
namespace at the moment, so how do I tell that I need to import it?
Just assume all nonexistent names are subpackages, and import them?
That doesn't seem like a very robust approach.

A proof of concept in the form of a Python implementation (as a
function) would help me understand, I guess. (But I still doubt that
even if it's implementable, the feature is much practical use...)


From dirkjan at  Sat May  7 14:16:48 2011
From: dirkjan at (Dirkjan Ochtman)
Date: Sat, 7 May 2011 14:16:48 +0200
Subject: [Python-ideas] thoughts on regular expression improvements
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 6, 2011 at 22:32, Bill Janssen <janssen at> wrote:
> Ah, you mean the PyPI "regex". ?Looks like it has "branch reset", which
> might support my #1? ?Using the same group name multiple times?
> I don't see fuzzy matches, or support for composition, though.

I might've been more specific: I think MRAB is working on regex as a
playground for new regex-module things (and potentially a replacement
for stdlib re), so it might be a good place to implement these kinds
of things or discuss them.



From guido at  Sat May  7 16:41:55 2011
From: guido at (Guido van Rossum)
Date: Sat, 7 May 2011 07:41:55 -0700
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, May 7, 2011 at 1:58 AM, Paul Moore <p.f.moore at> wrote:
> On 7 May 2011 01:38, Guido van Rossum <guido at> wrote:
>> The point is that the pkg should use __all__ to declare what submodules
>> exist. That's what it was invented for!
> Hmm, OK. I missed that. But how would that work?
> p1/
> __all__ = ['p2', 'foo']
> def foo(): print ""
> p1/p2/
> __all__ = ['foo']
> def foo(): print ""
> If I import p1, p1.__all__ shows me that p2 and foo are public.
> exists and I can tell it's not a module. p1.p2 doesn't exist in the p1
> namespace at the moment, so how do I tell that I need to import it?
> Just assume all nonexistent names are subpackages, and import them?
> That doesn't seem like a very robust approach.

Do whatever "from pkg import *" does today.

Though the recursive application is new. I think (if we do this) it
should be recursive. The implementation is straightforward, though the
consequences may not be (think cyclic imports).

> A proof of concept in the form of a Python implementation (as a
> function) would help me understand, I guess. (But I still doubt that
> even if it's implementable, the feature is much practical use...)

It deviates from "import what you use" for sure. OTOH it is a better
alternative to "from pkg import *" because it does not pollute the
namespace. I believe Java users are used to this.

--Guido van Rossum (

From python at  Sat May  7 17:32:53 2011
From: python at (MRAB)
Date: Sat, 07 May 2011 16:32:53 +0100
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

On 07/05/2011 08:46, Greg Ewing wrote:
> Bruce Leban wrote:
>> I'm opposed to changing int so that int('123_456') ignores the _ as
>> that will change the behavior of existing code and could break apps.
> But int('123_456', 0) should perhaps work? (On the grounds that
> it parses numbers using the same syntax as Python source.)
There's also the argument that if you forbid it then the programmer
may have to write:

     int(string.replace("_", ""))

in order to let the user include underscores, which would make it too
permissive. If the user entered "_10", the above code would accept it.

From g.brandl at  Sat May  7 18:11:21 2011
From: g.brandl at (Georg Brandl)
Date: Sat, 07 May 2011 18:11:21 +0200
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <iq3qv9$po9$>

On 07.05.2011 10:27, Greg Ewing wrote:
> Steven D'Aprano wrote:
>> Like consecutive commas, consecutive underscores are likely to indicate 
>> a typo rather than a deliberate decision.
> Well, yes, that's really the rationale I had in mind.
> Although it would provide an amusingly funky way of
> introducing dividing line comments into your code:
> class A:
>      ...
>      ...
>      ...
> 0____________________________________0



From g.brandl at  Sat May  7 18:12:06 2011
From: g.brandl at (Georg Brandl)
Date: Sat, 07 May 2011 18:12:06 +0200
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <iq3r0m$po9$>

On 06.05.2011 21:49, Brendan Moloney wrote:
> dag.odenhall at wrote:
>> I like this idea, except it's inconsistent with from-import-star, the 
>> latter which does *not* get you sub-packages or modules.
> Georg Brandl [g.brandl at] wrote:
>> And that's for a reason: it's not easy (I think it's even impossible,
>> because for example individual submodules can change __path__) to determine
>> all importable submodules of a package.
>> So ``import pkg.*`` would not have any behavior other than ``import pkg``.
> When I said all _public_ sub-packages and modules I was referring to those
> listed in the  __all__ attribute of 'pkg'.  Thus it would behave in the exact
> same way as from-import-star except you don't pollute the current namespace.

Right -- I forgot about __all__.


From dholth at  Sat May  7 19:15:07 2011
From: dholth at (Daniel Holth)
Date: Sat, 7 May 2011 13:15:07 -0400
Subject: [Python-ideas] AttributeError: __exit__
In-Reply-To: <>
References: <>
Message-ID: <>

OK. I will reopen the related bug that was immediately closed with a
suggestion to check with the python-ideas mailing list.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From dholth at  Sat May  7 19:49:58 2011
From: dholth at (Daniel Holth)
Date: Sat, 7 May 2011 13:49:58 -0400
Subject: [Python-ideas] proposal: module-level __init__
Message-ID: <>

__all__ is very useful when doing import *, which is frowned upon. As an
alternative, allow modules to contain a function called __init__ that
defines that module's exported symbols by way of the global statement. By
importing modules that are used, but not intended to be exported, inside the
__init__ function, programmers avoid cases such as the unintentional
'somemodule.sys' (referring to a module by its non-canonical name) that
makes it harder to refactor larger projects.


__all__ = ['a', 'b']
import sys
def a(): pass
def b(): pass
def c(): pass


def __init__():
    global a, b
    import sys
    def a(): pass
    def b(): pass
    def c(): pass

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mal at  Sat May  7 19:56:50 2011
From: mal at (M.-A. Lemburg)
Date: Sat, 07 May 2011 19:56:50 +0200
Subject: [Python-ideas] proposal: module-level __init__
In-Reply-To: <>
References: <>
Message-ID: <>

Daniel Holth wrote:
> __all__ is very useful when doing import *, which is frowned upon. As an
> alternative, allow modules to contain a function called __init__ that
> defines that module's exported symbols by way of the global statement. By
> importing modules that are used, but not intended to be exported, inside the
> __init__ function, programmers avoid cases such as the unintentional
> 'somemodule.sys' (referring to a module by its non-canonical name) that
> makes it harder to refactor larger projects.
> Before:
> __all__ = ['a', 'b']
> import sys
> def a(): pass
> def b(): pass
> def c(): pass
> After:
> def __init__():
>     global a, b
>     import sys
>     def a(): pass
>     def b(): pass
>     def c(): pass
> __init__()

This is already possible and used in modules where you don't
want to clutter up the global namespace. Where's the novelty ?

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 07 2011)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2011-06-20: EuroPython 2011, Florence, Italy               44 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From fdrake at  Sat May  7 21:26:22 2011
From: fdrake at (Fred Drake)
Date: Sat, 7 May 2011 15:26:22 -0400
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sat, May 7, 2011 at 4:27 AM, Greg Ewing <greg.ewing at> wrote:
> You could even decorate it with scissors for a bit
> more panache:
> 0_____8<0_____8<0_____8<0_____8<0_____0

Heh.  Thanks for the swell tip, Martha Stewart!


Fred L. Drake, Jr.? ? <fdrake at>
"Give me the luxuries of life and I will willingly do without the necessities."
?? --Frank Lloyd Wright

From eric at  Sat May  7 21:51:36 2011
From: eric at (Eric Smith)
Date: Sat, 07 May 2011 15:51:36 -0400
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

On 05/06/2011 11:45 PM, Guido van Rossum wrote:

> It would also be nice to have an easy way to emit _ in suitable
> places. Maybe this could be added to the .format() language for
> numbers? It would be nice if you could tell it to emit an _ every N
> positions.

We already support commas (PEP 378). Adding underscores in the same way
would be easy. However, you can't specify N, it's always 3.


From guido at  Sat May  7 23:06:12 2011
From: guido at (Guido van Rossum)
Date: Sat, 7 May 2011 14:06:12 -0700
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, May 7, 2011 at 12:51 PM, Eric Smith <eric at> wrote:
> On 05/06/2011 11:45 PM, Guido van Rossum wrote:
>> It would also be nice to have an easy way to emit _ in suitable
>> places. Maybe this could be added to the .format() language for
>> numbers? It would be nice if you could tell it to emit an _ every N
>> positions.
> We already support commas (PEP 378). Adding underscores in the same way
> would be easy. However, you can't specify N, it's always 3.

Which would suck for non-decimal formats. :-( Also there seem to be
some countries where the conventions for formatting currency uses
groupings other than 1000. E.g. (though specifying
N wouldn't be enough there).

--Guido van Rossum (

From jeanpierreda at  Sun May  8 00:38:47 2011
From: jeanpierreda at (Devin Jeanpierre)
Date: Sat, 7 May 2011 18:38:47 -0400
Subject: [Python-ideas] 1_000_000
In-Reply-To: <>
References: <>
Message-ID: <>

>> On 05/06/2011 11:45 PM, Guido van Rossum wrote:
> Which would suck for non-decimal formats. :-( Also there seem to be
> some countries where the conventions for formatting currency uses
> groupings other than 1000. E.g.
> (though specifying
> N wouldn't be enough there).

Wouldn't something like that be the job of locale.currency()?

Devin Jeanpierre

From jeanpierreda at  Sun May  8 02:57:29 2011
From: jeanpierreda at (Devin Jeanpierre)
Date: Sat, 7 May 2011 20:57:29 -0400
Subject: [Python-ideas] Rename python.exe to python3.exe on Windows
Message-ID: <>


On most *nix systems, Python 3.x is available as the python3
executable, and Python 2.x as the 'python' executable. This lets both
exist side-by-side and be usable from the command-line. The
alternative (used by Arch), is to name Python 2.x 'python2', and 3.x
'python'. The Windows distribution of Python does neither, it names
them both 'python.exe', meaning that you can't install and use both at
once. Moreover, if you install Python 2.7 and then Python 3.2, the
default handler for .py files is set to Python 3.2, and changing it to
2.7 is difficult because of a quirk in Eexplorer that forces you to
choose between two non-distinguishable "python.exe"s. This is made
much more difficult if in fact you installed five or so different
Python versions. Also any automated tests using something like Cram
that use python3 will not work, and any batch scripts that use
python.exe will work differently depending on the host system.

(It wouldn't be awful to get python-X.Y.exe executables, either).

The downside of this is that any code that tries to use
C:\Python3Y\python.exe breaks. Such code is probably broken anyway,
there are multiple Ys around, and Python can be installed in My
Documents or wherever. PEP 397 should relieve the issues with opening
.py files, making some of this unnecessary with that change, as well.

I'm guessing that it would also be appropriate to rename pythonw.exe
to python3w.exe. I doubt that particular change matters at all, it's
solely to do with opening .pyw files, and that should be handled by
PEP 397.

I'd appreciate any thoughts or comments you might have.

Thanks for your time,
Devin Jeanpierre

From ben+python at  Sun May  8 03:21:52 2011
From: ben+python at (Ben Finney)
Date: Sun, 08 May 2011 11:21:52 +1000
Subject: [Python-ideas] Rename python.exe to python3.exe on Windows
References: <>
Message-ID: <>

Devin Jeanpierre <jeanpierreda at>

> On most *nix systems, Python 3.x is available as the python3
> executable, and Python 2.x as the 'python' executable. This lets both
> exist side-by-side and be usable from the command-line.

More importantly, it ensures that programs written for older Python 2.x
will continue to run with the default ?python?.

If the default ?python? were Python 3.x, programs expecting Python 2.x
would most likely break due to backward incompatibility. So it's best if
the ?python? program invokes only Python 2.x.

 \             ?To label any subject unsuitable for comedy is to admit |
  `\                                           defeat.? ?Peter Sellers |
_o__)                                                                  |
Ben Finney

From steve at  Sun May  8 04:28:07 2011
From: steve at (Steven D'Aprano)
Date: Sun, 08 May 2011 12:28:07 +1000
Subject: [Python-ideas] Rename python.exe to python3.exe on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

Ben Finney wrote:

> If the default ?python? were Python 3.x, programs expecting Python 2.x
> would most likely break due to backward incompatibility. So it's best if
> the ?python? program invokes only Python 2.x.

The first sentence is true. The second is a value judgement, not a 
statement of fact, and the people behind Arch Linux disagree with you.

I say, good on 'em.

I wish I could find the quote somebody made about Arch being the distro 
that makes Gentoo seem cautious and conservative... something about Arch 
moving forward so the Gentoo folks know which mistakes not to make?


From stephen at  Mon May  9 12:39:17 2011
From: stephen at (Stephen J. Turnbull)
Date: Mon, 09 May 2011 19:39:17 +0900
Subject: [Python-ideas] Rename python.exe to python3.exe on Windows
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Steven D'Aprano writes:

 > I wish I could find the quote somebody made about Arch being the distro 
 > that makes Gentoo seem cautious and conservative... something about Arch 
 > moving forward so the Gentoo folks know which mistakes not to make?

The only thing history teaches us is that nobody learns from
others' history:

$ python
Python 3.1.3 (r313:86834, Feb 22 2011, 18:52:21) 
[GCC 4.3.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.

There are a couple of ebuilds that break because of this.

From ncoghlan at  Mon May  9 16:04:16 2011
From: ncoghlan at (Nick Coghlan)
Date: Tue, 10 May 2011 00:04:16 +1000
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, May 7, 2011 at 6:52 AM, Eric Snow <ericsnowcurrently at> wrote:

> If you have a list of the submodules you want imported then you can already
> accomplish this:
> import parent
> for mod in parent.__all_submodules__:
> ? ? __import__("parent.{}".format(mod))

> Of course, this does not bind the submodules to the namespace of the package
> module

It actually does, as binding the submodule name in the parent package
namespace is part of the responsibility of __import__():

>>> import logging
>>> logging.handlers
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'handlers'
>>> __import__("logging.handlers")
<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>
>>> logging.handlers
<module 'logging.handlers' from '/usr/lib/python2.7/logging/handlers.pyc'>

This is one of the reasons circular imports are such a pain - we
pre-bind them in sys.modules, and remove them again if the import
fails, but we don't currently do that in the parent package namespace,
so circular imports sometimes work and sometime break depending not
only on which names are accessed but also *how* they're accessed (e.g.
in a/b/, "import a.b.c" will work, "import a.b.c; c = a.b.c" will
fail with AttributeError and "from a.b import c" will fail with

>?I am not sure
> of the specific import mechanism with regards to name binding, but that
> would seem to be a conflict with the way imported names for submodules are
> bound.

Nope, it's basically the same as what happens automatically when the
modules are imported normally. Indeed, as near as I can tell, this
request amounts to asking for syntactic sugar that does something
roughly along the lines of:

def _subnames(pkg_name, subnames):
    for subname in subnames:
        yield ".".join(pkg_name, subname)

def import_all(pkg):
        pkg_all = pkg.__all__
    except AttributeError:
        names = list(_subnames(pkg.__name__, pkg_all))
        for name in names:
            mod = importlib.import_module(name)
                mod_all = mod.__all__
            except AttributeError:
                names.extend(_subnames(mod.__name__, mod_all)

I can see a case being made to provide that as a function in pkgutil
(or perhaps importlib itself), but I don't see any reason to give it
dedicated syntax.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From grosser.meister.morti at  Mon May  9 18:43:13 2011
From: grosser.meister.morti at (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Mon, 09 May 2011 18:43:13 +0200
Subject: [Python-ideas] Rename python.exe to python3.exe on Windows
In-Reply-To: <>
References: <>	<>
Message-ID: <>

I would say in every Python installation there should be a binary with the version number attached. 
I think in most (all?) Linux distributions this is already the case. E.g. there is python2.7 and 
python3.2. There is also python2, that links to some python2.x, and python3 that links to some 
python3.x, and then there is python, that links to any of the above.

Under Linux/Mac OS X we already add a line like this to our scripts:
#!/usr/bin/env python

Or better:
#!/usr/bin/env python3

I say it should be documented that the first is deprecated and the latter form shall be used. Using 
"#!/usr/bin/env python" should mean "this script is written so that it can be run in *any* python 
version", which is pretty unrealistic. "#!/usr/bin/env python3" should mean "this script is written 
so that it can be run in any python 3.x version" and so on.

Of course there are scripts that do not use this right. They should be considered as broken and be 
fixed. (Maybe print deprecation warnings if possible?)

Now on Windows there is no #! mechanism. I think it would be worthwhile to fix this and implement a 
python-dispatcher for Windows. This would then parse the #!-line, drop the "/usr/bin/env" part (if 
it exists) and lookup the right Python binary form a registry variable. I don't know if there are 
any registry variables set in a Windows Python installation that let you find the binary of a 
certain version, but I think it would be a good thing.

This way correct scripts would just work under Unix (Linux, Mac, BSD) and Windows. And under Windows 
you would not have any problems with file type associations. *.py and *.pyw files just have to be 
associated with the dispatcher. It should not matter if the dispatcher is from a Python 2.x or 
Python 3.x installation.


On 05/09/2011 12:39 PM, Stephen J. Turnbull wrote:
> Steven D'Aprano writes:
>   >  I wish I could find the quote somebody made about Arch being the distro
>   >  that makes Gentoo seem cautious and conservative... something about Arch
>   >  moving forward so the Gentoo folks know which mistakes not to make?
> The only thing history teaches us is that nobody learns from
> others' history:
> $ python
> Python 3.1.3 (r313:86834, Feb 22 2011, 18:52:21)
> [GCC 4.3.5] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> $
> There are a couple of ebuilds that break because of this.

From ncoghlan at  Mon May  9 18:55:32 2011
From: ncoghlan at (Nick Coghlan)
Date: Tue, 10 May 2011 02:55:32 +1000
Subject: [Python-ideas] Rename python.exe to python3.exe on Windows
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Tue, May 10, 2011 at 2:43 AM, Mathias Panzenb?ck
<grosser.meister.morti at> wrote:
> Now on Windows there is no #! mechanism. I think it would be worthwhile to
> fix this and implement a python-dispatcher for Windows. This would then
> parse the #!-line, drop the "/usr/bin/env" part (if it exists) and lookup
> the right Python binary form a registry variable. I don't know if there are
> any registry variables set in a Windows Python installation that let you
> find the binary of a certain version, but I think it would be a good thing.
> This way correct scripts would just work under Unix (Linux, Mac, BSD) and
> Windows. And under Windows you would not have any problems with file type
> associations. *.py and *.pyw files just have to be associated with the
> dispatcher. It should not matter if the dispatcher is from a Python 2.x or
> Python 3.x installation.

Since this came up not all that long ago, I'll point people to PEP 394
(for the current draft recommendation regarding symlinks on *nix
systems) and PEP 397 (for proposed Windows launcher semantics).


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From guido at  Mon May  9 19:02:59 2011
From: guido at (Guido van Rossum)
Date: Mon, 9 May 2011 10:02:59 -0700
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 9, 2011 at 7:04 AM, Nick Coghlan <ncoghlan at> wrote:
> This is one of the reasons circular imports are such a pain - we
> pre-bind them in sys.modules, and remove them again if the import
> fails, but we don't currently do that in the parent package namespace,
> so circular imports sometimes work and sometime break depending not
> only on which names are accessed but also *how* they're accessed (e.g.
> in a/b/, "import a.b.c" will work, "import a.b.c; c = a.b.c" will
> fail with AttributeError and "from a.b import c" will fail with
> ImportError).

Maybe that's something we could strive to fix?

> I can see a case being made to provide that as a function in pkgutil
> (or perhaps importlib itself), but I don't see any reason to give it
> dedicated syntax.


--Guido van Rossum (

From ncoghlan at  Mon May  9 19:16:52 2011
From: ncoghlan at (Nick Coghlan)
Date: Tue, 10 May 2011 03:16:52 +1000
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 10, 2011 at 3:02 AM, Guido van Rossum <guido at> wrote:
> On Mon, May 9, 2011 at 7:04 AM, Nick Coghlan <ncoghlan at> wrote:
>> This is one of the reasons circular imports are such a pain - we
>> pre-bind them in sys.modules, and remove them again if the import
>> fails, but we don't currently do that in the parent package namespace,
>> so circular imports sometimes work and sometime break depending not
>> only on which names are accessed but also *how* they're accessed (e.g.
>> in a/b/, "import a.b.c" will work, "import a.b.c; c = a.b.c" will
>> fail with AttributeError and "from a.b import c" will fail with
>> ImportError).
> Maybe that's something we could strive to fix?

The relevant bug is still open:

My recollection is that the division of responsibility between the
core import code and PEP 302 loaders gets a little confused on this
point (although I don't recall if that's a real confusion or just an
artefact of the structure of the legacy import code).

It will hopefully be a little easier to fix once importlib takes over
from import.c and the pre-PEP 302 legacy stuff goes away.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ericsnowcurrently at  Mon May  9 20:55:20 2011
From: ericsnowcurrently at (Eric Snow)
Date: Mon, 9 May 2011 12:55:20 -0600
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 9, 2011 at 8:04 AM, Nick Coghlan <ncoghlan at> wrote:

> On Sat, May 7, 2011 at 6:52 AM, Eric Snow <ericsnowcurrently at>
> wrote:
> > If you have a list of the submodules you want imported then you can
> already
> > accomplish this:
> > import parent
> > for mod in parent.__all_submodules__:
> >     __import__("parent.{}".format(mod))
> > Of course, this does not bind the submodules to the namespace of the
> package
> > module
> It actually does, as binding the submodule name in the parent package
> namespace is part of the responsibility of __import__():
> >>> import logging
> >>> logging.handlers
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'handlers'
> >>> __import__("logging.handlers")
> <module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>
> >>> logging.handlers
> <module 'logging.handlers' from '/usr/lib/python2.7/logging/handlers.pyc'>
Well, dang it.  Not sure how I missed this before:

$ python3
>>> import temp
>>> dir(temp)
['__builtins__', '__cached__', '__doc__', '__file__', '__name__',
'__package__', '__path__']

$ python3
>>> import temp.mod
>>> dir(temp)
['__builtins__', '__cached__', '__doc__', '__file__', '__name__',
'__package__', '__path__', 'mod']

So the sub-module name binding mechanism is simply to bind the package
module and then bind the submodules to it.  However, "import temp.mod as
something_else" and "from temp import mod" don't do this, which makes sense.

This is one of the reasons circular imports are such a pain - we
> pre-bind them in sys.modules, and remove them again if the import
> fails, but we don't currently do that in the parent package namespace,
> so circular imports sometimes work and sometime break depending not
> only on which names are accessed but also *how* they're accessed (e.g.
> in a/b/, "import a.b.c" will work, "import a.b.c; c = a.b.c" will
> fail with AttributeError and "from a.b import c" will fail with
> ImportError).
> > I am not sure
> > of the specific import mechanism with regards to name binding, but that
> > would seem to be a conflict with the way imported names for submodules
> are
> > bound.
> Nope, it's basically the same as what happens automatically when the
> modules are imported normally. Indeed, as near as I can tell, this
> request amounts to asking for syntactic sugar that does something
> roughly along the lines of:
> def _subnames(pkg_name, subnames):
>    for subname in subnames:
>        yield ".".join(pkg_name, subname)
> def import_all(pkg):
>    try:
>        pkg_all = pkg.__all__
>    except AttributeError:
>        pass
>    else:
>        names = list(_subnames(pkg.__name__, pkg_all))
>        for name in names:
>            mod = importlib.import_module(name)
>            try:
>                mod_all = mod.__all__
>            except AttributeError:
>                pass
>            else:
>                names.extend(_subnames(mod.__name__, mod_all)
This works as long as __all__ only contains submodule names, right?

> I can see a case being made to provide that as a function in pkgutil
> (or perhaps importlib itself), but I don't see any reason to give it
> dedicated syntax.


> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mwm at  Tue May 10 16:47:54 2011
From: mwm at (Mike Meyer)
Date: Tue, 10 May 2011 10:47:54 -0400
Subject: [Python-ideas] Minor tweak to PEP 8?
Message-ID: <>

PEP eight has an interesting omission in the "Code Layout" section. It
doesn't say how to indent continuation lines when code is wrapped to
comply with the line length limits. It has examples, but no textual
guides. Which means you can do a rock-stupid word warp (with no
indentation on the continuation lines), point at the resulting mess,
and say "See? If we follow this part of the PEP, we get really ugly
code!". Mail doing just that is what prompted this suggestion.

I therefore propose adding a sentence or two to this section,
something along the lines of:

    The continuation line(s) should be indented to reflect the
    structure of the statement being continued. This should be at
    least one space beyond the first open parenthesis that is not
    closed on the continued line, if present.

Nothing hard and fast, just a requirement to use good sense and the
minimal indent resulting from doing so.

Mike Meyer <mwm at>
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail -

From mikegraham at  Tue May 10 17:51:33 2011
From: mikegraham at (Mike Graham)
Date: Tue, 10 May 2011 11:51:33 -0400
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 10, 2011 at 10:47 AM, Mike Meyer <mwm at> wrote:
> PEP eight has an interesting omission in the "Code Layout" section. It
> doesn't say how to indent continuation lines when code is wrapped to
> comply with the line length limits. It has examples, but no textual
> guides. Which means you can do a rock-stupid word warp (with no
> indentation on the continuation lines), point at the resulting mess,
> and say "See? If we follow this part of the PEP, we get really ugly
> code!". Mail doing just that is what prompted this suggestion.
> I therefore propose adding a sentence or two to this section,
> something along the lines of:
> ? ?The continuation line(s) should be indented to reflect the
> ? ?structure of the statement being continued. This should be at
> ? ?least one space beyond the first open parenthesis that is not
> ? ?closed on the continued line, if present.
> Nothing hard and fast, just a requirement to use good sense and the
> minimal indent resulting from doing so.
> ? ?<mike

It is not shocking that you can be in technical compliance with PEP8
and have hideous code. PEP8 doesn't attempt to specify every existing
case nor should it, which would be long and pedantic. I'm not sure
anyone has bad enough taste for this omission to be problematic, so
I'm -0 on the the proposal in general.

For this actual rule, I am -1, as I think this is too limiting.
Sometimes the indentation is too far and the best style is


This approach also has the advantage of working well with
variable-width typefaces.

There are too many cases here that it would be silly to enumerate what
style might be best and when.


From ben+python at  Tue May 10 23:35:16 2011
From: ben+python at (Ben Finney)
Date: Wed, 11 May 2011 07:35:16 +1000
Subject: [Python-ideas] Minor tweak to PEP 8?
References: <>
Message-ID: <>

Mike Graham <mikegraham at> writes:

> For this actual rule, I am -1, as I think this is too limiting.

And often results in hideous code :-)

I'm ?1 also. Please don't make the indentation of continuation lines
dependent on the content of the opening line.

> Sometimes the indentation is too far and the best style is
>      self.other_thing.some_long_method_name(
>         foo,
>         barMightBeSortOfLongNaturally,
>         baz........

I assume you meant a four-column (not three-column) additional indent.

+1 if so, this matches the indentation style I advocate for continuation

 \       ?I believe in making the world safe for our children, but not |
  `\    our children's children, because I don't think children should |
_o__)                                     be having sex.? ?Jack Handey |
Ben Finney

From sklass at  Wed May 11 04:59:16 2011
From: sklass at (Steven Klass)
Date: Tue, 10 May 2011 19:59:16 -0700
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

+1 for this method

    keyword = foobar)


On Tue, May 10, 2011 at 2:35 PM, Ben Finney <ben+python at>wrote:

> Mike Graham <mikegraham at> writes:
> > For this actual rule, I am -1, as I think this is too limiting.
> And often results in hideous code :-)
> I'm ?1 also. Please don't make the indentation of continuation lines
> dependent on the content of the opening line.
> > Sometimes the indentation is too far and the best style is
> >
> >      self.other_thing.some_long_method_name(
> >         foo,
> >         barMightBeSortOfLongNaturally,
> >         baz........
> I assume you meant a four-column (not three-column) additional indent.
> +1 if so, this matches the indentation style I advocate for continuation
> lines.
> --
>  \       ?I believe in making the world safe for our children, but not |
>  `\    our children's children, because I don't think children should |
> _o__)                                     be having sex.? ?Jack Handey |
> Ben Finney
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


Steven M. Klass

? 1 (480) 225-1112
? sklass at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From cmjohnson.mailinglist at  Wed May 11 05:19:29 2011
From: cmjohnson.mailinglist at (Carl M. Johnson)
Date: Tue, 10 May 2011 17:19:29 -1000
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

Can we all at least agree that continuation lines should always be at least
one space more indented than the parent line? So, for example, this would be
right out:

for item in items:
    modified_item = self.frobincation_with_spengulizer(

The arguments should at least line up with the o in modified, if not the f.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ben+python at  Wed May 11 05:50:42 2011
From: ben+python at (Ben Finney)
Date: Wed, 11 May 2011 13:50:42 +1000
Subject: [Python-ideas] Minor tweak to PEP 8?
References: <>
Message-ID: <>

"Carl M. Johnson"
<cmjohnson.mailinglist at> writes:

> Can we all at least agree that continuation lines should always be at
> least one space more indented than the parent line?

At least one standard (four-column) indentation level further than the
opening line.

If you think you need to break that so your multi-line string will have
the right content, think again: use the ?textwrap.dedent? function

 \           ?Drop your trousers here for best results.? ?dry cleaner, |
  `\                                                           Bangkok |
_o__)                                                                  |
Ben Finney

From mwm at  Wed May 11 08:00:22 2011
From: mwm at (Mike Meyer)
Date: Wed, 11 May 2011 02:00:22 -0400
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 11 May 2011 13:50:42 +1000
Ben Finney <ben+python at> wrote:

> "Carl M. Johnson"
> <cmjohnson.mailinglist at> writes:
> > Can we all at least agree that continuation lines should always be at
> > least one space more indented than the parent line?
> At least one standard (four-column) indentation level further than the
> opening line.

Still overly strict. Consider:

    f(long_named_argument_one, calculated_value_two(with_arguments),

The two-space indent is perfectly reasonable here, as it aligns the
first element (a function argument) with the same function's argument
above it. In some cases, a similar one-space indent is also

I stand by my second proposal (reworded):

  Continuation lines should be indented to reflect the structure of
  the code. The indentation should either align with similar elements
  or match the surrounding source.


Mike Meyer <mwm at>
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail -

From ben+python at  Wed May 11 10:05:12 2011
From: ben+python at (Ben Finney)
Date: Wed, 11 May 2011 18:05:12 +1000
Subject: [Python-ideas] Minor tweak to PEP 8?
References: <>
Message-ID: <>

Mike Meyer <mwm at> writes:

> On Wed, 11 May 2011 13:50:42 +1000
> Ben Finney <ben+python at> wrote:
> > At least one standard (four-column) indentation level further than the
> > opening line.
> Still overly strict. Consider:
>     f(long_named_argument_one, calculated_value_two(with_arguments),
>       another_argument)
> The two-space indent is perfectly reasonable here

Maybe so; I'm not saying it's unreasonable. I'm saying it's *more*
reasonable to not have the indentation level depend on the opening line.

This generally involves breaking the opening line at a bracketing
token, such as ?"""?, ?(?, ?[?, etc., as Carl's suggestion showed, so
there's no parameter on that line for lining up.

Also, that function needs to be renamed to something more descriptive

 \           ?Kissing a smoker is like licking an ashtray.? ?anonymous |
  `\                                                                   |
_o__)                                                                  |
Ben Finney

From palla74 at  Wed May 11 11:10:54 2011
From: palla74 at (Palla)
Date: Wed, 11 May 2011 11:10:54 +0200
Subject: [Python-ideas] EuroPython: Early Bird will end in 2 days!
Message-ID: <>

Hi all,

If you plan to attend, you could save quite a bit on registration
fees! Buy your ticket now!

The end of Early bird is on May 12th, Friday, 23:59:59 CEST. We'd like
to ask to you to forward this post to anyone that you feel may be

We have an amazing lineup of tutorials, events and talks. We have some
excellent keynote speakers and a very complete partner program... but
early bird registration ends in 2 days!

Right now, you still get discounts on talks and tutorials so if you
plan to attend Register Now:

While you are booking, remember to have a look at the partner program
and our offer for a prepaid, data+voice+tethering SIM.

All the best,


From ncoghlan at  Wed May 11 14:21:35 2011
From: ncoghlan at (Nick Coghlan)
Date: Wed, 11 May 2011 22:21:35 +1000
Subject: [Python-ideas] Allow 'import star' with namespaces
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 10, 2011 at 4:55 AM, Eric Snow <ericsnowcurrently at> wrote:
> So the sub-module name binding mechanism is simply to bind the package
> module and then bind the submodules to it. ?However, "import temp.mod as
> something_else" and "from temp import mod" don't do this, which makes sense.

Not quite - both of the latter options change the name binding
behaviour in the *current* module, but temp.mod will be set to the
imported module regardless. It's part of the import process, whereas
the namebinding in the current module happens later (the underlying
complexity of all this is why importlib.import_module() was added to
replace direct invocation of __import__(). The latter has quite a
weird signature in order to support the various incarnations of the
import statement).


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From p.f.moore at  Wed May 11 15:27:54 2011
From: p.f.moore at (Paul Moore)
Date: Wed, 11 May 2011 14:27:54 +0100
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

On 11 May 2011 04:19, Carl M. Johnson <cmjohnson.mailinglist at> wrote:
> Can we all at least agree that continuation lines should always be at least
> one space more indented than the parent line?

Like it or not,

    some_string = """\
Text to be
used for something
incredibly exciting!"""

is not uncommon. I know about textwrap.dedent, but having to use a
Python function call to code a literal has always made me

I'm not saying it's right or wrong, just that there are reasonable
arguments why it might be reasonable.

What's wrong with just saying that continuation lines should be
formatted as appropriate to ensure readability, and leave it at that?
I know people have various standards of readability, but I'm willing
to assume that PEP 8 is targeted at people with some level of common
sense (anyone who is arguing "letter of the law" over something daft
like the example that started the thread is clearly trolling and could
find loopholes in anything, so why bother trying to convince them?)


From guido at  Wed May 11 16:23:33 2011
From: guido at (Guido van Rossum)
Date: Wed, 11 May 2011 07:23:33 -0700
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

At Google we use the following rule (from

Yes:  # Aligned with opening delimiter
       foo = long_function_name(var_one, var_two,
                               var_three, var_four)

      # 4-space hanging indent; nothing on first line
      foo = long_function_name(
           var_one, var_two, var_three,

No:   # Stuff on first line forbidden
      foo = long_function_name(var_one, var_two,
           var_three, var_four)

      # 2-space hanging indent forbidden
       foo = long_function_name(
        var_one, var_two, var_three,

I propose we somehow incorporate these two allowed alternatives into PEP 8.
They both serve a purpose.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Wed May 11 16:54:32 2011
From: steve at (Steven D'Aprano)
Date: Thu, 12 May 2011 00:54:32 +1000
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Paul Moore wrote:

> What's wrong with just saying that continuation lines should be
> formatted as appropriate to ensure readability, and leave it at that?


I think that specifying exactly how to indent continuation lines, or 
even whether or not to indent them, is way too controlling for my 
tastes. I don't believe it makes that much difference. Like the brace 
wars, if there actually was any objective, meaningful, consistent 
benefit of one style over the others, there would be no argument about 
it. Instead, it's all subjective, vague, and far from consistent.


From mwm at  Wed May 11 18:24:38 2011
From: mwm at (Mike Meyer)
Date: Wed, 11 May 2011 12:24:38 -0400
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, 12 May 2011 00:54:32 +1000
Steven D'Aprano <steve at> wrote:

> Paul Moore wrote:
> > What's wrong with just saying that continuation lines should be
> > formatted as appropriate to ensure readability, and leave it at that?
> +1


This sentiment is adequately expressed by the "A Foolish Consistency
..." section. It shouldn't need repeating.

> I think that specifying exactly how to indent continuation lines, or 
> even whether or not to indent them, is way too controlling for my 
> tastes. I don't believe it makes that much difference. Like the brace 
> wars, if there actually was any objective, meaningful, consistent 
> benefit of one style over the others, there would be no argument about 
> it. Instead, it's all subjective, vague, and far from consistent.

If you don't believe it makes much different "whether or not to indent
them", I suggest you align all continuation lines on the left hand
side of the page in code you have to maintain and then report back to

As for there being no benefit for one choice over another - that's
true about almost everything in the PEP (four space indent instead of
tabs?  80 character or 79 characters lines? spaces around = with
exceptions?  No spaces before/after "." and after open parens or
before close parens?  etc.). The goal is consistency. The important
thing isn't so much what we choose as that we choose something so
it'll be consistent when it doesn't make any difference.

Mike Meyer <mwm at>
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail -

From steve at  Wed May 11 18:55:08 2011
From: steve at (Steven D'Aprano)
Date: Thu, 12 May 2011 02:55:08 +1000
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Mike Meyer wrote:

>> I think that specifying exactly how to indent continuation lines, or 
>> even whether or not to indent them, is way too controlling for my 
>> tastes. I don't believe it makes that much difference. Like the brace 
>> wars, if there actually was any objective, meaningful, consistent 
>> benefit of one style over the others, there would be no argument about 
>> it. Instead, it's all subjective, vague, and far from consistent.
> If you don't believe it makes much different "whether or not to indent
> them", I suggest you align all continuation lines on the left hand
> side of the page in code you have to maintain and then report back to
> us.

In general, that would be an *outdent*, rather than not indenting.

As a matter of fact, there is at least one situation where I don't 
indent continuation lines:

         if condition:
             do_something("some long piece of text, most likely"
             " but not always an error message, which uses implicit"
             " concatenation over multiple lines blah blah blah blah"
             % spam)
             # and it's perfectly maintainable, thanks for asking.

The fact that I have bare strings (with a leading space) and/or a binary 
operator is more than enough clue that the lines form a block. Indenting 
would be superfluous, and counter-productive, as it would reduce the 
space available on each line.


From mat at  Wed May 11 18:44:04 2011
From: mat at (Matthias Lehmann)
Date: Wed, 11 May 2011 18:44:04 +0200
Subject: [Python-ideas] triple-quoted strings and indendation
Message-ID: <iqeeck$u8l$>

Hi all,

two times in one day I read about the problems of triple-quoted strings 
and indendation (one time on stackoverflow, one time one this list). 
Python is well known for its readability and its use of idendation to 
this end. But with triple-quoted strings, nice indendation is not 
possible without the need to post-process the resulting string.


Most often, the desired result of

 >>> some_string = """Hello
...                  World."""

is simply


instead of



What about the idea, to use a string-flag to indicate, that the 
triple-quoted string is to be trimmed. Like:

 >>> some_string = t"""Hello
...                   World."""

This would blend in with the 'u' and 'r' flags that already exist.

The triple-quoted string is trimmed to remove all whitespace up to the 
column where the first line of the string started OR all common 
whitespace of the subsequent lines, if the subsequent lines start on a 
column before the first line. The second rule makes it possible to also 

 >>> some_string = t"""Hello
...     World."""


The advantages above textwrap.dedent are:

1) textwrap.dedent only removes whitespace common to ALL lines, so to 
achieve the desired result, one has to add an additional newline

 >>> some_string = """
...     Hello
...     World."""
 >>> result = textwrap.dedent(some_string)[1:]

2) Also, it does not work, if one actually does want some common 
whitespace before all lines:

 >>> some_string = """
...                      Hello
...                      Wold."""
 >>> result = textwrap.dedent(some_string)[1:]

gives again


which is not, what I wanted. But

 >>> some_string = t"""    Hello
...                       World."""

would give


3) And finally to quote a post from earlier today
"I know about textwrap.dedent, but having to use a
Python function call to code a literal has always made me


Common indendation style for triple-quoted string (as far as I know) is

 >>> foo = """blubber
...       bla""" (align to first quote-char)

but with this auto-trimming, it would look better to use

 >>> foo = t"""blubber
...           bar""" (align to first char after triple quotes)

The other stlye would still work, though - as long as one does not want 
to preserve leading whitespace.

Maybe the t flag could also cause a leading and trailing newline to be 
removed, so that

 >>> foo = t"""
...        Hallo
...        World.
...        """

would also result in


Maybe something like this has been proposed before - please be kind, if 
it is an old hat.


From phd at  Wed May 11 23:27:48 2011
From: phd at (Oleg Broytman)
Date: Thu, 12 May 2011 01:27:48 +0400
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqeeck$u8l$>
References: <iqeeck$u8l$>
Message-ID: <>

On Wed, May 11, 2011 at 06:44:04PM +0200, Matthias Lehmann wrote:
> What about the idea, to use a string-flag to indicate, that the
> triple-quoted string is to be trimmed. Like:

   PEP 295 was rejected in 2002.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From steve at  Wed May 11 23:44:44 2011
From: steve at (Steven D'Aprano)
Date: Thu, 12 May 2011 07:44:44 +1000
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> At Google we use the following rule (from
> Yes:  # Aligned with opening delimiter
>        foo = long_function_name(var_one, var_two,
>                                var_three, var_four)

I cringe whenever I see that. If people are going to bother lining 
things up other than at 4-space indents, they should at least line them 
up in a visually attractive place. The delimiter should surround the 
arguments, not line up with them:

         foo = long_function_name(var_one, var_two,
                                  var_three, var_four)

although the effect may be spoiled if you're reading this in a 
non-monospaced font. This is analogous to the way that professional 
typesetters use handing punctuation:

    "Li Europan lingues es membres del sam familie. Lor separat
     existentie es un myth. Por scientie, musica, sport etc, litot
     Europa usa li sam vocabular. Li lingues differe solmen in li
     grammatica, li pronunciation e li plu commun vocabules."

compared to:

     "Li Europan lingues es membres del sam familie. Lor separat
     existentie es un myth. Por scientie, musica, sport etc, litot
     Europa usa li sam vocabular. Li lingues differe solmen in li
     grammatica, li pronunciation e li plu commun vocabules."

On the other hand, there's a good argument for not spending the time to 
neatly line up blocks of code (other than at the usual multiples of four 
spaces), whether it is to the delimiter or not. It's the same argument 
against doing this:

fee_fi_fo_fum = "something"       # Align the equals
foo           = "something else"  # and/or the hashes.

When actively changing code lined up like that, you can easily spend 
more time aligning things than programming.

I have a hard time reconciling the advice in PEP 8 against such 
alignments with the current suggestion.


From mal at  Wed May 11 23:50:43 2011
From: mal at (M.-A. Lemburg)
Date: Wed, 11 May 2011 23:50:43 +0200
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Steven D'Aprano wrote:
> Guido van Rossum wrote:
>> At Google we use the following rule (from
>> Yes:  # Aligned with opening delimiter
>>        foo = long_function_name(var_one, var_two,
>>                                var_three, var_four)
> I cringe whenever I see that. If people are going to bother lining
> things up other than at 4-space indents, they should at least line them
> up in a visually attractive place. The delimiter should surround the
> arguments, not line up with them:
>         foo = long_function_name(var_one, var_two,
>                                  var_three, var_four)

See the link Guido posted: that's what they use. Looks like the
MUA dropped a blank or there was a tab/space issue involved.
Whitespace tends to be mysterious sometimes ;-)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 11 2011)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2011-06-20: EuroPython 2011, Florence, Italy               40 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From tjreedy at  Thu May 12 00:17:40 2011
From: tjreedy at (Terry Reedy)
Date: Wed, 11 May 2011 18:17:40 -0400
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqeeck$u8l$>
References: <iqeeck$u8l$>
Message-ID: <iqf1u5$k3s$>

On 5/11/2011 12:44 PM, Matthias Lehmann wrote:
> Hi all,
> two times in one day I read about the problems of triple-quoted strings
> and indendation (one time on stackoverflow, one time one this list).
> Python is well known for its readability and its use of idendation to
> this end. But with triple-quoted strings, nice indendation is not
> possible without the need to post-process the resulting string.

Three partial solutions:

1. Strings are constants. Define them at the top of the module, in 
global scope. I remember seeing this promoted as a good coding practice 
once -- easy to find, modify, translate.

text = '''\
LIne 1
linklnlsf 2

and finally, we are done.

I would consider this for strings, at least long strings, displayed to 
end-users, with mnemonic names.

2. For doc strings, especially for top level classes, do not worry.

def whip_up(**args):
     '''Return some delicious munchies made from inputs.
     The keyword values should be edible and preferably yummy.
     Whip_up will do the best it can which what you give it.

Having help(whip_up) print

Return some delicious munchies made from inputs.
     The keyword values should be edible and preferably yummy.
     Whip_up will do the best it can which what you give it.

is not a problem. It might even be a virtue.

3. 'It is not necessarily so bad.' I have a test function with several 
tests that compares an expected string, given as a literal, to actual 
output captured with StringIO. Since this is a test_main in the file, 
run with __name__ == '__main__', I do not want to put the strings in the 
main part of the file (1 above). At first, the following bothered me.

     expected = '''\
Line 1
Line 2

I like Python's indentation! But it does not bother me so much anymore. 
IDLE colors the literals green, so they can be semi-ignored. Having the 
full screen width available can be a plus.

If I were using textwrap.dedent much, I might give it a short nickname 
like 'de' would be visible while I want see it but ignorable when I do 
not. If one wants a custom dedent rule, like the one you described, 
write a custom function.

Terry Jan Reedy

From greg.ewing at  Thu May 12 00:22:27 2011
From: greg.ewing at (Greg Ewing)
Date: Thu, 12 May 2011 10:22:27 +1200
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqeeck$u8l$>
References: <iqeeck$u8l$>
Message-ID: <>

I have an idea of my own concerning multi-line strings.

Many of the problems of triple-quoted strings stem from
the fact that they're trying to be expressions that
sit in-line with the rest of the code. As we've seen
with all the attempts to fit multi-line function bodies
into lambdas, that doesn't really work.

So instead of a multi-line string *expression*, I think
we need a *statement*.

string adverisement:
     |         Python Egg Incubator!
     |  Hatch your eggs in half the time. Get yours
     |  today for only $39.99!


From guido at  Thu May 12 00:22:21 2011
From: guido at (Guido van Rossum)
Date: Wed, 11 May 2011 15:22:21 -0700
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 11, 2011 at 2:44 PM, Steven D'Aprano <steve at> wrote:
> Guido van Rossum wrote:
>> At Google we use the following rule (from
>> Yes: ?# Aligned with opening delimiter
>> ? ? ? foo = long_function_name(var_one, var_two,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? var_three, var_four)
> I cringe whenever I see that. If people are going to bother lining things up
> other than at 4-space indents, they should at least line them up in a
> visually attractive place. The delimiter should surround the arguments, not
> line up with them:
> ? ? ? ?foo = long_function_name(var_one, var_two,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? var_three, var_four)
> although the effect may be spoiled if you're reading this in a
> non-monospaced font.

I used rich text in gmail and it looks aligned to me. Sorry if it
doesn't for you; as MAL said, follow the link to see how it's supposed
to look.

> This is analogous to the way that professional
> typesetters use handing punctuation:
> ? "Li Europan lingues es membres del sam familie. Lor separat
> ? ?existentie es un myth. Por scientie, musica, sport etc, litot
> ? ?Europa usa li sam vocabular. Li lingues differe solmen in li
> ? ?grammatica, li pronunciation e li plu commun vocabules."
> compared to:
> ? ?"Li Europan lingues es membres del sam familie. Lor separat
> ? ?existentie es un myth. Por scientie, musica, sport etc, litot
> ? ?Europa usa li sam vocabular. Li lingues differe solmen in li
> ? ?grammatica, li pronunciation e li plu commun vocabules."
> On the other hand, there's a good argument for not spending the time to
> neatly line up blocks of code (other than at the usual multiples of four
> spaces), whether it is to the delimiter or not.

Emacs automatically does this for me. I spend zero time aligning code.

> It's the same argument
> against doing this:
> fee_fi_fo_fum = "something" ? ? ? # Align the equals
> foo ? ? ? ? ? = "something else" ?# and/or the hashes.
> When actively changing code lined up like that, you can easily spend more
> time aligning things than programming.
> I have a hard time reconciling the advice in PEP 8 against such alignments
> with the current suggestion.

Hardly; that is about spaces *between* tokens. This is about
indentation. The amount of degradation in non-monospace fonts is quite
different. Indentation still looks indented, just not aligned with
[the first character inside] the open parenthesis, whereas internal
spaces look completely jumbled.

IF PEP 8 was still mine I would add this specific rule from the Google
style guide. If people want to bikeshed it to death, go ahead, I will
probably mute the thread.

--Guido van Rossum (

From greg.ewing at  Thu May 12 00:40:45 2011
From: greg.ewing at (Greg Ewing)
Date: Thu, 12 May 2011 10:40:45 +1200
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqf1u5$k3s$>
References: <iqeeck$u8l$> <iqf1u5$k3s$>
Message-ID: <>

Terry Reedy wrote:
> If I were using textwrap.dedent much, I might give it a short nickname 
> like 'de'

Wild idea: make the unary + operator on strings do
textwrap.dedent() on them.


From bruce at  Thu May 12 00:57:32 2011
From: bruce at (Bruce Leban)
Date: Wed, 11 May 2011 15:57:32 -0700
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <>
References: <iqeeck$u8l$> <iqf1u5$k3s$>
Message-ID: <>

On Wed, May 11, 2011 at 3:40 PM, Greg Ewing <greg.ewing at>wrote:
> Wild idea: make the unary + operator on strings do
> textwrap.dedent() on them.
Wouldn't the unary - operator make more sense since it's removing spaces?

But I would prefer that it use a slightly friendlier form of dedent:

def dedent_for_literal(s):
if s and s[0] == '\n':
    s = s[1:]
 if s and s[-1] == '\n':
    s = s[:-1]
return textwrap.dedent(s)

That said, is this such a wart on the language that it's worth changing?

--- Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From lists at  Thu May 12 01:58:05 2011
From: lists at (Christian Heimes)
Date: Thu, 12 May 2011 01:58:05 +0200
Subject: [Python-ideas] Threading hooks and disable gc per thread
Message-ID: <>


today I've spent several hours debugging a segfault in JCC [1]. JCC is a
framework to wrap Java code for Python. It's most prominently used in
PyLucene [2]. You can read more about my debugging in [3]

With JCC every Python thread must be registered at the JVM through JCC.
An unattached thread, that accesses a wrapped Java object, leads to
errors and may even cause a segfault. Accessing also includes garbage
collection. A code line like

   a = {}

   "a b c".split()

can segfault since the allocation of a dict or a bound method runs
through _PyObject_GC_New(), which may trigger a cyclic garbage
collection run. If the current thread isn't attached to the JVM but
triggers a gc.collect() with some Java objects in a cycle, the
interpreter crashes. It's quite complicated and hard to "fix" third
party tools to attach all threads created in the third party library.

The issue could be solved with a simple on_thread_start hook in the
threading module. However there is more to it. In order to free memory
threads must also be detached from the JVM, when a thread has ended. A
second on_thread_stop hook isn't enough since the bound methods may also
lead to a gc.collect() run after the thread is detached.

I propose three changes to Python in order to fix the issue:

on thread start hook

Similar to the atexit module, third party modules can register a
callable with *args and **kwargs. The functions are called inside the
newly created thread just before the target is called. The best place
for the hook list is threading.Thread._bootstrap_inner() right before
the try: except: block. Exceptions are ignored during the
call but reported to the user at the end (same as atexit's

on thread end hook

Same as on thread start hook but the callables are called inside the
dying thread after

gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread()

Right now almost any code can trigger a gc.collect() run
non-deterministicly. Some application like JCC want to control if
gc.collect() is wanted on a thread level. This could be solved with a
new flat in PyThreadState. PyThreadState->gc_enabled is enabled by
default. When the flag is false, _PyObject_GC_Malloc() doesn't start a
gc.collect() run for that thread. The collection is delayed until
another thread or the main thread triggers it.

The three functions should also have a C equivalent so C code can
prevent gc in a thread.




From ben+python at  Thu May 12 04:41:02 2011
From: ben+python at (Ben Finney)
Date: Thu, 12 May 2011 12:41:02 +1000
Subject: [Python-ideas] Minor tweak to PEP 8?
References: <>
Message-ID: <>

Guido van Rossum <guido at> writes:

> Yes:  # Aligned with opening delimiter
>        foo = long_function_name(var_one, var_two,
>                                var_three, var_four)

This is needlessly dependent on the content of the opening line; if that
changes, the rest need to change. It begs for the indentation to get
mis-aligned when other lines are edited.

>       # 4-space hanging indent; nothing on first line
>       foo = long_function_name(
>            var_one, var_two, var_three,
>           var_four)

This one doesn't have the previous problem, which is why it's what I

I would be happy to see the latter explicitly recommended in PEP 8. If
the price of that is to have the former also recommended, I'd grumble
but it would be an improvement.

> No:   # Stuff on first line forbidden
>       foo = long_function_name(var_one, var_two,
>            var_three, var_four)
>       # 2-space hanging indent forbidden
>        foo = long_function_name(
>         var_one, var_two, var_three,
>          var_four)

I agree with pointing to both of these as bad examples.

 \      ?People demand freedom of speech to make up for the freedom of |
  `\   thought which they avoid.? ?Soren Aabye Kierkegaard (1813?1855) |
_o__)                                                                  |
Ben Finney

From mat at  Thu May 12 09:24:53 2011
From: mat at (Matthias Lehmann)
Date: Thu, 12 May 2011 09:24:53 +0200
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <>
References: <iqeeck$u8l$>
Message-ID: <iqg206$1s2$>

>     PEP 295 was rejected in 2002.
> Oleg.
Oh, thanks for the link, I was almost sure that something like that was 
proposed before - sorry I didn't thoroughly search the PEPs beforehand.

I still think that indendation of triple-quoted strings is a wart of the 
language - a small one, but still a wart. But it's been discussed and 
rejected before - and probably with good reasons.


From mat at  Thu May 12 09:39:43 2011
From: mat at (Matthias Lehmann)
Date: Thu, 12 May 2011 09:39:43 +0200
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <>
References: <iqeeck$u8l$> <iqf1u5$k3s$>
Message-ID: <iqg2rv$62h$>

> Wild idea: make the unary + operator on strings do
> textwrap.dedent() on them.
The disadvantage compared to a string flag is, that this unary operator 
has no knowledge of the current indendation level within the code - so 
this solution looks similar in code

x = +"""  foo


x = t"""  foo

the results is different, though.




From phd at  Thu May 12 12:15:57 2011
From: phd at (Oleg Broytman)
Date: Thu, 12 May 2011 14:15:57 +0400
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqg206$1s2$>
References: <iqeeck$u8l$>
Message-ID: <>

On Thu, May 12, 2011 at 09:24:53AM +0200, Matthias Lehmann wrote:
> >    PEP 295 was rejected in 2002.

> Oh, thanks for the link, I was almost sure that something like that
> was proposed before - sorry I didn't thoroughly search the PEPs
> beforehand.
> I still think that indendation of triple-quoted strings is a wart of
> the language - a small one, but still a wart. But it's been
> discussed and rejected before - and probably with good reasons.

   My opinion is:
-- I don't think it's a wart;
-- If it's a wart it's quite small;
-- It's very easy to fix by calling dedent();
-- Fixing it by changing the language means to change the language for
   very little gain; changing the language must not be done lightly.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From p.f.moore at  Thu May 12 12:18:03 2011
From: p.f.moore at (Paul Moore)
Date: Thu, 12 May 2011 11:18:03 +0100
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqeeck$u8l$>
References: <iqeeck$u8l$>
Message-ID: <>

On 11 May 2011 17:44, Matthias Lehmann <mat at> wrote:
> 3) And finally to quote a post from earlier today
> "I know about textwrap.dedent, but having to use a
> Python function call to code a literal has always made me
> uncomfortable."

As the writer of that comment, I'd like to add a -1 to this proposal :-)

My intent was to point out that I'm willing to have indentation
oddities rather than use dedent. In my view, the problem isn't
important enough to warrant extra syntax.

Sorry, :-)

From mat at  Thu May 12 12:32:49 2011
From: mat at (Matthias Lehmann)
Date: Thu, 12 May 2011 12:32:49 +0200
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <>
References: <iqeeck$u8l$>
Message-ID: <iqgd0h$uiu$>

Am 12.05.2011 12:18, schrieb Paul Moore:
> On 11 May 2011 17:44, Matthias Lehmann<mat at>  wrote:
>> 3) And finally to quote a post from earlier today
>> "I know about textwrap.dedent, but having to use a
>> Python function call to code a literal has always made me
>> uncomfortable."
> As the writer of that comment, I'd like to add a -1 to this proposal :-)
> My intent was to point out that I'm willing to have indentation
> oddities rather than use dedent. In my view, the problem isn't
> important enough to warrant extra syntax.
> Sorry, :-)
> Paul.

I didn't mean to misuse your comment - I hope this is not your perception.

Thanks for your feedback.


From amaramrahul at  Thu May 12 15:44:00 2011
From: amaramrahul at (Rahul Amaram)
Date: Thu, 12 May 2011 19:14:00 +0530
Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8
Message-ID: <>

I was wondering if the following programming recommendation would be 
added to the Style Guide for Python Code (PEP 8) page.

The preferred way for checking if a key (k) exists in a dictionary (d) 
is "if k in d". This is faster than "if k in d.keys()" and this has 
superseded "d.has_key(k)"


From p.f.moore at  Thu May 12 15:54:07 2011
From: p.f.moore at (Paul Moore)
Date: Thu, 12 May 2011 14:54:07 +0100
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqgd0h$uiu$>
References: <iqeeck$u8l$>
Message-ID: <>

On 12 May 2011 11:32, Matthias Lehmann <mat at> wrote:
> I didn't mean to misuse your comment - I hope this is not your perception.

Not at all. I understood your message, just wanted to clarify the
thinking behind my original statement. Your quote was entirely fair.

From solipsis at  Thu May 12 19:26:45 2011
From: solipsis at (Antoine Pitrou)
Date: Thu, 12 May 2011 19:26:45 +0200
Subject: [Python-ideas] PEP-3151 pattern-matching
References: <>
Message-ID: <>

On Fri, 08 Apr 2011 10:59:16 +0200
"M.-A. Lemburg" <mal at> wrote:
> > 
> > I think EnvironmentError, WindowsError, VMSError, OSError, mmap.error
> > and select.error should definitely all be merged with IOError, as they
> > aren't used consistently enough to make handling them differently
> > reliable even in current code.
> Their use may be inconsistent in a few places, but those cases
> are still well-defined by the implementation, so code relying
> on that well-defined behavior will break in subtle ways.

Another quirk occurred to me today: select.error doesn't derive from
EnvironmentError, and so it doesn't have the errno attribute (even
though the select module "correctly" instantiates it with a (errno,
message) tuple). Also, its str() is borked:

>>> e = select.error(4, "interrupted")
>>> str(e)
"(4, 'interrupted')"
>>> raise e
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
select.error: (4, 'interrupted')



From mal at  Thu May 12 21:06:18 2011
From: mal at (M.-A. Lemburg)
Date: Thu, 12 May 2011 21:06:18 +0200
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Antoine Pitrou wrote:
> On Fri, 08 Apr 2011 10:59:16 +0200
> "M.-A. Lemburg" <mal at> wrote:
>>> I think EnvironmentError, WindowsError, VMSError, OSError, mmap.error
>>> and select.error should definitely all be merged with IOError, as they
>>> aren't used consistently enough to make handling them differently
>>> reliable even in current code.
>> Their use may be inconsistent in a few places, but those cases
>> are still well-defined by the implementation, so code relying
>> on that well-defined behavior will break in subtle ways.
> Another quirk occurred to me today: select.error doesn't derive from
> EnvironmentError, and so it doesn't have the errno attribute (even
> though the select module "correctly" instantiates it with a (errno,
> message) tuple). Also, its str() is borked:
>>>> e = select.error(4, "interrupted")
>>>> str(e)
> "(4, 'interrupted')"
>>>> raise e
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> select.error: (4, 'interrupted')

Works fine in Python 2.7:

>>> import select
>>> e = select.error(4, "intr")
>>> e
error(4, 'intr')
>>> try:
...  raise e
... except select.error, x:
...  code, text = x
...  print code,text
4 intr

Note that existing code will not look for an attribute that
doesn't exist :-) It'll unwrap the tuple and work from there
or use the .args attribute to get at the constructor args.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 12 2011)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2011-06-20: EuroPython 2011, Florence, Italy               39 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From solipsis at  Thu May 12 21:22:11 2011
From: solipsis at (Antoine Pitrou)
Date: Thu, 12 May 2011 21:22:11 +0200
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <1305228131.3548.4.camel@localhost.localdomain>

Le jeudi 12 mai 2011 ? 21:06 +0200, M.-A. Lemburg a ?crit :
> Note that existing code will not look for an attribute that
> doesn't exist :-)

True. My point is that not having "errno" makes it even more obscure how
to check for different kinds of select errors. Also, given that other
"environmental" errors will have an "errno" giving the POSIX error code,
it's easy to get surprised.



From g.brandl at  Thu May 12 22:12:47 2011
From: g.brandl at (Georg Brandl)
Date: Thu, 12 May 2011 22:12:47 +0200
Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8
In-Reply-To: <>
References: <>
Message-ID: <iqhevp$glv$>

On 12.05.2011 15:44, Rahul Amaram wrote:
> Hi,
> I was wondering if the following programming recommendation would be 
> added to the Style Guide for Python Code (PEP 8) page.
> The preferred way for checking if a key (k) exists in a dictionary (d) 
> is "if k in d". This is faster than "if k in d.keys()" and this has 
> superseded "d.has_key(k)"

While "k in d" is certainly the right way, this is not the sort of thing
that should be added to PEP 8.  There must be dozens of such little
idioms and anti-idioms, and listing them all is way beyond the PEP's scope.

(And has_key is gone in py3k anyway.)


From g.brandl at  Thu May 12 22:15:07 2011
From: g.brandl at (Georg Brandl)
Date: Thu, 12 May 2011 22:15:07 +0200
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
Message-ID: <iqhf45$glv$>

On 12.05.2011 21:06, M.-A. Lemburg wrote:
> Antoine Pitrou wrote:
>> On Fri, 08 Apr 2011 10:59:16 +0200
>> "M.-A. Lemburg" <mal at> wrote:
>>>> I think EnvironmentError, WindowsError, VMSError, OSError, mmap.error
>>>> and select.error should definitely all be merged with IOError, as they
>>>> aren't used consistently enough to make handling them differently
>>>> reliable even in current code.
>>> Their use may be inconsistent in a few places, but those cases
>>> are still well-defined by the implementation, so code relying
>>> on that well-defined behavior will break in subtle ways.
>> Another quirk occurred to me today: select.error doesn't derive from
>> EnvironmentError, and so it doesn't have the errno attribute (even
>> though the select module "correctly" instantiates it with a (errno,
>> message) tuple). Also, its str() is borked:
>>>>> e = select.error(4, "interrupted")
>>>>> str(e)
>> "(4, 'interrupted')"
>>>>> raise e
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> select.error: (4, 'interrupted')
> Works fine in Python 2.7:
>>>> import select
>>>> e = select.error(4, "intr")
>>>> e
> error(4, 'intr')

Note that this is the repr(), while Antoine showed the str().

But the str() looks correct to me as well (for an exception that doesn't
derive from EnvironmentError).


From solipsis at  Thu May 12 23:17:29 2011
From: solipsis at (Antoine Pitrou)
Date: Thu, 12 May 2011 23:17:29 +0200
Subject: [Python-ideas] PEP-3151 pattern-matching
References: <>
	<> <>
	<> <iqhf45$glv$>
Message-ID: <>

On Thu, 12 May 2011 22:15:07 +0200
Georg Brandl <g.brandl at> wrote:
> But the str() looks correct to me as well (for an exception that doesn't
> derive from EnvironmentError).

It's technically correct, sure. The point is that "technically correct"
translates to "humanly bogus" here, because of the broken I/O
exception hierarchy.



From g.brandl at  Fri May 13 07:07:56 2011
From: g.brandl at (Georg Brandl)
Date: Fri, 13 May 2011 07:07:56 +0200
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <>
References: <>
	<> <>
	<> <iqhf45$glv$>
Message-ID: <iqieb6$2r0$>

On 12.05.2011 23:17, Antoine Pitrou wrote:
> On Thu, 12 May 2011 22:15:07 +0200
> Georg Brandl <g.brandl at> wrote:
>> But the str() looks correct to me as well (for an exception that doesn't
>> derive from EnvironmentError).
> It's technically correct, sure. The point is that "technically correct"
> translates to "humanly bogus" here, because of the broken I/O
> exception hierarchy.

Yep, and I'm all for fixing it with PEP 3151 :)


From clockworksaint at  Fri May 13 14:34:50 2011
From: clockworksaint at (Weeble)
Date: Fri, 13 May 2011 13:34:50 +0100
Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8
Message-ID: <>

On 12.05.2011 15:44, Rahul Amaram wrote:
> The preferred way for checking if a key (k) exists in a dictionary (d)
> is "if k in d". This is faster than "if k in d.keys()" and this has
> superseded "d.has_key(k)"

While 'k in d' is the right way to do it, I feel the claim it's faster
that 'k in d.keys()' is somewhat weak. While this is technically true,
it's a constant overhead, not some cost linear in the size of the
collection - at least in Python 3 - because .keys() returns a view.

>>> timeit("'1234567' in d", "d=dict((str(x),x) for x in range(5000000))")
>>> timeit("'1234567' in d.keys()", "d=dict((str(x),x) for x in range(5000000))")
>>> timeit("'1234567' in dkeys", "dkeys=dict((str(x),x) for x in range(5000000)).keys()")

So "x in d.keys()" is slower than "x in d", but only by the cost of a
method lookup. I don't see any reason ever to recommend using "x in
d.keys()", but I think it's misleading to say that this is because of
performance reasons, assuming that we are talking about Python 3.

(I also completely agree with everything Georg said, FWIW.)

From stephen at  Fri May 13 15:13:19 2011
From: stephen at (Stephen J. Turnbull)
Date: Fri, 13 May 2011 22:13:19 +0900
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqg2rv$62h$>
References: <iqeeck$u8l$> <iqf1u5$k3s$>
	<> <iqg2rv$62h$>
Message-ID: <>

Matthias Lehmann writes:
 > > Wild idea: make the unary + operator on strings do
 > > textwrap.dedent() on them.
 > >
 > The disadvantage compared to a string flag is, that this unary operator 
 > has no knowledge of the current indendation level within the code

But then your complaint is against text.dedent, not against Python
syntax.  (That's no reason you can't have 2 complaints, of course.)

From mat at  Fri May 13 16:13:57 2011
From: mat at (Matthias Lehmann)
Date: Fri, 13 May 2011 16:13:57 +0200
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <>
References: <iqeeck$u8l$>
	<iqf1u5$k3s$>	<>
Message-ID: <iqjeb5$j0u$>

Am 13.05.2011 15:13, schrieb Stephen J. Turnbull:
> Matthias Lehmann writes:
>   >  >  Wild idea: make the unary + operator on strings do
>   >  >  textwrap.dedent() on them.
>   >  >
>   >  The disadvantage compared to a string flag is, that this unary operator
>   >  has no knowledge of the current indendation level within the code
> But then your complaint is against text.dedent, not against Python
> syntax.  (That's no reason you can't have 2 complaints, of course.)

Well, it's not the fault of textwrap.dedent, that is has no notion of 
the indendation-level of its argument. As far as I know, that is 
something, only the parser knows (not that I know anything about the 
Python parser).


From stephen at  Fri May 13 18:14:29 2011
From: stephen at (Stephen J. Turnbull)
Date: Sat, 14 May 2011 01:14:29 +0900
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqjeb5$j0u$>
References: <iqeeck$u8l$> <iqf1u5$k3s$>
	<> <iqg2rv$62h$>
Message-ID: <>

Matthias Lehmann writes:
 > Am 13.05.2011 15:13, schrieb Stephen J. Turnbull:
 > > Matthias Lehmann writes:
 > >   >  >  Wild idea: make the unary + operator on strings do
 > >   >  >  textwrap.dedent() on them.
 > >   >  >
 > >   >  The disadvantage compared to a string flag is, that this unary operator
 > >   >  has no knowledge of the current indendation level within the code
 > >
 > > But then your complaint is against text.dedent, not against Python
 > > syntax.  (That's no reason you can't have 2 complaints, of course.)
 > Well, it's not the fault of textwrap.dedent, that is has no notion of 
 > the indendation-level of its argument. As far as I know, that is 
 > something, only the parser knows (not that I know anything about the 
 > Python parser).

Oh, I thought you were referring to the indentation within the string
(on the first line), not where the string begins.  Sorry!

But I think there's real trouble here, because there are different
styles of indentation, as we've seen.  You'd have to enforce one for
triple-quoted strings, but that's likely to conflict with many
developers' ideas about the matter.  That's really not something the
parser should be doing ....

From bruce at  Fri May 13 19:17:01 2011
From: bruce at (Bruce Leban)
Date: Fri, 13 May 2011 10:17:01 -0700
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <>
References: <iqeeck$u8l$> <iqf1u5$k3s$>
	<> <iqg2rv$62h$>
Message-ID: <>

On Fri, May 13, 2011 at 9:14 AM, Stephen J. Turnbull <stephen at>wrote:

> Matthias Lehmann writes:
>  > Well, it's not the fault of textwrap.dedent, that is has no notion of
>  > the indendation-level of its argument. As far as I know, that is
>  > something, only the parser knows (not that I know anything about the
>  > Python parser).
> Oh, I thought you were referring to the indentation within the string
> (on the first line), not where the string begins.  Sorry!
> But I think there's real trouble here, because there are different
> styles of indentation, as we've seen.  You'd have to enforce one for
> triple-quoted strings, but that's likely to conflict with many
> developers' ideas about the matter.  That's really not something the
> parser should be doing ....

If this feature were to be added, we would surely want to ignore the
indentation on the first line regardless of the previous line since it
shouldn't depend on whether or not I use two or four space indents:

#   ^^^^ don't want these spaces in my string

but unless we force people to follow the convention that you must have a
line break after the opening """ we would need to ignore indentation
starting with the second line for people who use this style:


Now personally, I'd probably follow that first style but if this were a
language feature I wouldn't think it should only work for one style. Here's

if s[0] == '\n':  # style = first case above
    strip first character and strip indentation starting with first line
else if s[0] == ' ':
    strip indentation starting with first line   # style = """\
    strip indentation starting with second line  # style = second case above

--- Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mat at  Fri May 13 23:00:35 2011
From: mat at (Matthias Lehmann)
Date: Fri, 13 May 2011 23:00:35 +0200
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <>
References: <iqeeck$u8l$>
	<iqf1u5$k3s$>	<>
	<iqg2rv$62h$>	<>	<iqjeb5$j0u$>	<>
Message-ID: <iqk65j$iv$>

Am 13.05.2011 19:17, schrieb Bruce Leban:
> If this feature were to be added, we would surely want to ignore the
> indentation on the first line regardless of the previous line since it
> shouldn't depend on whether or not I use two or four space indents:
>      fun_func(-"""
>          multiple
>          lines
> """)
> #   ^^^^ don't want these spaces in my string
> but unless we force people to follow the convention that you must have a
> line break after the opening """ we would need to ignore indentation
> starting with the second line for people who use this style:
>      fun_func(-"""foo
>                   bar
>                   more""")
> Now personally, I'd probably follow that first style but if this were a
> language feature I wouldn't think it should only work for one style.
> Here's pseudo-code:
>     if s[0] == '\n':  # style = first case above
>          strip first character and strip indentation starting with first
>     line
>     else if s[0] == ' ':
>          strip indentation starting with first line   # style = """\
>     else:
>          strip indentation starting with second line  # style = second
>     case above

The prototyped code for trimming of triple-quoted string as I proposed were:

def trim(start_column, lines):
         """ start_column: start-column of first line of the
                           triple-quoted string
             lines: the lines of the string
         n = start_column
	for line in lines[1:]:
             m = get_index_of_first_non_whitespace_char(line)
             n = min(n, m)
         result = []
         if len(lines[0]) > 0:
         for line in lines[1:]:
         if len(lines[-1]) == 0:
             result = result[:-1]
         return '\n'.join(result)

The crux is to have the start_column available to the function, 
everything else could be done just with a function.

With this, following indendation styles are possible:




All this would be possible with a function, too. The start_column is 
really only needed to support cases like this:

func(t"""   keep

From rrr at  Sat May 14 03:28:49 2011
From: rrr at (Ron Adam)
Date: Fri, 13 May 2011 20:28:49 -0500
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <>
References: <iqeeck$u8l$> <>
Message-ID: <iqklsi$8id$>

On 05/11/2011 05:22 PM, Greg Ewing wrote:
> I have an idea of my own concerning multi-line strings.
> Many of the problems of triple-quoted strings stem from
> the fact that they're trying to be expressions that
> sit in-line with the rest of the code. As we've seen
> with all the attempts to fit multi-line function bodies
> into lambdas, that doesn't really work.
> So instead of a multi-line string *expression*, I think
> we need a *statement*.
> string adverisement:
>     | Python Egg Incubator!
>     |
>     | Hatch your eggs in half the time. Get yours
>     | today for only $39.99!

If in the above, '|' is used as the start of a line terminated string, it 
would be a nicer way of typing...

string advertisement:
     " Python Egg Incubator!\n"
     " Hatch your eggs in half the time. Get yours\n"
     " today for only $39.99!\n"

I think that would only require a small patch to tokanize.c.  It would 
result in a blank line being added to the end of the paragraph, but maybe 
that's not so bad.

The hard parts are finding the best symbol, '|' is already used, and 
weather or not to try to handle raw and byte strings would be a concern as 

We don't want to allow quotes to go unterminated as that is usually an 
error that needs to be caught.

Weather or not it's desirable to do this is another thing.  ;-)


From stephen at  Sat May 14 03:50:48 2011
From: stephen at (Stephen J. Turnbull)
Date: Sat, 14 May 2011 10:50:48 +0900
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqk65j$iv$>
References: <iqeeck$u8l$> <iqf1u5$k3s$>
	<> <iqg2rv$62h$>
Message-ID: <>

Matthias Lehmann writes:

 > func(t"""
 >    foo
 >    bar
 >    more
 >    """)

This style is possible without help from the parser, by taking the
last line as a hint for the indent to trim.

From amaramrahul at  Sat May 14 06:50:33 2011
From: amaramrahul at (Rahul Amaram)
Date: Sat, 14 May 2011 10:20:33 +0530
Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8
In-Reply-To: <iqhevp$glv$>
References: <>
Message-ID: <>

Thanks for the reply George and Weeble. It would nice if these kind of 
minor programming guidelines are also included in some page probably 
titled "Extended Python Guidelines" :). The reason being novice 
programmers in python who have worked in previous languages tend to use 
the same style of coding as in other languages. So, for instance, to 
check for the existence of a key in a dictionary, it is extremely likely 
that they'd either look for a has_key method or get a list of all the 
keys and search in it. Anyway, as you said, there might a lot of such 
small idioms in python, which may not make sense to cover in PEP 8 but 
if they are really the recommended way of doing the operation, then we 
probably should have them documented in one place.


On Friday 13 May 2011 01:42 AM, Georg Brandl wrote:
> On 12.05.2011 15:44, Rahul Amaram wrote:
>> Hi,
>> I was wondering if the following programming recommendation would be
>> added to the Style Guide for Python Code (PEP 8) page.
>> The preferred way for checking if a key (k) exists in a dictionary (d)
>> is "if k in d". This is faster than "if k in d.keys()" and this has
>> superseded "d.has_key(k)"
> While "k in d" is certainly the right way, this is not the sort of thing
> that should be added to PEP 8.  There must be dozens of such little
> idioms and anti-idioms, and listing them all is way beyond the PEP's scope.
> (And has_key is gone in py3k anyway.)
> Georg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From greg.ewing at  Sat May 14 13:15:50 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 14 May 2011 23:15:50 +1200
Subject: [Python-ideas] triple-quoted strings and indendation
In-Reply-To: <iqklsi$8id$>
References: <iqeeck$u8l$> <>
Message-ID: <>

Ron Adam wrote:
> On 05/11/2011 05:22 PM, Greg Ewing wrote:
>> string adverisement:
>>     | Python Egg Incubator!
>>     |
>>     | Hatch your eggs in half the time. Get yours
>>     | today for only $39.99!
> I think that would only require a small patch to tokanize.c.  It would 
> result in a blank line being added to the end of the paragraph,

No, the idea is that a newline wouldn't be added to the last line.
If you wanted that, you would have to add an empty line at the end:

string foo:
     | This line ends with a newline.

> The hard parts are finding the best symbol, '|' is already used,

In a different context, though. There shouldn't be any ambiguity.
I'd much rather use '|' than anything else, because it makes such
a nice vertical boundary line.


From g.brandl at  Sat May 14 16:17:11 2011
From: g.brandl at (Georg Brandl)
Date: Sat, 14 May 2011 16:17:11 +0200
Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8
In-Reply-To: <>
References: <>
Message-ID: <iqm2t7$3i7$>

On 14.05.2011 06:50, Rahul Amaram wrote:
> Thanks for the reply George and Weeble. It would nice if these kind of 
> minor programming guidelines are also included in some page probably 
> titled "Extended Python Guidelines" :). The reason being novice 
> programmers in python who have worked in previous languages tend to use 
> the same style of coding as in other languages. So, for instance, to 
> check for the existence of a key in a dictionary, it is extremely likely 
> that they'd either look for a has_key method or get a list of all the 
> keys and search in it. Anyway, as you said, there might a lot of such 
> small idioms in python, which may not make sense to cover in PEP 8 but 
> if they are really the recommended way of doing the operation, then we 
> probably should have them documented in one place.

I'd hope that simple things like "k in d" are already in every tutorial
on Python that's worth anything...


From grosser.meister.morti at  Sat May 14 19:57:32 2011
From: grosser.meister.morti at (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=)
Date: Sat, 14 May 2011 19:57:32 +0200
Subject: [Python-ideas] a few decorator recipes
In-Reply-To: <>
References: <> <>
Message-ID: <>

So there is a standard place to store such metadata.

See how I use it here (scroll all the way down):


On 04/30/2011 09:17 PM, Benjamin Peterson wrote:
> Mathias Panzenb?ck<grosser.meister.morti at ...>  writes:
>> def annotations(**annots):
>> 	def deco(obj):
>> 		if hasattr(obj,'__annotations__'):
>> 			obj.__annotations__.update(annots)
>> 		else:
>> 			obj.__annotations__ = annots
>> 		return obj
>> 	return deco
> Why would you want to do that?
>> def setannot(obj, key, value):
> I don't see the point.

From greg at  Sat May 14 20:51:37 2011
From: greg at (Gregory P. Smith)
Date: Sat, 14 May 2011 11:51:37 -0700
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 11, 2011 at 7:23 AM, Guido van Rossum <guido at> wrote:
> At Google we use the following rule (from
> Yes:? # Aligned with opening delimiter
> ? ? ? foo = long_function_name(var_one, var_two,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?var_three, var_four)
> ? ? ? # 4-space hanging indent; nothing on first line
> ? ? ? foo = long_function_name(
> ? ? ? ? ? var_one, var_two, var_three,
> ? ? ? ? ? var_four)

and note that this should be "8-space hanging indent" if it goes into
pep8.  The rule is really "double your code indentation hanging
indent" so that you can never confuse the two visually.  it works

> No: ? # Stuff on first line forbidden
> ? ? ? foo = long_function_name(var_one, var_two,
> ? ? ? ? ? var_three, var_four)
> ? ? ? # 2-space hanging indent forbidden
> ? ? ? foo = long_function_name(
> ? ? ? ? var_one, var_two, var_three,
> ? ? ? ? var_four)
> I propose we somehow incorporate these two allowed alternatives into PEP 8.
> They both serve a purpose.
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From guido at  Sat May 14 21:01:50 2011
From: guido at (Guido van Rossum)
Date: Sat, 14 May 2011 12:01:50 -0700
Subject: [Python-ideas] Minor tweak to PEP 8?
In-Reply-To: <>
References: <>
Message-ID: <>

Indeed. Somebody update PEP 8 please!

On Sat, May 14, 2011 at 11:51 AM, Gregory P. Smith <greg at> wrote:
> On Wed, May 11, 2011 at 7:23 AM, Guido van Rossum <guido at> wrote:
>> At Google we use the following rule (from
>> Yes:? # Aligned with opening delimiter
>> ? ? ? foo = long_function_name(var_one, var_two,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?var_three, var_four)
>> ? ? ? # 4-space hanging indent; nothing on first line
>> ? ? ? foo = long_function_name(
>> ? ? ? ? ? var_one, var_two, var_three,
>> ? ? ? ? ? var_four)
> and note that this should be "8-space hanging indent" if it goes into
> pep8. ?The rule is really "double your code indentation hanging
> indent" so that you can never confuse the two visually. ?it works
> well.
>> No: ? # Stuff on first line forbidden
>> ? ? ? foo = long_function_name(var_one, var_two,
>> ? ? ? ? ? var_three, var_four)
>> ? ? ? # 2-space hanging indent forbidden
>> ? ? ? foo = long_function_name(
>> ? ? ? ? var_one, var_two, var_three,
>> ? ? ? ? var_four)
>> I propose we somehow incorporate these two allowed alternatives into PEP 8.
>> They both serve a purpose.
>> --
>> --Guido van Rossum (
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at

--Guido van Rossum (

From greg at  Sat May 14 21:21:09 2011
From: greg at (Gregory P. Smith)
Date: Sat, 14 May 2011 12:21:09 -0700
Subject: [Python-ideas] Threading hooks and disable gc per thread
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 11, 2011 at 4:58 PM, Christian Heimes <lists at> wrote:
> Hello,
> today I've spent several hours debugging a segfault in JCC [1]. JCC is a
> framework to wrap Java code for Python. It's most prominently used in
> PyLucene [2]. You can read more about my debugging in [3]
> With JCC every Python thread must be registered at the JVM through JCC.
> An unattached thread, that accesses a wrapped Java object, leads to
> errors and may even cause a segfault. Accessing also includes garbage
> collection. A code line like
> ? a = {}
> or
> ? "a b c".split()
> can segfault since the allocation of a dict or a bound method runs
> through _PyObject_GC_New(), which may trigger a cyclic garbage
> collection run. If the current thread isn't attached to the JVM but
> triggers a gc.collect() with some Java objects in a cycle, the
> interpreter crashes. It's quite complicated and hard to "fix" third
> party tools to attach all threads created in the third party library.
> The issue could be solved with a simple on_thread_start hook in the
> threading module. However there is more to it. In order to free memory
> threads must also be detached from the JVM, when a thread has ended. A
> second on_thread_stop hook isn't enough since the bound methods may also
> lead to a gc.collect() run after the thread is detached.
> I propose three changes to Python in order to fix the issue:
> on thread start hook
> --------------------
> Similar to the atexit module, third party modules can register a
> callable with *args and **kwargs. The functions are called inside the
> newly created thread just before the target is called. The best place
> for the hook list is threading.Thread._bootstrap_inner() right before
> the try: except: block. Exceptions are ignored during the
> call but reported to the user at the end (same as atexit's
> atexit_callfunc())
> on thread end hook
> ------------------
> Same as on thread start hook but the callables are called inside the
> dying thread after

Makes sense to me.

Something that needs clarifying: when the process dies (main python
thread has exited and all remaining python threads are daemon threads)
the on thread end hook will _not_ be called.


This is really two separate feature requests.  The above thread hooks
and the below gc hooks.

> gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread()
> --------------------------------------------------------------
> Right now almost any code can trigger a gc.collect() run
> non-deterministicly. Some application like JCC want to control if
> gc.collect() is wanted on a thread level. This could be solved with a
> new flat in PyThreadState. PyThreadState->gc_enabled is enabled by
> default. When the flag is false, _PyObject_GC_Malloc() doesn't start a
> gc.collect() run for that thread. The collection is delayed until
> another thread or the main thread triggers it.
> The three functions should also have a C equivalent so C code can
> prevent gc in a thread.

This also sounds useful since we are a long long way from concurrent
gc.  (and whenever we gain that, we'd need a way to control when it
can or can't happen or to register the gc threads with the anything
that needs to know about 'em, JCC, etc..)



From lists at  Sun May 15 03:04:28 2011
From: lists at (Christian Heimes)
Date: Sun, 15 May 2011 03:04:28 +0200
Subject: [Python-ideas] Threading hooks and disable gc per thread
In-Reply-To: <>
References: <>
Message-ID: <>

Am 14.05.2011 21:21, schrieb Gregory P. Smith:
> Makes sense to me.
> Something that needs clarifying: when the process dies (main python
> thread has exited and all remaining python threads are daemon threads)
> the on thread end hook will _not_ be called.

Good catch! This gotcha should be mentioned in the docs. A daemon thread
can end at any point in its life cycle. It's not an issue for my use
case. For JCC the hook just frees some resources that are freed anyway
when the process ends. Other use cases may need a more deterministic
cleanup, but that's out of the scope for my proposal. Users can get
around the issue with an atexit hook, though.

> This also sounds useful since we are a long long way from concurrent
> gc.  (and whenever we gain that, we'd need a way to control when it
> can or can't happen or to register the gc threads with the anything
> that needs to know about 'em, JCC, etc..)

I though of a concurrent GC, too. A dedicated GC thread could improve
response time of a GUI or web application if we could separate the
cyclic garbage detection into two steps. Even on a fast machine, a full
GC sweep with millions of objects in gen2 can take a long time up to a
second, in which the interpreter is locked. I assume that the scanning a
million objects takes most of the time. If it would be possible to have
a scan without the GIL held and then remove the objects in a second step
with the GIL acquired, response time could increase. However that would
require a major redesign of the traverse and visit slots.

Back to my proposal. My initial proposal was missing one feature. It
should be possible to alter the default setting for
PyThreadState->gc_enabled, too. JCC could use the additional API to make
sure, non attached threads don't run the GC.

Example how JCC could use the feature:
lucene.initVM() initializes the Java VM and attaches the current thread.
This is usually done in the main thread before any other thread is
started. The function would call PyThread_set_gc_enabled(0) to set the
default value for new thread states and to prevent any new thread from
starting a cyclic GC collect.

lucene.getVM().attachCurrentThread() creates some thread local objects
in a TLS and registers the current thread at the Java VM. This would run
PyObject_GC_set_thread_enabled(1) to allow GC collect in the current thread.

lucene.getVMEnv().detachCurrentThread() cleans up the TLS and
unregisters the thread, so a PyObject_GC_set_thread_enabled(0) is required.

The implementation is rather simple:
 - a new static int variable for the default setting and a new flag in
the PyThreadState struct
 - check PyThreadState_Get()->gc_enabled in _PyObject_GC_Malloc()
 - four small functions to set and get the default and thread setting
 - three Python functions in the gc module to enable, disable and get
the flag from the current PyThreadState
 - a function to get the global flag. I'm not sure if we should expose
the global switch for Python code.

The attached patch already has all C functionality. If I hear more +1,
then I'll write two small PEPs for both feature requests.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc_thread.diff
Type: text/x-patch
Size: 3331 bytes
Desc: not available
URL: <>

From ncoghlan at  Sun May 15 12:40:31 2011
From: ncoghlan at (Nick Coghlan)
Date: Sun, 15 May 2011 20:40:31 +1000
Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8
In-Reply-To: <iqm2t7$3i7$>
References: <>
Message-ID: <>

On Sun, May 15, 2011 at 12:17 AM, Georg Brandl <g.brandl at> wrote:
> I'd hope that simple things like "k in d" are already in every tutorial
> on Python that's worth anything...

In this particular case, the official docs are already quite explicit:

"""has_key() is deprecated in favor of key in d."""


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Sun May 15 13:13:56 2011
From: ncoghlan at (Nick Coghlan)
Date: Sun, 15 May 2011 21:13:56 +1000
Subject: [Python-ideas] Threading hooks and disable gc per thread
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 12, 2011 at 9:58 AM, Christian Heimes <lists at> wrote:
> on thread start hook
> --------------------
> Similar to the atexit module, third party modules can register a
> callable with *args and **kwargs. The functions are called inside the
> newly created thread just before the target is called. The best place
> for the hook list is threading.Thread._bootstrap_inner() right before
> the try: except: block. Exceptions are ignored during the
> call but reported to the user at the end (same as atexit's
> atexit_callfunc())
> on thread end hook
> ------------------
> Same as on thread start hook but the callables are called inside the
> dying thread after

So the plan is to have threading.Thread support the hooks, while
_thread.start_new_thread and creation of thread states at the C level
(including via PyGILState_Ensure) will bypass them?

That actually sounds reasonable to me (+0), but the PEP should at
least discuss the rationale for the choice of level for the new
feature. I also suggest storing the associated hook lists at the
threading.Thread class object level rather than at the threading
module level (supporting such modularity of state being a major
advantage of only providing this feature at the higher level).

The PEP should also go into detail as to why having these hooks in a
custom Thread subclass isn't sufficient (e.g. needing to support
threads created by third party libraries, but note that such a
rationale has a problem due to the _thread.start_new_thread loophole).

Composability through inheritance should also be discussed - the hook
invocation should probably walk the MRO so it is easy to create Thread
subclasses that include class specific hooks without inadvertently
skipping the hooks installed on threading.Thread.

The possibility of passing exception information to thread_end hooks
(ala __exit__ methods) should be considered, along with the general
relationship between the threading hooks and the context management

> gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread()
> --------------------------------------------------------------
> Right now almost any code can trigger a gc.collect() run
> non-deterministicly. Some application like JCC want to control if
> gc.collect() is wanted on a thread level. This could be solved with a
> new flat in PyThreadState. PyThreadState->gc_enabled is enabled by
> default. When the flag is false, _PyObject_GC_Malloc() doesn't start a
> gc.collect() run for that thread. The collection is delayed until
> another thread or the main thread triggers it.
> The three functions should also have a C equivalent so C code can
> prevent gc in a thread.

The default setting for this should go in the interpreter state object
rather than in a static variable (subinterpreters can then inherit the
state of their parent interpreter when they are first created).

Otherwise sounds reasonable. (+0)


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From dag.odenhall at  Tue May 17 16:14:34 2011
From: dag.odenhall at (dag.odenhall at
Date: Tue, 17 May 2011 16:14:34 +0200
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <iqieb6$2r0$>
References: <>
	<> <>
	<> <iqhf45$glv$>
	<> <iqieb6$2r0$>
Message-ID: <>

Excuse me if this has already been discussed, but couldn't
__instancecheck__ be used to add exception types that match with more

From pyideas at  Wed May 18 00:41:40 2011
From: pyideas at (Chris Rebert)
Date: Tue, 17 May 2011 15:41:40 -0700
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <>
References: <>
	<> <>
	<> <iqhf45$glv$>
	<> <iqieb6$2r0$>
Message-ID: <>

On Tue, May 17, 2011 at 7:14 AM, dag.odenhall at
<dag.odenhall at> wrote:
> Excuse me if this has already been discussed, but couldn't
> __instancecheck__ be used to add exception types that match with more
> precision?

Somewhat related bug:


From dag.odenhall at  Wed May 18 12:07:28 2011
From: dag.odenhall at (dag.odenhall at
Date: Wed, 18 May 2011 12:07:28 +0200
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <>
References: <>
	<> <>
	<> <iqhf45$glv$>
	<> <iqieb6$2r0$>
Message-ID: <>

On 18 May 2011 00:41, Chris Rebert <pyideas at> wrote:
> On Tue, May 17, 2011 at 7:14 AM, dag.odenhall at
> <dag.odenhall at> wrote:
>> Excuse me if this has already been discussed, but couldn't
>> __instancecheck__ be used to add exception types that match with more
>> precision?
> Somewhat related bug:

Interesting. If that is intentional I'd advocate against it unless
there's a strong argument for it.

Another idea (also likely already proposed) would be to match against
instances as well, by the 'args' attribute:

except IOError(32):  # isinstance IOError and .args == (32,)

If this seems crazy consider that it's (to some extent) similar to the
behavior of 'raise'.

From jeanpierreda at  Wed May 18 14:24:04 2011
From: jeanpierreda at (Devin Jeanpierre)
Date: Wed, 18 May 2011 08:24:04 -0400
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <>
References: <>
	<> <>
	<> <iqhf45$glv$>
	<> <iqieb6$2r0$>
Message-ID: <>

On Wed, May 18, 2011 at 6:07 AM, dag.odenhall at
<dag.odenhall at> wrote:
> Interesting. If that is intentional I'd advocate against it unless
> there's a strong argument for it.
> Another idea (also likely already proposed) would be to match against
> instances as well, by the 'args' attribute:
> try:
> ? ?...
> except IOError(32): ?# isinstance IOError and .args == (32,)
> ? ?...
> If this seems crazy consider that it's (to some extent) similar to the
> behavior of 'raise'.

Unfortunately, as described it wouldn't match IOError(32, 'Blah blah
blah'). Although maybe it makes sense to create an Anything builtin,
which is equal to everything, such that IOError(X, Y) == IOError(X,
Anything) == IOError(X, Z) for all X, Y, and Z (except stupid X like X
= nan).

I do like it.

Devin Jeanpierre

From jeanpierreda at  Wed May 18 14:46:29 2011
From: jeanpierreda at (Devin Jeanpierre)
Date: Wed, 18 May 2011 08:46:29 -0400
Subject: [Python-ideas] Rename python.exe to python3.exe on Windows
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

I think going the Arch route might be unrealistic, because there will
be no Python 2.8, and thus no chance to rename python.exe to
python2.exe. Arch can do it because they have their own distribution
of Python, Microsoft Windows does not. All I can think of is having
Python 3 installers also install symlinks or batch scripts for Python
2 installations. This could break because you might install a Python 2
installation after Python 3 (and in an unexpected place, if the
installer tries to predict things).

If 3.3 has its executable renamed, the worst situation is that python
refers to 3.1 or 3.2, and python3 to 3.3. This could be resolved by
removing 3.2 or 3.1 from the PATH (and if you still wanted to access
them, adding python31.exe and python32.exe symlinks somewhere on the
PATH). This situation is increasingly unlikely to occur as time goes
on and fewer people put 3.2 or 3.1 on the PATH at all.

Devin Jeanpierre

On Sat, May 7, 2011 at 10:28 PM, Steven D'Aprano <steve at> wrote:
> Ben Finney wrote:
>> If the default ?python? were Python 3.x, programs expecting Python 2.x
>> would most likely break due to backward incompatibility. So it's best if
>> the ?python? program invokes only Python 2.x.
> The first sentence is true. The second is a value judgement, not a statement
> of fact, and the people behind Arch Linux disagree with you.
> I say, good on 'em.
> I wish I could find the quote somebody made about Arch being the distro that
> makes Gentoo seem cautious and conservative... something about Arch moving
> forward so the Gentoo folks know which mistakes not to make?
> --
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From ethan at  Wed May 18 22:10:15 2011
From: ethan at (Ethan Furman)
Date: Wed, 18 May 2011 13:10:15 -0700
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

As those who have to work with byte strings know, when retrieving a 
single character from a byte string, what you get back is not a byte 
string, but an int -- a rather important distinction from unicode 
strings (str).

This has the frustrating side-effect of

b'abc'[2] == b'c'

being False.

It is far too late to change that particular behavior of the byte string 
(returning int's, that is) -- however, it may not be too late for a 
non-backwards-incompatible change:

have the bytes class' __eq__ method be modified so that it
    1) checks to see if the bytes instance is length 1
    2) checks to see if
       a) the other object is an int, and
       b) 0 <= other_obj < 256
    3) if 1 and 2, make the comparison between the int and its
         single element instead of returning NotImplemented?

This makes sense to me -- after all, the bytes class is an array of ints
in range(256);  it is a special case, but doesn't feel any more special
than passing an int into bytes() giving a string of that many null
bytes; and it would get rid of the, in my opinion ugly, idiom of

some_var[i:i+1] == b'd'

It would also not require a new literal syntax.



From ethan at  Wed May 18 23:11:10 2011
From: ethan at (Ethan Furman)
Date: Wed, 18 May 2011 14:11:10 -0700
Subject: [Python-ideas] [Python-Dev] Python 3.x and bytes
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote [from python-dev]:
> Immutable objects that compare equal should hash equal;
> so we would also have to change the hashing of byte strings. Not sure
> whether that, in turn, has undesirable consequences.

I thought it was the other-way-round -- if they hash equal, they should
compare equal?  Or is this just for immutables?

> In addition, equality should be transitive, so b'A' == 65.0.

I'm not sure what you're getting at...  we could certainly have step 2
check for a number instead of an int, and then step 3 could extract the
one element, giving an int, and then let that int compare itself with
the other number, whether it be int, float, fraction, what-have-you.


From fdrake at  Wed May 18 23:04:13 2011
From: fdrake at (Fred Drake)
Date: Wed, 18 May 2011 17:04:13 -0400
Subject: [Python-ideas] [Python-Dev] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Wed, May 18, 2011 at 5:11 PM, Ethan Furman <ethan at> wrote:
> I thought it was the other-way-round -- if they hash equal, they should
> compare equal? ?Or is this just for immutables?

Two values that compare equal must have equal hashes.

Having equal hashes does not imply equality.


Fred L. Drake, Jr.? ? <fdrake at>
"Give me the luxuries of life and I will willingly do without the necessities."
?? --Frank Lloyd Wright

From greg.ewing at  Thu May 19 00:13:09 2011
From: greg.ewing at (Greg Ewing)
Date: Thu, 19 May 2011 10:13:09 +1200
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <>
References: <>
	<> <>
	<> <iqhf45$glv$>
	<> <iqieb6$2r0$>
Message-ID: <>

Devin Jeanpierre wrote:
> On Wed, May 18, 2011 at 6:07 AM, dag.odenhall at
> <dag.odenhall at> wrote:
>>except IOError(32):  # isinstance IOError and .args == (32,)
>>   ...
> Unfortunately, as described it wouldn't match IOError(32, 'Blah blah
> blah').

Also it's a bit magical -- normally one doesn't expect
Haskell-like pattern matching in Python.

Maybe something more explicit would be better:

   except IOError as e with e.errno == 32:


From tjreedy at  Thu May 19 05:10:01 2011
From: tjreedy at (Terry Reedy)
Date: Wed, 18 May 2011 23:10:01 -0400
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <ir21ma$72p$>

On 5/18/2011 4:10 PM, Ethan Furman wrote:
> As those who have to work with byte strings know, when retrieving a
> single character from a byte string, what you get back is not a byte
> string, but an int -- a rather important distinction from unicode
> strings (str).

For all sequences, slicing (if it works at all) returns a subsequence 
(possibly of length 0, which is why slicing can work with out-of-bounds 
slice points). For all (built-in) sequences except for strings, indexing 
returns a member of the sequence (which is why it raises an exception 
for out-of-bounds indexes). Leaving aside extension and user-defined 
sequences, strings are unique in instead returning a length-1 
subsequence So bytes are normal while strings are anomolous!

Why that anomaly? The immediate reason is that Python does not have a 
separate character type. Why not? Guido might best answer (but he might 
say 'my gut instinct'), but I can think of a few reasons.

1. That is how it is in the (math) theory of strings. 'A' is both a char 
and a string of length one. There is no separate 'char' type that cannot 
be added (concatenated) to other strings of whatever length.

2. (Related) This pragmatically works best for Python.

3. Python follows Occam's principle by not introducing types without 
necessity. And a separate char type is not *necessary*.

4. Text strings are homegeneous arrays (like the arrays in the array 
module), unlike heterogeneous tuples and lists. So they need not be 
sequences of Python objects, and for efficiency, would not be even if 
there were a character type. Like other arrays, they contain the 
information needed to produce Python objects on demand without actually 
containing such objects in the way tuples, lists, and dicts do.

I do, however, understand the tendency to think of bytes as strings 
because of both Python's history and the remnant string interface.

For people using non-Latin (non-ascii) alphabets, the 'convenience' of 
replacing some bytes with ascii-chars might be less convenient.

Terry Jan Reedy

From jeanpierreda at  Thu May 19 07:02:32 2011
From: jeanpierreda at (Devin Jeanpierre)
Date: Thu, 19 May 2011 01:02:32 -0400
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <ir21ma$72p$>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Wed, May 18, 2011 at 11:10 PM, Terry Reedy <tjreedy at> wrote:
> For all sequences, slicing (if it works at all) returns a subsequence
> (possibly of length 0, which is why slicing can work with out-of-bounds
> slice points). For all (built-in) sequences except for strings, indexing
> returns a member of the sequence (which is why it raises an exception for
> out-of-bounds indexes). Leaving aside extension and user-defined sequences,
> strings are unique in instead returning a length-1 subsequence So bytes are
> normal while strings are anomolous!

I don't see the necessity of saying that length-1 strings aren't
members of strings. For all definitions I can think of for "member of
the sequence", they are. You get them when you iterate over them, you
get them when you use index access, they work with .index(). They have
a sort of infinite regress / cycle to them ("it's strings all the way
down"), but you can get that with lists too (x = []; x.append(x); y =
x + x -- compare with x = 'a'; y = x + x).

> 1. That is how it is in the (math) theory of strings. 'A' is both a char and
> a string of length one. There is no separate 'char' type that cannot be
> added (concatenated) to other strings of whatever length.

At least in the context of formal language theory (e.g. Sipser's
Introduction to the Theory of Computation), characters (symbols) are a
separate thing from strings. You have your alphabet, Sigma, which is
an arbitrary set, and strings are finite sequences of elements from

In Python's case, it's chosen an alphabet where all elements are
length-1 strings in the alphabet. I don't think that's really
well-formed using this definition of string and ZFC, and the usual
definitions of finite sequences (functions or linked-lists). It
doesn't really matter, you can model it in something else.

> I do, however, understand the tendency to think of bytes as strings because
> of both Python's history and the remnant string interface.

I would add the syntax of bytes literals to the list of similarities.
br'\foo' versus r'\foo' makes them very similar.

> For people using non-Latin (non-ascii) alphabets, the 'convenience' of
> replacing some bytes with ascii-chars might be less convenient.

Eh, actually I think what was suggested was having w.g. b'\x42' ==
0x42 by making singleton bytes objects equal to the appropriate
integer. This would work for all bytes, not just those smaller than

Devin Jeanpierre

From dag.odenhall at  Thu May 19 10:33:11 2011
From: dag.odenhall at (dag.odenhall at
Date: Thu, 19 May 2011 10:33:11 +0200
Subject: [Python-ideas] PEP-3151 pattern-matching
In-Reply-To: <>
References: <>
	<> <>
	<> <iqhf45$glv$>
	<> <iqieb6$2r0$>
Message-ID: <>

On 19 May 2011 00:13, Greg Ewing <greg.ewing at> wrote:
> Devin Jeanpierre wrote:
>> On Wed, May 18, 2011 at 6:07 AM, dag.odenhall at
>> <dag.odenhall at> wrote:
>>> except IOError(32): ?# isinstance IOError and .args == (32,)
>>> ?...
>> Unfortunately, as described it wouldn't match IOError(32, 'Blah blah
>> blah').
> Also it's a bit magical -- normally one doesn't expect
> Haskell-like pattern matching in Python.
> Maybe something more explicit would be better:
> ?try:
> ? ?...
> ?except IOError as e with e.errno == 32:
> ? ?...

Then we're back where we started in which case I prefer 'if' as the keyword.

From andrew at  Thu May 19 14:27:57 2011
From: andrew at (andrew cooke)
Date: Thu, 19 May 2011 08:27:57 -0400
Subject: [Python-ideas] Type Metadata (and related ideas)
Message-ID: <>


I just finished working on a project that plays around with ABCs and
function annotations.  The idea was to allow for more delcarative code
by adding tools to describe Pytohn data in more detail.

While I don't think the result is suitable for adding to Python (it's
way too big a change, and it's not yet proven), the process of making
something consistent involved working through a lot of ideas about
"types in Python" that I recorded at

Part of that paper (pages 8 and 9) describes some issues that caused
particular problems, including:

* The lack of annotations on type generators makes it hard to use
  annotations as a way of completely describing types.

* There seems to be a missing ABC for __getitem__ (which unites lists,
  dicts and tuples).

* As ever, mutability is complicated :o) If we had copy on write lists
  (which already exist), perhaps we could hash instances efficiently?
  (OK, this may be already discussed, but I had to mention it)

* Given duck typing, shouldn't AttributeError be a TypeError (or vice

* For this particular use-case, an __instancehook__ (which would work
  much like __subclasshook__) for ABCMeta would have been useful (as
  described in the paper, polymorphism in Python occurs at the
  instance level, so asking about the types of instances makes a
  surprising amount of sense, if done right).

Anyway, apologies if some or all of this is old news or inapprorpiate.
I just thought people here might find it interesting (you can do
things like type check functions and use dynamic dispatch by type -
all in a fairly pythonic way... (imho))


PS The project home and more docs are at
; the code is at ; pypi page is

From jackdied at  Fri May 20 06:46:14 2011
From: jackdied at (Jack Diederich)
Date: Fri, 20 May 2011 00:46:14 -0400
Subject: [Python-ideas] function defaults and an empty() builtin
Message-ID: <>

During a code review I got asked a question about a "pythonic" idiom
I've been asked about before.  The code was like this:

def func(optional=None):
  if optional is None:
    optional = []

The question was why the optional value wasn't set to an empty list in
the first place.  The answer is that Really Bad Things can happen if
someone actually goes and manipulates that empty list because all
future callers will see the modified version.

I don't think this defensive programming practice is yet passe - I can
think of lots of unit tests that wouldn't trigger bad behavior.  You
would have to intentionally provoke it by adding some unit tests to be

What would make my life a little easier is a builtin container named
"empty()" that emulates all builtin containers and raises an exception
for any add/subtract manipulations.  Something like:

class empty():
    def _bad_user(self, *args):
      raise ValueError("empty objects are empty")
    append = pop = __getitem__ = add = setdeafult = __ior__ = __iand__
= _bad_user
    def _empty(self):
      return []
    items = keys = values = get = _empty

return nothing when asked for something and raise a ValueError when
any attempt is made to add/remove items.


From stephen at  Fri May 20 07:44:30 2011
From: stephen at (Stephen J. Turnbull)
Date: Fri, 20 May 2011 14:44:30 +0900
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <ir21ma$72p$>
References: <>
	<> <>
	<> <>
Message-ID: <>

Terry Reedy writes:

 > 3. Python follows Occam's principle by not introducing types without 
 > necessity. And a separate char type is not *necessary*.

Well, neither are floats and integers; Decimal should do, no?

 > For people using non-Latin (non-ascii) alphabets, the 'convenience' of 
 > replacing some bytes with ascii-chars might be less convenient.

For us, the convenience remains.  Japanese mail is transmitted via
SMTP, and the control function "hello" is still spelled "EHLO" in
Japanese mail.  Farsi web pages are formatted by HTML, and the control
function "new line" is spelled "<BR>" in Farsi, of course.

It's the pain that comes from the inevitable mixing of binary protocol
that looks like text with real text, turning the whole into an
unintelligible garble, that hurts so much harder for people who can't
properly write their names in ASCII.

???????????????-ly y'rs,

From steve at  Fri May 20 07:57:04 2011
From: steve at (Steven D'Aprano)
Date: Fri, 20 May 2011 15:57:04 +1000
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, 20 May 2011 02:46:14 pm Jack Diederich wrote:
> During a code review I got asked a question about a "pythonic" idiom
> I've been asked about before.  The code was like this:
> def func(optional=None):
>   if optional is None:
>     optional = []
> The question was why the optional value wasn't set to an empty list
> in the first place.  The answer is that Really Bad Things can happen
> if someone actually goes and manipulates that empty list because all
> future callers will see the modified version.

Assuming that this behaviour is not intended. However, I agree that, in 
general, the behaviour of mutable defaults in Python is a Gotcha.

> I don't think this defensive programming practice is yet passe - I
> can think of lots of unit tests that wouldn't trigger bad behavior. 
> You would have to intentionally provoke it by adding some unit tests
> to be sure.

Er, yes... how is that different from any other behaviour, good or bad? 
You have to write the unit tests to test the behaviour you want to test 
for, or else it won't be tested.

> What would make my life a little easier is a builtin container named
> "empty()" that emulates all builtin containers and raises an
> exception for any add/subtract manipulations.  Something like:

I don't think that this idea will actually be as useful as you think it 
will, but in any case, why does it need to be a built-in?

def func(optional=empty()):

works just as well whether empty is built-in or not.

But as I said, I don't think this will fly. What's the point? If you 
don't pass an argument for optional, and get a magic empty list, your 
function will raise an exception as soon as it tries to do something 
with the list. To my mind, that makes it rather useless. If you want 
the function to raise an exception if the default value is used, surely 
it's better to just make the argument non-optional.

But perhaps I've misunderstood something.

Steven D'Aprano

From masklinn at  Fri May 20 08:37:30 2011
From: masklinn at (Masklinn)
Date: Fri, 20 May 2011 08:37:30 +0200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On 2011-05-20, at 07:57 , Steven D'Aprano wrote:
> But as I said, I don't think this will fly. What's the point? If you 
> don't pass an argument for optional, and get a magic empty list, your 
> function will raise an exception as soon as it tries to do something 
> with the list. To my mind, that makes it rather useless. If you want 
> the function to raise an exception if the default value is used, surely 
> it's better to just make the argument non-optional.
> But perhaps I've misunderstood something.
That Jack's object would be an empty, immutable collection. Not an
arbitrary object.

The idea is rather similar to Java's Collections.empty* (emptySet,
emptyMap and emptyList), which could be fine solutions to this issue
indeed. There is already `frozenset` for sets, but there is no way to
instantiate an immutable list or dict in Python right now, as far as I
know (tuples don't work as several mixed list&tuple operations yield
an error, and I don't like using tuples as sequences personally).

The ability to either make a collection (list or dict, maybe via
separate functions) immutable or to create a special immutable empty
variant thereof would work nicely.

From tjreedy at  Fri May 20 10:28:39 2011
From: tjreedy at (Terry Reedy)
Date: Fri, 20 May 2011 04:28:39 -0400
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<>	<>
	<>	<ir21ma$72p$>
Message-ID: <ir58nn$8tl$>

On 5/20/2011 1:44 AM, Stephen J. Turnbull wrote:

>   >  For people using non-Latin (non-ascii) alphabets, the 'convenience' of
>   >  replacing some bytes with ascii-chars might be less convenient.
> For us, the convenience remains.

I understood the thrust of this thread being that doing text
manipulation with bytes sometimes bites -- because bytes are not text.
Someone writing email or html bodies in Japanese or Farsi will not even
try that, but will use str (unicode) and encode to bytes only when done,
most likely transparently..

As far as I noticed, Ethan did not explain why he was extracting single
bytes and comparing to a constant, so it is hard to know if he was even
using them properly.

>  Japanese mail is transmitted via
> SMTP, and the control function "hello" is still spelled "EHLO" in
> Japanese mail.

I am not familiar with that control function, but if it is part of the
SMTP protocol, it has nothing to do with the language of the payload.
For programming a wire protocol that encodes abstract functions in ascii
chars, then the ascii char representation of bytes in convenient. That
is why it was chosen as the default.

> Farsi web pages are formatted by HTML, and the control
> function "new line" is spelled "<BR>" in Farsi, of course.

When writing the html *text* body, sure. But I presume browsers decode
encoded bytes to unicode *before* parsing the text. If so, it does not
really matter that '<br>' gets encoded to b'<br>'.

> It's the pain that comes from the inevitable mixing of binary protocol
> that looks like text with real text, turning the whole into an
> unintelligible garble, that hurts so much harder for people who can't
> properly write their names in ASCII.
> ???????????????-ly y'rs,

Terry Jan Reedy

From steve at  Fri May 20 13:54:30 2011
From: steve at (Steven D'Aprano)
Date: Fri, 20 May 2011 21:54:30 +1000
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, 20 May 2011 04:37:30 pm you wrote:
> On 2011-05-20, at 07:57 , Steven D'Aprano wrote:
> > But as I said, I don't think this will fly. What's the point? If
> > you don't pass an argument for optional, and get a magic empty
> > list, your function will raise an exception as soon as it tries to
> > do something with the list. To my mind, that makes it rather
> > useless. If you want the function to raise an exception if the
> > default value is used, surely it's better to just make the argument
> > non-optional.
> >
> > But perhaps I've misunderstood something.
> That Jack's object would be an empty, immutable collection. Not an
> arbitrary object.

Yes, I get that, but what's the point? What's an actual use-case for it?
What's the point of having an immutable collection that has the same 
methods as a list, but raises an exception if you use them? 

Most importantly, why single out an *empty* immutable list for special 
treatment, instead of providing a general immutable list type?

It seems to me that all this suggested pattern does is use a too-clever 
and round-about way of turning a buggy function into an exception for 
the caller, possibly a long way from where the error actually exists.

I can't think of any reason I would use this special empty() value as a 
default instead of either:

- fix the function to not use the same default list; or
- if using a default value causes problems, don't use a default value

> The ability to either make a collection (list or dict, maybe via
> separate functions) immutable or to create a special immutable empty
> variant thereof would work nicely.

These are two different issues. Being able to freeze an object would be 
handy, but a special dedicated empty immutable list strikes me as 
completely pointless.

Steven D'Aprano

From masklinn at  Fri May 20 14:14:09 2011
From: masklinn at (Masklinn)
Date: Fri, 20 May 2011 14:14:09 +0200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On 2011-05-20, at 13:54 , Steven D'Aprano wrote:
> On Fri, 20 May 2011 04:37:30 pm you wrote:
>> On 2011-05-20, at 07:57 , Steven D'Aprano wrote:
>>> But as I said, I don't think this will fly. What's the point? If
>>> you don't pass an argument for optional, and get a magic empty
>>> list, your function will raise an exception as soon as it tries to
>>> do something with the list. To my mind, that makes it rather
>>> useless. If you want the function to raise an exception if the
>>> default value is used, surely it's better to just make the argument
>>> non-optional.
>>> But perhaps I've misunderstood something.
>> That Jack's object would be an empty, immutable collection. Not an
>> arbitrary object.
> Yes, I get that, but what's the point? What's an actual use-case for it?
> What's the point of having an immutable collection that has the same 
> methods as a list, but raises an exception if you use them? 
Not if you use them, if you *modify* them.

I'm guessing the point is to be able to avoid the `if collection is None`
dance when the collection is not *supposed* to be modified: an immutable
collection would immediately raise on modification, acting as a
precondition/invariant and ensuring mutation is not introduced on the
original collection.

> Most importantly, why single out an *empty* immutable list for special 
> treatment, instead of providing a general immutable list type?
I'm pretty sure I mentioned that as a good idea in the following two
paragraphs of my comment.

But in Jack's case, I'm guessing it's because the Python bug of
collections-as-default-values is most generally encountered with empty

> I can't think of any reason I would use this special empty() value as a 
> default instead of either:
> - fix the function to not use the same default list; or
> - if using a default value causes problems, don't use a default value
But that's the very issue: the mutable-collection-default is a common
bug, and one which may be quite hard to debug in the long term (not just
that, but it may not even be visible as a bug ? even though data is
corrupted ? and manifest itself as an even harder to track memory leak).

By making that default-empty-collection immutable, mutations of the
default-argument collection become obvious (they blow up), and the
function can be fixed. It's much easier to track this down than a
strange memory leak.

On 2011-05-20, at 13:54 , Steven D'Aprano wrote:
>> The ability to either make a collection (list or dict, maybe via
>> separate functions) immutable or to create a special immutable empty
>> variant thereof would work nicely.
> These are two different issues. Being able to freeze an object would be 
> handy, but a special dedicated empty immutable list strikes me as 
> completely pointless.

An immutable empty collection can be a system-wide singleton, and
extremely cheap to use. It makes for a good default value or default
object member when you expect the collection to never be modified.

Using `collections.empty_list` is also more readable and clearer than,
say, `collections.freeze([])`.

From ethan at  Fri May 20 15:05:37 2011
From: ethan at (Ethan Furman)
Date: Fri, 20 May 2011 06:05:37 -0700
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <ir58nn$8tl$>
References: <>	<>	<>	<>	<>	<>	<>	<>	<ir21ma$72p$>	<>
Message-ID: <>

Terry Reedy wrote:
> As far as I noticed, Ethan did not explain why he was extracting single
> bytes and comparing to a constant, so it is hard to know if he was even
> using them properly.

The header of a .dbf file details the field composition such as name, 
size, type, etc.  The type is C for character, L for logical, etc, and 
the end of the field definition block is signaled by a CR byte.

So in one spot of my code I (used to) have a comparison

if hdr[0] == b'\x0d': # end of fields

which I have changed to

if hdr[0] == 0x0d:

and elsewhere:

field_type = hdr[11]

which is now

field_type = chr(hdr[11])

since the first 127 positions of unicode are ASCII.

However, I can see this silently producing errors for values between 128 
and 255 -- consider:

--> chr(0xa1)
--> b'\xa1'.decode('cp1251')

So because my single element access to the byte string lost its bytes 
type, I may no longer get the correct result.


From dsdale24 at  Fri May 20 15:14:41 2011
From: dsdale24 at (Darren Dale)
Date: Fri, 20 May 2011 09:14:41 -0400
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ir58nn$8tl$> <>
Message-ID: <>

On Fri, May 20, 2011 at 9:05 AM, Ethan Furman <ethan at> wrote:
> Terry Reedy wrote:
>> As far as I noticed, Ethan did not explain why he was extracting single
>> bytes and comparing to a constant, so it is hard to know if he was even
>> using them properly.
> The header of a .dbf file details the field composition such as name, size,
> type, etc. ?The type is C for character, L for logical, etc, and the end of
> the field definition block is signaled by a CR byte.
> So in one spot of my code I (used to) have a comparison
> if hdr[0] == b'\x0d': # end of fields
> which I have changed to
> if hdr[0] == 0x0d:
> and elsewhere:
> field_type = hdr[11]
> which is now
> field_type = chr(hdr[11])
> since the first 127 positions of unicode are ASCII.
> However, I can see this silently producing errors for values between 128 and
> 255 -- consider:
> --> chr(0xa1)
> '?'
> --> b'\xa1'.decode('cp1251')
> '\u040e'
> So because my single element access to the byte string lost its bytes type,
> I may no longer get the correct result.

Can you use a single element stride as a workaround?

>>> b'01234'
>>> b'01234'[0]
>>> b'01234'[0:1]

From p.f.moore at  Fri May 20 15:28:19 2011
From: p.f.moore at (Paul Moore)
Date: Fri, 20 May 2011 14:28:19 +0100
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ir58nn$8tl$> <>
Message-ID: <>

On 20 May 2011 14:05, Ethan Furman <ethan at> wrote:
> Terry Reedy wrote:
>> As far as I noticed, Ethan did not explain why he was extracting single
>> bytes and comparing to a constant, so it is hard to know if he was even
>> using them properly.
> The header of a .dbf file details the field composition such as name, size,
> type, etc. ?The type is C for character, L for logical, etc, and the end of
> the field definition block is signaled by a CR byte.
> So in one spot of my code I (used to) have a comparison
> if hdr[0] == b'\x0d': # end of fields
> which I have changed to
> if hdr[0] == 0x0d:

This seems to me to be an improvement, regardless...

> and elsewhere:
> field_type = hdr[11]
> which is now
> field_type = chr(hdr[11])
> since the first 127 positions of unicode are ASCII.

That seems reasonable, if you have a fixed set of known-ASCII values
that are field types. If you care about detecting invalid files, then
do a field_type in 'CL...' test to validate and you're fine.

> However, I can see this silently producing errors for values between 128 and
> 255 -- consider:
> --> chr(0xa1)
> '?'
> --> b'\xa1'.decode('cp1251')
> '\u040e'

But those aren't valid field codes, so why do you care? And why are
you using cp1251? I thought you said they were ASCII? As I said, if
you're checking for error values, just start with either a check for
specific values, or simply check the field type is <128.

> So because my single element access to the byte string lost its bytes type,
> I may no longer get the correct result.

I still don't see your problem here...


From ncoghlan at  Fri May 20 17:03:04 2011
From: ncoghlan at (Nick Coghlan)
Date: Sat, 21 May 2011 01:03:04 +1000
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

I share Steve's puzzlement as the intended use case.

To get value from the magic empty immutable list, you will have to
explicitly test that calling your function with the default value does
the right thing.

But if you're writing an explicit test, having that test call the
function *twice* to confirm correct use of the 'is None' idiom will
work just as well.

There are limits to how much we can help people that don't test their code.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Fri May 20 17:16:46 2011
From: ncoghlan at (Nick Coghlan)
Date: Sat, 21 May 2011 01:16:46 +1000
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ir58nn$8tl$> <>
Message-ID: <>

On Fri, May 20, 2011 at 11:05 PM, Ethan Furman <ethan at> wrote:
> which is now
> field_type = chr(hdr[11])

This is definitely a modelling problem, and exactly the kind of
thinking that the bytes model in Py3k is intended to combat.

Bytes are not text, even when you're dealing primarily with ASCII. The
world where that mindset worked consistently and reliably is ancient
history (and many non-English speakers still suffer annoying software
glitches due to the fact that English speakers have been able to get
by with only ASCII for so long).

If you want a subscript on a bytes object to create another bytes
object, then slice it, just as you would a list. If you want the
integer value, index it.

> So because my single element access to the byte string lost its bytes type, I may no longer get the correct result.

Umm, no. You may not get the correct result because you're telling
Python to interpret a value as a Unicode code point when it is
actually no such thing (given your example, I assume it is actually
cp1251 encoded text). Therefore, instead of:

chr(hdr[11]) # Only makes sense for a sequence of Unicode code points

you want something like:

hdr[11:12].decode('cp1251') # Makes sense for a cp1251 encoded byte sequence


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ethan at  Fri May 20 18:31:04 2011
From: ethan at (Ethan Furman)
Date: Fri, 20 May 2011 09:31:04 -0700
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Masklinn wrote:
> I'm guessing the point is to be able to avoid the `if collection is None`
> dance when the collection is not *supposed* to be modified: an immutable
> collection would immediately raise on modification, acting as a
> precondition/invariant and ensuring mutation is not introduced on the
> original collection.

If the function can't proceed properly without an actual parameter, why 
supply a default?  Make it required, and then the function will blow up 
when it's called without one.

I suppose there could be a case where one is going to iterate through a 
collection, and useful work may still happen if said collection is 
empty, and one is feeling too lazy to create an empty one on the spot 
where the function is called and so relies an the immutable empty 
default... but if one knows all that one should be able to not call any 
mutating methods.

But the original problem is that an empty list is used as the default 
because an actual list is expected.  I think the problem has been 
misunderstood -- it's not *if* the list gets modified, but *when* -- so 
you would have the same dance, only instead of None, your now saying

if default == empty():
     default = []

So you haven't saved a thing, and still don't really get the purpose 
behind mutable defaults.


From janssen at  Fri May 20 19:35:12 2011
From: janssen at (Bill Janssen)
Date: Fri, 20 May 2011 10:35:12 PDT
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ir58nn$8tl$> <>
Message-ID: <>

Nick Coghlan <ncoghlan at> wrote:

> On Fri, May 20, 2011 at 11:05 PM, Ethan Furman <ethan at> wrote:
> > which is now
> >
> > field_type = chr(hdr[11])
> This is definitely a modelling problem, and exactly the kind of
> thinking that the bytes model in Py3k is intended to combat.
> Bytes are not text, even when you're dealing primarily with ASCII. The

To me, that's the crux of this issue, and that's the reason this will
keep coming up again and again, and that's the reason people will
continue to want to "improve" the 'bytes' type to be more 'string-like'.

The problem, of course, is that bytes often *are* text, in the sense
that the byte sequence contains an encoded string, and the programmer
both knows that and wants that.  Even for non-ASCII strings.  Because
Python is widely used for processing encoded strings of various kinds,
and programmers hate to decode/encode just to work on them *as* strings.

Mind you, that's exactly the wrong thing to do, in my opinion.  It just
gets us back to the bad old days of Python 2, where strings were often
kept in a sequence of bytes which had no way of indicating what encoding
it had.  But changing the mindset of programmers?  Hard to do, very hard
to do.

Personally, I think a more realistic approach might be to (a) improve
the implementation of 'str()' so that it avoids unnecessary
decode/encode operations, decoding only when necessary (yes, that means
there would be multiple C-level representations for a 'str'), and then
(b) making 'bytes' less useful as strings.


From masklinn at  Fri May 20 19:51:30 2011
From: masklinn at (Masklinn)
Date: Fri, 20 May 2011 19:51:30 +0200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

On 2011-05-20, at 18:31 , Ethan Furman wrote:
> Masklinn wrote:
>> I'm guessing the point is to be able to avoid the `if collection is None`
>> dance when the collection is not *supposed* to be modified: an immutable
>> collection would immediately raise on modification, acting as a
>> precondition/invariant and ensuring mutation is not introduced on the
>> original collection.
> If the function can't proceed properly without an actual parameter, why supply a default?
It can, where did you get the idea that it could not? That's the point of the default parameter.

> But the original problem is that an empty list is used as the default because an actual list is expected.  I think the problem has been misunderstood -- it's not *if* the list gets modified, but *when* -- so you would have the same dance, only instead of None, your now saying
> if default == empty():
>    default = []
> So you haven't saved a thing, and still don't really get the purpose behind mutable defaults.
No, the point of empty() (or whatever it would be called) would very much be to forbid mutation of the default parameter. I used the word *if* because that is precisely what I meant: if the default parameter is modified, an error has been introduced into the function.

empty() is both an empty list (because the code iterates over a list for instance, or maps it, or what have you) and an assertion that this list is *not* to be modified.

From masklinn at  Fri May 20 20:15:18 2011
From: masklinn at (Masklinn)
Date: Fri, 20 May 2011 20:15:18 +0200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On 2011-05-20, at 17:03 , Nick Coghlan wrote:
> I share Steve's puzzlement as the intended use case.
> To get value from the magic empty immutable list, you will have to
> explicitly test that calling your function with the default value does
> the right thing.
Why is that? The value of the empty immutable list (there's nothing magic
to it) would be an eternal assertion that an incorrect behavior (trying
to mutate the default parameter) can not be introduced in the function.

It is no different than adding `assert` calls in the code.

> But if you're writing an explicit test, having that test call the
> function *twice* to confirm correct use of the 'is None' idiom will
> work just as well.
But that's the point: do you *always* use the `is None` idiom? And do
you really love it? When you know the function body you just wrote
does not perform any modification to the collection?

There are 17 functions or methods with list default parameters and
133 with dict default parameters in the Python standard library.

Surely some of them legitimately make use of a mutable default
parameter as some kind of process-wide cache or accumulator, but
I would doubt the majority does (why would SMTP.sendmail need to
accumulate data in its mail_options parameter across runs?)

Do you know for sure that no mutation of these 150+ parameters will
ever be introduced, that all of these functions and methods are
sufficiently tested, called often enough that the introduction of
a mutation of the default parameter in themselves or one of their
callees would *never* be able to pass muster?

From ethan at  Fri May 20 20:56:48 2011
From: ethan at (Ethan Furman)
Date: Fri, 20 May 2011 11:56:48 -0700
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Masklinn wrote:
> On 2011-05-20, at 18:31 , Ethan Furman wrote:
>> Masklinn wrote:
>>> I'm guessing the point is to be able to avoid the `if collection is None`
>>> dance when the collection is not *supposed* to be modified: an immutable
>>> collection would immediately raise on modification, acting as a
>>> precondition/invariant and ensuring mutation is not introduced on the
>>> original collection.
>> If the function can't proceed properly without an actual parameter, why supply
 >> a default?
> It can, where did you get the idea that it could not? That's the point of the
 > default parameter.

Yes, I am aware.  And the point of providing an empty list as a default 
is so you have a list to add things to -- so what have you gained by 
providing an empty frozen list as a default?  Seems to me all you have 
now is a built-in time bomb -- every call = a blow up.

>> But the original problem is that an empty list is used as the default because
 >> an actual list is expected.  I think the problem has been 
misunderstood -- it's
 >> not *if* the list gets modified, but *when* -- so you would have the 
same dance,
 >> only instead of None, your now saying
>> if default == empty():
>>    default = []
>> So you haven't saved a thing, and still don't really get the purpose behind
 >> mutable defaults.
> No, the point of empty() (or whatever it would be called) would very much be to
 > forbid mutation of the default parameter. I used the word *if* 
because that is
 > precisely what I meant: if the default parameter is modified, an 
error has been
 > introduced into the function.
> empty() is both an empty list (because the code iterates over a list for instance,
 > or maps it, or what have you) and an assertion that this list is 
*not* to be
 > modified.

So what happens when you provide a *real* list, that is to be modified? 
  Not modify it?  Or have code that is constantly checking to see if 
it's okay to modify the list because it might be the immutable empty() 


From masklinn at  Fri May 20 20:48:15 2011
From: masklinn at (Masklinn)
Date: Fri, 20 May 2011 20:48:15 +0200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

On 2011-05-20, at 20:56 , Ethan Furman wrote:

> Masklinn wrote:
>> On 2011-05-20, at 18:31 , Ethan Furman wrote:
>>> Masklinn wrote:
>>>> I'm guessing the point is to be able to avoid the `if collection is None`
>>>> dance when the collection is not *supposed* to be modified: an immutable
>>>> collection would immediately raise on modification, acting as a
>>>> precondition/invariant and ensuring mutation is not introduced on the
>>>> original collection.
> >>
>>> If the function can't proceed properly without an actual parameter, why supply
> >> a default?
> >
>> It can, where did you get the idea that it could not? That's the point of the
> > default parameter.
> Yes, I am aware.  And the point of providing an empty list as a default is so you have a list to add things to
Not at all, you may just want to iterate on it, or accumulate it. There are cases of exactly this in the standard library itself.

> -- so what have you gained by providing an empty frozen list as a default?  Seems to me all you have now is a built-in time bomb -- every call = a blow up.
See above, your assumption is flawed and all reasoning following it is nonsense.

>>> But the original problem is that an empty list is used as the default because
> >> an actual list is expected.  I think the problem has been misunderstood -- it's
> >> not *if* the list gets modified, but *when* -- so you would have the same dance,
> >> only instead of None, your now saying
>>> if default == empty():
>>>   default = []
>>> So you haven't saved a thing, and still don't really get the purpose behind
> >> mutable defaults.
> >
>> No, the point of empty() (or whatever it would be called) would very much be to
> > forbid mutation of the default parameter. I used the word *if* because that is
> > precisely what I meant: if the default parameter is modified, an error has been
> > introduced into the function.
>> empty() is both an empty list (because the code iterates over a list for instance,
> > or maps it, or what have you) and an assertion that this list is *not* to be
> > modified.
> So what happens when you provide a *real* list, that is to be modified?
Again, this default parameter is for functions which *are not supposed to* modify
collections they were provided as parameters (which is the vast majority of functions,

>  Not modify it?  Or have code that is constantly checking to see if it's okay to modify the list because it might be the immutable empty() object?
This "objection" is absolutely nonsensical. Please cease.

From ethan at  Fri May 20 21:12:45 2011
From: ethan at (Ethan Furman)
Date: Fri, 20 May 2011 12:12:45 -0700
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Masklinn wrote:
> On 2011-05-20, at 17:03 , Nick Coghlan wrote:
>> I share Steve's puzzlement as the intended use case.
>> To get value from the magic empty immutable list, you will have to
>> explicitly test that calling your function with the default value does
>> the right thing.
> Why is that? The value of the empty immutable list (there's nothing magic
> to it) would be an eternal assertion that an incorrect behavior (trying
> to mutate the default parameter) can not be introduced in the function.
> It is no different than adding `assert` calls in the code.
>> But if you're writing an explicit test, having that test call the
>> function *twice* to confirm correct use of the 'is None' idiom will
>> work just as well.
> But that's the point: do you *always* use the `is None` idiom? And do
> you really love it? When you know the function body you just wrote
> does not perform any modification to the collection?

In this scenario:

def func(mylist=empty()):

mylist is the empty() object, you *know* func() does not modify mylist, 
you are wrong (heh) and it does... but your program always calls func() 
with an actual list -- how is empty() going to save you then?

Hint:  it won't.

And if you're thinking a unittest would catch that -- yes it would, but 
it would also catch it without empty() (make a copy first, call the 
func(), compare afterwards -- different?  Mutation!)


From tjreedy at  Fri May 20 21:05:17 2011
From: tjreedy at (Terry Reedy)
Date: Fri, 20 May 2011 15:05:17 -0400
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<ir21ma$72p$>	<>	<ir58nn$8tl$>
Message-ID: <ir6e1c$hvj$>

On 5/20/2011 9:05 AM, Ethan Furman wrote:

> The header of a .dbf file details the field composition such as name,
> size, type, etc. The type is C for character, L for logical, etc, and
> the end of the field definition block is signaled by a CR byte.

At the level of bytes, these are small int codes. For English speakers, 
it is convenient that most map to ascii chars that are the first letters 
of an English name of the type. This convinience is somewhat lost for 
non-English non-latin-alphabet speakers who cannot do the same.

> So in one spot of my code I (used to) have a comparison
> if hdr[0] == b'\x0d': # end of fields
> which I have changed to
> if hdr[0] == 0x0d:

Some people dislike magic constants in code and would suggest defining 
them at the top of the file (or even in a separate module) with comment 
that define and explain the protocol.

# Field type codes
T_log = ... # Logical field with T or F <or whatever>
T_char= ... # Variable length char field <or whatever>
T_efdb= 0x0d # End of field definition block

Take your pick of how to define the constants:
 >>> 0x0d == 13 == 0o15 == 0b1101 == ord(b'\r') == ord('\r') == b'\r'[0]

In 3.x, the identifies and comments can use any characters and language, 
so this works for everyone.

Terry Jan Reedy

From ethan at  Fri May 20 21:22:52 2011
From: ethan at (Ethan Furman)
Date: Fri, 20 May 2011 12:22:52 -0700
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Masklinn wrote:
> On 2011-05-20, at 20:56 , Ethan Furman wrote:
>> Masklinn wrote:
 >>> Ethan wrote:
>>>> If the function can't proceed properly without an actual parameter, why supply
>>>> a default?
>>> It can, where did you get the idea that it could not? That's the point of the
>>> default parameter.
>> Yes, I am aware.  And the point of providing an empty list as a default is so you have a list to add things to
> Not at all, you may just want to iterate on it, or accumulate it. There are cases of exactly this in the standard library itself.

Um, isn't accumulating modifying?  Or do you mean accumulating in a 
global or class instance? And why would you iterate over an empty list? 
  If you have an example from the stdlib I'd love to see it (seriously 
-- I'm always up for learning something).


From masklinn at  Fri May 20 21:10:25 2011
From: masklinn at (Masklinn)
Date: Fri, 20 May 2011 21:10:25 +0200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

On 2011-05-20, at 21:12 , Ethan Furman wrote:
> In this scenario:
> def func(mylist=empty()):
>   do_some_stuff_with_mylist
> mylist is the empty() object, you *know* func() does not modify mylist, you are wrong (heh) and it does... but your program always calls func() with an actual list -- how is empty() going to save you then?
It will not until one day func() is called without an actual list, and then you get a clear and immediate error instead of silent data corruption, or a memory leak, which are generally the result of an improperly mutated default argument collection and much harder to spot.

As I wrote in part of the message you quoted (but ignored), empty() acts as an assertion. The assertion is that the default parameter will never be modified, and if the assertion fails, an error is generated.

That's it. That's a pretty common bug in Python, and it solves it. No more, and no less.

From tjreedy at  Fri May 20 21:11:12 2011
From: tjreedy at (Terry Reedy)
Date: Fri, 20 May 2011 15:11:12 -0400
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <ir6ece$hvj$>

On 5/20/2011 1:51 PM, Masklinn wrote:

I am as puzzled as other people.

> empty() is both an empty list (because the code iterates over a list
> for instance, or maps it, or what have you) and an assertion that
> this list is *not* to be modified.

So use () as the default. It has all the methods of [] except for the 
mutation methods.

Terry Jan Reedy

From bruce at  Fri May 20 21:10:55 2011
From: bruce at (Bruce Leban)
Date: Fri, 20 May 2011 12:10:55 -0700
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

It seems to me that a better way of doing this is:

    def func(optional_list=[])
        optional_list = freeze(optional_list)

That is, if I expect the list to be immutable when it's empty, why wouldn't
I expect it to be immutable when it's not empty? The only case where the
immutability of the empty list matters is if there's a bug that changes the
list (or a called function changes its signature when that wasn't expected).
Wouldn't it be worth protecting against that when the list isn't empty as

Of course PEP 351 was rejected so there is no freeze() builtin. (Although
personally, I don't agree with all the arguments against it. For example, I
don't think that freezing a dict has to be hashable. I also think that
immutable objects are useful in unit testing where it's allows me to easily
be sure that a passed in dict isn't changed by a function.)

Anyway, in the case of a list I suspect that this is pretty close to what
you want:

    def func(optional_list=[])
        optional_list = tuple(optional_list)

--- Bruce
Latest blog post: Your social security number is a
very poor password
Learn how to hack web apps: (learn how to write
buggy Python too)

On Fri, May 20, 2011 at 11:15 AM, Masklinn <masklinn at> wrote:

> On 2011-05-20, at 17:03 , Nick Coghlan wrote:
> > I share Steve's puzzlement as the intended use case.
> >
> > To get value from the magic empty immutable list, you will have to
> > explicitly test that calling your function with the default value does
> > the right thing.
> Why is that? The value of the empty immutable list (there's nothing magic
> to it) would be an eternal assertion that an incorrect behavior (trying
> to mutate the default parameter) can not be introduced in the function.
> It is no different than adding `assert` calls in the code.
> > But if you're writing an explicit test, having that test call the
> > function *twice* to confirm correct use of the 'is None' idiom will
> > work just as well.
> But that's the point: do you *always* use the `is None` idiom? And do
> you really love it? When you know the function body you just wrote
> does not perform any modification to the collection?
> There are 17 functions or methods with list default parameters and
> 133 with dict default parameters in the Python standard library.
> Surely some of them legitimately make use of a mutable default
> parameter as some kind of process-wide cache or accumulator, but
> I would doubt the majority does (why would SMTP.sendmail need to
> accumulate data in its mail_options parameter across runs?)
> Do you know for sure that no mutation of these 150+ parameters will
> ever be introduced, that all of these functions and methods are
> sufficiently tested, called often enough that the introduction of
> a mutation of the default parameter in themselves or one of their
> callees would *never* be able to pass muster?
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From masklinn at  Fri May 20 21:18:30 2011
From: masklinn at (Masklinn)
Date: Fri, 20 May 2011 21:18:30 +0200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

On 2011-05-20, at 21:22 , Ethan Furman wrote:
> Masklinn wrote:
>> On 2011-05-20, at 20:56 , Ethan Furman wrote:
>>> Masklinn wrote:
> >>> Ethan wrote:
>>>>> If the function can't proceed properly without an actual parameter, why supply
>>>>> a default?
> >>>
>>>> It can, where did you get the idea that it could not? That's the point of the
>>>> default parameter.
> >>
>>> Yes, I am aware.  And the point of providing an empty list as a default is so you have a list to add things to
> >
>> Not at all, you may just want to iterate on it, or accumulate it. There are cases of exactly this in the standard library itself.
> Um, isn't accumulating modifying?
No. `reduce` does not alter the list in place, nor does `sum`, `any` or iterating on the list.

>  Or do you mean accumulating in a global or class instance?
Accumulating can be done in anything.

> And why would you iterate over an empty list?
Because you're iterating period, and that it's an empty list has no influence on your behavior. You'll simply do nothing during your iteration, because the iteration count will be 0. Why special-case empty lists when there is no need to?

Same with dict, `get` works on empty dicts as well as on any other such collection.

>  If you have an example from the stdlib I'd love to see it (seriously -- I'm always up for learning something).

Mailcap does that line 170: `subst` takes an empty list as a default parameter, forwards that parameter to `findparam` which iterates on the list to try and find the param.

If it can't find the param in the list, it simply returns an empty string.

An empty list is simply a case where it will never find the param, and it will Just Work. No need to create a special case.

Have you really never done such a thing?

From masklinn at  Fri May 20 21:27:42 2011
From: masklinn at (Masklinn)
Date: Fri, 20 May 2011 21:27:42 +0200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On 2011-05-20, at 21:10 , Bruce Leban wrote:
> It seems to me that a better way of doing this is:
>    def func(optional_list=[])
>        optional_list = freeze(optional_list)
> That is, if I expect the list to be immutable when it's empty, why wouldn't
> I expect it to be immutable when it's not empty? The only case where the
> immutability of the empty list matters is if there's a bug that changes the
> list (or a called function changes its signature when that wasn't expected).
> Wouldn't it be worth protecting against that when the list isn't empty as
> well?

Absolutely, but the mutation of the default parameters seems to be the main
problem (historically): it's a memory leak, and it's a global data corruption,
where modifying a provided parameter is a local data corruption (unless the
object passed in is global of course).

Ideally, you could just add a decorator or an annotation doing that for you
without additional work and name mutation within the function.

> Anyway, in the case of a list I suspect that this is pretty close to what
> you want:
>    def func(optional_list=[])
>        optional_list = tuple(optional_list)
But does not necessarily work depending on what callees demand (a callee
may be trying to concatenate that to a list of its own, and concatenating
lists and tuples does not work).

Plus, it does not help with dicts, which can expose the same issue.

From bruce at  Fri May 20 21:53:06 2011
From: bruce at (Bruce Leban)
Date: Fri, 20 May 2011 12:53:06 -0700
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 20, 2011 at 12:27 PM, Masklinn <masklinn at> wrote:

> Absolutely, but the mutation of the default parameters seems to be the main
> problem (historically): it's a memory leak, and it's a global data
> corruption,
> where modifying a provided parameter is a local data corruption (unless the
> object passed in is global of course).
Agreed, but that wasn't how I interpreted Jack's question. I think there are
two issues (with the first one the one that I think Jack was targeting):

(1) I want to make sure a parameter is immutable so I don't accidentally
change it (and none of the functions I call can do that). Akin to declaring
a parameter const in C-like languages. For example, is_ip_blocked(ip_address,
blocked_ip_list). (That's not just ip_address in blocked_ip_list if the list
contains CIDR addresses.) We could add code to make the values immutable or
use annotations:

    def func(x : const, y : const = []):

Personally, I like declaring the contract that a parameter is not being
modified explicitly and I would like a shallow freeze() function.

(2) The gotcha that the default value is the same value every time rather
than a new value. Lot's of ways to deal with this but none of them work
without educating people how the feature works. For example:

    def func(x : copy, y : copy = []):

--- Bruce
Latest blog post: Your social security number is a
very poor password
Learn how to hack web apps:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stutzbach at  Fri May 20 22:21:02 2011
From: stutzbach at (Daniel Stutzbach)
Date: Fri, 20 May 2011 13:21:02 -0700
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 19, 2011 at 9:46 PM, Jack Diederich <jackdied at> wrote:

> return nothing when asked for something and raise a ValueError when
> any attempt is made to add/remove items.

Couldn't you just use the empty immutable version for whatever type the
optional might be?  For sequences, use ().  For sets, use frozenset().  For
dicts, use ... oh.  Crap.

Daniel Stutzbach
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ericsnowcurrently at  Fri May 20 22:24:08 2011
From: ericsnowcurrently at (Eric Snow)
Date: Fri, 20 May 2011 14:24:08 -0600
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 20, 2011 at 1:53 PM, Bruce Leban <bruce at> wrote:

> On Fri, May 20, 2011 at 12:27 PM, Masklinn <masklinn at> wrote:
>> Absolutely, but the mutation of the default parameters seems to be the
>> main
>> problem (historically): it's a memory leak, and it's a global data
>> corruption,
>> where modifying a provided parameter is a local data corruption (unless
>> the
>> object passed in is global of course).
> Agreed, but that wasn't how I interpreted Jack's question. I think there
> are two issues (with the first one the one that I think Jack was targeting):
> (1) I want to make sure a parameter is immutable so I don't accidentally
> change it (and none of the functions I call can do that). Akin to declaring
> a parameter const in C-like languages. For example, is_ip_blocked(ip_address,
> blocked_ip_list). (That's not just ip_address in blocked_ip_list if the
> list contains CIDR addresses.) We could add code to make the values
> immutable or use annotations:
>     @const
>     def func(x : const, y : const = []):
>         pass
> Personally, I like declaring the contract that a parameter is not being
> modified explicitly and I would like a shallow freeze() function.
> (2) The gotcha that the default value is the same value every time rather
> than a new value. Lot's of ways to deal with this but none of them work
> without educating people how the feature works. For example:
>     @copy
>     def func(x : copy, y : copy = []):
>         pass
One bad solution to both would be to have the language enforce that default
values cannot be of mutable type.  Though not a valid solution, that idea
highlights the only two reasons I can see for using mutable defaults:

- caching across function calls
- having your default be of the same type as your expected argument (a
documentation of sorts)

The idiom of using None that Jack originally described is an acceptable
alternative to using mutable defaults.  It identifies no expectations on the
type of the argument.  It explicitly indicates that None is a valid
argument.  It implies that it will be special-cased in the function/class.
 It is easy to be consistent using None regardless of the expected type of
the argument.

The advantage of Jack's original proposal is that in cases where you are not
modifying the argument your won't need to plug the correct object in for
None, so that if statement he included would not be necessary.


--- Bruce
> Latest blog post: Your social security number is
> a very poor password
> Learn how to hack web apps:
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jackdied at  Sat May 21 01:10:01 2011
From: jackdied at (Jack Diederich)
Date: Fri, 20 May 2011 19:10:01 -0400
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 20, 2011 at 11:03 AM, Nick Coghlan <ncoghlan at> wrote:
> I share Steve's puzzlement as the intended use case.
> To get value from the magic empty immutable list, you will have to
> explicitly test that calling your function with the default value does
> the right thing.
> But if you're writing an explicit test, having that test call the
> function *twice* to confirm correct use of the 'is None' idiom will
> work just as well.
> There are limits to how much we can help people that don't test their code.

The use case isn't very fancy, it's to have a generic empty iterable
as a place holder in function defs instead of doing the "if x is None"
dance.  Unit tests make using a real empty iterable less likely to
trigger bad behavior, but because the behavior of real empty iterables
in function defs is tricky and non-intuitive, unit tests for some of
those functions would need some extra boilerplate.

That might not be a bad tradeoff compared to adding extra "if x is
None" checks for each optional arg to a function.

FYI, here is the code that triggered the query.  The "if None" check
is mostly habit with a small dose of pedagogical reinforcement for
other devs that would read it.

def query_sphinx(search_text, include=None, exclude=None):
    if include is None:
        include =  {}
    if exclude is None:
        exclude = {}

    query = sphinxapi.client()

    for field, values in include.items():
        query.SetFilter(field, values)
    for field, values in exclude.items():
        query.SetFilter(field, values, exclude=True)

    return query.query(search_text)


From ericsnowcurrently at  Sat May 21 02:03:17 2011
From: ericsnowcurrently at (Eric Snow)
Date: Fri, 20 May 2011 18:03:17 -0600
Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register
Message-ID: <>

ABCMeta.register is great.  It adds the cls argument to the _abc_registry of
the ABC.  However, the class that was passed in does not get touched.  If
you then want to find out to which classes a class has been registered, you
can't find out from that class.  Whereas your can find out from an abstract
base class which classes have been registered to it.

I propose having ABCMeta.register add/update a special method __implements__
to the class that is getting registered.  This would not be done to
builtin/extension types.  It adds the ABC to the __implements__ of the
subclass that is getting registered.  Something along these lines, right
before the final return in the method:

        if not hasattr(subclass, "__implements__"):
                subclass.__implements__ = {cls}
            except TypeError:

This is a small addition, but I realize it [potentially] adds another
special method to classes, so it's not trivial.

The use case is that I want to be able to validate that a class implements
all of the abstract methods of all the classes to which it has been
registered.  I don't have a programmatic way of discovering that set without
asking every class out there.  This is an easy way to accomplish this (for
non-extension/non-builtin types).  An alternative is to subclass ABCMeta and
tack this on, but that only works for my ABCs.  Another is to use a class
decorator to do this any place I do a register (or even to do the register
too), but again, only for the places that I do the registration.

Anyway, if it's useful to me then it may be useful to others, so I wanted to
put this out there.  I expect this has come up before, particularly during
discussions about PEP 3119.  However, I wasn't able to track down anything
specifically about doing this sort of "reverse registration".  And, of
course, I may be overestimating the value of this functionality.  If this
does not seem that valuable to anyone else, then no big deal.  :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From dbaker3448 at  Sat May 21 02:57:23 2011
From: dbaker3448 at (Dan Baker)
Date: Fri, 20 May 2011 19:57:23 -0500
Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like
Message-ID: <>

One common pattern I run across when parsing plain text data files is
that I want to skip over blank lines when processing. If I wanted to
build a list of all non-blank lines in the file, I could simply do:

lines = [line for line in input_file if line.strip()]

But as a loop, it almost invariably gets written as:

for line in input_file:
   if not line.strip():
   # do the real processing

It seems odd that "for x in y if z" is allowed in comprehensions but
not in a regular for loop. Why not let

for x in y if z:

be a shorthand for

for x in y:
   if not z:

Similarly, I occasionally have multiple sections that need to be
handled differently. One way to write this is:
for line in input_file:
   if is_section_delimiter(line):
for line in input_file: # this picks up where the last one left off
   if is_section_delimiter(line):
(This is a little bit of a weird idiom with files since repeated
iteration over them remembers where it left off, at least in 2.7.)

It would be nice to have this shorthand for it:
for line in input_file while not is_section_delimiter(line):
for line in input_file while not is_section_delimiter(line):

This makes it more immediately clear (to me, at least) that it stops
at the end of the section. This could also be added to comprehensions;
it's somewhat tricky to emulate in comprehensions now. I think the
easiest way to do the equivalent of [f(x) for x in y while z] with a
comprehension is
a = [(f(x) if z else None) for x in y]
   idx = a.index(None)
except ValueError: # no None found
else: # truncate before first None
   a = a[:idx]
but even that fails if None is a potentially valid result of f(x) (or
if you forget to use the try/except block and z was always True), and
it processes the entire list even though it may throw out a sizable
chunk of it immediately after. The only totally safe way I can think
of to do it now is by unpacking it into a loop:
a = []
for x in y:
   if not z:

I think adding these would make such idioms a little more readable,
but it might not be enough of a gain to justify a syntax addition.

Dan Baker

From ben+python at  Sat May 21 03:51:51 2011
From: ben+python at (Ben Finney)
Date: Sat, 21 May 2011 11:51:51 +1000
Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like
References: <>
Message-ID: <>

Dan Baker <dbaker3448 at> writes:

> It seems odd that "for x in y if z" is allowed in comprehensions but
> not in a regular for loop. Why not let
> for x in y if z:
>    do_stuff(x)
> be a shorthand for
> for x in y:
>    if not z:
>       continue
>    do_stuff(x)

This can already be spelled:

    for x in (w for w in y if z):

Which is not to forestall discussion of the proposed language change,
but only to point out that there is an existing idiom for this.

> Similarly, I occasionally have multiple sections that need to be
> handled differently. One way to write this is:
> for line in input_file:
>    if is_section_delimiter(line):
>       break
>    do_stuff_1(line)
> for line in input_file: # this picks up where the last one left off
>    if is_section_delimiter(line):
>        break
>    do_stuff_2(line)
> etc.

That looks like it would be better modelled with an explicit state
transition when the condition is encountered, without stopping the

    handlers = [do_stuff_1, do_stuff_2, do_stuff_3]
    handle_line = handlers.pop(0)
    for line in input_file:
        if is_section_delimiter(line):
            handle_line = handlers.pop(0)

 \       ?Philosophy is questions that may never be answered. Religion |
  `\              is answers that may never be questioned.? ?anonymous |
_o__)                                                                  |
Ben Finney

From guido at  Sat May 21 04:10:13 2011
From: guido at (Guido van Rossum)
Date: Fri, 20 May 2011 19:10:13 -0700
Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 20, 2011 at 5:57 PM, Dan Baker <dbaker3448 at> wrote:
> One common pattern I run across when parsing plain text data files is
> that I want to skip over blank lines when processing. If I wanted to
> build a list of all non-blank lines in the file, I could simply do:
> lines = [line for line in input_file if line.strip()]
> But as a loop, it almost invariably gets written as:
> for line in input_file:
> ? if not line.strip():
> ? ? ?continue
> ? # do the real processing
> It seems odd that "for x in y if z" is allowed in comprehensions but
> not in a regular for loop. Why not let
> for x in y if z:
> ? do_stuff(x)
> be a shorthand for
> for x in y:
> ? if not z:
> ? ? ?continue
> ? do_stuff(x)

Yes, "why not" indeed.

Because you can already do that in any number of different ways; you
showed one (two if you count the comprehension), another is

for x in y:
  if z:

Do we really need more ways to spell the same thing?

(Hint: this is a rhetorical question. I recommend you study the zen of
Python before replying.)

> Similarly, I occasionally have multiple sections that need to be
> handled differently. One way to write this is:
> for line in input_file:
> ? if is_section_delimiter(line):
> ? ? ?break
> ? do_stuff_1(line)
> for line in input_file: # this picks up where the last one left off
> ? if is_section_delimiter(line):
> ? ? ? break
> ? do_stuff_2(line)
> etc.
> (This is a little bit of a weird idiom with files since repeated
> iteration over them remembers where it left off, at least in 2.7.)
> It would be nice to have this shorthand for it:
> for line in input_file while not is_section_delimiter(line):
> ? do_stuff_1(line)
> for line in input_file while not is_section_delimiter(line):
> ? do_stuff_2(line)
> etc.
> This makes it more immediately clear (to me, at least)

Ay, there's the rub.

More syntactical options means more things to learn for every single
Python user. One of the attractions of Python is that it is relatively
small and simple. Let's keep it that way!

> that it stops
> at the end of the section. This could also be added to comprehensions;
> it's somewhat tricky to emulate in comprehensions now. I think the
> easiest way to do the equivalent of [f(x) for x in y while z] with a
> comprehension is
> a = [(f(x) if z else None) for x in y]
> try:
> ? idx = a.index(None)
> except ValueError: # no None found
> ? pass
> else: # truncate before first None
> ? a = a[:idx]
> but even that fails if None is a potentially valid result of f(x) (or
> if you forget to use the try/except block and z was always True), and
> it processes the entire list even though it may throw out a sizable
> chunk of it immediately after. The only totally safe way I can think
> of to do it now is by unpacking it into a loop:
> a = []
> for x in y:
> ? if not z:
> ? ? ?break
> ? a.append(f(x))
> I think adding these would make such idioms a little more readable,
> but it might not be enough of a gain to justify a syntax addition.
> Thoughts?

Indeed it is not enough.

--Guido van Rossum (

From steve at  Sat May 21 04:13:21 2011
From: steve at (Steven D'Aprano)
Date: Sat, 21 May 2011 12:13:21 +1000
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, 21 May 2011 04:15:18 am you wrote:
> On 2011-05-20, at 17:03 , Nick Coghlan wrote:
> > I share Steve's puzzlement as the intended use case.
> >
> > To get value from the magic empty immutable list, you will have to
> > explicitly test that calling your function with the default value
> > does the right thing.
> Why is that? The value of the empty immutable list (there's nothing
> magic to it) would be an eternal assertion that an incorrect behavior
> (trying to mutate the default parameter) can not be introduced in the
> function.

It's not the caller's responsibility to avoid mangling the internals of 
the function. It is the function's responsibility to avoid exposing 
those internals.

This suggestion seems crazy to me. Let me try to explain from the point 
of view of the caller. Suppose I call func and get a list back:

x = func(a)

So I can treat x as a list, because that's what it is:


But if I fail to pass an argument, and the default empty() is used, I 
get something that looks like a list:

y = func()
hasattr(y, "append")  # returns True

but blows up when I try to use it:

y.append(None)  # raise an exception

All because the function author doesn't want me modifying the return 
result. And why does the author care what I do with the result? Because 
he's exposing the default function value in such a way that the caller 
can mangle it.

If it causes problems when the function returns the default value, stop 
returning the default value! Don't push the burden onto the caller by 
dropping a landmine into their code.

Now you can "fix" this, for some definition of "fix", by documenting the 
fact that not passing the argument will result in something other than 
a list:

"If you don't pass an argument, and use the default, then you will get 
back an immutable empty sequence that has the same API as a list but 
that will raise an exception if you try to mutate it."

This is downright awful API design. As the caller, I simply don't care 
about the function author's difficulties in ensuring that the default 
value is not modified. That's Not My Problem. Fix your own buggy code. 
(Not that it is actually difficult: the idiom for mutable default 
values is two simple lines.)

What Is My Problem is that rather than fix his function, the author has 
dumped the problem in my lap. Now I have this immutable empty sequence 
that is useless to me. I either have to detect it and change it myself:

result = func(*args)  # args could be empty
if result is empty():
    # Fix stupid design flaw in func
    result = []

or I have to remember to never, under any circumstances, call func() 
without supplying an argument.

> There are 17 functions or methods with list default parameters and
> 133 with dict default parameters in the Python standard library.
> Surely some of them legitimately make use of a mutable default
> parameter as some kind of process-wide cache or accumulator, but
> I would doubt the majority does (why would SMTP.sendmail need to
> accumulate data in its mail_options parameter across runs?)
> Do you know for sure that no mutation of these 150+ parameters will
> ever be introduced, that all of these functions and methods are
> sufficiently tested, called often enough that the introduction of
> a mutation of the default parameter in themselves or one of their
> callees would *never* be able to pass muster?

Fine, you've discovered 150 potentially buggy functions in the standard 

If the authors didn't remember to use the default=None idiom in their 
functions, what makes you think that they'd remember to use 
default=empty() instead?

This suggested idiom is counterproductive. The function author doesn't 
save any work -- he still has to remember not to write default=[] in 
his functions. The author's burden is increased, because now he has to 
choose between three idioms instead of two:

# use this when default is like a cache

# use this when you need to mutate default within the function
if default is None:
    default = []

# use this when you want to return the default value but don't want 
# the caller to mutate it

And the caller's burden is increased, because now he has to deal with 
this immutable list instead of a real list.

Steven D'Aprano

From steve at  Sat May 21 05:12:58 2011
From: steve at (Steven D'Aprano)
Date: Sat, 21 May 2011 13:12:58 +1000
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, 21 May 2011 05:10:25 am Masklinn wrote:
> On 2011-05-20, at 21:12 , Ethan Furman wrote:
> > In this scenario:
> >
> > def func(mylist=empty()):
> >   do_some_stuff_with_mylist
> >
> > mylist is the empty() object, you *know* func() does not modify
> > mylist, you are wrong (heh) and it does... but your program always
> > calls func() with an actual list -- how is empty() going to save
> > you then?
> It will not until one day func() is called without an actual list,
> and then you get a clear and immediate error instead of silent data
> corruption, or a memory leak, 

I wouldn't call it a memory leak. As I understand it, a memory leak is 
normally understood to mean that your program is assigning memory in 
such a way that neither you, nor the compiler, can free it, not that 
you merely haven't noticed that you're assigning memory. Since the 
default value is exposed, either the function or the caller can free 
that memory.

> which are generally the result of an 
> improperly mutated default argument collection and much harder to
> spot.

You are simply wrong there. There is no reason to imagine that the 
caller will *immediately* attempt to modify the result:

y = func()  # returns immutable empty list

The attempt to mutate y might not happen until much later, in some 
distant part of the code, in another function, or module, or thread, or 
even another process. There is no limit to how distant in time or space 
the exception could be.

It's a landmine waiting to blow up, not an assertion.

y = func()
# ... much later
data = {'key': y}
# ... much later still
response = connect('something', params)

def connect(x, params):
    a = params.get('key', [])

When connect fails, there's nothing to associate the error with the 
mistake of calling func() without supplying an argument.

But note that calling func() without an argument is supposed to be 
legal. Why is it a mistake? It's only a mistake because func exposes 
internal data to the caller, and then compounds that bug by punishing 
the caller for inadvertently modifying that internal data rather than 
not exposing it in the first place.

This is *astonishingly* awful design.

> That's it. That's a pretty common bug in Python, and it solves it. No
> more, and no less.

This doesn't solve the problem, it just creates a new one. If people 
can't remember to use the "if default is None" idiom, what makes you 
think they will remember to use empty()? And if they do remember, 
they're just disguising their bug as the caller's mistake.

Steven D'Aprano

From steve at  Sat May 21 05:19:49 2011
From: steve at (Steven D'Aprano)
Date: Sat, 21 May 2011 13:19:49 +1000
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, 21 May 2011 05:18:30 am Masklinn wrote:
> Why special-case empty lists when there is no need to?

That's a remarkable statement. What is empty() except a special case for 
empty lists?

If you and Jack are serious about this proposal, it would require at 
least two such functions, emptylist and emptydict, not just empty(). 
And even if you are right that it solves the problem of default=[] 
(which you aren't, but for the sake of the argument lets pretend), it 
doesn't solve the general issue of mutable defaults.

As I said, having a freeze() function that creates an immutable list 
might be a good idea, although not for the default argument issue. (I'm 
not entirely sure how that differs from tuple, but that's another 
issue...) But special casing a frozen empty list seems silly, and the 
use-case given by the OP, and defended by you, is actively harmful.

Steven D'Aprano

From greg.ewing at  Sat May 21 05:57:30 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 21 May 2011 15:57:30 +1200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

Masklinn wrote:

> Again, this default parameter is for functions which *are not supposed to* modify
> collections they were provided as parameters (which is the vast majority of functions,
> really).

Your empty() default would do nothing to catch attempts to
modify a passed-in list. If you're worried about the function
erroneously modifying the default value, you should be just as
worried about that.


From dbaker3448 at  Sat May 21 06:06:04 2011
From: dbaker3448 at (Dan Baker)
Date: Fri, 20 May 2011 23:06:04 -0500
Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like
In-Reply-To: <>
References: <>
Message-ID: <>

I had a feeling that might be the answer. Sometimes a little syntactic
sugar isn't bad, but even reasonable people won't always agree on
which kinds - and too far down that road lies Perl. Thanks anyway.


From greg.ewing at  Sat May 21 06:10:55 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 21 May 2011 16:10:55 +1200
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

Masklinn wrote:

> Absolutely, but the mutation of the default parameters seems to be the main
> problem (historically):

The most common problem regarding default parameters is mutation
of them by functions which *are* supposed to modify the parameter.

IMO you're trying to solve an almost-nonexistent problem.

> it's a global data corruption,
> where modifying a provided parameter is a local data corruption

It's just as damaging, though -- the program still produces
incorrect results.


From guido at  Sat May 21 06:14:51 2011
From: guido at (Guido van Rossum)
Date: Fri, 20 May 2011 21:14:51 -0700
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>
Message-ID: <>

Please end this thread. The original empty() proposal is clearly not
working, and no modification of it is going to work. The recommended
pattern is very clear and matches the Zen of Python: Explicit is
better than implicit.

--Guido van Rossum (

From cs at  Sat May 21 07:25:04 2011
From: cs at (Cameron Simpson)
Date: Sat, 21 May 2011 15:25:04 +1000
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <ir6ece$hvj$>
References: <ir6ece$hvj$>
Message-ID: <>

On 20May2011 15:11, Terry Reedy <tjreedy at> wrote:
| On 5/20/2011 1:51 PM, Masklinn wrote:
| I am as puzzled as other people.
| >empty() is both an empty list (because the code iterates over a list
| >for instance, or maps it, or what have you) and an assertion that
| >this list is *not* to be modified.
| So use () as the default. It has all the methods of [] except for
| the mutation methods.

You're missing the point.

This thread is about providing a complex solution to a common problem.

Your technique of providing a simple solution to the problem doesn't
help the thread persist.

[ Hmm, I see my random sig quoter has hit the money again:-)
  Truly, the quote below was pot luck!

Cameron Simpson <cs at> DoD#743

If you can keep your head while all those about you are losing theirs,
perhaps you don't understand the situation.
        - Paul Wilson <Paul_Wilson.DBS at>

From tjreedy at  Sat May 21 23:06:01 2011
From: tjreedy at (Terry Reedy)
Date: Sat, 21 May 2011 17:06:01 -0400
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <ir99fm$fij$>

On 5/20/2011 3:18 PM, Masklinn wrote:
> Because you're iterating period, and that it's an empty list has no
> influence on your behavior. You'll simply do nothing during your
> iteration, because the iteration count will be 0. Why special-case
> empty lists when there is no need to?

An empty tuple () works fine for that. You never explained in any way I 
could remotely understand why you want something else.

Terry Jan Reedy

From tjreedy at  Sat May 21 23:48:55 2011
From: tjreedy at (Terry Reedy)
Date: Sat, 21 May 2011 17:48:55 -0400
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <ir9c06$u0j$>

On 5/20/2011 7:10 PM, Jack Diederich wrote:

> The use case isn't very fancy, it's to have a generic empty iterable

If () and frozenset() are not generic enough for *you*, try this:

def empty():
     raise StopIteration
emp = empty()
for i in emp: print('something')
for i in emp: print('something')
# prints nothing, both times.

Actually, iter(()), iter([]), iter({}) behave the same when iterated.

> as a place holder in function defs instead of doing the "if x is None"

That idiom is for a completely differert use case:
when one wants a new empty *mutable* on every call, that will be filled 
with values and, typically, returned.

> FYI, here is the code that triggered the query.  The "if None" check
> is mostly habit with a small dose of pedagogical reinforcement for
> other devs that would read it.
> def query_sphinx(search_text, include=None, exclude=None):
>      if include is None:
>          include =  {}
>      if exclude is None:
>          exclude = {}

Since you are not mutating include and exclude, there is no point to 
this noise. I fact, I consider it wrong because it actually *misleads* 
other devs who would expect something put into each of them and 
returned. The proper way to write this is

def query_sphinx(search_text, include={}, exclude={}):

which documents that the parameters should be dicts (or similar) and 
that they are read only. You version implies that it would be ok to 
write to them, which is wrong.

>      query = sphinxapi.client()
>      for field, values in include.items():
>          query.SetFilter(field, values)
>      for field, values in exclude.items():
>          query.SetFilter(field, values, exclude=True)
>      return query.query(search_text)

Terry Jan Reedy

From tjreedy at  Sat May 21 23:57:44 2011
From: tjreedy at (Terry Reedy)
Date: Sat, 21 May 2011 17:57:44 -0400
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <ir9cgm$qe$>

On 5/20/2011 3:10 PM, Bruce Leban wrote:
> It seems to me that a better way of doing this is:
>      def func(optional_list=[])
>          optional_list = freeze(optional_list)

If a function only needs a read-only sequence, it should not require or 
to said to require a list. "def func(optional_seq = ()):". If it only 
iterates through an input collection, it should only require an 
iterable: "def func(optional_iter=()): it = iter(optional_iter)". If you 
are paranoid and want the function to raise on any attempt to do much of 
anything with the input, replace '()' with 'iter()'.

Terry Jan Reedy

From tjreedy at  Sun May 22 00:53:14 2011
From: tjreedy at (Terry Reedy)
Date: Sat, 21 May 2011 18:53:14 -0400
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <ir9fop$gq4$>

On 5/20/2011 2:15 PM, Masklinn wrote:
> There are 17 functions or methods with list default parameters and
> 133 with dict default parameters in the Python standard library.

Could you give the re or whatever that you used to find these?
I might want to look and possibly change a few.

> Surely some of them legitimately make use of a mutable default
> parameter as some kind of process-wide cache or accumulator, but
> I would doubt the majority does (why would SMTP.sendmail need to
> accumulate data in its mail_options parameter across runs?)

I suspect that that nearly all of these uses are for read-only inputs. 
In Python 3, () could replace [] in such cases, as tuples now have all 
the read-only sequence methods (they once had no methods). Of course, 
even that does not protect against perhaps crazy code like

if input: <mutate it> # skips empty args, default or not

That suggests that all functions that are supposes to only read an input 
sequences should be tested with tuples. Actually, if only an iterable is 
needed, then such should be tested with non-seequence iterables. That is 
actually very easy to produce: for instance, iter((1,2,3)).

Thinking about it more, it the only use of an arg is to iterate through 
key,value pairs, then the default could be 'iter({})' insteaad of '{}' 
to better document the usage.

> Do you know for sure that no mutation of these 150+ parameters will
> ever be introduced, that all of these functions and methods are
> sufficiently tested, called often enough that the introduction of
> a mutation of the default parameter in themselves or one of their
> callees would *never* be able to pass muster?

No. However, anyone qualified for push access to the central source 
should know that defaults should be treated as read-only unless 
documented otherwise. This is especially true for {}. So I consider it a 
somewhat paranoid worry, in the absence of cases where revisers *have* 
introduced mutation where not present before.

That aside, developers have and are improving the test suite. That was 
the focus of the recent post-PyCon sprint. It continues with a test 
improvement most every day. If you want to join us volunteers to improve 
tests further, please do.

Terry Jan Reedy

From tjreedy at  Sun May 22 01:01:36 2011
From: tjreedy at (Terry Reedy)
Date: Sat, 21 May 2011 19:01:36 -0400
Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like
In-Reply-To: <>
References: <>
Message-ID: <ir9g8g$ite$>

On 5/20/2011 8:57 PM, Dan Baker wrote:

> It seems odd that "for x in y if z" is allowed in comprehensions but
> not in a regular for loop.

Comprehensions are expressions and therefore need everything packed into 
them that is needed.

For statements and if statement are statements and both can be followed 
in there suite with an many statements as needed, so there is no *need* 
to pack more than is necessary into the header line. Even doc strings, 
which are conceptually part of the header, were put down into the suite.

Terry Jan Reedy

From tjreedy at  Sun May 22 01:41:20 2011
From: tjreedy at (Terry Reedy)
Date: Sat, 21 May 2011 19:41:20 -0400
Subject: [Python-ideas] Use iter() for defaults (was Re: function defaults
 and an empty() builtin)
In-Reply-To: <ir9c06$u0j$>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <ir9iiv$tm2$>

On 5/21/2011 5:48 PM, Terry Reedy wrote:

>> def query_sphinx(search_text, include=None, exclude=None):
>> if include is None:
>> include = {}
>> if exclude is None:
>> exclude = {}
> Since you are not mutating include and exclude, there is no point to
> this noise. I fact, I consider it wrong because it actually *misleads*
> other devs who would expect something put into each of them and
> returned. The proper way to write this is
> def query_sphinx(search_text, include={}, exclude={}):
> which documents that the parameters should be dicts (or similar) and
> that they are read only. You version implies that it would be ok to
> write to them, which is wrong.
>> query = sphinxapi.client()
>> for field, values in include.items():
>> query.SetFilter(field, values)
>> for field, values in exclude.items():
>> query.SetFilter(field, values, exclude=True)
>> return query.query(search_text)

Here is back-compatible rewrite that expands the domain for 'include' 
and 'exclude' to iterables of key-value pairs. It both documents and 
ensures that query_sphinx() will do nothing but iterate through 
key-value pairs from the last two args.

def query_sphinx(search_text,
     if isinstance(include, dict):
         include = include.items()
     if isinstance(exclude, dict):
         exclude = exclude.items()

     query = sphinxapi.client()
     for field, values in include():
         query.SetFilter(field, values)
     for field, values in exclude():
         query.SetFilter(field, values, exclude=True)

     return query.query(search_text)

Lifting effectively constant expressions out of a loop is a standard 
technique. In this case, the 'loop' is whatever would cause repeated 
calls to the function without explicit args. The small define-time cost 
of the extra calls would eventually be saved at runtime by reusing the 
dict_valueiterators instead of creating equivalent ones over and over.

Terry Jan Reedy

From rob.cliffe at  Sun May 22 13:41:45 2011
From: rob.cliffe at (Rob Cliffe)
Date: Sun, 22 May 2011 12:41:45 +0100
Subject: [Python-ideas] function defaults and an empty() builtin
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

On 21/05/2011 04:19, Steven D'Aprano wrote:
> On Sat, 21 May 2011 05:18:30 am Masklinn wrote:
>> Why special-case empty lists when there is no need to?
> That's a remarkable statement. What is empty() except a special case for
> empty lists?
> If you and Jack are serious about this proposal, it would require at
> least two such functions, emptylist and emptydict, not just empty().
> And even if you are right that it solves the problem of default=[]
> (which you aren't, but for the sake of the argument lets pretend), it
> doesn't solve the general issue of mutable defaults.
Or all collections could have a "mutable" attribute which, once it has 
been set to False, can never subsequently be reset to True. Then you 
could merge lists and tuples into a single type, ditto sets and frozen 
sets, and you get immutable dictionaries as well.  Plus a considerable 
simplification of the language.

From stephen at  Sun May 22 17:46:20 2011
From: stephen at (Stephen J. Turnbull)
Date: Mon, 23 May 2011 00:46:20 +0900
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <ir58nn$8tl$>
References: <>
	<> <>
	<> <>
Message-ID: <>

Terry Reedy writes:

 > As far as I noticed, Ethan did not explain why he was extracting single
 > bytes and comparing to a constant, so it is hard to know if he was even
 > using them properly.

It doesn't really matter whether Ethan is using them properly.  It's
clear there are such uses, though I don't know how important they are,
so we may as well assume Ethan's is one such.

 > > Japanese mail is transmitted via SMTP, and the control function
 > > "hello" is still spelled "EHLO" in Japanese mail.
 > I am not familiar with that control function, but if it is part of
 > the SMTP protocol, it has nothing to do with the language of the
 > payload.

Precisely my point.  Therefore a payload represented as bytes should
be treated as *uninterpreted* bytes, except where interpretations are
defined for those bytes.  This works for SMTP, because RFC 822
*deliberately* specifies headers to be encoded in ASCII (not
"ASCII-compatible") in order that the payload (header) manipulations
specified by RFC 821 and friends be guaranteed correct.

Nevertheless, people frequently request mail processing features that
require manipulations of MIME part bodies and even plain RFC 822
message bodies.  These cannot be guaranteed correct unless done by
decoding and reencoding, but bytes-oriented manipulations generally
"work" in monolingual contexts (or seem to, and any problems can
always be blamed on MS Outlook).  There are several such features that
come up over and over again on Mailman lists and sometimes in the
Python Email SIG, and I'm sure the same is true for web protocols.

 > > Farsi web pages are formatted by HTML, and the control
 > > function "new line" is spelled "<BR>" in Farsi, of course.
 > When writing the html *text* body, sure. But I presume browsers decode
 > encoded bytes to unicode *before* parsing the text. If so, it does not
 > really matter that '<br>' gets encoded to b'<br>'.

HTML is not exclusively processed by browsers.  It is often processed
by servers and middleware that don't know they're speaking HTML, and
according to several experts' testimony, they're in a freakin' hurry
to push bytes out the door, there's no time for Unicode (decoding and
encoding, OMG how inefficient!)

Such developers want to write their libraries using bytes *and*
literals that can be used both for binary protocols and for text
protocols (urlparse seems to be the canonical example).

The convenience of using bytes in a string-like way (eg, the b''
literal) in manipulating many binary protocols is clear.  That
convenience is just as great for people who are at substantial risk of
mojibake if bytes are used to do text manipulations on the encoded
form, as well as for people who face little risk (eg, those who use
only American English).

The question is how far to go with polymorphism, etc.  I think that
Nick's urlparse work gets the balance about right, and see only danger
in more stringlike bytes (eg, by returning b'b' for b'bytes'[0]).
OTOH, there are some changes that might be useful but seem very
low-risk, such as a c'b' literal that means 98, not b'b'.

From ncoghlan at  Mon May 23 07:46:05 2011
From: ncoghlan at (Nick Coghlan)
Date: Mon, 23 May 2011 15:46:05 +1000
Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, May 21, 2011 at 10:03 AM, Eric Snow <ericsnowcurrently at> wrote:
> This is a small addition, but I realize it [potentially] adds another
> special method to classes, so it's not trivial.
> The use case is that I want to be able to validate that a class implements
> all of the abstract methods of all the classes to which it has been
> registered. ?I don't have a programmatic way of discovering that set without
> asking every class out there. ?This is an easy way to accomplish this (for
> non-extension/non-builtin types). ?An alternative is to subclass ABCMeta and
> tack this on, but that only works for my ABCs. ?Another is to use a class
> decorator to do this any place I do a register (or even to do the register
> too), but again, only for the places that I do the registration.
> Anyway, if it's useful to me then it may be useful to others, so I wanted to
> put this out there. ?I expect this has come up before, particularly during
> discussions about PEP 3119. ?However, I wasn't able to track down anything
> specifically about doing this sort of "reverse registration". ?And, of
> course, I may be overestimating the value of this functionality. ?If this
> does not seem that valuable to anyone else, then no big deal. ?:)

An alternative approach to the same idea was to be able to register
callbacks with ABCs to track registration and deregistration
operations on that ABC and any subclasses. This has the advantage of
working with arbitrary objects, including those without mutable
__dict__ attributes. Such an approach would start by building a type
map (via ABC.__subclasses__) and then using the callback hooks to keep
the mapping up to date.

I believe there is an open tracker item for that concept, but I can't
currently find a reference to it.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Mon May 23 08:02:18 2011
From: ncoghlan at (Nick Coghlan)
Date: Mon, 23 May 2011 16:02:18 +1000
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Mon, May 23, 2011 at 1:46 AM, Stephen J. Turnbull <stephen at> wrote:
> The question is how far to go with polymorphism, etc. ?I think that
> Nick's urlparse work gets the balance about right, and see only danger
> in more stringlike bytes (eg, by returning b'b' for b'bytes'[0]).
> OTOH, there are some changes that might be useful but seem very
> low-risk, such as a c'b' literal that means 98, not b'b'.

If we did go with an ord() literal, I would actually favour something
more like 0'b'.

However, as Maciej pointed out off-list, adding a new literal type
because calls to builtin functions have a relatively high overhead in
CPython even with constant arguments probably isn't a good idea.
Better to just write "ord('b')" and use PyPy to make it fast
(Alternative for use with -O rather than PyPy: "ordb = 98; assert ordb
== ord('b')").


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From stefan_ml at  Mon May 23 09:19:26 2011
From: stefan_ml at (Stefan Behnel)
Date: Mon, 23 May 2011 09:19:26 +0200
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<>	<>
	<>	<ir21ma$72p$>	<>	<ir58nn$8tl$>	<>
Message-ID: <ird1pu$453$>

Nick Coghlan, 23.05.2011 08:02:
> On Mon, May 23, 2011 at 1:46 AM, Stephen J. Turnbull wrote:
>> The question is how far to go with polymorphism, etc.  I think that
>> Nick's urlparse work gets the balance about right, and see only danger
>> in more stringlike bytes (eg, by returning b'b' for b'bytes'[0]).
>> OTOH, there are some changes that might be useful but seem very
>> low-risk, such as a c'b' literal that means 98, not b'b'.
> If we did go with an ord() literal, I would actually favour something
> more like 0'b'.
> However, as Maciej pointed out off-list, adding a new literal type
> because calls to builtin functions have a relatively high overhead in
> CPython even with constant arguments probably isn't a good idea.
> Better to just write "ord('b')" and use PyPy to make it fast

Even CPython could optimise b'x'[0] into a constant, if people ever find 
this to be a bottleneck.


From lists at  Mon May 23 15:40:39 2011
From: lists at (Christian Heimes)
Date: Mon, 23 May 2011 15:40:39 +0200
Subject: [Python-ideas] Threading hooks and disable gc per thread
In-Reply-To: <>
References: <>
Message-ID: <>

Am 15.05.2011 13:13, schrieb Nick Coghlan:

(Sorry for the delay, I was swamped with work again)

> So the plan is to have threading.Thread support the hooks, while
> _thread.start_new_thread and creation of thread states at the C level
> (including via PyGILState_Ensure) will bypass them?
> That actually sounds reasonable to me (+0), but the PEP should at
> least discuss the rationale for the choice of level for the new
> feature. I also suggest storing the associated hook lists at the
> threading.Thread class object level rather than at the threading
> module level (supporting such modularity of state being a major
> advantage of only providing this feature at the higher level).

I've considered both places, too. _thread.start_new_thread() as well as
PyGILState_Ensure() would require a considerable amount of C coding for
a feature that won't affect performance in a noticeable way.

This is my answer against an C implementation in
_thread.start_new_thread(). It's far too much work for a feature that
can be implemented in Python easily. An implementation in the pure
Python threading module will work on PyPy, IronPython and Jython
instantly. I consider any library, that bypasses the threading module,
broken, too.

PyGILState_Ensure() or PyThreadState_New() are a different beast. I
concur, it would the best place for the hooks if I could think of a way
to implement the on-thread-stop hook. I don't see a way to execute some
code at the end of a thread without cooperation from the calling code.

> The PEP should also go into detail as to why having these hooks in a
> custom Thread subclass isn't sufficient (e.g. needing to support
> threads created by third party libraries, but note that such a
> rationale has a problem due to the _thread.start_new_thread loophole).


> Composability through inheritance should also be discussed - the hook
> invocation should probably walk the MRO so it is easy to create Thread
> subclasses that include class specific hooks without inadvertently
> skipping the hooks installed on threading.Thread.

Good idea!

Do you think, it's sufficient to have hook methods like

class Thread:
    _start_hooks = []

    def on_thread_starting(self):
        for hook, args, kwargs in self._start_hooks:
            hook(*args, **kwargs)

? Subclasses of threading.Thread can easily overwrite the hook method
and call its parent's on_thread_starting().

> The possibility of passing exception information to thread_end hooks
> (ala __exit__ methods) should be considered, along with the general
> relationship between the threading hooks and the context management
> protocol.

That's an interesting idea! I'll consider it.

>> gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread()
>> --------------------------------------------------------------
> The default setting for this should go in the interpreter state object
> rather than in a static variable (subinterpreters can then inherit the
> state of their parent interpreter when they are first created).
> Otherwise sounds reasonable. (+0)

A subinterpreter flag isn't enough. All subinterpreters share a common
GC list. A gc.collect() inside a subinterpreter run affects the entire
interpreter and not just the one subinterpreter. I've to think about the
issue of subinterpreters ...

If I understand the code correctly, gc.get_objects() punches a hole in
the subinterpreter isolation. It returns all tracked objects of the
current process -- from all subinterpreters. Is this a design issue? The
fact isn't mentioned in


From ncoghlan at  Mon May 23 16:27:04 2011
From: ncoghlan at (Nick Coghlan)
Date: Tue, 24 May 2011 00:27:04 +1000
Subject: [Python-ideas] Threading hooks and disable gc per thread
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 23, 2011 at 11:40 PM, Christian Heimes <lists at> wrote:
> Am 15.05.2011 13:13, schrieb Nick Coghlan:
>> Composability through inheritance should also be discussed - the hook
>> invocation should probably walk the MRO so it is easy to create Thread
>> subclasses that include class specific hooks without inadvertently
>> skipping the hooks installed on threading.Thread.
> Good idea!
> Do you think, it's sufficient to have hook methods like
> class Thread:
> ? ?_start_hooks = []
> ? ?def on_thread_starting(self):
> ? ? ? ?for hook, args, kwargs in self._start_hooks:
> ? ? ? ? ? ?hook(*args, **kwargs)
> ? Subclasses of threading.Thread can easily overwrite the hook method
> and call its parent's on_thread_starting().

I was actually thinking of making life even easier for subclasses:

class Thread:
  start_hooks = []

  def _on_thread_starting(cls):
    # Hooks in parent classes are called before hooks in child classes
    hook_sources = reversed(cls.__mro__)
    for hook_source in hook_sources:
      # Arguable design decision here: only look at Thread subclasses,
not any mixins
      if not issubclass(hook_source, Thread):
      hooks = hook_src.__dict__.get("start_hooks", ())
      for hook, args, kwargs in hooks:
        hook(*args, **kwargs)

With the parent method explicitly walking the whole MRO in reverse,
any subclass hooks will naturally be invoked after any parent hooks
without any particular effort on the part of the subclass implementor
- the just need to provide and populate a "start_hooks" attribute.

The alternative would mean that overriding "_start_hooks" in a
subclass would block ready access to the main hooks in Thread.

>>> gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread()
>>> --------------------------------------------------------------
>> The default setting for this should go in the interpreter state object
>> rather than in a static variable (subinterpreters can then inherit the
>> state of their parent interpreter when they are first created).
>> Otherwise sounds reasonable. (+0)
> A subinterpreter flag isn't enough. All subinterpreters share a common
> GC list. A gc.collect() inside a subinterpreter run affects the entire
> interpreter and not just the one subinterpreter. I've to think about the
> issue of subinterpreters ...
> If I understand the code correctly, gc.get_objects() punches a hole in
> the subinterpreter isolation. It returns all tracked objects of the
> current process -- from all subinterpreters. Is this a design issue? The
> fact isn't mentioned in

It's quite possible - there's a reason that heavy use of
subinterpreters has a "this may fail in unexpected ways" rider
attached. Still, this is the kind of thing a PEP will hopefully do a
reasonable job of flushing out and resolving.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From sturla at  Mon May 23 18:39:07 2011
From: sturla at (Sturla Molden)
Date: Mon, 23 May 2011 18:39:07 +0200
Subject: [Python-ideas] [Python-Dev] CPython optimization: storing
 reference counters outside of objects
In-Reply-To: <>
References: <>
Message-ID: <>

Den 23.05.2011 06:59, skrev "Martin v. L?wis":
> My expectation is that your approach would likely make the issues
> worse in a multi-CPU setting. If you put multiple reference counters
> into a contiguous block of memory, unrelated reference counters will
> live in the same cache line. Consequentially, changing one reference
> counter on one CPU will invalidate the cached reference counters of
> that cache line on other CPU, making your problem a) actually worse.

In a multi-threaded setting with concurrent thread accessing reference 
counts, this would certainly worsen the situation.

In a single-threaded setting, this will likely be an improvement.

CPython, however, has a GIL. Thus there is only one concurrently active 
thread with access to reference counts. On a thread switch in the 
interpreter, I think the performance result will depend on the nature of 
the Python code: If threads share a lot of objects, it could help to 
reduce the number of dirty cache lines. If threads mainly work on 
private objects, it would likely have the effect you predict. Which will 
dominate is hard to tell.

Instead, we could use multiple heaps:

Each Python thread could manage it's own heap for malloc and free (cf. 
HeapAlloc and HeapFree in Windows). Objects local to one thread only 
reside in the locally managed heap.

When an object becomes shared by seveeral Python threads, it is moved 
from a local heap to the global heap of the process. Some objects, such 
as modules, would be stored directly onto the global heap.

This way, objects only used by only one thread would never dirty cache 
lines used by other threads.

This would also be a way to reduce the CPython dependency on the GIL. 
Only the global heap would need to be protected by the GIL, whereas the 
local heaps would not need any global synchronization.

(I am setting follow-up to the Python Ideas list, it does not belong on 
Python dev.)

Sturla Molden

From fuzzyman at  Mon May 23 20:16:44 2011
From: fuzzyman at (Michael Foord)
Date: Mon, 23 May 2011 19:16:44 +0100
Subject: [Python-ideas] Implementing __dir__ (moving dir implementation to
Message-ID: <>

Hello all,

I'm looking at implementing __dir__ for a class (mock.Mock as it happens) to
include some dynamically added attributes, the canonical use case according
to the documentation:

What I would like to do is report all the "standard attributes", and then
add any dynamically created attributes.

So the question is, how do I obtain the "standard list" (the list that dir
would normally report in the absence of a custom __dir__ implementation)?

There is no object.__dir__ (despite the fact that this is how it is
documented...) and obviously calling dir(self) within __dir__ is doomed to

The best I have come up with is:

def __dir__(self):
    return dir(type(self)) + list(self.__dict__) +

This works (absent multiple inheritance), but it would be nice to just be
able to do:

def __dir__(self):
    standard = super().__dir__()
    return standard + self._get_dynamic_attributes()

Moving the relevant parts of the implementation of dir into object.__dir__
would be one way to solve that.

All the best,

Michael Foord


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From amauryfa at  Mon May 23 20:22:14 2011
From: amauryfa at (Amaury Forgeot d'Arc)
Date: Mon, 23 May 2011 20:22:14 +0200
Subject: [Python-ideas] CPython optimization: storing reference counters
	outside of objects
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>


2011/5/23 Sturla Molden <sturla at>:
> Instead, we could use multiple heaps:
> Each Python thread could manage it's own heap for malloc and free (cf.
> HeapAlloc and HeapFree in Windows). Objects local to one thread only reside
> in the locally managed heap.
> When an object becomes shared by seveeral Python threads, it is moved from a
> local heap to the global heap of the process. Some objects, such as modules,
> would be stored directly onto the global heap.

Does this mean that the PyObject* address would change?
How would you update all the places that store moved references?

Amaury Forgeot d'Arc

From fuzzyman at  Mon May 23 22:52:23 2011
From: fuzzyman at (Michael Foord)
Date: Mon, 23 May 2011 21:52:23 +0100
Subject: [Python-ideas] Implementing __dir__ (moving dir implementation
	to object.__dir__?)
In-Reply-To: <>
References: <>
Message-ID: <>

On 23 May 2011 19:16, Michael Foord <fuzzyman at> wrote:

> Hello all,
> I'm looking at implementing __dir__ for a class (mock.Mock as it happens)
> to include some dynamically added attributes, the canonical use case
> according to the documentation:
> What I would like to do is report all the "standard attributes", and then
> add any dynamically created attributes.
> So the question is, how do I obtain the "standard list" (the list that dir
> would normally report in the absence of a custom __dir__ implementation)?
> There is no object.__dir__ (despite the fact that this is how it is
> documented...) and obviously calling dir(self) within __dir__ is doomed to
> failure.
> The best I have come up with is:
> def __dir__(self):
>     return dir(type(self)) + list(self.__dict__) +
> self._get_dynamic_attributes()

Better version which orders and removes duplicates:

        return sorted(set((dir(type(self)) + list(self.__dict__) +

> This works (absent multiple inheritance), but it would be nice to just be
> able to do:
> def __dir__(self):
>     standard = super().__dir__()
>     return standard + self._get_dynamic_attributes()
> Moving the relevant parts of the implementation of dir into object.__dir__
> would be one way to solve that.
> All the best,
> Michael Foord
> --
> May you do good and not evil
> May you find forgiveness for yourself and forgive others
> May you share freely, never taking more than you give.
> -- the sqlite blessing


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From sturla at  Mon May 23 23:29:18 2011
From: sturla at (Sturla Molden)
Date: Mon, 23 May 2011 23:29:18 +0200
Subject: [Python-ideas] CPython optimization: storing reference counters
 outside of objects
In-Reply-To: <>
References: <>	<>
	<>	<>
Message-ID: <>

Den 23.05.2011 20:22, skrev Amaury Forgeot d'Arc:
> Does this mean that the PyObject* address would change?
> How would you update all the places that store moved references?

That is a good point. How does the generational GC of .NET and Java deal 
with object relocation?

Perhaps we don't need to allocate new memory and memcpy. A heap is 
called a "heap" because it is a priority queue of contiguous memory 
buffers -- free size being the criterion for partial sorting. So we pop 
the buffer (or parts of it?) containing the PyObject off one heap and 
paste it to another, the PyObject* will not change. This might not be 
efficient for cache lines however.

Also, there is the question of attributes. Preferably a Python object 
and its attributes should reside on the same cache line.


From greg.ewing at  Mon May 23 23:33:35 2011
From: greg.ewing at (Greg Ewing)
Date: Tue, 24 May 2011 09:33:35 +1200
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <ird1pu$453$>
References: <>
	<> <>
	<> <>
Message-ID: <>

Stefan Behnel wrote:

> Even CPython could optimise b'x'[0] into a constant, if people ever find 
> this to be a bottleneck.

The need to write such circumlocutions would still be a
nuisance, though.


From alexander.belopolsky at  Tue May 24 00:03:38 2011
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Mon, 23 May 2011 18:03:38 -0400
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ird1pu$453$> <>
Message-ID: <>

On Mon, May 23, 2011 at 5:33 PM, Greg Ewing <greg.ewing at> wrote:
> Stefan Behnel wrote:
>> Even CPython could optimise b'x'[0] into a constant, if people ever find
>> this to be a bottleneck.
> The need to write such circumlocutions would still be a
> nuisance, though.

Not a nuisance enough to warrant a syntax change, IMO.  Note that one
of the proposed alternatives, 0'b' visually is very similar to
b'x'[0].  There are plenty of other options available to users.  My
own favorite is probably,

if bytesdata[i] == 98: # ord('b')

In some cases, when single-byte values have protocol mnemonics, it may
be more appropriate to give them descriptive names:

quit_code = ord('q')
if bytesdata[i] == quit_code:

Finally, I find it rare to have single-byte codes at fixed positions
in protocols.  More often such codes are found after splitting the
bytes data on some kind of separator.

From steve at  Tue May 24 01:58:23 2011
From: steve at (Steven D'Aprano)
Date: Tue, 24 May 2011 09:58:23 +1000
Subject: [Python-ideas] Implementing __dir__ (moving dir implementation
	to object.__dir__?)
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, 24 May 2011 04:16:44 am Michael Foord wrote:
> Hello all,
> I'm looking at implementing __dir__ for a class (mock.Mock as it
> happens) to include some dynamically added attributes, the canonical
> use case according to the documentation:
> Moving the relevant parts of the implementation of dir into
> object.__dir__ would be one way to solve that.

I haven't yet needed to write a custom __dir__, but your proposal makes 
sense to me.


Steven D'Aprano

From benjamin at  Tue May 24 02:10:24 2011
From: benjamin at (Benjamin Peterson)
Date: Tue, 24 May 2011 00:10:24 +0000 (UTC)
Subject: [Python-ideas]
References: <>
Message-ID: <>

Michael Foord <fuzzyman at ...> writes:
> Moving the relevant parts of the implementation of dir into object.__dir__
would be one way to solve that.

Sounds fine to me. Do file a bug report.

From bruce at  Tue May 24 02:18:41 2011
From: bruce at (Bruce Leban)
Date: Mon, 23 May 2011 17:18:41 -0700
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ird1pu$453$> <>
Message-ID: <>

I like c'x'. It's easy to read and very explicitly constant and clear what
the value is 'x'. (Some other letter instead of 'c' would be fine as well.)

I don't like this:

> if bytesdata[i] == 121: # ord('x')

because it looks a heck of a lot like:

> if bytesdata[i] == 120: # ord('x')

and only one of those is correct. That's a very easy bug to miss. I like it
even less without the comment.

I don't care for:

> if bytesdata[i] == ord('x'):

because while ord is a builtin, it's not invulnerable to being changed. In
contrast, string constants and numbers are truly constant.

I recognize that the compiler can optimize:

> if bytesdata[i] == b'x'[0]:

but that looks like chicken scratches to me.

Someone suggested using 0'x' which I don't quite get. It looks too much like
0x to me and the I've always read the leading zero to mean 'this is a

Also, this was raised in the context of bytes and not all characters fit in
a byte. So


work but



Is there a learning curve? Yes, but minor IMHO and if you don't know it,
it's obvious when you see it that you don't know it.

--- Bruce
Follow me:
Latest tweet: SO disappointed end of the world didn't happen AGAIN!
 #rapture <!/search?q=%23rapture> Now waiting for 2038!
#unixrapture <!/search?q=%23unixrapture>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From alexander.belopolsky at  Tue May 24 02:40:51 2011
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Mon, 23 May 2011 20:40:51 -0400
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ird1pu$453$> <>
Message-ID: <>

2011/5/23 Bruce Leban <bruce at>:
> I like c'x'. It's easy to read and very explicitly constant and clear what
> the value is 'x'. (Some other letter instead of 'c' would be fine as well.)

-0 from me

Mainly because unlike b'..' or r'..' constructs, no meaning is
proposed for c'xyz'.

BTW, is it too soon to assign new meaning to back-quotes?  In py3k
they no longer stand for repr(), so we can probably reuse them for
ord()?  On the other hand, this is likely to be a bad idea for the
same reasons as syntax for repr() was.

From stutzbach at  Tue May 24 02:54:49 2011
From: stutzbach at (Daniel Stutzbach)
Date: Mon, 23 May 2011 17:54:49 -0700
Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 20, 2011 at 5:03 PM, Eric Snow <ericsnowcurrently at>wrote:
> The use case is that I want to be able to validate that a class implements
> all of the abstract methods of all the classes to which it has been
> registered.

If you're going down that road, would you be willing to write a patch for along the way?

> I don't have a programmatic way of discovering that set without asking
> every class out there.

I agree it would be nice to have a way to ask a class "which ABCs do you
implement?"  It would be handy for introspection and debugging purposes.

Daniel Stutzbach
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ericsnowcurrently at  Tue May 24 03:21:05 2011
From: ericsnowcurrently at (Eric Snow)
Date: Mon, 23 May 2011 19:21:05 -0600
Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 23, 2011 at 6:54 PM, Daniel Stutzbach <stutzbach at>wrote:

> On Fri, May 20, 2011 at 5:03 PM, Eric Snow <ericsnowcurrently at>wrote:
>> The use case is that I want to be able to validate that a class implements
>> all of the abstract methods of all the classes to which it has been
>> registered.
> If you're going down that road, would you be willing to write a patch for
> along the way?
Interesting.  I was motivated in a similar situation to write a validater in
the same vein [1].  In fact, working on that is where I got thinking about
something like __implements__.  The class I wrote would work with registered
classes in addition to subclasses, if there were such a mechanism.



> I don't have a programmatic way of discovering that set without asking
>> every class out there.
> I agree it would be nice to have a way to ask a class "which ABCs do you
> implement?"  It would be handy for introspection and debugging purposes.
> --
> Daniel Stutzbach
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ericsnowcurrently at  Tue May 24 03:30:00 2011
From: ericsnowcurrently at (Eric Snow)
Date: Mon, 23 May 2011 19:30:00 -0600
Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 22, 2011 at 11:46 PM, Nick Coghlan <ncoghlan at> wrote:

> On Sat, May 21, 2011 at 10:03 AM, Eric Snow <ericsnowcurrently at>
> wrote:
> > This is a small addition, but I realize it [potentially] adds another
> > special method to classes, so it's not trivial.
> > The use case is that I want to be able to validate that a class
> implements
> > all of the abstract methods of all the classes to which it has been
> > registered.  I don't have a programmatic way of discovering that set
> without
> > asking every class out there.  This is an easy way to accomplish this
> (for
> > non-extension/non-builtin types).  An alternative is to subclass ABCMeta
> and
> > tack this on, but that only works for my ABCs.  Another is to use a class
> > decorator to do this any place I do a register (or even to do the
> register
> > too), but again, only for the places that I do the registration.
> > Anyway, if it's useful to me then it may be useful to others, so I wanted
> to
> > put this out there.  I expect this has come up before, particularly
> during
> > discussions about PEP 3119.  However, I wasn't able to track down
> anything
> > specifically about doing this sort of "reverse registration".  And, of
> > course, I may be overestimating the value of this functionality.  If this
> > does not seem that valuable to anyone else, then no big deal.  :)
> An alternative approach to the same idea was to be able to register
> callbacks with ABCs to track registration and deregistration
> operations on that ABC and any subclasses. This has the advantage of
> working with arbitrary objects, including those without mutable
> __dict__ attributes. Such an approach would start by building a type
> map (via ABC.__subclasses__) and then using the callback hooks to keep
> the mapping up to date.
That would be pretty cool.  A simple __implements__ like I described it
would definitely be less flexible.

> I believe there is an open tracker item for that concept, but I can't
> currently find a reference to it.
I believe you are talking about


> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Tue May 24 04:21:53 2011
From: guido at (Guido van Rossum)
Date: Mon, 23 May 2011 19:21:53 -0700
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ird1pu$453$> <>
Message-ID: <>

2011/5/23 Bruce Leban <bruce at>:
> I like c'x'. It's easy to read and very explicitly constant and clear what
> the value is 'x'. (Some other letter instead of 'c' would be fine as well.)

We shouldn't add any new notation to create integers from characters
to the language. It's too small a use case for adding new syntax. I
would focus on agreeing on the notation that is most readable;
personally I vote for ord('x').

--Guido van Rossum (

From stephen at  Tue May 24 04:40:46 2011
From: stephen at (Stephen J. Turnbull)
Date: Tue, 24 May 2011 11:40:46 +0900
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ird1pu$453$> <>
Message-ID: <>

Bruce Leban writes:

 > I recognize that the compiler can optimize:
 > > if bytesdata[i] == b'x'[0]:
 > but that looks like chicken scratches to me.

Using named constants should fix that, and is better style anyway.

 > Someone suggested using 0'x' which I don't quite get. It looks too much like
 > 0x to me

True but minor, IMO YMMV.

 > and the I've always read the leading zero to mean 'this is a
 > number'.

That's precisely Nick's point in suggesting it!

From ncoghlan at  Tue May 24 07:13:53 2011
From: ncoghlan at (Nick Coghlan)
Date: Tue, 24 May 2011 15:13:53 +1000
Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 24, 2011 at 11:30 AM, Eric Snow <ericsnowcurrently at> wrote:
> On Sun, May 22, 2011 at 11:46 PM, Nick Coghlan <ncoghlan at> wrote:
>> I believe there is an open tracker item for that concept, but I can't
>> currently find a reference to it.
> I believe you are talking about?

That's the one (my tracker-fu failed me when I was trying to find it).
I added a link from that issue back to the archive of this thread on


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Tue May 24 07:19:27 2011
From: ncoghlan at (Nick Coghlan)
Date: Tue, 24 May 2011 15:19:27 +1000
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ird1pu$453$> <>
Message-ID: <>

On Tue, May 24, 2011 at 12:40 PM, Stephen J. Turnbull
<stephen at> wrote:
> Bruce Leban writes:
> ?> and the I've always read the leading zero to mean 'this is a
> ?> number'.
> That's precisely Nick's point in suggesting it!

Indeed :)

Still, I've come around to the point of view that the simplest and
clearest way to write it is simply "ord('x')", and if that is in a
time-critical inner loop, save the value in a named variable.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From fuzzyman at  Tue May 24 11:24:43 2011
From: fuzzyman at (Michael Foord)
Date: Tue, 24 May 2011 10:24:43 +0100
Subject: [Python-ideas] Implementing __dir__ (moving dir implementation
	to object.__dir__?)
In-Reply-To: <>
References: <>
Message-ID: <>

On 24 May 2011 01:10, Benjamin Peterson <benjamin at> wrote:

> Michael Foord <fuzzyman at ...> writes:
> > Moving the relevant parts of the implementation of dir into
> object.__dir__
> would be one way to solve that.
> Sounds fine to me. Do file a bug report.

All the best,


> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Tue May 24 14:47:07 2011
From: stephen at (Stephen J. Turnbull)
Date: Tue, 24 May 2011 21:47:07 +0900
Subject: [Python-ideas] Python 3.x and bytes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<ird1pu$453$> <>
Message-ID: <>

Nick Coghlan writes:

 > Still, I've come around to the point of view that the simplest and
 > clearest way to write it is simply "ord('x')", and if that is in a
 > time-critical inner loop, save the value in a named variable.

+1.  Actually, I prefer the latter.  I feel that the former is just a
complicated and expensive magic number in almost all cases.

From songofacandy at  Wed May 25 19:29:48 2011
From: songofacandy at (INADA Naoki)
Date: Thu, 26 May 2011 02:29:48 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
Message-ID: <>

Hi, all.

There are some situation that I want to use bytes as a string in real world.
(I use the 'bstr' for bytes as a string below)

Sadly, Python 3's bytes is not bytestring.
For example, when I want to make 'cat -n' that is transparent to encoding,
Python 3 doesn't permit b'{0:6d}'.format(n) and
is circuitous way against simple requirements.

I think the best way to handle such situation with Python 3 is using 'latin1'
codec. For example, encoding transparent 'cat -n' is:

import sys
fin = open(sys.stdin.fileno(), 'r', encoding='latin1')
fout = open(sys.stdout.fileno(), 'w', encoding='latin1')
for n, L in enumerate(fin):
    fout.write('{0:5d}\t{1}'.format(n, L))

If using 'latin1' is Pythonic way to handle encoding transparent string,
I think Python should provide another alias like 'bytes'.

Any thoughts?

INADA Naoki? <songofacandy at>

From tjreedy at  Thu May 26 03:58:58 2011
From: tjreedy at (Terry Reedy)
Date: Wed, 25 May 2011 21:58:58 -0400
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
Message-ID: <irkc54$gih$>

On 5/25/2011 1:29 PM, INADA Naoki wrote:

> Sadly, Python 3's bytes is not bytestring.

By intention.

> import sys
> fin = open(sys.stdin.fileno(), 'r', encoding='latin1')
> fout = open(sys.stdout.fileno(), 'w', encoding='latin1')
> for n, L in enumerate(fin):
>      fout.write('{0:5d}\t{1}'.format(n, L))
> If using 'latin1' is Pythonic way to handle encoding transparent string,
> I think Python should provide another alias like 'bytes'.

I presume that you mean you would like to write
fin = open(sys.stdin.fileno(), 'r', encoding='bytes')
fout = open(sys.stdout.fileno(), 'w', encoding='bytes')

If such a thing were added, the 256 bytes should directly map to the 
first 256 codepoints. I don't know if 'latin1' does that or not. In any 
case, one can rewrite the above without decoding input lines.

with open('', 'rb') as fin, open('tem2.txt', 'wb') as fout:
   for n, L in enumerate(fin):

(sys.x.fineno raises fineno AttributeError in IDLE.)

Terry Jan Reedy

From songofacandy at  Thu May 26 04:57:24 2011
From: songofacandy at (INADA Naoki)
Date: Thu, 26 May 2011 11:57:24 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <irkc54$gih$>
References: <>
Message-ID: <>

On Thu, May 26, 2011 at 10:58 AM, Terry Reedy <tjreedy at> wrote:
> On 5/25/2011 1:29 PM, INADA Naoki wrote:
>> Sadly, Python 3's bytes is not bytestring.
> By intention.

Yes, I know. But I feel sad because it cause many confusions.
Bytes supports some string methods.

>>> b"foo".capitalize()  # Oh,
>>> b"foo".isalpha()   # alphabets in not-string?
>>> b"foo%d" % 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'int'

>> import sys
>> fin = open(sys.stdin.fileno(), 'r', encoding='latin1')
>> fout = open(sys.stdout.fileno(), 'w', encoding='latin1')
>> for n, L in enumerate(fin):
>> ? ? fout.write('{0:5d}\t{1}'.format(n, L))
>> If using 'latin1' is Pythonic way to handle encoding transparent string,
>> I think Python should provide another alias like 'bytes'.
> I presume that you mean you would like to write
> fin = open(sys.stdin.fileno(), 'r', encoding='bytes')
> fout = open(sys.stdout.fileno(), 'w', encoding='bytes')
> If such a thing were added, the 256 bytes should directly map to the first
> 256 codepoints. I don't know if 'latin1' does that or not. In any case,

Yes, 'latin1' directly maps 256 bytes to 256 codepoints.

> one
> can rewrite the above without decoding input lines.
> with open('', 'rb') as fin, open('tem2.txt', 'wb') as fout:
> ?for n, L in enumerate(fin):
> ? ?fout.write('{0:5d}\t'.format(n).encode('ascii'))
> ? ?fout.write(L)
> (sys.x.fineno raises fineno AttributeError in IDLE.)

There are 2 problems.

1) binary mode doesn't support line buffering. So I should disable buffering
    and this may cause performance regression.

2) Requiring .encode('ascii') is less attractive when using Python as
a scripting
   language in Unix.

But latin1 approach has disadvantage of performance and memory usage.

I think Python 3 doesn't provide easy and efficient way to implement encoding
transparent command like 'cat -n'. It's very sad.

INADA Naoki? <songofacandy at>

From tjreedy at  Thu May 26 06:09:42 2011
From: tjreedy at (Terry Reedy)
Date: Thu, 26 May 2011 00:09:42 -0400
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>	<irkc54$gih$>
Message-ID: <irkjq7$it9$>

On 5/25/2011 10:57 PM, INADA Naoki wrote:
> Bytes supports some string methods.
As exactly specified in 4.6.5. Bytes and Byte Array Methods
There is really no need to repeat what everyone reading this knows.

 > I wrote
>> with open('', 'rb') as fin, open('tem2.txt', 'wb') as fout:
>>   for n, L in enumerate(fin):
>>     fout.write('{0:5d}\t'.format(n).encode('ascii'))
>>     fout.write(L)
>> (sys.x.fineno raises fineno AttributeError in IDLE.)
> There are 2 problems.
> 1) binary mode doesn't support line buffering. So I should disable buffering
>      and this may cause performance regression.

*nix already has a c-coded cat command; Windows has copy commands.
So there is no need to design Python for this.
Cat is usually used with files rather than terminals ans screens.
When it is used with terminals and screens, the extra encode/decode does 
not matter.
Realistic Python programs that actually do something with the text  need 
to decode with the actual encoding, regardless of byte source.
So I do not think we need a bytes alias for latin_1.
The docs might mention that it is essentially a do-nothing codec.

Terry Jan Reedy

From serge.hulne at  Thu May 26 07:29:47 2011
From: serge.hulne at (Serge Hulne)
Date: Thu, 26 May 2011 07:29:47 +0200
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
Message-ID: <>

Suggestion: Integrate the script "" as standard command for
formatting pyhton code

Here is the link;

Pindent stands for "Pyton indent":

Goal :

   1. It provides bloc delimiters (end of blocks) in the for of comments
   (like "#end if" or "#end for" etc ... )
   2. This allows one to check / restore the indentation of Python code, in
   cases where>
      1. A copy/paste went wrong
      2. The indentation of a Python source got corrupted when the script
      was posted on web page, send via email etc ...
      3. Standardise (fix) sources which happily mix whitespaces and tabs
      4. Make Python code more readable for developers used to end of blocs
      delimiters (Ruby, C, C++, C#,Java, etc ...)

 Basically the idea is the same as the Go language "gofmt" (Go format).


- Before using pindent:

#!/usr/bin env python

i = 0
for c in "hello world":
    if c == 'l':
        print "number of occurrences of `l` :", i

- After using indent:

#!/usr/bin env python

i = 0
for c in "hello world":
    if c == 'l':
        print "number of occurrences of `l` :", i
    # end if
# end for

Serge Hulne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Thu May 26 07:42:18 2011
From: ncoghlan at (Nick Coghlan)
Date: Thu, 26 May 2011 15:42:18 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 26, 2011 at 3:29 AM, INADA Naoki <songofacandy at> wrote:
> There are some situation that I want to use bytes as a string in real world.

Breaking the bytes-are-text mental model is something we deliberately
set out to do with Python 3 (because it is wrong). In today's global
environment, programmers *need* to learn about text encoding issues as
treating bytes as text without finding out the encoding first is a
surefire way to get unintelligible mojibake. If "What does 'latin-1'
mean?" is a question that gets them there, then that's fine.

You *cannot* transparently handle data in arbitrary encodings, as the
meanings of the bytes change based on the encoding (this is especially
true when dealing with non-ASCII compatible encodings).

That said, decoding and reencoding via 'ascii' (strict 7-bit) or
'latin-1' (full 8-bit) is the easiest way to handle both strings and
bytes input reasonably efficiently. See urllib.parse for examples on
how to do that.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From stefan_ml at  Thu May 26 11:15:19 2011
From: stefan_ml at (Stefan Behnel)
Date: Thu, 26 May 2011 11:15:19 +0200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <irkc54$gih$>
References: <>
Message-ID: <irl5n7$l1q$>

Terry Reedy, 26.05.2011 03:58:
> If such a thing were added, the 256 bytes should directly map to the first
> 256 codepoints. I don't know if 'latin1' does that or not.

Yes, Unicode was specifically designed to support that. The first 128 code 
points are identical with the ASCII encoding, the first 256 code points are 
identical with the Latin-1 encoding.

See also PEP 393, which exploits this feature.

That being said, I don't see the point of aliasing "latin-1" to "bytes" in 
the codecs. That sounds confusing to me.


From cmjohnson.mailinglist at  Thu May 26 12:53:36 2011
From: cmjohnson.mailinglist at (Carl M. Johnson)
Date: Thu, 26 May 2011 00:53:36 -1000
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 25, 2011 at 7:29 PM, Serge Hulne <serge.hulne at> wrote:

>  Basically the idea is the same as the Go language "gofmt" (Go format).

Something like gofmt is imaginable for Python. Block delimiters are not.
Never gonna happen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From cmjohnson.mailinglist at  Thu May 26 12:59:58 2011
From: cmjohnson.mailinglist at (Carl M. Johnson)
Date: Thu, 26 May 2011 00:59:58 -1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <irl5n7$l1q$>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <stefan_ml at> wrote:

> Yes, Unicode was specifically designed to support that. The first 128 code
> points are identical with the ASCII encoding, the first 256 code points are
> identical with the Latin-1 encoding.
> See also PEP 393, which exploits this feature.
> That being said, I don't see the point of aliasing "latin-1" to "bytes" in
> the codecs. That sounds confusing to me.

"bytes" is probably the wrong name for it, but I think using some name to
signal "I'm not really using this encoding, I just need to be able to pass
these bytes into and out of a string without losing any bits" might be
better than using "latin-1" if we're forced to take up this hack. (My gut
feeling is that it would be better if we could avoid using the "latin-1"
hack all together, but apparently wiser minds than me have decided we have
no other choice.) Maybe we could call it "passthrough"? And we could add a
documentation note that if you use "passthrough" to decode some bytes you
must, must, must use it to encode them later, since the string you
manipulate won't really contain unicode codepoints, just a transparent byte

-- Carl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mal at  Thu May 26 13:13:29 2011
From: mal at (M.-A. Lemburg)
Date: Thu, 26 May 2011 13:13:29 +0200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>	<irkc54$gih$>
Message-ID: <>

Carl M. Johnson wrote:
> On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <stefan_ml at> wrote:
>> Yes, Unicode was specifically designed to support that. The first 128 code
>> points are identical with the ASCII encoding, the first 256 code points are
>> identical with the Latin-1 encoding.
>> See also PEP 393, which exploits this feature.
>> That being said, I don't see the point of aliasing "latin-1" to "bytes" in
>> the codecs. That sounds confusing to me.
> "bytes" is probably the wrong name for it, but I think using some name to
> signal "I'm not really using this encoding, I just need to be able to pass
> these bytes into and out of a string without losing any bits" might be
> better than using "latin-1" if we're forced to take up this hack. (My gut
> feeling is that it would be better if we could avoid using the "latin-1"
> hack all together, but apparently wiser minds than me have decided we have
> no other choice.) Maybe we could call it "passthrough"? And we could add a
> documentation note that if you use "passthrough" to decode some bytes you
> must, must, must use it to encode them later, since the string you
> manipulate won't really contain unicode codepoints, just a transparent byte
> encoding?

If you really wish to carry around binary data in a Unicode
object, then you should use a codec that maps the 256 code
points in a byte to either a private code point area or
use a hack like the surrogateescape approach defined in
PEP 383:

By using 'latin-1' you can potentially have the binary data
leak into other text data of your application, or worse,
have it converted to a different encoding on output, e.g.
when sending the data to a UTF-8 pipe.

In any case, this is bound to create hard to detect problems.

Better use bytes to begin with.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 26 2011)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2011-05-23: Released eGenix mx Base 3.2.0
2011-05-25: Released mxODBC 3.1.1    
2011-06-20: EuroPython 2011, Florence, Italy               25 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From serge.hulne at  Thu May 26 13:15:37 2011
From: serge.hulne at (Serge Hulne)
Date: Thu, 26 May 2011 13:15:37 +0200
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

Actually these are "fake" bloc delimiters (in the shape of comments, see
example in the original post).

By this I mean they are used by the formatting tool (pindent) only, not by
the language (Python itself).

They are (generated by and used by) pindent for the sake of being able to
fix the indent level in python code when :

   1. A copy / paste went bad (e.g. the last line of a for bloc has been
   "pasted at the wrong indentation level").
   2. A source file lost all indentation when been mailed because, say, the
   tabs have been stripped
   3. etc...

I do not see how there can be an equivalent of gofmt if there is no
*indication* of the end of the blocs (independent of the indentation, that

It is my feeling that without such a tool Python is inherently very
vulnerable to glitches occurring at editing time:

   1. Copy / paste glitch that passes unnoticed, does not generate an
      exception but alters the logic of the program.
      2. Tab key inadvertently hit.
      3. Difficulty in assessing the target indentation level when a part of
      a bloc has to be pasted in a different part of the code.

Serge Hulne.

On Thu, May 26, 2011 at 7:29 AM, Serge Hulne <serge.hulne at> wrote:

> Suggestion: Integrate the script "" as standard command for
> formatting pyhton code
> Here is the link;
> Pindent stands for "Pyton indent":
> Goal :
>    1. It provides bloc delimiters (end of blocks) in the for of comments
>    (like "#end if" or "#end for" etc ... )
>    2. This allows one to check / restore the indentation of Python code,
>    in cases where>
>       1. A copy/paste went wrong
>       2. The indentation of a Python source got corrupted when the script
>       was posted on web page, send via email etc ...
>       3. Standardise (fix) sources which happily mix whitespaces and tabs
>       4. Make Python code more readable for developers used to end of
>       blocs delimiters (Ruby, C, C++, C#,Java, etc ...)
>  Basically the idea is the same as the Go language "gofmt" (Go format).
> Example:
> #-------------------
> - Before using pindent:
> #!/usr/bin env python
> i = 0
> for c in "hello world":
>     if c == 'l':
>         i+=1
>         print "number of occurrences of `l` :", i
> #------------------
> - After using indent:
> #!/usr/bin env python
> i = 0
> for c in "hello world":
>     if c == 'l':
>         i+=1
>         print "number of occurrences of `l` :", i
>     # end if
> # end for
> Serge Hulne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From masklinn at  Thu May 26 13:17:07 2011
From: masklinn at (Masklinn)
Date: Thu, 26 May 2011 13:17:07 +0200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On 2011-05-26, at 12:59 , Carl M. Johnson wrote:
> On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <stefan_ml at> wrote:
>> Yes, Unicode was specifically designed to support that. The first 128 code
>> points are identical with the ASCII encoding, the first 256 code points are
>> identical with the Latin-1 encoding.
>> See also PEP 393, which exploits this feature.
>> That being said, I don't see the point of aliasing "latin-1" to "bytes" in
>> the codecs. That sounds confusing to me.
> "bytes" is probably the wrong name for it, but I think using some name to
> signal "I'm not really using this encoding, I just need to be able to pass
> these bytes into and out of a string without losing any bits" might be
> better than using "latin-1" if we're forced to take up this hack. (My gut
> feeling is that it would be better if we could avoid using the "latin-1"
> hack all together, but apparently wiser minds than me have decided we have
> no other choice.) Maybe we could call it "passthrough"? And we could add a
> documentation note that if you use "passthrough" to decode some bytes you
> must, must, must use it to encode them later, since the string you
> manipulate won't really contain unicode codepoints, just a transparent byte
> encoding?

Considering the original use case, which seems to be mostly about being able to use .format, would it make more sense to be able to create "byte patterns", with formats similar to those of str.format but not identical (e.g. better control on layout would be nice, something similar to Erlang's bit syntax for putting binaries together).

This would be useful to put together byte sequences from existing values to e.g. output binary formats.

From jimjjewett at  Thu May 26 15:45:51 2011
From: jimjjewett at (Jim Jewett)
Date: Thu, 26 May 2011 09:45:51 -0400
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 26, 2011 at 7:15 AM, Serge Hulne <serge.hulne at> wrote:
> Actually these are "fake" bloc delimiters (in the shape of comments, see
> example in the original post).

They are inherently bad, because they are extra noise.  The question
is whether they add enough value to make up for that.

> A copy / paste went bad (e.g. the last line of a for bloc has been "pasted
> at the wrong indentation level").

For me, it is usually either the entire bloc, or just the first line
that is wrong.

> A source file lost all indentation when been mailed because, say, the tabs
> have been stripped
> etc...

This has been an annoyance on the python lists lately; I'm not sure
why, but a lot of the recent code has come through (at least on my
gmail account) without indentation at all.

The catch is, I have usually been able to figure out where the
indents/dedents should go; if I can't, it is a sign that the function
is too long.  And these extra comments only make the functions


From brian.curtin at  Thu May 26 16:06:44 2011
From: brian.curtin at (Brian Curtin)
Date: Thu, 26 May 2011 09:06:44 -0500
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 26, 2011 at 00:29, Serge Hulne <serge.hulne at> wrote:

> Suggestion: Integrate the script "" as standard command for
> formatting pyhton code
> Here is the link;
> Pindent stands for "Pyton indent":
> Goal :
>    1. It provides bloc delimiters (end of blocks) in the for of comments
>    (like "#end if" or "#end for" etc ... )
>    2. This allows one to check / restore the indentation of Python code,
>    in cases where>
>       1. A copy/paste went wrong
>       2. The indentation of a Python source got corrupted when the script
>       was posted on web page, send via email etc ...
>       3. Standardise (fix) sources which happily mix whitespaces and tabs
>       4. Make Python code more readable for developers used to end of
>       blocs delimiters (Ruby, C, C++, C#,Java, etc ...)
>  Basically the idea is the same as the Go language "gofmt" (Go format).
> Example:
> #-------------------
> - Before using pindent:
> #!/usr/bin env python
> i = 0
> for c in "hello world":
>     if c == 'l':
>         i+=1
>         print "number of occurrences of `l` :", i
> #------------------
> - After using indent:
> #!/usr/bin env python
> i = 0
> for c in "hello world":
>     if c == 'l':
>         i+=1
>         print "number of occurrences of `l` :", i
>     # end if
> # end for

This is already included in the Python source tree, so I'm not sure what
further inclusion/integration you are suggesting. I don't find this style
necessary nor is it really a good style to promote, especially because
Python isn't Ruby, C++, or any of the languages you listed.

The only time I've found it sort-of ok to do this is if a block nested in
other blocks spans more than the height of one monitor view, which isn't
often. Even then, most IDEs and editors handle this by having optional
guides for block beginning and ending.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From g.rodola at  Thu May 26 16:12:55 2011
From: g.rodola at (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=)
Date: Thu, 26 May 2011 16:12:55 +0200
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

Brian Curtin <brian.curtin at>:
> This is already included in the Python source tree, so I'm not sure what
> further inclusion/integration you are suggesting.

Really? I honestly fail to understand why one would want to use such a
tool at all.
It always assumes the worst scenario (bad indentation / mixed tab
spaces / copy & paste went bad) and tries to solve it by adding
unnecessary cruft.

2011/5/26 Serge Hulne <serge.hulne at>:
> Make Python code more readable for developers used to end of blocs
> delimiters (Ruby, C, C++, C#,Java, etc ...)

Unless the block code is very long and/or not nicely written it's
*less* readable.

--- Giampaolo

From ncoghlan at  Thu May 26 16:55:55 2011
From: ncoghlan at (Nick Coghlan)
Date: Fri, 27 May 2011 00:55:55 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Thu, May 26, 2011 at 9:17 PM, Masklinn <masklinn at> wrote:
> Considering the original use case, which seems to be mostly about being able to use .format, would it make more sense to be able to create "byte patterns", with formats similar to those of str.format but not identical (e.g. better control on layout would be nice, something similar to Erlang's bit syntax for putting binaries together).
> This would be useful to put together byte sequences from existing values to e.g. output binary formats.

We already have an entire module dedicated to the task of handling
binary formats:

"format(n, '6d').encode('ascii')" is the right way to get the string
representation of a number as ASCII bytes. However, the programmer
needs to be aware that concatenating those bytes with an encoding that
is not ASCII compatible (such as UTF-16, UTF-32, or many of the Asian
encodings) will result in a sequence of unusable garbage. It is far,
far safer to transform everything into the text domain, work with it
there, then encode back when the manipulation is complete.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From masklinn at  Thu May 26 17:56:48 2011
From: masklinn at (Masklinn)
Date: Thu, 26 May 2011 17:56:48 +0200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On 2011-05-26, at 16:55 , Nick Coghlan wrote:

> On Thu, May 26, 2011 at 9:17 PM, Masklinn <masklinn at> wrote:
>> Considering the original use case, which seems to be mostly about being able to use .format, would it make more sense to be able to create "byte patterns", with formats similar to those of str.format but not identical (e.g. better control on layout would be nice, something similar to Erlang's bit syntax for putting binaries together).
>> This would be useful to put together byte sequences from existing values to e.g. output binary formats.
> We already have an entire module dedicated to the task of handling
> binary formats:
Sure, but:

1. It does not matter overly much, there are many cases where this did not stop the core team from agreeing the problem was insufficiently well solved (latest instance: string formatting, the current builtin solution being predated by an other builtin and at least one previous stdlib solution)

2. struct suffers from a bunch of issues
  - it ranks low in discoverability, people who have not bit-twiddled much in C may not realize that a struct (in C) is just an interpretation pattern on a byte string, and it's advertised as an interaction between Python and C structs, not arbitrary bytes patterns/building
  - struct format strings are "wonky" (in that they're nothing like those of str.format)
  - struct format strings simply can't deal with mixing literal "character bytes" and format specs, making formats with fixed ascii structures significantly less readable

> "format(n, '6d').encode('ascii')" is the right way to get the string
> representation of a number as ASCII bytes. However, the programmer
> needs to be aware that concatenating those bytes with an encoding that
> is not ASCII compatible (such as UTF-16, UTF-32, or many of the Asian
> encodings) will result in a sequence of unusable garbage. It is far,
> far safer to transform everything into the text domain, work with it
> there, then encode back when the manipulation is complete.
Sure, but as you noted this is not even always done in the stdlib, why third-party developers would be expected to be in a better situation?

And between jumping through a semi-arbitrary decode/encode cycle whose semantics are completely ignored and being able to just specify a bytes pattern, which seems stranger?

And I'm probably overstating its importance, but erlang seems to do rather well with its bit syntax. Which is much closer to str.format than to struct.pack (in API, in looks, in complexity, ?)

From benjamin at  Thu May 26 22:19:37 2011
From: benjamin at (Benjamin Peterson)
Date: Thu, 26 May 2011 20:19:37 +0000 (UTC)
Subject: [Python-ideas] Suggestion: Integrate the script "" as
	standard command for formatting pyhton code
References: <>
Message-ID: <>

Serge Hulne <serge.hulne at ...> writes:

> Suggestion: Integrate the script ""

A more useful script in my opinion is "".

> A copy/paste went wrong
> The indentation of a Python source got corrupted when the script was posted on
web page, send via email etc ...
> Standardise (fix) sources which happily mix whitespaces and tabs 

Since it does just this and nothing else.

From tjreedy at  Fri May 27 00:26:21 2011
From: tjreedy at (Terry Reedy)
Date: Thu, 26 May 2011 18:26:21 -0400
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>	<irkc54$gih$>
	<irl5n7$l1q$>	<>
Message-ID: <irmk2f$ess$>

On 5/26/2011 7:17 AM, Masklinn wrote:
> Considering the original use case,

to prefix ascii-encoded numbers to lines in an unknown but 
ascii-compatible encoding*,
and considering the responses since my last post, I have changed from -0 
to -1 to the alias proposal.

1. The use case does not need the fake decoding and is better off 
without it.
2. I suspect the uses cases where fake decoding is both needed and 
sufficient are relatively rare.
3. Fake decoding is dangerous (Lemburg).
4. People who know enough to use it safely should already know about how 
latin-1 relates to unicode, and therefore do not need an alias.
5. Other people should not be encouraged to use it as a fake.

*I meant to ask earlier whether there are ascii-incompatible encodings 
for which the original code and my revision would not work. I gather 
from the responses that yes, there are some.

Terry Jan Reedy

From greg.ewing at  Fri May 27 01:28:47 2011
From: greg.ewing at (Greg Ewing)
Date: Fri, 27 May 2011 11:28:47 +1200
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

Serge Hulne wrote:

> It is my feeling that without such a tool Python is inherently very 
> vulnerable to glitches occurring at editing time:
>          1. Copy / paste glitch that passes unnoticed, does not generate
>             an exception but alters the logic of the program.
>          2. Tab key inadvertently hit.
>          3. Difficulty in assessing the target indentation level when a
>             part of a bloc has to be pasted in a different part of the
>             code.

How much actual experience have you had writing and editing
Python code? While it might seem from a theoretical viewpoint
that these problems should exist, in my experience they occur
very rarely, if at all.

Even sending Python code by email seems to be fine most of
the time as long as you indent it with spaces, unless there
is some particularly braindamaged piece of software in the
way. All the Python mailing lists and newsgroups I frequent
seem to handle space-indented Python just fine.

I don't think any tool to add block-delimiting comments is
going to gain much adoption, because the uglification of the
code that it results in is grossly out of proportion to the
actual magnitude of the problem.


From greg.ewing at  Fri May 27 01:34:51 2011
From: greg.ewing at (Greg Ewing)
Date: Fri, 27 May 2011 11:34:51 +1200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Masklinn wrote:

> would it make more sense to be able to create "byte patterns",
> with formats similar to those of str.format but not identical (e.g. better
> control on layout would be nice, something similar to Erlang's bit syntax for
> putting binaries together).

Sounds a lot like struct.pack. Maybe struct.pack and struct.unpack
could be made available as methods of bytes?

I don't think this would address the OP's use case, though, because
he seems to actually want a textual format whose output is encoded
in ascii.


From songofacandy at  Fri May 27 04:02:52 2011
From: songofacandy at (INADA Naoki)
Date: Fri, 27 May 2011 11:02:52 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <irmk2f$ess$>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Fri, May 27, 2011 at 7:26 AM, Terry Reedy <tjreedy at> wrote:
> On 5/26/2011 7:17 AM, Masklinn wrote:
>> Considering the original use case,
> to prefix ascii-encoded numbers to lines in an unknown but ascii-compatible
> encoding*,
> and considering the responses since my last post, I have changed from -0 to
> -1 to the alias proposal.
> 1. The use case does not need the fake decoding and is better off without
> it.
> 2. I suspect the uses cases where fake decoding is both needed and
> sufficient are relatively rare.
> 3. Fake decoding is dangerous (Lemburg).
> 4. People who know enough to use it safely should already know about how
> latin-1 relates to unicode, and therefore do not need an alias.
> 5. Other people should not be encouraged to use it as a fake.

OK, I understand that using 'latin1' is just a hack and not Pythonic way.

Then, I hope bytes has a fast and efficient "format" method like:
>>> b'{0} {1}'.format(23, b'foo')  # accepts int, float, bytes, bool, None
23 foo
>>> b'{0}'.format('foo')  # raises TypeError for other types.

And line buffering in binary mode is also nice.

> *I meant to ask earlier whether there are ascii-incompatible encodings for
> which the original code and my revision would not work. I gather from the
> responses that yes, there are some.
> --
> Terry Jan Reedy
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

INADA Naoki? <songofacandy at>

From stephen at  Fri May 27 04:59:58 2011
From: stephen at (Stephen J. Turnbull)
Date: Fri, 27 May 2011 11:59:58 +0900
Subject: [Python-ideas]  Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
Message-ID: <>

INADA Naoki writes:

 > Any thoughts?

-1 TOOWTDI.  No alias, please.  It's just an idiom people who need the
functionality will need to learn (but see comment on urllib.parse below).

As Terry says, it's hard to believe that use of the latin1 codec and
str for internal processing is going to be a bottleneck in practical

I wonder if it would be possible to generalize Nick's work on
urllib.parse to a more general class.

From ncoghlan at  Fri May 27 06:41:04 2011
From: ncoghlan at (Nick Coghlan)
Date: Fri, 27 May 2011 14:41:04 +1000
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 27, 2011 at 9:28 AM, Greg Ewing <greg.ewing at> wrote:
> Even sending Python code by email seems to be fine most of
> the time as long as you indent it with spaces, unless there
> is some particularly braindamaged piece of software in the
> way. All the Python mailing lists and newsgroups I frequent
> seem to handle space-indented Python just fine.

Email is generally fine, but quite a few commenting systems are
braindead when it comes to handling whitespace correctly. Even there,
a simple leading dot on each line can generally resolve the issue, or
else you put the code on a code pasting site and just link to it from
the comment.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Fri May 27 07:11:41 2011
From: ncoghlan at (Nick Coghlan)
Date: Fri, 27 May 2011 15:11:41 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 27, 2011 at 12:59 PM, Stephen J. Turnbull
<stephen at> wrote:
> I wonder if it would be possible to generalize Nick's work on
> urllib.parse to a more general class.

I thought about that when I was implementing it, and I don't really
think so. The decode/encode cycle in urllib.parse is based on a few
key elements:

1. The URL standard itself mandates a 7-bit ASCII bytestream. The
implicit conversion accordingly uses the ascii codec with strict error
handling, so if you want to handle malformed URLs, you still have to
do your own decoding and pass in already decoded text strings rather
than the raw bytes (as there is no way for the library to guess an
appropriate encoding for any non-ASCII bytes it encounters).
2. The affected urllib.parse APIs are all stateless - the output is
determined by the inputs. Accordingly, it was fairly straightforward
to coerce all of the arguments to strings and also create a "coerce
result" callable that is either a no-op that just returns its argument
(string inputs) or calls .encode() on its input and returns that
(bytes/bytearray inputs)
3. All of the operations that returned tuples were updated to return
namedtuple subclasses with an encode() method that passed the encoding
command down to the individual tuple elements. These subclasses all
came in matched pairs (one that held only strings, another that held
only bytes).

The argument coercion function could probably be extracted and placed
in the string module, but it isn't all that useful on its own - it's
adequate if you're only returning single strings, but needs to be
matched with an appropriately designed class hierarchy if you're
returning anything more complicated.

I believe RDM used a similar design pattern of parallel bytes and
string based return types to get the email package into a more usable
state for 3.2.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Fri May 27 07:24:25 2011
From: ncoghlan at (Nick Coghlan)
Date: Fri, 27 May 2011 15:24:25 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Fri, May 27, 2011 at 12:02 PM, INADA Naoki <songofacandy at> wrote:
> Then, I hope bytes has a fast and efficient "format" method like:
>>>> b'{0} {1}'.format(23, b'foo') ?# accepts int, float, bytes, bool, None
> 23 foo
>>>> b'{0}'.format('foo') ?# raises TypeError for other types.
> TypeError

What method is invoked to convert the numbers to text? What encoding
is used to convert those numbers to text? How does this operation
avoid also converting the *bytes* object to text and then reencoding

Bytes are not text. Struggling against that is a recipe for making
life hard for yourself in Python 3.

That said, there *may* still be a place for bytes.format(). However,
proper attention needs to be paid to the encoding issues, and the
question of how arbitrary types can be supported (including how to
handle the fast path for existing bytes() and bytearray() objects).
The pedagogic cost of making it even harder than it already is to
convince people that bytes are not text would also need to be

> And line buffering in binary mode is also nice.

The Python 3 IO stack already provides b'\n' based line buffering for
binary files.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From songofacandy at  Fri May 27 08:14:57 2011
From: songofacandy at (INADA Naoki)
Date: Fri, 27 May 2011 15:14:57 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Fri, May 27, 2011 at 2:24 PM, Nick Coghlan <ncoghlan at> wrote:
> On Fri, May 27, 2011 at 12:02 PM, INADA Naoki <songofacandy at> wrote:
>> Then, I hope bytes has a fast and efficient "format" method like:
>>>>> b'{0} {1}'.format(23, b'foo') ?# accepts int, float, bytes, bool, None
>> 23 foo
>>>>> b'{0}'.format('foo') ?# raises TypeError for other types.
>> TypeError
> What method is invoked to convert the numbers to text?

Doesn't invoke any methods. Please imagine stdio's pritnf.

> What encoding
> is used to convert those numbers to text?
> How does this operation
> avoid also converting the *bytes* object to text and then reencoding
> it?

I've wrote a wrong example.

>>>>> b'{0} {1}'.format(23, b'foo') ?# accepts int, float, bytes, bool, None
>> 23 foo

This should be b'23 foo'. Numbers encoded by ascii.

> Bytes are not text. Struggling against that is a recipe for making
> life hard for yourself in Python 3.

I love unicode and use unicode when I can use it.
But this is a problem in the real world.
For example, Python 2 is convenient for analyzing line based logs
containing some different encodings. Python 3

> That said, there *may* still be a place for bytes.format(). However,
> proper attention needs to be paid to the encoding issues, and the
> question of how arbitrary types can be supported (including how to
> handle the fast path for existing bytes() and bytearray() objects).
> The pedagogic cost of making it even harder than it already is to
> convince people that bytes are not text would also need to be
> considered.
>> And line buffering in binary mode is also nice.
> The Python 3 IO stack already provides b'\n' based line buffering for
> binary files.

But the doc says that "1 to select line buffering (only usable in text mode),"

> Cheers,
> Nick.
> --
> Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

INADA Naoki? <songofacandy at>

From ncoghlan at  Fri May 27 08:37:31 2011
From: ncoghlan at (Nick Coghlan)
Date: Fri, 27 May 2011 16:37:31 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Fri, May 27, 2011 at 4:14 PM, INADA Naoki <songofacandy at> wrote:
> But the doc says that "1 to select line buffering (only usable in text mode),"

True, I was thinking about the public API (readline/readlines) rather
than the underlying buffering.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Fri May 27 08:45:13 2011
From: ncoghlan at (Nick Coghlan)
Date: Fri, 27 May 2011 16:45:13 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Fri, May 27, 2011 at 4:14 PM, INADA Naoki <songofacandy at> wrote:
> I love unicode and use unicode when I can use it.
> But this is a problem in the real world.
> For example, Python 2 is convenient for analyzing line based logs
> containing some different encodings. Python 3

...deliberately makes that difficult because it is *wrong*.

Binary files containing a mixture of encodings cannot be safely
treated as text. The closest it is possible to get is to support only
ASCII compatible encodings by decoding it as ASCII with the
"surrogateescape" error handler so that bytes with the high order bit
set can be faithfully reproduced on reencoding. However, such code
will potentially fail once it encounters a non-ASCII compatible
encoding, such as UTF-16 or -32.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From stephen at  Fri May 27 10:46:48 2011
From: stephen at (Stephen J. Turnbull)
Date: Fri, 27 May 2011 17:46:48 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Nick Coghlan writes:
 > On Fri, May 27, 2011 at 12:02 PM, INADA Naoki <songofacandy at> wrote:

 > > Then, I hope bytes has a fast and efficient "format" method like:

I still don't see a use case for a fast and efficient bytes.format()
method.  The latin-1 codec is O(n) with a very small coefficient.

It seems to me this is "really" all about TOOWTDI: we'd like to be
able to interpolate data received as arguments into a data stream
using the same idiom everywhere, whether the stream consists of text,
bytes, or class Froooble instances.  (I admit I don't offhand know how
you'd spell "{0}" in a Froooble stream.)  OK, so at present only bytes
is a plausible application, but I'm willing to go there.  Then, if it
turns out that the latin-1 codec imposes too high overhead on .format()
in some application, the concerned parties can optimize it.

 > >>>> b'{0} {1}'.format(23, b'foo') ?# accepts int, float, bytes, bool, None

I don't see a use case for accepting bool or None.  I hadn't thought
about float, but are you really gonna need it?  On-the-fly generation
of CSS "'{0}em'.format(0.5)" or something like that, I guess?

 > > 23 foo
 > >>>> b'{0}'.format('foo') ?# raises TypeError for other types.

Philip Eby has a use case for accepting str as long as the ascii codec
in strict error mode works on the particular instances of str.
Although I'm not sure he would consider a .format() method efficient
enough, ISTR he wanted the compiler to convert literals.

 > > TypeError
 > What method is invoked to convert the numbers to text? What encoding
 > is used to convert those numbers to text? How does this operation
 > avoid also converting the *bytes* object to text and then reencoding
 > it?

OTOH, Nick, aren't you making this harder than it needs to be?  After

 > Bytes are not text.

Precisely.  So bytes.format() need not handle *all* text-like
manipulations, just protocol magic that puns ASCII-encoded text.

If a bytes object is displayed sorta like text, then it *is* *all*
bytes in the ASCII repertoire (not even the right half of Latin-1 is
allowed).  In bytes.format(), bytes are bytes, they don't get encoded,
they just get interpolated into the bytes object being created.  For
other stuff, especially integers, if there is a conventional
represention for it in ASCII, it *might* be an appropriate conversion
for bytes.format() (but see above for my reservations about several
common Python types).

str (Unicode) might be converted via the ascii codec in strict errors
mode, although the purist in me really would rather not go there.

AFAICS, this handles all use cases presented so far.

 > The pedagogic cost of making it even harder than it already is to
 > convince people that bytes are not text would also need to be
 > considered.

This bothers me quite a bit, but my sense is that practicality is
going to beat purity (into a bloody pulp :-P) once again.

From ncoghlan at  Fri May 27 11:27:54 2011
From: ncoghlan at (Nick Coghlan)
Date: Fri, 27 May 2011 19:27:54 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Fri, May 27, 2011 at 6:46 PM, Stephen J. Turnbull <stephen at> wrote:
> ?> What method is invoked to convert the numbers to text? What encoding
> ?> is used to convert those numbers to text? How does this operation
> ?> avoid also converting the *bytes* object to text and then reencoding
> ?> it?
> OTOH, Nick, aren't you making this harder than it needs to be? ?After
> all,

To me, the defining feature of str.format() over str.__mod__() is the
ability for types to provide their own __format__ methods, rather than
being limited to a predefined set of types known to the interpreter.
If bytes were to reuse the same name, then I'd want to see similar

Now, a *different* bytes method (bytes.interpolate, perhaps?), limited
to specific types may make sense, but such an alternative *shouldn't*
be conflated with the text formatting API.

However, proponents of such an addition need to clearly articulate
their use cases and proposed solution in a PEP to make it clear that
they aren't merely trying to perpetuate the bytes/text confusion that
plagues 2.x 8-bit strings.

We can almost certainly do better when it comes to constructing byte
sequences from component parts, but simply saying "oh, just add a
format() method to bytes objects" doesn't cut it, since the associated
magic methods for str.format are all string based, and bytes
interpolation also needs to address encoding issues for anything that
isn't already a byte sequence.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From masklinn at  Fri May 27 11:41:32 2011
From: masklinn at (Masklinn)
Date: Fri, 27 May 2011 11:41:32 +0200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On 2011-05-27, at 11:27 , Nick Coghlan wrote:
> On Fri, May 27, 2011 at 6:46 PM, Stephen J. Turnbull <stephen at> wrote:
>>  > What method is invoked to convert the numbers to text? What encoding
>>  > is used to convert those numbers to text? How does this operation
>>  > avoid also converting the *bytes* object to text and then reencoding
>>  > it?
>> OTOH, Nick, aren't you making this harder than it needs to be?  After
>> all,
> To me, the defining feature of str.format() over str.__mod__() is the
> ability for types to provide their own __format__ methods, rather than
> being limited to a predefined set of types known to the interpreter.
> If bytes were to reuse the same name, then I'd want to see similar
> flexibility.
> Now, a *different* bytes method (bytes.interpolate, perhaps?), limited
> to specific types may make sense, but such an alternative *shouldn't*
> be conflated with the text formatting API.
> However, proponents of such an addition need to clearly articulate
> their use cases and proposed solution in a PEP to make it clear that
> they aren't merely trying to perpetuate the bytes/text confusion that
> plagues 2.x 8-bit strings.
> We can almost certainly do better when it comes to constructing byte
> sequences from component parts, but simply saying "oh, just add a
> format() method to bytes objects" doesn't cut it, since the associated
> magic methods for str.format are all string based, and bytes
> interpolation also needs to address encoding issues for anything that
> isn't already a byte sequence.

I don't see anything I could disagree with. Especially not in the last

From theller at  Fri May 27 12:04:40 2011
From: theller at (Thomas Heller)
Date: Fri, 27 May 2011 12:04:40 +0200
Subject: [Python-ideas] Threading hooks and disable gc per thread
In-Reply-To: <>
References: <>
Message-ID: <>

Am 12.05.2011 01:58, schrieb Christian Heimes:
> Hello,
> today I've spent several hours debugging a segfault in JCC [1]. JCC is a
> framework to wrap Java code for Python. It's most prominently used in
> PyLucene [2]. You can read more about my debugging in [3]
> With JCC every Python thread must be registered at the JVM through JCC.
> An unattached thread, that accesses a wrapped Java object, leads to
> errors and may even cause a segfault. Accessing also includes garbage
> collection. A code line like
>     a = {}
> or
>     "a b c".split()
> can segfault since the allocation of a dict or a bound method runs
> through _PyObject_GC_New(), which may trigger a cyclic garbage
> collection run. If the current thread isn't attached to the JVM but
> triggers a gc.collect() with some Java objects in a cycle, the
> interpreter crashes. It's quite complicated and hard to "fix" third
> party tools to attach all threads created in the third party library.

I have a somewhat similar problem and just noticed this thread.
In our software, we have multiple threads, and we use a lot of COM
COM object also have the requirement that they must only be used in the
same thread (in the same apartment, to be exact) that created them.
This also applies to cleaning up with the garbage collector.

Ok, when the com object is part of some Python structures that include
reference cycles, then the cycle gc tries to clean up the ref cycle and
cleans up the COM object.  This can happen in ANY thread, and in some 
cases the program crashes or the thread hangs.

Here is my idea to fix this from within Python:
The COM objects, when created, keep the name of the currently executing
thread. In the __del__ method, where the cleanup of the COM object
happens by calling the COM .Release() method, a check is made if the
current thread is the allowed one or not.  If it is the wrong thread,
the COM object is kept alive by appending it to some list. The list is
stored in a global dictionary indexed by the thread name.

The remaining goal is to clear the lists in the dict inside the valid
thread - which is done on every creation of a COM object, on every
destruction of a COM object, and in the CoUninitialize function that
every thread using COM must call before it is ending.  At least that's
my plan.

Maybe you can use a similar approach?


From stephen at  Fri May 27 12:20:24 2011
From: stephen at (Stephen J. Turnbull)
Date: Fri, 27 May 2011 19:20:24 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Nick Coghlan writes:
 > On Fri, May 27, 2011 at 4:14 PM, INADA Naoki <songofacandy at> wrote:
 > > I love unicode and use unicode when I can use it.
 > > But this is a problem in the real world.
 > > For example, Python 2 is convenient for analyzing line based logs
 > > containing some different encodings.

Where's the use case for bytes here?

 > > Python 3
 > ...deliberately makes that difficult because it is *wrong*.

Nick, you should have stopped there. :-)  I can see very little
difference between Python 2 and Python 3 in this use case, except that
Python 2 makes it much easier to write easily crashable programs.  In
both versions, the safe thing to do for such a program is either to
slurp the whole log with open(log, encoding=<whatever>,
errors=<something nonfatal>) (that's Python 3 code; Python 2 makes
this more tedious, in fact).  But no need for reading as bytes in
Python 3 visible here, move along, people!

Alternatively, one could write a function that reads lines from the
log as bytes, and tries different encodings for each line (perhaps
interacting with the user) and eventually uses some default encoding
and a nonfatal error handler to get *something*.  This requires
reading as bytes, but it's no easier to write in Python 2 AFAICS.

Granted, such a function will not easily be portable between Python 2
and 3, but that's a different problem.

> Binary files containing a mixture of encodings cannot be safely
> treated as text.

"Safety" is use-case-dependent.  I suppose Inada-san considers using
Python 2 strs to receive file input safe enough for his log analyzer.
While we shouldn't encourage that (and either errors='ignore' or
errors='surrogateescape' should be easy enough for him in the log
analysis case[1]), I don't think we should demand GIGO with 100%
fidelity in all use cases, either.

[1]  In new code.  Again, a port of existing Python 2 code to Python 3
might not be trivial, depending on how he handles unexpected encodings
and how pervasively they are manipulated in his program.

From stephen at  Fri May 27 13:07:42 2011
From: stephen at (Stephen J. Turnbull)
Date: Fri, 27 May 2011 20:07:42 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Nick Coghlan writes:

 > To me, the defining feature of str.format() over str.__mod__() is the
 > ability for types to provide their own __format__ methods,

Ah, so you object to the _spelling_, not the requested functionality.
(At least, not all of it.)  All is clear now!

OK, I retract my suggestion, but I'll let you beat up on anybody who
dredges it up in the future.  Specifically, I think that calling it
"bytes.format" (a) is discoverable and (b) it is not obvious to me
that __format_bytes__ functionality for arbitrary types is a bad
thing, although I personally have no use case and am unlikely to catch
one for a while (thus at most I'm now -0, and could easily be
persuaded to lower that).

 > bytes interpolation also needs to address encoding issues for
 > anything that isn't already a byte sequence.

Sure, but my proposal here still stands: whatever the API is, and
whatever types it supports, the assumption is that interpolation uses
the conventional ASCII representation for the given type (and for
interpolations implemented in stdlib there had better be universal
agreement on what that convention is).

From steve at  Fri May 27 13:21:00 2011
From: steve at (Steven D'Aprano)
Date: Fri, 27 May 2011 21:21:00 +1000
Subject: [Python-ideas] Suggestion: Integrate the script "" as
	standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, 26 May 2011 09:15:37 pm Serge Hulne wrote:
> It is my feeling that without such a tool Python is inherently very
> vulnerable to glitches occurring at editing time:

I can't think of any language that is invulnerable to the errors you 
list. All languages are vulnerable to glitches occurring at edit time. 
Picking your second example:

>       2. Tab key inadvertently hit.

If you inadvertently hit the tab key in the middle of a line:

    n = le	n(mylist)  # oops, hit the tab key!

do you expect it to keep working? No. Then why treat the start of the 
line any different? There might be some places that, *by chance*, an 
extra tab won't break the code:

    n = len(	mylist)

but you shouldn't rely on that. In general, you should expect ANY and 
EVERY mutation of source code could break your code, and avoid tools or 
practices that insert arbitrary changes you didn't intend. Don't let 
your cat walk on the keyboard while editing source code, don't put your 
code through a tool that turns text into fake Swedish, and don't use 
tools that mangle whitespace. It is commonsense really.

There are broken tools out there -- especially web forum software -- 
that arbitrarily mutate whitespace in source code. Those tools are 
broken, and should be avoided. If you can't avoid them, you have my 
sympathy, but that's your problem, not Python's, and Python doesn't 
need to be integrated with a tool for fixing broken source code.

Steven D'Aprano

From ncoghlan at  Fri May 27 13:51:53 2011
From: ncoghlan at (Nick Coghlan)
Date: Fri, 27 May 2011 21:51:53 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Fri, May 27, 2011 at 9:07 PM, Stephen J. Turnbull <stephen at> wrote:
> Nick Coghlan writes:
> ?> To me, the defining feature of str.format() over str.__mod__() is the
> ?> ability for types to provide their own __format__ methods,
> Ah, so you object to the _spelling_, not the requested functionality.
> (At least, not all of it.) ?All is clear now!
> OK, I retract my suggestion, but I'll let you beat up on anybody who
> dredges it up in the future. ?Specifically, I think that calling it
> "bytes.format" (a) is discoverable and (b) it is not obvious to me
> that __format_bytes__ functionality for arbitrary types is a bad
> thing, although I personally have no use case and am unlikely to catch
> one for a while (thus at most I'm now -0, and could easily be
> persuaded to lower that).

In the specific case of adding bytes.format(), it's the weight of the
backing machinery that bothers me - the PEP 3101 implementation isn't
small, and providing a parallel API for bytes without slowing down the
existing string implementation would be problematic (code re-use would
likely slow down the common case even further, while avoiding re-use
would likely end up duplicating a lot of code). However, *if* a solid
set of use cases for direct bytes interpolation can be identified (and
that's a big if), then it may be possible to devise a narrower, more
focused API that doesn't require such a heavy back end to support it.

But the use cases have to come first, and ones that are better
expressed via techniques such as ASCII decoding with the
surrogateescape error handler to support round-tripping don't count.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From donspauldingii at  Fri May 27 15:35:58 2011
From: donspauldingii at (Don Spaulding)
Date: Fri, 27 May 2011 08:35:58 -0500
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 27, 2011 at 6:21 AM, Steven D'Aprano <steve at>wrote:

>  and Python doesn't
> need to be integrated with a tool for fixing broken source code.

Doesn't it?  I thought something like this was already integrated.  At
least, since switching to Python, my source code looks a lot less broken.  I
don't know about this "pindent" script, but don't take out whatever it is in
Python that makes my source code look so good.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jimjjewett at  Fri May 27 16:01:32 2011
From: jimjjewett at (Jim Jewett)
Date: Fri, 27 May 2011 10:01:32 -0400
Subject: [Python-ideas] Threading hooks and disable gc per thread
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Fri, May 27, 2011 at 6:04 AM, Thomas Heller <theller at> wrote:
> Here is my idea to fix this from within Python:
> The COM objects, when created, keep the name of the currently executing
> thread. In the __del__ method, where the cleanup of the COM object
> happens by calling the COM .Release() method, a check is made if the
> current thread is the allowed one or not. ?If it is the wrong thread,
> the COM object is kept alive by appending it to some list. The list is
> stored in a global dictionary indexed by the thread name.

Of course, this means that multiple COM objects in the same cycle
become uncollectable, which again argues for the __close__ idiom.
(Just like __del__ except that it can be run more than once, and it if
there are multiples in a cycle, they are run in arbitrary order
instead of deferred.)  Alternatively, you might get away with some
wonky proxy objects as part of the COM wrapping.


From theller at  Fri May 27 12:04:40 2011
From: theller at (Thomas Heller)
Date: Fri, 27 May 2011 12:04:40 +0200
Subject: [Python-ideas] Threading hooks and disable gc per thread
In-Reply-To: <>
References: <>
Message-ID: <>

Am 12.05.2011 01:58, schrieb Christian Heimes:
> Hello,
> today I've spent several hours debugging a segfault in JCC [1]. JCC is a
> framework to wrap Java code for Python. It's most prominently used in
> PyLucene [2]. You can read more about my debugging in [3]
> With JCC every Python thread must be registered at the JVM through JCC.
> An unattached thread, that accesses a wrapped Java object, leads to
> errors and may even cause a segfault. Accessing also includes garbage
> collection. A code line like
>     a = {}
> or
>     "a b c".split()
> can segfault since the allocation of a dict or a bound method runs
> through _PyObject_GC_New(), which may trigger a cyclic garbage
> collection run. If the current thread isn't attached to the JVM but
> triggers a gc.collect() with some Java objects in a cycle, the
> interpreter crashes. It's quite complicated and hard to "fix" third
> party tools to attach all threads created in the third party library.

I have a somewhat similar problem and just noticed this thread.
In our software, we have multiple threads, and we use a lot of COM
COM object also have the requirement that they must only be used in the
same thread (in the same apartment, to be exact) that created them.
This also applies to cleaning up with the garbage collector.

Ok, when the com object is part of some Python structures that include
reference cycles, then the cycle gc tries to clean up the ref cycle and
cleans up the COM object.  This can happen in ANY thread, and in some 
cases the program crashes or the thread hangs.

Here is my idea to fix this from within Python:
The COM objects, when created, keep the name of the currently executing
thread. In the __del__ method, where the cleanup of the COM object
happens by calling the COM .Release() method, a check is made if the
current thread is the allowed one or not.  If it is the wrong thread,
the COM object is kept alive by appending it to some list. The list is
stored in a global dictionary indexed by the thread name.

The remaining goal is to clear the lists in the dict inside the valid
thread - which is done on every creation of a COM object, on every
destruction of a COM object, and in the CoUninitialize function that
every thread using COM must call before it is ending.  At least that's
my plan.

Maybe you can use a similar approach?


From stephen at  Fri May 27 17:18:39 2011
From: stephen at (Stephen J. Turnbull)
Date: Sat, 28 May 2011 00:18:39 +0900
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

Don Spaulding writes:

 > At least, since switching to Python, my source code looks a lot
 > less broken.


From ronaldoussoren at  Fri May 27 13:28:51 2011
From: ronaldoussoren at (Ronald Oussoren)
Date: Fri, 27 May 2011 13:28:51 +0200
Subject: [Python-ideas] Suggestion: Integrate the script "" as
 standard command for formatting pyhton code
In-Reply-To: <>
References: <>
Message-ID: <>

On 26 May, 2011, at 13:15, Serge Hulne wrote:
> It is my feeling that without such a tool Python is inherently very vulnerable to glitches occurring at editing time:
> Copy / paste glitch that passes unnoticed, does not generate an exception but alters the logic of the program.
> Tab key inadvertently hit.
> Difficulty in assessing the target indentation level when a part of a bloc has to be pasted in a different part of the code. 
You seem to be arguing for the addition of block delimiters to the language (even if only in comments), you might want to try "from __future__ import braces".


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <>

From greg.ewing at  Sat May 28 02:55:58 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 28 May 2011 12:55:58 +1200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Nick Coghlan wrote:

> The pedagogic cost of making it even harder than it already is to
> convince people that bytes are not text would also need to be
> considered.

I think that boat was missed some time ago. If there were
ever a serious intention to teach people that bytes are not
text by limiting the feature set of bytes, it would have
been better served by not giving bytes *any* features that
assumed a particular encoding.

As it is, bytes has quite a lot of features that implicitly
treat it as ascii-encoded text: the literal and repr()
forms, capitalize(), expandtabs(), lower(), splitlines(),
swapcase(), title(), upper(), and all the is*() methods.

Accepting all of that, and then saying "Oh, no, we couldn't
possibly provide a format() method, because bytes are not
text" seems a tad inconsistent.


From ncoghlan at  Sat May 28 03:16:14 2011
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 May 2011 11:16:14 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Sat, May 28, 2011 at 10:55 AM, Greg Ewing
<greg.ewing at> wrote:
> Nick Coghlan wrote:
>> The pedagogic cost of making it even harder than it already is to
>> convince people that bytes are not text would also need to be
>> considered.
> I think that boat was missed some time ago. If there were
> ever a serious intention to teach people that bytes are not
> text by limiting the feature set of bytes, it would have
> been better served by not giving bytes *any* features that
> assumed a particular encoding.
> As it is, bytes has quite a lot of features that implicitly
> treat it as ascii-encoded text: the literal and repr()
> forms, capitalize(), expandtabs(), lower(), splitlines(),
> swapcase(), title(), upper(), and all the is*() methods.
> Accepting all of that, and then saying "Oh, no, we couldn't
> possibly provide a format() method, because bytes are not
> text" seems a tad inconsistent.

Originally we didn't have all of that - more and more of it crept back
in at the behest of several binary protocol folks (including me, if I
recall correctly).

The urllib.parse experience has convinced me that giving in to that
pressure was a mistake. We went for a premature optimisation, and
screwed up the bytes API as a result. Yes, there is a potential
performance issue with the decode/process/encode model, but simple
keeping a bunch of string methods in the bytes API was the wrong
answer (and something that isn't actually all that useful in practice,
for the reasons brought up in this and other recent threads).

Perhaps it is time to resurrect the idea of an explicit 'ascii' type?
Add a'' literals, support the full string API as well as the bytes
API, deprecate all string APIs on bytes and bytearray objects. The
other thing I have learned in trying to deal with some of these issues
is that ASCII-encoded text really *is* special, compared to all other
encodings, due to its widespread use in a multitude of networking
protocols and other formats.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From greg.ewing at  Sat May 28 04:00:13 2011
From: greg.ewing at (Greg Ewing)
Date: Sat, 28 May 2011 14:00:13 +1200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Nick Coghlan wrote:

> Perhaps it is time to resurrect the idea of an explicit 'ascii' type?
> Add a'' literals, support the full string API as well as the bytes
> API, deprecate all string APIs on bytes and bytearray objects.

That sounds like an idea worth pursuing. Maybe also introduce an
x'...' literal for bytes at the same time, with a view to eventually
deprecating and removing the b'...' syntax.

I don't think I would remove *all* the string methods from bytes,
only the ones that assume ascii encoding. Searching and replacing
substrings etc. still makes sense on arbitrary bytes.

How would ascii behave when mixed with unicode strings? Should it
automatically coerce to unicode, or should an explicit decode()
be required?


From ethan at  Sat May 28 04:23:43 2011
From: ethan at (Ethan Furman)
Date: Fri, 27 May 2011 19:23:43 -0700
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>	<irkc54$gih$>
	<irl5n7$l1q$>	<>	<>	<irmk2f$ess$>	<>	<>	<>	<>
Message-ID: <>

Greg Ewing wrote:
> Nick Coghlan wrote:
>> Perhaps it is time to resurrect the idea of an explicit 'ascii' type?
>> Add a'' literals, support the full string API as well as the bytes
>> API, deprecate all string APIs on bytes and bytearray objects.
> That sounds like an idea worth pursuing. Maybe also introduce an
> x'...' literal for bytes at the same time, with a view to eventually
> deprecating and removing the b'...' syntax.
> I don't think I would remove *all* the string methods from bytes,
> only the ones that assume ascii encoding. Searching and replacing
> substrings etc. still makes sense on arbitrary bytes.
> How would ascii behave when mixed with unicode strings? Should it
> automatically coerce to unicode, or should an explicit decode()
> be required?

And what happens when a char > 127 hits the ascii stream?

As for unicode interoperation, I'm inclined to let it be implicit, since 
ascii directly overlaps unicode.  Depending, of course, on the answer to 
the above question.


From eric at  Sat May 28 11:43:54 2011
From: eric at (Eric Smith)
Date: Sat, 28 May 2011 05:43:54 -0400
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>	<irkc54$gih$>
	<irl5n7$l1q$>	<>	<>	<irmk2f$ess$>	<>	<>	<>	<>	<>
Message-ID: <>

On 5/27/2011 7:51 AM, Nick Coghlan wrote:
> In the specific case of adding bytes.format(), it's the weight of the
> backing machinery that bothers me - the PEP 3101 implementation isn't
> small, and providing a parallel API for bytes without slowing down the
> existing string implementation would be problematic (code re-use would
> likely slow down the common case even further, while avoiding re-use
> would likely end up duplicating a lot of code). However, *if* a solid
> set of use cases for direct bytes interpolation can be identified (and
> that's a big if), then it may be possible to devise a narrower, more
> focused API that doesn't require such a heavy back end to support it.

In Python 2.x str.format() and unicode.format() share the same
implementation, using the Objects/stringlib mechanism of #defines and
multiple includes. So while you do get the compiled code included twice,
there's only one source file that implements them both. I don't think
there's any concern about performance issues.

And Python 3.x has the exact same implementation, although it's only
included for unicode strings. It would not be difficult to add .format()
for bytes.

There have been various discussions over the years of how to actually do
that. I think the most recent one was to add an __bformat__ method.

I'm not saying any of this is a good idea or desirable. I'm just saying
it would be easy to do and wouldn't hurt the performance of


From ncoghlan at  Sat May 28 12:29:46 2011
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 May 2011 20:29:46 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Sat, May 28, 2011 at 7:43 PM, Eric Smith <eric at> wrote:
> There have been various discussions over the years of how to actually do
> that. I think the most recent one was to add an __bformat__ method.

Python 2.x was different, as the automatic unicode coercion meant
class developers still only needed to provide __str__ (or __unicode__
if they wanted to return non-ASCII data).

__bformat__ (and similar ideas) are somewhat different beasts due to
the encoding issues involved. Those aren't insurmountable, but they're
things that don't come up with pure unicode handling (2.x unicode, 3.x
str) or data that is essentially assumed to be latin-1 encoded in many
cases (2.x str)

> I'm not saying any of this is a good idea or desirable. I'm just saying
> it would be easy to do and wouldn't hurt the performance of
> unicode.format().

I'm still not sure about that, since the 2.x str.format() pretty much
ignores the associated encoding problems, and I don't believe
perpetuating that behaviour would be appropriate for 3.x bytes.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Sat May 28 12:47:48 2011
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 May 2011 20:47:48 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
	<> <>
Message-ID: <>

On Sat, May 28, 2011 at 12:23 PM, Ethan Furman <ethan at> wrote:
> Greg Ewing wrote:
>> How would ascii behave when mixed with unicode strings? Should it
>> automatically coerce to unicode, or should an explicit decode()
>> be required?
> And what happens when a char > 127 hits the ascii stream?

These are the kinds of questions that make it clear that the answer
here is far from being as simple as merely adding more string methods
to the existing bytes type. The underlying data model is simply
*wrong* for working with bytes as if they were text.

For a previous, more flexible, incarnation of this idea, Barry's post
is the earlier record I found of the idea of a byte sequence oriented
type that carried its encoding metadata along with it:

However, supporting multi-byte codes (and other stateful codecs like
ShiftJIS) poses problems for slicing operations (just as it does for
us already in Unicode slicing).

Hence the possibility of strictly limiting this to 7-bit ASCII - the
main problem with most bytes-as-text suggestions is that they don't
work for arbitrary subsets of the codecs available in the standard
library and it generally isn't entirely clear which codecs will work
and which ones won't.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From paul at  Sun May 29 20:55:21 2011
From: paul at (Paul Colomiets)
Date: Sun, 29 May 2011 21:55:21 +0300
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Sat, May 28, 2011 at 12:43 PM, Eric Smith <eric at> wrote:
> And Python 3.x has the exact same implementation, although it's only
> included for unicode strings. It would not be difficult to add .format()
> for bytes.
> There have been various discussions over the years of how to actually do
> that. I think the most recent one was to add an __bformat__ method.

Well, that's actually great idea I think. format method on bytes could
produce some data which is not an ascii, and eventually became struct.pack
on steroids. The struct.pack has plenty of problems:

* unable to use named fields, which is usefull to describe big structures
* all fields are fixed-length, which is unfortunate for today's trend of
variable length integers
* can't specify separators between fields

I also use str(intvalue).encode('ascii') idiom a lot. So probably I'd
suggest to have something like
__bformat__ with format values somewhat similar to ones struct.pack has
along with str-like ones for integers. Also it might be useful to have
`!len` conversion for bytes fields, for easier encoding of length-prefixed

To show an example, here is how two-chunk png file can be encoded:

    s1=section1, crc1=crc(section1),
    s2=section2, crc2=crc(section2)))

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Mon May 30 04:39:45 2011
From: stephen at (Stephen J. Turnbull)
Date: Mon, 30 May 2011 11:39:45 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Greg Ewing writes:

 > How would ascii behave when mixed with unicode strings? Should it
 > automatically coerce to unicode,

Definitely not!  Bytes are not text, and the programmer must say when
they want those bytes decoded.  The Python translator must not be
asked to guess.

 > or should an explicit decode() be required?


But IMHO worth considering is an implicit coercion of Unicode to ascii
via decode() with strict errors.  Remember, Unicode is an invertible
mapping of characters to abstract integers, which may be represented
in various different ways, such as bytes, 32-bit words, or UTF-8.  So
in some sense there is no violation of the Unicode type here.  Sorry,
I can't explain more clearly at the moment, but I have a strong sense
that coercion (ASCII) bytes -> Unicode *changes* or maybe even
"destroys" the type of the byte, while the coercion (ASCII) Unicode ->
bytes takes an abstract type "Unicode" and refines to a concrete type
"bytes".  Among other things, this is always reversible.

This takes into account the common usage of punning natural language
encoded in ASCII on binary protocol magic numbers.

Then one could write stuff like

    my_pipe.write('HELO ' + my_fqdn)

while true pedants would of course write

    my_pipe.write(b'HELO ' + my_fqdn)

This doesn't explain how to make it easy to ensure that my_fqdn is
bytes, of course, and that makes me uneasy about whether this would
actually be useful, or merely confusing.  (However, there are use
cases where it is claimed that 'HELO ' is needed both as str and as

From ncoghlan at  Mon May 30 06:45:10 2011
From: ncoghlan at (Nick Coghlan)
Date: Mon, 30 May 2011 14:45:10 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Mon, May 30, 2011 at 12:39 PM, Stephen J. Turnbull
<stephen at> wrote:
> (However, there are use
> cases where it is claimed that 'HELO ' is needed both as str and as
> bytes.)

My current opinion is that all of this still needs more
experimentation outside the core before we start fiddling any further
with the builtins (we blinked once in the lead-up to 3.0 by allowing
bytes and bytearray to retain a lot of string methods that assume an
ASCII compatible encoding, and I now have my doubts about the wisdom
of even that step). I don't have a good answer on how to deal with the
real world situations where the *use case* blurs the bytes/text
distinction (typically by embedding ASCII text inside an otherwise
binary protocol), and given the potential to backslide into the bad
old days of 8-bit strings, I'm not prepared to guess, either.

3.x has largely cleared the decks to allow a better solution to evolve
in this space by making it harder to blur the line accidentally, and
decode()/manipulate/encode() already nicely covers many stateless use
cases. If it turns out we need another type, or some other API, to
deal gracefully with any use cases where that isn't enough, then so be
it. However, I think we need to let the status quo run for a while
longer and see what people actually using the current types in
production come up with. The bytes/text division in Python 3 is by far
the biggest conceptual change between the two languages, so it's going
to take some time before we can figure out how many of the problems
encountered are real issues with the split model not covering some use
cases and how many are just people (including us) taking time to get
used to the sharp division between the two worlds.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From raymond.hettinger at  Mon May 30 06:58:52 2011
From: raymond.hettinger at (Raymond Hettinger)
Date: Sun, 29 May 2011 21:58:52 -0700
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On May 29, 2011, at 9:45 PM, Nick Coghlan wrote:

> On Mon, May 30, 2011 at 12:39 PM, Stephen J. Turnbull
> <stephen at> wrote:
>> (However, there are use
>> cases where it is claimed that 'HELO ' is needed both as str and as
>> bytes.)
> My current opinion is that all of this still needs more
> experimentation outside the core before we start fiddling any further
> with the builtins (we blinked once in the lead-up to 3.0 by allowing
> bytes and bytearray to retain a lot of string methods that assume an
> ASCII compatible encoding, and I now have my doubts about the wisdom
> of even that step). I don't have a good answer on how to deal with the
> real world situations where the *use case* blurs the bytes/text
> distinction (typically by embedding ASCII text inside an otherwise
> binary protocol), and given the potential to backslide into the bad
> old days of 8-bit strings, I'm not prepared to guess, either.



From tjreedy at  Mon May 30 22:04:36 2011
From: tjreedy at (Terry Reedy)
Date: Mon, 30 May 2011 16:04:36 -0400
Subject: [Python-ideas] Bytes formatting (was Re: Adding 'bytes' as alias
	for 'latin_1' codec)
In-Reply-To: <>
References: <>	<irkc54$gih$>
	<irl5n7$l1q$>	<>	<>	<irmk2f$ess$>	<>	<>	<>
Message-ID: <is0t8l$nln$>

Changing the subject to what it has actually become.

On 5/27/2011 5:27 AM, Nick Coghlan wrote:

> We can almost certainly do better when it comes to constructing byte
> sequences from component parts, but simply saying "oh, just add a
> format() method to bytes objects" doesn't cut it, since the associated
> magic methods for str.format are all string based,


 From a modern and Python viewpoint, string formatting is about 
interpolating text representations of objects into a text template. By 
default, the text representation is str(object).

Exception 1. str.format has an optional conversion specifier "!s/r/a" to 
specify repr(object) or ascii(object) instead of str(object). (It can 
also be used to overrides exception 2.) This is not relevant to bytes 

Exception 2.str.format, like % formatting, does special processing of 
numbers. Electronic computing was originally used only to compute 
numbers and text formatting was originally about formatting numbers, 
usually in tables, with optional text decoration. That is why the 
maximum field size for string interpolation is still called 'precision'. 
There are numerous variations in number formatting and most of the 
complication of format specifications arise therefrom.


If the desired result consists entirely of text encoded with one 
encoding, the current recommended method is to construct the text and 
encode. I think this is the proper method and do not think that anything 
we add should be aimed at this use case.

There are two other current methods to assemble bytes from pieces. One 
is concatenation; it has the same advantages and disadvantages of string 
concatenation. Another, overlooked in the current discussion so afr, is 
in-place editing of a bytearray by index and slice assignment. It has 
the disadvantage of having to know the correct indexes and slice points.

If we add another bytes formatting function or method, I think it should 
be about interpolating bytes into a bytes template. The use cases would 
be anything other than mono-encoded text -- text with multiple encodings 
or non-text bytes possibly intermixed with encoded text.

> and bytes interpolation also needs to address encoding issues
 > for anything that isn't already a byte sequence.

As indicated above, I disagree if 'encoding' means 'text encoding'.
Let .encode handle encoding issues.


A bytes template uses b'{' and b'}' to mark interpolation fields and 
other ascii bytes within as needed. It uses the ascii equivalent of the 
string field_name spec. It does not have a conversion spec. The 
format_spec should have the minimum needed for existing public 
protocols. How much more is up for discussion. We need use cases.

One possibility to keep in mind is that a bytes template could 
constructed by an ascii-compatible encoding of formatted text. Specs for 
bytes fields can be protected in a text template by doubling the braces.

 >>> '{} {{byte-field-spec}}'.format(1).encode()
b'1 {byte-field-spec}'

A major issue is what to do with numbers. Sometimes they needed to be 
ascii encoded, sometime binary encoded. The baseline is to do nothing 
extra and require all args to be bytes. I think this may be appropriate 
for floats as they are seldom specifically used in protocols. I think 
the same may be true for ints with signs. So I think we mainly need to 
consider counts (unsigned ints) for possible exceptional processing.

Option 0. As stated, no special number specs.

Option 1. Use a subset of the current int spec to produce ascii 
encodings; use struct.pack for binary encodings. (How many of the 
current integer presentation types would be needed?)

Option 2. Use an adaptation of the struct.pack mini-language to produce 
binary encodings; use encoded str.format for ascii encodings. (The 
latter might be done as part of a text-to-bytes-template process as 
indicated above.)

Option 3. Combine options 1 and 2. This might best be done by replacing 
the omitted 'conversion' field with a 'number-encoding' field, b'!a' or 
b'!b', to indicate ascii or binary conversion and corresponding 
interpretation of the format spec. (In other words, do not try to 
combine the number to text and number to binary mini-languages, but add 
a 'prefix' to specify which is being used.)

Terry Jan Reedy

From guido at  Mon May 30 22:27:05 2011
From: guido at (Guido van Rossum)
Date: Mon, 30 May 2011 13:27:05 -0700
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Sun, May 29, 2011 at 9:45 PM, Nick Coghlan <ncoghlan at> wrote:
> On Mon, May 30, 2011 at 12:39 PM, Stephen J. Turnbull
> <stephen at> wrote:
>> (However, there are use
>> cases where it is claimed that 'HELO ' is needed both as str and as
>> bytes.)
> My current opinion is that all of this still needs more
> experimentation outside the core before we start fiddling any further
> with the builtins (we blinked once in the lead-up to 3.0 by allowing
> bytes and bytearray to retain a lot of string methods that assume an
> ASCII compatible encoding, and I now have my doubts about the wisdom
> of even that step). I don't have a good answer on how to deal with the
> real world situations where the *use case* blurs the bytes/text
> distinction (typically by embedding ASCII text inside an otherwise
> binary protocol), and given the potential to backslide into the bad
> old days of 8-bit strings, I'm not prepared to guess, either.
> 3.x has largely cleared the decks to allow a better solution to evolve
> in this space by making it harder to blur the line accidentally, and
> decode()/manipulate/encode() already nicely covers many stateless use
> cases. If it turns out we need another type, or some other API, to
> deal gracefully with any use cases where that isn't enough, then so be
> it. However, I think we need to let the status quo run for a while
> longer and see what people actually using the current types in
> production come up with. The bytes/text division in Python 3 is by far
> the biggest conceptual change between the two languages, so it's going
> to take some time before we can figure out how many of the problems
> encountered are real issues with the split model not covering some use
> cases and how many are just people (including us) taking time to get
> used to the sharp division between the two worlds.

Well said, Nick. We ought to attempt to live with the current
situation for quite a bit longer before stirring the pot again.

My feeling is that one of the main reasons why this topic keeps coming
up is simply that it is different from Python 2 -- this is "the year
of Python 3" so more people than ever before are discovering the
differences between Python 2 and 3. Most people's minds probably
haven't switched over, and the solutions and attitudes that worked in
Python 2 don't always work so well in Python 3.

Let's also remember that while Python is not exactly blazing a new
trail here, it is also not following the most conservative course.
Most languages of Python's vintage or older are still using a model
that blurs the line between text and binary data, representing Unicode
text as bytes that happen to be encoded in some encoding. Even if the
language assumes a default encoding this doesn't mean that all data
manipulated is actually text encoded in that encoding -- it just means
that you may get nonsense when you use text operations on data that
uses some other encoding, just as you get nonsense when you use text
operations on binary data (e.g. using readlines() on a JPEG file).

Python lets you do this too, to some extent, with some of the text
operations on bytes data, and this is definitely a compromise. I hope
that we have built in just enough friction to remind people that this
is not the best way to deal with text most of the time, while still
allowing advanced users who are writing e.g. parsers for Internet
protocols to stay at the bytes layer at a reasonable cost. Personally
I think we got this close enough to right that we won't having to
rethink the whole thing, even if small tweaks might be possible; but
there's no need to rush.

--Guido van Rossum (

From greg.ewing at  Tue May 31 02:38:07 2011
From: greg.ewing at (Greg Ewing)
Date: Tue, 31 May 2011 12:38:07 +1200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Stephen J. Turnbull wrote:
> Greg Ewing writes:
>  > How would ascii behave when mixed with unicode strings? Should it
>  > automatically coerce to unicode,
> Definitely not!  Bytes are not text, and the programmer must say when
> they want those bytes decoded.

But the proposed 'ascii' type *is* text, though. Whether it's
a good idea to auto-coerce I'm not sure, but it's not obviously
wrong to do so.


From python at  Tue May 31 04:11:59 2011
From: python at (MRAB)
Date: Tue, 31 May 2011 03:11:59 +0100
Subject: [Python-ideas] Bytes formatting (was Re: Adding 'bytes' as
 alias for 'latin_1' codec)
In-Reply-To: <is0t8l$nln$>
References: <>	<irkc54$gih$>	<irl5n7$l1q$>	<>	<>	<irmk2f$ess$>	<>	<>	<>	<>
Message-ID: <>

On 30/05/2011 21:04, Terry Reedy wrote:
> Changing the subject to what it has actually become.
> A bytes template uses b'{' and b'}' to mark interpolation fields and
> other ascii bytes within as needed. It uses the ascii equivalent of the
> string field_name spec. It does not have a conversion spec. The
> format_spec should have the minimum needed for existing public
> protocols. How much more is up for discussion. We need use cases.
> One possibility to keep in mind is that a bytes template could
> constructed by an ascii-compatible encoding of formatted text. Specs for
> bytes fields can be protected in a text template by doubling the braces.
>  >>> '{} {{byte-field-spec}}'.format(1).encode()
> b'1 {byte-field-spec}'
> A major issue is what to do with numbers. Sometimes they needed to be
> ascii encoded, sometime binary encoded. The baseline is to do nothing
> extra and require all args to be bytes. I think this may be appropriate
> for floats as they are seldom specifically used in protocols. I think
> the same may be true for ints with signs. So I think we mainly need to
> consider counts (unsigned ints) for possible exceptional processing.
> Option 0. As stated, no special number specs.
> Option 1. Use a subset of the current int spec to produce ascii
> encodings; use struct.pack for binary encodings. (How many of the
> current integer presentation types would be needed?)
> Option 2. Use an adaptation of the struct.pack mini-language to produce
> binary encodings; use encoded str.format for ascii encodings. (The
> latter might be done as part of a text-to-bytes-template process as
> indicated above.)
> Option 3. Combine options 1 and 2. This might best be done by replacing
> the omitted 'conversion' field with a 'number-encoding' field, b'!a' or
> b'!b', to indicate ascii or binary conversion and corresponding
> interpretation of the format spec. (In other words, do not try to
> combine the number to text and number to binary mini-languages, but add
> a 'prefix' to specify which is being used.)
Perhaps something like this:

# Format int as byte.
b"{:b}".format(128) returns b"\x80"

# Format int as double-byte.
b"{:2b}".format(0x100) returns b"\x00\x01" or b"\x01\x00"

# Format int as double-byte, little-endian.
b"{:<2b}".format(0x100) returns b"\x00\x01"

# Format int as double-byte, big-endian.
b"{:>2b}".format(0x100) returns b"\x01\x00"

# Format list of ints as signed bytes.
b"{:s}".format([1, -2, 3]) returns b"\x01\xFE\x03"

# Format list of ints as unsigned bytes.
b"{:u}".format([1, 254, 3]) returns b"\x01\xFE\x03"

# Format ASCII-only string as bytes.
b"{:a}".format("abc") returns b"abc"

From stephen at  Tue May 31 07:51:47 2011
From: stephen at (Stephen J. Turnbull)
Date: Tue, 31 May 2011 14:51:47 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Greg Ewing writes:
 > Stephen J. Turnbull wrote:
 > > Greg Ewing writes:
 > > 
 > >  > How would ascii behave when mixed with unicode strings? Should it
 > >  > automatically coerce to unicode,
 > > 
 > > Definitely not!  Bytes are not text, and the programmer must say when
 > > they want those bytes decoded.
 > But the proposed 'ascii' type *is* text, though.

If it's intended that the 'ascii' type *be* text, I don't see the
point.  It *is* Unicode (with a restricted range), and no coercion is
necessary between str and 'ascii', just a change of representation.
This can be done completely transparently[1], no need for a new type,
except that some effort on the part of implementer can be saved by
imposing ongoing annoyance on the application programmer.

But even as a separate type, 'ascii' still can't mix with bytes
safely, for the same reason that str can't mix with bytes: 'ascii' and
str have a known fixed encoding (Unicode), and bytes have an unknown,
variable encoding (possibly the non-encoding 'binary').  YAGNI...

[1]  For some use cases it might be useful to allow specifying the
representation in advance, as a micro-optimization.

From greg.ewing at  Tue May 31 09:32:18 2011
From: greg.ewing at (Greg Ewing)
Date: Tue, 31 May 2011 19:32:18 +1200
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Stephen J. Turnbull wrote:

> But even as a separate type, 'ascii' still can't mix with bytes
> safely,

Yes, it can, because it's also bytes. :-)

If you're using the special ascii type at all, rather
than an ordinary str, it's precisely because you want
to mix it with bytes. Making that part hard would
defeat the purpose,


From ncoghlan at  Tue May 31 10:24:30 2011
From: ncoghlan at (Nick Coghlan)
Date: Tue, 31 May 2011 18:24:30 +1000
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

On Tue, May 31, 2011 at 5:32 PM, Greg Ewing <greg.ewing at> wrote:
> Stephen J. Turnbull wrote:
>> But even as a separate type, 'ascii' still can't mix with bytes
>> safely,
> Yes, it can, because it's also bytes. :-)
> If you're using the special ascii type at all, rather
> than an ordinary str, it's precisely because you want
> to mix it with bytes. Making that part hard would
> defeat the purpose,

Indeed, the specific use case here is working with ASCII snippets
embedded within ASCII compatible encodings (or otherwise demarcated
from the 8-bit data).

As I stated elsewhere, we still need more usage of Python 3 in
production before we can find out whether or not this is a significant
enough use case to require builtin support, or if third party
libraries will be up to the task.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From stephen at  Tue May 31 11:08:06 2011
From: stephen at (Stephen J. Turnbull)
Date: Tue, 31 May 2011 18:08:06 +0900
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Greg Ewing writes:
 > Stephen J. Turnbull wrote:
 > > But even as a separate type, 'ascii' still can't mix with bytes
 > > safely,
 > Yes, it can, because it's also bytes. :-)

To the extent that's safe, you may as well just use str and force
encoding with the ascii codec and strict errors (as I suggested
earlier).  AFAICS, the argument that the visual signal of the special
literal syntax helps is bogus.  It doesn't help with variables;
variables aren't typed in Python.  It's still just as possible to type
a'?????', although it might make the mistake a little more visible.
And in most cases, the use case for this feature will be very
stylized, with a very small vocabulary of ASCII puns, written as
literals at the point of combination with a bytes object.  Anything
else I can think of should be handled as text, via conversion to str.

I just don't see a use case for an 'ascii' type, vs. coercing str to
bytes and raising an error if the str is not all-ASCII.

 > If you're using the special ascii type at all, rather
 > than an ordinary str, it's precisely because you want
 > to mix it with bytes. Making that part hard would
 > defeat the purpose,

Indeed.  Most alleged use cases for "mixing" *should* be made hard to
do by operating on bytes directly.  Cf. the mixed-encoding log file

From janssen at  Tue May 31 18:16:46 2011
From: janssen at (Bill Janssen)
Date: Tue, 31 May 2011 09:16:46 PDT
Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
In-Reply-To: <>
References: <>
	<irkc54$gih$> <irl5n7$l1q$>
Message-ID: <>

Nick Coghlan <ncoghlan at> wrote:

> Perhaps it is time to resurrect the idea of an explicit 'ascii' type?
> Add a'' literals, support the full string API as well as the bytes
> API, deprecate all string APIs on bytes and bytearray objects. The
> other thing I have learned in trying to deal with some of these issues
> is that ASCII-encoded text really *is* special, compared to all other
> encodings, due to its widespread use in a multitude of networking
> protocols and other formats.

I like the deprecations you suggest, but I'd prefer to see a more
general solution: the 'str' type extended so that it had two possible
representations for strings, the current format and an "encoded" format,
which would be kept as an array of bytes plus an encoding.  It would
transcode only as necessary -- for example, the 're' module might
require the current Unicode encoding.  An explicit method would be added
to allow the user to force transcoding.

This would complicate life at the C level, to be sure.  Though, perhaps
not so much, given the proper macrology.


From tjreedy at  Tue May 31 20:08:33 2011
From: tjreedy at (Terry Reedy)
Date: Tue, 31 May 2011 14:08:33 -0400
Subject: [Python-ideas] Bytes formatting (was Re: Adding 'bytes' as
 alias for 'latin_1' codec)
In-Reply-To: <>
References: <>	<irkc54$gih$>
	<irl5n7$l1q$>	<>	<>	<irmk2f$ess$>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <is3ar1$5e3$>

On 5/31/2011 4:24 AM, Nick Coghlan wrote:
> On Tue, May 31, 2011 at 5:32 PM, Greg Ewing
>> If you're using the special ascii type at all, rather
>> than an ordinary str, it's precisely because you want
>> to mix it with bytes. Making that part hard would
>> defeat the purpose,
> Indeed, the specific use case here is working with ASCII snippets
> embedded within ASCII compatible encodings (or otherwise demarcated
> from the 8-bit data).

My proposal for a function that interpolates bytes into bytes covers 
this case. There is no need for a new class at all. I agree that 
experience and experimentation is needed before adding anything to the 
atdlib. But here is a baseline version in Python:

from itertools import zip_longest
import re
field = re.compile(b'{}')

def bformat(template, *inserts):
     temlits = re.split(field, template) # template literals
     res = bytearray()
     for t,i in zip_longest(temlits, inserts, fillvalue=b''):
     return res

print(bformat(b'xxx{}yyy{}zzz', b'help', b'me'))

# bytearray(b'xxxhelpyyymezzz')

This is, of course, not limited to the ascii subset of bytes.

print(bformat(b'xx\xaa{}yy\xbb{}zzz', b'h\xeeelp', b'm\xeee'))

The next step would be to change the field re to allow a field spec 
between {} and add capturing parens so that re.split keeps the field 
specs. Then use those to format the inserted bytes or, later, ints.

Terry Jan Reedy

From tjreedy at  Tue May 31 20:18:03 2011
From: tjreedy at (Terry Reedy)
Date: Tue, 31 May 2011 14:18:03 -0400
Subject: [Python-ideas] Bytes formatting (was Re: Adding 'bytes' as
 alias for 'latin_1' codec)
In-Reply-To: <>
References: <>	<irkc54$gih$>	<irl5n7$l1q$>	<>	<>	<irmk2f$ess$>	<>	<>	<>	<>	<is0t8l$nln$>
Message-ID: <is3bcq$97t$>

On 5/30/2011 10:11 PM, MRAB wrote:
> On 30/05/2011 21:04, Terry Reedy wrote:

>> Option 3. Combine options 1 and 2. This might best be done by replacing
>> the omitted 'conversion' field with a 'number-encoding' field, b'!a' or
>> b'!b', to indicate ascii or binary conversion and corresponding
>> interpretation of the format spec. (In other words, do not try to
>> combine the number to text and number to binary mini-languages, but add
>> a 'prefix' to specify which is being used.)

Unless someone has a better idea of how to combine than I do ;-).

> Perhaps something like this:
> # Format int as byte.
> b"{:b}".format(128) returns b"\x80"
> # Format int as double-byte.
> b"{:2b}".format(0x100) returns b"\x00\x01" or b"\x01\x00"
> # Format int as double-byte, little-endian.
> b"{:<2b}".format(0x100) returns b"\x00\x01"
> # Format int as double-byte, big-endian.
> b"{:>2b}".format(0x100) returns b"\x01\x00"
> # Format list of ints as signed bytes.
> b"{:s}".format([1, -2, 3]) returns b"\x01\xFE\x03"
> # Format list of ints as unsigned bytes.
> b"{:u}".format([1, 254, 3]) returns b"\x01\xFE\x03"
> # Format ASCII-only string as bytes.
> b"{:a}".format("abc") returns b"abc"

Interesting. The core ideas of my proposal are

* There are bytes construction cases not sensibly handled by test 
interpolation followed by encoding. Bytes concatenation and bytearray 
manipulation may be awkward, or follow patterns that can usefully be 
captures in a new function.

* Bytes interpolation should only deal with bytes and maybe ints and 
have nothing to do with text encoding.

* Design details should be based on use cases and experimentation with 
suggestions such as the above by people who would be the users of such a 

Experimental functions should be uploaded to pypi.

Terry Jan Reedy