From gokoproject at  Mon Jun  1 01:32:56 2015
From: gokoproject at (John Wong)
Date: Sun, 31 May 2015 19:32:56 -0400
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <>
References: <>
Message-ID: <>

Sorry I am on mobile but I'd chime in a concern:

David wrote:
> The problem that I have with virtualenv is that it requires quote a bit
of configuration and a great deal of awareness by the user of what is going
on and how things are configured.

This is his response to Stephen's comment. I'd like to point out that
personally I haven't found a use case in development I would find trouble
with source bin/activate and continue with my life. But in various places
including #python channel I would hear helpers strongly advise to run
/PTAH/TO/VIRTENV/bin/python which seems like a great idea, especially in
the case of writing app startup script. So I'm not sure how autonenv,
virtualenvwrapper are likeable if they configure something in behalf of
users and if users when into trouble the users still have to unfold the doc
yo find "Ohhh" moment. What I'm suggesting is I feel these recommendations
are kind of contradicting. Maybe I am not convinced why source activate is
bad yet because I have not really seen the pain with concrete example, just
alwAys someone telling me that is a bad idea.

Frankly I never like to posion my source directory with npm_modules and
have to reconfigure where to save npm modules. So I still don't gain much.
But Having pip to recognize that requirements.txt is there and install can
be helpful. But helpful has to come with a price... Is it better for a user
to learn what they are doing now or have they enjoy easy ride and then find
a mud hole?
On Sunday, May 31, 2015, David Townshend <aquavitae69 at> wrote:

> On Sun, May 31, 2015 at 9:00 PM, Andrew Barnert <abarnert at
> <javascript:_e(%7B%7D,'cvml','abarnert at');>> wrote:
>> On May 31, 2015, at 09:19, David Townshend <aquavitae69 at
>> <javascript:_e(%7B%7D,'cvml','aquavitae69 at');>> wrote:
>>> The default for npm is that your package dir is attached directly to the
>>> project. You can get more flexibility by setting an environment variable or
>>> creating a symlink, but normally you don't. It has about the same
>>> flexibility as virtualenvwrapper, with about the same amount of effort. So
>>> if virtualenvwrapper isn't flexible enough for you, my guess is that your
>>> take on npm won't be flexible enough either, it'll just come preconfigured
>>> for your own idiosyncratic use and everyone else will have to adjust...
>> You have a point.  Maybe lack of flexibility is not actually the issue -
>> it's too much flexibility.
>> I think Python needs that kind of flexibility, because it's used in a
>> much wider range of use cases, from binary end-user applications to OS
>> components to "just run this script against your system environment" to
>> conda packages, not just web apps managed by a deployment team and other
>> things that fall into the same model. And it needs to be backward
>> compatible with the different ways people have come up with for handling
>> all those models.
>> While it's possible to rebuild all of those models around the npm model,
>> and the node community is gradually coming up with ways of doing so
>> (although notice that much of the node community is instead relying on
>> docker or VMs...), you'd have to be able to transparently replace all of
>> the current Python use cases today if you wanted to change Python today.
>> Also, as Nick pointed out, making things easier for the developer comes
>> at the cost of making things harder for the user--which is acceptable when
>> the user is the developer himself or a deployment team that sits at the
>> next set of cubicles, but may not be acceptable when the user is someone
>> who just wants to run a script he found online. Again, the Node community
>> is coming to terms with this, but they haven't got to the same level as the
>> Python community, and, even if they had, it still wouldn't work as a
>> drop-in replacement without a lot of work.
>> What someone _could_ do is make it easier to set up a dev-friendly
>> environment based on virtualenvwrapper and virtualenvwrapperhelper.
>> Currently, you have to know what you're looking for and find a blog page
>> somewhere that tells you how to install and configure all the tools and
>> follow three or four steps. That's obvious less than ideal. It would be
>> nice if there were a single "pip install envstuff" that got you ready out
>> of the box (including working for Windows cmd and PowerShell), and if links
>> to that were included in the basic Python docs. It would also be nice if
>> there were a way to transfer your own custom setup to a new machine. But I
>> don't see why that can't all be built as improvements on the existing tools
>> (and a new package that just included requirements and configuration and no
>> new tools).
>> The problem that I have with virtualenv is that it requires quite a bit
>> of configuration and a great deal of awareness by the user of what is going
>> on and how things are configured. As stated on it's home page While there
>> is nothing specifically wrong with this, I usually just want a way to do
>> something in a venv without thinking too much about where it is or when or
>> how to activate it.
>> But again, if that's what you want, that's what you have with
>> virtualenvwrapper or autoenv. You just cd into the directory (whether a new
>> one you just created with the wrapper or an old one you just pulled from
>> git) and it's set up for you. And setting up a new environment or cloning
>> an existing one is just a single command, too. Sure, you can make your
>> configuration more complicated than that, but if you don't want to, you
>> don't have to.
>> If you've had a look at the details of the sort of tool I'm proposing, it
>> is completely transparent.  Perhaps the preconfiguration is just to my own
>> idiosyncrasies, but if it serves its use 90% of the time then maybe that is
>> good enough.
>> Some of what I'm proposing could be incorporated in to pip (i.e. better
>> requirements) and some could possibly be incorporated into
>> virtualenvwrapper (although I still think that my proposal for handling
>> venvs is just too different from that of virtualenvwrapper to be worth
>> pursuing that course), but one of the main aims is to merge it all into one
>> tool that manages both the venv and the requirements.
>> There are major advantages in not splitting the Python community between
>> two different sets of tools. We've only recently gotten past easy_install
>> vs. pip and distribute vs. setuptools, which has finally enabled a clean
>> story for everyone who wants to distribute packages to get it right, which
>> has finally started to happen (although there are people still finding and
>> following blog posts that tell them to install distribute or not to use
>> virtualenv because it doesn't play nice with py2app or whatever).
>> I'm quite sure that this proposal is not going to accepted without a
>> trial period on pypi, so maybe that will be the test of whether this is
>> useful.
>> Is this the right place for this, or would distutils-sig be better?
>> Other people have made the case for both sides of that earlier in the
>> thread and I'm not sure which one is more compelling...
>> Also, the pure pip enhancement of coming up with something better than
>> freeze/-r may belong on distutils-sig while the environment-aware launcher
>> and/or environment-managing tools may belong here. (Notice that Python
>> includes venv and the py launcher, but doesn't include setuptools or pip...)
> Just to be clear, I'm not suggesting changing the python executable
> itself, or any of the other tools already in existence.  My proposal is a
> separate wrapper around existing python, pip and venv which would not
> change anything about the way it works currently.  A dev environment set up
> using it could still be deployed in the same way it would be now, and there
> would still be the option of using virtualenvwrapper, or something else for
> those that want to.  It is obviously way too early to try to get it
> included in the next python release (apart form anything else, pip would
> need to be added first), so really this proposal is meant more to gauge
> interest in the concept so that if it is popular I can carry on developing
> it and preparing it for inclusion in the stdlib, or at least a serious
> discussion about including it, once it is mature.
> That said, Andrew's arguments have convinced me that much could be done to
> improve existing tools before creating a new one, although I still don't
> believe virtualenvwrapper can be squashed into the shape I'm aiming for
> without fundamental changes.  Also, from the other responses so far it
> seems that the general feeling is that handling of requirements could
> definitely be improved, but that anything too prescriptive with venvs would
> be problematic.  Unfortunately for my proposal, if something like what I'm
> suggesting were officially supported via inclusion in the stdlib it would
> quickly become, at best, the "strongly recommended" way of working and at
> worst the One Obvious Way.  With all this in mind, I'll withdraw my
> proposal, but continue development on my version and see if it goes
> anywhere.  I'll also see how much of it's functionality I can put into
> other tools (specifically pip's requirements handling) instead.

Sent from Jeff Dean's printf() mobile console
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From surya.subbarao1 at  Mon Jun  1 04:25:46 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Sun, 31 May 2015 19:25:46 -0700
Subject: [Python-ideas] Python Float Update
Message-ID: <>

Dear Python Developers:

I will be presenting a modification to the float class, which will improve
its speed and accuracy (reduce floating point errors). This is applicable
because Python uses a numerator and denominator rather than a sign and
mantissa to represent floats.

First, I propose that a float's integer ratio should be accurate. For
example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
returns(6004799503160661, 18014398509481984).

Second of all, even though 1 * 3 = 3 (last example), 6004799503160661 * 3
does not equal 18014398509481984. Instead, it equals 1801439850948198**3**,
one less than the value in the ratio. This means the ratio is inaccurate,
as well as completely not simplified.

[image: Inline image 1]

Even if the value displayed for a float is a rounded value, the internal
numerator and denominator should divide to equal to completely accurate

Thanks for considering this improvement!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pythonfloats.PNG
Type: image/png
Size: 16278 bytes
Desc: not available
URL: <>

From njs at  Mon Jun  1 04:37:14 2015
From: njs at (Nathaniel Smith)
Date: Sun, 31 May 2015 19:37:14 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On May 31, 2015 7:26 PM, "u8y7541 The Awesome Person" <
surya.subbarao1 at> wrote:
> Dear Python Developers:
> I will be presenting a modification to the float class, which will
improve its speed and accuracy (reduce floating point errors). This is
applicable because Python uses a numerator and denominator rather than a
sign and mantissa to represent floats.

Python's floats are in fact ieee754 floats, using sign/mantissa/exponent,
as provided by all popular CPU floating point hardware. This is why you're
getting the results you see -- 1/3 cannot be exactly represented as a
float, so it gets rounded to the closest representable float, and then
as_integer_ratio shows you an exact representation of this rounded value.
It sounds like you're instead looking for an exact fraction representation,
which in python is available in the standard "fractions" module:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Mon Jun  1 04:48:12 2015
From: rosuav at (Chris Angelico)
Date: Mon, 1 Jun 2015 12:48:12 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person
<surya.subbarao1 at> wrote:
> I will be presenting a modification to the float class, which will improve its speed and accuracy (reduce floating point errors). This is applicable because Python uses a numerator and denominator rather than a sign and mantissa to represent floats.
> First, I propose that a float's integer ratio should be accurate. For example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it returns(6004799503160661, 18014398509481984).

I think you're misunderstanding the as_integer_ratio method. That
isn't how Python works internally; that's a service provided for
parsing out float internals into something more readable. What you
_actually_ are working with is IEEE 754 binary64. (Caveat: I have no
idea what Python-the-language stipulates, nor what other Python
implementations use, but that's what CPython uses, and you did your
initial experiments with CPython. None of this discussion applies *at
all* if a Python implementation doesn't use IEEE 754.) So internally,
1/3 is stored as:

0 <-- sign bit (positive)
01111111101 <-- exponent (1021)
0101010101010101010101010101010101010101010101010101 <-- mantissa (52
bits, repeating)

The exponent is offset by 1023, so this means 1.010101.... divided by
2?; the original repeating value is exactly equal to 4/3, so this is
correct, but as soon as it's squeezed into a finite-sized mantissa, it
gets rounded - in this case, rounded down.

That's where your result comes from. It's been rounded such that it
fits inside IEEE 754, and then converted back to a fraction
afterwards. You're never going to get an exact result for anything
with a denominator that isn't a power of two. Fortunately, Python does
offer a solution: store your number as a pair of integers, rather than
as a packed floating point value, and all calculations truly will be
exact (at the cost of performance):

>>> one_third = fractions.Fraction(1, 3)
>>> one_eighth = fractions.Fraction(1, 8)
>>> one_third + one_eighth
Fraction(11, 24)

This is possibly more what you want to work with.


From random832 at  Mon Jun  1 05:14:06 2015
From: random832 at (random832 at
Date: Sun, 31 May 2015 23:14:06 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote:
> First, I propose that a float's integer ratio should be accurate. For
> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
> returns(6004799503160661, 18014398509481984).

Even though he's mistaken about the core premise, I do think there's a
kernel of a good idea here - it would be nice to have a method (maybe
as_integer_ratio, maybe with some parameter added, maybe a different
method) to return with the smallest denominator that would result in
exactly the original float if divided out, rather than merely the
smallest power of two.

From jim.witschey at  Mon Jun  1 05:21:36 2015
From: jim.witschey at (Jim Witschey)
Date: Mon, 01 Jun 2015 03:21:36 +0000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

Teachable moments about the implementation of floating-point aside,
something in this neighborhood has been considered and rejected before, in
PEP 240. However, that was in 2001 - it was apparently created the same day
as PEP 237, which introduced transparent conversion of machine ints to
bignums in the int type.

I think hiding hardware number implementations has been a success for
integers - it's a far superior API. It could be for rationals as well.

Has something like this thread's original proposal - interpeting
decimal-number literals as fractional values and using fractions as the
result of integer arithmetic - been seriously discussed more recently than
PEP 240? If so, why haven't they been implemented? Perhaps enough has
changed that it's worth reconsidering.

On Sun, May 31, 2015 at 22:49 Chris Angelico <rosuav at> wrote:

> On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person
> <surya.subbarao1 at> wrote:
> >
> > I will be presenting a modification to the float class, which will
> improve its speed and accuracy (reduce floating point errors). This is
> applicable because Python uses a numerator and denominator rather than a
> sign and mantissa to represent floats.
> >
> > First, I propose that a float's integer ratio should be accurate. For
> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
> returns(6004799503160661, 18014398509481984).
> >
> I think you're misunderstanding the as_integer_ratio method. That
> isn't how Python works internally; that's a service provided for
> parsing out float internals into something more readable. What you
> _actually_ are working with is IEEE 754 binary64. (Caveat: I have no
> idea what Python-the-language stipulates, nor what other Python
> implementations use, but that's what CPython uses, and you did your
> initial experiments with CPython. None of this discussion applies *at
> all* if a Python implementation doesn't use IEEE 754.) So internally,
> 1/3 is stored as:
> 0 <-- sign bit (positive)
> 01111111101 <-- exponent (1021)
> 0101010101010101010101010101010101010101010101010101 <-- mantissa (52
> bits, repeating)
> The exponent is offset by 1023, so this means 1.010101.... divided by
> 2?; the original repeating value is exactly equal to 4/3, so this is
> correct, but as soon as it's squeezed into a finite-sized mantissa, it
> gets rounded - in this case, rounded down.
> That's where your result comes from. It's been rounded such that it
> fits inside IEEE 754, and then converted back to a fraction
> afterwards. You're never going to get an exact result for anything
> with a denominator that isn't a power of two. Fortunately, Python does
> offer a solution: store your number as a pair of integers, rather than
> as a packed floating point value, and all calculations truly will be
> exact (at the cost of performance):
> >>> one_third = fractions.Fraction(1, 3)
> >>> one_eighth = fractions.Fraction(1, 8)
> >>> one_third + one_eighth
> Fraction(11, 24)
> This is possibly more what you want to work with.
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mertz at  Mon Jun  1 05:27:47 2015
From: mertz at (David Mertz)
Date: Sun, 31 May 2015 20:27:47 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 31, 2015 at 8:14 PM, <random832 at> wrote:

> Even though he's mistaken about the core premise, I do think there's a
> kernel of a good idea here - it would be nice to have a method (maybe
> as_integer_ratio, maybe with some parameter added, maybe a different
> method) to return with the smallest denominator that would result in
> exactly the original float if divided out, rather than merely the
> smallest power of two.

What is the computational complexity of a hypothetical
float.as_simplest_integer_ratio() method?  How hard that is to find is not
obvious to me (probably it should be, but I'm not sure).

Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From random832 at  Mon Jun  1 05:37:17 2015
From: random832 at (random832 at
Date: Sun, 31 May 2015 23:37:17 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 31, 2015, at 23:21, Jim Witschey wrote:
> I think hiding hardware number implementations has been a success for
> integers - it's a far superior API. It could be for rationals as well.

I'd worry about unbounded complexity. For rationals, unlike integers,
values don't have to be large for their bignum representation to be

> Has something like this thread's original proposal - interpeting
> decimal-number literals as fractional values and using fractions as the
> result of integer arithmetic - been seriously discussed more recently
> than
> PEP 240? If so, why haven't they been implemented? Perhaps enough has
> changed that it's worth reconsidering.

Also, it raises a question of string representation. Granted, "1/3"
becomes much more defensible as the repr of Fraction(1, 3) if it in fact
evaluates to that value, but how much do you like "6/5" as the repr of
1.2? Or are we going to use Fractions for integer division and Decimals
for literals? And, what of decimal division? Right now you can't even
mix Fraction and Decimal in arithmetic operations.

And are we going to add %e %f and %g support for both types? Directly
so, without any detour to float and its limitations (i.e. %.100f gets
you 100 true decimal digits of precision)?

Current reality:
>>> '%.50f' % Fraction(1, 3)
>>> '%.50f' % Decimal('0.3333333333333333333333333333333333333')
>>> '{:.50f}'.format(Fraction(1, 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: non-empty format string passed to object.__format__
>>> '{:.50f}'.format(Decimal('0.3333333333333333333333333333333333'))

Okay, that's one case right out of four.

From bussonniermatthias at  Mon Jun  1 05:46:03 2015
From: bussonniermatthias at (Matthias Bussonnier)
Date: Sun, 31 May 2015 20:46:03 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

> On May 31, 2015, at 20:21, Jim Witschey <jim.witschey at> wrote:
> Teachable moments about the implementation of floating-point aside, something in this neighborhood has been considered and rejected before, in PEP 240. However, that was in 2001 - it was apparently created the same day as PEP 237, which introduced transparent conversion of machine ints to bignums in the int type.
> I think hiding hardware number implementations has been a success for integers - it's a far superior API. It could be for rationals as well.
> Has something like this thread's original proposal - interpeting decimal-number literals as fractional values and using fractions as the result of integer arithmetic - been seriously discussed more recently than PEP 240? If so, why haven't they been implemented? Perhaps enough has changed that it's worth reconsidering.

Why I see the interest, does it really belong in core Python ? What would be the advantages ?

IIRC (during | after) the language submit at PyCon this year, it was said that maybe the stdlib should get 
less features, not more. 

Side note, Sympy as a IPython ast-hook that will wrap all your integers into SymPy Integers and hence
give you rationals of whatever you like, if you want to SymPy-plify your life. 

But for majority of use will it be useful ? What would be the performance costs ?

If you start into stroring rationals, then why not continued fraction, as they are just a N-tuple, instead of 2-tuples.
but then you are limited to non-infinite continued fraction. So you improve by using generator...
I love Python for doing science and math, but please stay away from putting too much in standard lib, 
or we will end up with cholesky matrix decomposition in Python 4.0 like Julia does? and I?m not sure it is a good idea.

I would much rather have a core set of library ?blessed? by CPython that provide features like this one, 
that are deemed ?important?.

From ron3200 at  Mon Jun  1 05:55:43 2015
From: ron3200 at (Ron Adam)
Date: Sun, 31 May 2015 23:55:43 -0400
Subject: [Python-ideas] Explicitly shared objects with sub modules vs
In-Reply-To: <mkclvj$390$>
References: <mkclvj$390$>
Message-ID: <mkgl40$ht6$>

On 05/30/2015 11:45 AM, Ron Adam wrote:
> The solution I found was to call a function to explicitly set the shared
> items in the imported module.

A bit of an improvement...

     def export_to(module, **d):
         """ Explitely share objects with imported module.

             Use this_module.item in the sub-module
             after item is exported to it.
         from collections import namedtuple
         namespace = namedtuple("exported", d.keys())
         for k, v in d.items():
             setattr(namespace, k, v)
         # Not sure about this.  Possibly sys.get_frame would be better.
         setattr(module,, namespace)

And used like this.

     import sub_mod
     export_to(sub_mod, foo=foo,

Then functions in sub-mod can access the objects as if the sub-module 
imported the parent module, but only the exported items are visible to the 
sub module.

Again, this is for closely dependent modules that can't easily be split by 
moving common objects into a mutually imported file, or if it is desired to 
split a larger module by functionality rather than dependency.

There are some limitations, but I think they are actually desirable 
features.  The sub-module can't use exported objects at the top level, and 
it can't alter the parent modules name space directly.

Of course, it could just be my own preferences.  I like the pattern of 
control (the specifying of what gets imported/shared) flowing from the top 


From tjreedy at  Mon Jun  1 06:37:53 2015
From: tjreedy at (Terry Reedy)
Date: Mon, 01 Jun 2015 00:37:53 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <mkgnjt$ftf$>

On 5/31/2015 11:21 PM, Jim Witschey wrote:

> Has something like this thread's original proposal - interpeting
> decimal-number literals as fractional values and using fractions as the
> result of integer arithmetic - been seriously discussed more recently
> than PEP 240?

The competing proposal is to treat decimal literals as decimal.Decimal 

Terry Jan Reedy

From casevh at  Mon Jun  1 06:39:03 2015
From: casevh at (Case Van Horsen)
Date: Sun, 31 May 2015 21:39:03 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 31, 2015 at 8:14 PM,  <random832 at> wrote:
> On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote:
>> First, I propose that a float's integer ratio should be accurate. For
>> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
>> returns(6004799503160661, 18014398509481984).
> Even though he's mistaken about the core premise, I do think there's a
> kernel of a good idea here - it would be nice to have a method (maybe
> as_integer_ratio, maybe with some parameter added, maybe a different
> method) to return with the smallest denominator that would result in
> exactly the original float if divided out, rather than merely the
> smallest power of two.

The gmpy2 library already supports such a method.

>>> import gmpy2
>>> gmpy2.version()
>>> a=gmpy2.mpfr(1)/3
>>> a.as_integer_ratio()
(mpz(6004799503160661), mpz(18014398509481984))
>>> a.as_simple_fraction()

gmpy2 uses a version of the Stern-Brocot algorithm to find the shortest
fraction that, when converted to a floating point value, will return the same
value as the original floating point value. The implementation was
originally done by Alex Martelli; I have just maintained it over the years.

The algorithm is quite fast. If there is a consensus to add this method
to Python, I would be willing to help implement it.


> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From cs at  Mon Jun  1 06:37:23 2015
From: cs at (Cameron Simpson)
Date: Mon, 1 Jun 2015 14:37:23 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 31May2015 20:27, David Mertz <mertz at> wrote:
>On Sun, May 31, 2015 at 8:14 PM, <random832 at> wrote:
>> Even though he's mistaken about the core premise, I do think there's a
>> kernel of a good idea here - it would be nice to have a method (maybe
>> as_integer_ratio, maybe with some parameter added, maybe a different
>> method) to return with the smallest denominator that would result in
>> exactly the original float if divided out, rather than merely the
>> smallest power of two.
>What is the computational complexity of a hypothetical
>float.as_simplest_integer_ratio() method?  How hard that is to find is not
>obvious to me (probably it should be, but I'm not sure).

Probably the same as Euler's greatest common factor method. About log(n) I 
think. Take as_integer_ratio, find greatest common factor, divide both by that.

Cameron Simpson <cs at>

In the desert, you can remember your name,
'cause there ain't no one for to give you no pain.      - America

From random832 at  Mon Jun  1 07:11:10 2015
From: random832 at (random832 at
Date: Mon, 01 Jun 2015 01:11:10 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 1, 2015, at 00:37, Cameron Simpson wrote:
> Probably the same as Euler's greatest common factor method. About log(n)
> I 
> think. Take as_integer_ratio, find greatest common factor, divide both by
> that.

Er, no, because (6004799503160661, 18014398509481984) are already
mutually prime, and we want (1, 3). This is a different problem from
finding a reduced fraction. There are algorithms, I know, for
constraining the denominator to a specific range
(Fraction.limit_denominator does this), but that's not *quite* the same
as finding the lowest one that will still convert exactly to the
original float

From jim.witschey at  Mon Jun  1 07:19:26 2015
From: jim.witschey at (Jim Witschey)
Date: Mon, 1 Jun 2015 01:19:26 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 31, 2015 at 11:37 PM,  <random832 at> wrote:

> I'd worry about unbounded complexity. For rationals, unlike integers,
> values don't have to be large for their bignum representation to be
> large.

I'd expect rational representations to be reasonably small until a
value was operated on many times, in which case you're using more
space, but representing the result very precisely. It's a tradeoff,
but with a small cost in the common case.

I'm no expert, though -- am I not considering some case?

> how much do you like "6/5" as the repr of 1.2?

6/5 is an ugly representation of 1.2, but consider the current state of affairs:

>>> 1.2

"1.2" is imprecisely interpreted as 1.2000000476837158 * (2^0), which
is then imprecisely represented as 1.2. I recognize this is the way
we've dealt with non-integer numbers for a long time, but "1.2" =>
SomeKindOfRational(6, 5) => "6/5" is conceptually cleaner.

> Or are we going to use Fractions for integer division and Decimals
> for literals?

I had been thinking of rationals built on bignums all around, a la
Haskell. Is Fraction as it exists today up to it? I don't know.

I agree that some principled decisions would have to be made for,
e.g., interpretation by format strings.

From casevh at  Mon Jun  1 07:42:21 2015
From: casevh at (Case Van Horsen)
Date: Sun, 31 May 2015 22:42:21 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 31, 2015 at 8:27 PM, David Mertz <mertz at> wrote:

> What is the computational complexity of a hypothetical
> float.as_simplest_integer_ratio() method?  How hard that is to find is not
> obvious to me (probably it should be, but I'm not sure).
Here is a (barely tested) implementation based on the Stern-Brocot tree:

def as_simple_integer_ratio(x):
    x = abs(float(x))
    left = (int(x), 1)
    right = (1, 0)
    while True:
        mediant = (left[0] + right[0], left[1] + right[1])
        test = mediant[0] / mediant[1]
        print(left, right, mediant, test)
        if test == x:
            return mediant
        elif test < x:
            left = mediant
            right = mediant


The approximations are printed so you can watch the convergence.


From cs at  Mon Jun  1 07:27:45 2015
From: cs at (Cameron Simpson)
Date: Mon, 1 Jun 2015 15:27:45 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 01Jun2015 01:11, random832 at <random832 at> wrote:
>On Mon, Jun 1, 2015, at 00:37, Cameron Simpson wrote:
>> Probably the same as Euler's greatest common factor method. About log(n)
>> I
>> think. Take as_integer_ratio, find greatest common factor, divide both by
>> that.
>Er, no, because (6004799503160661, 18014398509481984) are already
>mutually prime, and we want (1, 3). This is a different problem from
>finding a reduced fraction.

Ah, you want the simplest fraction that _also_ gives the same float 

>There are algorithms, I know, for
>constraining the denominator to a specific range
>(Fraction.limit_denominator does this), but that's not *quite* the same
>as finding the lowest one that will still convert exactly to the
>original float


Thanks for this clarification.

Cameron Simpson <cs at>

The Design View editor of Visual InterDev 6.0 is currently incompatible
with Compatibility Mode, and may not function correctly.
- George Politis <george at>, 22apr1999,

From jim.witschey at  Mon Jun  1 07:59:39 2015
From: jim.witschey at (Jim Witschey)
Date: Mon, 1 Jun 2015 01:59:39 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 31, 2015 at 11:46 PM, Matthias Bussonnier
<bussonniermatthias at> wrote:
> IIRC (during | after) the language submit at PyCon this year, it was said that maybe the stdlib should get
> less features, not more.

Rationals (and Decimals) already exist in the standard library. The
original proposal (as I read it, anyway) is more about the default
interpretation of, e.g., integer division and decimal-number literals.

> Side note, Sympy as a IPython ast-hook that will wrap all your integers into SymPy Integers and hence
> give you rationals of whatever you like, if you want to SymPy-plify your life.

Thank you for the pointer -- that's really cool.

> But for majority of use will it be useful ?

I believe interpreting "0.1" as 1/10 is more ergonomic than
representing it as 1.600000023841858 * (2^-4). I see it as being more
useful -- a better fit -- in most use cases because it's simpler, more
precise, and more understandable.

> What would be the performance costs ?

I don't know. Personally, I'd be willing to pay a performance penalty
to avoid reasoning about floating-point arithmetic most of the time,
then "drop into" floats when I need the speed.

From jim.witschey at  Mon Jun  1 08:02:21 2015
From: jim.witschey at (Jim Witschey)
Date: Mon, 1 Jun 2015 02:02:21 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <mkgnjt$ftf$>
References: <>
Message-ID: <>

On Mon, Jun 1, 2015 at 12:37 AM, Terry Reedy <tjreedy at> wrote:
> The competing proposal is to treat decimal literals as decimal.Decimal
> values.

Is that an existing PEP? I couldn't find any such proposal.

From nicholas.chammas at  Mon Jun  1 08:27:57 2015
From: nicholas.chammas at (Nicholas Chammas)
Date: Mon, 01 Jun 2015 06:27:57 +0000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

I don?t know. Personally, I?d be willing to pay a performance penalty
to avoid reasoning about floating-point arithmetic most of the time,
then ?drop into? floats when I need the speed.

This is perhaps a bit off topic for the thread, but +9000 for this.

Having decimal literals or something similar by default, though perhaps
problematic from a backwards compatibility standpoint, is a) user friendly,
b) easily understandable, and c) not surprising to beginners. None of these
qualities apply to float literals.

I always assumed that float literals were mostly an artifact of history or
of some performance limitations. Free of those, why would a language choose
them over decimal literals? When does someone ever expect floating-point
madness, unless they are doing something that is almost certainly not
common, or unless they have been burned in the past?

Every day another programmer gets bitten by floating point stupidities like
this one <>. It would be a big win
to kill this lame ?programmer rite of passage? and give people numbers that
work more like how they learned them in school.

The competing proposal is to treat decimal literals as decimal.Decimal

I?m interested in learning more about such a proposal.


On Mon, Jun 1, 2015 at 2:03 AM Jim Witschey <jim.witschey at> wrote:

> On Sun, May 31, 2015 at 11:46 PM, Matthias Bussonnier
> <bussonniermatthias at> wrote:
> > IIRC (during | after) the language submit at PyCon this year, it was
> said that maybe the stdlib should get
> > less features, not more.
> Rationals (and Decimals) already exist in the standard library. The
> original proposal (as I read it, anyway) is more about the default
> interpretation of, e.g., integer division and decimal-number literals.
> > Side note, Sympy as a IPython ast-hook that will wrap all your integers
> into SymPy Integers and hence
> > give you rationals of whatever you like, if you want to SymPy-plify your
> life.
> Thank you for the pointer -- that's really cool.
> > But for majority of use will it be useful ?
> I believe interpreting "0.1" as 1/10 is more ergonomic than
> representing it as 1.600000023841858 * (2^-4). I see it as being more
> useful -- a better fit -- in most use cases because it's simpler, more
> precise, and more understandable.
> > What would be the performance costs ?
> I don't know. Personally, I'd be willing to pay a performance penalty
> to avoid reasoning about floating-point arithmetic most of the time,
> then "drop into" floats when I need the speed.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Mon Jun  1 08:37:27 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 1 Jun 2015 16:37:27 +1000
Subject: [Python-ideas] Explicitly shared objects with sub modules vs
In-Reply-To: <mkgl40$ht6$>
References: <mkclvj$390$>
Message-ID: <>

On 1 June 2015 at 13:55, Ron Adam <ron3200 at> wrote:
> Of course, it could just be my own preferences.  I like the pattern of
> control (the specifying of what gets imported/shared) flowing from the top
> down.

This is actually how we bootstrap the import system in 3.3+ (we inject
the sys and os modules *after* the top level execution of the
bootstrap module is done, since the "import" statement doesn't work
yet at the point where that module is running).

However, this trick is never a desirable answer, just sometimes the
least wrong choice out of multiple bad options :)


P.S. Python 3.5 is also more tolerant of circular imports than has
historically been the case:

Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abarnert at  Mon Jun  1 08:39:34 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 31 May 2015 23:39:34 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On May 31, 2015, at 20:37, random832 at wrote:
> Also, it raises a question of string representation. Granted, "1/3"
> becomes much more defensible as the repr of Fraction(1, 3) if it in fact
> evaluates to that value, but how much do you like "6/5" as the repr of
> 1.2? Or are we going to use Fractions for integer division and Decimals
> for literals?

That's the big problem. There's no one always-right answer.

If you interpret the literal 1.20 a Fraction, it's going to be more confusing, not less, to people who are just trying to add up dollars and cents. Do a long financial computation and, instead of $691.05 as you expected or $691.0500000237 as you get today, you've got 10215488088 / 14782560. Not to mention that financial calculations often tend to involve things like e or exponentiation to non-integral powers, and what happens then? And then of course there's the unbounded size issue. If you do a long chain of operations that can theoretically be represented exactly followed by one that can't, you're wasting a ton of time and space for those intermediate values (and, unlike Haskell, Python can't look at the whole expression in advance and determine what the final type will be).

On other other hand, if you interpret 1.20 it as a Decimal, now you can't sensibly mix 1.20 * 3/4 without coming up with a rule for how decimal and fraction types should interact. (OK, there's an obvious right answer for multiplication, but what about for addition?)

And either one leads to people asking why the code they ported from Java or Ruby is broken on Python.

You could make it configurable, so integer division is your choice of float, fraction, or decimal and decimal literals are your separate choice of the same three (and maybe also let fraction exponentiation be your choice of decimal and float), but then which setting is the default? Also, where do you set that? It has to be available at compile time, unless you want to add new types like "decimal literal" at compile time that are interpreted appropriately at runtime (which some languages do, and it works, but it definitely adds complexity).

Maybe the answer is just to make it easier to be explicit, using something like C++ literal suffixes, so you can write, e.g., 1.20d or 1/3f (and I guess 1.2f) instead of Decimal('1.20') or Fraction(1, 3) (and Fraction(12, 10)).

> And, what of decimal division? Right now you can't even
> mix Fraction and Decimal in arithmetic operations.
> And are we going to add %e %f and %g support for both types? Directly
> so, without any detour to float and its limitations (i.e. %.100f gets
> you 100 true decimal digits of precision)?

At least here I think the answer is clear. %-substitution is printf-like, and shouldn't change. If you want formatting that can be overloaded by the type, you use {}, which already works.

From ncoghlan at  Mon Jun  1 09:08:40 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 1 Jun 2015 17:08:40 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 1 June 2015 at 16:27, Nicholas Chammas <nicholas.chammas at> wrote:
> I always assumed that float literals were mostly an artifact of history or
> of some performance limitations. Free of those, why would a language choose
> them over decimal literals?

In a world of binary computers, no programming language is free of
those constraints - if you choose decimal literals as your default,
you take a *big* performance hit, because computers are designed as
binary systems. (Some languages, like IBM's REXX, do choose to use
decimal integers by default)

For CPython, we offer C-accelerated decimal support by default since
3.3 (available as pip install cdecimal in Python 2), but it still
comes at a high cost in speed:

$ python3 -m timeit -s "n = 1.0; d = 3.0" "n / d"
10000000 loops, best of 3: 0.0382 usec per loop
$ python3 -m timeit -s "from decimal import Decimal as D; n = D(1); d
= D(3)" "n / d"
10000000 loops, best of 3: 0.129 usec per loop

And this isn't even like the situation with integers, where the
semantics of long integers are such that native integers can be used
transparently as an optimisation technique - IEEE754 (which defines
the behaviour of native binary floats) and the General Decimal
Arithmetic Specification (which defines the behaviour of the decimal
module) are genuinely different ways of doing floating point
arithmetic, since the choice of base 2 or base 10 has far reaching
ramifications for the way various operations work and how various
errors accumulate.

We aren't even likely to see widespread proliferation of hardware
level decimal arithmetic units, because the "binary arithmetic is
easier to implement than decimal arithmetic" consideration extends
down to the hardware layer as well - a decimal arithmetic unit takes
more silicon, and hence more power, than a similarly capable binary
unit. With battery conscious mobile device design and environmentally
conscious data centre design being two of the most notable current
trends in CPU design, this makes it harder than ever to justify
providing hardware support for both in general purpose computing

For some use cases (e.g. financial math), it's worth paying the price
in speed to get the base 10 arithmetic semantics, or the cost in
hardware to accelerate it, but for most other situations, we end up
being better off teaching humans to cope with the fact that binary
logic is the native language of our computational machines.

Binary vs decimal floating point is a lot like the Unicode bytes/text
distinction in that regard: while Unicode is a better model for
representing human communications, there's no avoiding the fact that
that text eventually has to be rendered as a bitstream in order to be
saved or transmitted.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From stefan_ml at  Mon Jun  1 09:10:24 2015
From: stefan_ml at (Stefan Behnel)
Date: Mon, 01 Jun 2015 09:10:24 +0200
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <mkh0h0$jg4$>

random832 at schrieb am 01.06.2015 um
> On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote:
>> First, I propose that a float's integer ratio should be accurate. For
>> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
>> returns(6004799503160661, 18014398509481984).
> Even though he's mistaken about the core premise, I do think there's a
> kernel of a good idea here - it would be nice to have a method (maybe
> as_integer_ratio, maybe with some parameter added, maybe a different
> method) to return with the smallest denominator that would result in
> exactly the original float if divided out, rather than merely the
> smallest power of two.

The fractions module seems the obvious place to put this. Consider opening
a feature request. Target version would be Python 3.6.


From mal at  Mon Jun  1 11:05:14 2015
From: mal at (M.-A. Lemburg)
Date: Mon, 01 Jun 2015 11:05:14 +0200
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

On 31.05.2015 18:19, David Townshend wrote:
>> The default for npm is that your package dir is attached directly to the
>> project. You can get more flexibility by setting an environment variable or
>> creating a symlink, but normally you don't. It has about the same
>> flexibility as virtualenvwrapper, with about the same amount of effort. So
>> if virtualenvwrapper isn't flexible enough for you, my guess is that your
>> take on npm won't be flexible enough either, it'll just come preconfigured
>> for your own idiosyncratic use and everyone else will have to adjust...
> You have a point.  Maybe lack of flexibility is not actually the issue -
> it's too much flexibility.  The problem that I have with virtualenv is that
> it requires quite a bit of configuration and a great deal of awareness by
> the user of what is going on and how things are configured.  As stated on
> it's home page While there is nothing specifically wrong with this, I
> usually just want a way to do something in a venv without thinking too much
> about where it is or when or how to activate it.  If you've had a look at
> the details of the sort of tool I'm proposing, it is completely
> transparent.  Perhaps the preconfiguration is just to my own
> idiosyncrasies, but if it serves its use 90% of the time then maybe that is
> good enough.

If you want to have a system that doesn't require activation,
you may want to take a look at what we've done with PyRun:

It basically takes the "virtual" out of virtualenvs. Instead
of creating a local symlinked copy of your host Python installation,
you create a completely separate Python installation (which isn't
much heavier than a virtualenv due to the way this is done).

Once installed, everything works relative to the PyRun binary,
so you don't need to activate anything when running code inside
your installation: you just need to run the right PyRun binary
and this automatically gives you access to everything else
you installed in your environment.

In our latest release, we've added requirements.txt support
to the installation helper install-pyrun, so that you can

install-pyrun -r requirements.txt .

to bootstrap a complete project environment with one command.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 01 2015)
>>> Python Projects, Coaching and Consulting ...
>>> mxODBC Plone/Zope Database Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...

::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From steve at  Mon Jun  1 14:09:02 2015
From: steve at (Steven D'Aprano)
Date: Mon, 1 Jun 2015 22:09:02 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 31, 2015 at 11:37:17PM -0400, random832 at wrote:
> On Sun, May 31, 2015, at 23:21, Jim Witschey wrote:
> > I think hiding hardware number implementations has been a success for
> > integers - it's a far superior API. It could be for rationals as well.
> I'd worry about unbounded complexity. For rationals, unlike integers,
> values don't have to be large for their bignum representation to be
> large.

You and Guido both. ABC used exact integer fractions as their numeric 
type, and Guido has spoken many times about the cost in both time and 
space (memory) of numeric calculations using rationals.


From random832 at  Mon Jun  1 14:59:47 2015
From: random832 at (random832 at
Date: Mon, 01 Jun 2015 08:59:47 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 1, 2015, at 02:39, Andrew Barnert wrote:
> At least here I think the answer is clear. %-substitution is printf-like,
> and shouldn't change. If you want formatting that can be overloaded by
> the type, you use {}, which already works.

The original proposal was for *getting rid of* float as we know it.
Which, unless the floating format specifiers for % are likewise removed,
means their semantics have to be defined in terms of types that still

From liik.joonas at  Mon Jun  1 16:52:35 2015
From: liik.joonas at (Joonas Liik)
Date: Mon, 1 Jun 2015 17:52:35 +0300
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

Having some sort of decimal literal would have some advantages of its own,
for one it could help against this sillyness:

>>> Decimal(1.3)

>>> Decimal('1.3')

I'm not saying that the actual data type needs to be a decimal (
might well be a float but say shove the string repr next to it so it can be
accessed when needed)

..but this is one really common pitfall for new users, i know its easy to
fix the code above,
but this behavior is very unintuitive.. you essentially get a really
expensive float when you do the obvious thing.

Not sure if this is worth the effort but it would help smooth some corners
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Mon Jun  1 16:58:06 2015
From: steve at (Steven D'Aprano)
Date: Tue, 2 Jun 2015 00:58:06 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 01, 2015 at 06:27:57AM +0000, Nicholas Chammas wrote:

> Having decimal literals or something similar by default, though perhaps
> problematic from a backwards compatibility standpoint, is a) user friendly,
> b) easily understandable, and c) not surprising to beginners. None of these
> qualities apply to float literals.

I wish this myth about Decimals would die, because it isn't true. The 
only advantage of base-10 floats over base-2 floats -- and I'll admit it 
can be a big advantage -- is that many of the numbers we commonly care 
about can be represented in Decimal exactly, but not as base-2 floats. 
In every other way, Decimals are no more user friendly, understandable, 
or unsurprising than floats. Decimals violate all the same rules of 
arithmetic that floats do. This should not come as a surprise, since 
decimals *are* floats, they merely use base 10 rather than base 2.

In the past, I've found that people are very resistant to this fact, so 
I'm going to show a few examples of how Decimals violate the fundamental 
laws of mathematics just as floats do. For those who already know this, 
please forgive me belabouring the obvious.

In mathematics, adding anything other than zero to a number must give 
you a different number. Decimals violate that expectation just as 
readily as binary floats:

py> from decimal import Decimal as D
py> x = D(10)**30
py> x == x + 100  # should be False

Apart from zero, multiplying a number by its inverse should always give 
one. Again, violated by decimals:

py> one_third = 1/D(3)
py> 3*one_third == 1

Inverting a number twice should give the original number back:

py> 1/(1/D(7)) == 7

Here's a violation of the Associativity Law, which states that (a+b)+c 
should equal a+(b+c) for any values a, b, c:

py> a = D(1)/17
py> b = D(5)/7
py> c = D(12)/13
py> (a + b) + c == a + (b+c)

(For the record, it only took me two attempts, and a total of about 30 
seconds, to find that example, so it's not particularly difficult to 
come across such violations.)

Here's a violation of the Distributive Law, which states that a*(b+c) 
should equal a*b + a*c:

py> a = D(15)/2
py> b = D(15)/8
py> c = D(1)/14
py> a*(b+c) == a*b + a*c

(I'll admit that was a bit trickier to find.)

This one is a bit subtle, and to make it easier to see what is going on 
I will reduce the number of digits used. When you take the average of 
two numbers x and y, mathematically the average must fall *between* x 
and y. With base-2 floats, we can't guarantee that the average will be 
strictly between x and y, but we can be sure that it will be either 
between the two values, or equal to one of them. 

But base-10 Decimal floats cannot even guarantee that. Sometimes the 
calculated average falls completely outside of the inputs.

py> from decimal import getcontext
py> getcontext().prec = 3
py> x = D('0.516')
py> y = D('0.518')
py> (x+y)/2  # should be 0.517

This one is even worse:

py> getcontext().prec = 1
py> x = D('51.6')
py> y = D('51.8')
py> (x+y)/2  # should be 51.7

Instead of the correct answer of 51.7, Decimal calculates the answer as 
50 exactly.

> I always assumed that float literals were mostly an artifact of history or
> of some performance limitations. Free of those, why would a language choose
> them over decimal literals? 

Performance and accuracy will always be better for binary floats. Binary 
floats are faster, and have stronger error bounds and slower-growing 
errors. Decimal floats suffer from the same problems as binary floats, 
only more so, and are slower to boot.

> When does someone ever expect floating-point
> madness, unless they are doing something that is almost certainly not
> common, or unless they have been burned in the past?
> Every day another programmer gets bitten by floating point stupidities like
> this one <>. It would be a big win
> to kill this lame ?programmer rite of passage? and give people numbers that
> work more like how they learned them in school.

There's a lot wrong with that.

- The sorts of errors we see with floats are not "madness", but the 
completely logical consequences of what happens when you try to do 
arithmetic in anything less than the full mathematical abstraction.

- And they aren't rare either -- they're incredibly common. Fortunately, 
most of the time they don't matter, or aren't obvious, or both.

- Decimals don't behave like the numbers you learn in school either. 
Floats are not real numbers, regardless of which base you use. And in 
fact, the smaller the base, the smaller the errors. Binary floats are 
better than decimals in this regard.

(Decimals *only* win out due to human bias: we don't care too much that 
1/7 cannot be expressed exactly as a float using *either* binary or 
decimal, but we do care about 1/10. And we conveniently ignore the case 
of 1/3, because familiarity breeds contempt.)

- Being at least vaguely aware of floating point issues shouldn't be 
difficult for anyone who has used a pocket calculator. And yet every day 
brings in another programmer surprised by floats.

- It's not really a rite of passage, that implies that it is arbitrary 
and imposed culturally. Float issues aren't arbitrary, they are baked 
into the very nature of the universe.

You cannot hope to perform infinitely precise real-number arithmetic 
using just a finite number of bits of storage, no matter what system you 
use. Fixed-point maths has its own problems, as does rational maths.

All you can do is choose to shift the errors from some calculations to 
other calculations, you cannot eliminate them altogether. 


From mertz at  Mon Jun  1 16:54:13 2015
From: mertz at (David Mertz)
Date: Mon, 1 Jun 2015 07:54:13 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

Decimal literals are far from as obvious as suggested.  We *have* the
`decimal` module after all, and it defines all sorts of parameters on
precision, rounding rules, etc. that one can provide context for.
 decimal.ROUND_HALF_DOWN is "the obvious way" for some users,
while decimal.ROUND_CEILING is "the obvious way" for others.

I like decimals, but they don't simply make all the mathematical answers
result in what all users would would consider "do what I mean" either.

On Sun, May 31, 2015 at 11:27 PM, Nicholas Chammas <
nicholas.chammas at> wrote:

> I don?t know. Personally, I?d be willing to pay a performance penalty
> to avoid reasoning about floating-point arithmetic most of the time,
> then ?drop into? floats when I need the speed.
> This is perhaps a bit off topic for the thread, but +9000 for this.
> Having decimal literals or something similar by default, though perhaps
> problematic from a backwards compatibility standpoint, is a) user friendly,
> b) easily understandable, and c) not surprising to beginners. None of these
> qualities apply to float literals.
> I always assumed that float literals were mostly an artifact of history or
> of some performance limitations. Free of those, why would a language choose
> them over decimal literals? When does someone ever expect floating-point
> madness, unless they are doing something that is almost certainly not
> common, or unless they have been burned in the past?
> Every day another programmer gets bitten by floating point stupidities
> like this one <>. It would be a
> big win to kill this lame ?programmer rite of passage? and give people
> numbers that work more like how they learned them in school.
> The competing proposal is to treat decimal literals as decimal.Decimal
> values.
> I?m interested in learning more about such a proposal.
> Nick
> ?
> On Mon, Jun 1, 2015 at 2:03 AM Jim Witschey <jim.witschey at>
> wrote:
>> On Sun, May 31, 2015 at 11:46 PM, Matthias Bussonnier
>> <bussonniermatthias at> wrote:
>> > IIRC (during | after) the language submit at PyCon this year, it was
>> said that maybe the stdlib should get
>> > less features, not more.
>> Rationals (and Decimals) already exist in the standard library. The
>> original proposal (as I read it, anyway) is more about the default
>> interpretation of, e.g., integer division and decimal-number literals.
>> > Side note, Sympy as a IPython ast-hook that will wrap all your integers
>> into SymPy Integers and hence
>> > give you rationals of whatever you like, if you want to SymPy-plify
>> your life.
>> Thank you for the pointer -- that's really cool.
>> > But for majority of use will it be useful ?
>> I believe interpreting "0.1" as 1/10 is more ergonomic than
>> representing it as 1.600000023841858 * (2^-4). I see it as being more
>> useful -- a better fit -- in most use cases because it's simpler, more
>> precise, and more understandable.
>> > What would be the performance costs ?
>> I don't know. Personally, I'd be willing to pay a performance penalty
>> to avoid reasoning about floating-point arithmetic most of the time,
>> then "drop into" floats when I need the speed.
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> Code of Conduct:
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From liik.joonas at  Mon Jun  1 17:12:52 2015
From: liik.joonas at (Joonas Liik)
Date: Mon, 1 Jun 2015 18:12:52 +0300
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

I'm sorry..

what i meant was not a literal that results in a Decimal, what i meant was
a special literal proxy object that usualyl acts like a float except you
can ask for its original string form.


flit = 1.3
flit*3 == float(flit)*3
str(flit) == '1.3'

thus in cases where the intermediate float conversion loses precision you
can get at the original string that the programmer actually typed in.

Decimal constructors are one case that woudl probably like to use the
original string whenever possible to avoid conversion losses,
but by no means are they the only ones.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From p.f.moore at  Mon Jun  1 17:19:37 2015
From: p.f.moore at (Paul Moore)
Date: Mon, 1 Jun 2015 16:19:37 +0100
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 1 June 2015 at 15:58, Steven D'Aprano <steve at> wrote:
> (Decimals *only* win out due to human bias: we don't care too much that
> 1/7 cannot be expressed exactly as a float using *either* binary or
> decimal, but we do care about 1/10. And we conveniently ignore the case
> of 1/3, because familiarity breeds contempt.)

There is one other "advantage" to decimals - they behave like
electronic calculators (which typically used decimal arithmetic). This
is a variation of "human bias" - we (if we're of a certain age, maybe
today's youngsters are less used to the vagaries of electronic
calculators :-)) are used to seeing 1/3 displayed as 0.33333333, and
showing that 1/3*3 = 0.99999999 was a "fun calculator fact" when I was
at school.


From rosuav at  Mon Jun  1 18:20:04 2015
From: rosuav at (Chris Angelico)
Date: Tue, 2 Jun 2015 02:20:04 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 2, 2015 at 12:58 AM, Steven D'Aprano <steve at> wrote:
> This one is even worse:
> py> getcontext().prec = 1
> py> x = D('51.6')
> py> y = D('51.8')
> py> (x+y)/2  # should be 51.7
> Decimal('5E+1')
> Instead of the correct answer of 51.7, Decimal calculates the answer as
> 50 exactly.

To be fair, you've actually destroyed precision so much that your
numbers start out effectively equal:

>>> from decimal import Decimal as D, getcontext
>>> getcontext().prec = 1
>>> x = D('51.6')
>>> y = D('51.8')
>>> x == y
>>> x + 0 == y + 0

They're not actually showing up as equal, but only because the
precision setting doesn't (apparently) apply to the constructor. If
adding zero to both sides of an equation makes it equal when it wasn't
before, something seriously screwy is going on.

(Actually, this behaviour of decimal.Decimal reminds me very much of
REXX. Since there are literally no data types in REXX (everything is a
string), the numeric precision setting ("NUMERIC DIGITS n") applies
only to arithmetic operations, so the same thing of adding zero to
both sides can happen.)

So what you're really doing here is averaging 5E+1 and 5E+1, with an
unsurprising result of... 5E+1. Your other example is more significant
here, because your numbers actually do fit inside the precision limits
- and then the end result slips outside the bounds.


From random832 at  Mon Jun  1 18:43:43 2015
From: random832 at (random832 at
Date: Mon, 01 Jun 2015 12:43:43 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 1, 2015, at 10:58, Steven D'Aprano wrote:
> I wish this myth about Decimals would die, because it isn't true. The 
> only advantage of base-10 floats over base-2 floats -- and I'll admit it 
> can be a big advantage -- is that many of the numbers we commonly care 
> about can be represented in Decimal exactly, but not as base-2 floats. 
> In every other way, Decimals are no more user friendly, understandable, 
> or unsurprising than floats. Decimals violate all the same rules of 
> arithmetic that floats do.

But people have been learning about those rules, as apply to decimals,
since they were small children. They know intuitively that 2/3 rounds to
...6667 at some point because they've done exactly that by hand. "user
friendly" and "understandable to beginners" don't arise in a vacuum.

From techtonik at  Mon Jun  1 17:46:34 2015
From: techtonik at (anatoly techtonik)
Date: Mon, 1 Jun 2015 18:46:34 +0300
Subject: [Python-ideas] Why decode()/encode() name is harmful
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, May 30, 2015 at 3:18 AM, Steven D'Aprano <steve at> wrote:
> As far as I can see, he has been given the solution, or at least a
> potential solution, on python-list, but as far as I can tell he either
> hasn't read it, or doesn't like the solutions offerred and so is
> ignoring them.

Let me update you on this. There was no solution given. Only the
pointers to go read some pointers on the internets again. So, yes,
I read replies. But I have very little time to analyse and follow up.

The idea I wanted to convey in this thread is that encode/decode
is confusing, so if you agree with that, I can start to propose

And just to make you understand the importance of the question
with translating from bytes to unicode and back, let me just tell
that this question is the third one voted with 221k views on SO in
Python 3 tag.

anatoly t.

From nicholas.chammas at  Mon Jun  1 19:24:32 2015
From: nicholas.chammas at (Nicholas Chammas)
Date: Mon, 1 Jun 2015 13:24:32 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

Well, I learned a lot about decimals today. :)

On Mon, Jun 1, 2015 at 3:08 AM, Nick Coghlan ncoghlan at
<http://mailto:ncoghlan at> wrote:

In a world of binary computers, no programming language is free of
those constraints - if you choose decimal literals as your default,
you take a *big* performance hit, because computers are designed as
binary systems. (Some languages, like IBM?s REXX, do choose to use
decimal integers by default)

I guess it?s a non-trivial tradeoff. But I would lean towards considering
people likely to be affected by the performance hit as doing something ?not
common?. Like, if they are doing that many calculations that it matters,
perhaps it makes sense to ask them to explicitly ask for floats vs.
decimals, in exchange for giving the majority who wouldn?t notice a
performance difference a better user experience.

On Mon, Jun 1, 2015 at 10:58 AM, Steven D?Aprano steve at
<http://mailto:steve at> wrote:

I wish this myth about Decimals would die, because it isn?t true.

Your email had a lot of interesting information about decimals that would
make a good blog post, actually. Writing one up will perhaps help kill this
myth in the long run :)

In the past, I?ve found that people are very resistant to this fact, so
I?m going to show a few examples of how Decimals violate the fundamental
laws of mathematics just as floats do.

How many of your examples are inherent limitations of decimals vs. problems
that can be improved upon?

Admittedly, the only place where I?ve played with decimals extensively is
on Microsoft?s SQL Server (where they are the default literal
<>). I?ve stumbled in
the past on my own decimal gotchas
<>, but looking at your examples
and trying them on SQL Server I suspect that most of the problems you show
are problems of precision and scale.

Perhaps Python needs better rules for how precision and scale are affected
by calculations (here are SQL Server?s
<>, for example), or
better defaults when they are not specified?

Anyway, here?s what happens on SQL Server for some of the examples you

Adding 100:

py> from decimal import Decimal as D
py> x = D(10)**30
py> x == x + 100 # should be False

DECLARE @x DECIMAL(38,0) = '1' + REPLICATE(0, 30);

IF @x = @x + 100
  SELECT 'equal' AS adding_100ELSE
  SELECT 'not equal' AS adding_100

Gives ?not equal? <!6/9eecb7db59d16c80417c72d1/1645/0>.
Leaving out the precision when declaring @x (i.e. going with the default
precision of 18 <>)
immediately yields an understandable data truncation error.


py> a = D(1)/17
py> b = D(5)/7
py> c = D(12)/13
py> (a + b) + c == a + (b+c)

DECIMAL = 12.0/13;

IF (@a + @b) + @c = @a + (@b + @c)
  SELECT 'equal' AS associativeELSE
  SELECT 'not equal' AS associative

Gives ?equal? <!6/9eecb7db59d16c80417c72d1/1656/0>.


py> a = D(15)/2
py> b = D(15)/8
py> c = D(1)/14
py> a*(b+c) == a*b + a*c

DECIMAL = 1.0/14;

IF @a * (@b + @c) = @a*@b + @a*@c
  SELECT 'equal' AS distributiveELSE
  SELECT 'not equal' AS distributive

Gives ?equal? <!6/9eecb7db59d16c80417c72d1/1655/0>.

I think some of the other decimal examples you provide, though definitely
not 100% beginner friendly, are still way more human-friendly because they
are explainable in terms of precision and scale, which we can understand
more simply (?there aren?t enough decimal places to carry the result?) and
which have parallels in other areas of life as Paul pointed out.

   - The sorts of errors we see with floats are not ?madness?, but the
   completely logical consequences of what happens when you try to do
   arithmetic in anything less than the full mathematical abstraction.

 I don?t mean madness as in incorrect, I mean madness as in difficult to
predict and difficult to understand.

Your examples do show that it isn?t all roses and honey with decimals, but
do you find it easier to understand explain all the weirdness of floats vs.

Understanding float weirdness (and disclaimer: I don?t) seems to require
understanding some hairy stuff, and even then it is not predictable because
there are platform dependent issues. Understanding decimal ?weirdness?
seems to require only understanding precision and scale, and after that it
is mostly predictable.


On Mon, Jun 1, 2015 at 11:19 AM Paul Moore <p.f.moore at> wrote:

On 1 June 2015 at 15:58, Steven D'Aprano <steve at> wrote:
> > (Decimals *only* win out due to human bias: we don't care too much that
> > 1/7 cannot be expressed exactly as a float using *either* binary or
> > decimal, but we do care about 1/10. And we conveniently ignore the case
> > of 1/3, because familiarity breeds contempt.)
> There is one other "advantage" to decimals - they behave like
> electronic calculators (which typically used decimal arithmetic). This
> is a variation of "human bias" - we (if we're of a certain age, maybe
> today's youngsters are less used to the vagaries of electronic
> calculators :-)) are used to seeing 1/3 displayed as 0.33333333, and
> showing that 1/3*3 = 0.99999999 was a "fun calculator fact" when I was
> at school.
> Paul
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From carl at  Mon Jun  1 19:58:42 2015
From: carl at (Carl Meyer)
Date: Mon, 01 Jun 2015 11:58:42 -0600
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>


On 06/01/2015 03:05 AM, M.-A. Lemburg wrote:
> If you want to have a system that doesn't require activation,
> you may want to take a look at what we've done with PyRun:

Virtualenv doesn't require activation either.

Activation is a convenience for running repeated commands in the
virtualenv context, but all it does is change your shell PATH; you can
explicitly specify the virtualenv's python binary and never use
activation, if you wish.

> It basically takes the "virtual" out of virtualenvs. Instead
> of creating a local symlinked copy of your host Python installation,
> you create a completely separate Python installation (which isn't
> much heavier than a virtualenv due to the way this is done).

Virtualenv doesn't create "a local symlinked copy of your host Python
installation." It copies the binary, symlinks a few key stdlib modules
that are necessary to bootstrap, and then its custom
finds the host Python's stdlib directory and adds it to `sys.path`.

> Once installed, everything works relative to the PyRun binary,
> so you don't need to activate anything when running code inside
> your installation: you just need to run the right PyRun binary
> and this automatically gives you access to everything else
> you installed in your environment.

This is exactly how virtualenv (and pyvenv in Python 3.3+) works.
Everything is relative to the Python binary in the virtualenv (this
behavior is built into the Python executable, actually). You can just
directly run the virtualenv's Python binary (or any script with that
Python binary in its shebang, which includes all pip or easy-installed
scripts in the virtualenv's bin/ dir), without ever activating anything.

It seems the main difference between virtualenv and PyRun is in how much
of the standard library is bundled with each environment, and that I
guess PyRun doesn't come with any convenience activation shell script?
But the method by which "activation" actually occurs is identical (at
least as far as you're described it here.)


> In our latest release, we've added requirements.txt support
> to the installation helper install-pyrun, so that you can
> run
> install-pyrun -r requirements.txt .
> to bootstrap a complete project environment with one command.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <>

From breamoreboy at  Mon Jun  1 20:32:03 2015
From: breamoreboy at (Mark Lawrence)
Date: Mon, 01 Jun 2015 19:32:03 +0100
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <mki8f6$cgl$>

On 01/06/2015 15:52, Joonas Liik wrote:
> Having some sort of decimal literal would have some advantages of its
> own, for one it could help against this sillyness:
>  >>> Decimal(1.3)
> Decimal('1.3000000000000000444089209850062616169452667236328125')
>  >>> Decimal('1.3')
> Decimal('1.3')
> I'm not saying that the actual data type needs to be a decimal (
> might well be a float but say shove the string repr next to it so it can
> be accessed when needed)
> ..but this is one really common pitfall for new users, i know its easy
> to fix the code above,
> but this behavior is very unintuitive.. you essentially get a really
> expensive float when you do the obvious thing.
> Not sure if this is worth the effort but it would help smooth some
> corners potentially..

Far easier to point them to and/or

My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

From mal at  Mon Jun  1 20:32:56 2015
From: mal at (M.-A. Lemburg)
Date: Mon, 01 Jun 2015 20:32:56 +0200
Subject: [Python-ideas] npm-style venv-aware launcher
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

On 01.06.2015 19:58, Carl Meyer wrote:
> Hi,
> On 06/01/2015 03:05 AM, M.-A. Lemburg wrote:
>> If you want to have a system that doesn't require activation,
>> you may want to take a look at what we've done with PyRun:
> Virtualenv doesn't require activation either.
> Activation is a convenience for running repeated commands in the
> virtualenv context, but all it does is change your shell PATH; you can
> explicitly specify the virtualenv's python binary and never use
> activation, if you wish.

Ok, I was always under the impression that the activation
script also does other magic to have the virtualenv Python
find the right settings.

That's good to know, thanks.

>> It basically takes the "virtual" out of virtualenvs. Instead
>> of creating a local symlinked copy of your host Python installation,
>> you create a completely separate Python installation (which isn't
>> much heavier than a virtualenv due to the way this is done).
> Virtualenv doesn't create "a local symlinked copy of your host Python
> installation." It copies the binary, symlinks a few key stdlib modules
> that are necessary to bootstrap, and then its custom
> finds the host Python's stdlib directory and adds it to `sys.path`.

Well, this is what I call a symlinked copy :-) It still points
to the system installed Python for the stdlib, shared
mods and include files.

>> Once installed, everything works relative to the PyRun binary,
>> so you don't need to activate anything when running code inside
>> your installation: you just need to run the right PyRun binary
>> and this automatically gives you access to everything else
>> you installed in your environment.
> This is exactly how virtualenv (and pyvenv in Python 3.3+) works.
> Everything is relative to the Python binary in the virtualenv (this
> behavior is built into the Python executable, actually). You can just
> directly run the virtualenv's Python binary (or any script with that
> Python binary in its shebang, which includes all pip or easy-installed
> scripts in the virtualenv's bin/ dir), without ever activating anything.
> It seems the main difference between virtualenv and PyRun is in how much
> of the standard library is bundled with each environment, 

The main difference is that PyRun is a stand-alone Python
runtime which doesn't depend on the system Python installation
at all.

We created it to no longer have to worry about supporting
dozens of different Python installation variants on Unix
platforms and it turned out to be small enough to just always
use instead of virtualenv.

> and that I
> guess PyRun doesn't come with any convenience activation shell script?
> But the method by which "activation" actually occurs is identical (at
> least as far as you're described it here.)

After what you've explained, the sys.path setup is indeed
very similar (well, PyRun doesn't really need much of it
since almost the whole Python stdlib is baked into the binary).

What virtualenv doesn't appear to do is update sysconfig to
point to the virtualenv environment instead of the host

> Carl
>> In our latest release, we've added requirements.txt support
>> to the installation helper install-pyrun, so that you can
>> run
>> install-pyrun -r requirements.txt .
>> to bootstrap a complete project environment with one command.
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> Code of Conduct:

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 01 2015)
>>> Python Projects, Coaching and Consulting ...
>>> mxODBC Plone/Zope Database Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...

::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From surya.subbarao1 at  Mon Jun  1 21:13:44 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Mon, 1 Jun 2015 12:13:44 -0700
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 2
In-Reply-To: <>
References: <>
Message-ID: <>

Floats internally use numerator and denominator (float.as_integer_ratio().)
It makes no sense to have sign and mantissa while displaying numerator and
denominator. Perhaps a redo of the class? I believe fractions should be the
standard, and just keep the ieee754 floats as a side option.

On Sun, May 31, 2015 at 7:37 PM, <python-ideas-request at> wrote:

> Send Python-ideas mailing list submissions to
>         python-ideas at
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
>         python-ideas-request at
> You can reach the person managing the list at
>         python-ideas-owner at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-ideas digest..."
> Today's Topics:
>    1. Python Float Update (u8y7541 The Awesome Person)
>    2. Re: Python Float Update (Nathaniel Smith)
> ----------------------------------------------------------------------
> Message: 1
> Date: Sun, 31 May 2015 19:25:46 -0700
> From: u8y7541 The Awesome Person <surya.subbarao1 at>
> To: python-ideas at
> Subject: [Python-ideas] Python Float Update
> Message-ID:
>         <
> CA+o1fZONRG_dZzct_VZwcUvtwD-5rJ6zOFxNR34jWFrpUXiw9Q at>
> Content-Type: text/plain; charset="utf-8"
> Dear Python Developers:
> I will be presenting a modification to the float class, which will improve
> its speed and accuracy (reduce floating point errors). This is applicable
> because Python uses a numerator and denominator rather than a sign and
> mantissa to represent floats.
> First, I propose that a float's integer ratio should be accurate. For
> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
> returns(6004799503160661, 18014398509481984).
> Second of all, even though 1 * 3 = 3 (last example), 6004799503160661 * 3
> does not equal 18014398509481984. Instead, it equals 1801439850948198**3**,
> one less than the value in the ratio. This means the ratio is inaccurate,
> as well as completely not simplified.
> [image: Inline image 1]
> Even if the value displayed for a float is a rounded value, the internal
> numerator and denominator should divide to equal to completely accurate
> value.
> Thanks for considering this improvement!
> Sincerely,
> u8y7541
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> >
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: pythonfloats.PNG
> Type: image/png
> Size: 16278 bytes
> Desc: not available
> URL: <
> >
> ------------------------------
> Message: 2
> Date: Sun, 31 May 2015 19:37:14 -0700
> From: Nathaniel Smith <njs at>
> To: u8y7541 The Awesome Person <surya.subbarao1 at>
> Cc: python-ideas at
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID:
>         <CAPJVwBnJqygpOktfD+qRy0J_UZE9MD0hkR+R2=
> wE1PWDSrquzg at>
> Content-Type: text/plain; charset="utf-8"
> On May 31, 2015 7:26 PM, "u8y7541 The Awesome Person" <
> surya.subbarao1 at> wrote:
> >
> > Dear Python Developers:
> >
> > I will be presenting a modification to the float class, which will
> improve its speed and accuracy (reduce floating point errors). This is
> applicable because Python uses a numerator and denominator rather than a
> sign and mantissa to represent floats.
> Python's floats are in fact ieee754 floats, using sign/mantissa/exponent,
> as provided by all popular CPU floating point hardware. This is why you're
> getting the results you see -- 1/3 cannot be exactly represented as a
> float, so it gets rounded to the closest representable float, and then
> as_integer_ratio shows you an exact representation of this rounded value.
> It sounds like you're instead looking for an exact fraction representation,
> which in python is available in the standard "fractions" module:
> -n
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> >
> ------------------------------
> Subject: Digest Footer
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> ------------------------------
> End of Python-ideas Digest, Vol 103, Issue 2
> ********************************************

-Surya Subbarao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From surya.subbarao1 at  Mon Jun  1 21:22:40 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Mon, 1 Jun 2015 12:22:40 -0700
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 3
In-Reply-To: <>
References: <>
Message-ID: <>

Maybe we could make a C implementation of the Fraction module? That would
be nice.

On Sun, May 31, 2015 at 8:28 PM, <python-ideas-request at> wrote:

> Send Python-ideas mailing list submissions to
>         python-ideas at
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
>         python-ideas-request at
> You can reach the person managing the list at
>         python-ideas-owner at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-ideas digest..."
> Today's Topics:
>    1. Re: Python Float Update (Chris Angelico)
>    2. Re: Python Float Update (random832 at
>    3. Re: Python Float Update (Jim Witschey)
>    4. Re: Python Float Update (David Mertz)
> ----------------------------------------------------------------------
> Message: 1
> Date: Mon, 1 Jun 2015 12:48:12 +1000
> From: Chris Angelico <rosuav at>
> Cc: python-ideas <python-ideas at>
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID:
>         <CAPTjJmr=LurRUoKH3KVVYpMFc=
> W5h6etG5TscV5uU6zWhxVbgQ at>
> Content-Type: text/plain; charset=UTF-8
> On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person
> <surya.subbarao1 at> wrote:
> >
> > I will be presenting a modification to the float class, which will
> improve its speed and accuracy (reduce floating point errors). This is
> applicable because Python uses a numerator and denominator rather than a
> sign and mantissa to represent floats.
> >
> > First, I propose that a float's integer ratio should be accurate. For
> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
> returns(6004799503160661, 18014398509481984).
> >
> I think you're misunderstanding the as_integer_ratio method. That
> isn't how Python works internally; that's a service provided for
> parsing out float internals into something more readable. What you
> _actually_ are working with is IEEE 754 binary64. (Caveat: I have no
> idea what Python-the-language stipulates, nor what other Python
> implementations use, but that's what CPython uses, and you did your
> initial experiments with CPython. None of this discussion applies *at
> all* if a Python implementation doesn't use IEEE 754.) So internally,
> 1/3 is stored as:
> 0 <-- sign bit (positive)
> 01111111101 <-- exponent (1021)
> 0101010101010101010101010101010101010101010101010101 <-- mantissa (52
> bits, repeating)
> The exponent is offset by 1023, so this means 1.010101.... divided by
> 2?; the original repeating value is exactly equal to 4/3, so this is
> correct, but as soon as it's squeezed into a finite-sized mantissa, it
> gets rounded - in this case, rounded down.
> That's where your result comes from. It's been rounded such that it
> fits inside IEEE 754, and then converted back to a fraction
> afterwards. You're never going to get an exact result for anything
> with a denominator that isn't a power of two. Fortunately, Python does
> offer a solution: store your number as a pair of integers, rather than
> as a packed floating point value, and all calculations truly will be
> exact (at the cost of performance):
> >>> one_third = fractions.Fraction(1, 3)
> >>> one_eighth = fractions.Fraction(1, 8)
> >>> one_third + one_eighth
> Fraction(11, 24)
> This is possibly more what you want to work with.
> ChrisA
> ------------------------------
> Message: 2
> Date: Sun, 31 May 2015 23:14:06 -0400
> From: random832 at
> To: python-ideas at
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID:
>         <1433128446.31560.283106753.6D60F98F at>
> Content-Type: text/plain
> On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote:
> > First, I propose that a float's integer ratio should be accurate. For
> > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
> > returns(6004799503160661, 18014398509481984).
> Even though he's mistaken about the core premise, I do think there's a
> kernel of a good idea here - it would be nice to have a method (maybe
> as_integer_ratio, maybe with some parameter added, maybe a different
> method) to return with the smallest denominator that would result in
> exactly the original float if divided out, rather than merely the
> smallest power of two.
> ------------------------------
> Message: 3
> Date: Mon, 01 Jun 2015 03:21:36 +0000
> From: Jim Witschey <jim.witschey at>
> To: Chris Angelico <rosuav at>
> Cc: python-ideas <python-ideas at>
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID:
>         <CAF+a8-q6kbOwcWk3F47+9PXf2vKgM9ao1uh5=qBw10jqzTC=
> kg at>
> Content-Type: text/plain; charset="utf-8"
> Teachable moments about the implementation of floating-point aside,
> something in this neighborhood has been considered and rejected before, in
> PEP 240. However, that was in 2001 - it was apparently created the same day
> as PEP 237, which introduced transparent conversion of machine ints to
> bignums in the int type.
> I think hiding hardware number implementations has been a success for
> integers - it's a far superior API. It could be for rationals as well.
> Has something like this thread's original proposal - interpeting
> decimal-number literals as fractional values and using fractions as the
> result of integer arithmetic - been seriously discussed more recently than
> PEP 240? If so, why haven't they been implemented? Perhaps enough has
> changed that it's worth reconsidering.
> On Sun, May 31, 2015 at 22:49 Chris Angelico <rosuav at> wrote:
> > On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person
> > <surya.subbarao1 at> wrote:
> > >
> > > I will be presenting a modification to the float class, which will
> > improve its speed and accuracy (reduce floating point errors). This is
> > applicable because Python uses a numerator and denominator rather than a
> > sign and mantissa to represent floats.
> > >
> > > First, I propose that a float's integer ratio should be accurate. For
> > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
> > returns(6004799503160661, 18014398509481984).
> > >
> >
> > I think you're misunderstanding the as_integer_ratio method. That
> > isn't how Python works internally; that's a service provided for
> > parsing out float internals into something more readable. What you
> > _actually_ are working with is IEEE 754 binary64. (Caveat: I have no
> > idea what Python-the-language stipulates, nor what other Python
> > implementations use, but that's what CPython uses, and you did your
> > initial experiments with CPython. None of this discussion applies *at
> > all* if a Python implementation doesn't use IEEE 754.) So internally,
> > 1/3 is stored as:
> >
> > 0 <-- sign bit (positive)
> > 01111111101 <-- exponent (1021)
> > 0101010101010101010101010101010101010101010101010101 <-- mantissa (52
> > bits, repeating)
> >
> > The exponent is offset by 1023, so this means 1.010101.... divided by
> > 2?; the original repeating value is exactly equal to 4/3, so this is
> > correct, but as soon as it's squeezed into a finite-sized mantissa, it
> > gets rounded - in this case, rounded down.
> >
> > That's where your result comes from. It's been rounded such that it
> > fits inside IEEE 754, and then converted back to a fraction
> > afterwards. You're never going to get an exact result for anything
> > with a denominator that isn't a power of two. Fortunately, Python does
> > offer a solution: store your number as a pair of integers, rather than
> > as a packed floating point value, and all calculations truly will be
> > exact (at the cost of performance):
> >
> > >>> one_third = fractions.Fraction(1, 3)
> > >>> one_eighth = fractions.Fraction(1, 8)
> > >>> one_third + one_eighth
> > Fraction(11, 24)
> >
> > This is possibly more what you want to work with.
> >
> > ChrisA
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at
> >
> > Code of Conduct:
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> >
> ------------------------------
> Message: 4
> Date: Sun, 31 May 2015 20:27:47 -0700
> From: David Mertz <mertz at>
> To: random832 at
> Cc: python-ideas <python-ideas at>
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID:
>         <
> CAEbHw4biWOxjYR0vtu8ykwSt63y9dcsR2-FtLPGfyUvqx0GgsQ at>
> Content-Type: text/plain; charset="utf-8"
> On Sun, May 31, 2015 at 8:14 PM, <random832 at> wrote:
> > Even though he's mistaken about the core premise, I do think there's a
> > kernel of a good idea here - it would be nice to have a method (maybe
> > as_integer_ratio, maybe with some parameter added, maybe a different
> > method) to return with the smallest denominator that would result in
> > exactly the original float if divided out, rather than merely the
> > smallest power of two.
> >
> What is the computational complexity of a hypothetical
> float.as_simplest_integer_ratio() method?  How hard that is to find is not
> obvious to me (probably it should be, but I'm not sure).
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> >
> ------------------------------
> Subject: Digest Footer
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> ------------------------------
> End of Python-ideas Digest, Vol 103, Issue 3
> ********************************************

-Surya Subbarao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stefan_ml at  Mon Jun  1 22:18:49 2015
From: stefan_ml at (Stefan Behnel)
Date: Mon, 01 Jun 2015 22:18:49 +0200
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <mkiena$lbh$>

u8y7541 The Awesome Person schrieb am 01.06.2015 um 21:22:
> Maybe we could make a C implementation of the Fraction module? That would
> be nice.

See the quicktions module:


From random832 at  Mon Jun  1 23:09:40 2015
From: random832 at (random832 at
Date: Mon, 01 Jun 2015 17:09:40 -0400
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 2
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 1, 2015, at 15:13, u8y7541 The Awesome Person wrote:
> Floats internally use numerator and denominator
> (float.as_integer_ratio().)

The fact that this method exists is not actually evidence that this form
is used internally. This is a utility method, provided, I suspect, for
the use of the fractions.Fraction constructor (which class does use a
numerator and denominator).

From abarnert at  Mon Jun  1 23:56:46 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 14:56:46 -0700
Subject: [Python-ideas] Why decode()/encode() name is harmful
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 08:46, anatoly techtonik <techtonik at> wrote:
>> On Sat, May 30, 2015 at 3:18 AM, Steven D'Aprano <steve at> wrote:
>> As far as I can see, he has been given the solution, or at least a
>> potential solution, on python-list, but as far as I can tell he either
>> hasn't read it, or doesn't like the solutions offerred and so is
>> ignoring them.
> Let me update you on this. There was no solution given. Only the
> pointers to go read some pointers on the internets again. So, yes,
> I read replies. But I have very little time to analyse and follow up.

Hold on. You had a question, you don't have time to read the answers you were given, so instead you think Python needs to change?

> The idea I wanted to convey in this thread is that encode/decode
> is confusing, so if you agree with that, I can start to propose
> alternatives.
> And just to make you understand the importance of the question
> with translating from bytes to unicode and back, let me just tell
> that this question is the third one voted with 221k views on SO in
> Python 3 tag.

First, as multiple people including the OP say in the comments to that question, what's confusing to novices is that subprocess pipes are the first thing they've used that are binary by default instead of text by default. (For other novices that will instead happen with sockets. But it will eventually happen somewhere.) So, maybe the subprocess docs need a prominent link to, say, the Unicode HOWTO, which is what the OP of that question seems to be proposing. Or maybe it should just be easier to open subprocess pipes in text mode, as it is for files.

But I don't see how renaming the methods could possibly help anything. The problem is not that the OP saw the answer and didn't understand or believe it, it's that he didn't know how to search for it. When told the right answer, he immediately said "Thanks, that does it" not "Whatchootalkinbout Willis, I don't have any crypto here". I've never heard of anyone besides you having that reaction.

Also, your own answer there is a really bad idea. It was an intentional part of the design of UTF-8 that decoding non-UTF-8 non-ASCII text as if it were UTF-8 will almost always signal an error. It's not a good thing to silently get mojibake instead of getting an error--it just pushes the problem back further, to someone it's harder to understand, find, and debug. In the worst case, it just pushes the problem all the way to the end user, who's even less equipped to deal with it than you when his Russian characters get turned into box graphics. If you have bytes and you want text, the only solution to that is to find out the encoding and decode it. That's not a problem with Python, it's a problem with the proliferation of incompatible encodings that people have used without any in-band or out-of-band indications over the past few decades.

Of course there are cases where you want to smuggle bytes with text, or degrade as gracefully as possible on errors, or whatever. That's why decode takes an error handler. But in the usual case, if you try to interpret something as UTF-8 when it's really cp1252, or interpret something as Big5 when it's really Shift-JIS, or whatever, an error is exactly what you should hope for, to tell you that you guessed wrong. That's why it's the default.

> -- 
> anatoly t.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From abarnert at  Tue Jun  2 00:15:16 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 15:15:16 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 10:24, Nicholas Chammas <nicholas.chammas at> wrote:
> Well, I learned a lot about decimals today. :)
> On Mon, Jun 1, 2015 at 3:08 AM, Nick Coghlan ncoghlan at wrote:
> In a world of binary computers, no programming language is free of
> those constraints - if you choose decimal literals as your default,
> you take a big performance hit, because computers are designed as
> binary systems. (Some languages, like IBM?s REXX, do choose to use
> decimal integers by default)
> I guess it?s a non-trivial tradeoff. But I would lean towards considering people likely to be affected by the performance hit as doing something ?not common?. Like, if they are doing that many calculations that it matters, perhaps it makes sense to ask them to explicitly ask for floats vs. decimals, in exchange for giving the majority who wouldn?t notice a performance difference a better user experience.
> On Mon, Jun 1, 2015 at 10:58 AM, Steven D?Aprano steve at wrote:
> I wish this myth about Decimals would die, because it isn?t true.
> Your email had a lot of interesting information about decimals that would make a good blog post, actually. Writing one up will perhaps help kill this myth in the long run :)
> In the past, I?ve found that people are very resistant to this fact, so
> I?m going to show a few examples of how Decimals violate the fundamental
> laws of mathematics just as floats do.
> How many of your examples are inherent limitations of decimals vs. problems that can be improved upon?
> Admittedly, the only place where I?ve played with decimals extensively is on Microsoft?s SQL Server (where they are the default literal). I?ve stumbled in the past on my own decimal gotchas, but looking at your examples and trying them on SQL Server I suspect that most of the problems you show are problems of precision and scale.
> Perhaps Python needs better rules for how precision and scale are affected by calculations (here are SQL Server?s, for example), or better defaults when they are not specified?
> Anyway, here?s what happens on SQL Server for some of the examples you provided.
> Adding 100:
> py> from decimal import Decimal as D
> py> x = D(10)**30
> py> x == x + 100 # should be False
> True
> DECLARE @x DECIMAL(38,0) = '1' + REPLICATE(0, 30);
> IF @x = @x + 100
>   SELECT 'equal' AS adding_100
>   SELECT 'not equal' AS adding_100
> Gives ?not equal?. Leaving out the precision when declaring @x (i.e. going with the default precision of 18) immediately yields an understandable data truncation error.
Obviously if you know the maximum precision needed before you start and explicitly set it to something big enough (or 7 places bigger than needed) you won't have any problem. Steven chose a low precision just to make the problems easy to see and understand; he could just as easily have constructed examples for a precision of 18.

Unfortunately, even in cases where it is both possible and sufficiently efficient to work out and set the precision high enough to make all of your calculations exact, that's not something most people know how to do reliably. In the fully general case, it's as hard as calculating error propagation.

As for the error: Python's decimal flags that too; the difference is that the flag is ignored by default. You can change it to warn or error instead. Maybe the solution is to make that easier--possibly just changing the docs. If you read the whole thing you will eventually learn that the default context ignores most such errors, but a one-liner gets you a different context that acts like SQL Server, but who reads the whole module docs (especially when they already believe they understand how decimal arithmetic works)? Maybe moving that up near the top would be useful?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tjreedy at  Tue Jun  2 00:20:19 2015
From: tjreedy at (Terry Reedy)
Date: Mon, 01 Jun 2015 18:20:19 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <mkils0$elr$>

On 6/1/2015 2:02 AM, Jim Witschey wrote:
> On Mon, Jun 1, 2015 at 12:37 AM, Terry Reedy <tjreedy at> wrote:
>> The competing proposal is to treat decimal literals as decimal.Decimal
>> values.
> Is that an existing PEP? I couldn't find any such proposal.

No, it is an idea presented here and other python lists.  Example: just 
today, Laura Creighton wrote on python-list (Re: What is considered an 
"advanced" topic in Python?)

 > But I am a bad arguer.
 > When incompatibilites were going into Python 3.0 I wanted
 > y = 1.3  to give you a decimal, not a float.
 > If you wanted a float you would have to write y = 1.3f or something.
 > I lost that one too.  I still think it would be great.
 > But, hell, I write accounting and bookkeeping systems.  Your milage
 > may vary. :)

There is no PEP AFAIK because no one has bothered to write one sure to 
be rejected.

Terry Jan Reedy

From tjreedy at  Tue Jun  2 00:26:46 2015
From: tjreedy at (Terry Reedy)
Date: Mon, 01 Jun 2015 18:26:46 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <mkim83$km8$>

On 6/1/2015 10:52 AM, Joonas Liik wrote:
> Having some sort of decimal literal would have some advantages of its
> own, for one it could help against this sillyness:
>  >>> Decimal(1.3)
> Decimal('1.3000000000000000444089209850062616169452667236328125')
>  >>> Decimal('1.3')
> Decimal('1.3')
> I'm not saying that the actual data type needs to be a decimal (
> might well be a float but say shove the string repr next to it so it can
> be accessed when needed)
> ..but this is one really common pitfall for new users, i know its easy
> to fix the code above,
> but this behavior is very unintuitive.. you essentially get a really
> expensive float when you do the obvious thing.
> Not sure if this is worth the effort but it would help smooth some
> corners potentially..

Since Decimal is designed specifically for money calculations, $ could 
be used as a generic money suffix.
1.3$ == Decimal(1.3)
.0101$ (money multiplier, as with interest)

Terry Jan Reedy

From abarnert at  Tue Jun  2 00:30:08 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 15:30:08 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 08:12, Joonas Liik <liik.joonas at> wrote:
> I'm sorry..
> what i meant was not a literal that results in a Decimal, what i meant was a special literal proxy object that usualyl acts like a float except you can ask for its original string form.

This is essentially what I was saying with new "literal constant" types. Swift is probably the most prominent language with this feature. is a good description of how it works. Many of the reasons Swift needed this don't apply in Python. For example, in Swift, it's how you can build a Set at compile time from an ArrayLiteral instead of building an Array and converting it to Set at compile time. Or how you can use 0 as a default value for a non-integer type without getting a TypeError or a runtime conversion. Or how you can build an Optional that acts like a real ADT but assign it nil instead of a special enumeration value. Or how you can decode UTF-8 source text to store in UTF-16 or UTF-32 or grapheme-cluster at compile time. And so on.

> eg:
> flit = 1.3
> flit*3 == float(flit)*3
> str(flit) == '1.3'
> thus in cases where the intermediate float conversion loses precision you can get at the original string that the programmer actually typed in.
> Decimal constructors are one case that woudl probably like to use the original string whenever possible to avoid conversion losses,
> but by no means are they the only ones.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From abarnert at  Tue Jun  2 00:40:55 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 15:40:55 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

Sorry, I accidentally sent that before it was done...

Sent from my iPhone

> On Jun 1, 2015, at 15:30, Andrew Barnert via Python-ideas <python-ideas at> wrote:
>> On Jun 1, 2015, at 08:12, Joonas Liik <liik.joonas at> wrote:
>> I'm sorry..
>> what i meant was not a literal that results in a Decimal, what i meant was a special literal proxy object that usualyl acts like a float except you can ask for its original string form.
> This is essentially what I was saying with new "literal constant" types. Swift is probably the most prominent language with this feature. is a good description of how it works. Many of the reasons Swift needed this don't apply in Python. For example, in Swift, it's how you can build a Set at compile time from an ArrayLiteral instead of building an Array and converting it to Set at compile time. Or how you can use 0 as a default value for a non-integer type without getting a TypeError or a runtime conversion. Or how you can build an Optional that acts like a real ADT but assign it nil instead of a special enumeration value. Or how you can decode UTF-8 source text to store in UTF-16 or UTF-32 or grapheme-cluster at compile time. And so on.

Anyway, my point was that the Swift feature is complicated, and has some controversial downsides (e.g., see the example at the end of silently using a string literal as if it were a URL by accessing an attribute of the NSURL class--which works given the Smalltalk-derived style of OO, but many people still find it confusing). But the basic idea can be extracted out and Pythonified:

The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor.

The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess.

The advantage of C++-style user-defined literal suffixes is that the absence of a suffix is something the compiler can see, so 1.23d might still require a runtime call, but 1.23 just is compiled as a float constant the same as it's been since Python 1.x.

From surya.subbarao1 at  Tue Jun  2 00:46:46 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Mon, 1 Jun 2015 15:46:46 -0700
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 10
In-Reply-To: <>
References: <>
Message-ID: <>

Thanks Stefan for quicktions!

On Mon, Jun 1, 2015 at 1:18 PM, <python-ideas-request at> wrote:

> Send Python-ideas mailing list submissions to
>         python-ideas at
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
>         python-ideas-request at
> You can reach the person managing the list at
>         python-ideas-owner at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-ideas digest..."
> Today's Topics:
>    1. Re: Python-ideas Digest, Vol 103, Issue 3
>       (u8y7541 The Awesome Person)
>    2. Re: Python Float Update (Stefan Behnel)
> ----------------------------------------------------------------------
> Message: 1
> Date: Mon, 1 Jun 2015 12:22:40 -0700
> From: u8y7541 The Awesome Person <surya.subbarao1 at>
> To: python-ideas at
> Subject: Re: [Python-ideas] Python-ideas Digest, Vol 103, Issue 3
> Message-ID:
>         <CA+o1fZMHxNybOjMWr7Uqk8=
> AKBncpAKXy+QO+vhF8ENtAXL6qg at>
> Content-Type: text/plain; charset="utf-8"
> Maybe we could make a C implementation of the Fraction module? That would
> be nice.
> On Sun, May 31, 2015 at 8:28 PM, <python-ideas-request at> wrote:
> > Send Python-ideas mailing list submissions to
> >         python-ideas at
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >
> > or, via email, send a message with subject or body 'help' to
> >         python-ideas-request at
> >
> > You can reach the person managing the list at
> >         python-ideas-owner at
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Python-ideas digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: Python Float Update (Chris Angelico)
> >    2. Re: Python Float Update (random832 at
> >    3. Re: Python Float Update (Jim Witschey)
> >    4. Re: Python Float Update (David Mertz)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 1 Jun 2015 12:48:12 +1000
> > From: Chris Angelico <rosuav at>
> > Cc: python-ideas <python-ideas at>
> > Subject: Re: [Python-ideas] Python Float Update
> > Message-ID:
> >         <CAPTjJmr=LurRUoKH3KVVYpMFc=
> > W5h6etG5TscV5uU6zWhxVbgQ at>
> > Content-Type: text/plain; charset=UTF-8
> >
> > On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person
> > <surya.subbarao1 at> wrote:
> > >
> > > I will be presenting a modification to the float class, which will
> > improve its speed and accuracy (reduce floating point errors). This is
> > applicable because Python uses a numerator and denominator rather than a
> > sign and mantissa to represent floats.
> > >
> > > First, I propose that a float's integer ratio should be accurate. For
> > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
> > returns(6004799503160661, 18014398509481984).
> > >
> >
> > I think you're misunderstanding the as_integer_ratio method. That
> > isn't how Python works internally; that's a service provided for
> > parsing out float internals into something more readable. What you
> > _actually_ are working with is IEEE 754 binary64. (Caveat: I have no
> > idea what Python-the-language stipulates, nor what other Python
> > implementations use, but that's what CPython uses, and you did your
> > initial experiments with CPython. None of this discussion applies *at
> > all* if a Python implementation doesn't use IEEE 754.) So internally,
> > 1/3 is stored as:
> >
> > 0 <-- sign bit (positive)
> > 01111111101 <-- exponent (1021)
> > 0101010101010101010101010101010101010101010101010101 <-- mantissa (52
> > bits, repeating)
> >
> > The exponent is offset by 1023, so this means 1.010101.... divided by
> > 2?; the original repeating value is exactly equal to 4/3, so this is
> > correct, but as soon as it's squeezed into a finite-sized mantissa, it
> > gets rounded - in this case, rounded down.
> >
> > That's where your result comes from. It's been rounded such that it
> > fits inside IEEE 754, and then converted back to a fraction
> > afterwards. You're never going to get an exact result for anything
> > with a denominator that isn't a power of two. Fortunately, Python does
> > offer a solution: store your number as a pair of integers, rather than
> > as a packed floating point value, and all calculations truly will be
> > exact (at the cost of performance):
> >
> > >>> one_third = fractions.Fraction(1, 3)
> > >>> one_eighth = fractions.Fraction(1, 8)
> > >>> one_third + one_eighth
> > Fraction(11, 24)
> >
> > This is possibly more what you want to work with.
> >
> > ChrisA
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Sun, 31 May 2015 23:14:06 -0400
> > From: random832 at
> > To: python-ideas at
> > Subject: Re: [Python-ideas] Python Float Update
> > Message-ID:
> >         <1433128446.31560.283106753.6D60F98F at
> >
> > Content-Type: text/plain
> >
> > On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote:
> > > First, I propose that a float's integer ratio should be accurate. For
> > > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
> > > returns(6004799503160661, 18014398509481984).
> >
> > Even though he's mistaken about the core premise, I do think there's a
> > kernel of a good idea here - it would be nice to have a method (maybe
> > as_integer_ratio, maybe with some parameter added, maybe a different
> > method) to return with the smallest denominator that would result in
> > exactly the original float if divided out, rather than merely the
> > smallest power of two.
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Mon, 01 Jun 2015 03:21:36 +0000
> > From: Jim Witschey <jim.witschey at>
> > To: Chris Angelico <rosuav at>
> > Cc: python-ideas <python-ideas at>
> > Subject: Re: [Python-ideas] Python Float Update
> > Message-ID:
> >         <CAF+a8-q6kbOwcWk3F47+9PXf2vKgM9ao1uh5=qBw10jqzTC=
> > kg at>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Teachable moments about the implementation of floating-point aside,
> > something in this neighborhood has been considered and rejected before,
> in
> > PEP 240. However, that was in 2001 - it was apparently created the same
> day
> > as PEP 237, which introduced transparent conversion of machine ints to
> > bignums in the int type.
> >
> > I think hiding hardware number implementations has been a success for
> > integers - it's a far superior API. It could be for rationals as well.
> >
> > Has something like this thread's original proposal - interpeting
> > decimal-number literals as fractional values and using fractions as the
> > result of integer arithmetic - been seriously discussed more recently
> than
> > PEP 240? If so, why haven't they been implemented? Perhaps enough has
> > changed that it's worth reconsidering.
> >
> >
> > On Sun, May 31, 2015 at 22:49 Chris Angelico <rosuav at> wrote:
> >
> > > On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person
> > > <surya.subbarao1 at> wrote:
> > > >
> > > > I will be presenting a modification to the float class, which will
> > > improve its speed and accuracy (reduce floating point errors). This is
> > > applicable because Python uses a numerator and denominator rather than
> a
> > > sign and mantissa to represent floats.
> > > >
> > > > First, I propose that a float's integer ratio should be accurate. For
> > > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
> > > returns(6004799503160661, 18014398509481984).
> > > >
> > >
> > > I think you're misunderstanding the as_integer_ratio method. That
> > > isn't how Python works internally; that's a service provided for
> > > parsing out float internals into something more readable. What you
> > > _actually_ are working with is IEEE 754 binary64. (Caveat: I have no
> > > idea what Python-the-language stipulates, nor what other Python
> > > implementations use, but that's what CPython uses, and you did your
> > > initial experiments with CPython. None of this discussion applies *at
> > > all* if a Python implementation doesn't use IEEE 754.) So internally,
> > > 1/3 is stored as:
> > >
> > > 0 <-- sign bit (positive)
> > > 01111111101 <-- exponent (1021)
> > > 0101010101010101010101010101010101010101010101010101 <-- mantissa (52
> > > bits, repeating)
> > >
> > > The exponent is offset by 1023, so this means 1.010101.... divided by
> > > 2?; the original repeating value is exactly equal to 4/3, so this is
> > > correct, but as soon as it's squeezed into a finite-sized mantissa, it
> > > gets rounded - in this case, rounded down.
> > >
> > > That's where your result comes from. It's been rounded such that it
> > > fits inside IEEE 754, and then converted back to a fraction
> > > afterwards. You're never going to get an exact result for anything
> > > with a denominator that isn't a power of two. Fortunately, Python does
> > > offer a solution: store your number as a pair of integers, rather than
> > > as a packed floating point value, and all calculations truly will be
> > > exact (at the cost of performance):
> > >
> > > >>> one_third = fractions.Fraction(1, 3)
> > > >>> one_eighth = fractions.Fraction(1, 8)
> > > >>> one_third + one_eighth
> > > Fraction(11, 24)
> > >
> > > This is possibly more what you want to work with.
> > >
> > > ChrisA
> > > _______________________________________________
> > > Python-ideas mailing list
> > > Python-ideas at
> > >
> > > Code of Conduct:
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> >
> > >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Sun, 31 May 2015 20:27:47 -0700
> > From: David Mertz <mertz at>
> > To: random832 at
> > Cc: python-ideas <python-ideas at>
> > Subject: Re: [Python-ideas] Python Float Update
> > Message-ID:
> >         <
> > CAEbHw4biWOxjYR0vtu8ykwSt63y9dcsR2-FtLPGfyUvqx0GgsQ at>
> > Content-Type: text/plain; charset="utf-8"
> >
> > On Sun, May 31, 2015 at 8:14 PM, <random832 at> wrote:
> >
> > > Even though he's mistaken about the core premise, I do think there's a
> > > kernel of a good idea here - it would be nice to have a method (maybe
> > > as_integer_ratio, maybe with some parameter added, maybe a different
> > > method) to return with the smallest denominator that would result in
> > > exactly the original float if divided out, rather than merely the
> > > smallest power of two.
> > >
> >
> > What is the computational complexity of a hypothetical
> > float.as_simplest_integer_ratio() method?  How hard that is to find is
> not
> > obvious to me (probably it should be, but I'm not sure).
> >
> > --
> > Keeping medicines from the bloodstreams of the sick; food
> > from the bellies of the hungry; books from the hands of the
> > uneducated; technology from the underdeveloped; and putting
> > advocates of freedom in prisons.  Intellectual property is
> > to the 21st century what the slave trade was to the 16th.
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> >
> > >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at
> >
> >
> >
> > ------------------------------
> >
> > End of Python-ideas Digest, Vol 103, Issue 3
> > ********************************************
> >
> --
> -Surya Subbarao
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> >
> ------------------------------
> Message: 2
> Date: Mon, 01 Jun 2015 22:18:49 +0200
> From: Stefan Behnel <stefan_ml at>
> To: python-ideas at
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID: <mkiena$lbh$1 at>
> Content-Type: text/plain; charset=utf-8
> u8y7541 The Awesome Person schrieb am 01.06.2015 um 21:22:
> > Maybe we could make a C implementation of the Fraction module? That would
> > be nice.
> See the quicktions module:
> Stefan
> ------------------------------
> Subject: Digest Footer
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> ------------------------------
> End of Python-ideas Digest, Vol 103, Issue 10
> *********************************************

-Surya Subbarao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Tue Jun  2 00:58:14 2015
From: ncoghlan at (Nick Coghlan)
Date: Tue, 2 Jun 2015 08:58:14 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 2 Jun 2015 01:04, "David Mertz" <mertz at> wrote:
> Decimal literals are far from as obvious as suggested.  We *have* the
`decimal` module after all, and it defines all sorts of parameters on
precision, rounding rules, etc. that one can provide context for.
 decimal.ROUND_HALF_DOWN is "the obvious way" for some users,
while decimal.ROUND_CEILING is "the obvious way" for others.
> I like decimals, but they don't simply make all the mathematical answers
result in what all users would would consider "do what I mean" either.

The last time we had a serious discussion about decimal literals, we
realised the fact their behaviour is context dependent posed a significant
problem for providing a literal form. With largely hardware provided
IEEE754 semantics, binary floats are predictable, albeit somewhat
surprising if you're expecting abstract math behaviour (i.e. no rounding
errors), or finite base 10 representation behaviour.

By contrast, decimal arithmetic deliberately allows for configurable
contexts, presumably because financial regulations sometimes place strict
constraints on how arithmetic is to be handled (e.g. "round half even" is
also known as "banker's rounding", since it eliminates statistical bias in
rounding financial transactions to the smallest supported unit of currency).

That configurability makes decimal more fit for its primary intended use
case (i.e. financial math), but also makes local reasoning harder - the
results of some operations (even something as simple as unary plus) may
vary based on the configured context (the precision, in particular).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From nicholas.chammas at  Tue Jun  2 00:53:33 2015
From: nicholas.chammas at (Nicholas Chammas)
Date: Mon, 01 Jun 2015 22:53:33 +0000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 1, 2015 at 6:15 PM Andrew Barnert abarnert at
<http://mailto:abarnert at> wrote:

Obviously if you know the maximum precision needed before you start and
> explicitly set it to something big enough (or 7 places bigger than needed)
> you won't have any problem. Steven chose a low precision just to make the
> problems easy to see and understand; he could just as easily have
> constructed examples for a precision of 18.
> Unfortunately, even in cases where it is both possible and sufficiently
> efficient to work out and set the precision high enough to make all of your
> calculations exact, that's not something most people know how to do
> reliably. In the fully general case, it's as hard as calculating error
> propagation.
> As for the error: Python's decimal flags that too; the difference is that
> the flag is ignored by default. You can change it to warn or error instead.
> Maybe the solution is to make that easier--possibly just changing the docs.
> If you read the whole thing you will eventually learn that the default
> context ignores most such errors, but a one-liner gets you a different
> context that acts like SQL Server, but who reads the whole module docs
> (especially when they already believe they understand how decimal
> arithmetic works)? Maybe moving that up near the top would be useful?
> This angle of discussion is what I was getting at when I wrote:

Perhaps Python needs better rules for how precision and scale are affected
by calculations (here are SQL Server?s
<>, for example), or
better defaults when they are not specified?

It sounds like there are perhaps several improvements that can be made to
how decimals are handled, documented, and configured by default, that could
possibly address the majority of gotchas for the majority of people in a
more user friendly way than can be accomplished with floats.

For all the problems presented with decimals by Steven and others, I?m not
seeing how overall they are supposed to be *worse* than the problems with

We can explain precision and scale to people when they are using decimals
and give them a basic framework for understanding how they affect
calculations, and we can pick sensible defaults so that people won?t hit
nasty gotchas easily. So we have some leverage there for making the
experience better for most people most of the time.

What?s our leverage for improving the experience of working with floats?
And is the result really something better than decimals?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Tue Jun  2 01:26:08 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 16:26:08 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 15:53, Nicholas Chammas <nicholas.chammas at> wrote:
>> On Mon, Jun 1, 2015 at 6:15 PM Andrew Barnert abarnert at wrote:
>> Obviously if you know the maximum precision needed before you start and explicitly set it to something big enough (or 7 places bigger than needed) you won't have any problem. Steven chose a low precision just to make the problems easy to see and understand; he could just as easily have constructed examples for a precision of 18.
>> Unfortunately, even in cases where it is both possible and sufficiently efficient to work out and set the precision high enough to make all of your calculations exact, that's not something most people know how to do reliably. In the fully general case, it's as hard as calculating error propagation.
>> As for the error: Python's decimal flags that too; the difference is that the flag is ignored by default. You can change it to warn or error instead. Maybe the solution is to make that easier--possibly just changing the docs. If you read the whole thing you will eventually learn that the default context ignores most such errors, but a one-liner gets you a different context that acts like SQL Server, but who reads the whole module docs (especially when they already believe they understand how decimal arithmetic works)? Maybe moving that up near the top would be useful?
> This angle of discussion is what I was getting at when I wrote:
> Perhaps Python needs better rules for how precision and scale are affected by calculations (here are SQL Server?s, for example), or better defaults when they are not specified?
I definitely agree that some edits to the decimal module docs, plus maybe a new HOWTO, and maybe some links to outside resources that explain things to people who are used to decimals in MSSQLServer or REXX or whatever, would be helpful. The question is, who has the sufficient knowledge, skill, and time/inclination to do it?

> It sounds like there are perhaps several improvements that can be made to how decimals are handled, documented, and configured by default, that could possibly address the majority of gotchas for the majority of people in a more user friendly way than can be accomplished with floats.
> For all the problems presented with decimals by Steven and others, I?m not seeing how overall they are supposed to be worse than the problems with floats.
They're not worse than the problems with floats, they're the same problems... But the _effect_ of those problems can be worse, because:

 * The magnitude of the rounding errors is larger.

 * People mistakenly think they understand everything relevant about decimals, and the naive tests they try work out, so the problems may blindside them.

 * Being much more detailed and configurable means the best solution may be harder to find.

 * There's a lot of correct but potentially-misleading information out there. For example, any StackOverflow answer that says "you can solve this particular problem by using Decimal instead of float" can be very easily misinterpreted as applying to a much wider range of problems than it actually does.

 * Sometimes performance matters.

On the other hand, the effect can also be less bad, because:

 * Once people do finally understand a given problem, at least for many people and many problems, working out a solution is easier in decimal. For some uses (in particular, many financial uses, and some kinds of engineering problems), it's even trivial.

 * Being more detailed and more configurable means the best solution may be better than any solution involving float.

I don't think there's any obvious answer to the tradeoff, short of making it easier for people to choose appropriately: a good HOWTO, decimal literals or Swift-style float-convertibles, making it easier to find/construct decimal64 or DECIMAL(18) or Money types, speeding up decimal (already done, but maybe more could be done), etc.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Tue Jun  2 02:08:37 2015
From: ncoghlan at (Nick Coghlan)
Date: Tue, 2 Jun 2015 10:08:37 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
<python-ideas at> wrote:
>But the basic idea can be extracted out and Pythonified:
> The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor.
> The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess.

Joonas's suggestion of storing the original text representation passed
to the float constructor is at least a novel one - it's only the idea
of actual decimal literals that was ruled out in the past.

Aside from the practical implementation question, the main concern I
have with it is that we'd be trading the status quo for a situation
where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.

It seems to me that a potentially better option might be to adjust the
implicit float->Decimal conversion in the Decimal constructor to use
the same algorithm as we now use for float.__repr__ [1], where we look
for the shortest decimal representation that gives the same answer
when rendered as a float. At the moment you have to indirect through
str() or repr() to get that behaviour:

 >>> from decimal import Decimal as D
 >>> 1.3
 >>> D('1.3')
 >>> D(1.3)
 >>> D(str(1.3))



From abarnert at  Tue Jun  2 03:27:32 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 18:27:32 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 17:08, Nick Coghlan <ncoghlan at> wrote:
> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
> <python-ideas at> wrote:
>> But the basic idea can be extracted out and Pythonified:
>> The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor.
>> The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess.
> Joonas's suggestion of storing the original text representation passed
> to the float constructor is at least a novel one - it's only the idea
> of actual decimal literals that was ruled out in the past.

I actually built about half an implementation of something like Swift's LiteralConvertible protocol back when I was teaching myself Swift. But I think I have a simpler version that I could implement much more easily.

Basically, FloatLiteral is just a subclass of float whose __new__ stores its constructor argument. Then decimal.Decimal checks for that stored string and uses it instead of the float value if present. Then there's an import hook that replaces every Num with a call to FloatLiteral.

This design doesn't actually fix everything; in effect, 1.3 actually compiles to FloatLiteral(str(float('1.3')) (because by the time you get to the AST it's too late to avoid that first conversion). Which does actually solve the problem with 1.3, but doesn't solve everything in general (e.g., just feed in a number that has more precision than a double can hold but less than your current decimal context can...).

But it just lets you test whether the implementation makes sense and what the performance effects are, and it's only an hour of work, and doesn't require anyone to patch their interpreter to play with it. If it seems promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') may be worth doing for a test of the actual functionality.

I'll be glad to hack it up when I get a chance tonight. But personally, I think decimal literals are a better way to go here. Decimal(1.20) magically doing what you want still has all the same downsides as 1.20d (or implicit decimal literals), plus it's more complex, adds performance costs, and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little nicer than Decimal('1.20'), but only a little--and nowhere near as nice as 1.20d).

> Aside from the practical implementation question, the main concern I
> have with it is that we'd be trading the status quo for a situation
> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.

Yes, to solve that you really need Decimal(13)/Decimal(10)... Which implies that maybe the simplification in Decimal(1.3) is more misleading than helpful. (Notice that this problem also doesn't arise for decimal literals--13/10d is int vs. Decimal division, which is correct out of the box. Or, if you want prefixes, d13/10 is Decimal vs. int division.)

> It seems to me that a potentially better option might be to adjust the
> implicit float->Decimal conversion in the Decimal constructor to use
> the same algorithm as we now use for float.__repr__ [1], where we look
> for the shortest decimal representation that gives the same answer
> when rendered as a float. At the moment you have to indirect through
> str() or repr() to get that behaviour:
>>>> from decimal import Decimal as D
>>>> 1.3
> 1.3
>>>> D('1.3')
> Decimal('1.3')
>>>> D(1.3)
> Decimal('1.3000000000000000444089209850062616169452667236328125')
>>>> D(str(1.3))
> Decimal('1.3')
> Cheers,
> Nick.
> [1]

From abarnert at  Tue Jun  2 04:00:48 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 19:00:48 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas <python-ideas at> wrote:
>> On Jun 1, 2015, at 17:08, Nick Coghlan <ncoghlan at> wrote:
>> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
>> <python-ideas at> wrote:
>>> But the basic idea can be extracted out and Pythonified:
>>> The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor.
>>> The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess.
>> Joonas's suggestion of storing the original text representation passed
>> to the float constructor is at least a novel one - it's only the idea
>> of actual decimal literals that was ruled out in the past.
> I actually built about half an implementation of something like Swift's LiteralConvertible protocol back when I was teaching myself Swift. But I think I have a simpler version that I could implement much more easily.
> Basically, FloatLiteral is just a subclass of float whose __new__ stores its constructor argument. Then decimal.Decimal checks for that stored string and uses it instead of the float value if present. Then there's an import hook that replaces every Num with a call to FloatLiteral.
> This design doesn't actually fix everything; in effect, 1.3 actually compiles to FloatLiteral(str(float('1.3')) (because by the time you get to the AST it's too late to avoid that first conversion). Which does actually solve the problem with 1.3, but doesn't solve everything in general (e.g., just feed in a number that has more precision than a double can hold but less than your current decimal context can...).
> But it just lets you test whether the implementation makes sense and what the performance effects are, and it's only an hour of work,

Make that 15 minutes.

> and doesn't require anyone to patch their interpreter to play with it. If it seems promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') may be worth doing for a test of the actual functionality.
> I'll be glad to hack it up when I get a chance tonight. But personally, I think decimal literals are a better way to go here. Decimal(1.20) magically doing what you want still has all the same downsides as 1.20d (or implicit decimal literals), plus it's more complex, adds performance costs, and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little nicer than Decimal('1.20'), but only a little--and nowhere near as nice as 1.20d).
>> Aside from the practical implementation question, the main concern I
>> have with it is that we'd be trading the status quo for a situation
>> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which implies that maybe the simplification in Decimal(1.3) is more misleading than helpful. (Notice that this problem also doesn't arise for decimal literals--13/10d is int vs. Decimal division, which is correct out of the box. Or, if you want prefixes, d13/10 is Decimal vs. int division.)
>> It seems to me that a potentially better option might be to adjust the
>> implicit float->Decimal conversion in the Decimal constructor to use
>> the same algorithm as we now use for float.__repr__ [1], where we look
>> for the shortest decimal representation that gives the same answer
>> when rendered as a float. At the moment you have to indirect through
>> str() or repr() to get that behaviour:
>>>>> from decimal import Decimal as D
>>>>> 1.3
>> 1.3
>>>>> D('1.3')
>> Decimal('1.3')
>>>>> D(1.3)
>> Decimal('1.3000000000000000444089209850062616169452667236328125')
>>>>> D(str(1.3))
>> Decimal('1.3')
>> Cheers,
>> Nick.
>> [1]
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From steve at  Tue Jun  2 03:58:09 2015
From: steve at (Steven D'Aprano)
Date: Tue, 2 Jun 2015 11:58:09 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote:

> It seems to me that a potentially better option might be to adjust the
> implicit float->Decimal conversion in the Decimal constructor to use
> the same algorithm as we now use for float.__repr__ [1], where we look
> for the shortest decimal representation that gives the same answer
> when rendered as a float. At the moment you have to indirect through
> str() or repr() to get that behaviour:

Apart from the questions of whether such a change would be allowed by 
the Decimal specification, and the breaking of backwards compatibility, 
I would really hate that change for another reason.

At the moment, a good, cheap way to find out what a binary float "really 
is" (in some sense) is to convert it to Decimal and see what you get:

-> Decimal('1.3000000000000000444089209850062616169452667236328125')

If you want conversion from repr, then you can be explicit about it:

-> Decimal('1.3')

("Explicit is better than implicit", as they say...)

Although in fairness I suppose that if this change happens, we could 
keep the old behaviour in the from_float method:

# hypothetical future behaviour
-> Decimal('1.3')
-> Decimal('1.3000000000000000444089209850062616169452667236328125')

But all things considered, I don't think we're doing people any favours 
by changing the behaviour of float->Decimal conversions to implicitly 
use the repr() instead of being exact. I expect this strategy is like 
trying to flatten a bubble under wallpaper: all you can do is push the 
gotchas and surprises to somewhere else.

Oh, another thought... Decimals could gain yet another conversion 
method, one which implicitly uses the float repr, but signals if it was 
an inexact conversion or not. Explicitly calling repr can never signal, 
since the conversion occurs outside of the Decimal constructor and 
Decimal sees only the string:

Decimal(repr(1.3)) cannot signal Inexact.


Decimal.from_nearest_float(1.5)  # exact
Decimal.from_nearest_float(1.3)  # signals Inexact

That might be useful, but probably not to beginners.


From steve at  Tue Jun  2 03:37:48 2015
From: steve at (Steven D'Aprano)
Date: Tue, 2 Jun 2015 11:37:48 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote:

> Having some sort of decimal literal would have some advantages of its own,
> for one it could help against this sillyness:
> >>> Decimal(1.3)
> Decimal('1.3000000000000000444089209850062616169452667236328125')

Why is that silly? That's the actual value of the binary float 1.3 
converted into base 10. If you want 1.3 exactly, you can do this:

> >>> Decimal('1.3')
> Decimal('1.3')

Is that really so hard for people to learn? 

> I'm not saying that the actual data type needs to be a decimal (
> might well be a float but say shove the string repr next to it so it can be
> accessed when needed)

You want Decimals to *lie* about what value they have?

I think that's a terrible idea, one which would lead to a whole set of 
new and exciting surprises when using Decimal. Let me try to predict a 
few of the questions on Stackoverflow which would follow this change...

  Why is equality so inaccurate in Python?

  py> x = Decimal(1.3)
  py> y = Decimal('1.3')
  py> x, y
  (Decimal('1.3'), Decimal('1.3'))
  py> x == y

  Why does Python insert extra digits into numbers when I multiply?

  py> x = Decimal(1.3)
  py> x
  py> y = 10000000000000000*x
  py> y - 13000000000000000

> ..but this is one really common pitfall for new users, i know its easy to
> fix the code above,
> but this behavior is very unintuitive.. you essentially get a really
> expensive float when you do the obvious thing.

Then don't do the obvious thing.

Sometimes there really is no good alternative to actually knowing what 
you are doing. Floating point maths is inherently hard, but that's not 
the problem. There are all sorts of things in programming which are 
hard, and people learn how to deal with them. The problem is that people 
*imagine* that floating point is simple, when it is not and can never 
be. We don't do them any favours by enabling that delusion.

If your needs are light, then you can ignore the complexities of 
floating point. You really can go a very long way by just rounding the 
results of your calculations when displaying them. But for anything more 
than that, we cannot just paper over the floating point complexities 
without creating new complexities that will burn people.

You don't have to become a floating point guru, but it really isn't 
onerous to expect people who are programming to learn a few basic 
programming skills, and that includes a few basic coping strategies for 
floating point.


From abarnert at  Tue Jun  2 04:21:47 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 19:21:47 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 19:00, Andrew Barnert <abarnert at> wrote:
>> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas <python-ideas at> wrote:
>>> On Jun 1, 2015, at 17:08, Nick Coghlan <ncoghlan at> wrote:
>>> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
>>> <python-ideas at> wrote:
>>>> But the basic idea can be extracted out and Pythonified:
>>>> The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor.
>>>> The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess.
>>> Joonas's suggestion of storing the original text representation passed
>>> to the float constructor is at least a novel one - it's only the idea
>>> of actual decimal literals that was ruled out in the past.
>> I actually built about half an implementation of something like Swift's LiteralConvertible protocol back when I was teaching myself Swift. But I think I have a simpler version that I could implement much more easily.
>> Basically, FloatLiteral is just a subclass of float whose __new__ stores its constructor argument. Then decimal.Decimal checks for that stored string and uses it instead of the float value if present. Then there's an import hook that replaces every Num with a call to FloatLiteral.
>> This design doesn't actually fix everything; in effect, 1.3 actually compiles to FloatLiteral(str(float('1.3')) (because by the time you get to the AST it's too late to avoid that first conversion). Which does actually solve the problem with 1.3, but doesn't solve everything in general (e.g., just feed in a number that has more precision than a double can hold but less than your current decimal context can...).
>> But it just lets you test whether the implementation makes sense and what the performance effects are, and it's only an hour of work,
> Make that 15 minutes.

And as it turns out, hacking the tokens is no harder than hacking the AST (in fact, it's a little easier; I'd just never done it before), so now it does that, meaning you really get the actual literal string from the source, not the repr of the float of that string literal.

Turning this into a real implementation would obviously be more than half an hour's work, but not more than a day or two. Again, I don't think anyone would actually want this, but now people who think they do have an implementation to play with to prove me wrong.

>> and doesn't require anyone to patch their interpreter to play with it. If it seems promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') may be worth doing for a test of the actual functionality.
>> I'll be glad to hack it up when I get a chance tonight. But personally, I think decimal literals are a better way to go here. Decimal(1.20) magically doing what you want still has all the same downsides as 1.20d (or implicit decimal literals), plus it's more complex, adds performance costs, and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little nicer than Decimal('1.20'), but only a little--and nowhere near as nice as 1.20d).
>>> Aside from the practical implementation question, the main concern I
>>> have with it is that we'd be trading the status quo for a situation
>>> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
>> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which implies that maybe the simplification in Decimal(1.3) is more misleading than helpful. (Notice that this problem also doesn't arise for decimal literals--13/10d is int vs. Decimal division, which is correct out of the box. Or, if you want prefixes, d13/10 is Decimal vs. int division.)
>>> It seems to me that a potentially better option might be to adjust the
>>> implicit float->Decimal conversion in the Decimal constructor to use
>>> the same algorithm as we now use for float.__repr__ [1], where we look
>>> for the shortest decimal representation that gives the same answer
>>> when rendered as a float. At the moment you have to indirect through
>>> str() or repr() to get that behaviour:
>>>>>> from decimal import Decimal as D
>>>>>> 1.3
>>> 1.3
>>>>>> D('1.3')
>>> Decimal('1.3')
>>>>>> D(1.3)
>>> Decimal('1.3000000000000000444089209850062616169452667236328125')
>>>>>> D(str(1.3))
>>> Decimal('1.3')
>>> Cheers,
>>> Nick.
>>> [1]
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> Code of Conduct:

From steve at  Tue Jun  2 05:00:40 2015
From: steve at (Steven D'Aprano)
Date: Tue, 2 Jun 2015 13:00:40 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>


Your email client appears to not be quoting text you quote. It is a
conventional to use a leading > for quoting, perhaps you could configure 
your mail program to do so? The good ones even have a "Paste As Quote" 

On with the substance of your post...

On Mon, Jun 01, 2015 at 01:24:32PM -0400, Nicholas Chammas wrote:

> I guess it?s a non-trivial tradeoff. But I would lean towards considering
> people likely to be affected by the performance hit as doing something ?not
> common?. Like, if they are doing that many calculations that it matters,
> perhaps it makes sense to ask them to explicitly ask for floats vs.
> decimals, in exchange for giving the majority who wouldn?t notice a
> performance difference a better user experience.

Changing from binary floats to decimal floats by default is a big, 
backwards incompatible change. Even if it's a good idea, we're 
constrained by backwards compatibility: I would imagine we wouldn't want 
to even introduce this feature until the majority of people are using 
Python 3 rather than Python 2, and then we'd probably want to introduce 
it using a "from __future__ import decimal_floats" directive.

So I would guess this couldn't happen until probably 2020 or so.

But we could introduce a decimal literal, say 1.1d for Decimal("1.1"). 
The first prerequisite is that we have a fast Decimal implementation, 
which we now have. Next we would have to decide how the decimal literals 
would interact with the decimal module. Do we include full support of 
the entire range of decimal features, including globally configurable 
precision and other modes? Or just a subset? How will these decimals 
interact with other numeric types, like float and Fraction? At the 
moment, Decimal isn't even part of the numeric tower.

There's a lot of ground to cover, it's not a trivial change, and will 
definitely need a PEP.

> How many of your examples are inherent limitations of decimals vs. problems
> that can be improved upon?

In one sense, they are inherent limitations of floating point numbers 
regardless of base. Whether binary, decimal, hexadecimal as used in some 
IBM computers, or something else, you're going to see the same problems. 
Only the specific details will vary, e.g. 1/3 cannot be represented 
exactly in base 2 or base 10, but if you constructed a base 3 float, it 
would be exact.

In another sense, Decimal has a big advantage that it is much more 
configurable than Python's floats. Decimal lets you configure the 
precision, rounding mode, error handling and more. That's not inherent 
to base 10 calculations, you can do exactly the same thing for binary 
floats too, but Python doesn't offer that feature for floats, only for 

But no matter how you configure Decimal, all you can do is shift the 
gotchas around. The issue really is inherent to the nature of the 
problem, and you cannot defeat the universe. Regardless of what 
base you use, binary or decimal or something else, or how many digits 
precision, you're still trying to simulate an uncountably infinite 
continuous, infinitely divisible number line using a finite, 
discontinuous set of possible values. Something has to give.

(For the record, when I say "uncountably infinite", I don't just mean 
"too many to count", it's a technical term. To oversimplify horribly, it 
means "larger than infinity" in some sense. It's off-topic for here, 
but if anyone is interested in learning more, you can email me off-list, 
or google for "countable vs uncountable infinity".)

Basically, you're trying to squeeze an infinite number of real numbers 
into a finite amount of memory. It can't be done. Consequently, there 
will *always* be some calculations where the true value simply cannot be 
calculated and the answer you get is slightly too big or slightly too 
small. All the other floating point gotchas follow from that simple 

> Admittedly, the only place where I?ve played with decimals extensively is
> on Microsoft?s SQL Server (where they are the default literal
> <>). I?ve stumbled in
> the past on my own decimal gotchas
> <>, but looking at your examples
> and trying them on SQL Server I suspect that most of the problems you show
> are problems of precision and scale.

No. Change the precision and scale, and some *specific* problems goes 
away, but they reappear with other numbers.

Besides, at the point that you're talking about setting the precision, 
we're really not talking about making things easy for beginners any 

And not all floating point issues are related to precision and scale in 
decimal. You cannot divide a cake into exactly three equal pieces in 
Decimal any more than you can divide a cake into exactly three equal 
pieces in binary. All you can hope for is to choose a precision were the 
rounding errors in one part of your calculation will be cancelled by the 
rounding errors in another part of your calculation. And that precision 
will be different for any two arbitrary calculations.


From abarnert at  Tue Jun  2 05:10:29 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 20:10:29 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 18:58, Steven D'Aprano <steve at> wrote:
>> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote:
>> It seems to me that a potentially better option might be to adjust the
>> implicit float->Decimal conversion in the Decimal constructor to use
>> the same algorithm as we now use for float.__repr__ [1], where we look
>> for the shortest decimal representation that gives the same answer
>> when rendered as a float. At the moment you have to indirect through
>> str() or repr() to get that behaviour:
> Apart from the questions of whether such a change would be allowed by 
> the Decimal specification,

As far as I know, GDAS doesn't specify anything about implicit conversion from floats. As long as the required explicit conversion function (which I think is from_float?) exists and does the required thing.

As a side note, has anyone considered whether it's worth switching to IEEE-754-2008 as the controlling specification? There may be a good reason not to do so; I'm just curious whether someone has thought it through and made the case.

> and the breaking of backwards compatibility, 
> I would really hate that change for another reason.
> At the moment, a good, cheap way to find out what a binary float "really 
> is" (in some sense) is to convert it to Decimal and see what you get:
> Decimal(1.3) 
> -> Decimal('1.3000000000000000444089209850062616169452667236328125')
> If you want conversion from repr, then you can be explicit about it:
> Decimal(repr(1.3))
> -> Decimal('1.3')
> ("Explicit is better than implicit", as they say...)
> Although in fairness I suppose that if this change happens, we could 
> keep the old behaviour in the from_float method:
> # hypothetical future behaviour
> Decimal(1.3) 
> -> Decimal('1.3')
> Decimal.from_float(1.3) 
> -> Decimal('1.3000000000000000444089209850062616169452667236328125')
> But all things considered, I don't think we're doing people any favours 
> by changing the behaviour of float->Decimal conversions to implicitly 
> use the repr() instead of being exact. I expect this strategy is like 
> trying to flatten a bubble under wallpaper: all you can do is push the 
> gotchas and surprises to somewhere else.
> Oh, another thought... Decimals could gain yet another conversion 
> method, one which implicitly uses the float repr, but signals if it was 
> an inexact conversion or not. Explicitly calling repr can never signal, 
> since the conversion occurs outside of the Decimal constructor and 
> Decimal sees only the string:
> Decimal(repr(1.3)) cannot signal Inexact.
> But:
> Decimal.from_nearest_float(1.5)  # exact
> Decimal.from_nearest_float(1.3)  # signals Inexact
> That might be useful, but probably not to beginners.

I think this might be worth having whether the default constructor is changed or not.

I can't think of too many programs where I'm pretty sure I have an exactly-representable decimal as a float but want to check to be sure... but for interactive use in IPython (especially when I'm specifically trying to explain to someone why just using Decimal instead of float will/will not solve their problem) I could see using it.

From surya.subbarao1 at  Tue Jun  2 05:48:51 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Mon, 1 Jun 2015 20:48:51 -0700
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 14
In-Reply-To: <>
References: <>
Message-ID: <>

That patch sounds nice, I don't have to edit my Python distribution! We'll
have to do with this.

On Mon, Jun 1, 2015 at 7:03 PM, <python-ideas-request at> wrote:

> Send Python-ideas mailing list submissions to
>         python-ideas at
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
>         python-ideas-request at
> You can reach the person managing the list at
>         python-ideas-owner at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-ideas digest..."
> Today's Topics:
>    1. Re: Python Float Update (Nick Coghlan)
>    2. Re: Python Float Update (Andrew Barnert)
>    3. Re: Python Float Update (Andrew Barnert)
>    4. Re: Python Float Update (Steven D'Aprano)
> ----------------------------------------------------------------------
> Message: 1
> Date: Tue, 2 Jun 2015 10:08:37 +1000
> From: Nick Coghlan <ncoghlan at>
> To: Andrew Barnert <abarnert at>
> Cc: python-ideas <python-ideas at>
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID:
>         <
> CADiSq7fjhS_XrKe3QfF58hXdhLSSbX6NvsFZZKjRq-+OLOQ-eQ at>
> Content-Type: text/plain; charset=UTF-8
> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
> <python-ideas at> wrote:
> >But the basic idea can be extracted out and Pythonified:
> >
> > The literal 1.23 no longer gives you a float, but a FloatLiteral, which
> is either a subclass of float, or an unrelated class that has a __float__
> method. Doing any calculation on it gives you a float. But as long as you
> leave it alone as a FloatLiteral, it has its literal characters available
> for any function that wants to distinguish FloatLiteral from float, like
> the Decimal constructor.
> >
> > The problem that Python faces that Swift doesn't is that Python doesn't
> use static typing and implicit compile-time conversions. So in Python,
> you'd be passing around these larger values and doing the slow conversions
> at runtime. That may or may not be unacceptable; without actually building
> it and testing some realistic programs it's pretty hard to guess.
> Joonas's suggestion of storing the original text representation passed
> to the float constructor is at least a novel one - it's only the idea
> of actual decimal literals that was ruled out in the past.
> Aside from the practical implementation question, the main concern I
> have with it is that we'd be trading the status quo for a situation
> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
> It seems to me that a potentially better option might be to adjust the
> implicit float->Decimal conversion in the Decimal constructor to use
> the same algorithm as we now use for float.__repr__ [1], where we look
> for the shortest decimal representation that gives the same answer
> when rendered as a float. At the moment you have to indirect through
> str() or repr() to get that behaviour:
>  >>> from decimal import Decimal as D
>  >>> 1.3
>  1.3
>  >>> D('1.3')
>  Decimal('1.3')
>  >>> D(1.3)
>  Decimal('1.3000000000000000444089209850062616169452667236328125')
>  >>> D(str(1.3))
>  Decimal('1.3')
> Cheers,
> Nick.
> [1]
> ------------------------------
> Message: 2
> Date: Mon, 1 Jun 2015 18:27:32 -0700
> From: Andrew Barnert <abarnert at>
> To: Nick Coghlan <ncoghlan at>
> Cc: python-ideas <python-ideas at>
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID: <EBB58361-19F4-4275-B6A6-E5AF2F77EF9C at>
> Content-Type: text/plain;       charset=us-ascii
> On Jun 1, 2015, at 17:08, Nick Coghlan <ncoghlan at> wrote:
> >
> > On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
> > <python-ideas at> wrote:
> >> But the basic idea can be extracted out and Pythonified:
> >>
> >> The literal 1.23 no longer gives you a float, but a FloatLiteral, which
> is either a subclass of float, or an unrelated class that has a __float__
> method. Doing any calculation on it gives you a float. But as long as you
> leave it alone as a FloatLiteral, it has its literal characters available
> for any function that wants to distinguish FloatLiteral from float, like
> the Decimal constructor.
> >>
> >> The problem that Python faces that Swift doesn't is that Python doesn't
> use static typing and implicit compile-time conversions. So in Python,
> you'd be passing around these larger values and doing the slow conversions
> at runtime. That may or may not be unacceptable; without actually building
> it and testing some realistic programs it's pretty hard to guess.
> >
> > Joonas's suggestion of storing the original text representation passed
> > to the float constructor is at least a novel one - it's only the idea
> > of actual decimal literals that was ruled out in the past.
> I actually built about half an implementation of something like Swift's
> LiteralConvertible protocol back when I was teaching myself Swift. But I
> think I have a simpler version that I could implement much more easily.
> Basically, FloatLiteral is just a subclass of float whose __new__ stores
> its constructor argument. Then decimal.Decimal checks for that stored
> string and uses it instead of the float value if present. Then there's an
> import hook that replaces every Num with a call to FloatLiteral.
> This design doesn't actually fix everything; in effect, 1.3 actually
> compiles to FloatLiteral(str(float('1.3')) (because by the time you get to
> the AST it's too late to avoid that first conversion). Which does actually
> solve the problem with 1.3, but doesn't solve everything in general (e.g.,
> just feed in a number that has more precision than a double can hold but
> less than your current decimal context can...).
> But it just lets you test whether the implementation makes sense and what
> the performance effects are, and it's only an hour of work, and doesn't
> require anyone to patch their interpreter to play with it. If it seems
> promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3')
> may be worth doing for a test of the actual functionality.
> I'll be glad to hack it up when I get a chance tonight. But personally, I
> think decimal literals are a better way to go here. Decimal(1.20) magically
> doing what you want still has all the same downsides as 1.20d (or implicit
> decimal literals), plus it's more complex, adds performance costs, and
> doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little
> nicer than Decimal('1.20'), but only a little--and nowhere near as nice as
> 1.20d).
> > Aside from the practical implementation question, the main concern I
> > have with it is that we'd be trading the status quo for a situation
> > where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which
> implies that maybe the simplification in Decimal(1.3) is more misleading
> than helpful. (Notice that this problem also doesn't arise for decimal
> literals--13/10d is int vs. Decimal division, which is correct out of the
> box. Or, if you want prefixes, d13/10 is Decimal vs. int division.)
> > It seems to me that a potentially better option might be to adjust the
> > implicit float->Decimal conversion in the Decimal constructor to use
> > the same algorithm as we now use for float.__repr__ [1], where we look
> > for the shortest decimal representation that gives the same answer
> > when rendered as a float. At the moment you have to indirect through
> > str() or repr() to get that behaviour:
> >
> >>>> from decimal import Decimal as D
> >>>> 1.3
> > 1.3
> >>>> D('1.3')
> > Decimal('1.3')
> >>>> D(1.3)
> > Decimal('1.3000000000000000444089209850062616169452667236328125')
> >>>> D(str(1.3))
> > Decimal('1.3')
> >
> > Cheers,
> > Nick.
> >
> > [1]
> ------------------------------
> Message: 3
> Date: Mon, 1 Jun 2015 19:00:48 -0700
> From: Andrew Barnert <abarnert at>
> To: Andrew Barnert <abarnert at>
> Cc: Nick Coghlan <ncoghlan at>, python-ideas
>         <python-ideas at>
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID: <90691306-98E3-421B-ABEB-BA2DE05962C6 at>
> Content-Type: text/plain;       charset=us-ascii
> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas <
> python-ideas at> wrote:
> >
> >> On Jun 1, 2015, at 17:08, Nick Coghlan <ncoghlan at> wrote:
> >>
> >> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
> >> <python-ideas at> wrote:
> >>> But the basic idea can be extracted out and Pythonified:
> >>>
> >>> The literal 1.23 no longer gives you a float, but a FloatLiteral,
> which is either a subclass of float, or an unrelated class that has a
> __float__ method. Doing any calculation on it gives you a float. But as
> long as you leave it alone as a FloatLiteral, it has its literal characters
> available for any function that wants to distinguish FloatLiteral from
> float, like the Decimal constructor.
> >>>
> >>> The problem that Python faces that Swift doesn't is that Python
> doesn't use static typing and implicit compile-time conversions. So in
> Python, you'd be passing around these larger values and doing the slow
> conversions at runtime. That may or may not be unacceptable; without
> actually building it and testing some realistic programs it's pretty hard
> to guess.
> >>
> >> Joonas's suggestion of storing the original text representation passed
> >> to the float constructor is at least a novel one - it's only the idea
> >> of actual decimal literals that was ruled out in the past.
> >
> > I actually built about half an implementation of something like Swift's
> LiteralConvertible protocol back when I was teaching myself Swift. But I
> think I have a simpler version that I could implement much more easily.
> >
> > Basically, FloatLiteral is just a subclass of float whose __new__ stores
> its constructor argument. Then decimal.Decimal checks for that stored
> string and uses it instead of the float value if present. Then there's an
> import hook that replaces every Num with a call to FloatLiteral.
> >
> > This design doesn't actually fix everything; in effect, 1.3 actually
> compiles to FloatLiteral(str(float('1.3')) (because by the time you get to
> the AST it's too late to avoid that first conversion). Which does actually
> solve the problem with 1.3, but doesn't solve everything in general (e.g.,
> just feed in a number that has more precision than a double can hold but
> less than your current decimal context can...).
> >
> > But it just lets you test whether the implementation makes sense and
> what the performance effects are, and it's only an hour of work,
> Make that 15 minutes.
> > and doesn't require anyone to patch their interpreter to play with it.
> If it seems promising, then hacking the compiler so 2.3 compiles to
> FloatLiteral('2.3') may be worth doing for a test of the actual
> functionality.
> >
> > I'll be glad to hack it up when I get a chance tonight. But personally,
> I think decimal literals are a better way to go here. Decimal(1.20)
> magically doing what you want still has all the same downsides as 1.20d (or
> implicit decimal literals), plus it's more complex, adds performance costs,
> and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little
> nicer than Decimal('1.20'), but only a little--and nowhere near as nice as
> 1.20d).
> >
> >> Aside from the practical implementation question, the main concern I
> >> have with it is that we'd be trading the status quo for a situation
> >> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
> >
> > Yes, to solve that you really need Decimal(13)/Decimal(10)... Which
> implies that maybe the simplification in Decimal(1.3) is more misleading
> than helpful. (Notice that this problem also doesn't arise for decimal
> literals--13/10d is int vs. Decimal division, which is correct out of the
> box. Or, if you want prefixes, d13/10 is Decimal vs. int division.)
> >
> >> It seems to me that a potentially better option might be to adjust the
> >> implicit float->Decimal conversion in the Decimal constructor to use
> >> the same algorithm as we now use for float.__repr__ [1], where we look
> >> for the shortest decimal representation that gives the same answer
> >> when rendered as a float. At the moment you have to indirect through
> >> str() or repr() to get that behaviour:
> >>
> >>>>> from decimal import Decimal as D
> >>>>> 1.3
> >> 1.3
> >>>>> D('1.3')
> >> Decimal('1.3')
> >>>>> D(1.3)
> >> Decimal('1.3000000000000000444089209850062616169452667236328125')
> >>>>> D(str(1.3))
> >> Decimal('1.3')
> >>
> >> Cheers,
> >> Nick.
> >>
> >> [1]
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at
> >
> > Code of Conduct:
> ------------------------------
> Message: 4
> Date: Tue, 2 Jun 2015 11:58:09 +1000
> From: Steven D'Aprano <steve at>
> To: python-ideas at
> Subject: Re: [Python-ideas] Python Float Update
> Message-ID: <20150602015809.GE932 at>
> Content-Type: text/plain; charset=us-ascii
> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote:
> > It seems to me that a potentially better option might be to adjust the
> > implicit float->Decimal conversion in the Decimal constructor to use
> > the same algorithm as we now use for float.__repr__ [1], where we look
> > for the shortest decimal representation that gives the same answer
> > when rendered as a float. At the moment you have to indirect through
> > str() or repr() to get that behaviour:
> Apart from the questions of whether such a change would be allowed by
> the Decimal specification, and the breaking of backwards compatibility,
> I would really hate that change for another reason.
> At the moment, a good, cheap way to find out what a binary float "really
> is" (in some sense) is to convert it to Decimal and see what you get:
> Decimal(1.3)
> -> Decimal('1.3000000000000000444089209850062616169452667236328125')
> If you want conversion from repr, then you can be explicit about it:
> Decimal(repr(1.3))
> -> Decimal('1.3')
> ("Explicit is better than implicit", as they say...)
> Although in fairness I suppose that if this change happens, we could
> keep the old behaviour in the from_float method:
> # hypothetical future behaviour
> Decimal(1.3)
> -> Decimal('1.3')
> Decimal.from_float(1.3)
> -> Decimal('1.3000000000000000444089209850062616169452667236328125')
> But all things considered, I don't think we're doing people any favours
> by changing the behaviour of float->Decimal conversions to implicitly
> use the repr() instead of being exact. I expect this strategy is like
> trying to flatten a bubble under wallpaper: all you can do is push the
> gotchas and surprises to somewhere else.
> Oh, another thought... Decimals could gain yet another conversion
> method, one which implicitly uses the float repr, but signals if it was
> an inexact conversion or not. Explicitly calling repr can never signal,
> since the conversion occurs outside of the Decimal constructor and
> Decimal sees only the string:
> Decimal(repr(1.3)) cannot signal Inexact.
> But:
> Decimal.from_nearest_float(1.5)  # exact
> Decimal.from_nearest_float(1.3)  # signals Inexact
> That might be useful, but probably not to beginners.
> --
> Steve
> ------------------------------
> Subject: Digest Footer
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> ------------------------------
> End of Python-ideas Digest, Vol 103, Issue 14
> *********************************************

-Surya Subbarao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From surya.subbarao1 at  Tue Jun  2 05:55:55 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Mon, 1 Jun 2015 20:55:55 -0700
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 16
In-Reply-To: <>
References: <>
Message-ID: <>

Thanks for making that patch!

On Mon, Jun 1, 2015 at 8:48 PM,  <python-ideas-request at> wrote:
> Send Python-ideas mailing list submissions to
>         python-ideas at
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
>         python-ideas-request at
> You can reach the person managing the list at
>         python-ideas-owner at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-ideas digest..."
> Today's Topics:
>    1. Re: Python-ideas Digest, Vol 103, Issue 14
>       (u8y7541 The Awesome Person)
> ----------------------------------------------------------------------
> Message: 1
> Date: Mon, 1 Jun 2015 20:48:51 -0700
> From: u8y7541 The Awesome Person <surya.subbarao1 at>
> To: python-ideas at, abarnert at
> Subject: Re: [Python-ideas] Python-ideas Digest, Vol 103, Issue 14
> Message-ID:
>         <CA+o1fZNoPAmTP_ZwPnqtEEdyBL8Gq=CjLRvohoeT7aeW+n0sEg at>
> Content-Type: text/plain; charset="utf-8"
> That patch sounds nice, I don't have to edit my Python distribution! We'll
> have to do with this.
> On Mon, Jun 1, 2015 at 7:03 PM, <python-ideas-request at> wrote:
>> Send Python-ideas mailing list submissions to
>>         python-ideas at
>> To subscribe or unsubscribe via the World Wide Web, visit
>> or, via email, send a message with subject or body 'help' to
>>         python-ideas-request at
>> You can reach the person managing the list at
>>         python-ideas-owner at
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Python-ideas digest..."
>> Today's Topics:
>>    1. Re: Python Float Update (Nick Coghlan)
>>    2. Re: Python Float Update (Andrew Barnert)
>>    3. Re: Python Float Update (Andrew Barnert)
>>    4. Re: Python Float Update (Steven D'Aprano)
>> ----------------------------------------------------------------------
>> Message: 1
>> Date: Tue, 2 Jun 2015 10:08:37 +1000
>> From: Nick Coghlan <ncoghlan at>
>> To: Andrew Barnert <abarnert at>
>> Cc: python-ideas <python-ideas at>
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID:
>>         <
>> CADiSq7fjhS_XrKe3QfF58hXdhLSSbX6NvsFZZKjRq-+OLOQ-eQ at>
>> Content-Type: text/plain; charset=UTF-8
>> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
>> <python-ideas at> wrote:
>> >But the basic idea can be extracted out and Pythonified:
>> >
>> > The literal 1.23 no longer gives you a float, but a FloatLiteral, which
>> is either a subclass of float, or an unrelated class that has a __float__
>> method. Doing any calculation on it gives you a float. But as long as you
>> leave it alone as a FloatLiteral, it has its literal characters available
>> for any function that wants to distinguish FloatLiteral from float, like
>> the Decimal constructor.
>> >
>> > The problem that Python faces that Swift doesn't is that Python doesn't
>> use static typing and implicit compile-time conversions. So in Python,
>> you'd be passing around these larger values and doing the slow conversions
>> at runtime. That may or may not be unacceptable; without actually building
>> it and testing some realistic programs it's pretty hard to guess.
>> Joonas's suggestion of storing the original text representation passed
>> to the float constructor is at least a novel one - it's only the idea
>> of actual decimal literals that was ruled out in the past.
>> Aside from the practical implementation question, the main concern I
>> have with it is that we'd be trading the status quo for a situation
>> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
>> It seems to me that a potentially better option might be to adjust the
>> implicit float->Decimal conversion in the Decimal constructor to use
>> the same algorithm as we now use for float.__repr__ [1], where we look
>> for the shortest decimal representation that gives the same answer
>> when rendered as a float. At the moment you have to indirect through
>> str() or repr() to get that behaviour:
>>  >>> from decimal import Decimal as D
>>  >>> 1.3
>>  1.3
>>  >>> D('1.3')
>>  Decimal('1.3')
>>  >>> D(1.3)
>>  Decimal('1.3000000000000000444089209850062616169452667236328125')
>>  >>> D(str(1.3))
>>  Decimal('1.3')
>> Cheers,
>> Nick.
>> [1]
>> ------------------------------
>> Message: 2
>> Date: Mon, 1 Jun 2015 18:27:32 -0700
>> From: Andrew Barnert <abarnert at>
>> To: Nick Coghlan <ncoghlan at>
>> Cc: python-ideas <python-ideas at>
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <EBB58361-19F4-4275-B6A6-E5AF2F77EF9C at>
>> Content-Type: text/plain;       charset=us-ascii
>> On Jun 1, 2015, at 17:08, Nick Coghlan <ncoghlan at> wrote:
>> >
>> > On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
>> > <python-ideas at> wrote:
>> >> But the basic idea can be extracted out and Pythonified:
>> >>
>> >> The literal 1.23 no longer gives you a float, but a FloatLiteral, which
>> is either a subclass of float, or an unrelated class that has a __float__
>> method. Doing any calculation on it gives you a float. But as long as you
>> leave it alone as a FloatLiteral, it has its literal characters available
>> for any function that wants to distinguish FloatLiteral from float, like
>> the Decimal constructor.
>> >>
>> >> The problem that Python faces that Swift doesn't is that Python doesn't
>> use static typing and implicit compile-time conversions. So in Python,
>> you'd be passing around these larger values and doing the slow conversions
>> at runtime. That may or may not be unacceptable; without actually building
>> it and testing some realistic programs it's pretty hard to guess.
>> >
>> > Joonas's suggestion of storing the original text representation passed
>> > to the float constructor is at least a novel one - it's only the idea
>> > of actual decimal literals that was ruled out in the past.
>> I actually built about half an implementation of something like Swift's
>> LiteralConvertible protocol back when I was teaching myself Swift. But I
>> think I have a simpler version that I could implement much more easily.
>> Basically, FloatLiteral is just a subclass of float whose __new__ stores
>> its constructor argument. Then decimal.Decimal checks for that stored
>> string and uses it instead of the float value if present. Then there's an
>> import hook that replaces every Num with a call to FloatLiteral.
>> This design doesn't actually fix everything; in effect, 1.3 actually
>> compiles to FloatLiteral(str(float('1.3')) (because by the time you get to
>> the AST it's too late to avoid that first conversion). Which does actually
>> solve the problem with 1.3, but doesn't solve everything in general (e.g.,
>> just feed in a number that has more precision than a double can hold but
>> less than your current decimal context can...).
>> But it just lets you test whether the implementation makes sense and what
>> the performance effects are, and it's only an hour of work, and doesn't
>> require anyone to patch their interpreter to play with it. If it seems
>> promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3')
>> may be worth doing for a test of the actual functionality.
>> I'll be glad to hack it up when I get a chance tonight. But personally, I
>> think decimal literals are a better way to go here. Decimal(1.20) magically
>> doing what you want still has all the same downsides as 1.20d (or implicit
>> decimal literals), plus it's more complex, adds performance costs, and
>> doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little
>> nicer than Decimal('1.20'), but only a little--and nowhere near as nice as
>> 1.20d).
>> > Aside from the practical implementation question, the main concern I
>> > have with it is that we'd be trading the status quo for a situation
>> > where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
>> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which
>> implies that maybe the simplification in Decimal(1.3) is more misleading
>> than helpful. (Notice that this problem also doesn't arise for decimal
>> literals--13/10d is int vs. Decimal division, which is correct out of the
>> box. Or, if you want prefixes, d13/10 is Decimal vs. int division.)
>> > It seems to me that a potentially better option might be to adjust the
>> > implicit float->Decimal conversion in the Decimal constructor to use
>> > the same algorithm as we now use for float.__repr__ [1], where we look
>> > for the shortest decimal representation that gives the same answer
>> > when rendered as a float. At the moment you have to indirect through
>> > str() or repr() to get that behaviour:
>> >
>> >>>> from decimal import Decimal as D
>> >>>> 1.3
>> > 1.3
>> >>>> D('1.3')
>> > Decimal('1.3')
>> >>>> D(1.3)
>> > Decimal('1.3000000000000000444089209850062616169452667236328125')
>> >>>> D(str(1.3))
>> > Decimal('1.3')
>> >
>> > Cheers,
>> > Nick.
>> >
>> > [1]
>> ------------------------------
>> Message: 3
>> Date: Mon, 1 Jun 2015 19:00:48 -0700
>> From: Andrew Barnert <abarnert at>
>> To: Andrew Barnert <abarnert at>
>> Cc: Nick Coghlan <ncoghlan at>, python-ideas
>>         <python-ideas at>
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <90691306-98E3-421B-ABEB-BA2DE05962C6 at>
>> Content-Type: text/plain;       charset=us-ascii
>> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas <
>> python-ideas at> wrote:
>> >
>> >> On Jun 1, 2015, at 17:08, Nick Coghlan <ncoghlan at> wrote:
>> >>
>> >> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
>> >> <python-ideas at> wrote:
>> >>> But the basic idea can be extracted out and Pythonified:
>> >>>
>> >>> The literal 1.23 no longer gives you a float, but a FloatLiteral,
>> which is either a subclass of float, or an unrelated class that has a
>> __float__ method. Doing any calculation on it gives you a float. But as
>> long as you leave it alone as a FloatLiteral, it has its literal characters
>> available for any function that wants to distinguish FloatLiteral from
>> float, like the Decimal constructor.
>> >>>
>> >>> The problem that Python faces that Swift doesn't is that Python
>> doesn't use static typing and implicit compile-time conversions. So in
>> Python, you'd be passing around these larger values and doing the slow
>> conversions at runtime. That may or may not be unacceptable; without
>> actually building it and testing some realistic programs it's pretty hard
>> to guess.
>> >>
>> >> Joonas's suggestion of storing the original text representation passed
>> >> to the float constructor is at least a novel one - it's only the idea
>> >> of actual decimal literals that was ruled out in the past.
>> >
>> > I actually built about half an implementation of something like Swift's
>> LiteralConvertible protocol back when I was teaching myself Swift. But I
>> think I have a simpler version that I could implement much more easily.
>> >
>> > Basically, FloatLiteral is just a subclass of float whose __new__ stores
>> its constructor argument. Then decimal.Decimal checks for that stored
>> string and uses it instead of the float value if present. Then there's an
>> import hook that replaces every Num with a call to FloatLiteral.
>> >
>> > This design doesn't actually fix everything; in effect, 1.3 actually
>> compiles to FloatLiteral(str(float('1.3')) (because by the time you get to
>> the AST it's too late to avoid that first conversion). Which does actually
>> solve the problem with 1.3, but doesn't solve everything in general (e.g.,
>> just feed in a number that has more precision than a double can hold but
>> less than your current decimal context can...).
>> >
>> > But it just lets you test whether the implementation makes sense and
>> what the performance effects are, and it's only an hour of work,
>> Make that 15 minutes.
>> > and doesn't require anyone to patch their interpreter to play with it.
>> If it seems promising, then hacking the compiler so 2.3 compiles to
>> FloatLiteral('2.3') may be worth doing for a test of the actual
>> functionality.
>> >
>> > I'll be glad to hack it up when I get a chance tonight. But personally,
>> I think decimal literals are a better way to go here. Decimal(1.20)
>> magically doing what you want still has all the same downsides as 1.20d (or
>> implicit decimal literals), plus it's more complex, adds performance costs,
>> and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little
>> nicer than Decimal('1.20'), but only a little--and nowhere near as nice as
>> 1.20d).
>> >
>> >> Aside from the practical implementation question, the main concern I
>> >> have with it is that we'd be trading the status quo for a situation
>> >> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
>> >
>> > Yes, to solve that you really need Decimal(13)/Decimal(10)... Which
>> implies that maybe the simplification in Decimal(1.3) is more misleading
>> than helpful. (Notice that this problem also doesn't arise for decimal
>> literals--13/10d is int vs. Decimal division, which is correct out of the
>> box. Or, if you want prefixes, d13/10 is Decimal vs. int division.)
>> >
>> >> It seems to me that a potentially better option might be to adjust the
>> >> implicit float->Decimal conversion in the Decimal constructor to use
>> >> the same algorithm as we now use for float.__repr__ [1], where we look
>> >> for the shortest decimal representation that gives the same answer
>> >> when rendered as a float. At the moment you have to indirect through
>> >> str() or repr() to get that behaviour:
>> >>
>> >>>>> from decimal import Decimal as D
>> >>>>> 1.3
>> >> 1.3
>> >>>>> D('1.3')
>> >> Decimal('1.3')
>> >>>>> D(1.3)
>> >> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> >>>>> D(str(1.3))
>> >> Decimal('1.3')
>> >>
>> >> Cheers,
>> >> Nick.
>> >>
>> >> [1]
>> > _______________________________________________
>> > Python-ideas mailing list
>> > Python-ideas at
>> >
>> > Code of Conduct:
>> ------------------------------
>> Message: 4
>> Date: Tue, 2 Jun 2015 11:58:09 +1000
>> From: Steven D'Aprano <steve at>
>> To: python-ideas at
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <20150602015809.GE932 at>
>> Content-Type: text/plain; charset=us-ascii
>> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote:
>> > It seems to me that a potentially better option might be to adjust the
>> > implicit float->Decimal conversion in the Decimal constructor to use
>> > the same algorithm as we now use for float.__repr__ [1], where we look
>> > for the shortest decimal representation that gives the same answer
>> > when rendered as a float. At the moment you have to indirect through
>> > str() or repr() to get that behaviour:
>> Apart from the questions of whether such a change would be allowed by
>> the Decimal specification, and the breaking of backwards compatibility,
>> I would really hate that change for another reason.
>> At the moment, a good, cheap way to find out what a binary float "really
>> is" (in some sense) is to convert it to Decimal and see what you get:
>> Decimal(1.3)
>> -> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> If you want conversion from repr, then you can be explicit about it:
>> Decimal(repr(1.3))
>> -> Decimal('1.3')
>> ("Explicit is better than implicit", as they say...)
>> Although in fairness I suppose that if this change happens, we could
>> keep the old behaviour in the from_float method:
>> # hypothetical future behaviour
>> Decimal(1.3)
>> -> Decimal('1.3')
>> Decimal.from_float(1.3)
>> -> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> But all things considered, I don't think we're doing people any favours
>> by changing the behaviour of float->Decimal conversions to implicitly
>> use the repr() instead of being exact. I expect this strategy is like
>> trying to flatten a bubble under wallpaper: all you can do is push the
>> gotchas and surprises to somewhere else.
>> Oh, another thought... Decimals could gain yet another conversion
>> method, one which implicitly uses the float repr, but signals if it was
>> an inexact conversion or not. Explicitly calling repr can never signal,
>> since the conversion occurs outside of the Decimal constructor and
>> Decimal sees only the string:
>> Decimal(repr(1.3)) cannot signal Inexact.
>> But:
>> Decimal.from_nearest_float(1.5)  # exact
>> Decimal.from_nearest_float(1.3)  # signals Inexact
>> That might be useful, but probably not to beginners.
>> --
>> Steve
>> ------------------------------
>> Subject: Digest Footer
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> ------------------------------
>> End of Python-ideas Digest, Vol 103, Issue 14
>> *********************************************
> --
> -Surya Subbarao
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <>
> ------------------------------
> Subject: Digest Footer
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> ------------------------------
> End of Python-ideas Digest, Vol 103, Issue 16
> *********************************************

-Surya Subbarao

From stephen at  Tue Jun  2 06:21:48 2015
From: stephen at (Stephen J. Turnbull)
Date: Tue, 02 Jun 2015 13:21:48 +0900
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

Joonas Liik writes:

 > Having some sort of decimal literal would have some advantages of its own,
 > for one it could help against this sillyness:

 > I'm not saying that the actual data type needs to be a decimal (
 > might well be a float but say shove the string repr next to it so
 > it can be accessed when needed)

That *would* be a different type from float.  You may as well go all
the way to Decimal.

 > ..but this is one really common pitfall for new users,

To fix it, you really need to change the parser, i.e., make Decimal
the default type for non-integral numbers.  "Decimal('1.3')" isn't
that much harder to remember than "1.3$" (although it's quite a bit
more to type).  But people are going to continue writing things like

    pennies = 13
    pennies_per_dollar = 100
    dollars = pennies / pennies_per_dollar
    # Much later ...
    future_value = dollars * Decimal('1.07')

And in real applications you're going to be using Decimal in code like

    def inputDecimals(file):
        for row, line in enumerate(file):
            for col, value in enumerate(line.strip().split()):
                matrix[row][col] = Decimal(value)


    def what_if():
        principal = Decimal(input("Principal ($): "))
        rate = Decimal(input("Interest rate (%): "))
        print("Future value is ",
              principal * (1 + rate/100),
              ".", sep="")

and the whole issue evaporates.

From guido at  Tue Jun  2 06:31:47 2015
From: guido at (Guido van Rossum)
Date: Mon, 1 Jun 2015 21:31:47 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 1, 2015 at 9:21 PM, Stephen J. Turnbull <stephen at>

> Joonas Liik writes:
>  > Having some sort of decimal literal would have some advantages of its
> own,
>  > for one it could help against this sillyness:
>  > I'm not saying that the actual data type needs to be a decimal (
>  > might well be a float but say shove the string repr next to it so
>  > it can be accessed when needed)
> That *would* be a different type from float.

Shudder indeed.

> You may as well go all
> the way to Decimal.

Or perhaps switch to decimal64 ( (Or its
bigger cousing, decimal128)

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Tue Jun  2 06:41:02 2015
From: stephen at (Stephen J. Turnbull)
Date: Tue, 02 Jun 2015 13:41:02 +0900
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan writes:

 > the main concern I have with [a FloatLiteral that carries the
 > original repr around] is that we'd be trading the status quo for a
 > situation where "Decimal(1.3)" and "Decimal(13/10)" gave different
 > answers.

Yeah, and that kills the deal for me.  Either Decimal is the default
representation for non-integers, or this is a no-go.  And that isn't
going to happen.

From random832 at  Tue Jun  2 06:47:02 2015
From: random832 at (random832 at
Date: Tue, 02 Jun 2015 00:47:02 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 2, 2015, at 00:31, Guido van Rossum wrote:
> Or perhaps switch to decimal64 (
> (Or its
> bigger cousing, decimal128)

Does anyone know if any common computer architectures have any hardware
support for this? Are there any known good implementations for all the
functions in math/cmath for these types?

Moving to a fixed-size floating point type does have the advantage of
not requiring making all these decisions about environments and
precision and potentially unbounded growth etc.

From ncoghlan at  Tue Jun  2 07:10:22 2015
From: ncoghlan at (Nick Coghlan)
Date: Tue, 2 Jun 2015 15:10:22 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 2 June 2015 at 13:10, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> On Jun 1, 2015, at 18:58, Steven D'Aprano <steve at> wrote:
>> Apart from the questions of whether such a change would be allowed by
>> the Decimal specification,
> As far as I know, GDAS doesn't specify anything about implicit conversion from floats. As long as the required explicit conversion function (which I think is from_float?) exists and does the required thing.
> As a side note, has anyone considered whether it's worth switching to IEEE-754-2008 as the controlling specification? There may be a good reason not to do so; I'm just curious whether someone has thought it through and made the case.

As far as I know, nobody has looked into it. If there aren't any
meaningful differences, we should just switch, if there are
differences, we should probably switch anyway, but it will be more
work (and hence will require volunteers willing to do that work).
Either way, the starting point would be an assessment of what the
differences are, and whether or not they have any implications for the
decimal module and cdecimal.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abarnert at  Tue Jun  2 07:15:35 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 22:15:35 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 21:47, random832 at wrote:
>> On Tue, Jun 2, 2015, at 00:31, Guido van Rossum wrote:
>> Or perhaps switch to decimal64 (
>> (Or its
>> bigger cousing, decimal128)
> Does anyone know if any common computer architectures have any hardware
> support for this?

IBM's RS/POWER architecture supports decimal32, 64, and 128. The PowerPC and Cell offshoots only support them in some models, not all. Is that common enough? (Is _anything_ common enough besides x86, x86_64, ARM7, ARM8, and various less-capable things like embedded 68k variants?)

> Are there any known good implementations for all the
> functions in math/cmath for these types?

Intel wrote a reference implementation for IEEE 754-2008 as part of the standardization process. And since then, they've focused on improvements geared at making it possible to write highly-optimized financial applications in C or C++ that run on x86_64 hardware. And I think it's BSD-licensed. It's available somewhere on netlib, but searching that repo is no fun on my phone (plus, most of Intel's code, you can't see the license or the detailed README until you unpack it...), so I'll leave it to someone else to find it.

Of course 754-2008 isn't necessarily identical to GDAS (which is what POWER implements, and Python's decimal module).

> Moving to a fixed-size floating point type does have the advantage of
> not requiring making all these decisions about environments and
> precision and potentially unbounded growth etc.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From random832 at  Tue Jun  2 07:23:57 2015
From: random832 at (random832 at
Date: Tue, 02 Jun 2015 01:23:57 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 2, 2015, at 01:10, Nick Coghlan wrote:
> Either way, the starting point would be an assessment of what the
> differences are, and whether or not they have any implications for the
> decimal module and cdecimal.

Does IEEE even have anything about arbitrary-precision decimal types
(which are what decimal/cdecimal are)?

From abarnert at  Tue Jun  2 07:22:07 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 1 Jun 2015 22:22:07 -0700
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 15
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person <surya.subbarao1 at> wrote:
> I think you're right. I was also considering ... "editing" my Python distribution. If they didn't implement my suggestion for correcting floats, at least they can fix this, instead of making people hack Python for good results!

If you're going to reply to digests, please learn how to reply inline instead of top-posting (and how to trim out all the irrelevant stuff). It's next to impossible to tell which part of which of the messages you're replying to even in simple cases like this one, with only 4 messages in the digest.

>> On Mon, Jun 1, 2015 at 8:10 PM, <python-ideas-request at> wrote:
>> Send Python-ideas mailing list submissions to
>>         python-ideas at
>> To subscribe or unsubscribe via the World Wide Web, visit
>> or, via email, send a message with subject or body 'help' to
>>         python-ideas-request at
>> You can reach the person managing the list at
>>         python-ideas-owner at
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Python-ideas digest..."
>> Today's Topics:
>>    1. Re: Python Float Update (Steven D'Aprano)
>>    2. Re: Python Float Update (Andrew Barnert)
>>    3. Re: Python Float Update (Steven D'Aprano)
>>    4. Re: Python Float Update (Andrew Barnert)
>> ----------------------------------------------------------------------
>> Message: 1
>> Date: Tue, 2 Jun 2015 11:37:48 +1000
>> From: Steven D'Aprano <steve at>
>> To: python-ideas at
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <20150602013748.GD932 at>
>> Content-Type: text/plain; charset=us-ascii
>> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote:
>> > Having some sort of decimal literal would have some advantages of its own,
>> > for one it could help against this sillyness:
>> >
>> > >>> Decimal(1.3)
>> > Decimal('1.3000000000000000444089209850062616169452667236328125')
>> Why is that silly? That's the actual value of the binary float 1.3
>> converted into base 10. If you want 1.3 exactly, you can do this:
>> > >>> Decimal('1.3')
>> > Decimal('1.3')
>> Is that really so hard for people to learn?
>> > I'm not saying that the actual data type needs to be a decimal (
>> > might well be a float but say shove the string repr next to it so it can be
>> > accessed when needed)
>> You want Decimals to *lie* about what value they have?
>> I think that's a terrible idea, one which would lead to a whole set of
>> new and exciting surprises when using Decimal. Let me try to predict a
>> few of the questions on Stackoverflow which would follow this change...
>>   Why is equality so inaccurate in Python?
>>   py> x = Decimal(1.3)
>>   py> y = Decimal('1.3')
>>   py> x, y
>>   (Decimal('1.3'), Decimal('1.3'))
>>   py> x == y
>>   False
>>   Why does Python insert extra digits into numbers when I multiply?
>>   py> x = Decimal(1.3)
>>   py> x
>>   Decimal('1.3')
>>   py> y = 10000000000000000*x
>>   py> y - 13000000000000000
>>   Decimal('0.444089209850062616169452667236328125')
>> > ..but this is one really common pitfall for new users, i know its easy to
>> > fix the code above,
>> > but this behavior is very unintuitive.. you essentially get a really
>> > expensive float when you do the obvious thing.
>> Then don't do the obvious thing.
>> Sometimes there really is no good alternative to actually knowing what
>> you are doing. Floating point maths is inherently hard, but that's not
>> the problem. There are all sorts of things in programming which are
>> hard, and people learn how to deal with them. The problem is that people
>> *imagine* that floating point is simple, when it is not and can never
>> be. We don't do them any favours by enabling that delusion.
>> If your needs are light, then you can ignore the complexities of
>> floating point. You really can go a very long way by just rounding the
>> results of your calculations when displaying them. But for anything more
>> than that, we cannot just paper over the floating point complexities
>> without creating new complexities that will burn people.
>> You don't have to become a floating point guru, but it really isn't
>> onerous to expect people who are programming to learn a few basic
>> programming skills, and that includes a few basic coping strategies for
>> floating point.
>> --
>> Steve
>> ------------------------------
>> Message: 2
>> Date: Mon, 1 Jun 2015 19:21:47 -0700
>> From: Andrew Barnert <abarnert at>
>> To: Andrew Barnert <abarnert at>
>> Cc: Nick Coghlan <ncoghlan at>, python-ideas
>>         <python-ideas at>
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <5E8271BF-183E-496D-A556-81C407977FFE at>
>> Content-Type: text/plain;       charset=us-ascii
>> On Jun 1, 2015, at 19:00, Andrew Barnert <abarnert at> wrote:
>> >
>> >> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas <python-ideas at> wrote:
>> >>
>> >>> On Jun 1, 2015, at 17:08, Nick Coghlan <ncoghlan at> wrote:
>> >>>
>> >>> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
>> >>> <python-ideas at> wrote:
>> >>>> But the basic idea can be extracted out and Pythonified:
>> >>>>
>> >>>> The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor.
>> >>>>
>> >>>> The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess.
>> >>>
>> >>> Joonas's suggestion of storing the original text representation passed
>> >>> to the float constructor is at least a novel one - it's only the idea
>> >>> of actual decimal literals that was ruled out in the past.
>> >>
>> >> I actually built about half an implementation of something like Swift's LiteralConvertible protocol back when I was teaching myself Swift. But I think I have a simpler version that I could implement much more easily.
>> >>
>> >> Basically, FloatLiteral is just a subclass of float whose __new__ stores its constructor argument. Then decimal.Decimal checks for that stored string and uses it instead of the float value if present. Then there's an import hook that replaces every Num with a call to FloatLiteral.
>> >>
>> >> This design doesn't actually fix everything; in effect, 1.3 actually compiles to FloatLiteral(str(float('1.3')) (because by the time you get to the AST it's too late to avoid that first conversion). Which does actually solve the problem with 1.3, but doesn't solve everything in general (e.g., just feed in a number that has more precision than a double can hold but less than your current decimal context can...).
>> >>
>> >> But it just lets you test whether the implementation makes sense and what the performance effects are, and it's only an hour of work,
>> >
>> > Make that 15 minutes.
>> >
>> >
>> And as it turns out, hacking the tokens is no harder than hacking the AST (in fact, it's a little easier; I'd just never done it before), so now it does that, meaning you really get the actual literal string from the source, not the repr of the float of that string literal.
>> Turning this into a real implementation would obviously be more than half an hour's work, but not more than a day or two. Again, I don't think anyone would actually want this, but now people who think they do have an implementation to play with to prove me wrong.
>> >> and doesn't require anyone to patch their interpreter to play with it. If it seems promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') may be worth doing for a test of the actual functionality.
>> >>
>> >> I'll be glad to hack it up when I get a chance tonight. But personally, I think decimal literals are a better way to go here. Decimal(1.20) magically doing what you want still has all the same downsides as 1.20d (or implicit decimal literals), plus it's more complex, adds performance costs, and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little nicer than Decimal('1.20'), but only a little--and nowhere near as nice as 1.20d).
>> >>
>> >>> Aside from the practical implementation question, the main concern I
>> >>> have with it is that we'd be trading the status quo for a situation
>> >>> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
>> >>
>> >> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which implies that maybe the simplification in Decimal(1.3) is more misleading than helpful. (Notice that this problem also doesn't arise for decimal literals--13/10d is int vs. Decimal division, which is correct out of the box. Or, if you want prefixes, d13/10 is Decimal vs. int division.)
>> >>
>> >>> It seems to me that a potentially better option might be to adjust the
>> >>> implicit float->Decimal conversion in the Decimal constructor to use
>> >>> the same algorithm as we now use for float.__repr__ [1], where we look
>> >>> for the shortest decimal representation that gives the same answer
>> >>> when rendered as a float. At the moment you have to indirect through
>> >>> str() or repr() to get that behaviour:
>> >>>
>> >>>>>> from decimal import Decimal as D
>> >>>>>> 1.3
>> >>> 1.3
>> >>>>>> D('1.3')
>> >>> Decimal('1.3')
>> >>>>>> D(1.3)
>> >>> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> >>>>>> D(str(1.3))
>> >>> Decimal('1.3')
>> >>>
>> >>> Cheers,
>> >>> Nick.
>> >>>
>> >>> [1]
>> >> _______________________________________________
>> >> Python-ideas mailing list
>> >> Python-ideas at
>> >>
>> >> Code of Conduct:
>> ------------------------------
>> Message: 3
>> Date: Tue, 2 Jun 2015 13:00:40 +1000
>> From: Steven D'Aprano <steve at>
>> To: python-ideas at
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <20150602030040.GF932 at>
>> Content-Type: text/plain; charset=utf-8
>> Nicholas,
>> Your email client appears to not be quoting text you quote. It is a
>> conventional to use a leading > for quoting, perhaps you could configure
>> your mail program to do so? The good ones even have a "Paste As Quote"
>> command.
>> On with the substance of your post...
>> On Mon, Jun 01, 2015 at 01:24:32PM -0400, Nicholas Chammas wrote:
>> > I guess it?s a non-trivial tradeoff. But I would lean towards considering
>> > people likely to be affected by the performance hit as doing something ?not
>> > common?. Like, if they are doing that many calculations that it matters,
>> > perhaps it makes sense to ask them to explicitly ask for floats vs.
>> > decimals, in exchange for giving the majority who wouldn?t notice a
>> > performance difference a better user experience.
>> Changing from binary floats to decimal floats by default is a big,
>> backwards incompatible change. Even if it's a good idea, we're
>> constrained by backwards compatibility: I would imagine we wouldn't want
>> to even introduce this feature until the majority of people are using
>> Python 3 rather than Python 2, and then we'd probably want to introduce
>> it using a "from __future__ import decimal_floats" directive.
>> So I would guess this couldn't happen until probably 2020 or so.
>> But we could introduce a decimal literal, say 1.1d for Decimal("1.1").
>> The first prerequisite is that we have a fast Decimal implementation,
>> which we now have. Next we would have to decide how the decimal literals
>> would interact with the decimal module. Do we include full support of
>> the entire range of decimal features, including globally configurable
>> precision and other modes? Or just a subset? How will these decimals
>> interact with other numeric types, like float and Fraction? At the
>> moment, Decimal isn't even part of the numeric tower.
>> There's a lot of ground to cover, it's not a trivial change, and will
>> definitely need a PEP.
>> > How many of your examples are inherent limitations of decimals vs. problems
>> > that can be improved upon?
>> In one sense, they are inherent limitations of floating point numbers
>> regardless of base. Whether binary, decimal, hexadecimal as used in some
>> IBM computers, or something else, you're going to see the same problems.
>> Only the specific details will vary, e.g. 1/3 cannot be represented
>> exactly in base 2 or base 10, but if you constructed a base 3 float, it
>> would be exact.
>> In another sense, Decimal has a big advantage that it is much more
>> configurable than Python's floats. Decimal lets you configure the
>> precision, rounding mode, error handling and more. That's not inherent
>> to base 10 calculations, you can do exactly the same thing for binary
>> floats too, but Python doesn't offer that feature for floats, only for
>> Decimals.
>> But no matter how you configure Decimal, all you can do is shift the
>> gotchas around. The issue really is inherent to the nature of the
>> problem, and you cannot defeat the universe. Regardless of what
>> base you use, binary or decimal or something else, or how many digits
>> precision, you're still trying to simulate an uncountably infinite
>> continuous, infinitely divisible number line using a finite,
>> discontinuous set of possible values. Something has to give.
>> (For the record, when I say "uncountably infinite", I don't just mean
>> "too many to count", it's a technical term. To oversimplify horribly, it
>> means "larger than infinity" in some sense. It's off-topic for here,
>> but if anyone is interested in learning more, you can email me off-list,
>> or google for "countable vs uncountable infinity".)
>> Basically, you're trying to squeeze an infinite number of real numbers
>> into a finite amount of memory. It can't be done. Consequently, there
>> will *always* be some calculations where the true value simply cannot be
>> calculated and the answer you get is slightly too big or slightly too
>> small. All the other floating point gotchas follow from that simple
>> fact.
>> > Admittedly, the only place where I?ve played with decimals extensively is
>> > on Microsoft?s SQL Server (where they are the default literal
>> > <>). I?ve stumbled in
>> > the past on my own decimal gotchas
>> > <>, but looking at your examples
>> > and trying them on SQL Server I suspect that most of the problems you show
>> > are problems of precision and scale.
>> No. Change the precision and scale, and some *specific* problems goes
>> away, but they reappear with other numbers.
>> Besides, at the point that you're talking about setting the precision,
>> we're really not talking about making things easy for beginners any
>> more.
>> And not all floating point issues are related to precision and scale in
>> decimal. You cannot divide a cake into exactly three equal pieces in
>> Decimal any more than you can divide a cake into exactly three equal
>> pieces in binary. All you can hope for is to choose a precision were the
>> rounding errors in one part of your calculation will be cancelled by the
>> rounding errors in another part of your calculation. And that precision
>> will be different for any two arbitrary calculations.
>> --
>> Steve
>> ------------------------------
>> Message: 4
>> Date: Mon, 1 Jun 2015 20:10:29 -0700
>> From: Andrew Barnert <abarnert at>
>> To: Steven D'Aprano <steve at>
>> Cc: "python-ideas at" <python-ideas at>
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <79C16144-8BF7-4260-A356-DD4E8D97BAAD at>
>> Content-Type: text/plain;       charset=us-ascii
>> On Jun 1, 2015, at 18:58, Steven D'Aprano <steve at> wrote:
>> >
>> >> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote:
>> >>
>> >> It seems to me that a potentially better option might be to adjust the
>> >> implicit float->Decimal conversion in the Decimal constructor to use
>> >> the same algorithm as we now use for float.__repr__ [1], where we look
>> >> for the shortest decimal representation that gives the same answer
>> >> when rendered as a float. At the moment you have to indirect through
>> >> str() or repr() to get that behaviour:
>> >
>> > Apart from the questions of whether such a change would be allowed by
>> > the Decimal specification,
>> As far as I know, GDAS doesn't specify anything about implicit conversion from floats. As long as the required explicit conversion function (which I think is from_float?) exists and does the required thing.
>> As a side note, has anyone considered whether it's worth switching to IEEE-754-2008 as the controlling specification? There may be a good reason not to do so; I'm just curious whether someone has thought it through and made the case.
>> > and the breaking of backwards compatibility,
>> > I would really hate that change for another reason.
>> >
>> > At the moment, a good, cheap way to find out what a binary float "really
>> > is" (in some sense) is to convert it to Decimal and see what you get:
>> >
>> > Decimal(1.3)
>> > -> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> >
>> > If you want conversion from repr, then you can be explicit about it:
>> >
>> > Decimal(repr(1.3))
>> > -> Decimal('1.3')
>> >
>> > ("Explicit is better than implicit", as they say...)
>> >
>> > Although in fairness I suppose that if this change happens, we could
>> > keep the old behaviour in the from_float method:
>> >
>> > # hypothetical future behaviour
>> > Decimal(1.3)
>> > -> Decimal('1.3')
>> > Decimal.from_float(1.3)
>> > -> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> >
>> > But all things considered, I don't think we're doing people any favours
>> > by changing the behaviour of float->Decimal conversions to implicitly
>> > use the repr() instead of being exact. I expect this strategy is like
>> > trying to flatten a bubble under wallpaper: all you can do is push the
>> > gotchas and surprises to somewhere else.
>> >
>> > Oh, another thought... Decimals could gain yet another conversion
>> > method, one which implicitly uses the float repr, but signals if it was
>> > an inexact conversion or not. Explicitly calling repr can never signal,
>> > since the conversion occurs outside of the Decimal constructor and
>> > Decimal sees only the string:
>> >
>> > Decimal(repr(1.3)) cannot signal Inexact.
>> >
>> > But:
>> >
>> > Decimal.from_nearest_float(1.5)  # exact
>> > Decimal.from_nearest_float(1.3)  # signals Inexact
>> >
>> > That might be useful, but probably not to beginners.
>> I think this might be worth having whether the default constructor is changed or not.
>> I can't think of too many programs where I'm pretty sure I have an exactly-representable decimal as a float but want to check to be sure... but for interactive use in IPython (especially when I'm specifically trying to explain to someone why just using Decimal instead of float will/will not solve their problem) I could see using it.
>> ------------------------------
>> Subject: Digest Footer
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> ------------------------------
>> End of Python-ideas Digest, Vol 103, Issue 15
>> *********************************************
> -- 
> -Surya Subbarao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Tue Jun  2 08:40:34 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 06:40:34 +0000 (UTC)
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Monday, June 1, 2015 10:23 PM, "random832 at" <random832 at> wrote:

>Does IEEE even have anything about arbitrary-precision decimal types

>(which are what decimal/cdecimal are)?


When many people say "IEEE float" they still mean 754-1985. This is what C90 was designed to "support without quite supporting", and what C99 explicitly supports, and what many consumer FPUs support (or, in the case of the 8087 and its successors, a preliminary version of the 1985 standard). That standard did not cover either arbitrary precision or decimals; both of those were only part of the companion standard 854 (which isn't complete enough to base an implementation on).

But the current version of the standard, 754-2008, does cover arbitrary-precision decimal types.

If I understand the relationship between the standards: 754-2008 was designed to merge 754-1985 and 854-1987, fill in the gaps, and fix any bugs; GDAS was a major influence (the committee chair was GDAS's author); and since 2009 GDAS has gone from being a de facto independent standard to being a more-specific specification of the relevant subset of 754-2008. IBM's hardware and Java library implement GDAS (and therefore implicitly the relevant part of 754-2008); Itanium (partly), C11, the gcc extensions, and Intel's C library implement 754-2008 (or IEC 60559, which is just a republished 754-2008).

So, my guess is that GDAS makes perfect sense to follow unless Python wants to expose C11's native fixed decimals, or the newer math.h functions from C99/C11/C14, or the other parts of 754-2008 that it doesn't support (like arbitrary-precision binary). My question was just whether someone had actually made that decision, or whether decimal is following GDAS just because that was the obvious decision to make in 2003.

From mal at  Tue Jun  2 09:53:03 2015
From: mal at (M.-A. Lemburg)
Date: Tue, 02 Jun 2015 09:53:03 +0200
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 02.06.2015 08:40, Andrew Barnert via Python-ideas wrote:
> On Monday, June 1, 2015 10:23 PM, "random832 at" <random832 at> wrote:
>> Does IEEE even have anything about arbitrary-precision decimal types
>> (which are what decimal/cdecimal are)?
> Yes.
> When many people say "IEEE float" they still mean 754-1985. This is what C90 was designed to "support without quite supporting", and what C99 explicitly supports, and what many consumer FPUs support (or, in the case of the 8087 and its successors, a preliminary version of the 1985 standard). That standard did not cover either arbitrary precision or decimals; both of those were only part of the companion standard 854 (which isn't complete enough to base an implementation on).
> But the current version of the standard, 754-2008, does cover arbitrary-precision decimal types.
> If I understand the relationship between the standards: 754-2008 was designed to merge 754-1985 and 854-1987, fill in the gaps, and fix any bugs; GDAS was a major influence (the committee chair was GDAS's author); and since 2009 GDAS has gone from being a de facto independent standard to being a more-specific specification of the relevant subset of 754-2008. IBM's hardware and Java library implement GDAS (and therefore implicitly the relevant part of 754-2008); Itanium (partly), C11, the gcc extensions, and Intel's C library implement 754-2008 (or IEC 60559, which is just a republished 754-2008).
> So, my guess is that GDAS makes perfect sense to follow unless Python wants to expose C11's native fixed decimals, or the newer math.h functions from C99/C11/C14, or the other parts of 754-2008 that it doesn't support (like arbitrary-precision binary). My question was just whether someone had actually made that decision, or whether decimal is following GDAS just because that was the obvious decision to make in 2003.

The IBM decimal implementation by Mike Cowlishaw was chosen
as basis for the Python's decimal implementation at the time,
so yes, this was an explicit design choice at the time:

According to the PEP, decimal implements IEEE 854-1987 (with some

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 02 2015)
>>> Python Projects, Coaching and Consulting ...
>>> mxODBC Plone/Zope Database Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...

::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From p.f.moore at  Tue Jun  2 10:13:39 2015
From: p.f.moore at (Paul Moore)
Date: Tue, 2 Jun 2015 09:13:39 +0100
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 2 June 2015 at 02:37, Steven D'Aprano <steve at> wrote:
> Sometimes there really is no good alternative to actually knowing what
> you are doing.


From mal at  Tue Jun  2 10:19:39 2015
From: mal at (M.-A. Lemburg)
Date: Tue, 02 Jun 2015 10:19:39 +0200
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

On 02.06.2015 03:37, Steven D'Aprano wrote:
> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote:
>> Having some sort of decimal literal would have some advantages of its own,
>> for one it could help against this sillyness:
>>>>> Decimal(1.3)
>> Decimal('1.3000000000000000444089209850062616169452667236328125')
> Why is that silly? That's the actual value of the binary float 1.3 
> converted into base 10. If you want 1.3 exactly, you can do this:
>>>>> Decimal('1.3')
>> Decimal('1.3')
> Is that really so hard for people to learn? 

Joonas, I think you're approaching this from the wrong angle.

People who want to get an exact decimal from a literal, will
use the string representation to define it, not a float

In practice, you typically read the data from some file or stream
anyway, so it already comes as string value and if you want to
convert an actual float to a decimal, this will most likely
not be done in a literal way, but instead by passed in to
the Decimal constructor as variable, so there's no literal

It may be good to provide some alternative ways of converting
a float to a decimal, e.g. one which uses the float repr logic
to overcome things like repr(float(1.1)) == '1.1000000000000001'
instead of a direct conversion:

>>> Decimal(1.1)
>>> Decimal(repr(1.1))

These could be added as parameter to the Decimal constructor.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 02 2015)
>>> Python Projects, Coaching and Consulting ...
>>> mxODBC Plone/Zope Database Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...

::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From dennis at  Tue Jun  2 10:23:35 2015
From: dennis at (Dennis Kaarsemaker)
Date: Tue, 02 Jun 2015 10:23:35 +0200
Subject: [Python-ideas] Explicitly shared objects with sub modules vs
In-Reply-To: <mkclvj$390$>
References: <mkclvj$390$>
Message-ID: <>

On za, 2015-05-30 at 11:45 -0400, Ron Adam wrote:

> The solution I found was to call a function to explicitly set the shared 
> items in the imported module.

This reminds me of my horrible april fools hack of 2013 to make Python
look more like perl:
Dennis Kaarsemaker

From tritium-list at  Tue Jun  2 11:38:47 2015
From: tritium-list at (Alexander Walters)
Date: Tue, 02 Jun 2015 05:38:47 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
 <> <>
Message-ID: <>

I think there is another discussion to have here, and that is making 
Decimal part of the language (__builtin(s)__) vs. part of the library 
(which implementations can freely omit).  If it were part of the 
language, then maybe, just maybe, a literal syntax should be considered.

As it stands, Decimal and Fraction are libraries - implementations of 
python are free to omit them (as I think some of the embedded platform 
implementations do), and it currently does not make a lick of sense to 
add syntax for something that is only in the library.

On 6/2/2015 04:19, M.-A. Lemburg wrote:
> On 02.06.2015 03:37, Steven D'Aprano wrote:
>> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote:
>>> Having some sort of decimal literal would have some advantages of its own,
>>> for one it could help against this sillyness:
>>>>>> Decimal(1.3)
>>> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> Why is that silly? That's the actual value of the binary float 1.3
>> converted into base 10. If you want 1.3 exactly, you can do this:
>>>>>> Decimal('1.3')
>>> Decimal('1.3')
>> Is that really so hard for people to learn?
> Joonas, I think you're approaching this from the wrong angle.
> People who want to get an exact decimal from a literal, will
> use the string representation to define it, not a float
> representation.
> In practice, you typically read the data from some file or stream
> anyway, so it already comes as string value and if you want to
> convert an actual float to a decimal, this will most likely
> not be done in a literal way, but instead by passed in to
> the Decimal constructor as variable, so there's no literal
> involved.
> It may be good to provide some alternative ways of converting
> a float to a decimal, e.g. one which uses the float repr logic
> to overcome things like repr(float(1.1)) == '1.1000000000000001'
> instead of a direct conversion:
>>>> Decimal(1.1)
> Decimal('1.100000000000000088817841970012523233890533447265625')
>>>> Decimal(repr(1.1))
> Decimal('1.1')
> These could be added as parameter to the Decimal constructor.

From ncoghlan at  Tue Jun  2 14:34:30 2015
From: ncoghlan at (Nick Coghlan)
Date: Tue, 2 Jun 2015 22:34:30 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
 <> <>
Message-ID: <>

On 2 June 2015 at 19:38, Alexander Walters <tritium-list at> wrote:
> I think there is another discussion to have here, and that is making Decimal
> part of the language (__builtin(s)__) vs. part of the library (which
> implementations can freely omit).  If it were part of the language, then
> maybe, just maybe, a literal syntax should be considered.

For decimal, the issues that keep it from becoming a literal are
similar to those that keep it from becoming a builtin: configurable
contexts are a core part of the decimal module's capabilities, and
making a builtin type context dependent causes various problems when
it comes to reasoning about a piece of code based on purely local
information. Those problems affect human readers regardless, but once
literals enter the mix, they affect all compile time processing as

On that front, I also finally found the (mammoth) thread from last
year about the idea of using base 10 for floating point values by

One of the things we eventually realised in that thread is that the
context dependence problem, while concerning for a builtin type, is an
absolute deal breaker for literals, because it means you *can't
constant fold them* by calculating the results of expressions at
compile time and store the result directly into the code object

This problem is illustrated by asking the following question: What is
the result of "Decimal('1.0') + Decimal('1e70')"?

Correct answer? Insufficient data (since we don't know the current
decimal precision).

With the current decimal module, the configurable rounding behaviour
is something you just need to learn about as part of adopting the
module. Since that configurability is one of the main reasons for
using it over binary floating point, that's generally not a big deal.

It becomes a much bigger deal when the question being asked is: What
is the result of "1.0d + 1e70d"?

Those look like they should be numeric constants, and hence the
compiler should be able to constant fold them at compile time. That's
possible if we were to pick a single IEEE decimal type as a builtin
(either decimal64 or decimal128), but not possible if we tried to use
the current variable precision decimal type.

One of the other "fun" discrepancies introduced by the context
sensitive processing in decimals is that unary plus and minus are
context-sensitive, which means that any literal format can't express
arbitrary negative decimal values without a parser hack to treat the
minus sign as part of the trailing literal. This is one of the other
main reasons why decimal64 or decimal128 are better candidates for a
builtin decimal type than decimal.Decimal as it exists today (as well
as being potentially more amenable to hardware acceleration on some


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abarnert at  Tue Jun  2 14:40:42 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 05:40:42 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
 <> <>
Message-ID: <>

On Jun 2, 2015, at 02:38, Alexander Walters <tritium-list at> wrote:
> I think there is another discussion to have here, and that is making Decimal part of the language (__builtin(s)__) vs. part of the library (which implementations can freely omit).

I don't think there is any such distinction in Python.

Neither the language reference nor the library reference claims to be a specification.

The library documentation specifically says that it "describes the standard library that is distributed with Python" and "also describes some of the optional components that are commonly included in Python  distributions", which implies that, except for the handful of modules that are described as optional or platform-specific, everything should always be there. (There is special dispensation for Unix systems to split up Python into separate packages, but even that is specifically limited to "some or all of the optional components".)

Historically, implementations that haven't included the entire stdlib also haven't included parts of the language (Jython 2.2 and early 2.5 versions, early versions of PyPy, the various browser-based implementations, MicroPython and PyMite, etc.).

Also, both the builtins module and the actual built-in functions, constants, types, and exceptions it contains are documented as part of the library, just like decimal, not as part of the language.

So, Python isn't like C, with separate specifications for "freestanding" vs. "hosted" implementations, and it doesn't have a separate specification for an "embedded" subset like C++ used to.

> If it were part of the language, then maybe, just maybe, a literal syntax should be considered.

Since there is no such distinction between language and library, I think we're free to define a literal syntax for decimals and fractions.

From a practical point of view (which beats purity, of course), it's probably not reasonable for CPython to define such literals unless there's a C implementation that defines the numeric type slot (and maybe even has a C API concrete type interface, although maybe not), and which can be "frozen" at build time. (See past discussions on adding an OrderedDict literal for why these things are important.) 

That's currently true for Decimal, but not for Fraction. So, that might be an argument against fraction literals, or for providing a C implementation of the fraction module.

> As it stands, Decimal and Fraction are libraries - implementations of python are free to omit them (as I think some of the embedded platform implementations do), and it currently does not make a lick of sense to add syntax for something that is only in the library.

Even besides the four library sections on the various kinds of built-in things, plenty of other things are syntax for something that's "only in the library". The import statement is defined in terms of functionality in importlib, and (at least in CPython) actually implemented that way.

In fact, numeric values, as defined in the data model section of the language reference, are defined in terms of types from the library docs, both in stdtypes and  in the numbers module. Defining decimal values in terms of types defined in the decimal module library section would be no different. (Numeric _literals_ don't seem to have their semantics defined anywhere, just their syntax, but it's pretty obvious from the wording that they're intended to have int, float, and complex values as defined by the data model--which, again, means as defined by the library.)

So, while there are potentially compelling arguments against a decimal literal (how it interacts with contexts may be confusing, the idea may be too bikesheddable to come up with one true design that everyone will like, or may be an attractive nuisance, it may add too much complexity to the implementation for the benefit, etc.), "decimal is only a library" doesn't seem to be one.

>> On 6/2/2015 04:19, M.-A. Lemburg wrote:
>>> On 02.06.2015 03:37, Steven D'Aprano wrote:
>>>> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote:
>>>> Having some sort of decimal literal would have some advantages of its own,
>>>> for one it could help against this sillyness:
>>>>>>> Decimal(1.3)
>>>> Decimal('1.3000000000000000444089209850062616169452667236328125')
>>> Why is that silly? That's the actual value of the binary float 1.3
>>> converted into base 10. If you want 1.3 exactly, you can do this:
>>>>>>> Decimal('1.3')
>>>> Decimal('1.3')
>>> Is that really so hard for people to learn?
>> Joonas, I think you're approaching this from the wrong angle.
>> People who want to get an exact decimal from a literal, will
>> use the string representation to define it, not a float
>> representation.
>> In practice, you typically read the data from some file or stream
>> anyway, so it already comes as string value and if you want to
>> convert an actual float to a decimal, this will most likely
>> not be done in a literal way, but instead by passed in to
>> the Decimal constructor as variable, so there's no literal
>> involved.
>> It may be good to provide some alternative ways of converting
>> a float to a decimal, e.g. one which uses the float repr logic
>> to overcome things like repr(float(1.1)) == '1.1000000000000001'
>> instead of a direct conversion:
>>>>> Decimal(1.1)
>> Decimal('1.100000000000000088817841970012523233890533447265625')
>>>>> Decimal(repr(1.1))
>> Decimal('1.1')
>> These could be added as parameter to the Decimal constructor.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From random832 at  Tue Jun  2 14:44:02 2015
From: random832 at (random832 at
Date: Tue, 02 Jun 2015 08:44:02 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
 <> <>
Message-ID: <>

On Tue, Jun 2, 2015, at 08:34, Nick Coghlan wrote:
> For decimal, the issues that keep it from becoming a literal are
> similar to those that keep it from becoming a builtin: configurable
> contexts are a core part of the decimal module's capabilities, and
> making a builtin type context dependent causes various problems when
> it comes to reasoning about a piece of code based on purely local
> information. Those problems affect human readers regardless, but once
> literals enter the mix, they affect all compile time processing as
> well.

Why do contexts exist? Why isn't this an issue for float, despite
floating point contexts being something that exists in IEEE 754?

As for constant folding - well, maybe python needs a -ffast-math

From abarnert at  Tue Jun  2 14:50:42 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 05:50:42 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
 <> <>
Message-ID: <>

On Jun 2, 2015, at 05:44, random832 at wrote:
>> On Tue, Jun 2, 2015, at 08:34, Nick Coghlan wrote:
>> For decimal, the issues that keep it from becoming a literal are
>> similar to those that keep it from becoming a builtin: configurable
>> contexts are a core part of the decimal module's capabilities, and
>> making a builtin type context dependent causes various problems when
>> it comes to reasoning about a piece of code based on purely local
>> information. Those problems affect human readers regardless, but once
>> literals enter the mix, they affect all compile time processing as
>> well.
> Why do contexts exist? Why isn't this an issue for float, despite
> floating point contexts being something that exists in IEEE 754?

The issue here isn't really binary vs. decimal, but rather that float implements a specific fixed-precision (binary) float type, and Decimal implements a configurable-precision (decimal) float type.

As Nick explained elsewhere in that message, decimal64 or decimal128 wouldn't have the context problem.

And similarly, a binary.Binary type designed like decimal.Decimal would have the context problem.

(This is a slight oversimplification; there's also the fact that Decimal implements the full set of 754-2008 context features, while float implements a subset of 754-1985 features, and even that only if the underlying C lib does so, and nobody ever uses them anyway.)

From abarnert at  Tue Jun  2 15:05:32 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 06:05:32 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
 <> <>
Message-ID: <>

On Jun 2, 2015, at 05:34, Nick Coghlan <ncoghlan at> wrote:
> This is one of the other
> main reasons why decimal64 or decimal128 are better candidates for a
> builtin decimal type than decimal.Decimal as it exists today (as well
> as being potentially more amenable to hardware acceleration on some
> platforms).

OK, so what are the stumbling blocks to adding decimal32/64/128 (or just one of the three), either in builtins/stdtypes or in decimal, and then adding literals for them?

I can imagine a few: someone has to work out exactly what features to support (the same things as float, or everything in the standard?), how it interacts with Decimal and float (which is defined by the standard, but translating that to Python isn't quite trivial), how it fits into the numeric tower ABCs, and what syntax to use for the literals, and if/how it fits into things like array/struct/ctypes and into math, and whether we need decimal complex values, and what the C API looks like (it would be nice if PyDecimal64_AsDecimal64 worked as expected on C11 platforms, but you could still use decimal64 on C90 platforms and just not get such functions...); then write a PEP; then write an implementation; and after all that work, the result may be seen as too much extra complexity (either in the language or in the implementation) for the benefits. But is that it, or is there even more that I'm missing?

(Of course while we're at it, it would be nice to have arbitrary-precision IEEE binary floats as well, modeled on the decimal module, and to add all the missing 754-2008/C11 methods/math functions for the existing float type, but those seem like separate proposals from fixed-precision decimal floats.)

From oscar.j.benjamin at  Tue Jun  2 16:05:07 2015
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Tue, 2 Jun 2015 15:05:07 +0100
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
 <> <>
Message-ID: <>

On 2 June 2015 at 14:05, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> On Jun 2, 2015, at 05:34, Nick Coghlan <ncoghlan at> wrote:
>> This is one of the other
>> main reasons why decimal64 or decimal128 are better candidates for a
>> builtin decimal type than decimal.Decimal as it exists today (as well
>> as being potentially more amenable to hardware acceleration on some
>> platforms).
> OK, so what are the stumbling blocks to adding decimal32/64/128 (or just one of the three), either in builtins/stdtypes or in decimal, and then adding literals for them?
> I can imagine a few: someone has to work out exactly what features to support (the same things as float, or everything in the standard?),

I would argue that it should be as simple as float. If someone wants
the rest of it they've got the Decimal module which is more than
enough for their needs.

> how it interacts with Decimal and float (which is defined by the standard, but translating that to Python isn't quite trivial),

Interaction between decimalN and Decimal coerces to Decimal.
Interaction with floats is a TypeError.

> how it fits into the numeric tower ABCs,

Does anyone really use these for anything? I haven't really found them
to be very useful since no third-party numeric types use them and they
don't really define the kind of information that you might really want
in any carefully written numerical algorithm. I don't see any gain in
adding any decimal types to e.g Real as the ABCs seem irrelevant to

> and what syntax to use for the literals, and if/how it fits into things like array/struct/ctypes

It's not essential to incorporate them here. If they become commonly
used in C then it would be good to have these for binary

> and into math, and whether we need decimal complex values,

It's not common to use the math-style functions with the decimal
module unless you're using it as a multi-precision library and then
you'd really want the full Decimal type. There's no advantage in using
decimal for e.g. sin, cos etc. so there's not much really lost in
converting to binary and back. It's in the simple arithmetic where it
makes a difference so I'd say that decimal should stick to that.

As for complex decimals this would only really be worth it if the
ultimate plan was to have decimals become the default floating point
type. Laura suggested that earlier and I probably agree that it would
have been a good idea at some earlier time but it's a bit late for

> and what the C API looks like (it would be nice if PyDecimal64_AsDecimal64 worked as expected on C11 platforms, but you could still use decimal64 on C90 platforms and just not get such functions...);

Presumably CPython would have to write it's own implementation e.g.:


... or something like that.

> then write a PEP; then write an implementation; and after all that work, the result may be seen as too much extra complexity (either in the language or in the implementation) for the benefits. But is that it, or is there even more that I'm missing?

I don't think anyone has proposed to add all of the things that you
suggested. Of course if there are decimal literals and a fixed-width
decimal type then over time people will suggest some of the other
things. That doesn't mean that they'd need to be incorporated though.

A year ago I said I'd write a PEP for decimal literals but then I got
clobbered at work and a number of other things happened so that I
didn't even have time to read threads like this. Maybe it's worth


From abarnert at  Tue Jun  2 17:14:14 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 08:14:14 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
 <> <>
Message-ID: <>

On Jun 2, 2015, at 07:05, Oscar Benjamin <oscar.j.benjamin at> wrote:
> On 2 June 2015 at 14:05, Andrew Barnert via Python-ideas
> <python-ideas at> wrote:
>> On Jun 2, 2015, at 05:34, Nick Coghlan <ncoghlan at> wrote:
>>> This is one of the other
>>> main reasons why decimal64 or decimal128 are better candidates for a
>>> builtin decimal type than decimal.Decimal as it exists today (as well
>>> as being potentially more amenable to hardware acceleration on some
>>> platforms).
>> OK, so what are the stumbling blocks to adding decimal32/64/128 (or just one of the three), either in builtins/stdtypes or in decimal, and then adding literals for them?
>> I can imagine a few: someone has to work out exactly what features to support (the same things as float, or everything in the standard?),
> I would argue that it should be as simple as float. If someone wants
> the rest of it they've got the Decimal module which is more than
> enough for their needs.

But decimal64 and Decimal are not the same types. So, if you want to, e.g., get the next decimal64 value after the current value, how would you do that? (Unless you're suggesting there should be a builtin decimal64 and a separate decimal.decimal64 or something, but I don't think you are.)

Also, with float, we can get away with saying we're supporting the 1985 standard and common practice among C90 implementations; with decimal64, the justification for arbitrarily implementing part of the 2008 standard but not the rest is not as clear-cut.

>> how it interacts with Decimal and float (which is defined by the standard, but translating that to Python isn't quite trivial),
> Interaction between decimalN and Decimal coerces to Decimal.

Even when the current decimal context is too small to hold a decimalN? Does that raise any flags?

> Interaction with floats is a TypeError.
>> how it fits into the numeric tower ABCs,
> Does anyone really use these for anything? I haven't really found them
> to be very useful since no third-party numeric types

The NumPy native types do. (Of course they also subclass int and float even relevant.)

> use them and they
> don't really define the kind of information that you might really want
> in any carefully written numerical algorithm. I don't see any gain in
> adding any decimal types to e.g Real as the ABCs seem irrelevant to
> me.

Even if they are completely irrelevant, unless they're deprecated they pretty much have to be supported by any new types. There might be a good argument that decimal64 doesn't fit into the numeric tower, but you'd have to make that argument.

>> and what syntax to use for the literals, and if/how it fits into things like array/struct/ctypes
> It's not essential to incorporate them here. If they become commonly
> used in C then it would be good to have these for binary
> compatibility.

For ctypes, sure (although even there, ctypes is a relatively simple way to share values between pure-Python child processes with multiprocessing.shared_ctypes). But for array, that's generally not about compatibility with existing C code, it's about efficiently packing zillions of homogenous simple values into as little memory as possible.

>> and into math, and whether we need decimal complex values,
> It's not common to use the math-style functions with the decimal
> module

Well, math is mostly about double functions from the C90 stdlib, so it's not common to use them with decimal. But that doesn't mean you wouldn't want decimal64 implementations of some of the functions in math. 

> unless you're using it as a multi-precision library and then
> you'd really want the full Decimal type.

But again, the full Decimal type isn't just an expansion on decimal64, it's a completely different type, with context-sensitive precision.

> There's no advantage in using
> decimal for e.g. sin, cos etc.
> so there's not much really lost in
> converting to binary and back.

There's still rounding error. Sure, usually that won't make a difference--but when it does, it will be surprising and frustrating if you didn't explicitly ask for it.

> It's in the simple arithmetic where it
> makes a difference so I'd say that decimal should stick to that.
> As for complex decimals this would only really be worth it if the
> ultimate plan was to have decimals become the default floating point
> type.


> Laura suggested that earlier and I probably agree that it would
> have been a good idea at some earlier time but it's a bit late for
> that.
>> and what the C API looks like (it would be nice if PyDecimal64_AsDecimal64 worked as expected on C11 platforms, but you could still use decimal64 on C90 platforms and just not get such functions...);
> Presumably CPython would have to write it's own implementation e.g.:
>    PyDecimal64_FromIntExponentAndLongSignificand
> ... or something like that.

Sure, if you want a C API for C90 platforms at all. But you may not even need that. When would you need to write C code that deals with decimal64 values as exponent and significant? Dealing with them as abstract numbers, general Python objects, native decimal64, and maybe even opaque values that I can pass around in C without being able to interpret them, I can see, but what C code needs the exponent and significand?

>> then write a PEP; then write an implementation; and after all that work, the result may be seen as too much extra complexity (either in the language or in the implementation) for the benefits. But is that it, or is there even more that I'm missing?
> I don't think anyone has proposed to add all of the things that you
> suggested.

I think in many (but maybe not all) of these cases the simplest answer is the best, but a PEP would have to actually make that case for each thing.

> Of course if there are decimal literals and a fixed-width
> decimal type then over time people will suggest some of the other
> things. That doesn't mean that they'd need to be incorporated though.
> A year ago I said I'd write a PEP for decimal literals but then I got
> clobbered at work and a number of other things happened so that I
> didn't even have time to read threads like this. Maybe it's worth
> revisiting...

Maybe we need a PEP for the decimalN type(s) first, then if someone has time and inclination they can write a PEP for literals for those types, either as a companion or as a followup. That would probably cut out 30-50% of the work, and maybe even more of the room for argument and bikeshedding.

From tjreedy at  Tue Jun  2 19:29:14 2015
From: tjreedy at (Terry Reedy)
Date: Tue, 02 Jun 2015 13:29:14 -0400
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
 <> <>
Message-ID: <mkkp69$h9a$>

On 6/2/2015 9:05 AM, Andrew Barnert via Python-ideas wrote:

> OK, so what are the stumbling blocks to adding decimal32/64/128 (or
> just one of the three), either in builtins/stdtypes or in decimal,
> and then adding literals for them?

A compelling rationale.  Python exposes the two basic number types used 
by the kinds of computers it runs on: integers (extended) and floats 
(binary in practice, though the language definition would all a decimal 
float machine).

The first killer app for Python was scientific numerical computing.  The 
first numerical package developed for this exposed the entire gamut of 
integer and float types available in C.  Numpy is the third numerical 
package. (Even so, none of the packages have been distributed with 
CPython -- and properly so.)

Numbers pre-wrapped as dates, times, and datetimes with specialized 
methods are not essential (Python once managed without) but are 
enormously useful in a wide variety of application areas.

Decimals, another class of pre-wrapped numbers, greatly simplify money 
calculations, including those that must follow legal or contractual 
rules.  It is no accident that the decimal specification is a product of 
what was once International Business Machines. Contexts and specialized 
rounding rules are an essential part of fulfilling the purpose of the 

What application area would be opened up by adding a fixed-precision 
float?  The only thing I have seen presented is making interactive 
python act even more* like a generic (decimal) calculator, so that 
newbies will find python floats less surprising that those of other 
languages.  (Of course, a particular decimal## might not exactly any 
existing calculator.)

*The int division change solved the biggest discrepancy: 1/10 is not .1 
instead of 0. Representation changes improved things also.

Terry Jan Reedy

From abarnert at  Tue Jun  2 21:03:25 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 12:03:25 -0700
Subject: [Python-ideas] User-defined literals
Message-ID: <>

This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery.

I explored the convertible literals a while ago, and I'm pretty sure that doesn't work in a duck-typed language. But the C++ design does work, as long as you're willing to have the conversion (including the lookup of the conversion function itself) done at runtime.

Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.

Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice.

Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from ? import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal.

Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary.

Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

I've built a quick&dirty toy implementation (at Unlike the real proposal, this only handles numbers, and allows whitespace between the numbers and the names, and is a terrible hack. But it's enough to play with the idea, and you don't need to patch and recompile CPython to use it.

My feeling is that this would be useful, but the problems are not surmountable without much bigger changes, and there's no obvious better design that avoids them. But I'm interested to see what others think. 

From me at  Tue Jun  2 21:33:46 2015
From: me at (Florian Bruhin)
Date: Tue, 2 Jun 2015 21:33:46 +0200
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <20150602193346.GG26357@tonks>

* Andrew Barnert via Python-ideas <python-ideas at> [2015-06-02 12:03:25 -0700]:
> This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

I actually had the exact same thing in mind recently, and never
brought it up because it seemed too crazy to me. It seems I'm not the
only one! :D

> Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.

I think a big issue is that it's non-obvious syntactic sugar. You
wouldn't expect 1.2x to actually be a function call, and for newcomers
this might be rather confusing...

> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

That actually was the use-case I had in mind. I think

    {'spam': 1, 'eggs': 2}_o

is less ugly (and less error-prone!) than

    OrderedDict([('spam', 1), ('eggs': 2)])

Also, it's immediately apparent that it is some kind of dict.

> I've built a quick&dirty toy implementation (at Unlike the real proposal, this only handles numbers, and allows whitespace between the numbers and the names, and is a terrible hack. But it's enough to play with the idea, and you don't need to patch and recompile CPython to use it.

Wow! I'm always amazed at how malleable Python is.


-- | me at (Mail/XMPP)
   GPG: 916E B0C8 FD55 A072 |
         I love long mails! |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <>

From rosuav at  Tue Jun  2 21:40:01 2015
From: rosuav at (Chris Angelico)
Date: Wed, 3 Jun 2015 05:40:01 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice.

There's probably no solution to the literal_imal problem, but the
easiest fix for literal_ump is to have 21j be parsed the same way -
it's a 21 modified by j, same as 21jump is a 21 modified by jump.

> Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from ? import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal.

I'd much rather see it be done at compile time. Something like this:

compile("x = 1d/10", "<>", "exec")

would immediately call literal_d("1") and embed its return value in
the resulting code as a literal. (Since the peephole optimizer
presumably doesn't currently understand Decimals, this would probably
keep the division, but if it got enhanced, this could end up
constant-folding to Decimal("0.1") before returning the code object.)
So it's only the compilation step that needs to know about all those
literal_* functions. Should there be a way to globally register them
for default usage, or is this too much action-at-a-distance?

> Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary.

I'd be inclined to simply always provide a string. The special case
would be that the quotes can sometimes be omitted, same as redundant
parens on genexprs can sometimes be omitted. Otherwise, 1.2d might
still produce wrong results.

> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

I thought there was no such thing as a dict/list/set literal, only
display syntax? In any case, that can always be left for a future
extension to the proposal.


From njs at  Tue Jun  2 21:40:50 2015
From: njs at (Nathaniel Smith)
Date: Tue, 2 Jun 2015 12:40:50 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.
> In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery.

Are there any use cases besides decimals? Wouldn't it be easier to
just add, say, a fixed "0d" prefix for decimals?

0x1001  # hex
0b1001  # binary
0d1.001 # decimal

> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

Also there's the idea floating around of making *all* dicts ordered
(as PyPy has done), which would be much cleaner if it can be managed,
so I'm guessing that would have to be tried and fail before any new
syntax would be added for this use case.


Nathaniel J. Smith --

From ckaynor at  Tue Jun  2 22:30:27 2015
From: ckaynor at (Chris Kaynor)
Date: Tue, 2 Jun 2015 13:30:27 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 2, 2015 at 12:40 PM, Nathaniel Smith <njs at> wrote:

> On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas
> <python-ideas at> wrote:
> > This is a straw-man proposal for user-defined literal suffixes, similar
> to the design in C++.
> >
> > In the thread on decimals, a number of people suggested that they'd like
> to have decimal literals. Nick Coghlan explained why decimal.Decimal
> literals don't make sense in general (primarily, but not solely, because
> they're inherently context-sensitive), so unless we first add a fixed type
> like decimal64, that idea is a non-starter. However, there was some
> interest in either having Swift-style convertible literals or C++-style
> user-defined literals. Either one would allow users who want decimal
> literals for a particular app where it makes sense (because there's a
> single fixed context, and the performance cost of Decimal('1.2') vs. a real
> constant is irrelevant) to add them without too much hassle or hackery.
> Are there any use cases besides decimals? Wouldn't it be easier to
> just add, say, a fixed "0d" prefix for decimals?
> 0x1001  # hex
> 0b1001  # binary
> 0d1.001 # decimal

In terms of other included useful options, you also have fractions.

There could also be benefit of using such a system for cases of numbers
with units, such as having the language understand 23.49MB.

That said, very similar results could be achieved in most cases by merely
using a normal function, without the need for special syntax. Decimal and
Fraction are probably the only two major cases where you will see any
actual benefit, though there may be libraries that may provide other number
formats that could benefit (perhaps a base-3 number?).

> Similarly, this idea could be extended to handle all literal types, so
> you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but
> I think that's ugly enough to not be worth proposing. (A prefix looks
> better there... but a prefix doesn't work for numbers or strings. And I'm
> not sure it's unambiguously parseable even for list/set/dict.) Plus,
> there's the problem that comprehensions and actual literals are both parsed
> as displays, but you wouldn't want user-defined comprehensions.
> Also there's the idea floating around of making *all* dicts ordered
> (as PyPy has done), which would be much cleaner if it can be managed,
> so I'm guessing that would have to be tried and fail before any new
> syntax would be added for this use case.

One benefit of the proposal is that it can be readily generalized to all
literal syntax, so custom behaviors for native support of ordered dicts,
trees, ordered sets, multi-sets, counters, and so forth could all be added
via libraries, with little to no additional need for Python to be updated
to support them directly.

All-in-all, I'd be very mixed on such a feature. I can see plenty of cases
where it would provide benefit, however it also adds quite a bit of
complexity to the language, and could easily result in code with nasty
action-at-a-distance issues. If such a feature were implemented, Python
would probably also want to reserve some set of the names for future
language features, similar to how dunder names are reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From njs at  Tue Jun  2 23:26:22 2015
From: njs at (Nathaniel Smith)
Date: Tue, 2 Jun 2015 14:26:22 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 2, 2015 1:32 PM, "Chris Kaynor" <ckaynor at> wrote:
> On Tue, Jun 2, 2015 at 12:40 PM, Nathaniel Smith <njs at> wrote:
>> Are there any use cases besides decimals? Wouldn't it be easier to
>> just add, say, a fixed "0d" prefix for decimals?
>> 0x1001  # hex
>> 0b1001  # binary
>> 0d1.001 # decimal
> In terms of other included useful options, you also have fractions.
> There could also be benefit of using such a system for cases of numbers
with units, such as having the language understand 23.49MB.

The unit libraries I've seen just spell this as "23.49 * MB" (or "22.49 *
km / h" for a speed, say). And crucially they don't have any need to
override the parsing rules for float literals.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ron3200 at  Tue Jun  2 23:46:13 2015
From: ron3200 at (Ron Adam)
Date: Tue, 02 Jun 2015 17:46:13 -0400
Subject: [Python-ideas] Explicitly shared objects with sub modules vs
In-Reply-To: <>
References: <mkclvj$390$>
Message-ID: <mkl876$eul$>

On 06/02/2015 04:23 AM, Dennis Kaarsemaker wrote:
> On za, 2015-05-30 at 11:45 -0400, Ron Adam wrote:
>> The solution I found was to call a function to explicitly set the shared
>> items in the imported module.
> This reminds me of my horrible april fools hack of 2013 to make Python
> look more like perl:

It's the reverse of what this suggestion does, if I'm reading it correctly. 
  It allows called code to alter the callers frame.  Obviously that 
wouldn't be good to do.

I think what makes the suggestion in this thread "not good", is that 
modules have no formal order of dependency.  If they did, then it could be 
restricted to only work in one direction, which means sub-modules couldn't 
effect parent modules.  But python isn't organised that way.  All modules 
are at the same level.  Which means they can import from each other... and 
possibly export to each other too.  So it's up to the programmer to 
restrict what parts effect other parts as if it did have a formal 
dependency order.


From surya.subbarao1 at  Wed Jun  3 00:28:33 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Tue, 2 Jun 2015 15:28:33 -0700
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 15
In-Reply-To: <>
References: <>
Message-ID: <>

What do you mean by replying inine?

On Mon, Jun 1, 2015 at 10:22 PM, Andrew Barnert <abarnert at> wrote:
> On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person
> <surya.subbarao1 at> wrote:
> I think you're right. I was also considering ... "editing" my Python
> distribution. If they didn't implement my suggestion for correcting floats,
> at least they can fix this, instead of making people hack Python for good
> results!
> If you're going to reply to digests, please learn how to reply inline
> instead of top-posting (and how to trim out all the irrelevant stuff). It's
> next to impossible to tell which part of which of the messages you're
> replying to even in simple cases like this one, with only 4 messages in the
> digest.
> On Mon, Jun 1, 2015 at 8:10 PM, <python-ideas-request at> wrote:
>> Send Python-ideas mailing list submissions to
>>         python-ideas at
>> To subscribe or unsubscribe via the World Wide Web, visit
>> or, via email, send a message with subject or body 'help' to
>>         python-ideas-request at
>> You can reach the person managing the list at
>>         python-ideas-owner at
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Python-ideas digest..."
>> Today's Topics:
>>    1. Re: Python Float Update (Steven D'Aprano)
>>    2. Re: Python Float Update (Andrew Barnert)
>>    3. Re: Python Float Update (Steven D'Aprano)
>>    4. Re: Python Float Update (Andrew Barnert)
>> ----------------------------------------------------------------------
>> Message: 1
>> Date: Tue, 2 Jun 2015 11:37:48 +1000
>> From: Steven D'Aprano <steve at>
>> To: python-ideas at
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <20150602013748.GD932 at>
>> Content-Type: text/plain; charset=us-ascii
>> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote:
>> > Having some sort of decimal literal would have some advantages of its
>> > own,
>> > for one it could help against this sillyness:
>> >
>> > >>> Decimal(1.3)
>> > Decimal('1.3000000000000000444089209850062616169452667236328125')
>> Why is that silly? That's the actual value of the binary float 1.3
>> converted into base 10. If you want 1.3 exactly, you can do this:
>> > >>> Decimal('1.3')
>> > Decimal('1.3')
>> Is that really so hard for people to learn?
>> > I'm not saying that the actual data type needs to be a decimal (
>> > might well be a float but say shove the string repr next to it so it can
>> > be
>> > accessed when needed)
>> You want Decimals to *lie* about what value they have?
>> I think that's a terrible idea, one which would lead to a whole set of
>> new and exciting surprises when using Decimal. Let me try to predict a
>> few of the questions on Stackoverflow which would follow this change...
>>   Why is equality so inaccurate in Python?
>>   py> x = Decimal(1.3)
>>   py> y = Decimal('1.3')
>>   py> x, y
>>   (Decimal('1.3'), Decimal('1.3'))
>>   py> x == y
>>   False
>>   Why does Python insert extra digits into numbers when I multiply?
>>   py> x = Decimal(1.3)
>>   py> x
>>   Decimal('1.3')
>>   py> y = 10000000000000000*x
>>   py> y - 13000000000000000
>>   Decimal('0.444089209850062616169452667236328125')
>> > ..but this is one really common pitfall for new users, i know its easy
>> > to
>> > fix the code above,
>> > but this behavior is very unintuitive.. you essentially get a really
>> > expensive float when you do the obvious thing.
>> Then don't do the obvious thing.
>> Sometimes there really is no good alternative to actually knowing what
>> you are doing. Floating point maths is inherently hard, but that's not
>> the problem. There are all sorts of things in programming which are
>> hard, and people learn how to deal with them. The problem is that people
>> *imagine* that floating point is simple, when it is not and can never
>> be. We don't do them any favours by enabling that delusion.
>> If your needs are light, then you can ignore the complexities of
>> floating point. You really can go a very long way by just rounding the
>> results of your calculations when displaying them. But for anything more
>> than that, we cannot just paper over the floating point complexities
>> without creating new complexities that will burn people.
>> You don't have to become a floating point guru, but it really isn't
>> onerous to expect people who are programming to learn a few basic
>> programming skills, and that includes a few basic coping strategies for
>> floating point.
>> --
>> Steve
>> ------------------------------
>> Message: 2
>> Date: Mon, 1 Jun 2015 19:21:47 -0700
>> From: Andrew Barnert <abarnert at>
>> To: Andrew Barnert <abarnert at>
>> Cc: Nick Coghlan <ncoghlan at>, python-ideas
>>         <python-ideas at>
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <5E8271BF-183E-496D-A556-81C407977FFE at>
>> Content-Type: text/plain;       charset=us-ascii
>> On Jun 1, 2015, at 19:00, Andrew Barnert <abarnert at> wrote:
>> >
>> >> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas
>> >> <python-ideas at> wrote:
>> >>
>> >>> On Jun 1, 2015, at 17:08, Nick Coghlan <ncoghlan at> wrote:
>> >>>
>> >>> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas"
>> >>> <python-ideas at> wrote:
>> >>>> But the basic idea can be extracted out and Pythonified:
>> >>>>
>> >>>> The literal 1.23 no longer gives you a float, but a FloatLiteral,
>> >>>> which is either a subclass of float, or an unrelated class that has a
>> >>>> __float__ method. Doing any calculation on it gives you a float. But as long
>> >>>> as you leave it alone as a FloatLiteral, it has its literal characters
>> >>>> available for any function that wants to distinguish FloatLiteral from
>> >>>> float, like the Decimal constructor.
>> >>>>
>> >>>> The problem that Python faces that Swift doesn't is that Python
>> >>>> doesn't use static typing and implicit compile-time conversions. So in
>> >>>> Python, you'd be passing around these larger values and doing the slow
>> >>>> conversions at runtime. That may or may not be unacceptable; without
>> >>>> actually building it and testing some realistic programs it's pretty hard to
>> >>>> guess.
>> >>>
>> >>> Joonas's suggestion of storing the original text representation passed
>> >>> to the float constructor is at least a novel one - it's only the idea
>> >>> of actual decimal literals that was ruled out in the past.
>> >>
>> >> I actually built about half an implementation of something like Swift's
>> >> LiteralConvertible protocol back when I was teaching myself Swift. But I
>> >> think I have a simpler version that I could implement much more easily.
>> >>
>> >> Basically, FloatLiteral is just a subclass of float whose __new__
>> >> stores its constructor argument. Then decimal.Decimal checks for that stored
>> >> string and uses it instead of the float value if present. Then there's an
>> >> import hook that replaces every Num with a call to FloatLiteral.
>> >>
>> >> This design doesn't actually fix everything; in effect, 1.3 actually
>> >> compiles to FloatLiteral(str(float('1.3')) (because by the time you get to
>> >> the AST it's too late to avoid that first conversion). Which does actually
>> >> solve the problem with 1.3, but doesn't solve everything in general (e.g.,
>> >> just feed in a number that has more precision than a double can hold but
>> >> less than your current decimal context can...).
>> >>
>> >> But it just lets you test whether the implementation makes sense and
>> >> what the performance effects are, and it's only an hour of work,
>> >
>> > Make that 15 minutes.
>> >
>> >
>> And as it turns out, hacking the tokens is no harder than hacking the AST
>> (in fact, it's a little easier; I'd just never done it before), so now it
>> does that, meaning you really get the actual literal string from the source,
>> not the repr of the float of that string literal.
>> Turning this into a real implementation would obviously be more than half
>> an hour's work, but not more than a day or two. Again, I don't think anyone
>> would actually want this, but now people who think they do have an
>> implementation to play with to prove me wrong.
>> >> and doesn't require anyone to patch their interpreter to play with it.
>> >> If it seems promising, then hacking the compiler so 2.3 compiles to
>> >> FloatLiteral('2.3') may be worth doing for a test of the actual
>> >> functionality.
>> >>
>> >> I'll be glad to hack it up when I get a chance tonight. But personally,
>> >> I think decimal literals are a better way to go here. Decimal(1.20)
>> >> magically doing what you want still has all the same downsides as 1.20d (or
>> >> implicit decimal literals), plus it's more complex, adds performance costs,
>> >> and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little
>> >> nicer than Decimal('1.20'), but only a little--and nowhere near as nice as
>> >> 1.20d).
>> >>
>> >>> Aside from the practical implementation question, the main concern I
>> >>> have with it is that we'd be trading the status quo for a situation
>> >>> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers.
>> >>
>> >> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which
>> >> implies that maybe the simplification in Decimal(1.3) is more misleading
>> >> than helpful. (Notice that this problem also doesn't arise for decimal
>> >> literals--13/10d is int vs. Decimal division, which is correct out of the
>> >> box. Or, if you want prefixes, d13/10 is Decimal vs. int division.)
>> >>
>> >>> It seems to me that a potentially better option might be to adjust the
>> >>> implicit float->Decimal conversion in the Decimal constructor to use
>> >>> the same algorithm as we now use for float.__repr__ [1], where we look
>> >>> for the shortest decimal representation that gives the same answer
>> >>> when rendered as a float. At the moment you have to indirect through
>> >>> str() or repr() to get that behaviour:
>> >>>
>> >>>>>> from decimal import Decimal as D
>> >>>>>> 1.3
>> >>> 1.3
>> >>>>>> D('1.3')
>> >>> Decimal('1.3')
>> >>>>>> D(1.3)
>> >>> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> >>>>>> D(str(1.3))
>> >>> Decimal('1.3')
>> >>>
>> >>> Cheers,
>> >>> Nick.
>> >>>
>> >>> [1]
>> >> _______________________________________________
>> >> Python-ideas mailing list
>> >> Python-ideas at
>> >>
>> >> Code of Conduct:
>> ------------------------------
>> Message: 3
>> Date: Tue, 2 Jun 2015 13:00:40 +1000
>> From: Steven D'Aprano <steve at>
>> To: python-ideas at
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <20150602030040.GF932 at>
>> Content-Type: text/plain; charset=utf-8
>> Nicholas,
>> Your email client appears to not be quoting text you quote. It is a
>> conventional to use a leading > for quoting, perhaps you could configure
>> your mail program to do so? The good ones even have a "Paste As Quote"
>> command.
>> On with the substance of your post...
>> On Mon, Jun 01, 2015 at 01:24:32PM -0400, Nicholas Chammas wrote:
>> > I guess it?s a non-trivial tradeoff. But I would lean towards
>> > considering
>> > people likely to be affected by the performance hit as doing something
>> > ?not
>> > common?. Like, if they are doing that many calculations that it matters,
>> > perhaps it makes sense to ask them to explicitly ask for floats vs.
>> > decimals, in exchange for giving the majority who wouldn?t notice a
>> > performance difference a better user experience.
>> Changing from binary floats to decimal floats by default is a big,
>> backwards incompatible change. Even if it's a good idea, we're
>> constrained by backwards compatibility: I would imagine we wouldn't want
>> to even introduce this feature until the majority of people are using
>> Python 3 rather than Python 2, and then we'd probably want to introduce
>> it using a "from __future__ import decimal_floats" directive.
>> So I would guess this couldn't happen until probably 2020 or so.
>> But we could introduce a decimal literal, say 1.1d for Decimal("1.1").
>> The first prerequisite is that we have a fast Decimal implementation,
>> which we now have. Next we would have to decide how the decimal literals
>> would interact with the decimal module. Do we include full support of
>> the entire range of decimal features, including globally configurable
>> precision and other modes? Or just a subset? How will these decimals
>> interact with other numeric types, like float and Fraction? At the
>> moment, Decimal isn't even part of the numeric tower.
>> There's a lot of ground to cover, it's not a trivial change, and will
>> definitely need a PEP.
>> > How many of your examples are inherent limitations of decimals vs.
>> > problems
>> > that can be improved upon?
>> In one sense, they are inherent limitations of floating point numbers
>> regardless of base. Whether binary, decimal, hexadecimal as used in some
>> IBM computers, or something else, you're going to see the same problems.
>> Only the specific details will vary, e.g. 1/3 cannot be represented
>> exactly in base 2 or base 10, but if you constructed a base 3 float, it
>> would be exact.
>> In another sense, Decimal has a big advantage that it is much more
>> configurable than Python's floats. Decimal lets you configure the
>> precision, rounding mode, error handling and more. That's not inherent
>> to base 10 calculations, you can do exactly the same thing for binary
>> floats too, but Python doesn't offer that feature for floats, only for
>> Decimals.
>> But no matter how you configure Decimal, all you can do is shift the
>> gotchas around. The issue really is inherent to the nature of the
>> problem, and you cannot defeat the universe. Regardless of what
>> base you use, binary or decimal or something else, or how many digits
>> precision, you're still trying to simulate an uncountably infinite
>> continuous, infinitely divisible number line using a finite,
>> discontinuous set of possible values. Something has to give.
>> (For the record, when I say "uncountably infinite", I don't just mean
>> "too many to count", it's a technical term. To oversimplify horribly, it
>> means "larger than infinity" in some sense. It's off-topic for here,
>> but if anyone is interested in learning more, you can email me off-list,
>> or google for "countable vs uncountable infinity".)
>> Basically, you're trying to squeeze an infinite number of real numbers
>> into a finite amount of memory. It can't be done. Consequently, there
>> will *always* be some calculations where the true value simply cannot be
>> calculated and the answer you get is slightly too big or slightly too
>> small. All the other floating point gotchas follow from that simple
>> fact.
>> > Admittedly, the only place where I?ve played with decimals extensively
>> > is
>> > on Microsoft?s SQL Server (where they are the default literal
>> > <>). I?ve stumbled
>> > in
>> > the past on my own decimal gotchas
>> > <>, but looking at your
>> > examples
>> > and trying them on SQL Server I suspect that most of the problems you
>> > show
>> > are problems of precision and scale.
>> No. Change the precision and scale, and some *specific* problems goes
>> away, but they reappear with other numbers.
>> Besides, at the point that you're talking about setting the precision,
>> we're really not talking about making things easy for beginners any
>> more.
>> And not all floating point issues are related to precision and scale in
>> decimal. You cannot divide a cake into exactly three equal pieces in
>> Decimal any more than you can divide a cake into exactly three equal
>> pieces in binary. All you can hope for is to choose a precision were the
>> rounding errors in one part of your calculation will be cancelled by the
>> rounding errors in another part of your calculation. And that precision
>> will be different for any two arbitrary calculations.
>> --
>> Steve
>> ------------------------------
>> Message: 4
>> Date: Mon, 1 Jun 2015 20:10:29 -0700
>> From: Andrew Barnert <abarnert at>
>> To: Steven D'Aprano <steve at>
>> Cc: "python-ideas at" <python-ideas at>
>> Subject: Re: [Python-ideas] Python Float Update
>> Message-ID: <79C16144-8BF7-4260-A356-DD4E8D97BAAD at>
>> Content-Type: text/plain;       charset=us-ascii
>> On Jun 1, 2015, at 18:58, Steven D'Aprano <steve at> wrote:
>> >
>> >> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote:
>> >>
>> >> It seems to me that a potentially better option might be to adjust the
>> >> implicit float->Decimal conversion in the Decimal constructor to use
>> >> the same algorithm as we now use for float.__repr__ [1], where we look
>> >> for the shortest decimal representation that gives the same answer
>> >> when rendered as a float. At the moment you have to indirect through
>> >> str() or repr() to get that behaviour:
>> >
>> > Apart from the questions of whether such a change would be allowed by
>> > the Decimal specification,
>> As far as I know, GDAS doesn't specify anything about implicit conversion
>> from floats. As long as the required explicit conversion function (which I
>> think is from_float?) exists and does the required thing.
>> As a side note, has anyone considered whether it's worth switching to
>> IEEE-754-2008 as the controlling specification? There may be a good reason
>> not to do so; I'm just curious whether someone has thought it through and
>> made the case.
>> > and the breaking of backwards compatibility,
>> > I would really hate that change for another reason.
>> >
>> > At the moment, a good, cheap way to find out what a binary float "really
>> > is" (in some sense) is to convert it to Decimal and see what you get:
>> >
>> > Decimal(1.3)
>> > -> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> >
>> > If you want conversion from repr, then you can be explicit about it:
>> >
>> > Decimal(repr(1.3))
>> > -> Decimal('1.3')
>> >
>> > ("Explicit is better than implicit", as they say...)
>> >
>> > Although in fairness I suppose that if this change happens, we could
>> > keep the old behaviour in the from_float method:
>> >
>> > # hypothetical future behaviour
>> > Decimal(1.3)
>> > -> Decimal('1.3')
>> > Decimal.from_float(1.3)
>> > -> Decimal('1.3000000000000000444089209850062616169452667236328125')
>> >
>> > But all things considered, I don't think we're doing people any favours
>> > by changing the behaviour of float->Decimal conversions to implicitly
>> > use the repr() instead of being exact. I expect this strategy is like
>> > trying to flatten a bubble under wallpaper: all you can do is push the
>> > gotchas and surprises to somewhere else.
>> >
>> > Oh, another thought... Decimals could gain yet another conversion
>> > method, one which implicitly uses the float repr, but signals if it was
>> > an inexact conversion or not. Explicitly calling repr can never signal,
>> > since the conversion occurs outside of the Decimal constructor and
>> > Decimal sees only the string:
>> >
>> > Decimal(repr(1.3)) cannot signal Inexact.
>> >
>> > But:
>> >
>> > Decimal.from_nearest_float(1.5)  # exact
>> > Decimal.from_nearest_float(1.3)  # signals Inexact
>> >
>> > That might be useful, but probably not to beginners.
>> I think this might be worth having whether the default constructor is
>> changed or not.
>> I can't think of too many programs where I'm pretty sure I have an
>> exactly-representable decimal as a float but want to check to be sure... but
>> for interactive use in IPython (especially when I'm specifically trying to
>> explain to someone why just using Decimal instead of float will/will not
>> solve their problem) I could see using it.
>> ------------------------------
>> Subject: Digest Footer
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> ------------------------------
>> End of Python-ideas Digest, Vol 103, Issue 15
>> *********************************************
> --
> -Surya Subbarao

-Surya Subbarao

From ethan at  Wed Jun  3 00:32:21 2015
From: ethan at (Ethan Furman)
Date: Tue, 02 Jun 2015 15:32:21 -0700
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 15
In-Reply-To: <>
References: <>
Message-ID: <>

On 06/01/2015 10:22 PM, Andrew Barnert via Python-ideas wrote:
> On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person <surya.subbarao1 at <mailto:surya.subbarao1 at>> wrote:
>> I think you're right. I was also considering ... "editing" my Python distribution. If they didn't implement my suggestion for correcting floats, at least they can fix this, instead of making people
>> hack Python for good results!
> If you're going to reply to digests, please learn how to reply inline instead of top-posting (and how to trim out all the irrelevant stuff). It's next to impossible to tell which part of which of the
> messages you're replying to even in simple cases like this one, with only 4 messages in the digest.

This would have been a better example had you trimmed the cruft yourself.  ;)


From phd at  Wed Jun  3 00:44:20 2015
From: phd at (Oleg Broytman)
Date: Wed, 3 Jun 2015 00:44:20 +0200
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 02, 2015 at 03:28:33PM -0700, u8y7541 The Awesome Person <surya.subbarao1 at> wrote:
> What do you mean by replying inine?

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

> On Mon, Jun 1, 2015 at 10:22 PM, Andrew Barnert <abarnert at> wrote:
> > On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person
> > <surya.subbarao1 at> wrote:
> >
> > I think you're right. I was also considering ... "editing" my Python
> > distribution. If they didn't implement my suggestion for correcting floats,
> > at least they can fix this, instead of making people hack Python for good
> > results!
> >
> >
> > If you're going to reply to digests, please learn how to reply inline
> > instead of top-posting (and how to trim out all the irrelevant stuff). It's
> > next to impossible to tell which part of which of the messages you're
> > replying to even in simple cases like this one, with only 4 messages in the
> > digest.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From tjreedy at  Wed Jun  3 02:05:42 2015
From: tjreedy at (Terry Reedy)
Date: Tue, 02 Jun 2015 20:05:42 -0400
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <mklgdl$ctb$>

On 6/2/2015 3:40 PM, Chris Angelico wrote:
> On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas

>> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.
> I thought there was no such thing as a dict/list/set literal, only
> display syntax?

Correct.  Only number and string literals. Displays are atomic runtime 
expressions.  'expression_list' and 'comprehension' are alternate 
contents of a display. 6.2.4. Displays for lists, sets and dictionaries
Terry Jan Reedy

From abarnert at  Wed Jun  3 02:36:22 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 17:36:22 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <20150602193346.GG26357@tonks>
References: <>
Message-ID: <>

On Jun 2, 2015, at 12:33, Florian Bruhin <me at> wrote:
> * Andrew Barnert via Python-ideas <python-ideas at> [2015-06-02 12:03:25 -0700]:
>> This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.
> I actually had the exact same thing in mind recently, and never
> brought it up because it seemed too crazy to me. It seems I'm not the
> only one! :D
>> Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.
> I think a big issue is that it's non-obvious syntactic sugar. You
> wouldn't expect 1.2x to actually be a function call, and for newcomers
> this might be rather confusing...

Well, newcomers won't be creating user-defined literals, so they won't have to even know there's a function call (unless whoever wrote the library that supplies them has a bug).

>> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.
> That actually was the use-case I had in mind. I think
>    {'spam': 1, 'eggs': 2}_o
> is less ugly (and less error-prone!) than
>    OrderedDict([('spam', 1), ('eggs': 2)])

Well, I suppose that's one advantage of the literals being user-defined: you can use _o in your project, and I can not use it. :)

But you still have to deal with the other issue I mentioned if you want to extend it to collection literals: again, they aren't really literals, or even easy to define except as "displays that aren't comprehensions". A quick hack like this is actually pretty easy to write (especially because in a quick hack, who cares whether using it on a comprehension gives the wrong error, or accidentally "works"); a real design and implementation may be harder.

> Also, it's immediately apparent that it is some kind of dict.

That is a good point. Not that it isn't immediately apparent that OrderedDict(?) is some kind of dict as well... But compared to Swift using ArrayLiteralConvertible to define sets or C++ using array-like initializer lists to do the same thing, this is definitely not as bad.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Wed Jun  3 02:47:03 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 17:47:03 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 2, 2015, at 12:40, Chris Angelico <rosuav at> wrote:
> On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas
> <python-ideas at> wrote:
>> Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice.
> There's probably no solution to the literal_imal problem, but the
> easiest fix for literal_ump is to have 21j be parsed the same way -
> it's a 21 modified by j, same as 21jump is a 21 modified by jump.

Thanks; I should have thought of that--especially since that's exactly how C++ solves similar problems. (Although reserving all suffixes that don't start with an underscore for the implementation's use doesn't hurt...)

>> Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from ? import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal.
> I'd much rather see it be done at compile time. Something like this:
> compile("x = 1d/10", "<>", "exec")
> would immediately call literal_d("1") and embed its return value in
> the resulting code as a literal. (Since the peephole optimizer
> presumably doesn't currently understand Decimals, this would probably
> keep the division, but if it got enhanced, this could end up
> constant-folding to Decimal("0.1") before returning the code object.)
> So it's only the compilation step that needs to know about all those
> literal_* functions. Should there be a way to globally register them
> for default usage, or is this too much action-at-a-distance?

It would definitely be nicer to have it done at compile time if possible. I'm just not sure there's a good design that makes it possible.

In particular, with your suggestion (which I considered), it seems a bit opaque to me that 1.2d is an error unless you _or some other module_ first imported decimalliterals; it's definitely more explicit if you (not some other module) have to from decimalliterals import literal_d. (And when you really want to be implicit, you can inject it into other modules or into builtins, the same as any other rare case where you really want to be implicit.)

But many real projects are either complex enough to need centralized organization or simple enough to fit in one script, so maybe it wouldn't turn out too "magical" in practice.

>> Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary.
> I'd be inclined to simply always provide a string. The special case
> would be that the quotes can sometimes be omitted, same as redundant
> parens on genexprs can sometimes be omitted.

Yes, that's what I thought too. The only real use case C++ has for this is allowing the same suffix to mean different things for different types, which I think would be more of a bug magnet than a feature if anyone actually did it...

> Otherwise, 1.2d might
> still produce wrong results.
>> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.
> I thought there was no such thing as a dict/list/set literal, only
> display syntax?

That's what I meant in the last sentence: technically, there's no such thing as a dict literal, just a dict display that isn't a comprehension. I don't think you want user-defined suffixes on comprehensions, and coming up with a principled and simply-implementable way to make them work on literal-type displays but not comprehension-type displays doesn't seem like an easy problem.

> In any case, that can always be left for a future
> extension to the proposal.
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From rosuav at  Wed Jun  3 03:05:09 2015
From: rosuav at (Chris Angelico)
Date: Wed, 3 Jun 2015 11:05:09 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 3, 2015 at 10:47 AM, Andrew Barnert <abarnert at> wrote:
>>> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.
>> I thought there was no such thing as a dict/list/set literal, only
>> display syntax?
> That's what I meant in the last sentence: technically, there's no such thing as a dict literal, just a dict display that isn't a comprehension. I don't think you want user-defined suffixes on comprehensions, and coming up with a principled and simply-implementable way to make them work on literal-type displays but not comprehension-type displays doesn't seem like an easy problem.

Yeah. The significance is that literals get snapshotted into the code
object as constants and simply called up when they're needed, but
displays are executable code:

>>> dis.dis(lambda: "Literal")
  1           0 LOAD_CONST               1 ('Literal')
              3 RETURN_VALUE
>>> dis.dis(lambda: ["List","Display"])
  1           0 LOAD_CONST               1 ('List')
              3 LOAD_CONST               2 ('Display')
              6 BUILD_LIST               2
              9 RETURN_VALUE
>>> dis.dis(lambda: ("Tuple","Literal"))
  1           0 LOAD_CONST               3 (('Tuple', 'Literal'))
              3 RETURN_VALUE

My understanding of "literal" is something which can be processed
entirely at compile time, and retained in the code object, just like
strings are. Once the code's finished being compiled, there's no
record of what type of string literal was used (raw, triple-quoted,
etc), only the type of string object (bytes/unicode). Custom literals
could be the same - come to think of it, it might be nice to have
pathlib.Path literals, represented as p"/home/rosuav" or something. In
any case, they'd be evaluated using only compile-time information, and
would then be saved as constants.

That implies that only immutables should have literal syntaxes. I'm
not sure whether that's significant or not.


From abarnert at  Wed Jun  3 03:35:09 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 18:35:09 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 2, 2015, at 12:40, Nathaniel Smith <njs at> wrote:
> On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas
> <python-ideas at> wrote:
>> This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.
>> In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery.
> Are there any use cases besides decimals?

> Wouldn't it be easier to
> just add, say, a fixed "0d" prefix for decimals?

I suggested that on the other thread, but go back and read the first paragraph of this thread. We don't want a standard literal syntax for decimal.Decimal. Some users may want it for some projects, but they should have to do something explicit to get it.

Meanwhile, a literal syntax for decimal64 would be very useful, but there's no such type in the stdlib, so anyone who wants it has to go get it on PyPI, which means the PyPI module, not Python itself, would have to supply the literal.

And, since I don't know of any implementation of decimal64 without decimal32 and decimal128, I can easily imagine wanting separate literals for all three.

And f or r for fraction came up in the other thread.

Beyond that? I don't know. If you look at the C++ proposal (N2750) and the various blog posts written around 2008-2012, here's what comes up repeatedly, in (to me) decreasing order of usefulness in Python:

 * One or more decimal float types.

 * Custom string types, like a string that iterates graphemes clusters instead of code units (Java and Swift have these; I don't know of an implementation for Python), or a mutable rope-based implementation, or the bytes-that-knows-its-encoding type that Nick Coghlan suggested some time last year.

 * Integers specified in arbitrary bases.

 * Quaternions or other number-like types beyond complex.

 * Points or vectors represented as 3x + 4z.

 * Units. Which I'm not sure is a good idea. (200*km seems just as readable to me as 200km, and only the former extends in an obvious way to 200*km/sec...) And I think the same goes for similar things like CSS units (1.2*em seems as good as 1.2_em to me).

 * Various things Python already has (real string objects instead of char*, real Unicode strings, binary integers, arbitrary-precision integers, etc.).

 * Cases where a constructor call would actually be just as nice, except for some other deficiency of C++ (e.g., you can't use a constexpr constructor expression as a template argument in C++11).

 * Blatantly silly things, like integral radians or integral halfs (which people keep saying physicists could use, only for physicists to ask "where would I use that?").

> 0x1001  # hex
> 0b1001  # binary
> 0d1.001 # decimal
>> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.
> Also there's the idea floating around of making *all* dicts ordered
> (as PyPy has done), which would be much cleaner if it can be managed,
> so I'm guessing that would have to be tried and fail before any new
> syntax would be added for this use case.

Well, OrderedDict isn't the only container class, even in the stdlib. But the real point would be types outside the stdlib. You could construct a sorted dict using blist or SortedContainers without having to first construct a dict in arbitrary order and then copy-sort it. Or build a NumPy array without building a list. And so on.

But, again, I think the additional problems with container literals (which, again, aren't really literals) mean it would be worth leaving this out of any 1.0 proposal (and if containers are the only good selling point for the whole thing, that may mean the whole thing isn't worth having).

From bruce at  Wed Jun  3 03:50:53 2015
From: bruce at (Bruce Leban)
Date: Tue, 2 Jun 2015 18:50:53 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas <
python-ideas at> wrote:

> Any number or string token followed by a name (identifier) token is
> currently illegal. This would change so that, if there's no whitespace
> between them, it's legal, and equivalent to a call to a function named
> `literal_{name}({number-or-string})`. For example, `1.2d` becomes
> `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also
> becomes `literal_d('1.2')`.
> Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump`
> becomes `literal_ump('21j'), which are not at all useful, and potentially
> confusing, but I don't think that would be a serious problem in practice.

You seem to suggest that the token should start with an underscore when you
write 1.2_dec and  {...}_o but not when you write 1.2d and 1.2jump.
Requiring the underscore solves the ambiguity and would make literals more
readable. I would also require an alphabetic character after the _ and
prohibit _ inside the name to avoid confusion.

1.2_d        => literal_d('1.2')
1.2j_ump     => literal_ump('1.2j')
1.2_jump     => literal_jump('1.2')

0x12dec_imal => literal_imal('0x12dec')

0x12_decimal => literal_decimal('0x12')

"1.2"_ebcdic => literal_ebcdic('1.2')

1.2d         => error
0x12decimal  => error

1_a_b        => error

1_2          => error

I do think the namescape thing is an issue but requiring me to write

from literals import literal_jump

isn't necessarily that bad. Without an explicit import, how would I go
about tracking down what exactly 21_jump means?

The use of _o on a dict is strange since the thing you're attaching it to
isn't a literal. I think there needs to be some more thought here if you
want to apply it to anything other than a simple value:

(1, 3, 4)_xyzspace
{'a': 1 + 2}_o
{'a', 'b': 3}_o

("abc", "def")_x
"abc" "def"_x

("abc" "def")_x
("abc" "def",)_x

--- Bruce
Check out my new puzzle book:
Get it free here: (available on iOS)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Wed Jun  3 03:56:13 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 18:56:13 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 2, 2015, at 18:05, Chris Angelico <rosuav at> wrote:
> On Wed, Jun 3, 2015 at 10:47 AM, Andrew Barnert <abarnert at> wrote:
>>>> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.
>>> I thought there was no such thing as a dict/list/set literal, only
>>> display syntax?
>> That's what I meant in the last sentence: technically, there's no such thing as a dict literal, just a dict display that isn't a comprehension. I don't think you want user-defined suffixes on comprehensions, and coming up with a principled and simply-implementable way to make them work on literal-type displays but not comprehension-type displays doesn't seem like an easy problem.
> Yeah. The significance is that literals get snapshotted into the code
> object as constants and simply called up when they're needed, but
> displays are executable code:
>>>> dis.dis(lambda: "Literal")
>  1           0 LOAD_CONST               1 ('Literal')
>              3 RETURN_VALUE
>>>> dis.dis(lambda: ["List","Display"])
>  1           0 LOAD_CONST               1 ('List')
>              3 LOAD_CONST               2 ('Display')
>              6 BUILD_LIST               2
>              9 RETURN_VALUE
>>>> dis.dis(lambda: ("Tuple","Literal"))
>  1           0 LOAD_CONST               3 (('Tuple', 'Literal'))
>              3 RETURN_VALUE
> My understanding of "literal" is something which can be processed
> entirely at compile time, and retained in the code object, just like
> strings are.

The problem is that Python doesn't really define what it means by "literal" anywhere, and the documentation is not consistent. There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)

Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you. And it comes up when teaching people about how the parser and compiler work. And... That's about it. You can (as the docs do) loosely use "literal" to include non-comprehension displays in some places but not others, or even to include -2 or 1+2j in some places but not others, and nobody gets confused, except in those special contexts where you're going to have to get into the details anyway.

This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere. It's still obvious to anyone what they're supposed to be. The Python docs are a language reference manual, not a rigorous specification, and that's fine.

> Once the code's finished being compiled, there's no
> record of what type of string literal was used (raw, triple-quoted,
> etc), only the type of string object (bytes/unicode). Custom literals
> could be the same

But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here.

And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?

> - come to think of it, it might be nice to have
> pathlib.Path literals, represented as p"/home/rosuav" or something. In
> any case, they'd be evaluated using only compile-time information, and
> would then be saved as constants.
> That implies that only immutables should have literal syntaxes. I'm
> not sure whether that's significant or not.

But pathlib.Path isn't immutable.

Meanwhile, that reminds me: one of the frequent selling points for Swift's related feature is for NSURL literals (which Cocoa uses for local paths as well as remote resources); I should go through the Swift selling points to see if they've found other things that the C++ community hasn't (but that can be ported to the C++ design, and that don't depend on peculiarities of Cocoa to be interesting).

From steve at  Wed Jun  3 04:52:07 2015
From: steve at (Steven D'Aprano)
Date: Wed, 3 Jun 2015 12:52:07 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 02, 2015 at 12:03:25PM -0700, Andrew Barnert via Python-ideas wrote:

> I explored the convertible literals a while ago, and I'm pretty sure 
> that doesn't work in a duck-typed language. But the C++ design does 
> work, as long as you're willing to have the conversion (including the 
> lookup of the conversion function itself) done at runtime.

I'm torn. On the one hand, some sort of extensible syntax for literals 
would be nice. I say "nice" rather than useful because there are 
advantages and disadvantages and there's no way of really knowing 
which outweighs the other.

But, really, your proposal is in no way, shape or form syntax for 
*literals*, it's a new syntax for an unary postfix operator or function. 
The whole point of something being a literal is that it is parsed and 
converted at compile time. Now you might (and do) say that worrying 
about this is "premature optimization", but call me a pedant if you 
like, I don't think we should call something a literal if it's a 
runtime function call. Otherwise, we might as well say that 

    from fractions import Fraction

is a literal, in which case I can say your proposal is unnecessary as we 
already have user-specified literals in Python.

I can think of some interesting uses for postfix operators, or literals, 
or whatever we want to call them:


I've deliberately not explained what I mean by each of them. You can 
probably guess some, or all, but I hope it demonstrates one problem with 
this suggestion. Like operator overloading, it risks making code less 
clear rather than more.


From rosuav at  Wed Jun  3 05:12:52 2015
From: rosuav at (Chris Angelico)
Date: Wed, 3 Jun 2015 13:12:52 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 3, 2015 at 11:56 AM, Andrew Barnert <abarnert at> wrote:
> On Jun 2, 2015, at 18:05, Chris Angelico <rosuav at> wrote:
>> My understanding of "literal" is something which can be processed
>> entirely at compile time, and retained in the code object, just like
>> strings are.
> The problem is that Python doesn't really define what it means by "literal" anywhere, and the documentation is not consistent. There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)
> Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you. And it comes up when teaching people about how the parser and compiler work. And... That's about it. You can (as the docs do) loosely use "literal" to include non-comprehension displays in some places but not others, or even to include -2 or 1+2j in some places but not others, and nobody gets confused, except in those special contexts where you're going to have to get into the details anyway.
> This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere. It's still obvious to anyone what they're supposed to be. The Python docs are a language reference manual, not a rigorous specification, and that's fine.

Yes, it's a bit tricky. Part of the confusion comes from the peephole
optimizer; "1+2j" looks like a constant, but it's actually a
compile-time expression. It wouldn't be a big problem to have an
uber-specific definition of "literal" that cuts out things like that;
for the most part, it's not going to be a problem (eg if you define a
fractions.Fraction literal, you could use "1/2frac" or "1frac/2" and
you'd get back Fraction(1, 2) either way, simply because division of
Fraction and int works correctly; you could even have a "mixed number
literal" like "1+1/2frac" and it'd evaluate just fine).

>> Once the code's finished being compiled, there's no
>> record of what type of string literal was used (raw, triple-quoted,
>> etc), only the type of string object (bytes/unicode). Custom literals
>> could be the same
> But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here.

Well, an additional parameter to compile() would do it. I've no idea
how hard it is to write an import hook, but my notion was that you
could do it that way and alter the behaviour of the compilation
process. But I haven't put a lot of thought into implementation, nor
do I know enough of the internals to know what's plausible and what

> And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?

It's to do with expectations. A literal should simply be itself,
nothing else. When you have a string literal in your code, nothing can
change what string that represents; at compilation time, it turns into
a string object, and there it remains. Shadowing the name 'str' won't
affect it. But if something that looks like a literal ends up being a
function call, it could get extremely confusing - name lookups
happening at run-time when the name doesn't occur in the code. Imagine
the traceback:

def calc_profit(hex):
    decimal = int(hex, 16)
    return 0.2d * decimal

>>> calc_profit("1E2A")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in calc_profit
AttributeError: 'int' object has no attribute 'Decimal'

Uhh... what? Sure, I shadowed the module name there, but I'm not
*using* the decimal module! I'm just using a decimal literal! It's no
problem to shadow the built-in function 'hex' there, because I'm not
using the built-in function!

Whatever name you use, there's the possibility that it'll have been
changed at run-time, and that will cause no end of confusion. A
literal shouldn't cause surprise function calls and name lookups.

>> - come to think of it, it might be nice to have
>> pathlib.Path literals, represented as p"/home/rosuav" or something. In
>> any case, they'd be evaluated using only compile-time information, and
>> would then be saved as constants.
>> That implies that only immutables should have literal syntaxes. I'm
>> not sure whether that's significant or not.
> But pathlib.Path isn't immutable.

Huh, it isn't? That's a pity. In that case, I guess you can't have a
path literal. In any case, I'm sure there'll be other string-like
things that people can come up with literal syntaxes for.


From abarnert at  Wed Jun  3 05:57:28 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 2 Jun 2015 20:57:28 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 2, 2015, at 18:50, Bruce Leban <bruce at> wrote:
>> On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas <python-ideas at> wrote:
>> Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.
>> Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice.
> You seem to suggest that the token should start with an underscore when you write 1.2_dec and  {...}_o but not when you write 1.2d and 1.2jump.

Well, I was suggesting leaving it up to the user who defines the literals. Sure, it's possible to come up with confusing suffixes, but if we can trust end users to name a variable that holds an XML tree "root" instead "_12rawbytes", can't we trust library authors to name their suffixes appropriately? I think you will _often_ want the preceding underscore, at least for multi-character suffixes, and you will _almost never_ want multiple underscores, or strings of underscores and digits without letters, etc. But that seems more like something for PEP8 and other style guides and checkers than something the language would need to enforce.

However, I noticed that I left off the extra underscore in literal__dec, and it really does look pretty ugly that way, so... Maybe you have a point here.

> I do think the namescape thing is an issue but requiring me to write
> from literals import literal_jump
> isn't necessarily that bad. Without an explicit import, how would I go about tracking down what exactly 21_jump means?

Thanks; that's the argument I was trying to make and not making very well.

> The use of _o on a dict is strange since the thing you're attaching it to isn't a literal. I think there needs to be some more thought here if you want to apply it to anything other than a simple value:

At least two people suggested that it's better to just explicitly put that whole question of collection "literals" off for the future (assuming the basic idea of numeric and string literal suffixes  is worth considering at all), and I think they're right. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tjreedy at  Wed Jun  3 06:48:47 2015
From: tjreedy at (Terry Reedy)
Date: Wed, 03 Jun 2015 00:48:47 -0400
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <mkm10d$tub$>

On 6/2/2015 9:56 PM, Andrew Barnert via Python-ideas wrote:

> The problem is that Python doesn't really define what it means by
> "literal" anywhere,

The reference manual seems quite definite to me. The definitive section 
is "Section 2.4. Literals". I should have all the information needed to 
write a new implementation. It starts "Literals are notations for 
constant values of some built-in types."

The relevant subsections are:
2.4.1. String and Bytes literals
2.4.2. String literal concatenation
2.4.3. Numeric literals
2.4.4. Integer literals
2.4.5. Floating point literals
2.4.6. Imaginary literals

 > and the documentation is not consistent.

I'd call it a bit sloppy in places.

> There
> are at least two places (not counting tutorial and howtos) that
> Python 3.4 refers to list or dict literals. (That's not based on a
> search; someone wrote a StackOverflow question asking what those two
> places meant.)

Please open a tracker issue to correct the sloppiness and reference the 
SO issue as evidence that it confuses people.

> Which I don't actually think is much of a problem. It means that in
> cases like this proposal, you have to be explicit about exactly what
> you mean by "literal" because Python doesn't do it for you.

Again, the Language Reference seems sufficiently explicit and detailed 
to write another implementation. 2.4.3 says

"There are three types of numeric literals: integers, floating point 
numbers, and imaginary numbers. There are no complex literals (complex 
numbers can be formed by adding a real number and an imaginary number).
Note that numeric literals do not include a sign; a phrase like -1 is 
actually an expression composed of the unary operator ?-? and the 
literal 1." I will let you read the three specific subsections

> This is similar to the fact that Python doesn't actually define the
> semantics of numeric literals anywhere.

I am again puzzled by your claim. There are 3 builtin number classes: 
int, float, and complex.  There are 3 type of numeric literals: integer, 
float, and imaginary. "An imaginary literal yields a complex number with 
a real part of 0.0."  Anyone capable of programming Python should be 
able to match 'integer' with 'int' and 'float' with 'float.

Terry Jan Reedy

From drekin at  Wed Jun  3 16:29:47 2015
From: drekin at (drekin at
Date: Wed, 03 Jun 2015 07:29:47 -0700 (PDT)
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
Message-ID: <>

Stephen J. Turnbull writes:

> Nick Coghlan writes:
> > the main concern I have with [a FloatLiteral that carries the
> > original repr around] is that we'd be trading the status quo for a
> > situation where "Decimal(1.3)" and "Decimal(13/10)" gave different
> > answers.
> Yeah, and that kills the deal for me.  Either Decimal is the default
> representation for non-integers, or this is a no-go.  And that isn't
> going to happen.

What if also 13/10 yielded a fraction? Anyway, what are the objections to integer division returning a fraction? They are coerced to floats when mixed with them. Also, the repr of Fraction class could be altered so repr(13 / 10) == "13 / 10" would hold.

Regards, Drekin

From drekin at  Wed Jun  3 16:44:06 2015
From: drekin at (drekin at
Date: Wed, 03 Jun 2015 07:44:06 -0700 (PDT)
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
Message-ID: <>

Andrew Barnert wrote:

>> Oh, another thought... Decimals could gain yet another conversion 
>> method, one which implicitly uses the float repr, but signals if it was 
>> an inexact conversion or not. Explicitly calling repr can never signal, 
>> since the conversion occurs outside of the Decimal constructor and 
>> Decimal sees only the string:
>> Decimal(repr(1.3)) cannot signal Inexact.
>> But:
>> Decimal.from_nearest_float(1.5)  # exact
>> Decimal.from_nearest_float(1.3)  # signals Inexact
>> That might be useful, but probably not to beginners.
> I think this might be worth having whether the default constructor is changed or not.
> I can't think of too many programs where I'm pretty sure I have an exactly-representable decimal as a float but want to check to be sure... but for interactive use in IPython (especially when I'm specifically trying to explain to someone why just using Decimal instead of float will/will not solve their problem) I could see using it.

How about more general Decimal.from_exact that does the same for argument of any type ??? float, int, Decimal object with possibly different precission, fraction, string. Just convert the argument to Decimal and signal if it cannot be done losslessly. The same constructor with the same semantics could be added to int, float, Fraction as well.

Regards, Drekin

From drekin at  Wed Jun  3 17:00:36 2015
From: drekin at (drekin at
Date: Wed, 03 Jun 2015 08:00:36 -0700 (PDT)
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
Message-ID: <>

> On Mon, Jun 01, 2015 at 06:27:57AM +0000, Nicholas Chammas wrote:
>> Having decimal literals or something similar by default, though perhaps
>> problematic from a backwards compatibility standpoint, is a) user friendly,
>> b) easily understandable, and c) not surprising to beginners. None of these
>> qualities apply to float literals.
> I wish this myth about Decimals would die, because it isn't true. The 
> only advantage of base-10 floats over base-2 floats -- and I'll admit it 
> can be a big advantage -- is that many of the numbers we commonly care 
> about can be represented in Decimal exactly, but not as base-2 floats. 
> In every other way, Decimals are no more user friendly, understandable, 
> or unsurprising than floats. Decimals violate all the same rules of 
> arithmetic that floats do. This should not come as a surprise, since 
> decimals *are* floats, they merely use base 10 rather than base 2.

You are definitely right in "float vs. Decimal as representation of a real", but there is also a syntactical point that interpreting a float literal as Decimal rather than binary float is more natural since the literal itself *is* decimal.

The there would be no counterpart of the following situation if the float literal was interpreted as Decimal rather than binary float.
>>> 0.1
>>> Decimal(0.1)

Regards, Drekin

From drekin at  Wed Jun  3 18:08:17 2015
From: drekin at (drekin at
Date: Wed, 03 Jun 2015 09:08:17 -0700 (PDT)
Subject: [Python-ideas] Python Float Update
In-Reply-To: <mkh0h0$jg4$>
Message-ID: <>

Stefan Behnel wrote:

> random832 at schrieb am 01.06.2015 um 05:14:
>> On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote:
>>> First, I propose that a float's integer ratio should be accurate. For
>>> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it
>>> returns(6004799503160661, 18014398509481984).
>> Even though he's mistaken about the core premise, I do think there's a
>> kernel of a good idea here - it would be nice to have a method (maybe
>> as_integer_ratio, maybe with some parameter added, maybe a different
>> method) to return with the smallest denominator that would result in
>> exactly the original float if divided out, rather than merely the
>> smallest power of two.
> The fractions module seems the obvious place to put this. Consider opening
> a feature request. Target version would be Python 3.6.
> Stefan

This makes sense for any floating point number, for example Decimal. It could be also a constructor of Fraction.

>>> Fraction.simple_from(0.1)
Fraction(1, 10)
>>> Fraction.simple_from(Decimal(1) / Decimal(3))
Fraction(1, 3)

Regards, Drekin

From steve at  Wed Jun  3 18:23:28 2015
From: steve at (Steven D'Aprano)
Date: Thu, 4 Jun 2015 02:23:28 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <mkh0h0$jg4$>
Message-ID: <>

On Wed, Jun 03, 2015 at 09:08:17AM -0700, drekin at wrote:

> This makes sense for any floating point number, for example Decimal. 
> It could be also a constructor of Fraction.
> >>> Fraction.simple_from(0.1)
> Fraction(1, 10)

Guido's time machine strikes again:

py> Fraction(0.1).limit_denominator(1000)
Fraction(1, 10)

> >>> Fraction.simple_from(Decimal(1) / Decimal(3))
> Fraction(1, 3)

py> Fraction(Decimal(1)/Decimal(3)).limit_denominator(100)
Fraction(1, 3)


From abarnert at  Wed Jun  3 18:55:21 2015
From: abarnert at (Andrew Barnert)
Date: Wed, 3 Jun 2015 09:55:21 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 2, 2015, at 20:12, Chris Angelico <rosuav at> wrote:
>> On Wed, Jun 3, 2015 at 11:56 AM, Andrew Barnert <abarnert at> wrote:
>>> On Jun 2, 2015, at 18:05, Chris Angelico <rosuav at> wrote:
>>> Once the code's finished being compiled, there's no
>>> record of what type of string literal was used (raw, triple-quoted,
>>> etc), only the type of string object (bytes/unicode). Custom literals
>>> could be the same
>> But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here.
> Well, an additional parameter to compile() would do it.

I don't understand what you mean. Sure, you can pass the magic registry a separate argument instead of leaving it in the local/global environment, but that doesn't really change anything.

> I've no idea
> how hard it is to write an import hook, but my notion was that you
> could do it that way and alter the behaviour of the compilation
> process.

It's not _that_ hard to write an import hook. But what are you going to do in that hook? 

If you're trying to change the syntax of Python by adding a new literal suffix, you have to rewrite the parser. (My hack gets around that by tokenizing, modifying the token stream, untokenizing, and compiling. But you don't want to do that in real life.)

So I assume your idea means something like: first we parse 2.3d into something like a new UserLiteral AST node, then if no hook translates that into something else before the AST is compiled, it's a SyntaxError?

But that still means:

 * If you want to use a user-defined literal, you can't import it; you need another module to first import that literal's import hook and then import your module.

 * Your .pyc file won't get updated when that other module changes the hooks in place when your module gets imported.

 * That's a significant amount of boilerplate for each module that wants to offer a new literal.

 * While it isn't actually that hard, it is something most module developers have no idea how to write. (A HOWTO could maybe help here....)

 * Every import has to be hooked and transformed once for each literal you want to be available.

Meanwhile, what exactly could the hook _do_ at compile time? It could generate the expression `Decimal('1.2')`, but that's no more "literal" than `literal_d('1.2')`, and now it means your script has to import `Decimal` into its scope instead. I suppose your import hook could push that import into the top of the script, but that seems even more magical. Or maybe you could generate an actual Decimal object, pickle it, compile in the expression `pickle.loads(b'cdecimal\nDecimal\np0\n(V1.2\np1\tp2\nRp3\n.')`, and push in a pickle import, but that doesn't really solve anything.

Really, trying to force something into a "compile-time computation" in a language that doesn't have a full compile-time sub-language is a losing proposition. C++03 had a sort of accidental minimal compile-time sub-language based on template expansion and required constant folding for integer and pointer arithmetic, and that really wasn't sufficient, which is why C++11 and D both added ways to use most of the language explicitly at compile time (and C++11 still didn't get it right, which is why C++14 had to redo it).

In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`?

> But I haven't put a lot of thought into implementation, nor
> do I know enough of the internals to know what's plausible and what
> isn't.
>> And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?
> It's to do with expectations. A literal should simply be itself,
> nothing else. When you have a string literal in your code, nothing can
> change what string that represents; at compilation time, it turns into
> a string object, and there it remains. Shadowing the name 'str' won't
> affect it. But if something that looks like a literal ends up being a
> function call, it could get extremely confusing - name lookups
> happening at run-time when the name doesn't occur in the code. Imagine
> the traceback:
> def calc_profit(hex):
>    decimal = int(hex, 16)
>    return 0.2d * decimal
>>>> calc_profit("1E2A")
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "<stdin>", line 3, in calc_profit
> AttributeError: 'int' object has no attribute 'Decimal'

But that _can't_ happen with my design: the `0.2d` is compiled to `literal_d('0.2')`. The call to `decimal.Decimal` is in that function's scope, so nothing you do in your function can interfere with it.

Sure, you can still redefine `literal_d`, but (a) why would you, and (b) even if you do, the problem will be a lot more obvious (especially since you had to explicitly `from decimalliterals import literal_d` at the top of the script, while you didn't have to even mention `decimal` or `Decimal` anywhere).

But your design, or any design that does the translation at compile time, _would_ have this problem. If you compile `0.2d` directly into `decimal.Decimal('0.2')`, then it's `decimal` that has to be in scope.

Also, notice that my design leaves the door open for later coming up with a special bytecode to look up translation functions following different rules (a registry, an explicit global lookup that ignores local shadowing, etc.); translating into a normal constructor expression doesn't.

> Uhh... what? Sure, I shadowed the module name there, but I'm not
> *using* the decimal module! I'm just using a decimal literal! It's no
> problem to shadow the built-in function 'hex' there, because I'm not
> using the built-in function!
> Whatever name you use, there's the possibility that it'll have been
> changed at run-time, and that will cause no end of confusion. A
> literal shouldn't cause surprise function calls and name lookups.
>>> - come to think of it, it might be nice to have
>>> pathlib.Path literals, represented as p"/home/rosuav" or something. In
>>> any case, they'd be evaluated using only compile-time information, and
>>> would then be saved as constants.
>>> That implies that only immutables should have literal syntaxes. I'm
>>> not sure whether that's significant or not.
>> But pathlib.Path isn't immutable.
> Huh, it isn't? That's a pity. In that case, I guess you can't have a
> path literal.

I don't understand why you think this is important.

Literal values, compile-time-computable/accessible values, and run-time-constant values are certainly not unrelated, but they're not the same thing. Other languages don't try to force them to be the same. In C++, for example, a literal has to evaluate into a compile-time-computable expression that only uses constant compile-time-accessible values, but the value it doesn't have to be constant at runtime. In fact, it's quite common for it not to be.

> In any case, I'm sure there'll be other string-like
> things that people can come up with literal syntaxes for.

From abarnert at  Wed Jun  3 20:26:54 2015
From: abarnert at (Andrew Barnert)
Date: Wed, 3 Jun 2015 11:26:54 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <mkm10d$tub$>
References: <>
 <> <mkm10d$tub$>
Message-ID: <>

I think this is off-topic, but it's important enough to answer anyway.

On Jun 2, 2015, at 21:48, Terry Reedy <tjreedy at> wrote:
>> On 6/2/2015 9:56 PM, Andrew Barnert via Python-ideas wrote:
>> The problem is that Python doesn't really define what it means by
>> "literal" anywhere,
> The reference manual seems quite definite to me. The definitive section is "Section 2.4. Literals". I should have all the information needed to write a new implementation.

No, that defines what literals mean for the purpose of lexical analysis.

> It starts "Literals are notations for constant values of some built-in types."

By the rules in this section, ..., None, True, and False are not literals, even though they are called literals everywhere else they appear in the documentation except for the Lexical Analysis chapter. In fact, even within that chapter, in 2.6 Delimiters, it explains that "A sequence of three periods has a special meaning as an ellipsis literal."

By the rules in this section, "-2" is not a literal, even though, e.g., in the data model section it says "co_consts is a tuple containing the literals used by the bytecode", and in every extant Python implementation -2 will be stored in co_consts.

By the rules in this section, "()" and "{}" are not literals, even though, e.g., in the set displays section it says "An empty set cannot be constructed with {}; this literal constructs an empty dictionary."

And so on.

And that's fine. None of those things are literals for the purpose of lexical analysis, even though they are things that represent literal values.

And using the word "literal" somewhat loosely isn't confusing anywhere. Where a more specific definition is needed, as when documenting the lexical analysis phase of the language, a specific definition is given.

And this is what allows ast.literal_eval to refer to "the following Python literal structures: strings, bytes, numbers, tuples, dicts, sets, booleans, and None" instead of having to say "the following Python literal structures: strings, bytes, and numbers; the negation of a literal number; the addition or subtraction of a non-imaginary literal number and an imaginary literal number; expression lists containing at least one comma; empty parentheses; the following container displays when not containing comprehensions: lists, dicts, sets; the keywords True, False, and None".

I don't think that's a bad thing. If you want to know what the "literal structure... None" means, it's easy to find out, and the fact that None is tokenized as a keyword rather than as a literal does not hamper you in any way. If you actually need to write a tokenizer, then the fact that None is tokenized as a keyword makes a difference--and you can find that out easily as well.

> > and the documentation is not consistent.
> I'd call it a bit sloppy in places.

I wouldn't call it sloppy. I'd call it somewhat loose and informal in places, but that's often a good thing.

>> There
>> are at least two places (not counting tutorial and howtos) that
>> Python 3.4 refers to list or dict literals. (That's not based on a
>> search; someone wrote a StackOverflow question asking what those two
>> places meant.)
> Please open a tracker issue to correct the sloppiness and reference the SO issue as evidence that it confuses people.

But it doesn't confuse people in any relevant way.

The user who asked that question had no problem figuring out how to interpret code that includes a (), or even how that code should be and is compiled. He could have written a Python interpreter with the knowledge he had. Maybe he couldn't have written a specification, but who cares? He doesn't need to.

>> This is similar to the fact that Python doesn't actually define the
>> semantics of numeric literals anywhere.
> I am again puzzled by your claim. There are 3 builtin number classes: int, float, and complex.  There are 3 type of numeric literals: integer, float, and imaginary. "An imaginary literal yields a complex number with a real part of 0.0."  Anyone capable of programming Python should be able to match 'integer' with 'int' and 'float' with 'float.

Yes, and they should also be able to tell that the integer literal "42" should evaluate to an int whose value is equal to 42, and that "the value may be approximated in the case of floating point" means that the literal "1.2" should evaluate to the float whose value is closest to 1.2 rather than some different approximation, and so on.

But the documentation doesn't actually define any of that. It doesn't have to, because it assumes it's being read by a non-idiot who's capable of programming Python (and won't deliberately make stupid decisions in interpreting it just because he's technically allowed to).

The C++ specification defines all of that, and more (that the digits are interpreted with the leftmost as most significant, that the runtime value of an integer literal is not an lvalue, that it counts as a compile-time constant value, and so on). It attempts to make no assumptions at all (and there have been cases where C++ compiler vendors _have_ made deliberately obtuse interpretations just to make a point about the standard).

That's exactly why reference documentation is more useful than a specification: because it leaves out the things that should be obvious to anyone capable of programming Python. To learn how integer literals work in Python, I need to look at two short and accessible paragraphs; to learn how integer literals work in C++, I have to read 2 full-page sections plus parts of at least 2 others, all written in impenetrable legalese.

From abarnert at  Wed Jun  3 21:43:00 2015
From: abarnert at (Andrew Barnert)
Date: Wed, 3 Jun 2015 12:43:00 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 2, 2015, at 19:52, Steven D'Aprano <steve at> wrote:
>> On Tue, Jun 02, 2015 at 12:03:25PM -0700, Andrew Barnert via Python-ideas wrote:
>> I explored the convertible literals a while ago, and I'm pretty sure 
>> that doesn't work in a duck-typed language. But the C++ design does 
>> work, as long as you're willing to have the conversion (including the 
>> lookup of the conversion function itself) done at runtime.
> I'm torn. On the one hand, some sort of extensible syntax for literals 
> would be nice. I say "nice" rather than useful because there are 
> advantages and disadvantages and there's no way of really knowing 
> which outweighs the other.

That's exactly why I came up with something I could hack up without any changes to the interpreter. It means anyone can try it out and see whether the advantages outweigh the disadvantages for them. (Of course there are additional disadvantages to the hack in efficiency, hackiness, and possibly debugability, so it may unfairly bias people who don't keep that in mind--but if so, it can only bias them in the conservative direction of rejecting the idea, which I think is ok.)

> But, really, your proposal is in no way, shape or form syntax for 
> *literals*,

It's a syntax for things that are somewhat like `2`, more like `-2`, even more like `(2,)`, but still not exactly the same as even that. If you don't like using the word "literal" for that, you can come up with a different word. I called it a "literal" because "user-defined literals" is what people were asking for when they asked for `2.3d`, and it has clear parallels with a very similar feature with the same name in other languages. But I'm fine calling it something different, as long as people who are looking for it will know how to find it.

> it's a new syntax for an unary postfix operator

That's fair; C++ in fact defines its user literal syntax in terms of special constexpr operator overloads, and points out the similarities to postfix operator++ in a note.

> or function. 
> The whole point of something being a literal is that it is parsed and 
> converted at compile time.
> Now you might (and do) say that worrying 
> about this is "premature optimization", but call me a pedant if you 
> like, I don't think we should call something a literal if it's a 
> runtime function call.

I don't think this is the right distinction.

A literal is a notation for expressing some value that means what it says in a sufficiently simple way. That concept has significant overlap with "compile-time evaluable", and with "constant", but they're not the same concepts.

And this is especially true for a language that doesn't define any compile-time computation phase. In Python, `-2` may be compiled to UNARY_NEGATIVE on the compiled-in constant value 2, or just to the compiled-in constant value -2, depending on what the implementation wants to optimize. Do you want to call it a literal in some implementations but not others? No reasonable user code that isn't reflecting on the internals is going to care, or even know, what the implementation is doing.

Being "user-defined" means that the "sufficiently simple way" the notation gets its meaning has to involve user code. In a language with a compile-time computation phase like C++, that can mean "constexpr" user code, but Python doesn't define a "constexpr"-like phase.

At any rate, again, if you want to call it something different, that's fine, as long as people looking for "what does `1.2d` mean in this program" or "how do I do the Python equivalent of a C++ user-defined literal" will be able to understand it.

> Otherwise, we might as well say that 
>    from fractions import Fraction
>    Fraction(2)
> is a literal, in which case I can say your proposal is unnecessary as we 
> already have user-specified literals in Python.

In C++, a constructor expression like Fraction(2) may be evaluable at compile time, and may evaluate to something that's constant at both compile time and runtime, and yet it's still not a literal. Why? Because their rule for what counts as "sufficiently simple" includes constexpr postfix user-literal operators, but not constexpr function or constructor calls. I don't know of anyone who's confused by that. It's a useful (and intuitively useful) distinction, separate from the "constexpr" and "const" distinctions.

> I can think of some interesting uses for postfix operators, or literals, 
> or whatever we want to call them:
> 45?
> 10!!
> 23.5d
> 3d6
> 35'24"
> 15ell
> I've deliberately not explained what I mean by each of them. You can 
> probably guess some, or all, but I hope it demonstrates one problem with 
> this suggestion. Like operator overloading, it risks making code less 
> clear rather than more.

Sure. In fact, it's very closely analogous--both of them are ways to allow a user-defined type to act more like a builtin type, which can be abused to do completely different things instead. The C++ proposal specifically pointed out this comparison.

I think the risk is lower in Python than in C++ just because Python idiomatically discourages magical or idiosyncratic programming much more strongly in general, and that means operator overloading is already used more consistently and less confusingly than in C++, so the same is more likely to be true with this new feature. But of course the risk isn't zero. 

Again, I'm hoping people will play around with it, come up with example code they can show to other people for impressions, etc., rather than trying to guess, or come up with some abstract argument. It's certainly possible that everything that looks like a good example when you think of it will look too magical to anyone who reads your code. Then the idea can be rejected, and if anyone thinks of a similar idea in the future, they can be pointed to the existing examples and asked, "Can your idea solve these problems?"

From abarnert at  Wed Jun  3 22:17:09 2015
From: abarnert at (Andrew Barnert)
Date: Wed, 3 Jun 2015 13:17:09 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 3, 2015, at 07:29, drekin at wrote:
> Stephen J. Turnbull writes:
>> Nick Coghlan writes:
>>> the main concern I have with [a FloatLiteral that carries the
>>> original repr around] is that we'd be trading the status quo for a
>>> situation where "Decimal(1.3)" and "Decimal(13/10)" gave different
>>> answers.
>> Yeah, and that kills the deal for me.  Either Decimal is the default
>> representation for non-integers, or this is a no-go.  And that isn't
>> going to happen.
> What if also 13/10 yielded a fraction?

That was raised near the start of the thread. In fact, I think the initial proposal was that 13/10 evaluated to Fraction(13, 10) and 1.2 evaluated to something like Fraction(12, 10).

> Anyway, what are the objections to integer division returning a fraction? They are coerced to floats when mixed with them.

As mentioned earlier in the thread, the language that inspired Python, ABC, used exactly this design: computations were kept as exact rationals until you mixed them with floats or called irrational functions like root. So it's not likely Guido didn't think of this possibility; he deliberately chose not to do things this way. He even wrote about this a few years ago; search for "integer division" on his Python-history blog. 

So, what are the problems?

When you stay with exact rationals through a long series of computations, the result can grow to be huge in memory, and processing time. (I'm ignoring the fact that CPython doesn't even have a fast fraction implementation, because one could be added easily. It's still going to be orders of magnitude slower to add two fractions with gigantic denominators than to add the equivalent floats or decimals.)

Plus, it's not always obvious when you've lost exactness. For example, exponentiation between rationals is exact only if the power simplifies to a whole fraction (and hasn't itself become a float somewhere along the way). Since the fractions module doesn't have IEEE-style flags for inexactness/rounding, it's harder to notice when this happens.

Except in very trivial cases, the repr would be much less human-readable and -debuggable, not more. (Or do you find 1728829813 / 2317409 easier to understand than 7460.181958816937?)

Fractions and Decimals can't be mixed or interconverted directly.

There are definitely cases where a rational type is the right thing to use (it wouldn't be in the stdlib otherwise), but I think they're less common than the cases where a floating-point type (whether binary or decimal) is the right thing to use. (And even many cases where you think you want rationals, what you actually want is SymPy-style symbolic computation--which can give you exact results for things with roots or sins or whatever as long as they cancel out in the end.)

From rosuav at  Wed Jun  3 23:48:30 2015
From: rosuav at (Chris Angelico)
Date: Thu, 4 Jun 2015 07:48:30 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 4, 2015 at 2:55 AM, Andrew Barnert <abarnert at> wrote:
> In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`?

That's exactly the thing: 1.2d should be atomic. It should not be an
expression. The three examples you gave are syntactically expressions,
but they act very much like literals thanks to constant folding:

>>> dis.dis(lambda: -2)
  1           0 LOAD_CONST               2 (-2)
              3 RETURN_VALUE
>>> dis.dis(lambda: 1+2j)
  1           0 LOAD_CONST               3 ((1+2j))
              3 RETURN_VALUE
>>> dis.dis(lambda: (1, 2))
  1           0 LOAD_CONST               3 ((1, 2))
              3 RETURN_VALUE

which means they behave the way people expect them to. There is no way
for run-time changes to affect what any of those expressions yields.
Whether you're talking about shadowing the name Decimal or the name
literal_d, the trouble is that it's happening at run-time. Here's
another confusing case:

import decimal
from fractionliterals import literal_fr
# oops, forgot to import literal_d

# If we miss off literal_fr, we get an immediate error, because
# 1/2fr gets evaluated at def time.
def do_stuff(x, y, portion=1/2fr):
    try: result = decimal.Decimal(x*y*portion)
    except OverflowError: return 0.0d

You won't know that your literal has failed until something actually
triggers the error. That is extremely unobvious, especially since the
token "literal_d" doesn't occur anywhere in do_stuff(). Literals look
like atoms, and if they behave like expressions, sooner or later
there'll be a ton of Stack Overflow questions saying "Why doesn't my
code work? I just changed this up here, and now I get this weird
error". Is that how literals should work? No.


From surya.subbarao1 at  Thu Jun  4 00:05:24 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Wed, 3 Jun 2015 15:05:24 -0700
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 29
In-Reply-To: <>
References: <>
Message-ID: <>

> Stephen J. Turnbull writes:
>> Nick Coghlan writes:
>>> the main concern I have with [a FloatLiteral that carries the
>>> original repr around] is that we'd be trading the status quo for a
>>> situation where "Decimal(1.3)" and "Decimal(13/10)" gave different
>>> answers.
>> Yeah, and that kills the deal for me.  Either Decimal is the default
>> representation for non-integers, or this is a no-go.  And that isn't
>> going to happen.
> What if also 13/10 yielded a fraction?

Yeah, either Decimal becomes default or 13/10 is a fraction. If
Decimal becomes default, we could have Decimal(13 / 10) = Decimal(13)
/ Decimal(10). We would have "expected" results.


>Fractions and Decimals can't be mixed or interconverted directly.

If Decimals are default, Fractions can have a .divide() method which
returns Decimal(Numerator) / Decimal(Denominator), which is used when
Fractions and Decimals are mixed.

-Surya Subbarao

From surya.subbarao1 at  Thu Jun  4 00:17:23 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Wed, 3 Jun 2015 15:17:23 -0700
Subject: [Python-ideas] [Python Ideas] Python Float Update
Message-ID: <>

> I?m going to show a few examples of how Decimals violate the fundamental
> laws of mathematics just as floats do.

Decimal is also uses sign and mantissa, except it's Base 10. I think
Decimal should use numerators and denominators, because they are more
accurate. That's why even Decimal defies the laws of mathematics.

From surya.subbarao1 at  Thu Jun  4 00:18:51 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Wed, 3 Jun 2015 15:18:51 -0700
Subject: [Python-ideas] Python Float Update
Message-ID: <>

> I'm going to show a few examples of how Decimals violate the fundamental
> laws of mathematics just as floats do.

Decimal is also uses sign and mantissa, except it's Base 10. I think
Decimal should use numerators and denominators, because they are more
accurate. That's why even Decimal defies the laws of mathematics.

-Surya Subbarao

From breamoreboy at  Thu Jun  4 00:23:17 2015
From: breamoreboy at (Mark Lawrence)
Date: Wed, 03 Jun 2015 23:23:17 +0100
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <mknuoq$762$>

On 03/06/2015 23:18, u8y7541 The Awesome Person wrote:
>> I'm going to show a few examples of how Decimals violate the fundamental
>> laws of mathematics just as floats do.
> Decimal is also uses sign and mantissa, except it's Base 10. I think
> Decimal should use numerators and denominators, because they are more
> accurate. That's why even Decimal defies the laws of mathematics.
> -Surya Subbarao

Defying the laws of mathematics isn't a key issue here as practicality 
beats purity.  Try beating the laws of the BDFL and the core devs and 
it's the Comfy Chair, terribly sorry and all that.

My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

From abarnert at  Thu Jun  4 00:35:34 2015
From: abarnert at (Andrew Barnert)
Date: Wed, 3 Jun 2015 15:35:34 -0700
Subject: [Python-ideas] [Python Ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 3, 2015, at 15:17, u8y7541 The Awesome Person <surya.subbarao1 at> wrote:

>> I?m going to show a few examples of how Decimals violate the fundamental
>> laws of mathematics just as floats do.
> Decimal is also uses sign and mantissa, except it's Base 10. I think
> Decimal should use numerators and denominators, because they are more
> accurate.

So sqrt(2) should be represented as an exact fraction? Do you have infinite RAM?

> That's why even Decimal defies the laws of mathematics.

From surya.subbarao1 at  Thu Jun  4 00:46:00 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Wed, 3 Jun 2015 15:46:00 -0700
Subject: [Python-ideas] [Python Ideas] Python Float Update
Message-ID: <>

On Wed, Jun 3, 2015 at 3:35 PM, Andrew Barnert <abarnert at> wrote:
> On Jun 3, 2015, at 15:17, u8y7541 The Awesome Person <surya.subbarao1 at> wrote:
>>> I?m going to show a few examples of how Decimals violate the fundamental
>>> laws of mathematics just as floats do.
>> Decimal is also uses sign and mantissa, except it's Base 10. I think
>> Decimal should use numerators and denominators, because they are more
>> accurate.
> So sqrt(2) should be represented as an exact fraction? Do you have infinite RAM?

You can't represent sqrt(2) exactly with sign and mantissa either.
When Decimal detects a non-repeating decimal, it should round it, and
assign it a numerator and denominator something like 14142135623730951
/ 10000000000000000 simplified. That's better than sign and mantissa

Or an alternative could be a hybrid of sign and mantissa and fraction
representation... I don't think that's a good idea though.

-Surya Subbarao

From abarnert at  Thu Jun  4 01:03:35 2015
From: abarnert at (Andrew Barnert)
Date: Wed, 3 Jun 2015 16:03:35 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 3, 2015, at 14:48, Chris Angelico <rosuav at> wrote:
>> On Thu, Jun 4, 2015 at 2:55 AM, Andrew Barnert <abarnert at> wrote:
>> In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`?
> That's exactly the thing: 1.2d should be atomic. It should not be an
> expression. The three examples you gave are syntactically expressions,
> but they act very much like literals thanks to constant folding:
>>>> dis.dis(lambda: -2)
>  1           0 LOAD_CONST               2 (-2)
>              3 RETURN_VALUE
>>>> dis.dis(lambda: 1+2j)
>  1           0 LOAD_CONST               3 ((1+2j))
>              3 RETURN_VALUE
>>>> dis.dis(lambda: (1, 2))
>  1           0 LOAD_CONST               3 ((1, 2))
>              3 RETURN_VALUE
> which means they behave the way people expect them to.

But that's not something that's guaranteed by Python. It's something that implementations are allowed to do, and that CPython happens to do. If user code actually relied on that optimization, that code would be nonportable.

But the reason Python allows that optimization in the first place is that user code actually doesn't care whether these expressions are evaluated "atomically" or at compile time, so it's ok to do so behind users' backs. It's not surprising because no one is going to monkeypatch int.__neg__ between definition time and call time (which CPython doesn't, but some implementations do), or call dis and read the bytecode if they don't even understand what a compile-time optimization is, and so on.

> There is no way
> for run-time changes to affect what any of those expressions yields.
> Whether you're talking about shadowing the name Decimal or the name
> literal_d, the trouble is that it's happening at run-time. Here's
> another confusing case:
> import decimal
> from fractionliterals import literal_fr
> # oops, forgot to import literal_d
> # If we miss off literal_fr, we get an immediate error, because
> # 1/2fr gets evaluated at def time.
> def do_stuff(x, y, portion=1/2fr):
>    try: result = decimal.Decimal(x*y*portion)
>    except OverflowError: return 0.0d
> You won't know that your literal has failed until something actually
> triggers the error.

If that's a problem, then you're using the wrong language. You also won't know that you've typo'd OvreflowError or reslt, or called d.sqrt() instead of decimal.sqrt(d), or all kinds of other errors until something actually triggers the error. Which means either executing the code, or running a static linter. Which would be exactly the same for 1.2d.

> That is extremely unobvious, especially since the
> token "literal_d" doesn't occur anywhere in do_stuff().

This really isn't going to be confusing in real life. You get an error saying you forgot to define literal_d. You say, "Nuh uh, I did define it right at the top, same way I did literal_fr, in this imp... Oops, looks like I forgot to import it".

> Literals look
> like atoms, and if they behave like expressions, sooner or later
> there'll be a ton of Stack Overflow questions saying "Why doesn't my
> code work? I just changed this up here, and now I get this weird
> error".

Can you come up with an actual example where changing this up here gives this weird error somewhere else? If not, I doubt even the intrepid noobs of StackOverflow will come up with one.

Neither of the examples so far qualifies--the first one is an error that the design can never produce, and the second one is not weird or confusing any more than any other error in any dynamic languages.

And if you're going to suggest "what if I just redefine literal_d for no reason", ask yourself who would ever do that? Redefining decimal makes sense, because that's a reasonable name for a variable; redefining literal_d is as silly as redefining __name__. (But if you think those are different because double underscores are special, I suppose __literal_d__ doesn't bother me.)

From abarnert at  Thu Jun  4 01:20:15 2015
From: abarnert at (Andrew Barnert)
Date: Wed, 3 Jun 2015 16:20:15 -0700
Subject: [Python-ideas] [Python Ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 3, 2015, at 15:46, u8y7541 The Awesome Person <surya.subbarao1 at> wrote:
>> On Wed, Jun 3, 2015 at 3:35 PM, Andrew Barnert <abarnert at> wrote:
>> On Jun 3, 2015, at 15:17, u8y7541 The Awesome Person <surya.subbarao1 at> wrote:
>>>> I?m going to show a few examples of how Decimals violate the fundamental
>>>> laws of mathematics just as floats do.
>>> Decimal is also uses sign and mantissa, except it's Base 10. I think
>>> Decimal should use numerators and denominators, because they are more
>>> accurate.
>> So sqrt(2) should be represented as an exact fraction? Do you have infinite RAM?
> You can't represent sqrt(2) exactly with sign and mantissa either.

That's exactly the point: Decimal never _pretends_ to be exact, and therefore there's no problem when it can't be.

By the way, it's not just "sign and mantissa" (that just gives you an integer, or maybe a fixed-point number), it's sign, mantissa, _and exponent_.

> When Decimal detects a non-repeating decimal, it should round it, and
> assign it a numerator and denominator something like 14142135623730951
> / 10000000000000000 simplified.
> That's better than sign and mantissa
> errors.

No, that's exactly the same value as mantissa 1.4142135623730951 and exponent 0, and therefore it has exactly the same error. You haven't gained anything over using Decimal.

And meanwhile, you've lost some efficiency (it takes twice as much memory because you have to store all those zeroes, where in Decimal they're implied by the exponent), and you've lost the benefit of a well-designed standard to follow (how many digits should you keep? what rounding rule should you use? should there be some way to optionally signal the user that rounding has occurred? and so on...). 

And, again, you've made things more surprising, not less, because now you have a type that's always exact, except when it isn't.

Meanwhile, when you asked about the problems, I gave you a whole list of them. Have you thought about the others, or only the third one on the list? For example, do you really want adding up a long string of simple numbers to give you a value that takes 500x as much memory to store and 500x as long to calculate with if you don't need the exactness? Or is there going to be another rounding rule that when the fraction gets "too big" you truncate it to a smaller approximation?

And meanwhile, if you do need the exactness, why don't you need to be able to carry around exact rational multiplies of pi or an exact representation of 2 ** 0.5 (both of which SymPy can do for you, by representing numbers symbolically, the way humans do when they need to)?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Thu Jun  4 01:40:13 2015
From: rosuav at (Chris Angelico)
Date: Thu, 4 Jun 2015 09:40:13 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 4, 2015 at 9:03 AM, Andrew Barnert <abarnert at> wrote:
> Can you come up with an actual example where changing this up here gives this weird error somewhere else? If not, I doubt even the intrepid noobs of StackOverflow will come up with one.
> Neither of the examples so far qualifies--the first one is an error that the design can never produce, and the second one is not weird or confusing any more than any other error in any dynamic languages.

Anything that causes a different code path to be executed can do this.


From surya.subbarao1 at  Thu Jun  4 02:19:01 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Wed, 3 Jun 2015 17:19:01 -0700
Subject: [Python-ideas] [Python Ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

>(it takes twice as much memory because you have to store all those zeroes, where in Decimal they're implied by the exponent), and you've lost the benefit of a well-designed standard to follow (how many digits should you keep? what rounding rule should you use? should there be some way to optionally signal the user that rounding has occurred? and so on...).

You are right about memory...
LOL, I just thought about having something like representing it as a
float / float for numerator / denominator! But that would be slower...

There's got to be a workaround for those zeros. Especially if I'm
dealing with stuff like 57 / 10^100 (57 is prime!).

-Surya Subbarao

From rosuav at  Thu Jun  4 02:24:05 2015
From: rosuav at (Chris Angelico)
Date: Thu, 4 Jun 2015 10:24:05 +1000
Subject: [Python-ideas] [Python Ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 4, 2015 at 10:19 AM, u8y7541 The Awesome Person
<surya.subbarao1 at> wrote:
> You are right about memory...
> LOL, I just thought about having something like representing it as a
> float / float for numerator / denominator! But that would be slower...

How would that even help?


From guido at  Thu Jun  4 03:01:38 2015
From: guido at (Guido van Rossum)
Date: Wed, 3 Jun 2015 18:01:38 -0700
Subject: [Python-ideas] [Python Ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

At this point I feel compelled to explain why I'm against using
fractions/rationals to represent numbers given as decimals.

>From 1982 till 1886 I participated in the implementation of ABC ( which did implement numbers as
arbitrary precision fractions. (An earlier prototype implemented them as
fractions of two floats, but that was wrong for many other reasons -- two
floats are not better than one. :-)

The design using arbitrary precision fractions was intended to avoid newbie
issues with decimal numbers (these threads have elaborated plenty on those
newbie issues). For reasons that should also be obvious by now, we
converted these fractions back to decimal before printing them.

But there was a big issue that we didn't anticipate. During the course of a
simple program it was quite common for calculations to slow down
dramatically, because numbers with ever-larger numerators and denominators
were being computed (and rational arithmetic quickly slows down as those
get bigger). So e.g. you might be computing your taxes with a precision of
a million digits -- only to be rounding them down to dollars for display.

These issues were quite difficult to debug because the normal approach to
debugging ("just use print statements") didn't work -- unless you came up
with the idea of printing the numbers as a fraction.

For this reason I think that it's better not to use rational arithmetic by

FWIW the same reasoning does *not* apply to using Decimal or something like
decimal128. But then again those don't really address most issues with
floating point -- the rounding issue exists for decimal as well as for
binary. Anyway, that's a separate discussion to have.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From python at  Thu Jun  4 03:06:19 2015
From: python at (MRAB)
Date: Thu, 04 Jun 2015 02:06:19 +0100
Subject: [Python-ideas] [Python Ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On 2015-06-04 02:01, Guido van Rossum wrote:
> At this point I feel compelled to explain why I'm against using
> fractions/rationals to represent numbers given as decimals.
>  From 1982 till 1886 I participated in the implementation of ABC
> ( which did implement numbers as
> arbitrary precision fractions. (An earlier prototype implemented them as
> fractions of two floats, but that was wrong for many other reasons --
> two floats are not better than one. :-)
Was that when the time machine was first used? :-)


From abarnert at  Thu Jun  4 03:03:43 2015
From: abarnert at (Andrew Barnert)
Date: Wed, 3 Jun 2015 18:03:43 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 3, 2015, at 16:40, Chris Angelico <rosuav at> wrote:
>> On Thu, Jun 4, 2015 at 9:03 AM, Andrew Barnert <abarnert at> wrote:
>> Can you come up with an actual example where changing this up here gives this weird error somewhere else? If not, I doubt even the intrepid noobs of StackOverflow will come up with one.
>> Neither of the examples so far qualifies--the first one is an error that the design can never produce, and the second one is not weird or confusing any more than any other error in any dynamic languages.
> Anything that causes a different code path to be executed can do this.

Well, any expression causes a different code path to be executed than any different expression, or what would be the point? But how is this relevant here?

Is there an example where 1.2d would lead to "changing this up here gives this weird error somewhere else" that doesn't apply just as well to spam.eggs (or that's relevant or likely to come up or whatever in the case of 1.2d but not in the case of spam.eggs)?

Otherwise, you're just presenting an argument against dynamic languages--or maybe even against programming languages full stop (after all, the same kinds of things can happen in Haskell or C++, they just often happen at compile time, so you get to debug the same "weird error" earlier).

From stephen at  Thu Jun  4 10:33:16 2015
From: stephen at (Stephen J. Turnbull)
Date: Thu, 04 Jun 2015 17:33:16 +0900
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 29
In-Reply-To: <>
References: <>
Message-ID: <>

u8y7541 The Awesome Person writes:

 > Yeah, either Decimal becomes default or 13/10 is a fraction. If
 > Decimal becomes default, we could have Decimal(13 / 10) =
 > Decimal(13) / Decimal(10). We would have "expected" results.

I gather you haven't read anybody's replies, because, no, you don't
get expected results with Decimal: Decimal can still violate all the
invariants that binary floats can.  Binary has better approximation
properties if you care about the *degree* of inexactness rather than
the *frequency*[1] of inexactness.  Fractions can quickly become very
inefficient.  Of course you could try approximate fractions with fixed
slash or floating slash calculations[2] to get bounds on the
complexity of "simple" arithmetic, but then you're back in a world
with approximations.

[1]  In some sense.  One important sense is "how often humans would
care or even notice", which is likely to be even less frequent than
how often inexactness is introduced, for Decimal.  But that varies by

[2]  It's in Knuth, Seminumerical Algorithms IIRC.

From steve at  Thu Jun  4 14:08:36 2015
From: steve at (Steven D'Aprano)
Date: Thu, 4 Jun 2015 22:08:36 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 03, 2015 at 12:43:00PM -0700, Andrew Barnert wrote:
> On Jun 2, 2015, at 19:52, Steven D'Aprano <steve at> wrote:
> > But, really, your proposal is in no way, shape or form syntax for 
> > *literals*,
> It's a syntax for things that are somewhat like `2`, more like `-2`, 
> even more like `(2,)`, but still not exactly the same as even that.

Not really. It's a syntax for something that is not very close to *any* 
of those examples. Unlike all of those example, it is a syntax for 
calling a function at runtime.

Let's take (-2, 1+3j) as an example. As you point out in another post, 
Python may constant-fold it, but isn't required to. Python 3.3 compiles 
it to a single constant:

  LOAD_CONST               6 ((-2, (1+3j)))

but Python 1.5 compiles it to a series of byte-code operations:

  LOAD_CONST          0 (2)
  LOAD_CONST          1 (1)
  LOAD_CONST          2 (3j)
  BUILD_TUPLE         2

But that's just implementation detail. Whether Python 3.3 or 1.5, both 
expressions have something in common: the *operation* is immutable (I 
don't mean the object itself); there is nothing you can do, from pure 
python code, to make the literal (-2, 1+3j) something other than a 
two-tuple consisting of -2 and 1+3j. You can shadow int, complex and 
tuple, and it won't make a lick of difference. For lack of a better 
term, I'm going to call this a "static operation" (as opposed to dynamic 
operations like called len(x), which can be shadowed or monkey-patched).

I don't wish to debate the definition of "literal", as that may be very 
difficult. For example, is 2+3j actually a literal, or an expression 
containing only literals? If a literal, how about 2*3**4/5 for that 
matter? As soon as Python compilers start doing compile-time constant 
folding, the boundary between literals and constant expressions becomes 
fuzzy. But that boundary is actually not very interesting. What is 
interesting is that every literal shares at least the property that I 
refer to above, that you cannot redefine the result of that literal at 
runtime by shadowing or monkey-patching.

Coming from that perspective, a literal *defined* at runtime as you 
suggest is a contradiction in terms. I don't care so much if the actual 
operation that evaluates the literal happens at runtime, so long as it 
is static in the above sense. If it's dynamic, then it's not a literal, 
it's just a function call with ugly syntax.

> If 
> you don't like using the word "literal" for that, you can come up with 
> a different word. I called it a "literal" because "user-defined 
> literals" is what people were asking for when they asked for `2.3d`, 

If you asked for a turkey and cheese sandwich on rye bread, and I said 
"Well, I haven't got any turkey, or rye, but I can give you a slice of 
cheese on white bread and we'll just call it a turkey and cheese rye 
sandwich", you probably wouldn't be impressed :-)

> A literal is a notation for expressing some value that means what it 
> says in a sufficiently simple way.

I don't think that works. "Sufficiently simple" is a problematic 
concept. If "123_d" is sufficiently simply, surely "d(123)" is equally 
simple? It's only one character more, and it's a much more familiar 
and conventional syntax.

Especially since *_d ends up calling a function, which might as well be 
called d(). And if it is called d, why not a more_meaningful_name() 
instead? I would hope that the length of the function name is not the 
defining characteristic of "sufficiently simple"? (Consider 

I don't wish to argue about other languages, but I think for Python, the 
important characteristic of "literals" is that they are static, as 
above, not "simple". An expression with nested containers isn't 
necessarily simple:

    {0: [1, 2, {3, 4, (5, 6)}]}  # add arbitrary levels of complexity

nor is it necessarily constructed as a compile-time constant, but it is 
static in the above sense. 

> > Otherwise, we might as well say that 
> > 
> >    from fractions import Fraction
> >    Fraction(2)
> > 
> > is a literal, in which case I can say your proposal is unnecessary as we 
> > already have user-specified literals in Python.
> In C++, a constructor expression like Fraction(2) may be evaluable at 
> compile time, and may evaluate to something that's constant at both 
> compile time and runtime, and yet it's still not a literal. Why? 
> Because their rule for what counts as "sufficiently simple" includes 
> constexpr postfix user-literal operators, but not constexpr function 
> or constructor calls.

What is the logic for that rule? If it is just an arbitrary decision 
that "literals cannot include parentheses" then I equally arbitrarily 
dismiss that rule and say "of course they can, the C++ standard not 
withstanding, and the fact that Fraction(2) is a constant evaluated at 
compile time is proof of that fact".

In any case, this is Python, and arguing over definitions from C++ is 
not productive. Our understanding of what makes a literal can be 
informed by other languages, but cannot be defined by other languages -- 
if for no other reason that other languages may not all agree on what 
is and isn't a literal.


From drekin at  Thu Jun  4 14:52:10 2015
From: drekin at (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=)
Date: Thu, 4 Jun 2015 14:52:10 +0200
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

Thank you very much for a detailed explanation.

Regards, Drekin

On Wed, Jun 3, 2015 at 10:17 PM, Andrew Barnert <abarnert at> wrote:

> On Jun 3, 2015, at 07:29, drekin at wrote:
> >
> > Stephen J. Turnbull writes:
> >
> >> Nick Coghlan writes:
> >>
> >>> the main concern I have with [a FloatLiteral that carries the
> >>> original repr around] is that we'd be trading the status quo for a
> >>> situation where "Decimal(1.3)" and "Decimal(13/10)" gave different
> >>> answers.
> >>
> >> Yeah, and that kills the deal for me.  Either Decimal is the default
> >> representation for non-integers, or this is a no-go.  And that isn't
> >> going to happen.
> >
> > What if also 13/10 yielded a fraction?
> That was raised near the start of the thread. In fact, I think the initial
> proposal was that 13/10 evaluated to Fraction(13, 10) and 1.2 evaluated to
> something like Fraction(12, 10).
> > Anyway, what are the objections to integer division returning a
> fraction? They are coerced to floats when mixed with them.
> As mentioned earlier in the thread, the language that inspired Python,
> ABC, used exactly this design: computations were kept as exact rationals
> until you mixed them with floats or called irrational functions like root.
> So it's not likely Guido didn't think of this possibility; he deliberately
> chose not to do things this way. He even wrote about this a few years ago;
> search for "integer division" on his Python-history blog.
> So, what are the problems?
> When you stay with exact rationals through a long series of computations,
> the result can grow to be huge in memory, and processing time. (I'm
> ignoring the fact that CPython doesn't even have a fast fraction
> implementation, because one could be added easily. It's still going to be
> orders of magnitude slower to add two fractions with gigantic denominators
> than to add the equivalent floats or decimals.)
> Plus, it's not always obvious when you've lost exactness. For example,
> exponentiation between rationals is exact only if the power simplifies to a
> whole fraction (and hasn't itself become a float somewhere along the way).
> Since the fractions module doesn't have IEEE-style flags for
> inexactness/rounding, it's harder to notice when this happens.
> Except in very trivial cases, the repr would be much less human-readable
> and -debuggable, not more. (Or do you find 1728829813 / 2317409 easier to
> understand than 7460.181958816937?)
> Fractions and Decimals can't be mixed or interconverted directly.
> There are definitely cases where a rational type is the right thing to use
> (it wouldn't be in the stdlib otherwise), but I think they're less common
> than the cases where a floating-point type (whether binary or decimal) is
> the right thing to use. (And even many cases where you think you want
> rationals, what you actually want is SymPy-style symbolic
> computation--which can give you exact results for things with roots or sins
> or whatever as long as they cancel out in the end.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From p.f.moore at  Thu Jun  4 15:06:12 2015
From: p.f.moore at (Paul Moore)
Date: Thu, 4 Jun 2015 14:06:12 +0100
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 4 June 2015 at 13:08, Steven D'Aprano <steve at> wrote:
> I don't wish to argue about other languages, but I think for Python, the
> important characteristic of "literals" is that they are static, as
> above, not "simple". An expression with nested containers isn't
> necessarily simple:
>     {0: [1, 2, {3, 4, (5, 6)}]}  # add arbitrary levels of complexity
> nor is it necessarily constructed as a compile-time constant, but it is
> static in the above sense.

I think that the main reason that people keep asking for things like
1.2d in place of D('1.2') is basically that the use of a string
literal, for some reason "feels different". It's not a technical
issue, nor is it one of compile time constants or static values - it's
simply about not wanting to *think* of the process as passing a string
literal to a function. They want "a syntax for a decimal" rather than
"a means of getting a decimal from a string" because that's how they
think of what they are doing.

People aren't asking for decimal literals because they don't know that
they can do D('1.2'). They want to avoid the quotes because they don't
"feel right", that's all. That's why the common question is "why
doesn't D(1.2) do what I expect?" rather than "how do I include a
decimal constant in my program?"

"Literal" syntax is about taking a chunk of the source code as a
string, and converting it into a runtime object. For built in types
the syntax is known to the lexer and the compiler knows how to create
the runtime constants (that applies as much to Python as to C or any
other language). The fundamental question here is whether there is a
Pythonic way of extending that to user-defined forms. That would have
to be handled at runtime, so the *syntax* would need to be immutable,
but the *semantics* could be defined in terms of runtime, without
violating the spirit of the request.

Such a syntax could be used for lots of things - regular expressions
are a common type that gets dedicated syntax (Javascript, Perl).

As a straw man how about a new syntax (this won't work as written,
because it'll clash with the "<" operator, but the basic idea works):

    LITERAL_CALL = PRIMARY "<" <any source character except right
angle bracket>* ">"

which is a new option for PRIMARY alongside CALL. This translates
directly into PRIMARY(str) where str is a string composed of the
source characters within <...>.

Decimal "literals" would then be

    from decimal import Decimal as D
    x = D<1.2>

Code objects could be


Regular expressions could be

    from re import compile as RE
    regex = RE<a.*([bc]+)$>

As you can see the potential for line noise and unreadable code is
there, but regular expressions always have that problem :-) Also, this
proposal gives a "literal syntax" that works with existing features,
rather than being a specialised add-on. Maybe that's a benefit (or
maybe it's over-generalisation).


From ncoghlan at  Thu Jun  4 15:48:43 2015
From: ncoghlan at (Nick Coghlan)
Date: Thu, 4 Jun 2015 23:48:43 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 4 June 2015 at 23:06, Paul Moore <p.f.moore at> wrote:
> As a straw man how about a new syntax (this won't work as written,
> because it'll clash with the "<" operator, but the basic idea works):
>     LITERAL_CALL = PRIMARY "<" <any source character except right
> angle bracket>* ">"

The main idea I've had for compile time metaprogramming that I figured
I might be able to persuade Guido not to hate is:

   python_ast, names2cells, unbound_names =

As suggested by the assignment target names, the default behaviour
would be to compile the expression to a Python AST, and then at
runtime provide some relevant information about the name bindings
referenced from it. (I haven't even attempted to implement this,
although I've suggested it to some of the SciPy folks as an idea they
might want to explore to make R style lazy evaluation easier)

By using the prefix+delimiters notation, it would become possible to
later have variants that were similarly transparent to the compiler,
but *called* a suitably registered callable at compile time to do the
conversion to runtime Python objects. For example:

   !sh(shell command)
   !format(format string with implicit interpolation)
   !sql(SQL query)

So for custom numeric types, you could register:

    d = !decimal(1.2)
    r = !rational(22/7)

This isn't an idea I'm likely to have time to pursue myself any time
soon (if ever), but I think it addresses the key concern with syntax
customisation by ensuring that customisations have *names*, and that
they're clearly distinguished from normal code.


From p.f.moore at  Thu Jun  4 16:25:58 2015
From: p.f.moore at (Paul Moore)
Date: Thu, 4 Jun 2015 15:25:58 +0100
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 4 June 2015 at 14:48, Nick Coghlan <ncoghlan at> wrote:
> On 4 June 2015 at 23:06, Paul Moore <p.f.moore at> wrote:
>> As a straw man how about a new syntax (this won't work as written,
>> because it'll clash with the "<" operator, but the basic idea works):
>>     LITERAL_CALL = PRIMARY "<" <any source character except right
>> angle bracket>* ">"
> The main idea I've had for compile time metaprogramming that I figured
> I might be able to persuade Guido not to hate is:
>    python_ast, names2cells, unbound_names =
> !(this_is_an_arbitrary_python_expression)

The fundamental difference between this proposal and mine is (I think)
that you're assuming an arbitrary Python expression in there (which is
parsed), whereas I'm proposing an *unparsed* string.

For example, your suggestion of !decimal(1.2) would presumably pass to
the "decimal" function, an AST consisting of a literal float node for
1.2. Which has the same issues as anything else that parses 1.2 before
the decimal constructor gets its hands on it - you've already lost the
original that the people wanting decimal literals need access to. And
I don't think your shell script example works - something like
!sh(echo $PATH) would be a syntax error, surely?

My proposal is specifically about allowing access to the *unevaluated*
source string, to allow the runtime function to take control of the
parsing. We have various functions already that take string
representations and parse them to objects (Decimal, re.compile,
compile...) - all I'm suggesting is a lighter-weight syntax than
("...") for "call with a string value". It's very hard to justify
this, as it doesn't add any new functionality, and it doesn't add that
much brevity. But it seems to me that it *does* add a strong measure
of "doing what people expect" - something that's hard to quantify, but
once you go looking for examples, it's applicable to a *lot* of
longstanding requests. The more I look, the more uses I can think of
(e.g., Windows paths via pathlib - Path<C:\Windows>).

The main issue I see with my proposal (other than "Guido could well
hate it" :-)) is that it has no answer to the fact that you can't
include the closing delimiter in the string - as soon as you try to
work around that, the syntax starts to lose its elegant simplicity
*very* fast. (Raw strings have similar problems - the rules on
backslashes in raw strings are clumsy at best).

Like you, though, I don't have time to work on this, so it's just an
idea if anyone else wants to pick up on it.


From steve at  Thu Jun  4 17:11:39 2015
From: steve at (Steven D'Aprano)
Date: Fri, 5 Jun 2015 01:11:39 +1000
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 03, 2015 at 03:18:51PM -0700, u8y7541 The Awesome Person wrote:
> > I'm going to show a few examples of how Decimals violate the fundamental
> > laws of mathematics just as floats do.
> Decimal is also uses sign and mantissa, except it's Base 10. I think
> Decimal should use numerators and denominators, because they are more
> accurate. That's why even Decimal defies the laws of mathematics.

The decimal module is an implementation of the decimal floating point 
arithmetic based on the General Decimal Arithmetic Specification:

and IEEE standard 854-1987:

The decimal module is not free to do whatever we want. It can only do 
what is specified by those standards. If you want to modify the decimal 
module to behave as you suggest, you are free to copy the module's 
source code and modify it. (It is open source, like all of Python.) This 
would be an interesting experiment for somebody to do.


From mertz at  Thu Jun  4 17:21:09 2015
From: mertz at (David Mertz)
Date: Thu, 4 Jun 2015 08:21:09 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 4, 2015 8:12 AM, "Steven D'Aprano" <steve at> wrote:
> The decimal module is not free to do whatever we want. It can only do
> what is specified by those standards.

That's not quite true. The class decimal.Decimal must obey those standards.
We could easily add decimal32/64/128 types to the module for those
different objects (and I think we probably should).

For that matter, it wouldn't violate the standards to add
decimal.PythonDecimal with some other behaviors. But I can't really think
of any desirable behaviors that are mathematically possible to put there.
There isn't going to be a decimal.NeverSurprisingDecimal class in there.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Thu Jun  4 21:14:38 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 4 Jun 2015 12:14:38 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 4, 2015, at 05:08, Steven D'Aprano <steve at> wrote:
>> On Wed, Jun 03, 2015 at 12:43:00PM -0700, Andrew Barnert wrote:
>> On Jun 2, 2015, at 19:52, Steven D'Aprano <steve at> wrote:
> [...]
>>> But, really, your proposal is in no way, shape or form syntax for 
>>> *literals*,
>> It's a syntax for things that are somewhat like `2`, more like `-2`, 
>> even more like `(2,)`, but still not exactly the same as even that.
> Not really. It's a syntax for something that is not very close to *any* 
> of those examples. Unlike all of those example, it is a syntax for 
> calling a function at runtime.
> Let's take (-2, 1+3j) as an example. As you point out in another post, 
> Python may constant-fold it, but isn't required to. Python 3.3 compiles 
> it to a single constant:
>  LOAD_CONST               6 ((-2, (1+3j)))
> but Python 1.5 compiles it to a series of byte-code operations:
>  LOAD_CONST          0 (2)
>  LOAD_CONST          1 (1)
>  LOAD_CONST          2 (3j)
>  BUILD_TUPLE         2
> But that's just implementation detail. Whether Python 3.3 or 1.5, both 
> expressions have something in common: the *operation* is immutable (I 
> don't mean the object itself); there is nothing you can do, from pure 
> python code, to make the literal (-2, 1+3j) something other than a 
> two-tuple consisting of -2 and 1+3j. You can shadow int, complex and 
> tuple, and it won't make a lick of difference. For lack of a better 
> term, I'm going to call this a "static operation" (as opposed to dynamic 
> operations like called len(x), which can be shadowed or monkey-patched).

But this isn't actually true. That BINARY_ADD opcode looks up the addition method at runtime and calls it. And that means that if you monkeypatch complex.__radd__, your method will get called.

As an implementation-specific detail, CPython 3.4 doesn't let you modify the complex type. Python allows this, but doesn't require it, and some other implementations do let you modify it.

So, if it's important to your code that 1+3j is a "static operation", then your code is non-portable at best. But once again, I suspect that the reason you haven't thought about this is that you've never written any code that actually cares what is or isn't a static operation. It's a typical "consenting adults" case.

> I don't wish to debate the definition of "literal", as that may be very 
> difficult. For example, is 2+3j actually a literal, or an expression 
> containing only literals? If a literal, how about 2*3**4/5 for that 
> matter? As soon as Python compilers start doing compile-time constant 
> folding, the boundary between literals and constant expressions becomes 
> fuzzy. But that boundary is actually not very interesting. What is 
> interesting is that every literal shares at least the property that I 
> refer to above, that you cannot redefine the result of that literal at 
> runtime by shadowing or monkey-patching.

What you're arguing here, and for the rest of the message, can be summarized in one sentence: the difference between user-defined literals and implementation-defined literals is that the former are user-defined. To which I have no real answer.

>> If 
>> you don't like using the word "literal" for that, you can come up with 
>> a different word. I called it a "literal" because "user-defined 
>> literals" is what people were asking for when they asked for `2.3d`,
> If you asked for a turkey and cheese sandwich on rye bread, and I said 
> "Well, I haven't got any turkey, or rye, but I can give you a slice of 
> cheese on white bread and we'll just call it a turkey and cheese rye 
> sandwich", you probably wouldn't be impressed :-)

But if I asked for a turkey and cheese hoagie, and you said I have turkey and cheese and a roll, but that doesn't count as a hoagie by my definition so you can't have it, I'd say just put the turkey and cheese on the roll and call it whatever you want to call it.

If people are asking for user-defined literals like 2.3d, and your argument is not that we can't or shouldn't do it, but that the term "user-defined literal" is contradictory, then the answer is the same: just call it something different.

I don't know how else to put this. I already said, in two different ways, that if you want to call it something different that's fine. You replied by saying you don't want to argue about the definition of literals, followed by multiple paragraphs arguing about the definition of literals.

>> A literal is a notation for expressing some value that means what it 
>> says in a sufficiently simple way.
> I don't think that works. "Sufficiently simple" is a problematic 
> concept. If "123_d" is sufficiently simply, surely "d(123)" is equally 
> simple? It's only one character more, and it's a much more familiar 
> and conventional syntax.

If you're talking about APL or J, the number of characters might be a relevant measure of simplicity. But in the vast majority of languages, including Python, it has very little relevance. Of course "simple" inherently a vague concept, and it will be different in different languages and contexts. But it's still one of the most important concepts. That's why language design is an art, and why we have a Zen of Python and not an Assembly Manual of Python. Trying to reduce it to something the wc program can measure means reducing it to the point of meaninglessness.

Let's give a different example. I could claim that currying makes higher-order expressions simpler. You could rightly point out that it makes the simplest function calls less simple. If we disagree on those points, or on the relative importance of them, we might draw up a bunch of examples to look at the human readability and writability or computer parsability of different expressions, in the context of idiomatic code in the language we were designing. If the rest of the language were a lot like Haskell, we'd probably agree that curried functions were simpler; if it were a lot like Python, we'd probably agree on the reverse. But at no point would the fact that f(1,2) is one character shorter than f(1)(2) come into the discussion. The closest we'd reasonably get might a discussion of the fact that the parens feel "big" and "get in the way" of reading the "more important" parts of the expression, or encourage the reader to naturally partition up the expression in a way that isn't appropriate to the intended meaning, or other such things. (See the "grit on Tim's monitor" appeal.) But those are still vague and subjective things. There's no objective measure to appeal to. Otherwise, every language proposal, Guido would just run the objective simplicity measurement program and it would say yes or no.

>> In C++, a constructor expression like Fraction(2) may be evaluable at 
>> compile time, and may evaluate to something that's constant at both 
>> compile time and runtime, and yet it's still not a literal. Why? 
>> Because their rule for what counts as "sufficiently simple" includes 
>> constexpr postfix user-literal operators, but not constexpr function 
>> or constructor calls.
> What is the logic for that rule?

In the case of C++, a committee actually sat down and hammered out a rigorous definition that codified the intuitive sense they were going for; if you want to read it, you can. But that isn't going to apply to anything but C++. And if you want to argue about it, the place to do so is the C++17 ISO committee. Just declaring that the C++ standard definition of literals doesn't define what you want to call literals doesn't really accomplish anything.

From abarnert at  Thu Jun  4 21:49:49 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 4 Jun 2015 12:49:49 -0700
Subject: [Python-ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 4, 2015, at 08:21, David Mertz <mertz at> wrote:
> On Jun 4, 2015 8:12 AM, "Steven D'Aprano" <steve at> wrote:
> > The decimal module is not free to do whatever we want. It can only do
> > what is specified by those standards.
> That's not quite true. The class decimal.Decimal must obey those standards. We could easily add decimal32/64/128 types to the module for those different objects (and I think we probably should).
If we add decimal32/64/128 types, presumably they'd act like the types of the same names as specified in the same standards. Otherwise, that would be very surprising.

Also, I'm not entirely sure we want to add those types to decimal in the stdlib. It would be a lot less work to implement them in terms of an existing C implementation (maybe using the native types if the exist, Intel's library if they don't). But I don't think that's necessarily desirable for the stdlib. Is this a big enough win to justify CPython being written in C11 instead of C90 (or, worse, sometimes one and sometimes the other, or a combination of the two), or to add a dependency on a library that isn't preinstalled on most systems and takes longer to build than all of current CPython 3.4? For a PyPI library, none of those things matter (in fact, a PyPI library could use C++ on platforms where the native types from the C++ TR exist but the C ones don't, or use Cython or CFFI or Boost::Python instead of native code, etc.),  and the stdlib's decimal docs could just point to that PyPI library.
> For that matter, it wouldn't violate the standards to add decimal.PythonDecimal with some other behaviors. But I can't really think of any desirable behaviors that are mathematically possible to put there.
I think it would still make sense to put it somewhere else. A module that declares that its behavior corresponds to a standard that adds a little bit of standard-irrelevant behavior is fine. For example, a conversion from Fraction to Decimal that worked "in the spirit of the standards" and documented that it wasn't specified by either standard. But adding a whole new class as complex as Decimal means half the module is now standard-irrelevant, which seems a lot more potentially confusing to me.
> There isn't going to be a decimal.NeverSurprisingDecimal class in there.
Start with your favorite axioms for the integers and your favorite construction of the reals, then negate the successor axiom. Now you have a fully-consistent, never-surprising, easily-implementable real number type that follows all the usual mathematical laws over its entire set of values, {0}. :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Thu Jun  4 21:49:06 2015
From: guido at (Guido van Rossum)
Date: Thu, 4 Jun 2015 12:49:06 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas <
python-ideas at> wrote:

> But this isn't actually true. That BINARY_ADD opcode looks up the addition
> method at runtime and calls it. And that means that if you monkeypatch
> complex.__radd__, your method will get called.

Wrong. You can't moneypatch complex.__radd__. That's a feature of the

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Thu Jun  4 22:18:56 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 4 Jun 2015 13:18:56 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 4, 2015, at 12:49, Guido van Rossum <guido at> wrote:
>> On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas <python-ideas at> wrote:
>> But this isn't actually true. That BINARY_ADD opcode looks up the addition method at runtime and calls it. And that means that if you monkeypatch complex.__radd__, your method will get called.
> Wrong. You can't moneypatch complex.__radd__. That's a feature of the language.

I may well have missed it, but I went looking through the Built-in Types library documentation, the Data Model and other chapters of the language reference documentation, and every relevant PEP I could think of, and I can't find anything that says this is true. 

The best I can find is the rationale section for PEP 3119 saying "there are good reasons to keep the built-in types immutable", which is why PEP 3141 was changed to not require mutating the built-in types. But "there are good reasons to allow implementations to forbid it" isn't the same thing as "all implementations must forbid it".

And at least some implementations do allow it, like Brython and one of the two embedded pythons. (And the rationale in PEP 3119 doesn't apply to them--Brython doesn't share built-in types between different Python interpreters in different browser windows, even if they're in the same address space.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Thu Jun  4 23:05:29 2015
From: guido at (Guido van Rossum)
Date: Thu, 4 Jun 2015 14:05:29 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

OK, you can attribute that to lousy docs. The intention is that builtin
types are immutable.

On Thu, Jun 4, 2015 at 1:18 PM, Andrew Barnert <abarnert at> wrote:

> On Jun 4, 2015, at 12:49, Guido van Rossum <guido at> wrote:
> On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas <
> python-ideas at> wrote:
>> But this isn't actually true. That BINARY_ADD opcode looks up the
>> addition method at runtime and calls it. And that means that if you
>> monkeypatch complex.__radd__, your method will get called.
> Wrong. You can't moneypatch complex.__radd__. That's a feature of the
> language.
> I may well have missed it, but I went looking through the Built-in Types
> library documentation, the Data Model and other chapters of the language
> reference documentation, and every relevant PEP I could think of, and I
> can't find anything that says this is true.
> The best I can find is the rationale section for PEP 3119 saying "there
> are good reasons to keep the built-in types immutable", which is why PEP
> 3141 was changed to not require mutating the built-in types. But "there are
> good reasons to allow implementations to forbid it" isn't the same thing as
> "all implementations must forbid it".
> And at least some implementations do allow it, like Brython and one of the
> two embedded pythons. (And the rationale in PEP 3119 doesn't apply to
> them--Brython doesn't share built-in types between different Python
> interpreters in different browser windows, even if they're in the same
> address space.)

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Thu Jun  4 23:23:41 2015
From: rosuav at (Chris Angelico)
Date: Fri, 5 Jun 2015 07:23:41 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 5, 2015 at 6:18 AM, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> The best I can find is the rationale section for PEP 3119 saying "there are
> good reasons to keep the built-in types immutable", which is why PEP 3141
> was changed to not require mutating the built-in types. But "there are good
> reasons to allow implementations to forbid it" isn't the same thing as "all
> implementations must forbid it".
> And at least some implementations do allow it, like Brython and one of the
> two embedded pythons. (And the rationale in PEP 3119 doesn't apply to
> them--Brython doesn't share built-in types between different Python
> interpreters in different browser windows, even if they're in the same
> address space.)

Huh. Does that imply that Brython has to construct a brand-new integer
object for absolutely every operation and constant, in case someone
monkeypatched something? Once integers (and other built-in types) lose
their immutability, they become distinguishable:

x = 2
y = 2

In CPython (and, I think, in the Python spec), the two 2s in x and y
will be utterly indistinguishable, like fermions. CPython goes further
and uses the exact same object for both 2s, *because it can*. Is there
something you can do inside monkey_patch() that will "mark" one of
those 2s such that it's somehow different (add an attribute, change a
dunder method, etc)? And does Brython guarantee that id(x)!=id(y)
because of that?


From random832 at  Thu Jun  4 23:34:53 2015
From: random832 at (random832 at
Date: Thu, 04 Jun 2015 17:34:53 -0400
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 4, 2015, at 17:23, Chris Angelico wrote:
> Huh. Does that imply that Brython has to construct a brand-new integer
> object for absolutely every operation and constant, in case someone
> monkeypatched something? Once integers (and other built-in types) lose
> their immutability, they become distinguishable:
> x = 2
> monkey_patch(x)
> y = 2

Er, we're talking about monkey-patching the int *class* (well, the
complex class, but the same idea applies), not an individual int object.

From rosuav at  Thu Jun  4 23:45:39 2015
From: rosuav at (Chris Angelico)
Date: Fri, 5 Jun 2015 07:45:39 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 5, 2015 at 7:34 AM,  <random832 at> wrote:
> On Thu, Jun 4, 2015, at 17:23, Chris Angelico wrote:
>> Huh. Does that imply that Brython has to construct a brand-new integer
>> object for absolutely every operation and constant, in case someone
>> monkeypatched something? Once integers (and other built-in types) lose
>> their immutability, they become distinguishable:
>> x = 2
>> monkey_patch(x)
>> y = 2
> Er, we're talking about monkey-patching the int *class* (well, the
> complex class, but the same idea applies), not an individual int object.

Ah okay. Even so, it would be very surprising if "1+2" could evaluate
to anything other than 3.


From at  Fri Jun  5 00:13:50 2015
From: at (Yury Selivanov)
Date: Thu, 04 Jun 2015 18:13:50 -0400
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 2015-06-04 5:23 PM, Chris Angelico wrote:
> Huh. Does that imply that Brython has to construct a brand-new integer
> object for absolutely every operation and constant, in case someone
> monkeypatched something?
FWIW, numbers (as well as strings) are immutable in JavaScript.
And there is Object.freeze to make things immutable where you
need that.


From ncoghlan at  Fri Jun  5 00:31:57 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 5 Jun 2015 08:31:57 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 Jun 2015 00:25, "Paul Moore" <p.f.moore at> wrote:
> On 4 June 2015 at 14:48, Nick Coghlan <ncoghlan at> wrote:
> > On 4 June 2015 at 23:06, Paul Moore <p.f.moore at> wrote:
> >> As a straw man how about a new syntax (this won't work as written,
> >> because it'll clash with the "<" operator, but the basic idea works):
> >>
> >>     LITERAL_CALL = PRIMARY "<" <any source character except right
> >> angle bracket>* ">"
> >
> > The main idea I've had for compile time metaprogramming that I figured
> > I might be able to persuade Guido not to hate is:
> >
> >    python_ast, names2cells, unbound_names =
> > !(this_is_an_arbitrary_python_expression)
> The fundamental difference between this proposal and mine is (I think)
> that you're assuming an arbitrary Python expression in there (which is
> parsed), whereas I'm proposing an *unparsed* string.

No, when you supplied a custom parser, the parser would have access to the
raw string (as well as the name -> cell mapping for the current scope).

The "quoted AST parser" would just be the default one.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From alexander.belopolsky at  Fri Jun  5 00:40:11 2015
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Thu, 4 Jun 2015 18:40:11 -0400
Subject: [Python-ideas] [Python Ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 3, 2015 at 9:01 PM, Guido van Rossum <guido at> wrote:

> But there was a big issue that we didn't anticipate. During the course of
> a simple program it was quite common for calculations to slow down
> dramatically, because numbers with ever-larger numerators and denominators
> were being computed (and rational arithmetic quickly slows down as those
> get bigger).

The problem of unlimited growth can be solved by rounding, but the result
is in many ways worse that floating point
numbers.  One obvious problem is that unlike binary floating point where
all bit patterns represent different numbers,
only about 60% of fractions with limited numerators and denominators
represent unique values.  The rest are
reducible by dividing the numerator and denominator by the GCD.

Furthermore, the fractions with limited numerators are distributed very
unevenly on the number line.  This problem
is present in binary floats as well: floats between 1 and 2 are twice as
dense as floats between 2 and 4, but with
fractions it is much worse.  Since a/b - c/d = (ad-bc)/(bd), a fraction
nearest to a/b is at a distance of 1/(bd) from it.
So if the denominators are limited by D (|b| < D and |d| < D), for small
b's the nearest fraction to a/b is at distance
~ 1/D, but if b ~ D, it is at a distance of 1/D^2.  For example, if we
limit denominators to 10 decimal digits, the gaps
between fractions can vary from ~ 10^(-10) to ~ 10^(-20) even if the
fractions are of similar magnitude - say between
1 and 2.

These two problems rule out the use of fractions as a general purpose
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Fri Jun  5 00:43:53 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 4 Jun 2015 15:43:53 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 4, 2015, at 14:45, Chris Angelico <rosuav at> wrote:
>> On Fri, Jun 5, 2015 at 7:34 AM,  <random832 at> wrote:
>>> On Thu, Jun 4, 2015, at 17:23, Chris Angelico wrote:
>>> Huh. Does that imply that Brython has to construct a brand-new integer
>>> object for absolutely every operation and constant, in case someone
>>> monkeypatched something? Once integers (and other built-in types) lose
>>> their immutability, they become distinguishable:
>>> x = 2
>>> monkey_patch(x)
>>> y = 2
>> Er, we're talking about monkey-patching the int *class* (well, the
>> complex class, but the same idea applies), not an individual int object.
> Ah okay. Even so, it would be very surprising if "1+2" could evaluate
> to anything other than 3.

It's surprising that int('3') could evaluate to 4, or that print(1+2) could print 4, or that adding today and a 1-day timedelta could give you a date in 1918, or that accessing sys.stdout could play a trumpet sound and then read a 300MB file over the network, but there's nothing in the language stopping you from shadowing or replacing or monkeypatching any of those things, there's just your own common sense, and your trust in the common sense of other people working on the code with you.

And, getting this back on point: That also means there would be nothing stopping you from accidentally or maliciously redefining literal_d to play a trumpet sound and then read a 300MB file over the network instead of giving you a Decimal value, but that's not a problem the language has to solve, any more than it's a problem that you can replace int or print or sys.__getattr__.

The fact that people might overuse user-defined literals (e.g., I think using it for units, like the _ms suffix that C++'s timing library uses, is a bad idea), that's potentially a real problem. The fact that people might stupidly or maliciously interfere with some-other-user's-defined literals is not. Yes, you can surprise people that way, but Python already gives you a lot of much easier ways to surprise people. Python doesn't have a secure loader or enforced privates and constants or anything of the sort; it's designed to be used by consenting adults, and that works everywhere else, so why wouldn't it work here?

From random832 at  Fri Jun  5 01:02:18 2015
From: random832 at (random832 at
Date: Thu, 04 Jun 2015 19:02:18 -0400
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 4, 2015, at 18:13, Yury Selivanov wrote:
> FWIW, numbers (as well as strings) are immutable in JavaScript.

numbers and strings are, but Numbers and Strings aren't. Remember, in
Javascript, the former aren't objects.

From abarnert at  Fri Jun  5 01:03:16 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 4 Jun 2015 16:03:16 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 4, 2015, at 06:48, Nick Coghlan <ncoghlan at> wrote:
>> On 4 June 2015 at 23:06, Paul Moore <p.f.moore at> wrote:
>> As a straw man how about a new syntax (this won't work as written,
>> because it'll clash with the "<" operator, but the basic idea works):
>>    LITERAL_CALL = PRIMARY "<" <any source character except right
>> angle bracket>* ">"
> The main idea I've had for compile time metaprogramming that I figured
> I might be able to persuade Guido not to hate is:
>   python_ast, names2cells, unbound_names =
> !(this_is_an_arbitrary_python_expression)
> As suggested by the assignment target names, the default behaviour
> would be to compile the expression to a Python AST, and then at
> runtime provide some relevant information about the name bindings
> referenced from it. (I haven't even attempted to implement this,
> although I've suggested it to some of the SciPy folks as an idea they
> might want to explore to make R style lazy evaluation easier)
> By using the prefix+delimiters notation, it would become possible to
> later have variants that were similarly transparent to the compiler,
> but *called* a suitably registered callable at compile time to do the
> conversion to runtime Python objects. For example:
>   !sh(shell command)
>   !format(format string with implicit interpolation)
>   !sql(SQL query)
> So for custom numeric types, you could register:
>    d = !decimal(1.2)
>    r = !rational(22/7)

But what would that get you?

If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?

Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value?

Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)?

> This isn't an idea I'm likely to have time to pursue myself any time
> soon (if ever), but I think it addresses the key concern with syntax
> customisation by ensuring that customisations have *names*, and that
> they're clearly distinguished from normal code.
> Cheers,
> Nick.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From abarnert at  Fri Jun  5 01:20:34 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 4 Jun 2015 16:20:34 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 4, 2015, at 14:05, Guido van Rossum <guido at> wrote:
> OK, you can attribute that to lousy docs. The intention is that builtin types are immutable.

I can go file bugs against those other implementations, but first, what's the rationale?

The ABC PEP, the numbers PEP discussion, and the type/class unification tutorial all use the same reason: In CPython, different interpreters in the same memory space (as with mod_python) share the same built-in types. From the numbers discussion, it sounds like this was the only reason to reject the idea of just patching float.__bases__.

But most other Python implementations don't have process-wide globals like that to worry about; patching int in one interpreter can't possibly affect any other interpreter.

"Because CPython can't do it, nobody else should do it, to keep code portable" might be a good enough rationale for something this fundamental, but if that's not the one you're thinking of, I don't want to put those words in your mouth.

>> On Thu, Jun 4, 2015 at 1:18 PM, Andrew Barnert <abarnert at> wrote:
>>> On Jun 4, 2015, at 12:49, Guido van Rossum <guido at> wrote:
>>>> On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas <python-ideas at> wrote:
>>>> But this isn't actually true. That BINARY_ADD opcode looks up the addition method at runtime and calls it. And that means that if you monkeypatch complex.__radd__, your method will get called.
>>> Wrong. You can't moneypatch complex.__radd__. That's a feature of the language.
>> I may well have missed it, but I went looking through the Built-in Types library documentation, the Data Model and other chapters of the language reference documentation, and every relevant PEP I could think of, and I can't find anything that says this is true. 
>> The best I can find is the rationale section for PEP 3119 saying "there are good reasons to keep the built-in types immutable", which is why PEP 3141 was changed to not require mutating the built-in types. But "there are good reasons to allow implementations to forbid it" isn't the same thing as "all implementations must forbid it".
>> And at least some implementations do allow it, like Brython and one of the two embedded pythons. (And the rationale in PEP 3119 doesn't apply to them--Brython doesn't share built-in types between different Python interpreters in different browser windows, even if they're in the same address space.)
> -- 
> --Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Fri Jun  5 01:44:37 2015
From: greg.ewing at (Greg Ewing)
Date: Fri, 05 Jun 2015 11:44:37 +1200
Subject: [Python-ideas] [Python Ideas] Python Float Update
In-Reply-To: <>
References: <>
Message-ID: <>

MRAB wrote:
> On 2015-06-04 02:01, Guido van Rossum wrote:
>>  From 1982 till 1886 I participated in the implementation of ABC
> Was that when the time machine was first used? :-)

Must have been a really big project if you had to
give yourself nearly 100 years of development time!


From at  Fri Jun  5 04:11:17 2015
From: at (Yury Selivanov)
Date: Thu, 04 Jun 2015 22:11:17 -0400
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 2015-06-04 7:02 PM, random832 at wrote:
> On Thu, Jun 4, 2015, at 18:13, Yury Selivanov wrote:
>> >FWIW, numbers (as well as strings) are immutable in JavaScript.
> numbers and strings are, but Numbers and Strings aren't. Remember, in
> Javascript, the former aren't objects.

I know. Although you can't mutate the inner-value of Number or
String objects, you can only attach properties.


From surya.subbarao1 at  Fri Jun  5 04:12:31 2015
From: surya.subbarao1 at (u8y7541 The Awesome Person)
Date: Thu, 4 Jun 2015 19:12:31 -0700
Subject: [Python-ideas] Python Float Update
Message-ID: <>

> But there was a big issue that we didn't anticipate. During the course of a
> simple program it was quite common for calculations to slow down
> dramatically, because numbers with ever-larger numerators and denominators
> were being computed (and rational arithmetic quickly slows down as those
> get bigger). So e.g. you might be computing your taxes with a precision of
> a million digits -- only to be rounding them down to dollars for display.

(Quote by Guido van Rossum)

> Decimal can still violate all the
> invariants that binary floats can.  Binary has better approximation
> properties if you care about the *degree* of inexactness rather than
> the *frequency*[1] of inexactness.  Fractions can quickly become very
> inefficient.

(Quote by Stephen J. Turnbull)

I have a solution. To add, we convert to decimal and then add, then
change back to fraction. If we have hard-to-represent decimals like 1
/ 3, we can add with numerators. Hybrid additions. This could really
speed up things...

-Surya Subbarao

From guettliml at  Fri Jun  5 07:36:59 2015
From: guettliml at (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=)
Date: Fri, 05 Jun 2015 07:36:59 +0200
Subject: [Python-ideas] Better Type Hinting
Message-ID: <>

It would be great to have better type hinting in IDEs.

My usecase:

logger = logging.getLogger(__name__)
except FooException, exc:

I remember there was a way to show the traceback via logger.warn().

I could use my favorite search engine, but a short cut via the IDE would
be much easier.

How can the IDE know what kind of duck "logger" is?

Many IDEs parse the docstrings, but a lot of code does not provide it.

How can this be improved?

   Thomas G?ttler

PS: I don't mention the name of my IDE intentionally :-)
  It does not matter for this question.

From stefan_ml at  Fri Jun  5 07:59:58 2015
From: stefan_ml at (Stefan Behnel)
Date: Fri, 05 Jun 2015 07:59:58 +0200
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <mkrdsu$7nv$>

Thomas G?ttler schrieb am 05.06.2015 um 07:36:
> It would be great to have better type hinting in IDEs.

Sounds more like a topic for python-list than python-ideas.

> My usecase:
> logger = logging.getLogger(__name__)
> try:
>     ...
> except FooException, exc:
>     logger.warn('...')
> I remember there was a way to show the traceback via logger.warn().
> I could use my favorite search engine, but a short cut via the IDE would
> be much easier.
> How can the IDE know what kind of duck "logger" is?
> Many IDEs parse the docstrings, but a lot of code does not provide it.
> How can this be improved?
> PS: I don't mention the name of my IDE intentionally :-)
>  It does not matter for this question.

Yes it does. It sounds like you want to use an IDE instead that supports
the above. Or install a plugin for the one you're using that improves its
capabilities for type introspection. There are a couple of IDE plugins that
embed jedi, for example.


From abarnert at  Fri Jun  5 08:01:14 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 4 Jun 2015 23:01:14 -0700
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 4, 2015, at 22:36, Thomas G?ttler <guettliml at> wrote:
> It would be great to have better type hinting in IDEs.

Is PEP 484 not sufficient for this purpose?

Of course you'll have to wait until 3.5 or use an external backport (I think the goal is for MyPy to include stub files for 2.7/3.3/3.4, and/or for them to be published separately on PyPI?), and even longer for every library you depend on to get on board. And of course your favorite IDE has to actually do something with these type hints, and integrate an inference engine (whether that means running something like MyPy in the background or implementing something themselves). But I don't think there's anything Python itself can do to speed any of that up.

And meanwhile, I suppose it's possible that the PEP 484 design will turn out to be insufficient, but there's no way we're going to know that until the IDEs try to use it and fail.

> My usecase:
> logger = logging.getLogger(__name__)
> try:
>    ...
> except FooException, exc:
>    logger.warn('...')

Is there a reason you're using syntax that's deprecate since Python 2.6 and doesn't work in 3.x? Any proposal for the future of the Python language isn't going to help you if you're still using 2.5.

From me+python at  Fri Jun  5 08:19:30 2015
From: me+python at (Stephen Hansen)
Date: Thu, 04 Jun 2015 23:19:30 -0700
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote:
> On Jun 4, 2015, at 22:36, Thomas G?ttler <guettliml at>
> wrote:
> > 
> > It would be great to have better type hinting in IDEs.
> Is PEP 484 not sufficient for this purpose?

It's really not.

For one thing, PEP 484 isn't going to result in the standard library
being hinted all up (though I assume someone may make stubs). But
really, the specific issue that the OP is running into is because of the
signature of logging.warn --  msg, *args, **kwargs.

These kinds of signatures are very useful in certain circumstances but
they are also completely opaque. They're intentionally taking "anything"
and passing it off to another function. PEP 484 doesn't say anything
about the realities of logging.warn, as all the work is being done in
the private _log where we can examine and learn that it takes two
optional keyword parameters named "exc_info" and "extra", or what those
mean or what valid values are for them.

All my preferred IDE tells me is, "msg, *args, **kwargs", which leaves
me befuddled if I don't remember the signature or have the docs in hand.
If it were to display the docstring too, I'd know "exc_info" is a valid
keyword argument that does something useful, but I'd still have no idea
about "extra" (and I actually have no idea what extra does and am not
looking it up right now on purpose).

I don't really think that this is a problem for Python the language, but
maybe the style guide: don't use *args or **kwargs unless you either
document the details of what those should be, ooor, maybe include a
@functools.passes (fictional device) that in some fashion documents
at this other function for the things I'm passing along blindly).

The problem the OP is demonstrating is really completely out of scope
for what PEP 484 is addressing, I think. It has little to do with type
hinting and more to do with, IMHO, "should the stdlib provide more
introspectable signatures" (which then IDE's could use).

  Stephen Hansen
  m e @ i x o k a i . i o

From ncoghlan at  Fri Jun  5 09:06:09 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 5 Jun 2015 17:06:09 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 June 2015 at 09:03, Andrew Barnert <abarnert at> wrote:
> On Jun 4, 2015, at 06:48, Nick Coghlan <ncoghlan at> wrote:
>>> On 4 June 2015 at 23:06, Paul Moore <p.f.moore at> wrote:
>>> As a straw man how about a new syntax (this won't work as written,
>>> because it'll clash with the "<" operator, but the basic idea works):
>>>    LITERAL_CALL = PRIMARY "<" <any source character except right
>>> angle bracket>* ">"
>> The main idea I've had for compile time metaprogramming that I figured
>> I might be able to persuade Guido not to hate is:
>>   python_ast, names2cells, unbound_names =
>> !(this_is_an_arbitrary_python_expression)
>> As suggested by the assignment target names, the default behaviour
>> would be to compile the expression to a Python AST, and then at
>> runtime provide some relevant information about the name bindings
>> referenced from it. (I haven't even attempted to implement this,
>> although I've suggested it to some of the SciPy folks as an idea they
>> might want to explore to make R style lazy evaluation easier)
>> By using the prefix+delimiters notation, it would become possible to
>> later have variants that were similarly transparent to the compiler,
>> but *called* a suitably registered callable at compile time to do the
>> conversion to runtime Python objects. For example:
>>   !sh(shell command)
>>   !format(format string with implicit interpolation)
>>   !sql(SQL query)
>> So for custom numeric types, you could register:
>>    d = !decimal(1.2)
>>    r = !rational(22/7)
> But what would that get you?
> If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?
> Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value?
> Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)?

The larger idea (again, keeping in mind I haven't actually fully
thought through how to implement this) is to give the parsers access
to the surrounding namespace, which means that the compiler needs to
be made aware of any *actual* name references, and the *way* names are
referenced would be parser dependent (shell variables, format string
interpolation, SQL interpolation, etc).

So, for example:

    print(!format(The {item} cost {amount} {units}))

Would roughly translate to:

    print("The {item} cost {amount} {units}".format(item=item,
amount=amount, units=units))

It seemed relevant in this context, as a compile time AST
transformation would let folks define their own pseudo-literals. Since
marshal wouldn't know how to handle them, the AST produced at compile
time would still need to be for a runtime constructor call rather than
for a value to be stored in co_consts. These cases:

    d = !decimal(1.2)
    r = !rational(22/7)

Might simply translate directly to the following as the runtime code:

    d = decimal.Decimal("1.2")
    r = fractions.Fraction(22, 7)

With the difference being that the validity of the passed in string
would be checked at compile time rather than at runtime, so you could
only use it for literal values, not to construct values from

As far as registration goes, yes, there'd need to be a way to hook the
compiler to notify it of the existence of these compile time AST
generation functions. Dave Malcolm's patch to allow parts of the
compiler to be written in Python rather than C
( ) might be an interest place to
start on that front.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Fri Jun  5 09:10:56 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 5 Jun 2015 17:10:56 +1000
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 June 2015 at 16:19, Stephen Hansen <me+python at> wrote:
> On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote:
>> On Jun 4, 2015, at 22:36, Thomas G?ttler <guettliml at>
>> wrote:
>> >
>> > It would be great to have better type hinting in IDEs.
>> Is PEP 484 not sufficient for this purpose?
> It's really not.
> For one thing, PEP 484 isn't going to result in the standard library
> being hinted all up (though I assume someone may make stubs).

Doing exactly that is a core part of the PEP 484 effort, since it's
needed to assist in Python 2 -> 3 migrations:

One of the advantages of that is that more specific signatures can be
added to stdlib stubs and benefit IDEs in existing Python versions,
rather than having to wait for more explicit signatures in future
Python versions.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guettliml at  Fri Jun  5 09:21:33 2015
From: guettliml at (=?windows-1252?Q?Thomas_G=FCttler?=)
Date: Fri, 05 Jun 2015 09:21:33 +0200
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <>

Am 05.06.2015 um 08:19 schrieb Stephen Hansen:
> On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote:
>> On Jun 4, 2015, at 22:36, Thomas G?ttler <guettliml at>
>> wrote:
>>> It would be great to have better type hinting in IDEs.
>> Is PEP 484 not sufficient for this purpose?
> It's really not.
> For one thing, PEP 484 isn't going to result in the standard library
> being hinted all up (though I assume someone may make stubs). But
> really, the specific issue that the OP is running into is because of the
> signature of logging.warn --  msg, *args, **kwargs.

I am using logger.warn() not logging.warn().

The question is: How to know which kind of duck "logger" is?

"logger" was created by "logging.getLogger(__name__)"

It is not the question how to implement better guessing in the IDE.

The basics needs to be solved. Everything else is "toilet paper programming"
(Ah, smell inside, ... let's write an wrapper ...)

   Thomas G?ttler

From cory at  Fri Jun  5 09:36:18 2015
From: cory at (Cory Benfield)
Date: Fri, 5 Jun 2015 08:36:18 +0100
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <>

> On 5 Jun 2015, at 08:21, Thomas G?ttler <guettliml at> wrote:
> I am using logger.warn() not logging.warn().
> The question is: How to know which kind of duck "logger" is?
> "logger" was created by "logging.getLogger(__name__)"
> It is not the question how to implement better guessing in the IDE.
> The basics needs to be solved. Everything else is "toilet paper programming"
> (Ah, smell inside, ... let's write an wrapper ...)

This question is unanswerable unless you actually execute the code at runtime under the exact same conditions as you expect to encounter it.

Because Python allows for monkey patching at runtime by any other code running in the process you can make no assumptions about what kind of duck that will be. Even without monkey patching you can?t know, because someone may have adjusted sys.path ahead of time, causing you to import an entirely unexpected module called ?logging?.

In certain specialised cases, if you limit yourself to special rules, you *might* be able to statically assert the type of this object, but in the general case it simply cannot be done.

So the only way to improve this is to implement better guessing in the IDE. Hence the improvements that were proposed to you.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <>

From me+python at  Fri Jun  5 09:41:37 2015
From: me+python at (Stephen Hansen)
Date: Fri, 05 Jun 2015 00:41:37 -0700
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 5, 2015, at 12:10 AM, Nick Coghlan wrote:
> On 5 June 2015 at 16:19, Stephen Hansen <me+python at> wrote:
> > On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote:
> >> On Jun 4, 2015, at 22:36, Thomas G?ttler <guettliml at>
> >> wrote:
> >> >
> >> > It would be great to have better type hinting in IDEs.
> >>
> >> Is PEP 484 not sufficient for this purpose?
> >
> > It's really not.
> >
> > For one thing, PEP 484 isn't going to result in the standard library
> > being hinted all up (though I assume someone may make stubs).
> Doing exactly that is a core part of the PEP 484 effort, since it's
> needed to assist in Python 2 -> 3 migrations:

How so? Nothing in PEP 484 addresses signatures which take *args and
**kwargs arguments. Please, correct me if I'm wrong, but my
understanding is you can uselessly specify types for a Mapping, but you
can't specify what actual keys are valid in that mapping. That's the
problem with logging's specification, its functions take "anything" and
pass on "anything", in their signature... In reality they take up two
keyword arguments -- exc_info and extra. Args really is "anything" as
its just formatted against the string. 

Unless I'm missing something, PEP484 only allows defining *types* of
specific arguments -- but this isn't about types. This is about what
arguments are valid (and then, after you have that bit of knowing, what
types come next). When used with API's that take *args and **kwargs, I
don't see how PEP484 is useful at all. 

I'm not arguing against PEP484. but it has nothing at all to do with the
specific problem mentioned here.

Dynamic API's that take "any args" and "any kwargs" are opaque things it
doesn't tell anything about. Logger.warn (and, logging.warn) is one such

On Fri, Jun 5, 2015, at 12:21 AM, Thomas G?ttler wrote:
> Am 05.06.2015 um 08:19 schrieb Stephen Hansen:
> > On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote:
> >> On Jun 4, 2015, at 22:36, Thomas G?ttler <guettliml at>
> >> wrote:
> >>>
> >>> It would be great to have better type hinting in IDEs.
> >>
> >> Is PEP 484 not sufficient for this purpose?
> >
> > It's really not.
> >
> > For one thing, PEP 484 isn't going to result in the standard library
> > being hinted all up (though I assume someone may make stubs). But
> > really, the specific issue that the OP is running into is because of the
> > signature of logging.warn --  msg, *args, **kwargs.
> I am using logger.warn() not logging.warn().

Same difference. Logging.warn is just a thin wrapper around the root
logger's warn(). They still have the same completely opaque signature,
of msg, *args, **kwargs that it passes along, which is why your IDE
can't report a useful signature that tells you that exc_info=True is
what you want. 


From abarnert at  Fri Jun  5 10:38:32 2015
From: abarnert at (Andrew Barnert)
Date: Fri, 5 Jun 2015 01:38:32 -0700
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 5, 2015, at 00:21, Thomas G?ttler <guettliml at> wrote:
>> Am 05.06.2015 um 08:19 schrieb Stephen Hansen:
>>> On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote:
>>> On Jun 4, 2015, at 22:36, Thomas G?ttler <guettliml at>
>>> wrote:
>>>> It would be great to have better type hinting in IDEs.
>>> Is PEP 484 not sufficient for this purpose?
>> It's really not.
>> For one thing, PEP 484 isn't going to result in the standard library
>> being hinted all up (though I assume someone may make stubs). But
>> really, the specific issue that the OP is running into is because of the
>> signature of logging.warn --  msg, *args, **kwargs.
> I am using logger.warn() not logging.warn().
> The question is: How to know which kind of duck "logger" is?

That is _exactly_ what PEP 484 addresses.

If `logging.getLogger` is annotated or stubbed to specify that it returns a `logging.Logger` (which it will be), then a static type checker (whether MyPy or a competing checker or custom code in the IDE) can trivially infer that `logger` is a `logging.Logger`.

If you needed to annotate exactly which subclass of `Logger` was returned (unlikely, but not impossible--maybe you conditionally do a `logging.set_logger_class`, and _you_ know what the type is going to be even though a static analyzer can't infer it, and your subclass has a different API than the base class), then you can use a variable type comment.

As Stephen Hansen argues, that may still not solve all of your problems. But it definitely does solve the "how to know which kind of duck" problem you're asking about.

> "logger" was created by "logging.getLogger(__name__)"
> It is not the question how to implement better guessing in the IDE.
> The basics needs to be solved. Everything else is "toilet paper programming"

Have you read PEP 484? What part of the basics do you think it's not solving? Because it sounds an awful lot like you're just demanding that someone write something exactly like PEP 484.

From abarnert at  Fri Jun  5 10:47:43 2015
From: abarnert at (Andrew Barnert)
Date: Fri, 5 Jun 2015 01:47:43 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 5, 2015, at 00:06, Nick Coghlan <ncoghlan at> wrote:
>> On 5 June 2015 at 09:03, Andrew Barnert <abarnert at> wrote:
>>> On Jun 4, 2015, at 06:48, Nick Coghlan <ncoghlan at> wrote:
>>>> On 4 June 2015 at 23:06, Paul Moore <p.f.moore at> wrote:
>>>> As a straw man how about a new syntax (this won't work as written,
>>>> because it'll clash with the "<" operator, but the basic idea works):
>>>>   LITERAL_CALL = PRIMARY "<" <any source character except right
>>>> angle bracket>* ">"
>>> The main idea I've had for compile time metaprogramming that I figured
>>> I might be able to persuade Guido not to hate is:
>>>  python_ast, names2cells, unbound_names =
>>> !(this_is_an_arbitrary_python_expression)
>>> As suggested by the assignment target names, the default behaviour
>>> would be to compile the expression to a Python AST, and then at
>>> runtime provide some relevant information about the name bindings
>>> referenced from it. (I haven't even attempted to implement this,
>>> although I've suggested it to some of the SciPy folks as an idea they
>>> might want to explore to make R style lazy evaluation easier)
>>> By using the prefix+delimiters notation, it would become possible to
>>> later have variants that were similarly transparent to the compiler,
>>> but *called* a suitably registered callable at compile time to do the
>>> conversion to runtime Python objects. For example:
>>>  !sh(shell command)
>>>  !format(format string with implicit interpolation)
>>>  !sql(SQL query)
>>> So for custom numeric types, you could register:
>>>   d = !decimal(1.2)
>>>   r = !rational(22/7)
>> But what would that get you?
>> If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?
>> Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value?
>> Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)?
> The larger idea (again, keeping in mind I haven't actually fully
> thought through how to implement this) is to give the parsers access
> to the surrounding namespace, which means that the compiler needs to
> be made aware of any *actual* name references, and the *way* names are
> referenced would be parser dependent (shell variables, format string
> interpolation, SQL interpolation, etc).
> So, for example:
>    print(!format(The {item} cost {amount} {units}))
> Would roughly translate to:
>    print("The {item} cost {amount} {units}".format(item=item,
> amount=amount, units=units))
> It seemed relevant in this context, as a compile time AST
> transformation would let folks define their own pseudo-literals. Since
> marshal wouldn't know how to handle them, the AST produced at compile
> time would still need to be for a runtime constructor call rather than
> for a value to be stored in co_consts. These cases:
>    d = !decimal(1.2)
>    r = !rational(22/7)
> Might simply translate directly to the following as the runtime code:
>    d = decimal.Decimal("1.2")
>    r = fractions.Fraction(22, 7)
> With the difference being that the validity of the passed in string
> would be checked at compile time rather than at runtime, so you could
> only use it for literal values, not to construct values from
> variables.

Note that, as discussed earlier in this thread, it is far easier to accidentally shadow `decimal` than something like `literal_decimal` or `bang_parser_decimal`, so there's a cost to doing this half-way at compile time, not just a benefit.

Also, a registry is definitely more "magical" than an explicit import: something some other module imported that isn't even visible in this module has changed the way this module is run, and even compiled. Of course that's true for import hooks as well, but I think in the case of import hooks there's really no avoiding the magic; in this case, there is. Obviously explicit vs. implicit isn't the only factor in usability/readability, so it's possible it would be better anyway, but I'm not sure it is.

At any rate, although you haven't shown how you expect these functions to be implemented, I think this proposal ends up being roughly equivalent to mine. Sure, the `bang_parser_decimal` function can compile the source to an AST and look up names in some way, but `literal_decimal` can do that too. And presumably whatever helper functions you were imagining to make that easier could still be written. So it's ultimately just bikeshedding the syntax, and whether you use a registry vs. normal lookup.

> As far as registration goes, yes, there'd need to be a way to hook the
> compiler to notify it of the existence of these compile time AST
> generation functions. Dave Malcolm's patch to allow parts of the
> compiler to be written in Python rather than C
> ( ) might be an interest place to
> start on that front.
> Cheers,
> Nick.
> -- 
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abarnert at  Fri Jun  5 11:29:43 2015
From: abarnert at (Andrew Barnert)
Date: Fri, 5 Jun 2015 02:29:43 -0700
Subject: [Python-ideas] Hooking between lexer and parser
Message-ID: <>

Compiling a module has four steps:

 * bytes->str (based on encoding declaration or default)
 * str->token stream
 * token stream->AST
 * AST->bytecode

You can very easily hook at every point in that process except the token stream.

There _is_ a workaround: re-encode the text to bytes, wrap it in a BytesIO, call tokenize, munge the token stream, call untokenize, re-decode back to text, then pass that to compile or ast.parse. But, besides being a bit verbose and painful, that means your line and column numbers get screwed up. So, while its fine for a quick&dirty toy like my user-literal-hack, it's not something you'd want to do in a real import hook for use in real code.

This could be solved by just changing ast.parse to accept an iterable of tokens or tuples as well as a string, and likewise for compile.

That isn't exactly a trivial change, because under the covers the _ast module is written in C, partly auto-generated, and expects as input a CST, which is itself created from a different tokenizer written in C with an similar but different API (since C doesn't have iterators). And adding a PyTokenizer_FromIterable or something seems like it might raise some fun bootstrapping issues that I haven't thought through yet. But I think it ought to be doable without having to reimplement the whole parser in pure Python. And I think it would be worth doing.

While we're at it, a few other (much smaller) changes would be nice:

 * Allow tokenize to take a text file instead of making it take a binary file and repeat the encoding detection.
 * Allow tokenize to take a file instead of its readline method.
 * Allow tokenize to take a str/bytes instead of requiring a file.
 * Add flags to compile to stop at any stage (decoded text, tokens, AST, or bytecode) instead of just the last two.
(The funny thing is that the C tokenizer actually already does support strings and bytes and file objects.)

I realize that doing all of these changes would mean that compile can now get an iterable and not know whether it's a file or a token stream until it tries to iterate it. So maybe that isn't the best API; maybe it's better to explicitly call tokenize, then ast.parse, then compile instead of calling compile repeatedly with different flags.

From stefan at  Fri Jun  5 12:52:24 2015
From: stefan at (s.krah)
Date: Fri, 05 Jun 2015 10:52:24 +0000
Subject: [Python-ideas] Decimal facts
Message-ID: <>

&gt; Also, I'm not entirely sure we want to add those types to decimal in the stdlib. It would be a lot less work to implement them in terms of an existing C implementation (maybe using the native types if the exist, Intel's library if they don't). But I don't think that's necessarily desirable for the stdlib.

Wrong.  The specification we're following is almost IEEE-2008.  Mike Cowlishaw
evolved the spec while he was on the IEEE commitee.

The Intel library is very slow for decimal128, and last time I looked it did not
promise correct rounding!  That fact was deeply buried in the docs.

As an aside, I'm not sure how serious the "Float Update" thread was,
given that the OP tried to sneak the Grothendieck prime past us.

Stefan Krah

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From p.f.moore at  Fri Jun  5 14:09:20 2015
From: p.f.moore at (Paul Moore)
Date: Fri, 5 Jun 2015 13:09:20 +0100
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 4 June 2015 at 23:31, Nick Coghlan <ncoghlan at> wrote:
>> The fundamental difference between this proposal and mine is (I think)
>> that you're assuming an arbitrary Python expression in there (which is
>> parsed), whereas I'm proposing an *unparsed* string.
> No, when you supplied a custom parser, the parser would have access to the
> raw string (as well as the name -> cell mapping for the current scope).
> The "quoted AST parser" would just be the default one.

Ah, I see now what you meant. Apologies, I'd not fully understood what
you were proposing. In which case yes, your proposal is strictly more
powerful than mine.

You still have the same problem as me, that what's inside !xxx(...)
cannot contain a ")" character. (Or maybe can't contain an unmatched
")", or an unescaped ")", depending on what restrictions you feel like
putting on the form of the unparsed expression...) But I think that's
fundamental to any form of syntax embedding, so it's not exactly a


From p.f.moore at  Fri Jun  5 14:18:43 2015
From: p.f.moore at (Paul Moore)
Date: Fri, 5 Jun 2015 13:18:43 +0100
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 June 2015 at 00:03, Andrew Barnert <abarnert at> wrote:
> If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?

Well, Python bytecode has no way of holding any form of constant
Decimal value, so if that's what you want you need a change to the
bytecode (and hence the interperter). I'm not sure how that qualifies
as "user-defined".

We seem to be talking at cross purposes here. The questions you're
asking are ones I would direct at you (assuming it's you that's after
a compile-time value, I'm completely lost as to who is arguing for
what any more :-() My position is that "compile-time" user-defined
literals don't make sense in Python, what people actually want is
probably more along the lines of "better syntax for writing constant
values of user-defined types".

Oh, and just as a point of reference see - C++ user
defined literals translate into a *runtime* function call. So even
static languages don't work the way you suggest in the comment above.


From random832 at  Fri Jun  5 14:28:46 2015
From: random832 at (random832 at
Date: Fri, 05 Jun 2015 08:28:46 -0400
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 4, 2015, at 22:11, Yury Selivanov wrote:
> I know. Although you can't mutate the inner-value of Number or
> String objects, you can only attach properties.

You can shadow valueOf, which gets you close enough for many purposes.

From encukou at  Fri Jun  5 14:30:06 2015
From: encukou at (Petr Viktorin)
Date: Fri, 5 Jun 2015 14:30:06 +0200
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 5, 2015 at 2:09 PM, Paul Moore <p.f.moore at> wrote:
> On 4 June 2015 at 23:31, Nick Coghlan <ncoghlan at> wrote:
>>> The fundamental difference between this proposal and mine is (I think)
>>> that you're assuming an arbitrary Python expression in there (which is
>>> parsed), whereas I'm proposing an *unparsed* string.
>> No, when you supplied a custom parser, the parser would have access to the
>> raw string (as well as the name -> cell mapping for the current scope).
>> The "quoted AST parser" would just be the default one.
> Ah, I see now what you meant. Apologies, I'd not fully understood what
> you were proposing. In which case yes, your proposal is strictly more
> powerful than mine.
> You still have the same problem as me, that what's inside !xxx(...)
> cannot contain a ")" character. (Or maybe can't contain an unmatched
> ")", or an unescaped ")", depending on what restrictions you feel like
> putting on the form of the unparsed expression...) But I think that's
> fundamental to any form of syntax embedding, so it's not exactly a
> showstopper.

Parsing consumes tokens. The tokenizer already tracks parentheses (for
ignoring indentation between them), so umatched parens would throw off
the tokenizer itself.
It'd be reasonable to require !macros to only contain valid Python
tokens, and have matched parentheses tokens (i.e. ignoring parens in
comments/string literals.)

From stefan_ml at  Fri Jun  5 15:11:47 2015
From: stefan_ml at (Stefan Behnel)
Date: Fri, 05 Jun 2015 15:11:47 +0200
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <mks76j$uun$>

Cory Benfield schrieb am 05.06.2015 um 09:36:
>> On 5 Jun 2015, at 08:21, Thomas G?ttler wrote:
>> I am using logger.warn() not logging.warn().
>> The question is: How to know which kind of duck "logger" is?
>> "logger" was created by "logging.getLogger(__name__)"
>> It is not the question how to implement better guessing in the IDE.
>> The basics needs to be solved. Everything else is "toilet paper
>> programming" (Ah, smell inside, ... let's write an wrapper ...)
> This question is unanswerable unless you actually execute the code at
> runtime under the exact same conditions as you expect to encounter it.

That doesn't mean that it's impossible to find enough type information to
make an IDE present something helpful to a user. In all interesting cases,
the object returned by logging.getLogger() will be a logger instance with a
well-known interface, and tools can just know that.

Tools like Jedi and PyCharm show that this is definitely possible.


From guettliml at  Fri Jun  5 15:32:36 2015
From: guettliml at (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=)
Date: Fri, 05 Jun 2015 15:32:36 +0200
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <>

Am 05.06.2015 um 10:38 schrieb Andrew Barnert:
> On Jun 5, 2015, at 00:21, Thomas G?ttler <guettliml at> wrote:
>>> Am 05.06.2015 um 08:19 schrieb Stephen Hansen:
>>>> On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote:
>>>> On Jun 4, 2015, at 22:36, Thomas G?ttler <guettliml at>
>>>> wrote:
>>>>> It would be great to have better type hinting in IDEs.
>>>> Is PEP 484 not sufficient for this purpose?
>>> It's really not.
>>> For one thing, PEP 484 isn't going to result in the standard library
>>> being hinted all up (though I assume someone may make stubs). But
>>> really, the specific issue that the OP is running into is because of the
>>> signature of logging.warn --  msg, *args, **kwargs.
>> I am using logger.warn() not logging.warn().
>> The question is: How to know which kind of duck "logger" is?
> That is _exactly_ what PEP 484 addresses.

Now I read it and it does exactly what I was looking for.

> If `logging.getLogger` is annotated or stubbed to specify that it returns a `logging.Logger` (which it will be),
 > then a static type checker (whether MyPy or a competing checker or custom code in the IDE)
 > can trivially infer that `logger` is a `logging.Logger`.

Unfortunately we still use Python2.7, but maybe it is time for change ...

Just one thing left:

 > **If** `logging.getLogger` is annotated ....

What is the policy of the standard library? Will there be type hints
for methods like logging.getLogger() in the standard library in the future?

Since it is quite easy to add them, will patches be accepted?

   Thomas G?ttler

From ncoghlan at  Fri Jun  5 16:53:27 2015
From: ncoghlan at (Nick Coghlan)
Date: Sat, 6 Jun 2015 00:53:27 +1000
Subject: [Python-ideas] Better Type Hinting
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 Jun 2015 23:34, "Thomas G?ttler" <guettliml at> wrote:
> Am 05.06.2015 um 10:38 schrieb Andrew Barnert:
>> On Jun 5, 2015, at 00:21, Thomas G?ttler <guettliml at>
>>>> Am 05.06.2015 um 08:19 schrieb Stephen Hansen:
>>>>> On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas
>>>>> On Jun 4, 2015, at 22:36, Thomas G?ttler <guettliml at
>>>>> wrote:
>>>>>> It would be great to have better type hinting in IDEs.
>>>>> Is PEP 484 not sufficient for this purpose?
>>>> It's really not.
>>>> For one thing, PEP 484 isn't going to result in the standard library
>>>> being hinted all up (though I assume someone may make stubs). But
>>>> really, the specific issue that the OP is running into is because of
>>>> signature of logging.warn --  msg, *args, **kwargs.
>>> I am using logger.warn() not logging.warn().
>>> The question is: How to know which kind of duck "logger" is?
>> That is _exactly_ what PEP 484 addresses.
> Now I read it and it does exactly what I was looking for.

>> If `logging.getLogger` is annotated or stubbed to specify that it
returns a `logging.Logger` (which it will be),
> > then a static type checker (whether MyPy or a competing checker or
custom code in the IDE)
> > can trivially infer that `logger` is a `logging.Logger`.
> Unfortunately we still use Python2.7, but maybe it is time for change ...

The typeshed project provides stubs for both Python 2 & 3:
<>:// <> <>/
<>/ <>
typeshed <>

Type hinting your own code where appropriate would be easier in Python 3
(since you can use inline type hints)

> Just one thing left:
> > **If** `logging.getLogger` is annotated ....
> What is the policy of the standard library? Will there be type hints
> for methods like logging.getLogger() in the standard library in the

The standard library won't be getting native annotations any time soon, but
the typeshed annotations are expected to fill the gap.

> Since it is quite easy to add them, will patches be accepted?

Contributions to the typeshed stubs would be preferable.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Fri Jun  5 17:45:52 2015
From: abarnert at (Andrew Barnert)
Date: Fri, 5 Jun 2015 08:45:52 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 5, 2015, at 05:18, Paul Moore <p.f.moore at> wrote:
>> On 5 June 2015 at 00:03, Andrew Barnert <abarnert at> wrote:
>> If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?
> Well, Python bytecode has no way of holding any form of constant
> Decimal value, so if that's what you want you need a change to the
> bytecode (and hence the interperter). I'm not sure how that qualifies
> as "user-defined".

That's the point I was making. Nick proposed this syntax in reply to a message where I said that being a compile-time value is both irrelevant and impossible, so I thought he was claiming that this syntax somehow solved that problem where mine didn't.

> We seem to be talking at cross purposes here. The questions you're
> asking are ones I would direct at you (assuming it's you that's after
> a compile-time value, I'm completely lost as to who is arguing for
> what any more :-() My position is that "compile-time" user-defined
> literals don't make sense in Python, what people actually want is
> probably more along the lines of "better syntax for writing constant
> values of user-defined types".

Be careful of that word "constant". Python doesn't really have a distinction between constant and non-constant values. There are values of immutable and mutable types, and there are read-only attributes and members of immutable collections, but there's no such thing as a constant list value or a non-constant decimal value. So people can't be asking to create constant decimal values when they ask for literal decimal values.

So, what does "literal" mean, if it's neither the same thing as "compile-time" nor the same thing as "constant" but just happens to overlap those perfectly in the simplest cases? Well, I think the sense in which these things should "act like literals" is intuitively obvious, but very hard to nail down precisely. Hence the intentionally vague "sufficiently simple" definition I gave. But it doesn't _need_ to be nailed down precisely, because a proposal can be precise, and you can then check it against the cases people intuitively want, and see if they do the right thing.

Notice that the C++ committee didn't start out by trying to define "literal" so they could define "user-defined literal"; they started with a vague notion that 1.2d could be a literal in the same sense that 0x1F is, came up with a proposal for that, hashed out that proposal through a series of revisions, translated the proposal into standardese, and then pointed at it and defined "literal" in terms of that. They could have instead decided "You know what, we don't like the term 'literal' for this after all" and called it something different in the final standard, and it still would have served the same needs, and I'm fine if people want to take that tack with Python. A name isn't meaningless, but it's not the most important part of the meaning; the semantics of the feature and the idiomatic uses of it are what matter.

> Oh, and just as a point of reference see
> - C++ user
> defined literals translate into a *runtime* function call.

No, if you define the operator constexpr, and it returns a value constructed with a constexpr constructor, 1.2d is a compile-time value that can be used in further compile-time computation.

That's the point I made earlier in the thread: the notion of "compile-time value" only really makes sense if you have a notion of "compile-time computation"; otherwise, it's irrelevant to any (non-reflective) computation. Therefore, the fact that my proposal leaves that part out of the C++ feature doesn't matter.

(Of course Python doesn't quite have _no_ compile-time computation; it has optional constant folding. But if you try to build on top of that without biting the bullet and just declaring the whole language accessible at compile time, you end up with the mess that was C++03, where compile-time code is slow, clumsy, and completely different from runtime code, which is a large part of why we have C++11, and also why we have D and various other languages. I don't think Python should add _anything_ new at compile time. You can always simulate compile time with import time, where the full language is available, so there's no compelling reason to make the same mistake C++ did.)

From p.f.moore at  Fri Jun  5 17:55:40 2015
From: p.f.moore at (Paul Moore)
Date: Fri, 5 Jun 2015 16:55:40 +0100
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 June 2015 at 16:45, Andrew Barnert <abarnert at> wrote:
> So, what does "literal" mean, if it's neither the same thing as "compile-time" nor the same thing as "constant" but just happens to overlap those perfectly in the simplest cases? Well, I think the sense in which these things should "act like literals" is intuitively obvious, but very hard to nail down precisely. Hence the intentionally vague "sufficiently simple" definition I gave. But it doesn't _need_ to be nailed down precisely, because a proposal can be precise, and you can then check it against the cases people intuitively want, and see if they do the right thing.

OK, my apologies, we're basically agreeing violently, then.

IMO, people typically *actually* want a nicer syntax for Decimal
values known at source-code-writing time. They probably don't actually
really think much about whether the value could be affected by
monkeypatching, or runtime changes, because they won't actually do
that in practice. So just documenting a clear, sane and suitably
Pythonic behaviour should be fine in practice (it won't stop the
bikeshedding of course :-)) And "it's the same as Decimal('1.2')" is
likely to be sufficiently clear, sane and Pythonic, even if it isn't
actually a "literal" in any real sense. That's certainly true for me -
I'd be happy with a syntax that worked like that.


From abarnert at  Fri Jun  5 18:13:37 2015
From: abarnert at (Andrew Barnert)
Date: Fri, 5 Jun 2015 09:13:37 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 5, 2015, at 08:55, Paul Moore <p.f.moore at> wrote:
>> On 5 June 2015 at 16:45, Andrew Barnert <abarnert at> wrote:
>> So, what does "literal" mean, if it's neither the same thing as "compile-time" nor the same thing as "constant" but just happens to overlap those perfectly in the simplest cases? Well, I think the sense in which these things should "act like literals" is intuitively obvious, but very hard to nail down precisely. Hence the intentionally vague "sufficiently simple" definition I gave. But it doesn't _need_ to be nailed down precisely, because a proposal can be precise, and you can then check it against the cases people intuitively want, and see if they do the right thing.
> OK, my apologies, we're basically agreeing violently, then.
> IMO, people typically *actually* want a nicer syntax for Decimal
> values known at source-code-writing time. They probably don't actually
> really think much about whether the value could be affected by
> monkeypatching, or runtime changes, because they won't actually do
> that in practice. So just documenting a clear, sane and suitably
> Pythonic behaviour should be fine in practice (it won't stop the
> bikeshedding of course :-)) And "it's the same as Decimal('1.2')" is
> likely to be sufficiently clear, sane and Pythonic, even if it isn't
> actually a "literal" in any real sense. That's certainly true for me -
> I'd be happy with a syntax that worked like that.

Thank you; I think you've just stated exactly my rationale in one paragraph better than all my longer attempts. :)

Well, I think it actually _is_ a literal in some useful sense, but I don't see much point in arguing about that. As long as the syntax and semantics are useful, and the name is something I can remember well enough to search for and tell other people about, I'm happy.

Anyway, the important question for me is whether people want this for any other type than Decimal (or, really, for decimal64, but unfortunately they don't have that option). That's why I created a hacky implementation, so anyone who thinks they have a good use case for fractions or a custom string type* or whatever can play with it and see if the code actually reads well to themselves and others. If it really is only Decimal that people want, we're better off with something specific rather than general.

(* My existing hack doesn't actually handle strings. Once I realized I'd left that out, I was hoping someone would bring it up, so I'd know someone was actually playing with it, at which point I can add it in a one-liner change. But apparently none of the people who downloaded it has actually tried it beyond running the included tests on 1.2d...)

From p.f.moore at  Fri Jun  5 19:42:03 2015
From: p.f.moore at (Paul Moore)
Date: Fri, 5 Jun 2015 18:42:03 +0100
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 June 2015 at 17:13, Andrew Barnert <abarnert at> wrote:
> Anyway, the important question for me is whether people want this for any other type than Decimal

Personally, I don'tuse decimals enough to care. But I like Nick's
generalised version, and I can easily imagine using that for a number
of things: unevaluated code objects or SQL snippets, for example. I'd
like to be able to use it as a regex literal, as well, but I don't
think it lends itself to that (I suspect a bare regex would choke the
Python lexer far too much).

But yes, the big question is whether it would be used sufficiently to
justify the work. And of course, it'd be Python 3.6+ only, so people
doing single-source code supporting older versions wouldn't be able to
use it for some time anyway. That's a high bar for *any* new syntax,
though, not specific to this.


From mistersheik at  Fri Jun  5 22:38:27 2015
From: mistersheik at (Neil Girdhar)
Date: Fri, 5 Jun 2015 13:38:27 -0700 (PDT)
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

Actually CPython has another step between the AST and the bytecode, which 
validates the AST to block out trees that violate various rules that were 
not easily incorporated into the LL(1) grammar.  This means that when you 
want to change parsing, you have to change: the grammar, the AST library, 
the validation library, and Python's exposed parsing module.

Modern parsers do not separate the grammar from tokenizing, parsing, and 
validation.  All of these are done in one place, which not only simplifies 
changes to the grammar, but also protects you from possible 
inconsistencies.  It was really hard for me when I was making changes to 
the parser to keep my conception of these four things synchronized.

So in my opinion, if you're going to modernize the parsing, then put it all 
together into one simple library that deals with all of it.  It seems like 
what you're suggesting would add complexity, whereas a merged solution 
would simplify the code.  If it's hard to write a fast parser, then 
consider writing a parser generator in Python that generates the C code you 



On Friday, June 5, 2015 at 5:30:23 AM UTC-4, Andrew Barnert via 
Python-ideas wrote:
> Compiling a module has four steps: 
>  * bytes->str (based on encoding declaration or default) 
>  * str->token stream 
>  * token stream->AST 
>  * AST->bytecode 
> You can very easily hook at every point in that process except the token 
> stream. 
> There _is_ a workaround: re-encode the text to bytes, wrap it in a 
> BytesIO, call tokenize, munge the token stream, call untokenize, re-decode 
> back to text, then pass that to compile or ast.parse. But, besides being a 
> bit verbose and painful, that means your line and column numbers get 
> screwed up. So, while its fine for a quick&dirty toy like my 
> user-literal-hack, it's not something you'd want to do in a real import 
> hook for use in real code. 
> This could be solved by just changing ast.parse to accept an iterable of 
> tokens or tuples as well as a string, and likewise for compile. 
> That isn't exactly a trivial change, because under the covers the _ast 
> module is written in C, partly auto-generated, and expects as input a CST, 
> which is itself created from a different tokenizer written in C with an 
> similar but different API (since C doesn't have iterators). And adding a 
> PyTokenizer_FromIterable or something seems like it might raise some fun 
> bootstrapping issues that I haven't thought through yet. But I think it 
> ought to be doable without having to reimplement the whole parser in pure 
> Python. And I think it would be worth doing. 
> While we're at it, a few other (much smaller) changes would be nice: 
>  * Allow tokenize to take a text file instead of making it take a binary 
> file and repeat the encoding detection. 
>  * Allow tokenize to take a file instead of its readline method. 
>  * Allow tokenize to take a str/bytes instead of requiring a file. 
>  * Add flags to compile to stop at any stage (decoded text, tokens, AST, 
> or bytecode) instead of just the last two. 
> (The funny thing is that the C tokenizer actually already does support 
> strings and bytes and file objects.) 
> I realize that doing all of these changes would mean that compile can now 
> get an iterable and not know whether it's a file or a token stream until it 
> tries to iterate it. So maybe that isn't the best API; maybe it's better to 
> explicitly call tokenize, then ast.parse, then compile instead of calling 
> compile repeatedly with different flags. 
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at <javascript:> 
> Code of Conduct: 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From luciano at  Sat Jun  6 00:55:24 2015
From: luciano at (Luciano Ramalho)
Date: Fri, 5 Jun 2015 19:55:24 -0300
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar <mistersheik at> wrote:
> Modern parsers do not separate the grammar from tokenizing, parsing, and
> validation.  All of these are done in one place, which not only simplifies
> changes to the grammar, but also protects you from possible inconsistencies.

Hi, Neil, thanks for that!

Having studied only ancient parsers, I'd love to learn new ones. Can
you please post references to modern parsing? Actual parsers, books,
papers, anything you may find valuable.

I have I hunch you're talking about PEG parsers, but maybe something
else, or besides?




Luciano Ramalho
|  Author of Fluent Python (O'Reilly, 2015)
|  Professor em:
|  Twitter: @ramalhoorg

From abarnert at  Sat Jun  6 00:58:07 2015
From: abarnert at (Andrew Barnert)
Date: Fri, 5 Jun 2015 15:58:07 -0700
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 5, 2015, at 13:38, Neil Girdhar <mistersheik at> wrote:
> Actually CPython has another step between the AST and the bytecode, which validates the AST to block out trees that violate various rules that were not easily incorporated into the LL(1) grammar.  

Yes, and it also builds CST nodes before building the AST, and there's a step after AST validation and before bytecode generation where the symbol table for the scope is built and LEG rules are applies. But none of those are things that seem particularly useful to hook. Maybe that's just a failure of imagination, but I've never wanted to do it.

Hooking the token stream, on the other hand, has pretty obvious uses. For example, in the user-defined literal thread, Paul Moore suggested that for Nick Coghlan's "compile-time expression" idea, requiring valid Python syntax would be way too restrictive, but requiring valid Python tokens is probably OK, and it automatically solves the quoting problem, and it would usually be easier than parsing text. I think he's right about all three parts of that, but unfortunately you can't implement it that way in an import hook because an import hook can't get access to the token stream.

And of course my hack for simulating user-defined literals relies on a workaround to fake hooking the token stream; it would be a whole lot harder without that, while it would be a little easier and a whole lot cleaner if I could just hook the token stream.

> This means that when you want to change parsing, you have to change: the grammar, the AST library, the validation library, and Python's exposed parsing module.

And the code generation and other post-validation steps (unless you're just trying to create a do-nothing construct--which can be useful to give you something new to, e.g., feed to MacroPy, but it's not something you're going to check in to core, or use in your own production code).

So yes, changing the grammar is painful. Which is just one more reason that being able to hack on Python without having to hack on Python is very useful. And as of 3.4, all of the pieces are there to do that, and dead-easy to use, and robust enough for production code--as long as the level you want to hack on is source text, AST, or bytecode, not token stream.

> Modern parsers do not separate the grammar from tokenizing, parsing, and validation.  All of these are done in one place, which not only simplifies changes to the grammar, but also protects you from possible inconsistencies.  It was really hard for me when I was making changes to the parser to keep my conception of these four things synchronized.
> So in my opinion, if you're going to modernize the parsing, then put it all together into one simple library that deals with all of it.  It seems like what you're suggesting would add complexity, whereas a merged solution would simplify the code.  

Rewriting the entire parsing mechanism from scratch might simplify things, but it also means rewriting the entire parsing mechanism from scratch. I'm sure you could implement a GLR parser generator that takes a complete declarative grammar and generates something that goes right from source code to a SAX- or iterparse-style pre-validated AST, and that would be a really cool thing. But besides being a lot of work, it would also be a huge amount of risk. You'd almost certainly end up with new bugs, new places where syntax errors are harder to diagnose, new places where compiling is slower than it used to be, etc.

Also, Python is defined as hacking a separate lexical analysis phase, and a module named tokenize that does the same thing as this phase, and tests that test it, and so on. So, you can't just throw all that out and remain backward compatible.

Meanwhile, adding functions to create a token state struct out of a Python iterable, drive it, and expose that functionality to Python is a lot less work, very unlikely to have any effect on the existing default mechanism (if you don't hook the token stream, the existing code runs the same as today, except for an if check inside the next-token function), and much easier to make exactly compatible with existing behavior even when you do hook the token stream (if creating and driving the token state works, all the other code is the same as it ever was). And it has no backward compat implications.

> If it's hard to write a fast parser, then consider writing a parser generator in Python that generates the C code you want.

It's not that it's hard to write a _fast_ parser in Python, but that it's hard to write a parser that does the exact same thing as Python's own parser (and even more so one that does the exact same thing as whatever version of Python you're running under's parser).

From ncoghlan at  Sat Jun  6 01:31:09 2015
From: ncoghlan at (Nick Coghlan)
Date: Sat, 6 Jun 2015 09:31:09 +1000
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 Jun 2015 01:45, "Andrew Barnert" <abarnert at> wrote:
> On Jun 5, 2015, at 05:18, Paul Moore <p.f.moore at> wrote:
> >
> >> On 5 June 2015 at 00:03, Andrew Barnert <abarnert at> wrote:
> >> If it's meant to be a "compile-time decimal value"... What kind of
value is that? What ends up in your co_consts? An instance of
decimal.Decimal? How does that get marshaled?
> >
> > Well, Python bytecode has no way of holding any form of constant
> > Decimal value, so if that's what you want you need a change to the
> > bytecode (and hence the interperter). I'm not sure how that qualifies
> > as "user-defined".
> That's the point I was making. Nick proposed this syntax in reply to a
message where I said that being a compile-time value is both irrelevant and
impossible, so I thought he was claiming that this syntax somehow solved
that problem where mine didn't.

I was mainly replying to Paul's angle bracket syntax proposal, not
specifically to anything you proposed. The problem I have with your
original suggestion is purely syntactic - I don't *want* user-defined
syntax to look like language-defined syntax, because it makes it too hard
for folks to know where to look things up, and I especially don't want a
suffix like "j" to mean "this is a complex literal" while "k" means "this
is a different way of spelling a normal function call that accepts a single
string argument".

I didn't say anything about my preferred syntactic idea *only* being usable
for a compile time construct, I just only consider it *interesting* if
there's a compile time AST transformation component, as that lets the hook
parse a string and break it down into its component parts to make it
transparent to the compiler, including giving it the ability to influence
the compiler's symbol table construction pass. That extra power above and
beyond a normal function call is what would give the construct its
rationale for requesting new syntax - it would be a genuinely new
capability to integrate "not Python code" with the Python compilation
toolchain, rather than an alternate spelling for existing features.

I've also been pondering the idea of how you'd notify the compiler of such
hooks, since I agree you'd want them declared inline in the module that
used them. For that, I think the idea of a "bang import" construct might
work, where a module level line of the form "from x import !y" would not
only be a normal runtime import of "y", but also allow "!y(implicitly
quoted input)" as a compile time construct.

There'd still be some tricky questions to resolve from a pragmatic
perspective, as you'd likely need a way for the bang import to make
additional runtime data available to the rendered AST produced by the bang
calls, without polluting the module global namespace, but it might suffice
to pass in a cell reference that is then populated at runtime by the bang
import step.

> > We seem to be talking at cross purposes here. The questions you're
> > asking are ones I would direct at you (assuming it's you that's after
> > a compile-time value, I'm completely lost as to who is arguing for
> > what any more :-()

That confusion is likely at least partly my fault - while this thread
provided the name, the bang call concept is one I've been pondering in
various forms (most coherently with some of the folks at SciPy last year)
since the last time we discussed switch statements (and the related "once"
statement), and it goes far beyond just defining pseudo-literals.

I brought it up here, because *as a side-effect*, it would provide
pseudo-literals by way of compile time constructs that didn't have any
variable references in the generated AST (other than constructor

> (Of course Python doesn't quite have _no_ compile-time computation; it
has optional constant folding. But if you try to build on top of that
without biting the bullet and just declaring the whole language accessible
at compile time, you end up with the mess that was C++03, where
compile-time code is slow, clumsy, and completely different from runtime
code, which is a large part of why we have C++11, and also why we have D
and various other languages. I don't think Python should add _anything_ new
at compile time. You can always simulate compile time with import time,
where the full language is available, so there's no compelling reason to
make the same mistake C++ did.)

Updated with the bang import idea to complement the bang calls, my vague
notion would actually involve adding two pieces:

* a compile time hook that lets you influence both the symbol table pass
and the AST generation pass (bang import & bang call working together)
* an import time hook that lets you reliably provide required data (like
references to type constructors and other functions) to the AST generated
in step 1 (probably through bang import populating a cell made available to
the corresponding bang call invocations)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mistersheik at  Fri Jun  5 22:40:04 2015
From: mistersheik at (Neil Girdhar)
Date: Fri, 5 Jun 2015 13:40:04 -0700 (PDT)
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

While we're at it, we can also fix "(1if 0 else 2)" :)

On Friday, June 5, 2015 at 4:38:27 PM UTC-4, Neil Girdhar wrote:
> Actually CPython has another step between the AST and the bytecode, which 
> validates the AST to block out trees that violate various rules that were 
> not easily incorporated into the LL(1) grammar.  This means that when you 
> want to change parsing, you have to change: the grammar, the AST library, 
> the validation library, and Python's exposed parsing module.
> Modern parsers do not separate the grammar from tokenizing, parsing, and 
> validation.  All of these are done in one place, which not only simplifies 
> changes to the grammar, but also protects you from possible 
> inconsistencies.  It was really hard for me when I was making changes to 
> the parser to keep my conception of these four things synchronized.
> So in my opinion, if you're going to modernize the parsing, then put it 
> all together into one simple library that deals with all of it.  It seems 
> like what you're suggesting would add complexity, whereas a merged solution 
> would simplify the code.  If it's hard to write a fast parser, then 
> consider writing a parser generator in Python that generates the C code you 
> want.
> Best,
> Neil
> On Friday, June 5, 2015 at 5:30:23 AM UTC-4, Andrew Barnert via 
> Python-ideas wrote:
>> Compiling a module has four steps: 
>>  * bytes->str (based on encoding declaration or default) 
>>  * str->token stream 
>>  * token stream->AST 
>>  * AST->bytecode 
>> You can very easily hook at every point in that process except the token 
>> stream. 
>> There _is_ a workaround: re-encode the text to bytes, wrap it in a 
>> BytesIO, call tokenize, munge the token stream, call untokenize, re-decode 
>> back to text, then pass that to compile or ast.parse. But, besides being a 
>> bit verbose and painful, that means your line and column numbers get 
>> screwed up. So, while its fine for a quick&dirty toy like my 
>> user-literal-hack, it's not something you'd want to do in a real import 
>> hook for use in real code. 
>> This could be solved by just changing ast.parse to accept an iterable of 
>> tokens or tuples as well as a string, and likewise for compile. 
>> That isn't exactly a trivial change, because under the covers the _ast 
>> module is written in C, partly auto-generated, and expects as input a CST, 
>> which is itself created from a different tokenizer written in C with an 
>> similar but different API (since C doesn't have iterators). And adding a 
>> PyTokenizer_FromIterable or something seems like it might raise some fun 
>> bootstrapping issues that I haven't thought through yet. But I think it 
>> ought to be doable without having to reimplement the whole parser in pure 
>> Python. And I think it would be worth doing. 
>> While we're at it, a few other (much smaller) changes would be nice: 
>>  * Allow tokenize to take a text file instead of making it take a binary 
>> file and repeat the encoding detection. 
>>  * Allow tokenize to take a file instead of its readline method. 
>>  * Allow tokenize to take a str/bytes instead of requiring a file. 
>>  * Add flags to compile to stop at any stage (decoded text, tokens, AST, 
>> or bytecode) instead of just the last two. 
>> (The funny thing is that the C tokenizer actually already does support 
>> strings and bytes and file objects.) 
>> I realize that doing all of these changes would mean that compile can now 
>> get an iterable and not know whether it's a file or a token stream until it 
>> tries to iterate it. So maybe that isn't the best API; maybe it's better to 
>> explicitly call tokenize, then ast.parse, then compile instead of calling 
>> compile repeatedly with different flags. 
>> _______________________________________________ 
>> Python-ideas mailing list 
>> Python... at 
>> Code of Conduct: 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sat Jun  6 02:28:32 2015
From: guido at (Guido van Rossum)
Date: Fri, 5 Jun 2015 17:28:32 -0700
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 4, 2015 at 4:20 PM, Andrew Barnert <abarnert at> wrote:

> On Jun 4, 2015, at 14:05, Guido van Rossum <guido at> wrote:
> OK, you can attribute that to lousy docs. The intention is that builtin
> types are immutable.
> I can go file bugs against those other implementations, but first, what's
> the rationale?
> The ABC PEP, the numbers PEP discussion, and the type/class unification
> tutorial all use the same reason: In CPython, different interpreters in the
> same memory space (as with mod_python) share the same built-in types. From
> the numbers discussion, it sounds like this was the only reason to reject
> the idea of just patching float.__bases__.
> But most other Python implementations don't have process-wide globals like
> that to worry about; patching int in one interpreter can't possibly affect
> any other interpreter.
> "Because CPython can't do it, nobody else should do it, to keep code
> portable" might be a good enough rationale for something this fundamental,
> but if that's not the one you're thinking of, I don't want to put those
> words in your mouth.

Why do you need a better rationale?

The builtins are shared between all modules in a way that other things
aren't. Nothing good can come from officially recognizing the ability to
monkey-patch the builtin types -- it would just lead to paranoia amongst
library developers.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rymg19 at  Sat Jun  6 02:50:56 2015
From: rymg19 at (Ryan Gonzalez)
Date: Fri, 05 Jun 2015 19:50:56 -0500
Subject: [Python-ideas] User-defined literals
In-Reply-To: <>
References: <>
Message-ID: <>

On June 5, 2015 7:28:32 PM CDT, Guido van Rossum <guido at> wrote:
>On Thu, Jun 4, 2015 at 4:20 PM, Andrew Barnert <abarnert at>
>> On Jun 4, 2015, at 14:05, Guido van Rossum <guido at> wrote:
>> OK, you can attribute that to lousy docs. The intention is that
>> types are immutable.
>> I can go file bugs against those other implementations, but first,
>> the rationale?
>> The ABC PEP, the numbers PEP discussion, and the type/class
>> tutorial all use the same reason: In CPython, different interpreters
>in the
>> same memory space (as with mod_python) share the same built-in types.
>> the numbers discussion, it sounds like this was the only reason to
>> the idea of just patching float.__bases__.
>> But most other Python implementations don't have process-wide globals
>> that to worry about; patching int in one interpreter can't possibly
>> any other interpreter.
>> "Because CPython can't do it, nobody else should do it, to keep code
>> portable" might be a good enough rationale for something this
>> but if that's not the one you're thinking of, I don't want to put
>> words in your mouth.
>Why do you need a better rationale?
>The builtins are shared between all modules in a way that other things
>aren't. Nothing good can come from officially recognizing the ability
>monkey-patch the builtin types -- it would just lead to paranoia
>library developers.

Like javascript:void hacks to avoid undefined being re-defined.

Sent from my Android device with K-9 Mail. Please excuse my brevity.

From mistersheik at  Sat Jun  6 04:08:03 2015
From: mistersheik at (Neil Girdhar)
Date: Fri, 5 Jun 2015 22:08:03 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert <abarnert at> wrote:

> On Jun 5, 2015, at 13:38, Neil Girdhar <mistersheik at> wrote:
> >
> > Actually CPython has another step between the AST and the bytecode,
> which validates the AST to block out trees that violate various rules that
> were not easily incorporated into the LL(1) grammar.
> Yes, and it also builds CST nodes before building the AST, and there's a
> step after AST validation and before bytecode generation where the symbol
> table for the scope is built and LEG rules are applies. But none of those
> are things that seem particularly useful to hook. Maybe that's just a
> failure of imagination, but I've never wanted to do it.
> Hooking the token stream, on the other hand, has pretty obvious uses. For
> example, in the user-defined literal thread, Paul Moore suggested that for
> Nick Coghlan's "compile-time expression" idea, requiring valid Python
> syntax would be way too restrictive, but requiring valid Python tokens is
> probably OK, and it automatically solves the quoting problem, and it would
> usually be easier than parsing text. I think he's right about all three
> parts of that, but unfortunately you can't implement it that way in an
> import hook because an import hook can't get access to the token stream.
> And of course my hack for simulating user-defined literals relies on a
> workaround to fake hooking the token stream; it would be a whole lot harder
> without that, while it would be a little easier and a whole lot cleaner if
> I could just hook the token stream.

Yes, I think I understand your motivation.  Can you help me understand the
what the hook you would write would look like?

> > This means that when you want to change parsing, you have to change: the
> grammar, the AST library, the validation library, and Python's exposed
> parsing module.
> And the code generation and other post-validation steps (unless you're
> just trying to create a do-nothing construct--which can be useful to give
> you something new to, e.g., feed to MacroPy, but it's not something you're
> going to check in to core, or use in your own production code).
> So yes, changing the grammar is painful. Which is just one more reason
> that being able to hack on Python without having to hack on Python is very
> useful. And as of 3.4, all of the pieces are there to do that, and
> dead-easy to use, and robust enough for production code--as long as the
> level you want to hack on is source text, AST, or bytecode, not token
> stream.


> > Modern parsers do not separate the grammar from tokenizing, parsing, and
> validation.  All of these are done in one place, which not only simplifies
> changes to the grammar, but also protects you from possible
> inconsistencies.  It was really hard for me when I was making changes to
> the parser to keep my conception of these four things synchronized.
> >
> > So in my opinion, if you're going to modernize the parsing, then put it
> all together into one simple library that deals with all of it.  It seems
> like what you're suggesting would add complexity, whereas a merged solution
> would simplify the code.
> Rewriting the entire parsing mechanism from scratch might simplify things,
> but it also means rewriting the entire parsing mechanism from scratch. I'm
> sure you could implement a GLR parser generator that takes a complete
> declarative grammar and generates something that goes right from source
> code to a SAX- or iterparse-style pre-validated AST, and that would be a
> really cool thing. But besides being a lot of work, it would also be a huge
> amount of risk. You'd almost certainly end up with new bugs, new places
> where syntax errors are harder to diagnose, new places where compiling is
> slower than it used to be, etc.

> Also, Python is defined as hacking a separate lexical analysis phase, and
> a module named tokenize that does the same thing as this phase, and tests
> that test it, and so on. So, you can't just throw all that out and remain
> backward compatible.

I don't see why that is.  The lexical "phase" would just become new parsing
rules, and so it would be supplanted by the parser.  Then you wouldn't need
to add special hooks for lexing.  You would merely have hooks for parsing.

> Meanwhile, adding functions to create a token state struct out of a Python
> iterable, drive it, and expose that functionality to Python is a lot less
> work, very unlikely to have any effect on the existing default mechanism
> (if you don't hook the token stream, the existing code runs the same as
> today, except for an if check inside the next-token function), and much
> easier to make exactly compatible with existing behavior even when you do
> hook the token stream (if creating and driving the token state works, all
> the other code is the same as it ever was). And it has no backward compat
> implications.
> > If it's hard to write a fast parser, then consider writing a parser
> generator in Python that generates the C code you want.
> It's not that it's hard to write a _fast_ parser in Python, but that it's
> hard to write a parser that does the exact same thing as Python's own
> parser (and even more so one that does the exact same thing as whatever
> version of Python you're running under's parser).

I think it's worth exploring having the whole parser in one place, rather
than repeating the same structures in at least four places.  With every
change to the Python grammar, you pay for this forced repetition.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mistersheik at  Sat Jun  6 04:21:08 2015
From: mistersheik at (Neil Girdhar)
Date: Fri, 5 Jun 2015 22:21:08 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

Back in the day, I remember Lex and Yacc, then came Flex and Bison, and
then ANTLR, which unified lexing and parsing under one common language.  In
general, I like the idea of putting everything together.  I think that
because of Python's separation of lexing and parsing, it accepts weird text
like "(1if 0else 2)", which is crazy.

Here's what I think I want in a parser:

Along with the grammar, you also give it code that it can execute as it
matches each symbol in a rule.  In Python for example, as it matches each
argument passed to a function, it would keep track of the count of *args,
**kwargs, and  keyword arguments, and regular arguments, and then raise a
syntax error if it encounters anything out of order.  Right now that check
is done in validate.c, which is really annoying.

I want to specify the lexical rules in the same way that I specify the
parsing rules.  And I think (after Andrew elucidates what he means by
hooks) I want the parsing hooks to be the same thing as lexing hooks, and I
agree with him that hooking into the lexer is useful.

I want the parser module to be automatically-generated from the grammar if
that's possible (I think it is).

Typically each grammar rule is implemented using a class.  I want the code
generation to be a method on that class.  This makes changing the AST
easy.  For example, it was suggested that we might change the grammar to
include a starstar_expr node.  This should be an easy change, but because
of the way every node validates its children, which it expects to have a
certain tree structure, it would be a big task with almost no payoff.

There's also a question of which parsing algorithm you use.  I wish I knew
more about the state-of-art parsers.  I was interested because I wanted to
use Python to parse my LaTeX files.  I got the impression that were state of the art, but I'm
not sure.

I'm curious what other people will contribute to this discussion as I think
having no great parsing library is a huge hole in Python.  Having one would
definitely allow me to write better utilities using Python.

On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho <luciano at> wrote:

> On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar <mistersheik at>
> wrote:
> > Modern parsers do not separate the grammar from tokenizing, parsing, and
> > validation.  All of these are done in one place, which not only
> simplifies
> > changes to the grammar, but also protects you from possible
> inconsistencies.
> Hi, Neil, thanks for that!
> Having studied only ancient parsers, I'd love to learn new ones. Can
> you please post references to modern parsing? Actual parsers, books,
> papers, anything you may find valuable.
> I have I hunch you're talking about PEG parsers, but maybe something
> else, or besides?
> Thanks!
> Best,
> Luciano
> --
> Luciano Ramalho
> |  Author of Fluent Python (O'Reilly, 2015)
> |
> |  Professor em:
> |  Twitter: @ramalhoorg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rymg19 at  Sat Jun  6 04:55:20 2015
From: rymg19 at (Ryan Gonzalez)
Date: Fri, 05 Jun 2015 21:55:20 -0500
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

IMO, lexer and parser separation is sometimes great. It also makes hand-written parsers much simpler.

"Modern" parsing with no lexer and EBNF can sometimes be slower than the classics, especially if one is using an ultra-fast lexer generator such as re2c.

On June 5, 2015 9:21:08 PM CDT, Neil Girdhar <mistersheik at> wrote:
>Back in the day, I remember Lex and Yacc, then came Flex and Bison, and
>then ANTLR, which unified lexing and parsing under one common language.
> In
>general, I like the idea of putting everything together.  I think that
>because of Python's separation of lexing and parsing, it accepts weird
>like "(1if 0else 2)", which is crazy.
>Here's what I think I want in a parser:
>Along with the grammar, you also give it code that it can execute as it
>matches each symbol in a rule.  In Python for example, as it matches
>argument passed to a function, it would keep track of the count of
>**kwargs, and  keyword arguments, and regular arguments, and then raise
>syntax error if it encounters anything out of order.  Right now that
>is done in validate.c, which is really annoying.
>I want to specify the lexical rules in the same way that I specify the
>parsing rules.  And I think (after Andrew elucidates what he means by
>hooks) I want the parsing hooks to be the same thing as lexing hooks,
>and I
>agree with him that hooking into the lexer is useful.
>I want the parser module to be automatically-generated from the grammar
>that's possible (I think it is).
>Typically each grammar rule is implemented using a class.  I want the
>generation to be a method on that class.  This makes changing the AST
>easy.  For example, it was suggested that we might change the grammar
>include a starstar_expr node.  This should be an easy change, but
>of the way every node validates its children, which it expects to have
>certain tree structure, it would be a big task with almost no payoff.
>There's also a question of which parsing algorithm you use.  I wish I
>more about the state-of-art parsers.  I was interested because I wanted
>use Python to parse my LaTeX files.  I got the impression that
> were state of the art, but
>not sure.
>I'm curious what other people will contribute to this discussion as I
>having no great parsing library is a huge hole in Python.  Having one
>definitely allow me to write better utilities using Python.
>On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho <luciano at>
>> On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar <mistersheik at>
>> wrote:
>> > Modern parsers do not separate the grammar from tokenizing,
>parsing, and
>> > validation.  All of these are done in one place, which not only
>> simplifies
>> > changes to the grammar, but also protects you from possible
>> inconsistencies.
>> Hi, Neil, thanks for that!
>> Having studied only ancient parsers, I'd love to learn new ones. Can
>> you please post references to modern parsing? Actual parsers, books,
>> papers, anything you may find valuable.
>> I have I hunch you're talking about PEG parsers, but maybe something
>> else, or besides?
>> Thanks!
>> Best,
>> Luciano
>> --
>> Luciano Ramalho
>> |  Author of Fluent Python (O'Reilly, 2015)
>> |
>> |  Professor em:
>> |  Twitter: @ramalhoorg
>Python-ideas mailing list
>Python-ideas at
>Code of Conduct:

Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mistersheik at  Sat Jun  6 04:57:54 2015
From: mistersheik at (Neil Girdhar)
Date: Fri, 5 Jun 2015 22:57:54 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

I don't see why it makes anything simpler.  Your lexing rules just live
alongside your parsing rules.  And I also don't see why it has to be faster
to do the lexing in a separate part of the code.  Wouldn't the parser
generator realize that that some of the rules don't use the stack and so
they would end up just as fast as any lexer?

On Fri, Jun 5, 2015 at 10:55 PM, Ryan Gonzalez <rymg19 at> wrote:

> IMO, lexer and parser separation is sometimes great. It also makes
> hand-written parsers much simpler.
> "Modern" parsing with no lexer and EBNF can sometimes be slower than the
> classics, especially if one is using an ultra-fast lexer generator such as
> re2c.
> On June 5, 2015 9:21:08 PM CDT, Neil Girdhar <mistersheik at>
> wrote:
>> Back in the day, I remember Lex and Yacc, then came Flex and Bison, and
>> then ANTLR, which unified lexing and parsing under one common language.  In
>> general, I like the idea of putting everything together.  I think that
>> because of Python's separation of lexing and parsing, it accepts weird text
>> like "(1if 0else 2)", which is crazy.
>> Here's what I think I want in a parser:
>> Along with the grammar, you also give it code that it can execute as it
>> matches each symbol in a rule.  In Python for example, as it matches each
>> argument passed to a function, it would keep track of the count of *args,
>> **kwargs, and  keyword arguments, and regular arguments, and then raise a
>> syntax error if it encounters anything out of order.  Right now that check
>> is done in validate.c, which is really annoying.
>> I want to specify the lexical rules in the same way that I specify the
>> parsing rules.  And I think (after Andrew elucidates what he means by
>> hooks) I want the parsing hooks to be the same thing as lexing hooks, and I
>> agree with him that hooking into the lexer is useful.
>> I want the parser module to be automatically-generated from the grammar
>> if that's possible (I think it is).
>> Typically each grammar rule is implemented using a class.  I want the
>> code generation to be a method on that class.  This makes changing the AST
>> easy.  For example, it was suggested that we might change the grammar to
>> include a starstar_expr node.  This should be an easy change, but because
>> of the way every node validates its children, which it expects to have a
>> certain tree structure, it would be a big task with almost no payoff.
>> There's also a question of which parsing algorithm you use.  I wish I
>> knew more about the state-of-art parsers.  I was interested because I
>> wanted to use Python to parse my LaTeX files.  I got the impression that
>> were state of the art, but
>> I'm not sure.
>> I'm curious what other people will contribute to this discussion as I
>> think having no great parsing library is a huge hole in Python.  Having one
>> would definitely allow me to write better utilities using Python.
>> On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho <luciano at>
>> wrote:
>>> On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar <mistersheik at>
>>> wrote:
>>> > Modern parsers do not separate the grammar from tokenizing, parsing,
>>> and
>>> > validation.  All of these are done in one place, which not only
>>> simplifies
>>> > changes to the grammar, but also protects you from possible
>>> inconsistencies.
>>> Hi, Neil, thanks for that!
>>> Having studied only ancient parsers, I'd love to learn new ones. Can
>>> you please post references to modern parsing? Actual parsers, books,
>>> papers, anything you may find valuable.
>>> I have I hunch you're talking about PEG parsers, but maybe something
>>> else, or besides?
>>> Thanks!
>>> Best,
>>> Luciano
>>> --
>>> Luciano Ramalho
>>> |  Author of Fluent Python (O'Reilly, 2015)
>>> |
>>> |  Professor em:
>>> |  Twitter: @ramalhoorg
>> ------------------------------
>> Python-ideas mailing list
>> Python-ideas at
>> Code of Conduct:
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From random832 at  Sat Jun  6 05:24:25 2015
From: random832 at (random832 at
Date: Fri, 05 Jun 2015 23:24:25 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 5, 2015, at 22:21, Neil Girdhar wrote:
> Back in the day, I remember Lex and Yacc, then came Flex and Bison, and
> then ANTLR, which unified lexing and parsing under one common language. 
> In
> general, I like the idea of putting everything together.  I think that
> because of Python's separation of lexing and parsing, it accepts weird
> text
> like "(1if 0else 2)", which is crazy.

I don't think this really has anything to do with separation of lexing
and parsing. C rejects this (where "this" is "integer followed by
arbitrary alphabetic token") purely due to the lexing stage
(specifically, 1if or 0else would be a single "preprocessor number"
token, with no valid meaning. Of course, this has its own quirks, for
example 0xE+1 is invalid in C.)

From guido at  Sat Jun  6 06:27:14 2015
From: guido at (Guido van Rossum)
Date: Fri, 5 Jun 2015 21:27:14 -0700
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 5, 2015 at 7:57 PM, Neil Girdhar <mistersheik at> wrote:

> I don't see why it makes anything simpler.  Your lexing rules just live
> alongside your parsing rules.  And I also don't see why it has to be faster
> to do the lexing in a separate part of the code.  Wouldn't the parser
> generator realize that that some of the rules don't use the stack and so
> they would end up just as fast as any lexer?

You're putting a lot of faith in "modern" parsers. I don't know if PLY
qualifies as such, but it certainly is newer than Lex/Yacc, and it unifies
the lexer and parser. However I don't think it would be much better for a
language the size of Python.

We are using PLY at Dropbox to parse a medium-sized DSL, and while at the
beginning it was convenient to have the entire language definition in one
place, there were a fair number of subtle bugs in the earlier stages of the
project due to the mixing of lexing and parsing. In order to get this right
it seems you actually have to *think* about the lexing and parsing stages
differently, and combining them in one tool doesn't actually help you to
think more clearly.

Also, this approach doesn't really do much for the later stages -- you can
easily construct a parse tree but it's a fairly direct representation of
the grammar rules, and it offers no help in managing a symbol table or
generating code.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Sat Jun  6 07:04:00 2015
From: ncoghlan at (Nick Coghlan)
Date: Sat, 6 Jun 2015 15:04:00 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 June 2015 at 15:00, Nick Coghlan <ncoghlan at> wrote:
> On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at> wrote:
>> I'm curious what other people will contribute to this discussion as I think
>> having no great parsing library is a huge hole in Python.  Having one would
>> definitely allow me to write better utilities using Python.
> The design of *Python's* grammar is deliberately restricted to being
> parsable with an LL(1) parser. There are a great many static analysis
> and syntax highlighting tools that are able to take advantage of that
> simplicity because they only care about the syntax, not the full
> semantics.
> Anyone actually doing their *own* parsing of something else *in*
> Python, would be better advised to reach for PLY
> ( ). PLY is the parser underlying
>, and hence the highly regarded
> CFFI library,

For the later stages of the pipeline (i.e. AST -> code generation),
CPython now uses Eli Bendersky's asdl_parser:

More background on that:


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Sat Jun  6 07:00:03 2015
From: ncoghlan at (Nick Coghlan)
Date: Sat, 6 Jun 2015 15:00:03 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at> wrote:
> I'm curious what other people will contribute to this discussion as I think
> having no great parsing library is a huge hole in Python.  Having one would
> definitely allow me to write better utilities using Python.

The design of *Python's* grammar is deliberately restricted to being
parsable with an LL(1) parser. There are a great many static analysis
and syntax highlighting tools that are able to take advantage of that
simplicity because they only care about the syntax, not the full

Anyone actually doing their *own* parsing of something else *in*
Python, would be better advised to reach for PLY
( ). PLY is the parser underlying, and hence the highly regarded
CFFI library,

Other notable parsing alternatives folks may want to look at include and (both of which allow you to use
Python code to define your grammar, rather than having to learn a
formal grammar notation).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From mistersheik at  Sat Jun  6 07:29:21 2015
From: mistersheik at (Neil Girdhar)
Date: Sat, 6 Jun 2015 01:29:21 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan <ncoghlan at> wrote:

> On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at> wrote:
> > I'm curious what other people will contribute to this discussion as I
> think
> > having no great parsing library is a huge hole in Python.  Having one
> would
> > definitely allow me to write better utilities using Python.
> The design of *Python's* grammar is deliberately restricted to being
> parsable with an LL(1) parser. There are a great many static analysis
> and syntax highlighting tools that are able to take advantage of that
> simplicity because they only care about the syntax, not the full
> semantics.

Given the validation that happens, it's not actually LL(1) though.  It's
mostly LL(1) with some syntax errors that are raised for various illegal

Anyway, no one is suggesting changing the grammar.

> Anyone actually doing their *own* parsing of something else *in*
> Python, would be better advised to reach for PLY
> ( ). PLY is the parser underlying
>, and hence the highly regarded
> CFFI library,
> Other notable parsing alternatives folks may want to look at include
> and
> (both of which allow you to use
> Python code to define your grammar, rather than having to learn a
> formal grammar notation).
I looked at ply and pyparsing, but it was impossible to simply parse LaTeX
because I couldn't explain to suck up the right number of arguments given
the name of the function.  When it sees a function, it learns how many
arguments that function needs.  When it sees a function call \a{1}{2}{3},
if "\a" takes 2 arguments, then it should only suck up 1 and 2 as
arguments, and leave 3 as a regular text token. In other words, I should be
able to tell the parser what to expect in code that lives on the rule edges.

The parsing tools you listed work really well until you need to do
something like (1) the validation step that happens in Python, or (2)
figuring out exactly where the syntax error is (line and column number) or
(3) ensuring that whitespace separates some tokens even when it's not
required to disambiguate different parse trees.  I got the impression that
they wanted to make these languages simple for the simple cases, but they
were made too simple and don't allow you to do everything in one simple



> Regards,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Sat Jun  6 07:30:58 2015
From: abarnert at (Andrew Barnert)
Date: Sat, 6 Jun 2015 05:30:58 +0000 (UTC)
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

First, I think your idea is almost completely tangential to mine. Yes, if you completely replaced both the interface and the implementation of the parser, you could do just about anything you wanted. But assuming nobody is going to completely replace the way Python does parsing today, I think it's still useful to add the one missing useful hook to the existing system. But let's continue.

On Friday, June 5, 2015 7:08 PM, Neil Girdhar <mistersheik at> wrote:
On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert <abarnert at> wrote:
>On Jun 5, 2015, at 13:38, Neil Girdhar <mistersheik at> wrote:

>>Hooking the token stream, on the other hand, has pretty obvious uses. For example, in the user-defined literal thread, Paul Moore suggested that for Nick Coghlan's "compile-time expression" idea, requiring valid Python syntax would be way too restrictive, but requiring valid Python tokens is probably OK, and it automatically solves the quoting problem, and it would usually be easier than parsing text. I think he's right about all three parts of that, but unfortunately you can't implement it that way in an import hook because an import hook can't get access to the token stream.
>>And of course my hack for simulating user-defined literals relies on a workaround to fake hooking the token stream; it would be a whole lot harder without that, while it would be a little easier and a whole lot cleaner if I could just hook the token stream.

>Yes, I think I understand your motivation.  Can you help me understand the what the hook you would write would look like?

That's a fair question.

First, let's look at the relevant parts of an AST hook that transforms every float literal into a Decimal constructor call:

    class FloatNodeWrapper(ast.NodeTransformer):
        def visit_Num(self, node):
            if isinstance(node.n, float):
                return ast.Call(func=ast.Name(id='Decimal', ctx=ast.Load()),
                                args=[ast.Str(s=repr(node.n))], keywords=[])
            return node

    # ...

    def source_to_code(self, data, path, *, _optimize=-1):
        source = importlib.decode_source(data)
        tree = ast.parse(source)
        tree = FloatNodeWrapper().visit(tree)
        return compile(tree, path, 'exec', dont_inherit=True, optimize=_optimize)

Now, here's what I'd like to write for a token hook that does the same thing at the token level:

    def retokenize_float(tokens):
        for num, val, *loc in tokens:

            if num == tokenize.NUMBER and ('.' in val or 'e' in val or 'E' in val):
                yield tokenize.NAME, 'Decimal', *loc
                yield tokenize.OP, '(', *loc

                yield tokenize.STRING, repr(val), *loc

                yield tokenize.OP, ')', *loc

                yield num, val, *loc

    # ...

    def source_to_code(self, data, path, *, _optimize=-1):
        source = importlib.decode_source(data)
        tokens = tokenize.tokenize(source)
        tokens = retokenize(tokens)
        return compile(tokens, path, 'exec', dont_inherit=True, optimize=_optimize)

Of course I don't want to do the same thing, I want to do something that you can't do at the AST level?see my user literal hack for an example. But this shows the parallels and differences between the two. If you want more background, see (which I wrote to explain to someone else how floatliteralhack works).

Of course I'm not presenting this as an ideal design if I were starting Python from scratch, but as the best design given what Python is already producing and consuming (a stream of tokens that's guaranteed to be equivalent to what you get out of the tokenize module).

>>Also, Python is defined as hacking a separate lexical analysis phase, and a module named tokenize that does the same thing as this phase, and tests that test it, and so on. So, you can't just throw all that out and remain backward compatible.

>I don't see why that is.  The lexical "phase" would just become new parsing rules, and so it would be supplanted by the parser.  Then you wouldn't need to add special hooks for lexing.  You would merely have hooks for parsing.

Callbacks from a tree builder and using a user-modifiable grammar are clearly not backward compatible with ast.NodeTransformer. They're a completely different way of doing things.

Is it a better way? Maybe. Plenty of people are using OMeta/JS every day. Of course plenty of people are cursing the fact that OMeta/JS sometimes generates exponential-time backtracking and it's never clear which part of your production rules are to blame, or the fact that you can't get a useful error message out of it, etc.

And I'm pretty sure you could design something with most of the strengths of OMeta without its weaknesses (just using a standard packrat PEG parser instead of an extended PEG parser seems like it would turn most of the exponential productions into explicit errors in the grammar?). Or you could go the opposite way and use GLR and bottom-up callbacks instead of PEG and top-down. Something like that would be a great idea for a research project. But it's probably not a great idea for a proposal to change the core of CPython, at least until someone does that research project.

>>It's not that it's hard to write a _fast_ parser in Python, but that it's hard to write a parser that does the exact same thing as Python's own parser (and even more so one that does the exact same thing as whatever version of Python you're running under's parser).
>I think it's worth exploring having the whole parser in one place, rather than repeating the same structures in at least four places.  With every change to the Python grammar, you pay for this forced repetition.

The repetition is really a different issue. A different implementation of the same basic design Python already has could make it so you only have to write explicit code for the 3 places the CST->AST node doesn't follow the same rules as everywhere else and the dozen or so places where the AST has to be post-validated, instead of having to write explicit code for both sides of every single node type. And that kind of cleanup could be done without breaking backward compatibility, because the interfaces on each side of the code would be unchanged. But that's also a lot less fun of a change than writing a whole new parser, so I wouldn't be surprised if nobody ever did it?

From mistersheik at  Sat Jun  6 07:50:28 2015
From: mistersheik at (Neil Girdhar)
Date: Sat, 6 Jun 2015 01:50:28 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert <abarnert at> wrote:

> First, I think your idea is almost completely tangential to mine. Yes, if
> you completely replaced both the interface and the implementation of the
> parser, you could do just about anything you wanted. But assuming nobody is
> going to completely replace the way Python does parsing today, I think it's
> still useful to add the one missing useful hook to the existing system. But
> let's continue.
> On Friday, June 5, 2015 7:08 PM, Neil Girdhar <mistersheik at>
> wrote:
> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert <abarnert at> wrote:
> >
> >On Jun 5, 2015, at 13:38, Neil Girdhar <mistersheik at> wrote:
> >>Hooking the token stream, on the other hand, has pretty obvious uses.
> For example, in the user-defined literal thread, Paul Moore suggested that
> for Nick Coghlan's "compile-time expression" idea, requiring valid Python
> syntax would be way too restrictive, but requiring valid Python tokens is
> probably OK, and it automatically solves the quoting problem, and it would
> usually be easier than parsing text. I think he's right about all three
> parts of that, but unfortunately you can't implement it that way in an
> import hook because an import hook can't get access to the token stream.
> >>
> >>And of course my hack for simulating user-defined literals relies on a
> workaround to fake hooking the token stream; it would be a whole lot harder
> without that, while it would be a little easier and a whole lot cleaner if
> I could just hook the token stream.
> >
> >Yes, I think I understand your motivation.  Can you help me understand
> the what the hook you would write would look like?
> That's a fair question.
> First, let's look at the relevant parts of an AST hook that transforms
> every float literal into a Decimal constructor call:
>     class FloatNodeWrapper(ast.NodeTransformer):
>         def visit_Num(self, node):
>             if isinstance(node.n, float):
>                 return ast.Call(func=ast.Name(id='Decimal',
> ctx=ast.Load()),
>                                 args=[ast.Str(s=repr(node.n))],
> keywords=[])
>             return node
>     # ...
>     def source_to_code(self, data, path, *, _optimize=-1):
>         source = importlib.decode_source(data)
>         tree = ast.parse(source)
>         tree = FloatNodeWrapper().visit(tree)
>         ast.fix_missing_locations(tree)
>         return compile(tree, path, 'exec', dont_inherit=True,
> optimize=_optimize)
> Now, here's what I'd like to write for a token hook that does the same
> thing at the token level:
>     def retokenize_float(tokens):
>         for num, val, *loc in tokens:
>             if num == tokenize.NUMBER and ('.' in val or 'e' in val or 'E'
> in val):
>                 yield tokenize.NAME, 'Decimal', *loc
>                 yield tokenize.OP, '(', *loc
>                 yield tokenize.STRING, repr(val), *loc
>                 yield tokenize.OP, ')', *loc
>             else:
>                 yield num, val, *loc
>     # ...
>     def source_to_code(self, data, path, *, _optimize=-1):
>         source = importlib.decode_source(data)
>         tokens = tokenize.tokenize(source)
>         tokens = retokenize(tokens)
>         return compile(tokens, path, 'exec', dont_inherit=True,
> optimize=_optimize)
> Of course I don't want to do the same thing, I want to do something that
> you can't do at the AST level?see my user literal hack for an example. But
> this shows the parallels and differences between the two. If you want more
> background, see
> (which I wrote to explain to someone else how floatliteralhack works).

Yes.  I want to point that if the lexer rules were alongside the parser,
they would be generating ast nodes ? so the hook for calling Decimal for
all floating point tokens would be doable in the same way as your AST
hook.  For the new tokens that you want, the ideal solution I think is to
modify the python parsing grammar before it parses the text.

> Of course I'm not presenting this as an ideal design if I were starting
> Python from scratch, but as the best design given what Python is already
> producing and consuming (a stream of tokens that's guaranteed to be
> equivalent to what you get out of the tokenize module).

This is like saying "I want to change some things, but not other things".
I want the best long-term solution, whatever that is.  (I don't know what
it is.)  In the long run, moving towards the best solution tends to be the
least total work.  Specifically, if lexing hooks are implemented
differently than parsing hooks, then that change is probably going to be
backed out and replaced with the "ideal" ? eventually ? maybe in five years
or ten or twenty.  And when it's removed, there's deprecation periods and
upgrade pains.  At least let's explore what is the "ideal" solution?

> >>Also, Python is defined as hacking a separate lexical analysis phase,
> and a module named tokenize that does the same thing as this phase, and
> tests that test it, and so on. So, you can't just throw all that out and
> remain backward compatible.
> >
> >I don't see why that is.  The lexical "phase" would just become new
> parsing rules, and so it would be supplanted by the parser.  Then you
> wouldn't need to add special hooks for lexing.  You would merely have hooks
> for parsing.
> Callbacks from a tree builder and using a user-modifiable grammar are
> clearly not backward compatible with ast.NodeTransformer. They're a
> completely different way of doing things.
> Is it a better way? Maybe. Plenty of people are using OMeta/JS every day.
> Of course plenty of people are cursing the fact that OMeta/JS sometimes
> generates exponential-time backtracking and it's never clear which part of
> your production rules are to blame, or the fact that you can't get a useful
> error message out of it, etc.

I don't know about OMeta, but the Earley parsing algorithm is worst-cast
cubic time "quadratic time for unambiguous grammars, and linear time for
almost all LR(k) grammars".

> And I'm pretty sure you could design something with most of the strengths
> of OMeta without its weaknesses (just using a standard packrat PEG parser
> instead of an extended PEG parser seems like it would turn most of the
> exponential productions into explicit errors in the grammar?). Or you could
> go the opposite way and use GLR and bottom-up callbacks instead of PEG and
> top-down. Something like that would be a great idea for a research project.
> But it's probably not a great idea for a proposal to change the core of
> CPython, at least until someone does that research project.

Yes, totally agree with you.  So if it were me doing this work, I would put
my energy in the research project to write an amazing parser in Python.
And then  I would try to convince the Python team to use that.  I guess we
don't disagree at all.

> >>It's not that it's hard to write a _fast_ parser in Python, but that
> it's hard to write a parser that does the exact same thing as Python's own
> parser (and even more so one that does the exact same thing as whatever
> version of Python you're running under's parser).
> >
> >I think it's worth exploring having the whole parser in one place, rather
> than repeating the same structures in at least four places.  With every
> change to the Python grammar, you pay for this forced repetition.
> >
> The repetition is really a different issue. A different implementation of
> the same basic design Python already has could make it so you only have to
> write explicit code for the 3 places the CST->AST node doesn't follow the
> same rules as everywhere else and the dozen or so places where the AST has
> to be post-validated, instead of having to write explicit code for both
> sides of every single node type. And that kind of cleanup could be done
> without breaking backward compatibility, because the interfaces on each
> side of the code would be unchanged. But that's also a lot less fun of a
> change than writing a whole new parser, so I wouldn't be surprised if
> nobody ever did it?

Cool, I didn't know it was even possible.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mistersheik at  Sat Jun  6 08:04:51 2015
From: mistersheik at (Neil Girdhar)
Date: Sat, 6 Jun 2015 02:04:51 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 6, 2015 at 12:27 AM, Guido van Rossum <guido at> wrote:

> On Fri, Jun 5, 2015 at 7:57 PM, Neil Girdhar <mistersheik at>
> wrote:
>> I don't see why it makes anything simpler.  Your lexing rules just live
>> alongside your parsing rules.  And I also don't see why it has to be faster
>> to do the lexing in a separate part of the code.  Wouldn't the parser
>> generator realize that that some of the rules don't use the stack and so
>> they would end up just as fast as any lexer?
> You're putting a lot of faith in "modern" parsers. I don't know if PLY
> qualifies as such, but it certainly is newer than Lex/Yacc, and it unifies
> the lexer and parser. However I don't think it would be much better for a
> language the size of Python.

I agree with you.  I think the problem might be that the parser that I'm
dreaming doesn't exist for Python.  In another message, I wrote what I


Along with the grammar, you also give it code that it can execute as it
matches each symbol in a rule.  In Python for example, as it matches each
argument passed to a function, it would keep track of the count of *args,
**kwargs, and  keyword arguments, and regular arguments, and then raise a
syntax error if it encounters anything out of order.  Right now that check
is done in validate.c, which is really annoying.

I want to specify the lexical rules in the same way that I specify the
parsing rules.  And I think (after Andrew elucidates what he means by
hooks) I want the parsing hooks to be the same thing as lexing hooks, and I
agree with him that hooking into the lexer is useful.

I want the parser module to be automatically-generated from the grammar if
that's possible (I think it is).

Typically each grammar rule is implemented using a class.  I want the code
generation to be a method on that class.  This makes changing the AST
easy.  For example, it was suggested that we might change the grammar to
include a starstar_expr node.  This should be an easy change, but because
of the way every node validates its children, which it expects to have a
certain tree structure, it would be a big task with almost no payoff.


I don't think this is possible with Ply.

> We are using PLY at Dropbox to parse a medium-sized DSL, and while at the
> beginning it was convenient to have the entire language definition in one
> place, there were a fair number of subtle bugs in the earlier stages of the
> project due to the mixing of lexing and parsing. In order to get this right
> it seems you actually have to *think* about the lexing and parsing stages
> differently, and combining them in one tool doesn't actually help you to
> think more clearly.

That's interesting.  I can understand wanting to separate them mentally,
but two problems with separating at a fundamental programmatic level are:
(1) you may want to change a lexical token like number to ? in some cases ?
be LL(1) for who knows what reason; or (2) you would have to implement
lexical hooks differently than parsing hooks.  In some of Andrew's code
below, the tokenize hook loos so different than the parser hook, and I
think that's unfortunate.

> Also, this approach doesn't really do much for the later stages -- you can
> easily construct a parse tree but it's a fairly direct representation of
> the grammar rules, and it offers no help in managing a symbol table or
> generating code.

It would be nice to generate the code in methods on the classes that
implement the grammar rules.  This would allow you to use memos that were
filled in as you were parsing and validating to generate code.

> --
> --Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Sat Jun  6 08:21:14 2015
From: ncoghlan at (Nick Coghlan)
Date: Sat, 6 Jun 2015 16:21:14 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 June 2015 at 15:30, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> The repetition is really a different issue. A different implementation of the same basic design Python already has could make it so you only have to write explicit code for the 3 places the CST->AST node doesn't follow the same rules as everywhere else and the dozen or so places where the AST has to be post-validated, instead of having to write explicit code for both sides of every single node type. And that kind of cleanup could be done without breaking backward compatibility, because the interfaces on each side of the code would be unchanged. But that's also a lot less fun of a change than writing a whole new parser, so I wouldn't be surprised if nobody ever did it?

Eugene Toder had a decent go at introducing more autogeneration into
the code generation code a few years ago as part of building out an
AST level optimiser:

The basic concepts Eugene introduced still seem sound to me, there'd
just be some work in bringing the patches up to date to target 3.6.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abarnert at  Sat Jun  6 09:17:47 2015
From: abarnert at (Andrew Barnert)
Date: Sat, 6 Jun 2015 00:17:47 -0700
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

> On Jun 5, 2015, at 22:50, Neil Girdhar <mistersheik at> wrote:
>> On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert <abarnert at> wrote:
>> First, I think your idea is almost completely tangential to mine. Yes, if you completely replaced both the interface and the implementation of the parser, you could do just about anything you wanted. But assuming nobody is going to completely replace the way Python does parsing today, I think it's still useful to add the one missing useful hook to the existing system. But let's continue.
>> On Friday, June 5, 2015 7:08 PM, Neil Girdhar <mistersheik at> wrote:
>> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert <abarnert at> wrote:
>> >
>> If you want more background, see
>> (which I wrote to explain to someone else how floatliteralhack works).
> Yes.  I want to point that if the lexer rules were alongside the parser, they would be generating ast nodes ? so the hook for calling Decimal for all floating point tokens would be doable in the same way as your AST hook. 

No. The way Python currently exposes things, the AST hook runs on an already-generated AST and transforms it into another one, to hand off to the code generator. That means it can only be used to handle things that parse as legal Python syntax (unless you replace the entire parser).

What I want is a way to similarly take an already-generated token stream and transform it into another one, to hand off to the parser. That will allow it to be used to handle things that lex as legal Python tokens but don't parse as legal Python syntax, like what Paul suggested. Merging lexing into parsing not only doesn't give me that, it makes that impossible.

> For the new tokens that you want, the ideal solution I think is to modify the python parsing grammar before it parses the text.

But I don't want any new tokens. I just want to change the way existing tokens are interpreted.

Just as with an AST hook like PyMacro, I don't want any new nodes, I just want to change the way existing nodes are interpreted.

>> Of course I'm not presenting this as an ideal design if I were starting Python from scratch, but as the best design given what Python is already producing and consuming (a stream of tokens that's guaranteed to be equivalent to what you get out of the tokenize module).
> This is like saying "I want to change some things, but not other things".

That's exactly what I'm saying. In particular, I want to change as few things as possible, to get what I want, without breaking stuff that already demonstrably works and has worked for decades.

> I don't know about OMeta, but the Earley parsing algorithm is worst-cast cubic time "quadratic time for unambiguous grammars, and linear time for almost all LR(k) grammars".

I don't know why you'd want to use Earley for parsing a programming language. IIRC, it was the first algorithm that could handle rampant ambiguity in polynomial time, but that isn't relevant to parsing programming languages (especially one like Python, which was explicitly designed to be simple to parse), and it isn't relevant to natural languages if you're not still in the 1960s, except in learning the theory and history of parsing. GLR does much better in almost-unambiguous/almost-deterministic languages; CYK can be easily extended with weights (which propagate sensibly, so you can use them for a final judgment, or to heuristically prune alternatives as you go); Valiant is easier to reason about mathematically; etc. And that's just among the parsers in the same basic family as Earley.

Also, the point of OMeta is that it's not just a parsing algorithm, it's a complete system that's been designed and built and is being used to write DSLs, macros, and other language extensions in real-life code in languages like JavaScript and C#. So you don't have to imagine what kind of interface you could present or what it might be like to use it in practice, you can use it and find out. And I think it's in the same basic direction as the kind of interface you want for Python's parser.

>> And I'm pretty sure you could design something with most of the strengths of OMeta without its weaknesses (just using a standard packrat PEG parser instead of an extended PEG parser seems like it would turn most of the exponential productions into explicit errors in the grammar?). Or you could go the opposite way and use GLR and bottom-up callbacks instead of PEG and top-down. Something like that would be a great idea for a research project. But it's probably not a great idea for a proposal to change the core of CPython, at least until someone does that research project.
> Yes, totally agree with you.  So if it were me doing this work, I would put my energy in the research project to write an amazing parser in Python.   And then  I would try to convince the Python team to use that.  I guess we don't disagree at all.

Well, I think we disagree about the value of our time, and about the cost of disruptive changes.

If I have a relatively low-work, almost completely non-disruptive way to definitely get everything I actually need, and a high-work, hugely-disruptive way to probably get what I actually need and also probably get a whole bunch of other useful stuff that I might be able to sell everyone else on if I also did a lot of additional work, that seems like a no-brainer to me.

In fact, even if I wanted to write an amazing parser library for Python (and I kind of do, but I don't know if I have the time), I still don't think I'd want to suggest it as a replacement for the parser in CPython. Writing all the backward-compat adapters and porting the Python parser over with all its quirks intact and building the tests to prove that it's performance and error handling were strictly better and so on wouldn't be nearly as much fun as other things I could do with it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stefan at  Sat Jun  6 15:36:28 2015
From: stefan at (s.krah)
Date: Sat, 06 Jun 2015 13:36:28 +0000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

Neil Girdhar &lt;mistersheik at; wrote: 
&gt; Along with the grammar, you also give it code that it can execute as it matches each symbol in a rule.  In Python for example, as it matches each argument passed to a function, it would keep track of the count of *args, **kwargs, and  keyword arguments, and regular arguments, and then raise a syntax error if it encounters anything out of order.  Right now that check is done in validate.c, which is really annoying.

Agreed.  For 3.4 it was possible to encode these particular semantics into the grammar
itself, but it would no longer be LL(1).

If I understood correctly, you wanted to handle lexing and parsing together.  How
would the INDENT/DEDENT tokens be generated?

For my private ast generator, I did the opposite: I wanted to formalize the token
preprocessing step, so I have:

    lexer -&gt; parser1 (generates INDENT/DEDENT) -&gt; parser2 (generates the ast directly)

It isn't slower than what is in Python right now and you can hook into the token stream
at any place.

Stefan Krah

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guettliml at  Sat Jun  6 17:28:32 2015
From: guettliml at (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=)
Date: Sat, 06 Jun 2015 17:28:32 +0200
Subject: [Python-ideas] Next steps to get type hinting become reality?
Message-ID: <>

Based on the thead "Better type hinting" here are new questions:

We have: PEP 484, Python2, the standard library and a dream called "type hinting".

Is it possible to get type hinting for the standard library of Python2?

If not, how to get type hinting for the standard library of Python3?

What can students with some spare time do, to improve the current situation?

  Thomas G?ttler


From mistersheik at  Sat Jun  6 18:18:49 2015
From: mistersheik at (Neil Girdhar)
Date: Sat, 6 Jun 2015 12:18:49 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

Maybe if every production has a link to its parent, then the spaces after a
newline followed by statement reduce to indentation followed by statement,
which reduces to indent or dedent  or nothing followed by statement based
on the parent's indentation level?  In other words the parent (a file_input
e.g.) has active control of the grammar of its children?

On Sat, Jun 6, 2015 at 9:36 AM, s.krah <stefan at> wrote:

> *Neil Girdhar <mistersheik at <mistersheik at>>* wrote:
> > Along with the grammar, you also give it code that it can execute as it
> matches each symbol in a rule.  In Python for example, as it matches each
> argument passed to a function, it would keep track of the count of *args,
> **kwargs, and  keyword arguments, and regular arguments, and then raise a
> syntax error if it encounters anything out of order.  Right now that check
> is done in validate.c, which is really annoying.
> Agreed.  For 3.4 it was possible to encode these particular semantics into
> the grammar
> itself, but it would no longer be LL(1).
> If I understood correctly, you wanted to handle lexing and parsing
> together.  How
> would the INDENT/DEDENT tokens be generated?
> For my private ast generator, I did the opposite: I wanted to formalize
> the token
> preprocessing step, so I have:
>     lexer -> parser1 (generates INDENT/DEDENT) -> parser2 (generates the
> ast directly)
> It isn't slower than what is in Python right now and you can hook into the
> token stream
> at any place.
> Stefan Krah
>  --
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at
> For more options, visit
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
> --
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at
> For more options, visit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mistersheik at  Sat Jun  6 18:23:03 2015
From: mistersheik at (Neil Girdhar)
Date: Sat, 6 Jun 2015 12:23:03 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 6, 2015 at 3:17 AM, Andrew Barnert <abarnert at> wrote:

> On Jun 5, 2015, at 22:50, Neil Girdhar <mistersheik at> wrote:
> On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert <abarnert at> wrote:
>> First, I think your idea is almost completely tangential to mine. Yes, if
>> you completely replaced both the interface and the implementation of the
>> parser, you could do just about anything you wanted. But assuming nobody is
>> going to completely replace the way Python does parsing today, I think it's
>> still useful to add the one missing useful hook to the existing system. But
>> let's continue.
>> On Friday, June 5, 2015 7:08 PM, Neil Girdhar <mistersheik at>
>> wrote:
>> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert <abarnert at>
>> wrote:
>> >
>> If you want more background, see
>> (which I wrote to explain to someone else how floatliteralhack works).
> Yes.  I want to point that if the lexer rules were alongside the parser,
> they would be generating ast nodes ? so the hook for calling Decimal for
> all floating point tokens would be doable in the same way as your AST hook.
> No. The way Python currently exposes things, the AST hook runs on an
> already-generated AST and transforms it into another one, to hand off to
> the code generator. That means it can only be used to handle things that
> parse as legal Python syntax (unless you replace the entire parser).
> What I want is a way to similarly take an already-generated token stream
> and transform it into another one, to hand off to the parser. That will
> allow it to be used to handle things that lex as legal Python tokens but
> don't parse as legal Python syntax, like what Paul suggested. Merging
> lexing into parsing not only doesn't give me that, it makes that impossible.

Yes, and I what I was suggesting is for the lexer to return AST nodes, so
it would be fine to process those nodes in the same way.

> For the new tokens that you want, the ideal solution I think is to modify
> the python parsing grammar before it parses the text.
> But I don't want any new tokens. I just want to change the way existing
> tokens are interpreted.
> Just as with an AST hook like PyMacro, I don't want any new nodes, I just
> want to change the way existing nodes are interpreted.
Yes, I see *how* you're trying to solve your problem, but my preference is
to have one kind of hook rather than two kinds by unifying lexing and
parsing.  I think that's more elegant.

> Of course I'm not presenting this as an ideal design if I were starting
>> Python from scratch, but as the best design given what Python is already
>> producing and consuming (a stream of tokens that's guaranteed to be
>> equivalent to what you get out of the tokenize module).
> This is like saying "I want to change some things, but not other things".
> That's exactly what I'm saying. In particular, I want to change as few
> things as possible, to get what I want, without breaking stuff that already
> demonstrably works and has worked for decades.
> I don't know about OMeta, but the Earley parsing algorithm is worst-cast
> cubic time "quadratic time for unambiguous grammars, and linear time for
> almost all LR(k) grammars".
> I don't know why you'd want to use Earley for parsing a programming
> language. IIRC, it was the first algorithm that could handle rampant
> ambiguity in polynomial time, but that isn't relevant to parsing
> programming languages (especially one like Python, which was explicitly
> designed to be simple to parse), and it isn't relevant to natural languages
> if you're not still in the 1960s, except in learning the theory and history
> of parsing. GLR does much better in almost-unambiguous/almost-deterministic
> languages; CYK can be easily extended with weights (which propagate
> sensibly, so you can use them for a final judgment, or to heuristically
> prune alternatives as you go); Valiant is easier to reason about
> mathematically; etc. And that's just among the parsers in the same basic
> family as Earley.

I suggested Earley to mitigate this fear of "exponential backtracking"
since that won't happen in Earley.

> Also, the point of OMeta is that it's not just a parsing algorithm, it's a
> complete system that's been designed and built and is being used to write
> DSLs, macros, and other language extensions in real-life code in languages
> like JavaScript and C#. So you don't have to imagine what kind of interface
> you could present or what it might be like to use it in practice, you can
> use it and find out. And I think it's in the same basic direction as the
> kind of interface you want for Python's parser.
> And I'm pretty sure you could design something with most of the strengths
>> of OMeta without its weaknesses (just using a standard packrat PEG parser
>> instead of an extended PEG parser seems like it would turn most of the
>> exponential productions into explicit errors in the grammar?). Or you could
>> go the opposite way and use GLR and bottom-up callbacks instead of PEG and
>> top-down. Something like that would be a great idea for a research project.
>> But it's probably not a great idea for a proposal to change the core of
>> CPython, at least until someone does that research project.
> Yes, totally agree with you.  So if it were me doing this work, I would
> put my energy in the research project to write an amazing parser in Python.
>   And then  I would try to convince the Python team to use that.  I guess
> we don't disagree at all.
> Well, I think we disagree about the value of our time, and about the cost
> of disruptive changes.
> If I have a relatively low-work, almost completely non-disruptive way to
> definitely get everything I actually need, and a high-work,
> hugely-disruptive way to probably get what I actually need and also
> probably get a whole bunch of other useful stuff that I might be able to
> sell everyone else on if I also did a lot of additional work, that seems
> like a no-brainer to me.
> In fact, even if I wanted to write an amazing parser library for Python
> (and I kind of do, but I don't know if I have the time), I still don't
> think I'd want to suggest it as a replacement for the parser in CPython.
> Writing all the backward-compat adapters and porting the Python parser over
> with all its quirks intact and building the tests to prove that it's
> performance and error handling were strictly better and so on wouldn't be
> nearly as much fun as other things I could do with it.

If you ever decide to write that amazing parser library for Python and want
any help please feel free to let me know.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sat Jun  6 19:13:59 2015
From: guido at (Guido van Rossum)
Date: Sat, 6 Jun 2015 10:13:59 -0700
Subject: [Python-ideas] Next steps to get type hinting become reality?
In-Reply-To: <>
References: <>
Message-ID: <>

The plan is to have volunteers produce stubs for the stdlib, and to
contribute those to typeshed: (that's a
shared resource and will eventually be transferred to the PSF, i.e.

If you want to use type annotations in Python 2, there's this hack:

On Sat, Jun 6, 2015 at 8:28 AM, Thomas G?ttler <guettliml at
> wrote:

> Based on the thead "Better type hinting" here are new questions:
> We have: PEP 484, Python2, the standard library and a dream called "type
> hinting".
> Is it possible to get type hinting for the standard library of Python2?
> If not, how to get type hinting for the standard library of Python3?
> What can students with some spare time do, to improve the current
> situation?
> Regards,
>   Thomas G?ttler
> --
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rymg19 at  Sat Jun  6 19:52:31 2015
From: rymg19 at (Ryan Gonzalez)
Date: Sat, 06 Jun 2015 12:52:31 -0500
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On June 6, 2015 12:29:21 AM CDT, Neil Girdhar <mistersheik at> wrote:
>On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan <ncoghlan at>
>> On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at> wrote:
>> > I'm curious what other people will contribute to this discussion as
>> think
>> > having no great parsing library is a huge hole in Python.  Having
>> would
>> > definitely allow me to write better utilities using Python.
>> The design of *Python's* grammar is deliberately restricted to being
>> parsable with an LL(1) parser. There are a great many static analysis
>> and syntax highlighting tools that are able to take advantage of that
>> simplicity because they only care about the syntax, not the full
>> semantics.
>Given the validation that happens, it's not actually LL(1) though. 
>mostly LL(1) with some syntax errors that are raised for various
>Anyway, no one is suggesting changing the grammar.
>> Anyone actually doing their *own* parsing of something else *in*
>> Python, would be better advised to reach for PLY
>> ( ). PLY is the parser underlying
>>, and hence the highly regarded
>> CFFI library,
>> Other notable parsing alternatives folks may want to look at include
>> and
>> (both of which allow you to use
>> Python code to define your grammar, rather than having to learn a
>> formal grammar notation).
>I looked at ply and pyparsing, but it was impossible to simply parse
>because I couldn't explain to suck up the right number of arguments
>the name of the function.  When it sees a function, it learns how many
>arguments that function needs.  When it sees a function call
>if "\a" takes 2 arguments, then it should only suck up 1 and 2 as
>arguments, and leave 3 as a regular text token. In other words, I
>should be
>able to tell the parser what to expect in code that lives on the rule

Can't you just hack it into the lexer? When the slash is detected, the lexer can treat the following identifier as a function, look up the number of required arguments, and push it onto some sort of stack. Whenever a left bracket is encountered and another argument is needed by the TOS, it returns a special argument opener token.

>The parsing tools you listed work really well until you need to do
>something like (1) the validation step that happens in Python, or (2)
>figuring out exactly where the syntax error is (line and column number)
>(3) ensuring that whitespace separates some tokens even when it's not
>required to disambiguate different parse trees.  I got the impression
>they wanted to make these languages simple for the simple cases, but
>were made too simple and don't allow you to do everything in one simple
>> Regards,
>> Nick.
>> --
>> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
>Python-ideas mailing list
>Python-ideas at
>Code of Conduct:

Sent from my Android device with K-9 Mail. Please excuse my brevity.

From mistersheik at  Sat Jun  6 20:27:14 2015
From: mistersheik at (Neil Girdhar)
Date: Sat, 6 Jun 2015 14:27:14 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>


On Sat, Jun 6, 2015 at 1:52 PM, Ryan Gonzalez <rymg19 at> wrote:

> On June 6, 2015 12:29:21 AM CDT, Neil Girdhar <mistersheik at>
> wrote:
> >On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan <ncoghlan at>
> >wrote:
> >
> >> On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at> wrote:
> >> > I'm curious what other people will contribute to this discussion as
> >I
> >> think
> >> > having no great parsing library is a huge hole in Python.  Having
> >one
> >> would
> >> > definitely allow me to write better utilities using Python.
> >>
> >> The design of *Python's* grammar is deliberately restricted to being
> >> parsable with an LL(1) parser. There are a great many static analysis
> >> and syntax highlighting tools that are able to take advantage of that
> >> simplicity because they only care about the syntax, not the full
> >> semantics.
> >>
> >
> >Given the validation that happens, it's not actually LL(1) though.
> >It's
> >mostly LL(1) with some syntax errors that are raised for various
> >illegal
> >constructs.
> >
> >Anyway, no one is suggesting changing the grammar.
> >
> >
> >> Anyone actually doing their *own* parsing of something else *in*
> >> Python, would be better advised to reach for PLY
> >> ( ). PLY is the parser underlying
> >>, and hence the highly regarded
> >> CFFI library,
> >>
> >> Other notable parsing alternatives folks may want to look at include
> >> and
> >> (both of which allow you to use
> >> Python code to define your grammar, rather than having to learn a
> >> formal grammar notation).
> >>
> >>
> >I looked at ply and pyparsing, but it was impossible to simply parse
> >LaTeX
> >because I couldn't explain to suck up the right number of arguments
> >given
> >the name of the function.  When it sees a function, it learns how many
> >arguments that function needs.  When it sees a function call
> >\a{1}{2}{3},
> >if "\a" takes 2 arguments, then it should only suck up 1 and 2 as
> >arguments, and leave 3 as a regular text token. In other words, I
> >should be
> >able to tell the parser what to expect in code that lives on the rule
> >edges.
> Can't you just hack it into the lexer? When the slash is detected, the
> lexer can treat the following identifier as a function, look up the number
> of required arguments, and push it onto some sort of stack. Whenever a left
> bracket is encountered and another argument is needed by the TOS, it
> returns a special argument opener token.

Your solution is right, but I would implement it in the parser since I want
that kind of generic functionality of dynamic grammar rules to be available

> >
> >The parsing tools you listed work really well until you need to do
> >something like (1) the validation step that happens in Python, or (2)
> >figuring out exactly where the syntax error is (line and column number)
> >or
> >(3) ensuring that whitespace separates some tokens even when it's not
> >required to disambiguate different parse trees.  I got the impression
> >that
> >they wanted to make these languages simple for the simple cases, but
> >they
> >were made too simple and don't allow you to do everything in one simple
> >pass.
> >
> >Best,
> >
> >Neil
> >
> >
> >> Regards,
> >> Nick.
> >>
> >> --
> >> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
> >>
> >
> >
> >------------------------------------------------------------------------
> >
> >_______________________________________________
> >Python-ideas mailing list
> >Python-ideas at
> >
> >Code of Conduct:
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rymg19 at  Sat Jun  6 20:31:46 2015
From: rymg19 at (Ryan Gonzalez)
Date: Sat, 06 Jun 2015 13:31:46 -0500
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On June 6, 2015 1:27:14 PM CDT, Neil Girdhar <mistersheik at> wrote:
>On Sat, Jun 6, 2015 at 1:52 PM, Ryan Gonzalez <rymg19 at> wrote:
>> On June 6, 2015 12:29:21 AM CDT, Neil Girdhar <mistersheik at>
>> wrote:
>> >On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan <ncoghlan at>
>> >wrote:
>> >
>> >> On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at>
>> >> > I'm curious what other people will contribute to this discussion
>> >I
>> >> think
>> >> > having no great parsing library is a huge hole in Python. 
>> >one
>> >> would
>> >> > definitely allow me to write better utilities using Python.
>> >>
>> >> The design of *Python's* grammar is deliberately restricted to
>> >> parsable with an LL(1) parser. There are a great many static
>> >> and syntax highlighting tools that are able to take advantage of
>> >> simplicity because they only care about the syntax, not the full
>> >> semantics.
>> >>
>> >
>> >Given the validation that happens, it's not actually LL(1) though.
>> >It's
>> >mostly LL(1) with some syntax errors that are raised for various
>> >illegal
>> >constructs.
>> >
>> >Anyway, no one is suggesting changing the grammar.
>> >
>> >
>> >> Anyone actually doing their *own* parsing of something else *in*
>> >> Python, would be better advised to reach for PLY
>> >> ( ). PLY is the parser underlying
>> >>, and hence the highly
>> >> CFFI library,
>> >>
>> >> Other notable parsing alternatives folks may want to look at
>> >> and
>> >> (both of which allow you to use
>> >> Python code to define your grammar, rather than having to learn a
>> >> formal grammar notation).
>> >>
>> >>
>> >I looked at ply and pyparsing, but it was impossible to simply parse
>> >LaTeX
>> >because I couldn't explain to suck up the right number of arguments
>> >given
>> >the name of the function.  When it sees a function, it learns how
>> >arguments that function needs.  When it sees a function call
>> >\a{1}{2}{3},
>> >if "\a" takes 2 arguments, then it should only suck up 1 and 2 as
>> >arguments, and leave 3 as a regular text token. In other words, I
>> >should be
>> >able to tell the parser what to expect in code that lives on the
>> >edges.
>> Can't you just hack it into the lexer? When the slash is detected,
>> lexer can treat the following identifier as a function, look up the
>> of required arguments, and push it onto some sort of stack. Whenever
>a left
>> bracket is encountered and another argument is needed by the TOS, it
>> returns a special argument opener token.
>Your solution is right, but I would implement it in the parser since I
>that kind of generic functionality of dynamic grammar rules to be

Unless the parsing library doesn't support that. Like PLY. I believe pycparser also uses the lexer to manage type names.

>> >
>> >The parsing tools you listed work really well until you need to do
>> >something like (1) the validation step that happens in Python, or
>> >figuring out exactly where the syntax error is (line and column
>> >or
>> >(3) ensuring that whitespace separates some tokens even when it's
>> >required to disambiguate different parse trees.  I got the
>> >that
>> >they wanted to make these languages simple for the simple cases, but
>> >they
>> >were made too simple and don't allow you to do everything in one
>> >pass.
>> >
>> >Best,
>> >
>> >Neil
>> >
>> >
>> >> Regards,
>> >> Nick.
>> >>
>> >> --
>> >> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
>> >>
>> >
>> >
>> >
>> >_______________________________________________
>> >Python-ideas mailing list
>> >Python-ideas at
>> >
>> >Code of Conduct:
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.

Sent from my Android device with K-9 Mail. Please excuse my brevity.

From mistersheik at  Sat Jun  6 20:44:38 2015
From: mistersheik at (Neil Girdhar)
Date: Sat, 6 Jun 2015 14:44:38 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

Ryan: I'm trying to figure out how the parsing library should be done ? not
trying to work around other designs.
Stefan: maybe this is a better answer to your question.

So thinking about this more, this is how I think it should be done:

Each grammar rule is expressed as an Iterable.

class FileInput:
    def __init__(self):
        self.indent_level = None

    def match(self):
        while True:
            matched = yield Disjunction(
                [Whitespace(self.indent_level, indent=False), Statement()])
            if matched == '\n':
        yield EndOfFile()

class Suite:
    def __init__(self, indent_level):
        self.indent_level = indent_level

    def match(self):
        yield Disjunction(
            ['\n', Whitespace(self.indent_level, indent=True),
        # dedent is not required because the next statement knows its indent
        # level.

On Sat, Jun 6, 2015 at 9:36 AM, s.krah <stefan at> wrote:

> *Neil Girdhar <mistersheik at <mistersheik at>>* wrote:
> > Along with the grammar, you also give it code that it can execute as it
> matches each symbol in a rule.  In Python for example, as it matches each
> argument passed to a function, it would keep track of the count of *args,
> **kwargs, and  keyword arguments, and regular arguments, and then raise a
> syntax error if it encounters anything out of order.  Right now that check
> is done in validate.c, which is really annoying.
> Agreed.  For 3.4 it was possible to encode these particular semantics into
> the grammar
> itself, but it would no longer be LL(1).
> If I understood correctly, you wanted to handle lexing and parsing
> together.  How
> would the INDENT/DEDENT tokens be generated?
> For my private ast generator, I did the opposite: I wanted to formalize
> the token
> preprocessing step, so I have:
>     lexer -> parser1 (generates INDENT/DEDENT) -> parser2 (generates the
> ast directly)
> It isn't slower than what is in Python right now and you can hook into the
> token stream
> at any place.
> Stefan Krah
>  --
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at
> For more options, visit
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
> --
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at
> For more options, visit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mertz at  Sat Jun  6 21:02:50 2015
From: mertz at (David Mertz)
Date: Sat, 6 Jun 2015 12:02:50 -0700
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 5, 2015 at 9:27 PM, Guido van Rossum <guido at> wrote:

> You're putting a lot of faith in "modern" parsers. I don't know if PLY
> qualifies as such, but it certainly is newer than Lex/Yacc, and it unifies
> the lexer and parser. However I don't think it would be much better for a
> language the size of Python.

PLY doesn't really "unify" the lexer and parser; it just provides both of
them in the same Python package (and uses somewhat similar syntax and
conventions for each).

I wrote a project at my last consulting position to process a fairly
complex DSL (used for code generation to several targets, Python, C++,
Verilog, etc.).  I like PLY, and decided to use that tool; but after a
short while I gave up on the parser part of it, and only used the lexing,
leaving parsing to "hand rolled" code.

I'm sure I *could* have managed to shoehorn in the entire EBNF stuff into
the parsing component of PLY.  But for my own purpose, I found it more
important to do various simplifications and modifications of the token
stream before generating the data structures that defined the eventual
output parameters.  So in this respect, what I did is something like a
simpler version of Python's compilation pipeline.

Actually, what I did was probably terrible practice for parsing purists,
but felt to me like the best "practicality beats purity" approach.  There
were these finite number of constructs in the DSL, and I would simply scan
through the token stream, in several passes, trying to identify a
particular construct, then pulling it out into the relevant data structure
type, and just marking those tokens as "used".  Other passes would look for
other constructs, and in some cases I'd need to resolve a reference to one
kind of construct that wasn't generated until a later pass in a
"unification" step.  There was a bit of duct tape and bailing wire involved
in all of this, but it actually seemed to keep the code as simple as
possible by isolating the code to generate each type of construct.

None of which is actually relevant to what Python should do in its parsing,
just a little bit of rambling thoughts.

Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guettliml at  Sat Jun  6 22:22:15 2015
From: guettliml at (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=)
Date: Sat, 06 Jun 2015 22:22:15 +0200
Subject: [Python-ideas] Next steps to get type hinting become reality?
In-Reply-To: <>
References: <>
Message-ID: <>

Am 06.06.2015 um 19:13 schrieb Guido van Rossum:
> The plan is to have volunteers produce stubs for the stdlib, and to
> contribute those to typeshed: (that's a
> shared resource and will eventually be transferred to the PSF, i.e.
> If you want to use type annotations in Python 2, there's this hack:

typeshed is referenced in the PEP. But something like your above answer to my question
is missing in the PEP 484.

Why not add a new chapter to the PEP with explains this roadmap?

Should I open an issue for pep 484?

  Thomas G?ttler


From abarnert at  Sun Jun  7 00:52:22 2015
From: abarnert at (Andrew Barnert)
Date: Sat, 6 Jun 2015 15:52:22 -0700
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 6, 2015, at 09:23, Neil Girdhar <mistersheik at> wrote:
>> On Sat, Jun 6, 2015 at 3:17 AM, Andrew Barnert <abarnert at> wrote:
>>> On Jun 5, 2015, at 22:50, Neil Girdhar <mistersheik at> wrote:
>>>> On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert <abarnert at> wrote:
>>>> First, I think your idea is almost completely tangential to mine. Yes, if you completely replaced both the interface and the implementation of the parser, you could do just about anything you wanted. But assuming nobody is going to completely replace the way Python does parsing today, I think it's still useful to add the one missing useful hook to the existing system. But let's continue.
>>>> On Friday, June 5, 2015 7:08 PM, Neil Girdhar <mistersheik at> wrote:
>>>> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert <abarnert at> wrote:
>>>> >
>>>> If you want more background, see
>>>> (which I wrote to explain to someone else how floatliteralhack works).
>>> Yes.  I want to point that if the lexer rules were alongside the parser, they would be generating ast nodes ? so the hook for calling Decimal for all floating point tokens would be doable in the same way as your AST hook. 
>> No. The way Python currently exposes things, the AST hook runs on an already-generated AST and transforms it into another one, to hand off to the code generator. That means it can only be used to handle things that parse as legal Python syntax (unless you replace the entire parser).
>> What I want is a way to similarly take an already-generated token stream and transform it into another one, to hand off to the parser. That will allow it to be used to handle things that lex as legal Python tokens but don't parse as legal Python syntax, like what Paul suggested. Merging lexing into parsing not only doesn't give me that, it makes that impossible.
> Yes, and I what I was suggesting is for the lexer to return AST nodes, so it would be fine to process those nodes in the same way. 


Tokens don't form a tree, they form a list. Yes, every linked list is just a degenerate tree, so you could have every "node" just include the next one as a child. But why? Do you want to then the input text into a tree of character nodes?

Python has all kinds of really good tools for dealing with iterables; why take away those tools and force me to work with a more complicated abstraction that Python doesn't have any tools for dealing with?

In the case of the user-defined literal hack, for example, I can use the adjacent-pairs recipe from itertools and my transformation becomes trivial. I did it more explicitly in the hack I uploaded, using a generator function with a for statement, just to make it blindingly obvious what's happening. But if I had to deal with a tree, I'd either have to write explicit lookahead or store some state explicitly on the tree or the visitor. That isn't exactly _hard_, but it's certainly _harder_, and for no benefit.

Also, if we got my change, I could write code that cleanly hooks parsing in 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people can at least use it, and all of the relevant and complicated code would be shared between the two versions. With your change, I'd have to write code that was completely different for 3.6+ than what I could backport, meaning I'd have to write, debug, and maintain two completely different implementations. And again, for no benefit.

And finally, once again: we already have a token stream as part of the process, we already expose every other interesting step in the process, exposing the token stream as it exists today in a way that fits into everything else as it exists today is clearly the easiest and least disruptive thing to do. Sometimes it's worth doing harder or more disruptive things because they provide more benefit, but you haven't yet shown any benefit.

You asked me for examples, and I provided them. Why don't you try writing a couple of actual examples--user literals, the LINQ-ish example from MacroPy, whatever--using your proposed design to show us how they could be simpler, or more elegant, or open up further possibilities. Or come up with an example of something your design could do that the existing one (even with my small proposed change) can't.

>>> For the new tokens that you want, the ideal solution I think is to modify the python parsing grammar before it parses the text.
>> But I don't want any new tokens. I just want to change the way existing tokens are interpreted.
>> Just as with an AST hook like PyMacro, I don't want any new nodes, I just want to change the way existing nodes are interpreted.
> Yes, I see *how* you're trying to solve your problem, but my preference is to have one kind of hook rather than two kinds by unifying lexing and parsing.  I think that's more elegant.

I'm trying to find a way to interpret this that makes sense. I think you're suggesting that we should throw out the idea of letting users write and install simple post-processing hooks in Python, because that will force us to find a way to instead make the entire parsing process user-customizable at runtime, which will force users to come up with "more elegant" solutions involving changing the grammar instead of post-processing it macro-style. 

If so, I think that's a very bad idea. Decades of practice in both Python and many other languages (especially those with built-in macro facilities) shows that post-processing at the relevant level is generally simple and elegant. Even if we had a fully-runtime-customizable parser, something like OMeta but "closing the loop" and implementing the language in the programmable metalanguage, many things are still simpler and more elegant written post-processing style (as used by existing import hooks, including MacroPy, and in other languages going all the way back to Lisp), and there's a much lower barrier to learning them, and there's much less risk of breaking the compiler/interpreter being used to run your hook in the first place. And, even if none of that were true, and your new and improved system really were simpler in every case, and you had actually built it rather than just envisioning it, there's still backward compatibility to think of. Do you really want to break working, documented functionality that people have written things like MacroPy on top of, even if forcing them to redesign and rewrite everything from scratch would force them to come up with a "more elegant" solution? And finally, the added flexibility of such a system is a cost as well as a benefit--the fact that Arc makes it as easy as possible to "rewrite the language into one that makes writing your application trivial" also means that one Arc programmer can't understand another's code until putting in a lot of effort to learn his idiosyncratic language.

>>> I don't know about OMeta, but the Earley parsing algorithm is worst-cast cubic time "quadratic time for unambiguous grammars, and linear time for almost all LR(k) grammars".
>> I don't know why you'd want to use Earley for parsing a programming language. IIRC, it was the first algorithm that could handle rampant ambiguity in polynomial time, but that isn't relevant to parsing programming languages (especially one like Python, which was explicitly designed to be simple to parse), and it isn't relevant to natural languages if you're not still in the 1960s, except in learning the theory and history of parsing. GLR does much better in almost-unambiguous/almost-deterministic languages; CYK can be easily extended with weights (which propagate sensibly, so you can use them for a final judgment, or to heuristically prune alternatives as you go); Valiant is easier to reason about mathematically; etc. And that's just among the parsers in the same basic family as Earley.
> I suggested Earley to mitigate this fear of "exponential backtracking" since that won't happen in Earley.

I already explained that using standard PEG with a packrat parser instead of extended PEG with an OMeta-style parser gives you linear time. Why do you think telling me about a decades-older cubic-time algorithm designed for parsing natural languages that's a direct ancestor to two other algorithms I also already mentioned is going to be helpful? Do you not understand the advantages of PEG or GLR over Earley?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Jun  7 01:04:19 2015
From: guido at (Guido van Rossum)
Date: Sat, 6 Jun 2015 16:04:19 -0700
Subject: [Python-ideas] Next steps to get type hinting become reality?
In-Reply-To: <>
References: <>
Message-ID: <>

If you really think this should be added to the PEP please submit a pull
request. Don't mention the PY2 thing though (it's unofficial).
On Jun 6, 2015 1:22 PM, "Thomas G?ttler" <guettliml at>

> Am 06.06.2015 um 19:13 schrieb Guido van Rossum:
> > The plan is to have volunteers produce stubs for the stdlib, and to
> > contribute those to typeshed:
> (that's a
> > shared resource and will eventually be transferred to the PSF, i.e.
> >
> >
> > If you want to use type annotations in Python 2, there's this hack:
> >
> typeshed is referenced in the PEP. But something like your above answer to
> my question
> is missing in the PEP 484.
> Why not add a new chapter to the PEP with explains this roadmap?
> Should I open an issue for pep 484?
> Regards,
>   Thomas G?ttler
> --
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From cgbeutler at  Sun Jun  7 05:03:38 2015
From: cgbeutler at (Cory Beutler)
Date: Sat, 6 Jun 2015 21:03:38 -0600
Subject: [Python-ideas] If branch merging
Message-ID: <>

I recently(1 year ago) realized that 'if', 'elif', and 'else' provide easy
branching for code, but there is no easy way to merge branches of code back
together. One fix for this would be introduction of two new keywords:
'also' and 'alif' (also if)

Here are the definitions of if-chain keywords with the addition of 'also'
and 'alif':
*if   *- execute code if this condition is met
*else *- execute code if no previous condition is met
*elif *- execute code if no previous condition is met and this condition is
*also *- execute code if any previous condition is met
*alif *- execute code if any previous condition is met and this condition
is met

This would simplify some logic expressions by allowing the merging of
branched code.

*Examples of use:*
*Duplicate code in if-chains may be reduced:*
# Old Version
if a == b:
    print ('a == b')
    foo()             # <-- duplicate code
elif b == c:
    print ('b == c')
    foo()             # <-- duplicate code
elif c == d:
    print ('c == d')
    foo()             # <-- duplicate code

# New Version
if a == b:
    print ('a == b')
elif b == c:
    print ('b == c')
elif c == d:
    print ('c == d')
    foo()            # <-- No longer duplicated

*Many nested 'if' statements could now be a more linear style:*
# Old Version
if a == b:
    print ('a == b')
    if b == c:
        print ('b == c')
    print ('end if')

# New Version
if a == b:
    print ('a == b')
alif b == c:
    print ('b == c')
    print ('end if')

These two examples are the most common ways this will help code. I have
been writing code samples using these keywords and have found that it
simplifies many other things as well. It does take a bit of getting used
to, though. I have found that is is best to use 'also' and 'alif'
sparingly, as overuse can make some code less flexible and more confusing.

*Selective Branch merging:*
One limitation of the 'also' and 'alif' keywords is the restriction to the
"all of the above" checking. What I mean by that is that there is no way to
pick and choose which branches to merge back together. When using 'also'
and 'alif' you are catching all previous if-branches. One easy way to solve
this would be to allow for named branching. The most simple way to do this
is to save the conditions of each branch into a variable with a name. Here
is an example of merging only select branches together:
# Old Version
if a == b:
    print ('a == b')
elif a == c:
    print ('a == c')
elif a == d:
    print ('a == d')
if (a == b) or (a == d):
    print ('a == b and a == d')

# New Version using 'as' keyword
if a == b as aisb:
    print ('a == b')
elif a == c:
    print ('a == c')
elif a == d as aisd:
    print ('a == d')
alif aisd or aisb:
    print ('a == b and a == d')

NOTE: In the old version, it may be necessary to save off the boolean
expression beforehand if the variables being used are changed in the first
set of conditions. This would not be required in the new version, making
the total code written an even larger gap.

I realize that using the 'as' keyword may not be the best. Using 'as' seems
to suggest that it will only be used in the following block. An alternative
to using the 'as' keyword could be assigning the 'if' to a variable like so:
aisb = if a == b:
This looks a bit gross to me, though. If think of a better one, I would
love to see it.

The next logical question to ask is that of speed. Will this slow down my
code at all? I happily submit to you that it shouldn't. If done right, this
may even speed things up by a cycle or two (yeah. I know. That is not
enough to fuss over, but I view it as a side benefit.)
When just using 'also' and 'alif', there should be no difference in speed.
The same number of jumps and checks should be done as before. Naming
branches may add an additional assignment operation, but this should be
more than made up for by not having to calculate the condition more than
once. There may be a few cases where this would be slower, but those can be
optimized to result in old-style code.

I am currently learning how the python parser and lexer work in hopes of
making a custom version containing these features. Because of my lack
of knowledge here, I cannot say how it should be implemented in python
specifically. Here is how each if code block should work in basic english:

on True:
    1. Execute code block
    2. Jump to next 'alif', 'also', or end of if-chain
on False:
    1. jump to next 'elif', 'else', or end of if-chain

Note: 'also' blocks can be useful in more places than just the end of a
if a == b:
    print ('a == b')
elif a == c:
    print ('a == c')
    print ('a == b == c')
    print ('a != b and a != c')

It could also be useful to have more than one 'also' in an if chain.

In contrast, having an 'else' before the end is not so useful. For example,
placing one before an also makes the also a little pointless:
if a == b:
    print ('a == b')
elif a == c:
    print ('a == c')
    print ('a != b and a != c')
    print ('This will always execute')

It would therefore be good to still require that 'else' be the last item in
a chain.

*The End*
Thank you for humoring my idea. I am new to this mailing list, so sorry if
this seems out of line or something. I am currently working with ansi C to
try to get something similar into the 'C' language, but it could be 20
years before anything comes of that. The gears move ofly slow over there.
Anyway, I look forward to hearing your feedback.

-Cory Beutler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From random832 at  Sun Jun  7 05:40:55 2015
From: random832 at (random832 at
Date: Sat, 06 Jun 2015 23:40:55 -0400
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 6, 2015, at 23:03, Cory Beutler wrote:
> # New Version using 'as' keyword
> if a == b as aisb:
>     print ('a == b')

> I realize that using the 'as' keyword may not be the best. Using 'as'
> seems
> to suggest that it will only be used in the following block. An
> alternative
> to using the 'as' keyword could be assigning the 'if' to a variable like
> so:
> aisb = if a == b:
> This looks a bit gross to me, though. If think of a better one, I would
> love to see it.

Well you could always go with if aisb = a == b.

I'm not sure there is a convincing reason to allow your case (assign
within an if statement) that doesn't also work just as well as a general
argument for assignment expressions.

From ben+python at  Sun Jun  7 06:17:36 2015
From: ben+python at (Ben Finney)
Date: Sun, 07 Jun 2015 14:17:36 +1000
Subject: [Python-ideas] If branch merging
References: <>
Message-ID: <>

Cory Beutler <cgbeutler at> writes:

> This would simplify some logic expressions by allowing the merging of
> branched code.

I don't think you've made the case for that assertion. Your description
is clear, but I can't see how real code would be simplified.

I would like to see some real examples of code that is using existing
syntax, and your proposed syntax, so the merits can be discussed in

Can you provide some real-world code examples, that you believe would be
improved by this change?

 \         ?All my life I've had one dream: to achieve my many goals.? |
  `\                                            ?Homer, _The Simpsons_ |
_o__)                                                                  |
Ben Finney

From abarnert at  Sun Jun  7 06:29:35 2015
From: abarnert at (Andrew Barnert)
Date: Sat, 6 Jun 2015 21:29:35 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 6, 2015, at 20:03, Cory Beutler <cgbeutler at> wrote:
> I recently(1 year ago) realized that 'if', 'elif', and 'else' provide easy branching for code, but there is no easy way to merge branches of code back together. One fix for this would be introduction of two new keywords: 'also' and 'alif' (also if)

Can you provide a realistic use case for when you'd want this, instead of just a toy example where you check meaningless variables for equality?

Because in practice, almost every time I've wanted complicated elif chains, deeply nested ifs, or anything else like it, it's been easy to either refactor the code into a function, replace the conditionals with a dict, or both. That isn't _always_ true, but I'm having a hard time coming up with an example where I really do need complicated elif chains and your new syntax would help.

> on True:
>     1. Execute code block
>     2. Jump to next 'alif', 'also', or end of if-chain
> on False:
>     1. jump to next 'elif', 'else', or end of if-chain
> Note: 'also' blocks can be useful in more places than just the end of a chain:
> if a == b:
>     print ('a == b')
> elif a == c:
>     print ('a == c')
> also:
>     print ('a == b == c')
> else:
>     print ('a != b and a != c')

This example seems to do the wrong thing. If a=b=2 and c=3, or a=c=3 and b=2, you're going to print "a == b == c" even though that isn't true. 

This implies that maybe it isn't as easy to think through the logic and keep the conditions in your head as you expected, even in relatively simple cases.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Sun Jun  7 07:19:41 2015
From: steve at (Steven D'Aprano)
Date: Sun, 7 Jun 2015 15:19:41 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 06, 2015 at 09:03:38PM -0600, Cory Beutler wrote:

> This would simplify some logic expressions by allowing the merging of
> branched code.
> *Examples of use:*
> *Duplicate code in if-chains may be reduced:*
> # Old Version
> if a == b:
>     print ('a == b')
>     foo()             # <-- duplicate code
> elif b == c:
>     print ('b == c')
>     foo()             # <-- duplicate code
> elif c == d:
>     print ('c == d')
>     foo()             # <-- duplicate code

if a == b:
    print('a == b')
elif b == c:
    print('b == c')
elif c == d:
    print('c == d')

No new syntax required.

> *Many nested 'if' statements could now be a more linear style:*
> # Old Version
> if a == b:
>     print ('a == b')
>     if b == c:
>         print ('b == c')
>     print ('end if')

What's wrong with that code? Nesting the code like that follows the 
logic of the code: the b==c test *only* occurs if a==b.

> # New Version
> if a == b:
>     print ('a == b')
> alif b == c:
>     print ('b == c')
> also:
>     print ('end if')

I consider this significantly worse. It isn't clear that the comparison 
between b and c is only made if a == b, otherwise it is entirely 

> These two examples are the most common ways this will help code. I have
> been writing code samples using these keywords and have found that it
> simplifies many other things as well. It does take a bit of getting used
> to, though. I have found that is is best to use 'also' and 'alif'
> sparingly, as overuse can make some code less flexible and more confusing.

You don't say.

Are you aware of any languages with this construct?

What are the rules for combining various if...elif...alif...also...else 

> *Selective Branch merging:*
> One limitation of the 'also' and 'alif' keywords is the restriction to the
> "all of the above" checking. What I mean by that is that there is no way to
> pick and choose which branches to merge back together. When using 'also'
> and 'alif' you are catching all previous if-branches. One easy way to solve
> this would be to allow for named branching. The most simple way to do this
> is to save the conditions of each branch into a variable with a name. Here
> is an example of merging only select branches together:
> # Old Version
> if a == b:
>     print ('a == b')
> elif a == c:
>     print ('a == c')
> elif a == d:
>     print ('a == d')
> if (a == b) or (a == d):
>     print ('a == b and a == d')

That code is wrong. Was that an intentional error? The final branch 
prints that a == b == d, but that's not correct, it runs when either 
a == b or a == d, not just when both are true.

Personally, I would write that as:

if a == b or a == d:
    if a == b:
        print('a == b')
        print('a == d')
    print('a == b or a == d')
elif a == c:
    print('a == c')

You do end up comparing a and b for equality twice, but worrying about 
that is likely to be premature optimization. It isn't worth adding 
syntax to the language just for the one time in a million that actually 

> # New Version using 'as' keyword
> if a == b as aisb:
>     print ('a == b')
> elif a == c:
>     print ('a == c')
> elif a == d as aisd:
>     print ('a == d')
> alif aisd or aisb:
>     print ('a == b and a == d')

With this "as" proposal, there's no need for "alif":

if a == b as aisb:
    print('a == b')
elif a == c:
    print('a == c')
elif a == d as aisd:
    print('a == d')
if aisd or aisb:
    print('a == b or a == d')

I think this has been proposed before. 

> I realize that using the 'as' keyword may not be the best. Using 'as' seems
> to suggest that it will only be used in the following block.

Not necessarily. Consider "import module as spam".

> I am currently working with ansi C to
> try to get something similar into the 'C' language, but it could be 20
> years before anything comes of that. The gears move ofly slow over there.

One advantage of Python is that the language does evolve more quickly, 
but still, Python is a fairly conservative language. We don't typically 
add new syntax features unless they solve a problem in an elegant 
fashion that cannot be solved easily without it.


From steve at  Sun Jun  7 07:25:00 2015
From: steve at (Steven D'Aprano)
Date: Sun, 7 Jun 2015 15:25:00 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 06, 2015 at 11:40:55PM -0400, random832 at wrote:

> Well you could always go with if aisb = a == b.

No, that is a terrible design and a source of many bugs in languages 
that allow it.

if a = expr:  ...

Oops, I meant to compare a == expr, instead I assigned the result of the 
expression to a.

I'm not convinced that we need to allow name binding in if/elif clauses, 
but if we do, the Pythonic syntax would be

if a == b as aeqb: ...


From bruce at  Sun Jun  7 07:50:52 2015
From: bruce at (Bruce Leban)
Date: Sat, 6 Jun 2015 22:50:52 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 6, 2015 at 8:03 PM, Cory Beutler <cgbeutler at> wrote:

> *also *- execute code if any previous condition is met
>     <snip>
> Thank you for humoring my idea. I am new to this mailing list, so sorry if
> this seems out of line or something.

Seeing many posts on this list which are repeats of ideas seen many times,
it's nice to see a new idea. I think the difficulty of making this work is
how often you want something only when *all* of the previous conditions are
true yet can't conveniently do it another way (e.g., setting a flag).

Your point about writing conditions multiple time is legitimate and happens
frequently. Here's an example of something where there is a similar
difficulty in writing simple code:

if foo.a == 0:

elif foo.a == 1 and foo.b == 0:

elif foo.a >= 1 and foo.b >= 0 and foo.c = 0:

elif ...

This is a generic example but I've written code like this many times and
there is no simple way to say that all the foo.x values don't need to be
computed more than once. Here it is rewritten to avoid recomputation:

foo_a = foo.a
if foo_a == 0:

    foo_b = foo.b
    if foo_a == 1 and foo_b == 0:


        foo_c = foo.c

        if foo_a >= 1 and foo_b >= 0 and foo_c = 0:


Much harder to follow the logic. A simpler example where the same
recomputation happens is:

x = a and a.b and a.b.c and a.b.c.d

which becomes

x = a and a.b

if x: x = x.c

if x: x = x.d


--- Bruce
Check out my new puzzle book:
Get it free here: (available on iOS)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Sun Jun  7 07:59:13 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 7 Jun 2015 15:59:13 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> Also, if we got my change, I could write code that cleanly hooks parsing in
> 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people can
> at least use it, and all of the relevant and complicated code would be
> shared between the two versions. With your change, I'd have to write code
> that was completely different for 3.6+ than what I could backport, meaning
> I'd have to write, debug, and maintain two completely different
> implementations. And again, for no benefit.

I don't think I've said this explicitly yet, but I'm +1 on the idea of
making it easier to "hack the token stream". As Andew has noted, there
are two reasons this is an interesting level to work at for certain
kinds of modifications:

1. The standard Python tokeniser has already taken care of converting
the byte stream into Unicode code points, and the code point stream
into tokens (including replacing leading whitespace with the
structural INDENT/DEDENT tokens)

2. You get to work with a linear stream of tokens, rather than a
precomposed tree of AST nodes that you have to traverse and keep

If all you're wanting to do is token rewriting, or to push the token
stream over a network connection in preference to pushing raw source
code or fully compiled bytecode, a bit of refactoring of the existing
tokeniser/compiler interface to be less file based and more iterable
based could make that easier to work with.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Sun Jun  7 08:15:14 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 7 Jun 2015 16:15:14 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 7 June 2015 at 15:50, Bruce Leban <bruce at> wrote:
> This is a generic example but I've written code like this many times and
> there is no simple way to say that all the foo.x values don't need to be
> computed more than once.

This is one of the key powers of JIT compilers like PyPy and Numba -
they can detect that a calculation is repeated and avoid repeating it
when the compiler knows the input values haven't changed.

There is no way any syntax addition can compete for clarity with using
existing already clear syntax and speeding its execution up

> Here it is rewritten to avoid recomputation:
> foo_a = foo.a
> if foo_a == 0:
>    ...
> else:
>     foo_b = foo.b
>     if foo_a == 1 and foo_b == 0:
>         ...
>     else:
>         foo_c = foo.c
>         if foo_a >= 1 and foo_b >= 0 and foo_c = 0:
>             ...
>         else:
>             ...
> Much harder to follow the logic.

It's hard to reason about whether or not logic is difficult to follow
when using metasyntactic variables, as they're never self-documenting.
There's also the fact that this *specific* example is why having
expensive-to-calculate values accessed as attributes without some form
of caching is a bad idea - it encourages folks to make their code
harder to read too early in the development process.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From stefan at  Sun Jun  7 12:20:29 2015
From: stefan at (s.krah)
Date: Sun, 07 Jun 2015 10:20:29 +0000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Steven D'Aprano &lt;steve at; wrote:

On Sat, Jun 06, 2015 at 11:40:55PM -0400, random832 at wrote: 
&gt;&gt; Well you could always go with if aisb = a == b. 
&gt; No, that is a terrible design and a source of many bugs in languages 
&gt; that allow it. 
&gt; if a = expr: ... 
&gt; Oops, I meant to compare a == expr, instead I assigned the result of the 
&gt; expression to a. 
In C I've mistyped this perhaps twice, in which case you get a compiler

It's a complete non-issue (and the construct *is* very handy).

Stefan Krah

Python-ideas mailing list 
Python-ideas at 
Code of Conduct: 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From random832 at  Sun Jun  7 12:30:48 2015
From: random832 at (random832 at
Date: Sun, 07 Jun 2015 06:30:48 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 7, 2015, at 01:59, Nick Coghlan wrote:
> 1. The standard Python tokeniser has already taken care of converting
> the byte stream into Unicode code points, and the code point stream
> into tokens (including replacing leading whitespace with the
> structural INDENT/DEDENT tokens)

Remember that balanced brackets are important for this INDENT/DEDENT
transformation. What should the parser do with indentation in the
presence of a hook that consumes a sequence containing unbalanced or
mixed brackets?

From abarnert at  Sun Jun  7 14:18:04 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 7 Jun 2015 05:18:04 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 7, 2015, at 03:20, s.krah <stefan at> wrote:
> Steven D'Aprano <steve at> wrote:
> On Sat, Jun 06, 2015 at 11:40:55PM -0400, random832 at wrote: 
> >> Well you could always go with if aisb = a == b. 
> > No, that is a terrible design and a source of many bugs in languages 
> > that allow it. 
> > if a = expr: ... 
> > Oops, I meant to compare a == expr, instead I assigned the result of the 
> > expression to a. 
In C I've mistyped this perhaps twice

Then you must be an amazing programmer. Or maybe you don't code in C very much. Look through the commit history of any major early C project and you'll find plenty of these errors. Dennis Ritchie made this mistake more than two times just in the Unix source.

Why do you think compilers added the warning? If this really were a non-issue that nobody ever faces in real life, no compiler vendor would have bothered to write a warning that will annoy people far more often than it helps. Or, if someone did just to satisfy some rare clumsy user, nobody else would have copied it.

in which case you get a compiler

Of course you also get the compiler warning when you use this feature _intentionally_, which means it's actually not usable syntax (unless you like to ignore warnings from the compiler, or pepper your code with pragmas). 

Most compilers let you use some variant on the syntax, typically throwing meaningless extra parentheses around the assignment, to make the warning go away. But this implies that C chose the wrong syntax in the first place.

If I were designing a new C-like language, I'd allow declarations, but not assignments, in the if condition (with the variable only live inside the if statement's scope). That would handle what you want 90% of the time, and usually better than the current rule, and would have no chance of confusing an assignment with a comparison, so the compiler warning would go away.

But of course this is irrelevant to Python, which doesn't have variable declarations (or sub-function scopes). In Python, I think not allowing assignment in an if condition was the right choice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Sun Jun  7 14:20:54 2015
From: rosuav at (Chris Angelico)
Date: Sun, 7 Jun 2015 22:20:54 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 7, 2015 at 8:20 PM, s.krah <stefan at> wrote:
> Steven D'Aprano <steve at> wrote:
> On Sat, Jun 06, 2015 at 11:40:55PM -0400, random832 at wrote:
>>> Well you could always go with if aisb = a == b.
>> No, that is a terrible design and a source of many bugs in languages
>> that allow it.
>> if a = expr: ...
>> Oops, I meant to compare a == expr, instead I assigned the result of the
>> expression to a.
> In C I've mistyped this perhaps twice, in which case you get a compiler
> warning.
> It's a complete non-issue (and the construct *is* very handy).

That's as may be, but Steven's still correct that the Pythonic way to
do it would be with "as". In C, assignment is an expression ("=" is an
operator that mutates its LHS and yields a value), but in Python, it
simply isn't, and making it possible to do assignment in an 'if'
condition would be a big change.

with expr as name:
except expr as name:
if expr as name:

Three parallel ways to do something and capture it. It makes
reasonable sense, if someone can come up with a really compelling
use-case. Personally, I'd be more inclined to seek the same thing for
a while loop:

while get_next_value() as value:
# equivalent to
while True:
    value = get_next_value()
    if not value: break

as that's a somewhat more common idiom; but neither is hugely common.


From stefan at  Sun Jun  7 14:55:23 2015
From: stefan at (s.krah)
Date: Sun, 07 Jun 2015 12:55:23 +0000
Subject: [Python-ideas] Wg: Re:  If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Andrew Barnert abarnert at  wrote:

&gt;&gt; In C I've mistyped this perhaps twice

&gt; Then you must be an amazing programmer. Or maybe you don't code in C very much.

Or maybe I don't pontificate on mailing lists all day long.

&gt; Look through the commit history of any major early C project and you'll find plenty of these errors.

Look through the commit history of CPython ...

Stefan Krah

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stefan at  Sun Jun  7 15:08:10 2015
From: stefan at (s.krah)
Date: Sun, 07 Jun 2015 13:08:10 +0000
Subject: [Python-ideas] Wg: Re:  If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Chris Angelico&lt;rosuav at; wrote:

&gt;&gt;&gt; Oops, I meant to compare a == expr, instead I assigned the result of the 
&gt;&gt;&gt; expression to a. 
&gt;&gt; In C I've mistyped this perhaps twice, in which case you get a compiler 
&gt;&gt; warning. 
&gt;&gt; It's a complete non-issue (and the construct *is* very handy). 
&gt; That's as may be, but Steven's still correct that the Pythonic way to 
&gt; do it would be with "as".

I agree.  I was mainly responding to the claim that it's a "major source
of bugs" in languages that allow it.

Stefan Krah

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Sun Jun  7 15:05:26 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 7 Jun 2015 06:05:26 -0700
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 6, 2015, at 22:59, Nick Coghlan <ncoghlan at> wrote:
> On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas
> <python-ideas at> wrote:
>> Also, if we got my change, I could write code that cleanly hooks parsing in
>> 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people can
>> at least use it, and all of the relevant and complicated code would be
>> shared between the two versions. With your change, I'd have to write code
>> that was completely different for 3.6+ than what I could backport, meaning
>> I'd have to write, debug, and maintain two completely different
>> implementations. And again, for no benefit.
> I don't think I've said this explicitly yet, but I'm +1 on the idea of
> making it easier to "hack the token stream". As Andew has noted, there
> are two reasons this is an interesting level to work at for certain
> kinds of modifications:
> 1. The standard Python tokeniser has already taken care of converting
> the byte stream into Unicode code points, and the code point stream
> into tokens (including replacing leading whitespace with the
> structural INDENT/DEDENT tokens)

Actually, as I discovered while trying to hack in the change this afternoon, the C tokenizer doesn't actually take care of conveying the byte stream. It does take care of detecting the encoding, but what it hands to the parsetok function is still encoded bytes.

The Python wrapper does transparently decode for you (in 3.x), but that actually just makes it harder to feed the output back into the parser, because the parser wants encoded bytes. (Also, as I mentioned before, it would be nice if the Python wrapper could just take Unicode in the first place, because the most obvious place to use this is in an import hook, where you can detect and decode the bytes yourself in as single line, and it's easier to just use the string than to encode it to UTF-8 so the tokenizer can detect UTF-8 so either the Python tokenizer wrapper or the C parser can decode it again...).

Anyway, this part was at least easy to temporarily work around; the stumbling block that prevented me from finishing a working implementation this afternoon is a bit hairier. The C tokenizer hands the parser the current line (which can actually be multiple lines) and start and end pointers to characters within that line. It also hands it the current token string, but the parser ignores that and just reads from line+start to line+end. The Python tokenizer, on the other hand, gives you line number and (Unicode-based) column numbers for start and end. Converting those to encoded-bytes offsets isn't _that_ hard... but those are offsets into the original (encoded) line, so the parser is going to see the value of the original token rather than the token value(s) you're trying to substitute, which defeats the entire purpose.

I was able to implement a hacky workaround using untokenize to fake the current line and provide offsets within that, but that means you get garbage from SyntaxErrors, and all your column numbers--and, worse, all your line numbers, if you add in a multi-line token--are off within the AST and bytecode. (And there may be other problems; those are just the ones I saw immediately when I tried it...)

I think what I'm going to try next is to fork the whole parsetok function and write a version that uses the token's string instead of the substring of the line, and start and stop as offsets instead of pointers. I'm still not sure whether the token string and line should be in tok->encoding, UTF-8, UTF-32, or a PyUnicode object, but I'll figure that out as I do it.... Once I get that working for the wrapped-up token iterator, then I can see if I can reunify it with the existing version for the C tokenizer (without any performance penalty, and without breaking pgen). I'd hate to have two copies of that giant function to keep in sync.

Meanwhile, I'm not sure what to do about tokens that don't have the optional start/stop/line values. Maybe just not allow them (just because untokenize can handle it doesn't mean ast.parse has to), or maybe just untokenize a fake line (and if any SyntaxErrors are ugly and undebuggable, well, don't skip those values...). The latter might be useful if for some reason you wanted to generate tokens on the fly instead of just munging a stream of tokens from source text you have available.

I'm also not sure what to do about a few error cases. For example, if you feed the parser something that isn't iterable, or whose values aren't an iterable of iterables of length 2 to 5 with the right types, that really feels more like a TypeError than a SyntaxError (and that would also be a good way to signal the end user that the bug is in the token stream transformer rather than in the source code...), but raising a TypeError from within the parser requires a bit more refactoring (the tokenizer can't tell you what error to raise, just that the current token is an error along with a tokenizer error code--although I code add an E_NOTOK error code that the parser interprets as "raise a TypeError instead of a SyntaxError"), and I'm not sure whether that would affect any other stuff. Anyway, for the first pass I may just leave it as a SyntaxError, just to get something working.

Finally, it might be nice if it were possible to generate a SyntaxError that showed the original source line but also told you that the tokens don't match the source (again, to signal the end user that he should look at what the hook did to his code, not just his code), but I'm not sure how necessary that is, or how easy it will be (it depends on how I end up refactoring parsetok).

> If all you're wanting to do is token rewriting, or to push the token
> stream over a network connection in preference to pushing raw source
> code or fully compiled bytecode

I didn't think about that use case at all, but that could be very handy.

From abarnert at  Sun Jun  7 15:24:27 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 7 Jun 2015 06:24:27 -0700
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 7, 2015, at 03:30, random832 at wrote:
>> On Sun, Jun 7, 2015, at 01:59, Nick Coghlan wrote:
>> 1. The standard Python tokeniser has already taken care of converting
>> the byte stream into Unicode code points, and the code point stream
>> into tokens (including replacing leading whitespace with the
>> structural INDENT/DEDENT tokens)
> Remember that balanced brackets are important for this INDENT/DEDENT
> transformation. What should the parser do with indentation in the
> presence of a hook that consumes a sequence containing unbalanced or
> mixed brackets?

I'm pretty sure that just doing nothing special here means you get a SyntaxError from the parser. Although I probably need more test cases.

Anyway, this is one of those cases I mentioned where the SyntaxError can't actually show you what's wrong with the code, because the actual source doesn't have an error in it, only the transformed token stream. But there are easier ways to get that--just replace a `None` with a `with` in the token stream and you get an error that shows you a perfectly valid line, with no indication that a hook has screwed things up for you.

I think we can at least detect that the tokens don't match the source line and throw in a note to go look for an installed token-transforming hook. It would be even nicer if we could show what the untokenized line looks like, so the user can see why it's an error. Something like this:

      File "<input>", line 1
        if spam is None:
    SyntaxError: invalid syntax
    Tokens do not match input, parsed as
        if spam is with :    

Of course in the specific case you mentioned of unbalanced parens swallowing a dedent, the output still wouldn't be useful, but I'm not sure what we could show usefully in that case anyway.

From random832 at  Sun Jun  7 16:53:01 2015
From: random832 at (random832 at
Date: Sun, 07 Jun 2015 10:53:01 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 7, 2015, at 09:24, Andrew Barnert wrote:
> I'm pretty sure that just doing nothing special here means you get a
> SyntaxError from the parser. Although I probably need more test cases.

I'm actually talking about what happens if the _untransformed_ stream
contains an unbalanced bracket that the hook is supposed to eliminate
(and/or supply the balancing one). My mental model of this idea was that
the "lexer" generates the entire untransformed (but including
indent/dedent magic etc) token sequence, then supplies it to the hook.

From random832 at  Sun Jun  7 17:04:03 2015
From: random832 at (random832 at
Date: Sun, 07 Jun 2015 11:04:03 -0400
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 7, 2015, at 08:20, Chris Angelico wrote:
> with expr as name:
> except expr as name:
> if expr as name:
> Three parallel ways to do something and capture it. It makes
> reasonable sense

The problem is, "with" and "except" create a variable whose scope is
limited to the enclosed block. The proposed "if... as..." does not, so
it's misleading.

If we don't like spelling it as = then invent a new operator, maybe :=
or something. Maybe even as.

My wider point, though, was that there's no argument for the
_functionality_ of allowing an assignment of the boolean condition of an
if statement that can't be generalized to allowing inline assignment of
anything (why not e.g. "if (expr as foo) > 5"? That, unlike the boolean,
might even be something that would be useful within the enclosed block.)

From rosuav at  Sun Jun  7 17:29:01 2015
From: rosuav at (Chris Angelico)
Date: Mon, 8 Jun 2015 01:29:01 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 8, 2015 at 1:04 AM, <random832 at> wrote:
> On Sun, Jun 7, 2015, at 08:20, Chris Angelico wrote:
> > with expr as name:
> > except expr as name:
> > if expr as name:
> >
> > Three parallel ways to do something and capture it. It makes
> > reasonable sense
> The problem is, "with" and "except" create a variable whose scope is
> limited to the enclosed block. The proposed "if... as..." does not, so
> it's misleading.

Actually, they don't. A with block tends to create a broad expectation
that the object will be used within that block, but it isn't scoped,
and sometimes there's "one last use" of something outside of the block
- for instance, a Timer context manager which uses __enter__ to start
timing, __exit__ to stop timing, and then has attributes "wall" and
"cpu" to tell you how much wall time and CPU time were used during
that block. There are a lot of context managers that might as well
have disappeared at the end of the with block (open files, psycopg2
cursors (but not connections), thread locks, etc), but they are
technically still around.

The "except" case is a slightly different one. Yes, the name is valid
only within that block - but it's not a matter of scope, it's a
deliberate unsetting.

>>> e = 2.718281828
>>> try: 1/0
... except Exception as e: pass
>>> e
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'e' is not defined

There's no concept of nested/limited scope here, although I'm sure
that this particular case could be turned into a subscope without
breaking anyone's code (I honestly cannot imagine there being ANY code
that depends on the name getting unset!), if Python ever grows support
for subscopes that aren't associated with nested functions.
Comprehensions do actually create their own scopes, but that's
actually implemented with a function:

>>> e = 2.718281828
>>> [e*2 for e in range(3)]
[0, 2, 4]
>>> e
>>> dis.dis(lambda: [e*2 for e in range(3)])
  1           0 LOAD_CONST               1 (<code object <listcomp> at
0x7fa4f1dae5d0, file "<stdin>", line 1>)
              3 LOAD_CONST               2 ('<lambda>.<locals>.<listcomp>')
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              0 (range)
             12 LOAD_CONST               3 (3)
             15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             18 GET_ITER
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 RETURN_VALUE

Hence, the most logical way to handle conditions with 'as' clauses is
to have them in the same scope. The special case for exceptions is
because tracebacks would create refloops with the locals, and catching
exceptions is extremely common. Nothing else needs that special case,
so everything else can follow the 'with' block model and leave the
name bound.


From ncoghlan at  Sun Jun  7 17:54:18 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 01:54:18 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 01:29, Chris Angelico <rosuav at> wrote:
> There's no concept of nested/limited scope here, although I'm sure
> that this particular case could be turned into a subscope without
> breaking anyone's code (I honestly cannot imagine there being ANY code
> that depends on the name getting unset!), if Python ever grows support
> for subscopes that aren't associated with nested functions.

The unsetting of bound exceptions is also merely a language quirk
introduced to cope with the implicit exception chaining introduced in
PEP 3134. The circular reference from the traceback frame back to the
bound exception caused a lot of uncollectable cycles that were
resolved by automatically dropping the frame's reference to the bound


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Sun Jun  7 18:12:28 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 02:12:28 +1000
Subject: [Python-ideas] Wg: Re: If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 7 June 2015 at 22:55, s.krah <stefan at> wrote:
> Andrew Barnert abarnert at  wrote:
>>> In C I've mistyped this perhaps twice
>> Then you must be an amazing programmer. Or maybe you don't code in C very
>> much.
> Or maybe I don't pontificate on mailing lists all day long.

No need for that, folks. (And Stefan, you're definitely on the first
half of Andrew's either/or statement there - not everyone is going to
be aware that you wrote cdecimal, redesigned the memoryview
implementation, etc).

The C/C++ embedded assignment construct *is* a major source of bugs
unless you really know what you're doing, which is why the compiler
developers eventually relented and introduced a warning for it. It's a
construct that trades clarity for brevity, so using it isn't always a
clear win from a maintainability perspective. It's also heavily
reliant on C's behaviour where assignment expression have a truth
value that matches that or the RHS of the assignment expression, and
the resulting longstanding conventions that have built up to take
advantage of that fact.

In Python, we've never had those conventions, so the alignment between
"truth value you want to test" and "value you want to work with" isn't
anywhere near as strong. In particular, testing for None via bool() is
considered incorrect, while testing for a NULL pointer in C++ with a
truth test is far more common.

To get the same kind of utility as C/C++ embedded assignments provide,
you really do need arbitrary embedded assignments. Only being able to
name the result of an if or while clause would be too limiting to be
useful, since you'd still need to find other ways to handle the cases
where the value to be tested and the value you want to work with
aren't quite the same.

That's why this particular idea is oft-discussed-but-never-accepted -
there are certain looping and conditional constructs that *do* become
more convoluted in its absence, but the benefit of leaving it out is
that those more convoluted constructs are needed to handle the general
case anyway, so the special case would be an additional thing to learn
that wouldn't actually make the language substantially more expressive
than it already is.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From rosuav at  Sun Jun  7 18:11:21 2015
From: rosuav at (Chris Angelico)
Date: Mon, 8 Jun 2015 02:11:21 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 8, 2015 at 1:54 AM, Nick Coghlan <ncoghlan at> wrote:
> On 8 June 2015 at 01:29, Chris Angelico <rosuav at> wrote:
>> There's no concept of nested/limited scope here, although I'm sure
>> that this particular case could be turned into a subscope without
>> breaking anyone's code (I honestly cannot imagine there being ANY code
>> that depends on the name getting unset!), if Python ever grows support
>> for subscopes that aren't associated with nested functions.
> The unsetting of bound exceptions is also merely a language quirk
> introduced to cope with the implicit exception chaining introduced in
> PEP 3134. The circular reference from the traceback frame back to the
> bound exception caused a lot of uncollectable cycles that were
> resolved by automatically dropping the frame's reference to the bound
> exception.

Right. I know the reason for it, and it's a special case for
exceptions because of the traceback. (Though I'm not sure why
exception chaining causes this. Was that the first place where the
traceback - with its reference to locals - was made a part of the
exception object?) If Python had a concept of nested scopes within
functions, it'd make equal sense to have the "except X as e:" subscope
shadow, rather than overwriting and unsetting, the outer "e". Since
neither that nor the list comprehension is implemented with nested
scopes, I think it's safe to say that "if cond as e:" wouldn't be


From robertc at  Mon Jun  8 00:19:05 2015
From: robertc at (Robert Collins)
Date: Mon, 8 Jun 2015 10:19:05 +1200
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 June 2015 at 17:00, Nick Coghlan <ncoghlan at> wrote:
> On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at> wrote:
>> I'm curious what other people will contribute to this discussion as I think
>> having no great parsing library is a huge hole in Python.  Having one would
>> definitely allow me to write better utilities using Python.
> The design of *Python's* grammar is deliberately restricted to being
> parsable with an LL(1) parser. There are a great many static analysis
> and syntax highlighting tools that are able to take advantage of that
> simplicity because they only care about the syntax, not the full
> semantics.
> Anyone actually doing their *own* parsing of something else *in*
> Python, would be better advised to reach for PLY
> ( ). PLY is the parser underlying
>, and hence the highly regarded
> CFFI library,
> Other notable parsing alternatives folks may want to look at include
> and
> (both of which allow you to use
> Python code to define your grammar, rather than having to learn a
> formal grammar notation).

Let me just pimp here - I have
written languages in both Parsely (a simple packaging metadata
language) and its predecessor pymeta (in which I wrote pybars -
handlebars.js for python) - and both were good implementations of


Robert Collins <rbtcollins at>
Distinguished Technologist
HP Converged Cloud

From greg.ewing at  Mon Jun  8 00:21:57 2015
From: greg.ewing at (Greg Ewing)
Date: Mon, 08 Jun 2015 10:21:57 +1200
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

random832 at wrote:
> I'm actually talking about what happens if the _untransformed_ stream
> contains an unbalanced bracket that the hook is supposed to eliminate

I'm of the opinion that designing an input language
to require or allow unmatched brackets is a bad
idea. If nothing else, it causes grief for editors
that do bracket matching.


From cgbeutler at  Mon Jun  8 03:06:42 2015
From: cgbeutler at (Cory Beutler)
Date: Sun, 7 Jun 2015 19:06:42 -0600
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Thank you all for your responses. I didn't realize how much support this
mailing list had.

In response to several responses:

It appears I have hit a soft spot with the 'as' keyword. It seems clear to
me that inlining an assignment confuses scope. With any inline solution,
that confusion will exist. Now, I will say that I do not like 'if aisb = a
== b' because of the potential errors, as others have mentioned. A language
should be written as much for the beginners as the experts, or it will
never live very long. Avoiding absentminded mistakes is always good to do.
There are many other possible solutions from a comma, as in "if a == b,
aisb:", to a custom language addition of a new keyword or operator.
Irregardless of how inline assignment is written, the scope issue will
still exist. As such, it is more important to decide if it is needed first.
The fact that this idea has been brought up before means that it deserves
some research. Perhaps I can do some analytics and return with more info on
where it could be used and if it will actually provide any speed benefits.

Ok, that was a bit of a shotgun response to many remarks. Hopefully it will
suffice. Thanks again for all the feedback.

I would now like to respond to Steven's response directly:

On Sat, Jun 6, 2015 at 11:19 PM, Steven D'Aprano <steve at>

> On Sat, Jun 06, 2015 at 09:03:38PM -0600, Cory Beutler wrote:
> [...]
> > This would simplify some logic expressions by allowing the merging of
> > branched code.
> >
> > *Examples of use:*
> > *Duplicate code in if-chains may be reduced:*
> > # Old Version
> > if a == b:
> >     print ('a == b')
> >     foo()             # <-- duplicate code
> > elif b == c:
> >     print ('b == c')
> >     foo()             # <-- duplicate code
> > elif c == d:
> >     print ('c == d')
> >     foo()             # <-- duplicate code
> if a == b:
>     print('a == b')
> elif b == c:
>     print('b == c')
> elif c == d:
>     print('c == d')
> foo()
> No new syntax required.
The functionally is not the same. In your example 'foo' gets called even if
none of the conditions are true. The above example only runs 'foo' if it
enters one of the if-blocks.
This basic layout is useful for various parsing and reading operations. It
is nice to check if something fits various conditions, then after the
specific handling, add on finishing details in a 'foo'-like context.

> > *Many nested 'if' statements could now be a more linear style:*
> > # Old Version
> > if a == b:
> >     print ('a == b')
> >     if b == c:
> >         print ('b == c')
> >     print ('end if')
> What's wrong with that code? Nesting the code like that follows the
> logic of the code: the b==c test *only* occurs if a==b.
> > # New Version
> > if a == b:
> >     print ('a == b')
> > alif b == c:
> >     print ('b == c')
> > also:
> >     print ('end if')
> I consider this significantly worse. It isn't clear that the comparison
> between b and c is only made if a == b, otherwise it is entirely
> skipped.
It may only be worse because you are not used to reading it. This type of
syntax looks simple once you know how the pieces work. I mean, you know
that having multiple if-elif statements will result in only checking
conditions until one passes. The 'also' mentality would be the same, but

> *Selective Branch merging:*
> > One limitation of the 'also' and 'alif' keywords is the restriction to
> the
> > "all of the above" checking. What I mean by that is that there is no way
> to
> > pick and choose which branches to merge back together. When using 'also'
> > and 'alif' you are catching all previous if-branches. One easy way to
> solve
> > this would be to allow for named branching. The most simple way to do
> this
> > is to save the conditions of each branch into a variable with a name.
> Here
> > is an example of merging only select branches together:
> > # Old Version
> > if a == b:
> >     print ('a == b')
> > elif a == c:
> >     print ('a == c')
> > elif a == d:
> >     print ('a == d')
> > if (a == b) or (a == d):
> >     print ('a == b and a == d')
> That code is wrong. Was that an intentional error? The final branch
> prints that a == b == d, but that's not correct, it runs when either
> a == b or a == d, not just when both are true.

Yeah, that was a mistype. That is why I shouldn't program fake code late at
night. It does warm my heart to see that 2 people have corrected my fake
code. That means it is easy to learn and understand.

> Personally, I would write that as:
> if a == b or a == d:
>     if a == b:
>         print('a == b')
>     else:
>         print('a == d')
>     print('a == b or a == d')
> elif a == c:
>     print('a == c')
> You do end up comparing a and b for equality twice, but worrying about
> that is likely to be premature optimization. It isn't worth adding
> syntax to the language just for the one time in a million that actually
> matters.
With that rearrangement, you could write it with 'also':
if a == b:
    print('a == b')
elif a == d:
    print('a == d')
    print('a == b or a == d')
elif a == c:
    print('a == c')

but that does not demonstrate the selective branch merging. It would not
work so well to combine the next two 'elif' statements, as any other 'also'
blocks would capture the 'a == b', 'a == d', and first 'also' branches. I
guess what I mean to say is that this example is a little too dumbed down.
I think you know what I am after, though. If 'a == b' is a heavy duty
calculation, it would be nice to be able to store that inline.

Thank you, Steven, for your objective view of things. It is has been useful
to see an outside perspective. I look forward to your future input.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From random832 at  Mon Jun  8 03:37:50 2015
From: random832 at (random832 at
Date: Sun, 07 Jun 2015 21:37:50 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 7, 2015, at 18:21, Greg Ewing wrote:
> random832 at wrote:
> > I'm actually talking about what happens if the _untransformed_ stream
> > contains an unbalanced bracket that the hook is supposed to eliminate
> I'm of the opinion that designing an input language
> to require or allow unmatched brackets is a bad
> idea. If nothing else, it causes grief for editors
> that do bracket matching.

Suppose one of the brackets is quoted somehow, or mismatched bracket
pairs such as [this), perhaps including tokens that are not normally
considered brackets, are somehow meaningful.

From random832 at  Mon Jun  8 03:40:00 2015
From: random832 at (random832 at
Date: Sun, 07 Jun 2015 21:40:00 -0400
Subject: [Python-ideas] If branch merging
Message-ID: <>

On Sun, Jun 7, 2015, at 21:06, Cory Beutler wrote:
> Thank you all for your responses. I didn't realize how much support this
> mailing list had.
> In response to several responses:
> It appears I have hit a soft spot with the 'as' keyword. 

I don't have an issue with the as keyword, I was just pointing out that
it disguises the fact that what you're really asking for seems to be
general assignment expressions, since there is no particular rationale
to constrain it to the boolean condition of if statements.

From random832 at  Mon Jun  8 03:42:14 2015
From: random832 at (random832 at
Date: Sun, 07 Jun 2015 21:42:14 -0400
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 7, 2015, at 21:06, Cory Beutler wrote:
> Thank you all for your responses. I didn't realize how much support this
> mailing list had.
> In response to several responses:
> It appears I have hit a soft spot with the 'as' keyword. 

I don't have an issue with the as keyword, I was just pointing out that
it disguises the fact that what you're really asking for seems to be
general assignment expressions, since there is no particular rationale
to constrain it to the boolean condition of if statements.

From ncoghlan at  Mon Jun  8 04:08:46 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 12:08:46 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 Jun 2015 11:07, "Cory Beutler" <cgbeutler at> wrote:
> Thank you all for your responses. I didn't realize how much support this
mailing list had.
> In response to several responses:
> It appears I have hit a soft spot with the 'as' keyword. It seems clear
to me that inlining an assignment confuses scope. With any inline solution,
that confusion will exist.

Not really, as we have a number of inline assignment and renaming
constructs, and they all use "as" (import, with statements, exception
handlers). For loops, function definitions and class definitions also help
establish the behaviour of name bindings in compound statement header lines
affecting the containing scope rather than only affecting the internal

The exception handler case is the odd one out, since that includes an
implied "del" whenever execution leaves the cobtained suite.

Any form of inline assignment that doesn't use "as NAME" will need a good

(It's also worth noting that "as" clauses are specifically for binding to a
name, while the LHS of an assignment statement allows attributes, indexing,
slicing and tuple unpacking)

> Avoiding absentminded mistakes is always good to do. There are many other
possible solutions from a comma, as in "if a == b, aisb:", to a custom
language addition of a new keyword or operator.

Commas are generally out, due to the ambiguity with tuple construction.

> Irregardless of how inline assignment is written, the scope issue will
still exist. As such, it is more important to decide if it is needed first.
The fact that this idea has been brought up before means that it deserves
some research. Perhaps I can do some analytics and return with more info on
where it could be used and if it will actually provide any speed benefits.

In this particular case, the variant that has always seemed most attractive
to me in past discussions is a general purpose "named subexpression"
construct that's just a normal local name binding operation affecting
whatever namespace the expression is executed in.

In the simple if statement case, it wouldn't be much different from having
a separate assignment statement before the if statement, but in a while
loop it would be executed on each iteration, in an elif it could make the
results of subcalculations available to subsequent elif clauses without
additional nesting, and in the conditional expression and comprehension
cases it could make part of the condition calculation available to the
result calculation.

It would certainly be possible for folks to go overboard with such a
construct and jam way too much into a single expression for it to be
readable, but that's already the case today, and the way to handle it would
remain the same: refactoring the relevant code to make it easier for
readers to follow and hence maintain.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mistersheik at  Mon Jun  8 04:20:06 2015
From: mistersheik at (Neil Girdhar)
Date: Sun, 7 Jun 2015 22:20:06 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 6, 2015 at 6:52 PM, Andrew Barnert <abarnert at> wrote:

> On Jun 6, 2015, at 09:23, Neil Girdhar <mistersheik at> wrote:
> On Sat, Jun 6, 2015 at 3:17 AM, Andrew Barnert <abarnert at> wrote:
>> On Jun 5, 2015, at 22:50, Neil Girdhar <mistersheik at> wrote:
>> On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert <abarnert at>
>> wrote:
>>> First, I think your idea is almost completely tangential to mine. Yes,
>>> if you completely replaced both the interface and the implementation of the
>>> parser, you could do just about anything you wanted. But assuming nobody is
>>> going to completely replace the way Python does parsing today, I think it's
>>> still useful to add the one missing useful hook to the existing system. But
>>> let's continue.
>>> On Friday, June 5, 2015 7:08 PM, Neil Girdhar <mistersheik at>
>>> wrote:
>>> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert <abarnert at>
>>> wrote:
>>> >
>>> If you want more background, see
>>> (which I wrote to explain to someone else how floatliteralhack works).
>> Yes.  I want to point that if the lexer rules were alongside the parser,
>> they would be generating ast nodes ? so the hook for calling Decimal for
>> all floating point tokens would be doable in the same way as your AST hook.
>> No. The way Python currently exposes things, the AST hook runs on an
>> already-generated AST and transforms it into another one, to hand off to
>> the code generator. That means it can only be used to handle things that
>> parse as legal Python syntax (unless you replace the entire parser).
>> What I want is a way to similarly take an already-generated token stream
>> and transform it into another one, to hand off to the parser. That will
>> allow it to be used to handle things that lex as legal Python tokens but
>> don't parse as legal Python syntax, like what Paul suggested. Merging
>> lexing into parsing not only doesn't give me that, it makes that impossible.
> Yes, and I what I was suggesting is for the lexer to return AST nodes, so
> it would be fine to process those nodes in the same way.
> Seriously?
> Tokens don't form a tree, they form a list. Yes, every linked list is just
> a degenerate tree, so you could have every "node" just include the next one
> as a child. But why? Do you want to then the input text into a tree of
> character nodes?

When the work done by the lexer is done by the parser, the characters
contained in a lexical node will be siblings.  Typically, in Python a tree
is represented as nodes with iterable children, so the characters would
just be a string.

> Python has all kinds of really good tools for dealing with iterables; why
> take away those tools and force me to work with a more complicated
> abstraction that Python doesn't have any tools for dealing with?

The stream would still be just an iterable.

> In the case of the user-defined literal hack, for example, I can use the
> adjacent-pairs recipe from itertools and my transformation becomes trivial.
> I did it more explicitly in the hack I uploaded, using a generator function
> with a for statement, just to make it blindingly obvious what's happening.
> But if I had to deal with a tree, I'd either have to write explicit
> lookahead or store some state explicitly on the tree or the visitor. That
> isn't exactly _hard_, but it's certainly _harder_, and for no benefit.
> Also, if we got my change, I could write code that cleanly hooks parsing
> in 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people
> can at least use it, and all of the relevant and complicated code would be
> shared between the two versions. With your change, I'd have to write code
> that was completely different for 3.6+ than what I could backport, meaning
> I'd have to write, debug, and maintain two completely different
> implementations. And again, for no benefit.
> And finally, once again: we already have a token stream as part of the
> process, we already expose every other interesting step in the process,
> exposing the token stream as it exists today in a way that fits into
> everything else as it exists today is clearly the easiest and least
> disruptive thing to do. Sometimes it's worth doing harder or more
> disruptive things because they provide more benefit, but you haven't yet
> shown any benefit.
You would still be able to do all this stuff.

> You asked me for examples, and I provided them. Why don't you try writing
> a couple of actual examples--user literals, the LINQ-ish example from
> MacroPy, whatever--using your proposed design to show us how they could be
> simpler, or more elegant, or open up further possibilities. Or come up with
> an example of something your design could do that the existing one (even
> with my small proposed change) can't.
If I find time, I'll do that.  I will explain my solution in another

> For the new tokens that you want, the ideal solution I think is to modify
>> the python parsing grammar before it parses the text.
>> But I don't want any new tokens. I just want to change the way existing
>> tokens are interpreted.
>> Just as with an AST hook like PyMacro, I don't want any new nodes, I just
>> want to change the way existing nodes are interpreted.
> Yes, I see *how* you're trying to solve your problem, but my preference is
> to have one kind of hook rather than two kinds by unifying lexing and
> parsing.  I think that's more elegant.
> I'm trying to find a way to interpret this that makes sense. I think
> you're suggesting that we should throw out the idea of letting users write
> and install simple post-processing hooks in Python, because that will force
> us to find a way to instead make the entire parsing process
> user-customizable at runtime, which will force users to come up with "more
> elegant" solutions involving changing the grammar instead of
> post-processing it macro-style.
> If so, I think that's a very bad idea. Decades of practice in both Python
> and many other languages (especially those with built-in macro facilities)
> shows that post-processing at the relevant level is generally simple and
> elegant. Even if we had a fully-runtime-customizable parser, something like
> OMeta but "closing the loop" and implementing the language in the
> programmable metalanguage, many things are still simpler and more elegant
> written post-processing style (as used by existing import hooks, including
> MacroPy, and in other languages going all the way back to Lisp), and
> there's a much lower barrier to learning them, and there's much less risk
> of breaking the compiler/interpreter being used to run your hook in the
> first place. And, even if none of that were true, and your new and improved
> system really were simpler in every case, and you had actually built it
> rather than just envisioning it, there's still backward compatibility to
> think of. Do you really want to break working, documented functionality
> that people have written things like MacroPy on top of, even if forcing
> them to redesign and rewrite everything from scratch would force them to
> come up with a "more elegant" solution? And finally, the added flexibility
> of such a system is a cost as well as a benefit--the fact that Arc makes it
> as easy as possible to "rewrite the language into one that makes writing
> your application trivial" also means that one Arc programmer can't
> understand another's code until putting in a lot of effort to learn his
> idiosyncratic language.

I understand that you are motivated by a specific problem.  However, your
solution does not solve the general problem.  If you only allow
transformations of the token stream, the token set is fixed.
Transformations of the token stream also hide ? even in your example ? the
fact that you're actually building what is conceptually a subtree.

It makes more sense to me to solve the problem in general, once and for
all.  (1) Make it easy to change the grammar, and (2) make lexing part of
the grammar.  Now, you don't have to change the grammar to solve some
problems.  Sometimes, you can just use AST transformers to accomplish what
you were doing with a lexical transformer.  That's nice because it's one
less thing to learn.

Sometimes, you need the power of changing the grammar.  That is already
coming with the popularity of languages like Theano.  I really want to
transform Python code into Theano, and for that it may be more elegant to
change the grammar.

> I don't know about OMeta, but the Earley parsing algorithm is worst-cast
>> cubic time "quadratic time for unambiguous grammars, and linear time for
>> almost all LR(k) grammars".
>> I don't know why you'd want to use Earley for parsing a programming
>> language. IIRC, it was the first algorithm that could handle rampant
>> ambiguity in polynomial time, but that isn't relevant to parsing
>> programming languages (especially one like Python, which was explicitly
>> designed to be simple to parse), and it isn't relevant to natural languages
>> if you're not still in the 1960s, except in learning the theory and history
>> of parsing. GLR does much better in almost-unambiguous/almost-deterministic
>> languages; CYK can be easily extended with weights (which propagate
>> sensibly, so you can use them for a final judgment, or to heuristically
>> prune alternatives as you go); Valiant is easier to reason about
>> mathematically; etc. And that's just among the parsers in the same basic
>> family as Earley.
> I suggested Earley to mitigate this fear of "exponential backtracking"
> since that won't happen in Earley.
> I already explained that using standard PEG with a packrat parser instead
> of extended PEG with an OMeta-style parser gives you linear time. Why do
> you think telling me about a decades-older cubic-time algorithm designed
> for parsing natural languages that's a direct ancestor to two other
> algorithms I also already mentioned is going to be helpful? Do you not
> understand the advantages of PEG or GLR over Earley?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Mon Jun  8 04:18:30 2015
From: steve at (Steven D'Aprano)
Date: Mon, 8 Jun 2015 12:18:30 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 07, 2015 at 07:06:42PM -0600, Cory Beutler wrote:

> The functionally is not the same. In your example 'foo' gets called even if
> none of the conditions are true. The above example only runs 'foo' if it
> enters one of the if-blocks.

Ah yes, of course you are correct.

> > > *Many nested 'if' statements could now be a more linear style:*
> > > # Old Version
> > > if a == b:
> > >     print ('a == b')
> > >     if b == c:
> > >         print ('b == c')
> > >     print ('end if')
> >
> > What's wrong with that code? Nesting the code like that follows the
> > logic of the code: the b==c test *only* occurs if a==b.
> >
> >
> > > # New Version
> > > if a == b:
> > >     print ('a == b')
> > > alif b == c:
> > >     print ('b == c')
> > > also:
> > >     print ('end if')
> >
> > I consider this significantly worse. It isn't clear that the comparison
> > between b and c is only made if a == b, otherwise it is entirely
> > skipped.
> >
> >
> It may only be worse because you are not used to reading it. This type of
> syntax looks simple once you know how the pieces work. I mean, you know
> that having multiple if-elif statements will result in only checking
> conditions until one passes. The 'also' mentality would be the same, but
> backwards.

In the first case, your b==c test only occurs if a==b, which can be 
easily seen from the structure of the code:

if a == b:
    everything here occurs only when a == b
    including the b == c test

In the second case, there is no hint from the structure:

if a == b:
alif b == c:

As you read down the left hand column, you see "if a == b" and you can 
mentally say "that block only occurs if a == b" and move on. But when 
you get to the alif block, you have to stop reading forward and go back 
up to understand whether it runs or not.

It's not like elif, which is uneffected by any previous if or elif 
clauses. Each if/elif clause is independent. The test is always made 
(assuming execution reaches that line of code at all), and you can 
decide whether the block is entered or not by looking at the if/elif 
line alone:

elif some_condition():

Here, nothing above the "elif" line matters. If I reach that line, 
some_condition() *must* be evaluated, and the block entered if it 
evaluates to a truthy value. It's easy to understand. But:

alif some_condition():

I cannot even tell whether some_condition() is called or not. The 
structure gives no hint as to whether the alif line is reachable. It 
looks like it is at the same semantic level as the distant "if" 
line somewhere far above it, but it isn't. Whether it runs or 
not is dependent on the distant "if" and "elif" lines above it.

By it's nature, this cannot be simple, since it introduces coupling 
between the alif line you are reading and one or more distant lines 
above it, while disguising the structure of the code by aligning the 
alif with the if even though it is conceptually part of the if block.


From mistersheik at  Mon Jun  8 04:23:59 2015
From: mistersheik at (Neil Girdhar)
Date: Sun, 7 Jun 2015 22:23:59 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 7, 2015 at 1:59 AM, Nick Coghlan <ncoghlan at> wrote:

> On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas
> <python-ideas at> wrote:
> > Also, if we got my change, I could write code that cleanly hooks parsing
> in
> > 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people
> can
> > at least use it, and all of the relevant and complicated code would be
> > shared between the two versions. With your change, I'd have to write code
> > that was completely different for 3.6+ than what I could backport,
> meaning
> > I'd have to write, debug, and maintain two completely different
> > implementations. And again, for no benefit.
> I don't think I've said this explicitly yet, but I'm +1 on the idea of
> making it easier to "hack the token stream". As Andew has noted, there
> are two reasons this is an interesting level to work at for certain
> kinds of modifications:
> 1. The standard Python tokeniser has already taken care of converting
> the byte stream into Unicode code points, and the code point stream
> into tokens (including replacing leading whitespace with the
> structural INDENT/DEDENT tokens)

I will explain in another message how to replace the indent and dedent
tokens so that the lexer loses most of its "magic" and becomes just like
the parser.

> 2. You get to work with a linear stream of tokens, rather than a
> precomposed tree of AST nodes that you have to traverse and keep
> consistent

The AST nodes would contain within them the linear stream of tokens that
you are free to work with.  The AST also encodes the structure of the
tokens, which can be very useful if only to debug how the tokens are being
parsed.  You might find yourself, when doing a more complicated lexical
transformation, trying to reverse engineer where the parse tree nodes begin
and end in the token stream.  This would be a nightmare.  This is the main
problem with trying to process the token stream "blind" to the parse tree.

> If all you're wanting to do is token rewriting, or to push the token
> stream over a network connection in preference to pushing raw source
> code or fully compiled bytecode, a bit of refactoring of the existing
> tokeniser/compiler interface to be less file based and more iterable
> based could make that easier to work with.

You can still do all that with the tokens included in the parse tree.

> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Mon Jun  8 04:33:59 2015
From: stephen at (Stephen J. Turnbull)
Date: Mon, 08 Jun 2015 11:33:59 +0900
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Cory Beutler writes:

 > It may only be worse because you are not used to reading it. This
 > type of syntax looks simple once you know how the pieces work. I
 > mean, you know that having multiple if-elif statements will result
 > in only checking conditions until one passes. The 'also' mentality
 > would be the same, but backwards.

And that inversion is what underlies Steven's point, I think.  I see
your point, *but only if 'elif' goes away*.  Currently the
"hangindent" formatting of if ... elif ... else signals a series of
alternatives, as similar formatting does (as a convention, rather than
syntax) in many other languages.  This makes scanning either actions
or conditions fairly easy; you don't have to actually read the "elif"s
to understand the alternative structure.  With also and alif, you now
have to not only read the keywords, you have to parse the code to
determine what conditions are actually in force.  This is definitely a
readability minus, a big one.

It doesn't help that "else" and "also" and "elif" and "alif" are
rather visually confusable pairs, but at this point that's a bikeshed
painting issue (except that as proponent you might want to paint it a
different color for presentation).

There's also the "dangling also" issue: I would suppose that also has
all the problems of "dangling else", and some new ones besides.  For
example, since "elif" really is "else if" (not a C-like "case"), it's
easy to imagine situations where you'd like to have one also or alif
for the first three cases, and one for the next two, etc.

Python, being a language for grownups, could always add a convention
that you should generally only use also and alif at the end of an if
... elif ... else series or something like that, but I think that
would seriously impair the usefulness of these constructs.

I'm definitely -1 on the also, alif syntax at this point.  On the
other hand, having done a lot of C programming in my misspent youth, I
do miss anaphoric conditionals, so I too would like to see the
possibility of "if cond as var: do_something_with_var" explored.  Of
course Nick is right that automatic common subexpression elimination
(CSE) is the big win, but manual CSE can improve readability.

From mistersheik at  Mon Jun  8 04:37:36 2015
From: mistersheik at (Neil Girdhar)
Date: Sun, 7 Jun 2015 22:37:36 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

The best parsing library in Python I could find to my eyes is modgrammar:

It's GLR I think.  The documentation isn't bad and the syntax isn't too bad.

The major change that I want to make to it is to replace the grammar class
variables with regular instance generator methods, and to replace the
components of the grammar return value, which are currently classes, with
constructed objects.  That way, a whitespace object that represents a block
continuation can be constructed to know how much whitespace it must match.
Similarly, a "suite" can include a constructed whitespace object that
includes extra space.  After it's matched, it can be queried for its size,
and the grammar generator method can construct whitespace objects with the
appropriate size.  This eliminates the need for INDENT and DEDENT tokens.

This kind of dynamic grammar generation is desirable for all kinds of other
language related problems, like the LaTeX one I discussed, and it also
allows us to merge all of the validation code into the parsing code, which
follows "Don't Repeat Yourself".  I think it's a better design.

I will try to find time to build a demo of this this week.

Ultimately, my problem with "token transformers" is, if I'm understanding
correctly, that we want to change Python so that not only will 3.5 have
Token transformers, but every Python after that has to support this.  This
risks constraining the development of the elegant solution.  And for what
major reason do we even need token transformers so soon?  For a toy example
on python ideas about automatic Decimal instances?  Why can't a user define
a one character function "d(x)" to do the conversion everywhere?  I prefer
to push for the better design even if it means waiting a year.



On Sun, Jun 7, 2015 at 6:19 PM, Robert Collins <robertc at>

> On 6 June 2015 at 17:00, Nick Coghlan <ncoghlan at> wrote:
> > On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at> wrote:
> >> I'm curious what other people will contribute to this discussion as I
> think
> >> having no great parsing library is a huge hole in Python.  Having one
> would
> >> definitely allow me to write better utilities using Python.
> >
> > The design of *Python's* grammar is deliberately restricted to being
> > parsable with an LL(1) parser. There are a great many static analysis
> > and syntax highlighting tools that are able to take advantage of that
> > simplicity because they only care about the syntax, not the full
> > semantics.
> >
> > Anyone actually doing their *own* parsing of something else *in*
> > Python, would be better advised to reach for PLY
> > ( ). PLY is the parser underlying
> >, and hence the highly regarded
> > CFFI library,
> >
> > Other notable parsing alternatives folks may want to look at include
> > and
> > (both of which allow you to use
> > Python code to define your grammar, rather than having to learn a
> > formal grammar notation).
> Let me just pimp here - I have
> written languages in both Parsely (a simple packaging metadata
> language) and its predecessor pymeta (in which I wrote pybars -
> handlebars.js for python) - and both were good implementations of
> OMeta, IMNSHO.
> -Rob
> --
> Robert Collins <rbtcollins at>
> Distinguished Technologist
> HP Converged Cloud
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Mon Jun  8 04:42:23 2015
From: stephen at (Stephen J. Turnbull)
Date: Mon, 08 Jun 2015 11:42:23 +0900
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan writes:

 > (It's also worth noting that "as" clauses are specifically for binding to a
 > name, while the LHS of an assignment statement allows attributes, indexing,
 > slicing and tuple unpacking)

+1 (and the point that it's a *binding*, not an assignment, deserves a
lot more than a parenthesized aside).

 > In this particular case, the variant that has always seemed most attractive
 > to me in past discussions is a general purpose "named subexpression"
 > construct that's just a normal local name binding operation affecting
 > whatever namespace the expression is executed in.

Yes, please!

From ncoghlan at  Mon Jun  8 04:42:31 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 12:42:31 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 12:23, Neil Girdhar <mistersheik at> wrote:
> On Sun, Jun 7, 2015 at 1:59 AM, Nick Coghlan <ncoghlan at> wrote:
>> On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas
>> <python-ideas at> wrote:
>> > Also, if we got my change, I could write code that cleanly hooks parsing
>> > in
>> > 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people
>> > can
>> > at least use it, and all of the relevant and complicated code would be
>> > shared between the two versions. With your change, I'd have to write
>> > code
>> > that was completely different for 3.6+ than what I could backport,
>> > meaning
>> > I'd have to write, debug, and maintain two completely different
>> > implementations. And again, for no benefit.
>> I don't think I've said this explicitly yet, but I'm +1 on the idea of
>> making it easier to "hack the token stream". As Andew has noted, there
>> are two reasons this is an interesting level to work at for certain
>> kinds of modifications:
>> 1. The standard Python tokeniser has already taken care of converting
>> the byte stream into Unicode code points, and the code point stream
>> into tokens (including replacing leading whitespace with the
>> structural INDENT/DEDENT tokens)
> I will explain in another message how to replace the indent and dedent
> tokens so that the lexer loses most of its "magic" and becomes just like the
> parser.

I don't dispute that this *can* be done, but what would it let me do
that I can't already do today? I addition, how will I be able to
continue to do all the things that I can do today with the separate
tokenisation step?

*Adding* steps to the compilation toolchain is doable (one of the
first things I was involved in CPython core development was the
introduction of the AST based parser in Python 2.5), but taking them
*away* is much harder.

You appear to have an idealised version of what a code generation
toolchain "should" be, and would like to hammer CPython's code
generation pipeline specifically into that mould. That's not the way
this works - we don't change the code generator for the sake of it, we
change it to solve specific problems with it.

Introducing the AST layer solved a problem. Introducing an AST
optimisation pass would solve a problem. Making the token stream
easier to manipulate would solve a problem.

Merging the lexer and the parser doesn't solve any problem that we have.

>> 2. You get to work with a linear stream of tokens, rather than a
>> precomposed tree of AST nodes that you have to traverse and keep
>> consistent
> The AST nodes would contain within them the linear stream of tokens that you
> are free to work with.  The AST also encodes the structure of the tokens,
> which can be very useful if only to debug how the tokens are being parsed.
> You might find yourself, when doing a more complicated lexical
> transformation, trying to reverse engineer where the parse tree nodes begin
> and end in the token stream.  This would be a nightmare.  This is the main
> problem with trying to process the token stream "blind" to the parse tree.

Anything that cares about the structure to that degree shouldn't be
manipulating the token stream - it should be working on the parse

>> If all you're wanting to do is token rewriting, or to push the token
>> stream over a network connection in preference to pushing raw source
>> code or fully compiled bytecode, a bit of refactoring of the existing
>> tokeniser/compiler interface to be less file based and more iterable
>> based could make that easier to work with.
> You can still do all that with the tokens included in the parse tree.

Not as easily, because I have to navigate the parse tree even when I
don't care about that structure, rather than being able to just look
at the tokens in isolation.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From rosuav at  Mon Jun  8 04:45:49 2015
From: rosuav at (Chris Angelico)
Date: Mon, 8 Jun 2015 12:45:49 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen at> wrote:
> I'm definitely -1 on the also, alif syntax at this point.  On the
> other hand, having done a lot of C programming in my misspent youth, I
> do miss anaphoric conditionals, so I too would like to see the
> possibility of "if cond as var: do_something_with_var" explored.  Of
> course Nick is right that automatic common subexpression elimination
> (CSE) is the big win, but manual CSE can improve readability.

Part of the trouble with depending on CSE is that Python is so dynamic
that you can't depend on things having no side effects... but the more
important part, in my opinion, is that duplication is a source code
maintenance problem. Bruce suggested this:

x = a and a.b and a.b.c and a.b.c.d
# which becomes
x = a and a.b
if x: x = x.c
if x: x = x.d

and frankly, I'd be more worried about a subsequent edit missing
something than I would be about the performance of all the repeated
lookups. Of course, Python does have an alternative, and that's to use
attribute absence rather than falsiness:

try: x = a.b.c.d
except AttributeError: x = None

But that won't always be an option. And any kind of expression that
says "the thing on the left, if it's false, otherwise the thing on the
left modified by this operator" is likely to get messy in anything
more than trivial cases; it looks great here:

x = a?.b?.c?.d

but now imagine something more complicated, and it's a lot more messy.


From abarnert at  Mon Jun  8 04:44:24 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 7 Jun 2015 19:44:24 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 7, 2015, at 19:18, Steven D'Aprano <steve at> wrote:
> As you read down the left hand column, you see "if a == b" and you can 
> mentally say "that block only occurs if a == b" and move on. But when 
> you get to the alif block, you have to stop reading forward and go back 
> up to understand whether it runs or not.

Thanks for putting it this way. I knew there was a more fundamental problem, but I couldn't see it until your message.

The proposal is closely analogous to trying to define a Boolean predicate in a list GUI instead of a tree. And that means it has the exact same problems that the early MS Office and Visual C++ Find in File dialogs had. Besides the obvious fact that mixing conjunctions and disjunctions without grouping (via nesting) is insufficiently powerful for many real-life predicates (which is exactly why the proposal needs the assignment-like add-on), even in the simple cases where it works, it's not readable (which is why the examples had at least one mistake, and at least one person misread one of the other examples). If your eye has to travel back upwards to the last also, but the alsos are flush against the left with the elifs instead of nested differently, you have to make an effort to parse each clause in your head, which is not true for a flat chain of elifs.

At any rate, as two people (I think Stephen and Nick) suggested, the second half of the proposal (the as-like binding) nearly eliminates the need for the first half, and doesn't have the same problem. The biggest problem it has is that you want the same syntax in other places besides if conditions, which is a better problem to have.

From mistersheik at  Mon Jun  8 04:47:16 2015
From: mistersheik at (Neil Girdhar)
Date: Sun, 7 Jun 2015 22:47:16 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 7, 2015 at 10:42 PM, Nick Coghlan <ncoghlan at> wrote:

> On 8 June 2015 at 12:23, Neil Girdhar <mistersheik at> wrote:
> >
> >
> > On Sun, Jun 7, 2015 at 1:59 AM, Nick Coghlan <ncoghlan at> wrote:
> >>
> >> On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas
> >> <python-ideas at> wrote:
> >> > Also, if we got my change, I could write code that cleanly hooks
> parsing
> >> > in
> >> > 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people
> >> > can
> >> > at least use it, and all of the relevant and complicated code would be
> >> > shared between the two versions. With your change, I'd have to write
> >> > code
> >> > that was completely different for 3.6+ than what I could backport,
> >> > meaning
> >> > I'd have to write, debug, and maintain two completely different
> >> > implementations. And again, for no benefit.
> >>
> >> I don't think I've said this explicitly yet, but I'm +1 on the idea of
> >> making it easier to "hack the token stream". As Andew has noted, there
> >> are two reasons this is an interesting level to work at for certain
> >> kinds of modifications:
> >>
> >> 1. The standard Python tokeniser has already taken care of converting
> >> the byte stream into Unicode code points, and the code point stream
> >> into tokens (including replacing leading whitespace with the
> >> structural INDENT/DEDENT tokens)
> >
> >
> > I will explain in another message how to replace the indent and dedent
> > tokens so that the lexer loses most of its "magic" and becomes just like
> the
> > parser.
> I don't dispute that this *can* be done, but what would it let me do
> that I can't already do today? I addition, how will I be able to
> continue to do all the things that I can do today with the separate
> tokenisation step?
> *Adding* steps to the compilation toolchain is doable (one of the
> first things I was involved in CPython core development was the
> introduction of the AST based parser in Python 2.5), but taking them
> *away* is much harder.
> You appear to have an idealised version of what a code generation
> toolchain "should" be, and would like to hammer CPython's code
> generation pipeline specifically into that mould. That's not the way
> this works - we don't change the code generator for the sake of it, we
> change it to solve specific problems with it.
> Introducing the AST layer solved a problem. Introducing an AST
> optimisation pass would solve a problem. Making the token stream
> easier to manipulate would solve a problem.
> Merging the lexer and the parser doesn't solve any problem that we have.

You're right.  And as usual, Nick, your analysis is spot on.  My main
concern is that the idealized way of parsing the language is not precluded
by any change.  Does adding token manipulation promise forwards
compatibility?  Will a Python 3.9 have to have the same kind of token
manipulator exposed.  If not, then I'm +1 on token manipulation. :)

> >> 2. You get to work with a linear stream of tokens, rather than a
> >> precomposed tree of AST nodes that you have to traverse and keep
> >> consistent
> >
> > The AST nodes would contain within them the linear stream of tokens that
> you
> > are free to work with.  The AST also encodes the structure of the tokens,
> > which can be very useful if only to debug how the tokens are being
> parsed.
> > You might find yourself, when doing a more complicated lexical
> > transformation, trying to reverse engineer where the parse tree nodes
> begin
> > and end in the token stream.  This would be a nightmare.  This is the
> main
> > problem with trying to process the token stream "blind" to the parse
> tree.
> Anything that cares about the structure to that degree shouldn't be
> manipulating the token stream - it should be working on the parse
> tree.
> >> If all you're wanting to do is token rewriting, or to push the token
> >> stream over a network connection in preference to pushing raw source
> >> code or fully compiled bytecode, a bit of refactoring of the existing
> >> tokeniser/compiler interface to be less file based and more iterable
> >> based could make that easier to work with.
> >
> > You can still do all that with the tokens included in the parse tree.
> Not as easily, because I have to navigate the parse tree even when I
> don't care about that structure, rather than being able to just look
> at the tokens in isolation.

I don't think it would be more of a burden than it would prevent bugs by
allowing you to ensure that the parse tree structure is what you think it
is.  It's a matter of intuition I guess.

> Regards,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Mon Jun  8 04:52:43 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 12:52:43 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 12:37, Neil Girdhar <mistersheik at> wrote:
> Ultimately, my problem with "token transformers" is, if I'm understanding
> correctly, that we want to change Python so that not only will 3.5 have
> Token transformers, but every Python after that has to support this.  This
> risks constraining the development of the elegant solution.  And for what
> major reason do we even need token transformers so soon?  For a toy example
> on python ideas about automatic Decimal instances?  Why can't a user define
> a one character function "d(x)" to do the conversion everywhere?  I prefer
> to push for the better design even if it means waiting a year.

Neil, you're the only one proposing major structural changes to the
code generation pipeline, and nobody at all is proposing anything for
Python 3.5 (the feature freeze deadline for that has already passed).

Andrew is essentially only proposing relatively minor tweaks to the
API of the existing tokenizer module to make it more iterable based
and less file based (while still preserving the file based APIs).
Eugene Toder's and Dave Malcolm's patches from a few years ago make
the existing AST -> bytecode section of the toolchain easier to modify
and experiment with (and are ideas worth exploring for 3.6 if anyone
is willing and able to invest the time to bring them back up to date).

However, if you're specifically wanting to work on an "ideal parser
API", then the reference interpreter for a 24 year old established
language *isn't* the place to do it - the compromises necessitated by
the need to align with an extensive existing ecosystem will actively
work against your goal for a clean, minimalist structure. That's thus
something better developed *independently* of CPython, and then
potentially considered at some point in the future when it's better
established what concrete benefits it would offer over the status quo
for both the core development team and Python's end users.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From mistersheik at  Mon Jun  8 04:56:55 2015
From: mistersheik at (Neil Girdhar)
Date: Sun, 7 Jun 2015 22:56:55 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 7, 2015 at 10:52 PM, Nick Coghlan <ncoghlan at> wrote:

> On 8 June 2015 at 12:37, Neil Girdhar <mistersheik at> wrote:
> > Ultimately, my problem with "token transformers" is, if I'm understanding
> > correctly, that we want to change Python so that not only will 3.5 have
> > Token transformers, but every Python after that has to support this.
> This
> > risks constraining the development of the elegant solution.  And for what
> > major reason do we even need token transformers so soon?  For a toy
> example
> > on python ideas about automatic Decimal instances?  Why can't a user
> define
> > a one character function "d(x)" to do the conversion everywhere?  I
> prefer
> > to push for the better design even if it means waiting a year.
> Neil, you're the only one proposing major structural changes to the
> code generation pipeline, and nobody at all is proposing anything for
> Python 3.5 (the feature freeze deadline for that has already passed).
> Andrew is essentially only proposing relatively minor tweaks to the
> API of the existing tokenizer module to make it more iterable based
> and less file based (while still preserving the file based APIs).
> Eugene Toder's and Dave Malcolm's patches from a few years ago make
> the existing AST -> bytecode section of the toolchain easier to modify
> and experiment with (and are ideas worth exploring for 3.6 if anyone
> is willing and able to invest the time to bring them back up to date).
> However, if you're specifically wanting to work on an "ideal parser
> API", then the reference interpreter for a 24 year old established
> language *isn't* the place to do it - the compromises necessitated by
> the need to align with an extensive existing ecosystem will actively
> work against your goal for a clean, minimalist structure. That's thus
> something better developed *independently* of CPython, and then
> potentially considered at some point in the future when it's better
> established what concrete benefits it would offer over the status quo
> for both the core development team and Python's end users.

That's not what I'm doing.  All I'm suggesting is that changes to Python
that *preclude* the "ideal parser API" be avoided.  I'm not trying to make
the ideal API happen today.  I'm just keeping the path to that rosy future
free of obstacles.

> Regards,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Mon Jun  8 05:03:09 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 13:03:09 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 12:47, Neil Girdhar <mistersheik at> wrote:
> You're right.  And as usual, Nick, your analysis is spot on.  My main
> concern is that the idealized way of parsing the language is not precluded
> by any change.  Does adding token manipulation promise forwards
> compatibility?  Will a Python 3.9 have to have the same kind of token
> manipulator exposed.  If not, then I'm +1 on token manipulation. :)

That may have been the heart of the confusion, as token manipulation
is *already* a public feature:

The tokenizer module has been a public part of Python for longer than
I've been a Pythonista (first documented in 1.5.2 in 1999):

As a result, token stream manipulation is already possible, you just
have to combine the tokens back into a byte stream before feeding them
to the compiler. Any future Python interpreter would be free to fall
back on implementing a token based API that way, if the CPython code
generator itself were to gain a native token stream interface.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From mistersheik at  Mon Jun  8 05:04:07 2015
From: mistersheik at (Neil Girdhar)
Date: Sun, 7 Jun 2015 23:04:07 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

Okay, well I'm sorry for the trouble then!!

On Sun, Jun 7, 2015 at 11:03 PM, Nick Coghlan <ncoghlan at> wrote:

> On 8 June 2015 at 12:47, Neil Girdhar <mistersheik at> wrote:
> > You're right.  And as usual, Nick, your analysis is spot on.  My main
> > concern is that the idealized way of parsing the language is not
> precluded
> > by any change.  Does adding token manipulation promise forwards
> > compatibility?  Will a Python 3.9 have to have the same kind of token
> > manipulator exposed.  If not, then I'm +1 on token manipulation. :)
> That may have been the heart of the confusion, as token manipulation
> is *already* a public feature:
> The tokenizer module has been a public part of Python for longer than
> I've been a Pythonista (first documented in 1.5.2 in 1999):
> As a result, token stream manipulation is already possible, you just
> have to combine the tokens back into a byte stream before feeding them
> to the compiler. Any future Python interpreter would be free to fall
> back on implementing a token based API that way, if the CPython code
> generator itself were to gain a native token stream interface.
> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Mon Jun  8 05:05:49 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 7 Jun 2015 20:05:49 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 7, 2015, at 19:45, Chris Angelico <rosuav at> wrote:

> Part of the trouble with depending on CSE is that Python is so dynamic
> that you can't depend on things having no side effects... but the more
> important part, in my opinion, is that duplication is a source code
> maintenance problem. Bruce suggested this:
> x = a and a.b and a.b.c and a.b.c.d
> # which becomes
> x = a and a.b
> if x: x = x.c
> if x: x = x.d
> and frankly, I'd be more worried about a subsequent edit missing
> something than I would be about the performance of all the repeated
> lookups. Of course, Python does have an alternative, and that's to use
> attribute absence rather than falsiness:
> try: x = a.b.c.d
> except AttributeError: x = None
> But that won't always be an option.

I don't have a link, but one of the Swift development blogs shows a number of good examples where it isn't an option. When deciding whether they wanted SmallTalk-style nil chaining or Python-style AttributeError/LookupError, all the simple cases look just as good both ways. So they went out looking for real-life code in multiple languages to find examples that couldn't be translated to the other style. They found plenty of nil-chaining examples that were clumsy to translate to exceptions, but almost all of the exception examples that were clumsy to translate to nil chaining could be solved if they just had multiple levels of nil. So, if they could find a way to provide something like Haskell's Maybe, but without forcing you to think about monads and pattern matching, that would be better than exceptions. So that's what they did. (I'm not sure it's 100% successful, because there are rare times when you really do want to check for Just Nothing, and by hiding things under the covers they made that difficult... But in simple cases it definitely does work.)

Anyway, their language design choice isn't directly relevant here (I assume nobody wants a.b.c.d to be None of a.b is missing, or wants to add a?.b?.c?.d syntax to Python), but the examples probably are.

> And any kind of expression that
> says "the thing on the left, if it's false, otherwise the thing on the
> left modified by this operator" is likely to get messy in anything
> more than trivial cases; it looks great here:
> x = a?.b?.c?.d
> but now imagine something more complicated, and it's a lot more messy.

It's surprising how often it doesn't get messy in Swift. But when it does, I really miss being able to pattern match Just Nothing, and there's no way around that without two clumsy assignment statements before the conditional (or defining and calling an extra function), which is even worse than the one that Python often needs...

From robertc at  Mon Jun  8 05:10:17 2015
From: robertc at (Robert Collins)
Date: Mon, 8 Jun 2015 15:10:17 +1200
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 14:56, Neil Girdhar <mistersheik at> wrote:
> On Sun, Jun 7, 2015 at 10:52 PM, Nick Coghlan <ncoghlan at> wrote:

>> However, if you're specifically wanting to work on an "ideal parser
>> API", then the reference interpreter for a 24 year old established
>> language *isn't* the place to do it - the compromises necessitated by
>> the need to align with an extensive existing ecosystem will actively
>> work against your goal for a clean, minimalist structure. That's thus
>> something better developed *independently* of CPython, and then
>> potentially considered at some point in the future when it's better
>> established what concrete benefits it would offer over the status quo
>> for both the core development team and Python's end users.
> That's not what I'm doing.  All I'm suggesting is that changes to Python
> that *preclude* the "ideal parser API" be avoided.  I'm not trying to make
> the ideal API happen today.  I'm just keeping the path to that rosy future
> free of obstacles.

I've used that approach in projects before, and in hindsight I realise
that I caused significant disruption doing that. The reason boils down
to - without consensus that the rosy future is all of:
 - the right future
 - worth doing eventually
 - more important to reach than solve problems that appear on the way

then you end up frustrating folk that have problems now, without
actually adding value to anyone: the project gets to choose between a
future that [worst case, fails all three tests] might not be right,
might not be worth doing, and is less important than actual problems
which it is stopping solutions for.

In this particular case, given Nick's comments about why we change the
guts here, I'd say that 'worth doing eventually' is not in consensus,
and I personally think that anything that is 'in the indefinite
future' is almost never more important than problems affecting people
today, because its a possible future benefit vs a current cost.
There's probably an economics theorem to describe that, but I'm not an
economist :)

Pragmatically, I propose that the existing structure already has
significant friction around any move to a unified (but still
multi-pass I presume) parser infrastructure, and so adding a small
amount of friction for substantial benefits will not substantially
impact the future work.

Concretely: a multi-stage parser with unified language for both lexer
and parser should be quite amenable to calling out to a legacy token
hook, without onerous impact. Failing that, we can follow the
deprecation approach when someone finds we can't do that, and after a
reasonable time remove the old hook. But right now, I think the onus
is on you to show that a shim wouldn't be possible, rather than
refusing to support adding a tokeniser hook because a shim isn't
*obviously possible*.


Robert Collins <rbtcollins at>
Distinguished Technologist
HP Converged Cloud

From abarnert at  Mon Jun  8 05:18:36 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 7 Jun 2015 20:18:36 -0700
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 7, 2015, at 19:52, Nick Coghlan <ncoghlan at> wrote:
> Andrew is essentially only proposing relatively minor tweaks to the
> API of the existing tokenizer module to make it more iterable based
> and less file based (while still preserving the file based APIs).

And also a patch to the existing ast module to allow it to handle tokenizers from Python as well as from C. The tokenizer tweaks themselves are just to make that easier (and to make using tokenizer a little simpler even if you don't feed it directly to the parser).

(It surprised me that the C-level tokenizer actually can take C strings and string objects rather than file objects, but once you think about how the high-level C API stuff like being able to exec a single line must work, it's pretty obvious why that was added...)

> Eugene Toder's and Dave Malcolm's patches from a few years ago make
> the existing AST -> bytecode section of the toolchain easier to modify
> and experiment with (and are ideas worth exploring for 3.6 if anyone
> is willing and able to invest the time to bring them back up to date).

I got a chance to take a look at this, and, while it seems completely orthogonal to what I'm trying to do, it also seems very cool. If someone got the patches up to date for the trunk and fixed the minor issues involved in the last review (both of which look pretty simple), what are the chances of getting it reviewed for 3.6? (I realize this is probably a better question for the issue tracker or the -dev list than buried in the middle of a barely-relevant -ideas thread, but I'm on my phone here, and you brought it up.:)

From ncoghlan at  Mon Jun  8 05:41:01 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 13:41:01 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 12:45, Chris Angelico <rosuav at> wrote:
> On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen at> wrote:
>> I'm definitely -1 on the also, alif syntax at this point.  On the
>> other hand, having done a lot of C programming in my misspent youth, I
>> do miss anaphoric conditionals, so I too would like to see the
>> possibility of "if cond as var: do_something_with_var" explored.  Of
>> course Nick is right that automatic common subexpression elimination
>> (CSE) is the big win, but manual CSE can improve readability.
> Part of the trouble with depending on CSE is that Python is so dynamic
> that you can't depend on things having no side effects... but the more
> important part, in my opinion, is that duplication is a source code
> maintenance problem.

Yes, this is the part of the problem definition I agree with, which is
why I think named subexpressions are the most attractive alternative
presented in the past discussions.

Our typical answer is "pull the named subexpression out to a separate
assignment statement and give it a name", but there are a range of
constructs where that poses a problem. For example:

    x = a.b if a.b else a.c

    while a.b:
        x = a.b

    [a.b for a in iterable if a.b]

Eliminating the duplication with named subexpressions would be
straightforward (I'd suggest making the parentheses mandatory for this
construct, which would also avoid ambiguity in the with statement and
exception handler clause cases):

    x = b if (a.b as b) else a.c

    while (a.b as x):

    [b for a in iterable if (a.b as b)]

By contrast, eliminating the duplication *today* requires switching to
very different structures based on the underlying patterns otherwise
hidden behind the syntactic sugar:

    x = a.b
    if not x:
        x = a.c

    while True:
        x = a.b
        if not x:

    result = []
    for a in iterable:
        b = a.b
        if b:

The main *problem* with named subexpressions (aside from the potential
for side effects introduced by deliberately letting the name bindings
leak into the surrounding namespace) is that it introduces a
redundancy at the single assignment level since an expression
statement that names the expression would be equivalent to a simple
assignment statement:

   x = a
   (a as x)

On the other hand, there's a similar existing redundancy between
function definitions and binding lambda expressions to a name:

    f = lambda: None
    def f(): pass

And for that, we just have a PEP 8 style guideline recommending the
latter form. Something similar would likely work for saying "only use
named subexpressions in cases where using a normal assignment
statement instead would require completely restructuring the code".


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Mon Jun  8 05:55:17 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 13:55:17 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 13:18, Andrew Barnert <abarnert at> wrote:
> On Jun 7, 2015, at 19:52, Nick Coghlan <ncoghlan at> wrote:
>> Eugene Toder's and Dave Malcolm's patches from a few years ago make
>> the existing AST -> bytecode section of the toolchain easier to modify
>> and experiment with (and are ideas worth exploring for 3.6 if anyone
>> is willing and able to invest the time to bring them back up to date).
> I got a chance to take a look at this, and, while it seems completely orthogonal to what I'm trying to do, it also seems very cool. If someone got the patches up to date for the trunk and fixed the minor issues involved in the last review (both of which look pretty simple), what are the chances of getting it reviewed for 3.6? (I realize this is probably a better question for the issue tracker or the -dev list than buried in the middle of a barely-relevant -ideas thread, but I'm on my phone here, and you brought it up.:)

I'm still interested in the underlying ideas, and while it's always
possible to get completely surprised by life's twists and turns, I'm
optimistic I'd be able to find the time to provide feedback on it
myself for 3.6, and hopefully encourage other folks with experience
with the compiler internals to review it as well.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Mon Jun  8 06:18:15 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 14:18:15 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 13:10, Robert Collins <robertc at> wrote:
> In this particular case, given Nick's comments about why we change the
> guts here, I'd say that 'worth doing eventually' is not in consensus,
> and I personally think that anything that is 'in the indefinite
> future' is almost never more important than problems affecting people
> today, because its a possible future benefit vs a current cost.
> There's probably an economics theorem to describe that, but I'm not an
> economist :)

I don't know about economics, but for anyone that hasn't encountered
it before, the phrase YAGNI is a good one to know: You Ain't Gonna
Need It. ( )

The way YAGNI applies when deciding *to* do something is when you're
faced with the following choice:

* Making a particular change solves an immediate problem, but would
make another possible change more complex in the future
* Not making a change preserves the simplicity of the possible future
change, but also doesn't solve the immediate problem

Sometimes you'll get lucky and someone will figure out a third path
that both addresses the immediate concern *and* leaves your future
options open for other changes. More often though, you'll have to
decide between these options, and in those cases "YAGNI" argues in
favour of heavily discounting the potential increase in difficulty for
a change you may never make anyway.


P.S. This tension between considering the long term implications of
changes without allowing that consideration to block near term
progress is what I personally see in the following two lines of the
Zen of Python:

    Now is better than never.
    Although never is often better than *right* now.

Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From mistersheik at  Mon Jun  8 06:23:43 2015
From: mistersheik at (Neil Girdhar)
Date: Mon, 8 Jun 2015 00:23:43 -0400
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

Yes, but in this case the near term "problem" was as far as I can tell just
parsing floats as decimals, which is easily done with a somewhat noisy
function call.  I don't see why it's important.

The way that CPython does parsing is more than just annoying.  It's a mess
of repetition and tests that try to make sure that all of the phases are
synchronized.  I don't think that CPython is the future of Python.  One
day, someone will write a Python interpreter in Python that includes a
clean one-pass parser.  I would prefer to make that as easy to realize as
possible.  You might think it's far-fetched.  I don't think it is.



On Mon, Jun 8, 2015 at 12:18 AM, Nick Coghlan <ncoghlan at> wrote:

> On 8 June 2015 at 13:10, Robert Collins <robertc at> wrote:
> > In this particular case, given Nick's comments about why we change the
> > guts here, I'd say that 'worth doing eventually' is not in consensus,
> > and I personally think that anything that is 'in the indefinite
> > future' is almost never more important than problems affecting people
> > today, because its a possible future benefit vs a current cost.
> > There's probably an economics theorem to describe that, but I'm not an
> > economist :)
> I don't know about economics, but for anyone that hasn't encountered
> it before, the phrase YAGNI is a good one to know: You Ain't Gonna
> Need It. ( )
> The way YAGNI applies when deciding *to* do something is when you're
> faced with the following choice:
> * Making a particular change solves an immediate problem, but would
> make another possible change more complex in the future
> * Not making a change preserves the simplicity of the possible future
> change, but also doesn't solve the immediate problem
> Sometimes you'll get lucky and someone will figure out a third path
> that both addresses the immediate concern *and* leaves your future
> options open for other changes. More often though, you'll have to
> decide between these options, and in those cases "YAGNI" argues in
> favour of heavily discounting the potential increase in difficulty for
> a change you may never make anyway.
> Cheers,
> Nick.
> P.S. This tension between considering the long term implications of
> changes without allowing that consideration to block near term
> progress is what I personally see in the following two lines of the
> Zen of Python:
>     Now is better than never.
>     Although never is often better than *right* now.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Mon Jun  8 06:24:36 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 7 Jun 2015 21:24:36 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 7, 2015, at 20:41, Nick Coghlan <ncoghlan at> wrote:
>> On 8 June 2015 at 12:45, Chris Angelico <rosuav at> wrote:
>>> On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen at> wrote:
>>> I'm definitely -1 on the also, alif syntax at this point.  On the
>>> other hand, having done a lot of C programming in my misspent youth, I
>>> do miss anaphoric conditionals, so I too would like to see the
>>> possibility of "if cond as var: do_something_with_var" explored.  Of
>>> course Nick is right that automatic common subexpression elimination
>>> (CSE) is the big win, but manual CSE can improve readability.
>> Part of the trouble with depending on CSE is that Python is so dynamic
>> that you can't depend on things having no side effects... but the more
>> important part, in my opinion, is that duplication is a source code
>> maintenance problem.
> Yes, this is the part of the problem definition I agree with, which is
> why I think named subexpressions are the most attractive alternative
> presented in the past discussions.

The problem with general named subexpressions is that it inherently means a side effect buried in the middle of an expression. While it's not _impossible_ to do that in Python today (e.g., you can always call a mutating method in a comprehension's if clause or in the third argument to a function), but it's not common or idiomatic.

You could say this is a consulting-adults issue and you shouldn't use it in cases where it's not deep inside an expression--but those are the actual motivating cases, the ones where just "pull it out into a named assignment" won't work. In fact, one of our three examples is:

>    [b for a in iterable if (a.b as b)]

That's exactly the kind of place that you'd call non-idiomatic with a mutating method call, so why is a binding not even worse?

Maybe something more like a let expression, where the binding goes as far left as possible instead of as far right would look better, 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Mon Jun  8 06:27:05 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 7 Jun 2015 21:27:05 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Sorry, early send...

Sent from my iPhone

> On Jun 7, 2015, at 21:24, Andrew Barnert via Python-ideas <python-ideas at> wrote:
>> On Jun 7, 2015, at 20:41, Nick Coghlan <ncoghlan at> wrote:
>>> On 8 June 2015 at 12:45, Chris Angelico <rosuav at> wrote:
>>>> On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen at> wrote:
>>>> I'm definitely -1 on the also, alif syntax at this point.  On the
>>>> other hand, having done a lot of C programming in my misspent youth, I
>>>> do miss anaphoric conditionals, so I too would like to see the
>>>> possibility of "if cond as var: do_something_with_var" explored.  Of
>>>> course Nick is right that automatic common subexpression elimination
>>>> (CSE) is the big win, but manual CSE can improve readability.
>>> Part of the trouble with depending on CSE is that Python is so dynamic
>>> that you can't depend on things having no side effects... but the more
>>> important part, in my opinion, is that duplication is a source code
>>> maintenance problem.
>> Yes, this is the part of the problem definition I agree with, which is
>> why I think named subexpressions are the most attractive alternative
>> presented in the past discussions.
> The problem with general named subexpressions is that it inherently means a side effect buried in the middle of an expression. While it's not _impossible_ to do that in Python today (e.g., you can always call a mutating method in a comprehension's if clause or in the third argument to a function), but it's not common or idiomatic.
> You could say this is a consulting-adults issue and you shouldn't use it in cases where it's not deep inside an expression--but those are the actual motivating cases, the ones where just "pull it out into a named assignment" won't work. In fact, one of our three examples is:
>>    [b for a in iterable if (a.b as b)]
> That's exactly the kind of place that you'd call non-idiomatic with a mutating method call, so why is a binding not even worse?
> Maybe something more like a let expression, where the binding goes as far left as possible instead of as far right would look better, 

... but I can't even begin to think of a way to fit that into Python's syntax that isn't horribly ugly and clunky, and "as" already has lots of precedent, so I think that's not worth exploring.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Mon Jun  8 06:31:21 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 7 Jun 2015 21:31:21 -0700
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 7, 2015, at 21:23, Neil Girdhar <mistersheik at> wrote:
> Yes, but in this case the near term "problem" was as far as I can tell just parsing floats as decimals, which is easily done with a somewhat noisy function call.  I don't see why it's important.

This isn't the only case anyone's ever wanted. The tokenize module has been there since at least 1.5, and presumably it wasn't added for no good reason, or made to work with 3.x just for fun. And it has an example use in the docs. 

The only thing that's changed is that, now that postprocessing the AST has become a lot easier and less hacky because of the ast module and the succession of changes to the import process, the fact that tokenize is still clumsy and hacky is more noticeable.

From stefan_ml at  Mon Jun  8 06:43:52 2015
From: stefan_ml at (Stefan Behnel)
Date: Mon, 08 Jun 2015 06:43:52 +0200
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <ml36i9$ud6$>

Cory Beutler schrieb am 07.06.2015 um 05:03:
> *Examples of use:*
> *Duplicate code in if-chains may be reduced:*
> # Old Version
> if a == b:
>     print ('a == b')
>     foo()             # <-- duplicate code
> elif b == c:
>     print ('b == c')
>     foo()             # <-- duplicate code
> elif c == d:
>     print ('c == d')
>     foo()             # <-- duplicate code
> # New Version
> if a == b:
>     print ('a == b')
> elif b == c:
>     print ('b == c')
> elif c == d:
>     print ('c == d')
> also:
>     foo()            # <-- No longer duplicated

I think this is best done by extracting it into a function, e.g.

    def good_name_needed():
        if a == b:
            print('a == b')
        elif b == c:
            print('b == c')
        elif c == d:
            print('c == d')
            # nothing to do here
        foo()            # <-- No longer duplicated

Usually, in real code, the call to foo() will have some kind of relation to
the previous if-chain anyway, so a good name for the whole function
shouldn't be all to difficult to find.


From ncoghlan at  Mon Jun  8 07:01:06 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 15:01:06 +1000
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 14:23, Neil Girdhar <mistersheik at> wrote:
> Yes, but in this case the near term "problem" was as far as I can tell just
> parsing floats as decimals, which is easily done with a somewhat noisy
> function call.  I don't see why it's important.

No, the problem to be solved is making it easier for people to "play"
with Python's syntax and try out different ideas in a format that can
be shared easily.

The more people that are able to tinker and play with something, and
share the results of their work, the more opportunities there are for
good ideas to be had, and shared, eventually building up to the point
of becoming a coherent proposal for change.

The 3.4 dis module included several enhancements to make playing with
bytecode easier and more fun:

3.4 also added the source_to_code() hook in importlib to make it easy
to tweak the compilation pass without having to learn all the other
intricacies of the import system:

MacroPy and Hylang are interesting examples of ways to manipulate the
AST in order to use the CPython VM without relying solely on the
native language syntax, while byteplay and Numba are examples of
manipulating things at the bytecode level.

> The way that CPython does parsing is more than just annoying.  It's a mess
> of repetition and tests that try to make sure that all of the phases are
> synchronized.  I don't think that CPython is the future of Python.  One day,
> someone will write a Python interpreter in Python that includes a clean
> one-pass parser.  I would prefer to make that as easy to realize as
> possible.  You might think it's far-fetched.  I don't think it is.

While the structure of CPython's code generation toolchain certainly
poses high incidental barriers to entry, those barriers are trivial
compared to the *inherent* barriers to entry involved in successfully
making the case for a change like introducing a matrix multiplication
operator or more clearly separating coroutines from generators through
the async/await keywords (both matrix multiplication support and
async/await landed for 3.5).

If someone successfully makes the case for a compelling change to the
language specification, then existing core developers are also ready,
willing and able to assist in actually making the change to CPython.

As a result, making that final step of *implementing* a syntactic
change in CPython easier involves changing something that *isn't the
bottleneck in the process*, so it would have no meaningful impact on
the broader Python community.

By contrast, making *more steps* of the existing compilation process
easier for pure Python programmers to play with, preferably in an
implementation independent way, *does* impact two of the bottlenecks:
the implementation of syntactic ideas in executable form, and sharing
those experiments with others. Combining that level of syntactic play
with PyPy's ability to automatically generate JIT compilers offers an
extraordinarily powerful platform for experimentation, with the
standardisation process ensuring that accepted experiments also scale
down to significantly more constrained environments like MicroPython.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From floyd at  Mon Jun  8 09:56:41 2015
From: floyd at (floyd)
Date: Mon, 08 Jun 2015 09:56:41 +0200
Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio
Message-ID: <>

Hi *

I use this python line quite a lot in some projects:

if difflib.SequenceMatcher.quick_ratio(None, a, b) >= threshold:

I realized that this is performance-wise not optimal, therefore wrote a
method that will return much faster in a lot of cases by using the
length of "a" and "b" to calculate the upper bound for "threshold":

if difflib.SequenceMatcher.quick_ratio_ge(None, a, b, threshold):

I'd say we could include it into the stdlib, but maybe it should only be
a python code recipe?

I would say this is one of the most frequent use cases for difflib, but
maybe that's just my biased opinion :) . What's yours?



From storchaka at  Mon Jun  8 10:44:38 2015
From: storchaka at (Serhiy Storchaka)
Date: Mon, 08 Jun 2015 11:44:38 +0300
Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio
In-Reply-To: <>
References: <>
Message-ID: <ml3klm$jm4$>

On 08.06.15 10:56, floyd wrote:
> I use this python line quite a lot in some projects:
> if difflib.SequenceMatcher.quick_ratio(None, a, b) >= threshold:
> I realized that this is performance-wise not optimal, therefore wrote a
> method that will return much faster in a lot of cases by using the
> length of "a" and "b" to calculate the upper bound for "threshold":
> if difflib.SequenceMatcher.quick_ratio_ge(None, a, b, threshold):
> I'd say we could include it into the stdlib, but maybe it should only be
> a python code recipe?
> I would say this is one of the most frequent use cases for difflib, but
> maybe that's just my biased opinion :) . What's yours?
> See

If such function will be added, I think it needs better name. E.g. 
difflib.isclose(a, b, threshold).

From ncoghlan at  Mon Jun  8 12:32:44 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 20:32:44 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 14:24, Andrew Barnert <abarnert at> wrote:
> On Jun 7, 2015, at 20:41, Nick Coghlan <ncoghlan at> wrote:
> On 8 June 2015 at 12:45, Chris Angelico <rosuav at> wrote:
> On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen at>
> wrote:
> I'm definitely -1 on the also, alif syntax at this point.  On the
> other hand, having done a lot of C programming in my misspent youth, I
> do miss anaphoric conditionals, so I too would like to see the
> possibility of "if cond as var: do_something_with_var" explored.  Of
> course Nick is right that automatic common subexpression elimination
> (CSE) is the big win, but manual CSE can improve readability.
> Part of the trouble with depending on CSE is that Python is so dynamic
> that you can't depend on things having no side effects... but the more
> important part, in my opinion, is that duplication is a source code
> maintenance problem.
> Yes, this is the part of the problem definition I agree with, which is
> why I think named subexpressions are the most attractive alternative
> presented in the past discussions.
> The problem with general named subexpressions is that it inherently means a
> side effect buried in the middle of an expression. While it's not
> _impossible_ to do that in Python today (e.g., you can always call a
> mutating method in a comprehension's if clause or in the third argument to a
> function), but it's not common or idiomatic.
> You could say this is a consulting-adults issue and you shouldn't use it in
> cases where it's not deep inside an expression--but those are the actual
> motivating cases, the ones where just "pull it out into a named assignment"
> won't work. In fact, one of our three examples is:
>    [b for a in iterable if (a.b as b)]
> That's exactly the kind of place that you'd call non-idiomatic with a
> mutating method call, so why is a binding not even worse?

Ah, but that's one of the interesting aspects of the idea: since
comprehensions and generator expressions *already* define their own
nested scope in Python 3 in order to keep the iteration variable from
leaking, their named subexpressions wouldn't leak either :)

For if/elif clauses and while loops, the leaking would be a desired
feature in order to make the subexpression available for use inside
the following suite body.

That would leave conditional expressions as the main suggested use
case where leaking the named subexpressions might not be desirable.
Without any dedicated syntax, the two ways that first come to mind for
doing expression local named subexpressions would be:

    x = (lambda a=a: b if (a.b as b) else a.c)()
    x = next((b if (a.b as b) else a.c) for a in (a,))

Neither of which would be a particularly attractive option.

The other possibility that comes to mind is to ask the question: "What
happens when a named subexpression appears as part of an argument list
to a function call, or as part of a subscript operation, or as part of
a container display?", as in:

    x = func(b if (a.b as b) else a.c)
    x = y[b if (a.b as b) else a.c]
    x = (b if (a.b as b) else a.c),
    x = [b if (a.b as b) else a.c]
    x = {b if (a.b as b) else a.c}
    x = {'k': b if (a.b as b) else a.c}

Having *those* subexpressions leak seems highly questionable, so it
seems reasonable to suggest that in order for this idea to be workable
in practice, there would need to be some form of implicit scoping rule
where using a named subexpression turned certain constructs into
"scoped subexpressions" that implicitly created a function object and
called it, rather than being evaluated inline as normal. (The dual
pass structure of the code generator should make this technically
feasible - it would be similar to the existing behaviour where the
presence of a yield expression changes the way a containing "def"
statement is handled)

However, that complication is significant enough to make me wonder how
feasible the idea really is - yes, it handles simple cases nicely, but
figuring out how to keep the side effect implications to a manageable
level without making the scoping rules impossibly hard to follow would
be a non-trivial challenge. Without attempting to implement it, I'm
honestly not sure how hard it would be to introduce more comprehension
style implicit scopes to bound the propagation of named subexpression


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abarnert at  Mon Jun  8 13:24:33 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 8 Jun 2015 04:24:33 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 8, 2015, at 03:32, Nick Coghlan <ncoghlan at> wrote:
> On 8 June 2015 at 14:24, Andrew Barnert <abarnert at> wrote:
>> The problem with general named subexpressions is that it inherently means a
>> side effect buried in the middle of an expression. While it's not
>> _impossible_ to do that in Python today (e.g., you can always call a
>> mutating method in a comprehension's if clause or in the third argument to a
>> function), but it's not common or idiomatic.
>> You could say this is a consulting-adults issue and you shouldn't use it in
>> cases where it's not deep inside an expression--but those are the actual
>> motivating cases, the ones where just "pull it out into a named assignment"
>> won't work. In fact, one of our three examples is:
>>   [b for a in iterable if (a.b as b)]
>> That's exactly the kind of place that you'd call non-idiomatic with a
>> mutating method call, so why is a binding not even worse?
> Ah, but that's one of the interesting aspects of the idea: since
> comprehensions and generator expressions *already* define their own
> nested scope in Python 3 in order to keep the iteration variable from
> leaking, their named subexpressions wouldn't leak either :)
> For if/elif clauses and while loops, the leaking would be a desired
> feature in order to make the subexpression available for use inside
> the following suite body.

Except it would also make the subexpression available for use _after_ the suite body. And it would give you a way to accidentally replace rather than shadow a variable from earlier in the function. So it really is just as bad as any other assignment or other mutation inside a condition.

> That would leave conditional expressions as the main suggested use
> case where leaking the named subexpressions might not be desirable.
> Without any dedicated syntax, the two ways that first come to mind for
> doing expression local named subexpressions would be:
>    x = (lambda a=a: b if (a.b as b) else a.c)()
>    x = next((b if (a.b as b) else a.c) for a in (a,))
> Neither of which would be a particularly attractive option.

Especially since if you're willing to introduce an otherwise-unnecessary scope, you don't even need this feature:

    x = (lambda b: b if b else a.c)(a.b)
    x = (lambda b=a.b: b if b else a.c)()

Or, of course, you can just define a reusable ifelse function somewhere:

    def defaultify(val, defaultval
        return val if val else defaultval

    x = defaultify(a.b, a.c)

> The other possibility that comes to mind is to ask the question: "What
> happens when a named subexpression appears as part of an argument list
> to a function call, or as part of a subscript operation, or as part of
> a container display?", as in:
>    x = func(b if (a.b as b) else a.c)
>    x = y[b if (a.b as b) else a.c]
>    x = (b if (a.b as b) else a.c),
>    x = [b if (a.b as b) else a.c]
>    x = {b if (a.b as b) else a.c}
>    x = {'k': b if (a.b as b) else a.c}
> Having *those* subexpressions leak seems highly questionable, so it
> seems reasonable to suggest that in order for this idea to be workable
> in practice, there would need to be some form of implicit scoping rule
> where using a named subexpression turned certain constructs into
> "scoped subexpressions" that implicitly created a function object and
> called it, rather than being evaluated inline as normal.

Now you really _are_ reinventing let. A let expression like this:

    x = let b=a.b in (b if b else a.c)

... is effectively just syntactic sugar for the lambda above. And it's a lot more natural and easy to reason about than letting b escape one step out to the conditional expression but not any farther. (Or to the rest of the complete containing expression? Or the statement? What does "x[(a.b as b)] = b" mean, for example? Or "x[(b if (a.b as b) else a.c) + (b if (d.b as b) else d.c)]"? Or "x[(b if (a.b as b) else a.c) + b]"?)

As a side note, the initial proposal here was to improve performance by not repeating the a.b lookup; I don't think adding an implicit comprehension-like function definition and call will be faster than a getattr except in very uncommon cases. However, I think there are reasonable cases where it's more about correctness than performance (e.g., the real expression you want to avoid evaluating twice is next(spam) or f.readline(), not a.b), so I'm not too concerned there. Also, I'm pretty sure a JIT could effectively inline a function definition plus call more easily than it could CSE an expression that's hard to prove is static.

From steve at  Mon Jun  8 14:12:28 2015
From: steve at (Steven D'Aprano)
Date: Mon, 8 Jun 2015 22:12:28 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 08, 2015 at 04:24:33AM -0700, Andrew Barnert via Python-ideas wrote:
> > For if/elif clauses and while loops, the leaking would be a desired
> > feature in order to make the subexpression available for use inside
> > the following suite body.
> Except it would also make the subexpression available for use _after_ 
> the suite body. And it would give you a way to accidentally replace 
> rather than shadow a variable from earlier in the function. So it 
> really is just as bad as any other assignment or other mutation inside 
> a condition.

I don't know why you think this will be a bad thing. Or rather, even if 
it is a bad thing, it's the Python Way. Apart from classes and functions 
themselves, indented blocks are *not* new scopes as they may be in some 
other languages. They are part of the existing scope, and the issues you 
raise above are already true today:

    x = 1
    if some_condition():
        x = 2  # replaces, rather than shadow, the earlier x
        y = 3  # y may be available for use after the suite body

So I don't see the following as any more of a problem:

    x = 1
    if (some_condition() as x) or (another_condition() as y):
    # x is replaced, and y is available

The solution to replacing a variable is, use another name. And if you 
really care about y escaping from the if-block, just use del y at the 
end of the block. (I can't imagine why anyone would bother.)

> > The other possibility that comes to mind is to ask the question: "What
> > happens when a named subexpression appears as part of an argument list
> > to a function call, or as part of a subscript operation, or as part of
> > a container display?", as in:
> > 
> >    x = func(b if (a.b as b) else a.c)
> >    x = y[b if (a.b as b) else a.c]
> >    x = (b if (a.b as b) else a.c),
> >    x = [b if (a.b as b) else a.c]
> >    x = {b if (a.b as b) else a.c}
> >    x = {'k': b if (a.b as b) else a.c}
> > 
> > Having *those* subexpressions leak seems highly questionable,

I agree with that in regard to the function call. It just feels wrong 
and icky for a binding to occur inside a function call like that. But 
I don't think I agree with respect to the rest. To answer Andrew's later 

> What does "x[(a.b as b)] = b" mean

surely it simply means the same as:

    b = a.b
    x[b] = b

Now we could apply the same logic to a function call:

    # func(a.b as b)
    b = a.b

but I think the reason this feels wrong for function calls is that it 
looks like the "as b" binding should be inside the function's scope 
rather than in the caller's scope. (At least that's what it looks like 
to me.) But that doesn't apply to the others. (At least for me.)

But frankly, I think I would prefer to have b escape from the function 
call than to have to deal with a bunch of obscure, complicated and 
unintuitive "as" scoping rules. Simplicity and predictability counts for 
a lot.


From liik.joonas at  Mon Jun  8 14:26:35 2015
From: liik.joonas at (Joonas Liik)
Date: Mon, 8 Jun 2015 15:26:35 +0300
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

I just got this funny feeling reading the last few posts.

# this
f(i+1 as i)

# feels a lot like..

# but really
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Mon Jun  8 14:26:28 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 22:26:28 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 21:24, Andrew Barnert <abarnert at> wrote:
> Now you really _are_ reinventing let. A let expression like this:
>     x = let b=a.b in (b if b else a.c)
> ... is effectively just syntactic sugar for the lambda above.

Sure, I've thought a *lot* about adding let-type syntax - hence PEP's
403 (@in) and 3150 (given) for a couple of variations on statement
level local variables.

The problem with a let expression is that you still end up having to
jumble up the order of things, just as you do with the trick of
defining and calling a function, rather than being able to just name
the subexpression on first execution and refer back to it by name
later rather than repeating the calculation.

Thus a let expression doesn't actually help all that much with
improving the flow of reading or writing code - you still have the
step of pulling the subexpression out and declaring both its name and
value first, before proceeding on with the value of the calculation.
That's not only annoying when writing, but also increases the
cognitive load when reading, since the subexpressions are introduced
in a context free fashion.

When the named subexpressions are inlined, they work more like the way
pronouns in English work:

  When the (named subexpressions as they) are inlined, they work more
like the way pronouns in English work.

It's a matter of setting up a subexpression for a subsequent
backreference, rather than pulling it out into a truly independent

> And it's a lot more natural and easy to reason about than letting b escape one step out to the conditional expression but not any farther. (Or to the rest of the complete containing expression? Or the statement? What does "x[(a.b as b)] = b" mean, for example? Or "x[(b if (a.b as b) else a.c) + (b if (d.b as b) else d.c)]"? Or "x[(b if (a.b as b) else a.c) + b]"?)

Exactly, that's the main problem with named subexpressions - if you
let them *always* leak, you get some very confusing consequences, and
if you *never* let them leak, than you don't address the if statement
and while loop use cases.

So to make them work as desired, you have to say they "sometimes"
leak, and then define what that means in a comprehensible way.

One possible way to do that would be to say that they *never* leak by
default (i.e. using a named subexpression always causes the expression
containing them to be executed in its own scope), and then introduce
some form of special casing into if statements and while loops to
implicitly extract named subexpressions.

> As a side note, the initial proposal here was to improve performance by not repeating the a.b lookup; I don't think adding an implicit comprehension-like function definition and call will be faster than a getattr except in very uncommon cases. However, I think there are reasonable cases where it's more about correctness than performance (e.g., the real expression you want to avoid evaluating twice is next(spam) or f.readline(), not a.b), so I'm not too concerned there. Also, I'm pretty sure a JIT could effectively inline a function definition plus call more easily than it could CSE an expression that's hard to prove is static.

Yes, I'm not particularly interested in speed here - I'm personally
interested in maintainability and expressiveness. (That's also why I
consider this a very low priority project for me personally, as it's
very, very hard to make a programming language easier to use by
*adding* concepts to it. You really want to be giving already emergent
patterns names and syntactic sugar, since you're then replacing
learning a pattern that someone would have eventually had to learn
anyway with learning the dedicated syntax).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Mon Jun  8 14:38:58 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Jun 2015 22:38:58 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 22:12, Steven D'Aprano <steve at> wrote:

[In relation to named subexpressions leaking to the surrounding
namespace by default]
> I agree with that in regard to the function call. It just feels wrong
> and icky for a binding to occur inside a function call like that. But
> I don't think I agree with respect to the rest. To answer Andrew's later
> question:
>> What does "x[(a.b as b)] = b" mean
> surely it simply means the same as:
>     b = a.b
>     x[b] = b

Right, but it reveals the execution order jumping around in a way that
is less obvious in the absence of side effects. That is, for side
effect free functions, the order of evaluation in:

    x[a()] = b()

doesn't matter. Once side effects are in play, the order matters a lot more.

> Now we could apply the same logic to a function call:
>     # func(a.b as b)
>     b = a.b
>     func(b)
> but I think the reason this feels wrong for function calls is that it
> looks like the "as b" binding should be inside the function's scope
> rather than in the caller's scope. (At least that's what it looks like
> to me.) But that doesn't apply to the others. (At least for me.)
> But frankly, I think I would prefer to have b escape from the function
> call than to have to deal with a bunch of obscure, complicated and
> unintuitive "as" scoping rules. Simplicity and predictability counts for
> a lot.

Hence the ongoing absence of named subexpressions as a feature - the
simple cases look potentially interesting, but without careful
consideration, the complex cases would inevitably end up depending on
CPython specific quirks in subexpression execution order.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abarnert at  Mon Jun  8 15:21:40 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 8 Jun 2015 06:21:40 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 8, 2015, at 05:26, Nick Coghlan <ncoghlan at> wrote:
> The problem with a let expression is that you still end up having to
> jumble up the order of things, just as you do with the trick of
> defining and calling a function, rather than being able to just name
> the subexpression on first execution and refer back to it by name
> later rather than repeating the calculation.

But notice that in two of your three use cases--and, significantly, the ones that are expressions--the place of first execution comes lexically _after_ the reference, so in normal reading order, you're referring _forward_ to it by name. 

He can front a clause without swapping the pronoun and its referent if Nick intends that special emphasis, but otherwise he wouldn't do that in English. That's a valid English sentence, but you have to think for a second to parse it, and then think again to guess what the odd emphasis is supposed to connote.

Sometimes you actually do want that odd emphasis (it seems like a major point of your given proposal), but that's not the case here. It's the temporary name "b" that's unimportant, not its definition; the only reason you need the name at all is to avoid evaluating "a.b" twice. So having it come halfway through the expression is a little weird.

Of course the same thing does happen in comprehensions, but (a) those are one of the few things in Python that are intended to read as much like math as like English, and (b) it almost always _is_ the expression rather than the loop variable that's the interesting part of a comprehension; that isn't generally true for a conditional.

From abarnert at  Mon Jun  8 15:31:34 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 8 Jun 2015 06:31:34 -0700
Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio
In-Reply-To: <>
References: <>
Message-ID: <>

If this really is needed as a performance optimization, surely you want to do something faster than loop over dozens of comparisons to decide whether you can skip the actual work?

I don't know if this is something you can calculate analytically, but if not, you're presumably doing this on zillions of lines, and instead of repeating the loop every time, wouldn't it be better to just do it once and then just check the ratio each time? (You could hide that from the caller by just factoring out the loop to a function _get_ratio_for_threshold and decorating it with @lru_cache. But I don't know if you really need to hide it from the caller.)

Also, do the extra checks for 0, 1, and 0.1 and for empty strings actually speed things up in practice?

> On Jun 8, 2015, at 00:56, floyd <floyd at> wrote:
> Hi *
> I use this python line quite a lot in some projects:
> if difflib.SequenceMatcher.quick_ratio(None, a, b) >= threshold:
> I realized that this is performance-wise not optimal, therefore wrote a
> method that will return much faster in a lot of cases by using the
> length of "a" and "b" to calculate the upper bound for "threshold":
> if difflib.SequenceMatcher.quick_ratio_ge(None, a, b, threshold):
> I'd say we could include it into the stdlib, but maybe it should only be
> a python code recipe?
> I would say this is one of the most frequent use cases for difflib, but
> maybe that's just my biased opinion :) . What's yours?
> See
> cheers,
> floyd
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From ncoghlan at  Mon Jun  8 16:11:26 2015
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Jun 2015 00:11:26 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 June 2015 at 23:21, Andrew Barnert <abarnert at> wrote:
> On Jun 8, 2015, at 05:26, Nick Coghlan <ncoghlan at> wrote:
>> The problem with a let expression is that you still end up having to
>> jumble up the order of things, just as you do with the trick of
>> defining and calling a function, rather than being able to just name
>> the subexpression on first execution and refer back to it by name
>> later rather than repeating the calculation.
> But notice that in two of your three use cases--and, significantly, the ones that are expressions--the place of first execution comes lexically _after_ the reference, so in normal reading order, you're referring _forward_ to it by name.

Right, but as you note later, that jumping around in execution order
is inherent in the way conditional expressions and comprehensions are
constructed, and the named subexpressions track execution order rather
than lexical order. It's also worth noting that the comprehension case
causes the same problem for a let expression that "pull it out to a
separate statement" does for while loops:

    # This works
    x = a.b
    if x:
        # use x

    # This doesn't
    x = a.b
    while x:
        # use x

And similarly:

    # This could work
    x = (let b = a.b in (b if b else a.c))

    # This can't be made to work
    x = (let b = a.b in (b for a in iterable if b)

By contrast, these would both be possible:

    x = b if (a.b as b) else a.c
    x = (b for a in iterable if (a.b as b))

If it's accepted that letting subexpressions of binary, ternary and
quaternary expressions refer to each other is a desirable design goal,
then a new scope definition expression can't handle that requirement -
cross-references require a syntax that can be interleaved with the
existing constructs and track their execution flow, rather than a
syntax that wraps them in a new larger expression.

> He can front a clause without swapping the pronoun and its referent if Nick intends that special emphasis, but otherwise he wouldn't do that in English. That's a valid English sentence, but you have to think for a second to parse it, and then think again to guess what the odd emphasis is supposed to connote.

Yeah, I didn't adequately think through the way the out-of-order
execution weakened the pronoun-and-back-reference analogy.

> Sometimes you actually do want that odd emphasis (it seems like a major point of your given proposal),

It's just a consequence of tracking execution order rather than
lexical order. The *reason* for needing to track execution order is
because it's the only way to handle loops properly (by rebinding the
name to a new value on each iteration).

It's also possible to get a conditional expression to use a back
reference instead of a forward reference by inverting the check:

    x = a.c if not (a.b as b) else b

Or by using the existing "pull the subexpression out to a separate
statement" trick:

    b = a.b
    x = b if b else a.c

You'd never be *forced* to use a forward reference if you felt it made
the code less readable.

The forward reference would be mandatory in comprehensions, but that's
already at least somewhat familiar due to the behaviour of the
iteration variable.

> but that's not the case here. It's the temporary name "b" that's unimportant, not its definition; the only reason you need the name at all is to avoid evaluating "a.b" twice. So having it come halfway through the expression is a little weird.

I'd consider elif clauses, while loops, comprehensions and generator
expressions to be the most useful cases - they're all situations where
pulling the subexpression out to a preceding assignment statement
doesn't work due to the conditional execution of the clause (elif) or
the repeated execution (while loops, comprehensions, generator

For other cases, the semantics would need to be clearly *defined* in
any real proposal, but I would expect a preceding explicit assignment
statement to be clearer most of the time.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From taleinat at  Mon Jun  8 17:15:29 2015
From: taleinat at (Tal Einat)
Date: Mon, 8 Jun 2015 18:15:29 +0300
Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio
In-Reply-To: <ml3klm$jm4$>
References: <> <ml3klm$jm4$>
Message-ID: <>

On Mon, Jun 8, 2015 at 11:44 AM, Serhiy Storchaka <storchaka at> wrote:

> If such function will be added, I think it needs better name. E.g.
> difflib.isclose(a, b, threshold).

Indeed, this is somewhat similar in concept to the recently-added
math.isclose() function, and could use a similar signature, i.e.:
difflib.SequenceMatcher.isclose(a, b, rel_tol=None, abs_tol=None).

However, the real issue here is whether this is important enough to be
included in the stdlib.

Are there any places in the stdlib where this would be useful?

Can anyone other than the OP confirm that they would find having this
in the stdlib particularly useful?

Why should this be in the stdlib vs. a recipe?

- Tal Einat

From wolfram.hinderer at  Mon Jun  8 17:24:04 2015
From: wolfram.hinderer at (Wolfram Hinderer)
Date: Mon, 08 Jun 2015 17:24:04 +0200
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Am 08.06.2015 um 14:38 schrieb Nick Coghlan:
 > On 8 June 2015 at 22:12, Steven D'Aprano <steve at> wrote:
 > [In relation to named subexpressions leaking to the surrounding
 > namespace by default]
 >>> What does "x[(a.b as b)] = b" mean
 >> surely it simply means the same as:
 >>     b = a.b
 >>     x[b] = b
 > Right, but it reveals the execution order jumping around in a way that
 > is less obvious in the absence of side effects.

I'm lost. The evaluation order of today (right hand side first)
would make "x[(a.b as b)] = b" mean

     x[a.b] = b
     b = a.b

(assuming looking up a.b has no side effects).
Would the introduction of named subexpressions change that, and how?

From bussonniermatthias at  Mon Jun  8 17:39:59 2015
From: bussonniermatthias at (Matthias Bussonnier)
Date: Mon, 8 Jun 2015 08:39:59 -0700
Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio
In-Reply-To: <>
References: <> <ml3klm$jm4$>
Message-ID: <>

> On Jun 8, 2015, at 08:15, Tal Einat <taleinat at> wrote:
> On Mon, Jun 8, 2015 at 11:44 AM, Serhiy Storchaka <storchaka at> wrote:
>> If such function will be added, I think it needs better name. E.g.
>> difflib.isclose(a, b, threshold).
> Indeed, this is somewhat similar in concept to the recently-added
> math.isclose() function, and could use a similar signature, i.e.:
> difflib.SequenceMatcher.isclose(a, b, rel_tol=None, abs_tol=None).
> However, the real issue here is whether this is important enough to be
> included in the stdlib.

One thing I found is that the fact that stdlib try to be smart (human friendly diff) make it horribly slow[1]. 

>  SequenceMatcher tries to compute a "human-friendly diff" between two sequences.

If you are only interested in a quick ratio, especially on long sequences, I would suggest using another algorithm
which is not worse case scenario in n^3.

On some sequences computing the diff was several order of magnitude faster for me[2] (pure python).
At the point where quick-ratio was not needed.

Note also that SequeceMatcher(None, a, b) might not give the same result/ratio that SequenceMatcher(None, b, a)

>>> SequenceMatcher(None,'aba', 'bca').get_matching_blocks()
 [Match(a=0, b=2, size=1), Match(a=3, b=3, size=0)]   # 1 common char

>>> SequenceMatcher(None,'bca','aba').get_matching_blocks()
 [Match(a=0, b=1, size=1), Match(a=2, b=2, size=1), Match(a=3, b=3, size=0)] # 2 common chars.


[1] For my application, I don?t really care about having Human Friendly diff, but the actual minimal diff. 
     I do understand the need for this algorithm though. 
[2] Well chosen Benchmark : <>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Mon Jun  8 17:42:34 2015
From: stephen at (Stephen J. Turnbull)
Date: Tue, 09 Jun 2015 00:42:34 +0900
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

Robert Collins writes:

 > There's probably an economics theorem to describe that, but I'm not an
 > economist :)

I'd like to refuse the troll, but this one is too good to pass up.

The book which is the authoritative source on the theorems you're
looking for is Nancy Stokey's "The Economics of Inaction".  'Nuff said
on this topic?<wink/>

From robertc at  Mon Jun  8 20:35:48 2015
From: robertc at (Robert Collins)
Date: Tue, 9 Jun 2015 06:35:48 +1200
Subject: [Python-ideas] Hooking between lexer and parser
In-Reply-To: <>
References: <>
Message-ID: <>

On 9 June 2015 at 03:42, Stephen J. Turnbull <stephen at> wrote:
> Robert Collins writes:
>  > There's probably an economics theorem to describe that, but I'm not an
>  > economist :)
> I'd like to refuse the troll, but this one is too good to pass up.
> The book which is the authoritative source on the theorems you're
> looking for is Nancy Stokey's "The Economics of Inaction".  'Nuff said
> on this topic?<wink/>

Thanks; wasn't a troll - fishing perhaps :)

I've bought it for kindle and shall have a read.


Robert Collins <rbtcollins at>
Distinguished Technologist
HP Converged Cloud

From ncoghlan at  Mon Jun  8 23:26:09 2015
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Jun 2015 07:26:09 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 9 June 2015 at 01:24, Wolfram Hinderer
<wolfram.hinderer at> wrote:
> Am 08.06.2015 um 14:38 schrieb Nick Coghlan:
>> On 8 June 2015 at 22:12, Steven D'Aprano <steve at> wrote:
>> [In relation to named subexpressions leaking to the surrounding
>> namespace by default]
>>>> What does "x[(a.b as b)] = b" mean
>>> surely it simply means the same as:
>>>     b = a.b
>>>     x[b] = b
>> Right, but it reveals the execution order jumping around in a way that
>> is less obvious in the absence of side effects.
> I'm lost. The evaluation order of today (right hand side first)
> would make "x[(a.b as b)] = b" mean
>     x[a.b] = b
>     b = a.b
> (assuming looking up a.b has no side effects).

That assumption that the LHS evaluation has no side effects is the one
that gets revealed by named subexpressions:

>>> def subscript():
...     print("Subscript called")
...     return 0
>>> def value():
...     print("Value called")
...     return 42
>>> def target():
...     print("Target called")
...     return [None]
>>> target()[subscript()] = value()
Value called
Target called
Subscript called


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ceridwen.mailing.lists at  Tue Jun  9 16:29:36 2015
From: ceridwen.mailing.lists at (Cara)
Date: Tue, 09 Jun 2015 10:29:36 -0400
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 49
In-Reply-To: <>
References: <>
Message-ID: <>

This isn't directed at Andrew in particular but the discussion in
general, because it wasn't clear to me from how everyone was using the
words that the distinction Andrew mentions was clear.  PEGs are a class
of grammars analogous to context-free grammars, regular grammars, or (to
choose a more obscure example) Boolean grammars
( ).  PEGs are probably not
comparable to CFGs, while Boolean grammars are a strict superset of
CFGs.  Like other grammars, PEGs admit multiple parsing algorithms.  As
Andrew says and as far as I know, OMeta uses a top-down
recursive-descent algorithm with backtracking for parsing PEGs, which is
why it can go exponential on some inputs.  Packrat parsing is the
algorithm that Bryan Ford published at the same time as he introduced
PEGs, and it can parse PEGs in linear time by using memoization instead
of backtracking.  However, it's not the only algorithm that can parse
PEGs, and while I don't know any work on this, it seems plausible that
algorithms for parsing general CFGs like GLL or Earley could be adapted
to parsing PEGs.  Likewise, memoization can be used to avoid
backtracking in a top-down recursive-descent parser for CFGs, though
it's highly unlikely that any algorithm could achieve linear time for
ambiguous CFGs.

> > I don't know about OMeta, but the Earley parsing algorithm is
> worst-cast cubic time "quadratic time for unambiguous grammars, and
> linear time for almost all LR(k) grammars".
> I don't know why you'd want to use Earley for parsing a programming
> language. IIRC, it was the first algorithm that could handle rampant
> ambiguity in polynomial time, but that isn't relevant to parsing
> programming languages (especially one like Python, which was
> explicitly designed to be simple to parse), and it isn't relevant to
> natural languages if you're not still in the 1960s, except in learning
> the theory and history of parsing. GLR does much better in
> almost-unambiguous/almost-deterministic languages; CYK can be easily
> extended with weights (which propagate sensibly, so you can use them
> for a final judgment, or to heuristically prune alternatives as you
> go); Valiant is easier to reason about mathematically; etc. And that's
> just among the parsers in the same basic family as Earley.

Do you have a source for the assertion that Earley is slower than GLR?
I've heard many people say this, but I've never seen any formal
comparisons made since Masaru Tomita's 1985 book, "Efficient parsing for
natural language: A fast algorithm for practical systems."

As far as I know, this is emphatically not true for asymptotic
complexity.  In 1991 Joop Leo published ?A General Context-free Parsing
Algorithm Running in Linear Time on Every LR(K) Grammar Without Using
Lookahead," a modification to Earley's algorithm that makes it run in
linear time on LR-regular grammars.  The LR-regular grammars include the
LR(k) grammars for all k and are in fact a strict superset of the
deterministic grammars.  Again, as far as I know, GLR parsers run in
linear times on grammars depending on the LR table they're using, which
means in most cases LR(1) or something similar.  There are many
variations of the GLR algorithm now, though, so it's possible there's
one I don't know about that doesn't have this limitation.

As for constant factors, it's possible that GLR is better than Earley.
I'm reluctant to assume that Tomita's findings still hold, though,
because hardware has changed radically since 1985 and because both the
original versions of both GLR and Earley had bugs.  While there are now
versions of both algorithms that fix the bugs, the fixes may have
changed the constant factors.

> In fact, even if I wanted to write an amazing parser library for
> Python (and I kind of do, but I don't know if I have the time), I
> still don't think I'd want to suggest it as a replacement for the
> parser in CPython. Writing all the backward-compat adapters and
> porting the Python parser over with all its quirks intact and building
> the tests to prove that it's performance and error handling were
> strictly better and so on wouldn't be nearly as much fun as other
> things I could do with it.

I'm working on a new parser library for Python at that I intend to have some of
the features that have been discussed here (it can be used for
scanner-less/lexer-less parsing, if one wants; a general CFG algorithm
rather than some subset thereof; linear-time parsing on a reasonable
subset of CFGs; integrated semantic actions; embedding in Python rather
than having a separate DSL like EBNF) and some others that haven't been.
The research alone has been a lot of work, and the implementation no
less so.  I'd barely call what I have at the moment pre-alpha.  I'm
exploring an implementation based on a newer general CFG parsing
algorithm, GLL
( ), though I'd like to compare it to Earley on constant factors.


From ethan at  Tue Jun  9 20:00:57 2015
From: ethan at (Ethan Furman)
Date: Tue, 09 Jun 2015 11:00:57 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 06/07/2015 07:18 PM, Steven D'Aprano wrote:

> It's not like elif, which is uneffected by any previous if or elif
> clauses. Each if/elif clause is independent.

This is simply not true: each "elif" encountered is only evaluated if all the previous if/elif lines failed, so you have to pay attention to those previous lines to know if execution will even get 
this far.

>  The test is always made
> (assuming execution reaches that line of code at all),



From wolfram.hinderer at  Tue Jun  9 20:54:00 2015
From: wolfram.hinderer at (Wolfram Hinderer)
Date: Tue, 09 Jun 2015 20:54:00 +0200
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Am 08.06.2015 um 23:26 schrieb Nick Coghlan:
> On 9 June 2015 at 01:24, Wolfram Hinderer
> <wolfram.hinderer at> wrote:
>> Am 08.06.2015 um 14:38 schrieb Nick Coghlan:
>>> On 8 June 2015 at 22:12, Steven D'Aprano <steve at> wrote:
>>> [In relation to named subexpressions leaking to the surrounding
>>> namespace by default]
>>>>> What does "x[(a.b as b)] = b" mean
>>>> surely it simply means the same as:
>>>>      b = a.b
>>>>      x[b] = b
>>> Right, but it reveals the execution order jumping around in a way that
>>> is less obvious in the absence of side effects.
>> I'm lost. The evaluation order of today (right hand side first)
>> would make "x[(a.b as b)] = b" mean
>>      x[a.b] = b
>>      b = a.b
>> (assuming looking up a.b has no side effects).
> That assumption that the LHS evaluation has no side effects is the one
> that gets revealed by named subexpressions:
>>>> def subscript():
> ...     print("Subscript called")
> ...     return 0
> ...
>>>> def value():
> ...     print("Value called")
> ...     return 42
> ...
>>>> def target():
> ...     print("Target called")
> ...     return [None]
> ...
>>>> target()[subscript()] = value()
> Value called
> Target called
> Subscript called
Hm, that's my point, isn't it?
The evaluation of subscript() happens after the evaluation of value().
The object that the RHS evaluates to (i.e. value()) is determined before 
subscript() is evaluated. Sideeffects of subscript() may mutate this 
object, but can't change *which* object is assigned.
But if

     x[(a.b as b)] = b


     b = a.b
     x[b] = b

then the evaluation of the LHS *does* change which object is assigned. 
That's why I asked for clarification.

(I mentioned the thing about a.b not having side effects only because in 
my alternative

     x[a.b] = b
     b = a.b

a.b is called twice, so it's no exact representation of what is going on either. But it's a lot closer, at least the right object is assigned ;-) )

From ncoghlan at  Wed Jun 10 01:46:03 2015
From: ncoghlan at (Nick Coghlan)
Date: Wed, 10 Jun 2015 09:46:03 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 10 Jun 2015 05:00, "Wolfram Hinderer" <wolfram.hinderer at>

> Hm, that's my point, isn't it?
> The evaluation of subscript() happens after the evaluation of value().
> The object that the RHS evaluates to (i.e. value()) is determined before
subscript() is evaluated. Sideeffects of subscript() may mutate this
object, but can't change *which* object is assigned.
> But if
>     x[(a.b as b)] = b
> means
>     b = a.b
>     x[b] = b

That would be:

    x[b] = (a.b as b)

> then the evaluation of the LHS *does* change which object is assigned.
That's why I asked for clarification.

Execution order wouldn't change, so it would mean the following:

    _temp = b
    b = a.b
    x[b] = _temp

This means you'd get the potentially surprising behaviour where the name
binding would still happen even if the subscript assignment fails.

However if name bindings *didn't* leak out of their containing expression
by default, and while/if/elif code generation instead gained machinery to
retrieve the name bindings for any named subexpressions in the condition,
that would eliminate most of the potentially bizarre edge cases.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From abarnert at  Wed Jun 10 02:20:38 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 9 Jun 2015 17:20:38 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 9, 2015, at 16:46, Nick Coghlan <ncoghlan at> wrote:
> However if name bindings *didn't* leak out of their containing expression by default, and while/if/elif code generation instead gained machinery to retrieve the name bindings for any named subexpressions in the condition, that would eliminate most of the potentially bizarre edge cases.

I don't think here's any consistent way to define "containing expression" that makes any sense for while/if statements. But "containing _statement_", that's easy.

In addition to the function local scope that exists today, add a statement local scope. Only an as-binding expression creates a new statement-local binding, and it does so in the smallest containing statement (so, e.g., in a while statement's condition, it's the whole while statement, suite and else suite as well as the rest of the condition). These bindings shadow outer as-bindings and function-locals. Assignments inside a statement that as-binds the variable change the statement-local variable, rather than creating a function-local. Two as-bindings within the same statement are treated like an as-binding followed by assignment in the normal (possibly implementation-dependent) evaluation order (which should rarely be relevant, unless you're deliberately writing pathological code). Of course this is much more complex than Python's current rules. But it's not that hard to reason about. In particular, even in silly cases akin to "x[(a.b as b)] = b" and "x[b] = (a.b as b)", either it does what you'd naively expect or raises an UnboundLocalError; it never uses any outer value of b. And, I think, in all of the cases you actually want people to use, it means what you want it to. It even handles cases where you put multiple as bindings for the same name in different subexpressions of an expression in the same part of a statement.

Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa.

The question is, is that the behavior you'd intuitively want, or is escaping to the rest of the smallest statement sometimes unacceptable, or are the rules about assignments inside a controlled suite wrong in some case?

From rosuav at  Wed Jun 10 02:54:55 2015
From: rosuav at (Chris Angelico)
Date: Wed, 10 Jun 2015 10:54:55 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa.

I'd actually rather see this implemented the other way around: instead
of turning this into a function call, actually have a real concept of
nested scoping. Nested functions imply changes to tracebacks and such,
which scoping doesn't require.

How hard would it be to hack the bytecode compiler to treat two names
as distinct despite appearing the same? Example:

def f(x):
    e = 2.718281828
        return e/x
    except ZeroDivisionError as e:
        raise ContrivedCodeException from e

Currently, f.__code__.co_varnames is ('x', 'e'), and all the
references to e are working with slot 1; imagine if, instead,
co_varnames were ('x', 'e', 'e') and the last two lines used slot 2
instead. Then the final act of the except clause would be to unbind
its local name e (slot 2), and then any code after the except block
would use slot 1 for e, and the original value would "reappear".

The only place that would need to "know" about the stack of scopes is
the compilation step; everything after that just uses the slots. Is
this feasible?


From abarnert at  Wed Jun 10 03:58:15 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 9 Jun 2015 18:58:15 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 9, 2015, at 17:54, Chris Angelico <rosuav at> wrote:
> On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas
> <python-ideas at> wrote:
>> Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa.
> I'd actually rather see this implemented the other way around: instead
> of turning this into a function call, actually have a real concept of
> nested scoping. Nested functions imply changes to tracebacks and such,
> which scoping doesn't require.
> How hard would it be to hack the bytecode compiler to treat two names
> as distinct despite appearing the same?

Here's a quick&dirty idea that might work: Basically, just gensyn a name like .0 for the second e (as is done for comprehensions), compile as normal, then rename the .0 back to e in the code attributes.

The problem is how to make this interact with all kinds of other stuff. What if someone calls locals()? What if the outer e was nonlocal or global? What if either e is referenced by an inner function? What if another statement re-rebinds e inside the first statement? What if you do this inside a class (or at top level)?I think for a quick hack to play with this, you don't have to worry about any of those issues; just say that's illegal, and whatever happens (even a segfault) is your own fault for trying it. (And obviously the same if some C extension calls PyFrame_LocalsToFast or equivalent.) But for a real implementation, I'm not even sure what the rules should be, much less how to implement them. (I'm guessing the implementation could either involve having a stack of symbol tables, or tagging things at the AST level while we've still got a tree and using that info in the last step, but I think there's still a problem telling the machinery how to set up closure cells to link inner functions' free variables.)

Also, all of this assumes that none of the machinery, even for tracebacks and debugging, cares about the name of the variable, just its index. Is that true?

It might be better to not start off worrying about how to get there from here, and instead first try to design the complete scoping rules for a language that's like Python but with nested scopes, and then identify all the places that it would differ from Python, and then decide which parts of the existing machinery you can hack up and which parts you have to completely replace. (Maybe, for example, would be easier with new bytecodes to replace LOAD_CLOSURE, LOAD_DEREF, MAKE_CLOSURE, etc. than trying to modify the data to make those bytecodes work properly.)

> Example:
> def f(x):
>    e = 2.718281828
>    try:
>        return e/x
>    except ZeroDivisionError as e:
>        raise ContrivedCodeException from e
> Currently, f.__code__.co_varnames is ('x', 'e'), and all the
> references to e are working with slot 1; imagine if, instead,
> co_varnames were ('x', 'e', 'e') and the last two lines used slot 2
> instead. Then the final act of the except clause would be to unbind
> its local name e (slot 2),
> and then any code after the except block
> would use slot 1 for e, and the original value would "reappear".

I don't think that "unbind" is a real step that needs to happen. The names have to get mapped to slot numbers at compile time anyway, so if all code outside of the except clause was compiled to LOAD_FAST 1 instead of LOAD_FAST 2, it doesn't matter that slot 2 has the same name. The only thing you need to do is the existing implicit "del e" on slot 2. (If you somehow managed to do another LOAD_FAST 2 after that, it would just be an UnboundLocalError, which is fine. But no code outside the except clause can compile to that anyway, unless there's a bug in your idea of its implementation or someone does some byteplay stuff).

> The only place that would need to "know" about the stack of scopes is
> the compilation step; everything after that just uses the slots. Is
> this feasible?
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From rosuav at  Wed Jun 10 05:27:38 2015
From: rosuav at (Chris Angelico)
Date: Wed, 10 Jun 2015 13:27:38 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 10, 2015 at 11:58 AM, Andrew Barnert <abarnert at> wrote:
>> How hard would it be to hack the bytecode compiler to treat two names
>> as distinct despite appearing the same?
> Here's a quick&dirty idea that might work: Basically, just gensyn a name like .0 for the second e (as is done for comprehensions), compile as normal, then rename the .0 back to e in the code attributes.

That's something like what I was thinking of, yeah.

> The problem is how to make this interact with all kinds of other stuff. What if someone calls locals()?

Ow, that one I have no idea about. Hmm. That could be majorly
problematic; if you call locals() inside the inner scope, and then use
that dictionary outside it, you should expect it to work. This would
be hard.

> What if the outer e was nonlocal or global?

The inner e will always get its magic name, and it doesn't matter what
the outer e is. That's exactly the same as would happen if there were
no shadowing:

>>> def f(x):
...     global e
...     try: 1/x
...     except ZeroDivisionError as e: pass
...     return e**x
>>> e=2.718281828
>>> f(3)
>>> f(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in f
NameError: name 'e' is not defined

If x is nonzero, the except clause doesn't happen, and no shadowing
happens. With this theory, the same would happen if x is zero - the
"as e" would effectively be "as <e.0>" or whatever the magic name is,
and then "e**x" would use the global e.

It would have to be an error to use a global or nonlocal statement
*inside* the as-governed block:

def f(x):
    except Exception as e:
        global e # SyntaxError

I can't imagine that this would be a problem to anyone. The rule is
that "as X" makes X into a statement-local name, and that's
incompatible with a global declaration.

> What if either e is referenced by an inner function?

I don't know about internals and how hard it'd be, but I would expect
that the as-name propagation should continue into the function. A
quick check with dis.dis() suggests that CPython uses a
LOAD_DEREF/STORE_DEREF bytecode to work with nonlocals, so that one
might have to become scope-aware too. (It would be based on
definition, not call, so it should be able to be compiled in somehow,
but I can't say for sure.)

> What if another statement re-rebinds e inside the first statement?

As in, something like this?

def f(x):
    e = 2.718
    try: 1/0
    except Exception as e:
        e = 1

The "e = 1" would assign to <e.0>, because it's in a scope where the
local name e translates into that. Any use of that name, whether
rebinding or referencing, will use the inner scope. But I would expect
this sort of thing to be unusual.

> What if you do this inside a class (or at top level)?

At top level, it would presumably have to create another global. If
you call a function from inside that block, it won't see your
semi-local, though I'm not sure what happens if you _define_ a
function inside a block like that:

with open("spam.log", "a") as logfile:
    def log(x):

Given that this example wouldn't work anyway (the file would get
closed before the function gets called), and I can't think of any
non-trivial examples where you'd actually want this, I can't call what
ought to happen.

> I think for a quick hack to play with this, you don't have to worry about any of those issues; just say that's illegal, and whatever happens (even a segfault) is your own fault for trying it. But for a real implementation, I'm not even sure what the rules should be, much less how to implement them.

Sure, for a quick-and-dirty. I think some will be illegal long-term too.

> (I'm guessing the implementation could either involve having a stack of symbol tables, or tagging things at the AST level while we've still got a tree and using that info in the last step, but I think there's still a problem telling the machinery how to set up closure cells to link inner functions' free variables.)

I have no idea about the CPython internals, but my broad thinking is
something like this: You start with an empty stack, and add to it
whenever you hit an "as" clause. Whenever you look up a name, you
proceed through the stack from newest to oldest; if you find the name,
you use the mangled name from that stack entry. Otherwise, you use the
same handling as current.

> Also, all of this assumes that none of the machinery, even for tracebacks and debugging, cares about the name of the variable, just its index. Is that true?

I'm not entirely sure, but I think that tracebacks etc will start with
the index and then look it up. Having duplicate names in co_varnames
would allow them to look correct. Can someone confirm?

>> Example:
>> def f(x):
>>    e = 2.718281828
>>    try:
>>        return e/x
>>    except ZeroDivisionError as e:
>>        raise ContrivedCodeException from e
>> Currently, f.__code__.co_varnames is ('x', 'e'), and all the
>> references to e are working with slot 1; imagine if, instead,
>> co_varnames were ('x', 'e', 'e') and the last two lines used slot 2
>> instead. Then the final act of the except clause would be to unbind
>> its local name e (slot 2),
>> and then any code after the except block
>> would use slot 1 for e, and the original value would "reappear".
> I don't think that "unbind" is a real step that needs to happen. The names have to get mapped to slot numbers at compile time anyway, so if all code outside of the except clause was compiled to LOAD_FAST 1 instead of LOAD_FAST 2, it doesn't matter that slot 2 has the same name. The only thing you need to do is the existing implicit "del e" on slot 2. (If you somehow managed to do another LOAD_FAST 2 after that, it would just be an UnboundLocalError, which is fine. But no code outside the except clause can compile to that anyway, unless there's a bug in your idea of its implementation or someone does some byteplay stuff).

The unbind is there to prevent a reference loop from causing problems.
And yes, it's effectively the implicit "del e" on slot 2.


From abarnert at  Wed Jun 10 06:03:28 2015
From: abarnert at (Andrew Barnert)
Date: Tue, 9 Jun 2015 21:03:28 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 9, 2015, at 20:27, Chris Angelico <rosuav at> wrote:
> with open("spam.log", "a") as logfile:
>    def log(x):
>        logfile.write(x)
> Given that this example wouldn't work anyway (the file would get
> closed before the function gets called), and I can't think of any
> non-trivial examples where you'd actually want this, I can't call what
> ought to happen.

The obvious one is:

    with open("spam.log", "a") as logfile:
        def log(x):

Of course in this case you could just pass logfile.write instead of a function, but more generally, anywhere you create a helper or callback as a closure to use immediately (e.g., in a SAX parser) instead of later (e.g., in a network server or GUI) it makes sense to put a closure inside a with statement.

Also, remember that the whole point here is to extend as-binding so it works in if and while conditions, and maybe arbitrary expressions, and those cases it's even more obvious why you'd want to create a closure.

Anyway, I think I know what all the compiled bytecode and code attributes for that case could look like (although I'd need to think through the edge cases), I'm just not sure if the code that compiles it today will be able to handle things without some rename-and-rename-back hack. I suppose the obvious answer is for someone to just try writing it and see. :)

But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first.

From rosuav at  Wed Jun 10 08:17:21 2015
From: rosuav at (Chris Angelico)
Date: Wed, 10 Jun 2015 16:17:21 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 10, 2015 at 2:03 PM, Andrew Barnert <abarnert at> wrote:
> On Jun 9, 2015, at 20:27, Chris Angelico <rosuav at> wrote:
>> with open("spam.log", "a") as logfile:
>>    def log(x):
>>        logfile.write(x)
>> Given that this example wouldn't work anyway (the file would get
>> closed before the function gets called), and I can't think of any
>> non-trivial examples where you'd actually want this, I can't call what
>> ought to happen.
> The obvious one is:
>     with open("spam.log", "a") as logfile:
>         def log(x):
>             logfile.write(x)
>         do_lots_of_stuff(logfunc=log)
> Of course in this case you could just pass logfile.write instead of a function, but more generally, anywhere you create a helper or callback as a closure to use immediately (e.g., in a SAX parser) instead of later (e.g., in a network server or GUI) it makes sense to put a closure inside a with statement.

Sure. In this example, there'd have to be some kind of "thing" that
exists as a global, and can be referenced by the log function. That's
not too hard; the usage all starts and ends inside the duration of the
"as" effect; any other global named "logfile" would simply be
unavailable. The confusion would come if you try to span the boundary
in some way - when it would be possible to call log(logfile) and have
it write to the log file defined by the with block, but have its
argument come from outside.

At very least, that would want to be strongly discouraged for reasons
of readability.

> Also, remember that the whole point here is to extend as-binding so it works in if and while conditions, and maybe arbitrary expressions, and those cases it's even more obvious why you'd want to create a closure.
> Anyway, I think I know what all the compiled bytecode and code attributes for that case could look like (although I'd need to think through the edge cases), I'm just not sure if the code that compiles it today will be able to handle things without some rename-and-rename-back hack. I suppose the obvious answer is for someone to just try writing it and see. :)
> But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first.

Okay. I'll start poking around with CPython and see what I can do.

I'm reminded of that spectacular slide from David Beazley's talk on
CPython and PyPy tinkering, where he has that VW called CPython, and
then talks about patches, extensions, PEPs... and python-ideas. at the four minute mark.


From rosuav at  Wed Jun 10 17:06:26 2015
From: rosuav at (Chris Angelico)
Date: Thu, 11 Jun 2015 01:06:26 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 10, 2015 at 4:17 PM, Chris Angelico <rosuav at> wrote:
>> But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first.
> Okay. I'll start poking around with CPython and see what I can do.

Here's a gross, disgusting, brutal hack. It applies only to try/except
(but can easily be expanded to other places; it's just a matter of
calling one function at top and bottom), and it currently assumes that
you're in a function scope (not at top level, not directly in a class;
methods are supported).

(Should I create a tracker issue? It's not even at proof-of-concept at
this point.)

Here's how it works: As an 'except' block is entered (at compilation
stage), a new subscope is defined. At the end of the except block,
after the "e = None; del e" opcodes get added in, the subscope is
popped off and disposed of. So long as there is a subscope attached to
the current compilation unit, any name lookups will be redirected
through it. Finally, when co_varnames is populated, names get
de-mangled, thus (possibly) making duplicates in the tuple, but more
importantly, getting tracebacks and such looking correct.

The subscope is a tiny thing that just says "this name now becomes
that mangled name", where the mangled name is the original name dot
something (eg mangle "e" and get back "e.0x12345678"); they're stored
in a linked list in the current compiler_unit.

Currently, locals() basically ignores the magic. If there is no
"regular" name to be shadowed, then it correctly picks up the interior
one; if there are both forms, I've no idea how it picks which one to
put into the dictionary, but it certainly can't logically retain both.
The fact that it manages to not crash and burn is, in my opinion, pure
luck :)

Can compiler_nameop() depend on all names being interned? I have a
full-on PyObject_RichCompareBool() to check for name equality; if
they're all interned, I could simply do a pointer comparison instead.

Next plan: Change compiler_comprehension_generator() to use subscopes
rather than a full nested function, and then do performance testing.
Currently, this can only have slowed things down. Removing the
function call overhead from list comps could give that speed back.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: scope_hack.patch
Type: text/x-patch
Size: 5783 bytes
Desc: not available
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Type: text/x-python
Size: 813 bytes
Desc: not available
URL: <>

From rymg19 at  Wed Jun 10 17:12:19 2015
From: rymg19 at (Ryan Gonzalez)
Date: Wed, 10 Jun 2015 10:12:19 -0500
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Maybe it's just me, but has several junk characters at the end (^@).

On June 10, 2015 10:06:26 AM CDT, Chris Angelico <rosuav at> wrote:
>On Wed, Jun 10, 2015 at 4:17 PM, Chris Angelico <rosuav at>
>>> But I think your quick&dirty hack may be worth playing with even if
>it bans this possibility and a few others, and may not be that hard to
>do if you make that decision, so if I were you I'd try that first.
>> Okay. I'll start poking around with CPython and see what I can do.
>Here's a gross, disgusting, brutal hack. It applies only to try/except
>(but can easily be expanded to other places; it's just a matter of
>calling one function at top and bottom), and it currently assumes that
>you're in a function scope (not at top level, not directly in a class;
>methods are supported).
>(Should I create a tracker issue? It's not even at proof-of-concept at
>this point.)
>Here's how it works: As an 'except' block is entered (at compilation
>stage), a new subscope is defined. At the end of the except block,
>after the "e = None; del e" opcodes get added in, the subscope is
>popped off and disposed of. So long as there is a subscope attached to
>the current compilation unit, any name lookups will be redirected
>through it. Finally, when co_varnames is populated, names get
>de-mangled, thus (possibly) making duplicates in the tuple, but more
>importantly, getting tracebacks and such looking correct.
>The subscope is a tiny thing that just says "this name now becomes
>that mangled name", where the mangled name is the original name dot
>something (eg mangle "e" and get back "e.0x12345678"); they're stored
>in a linked list in the current compiler_unit.
>Currently, locals() basically ignores the magic. If there is no
>"regular" name to be shadowed, then it correctly picks up the interior
>one; if there are both forms, I've no idea how it picks which one to
>put into the dictionary, but it certainly can't logically retain both.
>The fact that it manages to not crash and burn is, in my opinion, pure
>luck :)
>Can compiler_nameop() depend on all names being interned? I have a
>full-on PyObject_RichCompareBool() to check for name equality; if
>they're all interned, I could simply do a pointer comparison instead.
>Next plan: Change compiler_comprehension_generator() to use subscopes
>rather than a full nested function, and then do performance testing.
>Currently, this can only have slowed things down. Removing the
>function call overhead from list comps could give that speed back.
>Python-ideas mailing list
>Python-ideas at
>Code of Conduct:

Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Wed Jun 10 17:15:19 2015
From: rosuav at (Chris Angelico)
Date: Thu, 11 Jun 2015 01:15:19 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 11, 2015 at 1:12 AM, Ryan Gonzalez <rymg19 at> wrote:
> Maybe it's just me, but has several junk characters at the
> end (^@).

Hmm, I just redownloaded it, and it appears correct. The end of the
file has some triple-quoted strings, the last one ends with three
double quote characters and then a newline, then that's it. But maybe
that's Gmail being too smart and just giving me back what I sent.


From joejev at  Wed Jun 10 17:33:08 2015
From: joejev at (Joseph Jevnik)
Date: Wed, 10 Jun 2015 11:33:08 -0400
Subject: [Python-ideas] slice.literal notation
Message-ID: <>

I was told in the thread that it might be a good idea to bring this up on
python discussions. Here is a link to the proposed patch and some existing

I often find that when working with pandas and numpy I want to store slice
objects in variables to pass around and re-use; however, the syntax for
constructing a slice literal outside of an indexer is very different from
the syntax used inside of a subscript. This patch proposes the following


This would be a singleton instance of a class that looks like:

class sliceliteral(object):
    def __getitem__(self, key):
        return key

The basic idea is to provide an alternative constructor to 'slice' that
uses the subscript syntax. This allows people to write more understandable

Consider the following examples:

reverse = slice(None, None, -1)
reverse = slice.literal[::-1]

all_rows_first_col = slice(None), slice(0)
all_rows_first_col = slice.literal[:, 0]

first_row_all_cols_but_last = slice(0), slice(None, -1)
first_row_all_cols_but_last = slice.literal[0, :-1]

Again, this is not intended to make the code shorter, instead, it is
designed to make it more clear what the slice object your are constructing
looks like.

Another feature of the new `literal` object is that it is not limited to
just the creation of `slice` instances; instead, it is designed to mix
slices and other types together. For example:

>>> slice.literal[0]
>>> slice.literal[0, 1]
(0, 1)
>>> slice.literal[0, 1:]
(0, slice(1, None, None)
>>> slice.literal[:, ..., ::-1]
(slice(None, None, None), Ellipsis, slice(None, None, -1)

These examples show that sometimes the subscript notation is much more
clear that the non-subscript notation.
I believe that while this is trivial, it is very convinient to have on the
slice type itself so that it is quickly available. This also prevents
everyone from rolling their own version that is accesible in different ways
(think Py_RETURN_NONE).
Another reason that chose this aproach is that it requires no change to the
syntax to support.

There is a second change proposed here and that is to 'slice.__repr__'.
This change makes the repr of a slice object match the new literal syntax
to make it easier to read.

>>> slice.literal[:]
>>> slice.literal[1:]
>>> slice.literal[1:-1]
>>> slice.literal[:-1]
>>> slice.literal[::-1]

This change actually affects old behaviour so I am going to upload it as a
seperate patch. I understand that the change to repr much be less desirable
than the addition of 'slice.literal'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From taleinat at  Wed Jun 10 18:01:49 2015
From: taleinat at (Tal Einat)
Date: Wed, 10 Jun 2015 19:01:49 +0300
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 10, 2015 at 6:33 PM, Joseph Jevnik <joejev at> wrote:
> I was told in the thread that it might be a good idea to bring this up on
> python discussions. Here is a link to the proposed patch and some existing
> comments:
> I often find that when working with pandas and numpy I want to store slice
> objects in variables to pass around and re-use; however, the syntax for
> constructing a slice literal outside of an indexer is very different from
> the syntax used inside of a subscript. This patch proposes the following
> change:
>     slice.literal
> This would be a singleton instance of a class that looks like:
> class sliceliteral(object):
>     def __getitem__(self, key):
>         return key
> The basic idea is to provide an alternative constructor to 'slice' that uses
> the subscript syntax. This allows people to write more understandable code.
> Consider the following examples:
> reverse = slice(None, None, -1)
> reverse = slice.literal[::-1]
> all_rows_first_col = slice(None), slice(0)
> all_rows_first_col = slice.literal[:, 0]
> first_row_all_cols_but_last = slice(0), slice(None, -1)
> first_row_all_cols_but_last = slice.literal[0, :-1]
> Again, this is not intended to make the code shorter, instead, it is
> designed to make it more clear what the slice object your are constructing
> looks like.
> Another feature of the new `literal` object is that it is not limited to
> just the creation of `slice` instances; instead, it is designed to mix
> slices and other types together. For example:
>>>> slice.literal[0]
> 0
>>>> slice.literal[0, 1]
> (0, 1)
>>>> slice.literal[0, 1:]
> (0, slice(1, None, None)
>>>> slice.literal[:, ..., ::-1]
> (slice(None, None, None), Ellipsis, slice(None, None, -1)
> These examples show that sometimes the subscript notation is much more clear
> that the non-subscript notation.
> I believe that while this is trivial, it is very convinient to have on the
> slice type itself so that it is quickly available. This also prevents
> everyone from rolling their own version that is accesible in different ways
> (think Py_RETURN_NONE).
> Another reason that chose this aproach is that it requires no change to the
> syntax to support.

In regard with the first suggestion, this has already been mentioned
on the tracker but is important enough to repeat here: This already
exists in NumPy as IndexExpression, used via numpy.S_ or
numpy.index_exp. For details, see:

- Tal Einat

From random832 at  Wed Jun 10 18:03:06 2015
From: random832 at (random832 at
Date: Wed, 10 Jun 2015 12:03:06 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 10, 2015, at 11:33, Joseph Jevnik wrote:
> ...

What about slice[...]?

From joejev at  Wed Jun 10 18:10:47 2015
From: joejev at (Joseph Jevnik)
Date: Wed, 10 Jun 2015 12:10:47 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

I considered `slice[...]` however, this will change some existing
behaviour. This would mean we need to put a metaclass on slice, and then
`type(slice) is type` would no longer be true. Also, with 3.5's typing
work, we are overloading the meaning of indexing a type object. Adding the
slice.literal does not break anything or conflict with any syntax.

On Wed, Jun 10, 2015 at 12:03 PM, <random832 at> wrote:

> On Wed, Jun 10, 2015, at 11:33, Joseph Jevnik wrote:
> > ...
> What about slice[...]?
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Wed Jun 10 18:20:45 2015
From: solipsis at (Antoine Pitrou)
Date: Wed, 10 Jun 2015 18:20:45 +0200
Subject: [Python-ideas] slice.literal notation
References: <>
Message-ID: <20150610182045.01eb4ee7@fsol>

On Wed, 10 Jun 2015 19:01:49 +0300
Tal Einat <taleinat at> wrote:
> >
> >>>> slice.literal[0]
> > 0
> >>>> slice.literal[0, 1]
> > (0, 1)
> >>>> slice.literal[0, 1:]
> > (0, slice(1, None, None)
> >>>> slice.literal[:, ..., ::-1]
> > (slice(None, None, None), Ellipsis, slice(None, None, -1)
> >
> > These examples show that sometimes the subscript notation is much more clear
> > that the non-subscript notation.


> > I believe that while this is trivial, it is very convinient to have on the
> > slice type itself so that it is quickly available. This also prevents
> > everyone from rolling their own version that is accesible in different ways
> > (think Py_RETURN_NONE).
> > Another reason that chose this aproach is that it requires no change to the
> > syntax to support.
> In regard with the first suggestion, this has already been mentioned
> on the tracker but is important enough to repeat here: This already
> exists in NumPy as IndexExpression, used via numpy.S_ or
> numpy.index_exp.

Probably, but it looks useful to enough to integrate the standard
Another possible place for it would be the ast module.



From joejev at  Wed Jun 10 18:23:32 2015
From: joejev at (Joseph Jevnik)
Date: Wed, 10 Jun 2015 12:23:32 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <20150610182045.01eb4ee7@fsol>
References: <>
Message-ID: <>

I am not sure if this makes sense in the ast module only because it does
not generate _ast.Slice objects and instead returns the keys.

On Wed, Jun 10, 2015 at 12:20 PM, Antoine Pitrou <solipsis at>

> On Wed, 10 Jun 2015 19:01:49 +0300
> Tal Einat <taleinat at> wrote:
> > >
> > >>>> slice.literal[0]
> > > 0
> > >>>> slice.literal[0, 1]
> > > (0, 1)
> > >>>> slice.literal[0, 1:]
> > > (0, slice(1, None, None)
> > >>>> slice.literal[:, ..., ::-1]
> > > (slice(None, None, None), Ellipsis, slice(None, None, -1)
> > >
> > > These examples show that sometimes the subscript notation is much more
> clear
> > > that the non-subscript notation.
> Agreed.
> > > I believe that while this is trivial, it is very convinient to have on
> the
> > > slice type itself so that it is quickly available. This also prevents
> > > everyone from rolling their own version that is accesible in different
> ways
> > > (think Py_RETURN_NONE).
> > > Another reason that chose this aproach is that it requires no change
> to the
> > > syntax to support.
> >
> > In regard with the first suggestion, this has already been mentioned
> > on the tracker but is important enough to repeat here: This already
> > exists in NumPy as IndexExpression, used via numpy.S_ or
> > numpy.index_exp.
> Probably, but it looks useful to enough to integrate the standard
> library.
> Another possible place for it would be the ast module.
> Regards
> Antoine.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Wed Jun 10 18:26:42 2015
From: solipsis at (Antoine Pitrou)
Date: Wed, 10 Jun 2015 18:26:42 +0200
Subject: [Python-ideas] slice.literal notation
References: <>
Message-ID: <20150610182642.27fdf6a5@fsol>

On Wed, 10 Jun 2015 12:23:32 -0400
Joseph Jevnik <joejev at> wrote:

> I am not sure if this makes sense in the ast module only because it does
> not generate _ast.Slice objects and instead returns the keys.

There's already ast.literal_eval() there, so that was why I thought it
could be related. Thought at literal_eval() *compiles* its input, which
slice.literal wouldn't, so the relationship is quite distant...



From mertz at  Wed Jun 10 21:05:38 2015
From: mertz at (David Mertz)
Date: Wed, 10 Jun 2015 12:05:38 -0700
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>


This is an elegant improvement that doesn't affect backward compatibility.
Obviously, the difference between the spelling 'sliceliteral[::-1]' and
'slice.literal[::-1]' isn't that big, but having it attached to the slice
type itself rather than a user class feels more natural.

On Wed, Jun 10, 2015 at 8:33 AM, Joseph Jevnik <joejev at> wrote:

> I was told in the thread that it might be a good idea to bring this up on
> python discussions. Here is a link to the proposed patch and some existing
> comments:
> I often find that when working with pandas and numpy I want to store slice
> objects in variables to pass around and re-use; however, the syntax for
> constructing a slice literal outside of an indexer is very different from
> the syntax used inside of a subscript. This patch proposes the following
> change:
>     slice.literal
> This would be a singleton instance of a class that looks like:
> class sliceliteral(object):
>     def __getitem__(self, key):
>         return key
> The basic idea is to provide an alternative constructor to 'slice' that
> uses the subscript syntax. This allows people to write more understandable
> code.
> Consider the following examples:
> reverse = slice(None, None, -1)
> reverse = slice.literal[::-1]
> all_rows_first_col = slice(None), slice(0)
> all_rows_first_col = slice.literal[:, 0]
> first_row_all_cols_but_last = slice(0), slice(None, -1)
> first_row_all_cols_but_last = slice.literal[0, :-1]
> Again, this is not intended to make the code shorter, instead, it is
> designed to make it more clear what the slice object your are constructing
> looks like.
> Another feature of the new `literal` object is that it is not limited to
> just the creation of `slice` instances; instead, it is designed to mix
> slices and other types together. For example:
> >>> slice.literal[0]
> 0
> >>> slice.literal[0, 1]
> (0, 1)
> >>> slice.literal[0, 1:]
> (0, slice(1, None, None)
> >>> slice.literal[:, ..., ::-1]
> (slice(None, None, None), Ellipsis, slice(None, None, -1)
> These examples show that sometimes the subscript notation is much more
> clear that the non-subscript notation.
> I believe that while this is trivial, it is very convinient to have on the
> slice type itself so that it is quickly available. This also prevents
> everyone from rolling their own version that is accesible in different ways
> (think Py_RETURN_NONE).
> Another reason that chose this aproach is that it requires no change to
> the syntax to support.
> There is a second change proposed here and that is to 'slice.__repr__'.
> This change makes the repr of a slice object match the new literal syntax
> to make it easier to read.
> >>> slice.literal[:]
> slice.literal[:]
> >>> slice.literal[1:]
> slice.literal[1:]
> >>> slice.literal[1:-1]
> slice.literal[1:-1]
> >>> slice.literal[:-1]
> slice.literal[:-1]
> >>> slice.literal[::-1]
> slice.literal[::-1]
> This change actually affects old behaviour so I am going to upload it as a
> seperate patch. I understand that the change to repr much be less desirable
> than the addition of 'slice.literal'
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ethan at  Wed Jun 10 21:16:42 2015
From: ethan at (Ethan Furman)
Date: Wed, 10 Jun 2015 12:16:42 -0700
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 06/10/2015 08:33 AM, Joseph Jevnik wrote:

> The basic idea is to provide an alternative constructor to 'slice' that uses
>  the subscript syntax. This allows people to write more understandable code.


> There is a second change proposed here and that is to 'slice.__repr__'. This
>  change makes the repr of a slice object match the new literal syntax to make
>  it easier to read.


Having the old repr makes it possible to see what the equivalent slice() spelling is.


From joejev at  Wed Jun 10 21:18:24 2015
From: joejev at (Joseph Jevnik)
Date: Wed, 10 Jun 2015 15:18:24 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan, I am also not 100% on the new repr, I just wanted to propose this
change. In the issue, I have separated that change into it's own patch to
make it easier to apply the slice.literal without the repr update.

On Wed, Jun 10, 2015 at 3:16 PM, Ethan Furman <ethan at> wrote:

> On 06/10/2015 08:33 AM, Joseph Jevnik wrote:
>  The basic idea is to provide an alternative constructor to 'slice' that
>> uses
>>  the subscript syntax. This allows people to write more understandable
>> code.
> +1
>  There is a second change proposed here and that is to 'slice.__repr__'.
>> This
>>  change makes the repr of a slice object match the new literal syntax to
>> make
>>  it easier to read.
> -1
> Having the old repr makes it possible to see what the equivalent slice()
> spelling is.
> --
> ~Ethan~
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From taleinat at  Wed Jun 10 22:21:29 2015
From: taleinat at (Tal Einat)
Date: Wed, 10 Jun 2015 23:21:29 +0300
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 10, 2015 at 10:05 PM, David Mertz <mertz at> wrote:
> On Wed, Jun 10, 2015 at 8:33 AM, Joseph Jevnik <joejev at> wrote:
>> I was told in the thread that it might be a good idea to bring this up on
>> python discussions. Here is a link to the proposed patch and some existing
>> comments:
>> I often find that when working with pandas and numpy I want to store slice
>> objects in variables to pass around and re-use; however, the syntax for
>> constructing a slice literal outside of an indexer is very different from
>> the syntax used inside of a subscript. This patch proposes the following
>> change:
>>     slice.literal
>> This would be a singleton instance of a class that looks like:
>> class sliceliteral(object):
>>     def __getitem__(self, key):
>>         return key
>> The basic idea is to provide an alternative constructor to 'slice' that
>> uses the subscript syntax. This allows people to write more understandable
>> code.
>> Consider the following examples:
>> reverse = slice(None, None, -1)
>> reverse = slice.literal[::-1]
>> all_rows_first_col = slice(None), slice(0)
>> all_rows_first_col = slice.literal[:, 0]
>> first_row_all_cols_but_last = slice(0), slice(None, -1)
>> first_row_all_cols_but_last = slice.literal[0, :-1]
>> Again, this is not intended to make the code shorter, instead, it is
>> designed to make it more clear what the slice object your are constructing
>> looks like.
>> Another feature of the new `literal` object is that it is not limited to
>> just the creation of `slice` instances; instead, it is designed to mix
>> slices and other types together. For example:
>> >>> slice.literal[0]
>> 0
>> >>> slice.literal[0, 1]
>> (0, 1)
>> >>> slice.literal[0, 1:]
>> (0, slice(1, None, None)
>> >>> slice.literal[:, ..., ::-1]
>> (slice(None, None, None), Ellipsis, slice(None, None, -1)
>> These examples show that sometimes the subscript notation is much more
>> clear that the non-subscript notation.
>> I believe that while this is trivial, it is very convinient to have on the
>> slice type itself so that it is quickly available. This also prevents
>> everyone from rolling their own version that is accesible in different ways
>> (think Py_RETURN_NONE).
>> Another reason that chose this aproach is that it requires no change to
>> the syntax to support.
> +1
> This is an elegant improvement that doesn't affect backward compatibility.
> Obviously, the difference between the spelling 'sliceliteral[::-1]' and
> 'slice.literal[::-1]' isn't that big, but having it attached to the slice
> type itself rather than a user class feels more natural.

I dislike adding this to the slice class since many use cases don't
result in a slice at all. For example:

[0] -> int
[...] -> Ellipsis
[0:1, 2:3] -> 2-tuple of slice object

I like NumPy's name of IndexExpression, perhaps we can stick to that?

As for where it would reside, some possibilities are:

* the operator module
* as part of the abstract base class
* the types module
* builtins

- Tal

From random832 at  Wed Jun 10 22:26:52 2015
From: random832 at (random832 at
Date: Wed, 10 Jun 2015 16:26:52 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 10, 2015, at 15:16, Ethan Furman wrote:
> On 06/10/2015 08:33 AM, Joseph Jevnik wrote:
> > There is a second change proposed here and that is to 'slice.__repr__'. This
> >  change makes the repr of a slice object match the new literal syntax to make
> >  it easier to read.
> -1
> Having the old repr makes it possible to see what the equivalent slice()
> spelling is.

How about a separate method, slice.as_index_syntax() that just returns

From greg.ewing at  Thu Jun 11 00:16:55 2015
From: greg.ewing at (Greg Ewing)
Date: Thu, 11 Jun 2015 10:16:55 +1200
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Chris Angelico wrote:
> How hard would it be to hack the bytecode compiler to treat two names
> as distinct despite appearing the same?

Back when list comprehensions were changed to not leak
the variable, it was apparently considered too hard to
be worth the effort, since we ended up with the nested
function implementation.


From greg.ewing at  Thu Jun 11 00:29:01 2015
From: greg.ewing at (Greg Ewing)
Date: Thu, 11 Jun 2015 10:29:01 +1200
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

David Mertz wrote:

> This
>     patch proposes the following change:
>         slice.literal

It would be even nicer if the slice class itself
implemented the [] syntax:

    myslice = slice[1:2]


From joejev at  Thu Jun 11 00:30:22 2015
From: joejev at (Joseph Jevnik)
Date: Wed, 10 Jun 2015 18:30:22 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

Do you think that the dropping '.literal' is worth the change in behaviour?

On Wed, Jun 10, 2015 at 6:29 PM, Greg Ewing <greg.ewing at>

> David Mertz wrote:
>  This
>>     patch proposes the following change:
>>         slice.literal
> It would be even nicer if the slice class itself
> implemented the [] syntax:
>    myslice = slice[1:2]
> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Thu Jun 11 01:38:55 2015
From: rosuav at (Chris Angelico)
Date: Thu, 11 Jun 2015 09:38:55 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 11, 2015 at 8:16 AM, Greg Ewing <greg.ewing at> wrote:
> Chris Angelico wrote:
>> How hard would it be to hack the bytecode compiler to treat two names
>> as distinct despite appearing the same?
> Back when list comprehensions were changed to not leak
> the variable, it was apparently considered too hard to
> be worth the effort, since we ended up with the nested
> function implementation.

Yeah. I now have a brutal hack that does exactly that, so I'm fully
expecting someone to point out "Uhh, this isn't going to work


From tjreedy at  Thu Jun 11 01:43:13 2015
From: tjreedy at (Terry Reedy)
Date: Wed, 10 Jun 2015 19:43:13 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <mlai3l$eot$>

On 6/10/2015 11:33 AM, Joseph Jevnik wrote:

> I often find that when working with pandas and numpy I want to store
> slice objects in variables to pass around and re-use; however, the
> syntax for constructing a slice literal outside of an indexer is very
> different from the syntax used inside of a subscript. This patch
> proposes the following change:
>      slice.literal
> This would be a singleton instance of a class that looks like:
> class sliceliteral(object):
>      def __getitem__(self, key):
>          return key

Alternate constructors are implemented as class methods.

class slice:
     def literal(cls, key):
         if isinstance(key, cls):
             return key
             else raise ValueError('slice literal mush be slice')

They are typically names fromxyz or from_xyz.

Tal Einat pointed out that not all keys are slices

 > [0] -> int
 > [...] -> Ellipsis
 > [0:1, 2:3] -> 2-tuple of slice object

I think the first two cases should value errors. The third might be 
debated, but if allowed, this would not be a slice constructor.

Terry Jan Reedy

From joejev at  Thu Jun 11 01:45:34 2015
From: joejev at (Joseph Jevnik)
Date: Wed, 10 Jun 2015 19:45:34 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <mlai3l$eot$>
References: <>
Message-ID: <>

We cannot use a class method here because slice.literal(:) is a syntax

On Wed, Jun 10, 2015 at 7:43 PM, Terry Reedy <tjreedy at> wrote:

> On 6/10/2015 11:33 AM, Joseph Jevnik wrote:
>  I often find that when working with pandas and numpy I want to store
>> slice objects in variables to pass around and re-use; however, the
>> syntax for constructing a slice literal outside of an indexer is very
>> different from the syntax used inside of a subscript. This patch
>> proposes the following change:
>>      slice.literal
>> This would be a singleton instance of a class that looks like:
>> class sliceliteral(object):
>>      def __getitem__(self, key):
>>          return key
> Alternate constructors are implemented as class methods.
> class slice:
>     ...
>     @classmethod
>     def literal(cls, key):
>         if isinstance(key, cls):
>             return key
>         else:
>             else raise ValueError('slice literal mush be slice')
> They are typically names fromxyz or from_xyz.
> Tal Einat pointed out that not all keys are slices
> > [0] -> int
> > [...] -> Ellipsis
> > [0:1, 2:3] -> 2-tuple of slice object
> I think the first two cases should value errors. The third might be
> debated, but if allowed, this would not be a slice constructor.
> --
> Terry Jan Reedy
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Thu Jun 11 02:54:46 2015
From: ncoghlan at (Nick Coghlan)
Date: Thu, 11 Jun 2015 10:54:46 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 10 June 2015 at 10:20, Andrew Barnert <abarnert at> wrote:
> On Jun 9, 2015, at 16:46, Nick Coghlan <ncoghlan at> wrote:
>> However if name bindings *didn't* leak out of their containing expression by default, and while/if/elif code generation instead gained machinery to retrieve the name bindings for any named subexpressions in the condition, that would eliminate most of the potentially bizarre edge cases.
> I don't think here's any consistent way to define "containing expression" that makes any sense for while/if statements.

Sure there is. The gist of the basic "no leak" behaviour could be
something like:

1. Any expression containing a named subexpression would automatically
be converted to a lambda expression that is defined and called inline
(expressions that already implicitly define their own scope,
specifically comprehensions and generation expressions, would
terminate the search for the "containing expression" node and allow
this step to be skipped).
2. Any name references from within the expression that are not
references to named subexpressions or comprehension iteration
variables would be converted to parameter names for the implicitly
defined lambda expression, and thus resolved in the containing scope
rather than the nested scope.

In that basic mode, the only thing made available from the implicitly
created scope would be the result of the lambda expression. Something

    x = (250 as a)*a + b

would be equivalent to:

    x= (lambda b: ((250 as a)*a + b))(b)

if/elif/while clauses would define the behaviour of their conditional
expressions slightly differently: for those, the values of any named
subexpressions would also be passed back out, allowing them to be
bound appropriately in the outer scope (requiring compatibility with
class and module namespaces means it wouldn't be possible to use cell
references here).

Whether there should be a separate "bindlocal" statement for lifting
named subexpressions out of an expression and binding them all locally
would be an interesting question - I can't think of a good *use case*
for that, but it would be a good hook for explaining the difference
between the default behaviour of named subexpressions and the variant
used in if/elif/while conditional expressions.

> But "containing _statement_", that's easy.

No, it's not, because statements already contain name binding
operations that persist beyond the scope of the statement. In addition
to actual assignment statements, there are also for loops, with
statements, class definitions and function definitions.

Having the presence of a named subexpression magically change the
scope of the statement level name binding operations wouldn't be
acceptable, and having some name bindings propagate but not others
gets very tricky in the general case. (PEP's 403 and 3150 go into some
of the complexities)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Thu Jun 11 03:16:00 2015
From: ncoghlan at (Nick Coghlan)
Date: Thu, 11 Jun 2015 11:16:00 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 10 June 2015 at 10:54, Chris Angelico <rosuav at> wrote:
> On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas
> <python-ideas at> wrote:
>> Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa.
> I'd actually rather see this implemented the other way around: instead
> of turning this into a function call, actually have a real concept of
> nested scoping. Nested functions imply changes to tracebacks and such,
> which scoping doesn't require.
> How hard would it be to hack the bytecode compiler to treat two names
> as distinct despite appearing the same?

I tried to do this when working with Georg Brandl to implement the
Python 3 change to hide the iteration variable in comprehensions and
generator expressions, and I eventually gave up and used an implicit
local function definition:

This earlier post from just before we started working on that covers
some of the approaches I tried, as well as noting why this problem is
much harder than it might first seem:

One of the other benefits that I don't believe came up in either of
those threads is that using real frames for implicit scoping means
that *other tools* already know how to cope with it - pdb, gdb,
inspect, dis, traceback, etc, are all able to deal with what's going
on. If you introduce a new *kind* of scope, rather than just
implicitly using another level of our *existing* scoping rules, then
there's a whole constellation of tools (including other interpreter
implementations) that will need adjusting to model an entirely new
semantic concept, rather than another instance of an existing concept.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Thu Jun 11 03:30:37 2015
From: ncoghlan at (Nick Coghlan)
Date: Thu, 11 Jun 2015 11:30:37 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 11 June 2015 at 11:16, Nick Coghlan <ncoghlan at> wrote:
> On 10 June 2015 at 10:54, Chris Angelico <rosuav at> wrote:
>> On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas
>> <python-ideas at> wrote:
>>> Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa.
>> I'd actually rather see this implemented the other way around: instead
>> of turning this into a function call, actually have a real concept of
>> nested scoping. Nested functions imply changes to tracebacks and such,
>> which scoping doesn't require.
>> How hard would it be to hack the bytecode compiler to treat two names
>> as distinct despite appearing the same?
> I tried to do this when working with Georg Brandl to implement the
> Python 3 change to hide the iteration variable in comprehensions and
> generator expressions, and I eventually gave up and used an implicit
> local function definition:

Re-reading that post, I found this:

I don't think anyone has yet tried speeding up simple function level
cases at the peephole optimiser stage of the code generation pipeline
(at module and class level, the nested function is already often a
speed increase due to the use of optimised local variable access in
the implicitly created function scope).

However, I'm not sure our pattern matching is really up to the task of
detecting this at bytecode generation time - doing something about in
a JIT-compiled runtime like PyPy, Numba or Pyston might be more


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Thu Jun 11 03:54:03 2015
From: ncoghlan at (Nick Coghlan)
Date: Thu, 11 Jun 2015 11:54:03 +1000
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 11 June 2015 at 06:21, Tal Einat <taleinat at> wrote:
> On Wed, Jun 10, 2015 at 10:05 PM, David Mertz <mertz at> wrote:
>> This is an elegant improvement that doesn't affect backward compatibility.
>> Obviously, the difference between the spelling 'sliceliteral[::-1]' and
>> 'slice.literal[::-1]' isn't that big, but having it attached to the slice
>> type itself rather than a user class feels more natural.
> I dislike adding this to the slice class since many use cases don't
> result in a slice at all. For example:
> [0] -> int
> [...] -> Ellipsis
> [0:1, 2:3] -> 2-tuple of slice object
> I like NumPy's name of IndexExpression, perhaps we can stick to that?
> As for where it would reside, some possibilities are:
> * the operator module
> * as part of the abstract base class
> * the types module
> * builtins

I'm with Tal here - I like the concept, don't like the spelling
because it may return things other than slice objects.

While the formal name of the operation denoted by trailing square
brackets "[]" is "subscript" (with indexing and slicing being only two
of its several use cases), the actual *protocol* involved in
implementing that operation is getitem/setitem/delitem, so using the
formal name would count as "non-obvious" in my view.

Accordingly, I'd suggest putting this in under the name
"operator.itemkey" (no underscore because the operator module
traditionally omits them).

zero = operator.itemkey[0]

ellipsis = operator.itemkey[...]

reverse = slice(None, None, -1)
reverse = operator.itemkey[::-1]

all_rows_first_col = slice(None), slice(0)
all_rows_first_col = operator.itemkey[:, 0]

first_row_all_cols_but_last = slice(0), slice(None, -1)
first_row_all_cols_but_last = operator.itemkey[0, :-1]

Documentation would say that indexing into this object produces the
result of the key transformation step of getitem/setitem/delitem


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abarnert at  Thu Jun 11 10:38:24 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 11 Jun 2015 01:38:24 -0700
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 10, 2015, at 18:54, Nick Coghlan <ncoghlan at> wrote:
> I'm with Tal here - I like the concept, don't like the spelling
> because it may return things other than slice objects.
> While the formal name of the operation denoted by trailing square
> brackets "[]" is "subscript" (with indexing and slicing being only two
> of its several use cases), the actual *protocol* involved in
> implementing that operation is getitem/setitem/delitem, so using the
> formal name would count as "non-obvious" in my view.
> Accordingly, I'd suggest putting this in under the name
> "operator.itemkey" (no underscore because the operator module
> traditionally omits them).

That name seems a little odd. Normally by "key", you mean the thing you subscript a mapping with, as opposed to an index, the thing you subscript a sequence with (either specifically an integer, or the broader sense of an integer, a slice, an ellipsis, or a tuple of indices recursively).

(Of course you _can_ use this with a mapping key, but then it just returns the same key you passed in, which isn't very useful, except in allowing generic code that doesn't know whether it has a key or an index and wants to pass it on to a mapping or sequence, which obviously isn't the main use here.)

"itemindex" avoids the main problem with "itemkey", but it still shares the secondary problem of burying the fact that this is about slices (and tuples of plain indices and slices), not just (or even primarily) plain indices.

I agree with you that "subscript" isn't a very good name either.

I guess "lookup" is another possibility, and it parallels "LookupError" being the common base class of "IndexError" and "KeyError", but that sounds even less meaningful than "subscript" to me.

So, I don't have a good name to offer.

One last thing: Would it be worth adding bracket syntax to itemgetter, to make it easier to create slicing functions? (That wouldn't remove the need for this function, or vice versa, but since we're in operator and adding a thing that gets "called" with brackets...)

> zero = operator.itemkey[0]
> ellipsis = operator.itemkey[...]
> reverse = slice(None, None, -1)
> reverse = operator.itemkey[::-1]
> all_rows_first_col = slice(None), slice(0)
> all_rows_first_col = operator.itemkey[:, 0]
> first_row_all_cols_but_last = slice(0), slice(None, -1)
> first_row_all_cols_but_last = operator.itemkey[0, :-1]
> Documentation would say that indexing into this object produces the
> result of the key transformation step of getitem/setitem/delitem
> Cheers,
> Nick.
> -- 
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From rosuav at  Thu Jun 11 12:56:35 2015
From: rosuav at (Chris Angelico)
Date: Thu, 11 Jun 2015 20:56:35 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 11, 2015 at 1:06 AM, Chris Angelico <rosuav at> wrote:
> Next plan: Change compiler_comprehension_generator() to use subscopes
> rather than a full nested function, and then do performance testing.
> Currently, this can only have slowed things down. Removing the
> function call overhead from list comps could give that speed back.

Or maybe the next plan is to hack in a "while cond as name:" handler.
It works! And the name is bound only within the scope of the while
block and any else block (so when you get a falsey result, you can see
precisely _what_ falsey result it was).

The surprising part, in my opinion, is that this actually appears to
work outside a function. The demangling doesn't, but the original
mangling does.

It doesn't play ideally with locals() or globals(); the former appears
to take the first one that it sees, and ignore the others (though I
wouldn't promise that; certainly it takes exactly one local of any
given name. With globals(), you get the mangled name:

while input("Spam? ") as spam:

Spam? yes
{... 'spam.0x7f2080260228': 'yes'...}

With brand new syntax like "while cond as name:", it won't break
anything to use a mangled name, but this is a backward-incompatible
change as regards exception handling and globals(). Still, it's a fun

Aside from being a fun exercise for me, building a Volkswagen
Helicopter, is this at all useful to anybody?


From taleinat at  Thu Jun 11 14:57:02 2015
From: taleinat at (Tal Einat)
Date: Thu, 11 Jun 2015 15:57:02 +0300
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 11, 2015 at 11:38 AM, Andrew Barnert <abarnert at> wrote:
> On Jun 10, 2015, at 18:54, Nick Coghlan <ncoghlan at> wrote:
>> I'm with Tal here - I like the concept, don't like the spelling
>> because it may return things other than slice objects.
>> While the formal name of the operation denoted by trailing square
>> brackets "[]" is "subscript" (with indexing and slicing being only two
>> of its several use cases), the actual *protocol* involved in
>> implementing that operation is getitem/setitem/delitem, so using the
>> formal name would count as "non-obvious" in my view.
>> Accordingly, I'd suggest putting this in under the name
>> "operator.itemkey" (no underscore because the operator module
>> traditionally omits them).
> That name seems a little odd. Normally by "key", you mean the thing you subscript a mapping with, as opposed to an index, the thing you subscript a sequence with (either specifically an integer, or the broader sense of an integer, a slice, an ellipsis, or a tuple of indices recursively).
> (Of course you _can_ use this with a mapping key, but then it just returns the same key you passed in, which isn't very useful, except in allowing generic code that doesn't know whether it has a key or an index and wants to pass it on to a mapping or sequence, which obviously isn't the main use here.)
> "itemindex" avoids the main problem with "itemkey", but it still shares the secondary problem of burying the fact that this is about slices (and tuples of plain indices and slices), not just (or even primarily) plain indices.
> I agree with you that "subscript" isn't a very good name either.
> I guess "lookup" is another possibility, and it parallels "LookupError" being the common base class of "IndexError" and "KeyError", but that sounds even less meaningful than "subscript" to me.
> So, I don't have a good name to offer.
> One last thing: Would it be worth adding bracket syntax to itemgetter, to make it easier to create slicing functions? (That wouldn't remove the need for this function, or vice versa, but since we're in operator and adding a thing that gets "called" with brackets...)

I actually think "subscript" is quite good a name. It makes the
explicit distinction between subscripts, indexes and slices.

As for itemgetter, with X (placeholder for name we choose), you would
just do itemgetter(X[::-1]), so I don't see a need to change

- Tal

From ram at  Thu Jun 11 19:23:57 2015
From: ram at (Ram Rachum)
Date: Thu, 11 Jun 2015 20:23:57 +0300
Subject: [Python-ideas] Making -m work for scripts
Message-ID: <>


What do you think about making `python -m whatever` work also for installed
scripts and not just for modules? I need this now because I've installed
pypy on Linux, and I'm not sure how to run the `nosetests` of PyPy (in
contrast to the `nosetests` of the system Python.) It's sometimes a mess to
find where Linux installed the scripts related with each version of Python.
But if I could do `pypy -m nosetests` then I'd have a magic solution. What
do you think?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Thu Jun 11 19:37:30 2015
From: solipsis at (Antoine Pitrou)
Date: Thu, 11 Jun 2015 19:37:30 +0200
Subject: [Python-ideas] Making -m work for scripts
References: <>
Message-ID: <20150611193730.141204b8@fsol>

On Thu, 11 Jun 2015 20:23:57 +0300
Ram Rachum <ram at> wrote:
> Hi,
> What do you think about making `python -m whatever` work also for installed
> scripts and not just for modules? I need this now because I've installed
> pypy on Linux, and I'm not sure how to run the `nosetests` of PyPy (in
> contrast to the `nosetests` of the system Python.) It's sometimes a mess to
> find where Linux installed the scripts related with each version of Python.
> But if I could do `pypy -m nosetests` then I'd have a magic solution. What
> do you think?

How would Python know where to find the script?
Note: if "python -m nose" doesn't already work, it is probably a nice
feature request for nose.



From ram at  Thu Jun 11 19:39:50 2015
From: ram at (Ram Rachum)
Date: Thu, 11 Jun 2015 20:39:50 +0300
Subject: [Python-ideas] Making -m work for scripts
In-Reply-To: <20150611193730.141204b8@fsol>
References: <>
Message-ID: <>

I know little about package management, but I assume that the information
on where scripts are installed exists somewhere? Some central place that
Python might be able to access?

On Thu, Jun 11, 2015 at 8:37 PM, Antoine Pitrou <solipsis at> wrote:

> On Thu, 11 Jun 2015 20:23:57 +0300
> Ram Rachum <ram at> wrote:
> > Hi,
> >
> > What do you think about making `python -m whatever` work also for
> installed
> > scripts and not just for modules? I need this now because I've installed
> > pypy on Linux, and I'm not sure how to run the `nosetests` of PyPy (in
> > contrast to the `nosetests` of the system Python.) It's sometimes a mess
> to
> > find where Linux installed the scripts related with each version of
> Python.
> > But if I could do `pypy -m nosetests` then I'd have a magic solution.
> What
> > do you think?
> How would Python know where to find the script?
> Note: if "python -m nose" doesn't already work, it is probably a nice
> feature request for nose.
> Regards
> Antoine.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ethan at  Thu Jun 11 21:10:11 2015
From: ethan at (Ethan Furman)
Date: Thu, 11 Jun 2015 12:10:11 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On 06/11/2015 03:56 AM, Chris Angelico wrote:

> while input("Spam? ") as spam:
>      print(globals())
>      break
> Spam? yes
> {... 'spam.0x7f2080260228': 'yes'...}

Having names not leak from listcomps and genexps is a good thing.

Having names not leak from try/execpt blocks is a necessary thing.

Having names not leak from if/else or while is confusing and irritating: there is no scope there, and at least 'while' should be similar to 'for' which also does a name binding and does /not/ unset it 
at the end.

> Aside from being a fun exercise for me, building a Volkswagen
> Helicopter, is this at all useful to anybody?

I would find the 'as NAME' portion very useful as long as it wasn't shadowing nor unset.


From rosuav at  Thu Jun 11 21:19:29 2015
From: rosuav at (Chris Angelico)
Date: Fri, 12 Jun 2015 05:19:29 +1000
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 12, 2015 at 5:10 AM, Ethan Furman <ethan at> wrote:
> On 06/11/2015 03:56 AM, Chris Angelico wrote:
>> while input("Spam? ") as spam:
>>      print(globals())
>>      break
>> Spam? yes
>> {... 'spam.0x7f2080260228': 'yes'...}
> Having names not leak from listcomps and genexps is a good thing.
> Having names not leak from try/execpt blocks is a necessary thing.
> Having names not leak from if/else or while is confusing and irritating:
> there is no scope there, and at least 'while' should be similar to 'for'
> which also does a name binding and does /not/ unset it at the end.
>> Aside from being a fun exercise for me, building a Volkswagen
>> Helicopter, is this at all useful to anybody?
> I would find the 'as NAME' portion very useful as long as it wasn't
> shadowing nor unset.

Sure. Removing the scoping from the "while cond as target" rule is
simple. Just delete a couple of lines of code (one at the top, one at
the bottom), and it'll do a simple name binding.

On the subject of try/except unbinding, though, there's a surprising
thing in the code: the last action in an except clause is to assign
None to the name, and *then* del it:

try: suite
except Something as e:
        e = None
        del e

Why set it to None just before delling it? It's clearly no accident,
so it must have a reason for existing. (With CPython sources, it's
always safest to assume intelligent design.)


From abarnert at  Thu Jun 11 22:31:54 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 11 Jun 2015 13:31:54 -0700
Subject: [Python-ideas] Making -m work for scripts
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 11, 2015, at 10:23, Ram Rachum <ram at> wrote:
> What do you think about making `python -m whatever` work also for installed scripts and not just for modules?

The current packaging system already makes it easy for packages to make everything easy for you. Assuming nosetests is already using setuptools to create console script entry points automatically, all they have to do is make the top-level code in the module/ in the package call the same function specified as the entry point, and `pypy -m nosetests` now does the same thing as `/path/to/pypy/scripts/nosetests` no matter how the user has configured things. That's probably a one-line change. (If they're not using entry points, they made need to first factor out a `main` function to be put somewhere that can be used by both, but that still isn't very hard, and is worth doing just so they can switch to using entry points anyway.) Many packages already do this. The obvious problem is that some don't, and as an end-user your only option is to file enhancement requests with those that don't.

I don't think there's any obvious way to make your idea work. Python doesn't keep track of what got installed where. (I believe it could import distutils and determine the default install location for new scripts, but even that doesn't really help. Especially since you can end up with multiple Python installations all sharing an install location like /usr/local/bin unless one of those installations does something to avoid it--for example, on a Mac, if you use Apple's pre-installed 2.7 and also a default install from, they'll both install new scripts there.)

Of course the pseudo-database of installed scripts could be migrated from pip to Python core, or pip could grow its own script runner that you use in place of Python, or there are probably other radical changes you could make that would enable this.

But if we're going to do something big, I'd prefer to just make it easier to specify a custom script suffix in distutils.cfg and encourage distros/users/third-party Pythons/etc. to use that if they want to make multiple Pythons easy to use, instead of using different directories. Then you'd just run `nosetests_pypy3` vs. `nosetests3` or `nosetests_jy2` vs. `nosetests2` or `nosetests_cust3.4` vs. `nosetests3.4` or whatever. (Mainly because that's what I do manually; I have a wrapper around pip that symlinks any installed scripts into /usr/local/bin with a suffix that depends on the wrapper's name, and symlink a new name for each Python I install. I'm not sure if anyone else would like that.)

From ron3200 at  Fri Jun 12 01:23:03 2015
From: ron3200 at (Ron Adam)
Date: Thu, 11 Jun 2015 19:23:03 -0400
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <mld58o$bnt$>

On 06/11/2015 03:10 PM, Ethan Furman wrote:
> On 06/11/2015 03:56 AM, Chris Angelico wrote:
>> while input("Spam? ") as spam:
>>      print(globals())
>>      break
>> Spam? yes
>> {... 'spam.0x7f2080260228': 'yes'...}
> Having names not leak from listcomps and genexps is a good thing.

In a way this makes sense because you can think of them as a type of 
function literal.

> Having names not leak from if/else or while is confusing and irritating:
> there is no scope there, and at least 'while' should be similar to 'for'
> which also does a name binding and does /not/ unset it at the end.

Having a group of statement share a set of values is fairly easy to think 
about.  Having them share some values at some times, and not others at 
other times is not so easy to think about.

I also get the feeling the solution is more complex than the problem. 
Ummm... to clarify that.  The inconvenience of not having the solution to 
the apparent problem, is less of a problem than the possible problems I 
think might arise with the solution.

It's Kind of like parsing that sentence,

From ethan at  Fri Jun 12 02:12:32 2015
From: ethan at (Ethan Furman)
Date: Thu, 11 Jun 2015 17:12:32 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <mld58o$bnt$>
References: <>
 <> <mld58o$bnt$>
Message-ID: <>

On 06/11/2015 04:23 PM, Ron Adam wrote:
> On 06/11/2015 03:10 PM, Ethan Furman wrote:
>> On 06/11/2015 03:56 AM, Chris Angelico wrote:
>>> while input("Spam? ") as spam:
>>>      print(globals())
>>>      break
>>> Spam? yes
>>> {... 'spam.0x7f2080260228': 'yes'...}
>> Having names not leak from listcomps and genexps is a good thing.
> In a way this makes sense because you can think of them as a type of function literal.
>> Having names not leak from if/else or while is confusing and irritating:
>> there is no scope there, and at least 'while' should be similar to 'for'
>> which also does a name binding and does /not/ unset it at the end.
> Having a group of statement share a set of values is fairly easy to think about.

But that is not how Python works.  When you bind a name, that name stays until the scope is left (with one notable exception).

> Having them share some values at some times, and not others at other times is not so
>  easy to think about.

Which is why I would not have the psuedo-scope on any of them.  The only place where that currently happens is in a try/except clause, and that should remain the only exception.


From abarnert at  Fri Jun 12 04:21:29 2015
From: abarnert at (Andrew Barnert)
Date: Thu, 11 Jun 2015 19:21:29 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
 <> <mld58o$bnt$>
Message-ID: <>

On Jun 11, 2015, at 17:12, Ethan Furman <ethan at> wrote:
>> On 06/11/2015 04:23 PM, Ron Adam wrote:
>>> On 06/11/2015 03:10 PM, Ethan Furman wrote:
>>>> On 06/11/2015 03:56 AM, Chris Angelico wrote:
>>>> while input("Spam? ") as spam:
>>>>     print(globals())
>>>>     break
>>>> Spam? yes
>>>> {... 'spam.0x7f2080260228': 'yes'...}
>>> Having names not leak from listcomps and genexps is a good thing.
>> In a way this makes sense because you can think of them as a type of function literal.
>>> Having names not leak from if/else or while is confusing and irritating:
>>> there is no scope there, and at least 'while' should be similar to 'for'
>>> which also does a name binding and does /not/ unset it at the end.
>> Having a group of statement share a set of values is fairly easy to think about.
> But that is not how Python works.  When you bind a name, that name stays until the scope is left (with one notable exception).

What Nick was proposing was to explicitly change the way Python works. And what Chris hacked up was (part of) what Nick proposed. So you're just pointing out that this change to the way Python works would be a change to the way Python works. Well, of course it would. The question is whether it would be a good change.

Nick's point was that they tried a similar change to implement comprehensions without needing to "fake it" with a hidden function, and it makes the implementation far too complex, so it doesn't even matter if it's a well-designed and desirable change. Of course it's also possible that it's not a desirable change (e.g., the current scoping rules are simple enough to keep things straight in your head while reading any function that isn't already too long to be a function, but more complex rules wouldn't be), or that it's possible desirable but not as designed (e.g., I still think Nick's idea of binding within the expression or the statement in a somewhat complex way is more confusing than just binding within the statement). But Chris's attempt to show that the implementation problems might be resolvable, and/or to give people a hack they can play with instead of having to guess, is still a reasonable response to Nick's point.

I agree with your implied point that a language with two kinds of locality, one nested by block and the other function-wide, is probably not as good a design as one with only the first kind (like C) or only the second (like Python), and that's even more true in a language with closures or implicit declarations (both of which Python has), so I think any design is going to be a mess (definitely including my own straw-man design, and Nick's, and what Chris's hack implements). But it's certainly a _possible_ design, and there's nothing about Python 3.5 that means it would be impossible or backward-incompatible (as opposed to just a bad idea) to have such a design for Python 3.6.

From ncoghlan at  Fri Jun 12 08:53:42 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 12 Jun 2015 16:53:42 +1000
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 11 June 2015 at 22:57, Tal Einat <taleinat at> wrote:
> I actually think "subscript" is quite good a name. It makes the
> explicit distinction between subscripts, indexes and slices.

Yeah, I've warmed to it myself:

    zero = operator.subscript[0]

    ellipsis = operator.subscript[...]

    reverse = slice(None, None, -1)
    reverse = operator.subscript[::-1]

    all_rows_first_col = slice(None), slice(0)
    all_rows_first_col = operator.subscript[:, 0]

    first_row_all_cols_but_last = slice(0), slice(None, -1)
    first_row_all_cols_but_last = operator.subscript[0, :-1]

I realised the essential problem with using "item" in the name is that
the "item" in the method names refers to the *result*, not to the
input. Since the unifying term for the different kinds of input is
indeed "subscript" (covering indices, slices, multi-dimensional
slices, key lookups, content addressable data structures, etc), it
makes sense to just use it rather than inventing something new.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From stephen at  Fri Jun 12 08:55:52 2015
From: stephen at (Stephen J. Turnbull)
Date: Fri, 12 Jun 2015 15:55:52 +0900
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan Furman writes:

 > I would find the 'as NAME' portion very useful as long as it wasn't
 > shadowing nor unset.

I don't understand the "not shadowing" requirement.  If you're not
going to create a new scope, then

    from foo import *

    if expr as val:


might very well shadow foo.val and break the invocation of bar.  Is
use of the identifier "val" in this context an error?  Or what?

From ncoghlan at  Fri Jun 12 09:13:21 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 12 Jun 2015 17:13:21 +1000
Subject: [Python-ideas] Making -m work for scripts
In-Reply-To: <>
References: <>
Message-ID: <>

On 12 June 2015 at 06:31, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> But if we're going to do something big, I'd prefer to just make it easier to specify a custom script suffix in distutils.cfg and encourage distros/users/third-party Pythons/etc. to use that if they want to make multiple Pythons easy to use, instead of using different directories. Then you'd just run `nosetests_pypy3` vs. `nosetests3` or `nosetests_jy2` vs. `nosetests2` or `nosetests_cust3.4` vs. `nosetests3.4` or whatever. (Mainly because that's what I do manually; I have a wrapper around pip that symlinks any installed scripts into /usr/local/bin with a suffix that depends on the wrapper's name, and symlink a new name for each Python I install. I'm not sure if anyone else would like that.)

Armin Ronacher's pipsi should already make it possible to do:

    pypy -m pip install pipsi
    pypy -m pipsi install nose

In theory, that should give you a nosetests in ~/.local/bin that runs
in a PyPy virtualenv (I haven't actually tried it though).

"Should the default Python on Linux be a per-user configuration
setting rather than a system wide symlink?" was also a topic that came
up at this year's language summit (see, especially the straw poll results at
the end ), so it's likely a proposal along those lines will happen at
some point during the 3.6 development cycle.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From tjreedy at  Fri Jun 12 09:41:55 2015
From: tjreedy at (Terry Reedy)
Date: Fri, 12 Jun 2015 03:41:55 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <mle2h9$ooe$>

On 6/12/2015 2:53 AM, Nick Coghlan wrote:
> On 11 June 2015 at 22:57, Tal Einat <taleinat at> wrote:
>> I actually think "subscript" is quite good a name. It makes the
>> explicit distinction between subscripts, indexes and slices.
> Yeah, I've warmed to it myself:
>      zero = operator.subscript[0]
>      ellipsis = operator.subscript[...]
>      reverse = slice(None, None, -1)
>      reverse = operator.subscript[::-1]
>      all_rows_first_col = slice(None), slice(0)
>      all_rows_first_col = operator.subscript[:, 0]
>      first_row_all_cols_but_last = slice(0), slice(None, -1)
>      first_row_all_cols_but_last = operator.subscript[0, :-1]
> I realised the essential problem with using "item" in the name is that
> the "item" in the method names refers to the *result*, not to the
> input. Since the unifying term for the different kinds of input is
> indeed "subscript" (covering indices, slices, multi-dimensional
> slices, key lookups, content addressable data structures, etc), it
> makes sense to just use it rather than inventing something new.

If the feature is added, this looks pretty good to me.

Terry Jan Reedy

From ram at  Fri Jun 12 10:40:26 2015
From: ram at (Ram Rachum)
Date: Fri, 12 Jun 2015 11:40:26 +0300
Subject: [Python-ideas] Making -m work for scripts
In-Reply-To: <>
References: <>
Message-ID: <>


On Fri, Jun 12, 2015 at 10:13 AM, Nick Coghlan <ncoghlan at> wrote:

> On 12 June 2015 at 06:31, Andrew Barnert via Python-ideas
> <python-ideas at> wrote:
> > But if we're going to do something big, I'd prefer to just make it
> easier to specify a custom script suffix in distutils.cfg and encourage
> distros/users/third-party Pythons/etc. to use that if they want to make
> multiple Pythons easy to use, instead of using different directories. Then
> you'd just run `nosetests_pypy3` vs. `nosetests3` or `nosetests_jy2` vs.
> `nosetests2` or `nosetests_cust3.4` vs. `nosetests3.4` or whatever. (Mainly
> because that's what I do manually; I have a wrapper around pip that
> symlinks any installed scripts into /usr/local/bin with a suffix that
> depends on the wrapper's name, and symlink a new name for each Python I
> install. I'm not sure if anyone else would like that.)
> Armin Ronacher's pipsi should already make it possible to do:
>     pypy -m pip install pipsi
>     pypy -m pipsi install nose
> In theory, that should give you a nosetests in ~/.local/bin that runs
> in a PyPy virtualenv (I haven't actually tried it though).
> "Should the default Python on Linux be a per-user configuration
> setting rather than a system wide symlink?" was also a topic that came
> up at this year's language summit (see
>, especially the straw poll results at
> the end ), so it's likely a proposal along those lines will happen at
> some point during the 3.6 development cycle.
> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ned at  Fri Jun 12 13:03:56 2015
From: ned at (Ned Batchelder)
Date: Fri, 12 Jun 2015 07:03:56 -0400
Subject: [Python-ideas] Making -m work for scripts
In-Reply-To: <>
References: <>
Message-ID: <>

On 6/11/15 1:23 PM, Ram Rachum wrote:
> Hi,
> What do you think about making `python -m whatever` work also for 
> installed scripts and not just for modules? I need this now because 
> I've installed pypy on Linux, and I'm not sure how to run the 
> `nosetests` of PyPy (in contrast to the `nosetests` of the system 
> Python.) It's sometimes a mess to find where Linux installed the 
> scripts related with each version of Python. But if I could do `pypy 
> -m nosetests` then I'd have a magic solution. What do you think?

This works today:

     $ python -m nose


From ram at  Fri Jun 12 13:06:17 2015
From: ram at (Ram Rachum)
Date: Fri, 12 Jun 2015 14:06:17 +0300
Subject: [Python-ideas] Making -m work for scripts
In-Reply-To: <>
References: <>
Message-ID: <>

Thanks Ned!

On Fri, Jun 12, 2015 at 2:03 PM, Ned Batchelder <ned at>

> On 6/11/15 1:23 PM, Ram Rachum wrote:
>> Hi,
>> What do you think about making `python -m whatever` work also for
>> installed scripts and not just for modules? I need this now because I've
>> installed pypy on Linux, and I'm not sure how to run the `nosetests` of
>> PyPy (in contrast to the `nosetests` of the system Python.) It's sometimes
>> a mess to find where Linux installed the scripts related with each version
>> of Python. But if I could do `pypy -m nosetests` then I'd have a magic
>> solution. What do you think?
> This works today:
>     $ python -m nose
> --Ned.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From taleinat at  Fri Jun 12 15:27:35 2015
From: taleinat at (Tal Einat)
Date: Fri, 12 Jun 2015 16:27:35 +0300
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <mle2h9$ooe$>
References: <>
Message-ID: <>

On Fri, Jun 12, 2015 at 10:41 AM, Terry Reedy <tjreedy at> wrote:
> On 6/12/2015 2:53 AM, Nick Coghlan wrote:
>> On 11 June 2015 at 22:57, Tal Einat <taleinat at> wrote:
>>> I actually think "subscript" is quite good a name. It makes the
>>> explicit distinction between subscripts, indexes and slices.
>> Yeah, I've warmed to it myself:
>>      zero = operator.subscript[0]
>>      ellipsis = operator.subscript[...]
>>      reverse = slice(None, None, -1)
>>      reverse = operator.subscript[::-1]
>>      all_rows_first_col = slice(None), slice(0)
>>      all_rows_first_col = operator.subscript[:, 0]
>>      first_row_all_cols_but_last = slice(0), slice(None, -1)
>>      first_row_all_cols_but_last = operator.subscript[0, :-1]
>> I realised the essential problem with using "item" in the name is that
>> the "item" in the method names refers to the *result*, not to the
>> input. Since the unifying term for the different kinds of input is
>> indeed "subscript" (covering indices, slices, multi-dimensional
>> slices, key lookups, content addressable data structures, etc), it
>> makes sense to just use it rather than inventing something new.
> If the feature is added, this looks pretty good to me.

It looks good to me as well.

+1 for adding this as described and naming it operator.subscript.

- Tal Einat

From ethan at  Fri Jun 12 16:14:05 2015
From: ethan at (Ethan Furman)
Date: Fri, 12 Jun 2015 07:14:05 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

On 06/11/2015 11:55 PM, Stephen J. Turnbull wrote:
> Ethan Furman writes:
>> I would find the 'as NAME' portion very useful as long as it wasn't
>> shadowing nor unset.
> I don't understand the "not shadowing" requirement.  If you're not
> going to create a new scope, then
>   from foo import *
>   if expr as val:
>       use(val)
>   bar(val)
> might very well shadow foo.val and break the invocation of bar.  Is
> use of the identifier "val" in this context an error?  Or what?


    for val in some_iterator:


will shadow foo.val and break bar; yet for loops do not create their own scopes.

    with open('somefile') as val:
       stuff =


will also shadow foo.val and break bar, yet with contexts do not create their own scopes.

And let's not forget:

    val = some_func()


Again -- no micro-scope, and foo.val is shadowed.


From techtonik at  Fri Jun 12 15:14:47 2015
From: techtonik at (anatoly techtonik)
Date: Fri, 12 Jun 2015 16:14:47 +0300
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

I would gladly use forum interface like
if there was any linked in message footers. Bottom posting is not
automated in Gmail and takes a lot of energy and dumb repeated
keypresses to complete.

On Wed, Jun 3, 2015 at 1:44 AM, Oleg Broytman <phd at> wrote:
> On Tue, Jun 02, 2015 at 03:28:33PM -0700, u8y7541 The Awesome Person <surya.subbarao1 at> wrote:
>> What do you mean by replying inine?
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
>> On Mon, Jun 1, 2015 at 10:22 PM, Andrew Barnert <abarnert at> wrote:
>> > On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person
>> > <surya.subbarao1 at> wrote:
>> >
>> > I think you're right. I was also considering ... "editing" my Python
>> > distribution. If they didn't implement my suggestion for correcting floats,
>> > at least they can fix this, instead of making people hack Python for good
>> > results!
>> >
>> >
>> > If you're going to reply to digests, please learn how to reply inline
>> > instead of top-posting (and how to trim out all the irrelevant stuff). It's
>> > next to impossible to tell which part of which of the messages you're
>> > replying to even in simple cases like this one, with only 4 messages in the
>> > digest.
> Oleg.
> --
>      Oleg Broytman              phd at
>            Programmers don't die, they just GOSUB without RETURN.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

anatoly t.

From techtonik at  Fri Jun 12 15:34:03 2015
From: techtonik at (anatoly techtonik)
Date: Fri, 12 Jun 2015 16:34:03 +0300
Subject: [Python-ideas] Using semantic web (RDF triples) to link
 replacements for modules in standard library
Message-ID: <>

I failed to bring any attention to
to crowdsource collecting data about
modules in stdlib, and was even warned not to
distract people in lists to work on
it, so I thought about a different way to
decentralize the work on standard library.

We have the data about top Python modules
that need a redesign
but no way to propose alternatives for each

So, I propose to use wikidata for collecting
that info:
1. define concept for Python stdlib
2. define concept of module for Python stdlib
3. add a property "replacement of'
4. provide a web interface for viewing that info

There is no interested entity in sponsoring the
development of model and workflow in my
region, and the job contracts that are available
to me right are unlikely to allow me to work on
this stuff, so I hope that this idea will take some
traction from people interested to put semantic
web to some good use, and won't be affected
by "dead by origin" curse.
anatoly t.

From rymg19 at  Fri Jun 12 17:02:16 2015
From: rymg19 at (Ryan Gonzalez)
Date: Fri, 12 Jun 2015 10:02:16 -0500
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On June 12, 2015 8:14:47 AM CDT, anatoly techtonik <techtonik at> wrote:
>I would gladly use forum interface like
>if there was any linked in message footers. Bottom posting is not
>automated in Gmail and takes a lot of energy and dumb repeated
>keypresses to complete.

Not really. I just did it right now.

>On Wed, Jun 3, 2015 at 1:44 AM, Oleg Broytman <phd at> wrote:
>> On Tue, Jun 02, 2015 at 03:28:33PM -0700, u8y7541 The Awesome Person
><surya.subbarao1 at> wrote:
>>> What do you mean by replying inine?
>> A: Because it messes up the order in which people normally read text.
>> Q: Why is top-posting such a bad thing?
>> A: Top-posting.
>> Q: What is the most annoying thing in e-mail?
>>> On Mon, Jun 1, 2015 at 10:22 PM, Andrew Barnert <abarnert at>
>>> > On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person
>>> > <surya.subbarao1 at> wrote:
>>> >
>>> > I think you're right. I was also considering ... "editing" my
>>> > distribution. If they didn't implement my suggestion for
>correcting floats,
>>> > at least they can fix this, instead of making people hack Python
>for good
>>> > results!
>>> >
>>> >
>>> > If you're going to reply to digests, please learn how to reply
>>> > instead of top-posting (and how to trim out all the irrelevant
>stuff). It's
>>> > next to impossible to tell which part of which of the messages
>>> > replying to even in simple cases like this one, with only 4
>messages in the
>>> > digest.
>> Oleg.
>> --
>>      Oleg Broytman             
>phd at
>>            Programmers don't die, they just GOSUB without RETURN.
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> Code of Conduct:

Sent from my Android device with K-9 Mail. Please excuse my brevity.

From stephen at  Fri Jun 12 17:14:14 2015
From: stephen at (Stephen J. Turnbull)
Date: Sat, 13 Jun 2015 00:14:14 +0900
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan Furman writes:

 > Likewise:
 >     for val in some_iterator:
 >        use(val)
 >     bar(val)
 > will shadow foo.val

Yes, I understand that.  What I don't understand is your statement
that you would like "if expr as val:" if it *doesn't* shadow.

From abarnert at  Fri Jun 12 19:09:11 2015
From: abarnert at (Andrew Barnert)
Date: Fri, 12 Jun 2015 10:09:11 -0700
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 12, 2015, at 08:02, Ryan Gonzalez <rymg19 at> wrote:
>> On June 12, 2015 8:14:47 AM CDT, anatoly techtonik <techtonik at> wrote:
>> I would gladly use forum interface like
>> if there was any linked in message footers. Bottom posting is not
>> automated in Gmail and takes a lot of energy and dumb repeated
>> keypresses to complete.
> Not really. I just did it right now.

And if that's too much work, there are Greasemonkey scripts to do it for you. (And, unlike Yahoo, Gmail doesn't seem to go out of their way to break user scripts every two weeks.)

But really, in non-trivial cases, you usually want your reply interleaved with parts of the original; just jumping to the very end of the message (past the signature and list footer) and replying to the whole thing in bulk isn't much better than top-posting. And, while some MUAs do have tools to help with that better than Gmail's, there's really no way to automate it; you have to put the effort into selecting the parts you want to keep or the parts you want to remove and putting your cursor after each one.

(If you're not willing to do that, and instead assume anyone who needs the context can reassemble it themselves with the help of an MUA with proper threading support--which is pretty much all of them today--then why quote at all?)

From ethan at  Fri Jun 12 19:21:39 2015
From: ethan at (Ethan Furman)
Date: Fri, 12 Jun 2015 10:21:39 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

On 06/12/2015 08:14 AM, Stephen J. Turnbull wrote:
> Ethan Furman writes:
>> Likewise:
>>     for val in some_iterator:
>>        use(val)
>>     bar(val)
>> will shadow foo.val
> Yes, I understand that.  What I don't understand is your statement
> that you would like "if expr as val:" if it *doesn't* shadow.

Ah, I think I see your point.  My use of the word "shadow" was in relation to the micro-scope and the previously existing name being shadowed and then un-shadowed when the micro-scope was destroyed. 
If we are at module-level (not class nor function) then there should be no shadowing, but a rebinding of the name.  Even try/except blocks don't "shadow", but rebind and then delete the name used to 
catch the exception.


From ron3200 at  Fri Jun 12 20:25:20 2015
From: ron3200 at (Ron Adam)
Date: Fri, 12 Jun 2015 14:25:20 -0400
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<CAPTjJmof8QTqixupHxh>	<>	<>	<>
 <> <>
Message-ID: <mlf86h$4ka$>

On 06/12/2015 01:21 PM, Ethan Furman wrote:
> On 06/12/2015 08:14 AM, Stephen J. Turnbull wrote:
>> Ethan Furman writes:
>>> Likewise:
>>>     for val in some_iterator:
>>>        use(val)
>>>     bar(val)
>>> will shadow foo.val
>> Yes, I understand that.  What I don't understand is your statement
>> that you would like "if expr as val:" if it *doesn't* shadow.
> Ah, I think I see your point.  My use of the word "shadow" was in relation
> to the micro-scope and the previously existing name being shadowed and then
> un-shadowed when the micro-scope was destroyed. If we are at module-level
> (not class nor function) then there should be no shadowing, but a rebinding
> of the name.  Even try/except blocks don't "shadow", but rebind and then
> delete the name used to catch the exception.

The problem can be turned around/over.  Instead of specifying a name to be 
shadowed, the names to be shared can be specified.  Then it translates to 
function with specified nonlocals.

     a = 1             # will be shared
     b = 2             # will be shadowed

     def do_loop_with_shared_items():
         nonlocal a                      # a is a shared value.
         for b in some_iterator:
             a = use(b)

     print(a)     # changed by loop
     print(b)     # print 2. Not changed by loop

That might be expressed as...

     a = 1
     b = 2

     with nonlocal a:                 # a is shared
         for b in some_iterator:
             a = use(b)               # other values (b) are local to block.

     print(a)     # changed by loop
     print(b)     # prints 2. Not changed by loop

And with this, the "as" modifier isn't needed, just don't list the item as 
a nonlocal.

     with nonlocal:
         a =          # a as in this block scope only.

This has the advantage of not complicating other statements and keeps the 
concept in a separate mental box.

I like this better, but am still -0.5.  I'd need to see some examples where 
it would be "worth it".  It still feels like a solution looking for a 
problem to me.


From abarnert at  Fri Jun 12 20:41:28 2015
From: abarnert at (Andrew Barnert)
Date: Fri, 12 Jun 2015 11:41:28 -0700
Subject: [Python-ideas] If branch merging
In-Reply-To: <mlf86h$4ka$>
References: <>
 <> <>
 <> <>
 <> <mlf86h$4ka$>
Message-ID: <>

On Jun 12, 2015, at 11:25, Ron Adam <ron3200 at> wrote:
>> On 06/12/2015 01:21 PM, Ethan Furman wrote:
>>> On 06/12/2015 08:14 AM, Stephen J. Turnbull wrote:
>>> Ethan Furman writes:
>>>> Likewise:
>>>>    for val in some_iterator:
>>>>       use(val)
>>>>    bar(val)
>>>> will shadow foo.val
>>> Yes, I understand that.  What I don't understand is your statement
>>> that you would like "if expr as val:" if it *doesn't* shadow.
>> Ah, I think I see your point.  My use of the word "shadow" was in relation
>> to the micro-scope and the previously existing name being shadowed and then
>> un-shadowed when the micro-scope was destroyed. If we are at module-level
>> (not class nor function) then there should be no shadowing, but a rebinding
>> of the name.  Even try/except blocks don't "shadow", but rebind and then
>> delete the name used to catch the exception.
> The problem can be turned around/over.  Instead of specifying a name to be shadowed, the names to be shared can be specified.  Then it translates to function with specified nonlocals.

I really like making it explicit.

I'm not sure about the turning-it-around bit. That means inside a with-nonlocal block, things don't work the same as in any another block, and that won't be at all obvious.

But even without that, the idea works; to make something nested-local, you write:

    with local b:
        for b in some_iterator:
            a = use(b)

That leaves function-local as the default, and defines statement-local in a way that's as similar as possible to the other alternatives, environment-nonlocal and global; the only real difference is that it has a suite, which is pretty much implicit in the fact that it's defining something as local to the suite.

Either way seems better than the quasi-magic scoping (both my version and Nick's took a couple paragraphs to explain...) caused by as expressions and/or clauses. And that's in addition to the advantages you suggested of not complicating the syntax and keeping separate concepts separate.

> I like this better, but am still -0.5.  I'd need to see some examples where it would be "worth it".  It still feels like a solution looking for a problem to me.

Agreed. I think everyone (including myself) has put thought into this just because it's an interesting puzzle, not necessarily because the language needs it...

From taleinat at  Fri Jun 12 22:59:04 2015
From: taleinat at (Tal Einat)
Date: Fri, 12 Jun 2015 23:59:04 +0300
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 12, 2015 at 11:45 PM, Joseph Jevnik <joejev at> wrote:
> I can update my patch to move it to the operator module

Please do.

Further discussion should take place on the issue tracker.

From joejev at  Fri Jun 12 22:45:33 2015
From: joejev at (Joseph Jevnik)
Date: Fri, 12 Jun 2015 16:45:33 -0400
Subject: [Python-ideas] slice.literal notation
In-Reply-To: <>
References: <>
Message-ID: <>

I can update my patch to move it to the operator module
On Jun 12, 2015 9:28 AM, "Tal Einat" <taleinat at> wrote:

> On Fri, Jun 12, 2015 at 10:41 AM, Terry Reedy <tjreedy at> wrote:
> > On 6/12/2015 2:53 AM, Nick Coghlan wrote:
> >>
> >> On 11 June 2015 at 22:57, Tal Einat <taleinat at> wrote:
> >>>
> >>>
> >>> I actually think "subscript" is quite good a name. It makes the
> >>> explicit distinction between subscripts, indexes and slices.
> >>
> >>
> >> Yeah, I've warmed to it myself:
> >>
> >>      zero = operator.subscript[0]
> >>
> >>      ellipsis = operator.subscript[...]
> >>
> >>      reverse = slice(None, None, -1)
> >>      reverse = operator.subscript[::-1]
> >>
> >>      all_rows_first_col = slice(None), slice(0)
> >>      all_rows_first_col = operator.subscript[:, 0]
> >>
> >>      first_row_all_cols_but_last = slice(0), slice(None, -1)
> >>      first_row_all_cols_but_last = operator.subscript[0, :-1]
> >>
> >> I realised the essential problem with using "item" in the name is that
> >> the "item" in the method names refers to the *result*, not to the
> >> input. Since the unifying term for the different kinds of input is
> >> indeed "subscript" (covering indices, slices, multi-dimensional
> >> slices, key lookups, content addressable data structures, etc), it
> >> makes sense to just use it rather than inventing something new.
> >
> >
> > If the feature is added, this looks pretty good to me.
> It looks good to me as well.
> +1 for adding this as described and naming it operator.subscript.
> - Tal Einat
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Sat Jun 13 01:50:31 2015
From: rosuav at (Chris Angelico)
Date: Sat, 13 Jun 2015 09:50:31 +1000
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 12, 2015 at 11:14 PM, anatoly techtonik <techtonik at> wrote:
> I would gladly use forum interface like
> if there was any linked in message footers. Bottom posting is not
> automated in Gmail and takes a lot of energy and dumb repeated
> keypresses to complete.

Small trick: If Gmail has all the quoted text buried behind a "click
to expand" button, you don't have to grab the mouse - just press
Ctrl-A to select all, and it'll expand the quoted section into actual

Works in Chrome, not tested recently in Firefox but should work there
too. Can't speak for other browsers.


From stephen at  Sat Jun 13 03:28:58 2015
From: stephen at (Stephen J. Turnbull)
Date: Sat, 13 Jun 2015 10:28:58 +0900
Subject: [Python-ideas] If branch merging
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan Furman writes:

 > > Yes, I understand that.  What I don't understand is your statement
 > > that you would like "if expr as val:" if it *doesn't* shadow.
 > Ah, I think I see your point.  My use of the word "shadow" was in
 > relation to the micro-scope and the previously existing name being
 > shadowed and then un-shadowed when the micro-scope was destroyed.

I see.  Your use of "shadow" implies later "unshadowing", which can
only happen with scope.  Mine doesn't, I just associate "shadow" with
rebinding.  I think your usage is more accurate.  Especially in
Python, which has a much flatter (and more formalized) use of scopes
than, say, Lisp.

Thank you for your explanation, it helped (me, anyway).

From stephen at  Sat Jun 13 03:53:23 2015
From: stephen at (Stephen J. Turnbull)
Date: Sat, 13 Jun 2015 10:53:23 +0900
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

Andrew Barnert via Python-ideas writes:

 > But really, in non-trivial cases, you usually want your reply
 > interleaved with parts of the original; just jumping to the very
 > end of the message (past the signature and list footer) and
 > replying to the whole thing in bulk isn't much better than
 > top-posting.

Wrong. :-)  *Bottom*-posting is *much* worse.

For better or worse, top-posting is here to stay.  It doesn't work
very well in forums like this one, but it's not too bad if you do it
the way Guido does (which of course is one of the reasons we can't get
rid of it<wink/>).  The basic rules:

1.  Don't top-post, and you're done.  If you "must" top-post, then

2.  Only top-post short comments that make sense with minimal context
    (typically the subject should be enough context).  If it's not
    going to be short, don't top-post (no exceptions -- if you've got
    time and the equipment to write a long post, you also are not
    inconvenienced by using the interlinear style).  If it requires
    specific context presented accurately, don't top-post (no
    exceptions, as your top post will certainly be misunderstood and
    generate long threads of explaining what you thought didn't need
    explanation, wasting copious amounts of everybody's time and
    attention, and probably burying your contribution in the process).

3.  Indicate in your text that you top-posted -- preferably with a
    sincere apology.  If you're so self-centered that sincerity is
    impossible, an insincere apology is recommended (otherwise you'll
    probably end up in a few killfiles for being Beavis's friend).

From breamoreboy at  Sat Jun 13 10:39:53 2015
From: breamoreboy at (Mark Lawrence)
Date: Sat, 13 Jun 2015 09:39:53 +0100
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <>
References: <>
Message-ID: <mlgq94$peq$>

On 22/05/2015 02:18, Ben Hoyt wrote:
> Hi Python Ideas folks,

[snipped to death]

> -Ben

You might find this interesting

The introduction states "Baron is a Full Syntax Tree (FST) library for 
Python. By opposition to an AST which drops some syntax information in 
the process of its creation (like empty lines, comments, formatting), a 
FST keeps everything and guarantees the operation 
fst_to_code(code_to_fst(source_code)) == source_code.".

My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

From guido at  Sat Jun 13 11:09:17 2015
From: guido at (Guido van Rossum)
Date: Sat, 13 Jun 2015 02:09:17 -0700
Subject: [Python-ideas] Enabling access to the AST for Python code
In-Reply-To: <mlgq94$peq$>
References: <>
Message-ID: <>

On Sat, Jun 13, 2015 at 1:39 AM, Mark Lawrence <breamoreboy at>

> You might find this interesting
> The introduction states "Baron is a Full Syntax Tree (FST) library for
> Python. By opposition to an AST which drops some syntax information in the
> process of its creation (like empty lines, comments, formatting), a FST
> keeps everything and guarantees the operation
> fst_to_code(code_to_fst(source_code)) == source_code.".

There's one like this in the stdlib too! It's in lib2to3 and even preserves
comments and whitespace. It's used as the basis for the 2to3 fixers.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From Steve.Dower at  Sat Jun 13 21:54:37 2015
From: Steve.Dower at (Steve Dower)
Date: Sat, 13 Jun 2015 19:54:37 +0000
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

FWIW, from my phone the previous email is completely ineditable and I can't even delete it. Hence my signature, putting the blame firmly where it belongs ;)

That said, we basically always use top posting (with highlighting - a perk of formatting in emails) at work so I'm fairly used to it. The way that I read the mailing lists (oldest to newest by thread) also works better with top posting, since I'm permanently trying to catch up rather than carry on a conversation. The problems then are people who don't snip (though people who do snip are also problems... can't win here) and especially those who reply to one part in the middle of an epic and don't sign off, leaving me scrolling right to the end to figure out if they have anything to say.</complaining>


Top-posted from my Windows Phone
From: Stephen J. Turnbull<mailto:stephen at>
Sent: ?6/?12/?2015 18:53
To: Andrew Barnert<mailto:abarnert at>
Cc: python-ideas<mailto:python-ideas at>
Subject: Re: [Python-ideas] Meta: Email netiquette

Andrew Barnert via Python-ideas writes:

 > But really, in non-trivial cases, you usually want your reply
 > interleaved with parts of the original; just jumping to the very
 > end of the message (past the signature and list footer) and
 > replying to the whole thing in bulk isn't much better than
 > top-posting.

Wrong. :-)  *Bottom*-posting is *much* worse.

For better or worse, top-posting is here to stay.  It doesn't work
very well in forums like this one, but it's not too bad if you do it
the way Guido does (which of course is one of the reasons we can't get
rid of it<wink/>).  The basic rules:

1.  Don't top-post, and you're done.  If you "must" top-post, then

2.  Only top-post short comments that make sense with minimal context
    (typically the subject should be enough context).  If it's not
    going to be short, don't top-post (no exceptions -- if you've got
    time and the equipment to write a long post, you also are not
    inconvenienced by using the interlinear style).  If it requires
    specific context presented accurately, don't top-post (no
    exceptions, as your top post will certainly be misunderstood and
    generate long threads of explaining what you thought didn't need
    explanation, wasting copious amounts of everybody's time and
    attention, and probably burying your contribution in the process).

3.  Indicate in your text that you top-posted -- preferably with a
    sincere apology.  If you're so self-centered that sincerity is
    impossible, an insincere apology is recommended (otherwise you'll
    probably end up in a few killfiles for being Beavis's friend).

Python-ideas mailing list
Python-ideas at
Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ben+python at  Sat Jun 13 22:05:51 2015
From: ben+python at (Ben Finney)
Date: Sun, 14 Jun 2015 06:05:51 +1000
Subject: [Python-ideas] Meta: Email netiquette
References: <>
Message-ID: <>

Steve Dower <Steve.Dower at>

> FWIW, from my phone the previous email is completely ineditable and I
> can't even delete it. Hence my signature, putting the blame firmly
> where it belongs ;)

This speaks to a deficiency in the tool. Rather than apologising for bad
etiquette, surely the better course is not to use a tool that needs such

In other words: if your tool can't compose messages properly, don't
apologise for it; instead, stop using that tool for composing messages.

 \       ?A lot of people are afraid of heights. Not me, I'm afraid of |
  `\                                           widths.? ?Steven Wright |
_o__)                                                                  |
Ben Finney

From ncoghlan at  Sun Jun 14 03:24:31 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 14 Jun 2015 11:24:31 +1000
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On 14 Jun 2015 06:06, "Ben Finney" <ben+python at> wrote:
> Steve Dower <Steve.Dower at>
> writes:
> > FWIW, from my phone the previous email is completely ineditable and I
> > can't even delete it. Hence my signature, putting the blame firmly
> > where it belongs ;)
> This speaks to a deficiency in the tool. Rather than apologising for bad
> etiquette, surely the better course is not to use a tool that needs such
> apology?
> In other words: if your tool can't compose messages properly, don't
> apologise for it; instead, stop using that tool for composing messages.

Far easier said than done, especially in an institutional context, as there
are currently *zero* readily available email clients out there that
adequately cover hybrid operation for folks bridging the gap between the
open source community and the world of enterprise collaboration suites.

Gmail (and its associated Android app) at least attains "not entirely awful
at it" status, but I'd expect Microsoft's clients to still be assuming
Outlook/Exchange style models.


> --
>  \       ?A lot of people are afraid of heights. Not me, I'm afraid of |
>   `\                                           widths.? ?Steven Wright |
> _o__)                                                                  |
> Ben Finney
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Sun Jun 14 06:10:02 2015
From: steve at (Steven D'Aprano)
Date: Sun, 14 Jun 2015 14:10:02 +1000
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 13, 2015 at 10:53:23AM +0900, Stephen J. Turnbull wrote:
> Andrew Barnert via Python-ideas writes:
>  > But really, in non-trivial cases, you usually want your reply
>  > interleaved with parts of the original; just jumping to the very
>  > end of the message (past the signature and list footer) and
>  > replying to the whole thing in bulk isn't much better than
>  > top-posting.
> Wrong. :-)  *Bottom*-posting is *much* worse.

Agreed. The worst case I personally ever saw was somebody replying to a 
digest on a high-volume mailing list where they added "I agree!" and 
their signature to the very end. I actually counted how many pages of 
quoting there were: 29 pages, based on ~60 lines per page. (That's full 
A4 pages mind, it was about 50 keypresses in mutt to page through it 
a screen at a time.) Naturally there was no indication of which of the 
two dozen messages they agreed with.

> For better or worse, top-posting is here to stay.  It doesn't work
> very well in forums like this one, but it's not too bad if you do it
> the way Guido does (which of course is one of the reasons we can't get
> rid of it<wink/>).  The basic rules:

This is the best description of good top-posting practice I've ever 
seen, thanks.

For what it's worth, I think inline posters also need to follow good 
practice too: if the reader cannot see new content (i.e. what you wrote) 
within the first screen full of text, you're probably quoting too much. 
This rule does not apply to readers trying to read email on a phone that 
shows only a handful of lines at a time. If you are reading email on a 
screen the size of a credit card (or smaller), you cannot expect others 
to accomodate your choice of technology in a discussion group like this.

I can't think of *any* good reason to bottom-post without trimming the 
quoted content. 


From ncoghlan at  Sun Jun 14 07:25:08 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 14 Jun 2015 15:25:08 +1000
Subject: [Python-ideas] PEP 432 modular bootstrap status update
Message-ID: <>

I took some time today to resync my PEP 432 modular bootstrap branch
with the current default branch of CPython.

While it's still very much a work in progress (with outright test
failures for at least isolated mode and subinterpreter support at the
represents a significant milestone, as for the first time I was able
to replace the current call in Py_Main that initialises the hash
randomisation with the new Py_BeginInitialization API and still build
an interpreter that at least basically works.

The full details of what that branch is about are in, but the general idea is to
split the interpreter bootstrapping sequence up into two distinct

* Initializing: the eval loop works, builtin data types work, frozen
and builtin imports work, the compiler works, but most operating
system interfaces (including external module imports) don't work
* Initialized: fully configured interpreter

My main goal here is actually to make the startup code easier to hack
on, by having more of it take place with the interpreter in a
well-defined state, rather than having a lot of APIs that may or may
not work depending on exactly where we are in the bootstrapping

The main potential benefit for end users these changes should make it
easier to embed the CPython runtime in other applications (including
command line applications), and *skip the initialisation steps you
don't need*.

A secondary potential benefit is this should make it easier to have
subinterpreters that are *configured differently* from the main
interpreter (so, for example, you could have a subinterpreter that had
no import system configured), which opens the door to various
improvements in the way subinterpreters work in general.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From techtonik at  Sun Jun 14 13:08:31 2015
From: techtonik at (anatoly techtonik)
Date: Sun, 14 Jun 2015 14:08:31 +0300
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 12, 2015 at 6:02 PM, Ryan Gonzalez <rymg19 at> wrote:
> On June 12, 2015 8:14:47 AM CDT, anatoly techtonik <techtonik at> wrote:
>>I would gladly use forum interface like
>>if there was any linked in message footers. Bottom posting is not
>>automated in Gmail and takes a lot of energy and dumb repeated
>>keypresses to complete.
> Not really. I just did it right now.


> Sent from my Android device with K-9 Mail. Please excuse my brevity.


From techtonik at  Sun Jun 14 13:16:43 2015
From: techtonik at (anatoly techtonik)
Date: Sun, 14 Jun 2015 14:16:43 +0300
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jun 12, 2015 at 8:09 PM, Andrew Barnert <abarnert at> wrote:
> On Jun 12, 2015, at 08:02, Ryan Gonzalez <rymg19 at> wrote:
>>> On June 12, 2015 8:14:47 AM CDT, anatoly techtonik <techtonik at> wrote:
>>> I would gladly use forum interface like
>>> if there was any linked in message footers. Bottom posting is not
>>> automated in Gmail and takes a lot of energy and dumb repeated
>>> keypresses to complete.
>> Not really. I just did it right now.
> And if that's too much work, there are Greasemonkey scripts to do it for you. (And, unlike Yahoo, Gmail doesn't seem to go out of their way to break user scripts every two weeks.)

And where are those scripts? I find references dated 2008 on some
mirror sites, and I doubt that they work. Another problem is that I
don't use the same Pip-Boy all the time.

The process that need to be automated for Gmail is
"Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is
the annoying combo that I have to write every time to enter bottom
posting insert mode.
anatoly t.

From brett at  Sun Jun 14 15:11:27 2015
From: brett at (Brett Cannon)
Date: Sun, 14 Jun 2015 13:11:27 +0000
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

 On Sun, Jun 14, 2015, 00:10 Steven D'Aprano <steve at> wrote:

On Sat, Jun 13, 2015 at 10:53:23AM +0900, Stephen J. Turnbull wrote:
> Andrew Barnert via Python-ideas writes:
>  > But really, in non-trivial cases, you usually want your reply
>  > interleaved with parts of the original; just jumping to the very
>  > end of the message (past the signature and list footer) and
>  > replying to the whole thing in bulk isn't much better than
>  > top-posting.
> Wrong. :-)  *Bottom*-posting is *much* worse.

Agreed. The worst case I personally ever saw was somebody replying to a
digest on a high-volume mailing list where they added "I agree!" and
their signature to the very end. I actually counted how many pages of
quoting there were: 29 pages, based on ~60 lines per page. (That's full
A4 pages mind, it was about 50 keypresses in mutt to page through it
a screen at a time.) Naturally there was no indication of which of the
two dozen messages they agreed with.

+1 from me as well.

> For better or worse, top-posting is here to stay.  It doesn't work
> very well in forums like this one, but it's not too bad if you do it
> the way Guido does (which of course is one of the reasons we can't get
> rid of it<wink/>).  The basic rules:

This is the best description of good top-posting practice I've ever
seen, thanks.

For what it's worth, I think inline posters also need to follow good
practice too: if the reader cannot see new content (i.e. what you wrote)
within the first screen full of text, you're probably quoting too much.
This rule does not apply to readers trying to read email on a phone that
shows only a handful of lines at a time. If you are reading email on a
screen the size of a credit card (or smaller), you cannot expect others
to accomodate your choice of technology in a discussion group like this.

Well, we will see how long that lasts. With mobile now the predominant
platform for consumption we might be approaching a point where mobiles are
actually how most of us follow mailing lists (says the man writing this
email from a tablet). And I bet for a lot of people it is becoming more
common to follow things like this list in their spare time on their phones
when they have a moment here and there.


I can't think of *any* good reason to bottom-post without trimming the
quoted content.

Python-ideas mailing list
Python-ideas at
Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mertz at  Sun Jun 14 17:36:15 2015
From: mertz at (David Mertz)
Date: Sun, 14 Jun 2015 08:36:15 -0700
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 14, 2015 at 4:16 AM, anatoly techtonik <techtonik at>

> The process that need to be automated for Gmail is
> "Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is
> the annoying combo that I have to write every time to enter bottom
> posting insert mode.

How is it that Anatoly has such great difficulty editing email to
intersperse responses in Gmail, and yet I find the process entirely easy
and quick? Oh yeah... I guess I answered my own question by mentioning the
name of the complainer.

Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tyler at  Sun Jun 14 18:00:51 2015
From: tyler at (Tyler Crompton)
Date: Sun, 14 Jun 2015 11:00:51 -0500
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

May we please continue this discussion elsewhere? I neither feel that this conversation will lead to anything constructive nor see this to be fitting for Python-ideas.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <>

From p.f.moore at  Sun Jun 14 18:51:29 2015
From: p.f.moore at (Paul Moore)
Date: Sun, 14 Jun 2015 17:51:29 +0100
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On 14 June 2015 at 12:16, anatoly techtonik <techtonik at> wrote:
> The process that need to be automated for Gmail is
> "Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is
> the annoying combo that I have to write every time to enter bottom
> posting insert mode.

And you consider that the time it takes you to press six keys is worth
more than the time it takes all the people who want to read your
message and understand its context, to scroll down, read the message
from the bottom up, and then *fix* your unhelpful quoting style if
they wish to quote your comment in the context of a reply?

Enough said.


From schesis at  Sun Jun 14 19:30:42 2015
From: schesis at (Zero Piraeus)
Date: Sun, 14 Jun 2015 14:30:42 -0300
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <20150614173042.GA2629@piedra>


On Sun, Jun 14, 2015 at 02:10:02PM +1000, Steven D'Aprano wrote:
> For what it's worth, I think inline posters also need to follow good 
> practice too: if the reader cannot see new content (i.e. what you wrote) 
> within the first screen full of text, you're probably quoting too much. 

While I agree in principle, others' failure to trim is a handy
time-saving measure for me: if there's nothing but quoted text on the
first screenful, I take it as a signal that a similar lack of thought
went into the original content, and skip to the next message.

If there *was* something worth reading after all, there's at least a
chance someone who actually knows how to write email replied to it, so
I'll see it anyway.


Zero Piraeus: respice finem

From techtonik at  Mon Jun 15 03:45:44 2015
From: techtonik at (anatoly techtonik)
Date: Mon, 15 Jun 2015 04:45:44 +0300
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 14, 2015 at 4:11 PM, Brett Cannon <brett at> wrote:
> Well, we will see how long that lasts. With mobile now the predominant
> platform for consumption we might be approaching a point where mobiles are
> actually how most of us follow mailing lists (says the man writing this
> email from a tablet). And I bet for a lot of people it is becoming more
> common to follow things like this list in their spare time on their phones
> when they have a moment here and there.

Can Mailman provide statistics about those mobile devices?

anatoly t.

From techtonik at  Mon Jun 15 03:57:09 2015
From: techtonik at (anatoly techtonik)
Date: Mon, 15 Jun 2015 04:57:09 +0300
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 14, 2015 at 6:36 PM, David Mertz <mertz at> wrote:
> On Sun, Jun 14, 2015 at 4:16 AM, anatoly techtonik <techtonik at>
> wrote:
>> The process that need to be automated for Gmail is
>> "Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is
>> the annoying combo that I have to write every time to enter bottom
>> posting insert mode.
> How is it that Anatoly has such great difficulty editing email to
> intersperse responses in Gmail, and yet I find the process entirely easy and
> quick? Oh yeah... I guess I answered my own question by mentioning the name
> of the complainer.

The requirements of the machine <-> human interface are very subjective and
depend on the age of a person being introduced to enabling communication
technology. I doubt that people younger than 25 are considering email as a
communication method at all, and I know that annoying interfaces are the
reason why people pretend not to use them. So, it may happen that
discussions are limited to a certain age group because of that.

anatoly t.

From rymg19 at  Mon Jun 15 04:17:32 2015
From: rymg19 at (Ryan Gonzalez)
Date: Sun, 14 Jun 2015 21:17:32 -0500
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On June 14, 2015 8:57:09 PM CDT, anatoly techtonik <techtonik at> wrote:
>On Sun, Jun 14, 2015 at 6:36 PM, David Mertz <mertz at> wrote:
>> On Sun, Jun 14, 2015 at 4:16 AM, anatoly techtonik
><techtonik at>
>> wrote:
>>> The process that need to be automated for Gmail is
>>> "Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is
>>> the annoying combo that I have to write every time to enter bottom
>>> posting insert mode.
>> How is it that Anatoly has such great difficulty editing email to
>> intersperse responses in Gmail, and yet I find the process entirely
>easy and
>> quick? Oh yeah... I guess I answered my own question by mentioning
>the name
>> of the complainer.
>The requirements of the machine <-> human interface are very subjective
>depend on the age of a person being introduced to enabling
>technology. I doubt that people younger than 25 are considering email
>as a
>communication method at all, and I know that annoying interfaces are
>reason why people pretend not to use them. So, it may happen that
>discussions are limited to a certain age group because of that.

I'm under 25! It's not *annoying interfaces*; everything has some annoying interface aspect. That's very subjective. Personally, I prefer email to IM.

Remember, you're referring to the nerdy youth, not the normal ones. :)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

From ncoghlan at  Mon Jun 15 07:03:10 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 15 Jun 2015 15:03:10 +1000
Subject: [Python-ideas] Meta: Email netiquette
In-Reply-To: <>
References: <>
Message-ID: <>

On 15 June 2015 at 12:17, Ryan Gonzalez <rymg19 at> wrote:
> I'm under 25! It's not *annoying interfaces*; everything has some annoying interface aspect. That's very subjective. Personally, I prefer email to IM.
> Remember, you're referring to the nerdy youth, not the normal ones. :)

Folks, before we continue further down the rabbit hole, please take
into account that Anatoly has already been banned from
and the core-mentorship mailing list for being demonstrably unable to
adjust his own behaviour to appropriately account for the needs and
interests of other members of the Python community, especially the
CPython core development team. (Some additional background on the

This isn't a case of "poor communication practices from someone that
doesn't yet understand our expectations of appropriate behaviour when
it comes to considering the needs and perspectives of others", it's
"poor communication practices from someone that actively refuses to
meet our standards of expected behaviour despite years of collective
coaching from a range of core developers".

Making a request to the respective list moderators for Anatoly's ban
to be extended to also cover python-dev and python-ideas as the main
core development lists (as has clearly become necessary) has been
approved by the Python Software Foundation Board, but actually putting
that request into effect is unfortunately a somewhat complicated
distributed task with Mailman 2 so it's still a work in progress at
this point in time. (It's my understanding that Mailman 3 improves the
tooling made available to list moderators, but the migration of the mailing lists from Mailman 2 to Mailman 3 is going to be a
time consuming infrastructure maintenance task in and of itself)


P.S. Pondering the broader question of managing counterproductive
attempts at contributing to a community, my view on the key reason
that Anatoly's ongoing attempts to "help" the core development team
have proven to be a particularly challenging situation to address
relates to the fact that there are two key aspects to effecting change
in a community or organisation:

* vocally criticising it from the outside (allowing the existing
community leaders to decide whether or not they agree with the
concerns raised, and subsequently come up with their preferred
approach to addressing them)
* working to change it from the inside (often by gaining personal
credibility through contribution in non-controversial areas before
pushing for potentially controversial changes in other areas of

Making significant structural changes to a community or organisation
usually requires a combination of both activities, as influential
insiders echo and amplify the voices of critical outsiders that they
have come to agree with, as former insiders leave and adopt the role
of critical outsider, and as formerly critical outsiders are brought
into the fold as new insiders to help address the problems they noted.

Refusing to listen to criticism at all is a recipe for stagnation and
decline, so folks are understandably wary of shutting out critical
voices in the general case.

However, these "critical outsider" and "influential insider" roles in
advocating for structural change are also largely mutually exclusive -
for an outsider, "how could we make such a change in practice?" isn't
their problem, while for insiders, agreeing with a diagnosis of a
problem or concern is only the first step, as actually addressing the
concern is then a complex exercise in determining where the time and
energy to address the issue is going to come from, and how the
particular concern stacks up against all the other problems and
challenges that need to be addressed. Expanding the available pool of
contributor time and energy doesn't necessarily eliminate the latter
requirement for collaborative prioritisation, as collective ambition
often grows right along with the size of the contributor base.

It *is* possible for skilled communicators to pursue both roles at the
same time (which can be an amazingly effective approach when handled
well), but it's a difficult task that requires context-dependent
moderation of their own behaviour, such that when they're using
community specific communication channels, they operate primarily in
"influential insider" mode, and largely reserve "critical outsider"
mode for raising their concerns on their own platforms (e.g. a
personal blog). More commonly, folks will choose one approach or the
other based on their current level of engagement with the community

By contrast, someone using community specific communication channels
while persisting in operating in "critical outsider" mode counts as
deliberately disruptive behaviour, as it involves privileging our own
personal view of what we think the group's collective priorities
*should* be and *forcing* a discussion on those topics, rather than
gracefully accepting that there's almost always going to be a gap
between our personal priorities and the collective priorities of the
communities we choose to participate in, so we need to adjust our
expectations accordingly.

The combination of "insists on being directly involved in a particular
community" and "refuses to address feedback they receive on the
inappropriateness of their behaviour in that community" is fortunately
rare - most folks will either move on voluntarily once it is made
clear that their priorities and the group's priorities aren't aligned,
or else they will adjust their behaviour to be in line with community
norms whilst participating in that community.

For veterans of entirely unmoderated Usenet newsgroups, the historical
answer to the problematic "doesn't leave voluntarily when it is made
clear that their behaviour is not welcome" pattern has been to adopt
personal filters that automatically delete messages from particularly
unhelpful group participants. One of the realisations that has come
with the growth of online community management as a field of expertise
is that this individual filtering based approach is hostile to new
participants - newcomers don't know who to avoid yet, so they attempt
to engage productively with folks that aren't actually interested in
collaborative discussion (whether they know it or not). In the absence
of enforced bans, experienced group participants then face a choice
between appearing generally hostile (if they warn the newcomers away
from the participants known to regularly exhibit toxic behaviour), or
generally uncaring (if they leave the newcomers to their own devices).

Commercially backed open source communities have actually lead the way
in addressing this, as they're generally far more comfortable with
asserting their authority to ban folks in the interests of fostering a
more effective collaborative environment, and critical voices like
Model View Culture [1] have also had a major part to play in pointing
out the problems resulting from the historical approach of leaving
folks to find their own means of coping with toxic behaviour.

Continuing this meta-discussion here would be taking us even further
off-topic for python-ideas, though, so if anyone would like to
continue, I would suggest the comment thread on as
a possible venue.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From amber.yust at  Wed Jun 17 20:58:00 2015
From: amber.yust at (Amber Yust)
Date: Wed, 17 Jun 2015 18:58:00 +0000
Subject: [Python-ideas] Keyword-only arguments?
Message-ID: <>

One thing that has been a source of bugs and frustration in the past is the
inability to designate a named keyword argument that cannot be passed as a
positional argument (short of **kwargs and then keying into the dict
directly). Has there been any previous discussion on the possibility of a
means to designate named arguments as explicitly non-positional?

Not a solid proposal, but to capture the essential difference of what I'm
thinking of, along the lines of...

    def foo(bar, baz=None, qux: None):

where bar is a required positional argument, baz is an optional argument
that can have a value passed positionally or by name, and qux is an
optional argument that must always be passed by keyword.

Such a means would help avoid cases where a misremembered function
signature results in a subtle and likely unnoticed bug due to unintended
parameter/argument mismatch.

(It's possible that this has been discussed before - a cursory search of
python-ideas didn't bring up any direct discussion, but I may have missed
something. If you have a link to prior discussion, please by all means
point me at it!)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From joejev at  Wed Jun 17 21:00:39 2015
From: joejev at (Joseph Jevnik)
Date: Wed, 17 Jun 2015 15:00:39 -0400
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 17, 2015 at 2:58 PM, Amber Yust <amber.yust at> wrote:

> One thing that has been a source of bugs and frustration in the past is
> the inability to designate a named keyword argument that cannot be passed
> as a positional argument (short of **kwargs and then keying into the dict
> directly). Has there been any previous discussion on the possibility of a
> means to designate named arguments as explicitly non-positional?
> Not a solid proposal, but to capture the essential difference of what I'm
> thinking of, along the lines of...
>     def foo(bar, baz=None, qux: None):
> where bar is a required positional argument, baz is an optional argument
> that can have a value passed positionally or by name, and qux is an
> optional argument that must always be passed by keyword.
> Such a means would help avoid cases where a misremembered function
> signature results in a subtle and likely unnoticed bug due to unintended
> parameter/argument mismatch.
> (It's possible that this has been discussed before - a cursory search of
> python-ideas didn't bring up any direct discussion, but I may have missed
> something. If you have a link to prior discussion, please by all means
> point me at it!)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From toddrjen at  Wed Jun 17 21:01:20 2015
From: toddrjen at (Todd)
Date: Wed, 17 Jun 2015 21:01:20 +0200
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 17, 2015 8:58 PM, "Amber Yust" <amber.yust at> wrote:
> One thing that has been a source of bugs and frustration in the past is
the inability to designate a named keyword argument that cannot be passed
as a positional argument (short of **kwargs and then keying into the dict
directly). Has there been any previous discussion on the possibility of a
means to designate named arguments as explicitly non-positional?
> Not a solid proposal, but to capture the essential difference of what I'm
thinking of, along the lines of...
>     def foo(bar, baz=None, qux: None):
> where bar is a required positional argument, baz is an optional argument
that can have a value passed positionally or by name, and qux is an
optional argument that must always be passed by keyword.
> Such a means would help avoid cases where a misremembered function
signature results in a subtle and likely unnoticed bug due to unintended
parameter/argument mismatch.
> (It's possible that this has been discussed before - a cursory search of
python-ideas didn't bring up any direct discussion, but I may have missed
something. If you have a link to prior discussion, please by all means
point me at it!)

Already present in python 3:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ckaynor at  Wed Jun 17 21:01:08 2015
From: ckaynor at (Chris Kaynor)
Date: Wed, 17 Jun 2015 12:01:08 -0700
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 17, 2015 at 11:58 AM, Amber Yust <amber.yust at> wrote:

> One thing that has been a source of bugs and frustration in the past is
> the inability to designate a named keyword argument that cannot be passed
> as a positional argument (short of **kwargs and then keying into the dict
> directly). Has there been any previous discussion on the possibility of a
> means to designate named arguments as explicitly non-positional?
> Not a solid proposal, but to capture the essential difference of what I'm
> thinking of, along the lines of...
>     def foo(bar, baz=None, qux: None):
> where bar is a required positional argument, baz is an optional argument
> that can have a value passed positionally or by name, and qux is an
> optional argument that must always be passed by keyword.
> Such a means would help avoid cases where a misremembered function
> signature results in a subtle and likely unnoticed bug due to unintended
> parameter/argument mismatch.
> (It's possible that this has been discussed before - a cursory search of
> python-ideas didn't bring up any direct discussion, but I may have missed
> something. If you have a link to prior discussion, please by all means
> point me at it!)

This feature was added to Python 3 about 9 years ago, see A quick search for "python
keyword only arguments" on Google found it.

Guido's time machine strikes again!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From amber.yust at  Wed Jun 17 21:11:23 2015
From: amber.yust at (Amber Yust)
Date: Wed, 17 Jun 2015 19:11:23 +0000
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

Interesting. I don't think I've ever seen it used, even having looked at
Python 3 code. For those who have worked with more Python 3 code than I
have, do you ever see it used?

On Wed, Jun 17, 2015 at 12:02 PM Chris Kaynor <ckaynor at>

> On Wed, Jun 17, 2015 at 11:58 AM, Amber Yust <amber.yust at> wrote:
>> One thing that has been a source of bugs and frustration in the past is
>> the inability to designate a named keyword argument that cannot be passed
>> as a positional argument (short of **kwargs and then keying into the dict
>> directly). Has there been any previous discussion on the possibility of a
>> means to designate named arguments as explicitly non-positional?
>> Not a solid proposal, but to capture the essential difference of what I'm
>> thinking of, along the lines of...
>>     def foo(bar, baz=None, qux: None):
>> where bar is a required positional argument, baz is an optional argument
>> that can have a value passed positionally or by name, and qux is an
>> optional argument that must always be passed by keyword.
>> Such a means would help avoid cases where a misremembered function
>> signature results in a subtle and likely unnoticed bug due to unintended
>> parameter/argument mismatch.
>> (It's possible that this has been discussed before - a cursory search of
>> python-ideas didn't bring up any direct discussion, but I may have missed
>> something. If you have a link to prior discussion, please by all means
>> point me at it!)
> This feature was added to Python 3 about 9 years ago, see
> A quick search for "python
> keyword only arguments" on Google found it.
> Guido's time machine strikes again!
> Chris
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ethan at  Wed Jun 17 21:25:22 2015
From: ethan at (Ethan Furman)
Date: Wed, 17 Jun 2015 12:25:22 -0700
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

On 06/17/2015 12:11 PM, Amber Yust wrote:
> Interesting. I don't think I've ever seen it used, even having looked at Python 3 code. For those who have worked with more Python 3 code than I have, do you ever see it used?

We don't typically go back and modify existing code to use new features, so your best bet to see it used is to find new features in Python 3.

Or, do a grep on the source code: a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False): a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v'):        *, _is_text_encoding=None):                 allow_no_value=False, *, delimiters=('=', ':'),    def get(self, section, option, *, raw=False, vars=None, fallback=_UNSET):    def _get_conv(self, section, option, conv, *, raw=False, vars=None,    def getint(self, section, option, *, raw=False, vars=None,    def getfloat(self, section, option, *, raw=False, vars=None,    def getboolean(self, section, option, *, raw=False, vars=None,    def _validate_value_types(self, *, section="", option="", value=""):    def get(self, option, fallback=None, *, raw=False, vars=None, standard time.  Since that's what the local clock *does*, we want to map both                  context=False, numlines=5, *, charset='utf-8'): dis(x=None, *, file=None): distb(tb=None, *, file=None): show_code(co, *, file=None): get_instructions(x, *, first_line=None): disassemble(co, lasti=-1, *, file=None):                       *, file=None, line_offset=0): _disassemble_str(source, *, file=None):    def __init__(self, x, *, first_line=None, current_offset=None):    def __call__(cls, value, names=None, *, module=None, qualname=None, type=None, start=1):    def _create_(cls, class_name, names=None, *, module=None, qualname=None, type=None, start=1):            # an ABC *base*, insert said ABC to its MRO. glob(pathname, *, recursive=False): iglob(pathname, *, recursive=False): (co_*, im_*, tb_*, etc.) in a friendlier fashion. unwrap(func, *, stop=None):                    # signature: "(*, a='spam', b, c)". Because attempting _signature_from_callable(obj, *,    def __init__(self, name, kind, *, default=_empty, annotation=_empty):    def replace(self, *, name=_void, kind=_void,    def __init__(self, parameters=None, *, return_annotation=_empty,    def from_callable(cls, obj, *, follow_wrapped=True):    def replace(self, *, parameters=_void, return_annotation=_void):    def _bind(self, args, kwargs, *, partial=False):                # separator to the parameters list ("foo(arg1, *, arg2)" case) signature(obj, *, follow_wrapped=True):    def __init__(self, filename=None, mode="r", *, open(filename, mode="rb", *,    optional arguments *format*, *check*, *preset* and *filters*.    optional arguments *format*, *check* and *filters*.    def newgroups(self, date, *, file=None):    def newnews(self, group, date, *, file=None):    def list(self, group_pattern=None, *, file=None):    def help(self, *, file=None):    def head(self, message_spec=None, *, file=None):    def body(self, message_spec=None, *, file=None):    def article(self, message_spec=None, *, file=None):    def xhdr(self, hdr, str, *, file=None):    def xover(self, start, end, *, file=None):    def over(self, message_spec, *, file=None):    def xgtitle(self, group, *, file=None): See also module 'glob' for expansion of *, ? and [...] in pathnames.    *, /, abs(), .conjugate, ==, and !=.    def fwalk(top=".", topdown=True, onerror=None, *, follow_symlinks=False, dir_fd=None):    def __init__(self, file, protocol=None, *, fix_imports=True):    def __init__(self, file, *, fix_imports=True,        Optional keyword arguments are *fix_imports*, *encoding* and        *errors*, which are used to control compatiblity support for _dump(obj, file, protocol=None, *, fix_imports=True): _dumps(obj, protocol=None, *, fix_imports=True): _load(file, *, fix_imports=True, encoding="ASCII", errors="strict"): _loads(s, *, fix_imports=True, encoding="ASCII", errors="strict"): load(fp, *, fmt=None, use_builtin_types=True, dict_type=dict): loads(value, *, fmt=None, use_builtin_types=True, dict_type=dict): dump(value, fp, *, fmt=FMT_XML, sort_keys=True, skipkeys=False): dumps(value, *, fmt=FMT_XML, skipkeys=False, sort_keys=True): See also module 'glob' for expansion of *, ? and [...] in pathnames. pprint(object, stream=None, indent=1, width=80, depth=None, *, pformat(object, indent=1, width=80, depth=None, *, compact=False):    def __init__(self, indent=1, width=80, depth=None, stream=None, *, browse(port=0, *, open_browser=True):    *opener* with (*file*, *flags*). *opener* must return an open file        """Read up to len(b) bytes into *b*, using at most one system call        object is then obtained by calling opener with (*name*, *flags*). copyfile(src, dst, *, follow_symlinks=True): copymode(src, dst, *, follow_symlinks=True):    def _copyxattr(src, dst, *, follow_symlinks=True): copystat(src, dst, *, follow_symlinks=True): copy(src, dst, *, follow_symlinks=True): copy2(src, dst, *, follow_symlinks=True):    def makefile(self, mode="r", buffering=None, *, create_default_context(purpose=Purpose.SERVER_AUTH, *, cafile=None, _create_unverified_context(protocol=PROTOCOL_SSLv23, *, cert_reqs=None,    def list(self, verbose=True, *, members=None):    def add(self, name, arcname=None, recursive=True, exclude=None, *, filter=None):    def extractall(self, path=".", members=None, *, numeric_owner=False):    def extract(self, member, path="", set_attrs=True, *, numeric_owner=False):                 *,    the *width*, it is returned as is.  Otherwise, as many words                 args=(), kwargs=None, *, daemon=None): main(args=None, *, _wrap_timer=None):    def __init__(self, filename, lineno, name, *, lookup_line=True,    def extract(klass, frame_gen, *, limit=None, lookup_lines=True,    def __init__(self, exc_type, exc_value, exc_traceback, *, limit=None,    def format(self, *, chain=True):        If chain is not *True*, *__cause__* and *__context__* will not be formatted.    def __new__(cls, name, bases, namespace, *, _root=False):    def __init__(self, *, record=False, module=None):


From guido at  Wed Jun 17 21:29:31 2015
From: guido at (Guido van Rossum)
Date: Wed, 17 Jun 2015 21:29:31 +0200
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

It's used all over the asyncio code.

On Wed, Jun 17, 2015 at 9:11 PM, Amber Yust <amber.yust at> wrote:

> Interesting. I don't think I've ever seen it used, even having looked at
> Python 3 code. For those who have worked with more Python 3 code than I
> have, do you ever see it used?
> On Wed, Jun 17, 2015 at 12:02 PM Chris Kaynor <ckaynor at>
> wrote:
>> On Wed, Jun 17, 2015 at 11:58 AM, Amber Yust <amber.yust at>
>> wrote:
>>> One thing that has been a source of bugs and frustration in the past is
>>> the inability to designate a named keyword argument that cannot be passed
>>> as a positional argument (short of **kwargs and then keying into the dict
>>> directly). Has there been any previous discussion on the possibility of a
>>> means to designate named arguments as explicitly non-positional?
>>> Not a solid proposal, but to capture the essential difference of what
>>> I'm thinking of, along the lines of...
>>>     def foo(bar, baz=None, qux: None):
>>> where bar is a required positional argument, baz is an optional argument
>>> that can have a value passed positionally or by name, and qux is an
>>> optional argument that must always be passed by keyword.
>>> Such a means would help avoid cases where a misremembered function
>>> signature results in a subtle and likely unnoticed bug due to unintended
>>> parameter/argument mismatch.
>>> (It's possible that this has been discussed before - a cursory search of
>>> python-ideas didn't bring up any direct discussion, but I may have missed
>>> something. If you have a link to prior discussion, please by all means
>>> point me at it!)
>> This feature was added to Python 3 about 9 years ago, see
>> A quick search for "python
>> keyword only arguments" on Google found it.
>> Guido's time machine strikes again!
>> Chris
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> Code of Conduct:
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From carl at  Wed Jun 17 21:02:03 2015
From: carl at (Carl Meyer)
Date: Wed, 17 Jun 2015 13:02:03 -0600
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Amber,

On 06/17/2015 12:58 PM, Amber Yust wrote:
> One thing that has been a source of bugs and frustration in the past is
> the inability to designate a named keyword argument that cannot be
> passed as a positional argument (short of **kwargs and then keying into
> the dict directly). Has there been any previous discussion on the
> possibility of a means to designate named arguments as explicitly
> non-positional?
> Not a solid proposal, but to capture the essential difference of what
> I'm thinking of, along the lines of...
>     def foo(bar, baz=None, qux: None):
> where bar is a required positional argument, baz is an optional argument
> that can have a value passed positionally or by name, and qux is an
> optional argument that must always be passed by keyword.
> Such a means would help avoid cases where a misremembered function
> signature results in a subtle and likely unnoticed bug due to unintended
> parameter/argument mismatch.
> (It's possible that this has been discussed before - a cursory search of
> python-ideas didn't bring up any direct discussion, but I may have
> missed something. If you have a link to prior discussion, please by all
> means point me at it!)

I can do better than prior discussion - this already exists in Python 3:

Python 3.4.2 (default, Dec 12 2014, 17:46:08)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def foo(bar, *, baz=None):
...     print(bar, baz)
>>> foo('a', 'b')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() takes 1 positional argument but 2 were given
>>> foo('a', baz='b')
a b



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <>

From njs at  Wed Jun 17 21:32:07 2015
From: njs at (Nathaniel Smith)
Date: Wed, 17 Jun 2015 12:32:07 -0700
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 17, 2015 12:11 PM, "Amber Yust" <amber.yust at> wrote:
> Interesting. I don't think I've ever seen it used, even having looked at
Python 3 code. For those who have worked with more Python 3 code than I
have, do you ever see it used?

Unfortunately, no, because at this point almost all py3 APIs I see are
still aiming for py2/py3 compatibility, and there's really no good way to
accomplish kw only args in py2. It can be done, but it's very cumbersome;
not like range or print or whatever where you can just import something
from six or add some parentheses.

In retrospect I wish this had been backported to 2.7, because they're super
useful for making better APIs, but that ship has sailed.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From joejev at  Wed Jun 17 21:36:45 2015
From: joejev at (Joseph Jevnik)
Date: Wed, 17 Jun 2015 15:36:45 -0400
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

> In retrospect I wish this had been backported to 2.7, because they're
super useful for making better APIs, but that ship has sailed

Why would this not be able to be ported now? It does not clash with any
existing python 2 syntax so all current python 2 is still valid and has no
behaviour change.

On Wed, Jun 17, 2015 at 3:32 PM, Nathaniel Smith <njs at> wrote:

> On Jun 17, 2015 12:11 PM, "Amber Yust" <amber.yust at> wrote:
> >
> > Interesting. I don't think I've ever seen it used, even having looked at
> Python 3 code. For those who have worked with more Python 3 code than I
> have, do you ever see it used?
> Unfortunately, no, because at this point almost all py3 APIs I see are
> still aiming for py2/py3 compatibility, and there's really no good way to
> accomplish kw only args in py2. It can be done, but it's very cumbersome;
> not like range or print or whatever where you can just import something
> from six or add some parentheses.
> In retrospect I wish this had been backported to 2.7, because they're
> super useful for making better APIs, but that ship has sailed.
> -n
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From joejev at  Wed Jun 17 21:32:36 2015
From: joejev at (Joseph Jevnik)
Date: Wed, 17 Jun 2015 15:32:36 -0400
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

I use it for all of my python3 only code for all of the reasons that you
mentioned. One of the main reasons that it is not used is that many people
try to make their code work with 2 and 3.

On Wed, Jun 17, 2015 at 3:11 PM, Amber Yust <amber.yust at> wrote:

> Interesting. I don't think I've ever seen it used, even having looked at
> Python 3 code. For those who have worked with more Python 3 code than I
> have, do you ever see it used?
> On Wed, Jun 17, 2015 at 12:02 PM Chris Kaynor <ckaynor at>
> wrote:
>> On Wed, Jun 17, 2015 at 11:58 AM, Amber Yust <amber.yust at>
>> wrote:
>>> One thing that has been a source of bugs and frustration in the past is
>>> the inability to designate a named keyword argument that cannot be passed
>>> as a positional argument (short of **kwargs and then keying into the dict
>>> directly). Has there been any previous discussion on the possibility of a
>>> means to designate named arguments as explicitly non-positional?
>>> Not a solid proposal, but to capture the essential difference of what
>>> I'm thinking of, along the lines of...
>>>     def foo(bar, baz=None, qux: None):
>>> where bar is a required positional argument, baz is an optional argument
>>> that can have a value passed positionally or by name, and qux is an
>>> optional argument that must always be passed by keyword.
>>> Such a means would help avoid cases where a misremembered function
>>> signature results in a subtle and likely unnoticed bug due to unintended
>>> parameter/argument mismatch.
>>> (It's possible that this has been discussed before - a cursory search of
>>> python-ideas didn't bring up any direct discussion, but I may have missed
>>> something. If you have a link to prior discussion, please by all means
>>> point me at it!)
>> This feature was added to Python 3 about 9 years ago, see
>> A quick search for "python
>> keyword only arguments" on Google found it.
>> Guido's time machine strikes again!
>> Chris
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> Code of Conduct:
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From geoffspear at  Wed Jun 17 22:26:55 2015
From: geoffspear at (Geoffrey Spear)
Date: Wed, 17 Jun 2015 16:26:55 -0400
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 17, 2015 at 3:36 PM, Joseph Jevnik <joejev at> wrote:

> > In retrospect I wish this had been backported to 2.7, because they're
> super useful for making better APIs, but that ship has sailed
> Why would this not be able to be ported now? It does not clash with any
> existing python 2 syntax so all current python 2 is still valid and has no
> behaviour change.
Python 2.7 is not getting new features (ssl changes notwithstanding), and
there will never be a Python 2.8. There's certainly no desire to add new
syntax so there would be code that will run in Python 2.7.11 only, and not
in earlier 2.7 releases.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Thu Jun 18 00:52:49 2015
From: ncoghlan at (Nick Coghlan)
Date: Thu, 18 Jun 2015 08:52:49 +1000
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

On 18 Jun 2015 6:27 am, "Geoffrey Spear" <geoffspear at> wrote:
> On Wed, Jun 17, 2015 at 3:36 PM, Joseph Jevnik <joejev at> wrote:
>> > In retrospect I wish this had been backported to 2.7, because they're
super useful for making better APIs, but that ship has sailed
>> Why would this not be able to be ported now? It does not clash with any
existing python 2 syntax so all current python 2 is still valid and has no
behaviour change.
> Python 2.7 is not getting new features (ssl changes notwithstanding), and
there will never be a Python 2.8. There's certainly no desire to add new
syntax so there would be code that will run in Python 2.7.11 only, and not
in earlier 2.7 releases.

Exactly - it's only in truly exceptional cases like the PEP 466 & 476
network security changes that we'll add features to Python 2.7.

Keyword-only arguments are certainly a nice enhancement, but their absence
isn't actively harmful the way the aging network security capabilities were.


> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From kaiser.yann at  Sat Jun 20 19:52:03 2015
From: kaiser.yann at (Yann Kaiser)
Date: Sat, 20 Jun 2015 17:52:03 +0000
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 17 Jun 2015 at 21:33 Nathaniel Smith <njs at> wrote:

> there's really no good way to accomplish kw only args in py2. It can be
> done, but it's very cumbersome; not like range or print or whatever where
> you can just import something from six or add some parentheses.
As you correctly point out, it can't be done without friction.

I've attempted backporting kw-only parameters through decorators:

from sigtools import modifiers

def func(abc, kwop):

def func(abc, kwop=False):
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From flying-sheep at  Sat Jun 20 20:13:46 2015
From: flying-sheep at (Philipp A.)
Date: Sat, 20 Jun 2015 18:13:46 +0000
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

OK, i think it?s time to finally switch to python 3 instead of writing more
horrible crutches:

# coding: utf-8from __future__ import absolute_import, division,
print_function, unicode_literalsfrom builtins import *
import trolliusfrom trollius import From, Return
from other import stuff
@trollius.coroutine at modifiers.kwoargs('b')def awesome_stuff(a, b=5):
    res = (yield From(stuff()))
    raise Return(res)


import asyncio
from other import stuff
@asyncio.coroutinedef awesome_stuff(a, *, b=5):
    res = (yield from stuff())
    return res

or soon:

from other import stuff

async def awesome_stuff(a, *, b=5):
    res = await stuff()
    return res

Yann Kaiser kaiser.yann at <http://mailto:kaiser.yann at>
schrieb am Sa., 20. Juni 2015 um 19:52 Uhr:

On Wed, 17 Jun 2015 at 21:33 Nathaniel Smith <njs at> wrote:
>> there's really no good way to accomplish kw only args in py2. It can be
>> done, but it's very cumbersome; not like range or print or whatever where
>> you can just import something from six or add some parentheses.
> As you correctly point out, it can't be done without friction.
> I've attempted backporting kw-only parameters through decorators:
> from sigtools import modifiers
> @modifiers.kwoargs('kwop')
> def func(abc, kwop):
>     ...
> @modifiers.autokwoargs
> def func(abc, kwop=False):
>     ...
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From kaiser.yann at  Sat Jun 20 20:28:18 2015
From: kaiser.yann at (Yann Kaiser)
Date: Sat, 20 Jun 2015 18:28:18 +0000
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

Definitely agree with wanting to move on and focus on Python 3.

I design my stuff with Python 3 first in mind, but reality tells me I need
to keep supporting Python 2 users, even if that means uglifying front-page
examples with things such as sigtools.modifiers.kwoargs. The existence of
this thread and of some of the emails in the "signature documentation"
thread only serves as proof that I'd confuse too many by keeping "kwoargs"
& co a side-note.

Developing on Python 3.4 is great, things like chained tracebacks are
fantastic. If I can do it on Python 3, I will. But what if I want to deploy
to GAE? Stuck with 2.7. So are many users.

It's ugly, it sucks, but so far it is out of my hands and necessary.

On Sat, 20 Jun 2015 at 20:13 Philipp A. <flying-sheep at> wrote:

> OK, i think it?s time to finally switch to python 3 instead of writing
> more horrible crutches:
> # coding: utf-8from __future__ import absolute_import, division, print_function, unicode_literalsfrom builtins import *
> import trolliusfrom trollius import From, Return
> from other import stuff
> @trollius.coroutine at modifiers.kwoargs('b')def awesome_stuff(a, b=5):
>     res = (yield From(stuff()))
>     raise Return(res)
> vs.
> import asyncio
> from other import stuff
> @asyncio.coroutinedef awesome_stuff(a, *, b=5):
>     res = (yield from stuff())
>     return res
> or soon:
> from other import stuff
> async def awesome_stuff(a, *, b=5):
>     res = await stuff()
>     return res
> Yann Kaiser kaiser.yann at <http://mailto:kaiser.yann at>
> schrieb am Sa., 20. Juni 2015 um 19:52 Uhr:
> On Wed, 17 Jun 2015 at 21:33 Nathaniel Smith <njs at> wrote:
>>> there's really no good way to accomplish kw only args in py2. It can be
>>> done, but it's very cumbersome; not like range or print or whatever where
>>> you can just import something from six or add some parentheses.
>> As you correctly point out, it can't be done without friction.
>> I've attempted backporting kw-only parameters through decorators:
>> from sigtools import modifiers
>> @modifiers.kwoargs('kwop')
>> def func(abc, kwop):
>>     ...
>> @modifiers.autokwoargs
>> def func(abc, kwop=False):
>>     ...
> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> Code of Conduct:
> ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From flying-sheep at  Sat Jun 20 21:05:52 2015
From: flying-sheep at (Philipp A.)
Date: Sat, 20 Jun 2015 19:05:52 +0000
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

sure! and having those workarounds if you really need them is great.

it?s just really frustrating that so many of you are stuck with 2.7 or even
less without more reason than ?sysadmin won?t install a SCL?.

Yann Kaiser <kaiser.yann at> schrieb am Sa., 20. Juni 2015 um
20:28 Uhr:

> Definitely agree with wanting to move on and focus on Python 3.
> I design my stuff with Python 3 first in mind, but reality tells me I need
> to keep supporting Python 2 users, even if that means uglifying front-page
> examples with things such as sigtools.modifiers.kwoargs. The existence of
> this thread and of some of the emails in the "signature documentation"
> thread only serves as proof that I'd confuse too many by keeping "kwoargs"
> & co a side-note.
> Developing on Python 3.4 is great, things like chained tracebacks are
> fantastic. If I can do it on Python 3, I will. But what if I want to deploy
> to GAE? Stuck with 2.7. So are many users.
> It's ugly, it sucks, but so far it is out of my hands and necessary.
> On Sat, 20 Jun 2015 at 20:13 Philipp A. <flying-sheep at> wrote:
>> OK, i think it?s time to finally switch to python 3 instead of writing
>> more horrible crutches:
>> # coding: utf-8from __future__ import absolute_import, division, print_function, unicode_literalsfrom builtins import *
>> import trolliusfrom trollius import From, Return
>> from other import stuff
>> @trollius.coroutine at modifiers.kwoargs('b')def awesome_stuff(a, b=5):
>>     res = (yield From(stuff()))
>>     raise Return(res)
>> vs.
>> import asyncio
>> from other import stuff
>> @asyncio.coroutinedef awesome_stuff(a, *, b=5):
>>     res = (yield from stuff())
>>     return res
>> or soon:
>> from other import stuff
>> async def awesome_stuff(a, *, b=5):
>>     res = await stuff()
>>     return res
>> Yann Kaiser kaiser.yann at <http://mailto:kaiser.yann at>
>> schrieb am Sa., 20. Juni 2015 um 19:52 Uhr:
>> On Wed, 17 Jun 2015 at 21:33 Nathaniel Smith <njs at> wrote:
>>>> there's really no good way to accomplish kw only args in py2. It can be
>>>> done, but it's very cumbersome; not like range or print or whatever where
>>>> you can just import something from six or add some parentheses.
>>> As you correctly point out, it can't be done without friction.
>>> I've attempted backporting kw-only parameters through decorators:
>>> from sigtools import modifiers
>>> @modifiers.kwoargs('kwop')
>>> def func(abc, kwop):
>>>     ...
>>> @modifiers.autokwoargs
>>> def func(abc, kwop=False):
>>>     ...
>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at
>>> Code of Conduct:
>> ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ericsnowcurrently at  Sat Jun 20 23:42:33 2015
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 20 Jun 2015 15:42:33 -0600
Subject: [Python-ideas] solving multi-core Python
Message-ID: <>

tl;dr Let's exploit multiple cores by fixing up subinterpreters,
exposing them in Python, and adding a mechanism to safely share
objects between them.

This proposal is meant to be a shot over the bow, so to speak.  I plan
on putting together a more complete PEP some time in the future, with
content that is more refined along with references to the appropriate
online resources.

Feedback appreciated!  Offers to help even more so! :)



Python's multi-core story is murky at best.  Not only can we be more
clear on the matter, we can improve Python's support.  The result of
any effort must make multi-core (i.e. parallelism) support in Python
obvious, unmistakable, and undeniable (and keep it Pythonic).

Currently we have several concurrency models represented via
threading, multiprocessing, asyncio, concurrent.futures (plus others
in the cheeseshop).  However, in CPython the GIL means that we don't
have parallelism, except through multiprocessing which requires
trade-offs. (See Dave Beazley's talk at PyCon US 2015.)

This is a situation I'd like us to solve once and for all for a couple
of reasons.  Firstly, it is a technical roadblock for some Python
developers, though I don't see that as a huge factor.  Regardless,
secondly, it is especially a turnoff to folks looking into Python and
ultimately a PR issue.  The solution boils down to natively supporting
multiple cores in Python code.

This is not a new topic.  For a long time many have clamored for death
to the GIL.  Several attempts have been made over the years and failed
to do it without sacrificing single-threaded performance.
Furthermore, removing the GIL is perhaps an obvious solution but not
the only one.  Others include Trent Nelson's PyParallels, STM, and
other Python implementations..


In some personal correspondence Nick Coghlan, he summarized my
preferred approach as "the data storage separation of multiprocessing,
with the low message passing overhead of threading".

For Python 3.6:

* expose subinterpreters to Python in a new stdlib module: "subinterpreters"
* add a new SubinterpreterExecutor to concurrent.futures
* add a queue.Queue-like type that will be used to explicitly share
objects between subinterpreters

This is less simple than it might sound, but presents what I consider
the best option for getting a meaningful improvement into Python 3.6.

Also, I'm not convinced that the word "subinterpreter" properly
conveys the intent, for which subinterpreters is only part of the
picture.  So I'm open to a better name.


Note that I'm drawing quite a bit of inspiration from elsewhere.  The
idea of using subinterpreters to get this (more) efficient isolated
execution is not my own (I heard it from Nick).  I have also spent
quite a bit of time and effort researching for this proposal.  As part
of that, a number of people have provided invaluable insight and
encouragement as I've prepared, including Guido, Nick, Brett Cannon,
Barry Warsaw, and Larry Hastings.

Additionally, Hoare's "Communicating Sequential Processes" (CSP) has
been a big influence on this proposal.  FYI, CSP is also the
inspiration for Go's concurrency model (e.g. goroutines, channels,
select).  Dr. Sarah Mount, who has expertise in this area, has been
kind enough to agree to collaborate and even co-author the PEP that I
hope comes out of this proposal.

My interest in this improvement has been building for several years.
Recent events, including this year's language summit, have driven me
to push for something concrete in Python 3.6.

The subinterpreter Module

The subinterpreters module would look something like this (a la

      running() -> Task or None
      run(...) -> Task  # wrapper around PyRun_*, auto-calls Task.start()
  Task(...)  # analogous to a CSP process
      # other stuff?

      # for compatibility with threading.Thread:
  Channel(...)  # shared by passing as an arg to the subinterpreter-running func
      # this API is a bit uncooked still...
      poison()  # maybe
  select()  # maybe

Note that Channel objects will necessarily be shared in common between
subinterpreters (where bound).  This sharing will happen when the one
or more of the parameters to the function passed to Task() is a
Channel.  Thus the channel would be open to the (sub)interpreter
calling Task() (or and to the new
subinterpreter.  Also, other channels could be fed into such a shared
channel, whereby those channels would then likewise be shared between
the interpreters.

I don't know yet if this module should include *all* the essential
pieces to implement a complete CSP library.  Given the inspiration
that CSP is providing, it may make sense to support it fully.  It
would be interesting then if the implementation here allowed the
(complete?) formalisms provided by CSP (thus, e.g. rigorous proofs of
concurrent system models).

I expect there will also be a _subinterpreters module with low-level
implementation-specific details.

Related Ideas and Details Under Consideration

Some of these are details that need to be sorted out.  Some are
secondary ideas that may be appropriate to address in this proposal or
may need to be tabled.  I have some others but these should be
sufficient to demonstrate the range of points to consider.

* further coalesce the (concurrency/parallelism) abstractions between
threading, multiprocessing, asyncio, and this proposal
* only allow one running Task at a time per subinterpreter
* disallow threading within subinterpreters (with legacy support in C)
  + ignore/remove the GIL within subinterpreters (since they would be
* use the GIL only in the main interpreter and for interaction between
subinterpreters (and a "Local Interpreter Lock" for within a
* disallow forking within subinterpreters
* only allow passing plain functions to Task() and (exclude closures, other callables)
* object ownership model
  + read-only in all but 1 subinterpreter
  + RW in all subinterpreters
  + only allow 1 subinterpreter to have any refcounts to an object
(except for channels)
* only allow immutable objects to be shared between subinterpreters
* for better immutability, move object ref counts into a separate table
* freeze (new machinery or memcopy or something) objects to make them
(at least temporarily) immutable
* expose a more complete CSP implementation in the stdlib (or make the
subinterpreters module more compliant)
* treat the main interpreter differently than subinterpreters (or
treat it exactly the same)
* add subinterpreter support to asyncio (the interplay between them
could be interesting)

Key Dependencies

There are a few related tasks/projects that will likely need to be
resolved before subinterpreters in CPython can be used in the proposed
manner.  The proposal could implemented either way, but it will help
the multi-core effort if these are addressed first.

* fixes to subinterpreter support (there are a couple individuals who
should be able to provide the necessary insight)
* PEP 432 (will simplify several key implementation details)
* improvements to isolation between subinterpreters (file descriptors,
env vars, others)

Beyond those, the scale and technical scope of this project means that
I am unlikely to be able to do all the work myself to land this in
Python 3.6 (though I'd still give it my best shot).  That will require
the involvement of various experts.  I expect that the project is
divisible into multiple mostly independent pieces, so that will help.

Python Implementations

They can correct me if I'm wrong, but from what I understand both
Jython and IronPython already have subinterpreter support.  I'll be
soliciting feedback from the different Python implementors about
subinterpreter support.

C Extension Modules

Subinterpreters already isolate extension modules (and built-in
modules, including sys).  PEP 384 provides some help too.  However,
global state in C can easily leak data between subinterpreters,
breaking the desired data isolation.  This is something that will need
to be addressed as part of the effort.

From at  Sun Jun 21 00:04:47 2015
From: at (Yury Selivanov)
Date: Sat, 20 Jun 2015 18:04:47 -0400
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>


On 2015-06-20 5:42 PM, Eric Snow wrote:
> tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> exposing them in Python, and adding a mechanism to safely share
> objects between them.

This is really great. Big +1 from me, and I'd be glad to help
with the PEP/implementation.


> * only allow immutable objects to be shared between subinterpreters

Even if this is the only thing we have -- an efficient way
for sharing immutable objects (such as bytes, strings, ints,
and, stretching the definition of immutable, FDs) that will
allow us to do a lot.


From njs at  Sun Jun 21 00:08:40 2015
From: njs at (Nathaniel Smith)
Date: Sat, 20 Jun 2015 15:08:40 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 20, 2015 2:42 PM, "Eric Snow" <ericsnowcurrently at> wrote:
> tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> exposing them in Python, and adding a mechanism to safely share
> objects between them.

This all sounds really cool if you can pull it off, and shared-nothing
threads do seem like the least impossible model to pull off. But "least
impossible" and "possible" are different :-). From your email I can't tell
whether this plan is viable while preserving backcompat and memory safety.

Suppose I have a queue between two subinterpreters, and on this queue I
place a list of dicts of user-defined-in-python objects, each of which
holds a reference to a user-defined-via-the-C-api object. What happens next?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ericsnowcurrently at  Sun Jun 21 00:54:04 2015
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 20 Jun 2015 16:54:04 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 20, 2015 4:08 PM, "Nathaniel Smith" <njs at> wrote:
> On Jun 20, 2015 2:42 PM, "Eric Snow" <ericsnowcurrently at> wrote:
> >
> > tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> > exposing them in Python, and adding a mechanism to safely share
> > objects between them.
> This all sounds really cool if you can pull it off, and shared-nothing
threads do seem like the least impossible model to pull off.


> But "least impossible" and "possible" are different :-). From your email
I can't tell whether this plan is viable while preserving backcompat and
memory safety.

I agree that those issues must be clearly solved in the proposal before it
can be approved.  I'm confident the approach I'm pursuing will afford us
the necessary guarantees.  I'll address those specific points directly when
I can sit down and organize my thoughts.

> Suppose I have a queue between two subinterpreters, and on this queue I
place a list of dicts of user-defined-in-python objects, each of which
holds a reference to a user-defined-via-the-C-api object. What happens next?

You've hit upon exactly the trickiness involved and why I'm thinking the
best approach initially is to only allow *strictly* immutable objects to
pass between interpreters.  Admittedly, my description of channels is very
vague.:)  There are a number of possibilities with them that I'm still
exploring (CSP has particular opinions...), but immutability is a
characteristic that may provide the simplest *initial* approach.  Going
that route shouldn't preclude adding some sort of support for mutable
objects later.

Keep in mind that by "immutability" I'm talking about *really* immutable,
perhaps going so far as treating the full memory space associated with an
object as frozen.  For instance, we'd have to ensure that "immutable"
Python objects like strings, ints, and tuples do not change (i.e. via the C
API).  The contents of involved tuples/containers would have to be likewise
immutable.  Even changing refcounts could be too much, hence the idea of
moving refcounts out to a separate table.

This level of immutability would be something new to Python.  We'll see if
it's necessary.  If it isn't too much work it might be a good idea
regardless of the multi-core proposal.

Also note that Barry has a (rejected) PEP from a number of years ago about
freezing objects...  That idea is likely out of scope as relates to my
proposal, but it certainly factors in the problem space.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jeanpierreda at  Sun Jun 21 00:54:15 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Sat, 20 Jun 2015 15:54:15 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

It's worthwhile to consider fork as an alternative.  IMO we'd get a
lot out of making forking safer, easier, and more efficient. (e.g.
respectively: adding an atfork registration mechanism; separating out
the bits of multiprocessing that use pickle from those that don't;
moving the refcount to a separate page, or allowing it to be frozen
prior to a fork.)

It sounds to me like this approach would use more memory than either
regular threaded code or forking, so its main advantages are being
cross-platform and less bug-prone. Is that right?

Note: I don't count the IPC cost of forking, because at least on
linux, any way to efficiently share objects between independent
interpreters in separate threads can also be ported to independent
interpreters in forked subprocesses, and *should* be.

See also: multiprocessing.Value/Array. This is probably a good
opportunity for that unification you mentioned. :)

On Sat, Jun 20, 2015 at 3:04 PM, Yury Selivanov < at> wrote:
> On 2015-06-20 5:42 PM, Eric Snow wrote:
>> * only allow immutable objects to be shared between subinterpreters
> Even if this is the only thing we have -- an efficient way
> for sharing immutable objects (such as bytes, strings, ints,
> and, stretching the definition of immutable, FDs) that will
> allow us to do a lot.

+1, this has a lot of utility, and can be extended naturally to other
types and circumstances.

-- Devin

From ericsnowcurrently at  Sun Jun 21 01:16:37 2015
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 20 Jun 2015 17:16:37 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" <jeanpierreda at> wrote:
> It's worthwhile to consider fork as an alternative.  IMO we'd get a
> lot out of making forking safer, easier, and more efficient. (e.g.
> respectively: adding an atfork registration mechanism; separating out
> the bits of multiprocessing that use pickle from those that don't;
> moving the refcount to a separate page, or allowing it to be frozen
> prior to a fork.)

So leverage a common base of code with the multiprocessing module?

> It sounds to me like this approach would use more memory than either
> regular threaded code or forking, so its main advantages are being
> cross-platform and less bug-prone. Is that right?

I would expect subinterpreters to use less memory.  Furthermore creating
them would be significantly faster.  Passing objects between them would be
much more efficient.  And, yes, cross-platform.

> Note: I don't count the IPC cost of forking, because at least on
> linux, any way to efficiently share objects between independent
> interpreters in separate threads can also be ported to independent
> interpreters in forked subprocesses,

How so?  Subinterpreters are in the same process.  For this proposal each
would be on its own thread.  Sharing objects between them through channels
would be more efficient than IPC.  Perhaps I've missed something?

> and *should* be.
> See also: multiprocessing.Value/Array. This is probably a good
> opportunity for that unification you mentioned. :)

I'll look.

> On Sat, Jun 20, 2015 at 3:04 PM, Yury Selivanov < at>
> > On 2015-06-20 5:42 PM, Eric Snow wrote:
> >> * only allow immutable objects to be shared between subinterpreters
> >
> > Even if this is the only thing we have -- an efficient way
> > for sharing immutable objects (such as bytes, strings, ints,
> > and, stretching the definition of immutable, FDs) that will
> > allow us to do a lot.
> +1, this has a lot of utility, and can be extended naturally to other
> types and circumstances.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Sun Jun 21 02:41:54 2015
From: rosuav at (Chris Angelico)
Date: Sun, 21 Jun 2015 10:41:54 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow <ericsnowcurrently at> wrote:
> * disallow forking within subinterpreters

I love the idea as a whole (if only because the detractors can be told
"Just use subinterpreters, then you get concurrency"), but this seems
like a tricky restriction. That means no subprocess.Popen, no shelling
out to other applications. And I don't know what of other restrictions
might limit any given program. Will it feel like subinterpreters are
"write your code according to these tight restrictions and it'll
work", or will it be more of "most programs will run in parallel just
fine, but there are a few things to be careful of"?


From ericsnowcurrently at  Sun Jun 21 02:58:18 2015
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 20 Jun 2015 18:58:18 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 20, 2015 at 6:41 PM, Chris Angelico <rosuav at> wrote:
> On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow <ericsnowcurrently at> wrote:
>> * disallow forking within subinterpreters
> I love the idea as a whole (if only because the detractors can be told
> "Just use subinterpreters, then you get concurrency"), but this seems
> like a tricky restriction. That means no subprocess.Popen, no shelling
> out to other applications. And I don't know what of other restrictions
> might limit any given program.

This is just something I'm thinking about.  To be honest, forking
probably won't be a problem.  Furthermore, if there were any
restriction it would likely just be on forking Python (a la
multiprocessing).  However, I doubt there will be a need to pursue
such a restriction.  As I said, there are still a lot of open
questions and subtle details to sort out.

> Will it feel like subinterpreters are
> "write your code according to these tight restrictions and it'll
> work", or will it be more of "most programs will run in parallel just
> fine, but there are a few things to be careful of"?

I expect that will be somewhat the case no matter what.  The less
restrictions the better, though. :)  It's a balancing act because I
expect that with some initial restrictions we can land the feature
sooner.  Then we could look into how to relax the restrictions.  I
just want to be careful that we don't paint ourselves into a corner in
that regard.


From ncoghlan at  Sun Jun 21 03:28:12 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 21 Jun 2015 11:28:12 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 21 June 2015 at 10:41, Chris Angelico <rosuav at> wrote:
> On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow <ericsnowcurrently at> wrote:
>> * disallow forking within subinterpreters
> I love the idea as a whole (if only because the detractors can be told
> "Just use subinterpreters, then you get concurrency"), but this seems
> like a tricky restriction. That means no subprocess.Popen, no shelling
> out to other applications. And I don't know what of other restrictions
> might limit any given program. Will it feel like subinterpreters are
> "write your code according to these tight restrictions and it'll
> work", or will it be more of "most programs will run in parallel just
> fine, but there are a few things to be careful of"?

To calibrate expectations appropriately, it's worth thinking about the
concept of Python level subinterpreter support as being broadly
comparable to the JavaScript concept of web worker threads. mod_wsgi's
use of the existing CPython specific subinterpreter support when
embedding CPython in Apache httpd means we already know
subinterpreters largely "just work" in the absence of low level C
shenanigans in extension modules, but we also know keeping
subinterpreters clearly subordinate to the main interpreter simplifies
a number of design and implementation aspects (just as having a main
thread simplified various aspects of the threading implementation),
and that there will likely be things the main interpreter can do that
subinterpreters can't.

A couple of possible examples:

* as Eric noted, we don't know yet if we'll be able to safely let
subinterpreters launch subprocesses (especially via fork)
* there may be restrictions on some extension modules that limit them
to "main interpreter only" (e.g. if the extension module itself isn't
thread-safe, then it will need to remain fully protected by the GIL)

The analogous example with web workers is the fact that they don't
have any access to the window object, document object or parent object
in the browser DOM.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From rustompmody at  Sun Jun 21 05:04:44 2015
From: rustompmody at (Rustom Mody)
Date: Sat, 20 Jun 2015 20:04:44 -0700 (PDT)
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sunday, June 21, 2015 at 6:12:22 AM UTC+5:30, Chris Angelico wrote:
> On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow <ericsnow... at 
> <javascript:>> wrote: 
> > * disallow forking within subinterpreters 
> I love the idea as a whole (if only because the detractors can be told 
> "Just use subinterpreters, then you get concurrency"), but this seems 
> like a tricky restriction. That means no subprocess.Popen, no shelling 
> out to other applications. And I don't know what of other restrictions 
> might limit any given program. Will it feel like subinterpreters are 
> "write your code according to these tight restrictions and it'll 
> work", or will it be more of "most programs will run in parallel just 
> fine, but there are a few things to be careful of"? 
> ChrisA 

Its good to get our terminology right: Are we talking parallelism or 
Some references on the distinction:

Bob Harper:
Rob Pike:

[Or if you prefer the more famous 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Sun Jun 21 05:38:14 2015
From: steve at (Steven D'Aprano)
Date: Sun, 21 Jun 2015 13:38:14 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote:

> * only allow passing plain functions to Task() and
> (exclude closures, other callables)

That doesn't sound very Pythonic to me. That's going to limit the 
usefulness of these subinterpreters.

> * object ownership model
>   + read-only in all but 1 subinterpreter
>   + RW in all subinterpreters

Isn't that a contradiction? If objects are read-only in all 
subinterpreters (except one), how can they be read/write in all 

All this talk about subinterpreters reminds me of an interesting blog 
post by Armin Ronacher:

He's quite critical of a number of internal details of the CPython 
interpreter. But what I take from his post is that there could be 
significant advantages to giving the CPython interpreter its own local 
environment, like Lua and Javascript typically do, rather than the 
current model where there is a single process-wide global environment. 
Instead of having multiple subinterpreters all running inside the main 
interpreter, you could have multiple interpreters running in the same 
process, each with their own environment.

I may be completely misinterpreting things here, but as I understand it, 
this would remove the need for the GIL, allowing even plain old threads 
to take advantage of multiple cores. But that's a separate issue.

Armin writes:

    I would like to see an internal interpreter design could be based on 
    interpreters that work independent of each other, with local base 
    types and more, similar to how JavaScript works. This would 
    immediately open up the door again for embedding and concurrency 
    based on message passing. CPUs won't get any faster :)

(He also talks about CPython's tp_slots system, but that's a 
separate issue, I think.)

Now I have no idea if Armin is correct, or whether I am even 
interpreting his post correctly. But I'd like to hear people's thoughts 
on how this might interact with Eric's suggestion.


From ericsnowcurrently at  Sun Jun 21 07:01:20 2015
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 20 Jun 2015 23:01:20 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 20, 2015 9:38 PM, "Steven D'Aprano" <steve at> wrote:
> On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote:
> > * only allow passing plain functions to Task() and
> > (exclude closures, other callables)
> That doesn't sound very Pythonic to me. That's going to limit the
> usefulness of these subinterpreters.

It certainly would limit their usefulness.  It's a tradeoff to make the
project tractable.  I'm certainly not opposed to dropping such
restrictions, now or as a follow-up project.  Also keep in mind that the
restriction is only something I'm considering.  It's too early to settle on
many of these details.

> > * object ownership model
> >   + read-only in all but 1 subinterpreter
> >   + RW in all subinterpreters
> Isn't that a contradiction? If objects are read-only in all
> subinterpreters (except one), how can they be read/write in all
> subinterpreters?

True.  The two statements, like the rest in the section, are summarizing
different details and ideas into which I've been looking.  Several of them
are mutually exclusive.

> All this talk about subinterpreters reminds me of an interesting blog
> post by Armin Ronacher:
> http:// <>

Interesting.  I'd read that before, but not recently.  Armin has some
interesting points but I can't say that I agree with his analysis or his
conclusions.  Regardless...

> He's quite critical of a number of internal details of the CPython
> interpreter. But what I take from his post is that there could be
> significant advantages to giving the CPython interpreter its own local
> environment, like Lua and Javascript typically do, rather than the
> current model where there is a single process-wide global environment.
> Instead of having multiple subinterpreters all running inside the main
> interpreter, you could have multiple interpreters running in the same
> process, each with their own environment.

But that's effectively the goal!  This proposal will not work if the
interpreters are not isolated.  I'm not clear on what Armin thinks is
shared between interpreters.  The only consequential shared piece is the
GIL and my proposal should render the GIL irrelevant for the most part.

> I may be completely misinterpreting things here, but as I understand it,
> this would remove the need for the GIL, allowing even plain old threads
> to take advantage of multiple cores. But that's a separate issue.

If we restrict each subinterpreter to a single thread and are careful with
how objects are shared (and sort out exrension modules) then there will be
no need for the GIL *within* each subinterpreter.  However there are a
couple of things that will keep the GIL around for now.

> Armin writes:
>     I would like to see an internal interpreter design could be based on
>     interpreters that work independent of each other, with local base
>     types and more, similar to how JavaScript works. This would
>     immediately open up the door again for embedding and concurrency
>     based on message passing. CPUs won't get any faster :)

That's almost exactly what I'm aiming for. :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From njs at  Sun Jun 21 07:25:07 2015
From: njs at (Nathaniel Smith)
Date: Sat, 20 Jun 2015 22:25:07 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 20, 2015 3:54 PM, "Eric Snow" <ericsnowcurrently at> wrote:
> On Jun 20, 2015 4:08 PM, "Nathaniel Smith" <njs at> wrote:
> >
> > On Jun 20, 2015 2:42 PM, "Eric Snow" <ericsnowcurrently at>
> > >
> > > tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> > > exposing them in Python, and adding a mechanism to safely share
> > > objects between them.
> >
> > This all sounds really cool if you can pull it off, and shared-nothing
threads do seem like the least impossible model to pull off.
> Agreed.
> > But "least impossible" and "possible" are different :-). From your
email I can't tell whether this plan is viable while preserving backcompat
and memory safety.
> I agree that those issues must be clearly solved in the proposal before
it can be approved.  I'm confident the approach I'm pursuing will afford us
the necessary guarantees.  I'll address those specific points directly when
I can sit down and organize my thoughts.

I'd love to see just a hand wavy, verbal proof-of-concept walking through
how this might work in some simple but realistic case. To me a single
compelling example could make this proposal feel much more concrete and

> > Suppose I have a queue between two subinterpreters, and on this queue I
place a list of dicts of user-defined-in-python objects, each of which
holds a reference to a user-defined-via-the-C-api object. What happens next?
> You've hit upon exactly the trickiness involved and why I'm thinking the
best approach initially is to only allow *strictly* immutable objects to
pass between interpreters.  Admittedly, my description of channels is very
vague.:)  There are a number of possibilities with them that I'm still
exploring (CSP has particular opinions...), but immutability is a
characteristic that may provide the simplest *initial* approach.  Going
that route shouldn't preclude adding some sort of support for mutable
objects later.

There aren't really many options for mutable objects, right? If you want
shared nothing semantics, then transmitting a mutable object either needs
to make a copy, or else be a real transfer, where the sender no longer has
it (cf. Rust).

I guess for the latter you'd need some new syntax for send-and-del, that
requires the object to be self contained (all mutable objects reachable
from it are only referenced by each other) and have only one reference in
the sending process (which is the one being sent and then destroyed).

> Keep in mind that by "immutability" I'm talking about *really* immutable,
perhaps going so far as treating the full memory space associated with an
object as frozen.  For instance, we'd have to ensure that "immutable"
Python objects like strings, ints, and tuples do not change (i.e. via the C

This seems like a red herring to me. It's already the case that you can't
legally use the c api to mutate tuples, ints, for any object that's ever
been, say, passed to a function. So for these objects, the subinterpreter
setup doesn't actually add any new constraints on user code.

C code is always going to be *able* to break memory safety so long as
you're using shared-memory threading at the c level to implement this
stuff. We just need to make it easy not to.

Refcnts and garbage collection are another matter, of course.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Sun Jun 21 08:31:33 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 21 Jun 2015 16:31:33 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 21 June 2015 at 15:25, Nathaniel Smith <njs at> wrote:
> On Jun 20, 2015 3:54 PM, "Eric Snow" <ericsnowcurrently at> wrote:
>> On Jun 20, 2015 4:08 PM, "Nathaniel Smith" <njs at> wrote:
>> >
>> > On Jun 20, 2015 2:42 PM, "Eric Snow" <ericsnowcurrently at>
>> > wrote:
>> > >
>> > > tl;dr Let's exploit multiple cores by fixing up subinterpreters,
>> > > exposing them in Python, and adding a mechanism to safely share
>> > > objects between them.
>> >
>> > This all sounds really cool if you can pull it off, and shared-nothing
>> > threads do seem like the least impossible model to pull off.
>> Agreed.
>> > But "least impossible" and "possible" are different :-). From your email
>> > I can't tell whether this plan is viable while preserving backcompat and
>> > memory safety.
>> I agree that those issues must be clearly solved in the proposal before it
>> can be approved.  I'm confident the approach I'm pursuing will afford us the
>> necessary guarantees.  I'll address those specific points directly when I
>> can sit down and organize my thoughts.
> I'd love to see just a hand wavy, verbal proof-of-concept walking through
> how this might work in some simple but realistic case. To me a single
> compelling example could make this proposal feel much more concrete and
> achievable.

I was one of the folks pushing Eric in this direction, and that's
because it's a possibility that was conceived of a few years back, but
never tried due to lack of time (and inclination for those of us that
are using Python primarily as an orchestration tool and hence spend
most of our time on IO bound problems rather than CPU bound ones):

As mentioned there, I've at least spent some time with Graham
Dumpleton over the past few years figuring out (and occasionally
trying to address) some of the limitations of mod_wsgi's existing
subinterpreter based WSGI app separation:

The fact that mod_wsgi can run most Python web applications in a
subinterpreter quite happily means we already know the core mechanism
works fine, and there don't appear to be any insurmountable technical
hurdles between the status quo and getting to a point where we can
either switch the GIL to a read/write lock where a write lock is only
needed for inter-interpreter communications, or else find a way for
subinterpreters to release the GIL entirely by restricting them

For inter-interpreter communication, the worst case scenario is having
to rely on a memcpy based message passing system (which would still be
faster than multiprocessing's serialisation + IPC overhead), but there
don't appear to be any insurmountable barriers to setting up an object
ownership based system instead (code that accesses PyObject_HEAD
fields directly rather than through the relevant macros and functions
seems to be the most likely culprit for breaking, but I think "don't
do that" is a reasonable answer there).

There's plenty of prior art here (including a system I once wrote in C
myself atop TI's DSP/BIOS MBX and TSK APIs), so I'm comfortable with
Eric's "simple matter of engineering" characterisation of the problem

The main reason that subinterpreters have never had a Python API
before is that they have enough rough edges that having to write a
custom C extension module to access the API is the least of your
problems if you decide you need them. At the same time, not having a
Python API not only makes them much harder to test, which means
various aspects of their operation are more likely to be broken, but
also makes them inherently CPython specific.

Eric's proposal essentially amounts to three things:

1. Filing off enough of the rough edges of the subinterpreter support
that we're comfortable giving them a public Python level API that
other interpreter implementations can reasonably support
2. Providing the primitives needed for safe and efficient message
passing between subinterpreters
3. Allowing subinterpreters to truly execute in parallel on multicore machines

All 3 of those are useful enhancements in their own right, which
offers the prospect of being able to make incremental progress towards
the ultimate goal of native Python level support for distributing
across multiple cores within a single process.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From wes.turner at  Sun Jun 21 08:41:21 2015
From: wes.turner at (Wes Turner)
Date: Sun, 21 Jun 2015 01:41:21 -0500
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>


* other approaches to the problem (with great APIs):
On Jun 20, 2015 5:55 PM, "Eric Snow" <ericsnowcurrently at> wrote:

> On Jun 20, 2015 4:08 PM, "Nathaniel Smith" <njs at> wrote:
> >
> > On Jun 20, 2015 2:42 PM, "Eric Snow" <ericsnowcurrently at>
> wrote:
> > >
> > > tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> > > exposing them in Python, and adding a mechanism to safely share
> > > objects between them.
> >
> > This all sounds really cool if you can pull it off, and shared-nothing
> threads do seem like the least impossible model to pull off.
> Agreed.
> > But "least impossible" and "possible" are different :-). From your email
> I can't tell whether this plan is viable while preserving backcompat and
> memory safety.
> I agree that those issues must be clearly solved in the proposal before it
> can be approved.  I'm confident the approach I'm pursuing will afford us
> the necessary guarantees.  I'll address those specific points directly when
> I can sit down and organize my thoughts.
> >
> > Suppose I have a queue between two subinterpreters, and on this queue I
> place a list of dicts of user-defined-in-python objects, each of which
> holds a reference to a user-defined-via-the-C-api object. What happens next?
> You've hit upon exactly the trickiness involved and why I'm thinking the
> best approach initially is to only allow *strictly* immutable objects to
> pass between interpreters.  Admittedly, my description of channels is very
> vague.:)  There are a number of possibilities with them that I'm still
> exploring (CSP has particular opinions...), but immutability is a
> characteristic that may provide the simplest *initial* approach.  Going
> that route shouldn't preclude adding some sort of support for mutable
> objects later.
> Keep in mind that by "immutability" I'm talking about *really* immutable,
> perhaps going so far as treating the full memory space associated with an
> object as frozen.  For instance, we'd have to ensure that "immutable"
> Python objects like strings, ints, and tuples do not change (i.e. via the C
> API).  The contents of involved tuples/containers would have to be likewise
> immutable.  Even changing refcounts could be too much, hence the idea of
> moving refcounts out to a separate table.
> This level of immutability would be something new to Python.  We'll see if
> it's necessary.  If it isn't too much work it might be a good idea
> regardless of the multi-core proposal.
> Also note that Barry has a (rejected) PEP from a number of years ago about
> freezing objects...  That idea is likely out of scope as relates to my
> proposal, but it certainly factors in the problem space.
> -eric
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Jun 21 11:11:49 2015
From: guido at (Guido van Rossum)
Date: Sun, 21 Jun 2015 11:11:49 +0200
Subject: [Python-ideas] Keyword-only arguments?
In-Reply-To: <>
References: <>
Message-ID: <>

My approach to this particular case has always been, if I need to port
keyword-only args to older Python versions, I'll just remove the '*' and
make it a documented convention that those arguments must be passed by
keyword only, without enforcement. The code obfuscation is just not worth
it -- it's just rarely super-important to strictly enforce this convention.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Sun Jun 21 11:48:46 2015
From: solipsis at (Antoine Pitrou)
Date: Sun, 21 Jun 2015 11:48:46 +0200
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <20150621114846.06bc8dc8@fsol>

On Sun, 21 Jun 2015 16:31:33 +1000
Nick Coghlan <ncoghlan at> wrote:
> For inter-interpreter communication, the worst case scenario is having
> to rely on a memcpy based message passing system (which would still be
> faster than multiprocessing's serialisation + IPC overhead)

And memcpy() updates pointer references to dependent objects magically?
Surely you meant the memdeepcopy() function that's part of every
standard C library!



From solipsis at  Sun Jun 21 11:54:43 2015
From: solipsis at (Antoine Pitrou)
Date: Sun, 21 Jun 2015 11:54:43 +0200
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <20150621115443.70ddcf28@fsol>

On Sat, 20 Jun 2015 23:01:20 -0600
Eric Snow <ericsnowcurrently at>
> The only consequential shared piece is the
> GIL and my proposal should render the GIL irrelevant for the most part.

All singleton objects, built-in types are shared and probably a number
of other things hidden in dark closets... Not to mention the memory

By the way, what you're aiming to do is conceptually quite similar to
Trent's PyParallel (thought Trent doesn't use subinterpreters, his main
work is around trying to making object sharing safe without any GIL to
trivially protect the sharing), so you may want to pair with him. Of
course, you may end up with a Windows-only Python interpreter :-)

I'm under the impression you're underestimating the task at hand here.
Or perhaps you're not and you're just willing to present it in a
positive way :-)



From ncoghlan at  Sun Jun 21 12:25:47 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 21 Jun 2015 20:25:47 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <20150621114846.06bc8dc8@fsol>
References: <>
Message-ID: <>

On 21 June 2015 at 19:48, Antoine Pitrou <solipsis at> wrote:
> On Sun, 21 Jun 2015 16:31:33 +1000
> Nick Coghlan <ncoghlan at> wrote:
>> For inter-interpreter communication, the worst case scenario is having
>> to rely on a memcpy based message passing system (which would still be
>> faster than multiprocessing's serialisation + IPC overhead)
> And memcpy() updates pointer references to dependent objects magically?
> Surely you meant the memdeepcopy() function that's part of every
> standard C library!

We already have the tools to do deep copies of object trees (although
I'll concede I *was* actually thinking in terms of the classic C/C++
mistake of carelessly copying pointers around when I wrote that
particular message). One of the options for deep copies tends to be a
pickle/unpickle round trip, which will still incur the serialisation
overhead, but not the IPC overhead.

"Faster message passing than multiprocessing" sets the baseline pretty
low, after all.

However, this is also why Eric mentions the notions of object
ownership or limiting channels to less than the full complement of
Python objects. As an *added* feature at the Python level, it's
possible to initially enforce restrictions that don't exist in the C
level subinterpeter API, and then work to relax those restrictions
over time.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From stefan_ml at  Sun Jun 21 12:40:43 2015
From: stefan_ml at (Stefan Behnel)
Date: Sun, 21 Jun 2015 12:40:43 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mm64be$eq5$>

Nick Coghlan schrieb am 21.06.2015 um 12:25:
> On 21 June 2015 at 19:48, Antoine Pitrou wrote:
>> On Sun, 21 Jun 2015 16:31:33 +1000 Nick Coghlan wrote:
>>> For inter-interpreter communication, the worst case scenario is having
>>> to rely on a memcpy based message passing system (which would still be
>>> faster than multiprocessing's serialisation + IPC overhead)
>> And memcpy() updates pointer references to dependent objects magically?
>> Surely you meant the memdeepcopy() function that's part of every
>> standard C library!
> We already have the tools to do deep copies of object trees (although
> I'll concede I *was* actually thinking in terms of the classic C/C++
> mistake of carelessly copying pointers around when I wrote that
> particular message). One of the options for deep copies tends to be a
> pickle/unpickle round trip, which will still incur the serialisation
> overhead, but not the IPC overhead.
> "Faster message passing than multiprocessing" sets the baseline pretty
> low, after all.
> However, this is also why Eric mentions the notions of object
> ownership or limiting channels to less than the full complement of
> Python objects. As an *added* feature at the Python level, it's
> possible to initially enforce restrictions that don't exist in the C
> level subinterpeter API, and then work to relax those restrictions
> over time.

If objects can make it explicit that they support sharing (and preferably
are allowed to implement the exact details themselves), I'm sure we'll find
ways to share NumPy arrays across subinterpreters. That feature alone tends
to be a quick way to make a lot of people happy.


From solipsis at  Sun Jun 21 12:41:05 2015
From: solipsis at (Antoine Pitrou)
Date: Sun, 21 Jun 2015 12:41:05 +0200
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <20150621124105.52c194c9@fsol>

On Sun, 21 Jun 2015 20:25:47 +1000
Nick Coghlan <ncoghlan at> wrote:
> On 21 June 2015 at 19:48, Antoine Pitrou <solipsis at> wrote:
> > On Sun, 21 Jun 2015 16:31:33 +1000
> > Nick Coghlan <ncoghlan at> wrote:
> >>
> >> For inter-interpreter communication, the worst case scenario is having
> >> to rely on a memcpy based message passing system (which would still be
> >> faster than multiprocessing's serialisation + IPC overhead)
> >
> > And memcpy() updates pointer references to dependent objects magically?
> > Surely you meant the memdeepcopy() function that's part of every
> > standard C library!
> We already have the tools to do deep copies of object trees 
> "Faster message passing than multiprocessing" sets the baseline pretty
> low, after all.

What's the goal? 10% faster? Or 10x? copy.deepcopy() uses similar
internal mechanisms as pickle...



From stefan_ml at  Sun Jun 21 12:54:52 2015
From: stefan_ml at (Stefan Behnel)
Date: Sun, 21 Jun 2015 12:54:52 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mm655v$ri7$>

Eric Snow schrieb am 20.06.2015 um 23:42:
> tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> exposing them in Python, and adding a mechanism to safely share
> objects between them.
> [...]
> In some personal correspondence Nick Coghlan, he summarized my
> preferred approach as "the data storage separation of multiprocessing,
> with the low message passing overhead of threading".
> For Python 3.6:
> * expose subinterpreters to Python in a new stdlib module: "subinterpreters"
> * add a new SubinterpreterExecutor to concurrent.futures
> * add a queue.Queue-like type that will be used to explicitly share
> objects between subinterpreters
> [...]
> C Extension Modules
> =================
> Subinterpreters already isolate extension modules (and built-in
> modules, including sys).  PEP 384 provides some help too.  However,
> global state in C can easily leak data between subinterpreters,
> breaking the desired data isolation.  This is something that will need
> to be addressed as part of the effort.

I also had some discussions about these things with Nick before. Not sure
if you really meant PEP 384 (you might have) or rather PEP 489:

I consider that one more important here, as it will eventually allow Cython
modules to support subinterpreters. Unless, as you mentioned, they use
global C state, but only in external C code, e.g. wrapped libraries. Cython
should be able to handle most of the module internal global state on a
per-interpreter basis itself, without too much user code impact.

I'm totally +1 for the idea. I hope that I'll find the time (well, and
money) to work on PEP 489 in Cython soon, so that I can prove it right for
actual real-world code in Python 3.5. We'll then see about subinterpreter
support. That's certainly the next step.


From ncoghlan at  Sun Jun 21 12:57:42 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 21 Jun 2015 20:57:42 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <20150621124105.52c194c9@fsol>
References: <>
Message-ID: <>

On 21 June 2015 at 20:41, Antoine Pitrou <solipsis at> wrote:
> On Sun, 21 Jun 2015 20:25:47 +1000
> Nick Coghlan <ncoghlan at> wrote:
>> On 21 June 2015 at 19:48, Antoine Pitrou <solipsis at> wrote:
>> > On Sun, 21 Jun 2015 16:31:33 +1000
>> > Nick Coghlan <ncoghlan at> wrote:
>> >>
>> >> For inter-interpreter communication, the worst case scenario is having
>> >> to rely on a memcpy based message passing system (which would still be
>> >> faster than multiprocessing's serialisation + IPC overhead)
>> >
>> > And memcpy() updates pointer references to dependent objects magically?
>> > Surely you meant the memdeepcopy() function that's part of every
>> > standard C library!
>> We already have the tools to do deep copies of object trees
> [...]
>> "Faster message passing than multiprocessing" sets the baseline pretty
>> low, after all.
> What's the goal? 10% faster? Or 10x? copy.deepcopy() uses similar
> internal mechanisms as pickle...

I'd want us to eventually aim for zero-copy speed for at least known
immutable values (int, str, float, etc), immutable containers of
immutable values (tuple, frozenset), and for types that support both
publishing and consuming data via the PEP 3118 buffer protocol without
making a copy.

For everything else I'd be fine with a starting point that was at
least no slower than multiprocessing (which shouldn't be difficult,
since we'll at least save the IPC overhead even if there are cases
where communication between subinterpreters falls back to
serialisation rather than doing something more CPU and memory

As an implementation strategy, I'd actually suggest starting with
*only* the latter for simplicity's sake, even though it misses out on
some of the potential speed benefits of sharing an address space.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From jean-charles.douet at  Sun Jun 21 13:03:37 2015
From: jean-charles.douet at (jean-charles.douet at
Date: Sun, 21 Jun 2015 13:03:37 +0200 (CEST)
Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 100
In-Reply-To: <>
References: <>
Message-ID: <>


My dix cents may not seem to be neither very clear, nor too argumented, but I wanted to let you know that there seem to be already interesting pythonic studies about how to release the so-called GIL and about, all, for which pragmatic use cases. 

Reading the following article, I discovered that Python by itself seems not to be a language, but language specification : 
Thus, it justifies that PyPy is a more general approach than CPython, which looks like a particular case of Python, even if the most frequently used (?) 

Now, there is a specific study of PyPy aimed at removing the GIL, called "Software Transactional Memory". Here it is : 

Hope it helps : 
Best regards, 

----- Mail original -----

De: python-ideas-request at 
?: python-ideas at 
Envoy?: Dimanche 21 Juin 2015 08:41:24 
Objet: Python-ideas Digest, Vol 103, Issue 100 

Send Python-ideas mailing list submissions to 
python-ideas at 

To subscribe or unsubscribe via the World Wide Web, visit 
or, via email, send a message with subject or body 'help' to 
python-ideas-request at 

You can reach the person managing the list at 
python-ideas-owner at 

When replying, please edit your Subject line so it is more specific 
than "Re: Contents of Python-ideas digest..." 

Today's Topics: 

1. Re: solving multi-core Python (Nathaniel Smith) 
2. Re: solving multi-core Python (Nick Coghlan) 
3. Re: solving multi-core Python (Wes Turner) 


Message: 1 
Date: Sat, 20 Jun 2015 22:25:07 -0700 
From: Nathaniel Smith <njs at> 
To: Eric Snow <ericsnowcurrently at> 
Cc: python-ideas <python-ideas at> 
Subject: Re: [Python-ideas] solving multi-core Python 
<CAPJVwBkjoK31m2-ynrGF_AmYFL0ULL3LdX6r7+d+B7RienQh7A at> 
Content-Type: text/plain; charset="utf-8" 

On Jun 20, 2015 3:54 PM, "Eric Snow" <ericsnowcurrently at> wrote: 
> On Jun 20, 2015 4:08 PM, "Nathaniel Smith" <njs at> wrote: 
> > 
> > On Jun 20, 2015 2:42 PM, "Eric Snow" <ericsnowcurrently at> 
> > > 
> > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, 
> > > exposing them in Python, and adding a mechanism to safely share 
> > > objects between them. 
> > 
> > This all sounds really cool if you can pull it off, and shared-nothing 
threads do seem like the least impossible model to pull off. 
> Agreed. 
> > But "least impossible" and "possible" are different :-). From your 
email I can't tell whether this plan is viable while preserving backcompat 
and memory safety. 
> I agree that those issues must be clearly solved in the proposal before 
it can be approved. I'm confident the approach I'm pursuing will afford us 
the necessary guarantees. I'll address those specific points directly when 
I can sit down and organize my thoughts. 

I'd love to see just a hand wavy, verbal proof-of-concept walking through 
how this might work in some simple but realistic case. To me a single 
compelling example could make this proposal feel much more concrete and 

> > Suppose I have a queue between two subinterpreters, and on this queue I 
place a list of dicts of user-defined-in-python objects, each of which 
holds a reference to a user-defined-via-the-C-api object. What happens next? 
> You've hit upon exactly the trickiness involved and why I'm thinking the 
best approach initially is to only allow *strictly* immutable objects to 
pass between interpreters. Admittedly, my description of channels is very 
vague.:) There are a number of possibilities with them that I'm still 
exploring (CSP has particular opinions...), but immutability is a 
characteristic that may provide the simplest *initial* approach. Going 
that route shouldn't preclude adding some sort of support for mutable 
objects later. 

There aren't really many options for mutable objects, right? If you want 
shared nothing semantics, then transmitting a mutable object either needs 
to make a copy, or else be a real transfer, where the sender no longer has 
it (cf. Rust). 

I guess for the latter you'd need some new syntax for send-and-del, that 
requires the object to be self contained (all mutable objects reachable 
from it are only referenced by each other) and have only one reference in 
the sending process (which is the one being sent and then destroyed). 

> Keep in mind that by "immutability" I'm talking about *really* immutable, 
perhaps going so far as treating the full memory space associated with an 
object as frozen. For instance, we'd have to ensure that "immutable" 
Python objects like strings, ints, and tuples do not change (i.e. via the C 

This seems like a red herring to me. It's already the case that you can't 
legally use the c api to mutate tuples, ints, for any object that's ever 
been, say, passed to a function. So for these objects, the subinterpreter 
setup doesn't actually add any new constraints on user code. 

C code is always going to be *able* to break memory safety so long as 
you're using shared-memory threading at the c level to implement this 
stuff. We just need to make it easy not to. 

Refcnts and garbage collection are another matter, of course. 

-------------- next part -------------- 
An HTML attachment was scrubbed... 
URL: <> 


Message: 2 
Date: Sun, 21 Jun 2015 16:31:33 +1000 
From: Nick Coghlan <ncoghlan at> 
To: Nathaniel Smith <njs at> 
Cc: Eric Snow <ericsnowcurrently at>, python-ideas 
<python-ideas at> 
Subject: Re: [Python-ideas] solving multi-core Python 
<CADiSq7cv538UBK9BE3e8eAakFB=njwHB-qnMu3m=qzLADzpsOg at> 
Content-Type: text/plain; charset=UTF-8 

On 21 June 2015 at 15:25, Nathaniel Smith <njs at> wrote: 
> On Jun 20, 2015 3:54 PM, "Eric Snow" <ericsnowcurrently at> wrote: 
>> On Jun 20, 2015 4:08 PM, "Nathaniel Smith" <njs at> wrote: 
>> > 
>> > On Jun 20, 2015 2:42 PM, "Eric Snow" <ericsnowcurrently at> 
>> > wrote: 
>> > > 
>> > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, 
>> > > exposing them in Python, and adding a mechanism to safely share 
>> > > objects between them. 
>> > 
>> > This all sounds really cool if you can pull it off, and shared-nothing 
>> > threads do seem like the least impossible model to pull off. 
>> Agreed. 
>> > But "least impossible" and "possible" are different :-). From your email 
>> > I can't tell whether this plan is viable while preserving backcompat and 
>> > memory safety. 
>> I agree that those issues must be clearly solved in the proposal before it 
>> can be approved. I'm confident the approach I'm pursuing will afford us the 
>> necessary guarantees. I'll address those specific points directly when I 
>> can sit down and organize my thoughts. 
> I'd love to see just a hand wavy, verbal proof-of-concept walking through 
> how this might work in some simple but realistic case. To me a single 
> compelling example could make this proposal feel much more concrete and 
> achievable. 

I was one of the folks pushing Eric in this direction, and that's 
because it's a possibility that was conceived of a few years back, but 
never tried due to lack of time (and inclination for those of us that 
are using Python primarily as an orchestration tool and hence spend 
most of our time on IO bound problems rather than CPU bound ones): 

As mentioned there, I've at least spent some time with Graham 
Dumpleton over the past few years figuring out (and occasionally 
trying to address) some of the limitations of mod_wsgi's existing 
subinterpreter based WSGI app separation: 

The fact that mod_wsgi can run most Python web applications in a 
subinterpreter quite happily means we already know the core mechanism 
works fine, and there don't appear to be any insurmountable technical 
hurdles between the status quo and getting to a point where we can 
either switch the GIL to a read/write lock where a write lock is only 
needed for inter-interpreter communications, or else find a way for 
subinterpreters to release the GIL entirely by restricting them 

For inter-interpreter communication, the worst case scenario is having 
to rely on a memcpy based message passing system (which would still be 
faster than multiprocessing's serialisation + IPC overhead), but there 
don't appear to be any insurmountable barriers to setting up an object 
ownership based system instead (code that accesses PyObject_HEAD 
fields directly rather than through the relevant macros and functions 
seems to be the most likely culprit for breaking, but I think "don't 
do that" is a reasonable answer there). 

There's plenty of prior art here (including a system I once wrote in C 
myself atop TI's DSP/BIOS MBX and TSK APIs), so I'm comfortable with 
Eric's "simple matter of engineering" characterisation of the problem 

The main reason that subinterpreters have never had a Python API 
before is that they have enough rough edges that having to write a 
custom C extension module to access the API is the least of your 
problems if you decide you need them. At the same time, not having a 
Python API not only makes them much harder to test, which means 
various aspects of their operation are more likely to be broken, but 
also makes them inherently CPython specific. 

Eric's proposal essentially amounts to three things: 

1. Filing off enough of the rough edges of the subinterpreter support 
that we're comfortable giving them a public Python level API that 
other interpreter implementations can reasonably support 
2. Providing the primitives needed for safe and efficient message 
passing between subinterpreters 
3. Allowing subinterpreters to truly execute in parallel on multicore machines 

All 3 of those are useful enhancements in their own right, which 
offers the prospect of being able to make incremental progress towards 
the ultimate goal of native Python level support for distributing 
across multiple cores within a single process. 


Nick Coghlan | ncoghlan at | Brisbane, Australia 


Message: 3 
Date: Sun, 21 Jun 2015 01:41:21 -0500 
From: Wes Turner <wes.turner at> 
To: Eric Snow <ericsnowcurrently at> 
Cc: Nathaniel Smith <njs at>, Python-Ideas 
<python-ideas at> 
Subject: Re: [Python-ideas] solving multi-core Python 
<CACfEFw_1JVpUmZwFVkye-fbEsfH_NXVSo6WDMi1azhnXdY6PcA at> 
Content-Type: text/plain; charset="utf-8" 


* other approaches to the problem (with great APIs): 
On Jun 20, 2015 5:55 PM, "Eric Snow" <ericsnowcurrently at> wrote: 

> On Jun 20, 2015 4:08 PM, "Nathaniel Smith" <njs at> wrote: 
> > 
> > On Jun 20, 2015 2:42 PM, "Eric Snow" <ericsnowcurrently at> 
> wrote: 
> > > 
> > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, 
> > > exposing them in Python, and adding a mechanism to safely share 
> > > objects between them. 
> > 
> > This all sounds really cool if you can pull it off, and shared-nothing 
> threads do seem like the least impossible model to pull off. 
> Agreed. 
> > But "least impossible" and "possible" are different :-). From your email 
> I can't tell whether this plan is viable while preserving backcompat and 
> memory safety. 
> I agree that those issues must be clearly solved in the proposal before it 
> can be approved. I'm confident the approach I'm pursuing will afford us 
> the necessary guarantees. I'll address those specific points directly when 
> I can sit down and organize my thoughts. 
> > 
> > Suppose I have a queue between two subinterpreters, and on this queue I 
> place a list of dicts of user-defined-in-python objects, each of which 
> holds a reference to a user-defined-via-the-C-api object. What happens next? 
> You've hit upon exactly the trickiness involved and why I'm thinking the 
> best approach initially is to only allow *strictly* immutable objects to 
> pass between interpreters. Admittedly, my description of channels is very 
> vague.:) There are a number of possibilities with them that I'm still 
> exploring (CSP has particular opinions...), but immutability is a 
> characteristic that may provide the simplest *initial* approach. Going 
> that route shouldn't preclude adding some sort of support for mutable 
> objects later. 
> Keep in mind that by "immutability" I'm talking about *really* immutable, 
> perhaps going so far as treating the full memory space associated with an 
> object as frozen. For instance, we'd have to ensure that "immutable" 
> Python objects like strings, ints, and tuples do not change (i.e. via the C 
> API). The contents of involved tuples/containers would have to be likewise 
> immutable. Even changing refcounts could be too much, hence the idea of 
> moving refcounts out to a separate table. 
> This level of immutability would be something new to Python. We'll see if 
> it's necessary. If it isn't too much work it might be a good idea 
> regardless of the multi-core proposal. 
> Also note that Barry has a (rejected) PEP from a number of years ago about 
> freezing objects... That idea is likely out of scope as relates to my 
> proposal, but it certainly factors in the problem space. 
> -eric 
> _______________________________________________ 
> Python-ideas mailing list 
> Python-ideas at 
> Code of Conduct: 
-------------- next part -------------- 
An HTML attachment was scrubbed... 
URL: <> 


Subject: Digest Footer 

Python-ideas mailing list 
Python-ideas at 


End of Python-ideas Digest, Vol 103, Issue 100 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From sturla.molden at  Sun Jun 21 13:41:30 2015
From: sturla.molden at (Sturla Molden)
Date: Sun, 21 Jun 2015 13:41:30 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mm67sk$1nv$>

On 20/06/15 23:42, Eric Snow wrote:
> tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> exposing them in Python, and adding a mechanism to safely share
> objects between them.
> This proposal is meant to be a shot over the bow, so to speak.  I plan
> on putting together a more complete PEP some time in the future, with
> content that is more refined along with references to the appropriate
> online resources.
> Feedback appreciated!  Offers to help even more so! :)

 From the perspective of software design, it would be good it the 
CPython interpreter provided an environment instead of using global 
objects. It would mean that all functions in the C API would need to 
take the environment pointer as their first variable, which will be a 
major rewrite. It would also allow the "one interpreter per thread" 
design similar to tcl and .NET application domains.

However, from the perspective of multi-core parallel computing, I am not 
sure what this offers over using multiple processes.

Yes, you avoid the process startup time, but on POSIX systems a fork is 
very fast. An certainly, forking is much more efficient than serializing 
Python objects. It then boils down to a workaround for the fact that 
Windows cannot fork, which makes it particularly bad for running 
CPython. You also have to start up a subinterpreter and a thread, which 
is not instantaneous. So I am not sure there is a lot to gain here over 
calling os.fork.

A non-valid argument for this kind of design is that only code which 
uses threads for parallel computing is "real" multi-core code. So Python 
does not support multi-cores because multiprocessing or os.fork is just 
faking it. This is an argument that belongs in the intellectual junk 
yard. It stems from the abuse of threads among Windows and Java 
developers, and is rooted in the absence of fork on Windows and the 
formerly slow fork on Solaris. And thus they are only able to think in 
terms of threads. If threading.Thread does not scale the way they want, 
they think multicores are out of reach.

So the question is, how do you want to share objects between 
subinterpreters? And why is it better than IPC, when your idea is to 
isolate subinterpreters like application domains?

If you think avoiding IPC is clever, you are wrong. IPC is very fast, in 
fact programs written to use MPI tends to perform and scale better than 
programs written to use OpenMP in parallel computing. Not only is IPC 
fast, but you also avoid an issue called "false sharing", which can be 
even more detrimental than the GIL: You have parallel code, but it seems 
to run in serial, even though there is no explicit serialization 
anywhere. And by since Murphy's law is working against us, Python 
reference counts will be false shared unless we use multiple processes.
The reason IPC in multiprocessing is slow is due to calling pickle, it 
is not the IPC in itself. A pipe or an Unix socket (named pipe on 
Windows) have the overhead of a memcpy in the kernel, which is equal to 
a memcpy plus some tiny constant overhead. And if you need two processes 
to share memory, there is something called shared memory. Thus, we can 
send data between processes just as fast as between subinterpreters.

All in all, I think we are better off finding a better way to share 
Python objects between processes.

P.S. Another thing to note is that with sub-interpreters, you can forget 
about using ctypes or anything else that uses the simplified GIL API 
(e.g. certain Cython generated extensions).


From solipsis at  Sun Jun 21 13:52:36 2015
From: solipsis at (Antoine Pitrou)
Date: Sun, 21 Jun 2015 13:52:36 +0200
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <20150621135236.6c31b605@fsol>

On Sun, 21 Jun 2015 13:41:30 +0200
Sturla Molden <sturla.molden at>
>  From the perspective of software design, it would be good it the 
> CPython interpreter provided an environment instead of using global 
> objects. It would mean that all functions in the C API would need to 
> take the environment pointer as their first variable, which will be a 
> major rewrite. It would also allow the "one interpreter per thread" 
> design similar to tcl and .NET application domains.

From the point of view of API compatibility, it's unfortunately a no-no.

> The reason IPC in multiprocessing is slow is due to calling pickle, it 
> is not the IPC in itself.

No need to be pedantic :-) The "C" means communication, and pickling
objects is part of the communication between Python processes.

> All in all, I think we are better off finding a better way to share 
> Python objects between processes.

Sure. This is however a complex and experimental topic (how to share a
graph of garbage-collected objects between independant processes), with
no guarantees of showing any results at the end.

> P.S. Another thing to note is that with sub-interpreters, you can forget 
> about using ctypes or anything else that uses the simplified GIL API 
> (e.g. certain Cython generated extensions).

Indeed, the PyGILState API is still not subinterpreter-compatible.
There's a proposal on the tracker, IIRC, but the interested parties
never made any progress on it.



From jeanpierreda at  Sun Jun 21 13:55:54 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Sun, 21 Jun 2015 04:55:54 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow <ericsnowcurrently at> wrote:
> On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" <jeanpierreda at> wrote:
>> It's worthwhile to consider fork as an alternative.  IMO we'd get a
>> lot out of making forking safer, easier, and more efficient. (e.g.
>> respectively: adding an atfork registration mechanism; separating out
>> the bits of multiprocessing that use pickle from those that d, I still disagreeon't;
>> moving the refcount to a separate page, or allowing it to be frozen
>> prior to a fork.)
> So leverage a common base of code with the multiprocessing module?

What is this question in response to? I don't understand.

> I would expect subinterpreters to use less memory.  Furthermore creating
> them would be significantly faster.  Passing objects between them would be
> much more efficient.  And, yes, cross-platform.

Maybe I don't understand how subinterpreters work. AIUI, the whole
point of independent subinterpreters is that they share no state. So
if I have a web server, each independent serving thread has to do all
of the initialization (import HTTP libraries, etc.), right? Compare
with forking, where the initialization is all done and then you fork,
and you are immediately ready to serve, using the data structures
shared with all the other workers, which is only copied when it is
written to. So forking starts up faster and uses less memory (due to
shared memory.)

Re passing objects, see below.

I do agree it's cross-platform, but right now that's the only thing I
agree with.

>> Note: I don't count the IPC cost of forking, because at least on
>> linux, any way to efficiently share objects between independent
>> interpreters in separate threads can also be ported to independent
>> interpreters in forked subprocesses,
> How so?  Subinterpreters are in the same process.  For this proposal each
> would be on its own thread.  Sharing objects between them through channels
> would be more efficient than IPC.  Perhaps I've missed something?

You might be missing that memory can be shared between processes, not
just threads, but I don't know.

The reason passing objects between processes is so slow is currently
*nearly entirely* the cost of serialization. That is, it's the fact
that you are passing an object to an entirely separate interpreter,
and need to serialize the whole object graph and so on. If you can
make that fast without serialization, for shared memory threads, then
all the serialization becomes unnecessary, and you can either write to
a pipe (fast, if it's a non-container), or used shared memory from the
beginning (instantaneous). This is possible on any POSIX OS. Linux
lets you go even further.

-- Devin

From jeanpierreda at  Sun Jun 21 14:13:40 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Sun, 21 Jun 2015 05:13:40 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 20, 2015 at 11:31 PM, Nick Coghlan <ncoghlan at> wrote:
> For inter-interpreter communication, the worst case scenario is having
> to rely on a memcpy based message passing system (which would still be
> faster than multiprocessing's serialisation + IPC overhead), but there
> don't appear to be any insurmountable barriers to setting up an object
> ownership based system instead (code that accesses PyObject_HEAD
> fields directly rather than through the relevant macros and functions
> seems to be the most likely culprit for breaking, but I think "don't
> do that" is a reasonable answer there).

The comparison is unfair -- if you can share between subinterpreters
using memcpy, then you can share between processes using just a socket
write, and multiprocessing becomes nearly just as fast.

> Eric's proposal essentially amounts to three things:
> 1. Filing off enough of the rough edges of the subinterpreter support
> that we're comfortable giving them a public Python level API that
> other interpreter implementations can reasonably support
> 2. Providing the primitives needed for safe and efficient message
> passing between subinterpreters
> 3. Allowing subinterpreters to truly execute in parallel on multicore machines
> All 3 of those are useful enhancements in their own right, which
> offers the prospect of being able to make incremental progress towards
> the ultimate goal of native Python level support for distributing
> across multiple cores within a single process.

Why is that the goal? Whatever faults processes have, those are the
problems, surely not processes in and of themselves, right?

e.g. if the reason we don't like multiprocessed python is extra memory
use, it's memory use we're opposed to. A solution that gives us
parallel threads, but doesn't decrease memory consumption, doesn't
solve anything. The solution has threads that are remarkably like
processes, so I think it's really important to be careful about the
differences and why this solution has the advantage. I'm not seeing

And remember that we *do* have many examples of people using
parallelized Python code in production. Are you sure you're satisfying
their concerns, or whose concerns are you trying to satisfy?

-- Devin

From ncoghlan at  Sun Jun 21 14:55:42 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 21 Jun 2015 22:55:42 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 21 June 2015 at 07:42, Eric Snow <ericsnowcurrently at> wrote:
> tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> exposing them in Python, and adding a mechanism to safely share
> objects between them.
> This proposal is meant to be a shot over the bow, so to speak.  I plan
> on putting together a more complete PEP some time in the future, with
> content that is more refined along with references to the appropriate
> online resources.
> Feedback appreciated!  Offers to help even more so! :)

For folks interested in more of the background and design trade-offs
involved here, with Eric's initial post published, I've now extracted
and updated my old answer about the GIL from the Python 3 Q & A page,
and turned it into its own article:


P.S. The entry for the old Q&A answer is still there, but now
redirects to the new article:

Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From stefan_ml at  Sun Jun 21 15:06:57 2015
From: stefan_ml at (Stefan Behnel)
Date: Sun, 21 Jun 2015 15:06:57 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mm6ctj$790$>

Nick Coghlan schrieb am 21.06.2015 um 03:28:
> * there may be restrictions on some extension modules that limit them
> to "main interpreter only" (e.g. if the extension module itself isn't
> thread-safe, then it will need to remain fully protected by the GIL)

Just an idea, but C extensions could opt-in to this. Calling into them has
to go through some kind of callable type, usually PyCFunction. We could
protect all calls to extension types and C functions with a global runtime
lock (per process, not per interpreter) and Extensions could set a flag on
their functions and methods (or get it inherited from their extension types
etc.) that says "I don't need the lock". That allows for a very
fine-grained transition.


From ncoghlan at  Sun Jun 21 15:09:35 2015
From: ncoghlan at (Nick Coghlan)
Date: Sun, 21 Jun 2015 23:09:35 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mm67sk$1nv$>
References: <>
Message-ID: <>

On 21 June 2015 at 21:41, Sturla Molden <sturla.molden at> wrote:
> On 20/06/15 23:42, Eric Snow wrote:
>> tl;dr Let's exploit multiple cores by fixing up subinterpreters,
>> exposing them in Python, and adding a mechanism to safely share
>> objects between them.
>> This proposal is meant to be a shot over the bow, so to speak.  I plan
>> on putting together a more complete PEP some time in the future, with
>> content that is more refined along with references to the appropriate
>> online resources.
>> Feedback appreciated!  Offers to help even more so! :)
> From the perspective of software design, it would be good it the CPython
> interpreter provided an environment instead of using global objects. It
> would mean that all functions in the C API would need to take the
> environment pointer as their first variable, which will be a major rewrite.
> It would also allow the "one interpreter per thread" design similar to tcl
> and .NET application domains.
> However, from the perspective of multi-core parallel computing, I am not
> sure what this offers over using multiple processes.
> Yes, you avoid the process startup time, but on POSIX systems a fork is very
> fast. An certainly, forking is much more efficient than serializing Python
> objects. It then boils down to a workaround for the fact that Windows cannot
> fork, which makes it particularly bad for running CPython. You also have to
> start up a subinterpreter and a thread, which is not instantaneous. So I am
> not sure there is a lot to gain here over calling os.fork.

Please give Eric and I the courtesy of assuming we know how CPython
works. This article, which is an update of a Python 3 Q&A answer I
wrote some time ago, goes into more detail on the background of this
proposed investigation:

> A non-valid argument for this kind of design is that only code which uses
> threads for parallel computing is "real" multi-core code. So Python does not
> support multi-cores because multiprocessing or os.fork is just faking it.
> This is an argument that belongs in the intellectual junk yard.
> It stems
> from the abuse of threads among Windows and Java developers, and is rooted
> in the absence of fork on Windows and the formerly slow fork on Solaris. And
> thus they are only able to think in terms of threads. If threading.Thread
> does not scale the way they want, they think multicores are out of reach.

Sturla, expressing out and out contempt for entire communities of
capable, competent developers (both the creators of Windows and Java,
and the users of those platforms) has no place on the core Python
mailing lists. Please refrain from casually insulting entire groups of
people merely because you don't approve of their technical choices.

> The reason IPC in multiprocessing is slow is due to calling pickle, it is
> not the IPC in itself. A pipe or an Unix socket (named pipe on Windows) have
> the overhead of a memcpy in the kernel, which is equal to a memcpy plus some
> tiny constant overhead. And if you need two processes to share memory, there
> is something called shared memory. Thus, we can send data between processes
> just as fast as between subinterpreters.

Avoiding object serialisation is indeed the main objective. With
subinterpreters, we have a lot more options for that than we do with
any form of IPC, including shared references to immutable objects, and
the PEP 3118 buffer API.

> All in all, I think we are better off finding a better way to share Python
> objects between processes.

This is not an either/or question, as other folks remain free to work
on improving multiprocessing's IPC efficiency if they want to. We
don't seem to have folks clamouring at the door to work on that,

> P.S. Another thing to note is that with sub-interpreters, you can forget
> about using ctypes or anything else that uses the simplified GIL API (e.g.
> certain Cython generated extensions).

Those aren't fundamental conceptual limitations, they're incidental
limitations of the current design and implementation of the simplified
GIL state API. One of the benefits of introducing a Python level API
for subinterpreters is that it makes it easier to start testing, and
hence fixing, some of those limitations (I actually just suggested to
Eric off list that adding subinterpreter controls to _testcapi might
be a good place to start, as that's beneficial regardless of what, if
anything, ends up happening from a public API perspective)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From solipsis at  Sun Jun 21 15:21:12 2015
From: solipsis at (Antoine Pitrou)
Date: Sun, 21 Jun 2015 15:21:12 +0200
Subject: [Python-ideas] solving multi-core Python
References: <>
 <mm67sk$1nv$> <20150621135236.6c31b605@fsol>
Message-ID: <20150621152112.16fedf02@fsol>

On Sun, 21 Jun 2015 13:52:36 +0200
Antoine Pitrou <solipsis at> wrote:
> > P.S. Another thing to note is that with sub-interpreters, you can forget 
> > about using ctypes or anything else that uses the simplified GIL API 
> > (e.g. certain Cython generated extensions).
> Indeed, the PyGILState API is still not subinterpreter-compatible.
> There's a proposal on the tracker, IIRC, but the interested parties
> never made any progress on it.

For reference:



From rosuav at  Sun Jun 21 16:12:24 2015
From: rosuav at (Chris Angelico)
Date: Mon, 22 Jun 2015 00:12:24 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mm67sk$1nv$>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 9:41 PM, Sturla Molden <sturla.molden at> wrote:
> However, from the perspective of multi-core parallel computing, I am not
> sure what this offers over using multiple processes.
> Yes, you avoid the process startup time, but on POSIX systems a fork is very
> fast. An certainly, forking is much more efficient than serializing Python
> objects. It then boils down to a workaround for the fact that Windows cannot
> fork, which makes it particularly bad for running CPython. You also have to
> start up a subinterpreter and a thread, which is not instantaneous. So I am
> not sure there is a lot to gain here over calling os.fork.

That's all very well for sending stuff *to* a subprocess. If you fork
for a single job, do the job, and have the subprocess send the result
directly back to the origin (eg over its socket), then terminate, then
sure, you don't need a lot of IPC. But for models where there's
ongoing work, maybe interacting with other subinterpreters
periodically, there could be a lot of benefit. It's very easy to slip
into a CGI style of mentality where requests are entirely fungible and
independent, and all you're doing is parallelization, but not
everything fits into that model :) I run a MUD server, for instance,
where currently every connection gets its own thread; if I wanted to
make use of multiple CPU cores, I would not want to have the
connections handled by separate processes, because they are constantly
interacting with each other, so IPC would get expensive.


From sturla.molden at  Sun Jun 21 17:14:54 2015
From: sturla.molden at (Sturla Molden)
Date: Sun, 21 Jun 2015 15:14:54 +0000 (UTC)
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <>

Devin Jeanpierre <jeanpierreda at>

> The comparison is unfair -- if you can share between subinterpreters
> using memcpy, then you can share between processes using just a socket
> write, and multiprocessing becomes nearly just as fast.

That is the main issue here. Writing to a pipe or a Unix socket is
implemented with a memcpy in the kernel. So there is just a tiny constant
overhead compared to using memcpy within a process. And with shared memory
as IPC even this tiny overhead can be removed. 

The main overhead in communicating Python objects in multiprocessing is the
serialization with pickle. So there is basically nothing to gain unless
this part can be omitted.

There is an errorneous belief among Windows programmers tht "IPC is slow".
But that is because they are using out-proc DCOM server, CORBA, XMLRPC or
something equally atrocious. A plain named pipe transaction is not in any
way slow on Windows.


From sturla.molden at  Sun Jun 21 17:45:05 2015
From: sturla.molden at (Sturla Molden)
Date: Sun, 21 Jun 2015 15:45:05 +0000 (UTC)
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <>

Nick Coghlan <ncoghlan at> wrote:

> Sturla, expressing out and out contempt for entire communities of
> capable, competent developers (both the creators of Windows and Java,
> and the users of those platforms) has no place on the core Python
> mailing lists. Please refrain from casually insulting entire groups of
> people merely because you don't approve of their technical choices.

I am not sure what you mean. Using threads on Windows and Java comes from a
necessity, not because developers are incompetent. Windows does not provide
a fork and processes are heavy-weight, hence multi-threading is the obvious

> Avoiding object serialisation is indeed the main objective. 


> With
> subinterpreters, we have a lot more options for that than we do with
> any form of IPC, including shared references to immutable objects, and
> the PEP 3118 buffer API.

Perhaps. One could do this with shared memory as well, but a complicating
factor is that the base address must be the same (or corrected for). But
one could probably do low-level magic with memory mapping  to work around
this. Particularly on 64-bit it is not really difficult to make sure a page
is mapped to the same address in two processes.

It is certainly easier to achieve within a process. But if the plan for
Erlang-style "share nothing" threads is to pickle and memcpy objects, there
is little or nothing to gain over using multiprocessing.


From sturla.molden at  Sun Jun 21 18:13:01 2015
From: sturla.molden at (Sturla Molden)
Date: Sun, 21 Jun 2015 16:13:01 +0000 (UTC)
Subject: [Python-ideas] solving multi-core Python
References: <>
 <mm67sk$1nv$> <20150621135236.6c31b605@fsol>
Message-ID: <>

Antoine Pitrou <solipsis at> wrote:

>> The reason IPC in multiprocessing is slow is due to calling pickle, it 
>> is not the IPC in itself.
> No need to be pedantic :-) The "C" means communication, and pickling
> objects is part of the communication between Python processes.

Yes, currently it is.

But is does not mean that it has to be. Clearly it is easier to avoid with
multiple interpreters in the same process. But it does not mean it is


From ron3200 at  Sun Jun 21 20:54:48 2015
From: ron3200 at (Ron Adam)
Date: Sun, 21 Jun 2015 14:54:48 -0400
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mm719p$tpg$>

On 06/20/2015 06:54 PM, Eric Snow wrote:
> Also note that Barry has a (rejected) PEP from a number of years ago about
> freezing objects...  That idea is likely out of scope as relates to my
> proposal, but it certainly factors in the problem space.

How about instead of freezing, just modify a flag or counter if it's 
mutated.  That could be turned off by default.

Then have a way to turn on an ObjectMutated warning or exception if any 
objects is modified within a routine, code block. or function.

With something like that, small parts of python can be tested and made less 
mutable in small sections at a time.  Possibly working from the inside out.

It doesn't force immutability but instead asks for it.

A small but not quite so impossible step. (?)


From abarnert at  Sun Jun 21 23:08:09 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 21 Jun 2015 14:08:09 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

First, a minor question: instead of banning fork entirely within subinterpreters, why not just document that it is illegal to do anything between fork and exec in a subinterpreters, except for a very small (but possibly extensible) subset of Python? For example, after fork, you can no longer access any channels, and you also can't use signals, threads, fork again, imports, assignments to builtins, raising exceptions, or a whole host of other things (but of course if you exec an entirely new Python interpreter, it can do any of those things). C extension modules could just have a flag that marks whether the whole module is fork-safe or not (defaulting to not). So, this allows a subinterpreter to use subprocess (or even multiprocessing, as long as you use the forkserver or spawn mechanism), and it gives code that intentionally wants to do tricky/dangerous things a way to do them, but it avoids all of the problems with accidentally breaking a subinterpreter by forking it and then doing bad things.

Second, a major question: In this proposal, are builtins and the modules map shared, or copied?

If they're copied, it seems like it would be hard to do that even as efficiently as multiprocessing, much less more efficiently. Of course you could fake this with CoW, but I'm not sure how you'd do that, short of CoWing the entire heap (by using clone instead of pthreads on Linux, or by doing a bunch of explicit mmap and related calls on other POSIX systems), at which point you're pretty close to just implementing fork or vfork yourself to avoid calling fork or vfork, and unlikely to get it as efficient or as robust as what's already there.

If they're shared, on the other hand, then it seems like it becomes very difficult to implement subinterpreter-safe code, because it's no longer safe to import a module, set a flag, call a registration function, etc.

From abarnert at  Sun Jun 21 23:24:19 2015
From: abarnert at (Andrew Barnert)
Date: Sun, 21 Jun 2015 14:24:19 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 21, 2015, at 06:09, Nick Coghlan <ncoghlan at> wrote:
> Avoiding object serialisation is indeed the main objective. With
> subinterpreters, we have a lot more options for that than we do with
> any form of IPC, including shared references to immutable objects, and
> the PEP 3118 buffer API.

It seems like you could provide a way to efficiently copy and share deeper objects than integers and buffers without sharing everything, assuming the user code knows, at the time those objects are created, that they will be copied or shared. Basically, you allocate the objects into a separate arena (along with allocating their refcounts on a separate page, as already mentioned). You can't add a reference to an outside object in an arena-allocated object, although you can copy that outside object into the arena. And then you just pass or clone (possibly by using CoW memory-mapping calls, only falling back to memcpy on platforms that can't do that) entire arenas instead of individual objects (so you don't need the fictitious memdeepcpy function that someone ridiculed earlier in this thread, but you get 90% of the benefits of having one).

This has the same basic advantages of forking, but it's doable efficiently on Windows, and doable less efficiently (but still better than spawn and pass) on even weird embedded platforms, and it forces code to be explicit about what gets shared and copied without forcing it to work through less-natural queue-like APIs.

Also, it seems like you could fake this entire arena API on top of pickle/copy for a first implementation, then just replace the underlying implementation separately.

From solipsis at  Mon Jun 22 00:41:57 2015
From: solipsis at (Antoine Pitrou)
Date: Mon, 22 Jun 2015 00:41:57 +0200
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <20150622004157.52bc3239@fsol>

On Sun, 21 Jun 2015 14:08:09 -0700
Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> First, a minor question: instead of banning fork entirely within subinterpreters, why not just document that it is illegal to do anything between fork and exec in a subinterpreters, except for a very small (but possibly extensible) subset of Python?

It's actually already the case in POSIX that most things are illegal
between fork() and exec(). However, to make fork() practical, many
libraries or frameworks tend to ignore those problems deliberately.



From ncoghlan at  Mon Jun 22 01:31:06 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 22 Jun 2015 09:31:06 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 22 Jun 2015 01:45, "Sturla Molden" <sturla.molden at> wrote:
> Nick Coghlan <ncoghlan at> wrote:
> > Sturla, expressing out and out contempt for entire communities of
> > capable, competent developers (both the creators of Windows and Java,
> > and the users of those platforms) has no place on the core Python
> > mailing lists. Please refrain from casually insulting entire groups of
> > people merely because you don't approve of their technical choices.
> I am not sure what you mean. Using threads on Windows and Java comes from
> necessity, not because developers are incompetent.

The folks *designing* Windows and Java are also people, and as creators of
development platforms go, it's hard to dispute their success in helping
folks solve real problems. We should be mindful of that when drawing
lessons from their experience.

> Windows does not provide
> a fork and processes are heavy-weight, hence multi-threading is the
> choice.

Windows actually has superior native parallel execution APIs to Linux in
some respects, but open source programming languages tend not to support
them, presumably due to a combination of Microsoft's longstanding hostile
perspective on open source licencing (which seems to finally be moderating
with their new CEO), and the even longer standing POSIX mindset that "fork
and file descriptors ought to be enough for anyone" (even if the workload
in the child processes is wildly different from that in the main process).

asyncio addresses that problem for Python in regards to IOCP vs select (et
al), and the configurable subprocess creation options addressed it for
multiprocessing, but I'm not aware of any efforts to get greenlets to use
fibres when they're available.

> > With
> > subinterpreters, we have a lot more options for that than we do with
> > any form of IPC, including shared references to immutable objects, and
> > the PEP 3118 buffer API.
> Perhaps. One could do this with shared memory as well, but a complicating
> factor is that the base address must be the same (or corrected for). But
> one could probably do low-level magic with memory mapping  to work around
> this. Particularly on 64-bit it is not really difficult to make sure a
> is mapped to the same address in two processes.
> It is certainly easier to achieve within a process. But if the plan for
> Erlang-style "share nothing" threads is to pickle and memcpy objects,
> is little or nothing to gain over using multiprocessing.

The Python level *semantics* should be as if the objects were being copied
(for ease of use), but the *implementation* should try to avoid actually
doing that (for speed of execution).

Assuming that can be done effectively *within* a process between
subinterpreters, then the possibility arises of figuring out how to use
shared memory to federate that approach across multiple processes. That
could then provide a significant performance improvement for

But since we have the option of tackling the simpler problem of
subinterpreters *first*, it makes sense to do that before diving into the
cross-platform arcana involved in similarly improving the efficiency of
multiprocessing's IPC.


> Sturla
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Mon Jun 22 01:39:20 2015
From: solipsis at (Antoine Pitrou)
Date: Mon, 22 Jun 2015 01:39:20 +0200
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <20150622013920.7c678330@fsol>

On Mon, 22 Jun 2015 09:31:06 +1000
Nick Coghlan <ncoghlan at> wrote:
> Windows actually has superior native parallel execution APIs to Linux in
> some respects, but open source programming languages tend not to support
> them, presumably due to a combination of Microsoft's longstanding hostile
> perspective on open source licencing (which seems to finally be moderating
> with their new CEO), and the even longer standing POSIX mindset that "fork
> and file descriptors ought to be enough for anyone" (even if the workload
> in the child processes is wildly different from that in the main process).

Or perhaps the fact that those superiors APIs are a PITA.
select() and friends may be crude performance-wise (though, strangely,
we don't see providers migrating massively to Windows in order to
improve I/O throughput), but they are simple to use.



From ncoghlan at  Mon Jun 22 01:47:29 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 22 Jun 2015 09:47:29 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <20150622013920.7c678330@fsol>
References: <>
Message-ID: <>

On 22 Jun 2015 09:40, "Antoine Pitrou" <solipsis at> wrote:
> On Mon, 22 Jun 2015 09:31:06 +1000
> Nick Coghlan <ncoghlan at> wrote:
> >
> > Windows actually has superior native parallel execution APIs to Linux in
> > some respects, but open source programming languages tend not to support
> > them, presumably due to a combination of Microsoft's longstanding
> > perspective on open source licencing (which seems to finally be
> > with their new CEO), and the even longer standing POSIX mindset that
> > and file descriptors ought to be enough for anyone" (even if the
> > in the child processes is wildly different from that in the main
> Or perhaps the fact that those superiors APIs are a PITA.
> select() and friends may be crude performance-wise (though, strangely,
> we don't see providers migrating massively to Windows in order to
> improve I/O throughput), but they are simple to use.

Aye, there's a reason using a smart IDE like Visual Studio, IntelliJ or
Eclipse is pretty much essential for both Windows and Java programming.
These platforms fall squarely on the "tools maven" side of Oliver Steele's
"IDE Divide":

The opportunity I think we have with Python is to put a cross platform text
editor friendly abstraction layer across these kinds of underlying
capabilities :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From sturla.molden at  Mon Jun 22 02:41:49 2015
From: sturla.molden at (Sturla Molden)
Date: Mon, 22 Jun 2015 02:41:49 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <20150622013920.7c678330@fsol>
References: <>
Message-ID: <mm7lkd$th0$>

On 22/06/15 01:39, Antoine Pitrou wrote:

> Or perhaps the fact that those superiors APIs are a PITA.

Not all of them, no.

HeapAlloc is a good example. Very easy to use, and the "one heap per 
thread" design often gives excellent performance compared to a single 
global heap. But on Linux we only have malloc et al., allocating from 
the global heap. How many Linux programmers have even considered using 
multiple heaps in combination with multi-threading? I can assure you it 
is not common.

A good idea is to look at the Python C API. We have PyMem_Malloc, but 
nothing that compares to Windows' HeapAlloc.

Not only does HeapAlloc remove the contention for the global heap, it 
can also serialize. Instead of serializing an object by traversing all 
references in the object tree, we just serialize the heap from which it 
was allocated.

And as for garbage collection, why not deallocate the whole heap in one 
blow? Is the any reason to pair each malloc with free if one could just 
zap the whole heap? That is what HeapDestroy does. On Linux we would 
typically homebrew a memory pool to achieve the same thing. But a memory 
pool needs to traverse a chain of pointers and call free() multiple 
times, each time with contention for the spinlock protecting the global 
heap. And when allocating from a memory pool we also have contention for 
the global heap. It cannot in any way compare to the performance of the 
Win API HeapCreate/HeapDestroy and HeapAlloc/HeapFree.


From solipsis at  Mon Jun 22 02:49:38 2015
From: solipsis at (Antoine Pitrou)
Date: Mon, 22 Jun 2015 02:49:38 +0200
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <20150622024938.39ffba70@fsol>

On Mon, 22 Jun 2015 09:47:29 +1000
Nick Coghlan <ncoghlan at> wrote:
> >
> > Or perhaps the fact that those superiors APIs are a PITA.
> > select() and friends may be crude performance-wise (though, strangely,
> > we don't see providers migrating massively to Windows in order to
> > improve I/O throughput), but they are simple to use.
> Aye, there's a reason using a smart IDE like Visual Studio, IntelliJ or
> Eclipse is pretty much essential for both Windows and Java programming.
> These platforms fall squarely on the "tools maven" side of Oliver Steele's
> "IDE Divide":

It's not about using an IDE, it's the more complex and delicate control
flow that asynchronous IO (IOCP / Overlapped) imposes compared to
non-blocking IO (e.g. select()).

Not to mention that lifetime issues are hard to handle safely and
generically before Vista (that is, before CancelIOEx():
s.85%29.aspx -- "The CancelIoEx function allows you to cancel requests
in threads other than the calling thread. The CancelIo function only
cancels requests in the same thread that called the CancelIo function")



From ncoghlan at  Mon Jun 22 03:47:45 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 22 Jun 2015 11:47:45 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 21 June 2015 at 07:42, Eric Snow <ericsnowcurrently at> wrote:
> tl;dr Let's exploit multiple cores by fixing up subinterpreters,
> exposing them in Python, and adding a mechanism to safely share
> objects between them.
> This proposal is meant to be a shot over the bow, so to speak.  I plan
> on putting together a more complete PEP some time in the future, with
> content that is more refined along with references to the appropriate
> online resources.
> Feedback appreciated!  Offers to help even more so! :)

It occurred to me in the context of another conversation that you (or
someone else!) may be able to prototype some of the public API ideas
for this using Jython and Vert.x:

That idea and some of the initial feedback in this thread also made me
realise that it is going to be essential to keep in mind that there
are key goals at two different layers here:

* design a compelling implementation independent public API for CSP
style programming in Python
* use subinterpreters to implement that API efficiently in CPython

There's a feedback loop between those two goals where limitations on
what's feasible in CPython may constrain the design of the public API,
and the design of the API may drive enhancements to the existing
subinterpreter capability, but we shouldn't lose sight of the fact
that they're *separate* goals.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From jeanpierreda at  Mon Jun 22 04:16:32 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Sun, 21 Jun 2015 19:16:32 -0700
Subject: [Python-ideas] Responsive signal handling
Message-ID: <>

On the topic of obscure concurrency trivia, signal handling in Python
is not very friendly, and I'm wondering if anyone is interested in
reviewing changes to fix it.

The world today: handlers only run in the main thread, but if the main
thread is running a bytecode (e.g. a call to a C function), it will
wait for that first. For example, signal handlers don't get run if you
are in the middle of a lock acquisition, thread join, or (sometimes) a
select call, until after the call returns (which may take a very long

This makes it difficult to be responsive to signals without
acrobatics, or without using a library that does those acrobatics for
you (such as Twisted.) Being responsive to SIGTERM and SIGINT is, IMO,
important for running programs in the cloud, since otherwise they may
be forcefully killed by the job manager, causing user-facing errors.
(It's also annoying as a command line user when you can't kill a
process with anything less than SIGKILL.)

I would stress that, by and large, the signal module is a trap that
most people get wrong, and I don't think the stdlib solution should be
the way it is. (e.g. it looks to me like gunicorn gets it wrong, and
certainly asyncio did in early versions.)

One could implement one or both of the following:

- Keep only running signal handlers in the main thread, but allow them
to run even in the middle of a call to a C function for as many C
functions as we can.

This is not possible in general, but it can be made to work for all
blocking operations in the stdlib. Operations that run in C but just
take a long time, or that are part of third-party code, will continue
to inhibit responsiveness.

- Run signal handlers in a dedicated separate thread.

IMO this is generally better than running signal handlers in the main
thread, because it eliminates the separate concept of "async-safe" and
just requires "thread-safe". So you can use regular threading
synchronization primitives for safety, instead of relying on luck /
memorized lists of atomic/re-entrant operations.

Something still needs to run in the main thread though, for e.g.
KeyboardInterrupt, so this is not super straightforward. Also, it
could break any code that really relies on signal handlers running in
the main thread.

Either approach can be turned into a library, albeit potentially
hackily in the first case.

-- Devin

From cs at  Mon Jun 22 09:52:07 2015
From: cs at (Cameron Simpson)
Date: Mon, 22 Jun 2015 17:52:07 +1000
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

On 21Jun2015 19:16, Devin Jeanpierre <jeanpierreda at> wrote:
>On the topic of obscure concurrency trivia, signal handling in Python
>is not very friendly, and I'm wondering if anyone is interested in
>reviewing changes to fix it.
>The world today: handlers only run in the main thread, but if the main
>thread is running a bytecode (e.g. a call to a C function), it will
>wait for that first. For example, signal handlers don't get run if you
>are in the middle of a lock acquisition, thread join, or (sometimes) a
>select call, until after the call returns (which may take a very long
>This makes it difficult to be responsive to signals without
>acrobatics, or without using a library that does those acrobatics for
>you (such as Twisted.) Being responsive to SIGTERM and SIGINT is, IMO,
>important for running programs in the cloud, since otherwise they may
>be forcefully killed by the job manager, causing user-facing errors.
>(It's also annoying as a command line user when you can't kill a
>process with anything less than SIGKILL.)

I agree with all of this, but I do think that handling signals in the main 
program by default is a sensible default: it gives very predictable behaviour.

>- Keep only running signal handlers in the main thread, but allow them
>to run even in the middle of a call to a C function for as many C
>functions as we can.

This feels fragile: this means that former one could expect C calls to be 
"atomic" from the main thread's point of view and conversely the C functions 
can expect the main thread (or whatever calling thread called them) is paused 
during their execution. As soon as the calling thread can reactivate these 
guarrentees are broken. Supposing the C call is doing things to thread local 
Python variables, for just one scenario.

So I'm -1 on this on the face of it.

>This is not possible in general, but it can be made to work for all
>blocking operations in the stdlib.

Hmm. I'm not sure that you will find this universally so. No, I have no 
examples proving my intuition here.

>Operations that run in C but just
>take a long time, or that are part of third-party code, will continue
>to inhibit responsiveness.
>- Run signal handlers in a dedicated separate thread.
>IMO this is generally better than running signal handlers in the main
>thread, because it eliminates the separate concept of "async-safe" and
>just requires "thread-safe". So you can use regular threading
>synchronization primitives for safety, instead of relying on luck /
>memorized lists of atomic/re-entrant operations.

Yes, I am in favour of this or something like it. Personally I would go for 
either or both of:

  - a stdlib function to specify the thread to handle signals instead of main

  - a stdlib function to declare that signals should immediately place a nice descriptive "signal" object on a Queue, and leaves it to the user to handle the queue (for example, by spawning a thread to consume it)

>Something still needs to run in the main thread though, for e.g.
>KeyboardInterrupt, so this is not super straightforward.

Is this necessarily true?

>Also, it
>could break any code that really relies on signal handlers running in
>the main thread.

Which is why it should never be the default; I am firmly of the opinion that 
that changed handling should be requested by the program.

Cameron Simpson <cs at>

Facts do not discourage the conspiracy-minded.
        - Robert Crawford <rawford at>

From mal at  Mon Jun 22 10:16:52 2015
From: mal at (M.-A. Lemburg)
Date: Mon, 22 Jun 2015 10:16:52 +0200
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

On 22.06.2015 04:16, Devin Jeanpierre wrote:
> On the topic of obscure concurrency trivia, signal handling in Python
> is not very friendly, and I'm wondering if anyone is interested in
> reviewing changes to fix it.
> The world today: handlers only run in the main thread, but if the main
> thread is running a bytecode (e.g. a call to a C function), it will
> wait for that first. For example, signal handlers don't get run if you
> are in the middle of a lock acquisition, thread join, or (sometimes) a
> select call, until after the call returns (which may take a very long
> time).

IMO, the above can easily be solved by going with an application
design which doesn't use the main thread for any long running
tasks, but instead runs these in separate threads.

I don't know what the overall situation is today, but at least in
the past, signal handling only worked reliably across platforms
in the main thread of the application.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 22 2015)
>>> Python Projects, Coaching and Consulting ...
>>> mxODBC Plone/Zope Database Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...
2015-06-16: Released eGenix pyOpenSSL 0.13.10 ...
2015-06-10: Released mxODBC Plone/Zope DA 2.2.2
2015-07-20: EuroPython 2015, Bilbao, Spain ...             28 days to go Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From guido at  Mon Jun 22 10:28:04 2015
From: guido at (Guido van Rossum)
Date: Mon, 22 Jun 2015 10:28:04 +0200
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

I would regret losing the behavior where just raising an exception in a
signal handler causes the main thread to be interrupted by that exception.

I agree it would be nice if handlers ran when the main thread is waiting
for I/O.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From aquavitae69 at  Mon Jun 22 10:30:26 2015
From: aquavitae69 at (David Townshend)
Date: Mon, 22 Jun 2015 10:30:26 +0200
Subject: [Python-ideas] Pathlib additions & changes
Message-ID: <>


Recently I've been trying out pathlib in some real code, and while it is a
vast improvement on messing around with os, os.path and shutil there are a
couple of suggestions I'd like to make.

** TL;DR: Add Path.copy, add Path.remove (replaces Path.rmdir and
Path.unlink) and add flags to several methods.


1.  Add a copy method, i.e. source_path.copy(target_path), which by default
should behave like shutil.copy2.

2.  Why bother with the distinction between Path.unlink and Path.rmdir?
Obviously they apply to different types of paths, but to a user, either way
you just want to remove whatever is at the path, so perhaps just have a
single Path.remove instead (but see point 3 below).

3.  There are several other minor irritations where a common pattern
requires several lines or the use of a lower-level library such as shutil.
For example:

  *  Mkdir where path exists, but we don't care (common pattern on scripts)
    if not path.exists():

  *  Recursively remove a directory (no sane way using pathlib alone)

  *  Move a file, creating parents if necessary
    py> if not target.parent.exists():

There are others, but these are a couple that spring to mind.  There are
three options.  Either we add a bunch of specific functions for each of
these (e.g. Path.rmtree, Path.rename_with_mkdir, etc), or we add a whole
lot of boolean arguments (e.g. Path.rename(make_parents=True), or we use
flags, e.g. Path.rename(flags=MAKE_PARENTS)

Using flags is, IMHO the neatest solution, and could replace some boolean
arguments already included.  What follows is a suggestion of where flags
might be useful, including the new methods suggested above.  I haven't put
a huge amount of thought into these, wanting to just get the general idea
on the table, so I'm sure that upon closer inspection some won't make much
sense or could be better named.

    iterdir: RECURSIVE (maybe not worth it because of globbing)
    lchmod: RECURSIVE (Could be dropped in favour of
    lstat: (Could be dropped in favour of stat(flags=DONT_FOLLOW_SYMLINKS))
    remove: RECURSIVE
    replace: (Could be dropped in favour of rename(flags=OVERWRITE_EXISTING)
    rmdir: (Could be dropped in favour of remove)
    unlink: (Could be dropped in favour of remove)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Mon Jun 22 10:41:17 2015
From: solipsis at (Antoine Pitrou)
Date: Mon, 22 Jun 2015 10:41:17 +0200
Subject: [Python-ideas] Responsive signal handling
References: <>
Message-ID: <20150622104117.31b69ddc@fsol>

On Sun, 21 Jun 2015 19:16:32 -0700
Devin Jeanpierre <jeanpierreda at>
> One could implement one or both of the following:
> - Keep only running signal handlers in the main thread, but allow them
> to run even in the middle of a call to a C function for as many C
> functions as we can.
> This is not possible in general, but it can be made to work for all
> blocking operations in the stdlib.

Are you aware that it is already the case today (perhaps not for "all
blocking operations", but at least for those that return EINTR when
By the way, have you read ?



From jeanpierreda at  Mon Jun 22 10:42:38 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Mon, 22 Jun 2015 01:42:38 -0700
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 22, 2015 at 12:52 AM, Cameron Simpson <cs at> wrote:
> On 21Jun2015 19:16, Devin Jeanpierre <jeanpierreda at> wrote:
>> This is not possible in general, but it can be made to work for all
>> blocking operations in the stdlib.
> Hmm. I'm not sure that you will find this universally so. No, I have no
> examples proving my intuition here.

If you fix select, everything else can hypothetically follow, as you
can run all C functions in another thread and signal the result is
ready using an fd. (This is my idea for a hacky third-party library.
It only ruins stack traces!)

>> Operations that run in C but just
>> take a long time, or that are part of third-party code, will continue
>> to inhibit responsiveness.
>> - Run signal handlers in a dedicated separate thread.
>> IMO this is generally better than running signal handlers in the main
>> thread, because it eliminates the separate concept of "async-safe" and
>> just requires "thread-safe". So you can use regular threading
>> synchronization primitives for safety, instead of relying on luck /
>> memorized lists of atomic/re-entrant operations.
> Yes, I am in favour of this or something like it. Personally I would go for
> either or both of:
>  - a stdlib function to specify the thread to handle signals instead of main

This just moves the problem to another thread. One can already today
try to keep the main thread free to handle signals, it's just hard.

>  - a stdlib function to declare that signals should immediately place a nice
> descriptive "signal" object on a Queue, and leaves it to the user to handle
> the queue (for example, by spawning a thread to consume it)

I like this. It mirror's Linux's selectfd, too. One small correction,
it can't literally be a Queue, because those aren't safe to use in
signal handlers. (It can be a pipe that is wrapped in a Queue-like
interface, though, and if we do that, we can even use native signalfd
if we want.)

It also resolves an unspoken concern I had, which is that silently
starting threads for the user feels icky.

>> Something still needs to run in the main thread though, for e.g.
>> KeyboardInterrupt, so this is not super straightforward.
> Is this necessarily true?

What I mean is that there needs to be a way to raise KeyboardInterrupt
in the main thread from a signal handler. If, as you suggest, the old
behavior stays around, then that's enough.

Another option, if we went with a dedicated signal handling thread,
would be that uncaught exceptions propagate to the main thread when it
gets around to it.

-- Devin

From jeanpierreda at  Mon Jun 22 10:56:54 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Mon, 22 Jun 2015 01:56:54 -0700
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <20150622104117.31b69ddc@fsol>
References: <>
Message-ID: <>

On Mon, Jun 22, 2015 at 1:41 AM, Antoine Pitrou <solipsis at> wrote:
> On Sun, 21 Jun 2015 19:16:32 -0700
> Devin Jeanpierre <jeanpierreda at>
> wrote:
>> One could implement one or both of the following:
>> - Keep only running signal handlers in the main thread, but allow them
>> to run even in the middle of a call to a C function for as many C
>> functions as we can.
>> This is not possible in general, but it can be made to work for all
>> blocking operations in the stdlib.
> Are you aware that it is already the case today (perhaps not for "all
> blocking operations", but at least for those that return EINTR when
> interrupted)?

That only applies when the signal arrives during the system call. For
example, if you call Python and a signal is received
during argument parsing, EINTR is not returned because C select() is
not running yet. See .
The only cross-platform fix for this that I am aware of is to make
select() use the self-pipe trick:

 - set up a pipe, and use set_wakeup_fd to make signals write to that pipe
 - check for signals in case any arrived before you called set_wakeup_fd
 - call select() as before, but also select on the pipe

Without something like this, where a signal is handled no matter when
it comes in, even a call which returns EINTR usually can miss signals,
resulting in potentially drastically reduced responsiveness. This
exact trick doesn't work for every blocking call, just for the ones in
the select module.

I provided a patch on that issue which does this. It is atrocious. :(
If I rewrote it, I'd prefer to write it as a pure-python wrapper
around select().

> By the way, have you read ?

I did once, but I reread it now. I think the PEP is focused not on
making signal handling more responsive, but on making EINTR less of a
trap. Although it does mention responsiveness in use case 2, it
doesn't go far enough.

I think the following cases matter:

- Signals that arrive before the system call starts, but after the
  Python function call begins
- Signals that arrive during a call to a blocking function which
  doesn't return EINTR
- Signals that arrive during a call to a C function which doesn't
  block at all, but is just slow

-- Devin

From andrew.svetlov at  Mon Jun 22 12:13:18 2015
From: andrew.svetlov at (Andrew Svetlov)
Date: Mon, 22 Jun 2015 13:13:18 +0300
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

IIRC signal handler may be blocked by threading synchronization
primitives etc. on Python 2.7 but it's not an issue for Python 3.
I don't recall exact version -- it's 3.2 likely. Maybe Benjamin
Peterson would provide more info -- hg blame says he is author of
EINTR processing in _threadmodule.c

On Mon, Jun 22, 2015 at 11:56 AM, Devin Jeanpierre
<jeanpierreda at> wrote:
> On Mon, Jun 22, 2015 at 1:41 AM, Antoine Pitrou <solipsis at> wrote:
>> On Sun, 21 Jun 2015 19:16:32 -0700
>> Devin Jeanpierre <jeanpierreda at>
>> wrote:
>>> One could implement one or both of the following:
>>> - Keep only running signal handlers in the main thread, but allow them
>>> to run even in the middle of a call to a C function for as many C
>>> functions as we can.
>>> This is not possible in general, but it can be made to work for all
>>> blocking operations in the stdlib.
>> Are you aware that it is already the case today (perhaps not for "all
>> blocking operations", but at least for those that return EINTR when
>> interrupted)?
> That only applies when the signal arrives during the system call. For
> example, if you call Python and a signal is received
> during argument parsing, EINTR is not returned because C select() is
> not running yet. See .
> The only cross-platform fix for this that I am aware of is to make
> select() use the self-pipe trick:
>  - set up a pipe, and use set_wakeup_fd to make signals write to that pipe
>  - check for signals in case any arrived before you called set_wakeup_fd
>  - call select() as before, but also select on the pipe
> Without something like this, where a signal is handled no matter when
> it comes in, even a call which returns EINTR usually can miss signals,
> resulting in potentially drastically reduced responsiveness. This
> exact trick doesn't work for every blocking call, just for the ones in
> the select module.
> I provided a patch on that issue which does this. It is atrocious. :(
> If I rewrote it, I'd prefer to write it as a pure-python wrapper
> around select().
>> By the way, have you read ?
> I did once, but I reread it now. I think the PEP is focused not on
> making signal handling more responsive, but on making EINTR less of a
> trap. Although it does mention responsiveness in use case 2, it
> doesn't go far enough.
> I think the following cases matter:
> - Signals that arrive before the system call starts, but after the
>   Python function call begins
> - Signals that arrive during a call to a blocking function which
>   doesn't return EINTR
> - Signals that arrive during a call to a C function which doesn't
>   block at all, but is just slow
> -- Devin
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

Andrew Svetlov

From rymg19 at  Mon Jun 22 16:30:22 2015
From: rymg19 at (Ryan Gonzalez)
Date: Mon, 22 Jun 2015 09:30:22 -0500
Subject: [Python-ideas] Pathlib additions & changes
In-Reply-To: <>
References: <>
Message-ID: <>

On June 22, 2015 3:30:26 AM CDT, David Townshend <aquavitae69 at> wrote:
>Recently I've been trying out pathlib in some real code, and while it
>is a
>vast improvement on messing around with os, os.path and shutil there
>are a
>couple of suggestions I'd like to make.
>** TL;DR: Add Path.copy, add Path.remove (replaces Path.rmdir and
>Path.unlink) and add flags to several methods.
>1.  Add a copy method, i.e. source_path.copy(target_path), which by
>should behave like shutil.copy2.
>2.  Why bother with the distinction between Path.unlink and Path.rmdir?
>Obviously they apply to different types of paths, but to a user, either
>you just want to remove whatever is at the path, so perhaps just have a
>single Path.remove instead (but see point 3 below).
>3.  There are several other minor irritations where a common pattern
>requires several lines or the use of a lower-level library such as
>For example:
>*  Mkdir where path exists, but we don't care (common pattern on
>    if not path.exists():
>        path.mkdir(parent=True)

You can just do'

except FileExistsError:

>  *  Recursively remove a directory (no sane way using pathlib alone)
>    shutil.rmtree(str(path))
>  *  Move a file, creating parents if necessary
>    py> if not target.parent.exists():
>            target.parent.mkdir(parents=true)
>        source.rename(target)
>There are others, but these are a couple that spring to mind.  There
>three options.  Either we add a bunch of specific functions for each of
>these (e.g. Path.rmtree, Path.rename_with_mkdir, etc), or we add a
>lot of boolean arguments (e.g. Path.rename(make_parents=True), or we
>flags, e.g. Path.rename(flags=MAKE_PARENTS)
>Using flags is, IMHO the neatest solution, and could replace some
>arguments already included.  What follows is a suggestion of where
>might be useful, including the new methods suggested above.  I haven't
>a huge amount of thought into these, wanting to just get the general
>on the table, so I'm sure that upon closer inspection some won't make
>sense or could be better named.

I prefer keyword-only arguments. Flags aren't really Pythonic, IMO.

>    iterdir: RECURSIVE (maybe not worth it because of globbing)
>    lchmod: RECURSIVE (Could be dropped in favour of
>lstat: (Could be dropped in favour of stat(flags=DONT_FOLLOW_SYMLINKS))
>    remove: RECURSIVE
>replace: (Could be dropped in favour of
>    rmdir: (Could be dropped in favour of remove)
>    unlink: (Could be dropped in favour of remove)
>Python-ideas mailing list
>Python-ideas at
>Code of Conduct:

Sent from my Android device with K-9 Mail. Please excuse my brevity.

From rosuav at  Mon Jun 22 16:59:18 2015
From: rosuav at (Chris Angelico)
Date: Tue, 23 Jun 2015 00:59:18 +1000
Subject: [Python-ideas] Pathlib additions & changes
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 22, 2015 at 6:30 PM, David Townshend <aquavitae69 at> wrote:
> 3.  There are several other minor irritations where a common pattern
> requires several lines or the use of a lower-level library such as shutil.
> For example:
>   *  Recursively remove a directory (no sane way using pathlib alone)
>     shutil.rmtree(str(path))

I'm not sure shutil should be considered a lower-level library. It's a
separate set of tools aimed at shell-like functionality. Removing a
directory tree seems right for shutil; what if shutil.rmtree() would
accept a Path object as an alternative to a str? That'd make
reasonable sense, and it'd feel like the two modules were working well

(Or can it already?)


From p.f.moore at  Mon Jun 22 17:53:24 2015
From: p.f.moore at (Paul Moore)
Date: Mon, 22 Jun 2015 16:53:24 +0100
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

On 22 June 2015 at 09:42, Devin Jeanpierre <jeanpierreda at> wrote:
> On Mon, Jun 22, 2015 at 12:52 AM, Cameron Simpson <cs at> wrote:
>> On 21Jun2015 19:16, Devin Jeanpierre <jeanpierreda at> wrote:
>>> This is not possible in general, but it can be made to work for all
>>> blocking operations in the stdlib.
>> Hmm. I'm not sure that you will find this universally so. No, I have no
>> examples proving my intuition here.
> If you fix select, everything else can hypothetically follow, as you
> can run all C functions in another thread and signal the result is
> ready using an fd. (This is my idea for a hacky third-party library.
> It only ruins stack traces!)

This particular approach presumably only works on Unix? (On Windows,
select is not a general signalling operation, it only works for
sockets). Presumably a cross-platform solution would need to use
appropriate OS-native signalling based on the platform?


From p.f.moore at  Mon Jun 22 17:59:26 2015
From: p.f.moore at (Paul Moore)
Date: Mon, 22 Jun 2015 16:59:26 +0100
Subject: [Python-ideas] Pathlib additions & changes
In-Reply-To: <>
References: <>
Message-ID: <>

On 22 June 2015 at 15:59, Chris Angelico <rosuav at> wrote:
> On Mon, Jun 22, 2015 at 6:30 PM, David Townshend <aquavitae69 at> wrote:
>> 3.  There are several other minor irritations where a common pattern
>> requires several lines or the use of a lower-level library such as shutil.
>> For example:
>>   *  Recursively remove a directory (no sane way using pathlib alone)
>>     shutil.rmtree(str(path))
> I'm not sure shutil should be considered a lower-level library. It's a
> separate set of tools aimed at shell-like functionality. Removing a
> directory tree seems right for shutil; what if shutil.rmtree() would
> accept a Path object as an alternative to a str? That'd make
> reasonable sense, and it'd feel like the two modules were working well
> together.

Agreed, shutil is higher level than pathlib, not lower.

Having more stdlib functions (shutil is the most obvious example, but
there are others) take pathlib.Path objects as well as strings would
be a good change (and would set a nice example for 3rd party file
manipulation modules). I'm sure the usual "patches welcome" applies

The main irritation about using "higher level" modules with path
objects is the proliferation of str() calls. Accepting path objects
natively fixes that:

    from shutil import rmtree

looks fine to me.


From aquavitae69 at  Mon Jun 22 18:44:27 2015
From: aquavitae69 at (David Townshend)
Date: Mon, 22 Jun 2015 18:44:27 +0200
Subject: [Python-ideas] Pathlib additions & changes
In-Reply-To: <>
References: <>
Message-ID: <>

On 22 Jun 2015 17:59, "Paul Moore" <p.f.moore at> wrote:
> On 22 June 2015 at 15:59, Chris Angelico <rosuav at> wrote:
> > On Mon, Jun 22, 2015 at 6:30 PM, David Townshend <aquavitae69 at>
> >> 3.  There are several other minor irritations where a common pattern
> >> requires several lines or the use of a lower-level library such as
> >> For example:
> >>
> >>   *  Recursively remove a directory (no sane way using pathlib alone)
> >>     shutil.rmtree(str(path))
> >
> > I'm not sure shutil should be considered a lower-level library. It's a
> > separate set of tools aimed at shell-like functionality. Removing a
> > directory tree seems right for shutil; what if shutil.rmtree() would
> > accept a Path object as an alternative to a str? That'd make
> > reasonable sense, and it'd feel like the two modules were working well
> > together.
> Agreed, shutil is higher level than pathlib, not lower.
> Having more stdlib functions (shutil is the most obvious example, but
> there are others) take pathlib.Path objects as well as strings would
> be a good change (and would set a nice example for 3rd party file
> manipulation modules). I'm sure the usual "patches welcome" applies
> :-)
> The main irritation about using "higher level" modules with path
> objects is the proliferation of str() calls. Accepting path objects
> natively fixes that:
>     from shutil import rmtree
>     rmtree(path)
> looks fine to me.
> Paul
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

I was going on the fact that the PEP talks about possibly including shutil
functions, but I have no problem with making them accept Paths instead. If
that's the best approach I'll see if I can put together a patch.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg at  Mon Jun 22 19:03:11 2015
From: greg at (Gregory P. Smith)
Date: Mon, 22 Jun 2015 17:03:11 +0000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 20, 2015 at 5:42 PM Chris Angelico <rosuav at> wrote:

> On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow <ericsnowcurrently at>
> wrote:
> > * disallow forking within subinterpreters
> I love the idea as a whole (if only because the detractors can be told
> "Just use subinterpreters, then you get concurrency"), but this seems
> like a tricky restriction. That means no subprocess.Popen, no shelling
> out to other applications. And I don't know what of other restrictions
> might limit any given program. Will it feel like subinterpreters are
> "write your code according to these tight restrictions and it'll
> work", or will it be more of "most programs will run in parallel just
> fine, but there are a few things to be careful of"?

It wouldn't disallow use of subprocess, only os.fork(). C extension modules
can alway fork. The restriction being placed in this scheme is: "if your
extension module code forks from a subinterpreter, the child process MUST
not return control to Python."

I'm not sure if this restriction would actually be *needed* or not but I
agree with it regardless.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg at  Mon Jun 22 19:37:01 2015
From: greg at (Gregory P. Smith)
Date: Mon, 22 Jun 2015 17:37:01 +0000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 4:56 AM Devin Jeanpierre <jeanpierreda at>

> On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow <ericsnowcurrently at>
> wrote:
> >
> > On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" <jeanpierreda at>
> wrote:
> >>
> >> It's worthwhile to consider fork as an alternative.  IMO we'd get a
> >> lot out of making forking safer, easier, and more efficient. (e.g.
> >> respectively: adding an atfork registration mechanism; separating out
> >> the bits of multiprocessing that use pickle from those that d, I still
> disagreeon't;
> >> moving the refcount to a separate page, or allowing it to be frozen
> >> prior to a fork.)
> >
> > So leverage a common base of code with the multiprocessing module?
> What is this question in response to? I don't understand.
> > I would expect subinterpreters to use less memory.  Furthermore creating
> > them would be significantly faster.  Passing objects between them would
> be
> > much more efficient.  And, yes, cross-platform.
> Maybe I don't understand how subinterpreters work. AIUI, the whole
> point of independent subinterpreters is that they share no state. So
> if I have a web server, each independent serving thread has to do all
> of the initialization (import HTTP libraries, etc.), right? Compare
> with forking, where the initialization is all done and then you fork,
> and you are immediately ready to serve, using the data structures
> shared with all the other workers, which is only copied when it is
> written to.

Unfortunately CPython subinterpreters do share some state, though it is not
visible to the running code in many cases.  Thus the other mentions of
"wouldn't it be nice if CPython didn't assume a single global state per
process" (100% agreed, but tangential to this discussion)...

You are correct that some things that could make sense to share, such as
imported modules, would not be shared as they are in a forked environment.

This is an important oddity of subinterpreters: They have to re-import
everything other than extension modules. When you've got a big process with
a ton of modules (like, say, 100s of protocol buffers...), that's going to
be a non-starter (pun intended) for the use of threads+subinterpreters as a
fast form of concurrency if they need to import most of those from each
subinterpreter. startup latency and cpu usage += lots. (possibly uses more
memory as well but given our existing refcount implementation forcing
needless PyObject page writes during a read causing fork to
copy-on-write... impossible to guess)

What this means for subinterpreters in this case is not much different from
starting up multiple worker processes: You need to start them up and wait
for them to be ready to serve, then reuse them as long as feasible before
recycling them to start up a new one. The startup cost is high.

I'm not entirely sold on this overall proposal, but I think a result of it
*could* be to make our subinterpreter support better which would be a good

We have had to turn people away from subinterpreters in the past for use as
part of their multithreaded C++ server where they wanted to occasionally
run some Python code in embedded interpreters as part of serving some
requests. Doing that would suddenly single thread their application
(GIIIIIIL!) for all requests currently executing Python code despite
multiple subinterpreters. The general advice for that: Run multiple Python
processes and make RPCs to those from the C++ code. It allows for
parallelism and ultimately scales better, if ever needed, as it can be
easily spread across machines. Which one is more complex to maintain? Good


> Re passing objects, see below.
> I do agree it's cross-platform, but right now that's the only thing I
> agree with.
> >> Note: I don't count the IPC cost of forking, because at least on
> >> linux, any way to efficiently share objects between independent
> >> interpreters in separate threads can also be ported to independent
> >> interpreters in forked subprocesses,
> >
> > How so?  Subinterpreters are in the same process.  For this proposal each
> > would be on its own thread.  Sharing objects between them through
> channels
> > would be more efficient than IPC.  Perhaps I've missed something?
> You might be missing that memory can be shared between processes, not
> just threads, but I don't know.
> The reason passing objects between processes is so slow is currently
> *nearly entirely* the cost of serialization. That is, it's the fact
> that you are passing an object to an entirely separate interpreter,
> and need to serialize the whole object graph and so on. If you can
> make that fast without serialization, for shared memory threads, then
> all the serialization becomes unnecessary, and you can either write to
> a pipe (fast, if it's a non-container), or used shared memory from the
> beginning (instantaneous). This is possible on any POSIX OS. Linux
> lets you go even further.
> -- Devin
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Tue Jun 23 00:30:13 2015
From: ncoghlan at (Nick Coghlan)
Date: Tue, 23 Jun 2015 08:30:13 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 23 Jun 2015 03:37, "Gregory P. Smith" <greg at> wrote:
> On Sun, Jun 21, 2015 at 4:56 AM Devin Jeanpierre <jeanpierreda at>
>> On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow <ericsnowcurrently at>
>> >
>> > On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" <jeanpierreda at>
>> >>
>> >> It's worthwhile to consider fork as an alternative.  IMO we'd get a
>> >> lot out of making forking safer, easier, and more efficient. (e.g.
>> >> respectively: adding an atfork registration mechanism; separating out
>> >> the bits of multiprocessing that use pickle from those that d, I
still disagreeon't;
>> >> moving the refcount to a separate page, or allowing it to be frozen
>> >> prior to a fork.)
>> >
>> > So leverage a common base of code with the multiprocessing module?
>> What is this question in response to? I don't understand.
>> > I would expect subinterpreters to use less memory.  Furthermore
>> > them would be significantly faster.  Passing objects between them
would be
>> > much more efficient.  And, yes, cross-platform.
>> Maybe I don't understand how subinterpreters work. AIUI, the whole
>> point of independent subinterpreters is that they share no state. So
>> if I have a web server, each independent serving thread has to do all
>> of the initialization (import HTTP libraries, etc.), right? Compare
>> with forking, where the initialization is all done and then you fork,
>> and you are immediately ready to serve, using the data structures
>> shared with all the other workers, which is only copied when it is
>> written to.
> Unfortunately CPython subinterpreters do share some state, though it is
not visible to the running code in many cases.  Thus the other mentions of
"wouldn't it be nice if CPython didn't assume a single global state per
process" (100% agreed, but tangential to this discussion)...
> You are correct that some things that could make sense to share, such as
imported modules, would not be shared as they are in a forked environment.
> This is an important oddity of subinterpreters: They have to re-import
everything other than extension modules. When you've got a big process with
a ton of modules (like, say, 100s of protocol buffers...), that's going to
be a non-starter (pun intended) for the use of threads+subinterpreters as a
fast form of concurrency if they need to import most of those from each
subinterpreter. startup latency and cpu usage += lots. (possibly uses more
memory as well but given our existing refcount implementation forcing
needless PyObject page writes during a read causing fork to
copy-on-write... impossible to guess)
> What this means for subinterpreters in this case is not much different
from starting up multiple worker processes: You need to start them up and
wait for them to be ready to serve, then reuse them as long as feasible
before recycling them to start up a new one. The startup cost is high.

While I don't believe it's clear from the current text in the PEP (mostly
because I only figured it out while hacking on the prototype
implementation), PEP 432 should actually give us much better control over
how subinterpreters are configured, as many more interpreter settings move
out of global variables and into the interpreter state: (the global variables will still
exist, but primarily as an input to the initial configuration of the main

The current state of that work can be seen at

While a lot of things are broken there, it's at least to the point where it
can start running the regression test suite under the new 2-phase
initialisation model.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From sturla.molden at  Tue Jun 23 00:50:09 2015
From: sturla.molden at (Sturla Molden)
Date: Mon, 22 Jun 2015 22:50:09 +0000 (UTC)
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <>

"Gregory P. Smith" <greg at> wrote:

> What this means for subinterpreters in this case is not much different from
> starting up multiple worker processes: You need to start them up and wait
> for them to be ready to serve, then reuse them as long as feasible before
> recycling them to start up a new one. The startup cost is high.

The statup cost for worker processes is high on Windows. It is very small
on nearly any other OS.


From pmiscml at  Tue Jun 23 01:15:30 2015
From: pmiscml at (Paul Sokolovsky)
Date: Tue, 23 Jun 2015 02:15:30 +0300
Subject: [Python-ideas] millisecond and microsecond times without floats
Message-ID: <20150623021530.74ce1ebe@x230>

Hello from MicroPython, a lean Python implementation
scaling down to run even on microcontrollers 

Our target hardware base oftentimes lacks floating point support, and
using software emulation is expensive. So, we would like to have
versions of some timing functions, taking/returning millisecond and/or
microsecond values as integers.

The most functionality we're interested in:

1. Delays
2. Relative time (from an arbitrary starting point, expected to be
3. Calculating time differences, with immunity to wrap-around.

The first presented assumption is to use "time.sleep()" for delays,
"time.monotonic()" for relative time as the base. Would somebody gave
alternative/better suggestions?

Second question is how to modify their names for
millisecond/microsecond versions. For sleep(), "msleep" and "usleep"
would be concise possibilities, but that doesn't map well to
monotonic(), leading to "mmonotonic". So, better idea is to use "_ms"
and "_us" suffixes:


Point 3 above isn't currently addressed by time module at all. mentions some internal
workaround for overflows/wrap-arounds on some systems. Due to
lean-ness of our hardware base, we'd like to make this matter explicit
to the applications and avoid internal workarounds. Proposed solution
is to have time.elapsed(time1, time2) function, which can take values
as returned by monotonic_ms(), monotonic_us(). Assuming that results of
both functions are encoded and wrap consistently (this is reasonable
assumption), there's no need for 2 separate elapsed_ms(), elapsed_us()

So, the above are rough ideas we (well, I) have. We'd like to get wider
Python community feedback on them, see if there're better/alternative
ideas, how Pythonic it is, etc. To clarify, this should not be construed
as proposal to add the above functions to CPython.

Best regards,
 Paul                          mailto:pmiscml at

From greg at  Tue Jun 23 01:29:17 2015
From: greg at (Gregory P. Smith)
Date: Mon, 22 Jun 2015 23:29:17 +0000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 22, 2015 at 3:51 PM Sturla Molden <sturla.molden at>

> "Gregory P. Smith" <greg at> wrote:
> > What this means for subinterpreters in this case is not much different
> from
> > starting up multiple worker processes: You need to start them up and wait
> > for them to be ready to serve, then reuse them as long as feasible before
> > recycling them to start up a new one. The startup cost is high.
> The statup cost for worker processes is high on Windows. It is very small
> on nearly any other OS.

While I understand that Windows adds some overhead there, startup time for
Python worker processes is high on all OSes.

Python startup is slow in general. It slows down further based on the
modules you must import before you can begin work.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jeanpierreda at  Tue Jun 23 01:51:34 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Mon, 22 Jun 2015 16:51:34 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 22, 2015 at 4:29 PM, Gregory P. Smith <greg at> wrote:
> On Mon, Jun 22, 2015 at 3:51 PM Sturla Molden <sturla.molden at>
> wrote:
>> "Gregory P. Smith" <greg at> wrote:
>> > What this means for subinterpreters in this case is not much different
>> > from
>> > starting up multiple worker processes: You need to start them up and
>> > wait
>> > for them to be ready to serve, then reuse them as long as feasible
>> > before
>> > recycling them to start up a new one. The startup cost is high.
>> The statup cost for worker processes is high on Windows. It is very small
>> on nearly any other OS.
> While I understand that Windows adds some overhead there, startup time for
> Python worker processes is high on all OSes.
> Python startup is slow in general. It slows down further based on the
> modules you must import before you can begin work.

Python does *very* little work on fork, which is what Sturla is
alluding to. (Fork doesn't exist on Windows.)

The only part I've found forking to be slow with is if you need to
delay initialization of a thread pool and everything that depends on a
thread pool until after the fork. This could hypothetically be made
faster with subinterpreters if the thread pool was shared among all
subinterpreters (e.g. if it was written in C.), but I would *expect*
fork to be faster overall.

That said, worker startup time is not actually very interesting
anyway, since workers should restart rarely. I think its biggest
impact is probably the time it takes to start your entire task from

-- Devin

From njs at  Tue Jun 23 01:59:40 2015
From: njs at (Nathaniel Smith)
Date: Mon, 22 Jun 2015 16:59:40 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 22, 2015 at 10:37 AM, Gregory P. Smith <greg at> wrote:
> This is an important oddity of subinterpreters: They have to re-import
> everything other than extension modules. When you've got a big process with
> a ton of modules (like, say, 100s of protocol buffers...), that's going to
> be a non-starter (pun intended) for the use of threads+subinterpreters as a
> fast form of concurrency if they need to import most of those from each
> subinterpreter. startup latency and cpu usage += lots. (possibly uses more
> memory as well but given our existing refcount implementation forcing
> needless PyObject page writes during a read causing fork to copy-on-write...
> impossible to guess)
> What this means for subinterpreters in this case is not much different from
> starting up multiple worker processes: You need to start them up and wait
> for them to be ready to serve, then reuse them as long as feasible before
> recycling them to start up a new one. The startup cost is high.

One possibility would be for subinterpreters to copy modules from the
main interpreter -- I guess your average module is mostly dicts,
strings, type objects, and functions; strings and functions are
already immutable and could be shared without copying, and I guess
copying the dicts and type objects into the subinterpreter is much
cheaper than hitting the disk etc. to do a real import. (Though
certainly not free.)

This would have interesting semantic implications -- it would give
similar effects to fork(), with subinterpreters starting from a
snapshot of the main interpreter's global state.

> I'm not entirely sold on this overall proposal, but I think a result of it
> could be to make our subinterpreter support better which would be a good
> thing.
> We have had to turn people away from subinterpreters in the past for use as
> part of their multithreaded C++ server where they wanted to occasionally run
> some Python code in embedded interpreters as part of serving some requests.
> Doing that would suddenly single thread their application (GIIIIIIL!) for
> all requests currently executing Python code despite multiple
> subinterpreters.

I've also talked to HPC users who discovered this problem the hard way
(e.g., folks working on the Large Hadron
Collider) -- they've been using Python as an extension language in
some large physics codes but are now porting those bits to C++ because
of the GIL issues. (In this context startup overhead should be easily
amortized, but switching to an RPC model is not going to happen.)


Nathaniel J. Smith --

From rosuav at  Tue Jun 23 01:59:51 2015
From: rosuav at (Chris Angelico)
Date: Tue, 23 Jun 2015 09:59:51 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 23, 2015 at 3:03 AM, Gregory P. Smith <greg at> wrote:
> On Sat, Jun 20, 2015 at 5:42 PM Chris Angelico <rosuav at> wrote:
>> On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow <ericsnowcurrently at>
>> wrote:
>> > * disallow forking within subinterpreters
>> I love the idea as a whole (if only because the detractors can be told
>> "Just use subinterpreters, then you get concurrency"), but this seems
>> like a tricky restriction. That means no subprocess.Popen, no shelling
>> out to other applications. And I don't know what of other restrictions
>> might limit any given program. Will it feel like subinterpreters are
>> "write your code according to these tight restrictions and it'll
>> work", or will it be more of "most programs will run in parallel just
>> fine, but there are a few things to be careful of"?
> It wouldn't disallow use of subprocess, only os.fork(). C extension modules
> can alway fork. The restriction being placed in this scheme is: "if your
> extension module code forks from a subinterpreter, the child process MUST
> not return control to Python."
> I'm not sure if this restriction would actually be needed or not but I agree
> with it regardless.

Oh! That's fine, then. Sounds good to me!


From rosuav at  Tue Jun 23 02:03:10 2015
From: rosuav at (Chris Angelico)
Date: Tue, 23 Jun 2015 10:03:10 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 23, 2015 at 9:59 AM, Nathaniel Smith <njs at> wrote:
> One possibility would be for subinterpreters to copy modules from the
> main interpreter -- I guess your average module is mostly dicts,
> strings, type objects, and functions; strings and functions are
> already immutable and could be shared without copying, and I guess
> copying the dicts and type objects into the subinterpreter is much
> cheaper than hitting the disk etc. to do a real import. (Though
> certainly not free.)

FWIW, functions aren't immutable, but code objects are.


From greg at  Tue Jun 23 02:03:14 2015
From: greg at (Gregory P. Smith)
Date: Tue, 23 Jun 2015 00:03:14 +0000
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <20150623021530.74ce1ebe@x230>
References: <20150623021530.74ce1ebe@x230>
Message-ID: <>

On Mon, Jun 22, 2015 at 4:15 PM Paul Sokolovsky <pmiscml at> wrote:

> Hello from MicroPython, a lean Python implementation
> scaling down to run even on microcontrollers
> (
> Our target hardware base oftentimes lacks floating point support, and
> using software emulation is expensive. So, we would like to have
> versions of some timing functions, taking/returning millisecond and/or
> microsecond values as integers.
> The most functionality we're interested in:
> 1. Delays
> 2. Relative time (from an arbitrary starting point, expected to be
>    wrapped)
> 3. Calculating time differences, with immunity to wrap-around.
> The first presented assumption is to use "time.sleep()" for delays,
> "time.monotonic()" for relative time as the base. Would somebody gave
> alternative/better suggestions?
> Second question is how to modify their names for
> millisecond/microsecond versions. For sleep(), "msleep" and "usleep"
> would be concise possibilities, but that doesn't map well to
> monotonic(), leading to "mmonotonic". So, better idea is to use "_ms"
> and "_us" suffixes:
> sleep_ms()
> sleep_us()
> monotonic_ms()
> monotonic_us()

If you're going to add new function names, going with the _unit suffix
seems best.

Another option to consider: keyword only arguments.

# We could use the long form names milliseconds, microseconds and
nanoseconds but i worry with those that people would inevitably confuse ms
with microseconds as times and APIs usually given the standard
abbreviations rather than spelled out.

time.monotonic(return_int_ns=True) ?
# This seems ugly.  time.monotonic_ns() seems better.

These should be acceptable to add to Python 3.6 for consistency.

I do not think we should have functions for each ms/us/ns unit if adding
functions.  Just choose the most useful high precision unit and let people
do the math as needed for the others.

Point 3 above isn't currently addressed by time module at all.
> mentions some internal
> workaround for overflows/wrap-arounds on some systems. Due to
> lean-ness of our hardware base, we'd like to make this matter explicit
> to the applications and avoid internal workarounds. Proposed solution
> is to have time.elapsed(time1, time2) function, which can take values
> as returned by monotonic_ms(), monotonic_us(). Assuming that results of
> both functions are encoded and wrap consistently (this is reasonable
> assumption), there's no need for 2 separate elapsed_ms(), elapsed_us()
> function.

Reading the PEP my takeaway is that wrap-around of underlying deficient
system APIs should be handled by the Python VM for the user. It sounds like
we should explicitly spell this out though.

I don't think time.elapsed() could ever provide any utility in either case,
just use subtraction. time.elapsed() wouldn't know when and where the time
values came from and magically be able to apply wrap around or not to them.


So, the above are rough ideas we (well, I) have. We'd like to get wider
> Python community feedback on them, see if there're better/alternative
> ideas, how Pythonic it is, etc. To clarify, this should not be construed
> as proposal to add the above functions to CPython.
> --
> Best regards,
>  Paul                          mailto:pmiscml at
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From cs at  Tue Jun 23 01:00:41 2015
From: cs at (Cameron Simpson)
Date: Tue, 23 Jun 2015 09:00:41 +1000
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

On 22Jun2015 01:42, Devin Jeanpierre <jeanpierreda at> wrote:
>On Mon, Jun 22, 2015 at 12:52 AM, Cameron Simpson <cs at> wrote:
>> On 21Jun2015 19:16, Devin Jeanpierre <jeanpierreda at> wrote:
>>> Operations that run in C but just
>>> take a long time, or that are part of third-party code, will continue
>>> to inhibit responsiveness.
>>> - Run signal handlers in a dedicated separate thread.
>>> IMO this is generally better than running signal handlers in the main
>>> thread, because it eliminates the separate concept of "async-safe" and
>>> just requires "thread-safe". So you can use regular threading
>>> synchronization primitives for safety, instead of relying on luck /
>>> memorized lists of atomic/re-entrant operations.
>> Yes, I am in favour of this or something like it. Personally I would go for
>> either or both of:
>>  - a stdlib function to specify the thread to handle signals instead of main
>This just moves the problem to another thread. One can already today
>try to keep the main thread free to handle signals, it's just hard.

Yes. But it is very easy to ensure that a specifial purpose Thread is free to 
handle signals. And it is arguably the minimalist change.

>>  - a stdlib function to declare that signals should immediately place a nice
>> descriptive "signal" object on a Queue, and leaves it to the user to handle
>> the queue (for example, by spawning a thread to consume it)
>I like this. It mirror's Linux's selectfd, too. One small correction,
>it can't literally be a Queue, because those aren't safe to use in
>signal handlers. (It can be a pipe that is wrapped in a Queue-like
>interface, though, and if we do that, we can even use native signalfd
>if we want.)
>It also resolves an unspoken concern I had, which is that silently
>starting threads for the user feels icky.

I wasn't proposing silently starting threads. I imagined the former suggestion 
would be handed a thread as the signal target.

>>> Something still needs to run in the main thread though, for e.g.
>>> KeyboardInterrupt, so this is not super straightforward.
>> Is this necessarily true?
>What I mean is that there needs to be a way to raise KeyboardInterrupt
>in the main thread from a signal handler. If, as you suggest, the old
>behavior stays around, then that's enough.

I was imagining the old behaviour stayed around by default, not necessarily as 
fixed behaviour. But "KeyboardInterrupt occurs in the main thread" is handy.

Perhaps a better solution here is not to keep KeyboardInterrupt special (i.e.  
always going to the main thread) but to extend "raise" to accept a thread 

  raise blah in thread

Given that signals are already presented as occuring between Python opcodes, it 
seems reasonable to me that the signal situation could be addressed with a 
common mechanism extended to exceptions.

How often is the question "how do I terminate another thread?" raised on 
python-list? Often. The standard answer is "set a flag and have the thread 
consult it". That is very sensitive to how often the flag is polled: too often 
and it infuses the code with noise (not to mention ungainly loop termination 
logic etc), too infrequently and the response is much like your issue here with 
signals: it can be arbitrarily delayed.

Suppose one could raise signals in another thread? Then the answer becomes 
"raise exception in other_thread". And the other thread will abort as soon as 
the next python opcode would fire.

It has several advantages:

  it removes any need to poll some shared state, or the set up shared state

  it lets the target thread remain nice and pythonic, letting unhandled 
  exceptions simply abort the thread automatically as they would anyway

  it lets the target thread catch the exception and handle it if desired

  it dovetails neatly with our hypothetical special signal handling thread: the 
  handling thread has merely to "raise KeyboardInterrupt in main_thread" to get 
  the behaviour you seek to preserve, _without_ making SIGINT specially handled 
  - the specialness is not an aspect of the handling thread's code, not 

>Another option, if we went with a dedicated signal handling thread,
>would be that uncaught exceptions propagate to the main thread when it
>gets around to it.

Perhaps. But I'd rather not; you _can_ always catch every exception and if we 
have "raise exception in thread" we can implement the above trivially for 
programs which want it.

Cameron Simpson <cs at>

Nothing is impossible for the man who doesn't have to do it.

From alexander.belopolsky at  Tue Jun 23 02:35:45 2015
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Mon, 22 Jun 2015 20:35:45 -0400
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <>
References: <20150623021530.74ce1ebe@x230>
Message-ID: <>

On Mon, Jun 22, 2015 at 8:03 PM, Gregory P. Smith <greg at> wrote:

> # We could use the long form names milliseconds, microseconds and
> nanoseconds but i worry with those that people would inevitably confuse ms
> with microseconds as times and APIs usually given the standard
> abbreviations rather than spelled out.

Note that datetime.timedelta uses long names:

>>> timedelta(milliseconds=5, microseconds=3)
datetime.timedelta(0, 0, 5003)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg at  Tue Jun 23 02:39:14 2015
From: greg at (Gregory P. Smith)
Date: Tue, 23 Jun 2015 00:39:14 +0000
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <>
References: <20150623021530.74ce1ebe@x230>
Message-ID: <>

On Mon, Jun 22, 2015 at 5:35 PM Alexander Belopolsky <
alexander.belopolsky at> wrote:

> On Mon, Jun 22, 2015 at 8:03 PM, Gregory P. Smith <greg at> wrote:
>> # We could use the long form names milliseconds, microseconds and
>> nanoseconds but i worry with those that people would inevitably confuse ms
>> with microseconds as times and APIs usually given the standard
>> abbreviations rather than spelled out.
> Note that datetime.timedelta uses long names:
> >>> timedelta(milliseconds=5, microseconds=3)
> datetime.timedelta(0, 0, 5003)

That is a good vote for consistency with its API...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Tue Jun 23 05:52:47 2015
From: ncoghlan at (Nick Coghlan)
Date: Tue, 23 Jun 2015 13:52:47 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 23 June 2015 at 10:03, Chris Angelico <rosuav at> wrote:
> On Tue, Jun 23, 2015 at 9:59 AM, Nathaniel Smith <njs at> wrote:
>> One possibility would be for subinterpreters to copy modules from the
>> main interpreter -- I guess your average module is mostly dicts,
>> strings, type objects, and functions; strings and functions are
>> already immutable and could be shared without copying, and I guess
>> copying the dicts and type objects into the subinterpreter is much
>> cheaper than hitting the disk etc. to do a real import. (Though
>> certainly not free.)
> FWIW, functions aren't immutable, but code objects are.

Anything we come up with for optimised data sharing via channels could
be applied to passing a prebuilt sys.modules dictionary through to

The key for me is to start from a well-defined "shared nothing"
semantic model, but then look for ways to exploit the fact that we
actually *are* running in the same address space to avoid copy

The current reference-counts-embedded-in-the-object-structs memory
layout also plays havoc with the all-or-nothing page level
copy-on-write semantics used by the fork() syscall at the operating
system layer, so some of the ideas we've been considering
(specifically, those related to moving the reference counter
bookkeeping out of the object structs themselves) would potentially
help with that as well (but would also have other hard to predict
performance consequences).

There's a reason Eric announced this as the *start* of a research
project, rather than as a finished proposal - while it seems
conceptually sound overall, there are a vast number of details to be
considered that will no doubt hold a great many devils :)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From cs at  Tue Jun 23 06:37:46 2015
From: cs at (Cameron Simpson)
Date: Tue, 23 Jun 2015 14:37:46 +1000
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

On 22Jun2015 16:53, Paul Moore <p.f.moore at> wrote:
>On 22 June 2015 at 09:42, Devin Jeanpierre <jeanpierreda at> wrote:
>> On Mon, Jun 22, 2015 at 12:52 AM, Cameron Simpson <cs at> wrote:
>>> On 21Jun2015 19:16, Devin Jeanpierre <jeanpierreda at> wrote:
>>>> This is not possible in general, but it can be made to work for all
>>>> blocking operations in the stdlib.
>>> Hmm. I'm not sure that you will find this universally so. No, I have no
>>> examples proving my intuition here.
>> If you fix select, everything else can hypothetically follow, as you
>> can run all C functions in another thread and signal the result is
>> ready using an fd. (This is my idea for a hacky third-party library.
>> It only ruins stack traces!)
>This particular approach presumably only works on Unix? (On Windows,
>select is not a general signalling operation, it only works for
>sockets). Presumably a cross-platform solution would need to use
>appropriate OS-native signalling based on the platform?


Cameron Simpson <cs at>

The thought of suicide is a comforting one, for with it has come a calm
passage through many a bad night.       - Fred Nieztsche

From cs at  Tue Jun 23 06:44:11 2015
From: cs at (Cameron Simpson)
Date: Tue, 23 Jun 2015 14:44:11 +1000
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

On 22Jun2015 10:28, Guido van Rossum <guido at> wrote:
>I would regret losing the behavior where just raising an exception in a
>signal handler causes the main thread to be interrupted by that exception.

I don't think any of us is sguuesting losing that as the default situation. I 
can see that losing this could be a side effect of a program choosing one of 
these alternatives.

>I agree it would be nice if handlers ran when the main thread is waiting
>for I/O.

Hmm. That sounds doable (I speak as one totally unfamiliar with CPython's 
internals:-) Is this affected or improved by the recent discussions about I/O 
restarting over a signal?

Cameron Simpson <cs at>

No system, regardless of how sophisticated, can repeal the laws of physics or
overcome careless driving actions.      - Mercedes Benz

From abarnert at  Tue Jun 23 07:42:13 2015
From: abarnert at (Andrew Barnert)
Date: Mon, 22 Jun 2015 22:42:13 -0700
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

On Jun 22, 2015, at 16:00, Cameron Simpson <cs at> wrote:
> Perhaps a better solution here is not to keep KeyboardInterrupt special (i.e.  always going to the main thread) but to extend "raise" to accept a thread argument:
> raise blah in thread

Does this need to be syntax? Why not just:


This could even use the same mechanism as signals in 3.6, while possibly being backportable to something hackier in a C extension module for older versions.

From random832 at  Tue Jun 23 07:56:50 2015
From: random832 at (random832 at
Date: Tue, 23 Jun 2015 01:56:50 -0400
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <20150622004157.52bc3239@fsol>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015, at 18:41, Antoine Pitrou wrote:
> It's actually already the case in POSIX that most things are illegal
> between fork() and exec(). However, to make fork() practical, many
> libraries or frameworks tend to ignore those problems deliberately.

I'm not _entirely_ sure that this applies to single-threaded programs,
or even to multi-threaded programs that don't use constructs that will
cause problems.

The text is: "A process shall be created with a single thread. If a
multi-threaded process calls fork(), the new process shall contain a
replica of the calling thread and its entire address space, possibly
including the states of mutexes and other resources. Consequently, to
avoid errors, the child process may only execute async-signal-safe
operations until such time as one of the exec functions is called. Fork
handlers may be established by means of the pthread_atfork() function in
order to maintain application invariants across fork() calls."

Note that it uses "may only" (which is ambiguous) rather than "shall
only". It could be read that "only [stuff] until exec" is a suggestion
of what the child process "may" do, under the circumstances described,
to avoid the particular problems being discussed, rather than as a
general prohibition.

And the next paragraph is "When the application calls fork() from a
signal handler and any of the fork handlers registered by
pthread_atfork() calls a function that is not async-signal-safe, the
behavior is undefined." suggesting that the behavior is _not_ likewise
undefined when it was not called from a signal handler.

Now, *vfork* is a ridiculous can of worms, which is why nobody uses it
anymore, and certainly not within Python.

From njs at  Tue Jun 23 08:18:24 2015
From: njs at (Nathaniel Smith)
Date: Mon, 22 Jun 2015 23:18:24 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 22, 2015 at 10:56 PM,  <random832 at> wrote:
> On Sun, Jun 21, 2015, at 18:41, Antoine Pitrou wrote:
>> It's actually already the case in POSIX that most things are illegal
>> between fork() and exec(). However, to make fork() practical, many
>> libraries or frameworks tend to ignore those problems deliberately.
> I'm not _entirely_ sure that this applies to single-threaded programs,
> or even to multi-threaded programs that don't use constructs that will
> cause problems.
> The text is: "A process shall be created with a single thread. If a
> multi-threaded process calls fork(), the new process shall contain a
> replica of the calling thread and its entire address space, possibly
> including the states of mutexes and other resources. Consequently, to
> avoid errors, the child process may only execute async-signal-safe
> operations until such time as one of the exec functions is called. Fork
> handlers may be established by means of the pthread_atfork() function in
> order to maintain application invariants across fork() calls."
> Note that it uses "may only" (which is ambiguous) rather than "shall
> only". It could be read that "only [stuff] until exec" is a suggestion
> of what the child process "may" do, under the circumstances described,
> to avoid the particular problems being discussed, rather than as a
> general prohibition.

Yeah, basically the way this works out is: (a) in practice on
mainstream systems you can get away with forking and then doing
whatever, so long as none of the threads in the parent process were
holding any crucial locks, and the child is prepared for them to have
all disappeared. (b) But, if something does break, then system
builders reserve the right to laugh in your face. You can argue about
things being technically ambiguous or whatever, but that's how it

E.g. if you have a single-threaded program that does a matrix
multiply, then forks, and then the child does a matrix multiply, and
you run it on OS X linked to Apple's standard libraries, then the
child will lock up, and if you report this to Apple they will close it
as not-a-bug.

> And the next paragraph is "When the application calls fork() from a
> signal handler and any of the fork handlers registered by
> pthread_atfork() calls a function that is not async-signal-safe, the
> behavior is undefined." suggesting that the behavior is _not_ likewise
> undefined when it was not called from a signal handler.

I wouldn't read anything into this. pthread_atfork registers three
handlers, and two of them are run in the parent process, where
normally they'd be allowed to call any functions they like.


Nathaniel J. Smith --

From cs at  Tue Jun 23 08:08:45 2015
From: cs at (Cameron Simpson)
Date: Tue, 23 Jun 2015 16:08:45 +1000
Subject: [Python-ideas] Responsive signal handling
In-Reply-To: <>
References: <>
Message-ID: <>

On 22Jun2015 22:42, Andrew Barnert <abarnert at> wrote:
>On Jun 22, 2015, at 16:00, Cameron Simpson <cs at> wrote:
>> Perhaps a better solution here is not to keep KeyboardInterrupt special (i.e.  always going to the main thread) but to extend "raise" to accept a thread argument:
>> raise blah in thread
>Does this need to be syntax? Why not just:
>    mythread.throw(blah)
>This could even use the same mechanism as signals in 3.6, while possibly being backportable to something hackier in a C extension module for older versions.

Indeed. I think that extending raise's syntax is a little easier on the eye, 
but the advantage is small. Certainly giving threads a throw method would 
function as well.

I was indeed hoping that signals and exceptions could be delivered the same way 
via such a mechanism, which would also allow signals to be delivered to a 
chosen thread.

Cameron Simpson <cs at>

Thus spake the master programmer: "A well written program is its own heaven;
a poorly-written program its own hell."

From sturla.molden at  Tue Jun 23 13:57:47 2015
From: sturla.molden at (Sturla Molden)
Date: Tue, 23 Jun 2015 13:57:47 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmbhjs$pau$>

On 23/06/15 01:29, Gregory P. Smith wrote:

> While I understand that Windows adds some overhead there, startup time
> for Python worker processes is high on all OSes.

No it is not.

A fork() will clone the process. You don't need to run any 
initialization code after that. You don't need to start a new Python 
interpreter -- you already have one. You don't need to run module 
imports -- they are already imported. You don't need to pickle and build 
Python objects -- they are already there. Everything you had in the 
parent process is ready to use the child process. This magic happens so 
fast it is comparable to the time it takes Windows to start a thread.

On Windows, CreateProcess starts an "almost empty" process. You 
therefore have a lot of setup code to run. This is what makes starting 
Python processes with multiprocessing so much slower on Windows. It is 
not that Windows processes are more hevy-weight than threads, they are, 
but the real issue is all the setup code you need to run.

On Linux and Mac, you don't need to run any setup code code after a fork().


From j.wielicki at  Tue Jun 23 15:14:25 2015
From: j.wielicki at (Jonas Wielicki)
Date: Tue, 23 Jun 2015 15:14:25 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmbhjs$pau$>
References: <>
Message-ID: <>

On 23.06.2015 13:57, Sturla Molden wrote:
> On 23/06/15 01:29, Gregory P. Smith wrote:
>> While I understand that Windows adds some overhead there, startup time
>> for Python worker processes is high on all OSes.
> No it is not.
> A fork() will clone the process. You don't need to run any
> initialization code after that. You don't need to start a new Python
> interpreter -- you already have one. You don't need to run module
> imports -- they are already imported. You don't need to pickle and build
> Python objects -- they are already there. Everything you had in the
> parent process is ready to use the child process. This magic happens so
> fast it is comparable to the time it takes Windows to start a thread.

To be fair, you will nevertheless get a slowdown when copy-on-write
kicks in while first using whatever was cloned from the parent. This is
nothing which blocks execution, but slows down execution.

That is no time which can directly be measured during the fork() call,
but I would still count it into start up cost.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <>

From trent at  Tue Jun 23 15:53:01 2015
From: trent at (Trent Nelson)
Date: Tue, 23 Jun 2015 09:53:01 -0400
Subject: [Python-ideas] PyParallel update (was: solving multi-core Python)
Message-ID: <>

On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote:
> Furthermore, removing the GIL is perhaps an obvious solution but not
> the only one.  Others include Trent Nelson's PyParallels, STM, and
> other Python implementations..

So, I've been sprinting relentlessly on PyParallel since Christmas, and
recently reached my v0.0 milestone of being able to handle all the TEFB
tests, plus get the "instantaneous wiki search" thing working too.

The TEFB (Techempower Framework Benchmarks) implementation is here:
    (The aim was to have it compete in this:, but unfortunately they broke their Windows support after round 9, so there's no way to get PyParallel into the official results without fixing that first.)

The wiki thing is here:

I particularly like the wiki example as it leverages a lot of benefits
afforded by PyParallel's approach to parallelism, concurrency and
asynchronous I/O:
    - Load a digital search trie (datrie.Trie) that contains every
      Wikipedia title and the byte-offset within the wiki.xml where
      the title was found.  (Once loaded the RSS of python.exe is about
      11GB; the trie itself has about 16 million items in it.)
    - Load a numpy array of sorted 64-bit integer offsets.  This allows
      us to do a searchsorted() (binary search) against a given offset
      in order to derive the next offset.
    - Once we have a way of getting two byte offsets, we can use ranged
      HTTP requests (and TransmitFile behind the scenes) to efficiently
      read random chunks of the file asynchronously.  (Windows has a
      huge advantage here -- there's simply no way to achieve similar
      functionality on POSIX in a non-blocking fashion (sendfile can
      block, a disk read() can block, a memory reference into a mmap'd
      file that isn't in memory will page fault, which will block).)

The performance has far surpassed anything I could have imagined back
during the async I/O discussions in September 2012, so, time to stick a
fork in it and document the experience, which is what I'll be working on
in the coming weeks.

In the mean time:
    - There are installers available here for those that wish to play
      around with the current state of things:
    - I wrote a little helper thing that diffs the hg tree against the
      original v3.3.5 tag I based the work off and committed the diffs
      directly -- this provides a way to review the changes that were
      made in order to get to the current level of functionality:

      (It only includes files that existed in the v3.3.5 tag, I don't
       include diffs for new files I've added.)

It's probably useful reviewing the diffs after perusing pyparallel.h: you'll see lots of guards in place in most of the diffs. E.g.:

Py_GUARD()  -- make sure we never hit this from a parallel context
Px_GUARD()  -- make sure we never hit this from a main thread
Py_GUARD_OBJ(o) -- make sure object o is always a main thread object
Px_GUARD_OBJ(o) -- make sure object o is always a parallel object
PyPx_GUARD_OBJ(o) -- if we're a parallel context, make sure it's a
                     parallel object, if we're a main thread, make
                     sure it's a main thread object.

If you haven't heard of PyParallel before, this might be a good place to

The core concepts haven't really changed since here (re: parallel
contexts, main thread, main thread objects, parallel thread objects):

Basically, if we're a main thread, "do what we normally do", if we're a
parallel thread, "divert to a thread-safe alternative".

And a final note: I like the recent async additions.  I mean, it's
unfortunate that the new keyword clashes with the module name I used to
hide all the PyParallel trickery, but I'm at the point now where calling
something like this from within a parallel context is exactly what I
    async f.write(...)
    async cursor.execute(...)

I've been working on PyParallel on-and-off now for ~2.5 years and have
learned a lot and churned out a lot of code -- documenting it all is
actually somewhat daunting (where do I start?!), so, if anyone has
specific questions about how I addressed certain things, I'm more than
happy to elicit more detail on specifics.


From trent at  Tue Jun 23 16:03:55 2015
From: trent at (Trent Nelson)
Date: Tue, 23 Jun 2015 10:03:55 -0400
Subject: [Python-ideas] PyParallel update (was: solving multi-core
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 23, 2015 at 09:53:01AM -0400, Trent Nelson wrote:
> On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote:
> > Furthermore, removing the GIL is perhaps an obvious solution but not
> > the only one.  Others include Trent Nelson's PyParallels, STM, and
> > other Python implementations..
> So, I've been sprinting relentlessly on PyParallel since Christmas, and
> recently reached my v0.0 milestone of being able to handle all the TEFB
> tests, plus get the "instantaneous wiki search" thing working too.
> The TEFB (Techempower Framework Benchmarks) implementation is here:
>     (The aim was to have it compete in this:, but unfortunately they broke their Windows support after round 9, so there's no way to get PyParallel into the official results without fixing that first.)
> The wiki thing is here:
> I particularly like the wiki example as it leverages a lot of benefits
> afforded by PyParallel's approach to parallelism, concurrency and
> asynchronous I/O:
>     - Load a digital search trie (datrie.Trie) that contains every
>       Wikipedia title and the byte-offset within the wiki.xml where
>       the title was found.  (Once loaded the RSS of python.exe is about
>       11GB; the trie itself has about 16 million items in it.)

Oops, I was off by about 12 million:

    PyParallel 3.3.5 (3.3-px:829ae345012e+, Jun 15 2015, 16:54:16) [MSC v.1600 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> os.chdir('examples\\wiki')
    >>> import wiki as w
    About to load titles trie, this will take a while...
    >>> len(w.titles)

From sturla.molden at  Tue Jun 23 16:55:31 2015
From: sturla.molden at (Sturla Molden)
Date: Tue, 23 Jun 2015 16:55:31 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
 <mmbhjs$pau$> <>
Message-ID: <mmbs15$cpk$>

On 23/06/15 15:14, Jonas Wielicki wrote:

> To be fair, you will nevertheless get a slowdown when copy-on-write
> kicks in while first using whatever was cloned from the parent. This is
> nothing which blocks execution, but slows down execution.

Yes, particularly because of reference counts. Unfortunately Python 
stores refcounts within the PyObject struct. And when a refcount is 
updated a copy of the entire 4 KB page is triggered. There would be fare 
less of this if refcounts was kept in dedicated pages.


From barry at  Tue Jun 23 18:01:18 2015
From: barry at (Barry Warsaw)
Date: Tue, 23 Jun 2015 12:01:18 -0400
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <>

On Jun 23, 2015, at 01:52 PM, Nick Coghlan wrote:

>The current reference-counts-embedded-in-the-object-structs memory
>layout also plays havoc with the all-or-nothing page level
>copy-on-write semantics used by the fork() syscall at the operating
>system layer, so some of the ideas we've been considering
>(specifically, those related to moving the reference counter
>bookkeeping out of the object structs themselves) would potentially
>help with that as well (but would also have other hard to predict
>performance consequences).

A crazy offshoot idea would be something like Emacs' unexec, where during the
build process you could preload a bunch of always-used immutable modules, then
freeze the state in such a way that starting up again later would be much
faster, because the imports (and probably more importantly, the searching)
could be avoided.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <>

From jsbueno at  Tue Jun 23 18:55:09 2015
From: jsbueno at (Joao S. O. Bueno)
Date: Tue, 23 Jun 2015 13:55:09 -0300
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <>
References: <20150623021530.74ce1ebe@x230>
Message-ID: <>

For new functions altogether, maybe namespaces could be the a nice option --

from time.milliseconds import sleep, monotonic

Named parameters would be a better way to implement it,though - I just
don't know if having to go through the function
that does have to be ready to handle floats anyway won't be in the way
of the desired optimization

On 22 June 2015 at 21:39, Gregory P. Smith <greg at> wrote:
> On Mon, Jun 22, 2015 at 5:35 PM Alexander Belopolsky
> <alexander.belopolsky at> wrote:
>> On Mon, Jun 22, 2015 at 8:03 PM, Gregory P. Smith <greg at> wrote:
>>> # We could use the long form names milliseconds, microseconds and
>>> nanoseconds but i worry with those that people would inevitably confuse ms
>>> with microseconds as times and APIs usually given the standard abbreviations
>>> rather than spelled out.
>> Note that datetime.timedelta uses long names:
>> >>> timedelta(milliseconds=5, microseconds=3)
>> datetime.timedelta(0, 0, 5003)
> That is a good vote for consistency with its API...
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

From trent at  Tue Jun 23 20:29:40 2015
From: trent at (Trent Nelson)
Date: Tue, 23 Jun 2015 14:29:40 -0400
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mm64be$eq5$>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 12:40:43PM +0200, Stefan Behnel wrote:
> Nick Coghlan schrieb am 21.06.2015 um 12:25:
> > On 21 June 2015 at 19:48, Antoine Pitrou wrote:
> >> On Sun, 21 Jun 2015 16:31:33 +1000 Nick Coghlan wrote:
> >>>
> >>> For inter-interpreter communication, the worst case scenario is having
> >>> to rely on a memcpy based message passing system (which would still be
> >>> faster than multiprocessing's serialisation + IPC overhead)
> >>
> >> And memcpy() updates pointer references to dependent objects magically?
> >> Surely you meant the memdeepcopy() function that's part of every
> >> standard C library!
> > 
> > We already have the tools to do deep copies of object trees (although
> > I'll concede I *was* actually thinking in terms of the classic C/C++
> > mistake of carelessly copying pointers around when I wrote that
> > particular message). One of the options for deep copies tends to be a
> > pickle/unpickle round trip, which will still incur the serialisation
> > overhead, but not the IPC overhead.
> > 
> > "Faster message passing than multiprocessing" sets the baseline pretty
> > low, after all.
> > 
> > However, this is also why Eric mentions the notions of object
> > ownership or limiting channels to less than the full complement of
> > Python objects. As an *added* feature at the Python level, it's
> > possible to initially enforce restrictions that don't exist in the C
> > level subinterpeter API, and then work to relax those restrictions
> > over time.
> If objects can make it explicit that they support sharing (and preferably
> are allowed to implement the exact details themselves), I'm sure we'll find
> ways to share NumPy arrays across subinterpreters. That feature alone tends
> to be a quick way to make a lot of people happy.

    FWIW, the following commit was all it took to get NumPy playing
    nicely with PyParallel:

    It uses thread-local buckets instead of static ones, and calls out
    to PyMem_Raw(Malloc|Realloc|Calloc|Free) instead of the normal libc
    counterparts.  This means PyParallel will intercept the call within
    a parallel context and divert it to the per-context heap.

    Example parallel callback using NumPy:

    (Also, datrie is a Cython module, and that seems to work fine as
     well, which is neat, as it means you could sub out the entire
     Python callback with a Cythonized version, including all the
     relatively-slow-compared-to-C http header parsing that happens in


From greg at  Tue Jun 23 21:26:23 2015
From: greg at (Gregory P. Smith)
Date: Tue, 23 Jun 2015 19:26:23 +0000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 23, 2015 at 9:01 AM Barry Warsaw <barry at> wrote:

> On Jun 23, 2015, at 01:52 PM, Nick Coghlan wrote:
> >The current reference-counts-embedded-in-the-object-structs memory
> >layout also plays havoc with the all-or-nothing page level
> >copy-on-write semantics used by the fork() syscall at the operating
> >system layer, so some of the ideas we've been considering
> >(specifically, those related to moving the reference counter
> >bookkeeping out of the object structs themselves) would potentially
> >help with that as well (but would also have other hard to predict
> >performance consequences).
> A crazy offshoot idea would be something like Emacs' unexec, where during
> the
> build process you could preload a bunch of always-used immutable modules,
> then
> freeze the state in such a way that starting up again later would be much
> faster, because the imports (and probably more importantly, the searching)
> could be avoided.

I actually would like something like this for Python, but I want it to work
with hash randomization rather than freezing a single fixed hash seed. That
means you'd need to record the location of all hash tables and cached
hashes and fix them up after loading such a binary image at process start
time, much like processing relocations when loading a binary executable.
Non trivial.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pmiscml at  Tue Jun 23 22:25:00 2015
From: pmiscml at (Paul Sokolovsky)
Date: Tue, 23 Jun 2015 23:25:00 +0300
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <>
References: <20150623021530.74ce1ebe@x230>
Message-ID: <20150623232500.45efccdf@x230>


On Tue, 23 Jun 2015 00:03:14 +0000
"Gregory P. Smith" <greg at> wrote:


> > sleep_ms()
> > sleep_us()
> > monotonic_ms()
> > monotonic_us()
> >
> If you're going to add new function names, going with the _unit suffix
> seems best.
> Another option to consider: keyword only arguments.
> time.sleep(ms=31416)
> time.sleep(us=31415927)
> time.sleep(ns=31415296536)

That doesn't immediately map to usage for monotonic(), as you mention

Another issue is that keywords arguments on average (and for
MicroPython all the time) are less efficient than positional. Put it
other way,

t = monotonic_ns()
t = monotonic_ns() - t

is going to give lower number than

t = monotonic(ns=True)
t = monotonic(ns=True) - t

, and the closer it to 0, the better.

> # We could use the long form names milliseconds, microseconds and
> nanoseconds but i worry with those that people would inevitably
> confuse ms with microseconds as times and APIs usually given the
> standard abbreviations rather than spelled out.

Another issue is that full spellings are rather long. Logistically,
while function names can be expected to have autocompletion support,
keyword arguments not necessarily.

> time.monotonic(return_int_ns=True) ?
> # This seems ugly.  time.monotonic_ns() seems better.
> These should be acceptable to add to Python 3.6 for consistency.

Well, as I mentioned, I'm personally not looking for this to be
implemented in CPython right away. Ideally, this should be tested by >1
independent "embedded" Python implementation first, and only then, based
on the actual experience, submitted as a PEP. That's rather better than
"desktop" CPython, which doesn't care about all the subtle "embedded"
aspects "forced" a way to implement it.

> I do not think we should have functions for each ms/us/ns unit if
> adding functions.  Just choose the most useful high precision unit
> and let people do the math as needed for the others.

Well, that's one of examples of that "desktop" thinking ;-).
Consider for example that 2^32 microseconds is just over an hour, so
expressing everything in microseconds would require arbitrary-precision
integers, which may be just the same kind of burden for an embedded
system as floats.

> > Point 3 above isn't currently addressed by time module at all.
> > mentions some internal


> Reading the PEP my takeaway is that wrap-around of underlying
> deficient system APIs should be handled by the Python VM for the
> user. It sounds like we should explicitly spell this out though.

This is another point which is overlooked by "desktop" programmers -
time counters can, will, and do wrap around. Big OSes try hard to to
hide this fact, and indeed succeed well enough, so in cases when they
do fail, it has shattering effect (at least PR-wise) - Y2K, Y2K38
problems. For an embedded programmer wrapping counters is objective
reality, and we wouldn't like to hide that fact in MicroPython
(of course, only for these, newly introduced real-time precision time

> I don't think time.elapsed() could ever provide any utility in either
> case, just use subtraction. 

Can't work. Previous value of monotonic_us() is 65530, next value is
10, what does it tell you?

> time.elapsed() wouldn't know when and
> where the time values came from and magically be able to apply wrap
> around or not to them.

Well, as I mentioned, it's an API contract that elapsed() takes values
of monotonic_ms(), monotonic_us(), etc. functions, and knows law how
their values change (likely, apply unsigned power-of-2 modular
arithmetics). There's additional restriction that this change law for
all of monotonic_ms(), monotonic_us() is the same, but I personally
find this an acceptable restriction to not bloat API even further. (But
it is a restriction, for example, if nano/microsecond time source is
24-bit counter, than millisecond time is limited to 24 bits too).

> -gps

Best regards,
 Paul                          mailto:pmiscml at

From solipsis at  Tue Jun 23 22:46:23 2015
From: solipsis at (Antoine Pitrou)
Date: Tue, 23 Jun 2015 22:46:23 +0200
Subject: [Python-ideas] solving multi-core Python
References: <>
 <mm64be$eq5$> <>
Message-ID: <20150623224623.70e98b07@fsol>

Hey Trent,

You may be interested in this PR for Numpy:



>     FWIW, the following commit was all it took to get NumPy playing
>     nicely with PyParallel:

From greg at  Wed Jun 24 01:32:55 2015
From: greg at (Gregory P. Smith)
Date: Tue, 23 Jun 2015 23:32:55 +0000
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <20150623232500.45efccdf@x230>
References: <20150623021530.74ce1ebe@x230>
Message-ID: <>

On Tue, Jun 23, 2015 at 1:25 PM Paul Sokolovsky <pmiscml at> wrote:

> Hello,
> On Tue, 23 Jun 2015 00:03:14 +0000
> "Gregory P. Smith" <greg at> wrote:
> []
> > > sleep_ms()
> > > sleep_us()
> > > monotonic_ms()
> > > monotonic_us()
> > >
> >
> > If you're going to add new function names, going with the _unit suffix
> > seems best.
> >
> > Another option to consider: keyword only arguments.
> >
> > time.sleep(ms=31416)
> > time.sleep(us=31415927)
> > time.sleep(ns=31415296536)
> That doesn't immediately map to usage for monotonic(), as you mention
> below.
> Another issue is that keywords arguments on average (and for
> MicroPython all the time) are less efficient than positional. Put it
> other way,
> t = monotonic_ns()
> t = monotonic_ns() - t
> is going to give lower number than
> t = monotonic(ns=True)
> t = monotonic(ns=True) - t
> , and the closer it to 0, the better.
> > # We could use the long form names milliseconds, microseconds and
> > nanoseconds but i worry with those that people would inevitably
> > confuse ms with microseconds as times and APIs usually given the
> > standard abbreviations rather than spelled out.
> Another issue is that full spellings are rather long. Logistically,
> while function names can be expected to have autocompletion support,
> keyword arguments not necessarily.
> > time.monotonic(return_int_ns=True) ?
> > # This seems ugly.  time.monotonic_ns() seems better.
> >
> > These should be acceptable to add to Python 3.6 for consistency.
> Well, as I mentioned, I'm personally not looking for this to be
> implemented in CPython right away. Ideally, this should be tested by >1
> independent "embedded" Python implementation first, and only then, based
> on the actual experience, submitted as a PEP. That's rather better than
> "desktop" CPython, which doesn't care about all the subtle "embedded"
> aspects "forced" a way to implement it.
> > I do not think we should have functions for each ms/us/ns unit if
> > adding functions.  Just choose the most useful high precision unit
> > and let people do the math as needed for the others.
> Well, that's one of examples of that "desktop" thinking ;-).
> Consider for example that 2^32 microseconds is just over an hour, so
> expressing everything in microseconds would require arbitrary-precision
> integers, which may be just the same kind of burden for an embedded
> system as floats.
I know. I was actually hoping you'd respond on that point because I haven't
used micropython yet. I assumed it had bignum, or at least fixed "big"
64-bit number, support. But if it does not, having specific functions for
the needed resolutions makes a lot of sense.

> > Point 3 above isn't currently addressed by time module at all.
> > > mentions some internal
> []
> > Reading the PEP my takeaway is that wrap-around of underlying
> > deficient system APIs should be handled by the Python VM for the
> > user. It sounds like we should explicitly spell this out though.
> This is another point which is overlooked by "desktop" programmers -
> time counters can, will, and do wrap around. Big OSes try hard to to
> hide this fact, and indeed succeed well enough, so in cases when they
> do fail, it has shattering effect (at least PR-wise) - Y2K, Y2K38
> problems. For an embedded programmer wrapping counters is objective
> reality, and we wouldn't like to hide that fact in MicroPython
> (of course, only for these, newly introduced real-time precision time
> functions).

I still don't see how an elapsed() function taking two arbitrary integer
arguments could work in a meaningful manner.  Even if you assume they are
the same units, the only assumption that can be made is that if the second
int is lower than the first, at least one wraparound occurred.

> I don't think time.elapsed() could ever provide any utility in either
> > case, just use subtraction.
> Can't work. Previous value of monotonic_us() is 65530, next value is
> 10, what does it tell you?

At least one wrap around occurred. without more information you cannot know
how many.

> time.elapsed() wouldn't know when and
> > where the time values came from and magically be able to apply wrap
> > around or not to them.
> Well, as I mentioned, it's an API contract that elapsed() takes values
> of monotonic_ms(), monotonic_us(), etc. functions, and knows law how
> their values change (likely, apply unsigned power-of-2 modular
> arithmetics). There's additional restriction that this change law for
> all of monotonic_ms(), monotonic_us() is the same, but I personally
> find this an acceptable restriction to not bloat API even further. (But
> it is a restriction, for example, if nano/microsecond time source is
> 24-bit counter, than millisecond time is limited to 24 bits too).

I guess what I'm missing is how you intend to tell elapsed() which of the
_ms vs _us vs _ns functions the values came from. I'm assuming that all
functions are likely to exist at once rather than there being only one high
resolution integer time function.

Given that, yes, you can make elapsed() do what you want.  But I really
think you should call it something more specific than elapsed if the
function is serving as a common source of information on how a particular
type of timer on the system works.  monotonic_elapsed() perhaps?  etc..

Also, agreed, we don't need these in 3.6.  I'm not seeing anything really
objectionable for inclusion in a future 3.x which is all I'm really looking
out for. It sounds like avoiding keyword arguments and adding _ms _us and
_ns variants of functions is the practical solution for micropython.

-gps  (awaiting his WiPys :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jeanpierreda at  Wed Jun 24 01:32:47 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Tue, 23 Jun 2015 16:32:47 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmbs15$cpk$>
References: <>
 <mmbhjs$pau$> <>
Message-ID: <>

On Tue, Jun 23, 2015 at 7:55 AM, Sturla Molden <sturla.molden at> wrote:
> On 23/06/15 15:14, Jonas Wielicki wrote:
>> To be fair, you will nevertheless get a slowdown when copy-on-write
>> kicks in while first using whatever was cloned from the parent. This is
>> nothing which blocks execution, but slows down execution.
> Yes, particularly because of reference counts. Unfortunately Python stores
> refcounts within the PyObject struct. And when a refcount is updated a copy
> of the entire 4 KB page is triggered. There would be fare less of this if
> refcounts was kept in dedicated pages.

A coworker of mine wrote a patch to Python that allows you to freeze
refcounts for all existing objects before forking, if the correct
compile options are set. This adds overhead to incref/decref, but
dramatically changes the python+fork memory usage story. (I haven't
personally played with it much, but it sounds decent.) If there's any
interest I can try to upstream this change, guarded behind a compiler

We've also tried moving refcounts to their own pages, like you and
Nick suggest, but it breaks a *lot* of third-party code. I can try to
upstream it. If it's guarded by a compiler flag it is probably still
useful, just any users would have to grep through their dependencies
to make sure nothing directly accesses the refcount. (The stdlib can
be made to work.)  It sounds like it would also be useful for the main
project in the topic of this thread, so I imagine there's more
momentum behind it.

-- Devin

From njs at  Wed Jun 24 01:46:05 2015
From: njs at (Nathaniel Smith)
Date: Tue, 23 Jun 2015 16:46:05 -0700
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <>
References: <20150623021530.74ce1ebe@x230>
Message-ID: <>

On Tue, Jun 23, 2015 at 4:32 PM, Gregory P. Smith <greg at> wrote:
> I still don't see how an elapsed() function taking two arbitrary integer
> arguments could work in a meaningful manner.  Even if you assume they are
> the same units, the only assumption that can be made is that if the second
> int is lower than the first, at least one wraparound occurred.

Assuming you have an n-bit clock:

(1) if you have arbitrary storage and the ability to do some sort of
interrupt handling at least once per wraparound period, then you can
reliably measure any duration.

(2) if you don't have that, but can assume that at most one wraparound
has occurred, then you can reliably measure any duration up to 2**n
time units.

(3) if you can't even make that assumption, then you can't reliably
measure any duration whatsoever, so there's no point in even having
the clock.

I guess micropython is targeting platforms that can't afford option
(1), but would like to at least take advantage of option (2)?


Nathaniel J. Smith --

From ericsnowcurrently at  Wed Jun 24 04:18:36 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 20:18:36 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jun 20, 2015 at 11:25 PM, Nathaniel Smith <njs at> wrote:
> I'd love to see just a hand wavy, verbal proof-of-concept walking through
> how this might work in some simple but realistic case. To me a single
> compelling example could make this proposal feel much more concrete and
> achievable.

Here's a vague example:


from subinterpreters import Subinterpreter, Channel

def handle_job(val):
    if not isinstance(val, (int, float)):
        raise RuntimeError("{!r} not a valid arg".format(val))
    # something potentially expensive...

def runner(ch):
    while True:
        value = ch.pop()  # blocks
        if value is None:

ch = Channel()

sub = Subinterpreter()
task =, ch)

data = get_data()
for immutable_item in data:

if task.is_alive():

exc = task.exception()
if exc is not None:
    raise RuntimeError from exc

def verify(data):
    # make sure runner did its job

task =, data)
# do other stuff while we wait



> There aren't really many options for mutable objects, right? If you want
> shared nothing semantics, then transmitting a mutable object either needs to
> make a copy, or else be a real transfer, where the sender no longer has it
> (cf. Rust).
> I guess for the latter you'd need some new syntax for send-and-del, that
> requires the object to be self contained (all mutable objects reachable from
> it are only referenced by each other) and have only one reference in the
> sending process (which is the one being sent and then destroyed).

Right.  The idea of a self-contained object graph is something we'd
need if we went that route.  That's why initially we should focus on
sharing only immutable objects.

>> Keep in mind that by "immutability" I'm talking about *really* immutable,
>> perhaps going so far as treating the full memory space associated with an
>> object as frozen.  For instance, we'd have to ensure that "immutable" Python
>> objects like strings, ints, and tuples do not change (i.e. via the C API).
> This seems like a red herring to me. It's already the case that you can't
> legally use the c api to mutate tuples, ints, for any object that's ever
> been, say, passed to a function. So for these objects, the subinterpreter
> setup doesn't actually add any new constraints on user code.

Fair enough.

> C code is always going to be *able* to break memory safety so long as you're
> using shared-memory threading at the c level to implement this stuff. We
> just need to make it easy not to.


> Refcnts and garbage collection are another matter, of course.

Agreed. :)


From ericsnowcurrently at  Wed Jun 24 04:37:43 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 20:37:43 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 12:31 AM, Nick Coghlan <ncoghlan at> wrote:
> The fact that mod_wsgi can run most Python web applications in a
> subinterpreter quite happily means we already know the core mechanism
> works fine,

This is a pretty important point.

> and there don't appear to be any insurmountable technical
> hurdles between the status quo and getting to a point where we can
> either switch the GIL to a read/write lock where a write lock is only
> needed for inter-interpreter communications, or else find a way for
> subinterpreters to release the GIL entirely by restricting them
> appropriately.

Proper multi-core operation will require at least some changes
relative to the GIL.  My goal is to execute the least amount of change
at first.  We can build on that.

> For inter-interpreter communication, the worst case scenario is having
> to rely on a memcpy based message passing system (which would still be
> faster than multiprocessing's serialisation + IPC overhead),

By initially focusing on immutable objects we shouldn't need to go
that far.  That said, a memcpy-based solution may very well be a good
next step once the basic goals of the project are met.

> but there
> don't appear to be any insurmountable barriers to setting up an object
> ownership based system instead

Agreed.  That's something we can experiment with once we get the core
of the project working.

> (code that accesses PyObject_HEAD
> fields directly rather than through the relevant macros and functions
> seems to be the most likely culprit for breaking, but I think "don't
> do that" is a reasonable answer there).


> There's plenty of prior art here (including a system I once wrote in C
> myself atop TI's DSP/BIOS MBX and TSK APIs), so I'm comfortable with
> Eric's "simple matter of engineering" characterisation of the problem
> space.

Good. :)

> The main reason that subinterpreters have never had a Python API
> before is that they have enough rough edges that having to write a
> custom C extension module to access the API is the least of your
> problems if you decide you need them. At the same time, not having a
> Python API not only makes them much harder to test, which means
> various aspects of their operation are more likely to be broken, but
> also makes them inherently CPython specific.
> Eric's proposal essentially amounts to three things:
> 1. Filing off enough of the rough edges of the subinterpreter support
> that we're comfortable giving them a public Python level API that
> other interpreter implementations can reasonably support
> 2. Providing the primitives needed for safe and efficient message
> passing between subinterpreters
> 3. Allowing subinterpreters to truly execute in parallel on multicore machines
> All 3 of those are useful enhancements in their own right, which
> offers the prospect of being able to make incremental progress towards
> the ultimate goal of native Python level support for distributing
> across multiple cores within a single process.

Yep.  That sums it up pretty well.  That decomposition should make it
a bit easier to move the project forward.


From ericsnowcurrently at  Wed Jun 24 04:39:32 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 20:39:32 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 12:41 AM, Wes Turner <wes.turner at> wrote:
> Exciting!
> *
> *
>   *
>   *
>   *
> * other approaches to the problem (with great APIs):
>   *
>   *



From ericsnowcurrently at  Wed Jun 24 05:05:13 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 21:05:13 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <20150621115443.70ddcf28@fsol>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 3:54 AM, Antoine Pitrou <solipsis at> wrote:
> On Sat, 20 Jun 2015 23:01:20 -0600
> Eric Snow <ericsnowcurrently at>
> wrote:
>> The only consequential shared piece is the
>> GIL and my proposal should render the GIL irrelevant for the most part.
> All singleton objects, built-in types are shared and probably a number
> of other things hidden in dark closets...

Yep.  I expect we'll be able to sort those out under the assumption
that 99% of the time they can be treated as immutable.  We'll then
have to find a way to keep the corner cases from breaking the
subinterpreter isolation.

>Not to mention the memory allocator.

This is a sticky part that I've been considering from almost day 1.
It's not the #1 problem to solve, but it will be an important one if
we want to have truly parallel subinterpreters.

> By the way, what you're aiming to do is conceptually quite similar to
> Trent's PyParallel (thought Trent doesn't use subinterpreters, his main
> work is around trying to making object sharing safe without any GIL to
> trivially protect the sharing), so you may want to pair with him. Of
> course, you may end up with a Windows-only Python interpreter :-)

Right.  I read through Trent's work on several occasions and have
gleaned a couple lessons related to object sharing.  I was planning on
getting in touch with Trent in the near future.

> I'm under the impression you're underestimating the task at hand here.
> Or perhaps you're not and you're just willing to present it in a
> positive way :-)

I'd like to think it's the latter. :)

The main reason why I'm hopeful we can make a meaningful change for
3.6 is that I don't foresee any major changes to CPython's internals.
Nearly all the necessary pieces are already there. <handwave/>  I'm
also intent on taking a minimal approach initially.  We can build on
it from there, easing restrictions that allowed us to roll out the
initial implementation more quickly.  All that said, I won't be
surprised if it takes the entire 3.6 dev cycle to get it right.


From ericsnowcurrently at  Wed Jun 24 05:08:36 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 21:08:36 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 4:25 AM, Nick Coghlan <ncoghlan at> wrote:
> We already have the tools to do deep copies of object trees (although
> I'll concede I *was* actually thinking in terms of the classic C/C++
> mistake of carelessly copying pointers around when I wrote that
> particular message). One of the options for deep copies tends to be a
> pickle/unpickle round trip, which will still incur the serialisation
> overhead, but not the IPC overhead.

This does make me wonder if it would be worth pursuing a mechanism for
encapsulating an object graph, such that it would be easier to
manage/copy the graph as a whole.

> "Faster message passing than multiprocessing" sets the baseline pretty
> low, after all.
> However, this is also why Eric mentions the notions of object
> ownership or limiting channels to less than the full complement of
> Python objects. As an *added* feature at the Python level, it's
> possible to initially enforce restrictions that don't exist in the C
> level subinterpeter API, and then work to relax those restrictions
> over time.



From ericsnowcurrently at  Wed Jun 24 06:15:10 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 22:15:10 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mm64be$eq5$>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 4:40 AM, Stefan Behnel <stefan_ml at> wrote:
> If objects can make it explicit that they support sharing (and preferably
> are allowed to implement the exact details themselves), I'm sure we'll find
> ways to share NumPy arrays across subinterpreters. That feature alone tends
> to be a quick way to make a lot of people happy.

Are you thinking of something along the lines of a dunder method (e.g.


From ericsnowcurrently at  Wed Jun 24 06:19:07 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 22:19:07 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mm655v$ri7$>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 4:54 AM, Stefan Behnel <stefan_ml at> wrote:
> I also had some discussions about these things with Nick before. Not sure
> if you really meant PEP 384 (you might have) or rather PEP 489:

I did mean PEP 384, but PEP 489 is certainly related as I expect we'll
make participation in this subinterpreter model by extension modules
opt-in.  Basically they will need to promise that they will work
within the restricted environment.

> I consider that one more important here, as it will eventually allow Cython
> modules to support subinterpreters. Unless, as you mentioned, they use
> global C state, but only in external C code, e.g. wrapped libraries. Cython
> should be able to handle most of the module internal global state on a
> per-interpreter basis itself, without too much user code impact.


> I'm totally +1 for the idea. I hope that I'll find the time (well, and
> money) to work on PEP 489 in Cython soon, so that I can prove it right for
> actual real-world code in Python 3.5. We'll then see about subinterpreter
> support. That's certainly the next step.

That would be super.


From ericsnowcurrently at  Wed Jun 24 06:21:33 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 22:21:33 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 4:57 AM, Nick Coghlan <ncoghlan at> wrote:
> I'd want us to eventually aim for zero-copy speed for at least known
> immutable values (int, str, float, etc), immutable containers of
> immutable values (tuple, frozenset), and for types that support both
> publishing and consuming data via the PEP 3118 buffer protocol without
> making a copy.
> For everything else I'd be fine with a starting point that was at
> least no slower than multiprocessing (which shouldn't be difficult,
> since we'll at least save the IPC overhead even if there are cases
> where communication between subinterpreters falls back to
> serialisation rather than doing something more CPU and memory
> efficient).

Makes sense.


From storchaka at  Wed Jun 24 06:59:12 2015
From: storchaka at (Serhiy Storchaka)
Date: Wed, 24 Jun 2015 07:59:12 +0300
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <20150623021530.74ce1ebe@x230>
References: <20150623021530.74ce1ebe@x230>
Message-ID: <mmddf0$ko2$>

On 23.06.15 02:15, Paul Sokolovsky wrote:
> Hello from MicroPython, a lean Python implementation
> scaling down to run even on microcontrollers
> (
> Our target hardware base oftentimes lacks floating point support, and
> using software emulation is expensive. So, we would like to have
> versions of some timing functions, taking/returning millisecond and/or
> microsecond values as integers.

What about returning decimals or special fixed-precision numbers 
(internally implemented as 64-bit integer with constant scale 1000 or 

From ericsnowcurrently at  Wed Jun 24 07:01:24 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 23:01:24 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mm67sk$1nv$>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 5:41 AM, Sturla Molden <sturla.molden at> wrote:
> From the perspective of software design, it would be good it the CPython
> interpreter provided an environment instead of using global objects. It
> would mean that all functions in the C API would need to take the
> environment pointer as their first variable, which will be a major rewrite.
> It would also allow the "one interpreter per thread" design similar to tcl
> and .NET application domains.

While perhaps a worthy goal, I don't know that it fits in well with my
goals.  I'm aiming for an improved multi-core story with a minimum of
change in the interpreter.

> However, from the perspective of multi-core parallel computing, I am not
> sure what this offers over using multiple processes.
> Yes, you avoid the process startup time, but on POSIX systems a fork is very
> fast. An certainly, forking is much more efficient than serializing Python
> objects.

You still need the mechanism to safely and efficiently share (at least
some) objects between interpreters after forking.  I expect this will
be simpler within the same process.

> It then boils down to a workaround for the fact that Windows cannot
> fork, which makes it particularly bad for running CPython.

We cannot leave Windows out in the cold.

> You also have to
> start up a subinterpreter and a thread, which is not instantaneous. So I am
> not sure there is a lot to gain here over calling os.fork.

One key difference is that with a subinterpreter you are basically
starting with a clean slate.  The isolation between interpreters
extends to the initial state.  That level of isolation is a desirable
feature because you can more clearly reason about the state of the
running tasks.

> A non-valid argument for this kind of design is that only code which uses
> threads for parallel computing is "real" multi-core code. So Python does not
> support multi-cores because multiprocessing or os.fork is just faking it.
> This is an argument that belongs in the intellectual junk yard. It stems
> from the abuse of threads among Windows and Java developers, and is rooted
> in the absence of fork on Windows and the formerly slow fork on Solaris. And
> thus they are only able to think in terms of threads. If threading.Thread
> does not scale the way they want, they think multicores are out of reach.

Well, perception is 9/10ths of the law. :)  If the multi-core problem
is already solved in Python then why does it fail in the court of
public opinion.  The perception that Python lacks a good multi-core
story is real, leads organizations away from Python, and will not
improve without concrete changes.  Contrast that with Go or Rust or
many other languages that make it simple to leverage multiple cores
(even if most people never need to).

> So the question is, how do you want to share objects between
> subinterpreters? And why is it better than IPC, when your idea is to isolate
> subinterpreters like application domains?

In return, my question is, what is the level of effort to get fork+IPC
to do what we want vs. subinterpreters?  Note that we need to
accommodate Windows as more than an afterthought (or second-class
citizen), as well as other execution environments (e.g. embedded)
where we may not be able to fork.

> If you think avoiding IPC is clever, you are wrong. IPC is very fast, in
> fact programs written to use MPI tends to perform and scale better than
> programs written to use OpenMP in parallel computing.

I'd love to learn more about that.  I'm sure there are some great
lessons on efficiently and safely sharing data between isolated
execution environments.  That said, how does IPC compare to passing
objects around within the same process?

> Not only is IPC fast,
> but you also avoid an issue called "false sharing", which can be even more
> detrimental than the GIL: You have parallel code, but it seems to run in
> serial, even though there is no explicit serialization anywhere. And by
> since Murphy's law is working against us, Python reference counts will be
> false shared unless we use multiple processes.

Solving reference counts in this situation is a separate issue that
will likely need to be resolved, regardless of which machinery we use
to isolate task execution.

> The reason IPC in multiprocessing is slow is due to calling pickle, it is
> not the IPC in itself. A pipe or an Unix socket (named pipe on Windows) have
> the overhead of a memcpy in the kernel, which is equal to a memcpy plus some
> tiny constant overhead. And if you need two processes to share memory, there
> is something called shared memory. Thus, we can send data between processes
> just as fast as between subinterpreters.

IPC sounds great, but how well does it interact with Python's memory
management/allocator?  I haven't looked closely but I expect that
multiprocessing does not use IPC anywhere.

> All in all, I think we are better off finding a better way to share Python
> objects between processes.

I expect that whatever solution we would find for subinterpreters
would have a lot in common with the same thing for processes.

> P.S. Another thing to note is that with sub-interpreters, you can forget
> about using ctypes or anything else that uses the simplified GIL API (e.g.
> certain Cython generated extensions).

On the one hand there are some rough edges with subinterpreters that
need to be fixed.  On the other hand, we will have to restrict the
subinterpreter model (at least initially) in ways that would likely
preclude operation of existing extension modules.


From ericsnowcurrently at  Wed Jun 24 07:26:08 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 23:26:08 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 5:55 AM, Devin Jeanpierre
<jeanpierreda at> wrote:
> On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow <ericsnowcurrently at> wrote:
>> On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" <jeanpierreda at> wrote:
>>> It's worthwhile to consider fork as an alternative.  IMO we'd get a
>>> lot out of making forking safer, easier, and more efficient. (e.g.
>>> respectively: adding an atfork registration mechanism; separating out
>>> the bits of multiprocessing that use pickle from those that d, I still disagreeon't;
>>> moving the refcount to a separate page, or allowing it to be frozen
>>> prior to a fork.)
>> So leverage a common base of code with the multiprocessing module?
> What is this question in response to? I don't understand.

It sounded like you were suggesting that we factor out a common code
base that could be used by multiprocessing and the other machinery and
that only multiprocessing would keep the pickle-related code.

>> I would expect subinterpreters to use less memory.  Furthermore creating
>> them would be significantly faster.  Passing objects between them would be
>> much more efficient.  And, yes, cross-platform.
> Maybe I don't understand how subinterpreters work. AIUI, the whole
> point of independent subinterpreters is that they share no state. So
> if I have a web server, each independent serving thread has to do all
> of the initialization (import HTTP libraries, etc.), right?

Yes.  However, I expect that we could mitigate that cost to some extent.

> Compare
> with forking, where the initialization is all done and then you fork,
> and you are immediately ready to serve, using the data structures
> shared with all the other workers, which is only copied when it is
> written to. So forking starts up faster and uses less memory (due to
> shared memory.)

But we are aiming for a share-nothing model with an efficient
object-passing mechanism.  Furthermore, subinterpreters do not have to
be single-use.  My proposal includes running tasks in an existing
subinterpreter (e.g. executor pool), so that start-up cost is
mitigated in cases where it matters.

Note that ultimately my goal is to make it obvious and undeniable that
Python (3.6+) has a good multi-core story.  In my proposal,
subinterpreters are a means to an end.  If there's a better solution
then great!  As long as the real goal is met I'll be satisfied. :)
For now I'm still confident that the subinterpreter approach is the
best option for meeting the goal.

> Re passing objects, see below.
> I do agree it's cross-platform, but right now that's the only thing I
> agree with.
>>> Note: I don't count the IPC cost of forking, because at least on
>>> linux, any way to efficiently share objects between independent
>>> interpreters in separate threads can also be ported to independent
>>> interpreters in forked subprocesses,
>> How so?  Subinterpreters are in the same process.  For this proposal each
>> would be on its own thread.  Sharing objects between them through channels
>> would be more efficient than IPC.  Perhaps I've missed something?
> You might be missing that memory can be shared between processes, not
> just threads, but I don't know.
> The reason passing objects between processes is so slow is currently
> *nearly entirely* the cost of serialization. That is, it's the fact
> that you are passing an object to an entirely separate interpreter,
> and need to serialize the whole object graph and so on. If you can
> make that fast without serialization,

That is a worthy goal!

> for shared memory threads, then
> all the serialization becomes unnecessary, and you can either write to
> a pipe (fast, if it's a non-container), or used shared memory from the
> beginning (instantaneous). This is possible on any POSIX OS. Linux
> lets you go even further.

And this is faster than passing objects around within the same
process?  Does it play well with Python's memory model?


From ericsnowcurrently at  Wed Jun 24 07:30:13 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 23:30:13 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 6:13 AM, Devin Jeanpierre
<jeanpierreda at> wrote:
> The solution has threads that are remarkably like
> processes, so I think it's really important to be careful about the
> differences and why this solution has the advantage. I'm not seeing
> that.

Good point.  I still think there are some significant differences (as
already explained).

> And remember that we *do* have many examples of people using
> parallelized Python code in production. Are you sure you're satisfying
> their concerns, or whose concerns are you trying to satisfy?

Another good point.  What would you suggest is the best way to find out?


From ericsnowcurrently at  Wed Jun 24 07:33:23 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 23:33:23 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mm6ctj$790$>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 7:06 AM, Stefan Behnel <stefan_ml at> wrote:
> Nick Coghlan schrieb am 21.06.2015 um 03:28:
>> * there may be restrictions on some extension modules that limit them
>> to "main interpreter only" (e.g. if the extension module itself isn't
>> thread-safe, then it will need to remain fully protected by the GIL)
> Just an idea, but C extensions could opt-in to this. Calling into them has
> to go through some kind of callable type, usually PyCFunction. We could
> protect all calls to extension types and C functions with a global runtime
> lock (per process, not per interpreter) and Extensions could set a flag on
> their functions and methods (or get it inherited from their extension types
> etc.) that says "I don't need the lock". That allows for a very
> fine-grained transition.

Exactly.  PEP 489 helps facilitate opting in as well, right?


From ericsnowcurrently at  Wed Jun 24 07:48:00 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 23:48:00 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 3:08 PM, Andrew Barnert <abarnert at> wrote:
> First, a minor question: instead of banning fork entirely within subinterpreters, why not just document that it is illegal to do anything between fork and exec in a subinterpreters, except for a very small (but possibly extensible) subset of Python? For example, after fork, you can no longer access any channels, and you also can't use signals, threads, fork again, imports, assignments to builtins, raising exceptions, or a whole host of other things (but of course if you exec an entirely new Python interpreter, it can do any of those things).

Sure.  I expect the quickest approach, though, will be to initially
have blanket restrictions and then ease them once the core
functionality is complete.

> C extension modules could just have a flag that marks whether the whole module is fork-safe or not (defaulting to not).

That may make sense independently from my proposal.

> So, this allows a subinterpreter to use subprocess (or even multiprocessing, as long as you use the forkserver or spawn mechanism), and it gives code that intentionally wants to do tricky/dangerous things a way to do them, but it avoids all of the problems with accidentally breaking a subinterpreter by forking it and then doing bad things.
> Second, a major question: In this proposal, are builtins and the modules map shared, or copied?
> If they're copied, it seems like it would be hard to do that even as efficiently as multiprocessing, much less more efficiently. Of course you could fake this with CoW, but I'm not sure how you'd do that, short of CoWing the entire heap (by using clone instead of pthreads on Linux, or by doing a bunch of explicit mmap and related calls on other POSIX systems), at which point you're pretty close to just implementing fork or vfork yourself to avoid calling fork or vfork, and unlikely to get it as efficient or as robust as what's already there.
> If they're shared, on the other hand, then it seems like it becomes very difficult to implement subinterpreter-safe code, because it's no longer safe to import a module, set a flag, call a registration function, etc.

I expect that ultimately the builtins will be shared in some fashion.
To some extent they already are.  sys.modules (and the rest of the
import machinery) will mostly not be shared, though I expect that
likewise we will have some form of sharing where we can get away with


From ericsnowcurrently at  Wed Jun 24 07:51:00 2015
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 23 Jun 2015 23:51:00 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 3:24 PM, Andrew Barnert via Python-ideas
<python-ideas at> wrote:
> On Jun 21, 2015, at 06:09, Nick Coghlan <ncoghlan at> wrote:
>> Avoiding object serialisation is indeed the main objective. With
>> subinterpreters, we have a lot more options for that than we do with
>> any form of IPC, including shared references to immutable objects, and
>> the PEP 3118 buffer API.
> It seems like you could provide a way to efficiently copy and share deeper objects than integers and buffers without sharing everything, assuming the user code knows, at the time those objects are created, that they will be copied or shared. Basically, you allocate the objects into a separate arena (along with allocating their refcounts on a separate page, as already mentioned). You can't add a reference to an outside object in an arena-allocated object, although you can copy that outside object into the arena. And then you just pass or clone (possibly by using CoW memory-mapping calls, only falling back to memcpy on platforms that can't do that) entire arenas instead of individual objects (so you don't need the fictitious memdeepcpy function that someone ridiculed earlier in this thread, but you get 90% of the benefits of having one).

Yeah, I've been thinking of something along these lines.  However,
it's not the #1 issue to address so I haven't gotten too far into it.


> This has the same basic advantages of forking, but it's doable efficiently on Windows, and doable less efficiently (but still better than spawn and pass) on even weird embedded platforms, and it forces code to be explicit about what gets shared and copied without forcing it to work through less-natural queue-like APIs.
> Also, it seems like you could fake this entire arena API on top of pickle/copy for a first implementation, then just replace the underlying implementation separately.

From ericsnowcurrently at  Wed Jun 24 08:01:31 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 00:01:31 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jun 21, 2015 at 7:47 PM, Nick Coghlan <ncoghlan at> wrote:
> It occurred to me in the context of another conversation that you (or
> someone else!) may be able to prototype some of the public API ideas
> for this using Jython and Vert.x:

I'll take a look.

> That idea and some of the initial feedback in this thread also made me
> realise that it is going to be essential to keep in mind that there
> are key goals at two different layers here:
> * design a compelling implementation independent public API for CSP
> style programming in Python
> * use subinterpreters to implement that API efficiently in CPython
> There's a feedback loop between those two goals where limitations on
> what's feasible in CPython may constrain the design of the public API,
> and the design of the API may drive enhancements to the existing
> subinterpreter capability, but we shouldn't lose sight of the fact
> that they're *separate* goals.

Yep.  I've looked at it that way from the beginning.  When I get to
the point of writing an actual PEP, I'm thinking it will actually be
multiple PEPs covering the different pieces.

I've also been considering how to implement that high-level API in
terms of a low-level API (threading vs. _thread) and it it make sense
to focus less on subinterpreters in that context.  At this point it
makes sense to me to expose subinterpreters in Python, so for now I
was planning on that for the low-level API.


From ericsnowcurrently at  Wed Jun 24 08:11:16 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 00:11:16 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 22, 2015 at 5:59 PM, Nathaniel Smith <njs at> wrote:
> On Mon, Jun 22, 2015 at 10:37 AM, Gregory P. Smith <greg at> wrote:
>> ...
> One possibility would be for subinterpreters to copy modules from the
> main interpreter -- I guess your average module is mostly dicts,
> strings, type objects, and functions; strings and functions are
> already immutable and could be shared without copying, and I guess
> copying the dicts and type objects into the subinterpreter is much
> cheaper than hitting the disk etc. to do a real import. (Though
> certainly not free.)

Yeah, I think there are a number of mechanisms we can explore to
improve the efficiency of subinterpreter startup (and sharing).

> This would have interesting semantic implications -- it would give
> similar effects to fork(), with subinterpreters starting from a
> snapshot of the main interpreter's global state.
>> I'm not entirely sold on this overall proposal, but I think a result of it
>> could be to make our subinterpreter support better which would be a good
>> thing.
>> We have had to turn people away from subinterpreters in the past for use as
>> part of their multithreaded C++ server where they wanted to occasionally run
>> some Python code in embedded interpreters as part of serving some requests.
>> Doing that would suddenly single thread their application (GIIIIIIL!) for
>> all requests currently executing Python code despite multiple
>> subinterpreters.
> I've also talked to HPC users who discovered this problem the hard way
> (e.g., folks working on the Large Hadron
> Collider) -- they've been using Python as an extension language in
> some large physics codes but are now porting those bits to C++ because
> of the GIL issues. (In this context startup overhead should be easily
> amortized, but switching to an RPC model is not going to happen.)

Would this proposal make a difference for them?


From ericsnowcurrently at  Wed Jun 24 08:12:37 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 00:12:37 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 22, 2015 at 9:52 PM, Nick Coghlan <ncoghlan at> wrote:
> On 23 June 2015 at 10:03, Chris Angelico <rosuav at> wrote:
>> On Tue, Jun 23, 2015 at 9:59 AM, Nathaniel Smith <njs at> wrote:
>>> One possibility would be for subinterpreters to copy modules from the
>>> main interpreter -- I guess your average module is mostly dicts,
>>> strings, type objects, and functions; strings and functions are
>>> already immutable and could be shared without copying, and I guess
>>> copying the dicts and type objects into the subinterpreter is much
>>> cheaper than hitting the disk etc. to do a real import. (Though
>>> certainly not free.)
>> FWIW, functions aren't immutable, but code objects are.
> Anything we come up with for optimised data sharing via channels could
> be applied to passing a prebuilt sys.modules dictionary through to
> subinterpreters.
> The key for me is to start from a well-defined "shared nothing"
> semantic model, but then look for ways to exploit the fact that we
> actually *are* running in the same address space to avoid copy
> objects.


> The current reference-counts-embedded-in-the-object-structs memory
> layout also plays havoc with the all-or-nothing page level
> copy-on-write semantics used by the fork() syscall at the operating
> system layer, so some of the ideas we've been considering
> (specifically, those related to moving the reference counter
> bookkeeping out of the object structs themselves) would potentially
> help with that as well (but would also have other hard to predict
> performance consequences).
> There's a reason Eric announced this as the *start* of a research
> project, rather than as a finished proposal - while it seems
> conceptually sound overall, there are a vast number of details to be
> considered that will no doubt hold a great many devils :)

And they keep multiplying! :)


From ericsnowcurrently at  Wed Jun 24 08:15:58 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 00:15:58 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 23, 2015 at 10:01 AM, Barry Warsaw <barry at> wrote:
> A crazy offshoot idea would be something like Emacs' unexec, where during the
> build process you could preload a bunch of always-used immutable modules, then
> freeze the state in such a way that starting up again later would be much
> faster, because the imports (and probably more importantly, the searching)
> could be avoided.



From ericsnowcurrently at  Wed Jun 24 08:18:39 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 00:18:39 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
 <mmbhjs$pau$> <>
Message-ID: <>

On Tue, Jun 23, 2015 at 5:32 PM, Devin Jeanpierre
<jeanpierreda at> wrote:
> A coworker of mine wrote a patch to Python that allows you to freeze
> refcounts for all existing objects before forking, if the correct
> compile options are set. This adds overhead to incref/decref, but
> dramatically changes the python+fork memory usage story. (I haven't
> personally played with it much, but it sounds decent.) If there's any
> interest I can try to upstream this change, guarded behind a compiler
> flag.
> We've also tried moving refcounts to their own pages, like you and
> Nick suggest, but it breaks a *lot* of third-party code. I can try to
> upstream it. If it's guarded by a compiler flag it is probably still
> useful, just any users would have to grep through their dependencies
> to make sure nothing directly accesses the refcount. (The stdlib can
> be made to work.)  It sounds like it would also be useful for the main
> project in the topic of this thread, so I imagine there's more
> momentum behind it.

I'd be interested in more info on both the refcount freezing and the
sepatate refcounts pages.


From ericsnowcurrently at  Wed Jun 24 08:19:54 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 00:19:54 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
 <mmbhjs$pau$> <>
Message-ID: <>

On Tue, Jun 23, 2015 at 5:32 PM, Devin Jeanpierre
<jeanpierreda at> wrote:
> We've also tried moving refcounts to their own pages, like you and
> Nick suggest, but it breaks a *lot* of third-party code. I can try to
> upstream it. If it's guarded by a compiler flag it is probably still
> useful, just any users would have to grep through their dependencies
> to make sure nothing directly accesses the refcount. (The stdlib can
> be made to work.)  It sounds like it would also be useful for the main
> project in the topic of this thread, so I imagine there's more
> momentum behind it.

Any indication of the performance impact?


From ericsnowcurrently at  Wed Jun 24 08:21:42 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 00:21:42 -0600
Subject: [Python-ideas] PyParallel update (was: solving multi-core
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 23, 2015 at 7:53 AM, Trent Nelson <trent at> wrote:
> On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote:
>> Furthermore, removing the GIL is perhaps an obvious solution but not
>> the only one.  Others include Trent Nelson's PyParallels, STM, and
>> other Python implementations..
> So, I've been sprinting relentlessly on PyParallel since Christmas, and
> recently reached my v0.0 milestone of being able to handle all the TEFB
> tests, plus get the "instantaneous wiki search" thing working too.

Thanks for the update, Trent.  I've skimmed through it and will be
reading more in-depth when I get a chance.  I'm sure I'll have more
questions for you. :)


From njs at  Wed Jun 24 09:19:47 2015
From: njs at (Nathaniel Smith)
Date: Wed, 24 Jun 2015 00:19:47 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 23, 2015 at 11:11 PM, Eric Snow <ericsnowcurrently at> wrote:
> On Mon, Jun 22, 2015 at 5:59 PM, Nathaniel Smith <njs at> wrote:
>> On Mon, Jun 22, 2015 at 10:37 AM, Gregory P. Smith <greg at> wrote:
>>> We have had to turn people away from subinterpreters in the past for use as
>>> part of their multithreaded C++ server where they wanted to occasionally run
>>> some Python code in embedded interpreters as part of serving some requests.
>>> Doing that would suddenly single thread their application (GIIIIIIL!) for
>>> all requests currently executing Python code despite multiple
>>> subinterpreters.
>> I've also talked to HPC users who discovered this problem the hard way
>> (e.g., folks working on the Large Hadron
>> Collider) -- they've been using Python as an extension language in
>> some large physics codes but are now porting those bits to C++ because
>> of the GIL issues. (In this context startup overhead should be easily
>> amortized, but switching to an RPC model is not going to happen.)
> Would this proposal make a difference for them?

I'm not sure -- it was just a conversation, so I've never seen their
actual code. I'm pretty sure they're still on py2, for one thing :-).
But putting that aside, I *think* it potentially could help -- my
guess is that at a high level they have an API where they basically
want to register a callback once, and then call it in parallel from
multiple threads. This kind of usage would require some extra
machinery, I guess, to spawn a subinterpreter for each thread and
import the relevant libraries so the callback could run, but I can't
see any reason one couldn't build that on top of the mechanisms you're
talking about.


Nathaniel J. Smith --

From solipsis at  Wed Jun 24 09:19:55 2015
From: solipsis at (Antoine Pitrou)
Date: Wed, 24 Jun 2015 09:19:55 +0200
Subject: [Python-ideas] millisecond and microsecond times without floats
References: <20150623021530.74ce1ebe@x230>
Message-ID: <20150624091955.2efef148@fsol>

On Tue, 23 Jun 2015 23:25:00 +0300
Paul Sokolovsky <pmiscml at> wrote:
> Well, that's one of examples of that "desktop" thinking ;-).
> Consider for example that 2^32 microseconds is just over an hour, so
> expressing everything in microseconds would require arbitrary-precision
> integers, which may be just the same kind of burden for an embedded
> system as floats.

I'd like to suggest micropython first acquire the ability to handle
64-bit numbers (or something close to that, e.g. 60-bit, if it likes
to use tags for typing), if it wants to become appropriate for precise
datetime computations.

That should be less of a heavy requirement than arbitrary-precision



From mal at  Wed Jun 24 09:50:16 2015
From: mal at (M.-A. Lemburg)
Date: Wed, 24 Jun 2015 09:50:16 +0200
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <20150623021530.74ce1ebe@x230>
References: <20150623021530.74ce1ebe@x230>
Message-ID: <>

On 23.06.2015 01:15, Paul Sokolovsky wrote:
> Hello from MicroPython, a lean Python implementation
> scaling down to run even on microcontrollers 
> (
> Our target hardware base oftentimes lacks floating point support, and
> using software emulation is expensive. So, we would like to have
> versions of some timing functions, taking/returning millisecond and/or
> microsecond values as integers.
> The most functionality we're interested in:
> 1. Delays
> 2. Relative time (from an arbitrary starting point, expected to be
>    wrapped)
> 3. Calculating time differences, with immunity to wrap-around.
> The first presented assumption is to use "time.sleep()" for delays,
> "time.monotonic()" for relative time as the base. Would somebody gave
> alternative/better suggestions?
> Second question is how to modify their names for
> millisecond/microsecond versions. For sleep(), "msleep" and "usleep"
> would be concise possibilities, but that doesn't map well to
> monotonic(), leading to "mmonotonic". So, better idea is to use "_ms"
> and "_us" suffixes:
> sleep_ms()
> sleep_us()
> monotonic_ms()
> monotonic_us()
> Point 3 above isn't currently addressed by time module at all.
> mentions some internal
> workaround for overflows/wrap-arounds on some systems. Due to
> lean-ness of our hardware base, we'd like to make this matter explicit
> to the applications and avoid internal workarounds. Proposed solution
> is to have time.elapsed(time1, time2) function, which can take values
> as returned by monotonic_ms(), monotonic_us(). Assuming that results of
> both functions are encoded and wrap consistently (this is reasonable
> assumption), there's no need for 2 separate elapsed_ms(), elapsed_us()
> function.
> So, the above are rough ideas we (well, I) have. We'd like to get wider
> Python community feedback on them, see if there're better/alternative
> ideas, how Pythonic it is, etc. To clarify, this should not be construed
> as proposal to add the above functions to CPython.

You may want to use a similar approach as I have used in
mxDateTime to express date/time values:

It uses an integer to represent days and a float to represent
seconds since midnight (i.e. time of day). The concept has worked
out really well and often makes date/time calculations a lot
easier than trying to stuff everything into a single number and
then having to deal things like leap seconds and rounding errors.

In your case you'd use integers for both and nanoseconds
as basis for the time of day integer.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 24 2015)
>>> Python Projects, Coaching and Consulting ...
>>> mxODBC Plone/Zope Database Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...
2015-06-16: Released eGenix pyOpenSSL 0.13.10 ...
2015-07-20: EuroPython 2015, Bilbao, Spain ...             26 days to go
2015-07-29: Python Meeting Duesseldorf ...                 35 days to go Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at  Wed Jun 24 10:22:42 2015
From: ncoghlan at (Nick Coghlan)
Date: Wed, 24 Jun 2015 18:22:42 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 24 June 2015 at 15:33, Eric Snow <ericsnowcurrently at> wrote:
> On Sun, Jun 21, 2015 at 7:06 AM, Stefan Behnel <stefan_ml at> wrote:
>> Nick Coghlan schrieb am 21.06.2015 um 03:28:
>>> * there may be restrictions on some extension modules that limit them
>>> to "main interpreter only" (e.g. if the extension module itself isn't
>>> thread-safe, then it will need to remain fully protected by the GIL)
>> Just an idea, but C extensions could opt-in to this. Calling into them has
>> to go through some kind of callable type, usually PyCFunction. We could
>> protect all calls to extension types and C functions with a global runtime
>> lock (per process, not per interpreter) and Extensions could set a flag on
>> their functions and methods (or get it inherited from their extension types
>> etc.) that says "I don't need the lock". That allows for a very
>> fine-grained transition.
> Exactly.  PEP 489 helps facilitate opting in as well, right?

Yep, as PEP 489 requires subinterpreter compatibility as a
precondition for using multi-phase initialisation :)


P.S. Technically, what it actually requires is support for "multiple
instances of the module existing in the same process at the same
time", as it really recreates the module if you remove it from
sys.modules and import it again, unlike single phase initialisation.
But that's a mouthful, so "must support subinterpreters" is an easier

Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From drekin at  Wed Jun 24 11:00:18 2015
From: drekin at (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=)
Date: Wed, 24 Jun 2015 11:00:18 +0200
Subject: [Python-ideas] Are there asynchronous generators?
Message-ID: <>


I had a generator producing pairs of values and wanted to feed all the
first members of the pairs to one consumer and all the second members to
another consumer. For example:

def pairs():
    for i in range(4):
        yield (i, i ** 2)

biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9])

The point is I wanted the consumers to be suspended and resumed in a
coordinated manner: The first producer is invoked, it wants the first
element. The coordinator implemented by biconsumer function invokes
pairs(), gets the first pair and yields its first member to the first
consumer. Then it wants the next element, but now it's the second
consumer's turn, so the first consumer is suspended and the second consumer
is invoked and fed with the second member of the first pair. Then the
second producer wants the next element, but it's the first consumer's turn?
and so on. In the end, when the stream of pairs is exhausted, StopIteration
is thrown to both consumers and their results are combined.

The cooperative asynchronous nature of the execution reminded me asyncio
and coroutines, so I thought that biconsumer may be implemented using them.
However, it seems that it is imposible to write an "asynchronous generator"
since the "yielding pipe" is already used for the communication with the
scheduler. And even if it was possible to make an asynchronous generator,
it is not clear how to feed it to a synchronous consumer like sum() or
list() function.

With PEP 492 the concepts of generators and coroutines were separated, so
asyncronous generators may be possible in theory. An ordinary function has
just the returning pipe ? for returning the result to the caller. A
generator has also a yielding pipe ? used for yielding the values during
iteration, and its return pipe is used to finish the iteration. A native
coroutine has a returning pipe ? to return the result to a caller just like
an ordinary function, and also an async pipe ? used for communication with
a scheduler and execution suspension. An asynchronous generator would just
have both yieling pipe and async pipe.

So my question is: was the code like the following considered? Does it make
sense? Or are there not enough uses cases for such code? I found only a
short mention in, so possibly
these coroutine-generators are the same idea.

async def f():
    number_string = await fetch_data()
    for n in number_string.split():
        yield int(n)

async def g():
    result = async/await? sum(f())
    return result

async def h():
    the_sum = await g()

As for explanation about the execution of h() by an event loop: h is a
native coroutine called by the event loop, having both returning pipe and
async pipe. The returning pipe leads to the end of the task, the async pipe
is used for cummunication with the scheduler. Then, g() is called
asynchronously ? using the await keyword means the the access to the async
pipe is given to the callee. Then g() invokes the asyncronous generator f()
and gives it the access to its async pipe, so when f() is yielding values
to sum, it can also yield a future to the scheduler via the async pipe and
suspend the whole task.

Regards, Adam Barto?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pmiscml at  Wed Jun 24 12:13:49 2015
From: pmiscml at (Paul Sokolovsky)
Date: Wed, 24 Jun 2015 13:13:49 +0300
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <20150624091955.2efef148@fsol>
References: <20150623021530.74ce1ebe@x230>
 <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol>
Message-ID: <20150624131349.01ee7634@x230>


On Wed, 24 Jun 2015 09:19:55 +0200
Antoine Pitrou <solipsis at> wrote:

> On Tue, 23 Jun 2015 23:25:00 +0300
> Paul Sokolovsky <pmiscml at> wrote:
> > 
> > Well, that's one of examples of that "desktop" thinking ;-).
> > Consider for example that 2^32 microseconds is just over an hour, so
> > expressing everything in microseconds would require
> > arbitrary-precision integers, which may be just the same kind of
> > burden for an embedded system as floats.
> I'd like to suggest micropython first acquire the ability to handle
> 64-bit numbers (or something close to that, e.g. 60-bit, if it likes
> to use tags for typing), if it wants to become appropriate for precise
> datetime computations.

MicroPython has such support. Long integers can be implemented either
as variable-size arbitrary-precisions integers or and C long long type.
But that doesn't change the fact that 64-bit values still overflow, or
that we don't want to force need for any kind of long integer on any
particular implementation.

We don't even want to mask the fact that fixed-size (time) counters
overflow - for various reasons, including the fact that we want to
follow Python's tradition of being nice teaching/learning language, and
learning embedded programming means learning to deal with timer, etc.

So, the question is not how to "appropriate for precise datetime
computations" - MicroPython inherits that ability by being a Python,
but how to scale into the opposite direction, how to integrate into
stdlib "realtime" time handling, which is simple, fast (getting timing
value itself is low-overhead) and modular-arithmetic by its nature.

> That should be less of a heavy requirement than arbitrary-precision
> ints.
> Regards
> Antoine.

Best regards,
 Paul                          mailto:pmiscml at

From andrew.svetlov at  Wed Jun 24 12:13:53 2015
From: andrew.svetlov at (Andrew Svetlov)
Date: Wed, 24 Jun 2015 13:13:53 +0300
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
Message-ID: <>

Your idea is clean and maybe we will allow `yield` inside `async def`
in Python 3.6.
For PEP 492 it was too big change.

On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? <drekin at> wrote:
> Hello,
> I had a generator producing pairs of values and wanted to feed all the first
> members of the pairs to one consumer and all the second members to another
> consumer. For example:
> def pairs():
>     for i in range(4):
>         yield (i, i ** 2)
> biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9])
> The point is I wanted the consumers to be suspended and resumed in a
> coordinated manner: The first producer is invoked, it wants the first
> element. The coordinator implemented by biconsumer function invokes pairs(),
> gets the first pair and yields its first member to the first consumer. Then
> it wants the next element, but now it's the second consumer's turn, so the
> first consumer is suspended and the second consumer is invoked and fed with
> the second member of the first pair. Then the second producer wants the next
> element, but it's the first consumer's turn? and so on. In the end, when the
> stream of pairs is exhausted, StopIteration is thrown to both consumers and
> their results are combined.
> The cooperative asynchronous nature of the execution reminded me asyncio and
> coroutines, so I thought that biconsumer may be implemented using them.
> However, it seems that it is imposible to write an "asynchronous generator"
> since the "yielding pipe" is already used for the communication with the
> scheduler. And even if it was possible to make an asynchronous generator, it
> is not clear how to feed it to a synchronous consumer like sum() or list()
> function.
> With PEP 492 the concepts of generators and coroutines were separated, so
> asyncronous generators may be possible in theory. An ordinary function has
> just the returning pipe ? for returning the result to the caller. A
> generator has also a yielding pipe ? used for yielding the values during
> iteration, and its return pipe is used to finish the iteration. A native
> coroutine has a returning pipe ? to return the result to a caller just like
> an ordinary function, and also an async pipe ? used for communication with a
> scheduler and execution suspension. An asynchronous generator would just
> have both yieling pipe and async pipe.
> So my question is: was the code like the following considered? Does it make
> sense? Or are there not enough uses cases for such code? I found only a
> short mention in
>, so possibly
> these coroutine-generators are the same idea.
> async def f():
>     number_string = await fetch_data()
>     for n in number_string.split():
>         yield int(n)
> async def g():
>     result = async/await? sum(f())
>     return result
> async def h():
>     the_sum = await g()
> As for explanation about the execution of h() by an event loop: h is a
> native coroutine called by the event loop, having both returning pipe and
> async pipe. The returning pipe leads to the end of the task, the async pipe
> is used for cummunication with the scheduler. Then, g() is called
> asynchronously ? using the await keyword means the the access to the async
> pipe is given to the callee. Then g() invokes the asyncronous generator f()
> and gives it the access to its async pipe, so when f() is yielding values to
> sum, it can also yield a future to the scheduler via the async pipe and
> suspend the whole task.
> Regards, Adam Barto?
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:

Andrew Svetlov

From solipsis at  Wed Jun 24 12:40:10 2015
From: solipsis at (Antoine Pitrou)
Date: Wed, 24 Jun 2015 12:40:10 +0200
Subject: [Python-ideas] millisecond and microsecond times without floats
References: <20150623021530.74ce1ebe@x230>
 <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol>
Message-ID: <20150624124010.24cd3613@fsol>

On Wed, 24 Jun 2015 13:13:49 +0300
Paul Sokolovsky <pmiscml at> wrote:
> So, the question is not how to "appropriate for precise datetime
> computations" - MicroPython inherits that ability by being a Python,
> but how to scale into the opposite direction, how to integrate into
> stdlib "realtime" time handling, which is simple, fast (getting timing
> value itself is low-overhead) and modular-arithmetic by its nature.

I'm sorry, I don't understand. If you have 64-bit ints then why would
you use anything smaller for timestamps?



From pmiscml at  Wed Jun 24 12:59:08 2015
From: pmiscml at (Paul Sokolovsky)
Date: Wed, 24 Jun 2015 13:59:08 +0300
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <20150624124010.24cd3613@fsol>
References: <20150623021530.74ce1ebe@x230>
 <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol>
 <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol>
Message-ID: <20150624135908.4f85d415@x230>


On Wed, 24 Jun 2015 12:40:10 +0200
Antoine Pitrou <solipsis at> wrote:

> On Wed, 24 Jun 2015 13:13:49 +0300
> Paul Sokolovsky <pmiscml at> wrote:
> > 
> > So, the question is not how to "appropriate for precise datetime
> > computations" - MicroPython inherits that ability by being a Python,
> > but how to scale into the opposite direction, how to integrate into
> > stdlib "realtime" time handling, which is simple, fast (getting
> > timing value itself is low-overhead) and modular-arithmetic by its
> > nature.
> I'm sorry, I don't understand. If you have 64-bit ints then why would
> you use anything smaller for timestamps?

Because MicroPython stays close (== may stay close) to hardware and does
not depend on any OS (even those smaller embedded OSes, which are
called RTOS'es). Then, it's usual case for embedded hardware to have
hardware timers of the same size or smaller as the architecture machine
word. For example, on a 32-bit CPU, timers are usually 32-, 24-, or 16-
bit. On 16-bit CPUs, timers are 16- or 8-bit. Put it otherwise way,
there's simply nowhere to get 64-bit time value from, except by building
software abstractions, and MicroPython does not *require* them (if they
exist - good, they will be helpful for other things, if not -
MicroPython can still run and do a large subset of useful things).

Another reason is that MicroPython exactly uses tagged pointers
scheme, and small integers are value, not reference, objects. Dealing
with them is largely faster (MicroPython easily beats CPython on
(small) integer performance), and doesn't require memory allocation
(the latter is another important feature for embedded systems).

> Regards
> Antoine.

Best regards,
 Paul                          mailto:pmiscml at

From solipsis at  Wed Jun 24 13:03:38 2015
From: solipsis at (Antoine Pitrou)
Date: Wed, 24 Jun 2015 13:03:38 +0200
Subject: [Python-ideas] millisecond and microsecond times without floats
References: <20150623021530.74ce1ebe@x230>
 <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol>
 <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol>
Message-ID: <20150624130338.0b222ca3@fsol>

On Wed, 24 Jun 2015 13:59:08 +0300
Paul Sokolovsky <pmiscml at> wrote:
> Hello,
> On Wed, 24 Jun 2015 12:40:10 +0200
> Antoine Pitrou <solipsis at> wrote:
> > On Wed, 24 Jun 2015 13:13:49 +0300
> > Paul Sokolovsky <pmiscml at> wrote:
> > > 
> > > So, the question is not how to "appropriate for precise datetime
> > > computations" - MicroPython inherits that ability by being a Python,
> > > but how to scale into the opposite direction, how to integrate into
> > > stdlib "realtime" time handling, which is simple, fast (getting
> > > timing value itself is low-overhead) and modular-arithmetic by its
> > > nature.
> > 
> > I'm sorry, I don't understand. If you have 64-bit ints then why would
> > you use anything smaller for timestamps?
> Because MicroPython stays close (== may stay close) to hardware and does
> not depend on any OS (even those smaller embedded OSes, which are
> called RTOS'es). Then, it's usual case for embedded hardware to have
> hardware timers of the same size or smaller as the architecture machine
> word. For example, on a 32-bit CPU, timers are usually 32-, 24-, or 16-
> bit. On 16-bit CPUs, timers are 16- or 8-bit.

I don't think such timers have a place in the CPython standard library,
though. Don't you have an additional namespace for micropython-specific



From pmiscml at  Wed Jun 24 13:38:08 2015
From: pmiscml at (Paul Sokolovsky)
Date: Wed, 24 Jun 2015 14:38:08 +0300
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <20150624130338.0b222ca3@fsol>
References: <20150623021530.74ce1ebe@x230>
 <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol>
 <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol>
 <20150624135908.4f85d415@x230> <20150624130338.0b222ca3@fsol>
Message-ID: <20150624143808.29844019@x230>


On Wed, 24 Jun 2015 13:03:38 +0200
Antoine Pitrou <solipsis at> wrote:

> > Because MicroPython stays close (== may stay close) to hardware and
> > does not depend on any OS (even those smaller embedded OSes, which
> > are called RTOS'es). Then, it's usual case for embedded hardware to
> > have hardware timers of the same size or smaller as the
> > architecture machine word. For example, on a 32-bit CPU, timers are
> > usually 32-, 24-, or 16- bit. On 16-bit CPUs, timers are 16- or
> > 8-bit.
> I don't think such timers have a place in the CPython standard
> library, though. 

They don't, that was said in the very first message. They do have their
place in MicroPython's stdlib and arguably in any other embedded
Python's stdlib. There're number of embedded Python ports, I don't
know if they tried to address to wider Python community regarding
aspects peculiar to them. As you can see, we try to do the homework
on our side.

> Don't you have an additional namespace for micropython-specific
> features?

I treat it as a good sign that it's ~8th message in the thread and it's
only the first time we get a hint that we should get out with our stuff
into a separate namespace ;-). But of course, digging own hole and
putting random stuff in there is everyone's first choice. And
MicroPython has its "catch-all" module for random stuff imaginatively
called "pyb", and in (user-friendly) embedded, the de-facto API
standard is Arduino's, so that's what taken as a base for function

So, MicroPython currently has:


As can be seen, while these deal with time measurement/delays, they have
little in common with how Python does it. And the main question we seek
to answer is - what's more beneficial: to keep digging own hole or try
to take Python's API as a close affinity (while still adhering to
requirements posed by embedded platforms).

> Regards
> Antoine.

Best regards,
 Paul                          mailto:pmiscml at

From mal at  Wed Jun 24 13:43:55 2015
From: mal at (M.-A. Lemburg)
Date: Wed, 24 Jun 2015 13:43:55 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>	<>	<>	<mm6ctj$790$>	<>
Message-ID: <>

On 24.06.2015 10:22, Nick Coghlan wrote:
> On 24 June 2015 at 15:33, Eric Snow <ericsnowcurrently at> wrote:
>> On Sun, Jun 21, 2015 at 7:06 AM, Stefan Behnel <stefan_ml at> wrote:
>>> Nick Coghlan schrieb am 21.06.2015 um 03:28:
>>>> * there may be restrictions on some extension modules that limit them
>>>> to "main interpreter only" (e.g. if the extension module itself isn't
>>>> thread-safe, then it will need to remain fully protected by the GIL)
>>> Just an idea, but C extensions could opt-in to this. Calling into them has
>>> to go through some kind of callable type, usually PyCFunction. We could
>>> protect all calls to extension types and C functions with a global runtime
>>> lock (per process, not per interpreter) and Extensions could set a flag on
>>> their functions and methods (or get it inherited from their extension types
>>> etc.) that says "I don't need the lock". That allows for a very
>>> fine-grained transition.
>> Exactly.  PEP 489 helps facilitate opting in as well, right?
> Yep, as PEP 489 requires subinterpreter compatibility as a
> precondition for using multi-phase initialisation :)
> Cheers,
> Nick.
> P.S. Technically, what it actually requires is support for "multiple
> instances of the module existing in the same process at the same
> time", as it really recreates the module if you remove it from
> sys.modules and import it again, unlike single phase initialisation.
> But that's a mouthful, so "must support subinterpreters" is an easier
> shorthand.

Note that extension modules often interface to other C libraries
which typically use some setup logic that is not thread safe,
but is used to initialize the other thread safe parts. E.g.
setting up locks and shared memory for all threads to
use is a typical scenario you find in such libs.

A requirement to be able to import modules multiple times
would pretty much kill the idea for those modules.

That said, I don't think this is really needed. Modules
would only have to be made aware that there is a global
first time setup phase and a later shutdown/reinit phase.

As a result, the module DLL would load only once, but then
use the new module setup logic to initialize its own state
multiple times.

That said, I still think the multiple-process is a better one (more
robust, more compatible, fewer problems). We'd just need a way more
efficient approach to sharing objects between the Python processes
than using pickle and shared memory or pipes :-)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 24 2015)
>>> Python Projects, Coaching and Consulting ...
>>> mxODBC Plone/Zope Database Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...
2015-06-16: Released eGenix pyOpenSSL 0.13.10 ...
2015-07-20: EuroPython 2015, Bilbao, Spain ...             26 days to go
2015-07-29: Python Meeting Duesseldorf ...                 35 days to go Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From jonathan at  Wed Jun 24 13:54:28 2015
From: jonathan at (Jonathan Slenders)
Date: Wed, 24 Jun 2015 13:54:28 +0200
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
Message-ID: <>

In my experience, it's much easier to use asyncio Queues for this.
Instead of yielding, push to a queue. The consumer can then use "await

I think the semantics of the generator become too complicated otherwise, or
maybe impossible.
Maybe have a look at this article:


2015-06-24 12:13 GMT+02:00 Andrew Svetlov <andrew.svetlov at>:

> Your idea is clean and maybe we will allow `yield` inside `async def`
> in Python 3.6.
> For PEP 492 it was too big change.
> On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? <drekin at> wrote:
> > Hello,
> >
> > I had a generator producing pairs of values and wanted to feed all the
> first
> > members of the pairs to one consumer and all the second members to
> another
> > consumer. For example:
> >
> > def pairs():
> >     for i in range(4):
> >         yield (i, i ** 2)
> >
> > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9])
> >
> > The point is I wanted the consumers to be suspended and resumed in a
> > coordinated manner: The first producer is invoked, it wants the first
> > element. The coordinator implemented by biconsumer function invokes
> pairs(),
> > gets the first pair and yields its first member to the first consumer.
> Then
> > it wants the next element, but now it's the second consumer's turn, so
> the
> > first consumer is suspended and the second consumer is invoked and fed
> with
> > the second member of the first pair. Then the second producer wants the
> next
> > element, but it's the first consumer's turn? and so on. In the end, when
> the
> > stream of pairs is exhausted, StopIteration is thrown to both consumers
> and
> > their results are combined.
> >
> > The cooperative asynchronous nature of the execution reminded me asyncio
> and
> > coroutines, so I thought that biconsumer may be implemented using them.
> > However, it seems that it is imposible to write an "asynchronous
> generator"
> > since the "yielding pipe" is already used for the communication with the
> > scheduler. And even if it was possible to make an asynchronous
> generator, it
> > is not clear how to feed it to a synchronous consumer like sum() or
> list()
> > function.
> >
> > With PEP 492 the concepts of generators and coroutines were separated, so
> > asyncronous generators may be possible in theory. An ordinary function
> has
> > just the returning pipe ? for returning the result to the caller. A
> > generator has also a yielding pipe ? used for yielding the values during
> > iteration, and its return pipe is used to finish the iteration. A native
> > coroutine has a returning pipe ? to return the result to a caller just
> like
> > an ordinary function, and also an async pipe ? used for communication
> with a
> > scheduler and execution suspension. An asynchronous generator would just
> > have both yieling pipe and async pipe.
> >
> > So my question is: was the code like the following considered? Does it
> make
> > sense? Or are there not enough uses cases for such code? I found only a
> > short mention in
> >, so
> possibly
> > these coroutine-generators are the same idea.
> >
> > async def f():
> >     number_string = await fetch_data()
> >     for n in number_string.split():
> >         yield int(n)
> >
> > async def g():
> >     result = async/await? sum(f())
> >     return result
> >
> > async def h():
> >     the_sum = await g()
> >
> > As for explanation about the execution of h() by an event loop: h is a
> > native coroutine called by the event loop, having both returning pipe and
> > async pipe. The returning pipe leads to the end of the task, the async
> pipe
> > is used for cummunication with the scheduler. Then, g() is called
> > asynchronously ? using the await keyword means the the access to the
> async
> > pipe is given to the callee. Then g() invokes the asyncronous generator
> f()
> > and gives it the access to its async pipe, so when f() is yielding
> values to
> > sum, it can also yield a future to the scheduler via the async pipe and
> > suspend the whole task.
> >
> > Regards, Adam Barto?
> >
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at
> >
> > Code of Conduct:
> --
> Thanks,
> Andrew Svetlov
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From oreilldf at  Wed Jun 24 16:01:45 2015
From: oreilldf at (Dan O'Reilly)
Date: Wed, 24 Jun 2015 14:01:45 +0000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 2:01 AM Eric Snow <ericsnowcurrently at>

> On Sun, Jun 21, 2015 at 7:47 PM, Nick Coghlan <ncoghlan at> wrote:
> > It occurred to me in the context of another conversation that you (or
> > someone else!) may be able to prototype some of the public API ideas
> > for this using Jython and Vert.x:
> I'll take a look.
> Note that Vert.x 3 was just released today, which (at least for now) drops
support for Python. There is work underway to support it under version 3,
but it's using CPython and Py4J, not Jython. You'd need to use Vert.x 2 to
get Jython support:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From techtonik at  Wed Jun 24 15:07:59 2015
From: techtonik at (anatoly techtonik)
Date: Wed, 24 Jun 2015 16:07:59 +0300
Subject: [Python-ideas] natively logging sys.path modifications
Message-ID: <>


sys.path is kind of important thing for troubleshooting. It may worth
to ship it with a logging mechanism that allows to quickly dump
who added what and (optionally) why.

I see the log as a circular memory buffer of limited size, to say
256 entries, that contains tuples in the following format:

path, who, where, why

   path   -- actual path added to sys.path
   who    -- the context - package.module:function or
                         - package.module:class.method or
                         - package.module:__toplevel__
   where  -- full filename and line number to the instruction
   why    -- advanced API may allow to set this field

anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From techtonik at  Wed Jun 24 15:19:46 2015
From: techtonik at (anatoly techtonik)
Date: Wed, 24 Jun 2015 16:19:46 +0300
Subject: [Python-ideas] web API to get a list of all module in stdlib
Message-ID: <>


People from core-workflow are not too active about the idea,
so I finally found a time to repost it here. The original is here:

The idea is that site should export the list
of Python modules shipped in stdlib for particular Python
version in a machine readable format.

There are recipes like these to get the list of modules:

But they give only the modules enabled for specific
interpreter/platform. Not the list of modules that is included in
de-facto standard for this stdlib version.

This is need for processing information, for all Python
versions, so instead parsing HTML tables, it would be
more useful to directly fetch csv or json. That way anybody
can quickly validate the processing algorithm without
wasting time on extracting and normalizing the data.

I see the data as the necessary step to organize a work
around "externally evolving standard library", so a way
to query it should be somewhat sustainable and obvious.

Docs looks like an obvious way yo do so, like:

anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Wed Jun 24 16:59:34 2015
From: stephen at (Stephen J. Turnbull)
Date: Wed, 24 Jun 2015 23:59:34 +0900
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

Barry Warsaw writes:

 > A crazy offshoot idea would be something like Emacs' unexec, where
 > during the build process you could preload a bunch of always-used
 > immutable modules,

XEmacs doesn't do this any more if it can avoid it, we now have a
portable dumper that we use on almost all platforms.  And everybody at
GNU Emacs who works with the unexec code wants to get rid of it.
XEmacs's legacy unexec requires defeating address space randomization
as well as certain optimizations that combine segments.  I believe
Emacs's does too.  From a security standpoint, Emacsen are a child's
garden of diseases and it will take decades, maybe centuries, to fix
that, so those aren't huge problems for us.  But I suppose Python
needs to be able to work and play nicely with high-security
environments, and would like to take advantage of security-oriented OS
facilities like base address randomization.  That kind of thing hugely
complicates unexec -- last I heard it wasn't just "way too much work
to be worth it", the wonks who created the portable dumper didn't know
how to do it and weren't sure it could be done.

XEmacs's default "portable dumper" is a poor man's relocating loader.
I don't know exactly how it works, can't give details.  Unlike the
unexecs of some Lisps, however, this is a "do it once per build
process" design.  There's no explicit provision for keeping multiple
dumpfiles around, although I believe it can be done "by hand" by
someone with a little bit of knowledge.  The reason for this is that
the dumpfile is actually added to the executable.

Regarding performance, the dumper itself is fast enough to be
imperceptible to humans at load time, and doesn't take very long to
build the dump file containing the "frozen" objects when building.  I
suspect Python has applications where it would be like to be faster
than that, but I don't have benchmarks so don't know if this approach
would be fast enough.

This approach has the feature (disadvantage?) that some objects can't
be dumped including editing buffers, network connections, and
processes.  I suppose those restrictions are very similar to the
restrictions imposed by pickle.

If somebody wants to know more about the portable dumper, I can
probably connect them with the authors of that feature.

From sturla.molden at  Wed Jun 24 17:26:59 2015
From: sturla.molden at (Sturla Molden)
Date: Wed, 24 Jun 2015 17:26:59 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmei82$q2u$>

On 24/06/15 07:01, Eric Snow wrote:

> In return, my question is, what is the level of effort to get fork+IPC
> to do what we want vs. subinterpreters?  Note that we need to
> accommodate Windows as more than an afterthought

Windows is really the problem. The absence of fork() is especially 
hurtful for an interpreted language like Python, in my opinion.

>> If you think avoiding IPC is clever, you are wrong. IPC is very fast, in
>> fact programs written to use MPI tends to perform and scale better than
>> programs written to use OpenMP in parallel computing.
> I'd love to learn more about that.  I'm sure there are some great
> lessons on efficiently and safely sharing data between isolated
> execution environments.  That said, how does IPC compare to passing
> objects around within the same process?

There are two major competing standards for parallel computing in 
science and engineering: OpenMP and MPI. OpenMP is based on a shared 
memory model. MPI is based on a distributed memory model and use message 
passing (hence its name).

The common implementations of OpenMP (GNU, Intel, Microsoft) are all 
implemented with threads. There are also OpenMP implementations for 
clusters (e.g. Intel), but from the programmer's perspective OpenMP is a 
shared memory model.

The common implementations of MPI (MPICH, OpenMPI, Microsoft MPI) use 
processes instead of threads. Processes can run on the same computer or 
on different computers (aka "clusters"). On localhost shared memory is 
commonly used for message passing, on clusters MPI implementations will 
use networking protocols.

The take-home message is that OpenMP is conceptually easier to use, but 
programs written to use MPI tend to be faster and scale better. This is 
even true when using a single computer, e.g. a laptop with one multicore 

Here is tl;dr explanation:

As for ease of programming, it is easier to create a deadlock or 
livelock with MPI than OpenMP, even though programs written to use MPI 
tend to need fewer synchronization points. There is also less 
boilerplate code to type when using OpenMP, because we do not have to 
code object serialization, message passing, and object deserialization.

For performance, programs written to use MPI seems to have a larger 
overhead because they require object serialization and message passing, 
whereas OpenMP threads can just share the same objects. The reality is 
actually the opposite, and is due to the internals of modern CPU, 
particularly hierarchichal memory, branch prediction and long pipelines.

Because of hierarchichal memory, the cache used by CPUs and CPU cores 
must be kept in synch. Thus when using OpenMP (threads) there will be a 
lot of synchronization going on that the programmer does not see, but 
which the hardware will do behind the scenes. There will also be a lot 
of data passing between various cache levels on the CPU and RAM. If a 
core writes to a pice of memory it keeps in a cache line, a cascade of 
data traffic and synchronization can be triggered across all CPUs and 
cores. Not only will this stop the CPUs and prompt them to synchronize 
cache with RAM, it also invalidates their branch prediction and they 
must flush their pipelines and throw away work they have already done.
The end result is a program that does not scale or perform very well, 
even though it does not seem to have any explicit synchronization points 
that could explain this. The term "false sharing" is often used to 
describe this problem.

Programs written to use MPI are the opposite. There every instance of 
synchronization and message passing is visible. When a CPU core writes 
to memory kept in a cache line, it will never trigger synchronization 
and data traffic across all the CPUs. The scalability is as the program 
predicts. And even though memory and objects are not shared, there is 
actually much less data traffic going on.

Which to use? Most people find it easier to use OpenMP, and it does not 
require a big runtime environment to be installed. But programs using 
MPI tend to be the faster and more scalable. If you need to ensure 
scalability on multicores, multiple processes are better than multiple 
threads. The scalability of MPI also applies to Python's 
multiprocessing. It is the isolated virtual memory of each process that 
allows the cores to run at full speed.

Another thing to note is that Windows is not a second-class citizen when 
using MPI. The MPI runtime (usually an executable called mpirun or 
mpiexec) starts and manages a group of processes. It does not matter if 
they are started by fork() or CreateProcess().

> Solving reference counts in this situation is a separate issue that
> will likely need to be resolved, regardless of which machinery we use
> to isolate task execution.

As long as we have a GIL, and we need the GIL to update a reference 
count, it does not hurt so much as it otherwise would. The GIL hides 
most of the scalability impact by serializing flow of execution.

> IPC sounds great, but how well does it interact with Python's memory
> management/allocator?  I haven't looked closely but I expect that
> multiprocessing does not use IPC anywhere.

multiprocessing does use IPC. Otherwise the processes could not 
communicate. One example is multiprocessing.Queue, which uses a pipe and 
a semaphore.


From rosuav at  Wed Jun 24 17:47:25 2015
From: rosuav at (Chris Angelico)
Date: Thu, 25 Jun 2015 01:47:25 +1000
Subject: [Python-ideas] natively logging sys.path modifications
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 11:07 PM, anatoly techtonik <techtonik at> wrote:
> sys.path is kind of important thing for troubleshooting. It may worth
> to ship it with a logging mechanism that allows to quickly dump
> who added what and (optionally) why.
> I see the log as a circular memory buffer of limited size, to say
> 256 entries, that contains tuples in the following format:
> path, who, where, why
>    path   -- actual path added to sys.path
>    who    -- the context - package.module:function or
>                          - package.module:class.method or
>                          - package.module:__toplevel__
>    where  -- full filename and line number to the instruction
>    why    -- advanced API may allow to set this field

It should be possible for you to replace sys.path with an object of
your own invention, a subclass of list that records the above
information whenever it's modified. Install that early, then let all
the other changes get logged. Or have you tried this and found that it
breaks something?


From breamoreboy at  Wed Jun 24 17:58:06 2015
From: breamoreboy at (Mark Lawrence)
Date: Wed, 24 Jun 2015 16:58:06 +0100
Subject: [Python-ideas] natively logging sys.path modifications
In-Reply-To: <>
References: <>
Message-ID: <mmek2h$s91$>

On 24/06/2015 14:07, anatoly techtonik wrote:
> Hi,
> sys.path is kind of important thing for troubleshooting. It may worth
> to ship it with a logging mechanism that allows to quickly dump
> who added what and (optionally) why.
> I see the log as a circular memory buffer of limited size, to say
> 256 entries, that contains tuples in the following format:
> path, who, where, why
>     path   -- actual path added to sys.path
>     who    -- the context - package.module:function or
>                           - package.module:class.method or
>                           - package.module:__toplevel__
>     where  -- full filename and line number to the instruction
>     why    -- advanced API may allow to set this field

You see the log and somebody else does the work for you as you refuse to 
sign the CLA.  Do you want your bread buttered on both sides, or will 
one side suffice?

My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

From breamoreboy at  Wed Jun 24 18:03:10 2015
From: breamoreboy at (Mark Lawrence)
Date: Wed, 24 Jun 2015 17:03:10 +0100
Subject: [Python-ideas] web API to get a list of all module in stdlib
In-Reply-To: <>
References: <>
Message-ID: <mmekc1$1kd$>

On 24/06/2015 14:19, anatoly techtonik wrote:
> Hi,
> People from core-workflow are not too active about the idea,

If you want to resurrect this please sign the CLA and provide some code. 
  Otherwise please go away permanently, thank you.

My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

From rymg19 at  Wed Jun 24 18:15:49 2015
From: rymg19 at (Ryan Gonzalez)
Date: Wed, 24 Jun 2015 11:15:49 -0500
Subject: [Python-ideas] web API to get a list of all module in stdlib
In-Reply-To: <mmekc1$1kd$>
References: <>
Message-ID: <>

On June 24, 2015 11:03:10 AM CDT, Mark Lawrence <breamoreboy at> wrote:
>On 24/06/2015 14:19, anatoly techtonik wrote:
>> Hi,
>> People from core-workflow are not too active about the idea,
>If you want to resurrect this please sign the CLA and provide some
>  Otherwise please go away permanently, thank you.

FYI, there are nicer ways to say that...

Sent from my Android device with K-9 Mail. Please excuse my brevity.

From sturla.molden at  Wed Jun 24 18:28:54 2015
From: sturla.molden at (Sturla Molden)
Date: Wed, 24 Jun 2015 18:28:54 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmels6$sc8$>

On 24/06/15 07:01, Eric Snow wrote:

> Well, perception is 9/10ths of the law. :)  If the multi-core problem
> is already solved in Python then why does it fail in the court of
> public opinion.  The perception that Python lacks a good multi-core
> story is real, leads organizations away from Python, and will not
> improve without concrete changes.

I think it is a combination of FUD and the lack of fork() on Windows. 
There is a lot of utterly wrong information about CPython and its GIL.

The reality is that Python is used on even the largest supercomputers. 
The scalability problem that is seen on those systems is not the GIL, 
but the module import. If we have 1000 CPython processes importing 
modules like NumPy simultaneously, they will do a "denial of service 
attack" on the file system. This happens when the module importer 
generates a huge number of failed open() calls while trying to locate 
the module files.

There is even described in a paper on how to avoid this on an IBM Blue 
Brain: "As an example, on Blue Gene P just starting up Python and 
importing NumPy and GPAW with 32768 MPI tasks can take 45 minutes!"

And while CPython is being used for massive parallel computing to e.g. 
model the global climate system, there is this FUD that CPython does not 
even scale up on a laptop with a single multicore CPU. I don't know 
where it is coming from, but it is more FUD than truth.

The main answers to FUD about the GIL and Python in scientific computing 
are these:

1. Python in itself generates a 200x to 2000x performance hit compared 
to C or Fortran. Do not write compute kernels in Python, unless you can 
compile with Cython or Numba. If you have need for speed, start by 
moving the performance critical parts to Cython instead of optimizing 
for a few CPU cores.

2. If you can release the GIL, e.g. in Cython code, Python threads scale 
like any other native OS thread. They are real threads, not fake threads 
in the interpreter.

3. The 80-20, 90-10, or 99-1 rule: The majority of the code accounts for 
a small portion of the runtime. It is wasteful to optimize "everything". 
The more speed you need, the stronger this asymmetry will be. Identify 
the bottlenecks with a profiler and optimize those.

4. Using C or Java does not give you ha faster hard-drive or faster 
network connection. You cannot improve on network access by using 
threads in C or Java instead of threads in Python. If your code is i/o 
bound, Python's GIL does not matter. Python threads do execute i/o tasks 
in parallel. (This is the major misunderstanding.)

5. Computational intensive parts of a program is usually taken case of 
in libraries like BLAS, LAPACK, and FFTW. The Fortran code in LAPACK 
does not care if you called it from Python. It will be as fast as it can 
be, independent of Python. The Fortran code in LAPACK also have no 
concept of Python's GIL. LAPACK libraries like Intel MKL can use threads 
internally without asking Python for permission.

6. The scalability problem when using Python on a massive supercomputer 
is not the GIL but the module import.

7. When using OpenCL we write kernels as plain text. Python is excellent 
at manipulating text, more so than C. This also applies to using OpenGL 
for computer graphics with GLSL shaders and vetexbuffer objects. If you 
need the GPU, you can just as well use Python on the CPU.


From sturla.molden at  Wed Jun 24 18:58:01 2015
From: sturla.molden at (Sturla Molden)
Date: Wed, 24 Jun 2015 18:58:01 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>	<>	<>	<mm6ctj$790$>	<>
Message-ID: <mmenip$p8f$>

On 24/06/15 13:43, M.-A. Lemburg wrote:

> That said, I still think the multiple-process is a better one (more
> robust, more compatible, fewer problems). We'd just need a way more
> efficient approach to sharing objects between the Python processes
> than using pickle and shared memory or pipes :-)

It is hard to get around shared memory, Unix domain sockets, or pipes. 
There must be some sort of IPC, regardless.

One idea I have played with is to use a specialized queue instead of the 
current multiprocessing.Queue. In scientific computing we often need to 
pass arrays, so it would make sense to have a queue that could bypass 
pickle for NumPy arrays, scalars and dtypes, simply by using the NumPy C 
API to process the data. It could also have specialized code for a 
number of other objects -- at least str, int, float, complex, and PEP 
3118 buffers, but perhaps also simple lists, tuples and dicts with these 
types. I think it should be possible to make a queue that would avoid 
the pickle issue for 99 % of scientific computing. It would be very easy 
to write such a queue with Cython and e.g. have it as a part of NumPy or 

One thing I did some years ago was to have NumPy arrays that would store 
the data in shared memory. And when passed to multiprocessing.Queue they 
would not pickle the data buffer, only the metadata. However this did 
not improve on performance, because the pickle overhead was still there, 
and passing a lot of binary data over a pipe was not comparably 
expensive. So while it would save memory, it did not make programs using 
multiprocessing and NumPy more efficient.


From phd at  Wed Jun 24 21:08:48 2015
From: phd at (Oleg Broytman)
Date: Wed, 24 Jun 2015 21:08:48 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmei82$q2u$>
References: <>
Message-ID: <>


On Wed, Jun 24, 2015 at 05:26:59PM +0200, Sturla Molden <sturla.molden at> wrote:
> The absence of fork() is especially
> hurtful for an interpreted language like Python, in my opinion.

   I don't think fork is of major help for interpreted languages. When
most of your "code" is actually data most of your data pages are prone
to copy-on-write slowdown.

> Sturla

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From greg at  Wed Jun 24 21:31:56 2015
From: greg at (Gregory P. Smith)
Date: Wed, 24 Jun 2015 19:31:56 +0000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmei82$q2u$>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 8:27 AM Sturla Molden <sturla.molden at>

> On 24/06/15 07:01, Eric Snow wrote:
> > In return, my question is, what is the level of effort to get fork+IPC
> > to do what we want vs. subinterpreters?  Note that we need to
> > accommodate Windows as more than an afterthought
> Windows is really the problem. The absence of fork() is especially
> hurtful for an interpreted language like Python, in my opinion.

You cannot assume that fork() is safe on any OS as a general solution for
anything.  This isn't a Windows specific problem, It simply cannot be
relied upon in a general purpose library at all.  It is incompatible with

The ways fork() can be used safely are in top level application decisions:
There must be a guarantee of no threads running before all forking is done.
 (thus the impossibility of relying on it as a mechanism to do anything
useful in a generic library - you are a library, you don't know what the
whole application is doing or when you were called as part of it)

A concurrency model that assumes that it is fine to fork() and let child
processes continue to execute is not usable by everyone. (ie:
multiprocessing until was implemented).


> >> If you think avoiding IPC is clever, you are wrong. IPC is very fast, in
> >> fact programs written to use MPI tends to perform and scale better than
> >> programs written to use OpenMP in parallel computing.
> >
> > I'd love to learn more about that.  I'm sure there are some great
> > lessons on efficiently and safely sharing data between isolated
> > execution environments.  That said, how does IPC compare to passing
> > objects around within the same process?
> There are two major competing standards for parallel computing in
> science and engineering: OpenMP and MPI. OpenMP is based on a shared
> memory model. MPI is based on a distributed memory model and use message
> passing (hence its name).
> The common implementations of OpenMP (GNU, Intel, Microsoft) are all
> implemented with threads. There are also OpenMP implementations for
> clusters (e.g. Intel), but from the programmer's perspective OpenMP is a
> shared memory model.
> The common implementations of MPI (MPICH, OpenMPI, Microsoft MPI) use
> processes instead of threads. Processes can run on the same computer or
> on different computers (aka "clusters"). On localhost shared memory is
> commonly used for message passing, on clusters MPI implementations will
> use networking protocols.
> The take-home message is that OpenMP is conceptually easier to use, but
> programs written to use MPI tend to be faster and scale better. This is
> even true when using a single computer, e.g. a laptop with one multicore
> CPU.
> Here is tl;dr explanation:
> As for ease of programming, it is easier to create a deadlock or
> livelock with MPI than OpenMP, even though programs written to use MPI
> tend to need fewer synchronization points. There is also less
> boilerplate code to type when using OpenMP, because we do not have to
> code object serialization, message passing, and object deserialization.
> For performance, programs written to use MPI seems to have a larger
> overhead because they require object serialization and message passing,
> whereas OpenMP threads can just share the same objects. The reality is
> actually the opposite, and is due to the internals of modern CPU,
> particularly hierarchichal memory, branch prediction and long pipelines.
> Because of hierarchichal memory, the cache used by CPUs and CPU cores
> must be kept in synch. Thus when using OpenMP (threads) there will be a
> lot of synchronization going on that the programmer does not see, but
> which the hardware will do behind the scenes. There will also be a lot
> of data passing between various cache levels on the CPU and RAM. If a
> core writes to a pice of memory it keeps in a cache line, a cascade of
> data traffic and synchronization can be triggered across all CPUs and
> cores. Not only will this stop the CPUs and prompt them to synchronize
> cache with RAM, it also invalidates their branch prediction and they
> must flush their pipelines and throw away work they have already done.
> The end result is a program that does not scale or perform very well,
> even though it does not seem to have any explicit synchronization points
> that could explain this. The term "false sharing" is often used to
> describe this problem.
> Programs written to use MPI are the opposite. There every instance of
> synchronization and message passing is visible. When a CPU core writes
> to memory kept in a cache line, it will never trigger synchronization
> and data traffic across all the CPUs. The scalability is as the program
> predicts. And even though memory and objects are not shared, there is
> actually much less data traffic going on.
> Which to use? Most people find it easier to use OpenMP, and it does not
> require a big runtime environment to be installed. But programs using
> MPI tend to be the faster and more scalable. If you need to ensure
> scalability on multicores, multiple processes are better than multiple
> threads. The scalability of MPI also applies to Python's
> multiprocessing. It is the isolated virtual memory of each process that
> allows the cores to run at full speed.
> Another thing to note is that Windows is not a second-class citizen when
> using MPI. The MPI runtime (usually an executable called mpirun or
> mpiexec) starts and manages a group of processes. It does not matter if
> they are started by fork() or CreateProcess().
> > Solving reference counts in this situation is a separate issue that
> > will likely need to be resolved, regardless of which machinery we use
> > to isolate task execution.
> As long as we have a GIL, and we need the GIL to update a reference
> count, it does not hurt so much as it otherwise would. The GIL hides
> most of the scalability impact by serializing flow of execution.
> > IPC sounds great, but how well does it interact with Python's memory
> > management/allocator?  I haven't looked closely but I expect that
> > multiprocessing does not use IPC anywhere.
> multiprocessing does use IPC. Otherwise the processes could not
> communicate. One example is multiprocessing.Queue, which uses a pipe and
> a semaphore.
> Sturla
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From breamoreboy at  Wed Jun 24 22:11:29 2015
From: breamoreboy at (Mark Lawrence)
Date: Wed, 24 Jun 2015 21:11:29 +0100
Subject: [Python-ideas] web API to get a list of all module in stdlib
In-Reply-To: <>
References: <>
 <mmekc1$1kd$> <>
Message-ID: <mmf2tk$385$>

On 24/06/2015 17:15, Ryan Gonzalez wrote:
> On June 24, 2015 11:03:10 AM CDT, Mark Lawrence <breamoreboy at> wrote:
>> On 24/06/2015 14:19, anatoly techtonik wrote:
>>> Hi,
>>> People from core-workflow are not too active about the idea,
>> If you want to resurrect this please sign the CLA and provide some
>> code.
>>   Otherwise please go away permanently, thank you.
> FYI, there are nicer ways to say that...

Regretably to the OP there aren't, he has no concept of taking other 
people into account.  Possibly he's autistic the same as me, who knows? 
  All I do know is that he's driven a highly respected member of the 
community as in Nick Coghlan away from the core workflow mailing list. 
Still like one of the Piranha brothers he used to buy his mother flowers 
and things, so that's okay.

My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

From mal at  Wed Jun 24 22:16:56 2015
From: mal at (M.-A. Lemburg)
Date: Wed, 24 Jun 2015 22:16:56 +0200
Subject: [Python-ideas] web API to get a list of all module in stdlib
In-Reply-To: <mmf2tk$385$>
References: <>	<mmekc1$1kd$>	<>
Message-ID: <>

Please keep discussions on topic and avoid heading off into the
woods - there are snakes out there and those are not the kinds
we're discussing here :-)

Thank you,
Marc-Andre Lemburg
Python Software Foundation

From breamoreboy at  Wed Jun 24 22:49:53 2015
From: breamoreboy at (Mark Lawrence)
Date: Wed, 24 Jun 2015 21:49:53 +0100
Subject: [Python-ideas] web API to get a list of all module in stdlib
In-Reply-To: <>
References: <>	<mmekc1$1kd$>	<>
 <mmf2tk$385$> <>
Message-ID: <mmf55k$7jt$>

On 24/06/2015 21:16, M.-A. Lemburg wrote:
> Please keep discussions on topic and avoid heading off into the
> woods - there are snakes out there and those are not the kinds
> we're discussing here :-)
> Thank you,

Thank you for bringing me so gently back to earth, I most seriously 
appreciate it.

Should any of you ever head into Mudeford, Christchurch, Dorset, UK, 
beers are on me, at the inaugural meeting of the local Python Users 
Group.  This would obviously have to be called MudPy or MudePy :)

My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

From mal at  Wed Jun 24 22:50:58 2015
From: mal at (M.-A. Lemburg)
Date: Wed, 24 Jun 2015 22:50:58 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmenip$p8f$>
References: <>	<>	<>	<mm6ctj$790$>	<>	<>	<>
Message-ID: <>

On 24.06.2015 18:58, Sturla Molden wrote:
> On 24/06/15 13:43, M.-A. Lemburg wrote:
>> That said, I still think the multiple-process is a better one (more
>> robust, more compatible, fewer problems). We'd just need a way more
>> efficient approach to sharing objects between the Python processes
>> than using pickle and shared memory or pipes :-)
> It is hard to get around shared memory, Unix domain sockets, or pipes. There must be some sort of
> IPC, regardless.

Sure, but the current approach of pickling Python objects for
communication is just too much overhead in many cases - it
also duplicates the memory requirements when using the multiple
process approach since you eventually end up having n copies of
the same data in memory (with n = number of parallel workers).

> One idea I have played with is to use a specialized queue instead of the current
> multiprocessing.Queue. In scientific computing we often need to pass arrays, so it would make sense
> to have a queue that could bypass pickle for NumPy arrays, scalars and dtypes, simply by using the
> NumPy C API to process the data. It could also have specialized code for a number of other objects
> -- at least str, int, float, complex, and PEP 3118 buffers, but perhaps also simple lists, tuples
> and dicts with these types. I think it should be possible to make a queue that would avoid the
> pickle issue for 99 % of scientific computing. It would be very easy to write such a queue with
> Cython and e.g. have it as a part of NumPy or SciPy.

The tricky part is managing pointers in those data structures,
e.g. a container types for other Python objects will have to
store all referenced objects in the shared memory segment as

For NumPy arrays using simple types this is a lot easier,
since you don't have to deal with pointers to other objects.

> One thing I did some years ago was to have NumPy arrays that would store the data in shared memory.
> And when passed to multiprocessing.Queue they would not pickle the data buffer, only the metadata.
> However this did not improve on performance, because the pickle overhead was still there, and
> passing a lot of binary data over a pipe was not comparably expensive. So while it would save
> memory, it did not make programs using multiprocessing and NumPy more efficient.

When saying "passing a lot of binary data over a pipe" you mean
the meta-data ?

I had discussed the idea of Python object sharing with Larry
Hastings back in 2013, but decided that trying to get
all references of containers managed in the shared memory
would be too fragile an approach to pursue further.

Still, after some more research later that year, I found that
someone already had investigated the idea in 2003:

Reading the paper on this:

made me wonder why this idea never received more attention in
all these years.

His results are clearly positive and show that the multiple
process approach can provide better scalability than
using threads when combined with shared memory object

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 24 2015)
>>> Python Projects, Coaching and Consulting ...
>>> mxODBC Plone/Zope Database Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...
2015-06-16: Released eGenix pyOpenSSL 0.13.10 ...
2015-07-20: EuroPython 2015, Bilbao, Spain ...             26 days to go
2015-07-29: Python Meeting Duesseldorf ...                 35 days to go Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From sturla.molden at  Wed Jun 24 23:41:02 2015
From: sturla.molden at (Sturla Molden)
Date: Wed, 24 Jun 2015 23:41:02 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>	<>	<>	<mm6ctj$790$>	<>	<>	<>
 <mmenip$p8f$> <>
Message-ID: <mmf85f$m65$>

On 24/06/15 22:50, M.-A. Lemburg wrote:

> The tricky part is managing pointers in those data structures,
> e.g. a container types for other Python objects will have to
> store all referenced objects in the shared memory segment as
> well.

If a container type for Python objects contains some unknown object type 
we would have to use pickle as fallback.

> For NumPy arrays using simple types this is a lot easier,
> since you don't have to deal with pointers to other objects.

The objects we deal with in scientific computing are usually arrays with 
a rather regular structure, not deeply nested Python objects. Even a 
more complex object like scipy.spatial.cKDTree is just a collection of a 
few contiguous arrays under the hood. So we could for most parts squash 
the pickle overhead that anyone will encounter by specializing a queue 
that has knowledge about a small set of Python types.

> When saying "passing a lot of binary data over a pipe" you mean
> the meta-data ?

No, I mean the buffer pointed to by PyArray_DATA(obj) when using the 
NumPy C API. We have to send a lot of raw bytes over an IPC mechanism 
before this communication compares to the pickle overhead.


From sturla.molden at  Wed Jun 24 23:48:42 2015
From: sturla.molden at (Sturla Molden)
Date: Wed, 24 Jun 2015 23:48:42 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmf85f$m65$>
References: <>	<>	<>	<mm6ctj$790$>	<>	<>	<>
 <mmenip$p8f$> <>
Message-ID: <mmf8js$u5l$>

On 24/06/15 23:41, Sturla Molden wrote:

> So we could for most parts squash
> the pickle overhead that anyone will encounter by specializing a queue
> that has knowledge about a small set of Python types.

But this would be very domain specific for scientific and numerical 
computing, it would not be a general improvement for multiprocessing 
with Python.


From wes.turner at  Wed Jun 24 23:57:35 2015
From: wes.turner at (Wes Turner)
Date: Wed, 24 Jun 2015 16:57:35 -0500
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmf8js$u5l$>
References: <>
 <> <mmenip$p8f$>
 <> <mmf85f$m65$>
Message-ID: <>

On Jun 24, 2015 4:49 PM, "Sturla Molden" <sturla.molden at> wrote:
> On 24/06/15 23:41, Sturla Molden wrote:
>> So we could for most parts squash
>> the pickle overhead that anyone will encounter by specializing a queue
>> that has knowledge about a small set of Python types.
> But this would be very domain specific for scientific and numerical
computing, it would not be a general improvement for multiprocessing with

Basically C structs like Thrift or Protocol Buffers?

> Sturla
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jeanpierreda at  Thu Jun 25 00:02:04 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Wed, 24 Jun 2015 15:02:04 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 12:31 PM, Gregory P. Smith <greg at> wrote:
> You cannot assume that fork() is safe on any OS as a general solution for
> anything.  This isn't a Windows specific problem, It simply cannot be relied
> upon in a general purpose library at all.  It is incompatible with threads.
> The ways fork() can be used safely are in top level application decisions:
> There must be a guarantee of no threads running before all forking is done.
> (thus the impossibility of relying on it as a mechanism to do anything
> useful in a generic library - you are a library, you don't know what the
> whole application is doing or when you were called as part of it)
> A concurrency model that assumes that it is fine to fork() and let child
> processes continue to execute is not usable by everyone. (ie:
> multiprocessing until was implemented).

Another way of looking at it is that a concurrency model that assumes
it is fine to thread and let child threads continue to execute is not
usable by everyone.

IMO the lesson here is don't start threads *or* fork processes behind
the scenes without explicitly allowing your callers to override you,
so that the top level app can orchestrate everything appropriately.
This is especially important in Python, where forking is one of the
best ways of getting single-machine multicore processing.

Interestingly, the worker threads in OP can probably be made
fork-safe. Not sure that's especially useful, but I can imagine.

-- Devin

From jeanpierreda at  Thu Jun 25 00:10:48 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Wed, 24 Jun 2015 15:10:48 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

I'm going to break mail client threading and also answer some of your
other emails here.

On Tue, Jun 23, 2015 at 10:26 PM, Eric Snow <ericsnowcurrently at> wrote:
> It sounded like you were suggesting that we factor out a common code
> base that could be used by multiprocessing and the other machinery and
> that only multiprocessing would keep the pickle-related code.

Yes, I like that idea a lot.

>> Compare
>> with forking, where the initialization is all done and then you fork,
>> and you are immediately ready to serve, using the data structures
>> shared with all the other workers, which is only copied when it is
>> written to. So forking starts up faster and uses less memory (due to
>> shared memory.)
> But we are aiming for a share-nothing model with an efficient
> object-passing mechanism.  Furthermore, subinterpreters do not have to
> be single-use.  My proposal includes running tasks in an existing
> subinterpreter (e.g. executor pool), so that start-up cost is
> mitigated in cases where it matters.
> Note that ultimately my goal is to make it obvious and undeniable that
> Python (3.6+) has a good multi-core story.  In my proposal,
> subinterpreters are a means to an end.  If there's a better solution
> then great!  As long as the real goal is met I'll be satisfied. :)
> For now I'm still confident that the subinterpreter approach is the
> best option for meeting the goal.

Ahead of time: the following is my opinion. My opinions are my own,
and bizarre, unlike the opinions of my employer and coworkers. (Who
are also reading this maybe.)

So there's two reasons I can think of to use threads for CPU parallelism:

- My thing does a lot of parallel work, and so I want to save on
memory by sharing an address space

This only becomes an especially pressing concern if you start running
tens of thousands or more of workers. Fork also allows this.

- My thing does a lot of communication, and so I want fast
communication through a shared address space

This can become a pressing concern immediately, and so is a more
visible issue. However, it's also a non-problem for many kinds of
tasks which just take requests in and put output back out, without
talking with other members of the pool (e.g. writing an RPC server or
HTTP server.)

I would also speculate that once you're on many machines, unless
you're very specific with your design, RPC costs dominate IPC costs to
the point where optimizing IPC doesn't do a lot for you.

On Unix, IPC can be free or cheap due to shared memory.

Threads really aren't all that important, and if we need them, we have
them. When people tell me in #python that multicore in Python is bad
because of the GIL, I point them at fork and at C extensions, but also
at PyPy-STM and Jython. Everything has problems, but then so does this
proposal, right?

> And this is faster than passing objects around within the same
> process?  Does it play well with Python's memory model?

As far as whether it plays with the memory model,
multiprocessing.Value() just works, today. To make it even lower
overhead (not construct an int PyObject* on the fly), you need to
change things, e.g. the way refcounts work. I think it's possibly
feasible. If not, at least the overhead would be negligible.

Same applies to strings and other non-compound datatypes. Compound
datatypes are hard even for the subinterpreter case, just because the
objects you're referring to are not likely to exist on the other end,
so you need a real copy. I'm sure you've thought about this.
multiprocessing.Array has a solution for this, which is to unbox the
contained values. It won't work with tuples.

> I'd be interested in more info on both the refcount freezing and the
> sepatate refcounts pages.

 I can describe the patches:

- separate refcounts replaces refcount with a pointer to refcount, and
changes incref/decref.
- refcount freezing lets you walk all objects and set the reference
count to a magic value. incref/decref check if the refcount is frozen
before working.

With freezing, unlike this approach to separate refcounts, anyone that
touches the refcount manually will just dirty the page and unfreeze
the refcount, rather than crashing the process.

Both of them will decrease performance for non-forking python code,
but for forking code it can be made up for e.g. by increased worker
lifetime and decreased rate of page copying, plus the whole CPU vs
memory tradeoff.

I legitimately don't remember the difference in performance, which is
good because I'm probably not allowed to say what it was, as it was
tested on our actual app and not microbenchmarks. ;)

>> And remember that we *do* have many examples of people using
>> parallelized Python code in production. Are you sure you're satisfying
>> their concerns, or whose concerns are you trying to satisfy?
> Another good point.  What would you suggest is the best way to find out?

I don't necessarily mean that. I mean that this thread feels like you
posed an answer and I'm not sure what the question is. Is it about
solving a real technical problem? What is that, and who does it
affect? A new question I didn't ask before: is the problem with Python
as a whole, or just CPython?

-- Devin

From ericsnowcurrently at  Thu Jun 25 00:19:27 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 16:19:27 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmei82$q2u$>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 9:26 AM, Sturla Molden <sturla.molden at> wrote:
> On 24/06/15 07:01, Eric Snow wrote:
> There are two major competing standards for parallel computing in science
> and engineering: OpenMP and MPI. OpenMP is based on a shared memory model.
> MPI is based on a distributed memory model and use message passing (hence
> its name).
> [snip]

Thanks for the great explanation!

>> Solving reference counts in this situation is a separate issue that
>> will likely need to be resolved, regardless of which machinery we use
>> to isolate task execution.
> As long as we have a GIL, and we need the GIL to update a reference count,
> it does not hurt so much as it otherwise would. The GIL hides most of the
> scalability impact by serializing flow of execution.

It does hurt in COW situations, e.g. forking.  My expectation is that
we'll at least need to take a serious look into the matter in the
short term (i.e. Python 3.6).

>> IPC sounds great, but how well does it interact with Python's memory
>> management/allocator?  I haven't looked closely but I expect that
>> multiprocessing does not use IPC anywhere.
> multiprocessing does use IPC. Otherwise the processes could not communicate.
> One example is multiprocessing.Queue, which uses a pipe and a semaphore.

Right.  I don't know quite what I was thinking. :)


From ericsnowcurrently at  Thu Jun 25 00:56:17 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 16:56:17 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmels6$sc8$>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 10:28 AM, Sturla Molden <sturla.molden at> wrote:
> On 24/06/15 07:01, Eric Snow wrote:
>> Well, perception is 9/10ths of the law. :)  If the multi-core problem
>> is already solved in Python then why does it fail in the court of
>> public opinion.  The perception that Python lacks a good multi-core
>> story is real, leads organizations away from Python, and will not
>> improve without concrete changes.
> I think it is a combination of FUD and the lack of fork() on Windows. There
> is a lot of utterly wrong information about CPython and its GIL.

Thanks for a clear summary of the common misunderstandings.  While I
agreed with your points, they are mostly the same things we have been
communicating for many years, to no avail.  They are also oriented
toward larger-scale parallelism (which I don't mean to discount).
That makes it easier to misunderstand.

Why?  Because there are enough caveats and performance downsides (see
Dave Beazley's PyCon 2015 talk) that most folks stop trying to
rationalize, throw their hands up, and say "Python concurrency stinks"
and "you can't *really* do multicore on Python".  I have personal
experience with high-profile decision makers where this is exactly
what happened, with adverse consequences to support for Python within
the organizations.

To change this perception we need to give folks a simpler, performant
concurrency model that takes advantage of multiple cores.  My proposal
is all about doing at least *something* that makes Python's multi-core
story obvious and undeniable.

*That* is my entire goal with this proposal.  Clearly I have opinions
on the best approach to achieve that in the 3.6 timeframe. :)
However, I am quite willing to investigate all the options (as I hope
this thread demonstrates).

So, again, thanks for the feedback and insight.  You've provided me
with plenty of food for thought.


From sturla.molden at  Thu Jun 25 01:30:21 2015
From: sturla.molden at (Sturla Molden)
Date: Thu, 25 Jun 2015 01:30:21 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmfeif$pme$>

On 25/06/15 00:10, Devin Jeanpierre wrote:

> So there's two reasons I can think of to use threads for CPU parallelism:
> - My thing does a lot of parallel work, and so I want to save on
> memory by sharing an address space
> This only becomes an especially pressing concern if you start running
> tens of thousands or more of workers. Fork also allows this.

This might not be a valid concern. Sharing address space means sharing 
*virtual memory*. Presumably what they really want is to save *physical 
memory*. Two processes can map the same physical memory into virtual memory.

> - My thing does a lot of communication, and so I want fast
> communication through a shared address space
> This can become a pressing concern immediately, and so is a more
> visible issue.

This is a valid argument. It is mainly a concern for those who use 
deeply nested Python objects though.

> On Unix, IPC can be free or cheap due to shared memory.

This is also the case on Windows.

IPC mechanisms like pipes, fifos, Unix domain sockets are also very 
cheap on Unix.

Pipes are also very cheap on Windows, as are tcp sockets on localhost. 
Windows named pipes are similar to Unix domain sockets in performance.

> Same applies to strings and other non-compound datatypes. Compound
> datatypes are hard even for the subinterpreter case, just because the
> objects you're referring to are not likely to exist on the other end,
> so you need a real copy.


With a "share nothing" message-passing approach, one will have to make 
deep copies of any mutable object. And even though a tuple can be 
immutable, it could still contain mutable objects. It is really hard to 
get around the pickle overhead with subinterpreters. Since the pickle 
overhead is huge compared to the low-level IPC, there is very little to 
save in this manner.

> - separate refcounts replaces refcount with a pointer to refcount, and
> changes incref/decref.
> - refcount freezing lets you walk all objects and set the reference
> count to a magic value. incref/decref check if the refcount is frozen
> before working.
> With freezing, unlike this approach to separate refcounts, anyone that
> touches the refcount manually will just dirty the page and unfreeze
> the refcount, rather than crashing the process.
> Both of them will decrease performance for non-forking python code,

Freezing has little impact on a modern CPU with branch prediction. On 
GCC we can also use __builtin_expect to make sure the optimal code is 

This is a bit similar to using typed memoryviews and NumPy arrays in 
Cython with and without bounds checking. A pragma like 
@cython.boundscheck(False) have little benefit for the performance 
because of the CPU's branch prediction. The CPU knows it can expect the 
bounds check to pass, and only if it fails will it have to flush the 
pipeline. But if the bounds check passes the pipeline need not be 
flushed, and performance wise it will be as if the test were never 
there. This has greatly improved the last decade, particularly because 
processors have been optimized for running languages like Java and .NET 
efficiently. A check for a thawed refcount would be similarly cheap.

Keeping reference counts in extra pages could impair performance, but 
mostly if multiple threads are allowed to access the same page. Because 
of hierachical memory, the extra pointer lookup should not matter much. 
Modern CPUs have evolved to solve the aliasing problem that formerly 
made Fortran code run faster than similar C code. Today C code tends to 
be faster than similar Fortran. This helps if we keep refcounts in a 
separate page, and the compiler cannot know what the pointer actually 
refers to and what it might alias. 10 or 15 years ago it would have been 
a performance killer, but not today.


From sturla.molden at  Thu Jun 25 01:47:07 2015
From: sturla.molden at (Sturla Molden)
Date: Thu, 25 Jun 2015 01:47:07 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmffhu$83h$>

On 25/06/15 00:56, Eric Snow wrote:

> Why?  Because there are enough caveats and performance downsides (see
> Dave Beazley's PyCon 2015 talk) that most folks stop trying to
> rationalize, throw their hands up, and say "Python concurrency stinks"
> and "you can't *really* do multicore on Python".

Yes, that seems to be the case.

> To change this perception we need to give folks a simpler, performant
> concurrency model that takes advantage of multiple cores.  My proposal
> is all about doing at least *something* that makes Python's multi-core
> story obvious and undeniable.

I think the main issue with subinterpreters and a message-passing model 
is that it will be very difficult to avoid deep copies of Python 
objects. And in that case all we have achieved compared to 
multiprocessing is less scalability.

Also you have not removed the GIL, so the FUD about the dreaded GIL will 
still be around. Clearly introducing multiprocessing in the standard 
library did nothing to reduce this.


From njs at  Thu Jun 25 01:55:31 2015
From: njs at (Nathaniel Smith)
Date: Wed, 24 Jun 2015 16:55:31 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 3:10 PM, Devin Jeanpierre
<jeanpierreda at> wrote:
> So there's two reasons I can think of to use threads for CPU parallelism:
> - My thing does a lot of parallel work, and so I want to save on
> memory by sharing an address space
> This only becomes an especially pressing concern if you start running
> tens of thousands or more of workers. Fork also allows this.

Not necessarily true... e.g., see two threads from yesterday (!) on
the pandas mailing list, from users who want to perform queries
against a large data structure shared between threads/processes:!topic/pydata/wOwe21I65-I
("Are we just screwed on windows?")


Nathaniel J. Smith --

From sturla.molden at  Thu Jun 25 02:02:05 2015
From: sturla.molden at (Sturla Molden)
Date: Thu, 25 Jun 2015 02:02:05 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmfgdu$kip$>

On 25/06/15 00:19, Eric Snow wrote:

>>> Solving reference counts in this situation is a separate issue that
>>> will likely need to be resolved, regardless of which machinery we use
>>> to isolate task execution.
>> As long as we have a GIL, and we need the GIL to update a reference count,
>> it does not hurt so much as it otherwise would. The GIL hides most of the
>> scalability impact by serializing flow of execution.
> It does hurt in COW situations, e.g. forking.  My expectation is that
> we'll at least need to take a serious look into the matter in the
> short term (i.e. Python 3.6).


It hurts performance after forking as reference counting will trigger a 
lot of page copies. Keeping reference counts in separate pages and 
replacing the field in the PyObject struct would reduce this problem by 
a factor of up to 512 (64 bit) or 1024 (32 bit).

It does not hurt performance with multi-threading, as Python threads are 
serialized by the GIL. But if the GIL was removed it would result in a 
lot of false sharing. That is a major reason we need a tracing garbage 
collector instead of reference counting if we shall be able to remove 
the GIL.


From jeanpierreda at  Thu Jun 25 02:09:55 2015
From: jeanpierreda at (Devin Jeanpierre)
Date: Wed, 24 Jun 2015 17:09:55 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmfeif$pme$>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 4:30 PM, Sturla Molden <sturla.molden at> wrote:
> On 25/06/15 00:10, Devin Jeanpierre wrote:
>> So there's two reasons I can think of to use threads for CPU parallelism:
>> - My thing does a lot of parallel work, and so I want to save on
>> memory by sharing an address space
>> This only becomes an especially pressing concern if you start running
>> tens of thousands or more of workers. Fork also allows this.
> This might not be a valid concern. Sharing address space means sharing
> *virtual memory*. Presumably what they really want is to save *physical
> memory*. Two processes can map the same physical memory into virtual memory.

Yeah, physical memory. I agree, processes with shared memory can be
made to work in practice. Although, threads are better for memory
usage, by defaulting to sharing even on write. (Good for memory, maybe
not so good for bug-freedom...)

So from my perspective, this is the hard problem in multicore python.
My views may be skewed by the peculiarities of the one major app I've
worked on.

>> Same applies to strings and other non-compound datatypes. Compound
>> datatypes are hard even for the subinterpreter case, just because the
>> objects you're referring to are not likely to exist on the other end,
>> so you need a real copy.
> Yes.
> With a "share nothing" message-passing approach, one will have to make deep
> copies of any mutable object. And even though a tuple can be immutable, it
> could still contain mutable objects. It is really hard to get around the
> pickle overhead with subinterpreters. Since the pickle overhead is huge
> compared to the low-level IPC, there is very little to save in this manner.

I think this is giving up too easily. Here's a stupid idea for
sharable interpreter-specific objects:

You keep a special heap for immutable object refcounts, where each
thread/process has its own region in the heap. Refcount locations are
stored as offsets into the thread local heap, and incref does
++*(threadlocal_refcounts + refcount_offset);

Then for the rest of a pyobject's memory, we share by default and
introduce a marker for which thread originated it. Any non-threadsafe
operations can check if the originating thread id is the same as the
current thread id, and raise an exception if not, before even reading
the memory at all. So it introduces an overhead to accessing mutable
objects. Also, this won't work with extension objects that don't
check, those just get shared and unsafely mutate and crash.

This also introduces the possibility of sharing mutable objects
between interpreters, if the objects themselves choose to implement
fine-grained locking. And it should work fine with fork if we change
how the refcount heap is allocated, to use mmap or whatever.

This is probably not acceptable for real, but I just mean to show with
a straw man that the problem can be attacked.

-- Devin

From rosuav at  Thu Jun 25 02:12:07 2015
From: rosuav at (Chris Angelico)
Date: Thu, 25 Jun 2015 10:12:07 +1000
Subject: [Python-ideas] natively logging sys.path modifications
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jun 25, 2015 at 4:26 AM, anatoly techtonik <techtonik at> wrote:
> That object will be broken if somebody decides to use assignment:
> sys.path = []
> And as far as I know it is not possible to prevent this case or guard
> against this replacement.

So what you want is for the sys module to log all assignments to a
particular attribute, AND for all mutations of that attribute to be
logged as well. That sounds like two completely separate problems to
be solved, but neither is fundamentally impossible (although you'd
need to fiddle with the sys module itself to do the other). I suggest
you investigate ways of solving this that require zero core code
changes, as those ways will work on all existing Python versions. Then
once you run up against an actual limitation, you'll have a better
argument for code changes.


From sturla.molden at  Thu Jun 25 02:45:24 2015
From: sturla.molden at (Sturla Molden)
Date: Thu, 25 Jun 2015 02:45:24 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmfiv7$qmd$>

On 25/06/15 02:09, Devin Jeanpierre wrote:

> Although, threads are better for memory
> usage, by defaulting to sharing even on write. (Good for memory, maybe
> not so good for bug-freedom...)

I am not sure. Code written to use OpenMP tend to have less bugs than 
code written to use MPI. This suggests that shared memory is easier than 
message-passing, which is contrary to the common belief.

My own experience with OpenMP and MPI suggests it is easier to create a 
deadlock with message-passing than accidentally have threads access the 
same address concurrently. This is also what I hear from other people 
who writes code for scientific computing.

I see a lot of claims that message-passing is supposed to be "safer" 
than a shared memory model, but that is not what we see with OpenMP and 
MPI. With MPI, the programmer must make sure that the send and receive 
commands are passed in the right order at the right time, in each 
process. This leaves plenty of room for messing up or creating 
unmaintainable spaghetti code, particularly in a complex algorithm. It 
is easier to make sure all shared objects are protected with mutexes 
than to make sure a spaghetti of send and receive messages are in 
correct order.

It might be that Python's queue method of passing messages leave less 
room for deadlocking than the socket-like MPI_send and MPI_recv 
functions. But I think message-passing are sometimes overrated as "the 
safe solution" to multi-core programming (cf. Go and Erlang).


From njs at  Thu Jun 25 03:05:17 2015
From: njs at (Nathaniel Smith)
Date: Wed, 24 Jun 2015 18:05:17 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmfiv7$qmd$>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 5:45 PM, Sturla Molden <sturla.molden at> wrote:
> On 25/06/15 02:09, Devin Jeanpierre wrote:
>> Although, threads are better for memory
>> usage, by defaulting to sharing even on write. (Good for memory, maybe
>> not so good for bug-freedom...)
> I am not sure. Code written to use OpenMP tend to have less bugs than code
> written to use MPI. This suggests that shared memory is easier than
> message-passing, which is contrary to the common belief.

OpenMP is an *extremely* structured and constrained subset of shared
memory multithreading, and not at all comparable to


Nathaniel J. Smith --

From sturla.molden at  Thu Jun 25 03:31:51 2015
From: sturla.molden at (Sturla Molden)
Date: Thu, 25 Jun 2015 01:31:51 +0000 (UTC)
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <>

Nathaniel Smith <njs at> wrote:

> OpenMP is an *extremely* structured and constrained subset of shared
> memory multithreading, and not at all comparable to
> pthreads/

If you use "parallel section" it is almost as free as using pthreads
directly. But if you stick to "parallel for", which most do, you have a
rather constrained and more well-behaved subset. I am quite sure MPI can
even be a source of more errors than pthreads used directly. Getting
message passing right inside a complex algorithm is not funny. I would
rather keep my mind focused on which objects to protect with a lock or when
to signal a condition. 


From ericsnowcurrently at  Thu Jun 25 03:57:19 2015
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 24 Jun 2015 19:57:19 -0600
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmels6$sc8$>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 10:28 AM, Sturla Molden <sturla.molden at> wrote:
> The reality is that Python is used on even the largest supercomputers. The
> scalability problem that is seen on those systems is not the GIL, but the
> module import. If we have 1000 CPython processes importing modules like
> NumPy simultaneously, they will do a "denial of service attack" on the file
> system. This happens when the module importer generates a huge number of
> failed open() calls while trying to locate the module files.
> There is even described in a paper on how to avoid this on an IBM Blue
> Brain: "As an example, on Blue Gene P just starting up Python and importing
> NumPy and GPAW with 32768 MPI tasks can take 45 minutes!"

I'm curious what difference there is under Python 3.4 (or even 3.3).
Along with being almost entirely pure Python, the import system now
has some optimizations that help mitigate filesystem access
(particularly stats).

Regardless, have there been any attempts to address this situation?
I'd be surprised if there haven't. :)  Is the solution described in
the cited paper sufficient?  Earlier Barry brought up Emac's unexec as
at least an inspiration for a solution.  I expect there are a number
of approaches.  It would be nice to address this somehow (though
unrelated to my multi-core proposal).  I would expect that it could
also have bearing on interpreter start-up time.  If it's worth
pursuing then consider posting something to import-sig.


From trent at  Thu Jun 25 08:50:52 2015
From: trent at (Trent Nelson)
Date: Thu, 25 Jun 2015 02:50:52 -0400
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 04:55:31PM -0700, Nathaniel Smith wrote:
> On Wed, Jun 24, 2015 at 3:10 PM, Devin Jeanpierre
> <jeanpierreda at> wrote:
> > So there's two reasons I can think of to use threads for CPU parallelism:
> >
> > - My thing does a lot of parallel work, and so I want to save on
> > memory by sharing an address space
> >
> > This only becomes an especially pressing concern if you start running
> > tens of thousands or more of workers. Fork also allows this.
> Not necessarily true... e.g., see two threads from yesterday (!) on
> the pandas mailing list, from users who want to perform queries
> against a large data structure shared between threads/processes:
> ("Are we just screwed on windows?")

    Ironically (not knowing anything about Pandas' implementation
    details other than... "Cython... and NumPy"), there should be
    no difference between getting a Pandas DataFrame available to
    PyParallel and a NumPy ndarray or Cythonized C-struct (like

    The situation Ryan describes is literally the exact situation
    that PyParallel excels at: large reference data structures
    accessible in parallel contexts.


From trent at  Thu Jun 25 06:59:04 2015
From: trent at (Trent Nelson)
Date: Thu, 25 Jun 2015 00:59:04 -0400
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 23, 2015 at 11:01:24PM -0600, Eric Snow wrote:
> On Sun, Jun 21, 2015 at 5:41 AM, Sturla Molden <sturla.molden at> wrote:
> > From the perspective of software design, it would be good it the CPython
> > interpreter provided an environment instead of using global objects. It
> > would mean that all functions in the C API would need to take the
> > environment pointer as their first variable, which will be a major rewrite.
> > It would also allow the "one interpreter per thread" design similar to tcl
> > and .NET application domains.
> While perhaps a worthy goal, I don't know that it fits in well with my
> goals.  I'm aiming for an improved multi-core story with a minimum of
> change in the interpreter.

This slide and the following two are particularly relevant:

I elicit three categories of contemporary problems where efficient
use of multiple cores would be desirable:

    1)  Computationally-intensive work against large data sets (the
        traditional "parallel" HPC/science/engineering space, and
        lately, to today's "Big Data" space).

    2a) Serving tens/hundreds of thousands of network clients with
        non-trivial computation required per-request (i.e. more than
        just buffer copying between two sockets); best example being
        the modern day web server, or:

    2b) Serving far fewer clients, but striving for the lowest latency
        possible in an environment with "maximum permitted latency"
        restrictions (or percentile targets, 99s etc).

In all three problem domains, there is a clear inflection point at
which multiple cores would overtake a single core in either:

    1)    Reducing the overall computation time.

    2a|b) Serving a greater number of clients (or being able to perform
          more complex computation per request) before hitting maximum
          permitted latency limits.

For PyParallel, I focused on 2a and 2b.  More specifically, a TCP/IP
socket server that had the ability to dynamically adjust its behavior
(low latency vs concurrency vs throughput[1]), whilst maintaining
optimal usage of underlying hardware[2].  That is: given sufficient
load, you should be able to saturate all I/O channels (network and
disk), or all cores, or both, with *useful* work.  (The next step
after saturation is sustained saturation (given sufficient load),
which can be even harder to achieve, as you need to factor in latencies
for "upcoming I/O" ahead of time if your computation is driven by
the results of a disk read (or database cursor fetch).)

(Sturla commented on the "import-DDoS" that you can run into on POSIX
 systems, which is a good example.  You're saturating your underlying
 hardware, sure, but you're not doing useful work -- it's important
 to distinguish the two.)

    Dynamically adjusting behavior based on low latency vs
    concurrency vs throughput:

    Optimal hardware use:

So, with the focus of PyParallel established (socket server that
could exploit all cores), my hypothesis was that I could find a
new way of doing things that was more performant than the status
quo.  (In particular, I wanted to make sure I had an answer for
"why not just use multiprocessing?" -- which is an important

So, I also made the decision to leverage threads for parallelism and
not processes+IPC, which it sounds like you're leaning toward as
well.  Actually, other than the subinterpreter implementation aspect,
everything you've described is basically on par with PyParallel, more
or less.

Now, going back to your original comment:

> While perhaps a worthy goal, I don't know that it fits in well with my
> goals.  I'm aiming for an improved multi-core story with a minimum of
> change in the interpreter.

That last sentence is very vague as multi-core means different things to
different people.  What is the problem domain you're going to try and
initially target?  Computationally-intensive parallel workloads like in
1), or the network I/O-driven socket server stuff like in 2a/2b?

I'd argue it should be the latter.  Reason being is that you'll rarely
see the former problem tackled solely by pure Python -- e.g. Python may
be gluing everything together, but the actual computation will be handled
by something like NumPy/Numba/Fortran/Cython or custom C stuff, and, as
Sturla's mentioned, OpenMP and MPI usually gets involved to manage the
parallel aspect.

For the I/O-driven socket server stuff, though, you already have this
nice delineation of what would be run serially versus what would be
ideal to run in parallel:

    import datrie
    import numpy as np
    import pyodbc
    import async
    from collections import defaultdict
    from async.http.server import (

    # Tell PyParallel to invoke the tp_dealloc method explicitly
    # for these classes when rewinding a heap after a parallel
    # callback has finished.  (Implementation detail: this toggles
    # the Py_TPFLAGS_PX_DEALLOC flag in the TypeObject's tp_flags;
    # when PyParallel intercepts PyObject_NEW/INIT (init_object),
    # classes (PyTypeObject *tp) with this flag set will be tracked
    # in a linked-list that is local to the parallel context being
    # used to service this client.  When the context has its heaps
    # rewound back to the initial state at the time of the snapshot,
    # it will call tp_dealloc() explicitly against all objects of
    # this type that were encountered.)

    # Load 29 million titles.  RSS += ~9.5GB.
    TITLES = datrie.Trie.load('titles.trie')
    # Load 15 million 64-bit offsets. RSS += ~200MB.
    OFFSETS = np.load('offsets.npy')
    XML = 'enwiki-20150205-pages-articles.xml'

    class WikiServer(HttpServer):
        # All of these methods are automatically invoked in
        # parallel.  HttpServer implements a data_received()
        # method which prepares the request object and then
        # calls the relevant method depending on the URL, e.g.
        # http://localhost/user/foo will call the user(request,
        # name='foo').  If we want to "write" to the client,
        # we return a bytes, bytearray or unicode object from
        # our callback (that is, we don't expose a socket.write()
        # to the user).
        # Just before the PyParallel machinery invokes the
        # callback (via a simple PyObject_CallObject), though,
        # it takes a snapshot of its current state, such that
        # the exact state can be rolled back to (termed a socket
        # "rewind") when this callback is complete.  If we don't
        # return a sendable object back, this rewind happens
        # immediately, and then we go straight into a read call.
        # If we do return something sendable, we send it.  When
        # that send completes, *then* we do the rewind, then we
        # issue the next read/recv call.
        # This approach is particularly well suited to parallel
        # callback execution because none of the objects we create
        # as part of the callback are needed when the callback
        # completes.  No garbage can accumulate because nothing
        # can live longer than that callback.  That obviates the
        # need for two things: reference counting against any object
        # in a parallel context, and garbage collection.  Those
        # things are useful for the main thread, but not parallel
        # contexts.
        # What if you do want to keep something around after
        # the callback?  If it's a simple scalar type, the
        # following will work:
        #   class Server:
        #       name = None
        #       @route
        #       def set_name(self, request, name):
        #  = name.upper()
        #           ^^^^^^^^^ we intercept that setattr and make
        #                     a copy of (the result of) name.upper()
        #                     using memory allocation from a different
        #                     heap that persists as long as the client
        #                     stays connnected.  (There's actually
        #                     support for alternatively persisting
        #                     the entire heap that the object was
        #                     allocated from, which we could use if
        #                     we were persisting complex, external,
        #                     or container types where simply doing
        #                     a memcpy() of a *base + size_t wouldn't
        #                     be feasible.  However, I haven't wired
        #                     up this logic to the socket context
        #                     logic yet.)
        #       @route
        #       def name(self, request):
        #           return json_serialization(request,
        #                                              ^^^^^^^^^
        #                                   This will return whatever
        #                                   was set in the call above.
        #                                   Once the client disconnects,
        #                                   the value disappears.
        #       (Actually I think if you wanted to persist the object
        #        for the lifetime of the server, you could probably
        #        do ` = xyz`; or at least,
        #        if that doesn't currently work, the required mechanics
        #        definitely exist, so it would just need to be wired
        #        up.)
        # If you want to keep an object around past the lifetime of
        # the connected client and the server, then send it to the main
        # thread where it can be tracked like a normal Python object:
        # USERS = async.dict()
        #         ^^^^^^^^^^^^ shortcut for:
        #                           foo = {}
        #                           async.protect(foo)
        #                      or just:
        #                           foo = async.protect({})
        #         (On the backend, this instruments[3] the object such
        #          that PyParallel can intercept setattr/setitem and
        #          getattr/getitem calls and "do stuff"[4], depending
        #          on the context.)


        # class MyServer(HttpServer):
        #   @route
        #   ^^^^^^ Ignore the mechanics of this, it's just a helper
        #          decorator I used to translate a HTTP GET for
        #          /login/foo to a function call of `login(name='foo')`.
        #          (see the bowls of async.http.server for details).
        #   def login(self, request, name):
        #       @call_from_main_thread
        #       def _save_name(n):
        #           USERS[n] = async.rdtsc()
        #           return len(USERS)
        #       count = _save_name(name)
        #       return json_serialization(request, {'count': count})
        # The @call_from_main_thread decorator will enqueue a work
        # item to the main thread, and then wait on the main thread's
        # response.  The main thread executes the callback and notifies
        # the parallel thread that the call has been completed and the
        # return value (in this case the value of `len(USERS)`).  The
        # parallel thread resumes and finishes the client request.
        # Note that this will implicitly serialize execution; any number
        # of parallel requests can submit main thread work, but the
        # main thread can only call them one at a time.  So, you'd
        # usually try and avoid this, or at least remove it from your
        # application's hot code path.

        connect_string = None
        all_users_sql = 'select * from user'
        one_user_sql = 'select * from user where login = ?'

        secret_key = None

        def wiki(self, request, name):
            # http://localhost/wiki/Python: name = Python
            if name not in TITLES:
                self.error(request, 404)

            # log(n) lookup against a trie with 29 million keys.
            offset = TITLES[name][0]
            # log(n) binary search against a numpy array with 15
            # million int64s.
            ix = OFFSETS.searchsorted(offset, side='right')
            # OFFSETS[ix] = what's the offset after this?
            (start, end) = (ix-7, OFFSETS[ix]-11)
            # -7, +11 = adjust for the fact that all of the offsets
            # were calculated against the '<' of '<title>Foo</title>'.
            range_request = '%d-%d' % (start, end)
            request.range = RangedRequest(range_request)
            request.response.content_type = 'text/xml; charset=utf-8'
            return self.sendfile(request, XML)

        def users(self, request):
            # ODBC driver managers that implement connection pooling
            # behind the scenes play very nicely with our
            # pyodbc.connect() call here, returning a connection
            # from the pool (when able) without blocking.
            con = pyodbc.connect(self.connect_string)

            # The next three odbc calls would all block (in the
            # traditional sense), so this current thread would
            # not be able to serve any other requests whilst
            # waiting for completion -- however, this is far
            # less of a problem for PyParallel than single-threaded
            # land as other threads will keep servicing requests
            # in the mean time.  (ODBC 3.8/SQL Server 2012/Windows 8
            # did introduce async notification, such that we could
            # request that an event be set when the cursor/query/call
            # has completed, which we'd tie in to PyParallel by
            # submitting a threadpool wait (much like we do for async
            # DNS lookup[5], also added in Windows 8), however, it was
            # going to require a bit of modification to the pyodbc
            # module to support the async calling style, so, all the
            # calls stay synchronous for now.)


            cur = con.cursor()
            return json_serialization(request, cur.fetchall())

        def user(self, request, login):
            con = pyodbc.connect(self.connect_string)
            cur = con.cursor()
            cur.execute(self.one_user_sql, (login,))
            return json_serialization(request, cur.fetchall())

        def set_secret_key(self, request, key):
            # http://localhost/set_secret_key/foobar
            # An example of persisting a scalar for the lifetime
            # of the thread (that is, until it disconects or EOFs).
                self.secret_key = [ key, ]
            except ValueError:
                # This would be hit, because we've got guards in place
                # to assess the "clonability" of an object at this
                # point[6].  (Ok, after reviewing the code, we don't,
                # but at least we'd crash.)

            # However, this would work fine, essentially memcpy'ing
            # the key object at the time of assignment using a different
            # heap to the one that automatically gets reset at the end
            # of the callback.
            self.secret_key = key

        def secret_key(self, request):
            # http://localhost/secret_key -> 'foobar'
            return json_serialization(request, {'key': self.secret_key})

        def stats(self, request):
            # Handy little json representation of various system stats;
            # active parallel contexts, I/O hogs, memory load, etc.
            stats = {
                'system': dict(sys_stats()),
                'server': dict(socket_stats(request.transport.parent)),
                'memory': dict(memory_stats()),
                'contexts': dict(context_stats()),
                'elapsed': request.transport.elapsed(),
                'thread': async.thread_seq_id(),
            return json_serialization(request, stats)

        def debug(self, request):
            # Don't call print() or any of the sys.std(err|out)
            # methods in a parallel context.  If you want to do some
            # poor man's debugging with print statements in lieu of not
            # being able to attach a pdb debugger (tracing is disabled
            # in parallel threads), then use async.debug().  (On
            # Windows, this writes the message to the debug stream,
            # which you'd monitor via dbgview or VS.)
            async.debug("received request: %s" %

            # Avoid repr() at the moment in parallel threads; it uses
            # PyThreadState_SetDictItem() to control recursion depths,
            # which I haven't made safe to call from a parallel context.

            # If you want to attach Visual Studio debugger at this point
            # though, you can do so via:
            # (That literally just generates an INT 3.)

        def shutdown(self, request):
            # Handy helper for server shutdown (stop listening on the
            # bound IP:PORT, wait for all running client callbacks to
            # complete, then return.  Totally almost works at the
            # moment[7].)

    def main():
        server = async.server('', port)
        protocol = HttpServer
        protocol.connect_string = 'Driver={SQL Server}...'
        async.register(transport=server, protocol=protocol)
        ^^^^^^^^^^^^^^ this will create a special 'server' instance
                       of the protocol, which will issue the bind()
                       call.  It then creates a configurable number
                       (currently ncpu * 2) of parallel contexts
                       and triggers parallel AcceptEx() invocation
                       (you can prime "pre-accepted" sockets on Windows,
                       which removes the serialization limits of
                       accept() on POSIX).

        # If an exception occurs in a parallel thread, it is queued
        # to a special list the main thread has.  The main thread
        # checks this list each time async.run_once() is called, so,
        # we call it here just to propagate any exceptions that
        # may have already occurred (like attempting to bind to an
        # invalid IP, or submitting a protocol that had an error).
        return server
        # (This also facilitates interactive console usage whilst
        #  serving request in parallel.)

    if __name__ == '__main__':
        # Run forever.  Returns when there are no active contexts
        # or ctrl-c is pressed.

All of that works *today* with PyParallel.  The main thread preps
everything, does the importing, loads the huge data structures,
establishes all the code objects and then, once is called,
sits there dormant waiting for feedback from the parallel threads.

It's not perfect; I haven't focused on clean shutdown yet, so you will
100% crash if you ctrl-C it currently.  That's mainly an issue with
interpreter finalization destroying the GIL, which clears our
Py_MainThreadId, which makes all the instrumented macros like
Py_INCREF/Py_DECREF think they're in a parallel context when they're
not, which... well, you can probably guess what happens after that if
you've got 8 threads still running at the time pointer dereferencing
things that aren't what they think they are.

None of the problems are showstoppers though, it's just a matter of
prioritization and engineering effort.  My strategic priorities to date
have been:
    a) no changes to semantics of CPython API
    b) high performance
    c) real-world examples

Now, given that this has been something I've mostly worked on in my own
time, my tactical priority each development session (often started after
an 8 hour work day where I'm operating at reduced brain power) is simply:
    a) forward progress at any cost

The quickest hack I can think of that'll address the immediate problem
is the one that gets implemented.  That hack will last until it stops
working, at which point, the quickest hack I can think of to replace it
wins, and so on.  At no time do I consider the maintainability, quality
or portability of the hack -- as long as it moves the overall needle
forward, perfect; it can be made elegant later.

I think it's important to mention that, because if you're reviewing the
source code, it helps explain things like how I implemented the
persistence of an object within a client session (e.g. intercepting the
setattr/setitem and doing the alternate heap memcpy dance alluded to

Without that bit of code, you'll leak memory, with it, you won't.

I attacked pyodbc a few weeks ago -- it was also leaking memory
when called from parallel callbacks because tp_dealloc wasn't being
called on any of the Connection, Cursor or Row objects, so handles
that were allocated (i.e. SQLAllocHandle()) were never paired with a
SQLFreeHandle() (because we don't refcount in a parallel context, which
means there's never a Py_DECREF that hits 0, which means Py_Dealloc()
never gets called for that object (which works fine for everything
that allocates via PyObject/PyMem facilities, because we intercept those
and roll them back in bulk)), and thus, leak.

Quickest fix I could think of at the time:


Which facilitates this during our interception of PyObject_NEW/INIT:

Which allows us to do this for each heap...

....that we encounter as part of "socket rewinding":

Absolutely horrendous hack from a software engineering perspective, but
is surprisingly effective at solving the problem.



From njs at  Thu Jun 25 10:58:25 2015
From: njs at (Nathaniel Smith)
Date: Thu, 25 Jun 2015 01:58:25 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 9:59 PM, Trent Nelson <trent at> wrote:
> (Sturla commented on the "import-DDoS" that you can run into on POSIX
>  systems, which is a good example.  You're saturating your underlying
>  hardware, sure, but you're not doing useful work -- it's important
>  to distinguish the two.)

To be clear, AFAIU the "import-DDoS" that supercomputers classically
run into has nothing to do with POSIX, it has to do running systems
that were designed for simulation workloads that go like: generate a
bunch of data from scratch in memory, crunch on it for a while, and
then spit out some summaries. So you end up with $1e11 spent on
increasing the FLOP count, and the absolute minimum spent on the
storage system -- basically just enough to let you load a single
static binary into memory at the start of your computation, and there
might even be some specific hacks in the linker to minimize cost of
distributing that single binary load. (These are really weird
architectures; they usually do not even have shared library support.)
And the result is that when you try spinning up a Python program
instead, the startup sequence produces (number of imports) * (number
of entries in sys.path) * (hundreds of thousands of nodes)
simultaneous stat calls hammering some poor NFS server somewhere and
it falls over and dies. (I think often the network connection to the
NFS server is not even using the ridiculously-fast interconnect mesh,
but rather some plain-old-ethernet that gets saturated.) I could be
wrong, I don't actually work with these systems myself, but that's
what I've picked up.

Continuing my vague and uninformed impressions, I suspect that this
would actually be relatively easy to fix by hooking the import system
to do something more intelligent, like nominate one node as the leader
and have it do the file lookups and then tell everyone else what it
found (via the existing message-passing systems). Though there is an
interesting problem of how you bootstrap the hook code.

But as to whether the new import hook stuff actually helps with
this... I'm pretty sure most HPC centers haven't noticed that Python 3
exists yet. See above re: extremely weird architectures -- many of us
are familiar with "clinging to RHEL 5" levels of conservatism, but
that's nothing on "look there's only one person who ever knew how to
get a working python and numpy using our bespoke compiler toolchain on
this architecture that doesn't support extension module loading (!!),
and they haven't touched it in years either"...

There are lots of smart people working on this stuff right now. But
they are starting from a pretty different place from those of us in
the consumer computing world :-).


Nathaniel J. Smith --

From sturla.molden at  Thu Jun 25 11:35:35 2015
From: sturla.molden at (Sturla Molden)
Date: Thu, 25 Jun 2015 09:35:35 +0000 (UTC)
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <>

Trent Nelson <trent at> wrote:

>     The situation Ryan describes is literally the exact situation
>     that PyParallel excels at: large reference data structures
>     accessible in parallel contexts.

Back in 2009 I solved this for multiprocessing using a NumPy array that
used shared memory as backend (Sys V IPC, not BSD mmap, on mac and Linux).
By monkey-patching the pickling of numpy.ndarray, the contents of the
shared memory buffer was not pickled, only the metadata needed to reopen
the shared memory. After a while it stopped working on Mac (I haven't had
time to fix it -- maybe I should), but it still works on Windows. :(

Anyway, there is another library that does something similar called joblib.
It is used for parallel computing in scikit-learn. It creates shared memory
by mmap from /tmp, which means it is only shared memory on Linux. On Mac
and Window there is no tmpfs so it ends up using a physical file on disk
instead :-(


From sturla.molden at  Thu Jun 25 11:35:34 2015
From: sturla.molden at (Sturla Molden)
Date: Thu, 25 Jun 2015 09:35:34 +0000 (UTC)
Subject: [Python-ideas] solving multi-core Python
References: <>
Message-ID: <>

Nathaniel Smith <njs at> wrote:

> Continuing my vague and uninformed impressions, I suspect that this
> would actually be relatively easy to fix by hooking the import system
> to do something more intelligent, like nominate one node as the leader
> and have it do the file lookups and then tell everyone else what it
> found (via the existing message-passing systems). 

There are two known solutions. One is basically what you describe. The
other, which at least works on IBM blue brain, is to import modules from a
ramdisk. It seems to be sufficient to make sure whatever is serving the
shared disk can deal with the 100k client DDoS.


From mal at  Thu Jun 25 12:00:34 2015
From: mal at (M.-A. Lemburg)
Date: Thu, 25 Jun 2015 12:00:34 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>	<mm67sk$1nv$>	<>	<>	<>
Message-ID: <>

On 25.06.2015 11:35, Sturla Molden wrote:
> Nathaniel Smith <njs at> wrote:
>> Continuing my vague and uninformed impressions, I suspect that this
>> would actually be relatively easy to fix by hooking the import system
>> to do something more intelligent, like nominate one node as the leader
>> and have it do the file lookups and then tell everyone else what it
>> found (via the existing message-passing systems). 
> There are two known solutions. One is basically what you describe. The
> other, which at least works on IBM blue brain, is to import modules from a
> ramdisk. It seems to be sufficient to make sure whatever is serving the
> shared disk can deal with the 100k client DDoS.

Another way to solve this problem may be to use our eGenix PyRun
which embeds modules right in the binary. As a result, all reading
is done from the mmap'ed binary and automatically shared between
processes by the OS:

I don't know whether this actually works on an IBM Blue Brain with
100k clients - we are not fortunate enough to have access to one of
those machines :-)

Note: Even though the data reading is shared, the resulting
code and modules objects are, of course, not shared, so you still
have the overhead of using up memory for this, unless you init
your process cluster using fork() after you've imported all
necessary modules (then you benefit from the copy-on-write
provided by the OS - code objects usually don't change after
they have been created).

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 25 2015)
>>> Python Projects, Coaching and Consulting ...
>>> mxODBC Plone/Zope Database Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...
2015-06-25: Released mxODBC 3.3.3 ...   
2015-06-16: Released eGenix pyOpenSSL 0.13.10 ...
2015-07-20: EuroPython 2015, Bilbao, Spain ...             25 days to go Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at  Thu Jun 25 14:56:59 2015
From: ncoghlan at (Nick Coghlan)
Date: Thu, 25 Jun 2015 22:56:59 +1000
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <20150624143808.29844019@x230>
References: <20150623021530.74ce1ebe@x230>
 <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol>
 <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol>
 <20150624135908.4f85d415@x230> <20150624130338.0b222ca3@fsol>
Message-ID: <>

On 24 June 2015 at 21:38, Paul Sokolovsky <pmiscml at> wrote:
> On Wed, 24 Jun 2015 13:03:38 +0200
> Antoine Pitrou <solipsis at> wrote:
>> Don't you have an additional namespace for micropython-specific
>> features?
> I treat it as a good sign that it's ~8th message in the thread and it's
> only the first time we get a hint that we should get out with our stuff
> into a separate namespace ;-).

We hadn't previously gotten to the fact that part of your motivation
was helping folks learn the intricacies of low level fixed width time
measurement, though.

That's actually a really cool idea - HC11 assembly programming and TI
C5420 DSP programming are still two of my favourite things I've ever
done, and it would be nice if folks could more easily start exploring
the mindset of the embedded microprocessor world without having to
first deal with the incidental complexity of emulators or actual
embedded hardware (even something like programming an Arduino directly
is more hassle than remote controlling one from a Raspberry Pi or PC).

Unfortunately, I can't think of a good alternative name that isn't
ambiguous at the CPython layer - embedded CPython is very different
from an embedded microprocessor, utime is taken, and microtime is
confusable with microseconds.

I'm tempted to suggest calling it "qtime", and using TI's Q notation
to denote the formats of numbers:

That would conflict with your notion of making the APIs agnostic as to
the exact bitwidth used, though, as well as with the meaning of the
"q" prefix in qmath:


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Thu Jun 25 15:24:54 2015
From: ncoghlan at (Nick Coghlan)
Date: Thu, 25 Jun 2015 23:24:54 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 24 June 2015 at 21:43, M.-A. Lemburg <mal at> wrote:
> Note that extension modules often interface to other C libraries
> which typically use some setup logic that is not thread safe,
> but is used to initialize the other thread safe parts. E.g.
> setting up locks and shared memory for all threads to
> use is a typical scenario you find in such libs.
> A requirement to be able to import modules multiple times
> would pretty much kill the idea for those modules.

Yep, that's the reason earlier versions of PEP 489 included the notion
of "singleton modules". We ended up deciding to back that out for the
time being, and instead leave those modules using the existing single
phase initialisation model.

> That said, I don't think this is really needed. Modules
> would only have to be made aware that there is a global
> first time setup phase and a later shutdown/reinit phase.
> As a result, the module DLL would load only once, but then
> use the new module setup logic to initialize its own state
> multiple times.

Aye, buying more time to consider alternative designs was the reason
we dropped the "singleton module" idea from multi-phase initialisation
until 3.6 at the earliest. I think your idea here has potential - it
should just require a new Py_mod_setup slot identifier, and a bit of
additional record keeping to track which modules had already had their
setup slots invoked. (It's conceivable there could also be a
process-wide Py_mod_teardown slot, but that gets messy in the embedded
interpreter case where we might have multiple
Py_Initialize/Py_Finalize cycles)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Thu Jun 25 16:08:07 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 26 Jun 2015 00:08:07 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmels6$sc8$>
References: <>
Message-ID: <>

On 25 June 2015 at 02:28, Sturla Molden <sturla.molden at> wrote:
> On 24/06/15 07:01, Eric Snow wrote:
>> Well, perception is 9/10ths of the law. :)  If the multi-core problem
>> is already solved in Python then why does it fail in the court of
>> public opinion.  The perception that Python lacks a good multi-core
>> story is real, leads organizations away from Python, and will not
>> improve without concrete changes.
> I think it is a combination of FUD and the lack of fork() on Windows. There
> is a lot of utterly wrong information about CPython and its GIL.
> The reality is that Python is used on even the largest supercomputers. The
> scalability problem that is seen on those systems is not the GIL, but the
> module import. If we have 1000 CPython processes importing modules like
> NumPy simultaneously, they will do a "denial of service attack" on the file
> system. This happens when the module importer generates a huge number of
> failed open() calls while trying to locate the module files.

Slight tangent, but folks hitting this issue on 2.7 may want to
investigate Eric's importlib2:

It switches from stat-based searching for files to the Python 3.3+
model of directory listing based searches, which can (anecdotally)
lead to a couple of orders of magnitude of improvement in startup for
code loading modules from NFS mounts.

> And while CPython is being used for massive parallel computing to e.g. model
> the global climate system, there is this FUD that CPython does not even
> scale up on a laptop with a single multicore CPU. I don't know where it is
> coming from, but it is more FUD than truth.

Like a lot of things in the vast sprawling Python ecosystem, I think
there are aspects of this that are a discoverabiilty problem moreso
than a capability problem. When you're first experimenting with
parallel execution, a lot of the time folks start with computational
problems like executing multiple factorials at once. That's trivial to
do across multiple cores even with a threading model like JavaScript's
worker threads, but can't be done in CPython without reaching for the
multiprocessing module. This is the one place where I'll concede that
folks learning to program on Windows or the JVM and hence getting the
idea that "creating threads is fast, creating processes is slow"
causes problems: folks playing this kind of thing are far more likely
to go "import threading" than they are "import multiprocessing" (and
likewise for the ThreadPoolExecutor vs the ProcessPoolExecutor if
using concurrent.futures), and their reaction when it doesn't work is
far more likely to be "Python can't do this" than it is "I need to do
this differently in Python from the way I do it in

> The main answers to FUD about the GIL and Python in scientific computing are
> these:

It generally isn't scientific programmers I personally hit problems
with (although we have to allow for the fact many of the scientists I
know I met *because* they're Pythonistas). For that use case, there's
not only HPC to point to, but a number of papers that talking about
Cython and Numba in the same breath as C, C++ and FORTRAN, which is
pretty spectacular company to be in when it comes to numerical
computation. Being the fourth language Nvidia supported directly for
CUDA doesn't hurt either.

Instead, the folks that I think have a more valid complaint are the
games developers, and the folks trying to use games development as an
educational tool. They're not doing array based programming the way
numeric programmers are (so the speed of the NumPy stack isn't any
help), and they're operating on shared game state and frequently
chattering back and forth between threads of control, so high overhead
message passing poses a major performance problem.

That does suggest to me a possible "archetypal problem" for the work
Eric is looking to do here: a 2D canvas with multiple interacting
circles bouncing around. We'd like each circle to have its own
computational thread, but still be able to deal with the collision
physics when they run into each other. We'll assume it's a teaching
exercise, so "tell the GPU to do it" *isn't* the right answer
(although it might be an interesting entrant in a zoo of solutions).
Key performance metric: frames per second


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Thu Jun 25 16:31:47 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 26 Jun 2015 00:31:47 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 26 June 2015 at 00:08, Nick Coghlan <ncoghlan at> wrote:
> That does suggest to me a possible "archetypal problem" for the work
> Eric is looking to do here: a 2D canvas with multiple interacting
> circles bouncing around. We'd like each circle to have its own
> computational thread, but still be able to deal with the collision
> physics when they run into each other. We'll assume it's a teaching
> exercise, so "tell the GPU to do it" *isn't* the right answer
> (although it might be an interesting entrant in a zoo of solutions).
> Key performance metric: frames per second

The more I think about it, the more I think this (or at least
something along these lines) makes sense as the archetypal problem to
solve here.

1. It avoids any temptation to consider the problem potentially IO
bound, as the only IO is rendering the computational results to the
2. Scaling across multiple machines clearly isn't relevant, since
we're already bound to a single machine due to the fact we're
rendering to a local display
3. The potential for collisions between objects means it isn't an
embarrassingly parallel problem where the different computational
threads can entirely ignore the existence of the other threads
4. "Frames per second" is a nice simple metric that can be compared
across threading, multiprocessing, PyParallel, subinterpreters, mpi4py
and perhaps even the GPU (which will no doubt thump the others
soundly, but the comparison may still be interesting)
5. It's a problem domain where we know Python isn't currently a
popular choice, and there are valid technical reasons (including this
one) for that lack of adoption
6. It's a problem domain we know folks in the educational community
are interested in seeing Python get better at, as building simple
visual animations is often a good way to introduce programming in
general (just look at the design of Scratch)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From sturla.molden at  Thu Jun 25 17:18:10 2015
From: sturla.molden at (Sturla Molden)
Date: Thu, 25 Jun 2015 17:18:10 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmh63g$r1o$>

On 25/06/15 16:08, Nick Coghlan wrote:

> It generally isn't scientific programmers I personally hit problems
> with (although we have to allow for the fact many of the scientists I
> know I met *because* they're Pythonistas). For that use case, there's
> not only HPC to point to, but a number of papers that talking about
> Cython and Numba in the same breath as C, C++ and FORTRAN, which is
> pretty spectacular company to be in when it comes to numerical
> computation.

Cython can sometimes give the same performance as C or Fortran, but as 
soon as you start to use classes in the Cython code you run into GIL 
issues. It is not that the GIL is a problem per se, but because Cython 
compiles to C, the GIL is not released until the Cython function 
returns. That is, unless you manually release it inside Cython. This 
e.g. means that the interpreter might be locked for longer durations, 
and if you have a GUI it becomes unresponsive. The GIL is more painful 
in Cython than in Python. Personally I often end up writing a mix of 
Cython and C or C++.

Numba is impressive but still a bit immature. It is an LLVM based JIT 
compiler for CPython that for simple computational tasks can give 
performance similar to C. It can also run Python code on Nvidia GPUs. 
Numba is becoming what the dead swallow should have been.

> Instead, the folks that I think have a more valid complaint are the
> games developers, and the folks trying to use games development as an
> educational tool.

I have not developed games myself, but for computer graphics with OpenGL 
there is certainly no reason to complain. NumPy arrays are great for 
storing vertex and texture data. OpenGL with NumPy is just as fast as 
OpenGL with C arrays. GLSL shaders are just plain text, Python is great 
for that. Cython and Numba are both great if you call glVertex* 
functions the old way, doing this as fast as C. Display lists are also 
equally fast from Python and C. But if you start to call glVertex* 
multiple times from a Python loop, then you're screwed.

> That does suggest to me a possible "archetypal problem" for the work
> Eric is looking to do here: a 2D canvas with multiple interacting
> circles bouncing around. We'd like each circle to have its own
> computational thread, but still be able to deal with the collision
> physics when they run into each other.

There are people doing Monte Carlo simulations with thousands or 
millions of particles, but not with one thread per particle.



From sturla.molden at  Thu Jun 25 17:25:41 2015
From: sturla.molden at (Sturla Molden)
Date: Thu, 25 Jun 2015 17:25:41 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmh6hi$11q$>

On 25/06/15 16:31, Nick Coghlan wrote:

> 3. The potential for collisions between objects means it isn't an
> embarrassingly parallel problem where the different computational
> threads can entirely ignore the existence of the other threads

Well, you can have a loop that updates all particles, e.g. by calling a 
coroutine associated with each particle, and then this loop is an 
embarrassingly parallel problem. You don't need to associate each 
particle with its own thread.

It is bad to teach students to use one thread per particle anyway. 
Suddenly they write a system that have thousands of threads.


From random832 at  Thu Jun 25 20:06:38 2015
From: random832 at (random832 at
Date: Thu, 25 Jun 2015 14:06:38 -0400
Subject: [Python-ideas] millisecond and microsecond times without floats
Message-ID: <>

On Mon, Jun 22, 2015, at 19:15, Paul Sokolovsky wrote:
> Hello from MicroPython, a lean Python implementation
> scaling down to run even on microcontrollers 
> (
> Our target hardware base oftentimes lacks floating point support, and
> using software emulation is expensive. So, we would like to have
> versions of some timing functions, taking/returning millisecond and/or
> microsecond values as integers.

What about having a fixed-point decimal numeric type to be used for this

Allowing time (and stat, and the relevant functions of the datetime
module) to return any real numeric type rather than being required to
use float would be a useful extension.

From stefan_ml at  Thu Jun 25 21:00:47 2015
From: stefan_ml at (Stefan Behnel)
Date: Thu, 25 Jun 2015 21:00:47 +0200
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <mmhj4v$n6r$>

Eric Snow schrieb am 24.06.2015 um 06:15:
> On Sun, Jun 21, 2015 at 4:40 AM, Stefan Behnel wrote:
>> If objects can make it explicit that they support sharing (and preferably
>> are allowed to implement the exact details themselves), I'm sure we'll find
>> ways to share NumPy arrays across subinterpreters. That feature alone tends
>> to be a quick way to make a lot of people happy.
> Are you thinking of something along the lines of a dunder method (e.g.
> __reduce__)?

Sure. Should not be the first problem to tackle here, but dunder methods
would be the obvious way to interact with whatever "share/move/copy between
subinterpreters" protocol there will be.


From trent at  Thu Jun 25 09:01:19 2015
From: trent at (Trent Nelson)
Date: Thu, 25 Jun 2015 03:01:19 -0400
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmei82$q2u$>
References: <>
Message-ID: <>

On Wed, Jun 24, 2015 at 05:26:59PM +0200, Sturla Molden wrote:
> On 24/06/15 07:01, Eric Snow wrote:
> >In return, my question is, what is the level of effort to get fork+IPC
> >to do what we want vs. subinterpreters?  Note that we need to
> >accommodate Windows as more than an afterthought
> Windows is really the problem. The absence of fork() is especially hurtful
> for an interpreted language like Python, in my opinion.

    UNIX is really the problem.  The absence of tiered interrupt request
    levels, memory descriptor lists, I/O request packets (Irps), thread
    agnostic I/O, non-paged kernel memory, non-overcommitted memory
    management, universal page/buffer cache, better device driver
    architecture and most importantly, a kernel architected around
    waitable events, not processes, is harmful for efficiently solving
    contemporary optimally with modern hardware.

    VMS got it right from day one.  UNIX did not.



From wes.turner at  Thu Jun 25 23:51:48 2015
From: wes.turner at (Wes Turner)
Date: Thu, 25 Jun 2015 16:51:48 -0500
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmh6hi$11q$>
References: <>
Message-ID: <>

On Thu, Jun 25, 2015 at 10:25 AM, Sturla Molden <sturla.molden at>

> On 25/06/15 16:31, Nick Coghlan wrote:
>  3. The potential for collisions between objects means it isn't an
>> embarrassingly parallel problem where the different computational
>> threads can entirely ignore the existence of the other threads
> Well, you can have a loop that updates all particles, e.g. by calling a
> coroutine associated with each particle, and then this loop is an
> embarrassingly parallel problem. You don't need to associate each particle
> with its own thread.
> It is bad to teach students to use one thread per particle anyway.
> Suddenly they write a system that have thousands of threads.

Understood that this is merely an example re: threading, but
BSP seems to be the higher-level algorithm for iterative graphs with

  * (no graphx BSP yet,
* (Erlang, HDFS, Thrift)
* (Python)

Intra-machine optimization could also be useful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ethan at  Fri Jun 26 00:11:40 2015
From: ethan at (Ethan Furman)
Date: Thu, 25 Jun 2015 15:11:40 -0700
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmh6hi$11q$>
References: <>
Message-ID: <>

On 06/25/2015 08:25 AM, Sturla Molden wrote:
> On 25/06/15 16:31, Nick Coghlan wrote:
>> 3. The potential for collisions between objects means it isn't an
>> embarrassingly parallel problem where the different computational
>> threads can entirely ignore the existence of the other threads
> Well, you can have a loop that updates all particles, e.g. by calling a coroutine associated with each particle, and then this loop is an embarrassingly parallel problem. You don't need to associate
> each particle with its own thread.
> It is bad to teach students to use one thread per particle anyway. Suddenly they write a system that have thousands of threads.

Speaking as a novice to this area, I do understand that what we learn with may not be (and usually isn't) production-ready code, I do see Nick's suggestion as being one that is easy to understand, 
easy to measure, and good for piquing interest.

At least, I'm now interested.  :)  (look ma!  bowling for circles!)


From ncoghlan at  Fri Jun 26 10:00:44 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 26 Jun 2015 18:00:44 +1000
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <>
References: <>
Message-ID: <>

On 26 June 2015 at 04:06,  <random832 at> wrote:
> On Mon, Jun 22, 2015, at 19:15, Paul Sokolovsky wrote:
>> Hello from MicroPython, a lean Python implementation
>> scaling down to run even on microcontrollers
>> (
>> Our target hardware base oftentimes lacks floating point support, and
>> using software emulation is expensive. So, we would like to have
>> versions of some timing functions, taking/returning millisecond and/or
>> microsecond values as integers.
> What about having a fixed-point decimal numeric type to be used for this
> purpose?
> Allowing time (and stat, and the relevant functions of the datetime
> module) to return any real numeric type rather than being required to
> use float would be a useful extension.

It isn't the data type that's the problem per se, it's the additional
abstraction layers - the time module assumes it's dealing with
operating system provided timing functionality, rather than accessing
timer hardware directly. Folks tend to think that the os and time
modules are low level, but there's a wonderful saying that asks "How
do you tell the difference between a software developer and a computer
systems engineer?" Answer:

Software developer: "In a *low* level language like C..."
Computer systems engineer: "In a *high* level language like C..."

Paul, what do you think of the idea of trying to come up with a
"hwclock" module for MicroPython that aims to expose very, very low
level timing functionality as you describe, and reserving that name
for independent distribution on PyPI? Such a module could even
eventually grow a plugin system to provide access to various real time
clock modules in addition to the basic counter based clocks you're
interested in right now.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Fri Jun 26 13:12:24 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 26 Jun 2015 21:12:24 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <mmh6hi$11q$>
References: <>
Message-ID: <>

On 26 Jun 2015 01:27, "Sturla Molden" <sturla.molden at> wrote:
> On 25/06/15 16:31, Nick Coghlan wrote:
>> 3. The potential for collisions between objects means it isn't an
>> embarrassingly parallel problem where the different computational
>> threads can entirely ignore the existence of the other threads
> Well, you can have a loop that updates all particles, e.g. by calling a
coroutine associated with each particle, and then this loop is an
embarrassingly parallel problem. You don't need to associate each particle
with its own thread.
> It is bad to teach students to use one thread per particle anyway.
Suddenly they write a system that have thousands of threads.

And when they hit that scaling limit is when they should need to learn why
this simple approach doesn't scale very well, just as purely procedural
programming doesn't handle increasing structural complexity and just as the
"c10m" problem (like the "c10k" problem before it) is teaching our industry
as a whole some important lessons about scalable hardware and software

There are limits to the degree that education can be front loaded before
all the pre-emptive "you'll understand why this is important later"
concerns become a barrier to learning the fundamentals, rather than a
useful aid. Sometimes folks really do need to encounter a problem
themselves in order to appreciate the value of the more complex solutions
that make it possible to get past those barriers.


> Sturla
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Fri Jun 26 13:20:10 2015
From: ncoghlan at (Nick Coghlan)
Date: Fri, 26 Jun 2015 21:20:10 +1000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
 <mmei82$q2u$> <>
Message-ID: <>

On 26 Jun 2015 05:37, "Trent Nelson" <trent at> wrote:
> On Wed, Jun 24, 2015 at 05:26:59PM +0200, Sturla Molden wrote:
> > On 24/06/15 07:01, Eric Snow wrote:
> >
> > >In return, my question is, what is the level of effort to get fork+IPC
> > >to do what we want vs. subinterpreters?  Note that we need to
> > >accommodate Windows as more than an afterthought
> >
> > Windows is really the problem. The absence of fork() is especially
> > for an interpreted language like Python, in my opinion.
>     UNIX is really the problem.  The absence of tiered interrupt request
>     levels, memory descriptor lists, I/O request packets (Irps), thread
>     agnostic I/O, non-paged kernel memory, non-overcommitted memory
>     management, universal page/buffer cache, better device driver
>     architecture and most importantly, a kernel architected around
>     waitable events, not processes, is harmful for efficiently solving
>     contemporary optimally with modern hardware.

Platforms are what they are :)

As a cross-platform, but still platform dependent, language runtime, we're
actually in a pretty good position to help foster some productive
competition between Windows and the *nix platforms.

However, we'll only be able to achieve that if we approach their wildly
divergent execution and development models with respect for their
demonstrated success and seek to learn from their respective strengths,
rather than dismissing them over their respective weaknesses :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From oscar.j.benjamin at  Fri Jun 26 17:35:51 2015
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Fri, 26 Jun 2015 15:35:51 +0000
Subject: [Python-ideas] solving multi-core Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, 25 Jun 2015 at 02:57 Eric Snow <ericsnowcurrently at> wrote:

> On Wed, Jun 24, 2015 at 10:28 AM, Sturla Molden <sturla.molden at>
> wrote:
> > The reality is that Python is used on even the largest supercomputers.
> The
> > scalability problem that is seen on those systems is not the GIL, but the
> > module import. If we have 1000 CPython processes importing modules like
> > NumPy simultaneously, they will do a "denial of service attack" on the
> file
> > system. This happens when the module importer generates a huge number of
> > failed open() calls while trying to locate the module files.
> >
> > There is even described in a paper on how to avoid this on an IBM Blue
> > Brain: "As an example, on Blue Gene P just starting up Python and
> importing
> > NumPy and GPAW with 32768 MPI tasks can take 45 minutes!"
> I'm curious what difference there is under Python 3.4 (or even 3.3).
> Along with being almost entirely pure Python, the import system now
> has some optimizations that help mitigate filesystem access
> (particularly stats).

>From the HPC setup that I use there does appear to be some difference.
The number of syscalls required to import numpy is significantly lower with
3.3 than 2.7 in our setup (I don't have 3.4 in there and I didn't compile
either of these myself):

$ strace python3.3 -c "import numpy" 2>&1 | egrep -c '(open|stat)'
$ strace python2.7 -c "import numpy" 2>&1 | egrep -c '(open|stat)'

It doesn't make any perceptible difference when running "time python -c
'import numpy'" on the login node. I'm not going to request 1000 cores in
order to test the difference properly. Also note that profiling in these
setups is often complicated by the other concurrent users of the system.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pmiscml at  Fri Jun 26 21:14:56 2015
From: pmiscml at (Paul Sokolovsky)
Date: Fri, 26 Jun 2015 22:14:56 +0300
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <>
References: <20150623021530.74ce1ebe@x230>
 <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol>
 <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol>
 <20150624135908.4f85d415@x230> <20150624130338.0b222ca3@fsol>
Message-ID: <20150626221456.211042b4@x230>


On Thu, 25 Jun 2015 22:56:59 +1000
Nick Coghlan <ncoghlan at> wrote:

> On 24 June 2015 at 21:38, Paul Sokolovsky <pmiscml at> wrote:
> > On Wed, 24 Jun 2015 13:03:38 +0200
> > Antoine Pitrou <solipsis at> wrote:
> >> Don't you have an additional namespace for micropython-specific
> >> features?
> >
> > I treat it as a good sign that it's ~8th message in the thread and
> > it's only the first time we get a hint that we should get out with
> > our stuff into a separate namespace ;-).
> We hadn't previously gotten to the fact that part of your motivation
> was helping folks learn the intricacies of low level fixed width time
> measurement, though.

Well, Python is nice teaching language, and a lot of embedded
programming can be done with just GPIO control (i.e. being able to
control 1/0 digital signal) and (properly) timed delays (this approach
is know as Then if keeping
it simple and performant we let people do more rather than less with
that. So, yes, learning/experimentation is definitely in scope of this

People may ask what all that has to do with very high-level language
Python, but I have 2 answers:

1. Languages like JavaScript or Lua have much more limited type model,
e.g. they don't even have integer numeric type per se (only float), and
yet their apologists don't feel too shy to push to use them for
embedded hardware programming. Certainly, Python is not less, only more
suited for that with its elaborated type model and stricter typedness

2. When I started with Python 1.5, I couldn't imagine there would be
e.g. memoryview's. And yet they're there. So, Python (and people behind
it) do care about efficiency, so it shouldn't come as surprise
special-purpose Python implementation cares about efficiency in its
niche either.

> That's actually a really cool idea - HC11 assembly programming and TI
> C5420 DSP programming are still two of my favourite things I've ever
> done, and it would be nice if folks could more easily start exploring
> the mindset of the embedded microprocessor world without having to
> first deal with the incidental complexity of emulators or actual
> embedded hardware (even something like programming an Arduino directly
> is more hassle than remote controlling one from a Raspberry Pi or PC).

Thanks, and yes, that's the idea behind MicroPython - that people new
embedded programming could starter easier, while having chance to learn
really cool language, and be able to go into low-level details and
optimize (in my list, Python is as friendly to that as VHLL may be).
And yet another idea to make MicroPython friendly to people who know
Python and wanted to play with embedded. Making it play well for these
2 user groups isn't exactly easy, but we'd like to try.

> Unfortunately, I can't think of a good alternative name that isn't
> ambiguous at the CPython layer - embedded CPython is very different
> from an embedded microprocessor, utime is taken, and microtime is
> confusable with microseconds.

You mean POSIX utime() function, right? How we have namespacing
structured currently is that we have "u" prefix for all important
buildin modules, e.g. uos, utime, etc. They contain bare minimum, and
then fuller standard modules can be coded in Python. So, formally
speaking, on our side it will go into separate namespace anyway. It's
just we treat "utime" just as an alias for "time", and wouldn't like to
put there something which couldn't be with clean conscience submitted
as a PEP (in some distant future).

> I'm tempted to suggest calling it "qtime", and using TI's Q notation
> to denote the formats of numbers:
> That would conflict with your notion of making the APIs agnostic as to
> the exact bitwidth used, though, as well as with the meaning of the
> "q" prefix in qmath:

Yes, exact fixed-point nature of "Q" numbers doesn't help here, but as
a designator of a special format it's pretty close to the original idea
to use "_ms" and "_us" suffixes, so I treat that as a sign that we're
on the right track.

I however was thinking about our exchange with Antoine, and his
surprise that we don't want to use 64-bit value. I guess I nailed the
issue: I selected "monotonic()" because it seemed the closest to what we
need, and in my list, our stuff is still "monotonic" in a sense that it
goes only forward at constant pace. It just wraps around because so is
the physical nature of the underlying fixes-size counter. Apparently,
such "extended" treatment of "monotonic" is confusing for people who
know time.monotonic() and PEP418.

So, looks like we'll need to call our stuff different, I'm going to
propose ticks_ms() and ticks_us() for MicroPython (hopefully "ticks"
it's a well-known embedded term, and intuitive enough for other folks,
at the very least, it's better than Linux kernel's jiffies ;-) ).

> Cheers,
> Nick.

Best regards,
 Paul                          mailto:pmiscml at

From pmiscml at  Fri Jun 26 21:47:42 2015
From: pmiscml at (Paul Sokolovsky)
Date: Fri, 26 Jun 2015 22:47:42 +0300
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <>
References: <>
Message-ID: <20150626224742.7954cc0b@x230>


On Fri, 26 Jun 2015 18:00:44 +1000
Nick Coghlan <ncoghlan at> wrote:

> On 26 June 2015 at 04:06,  <random832 at> wrote:
> > On Mon, Jun 22, 2015, at 19:15, Paul Sokolovsky wrote:
> >>
> >>
> >> Hello from MicroPython, a lean Python implementation
> >> scaling down to run even on microcontrollers
> >> (
> >>
> >> Our target hardware base oftentimes lacks floating point support,
> >> and using software emulation is expensive. So, we would like to
> >> have versions of some timing functions, taking/returning
> >> millisecond and/or microsecond values as integers.
> >
> > What about having a fixed-point decimal numeric type to be used for
> > this purpose?
> >
> > Allowing time (and stat, and the relevant functions of the datetime
> > module) to return any real numeric type rather than being required
> > to use float would be a useful extension.
> It isn't the data type that's the problem per se, it's the additional
> abstraction layers - the time module assumes it's dealing with

Just to clarify, I tried to pose problem exactly as capturing right
level of abstraction, but implementation-wise, data type is important
for us (MicroPython) too. Specifically small integer is different from
most other types is that it's value type, not reference type, and so
it doesn't require memory allocation (will never trigger garbage
collection => no unpredictable pauses), and faster to work with (uPy
has ahead-of-type machine code compiler -> operations on word-sized
values approach (unoptimized) C performance).

But those are implementation details hidden by formulation of the
original task - we need integer-based time (Python has integers, so no
problems with that), and that time may and will wrap around at
implementation-specific intervals (so a particular implementation may
choose to represent it with efficient "small integer" type if it have

Note that I also don't try to bloat the problem space and say "Guys,
why don't we have unsigned integers in Python?" or "Let's have a
generic builtin modular arithmetics module". None of those needed here.

> operating system provided timing functionality, rather than accessing
> timer hardware directly. Folks tend to think that the os and time
> modules are low level, but there's a wonderful saying that asks "How
> do you tell the difference between a software developer and a computer
> systems engineer?" Answer:
> Software developer: "In a *low* level language like C..."
> Computer systems engineer: "In a *high* level language like C..."
> Paul, what do you think of the idea of trying to come up with a
> "hwclock" module for MicroPython that aims to expose very, very low

Well, so on MicroPython side, having extra modules is expensive
(defining a module costs 100+ bytes, OMG! ;-)) That's why we have
catch-all "pyb" module so far, and see ways to put sensible extra stuff
into existing modules. So, my concern is function, not module, names and
sensible semantics of those functions.

To come up with general-purpose "hwclock" module, there would need to
be bigger cooperation from various parties and stakeholders. Neither
myself nor other MicroPython developers can lead that effort,
unfortunately. But it's my hope that if someone starts that effort,
they will grep Python lists first for prior art, and maybe fall into
arguments presented here, and select compatible API, then for us,
compatibility will be easy:

--- ---
from utime import *

And even if someone selects other API, we'll know that ours is the most
efficient building blocks we can have on our side, and can implement
compatibility layer in their terms.

> level timing functionality as you describe, and reserving that name
> for independent distribution on PyPI? Such a module could even
> eventually grow a plugin system to provide access to various real time
> clock modules in addition to the basic counter based clocks you're
> interested in right now.
> Cheers,
> Nick.


Best regards,
 Paul                          mailto:pmiscml at

From pmiscml at  Fri Jun 26 22:48:30 2015
From: pmiscml at (Paul Sokolovsky)
Date: Fri, 26 Jun 2015 23:48:30 +0300
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <>
References: <>
Message-ID: <20150626234830.02f74c60@x230>


On Thu, 25 Jun 2015 14:06:38 -0400
random832 at wrote:

> > Hello from MicroPython, a lean Python implementation
> > scaling down to run even on microcontrollers 
> > (
> > 
> > Our target hardware base oftentimes lacks floating point support,
> > and using software emulation is expensive. So, we would like to have
> > versions of some timing functions, taking/returning millisecond
> > and/or microsecond values as integers.
> What about having a fixed-point decimal numeric type to be used for
> this purpose?

The problem is actually not even in floating point per se. For example,
reference hardware board for MicroPython is built on a relatively
powerful microcontroller which has hardware floating point. But
only single-precision floating point (IEEE 32-bit). And you won't read
it at or PEP418 - it's just
implied - that you don't just need floating point for those functions,
it should be floating point of specific mantissa requirements. Let's

time.time() returns the time in seconds since the epoch (1 Jan 1970) as
a floating point number.

That means that mantissa already should be at least 32 bits.
Single-precision FP has 23 mantissa bits, so it's already ruled out
from suitable representation of time.time() value.

Then, we need 10 extra mantissa bits for each SI decimal subunit (2^10
== 1024). Double-precision FP has 52 mantissa bits. We can store
millisecond precision there, we can store microsecond precision there.
But - oops - going further, we hit the same problem as MicroPython
hit right away: it's not possible to represent the same calendar data
range as Unix time() call, but with higher resolution than microsecond.

Of course, PEP418 provides one direction to work that around - by
using other epochs than Jan 1, 1970 with all these new functions like
monotonic() (which is specified as "reference point of the returned
value is undefined"). It still implicitly assumes there's enough bits
so wrap-arounds can be ignored.

My proposal works around issue in another direction - by embracing the
fact that any fixed-size counter wraps around and preparing to deal with
that. All that in turn enables to use just integer values for
representing times.

And why it's useful (implementation-wise) to be able to use integer
values, I elaborated in another recent mail.


Best regards,
 Paul                          mailto:pmiscml at

From ncoghlan at  Sat Jun 27 04:27:55 2015
From: ncoghlan at (Nick Coghlan)
Date: Sat, 27 Jun 2015 12:27:55 +1000
Subject: [Python-ideas] millisecond and microsecond times without floats
In-Reply-To: <20150626221456.211042b4@x230>
References: <20150623021530.74ce1ebe@x230>
 <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol>
 <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol>
 <20150624135908.4f85d415@x230> <20150624130338.0b222ca3@fsol>
Message-ID: <>

On 27 June 2015 at 05:14, Paul Sokolovsky <pmiscml at> wrote:
> I however was thinking about our exchange with Antoine, and his
> surprise that we don't want to use 64-bit value. I guess I nailed the
> issue: I selected "monotonic()" because it seemed the closest to what we
> need, and in my list, our stuff is still "monotonic" in a sense that it
> goes only forward at constant pace. It just wraps around because so is
> the physical nature of the underlying fixes-size counter. Apparently,
> such "extended" treatment of "monotonic" is confusing for people who
> know time.monotonic() and PEP418.
> So, looks like we'll need to call our stuff different, I'm going to
> propose ticks_ms() and ticks_us() for MicroPython (hopefully "ticks"
> it's a well-known embedded term, and intuitive enough for other folks,
> at the very least, it's better than Linux kernel's jiffies ;-) ).

I like it - as you say, ticks is already a common term for this, and
it's clearly distinct from anything else in the time module if we ever
decide to standardise it. It also doesn't hurt that "tick" is the term
both LabVIEW (
and Simulink (
use for the concept.

As a terminology/API suggestion, you may want to go with:

    tick_ms() - get the current tick with 1 millisecond between ticks
    tick_overflow_ms() - get the overflow period of the millisecond tick counter
    ticks_elapsed_ms(start, end) - get the number of millisecond ticks
elapsed between two points in time (assuming at most one tick counter
overflow between the start and end of the measurement)

    tick_us() - get the current tick with 1 microsecond between ticks
    tick_overflow_us() - get the overflow period of the microsecond tick counter
    ticks_elapsed_us(start, end) - get the number of microsecond ticks
elapsed between two points in time (assuming at most one tick counter
overflow between the start and end of the measurement)

The problem I see with "ticks_ms()" and "ticks_us()" specifically is
that the plural in the name implies "ticks elapsed since a given
reference time". Since the tick counter can wrap around, there's no
reference time - the current tick count is merely an opaque token
allowing you to measure elapsed times up to the duration of the tick
counter's overflow period.

I also don't think you want to assume the overflow periods of the
millisecond timer and the microsecond timer are going to be the same,
hence the duplication of the other APIs as well.

Something else you may want to consider is the idea of a "system
tick", distinct from the fixed duration millisecond and microsecond

    tick() - get the current tick in system ticks
    tick_overflow() - get the overflow period of the system tick counter
    ticks_elapsed(start, end) - get the number of system ticks elapsed
between two points in time (assuming at most one tick counter overflow
between the start and end of the measurement)
    tick_duration() - get the system tick duration in seconds as a
floating point number

On platforms without a real time clock, the millisecond and
microsecond ticks may then be approximations based on the system tick
counter - that's actually the origin of my suggestion to expose
completely separate APIs for the millisecond and microsecond versions,
as if those are derived by dividing a fast system tick counter
appropriately, they may wrap more frequently than every 2**32 or 2**64

Depending on use case, there may also be value in exposing the
potential degree of jitter in the *_ms() and *_us() tick counters. I'm
not sure if that would be best expressed in absolute or relative
terms, though, so I'd suggest leaving that aspect undefined
unless/until you have a specific use case in mind.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From drekin at  Sun Jun 28 12:02:01 2015
From: drekin at (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=)
Date: Sun, 28 Jun 2015 12:02:01 +0200
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
Message-ID: <>

Is there a way for a producer to say that there will be no more items put,
so consumers get something like StopIteration when there are no more items
left afterwards?

There is also the problem that one cannot easily feed a queue, asynchronous
generator, or any asynchronous iterator to a simple synchronous consumer
like sum() or list() or "".join(). It would be nice if there was a way to
wrap them to asynchronous ones when needed ? something like (async

On Wed, Jun 24, 2015 at 1:54 PM, Jonathan Slenders <jonathan at>

> In my experience, it's much easier to use asyncio Queues for this.
> Instead of yielding, push to a queue. The consumer can then use "await
> queue.get()".
> I think the semantics of the generator become too complicated otherwise,
> or maybe impossible.
> Maybe have a look at this article:
> Jonathan
> 2015-06-24 12:13 GMT+02:00 Andrew Svetlov <andrew.svetlov at>:
>> Your idea is clean and maybe we will allow `yield` inside `async def`
>> in Python 3.6.
>> For PEP 492 it was too big change.
>> On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? <drekin at> wrote:
>> > Hello,
>> >
>> > I had a generator producing pairs of values and wanted to feed all the
>> first
>> > members of the pairs to one consumer and all the second members to
>> another
>> > consumer. For example:
>> >
>> > def pairs():
>> >     for i in range(4):
>> >         yield (i, i ** 2)
>> >
>> > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9])
>> >
>> > The point is I wanted the consumers to be suspended and resumed in a
>> > coordinated manner: The first producer is invoked, it wants the first
>> > element. The coordinator implemented by biconsumer function invokes
>> pairs(),
>> > gets the first pair and yields its first member to the first consumer.
>> Then
>> > it wants the next element, but now it's the second consumer's turn, so
>> the
>> > first consumer is suspended and the second consumer is invoked and fed
>> with
>> > the second member of the first pair. Then the second producer wants the
>> next
>> > element, but it's the first consumer's turn? and so on. In the end,
>> when the
>> > stream of pairs is exhausted, StopIteration is thrown to both consumers
>> and
>> > their results are combined.
>> >
>> > The cooperative asynchronous nature of the execution reminded me
>> asyncio and
>> > coroutines, so I thought that biconsumer may be implemented using them.
>> > However, it seems that it is imposible to write an "asynchronous
>> generator"
>> > since the "yielding pipe" is already used for the communication with the
>> > scheduler. And even if it was possible to make an asynchronous
>> generator, it
>> > is not clear how to feed it to a synchronous consumer like sum() or
>> list()
>> > function.
>> >
>> > With PEP 492 the concepts of generators and coroutines were separated,
>> so
>> > asyncronous generators may be possible in theory. An ordinary function
>> has
>> > just the returning pipe ? for returning the result to the caller. A
>> > generator has also a yielding pipe ? used for yielding the values during
>> > iteration, and its return pipe is used to finish the iteration. A native
>> > coroutine has a returning pipe ? to return the result to a caller just
>> like
>> > an ordinary function, and also an async pipe ? used for communication
>> with a
>> > scheduler and execution suspension. An asynchronous generator would just
>> > have both yieling pipe and async pipe.
>> >
>> > So my question is: was the code like the following considered? Does it
>> make
>> > sense? Or are there not enough uses cases for such code? I found only a
>> > short mention in
>> >, so
>> possibly
>> > these coroutine-generators are the same idea.
>> >
>> > async def f():
>> >     number_string = await fetch_data()
>> >     for n in number_string.split():
>> >         yield int(n)
>> >
>> > async def g():
>> >     result = async/await? sum(f())
>> >     return result
>> >
>> > async def h():
>> >     the_sum = await g()
>> >
>> > As for explanation about the execution of h() by an event loop: h is a
>> > native coroutine called by the event loop, having both returning pipe
>> and
>> > async pipe. The returning pipe leads to the end of the task, the async
>> pipe
>> > is used for cummunication with the scheduler. Then, g() is called
>> > asynchronously ? using the await keyword means the the access to the
>> async
>> > pipe is given to the callee. Then g() invokes the asyncronous generator
>> f()
>> > and gives it the access to its async pipe, so when f() is yielding
>> values to
>> > sum, it can also yield a future to the scheduler via the async pipe and
>> > suspend the whole task.
>> >
>> > Regards, Adam Barto?
>> >
>> >
>> > _______________________________________________
>> > Python-ideas mailing list
>> > Python-ideas at
>> >
>> > Code of Conduct:
>> --
>> Thanks,
>> Andrew Svetlov
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From andrew.svetlov at  Sun Jun 28 12:07:32 2015
From: andrew.svetlov at (Andrew Svetlov)
Date: Sun, 28 Jun 2015 13:07:32 +0300
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
Message-ID: <>

I afraid the last will never possible -- you cannot push async
coroutines into synchronous convention call.
Your example should be converted into `await
async_sum(asynchronously_produced_numbers())` which is possible right
now. (asynchronously_produced_numbers should be *iterator* with
__aiter__/__anext__ methods, not generator with yield expressions

On Sun, Jun 28, 2015 at 1:02 PM, Adam Barto? <drekin at> wrote:
> Is there a way for a producer to say that there will be no more items put,
> so consumers get something like StopIteration when there are no more items
> left afterwards?
> There is also the problem that one cannot easily feed a queue, asynchronous
> generator, or any asynchronous iterator to a simple synchronous consumer
> like sum() or list() or "".join(). It would be nice if there was a way to
> wrap them to asynchronous ones when needed ? something like (async
> sum)(asynchronously_produced_numbers()).
> On Wed, Jun 24, 2015 at 1:54 PM, Jonathan Slenders <jonathan at>
> wrote:
>> In my experience, it's much easier to use asyncio Queues for this.
>> Instead of yielding, push to a queue. The consumer can then use "await
>> queue.get()".
>> I think the semantics of the generator become too complicated otherwise,
>> or maybe impossible.
>> Maybe have a look at this article:
>> Jonathan
>> 2015-06-24 12:13 GMT+02:00 Andrew Svetlov <andrew.svetlov at>:
>>> Your idea is clean and maybe we will allow `yield` inside `async def`
>>> in Python 3.6.
>>> For PEP 492 it was too big change.
>>> On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? <drekin at> wrote:
>>> > Hello,
>>> >
>>> > I had a generator producing pairs of values and wanted to feed all the
>>> > first
>>> > members of the pairs to one consumer and all the second members to
>>> > another
>>> > consumer. For example:
>>> >
>>> > def pairs():
>>> >     for i in range(4):
>>> >         yield (i, i ** 2)
>>> >
>>> > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9])
>>> >
>>> > The point is I wanted the consumers to be suspended and resumed in a
>>> > coordinated manner: The first producer is invoked, it wants the first
>>> > element. The coordinator implemented by biconsumer function invokes
>>> > pairs(),
>>> > gets the first pair and yields its first member to the first consumer.
>>> > Then
>>> > it wants the next element, but now it's the second consumer's turn, so
>>> > the
>>> > first consumer is suspended and the second consumer is invoked and fed
>>> > with
>>> > the second member of the first pair. Then the second producer wants the
>>> > next
>>> > element, but it's the first consumer's turn? and so on. In the end,
>>> > when the
>>> > stream of pairs is exhausted, StopIteration is thrown to both consumers
>>> > and
>>> > their results are combined.
>>> >
>>> > The cooperative asynchronous nature of the execution reminded me
>>> > asyncio and
>>> > coroutines, so I thought that biconsumer may be implemented using them.
>>> > However, it seems that it is imposible to write an "asynchronous
>>> > generator"
>>> > since the "yielding pipe" is already used for the communication with
>>> > the
>>> > scheduler. And even if it was possible to make an asynchronous
>>> > generator, it
>>> > is not clear how to feed it to a synchronous consumer like sum() or
>>> > list()
>>> > function.
>>> >
>>> > With PEP 492 the concepts of generators and coroutines were separated,
>>> > so
>>> > asyncronous generators may be possible in theory. An ordinary function
>>> > has
>>> > just the returning pipe ? for returning the result to the caller. A
>>> > generator has also a yielding pipe ? used for yielding the values
>>> > during
>>> > iteration, and its return pipe is used to finish the iteration. A
>>> > native
>>> > coroutine has a returning pipe ? to return the result to a caller just
>>> > like
>>> > an ordinary function, and also an async pipe ? used for communication
>>> > with a
>>> > scheduler and execution suspension. An asynchronous generator would
>>> > just
>>> > have both yieling pipe and async pipe.
>>> >
>>> > So my question is: was the code like the following considered? Does it
>>> > make
>>> > sense? Or are there not enough uses cases for such code? I found only a
>>> > short mention in
>>> >, so
>>> > possibly
>>> > these coroutine-generators are the same idea.
>>> >
>>> > async def f():
>>> >     number_string = await fetch_data()
>>> >     for n in number_string.split():
>>> >         yield int(n)
>>> >
>>> > async def g():
>>> >     result = async/await? sum(f())
>>> >     return result
>>> >
>>> > async def h():
>>> >     the_sum = await g()
>>> >
>>> > As for explanation about the execution of h() by an event loop: h is a
>>> > native coroutine called by the event loop, having both returning pipe
>>> > and
>>> > async pipe. The returning pipe leads to the end of the task, the async
>>> > pipe
>>> > is used for cummunication with the scheduler. Then, g() is called
>>> > asynchronously ? using the await keyword means the the access to the
>>> > async
>>> > pipe is given to the callee. Then g() invokes the asyncronous generator
>>> > f()
>>> > and gives it the access to its async pipe, so when f() is yielding
>>> > values to
>>> > sum, it can also yield a future to the scheduler via the async pipe and
>>> > suspend the whole task.
>>> >
>>> > Regards, Adam Barto?
>>> >
>>> >
>>> > _______________________________________________
>>> > Python-ideas mailing list
>>> > Python-ideas at
>>> >
>>> > Code of Conduct:
>>> --
>>> Thanks,
>>> Andrew Svetlov
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at
>>> Code of Conduct:

Andrew Svetlov

From drekin at  Sun Jun 28 12:30:20 2015
From: drekin at (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=)
Date: Sun, 28 Jun 2015 12:30:20 +0200
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
Message-ID: <>

I understand that it's impossible today, but I thought that if asynchronous
generators were going to be added, some kind of generalized generator
mechanism allowing yielding to multiple different places would be needed
anyway. So in theory no special change to synchronous consumers would be
needed ? when the asynschronous generator object is created, it gets a link
to the scheduler from the caller, then it's given as an argument to sum();
when sum wants next item it calls next() and the asynchronous generator can
either yield the next value to sum or it can yield a future to the
scheduler and suspend execution of whole task. But since it's a good idea
to be explicit and mark each asyncronous call, some wrapper like (async
sum) would be used.

On Sun, Jun 28, 2015 at 12:07 PM, Andrew Svetlov <andrew.svetlov at>

> I afraid the last will never possible -- you cannot push async
> coroutines into synchronous convention call.
> Your example should be converted into `await
> async_sum(asynchronously_produced_numbers())` which is possible right
> now. (asynchronously_produced_numbers should be *iterator* with
> __aiter__/__anext__ methods, not generator with yield expressions
> inside.
> On Sun, Jun 28, 2015 at 1:02 PM, Adam Barto? <drekin at> wrote:
> > Is there a way for a producer to say that there will be no more items
> put,
> > so consumers get something like StopIteration when there are no more
> items
> > left afterwards?
> >
> > There is also the problem that one cannot easily feed a queue,
> asynchronous
> > generator, or any asynchronous iterator to a simple synchronous consumer
> > like sum() or list() or "".join(). It would be nice if there was a way to
> > wrap them to asynchronous ones when needed ? something like (async
> > sum)(asynchronously_produced_numbers()).
> >
> >
> >
> > On Wed, Jun 24, 2015 at 1:54 PM, Jonathan Slenders <jonathan at
> >
> > wrote:
> >>
> >> In my experience, it's much easier to use asyncio Queues for this.
> >> Instead of yielding, push to a queue. The consumer can then use "await
> >> queue.get()".
> >>
> >> I think the semantics of the generator become too complicated otherwise,
> >> or maybe impossible.
> >> Maybe have a look at this article:
> >>
> >>
> >> Jonathan
> >>
> >>
> >>
> >>
> >> 2015-06-24 12:13 GMT+02:00 Andrew Svetlov <andrew.svetlov at>:
> >>>
> >>> Your idea is clean and maybe we will allow `yield` inside `async def`
> >>> in Python 3.6.
> >>> For PEP 492 it was too big change.
> >>>
> >>> On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? <drekin at>
> wrote:
> >>> > Hello,
> >>> >
> >>> > I had a generator producing pairs of values and wanted to feed all
> the
> >>> > first
> >>> > members of the pairs to one consumer and all the second members to
> >>> > another
> >>> > consumer. For example:
> >>> >
> >>> > def pairs():
> >>> >     for i in range(4):
> >>> >         yield (i, i ** 2)
> >>> >
> >>> > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9])
> >>> >
> >>> > The point is I wanted the consumers to be suspended and resumed in a
> >>> > coordinated manner: The first producer is invoked, it wants the first
> >>> > element. The coordinator implemented by biconsumer function invokes
> >>> > pairs(),
> >>> > gets the first pair and yields its first member to the first
> consumer.
> >>> > Then
> >>> > it wants the next element, but now it's the second consumer's turn,
> so
> >>> > the
> >>> > first consumer is suspended and the second consumer is invoked and
> fed
> >>> > with
> >>> > the second member of the first pair. Then the second producer wants
> the
> >>> > next
> >>> > element, but it's the first consumer's turn? and so on. In the end,
> >>> > when the
> >>> > stream of pairs is exhausted, StopIteration is thrown to both
> consumers
> >>> > and
> >>> > their results are combined.
> >>> >
> >>> > The cooperative asynchronous nature of the execution reminded me
> >>> > asyncio and
> >>> > coroutines, so I thought that biconsumer may be implemented using
> them.
> >>> > However, it seems that it is imposible to write an "asynchronous
> >>> > generator"
> >>> > since the "yielding pipe" is already used for the communication with
> >>> > the
> >>> > scheduler. And even if it was possible to make an asynchronous
> >>> > generator, it
> >>> > is not clear how to feed it to a synchronous consumer like sum() or
> >>> > list()
> >>> > function.
> >>> >
> >>> > With PEP 492 the concepts of generators and coroutines were
> separated,
> >>> > so
> >>> > asyncronous generators may be possible in theory. An ordinary
> function
> >>> > has
> >>> > just the returning pipe ? for returning the result to the caller. A
> >>> > generator has also a yielding pipe ? used for yielding the values
> >>> > during
> >>> > iteration, and its return pipe is used to finish the iteration. A
> >>> > native
> >>> > coroutine has a returning pipe ? to return the result to a caller
> just
> >>> > like
> >>> > an ordinary function, and also an async pipe ? used for communication
> >>> > with a
> >>> > scheduler and execution suspension. An asynchronous generator would
> >>> > just
> >>> > have both yieling pipe and async pipe.
> >>> >
> >>> > So my question is: was the code like the following considered? Does
> it
> >>> > make
> >>> > sense? Or are there not enough uses cases for such code? I found
> only a
> >>> > short mention in
> >>> >, so
> >>> > possibly
> >>> > these coroutine-generators are the same idea.
> >>> >
> >>> > async def f():
> >>> >     number_string = await fetch_data()
> >>> >     for n in number_string.split():
> >>> >         yield int(n)
> >>> >
> >>> > async def g():
> >>> >     result = async/await? sum(f())
> >>> >     return result
> >>> >
> >>> > async def h():
> >>> >     the_sum = await g()
> >>> >
> >>> > As for explanation about the execution of h() by an event loop: h is
> a
> >>> > native coroutine called by the event loop, having both returning pipe
> >>> > and
> >>> > async pipe. The returning pipe leads to the end of the task, the
> async
> >>> > pipe
> >>> > is used for cummunication with the scheduler. Then, g() is called
> >>> > asynchronously ? using the await keyword means the the access to the
> >>> > async
> >>> > pipe is given to the callee. Then g() invokes the asyncronous
> generator
> >>> > f()
> >>> > and gives it the access to its async pipe, so when f() is yielding
> >>> > values to
> >>> > sum, it can also yield a future to the scheduler via the async pipe
> and
> >>> > suspend the whole task.
> >>> >
> >>> > Regards, Adam Barto?
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > Python-ideas mailing list
> >>> > Python-ideas at
> >>> >
> >>> > Code of Conduct:
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Andrew Svetlov
> >>> _______________________________________________
> >>> Python-ideas mailing list
> >>> Python-ideas at
> >>>
> >>> Code of Conduct:
> >>
> >>
> >
> --
> Thanks,
> Andrew Svetlov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stefan_ml at  Sun Jun 28 12:58:32 2015
From: stefan_ml at (Stefan Behnel)
Date: Sun, 28 Jun 2015 12:58:32 +0200
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
Message-ID: <mmok0q$ugg$>

[Fixing the messed-up reply quoting order]

Adam Barto? schrieb am 28.06.2015 um 12:30:
> On Sun, Jun 28, 2015 at 12:07 PM, Andrew Svetlov wrote:
>> On Sun, Jun 28, 2015 at 1:02 PM, Adam Barto? wrote:
>>> There is also the problem that one cannot easily feed a queue,
>>> asynchronous
>>> generator, or any asynchronous iterator to a simple synchronous consumer
>>> like sum() or list() or "".join(). It would be nice if there was a way to
>>> wrap them to asynchronous ones when needed ? something like (async
>>> sum)(asynchronously_produced_numbers()).
>> I afraid the last will never possible -- you cannot push async
>> coroutines into synchronous convention call.
>> Your example should be converted into `await
>> async_sum(asynchronously_produced_numbers())` which is possible right
>> now. (asynchronously_produced_numbers should be *iterator* with
>> __aiter__/__anext__ methods, not generator with yield expressions
>> inside.
> I understand that it's impossible today, but I thought that if asynchronous
> generators were going to be added, some kind of generalized generator
> mechanism allowing yielding to multiple different places would be needed
> anyway. So in theory no special change to synchronous consumers would be
> needed ? when the asynschronous generator object is created, it gets a link
> to the scheduler from the caller, then it's given as an argument to sum();
> when sum wants next item it calls next() and the asynchronous generator can
> either yield the next value to sum or it can yield a future to the
> scheduler and suspend execution of whole task. But since it's a good idea
> to be explicit and mark each asyncronous call, some wrapper like (async
> sum) would be used.

Stackless might eventually support something like that.

That being said, note that by design, the scheduler (or I/O loop, if that's
what you're using) always lives *outside* of the whole asynchronous call
chain, at its very end, but can otherwise be controlled by arbitrary code
itself, and that is usually synchronous code. In your example, it could
simply be moved between the first async function and its synchronous
consumer ("sum" in your example). Doing that is entirely possible. What is
not possible (unless you're using a design like Stackless) is that this
scheduler controls its own controller, e.g. that it starts interrupting the
execution of the synchronous code that called it.


From gmludo at  Sun Jun 28 17:49:56 2015
From: gmludo at (Ludovic Gasc)
Date: Sun, 28 Jun 2015 17:49:56 +0200
Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in
 logging module to simplify structured logs support
In-Reply-To: <>
References: <>
Message-ID: <>

2015-05-25 22:08 GMT+02:00 Andrew Barnert <abarnert at>:

> On Monday, May 25, 2015 6:57 AM, Ludovic Gasc <gmludo at> wrote:
> >2015-05-25 4:19 GMT+02:00 Steven D'Aprano <steve at>:
> >>At the other extreme, there is the structlog module:
> >>
> >>
> >
> >Thank you for the link, it's an interesting project, it's like "logging"
> module but on steroids, some good logging ideas inside.
> >However, in fact, if I understand correctly, it's the same approach that
> the previous recipe: Generate a log file with JSON content, use
> logstash-forwarder to reparse the JSON content, to finally send the
> structure to logstash, for the query part:
> >>How does your change compare to those?
> >>
> >
> >
> >In the use case of structlog, drop the logstash-forwarder step to
> interconnect directly Python daemon with structured log daemon.
> >Even if logstash-forwarder should be efficient, why to have an additional
> step to rebuild a structure you have at the beginning ?
Sorry for the delay, I was very busy since one month.

> You can't send a Python dictionary over the wire, or store a Python
> dictionary in a database.You need to encode it to some transmission and/or
> storage format; there's no way around that. And what's wrong with using
> JSON as that format?

Maybe I should be more clear about my objective: I'm trying to build the
simplest architecture for logging, based on the existing python logging
features because all Python libraries use that, and with similar features
that with ELK (Elasticsearch, Logstash, Kinbana).

On the paper, the features ELK are very interesting for a sysadmin and,
with my sysadmin hat, I'm strongly agree with that.
The issue is that, based on my experience where I work, (sorry eventual
ELK-lovers on this ML), but, it's very complicated to setup, to maintain
and to keep scalable when you have a lot of logs: We passed a lot of time
to have a working ELK, and finally we dropped that because the cost of
maintenance was too important for us compare to use grep in rsyslog logs.
Maybe we aren't enough smart to maintain ELK, it's possible.
However, if we're not too smart to do that, certainly some people have the
same issue as us.
In fact, the issue shouldn't be our brains, but it was clearly a time
consuming task, and we have too much directly paid-work to take care.

Don't be wrong: I don't say that ELK doesn't work, only it's time consuming
with a high level of logs.
I'm pretty sure that a lot of people are happy with ELK, it's cool for them

It's like Oracle and PostgreSQL databases: Where with Oracle you need a
full-time DBA, with PostgreSQL: apt-get install postgresql
With this last sentence, I'm totally caricatural, but only to show where I
see an issue that should be fixed, at least for us.
(FYI, in a previous professional life, I've maintained Oracle, MySQL and
PostgreSQL servers for several clients, I know a little bit the subject).

>From my point of view, the features in journald are enough to replace most
usages of ELK, at least for us, and, contrary to ELK, journald is already
installed in all latest Linux distributions, even in Debian Jessie. You
have almost no maintenance cost.

More importantly, when you drop logstash-forwarder, how are you intending
> to get the messages to the upstream server? You don't want to make your log
> calls synchronously wait for acknowledgement before returning. So you need
> some kind of buffering. And just buffering in memory doesn't work: if your
> service shuts down unexpectedly, you've lost the last batch of log messages
> which would tell you why it went down (plus, if the network goes down
> temporarily, your memory use becomes unbounded). You can of course buffer
> to disk, but then you've just reintroduced the same need for some kind of
> intermediate storage format you were trying to eliminate?and it doesn't
> really solve the problem, because if your service shuts down, the last
> messages won't get sent until it starts up again. So you could write a
> separate simple store-and-forward daemon that either reads those file
> buffers or listens on localhost UDP? but then you've just recreated
> logstash-forwarder.

In the past, we used directly a local rsyslog to play this role on each VM,
connected directly with the Python daemons via a datagram UNIX socket.
See a logging config file example:

Now, it's journald that plays this role, also via a datagram UNIX socket.

> And even if you wanted to do all that, I don't see why you couldn't do it
> all with structlog. They recommend using an already-working workflow
> instead of designing a different one from scratch, but it's just a
> recommendation.

You're right: Don't reinvent the wheel.
However, if I follow your argument in another context: instead of to create
AsyncIO, Guido should integrate Twisted in Python ?
As an end-user of Twisted and AsyncIO, it isn't for the pleasure or to be
fancy that we migrated from Twisted to AsyncIO ;-)
To me, the expression should be: "Don't reinvent the wheel, except if you
can provide a more efficient wheel"

Now, in the context of logging: Please let me to try another approach,
maybe I'll waste my time, or maybe I'll find an interesting gold nugget,
who knows before to dig ?
You can think I'm trying to be only different from the "ELK" standard, and
it's possible, who knows ?

If I revive this thread, it isn't to troll you, but because I'm interested
in by your opinion.
I may found a better approach that doesn't need a CPython patch and it's
more powerful.

In the source code of logging package, I've found this:
BTW, this approach should have more promotion: I didn't know you can use a
dict to replace text in a log message, I thought only strings.

Now, instead of to use extra parameter, I use directly this feature.

For the developer, instead to write this:

logger.debug('Receive a create_or_update request from "%s" account',

he writes this:

logger.debug('Receive a create_or_update request from "%(account_id)s"
            {'request_id': request.request_id,
            'account_id': account_id,
            'aiohttp_request': request,
            'payload': payload})

With that, you can write logs as usual in your source code, and use the
handler you want.

However, if you use the systemDream handler, all metadata with your log
will be sent to journald:

The another bonus of this approach is that you can use an element of your
dict to improve your log message.
With my previous approach with extra parameter, you must pass two times the
The cherry on the cake is that extra can be used for something else.
And the bonus of bonus, for the developers who already use this logging
feature, they are already journald compliant without to know.

I see no drawbacks of this approach, except that the developers who already
use this feature: he must be consistent in the key names of the dict to be
useful with journald.

I'm very interested in by your feedbacks, maybe I've missed something.

If anybody doesn't find an issue, I'll push this pattern also in the
official Python binding of journald, systemDream is only my laboratory to
experiment around systemd/journald (and secondarily, it's impossible to
setup the official Python binding of systemd/journald in a pyvenv, at least
to me).

I'll publish also a step-by-step tutorial for the new comers on my blog.

Thanks for your attention.
Ludovic Gasc (GMLudo)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pmiscml at  Sun Jun 28 17:52:49 2015
From: pmiscml at (Paul Sokolovsky)
Date: Sun, 28 Jun 2015 18:52:49 +0300
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
Message-ID: <20150628185249.61624b82@x230>


On Sun, 28 Jun 2015 12:02:01 +0200
Adam Barto? <drekin at> wrote:

> Is there a way for a producer to say that there will be no more items
> put, so consumers get something like StopIteration when there are no
> more items left afterwards?

Sure, just designate sentinel value of your likes (StopIteration class
value seems an obvious choice) and use it for that purpose.

> There is also the problem that one cannot easily feed a queue,
> asynchronous generator, or any asynchronous iterator to a simple
> synchronous consumer like sum() or list() or "".join(). It would be
> nice if there was a way to wrap them to asynchronous ones when needed
> ? something like (async sum)(asynchronously_produced_numbers()).

All that is easily achievable with classical Python coroutines, not
with asyncio garden variety of coroutines, which lately were casted
into a language level with async/await disablers:

def coro1():
    yield 1
    yield 2
    yield 3

def coro2():
    yield from coro1()
    yield 4
    yield 5


And back to your starter question, it's also possible - and also only
with classical Python coroutines. I mentioned not just possibility, but
necessity of that in my independent "reverse engineering" of how yield
from works
(point 9 there). That's simplistic presentation, and in the presence of
"syscall main loop", example there would need to be:

class MyValueWrapper:
    def __init__(self, v):
        self.v = v

def pump(ins, outs):
    for chunk in gen(ins):
         if isinstance(chunk, MyValueWrapper):
             # if value we got from a coro is of
             # type we expect, process it
             yield from outs.write(chunk.v)
	     # anything else is simply not for us,
             # re-yield it to higher levels (ultimately, mainloop)
             yield chunk

def gen(ins):
    yield MyValueWrapper("<b>")
    # Assume read_in_chunks() already yields MyValueWrapper objects
    yield from ins.read_in_chunks(1000*1000*1000)
    yield MyValueWrapper("</b>")

> On Wed, Jun 24, 2015 at 1:54 PM, Jonathan Slenders
> <jonathan at> wrote:
> > In my experience, it's much easier to use asyncio Queues for this.
> > Instead of yielding, push to a queue. The consumer can then use
> > "await queue.get()".
> >
> > I think the semantics of the generator become too complicated
> > otherwise, or maybe impossible.
> > Maybe have a look at this article:
> >
> >
> > Jonathan
> >
> >
> >
> >
> > 2015-06-24 12:13 GMT+02:00 Andrew Svetlov
> > <andrew.svetlov at>:
> >
> >> Your idea is clean and maybe we will allow `yield` inside `async
> >> def` in Python 3.6.
> >> For PEP 492 it was too big change.
> >>
> >> On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? <drekin at>
> >> wrote:
> >> > Hello,
> >> >
> >> > I had a generator producing pairs of values and wanted to feed
> >> > all the
> >> first
> >> > members of the pairs to one consumer and all the second members
> >> > to
> >> another
> >> > consumer. For example:
> >> >
> >> > def pairs():
> >> >     for i in range(4):
> >> >         yield (i, i ** 2)
> >> >
> >> > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9])
> >> >
> >> > The point is I wanted the consumers to be suspended and resumed
> >> > in a coordinated manner: The first producer is invoked, it wants
> >> > the first element. The coordinator implemented by biconsumer
> >> > function invokes
> >> pairs(),
> >> > gets the first pair and yields its first member to the first
> >> > consumer.
> >> Then
> >> > it wants the next element, but now it's the second consumer's
> >> > turn, so
> >> the
> >> > first consumer is suspended and the second consumer is invoked
> >> > and fed
> >> with
> >> > the second member of the first pair. Then the second producer
> >> > wants the
> >> next
> >> > element, but it's the first consumer's turn? and so on. In the
> >> > end,
> >> when the
> >> > stream of pairs is exhausted, StopIteration is thrown to both
> >> > consumers
> >> and
> >> > their results are combined.
> >> >
> >> > The cooperative asynchronous nature of the execution reminded me
> >> asyncio and
> >> > coroutines, so I thought that biconsumer may be implemented
> >> > using them. However, it seems that it is imposible to write an
> >> > "asynchronous
> >> generator"
> >> > since the "yielding pipe" is already used for the communication
> >> > with the scheduler. And even if it was possible to make an
> >> > asynchronous
> >> generator, it
> >> > is not clear how to feed it to a synchronous consumer like sum()
> >> > or
> >> list()
> >> > function.
> >> >
> >> > With PEP 492 the concepts of generators and coroutines were
> >> > separated,
> >> so
> >> > asyncronous generators may be possible in theory. An ordinary
> >> > function
> >> has
> >> > just the returning pipe ? for returning the result to the
> >> > caller. A generator has also a yielding pipe ? used for yielding
> >> > the values during iteration, and its return pipe is used to
> >> > finish the iteration. A native coroutine has a returning pipe ?
> >> > to return the result to a caller just
> >> like
> >> > an ordinary function, and also an async pipe ? used for
> >> > communication
> >> with a
> >> > scheduler and execution suspension. An asynchronous generator
> >> > would just have both yieling pipe and async pipe.
> >> >
> >> > So my question is: was the code like the following considered?
> >> > Does it
> >> make
> >> > sense? Or are there not enough uses cases for such code? I found
> >> > only a short mention in
> >> >,
> >> > so
> >> possibly
> >> > these coroutine-generators are the same idea.
> >> >
> >> > async def f():
> >> >     number_string = await fetch_data()
> >> >     for n in number_string.split():
> >> >         yield int(n)
> >> >
> >> > async def g():
> >> >     result = async/await? sum(f())
> >> >     return result
> >> >
> >> > async def h():
> >> >     the_sum = await g()
> >> >
> >> > As for explanation about the execution of h() by an event loop:
> >> > h is a native coroutine called by the event loop, having both
> >> > returning pipe
> >> and
> >> > async pipe. The returning pipe leads to the end of the task, the
> >> > async
> >> pipe
> >> > is used for cummunication with the scheduler. Then, g() is called
> >> > asynchronously ? using the await keyword means the the access to
> >> > the
> >> async
> >> > pipe is given to the callee. Then g() invokes the asyncronous
> >> > generator
> >> f()
> >> > and gives it the access to its async pipe, so when f() is
> >> > yielding
> >> values to
> >> > sum, it can also yield a future to the scheduler via the async
> >> > pipe and suspend the whole task.
> >> >
> >> > Regards, Adam Barto?
> >> >
> >> >
> >> > _______________________________________________
> >> > Python-ideas mailing list
> >> > Python-ideas at
> >> >
> >> > Code of Conduct:
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Andrew Svetlov
> >> _______________________________________________
> >> Python-ideas mailing list
> >> Python-ideas at
> >>
> >> Code of Conduct:
> >
> >
> >

Best regards,
 Paul                          mailto:pmiscml at

From at  Mon Jun 29 00:14:20 2015
From: at (Yury Selivanov)
Date: Sun, 28 Jun 2015 18:14:20 -0400
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <20150628185249.61624b82@x230>
References: <>
Message-ID: <>

On 2015-06-28 11:52 AM, Paul Sokolovsky wrote:
>> There is also the problem that one cannot easily feed a queue,
>> >asynchronous generator, or any asynchronous iterator to a simple
>> >synchronous consumer like sum() or list() or "".join(). It would be
>> >nice if there was a way to wrap them to asynchronous ones when needed
>> >? something like (async sum)(asynchronously_produced_numbers()).
> All that is easily achievable with classical Python coroutines, not
> with asyncio garden variety of coroutines, which lately were casted
> into a language level with async/await disablers:
> def coro1():
>      yield 1
>      yield 2
>      yield 3
> def coro2():
>      yield from coro1()
>      yield 4
>      yield 5
> print(sum(coro2()))

You have easily achieved combining two generators with 'yield from' and 
feeding that to 'sum' builtin.

There is no way to combine synchronous loops with asynchronous 
coroutines; by definition, the entire process will block while you are 
iterating trough them.


From ncoghlan at  Mon Jun 29 03:09:14 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 29 Jun 2015 11:09:14 +1000
Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in
 logging module to simplify structured logs support
In-Reply-To: <>
References: <>
Message-ID: <>

On 29 Jun 2015 1:50 am, "Ludovic Gasc" <gmludo at> wrote:
> In fact, the issue shouldn't be our brains, but it was clearly a time
consuming task, and we have too much directly paid-work to take care.
> Don't be wrong: I don't say that ELK doesn't work, only it's time
consuming with a high level of logs.
> I'm pretty sure that a lot of people are happy with ELK, it's cool for
them ;-)
> It's like Oracle and PostgreSQL databases: Where with Oracle you need a
full-time DBA, with PostgreSQL: apt-get install postgresql
> With this last sentence, I'm totally caricatural, but only to show where
I see an issue that should be fixed, at least for us.
> (FYI, in a previous professional life, I've maintained Oracle, MySQL and
PostgreSQL servers for several clients, I know a little bit the subject).

This discrepancy in manageability between services like PostgreSQL & more
complex setups like the ELK stack is why Red Hat started working on
Nulecule as part of Project Atomic:

There's still some work to be done making sure the related tools support
Debian and derivatives properly, but "the ELK stack is too hard to install
& maintain" is a distro level software management problem to be solved,
rather than something to try to work around at the language level.


> From my point of view, the features in journald are enough to replace
most usages of ELK, at least for us, and, contrary to ELK, journald is
already installed in all latest Linux distributions, even in Debian Jessie.
You have almost no maintenance cost.
>> More importantly, when you drop logstash-forwarder, how are you
intending to get the messages to the upstream server? You don't want to
make your log calls synchronously wait for acknowledgement before
returning. So you need some kind of buffering. And just buffering in memory
doesn't work: if your service shuts down unexpectedly, you've lost the last
batch of log messages which would tell you why it went down (plus, if the
network goes down temporarily, your memory use becomes unbounded). You can
of course buffer to disk, but then you've just reintroduced the same need
for some kind of intermediate storage format you were trying to
eliminate?and it doesn't really solve the problem, because if your service
shuts down, the last messages won't get sent until it starts up again. So
you could write a separate simple store-and-forward daemon that either
reads those file buffers or listens on localhost UDP? but then you've just
recreated logstash-forwarder.
> In the past, we used directly a local rsyslog to play this role on each
VM, connected directly with the Python daemons via a datagram UNIX socket.
> See a logging config file example:
> Now, it's journald that plays this role, also via a datagram UNIX socket.
>> And even if you wanted to do all that, I don't see why you couldn't do
it all with structlog. They recommend using an already-working workflow
instead of designing a different one from scratch, but it's just a
> You're right: Don't reinvent the wheel.
> However, if I follow your argument in another context: instead of to
create AsyncIO, Guido should integrate Twisted in Python ?
> As an end-user of Twisted and AsyncIO, it isn't for the pleasure or to be
fancy that we migrated from Twisted to AsyncIO ;-)
> To me, the expression should be: "Don't reinvent the wheel, except if you
can provide a more efficient wheel"
> Now, in the context of logging: Please let me to try another approach,
maybe I'll waste my time, or maybe I'll find an interesting gold nugget,
who knows before to dig ?
> You can think I'm trying to be only different from the "ELK" standard,
and it's possible, who knows ?
> If I revive this thread, it isn't to troll you, but because I'm
interested in by your opinion.
> I may found a better approach that doesn't need a CPython patch and it's
more powerful.
> In the source code of logging package, I've found this:
> BTW, this approach should have more promotion: I didn't know you can use
a dict to replace text in a log message, I thought only strings.
> Now, instead of to use extra parameter, I use directly this feature.
> For the developer, instead to write this:
> logger.debug('Receive a create_or_update request from "%s" account',
> he writes this:
> logger.debug('Receive a create_or_update request from "%(account_id)s"
>             {'request_id': request.request_id,
>             'account_id': account_id,
>             'aiohttp_request': request,
>             'payload': payload})
> With that, you can write logs as usual in your source code, and use the
handler you want.
> However, if you use the systemDream handler, all metadata with your log
will be sent to journald:
> The another bonus of this approach is that you can use an element of your
dict to improve your log message.
> With my previous approach with extra parameter, you must pass two times
the values.
> The cherry on the cake is that extra can be used for something else.
> And the bonus of bonus, for the developers who already use this logging
feature, they are already journald compliant without to know.
> I see no drawbacks of this approach, except that the developers who
already use this feature: he must be consistent in the key names of the
dict to be useful with journald.
> I'm very interested in by your feedbacks, maybe I've missed something.
> If anybody doesn't find an issue, I'll push this pattern also in the
official Python binding of journald, systemDream is only my laboratory to
experiment around systemd/journald (and secondarily, it's impossible to
setup the official Python binding of systemd/journald in a pyvenv, at least
to me).
> I'll publish also a step-by-step tutorial for the new comers on my blog.
> Thanks for your attention.
> --
> Ludovic Gasc (GMLudo)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pmiscml at  Mon Jun 29 08:44:39 2015
From: pmiscml at (Paul Sokolovsky)
Date: Mon, 29 Jun 2015 09:44:39 +0300
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
 <20150628185249.61624b82@x230> <>
Message-ID: <20150629094439.4f8a8efa@x230>


On Sun, 28 Jun 2015 18:14:20 -0400
Yury Selivanov < at> wrote:

> On 2015-06-28 11:52 AM, Paul Sokolovsky wrote:
> >> There is also the problem that one cannot easily feed a queue,
> >> >asynchronous generator, or any asynchronous iterator to a simple
> >> >synchronous consumer like sum() or list() or "".join(). It would
> >> >be nice if there was a way to wrap them to asynchronous ones when
> >> >needed ? something like (async
> >> >sum)(asynchronously_produced_numbers()).
> > All that is easily achievable with classical Python coroutines, not
> > with asyncio garden variety of coroutines, which lately were casted
> > into a language level with async/await disablers:
> >
> > def coro1():
> >      yield 1
> >      yield 2
> >      yield 3
> >
> > def coro2():
> >      yield from coro1()
> >      yield 4
> >      yield 5
> >
> > print(sum(coro2()))
> You have easily achieved combining two generators with 'yield from'
> and feeding that to 'sum' builtin.

Right, the point here was that PEP492, banning usage of "yield" in
coroutines, doesn't help with such simple and basic usage of them. And
then I again can say what I said during initial discussion of PEP492:
I have dual feeling about it: promise of making coroutines easier and
more user friendly is worth all support, but step of limiting basic
language usage in them doesn't seem good. What me and other people can
do then is just trust that you guys know what you do and PEP492 will
be just first step. But bottom line is that I personally don't find
async/await worthy to use for now - it's better to stick to old good
yield from, until the promise of truly better coroutines is delivered.  

> There is no way to combine synchronous loops with asynchronous 
> coroutines; by definition, the entire process will block while you
> are iterating trough them.

Indeed, to solve this issue, it requires to use "inversion of inversion
of control" pattern. Typical real-world example is that someone has got
their (unwise) main loop and wants us to do callback mess programming
with it, but we don't want them to call us, we want to call them, at
controlled intervals, to do controlled amount of work.

The solution would be to pass a callback which looks like a
normal function, but which is actually a coroutine. Then foreign main
loop, calling it, would suspend it and pass control to "us", and us can
let another iteration of foreign main loop by resuming that coroutine.

The essence of this approach lies in having a coroutine "look like" a
usual function, or more exactly, in being able to resume a coroutine
from a context of normal function. And that's explicitly not what
Python coroutines are - they require lexical marking of each site where
coroutine suspension may happen (for good reasons which were described
here on the list many times).

During previous phase of discussion, I gave classification of
different types of coroutines to graps/structure all this stuff better: 

Best regards,
 Paul                          mailto:pmiscml at

From ncoghlan at  Mon Jun 29 10:32:56 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 29 Jun 2015 18:32:56 +1000
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <20150629094439.4f8a8efa@x230>
References: <>
 <20150628185249.61624b82@x230> <>
Message-ID: <>

On 29 June 2015 at 16:44, Paul Sokolovsky <pmiscml at> wrote:
> Hello,
> On Sun, 28 Jun 2015 18:14:20 -0400
> Yury Selivanov < at> wrote:
>> On 2015-06-28 11:52 AM, Paul Sokolovsky wrote:
>> >> There is also the problem that one cannot easily feed a queue,
>> >> >asynchronous generator, or any asynchronous iterator to a simple
>> >> >synchronous consumer like sum() or list() or "".join(). It would
>> >> >be nice if there was a way to wrap them to asynchronous ones when
>> >> >needed ? something like (async
>> >> >sum)(asynchronously_produced_numbers()).
>> > All that is easily achievable with classical Python coroutines, not
>> > with asyncio garden variety of coroutines, which lately were casted
>> > into a language level with async/await disablers:
>> >
>> > def coro1():
>> >      yield 1
>> >      yield 2
>> >      yield 3
>> >
>> > def coro2():
>> >      yield from coro1()
>> >      yield 4
>> >      yield 5
>> >
>> > print(sum(coro2()))
>> You have easily achieved combining two generators with 'yield from'
>> and feeding that to 'sum' builtin.
> Right, the point here was that PEP492, banning usage of "yield" in
> coroutines, doesn't help with such simple and basic usage of them. And
> then I again can say what I said during initial discussion of PEP492:
> I have dual feeling about it: promise of making coroutines easier and
> more user friendly is worth all support, but step of limiting basic
> language usage in them doesn't seem good. What me and other people can
> do then is just trust that you guys know what you do and PEP492 will
> be just first step. But bottom line is that I personally don't find
> async/await worthy to use for now - it's better to stick to old good
> yield from, until the promise of truly better coroutines is delivered.

The purpose of PEP 492 is to fundamentally split the asynchronous IO
use case away from traditional generators. If you're using native
coroutines, you MUST have an event loop, or at least be using
something like asyncio.run_until_complete() (which spins up a
scheduler for the duration). If you're using generators without
@types.coroutine or @asyncio.coroutine (or the equivalent for tulip,
Twisted, etc), then you're expecting a synchronous driver rather than
an asynchronous one.

This isn't an accident, or something that will change at some point in
the future, it's the entire point of the exercise: having it be
obvious both how you're meant to interact with something based on the
way it's defined, and how you factor outside subcomponents of the
algorithm. Asynchronous driver? Use a coroutine. Synchronous driver?
Use a generator.

What we *don't* have are consumption functions that have an implied
"async for" inside them - functions like sum(), any(), all(), etc are
all synchronous drivers.

The other key thing we don't have yet? Asynchronous comprehensions.

A peak at the various options for parallel execution described in
documentation helps illustrate why: once we're talking about applying
reduction functions to asynchronous iterables we're getting into
full-blown language-level-support-for-MapReduce territory. Do the
substeps still need to be executed in series? Or can the substeps be
executed in parallel, and either accumulated in iteration order or as
they become available? Does it perhaps make sense to *require* that
the steps be executable in parallel, such that we could write the

    result = sum(x*x for async x in coro)

Where the reduction step remains synchronous, but we can mark the
comprehension/map step as asynchronous, and have that change the
generated code to create an implied lambda for the "lambda x: x*x"
calculation, dispatch all of those to the scheduler at once, and then
produce the results one at a time?

The answer to that is "quite possibly, but we don't really know yet".
PEP 492 is enough to address some major comprehensibility challenges
that exist around generators-as-coroutines. It *doesn't* bring
language level support for parallel MapReduce to Python, but it *does*
bring some interesting new building blocks for folks to play around
with in that regard (in particular, figuring out what we want the
comprehension level semantics of "async for" to be).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guido at  Mon Jun 29 11:33:21 2015
From: guido at (Guido van Rossum)
Date: Mon, 29 Jun 2015 11:33:21 +0200
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
 <20150628185249.61624b82@x230> <>
Message-ID: <>

Not following this in detail, but want to note that async isn't a good
model for parallelization (except I/O) because the expectation of
coroutines is single threading. The event loop serializes callbacks.
Changing this would break expectations and code.
On Jun 29, 2015 10:33 AM, "Nick Coghlan" <ncoghlan at> wrote:

> On 29 June 2015 at 16:44, Paul Sokolovsky <pmiscml at> wrote:
> > Hello,
> >
> > On Sun, 28 Jun 2015 18:14:20 -0400
> > Yury Selivanov < at> wrote:
> >
> >>
> >> On 2015-06-28 11:52 AM, Paul Sokolovsky wrote:
> >> >> There is also the problem that one cannot easily feed a queue,
> >> >> >asynchronous generator, or any asynchronous iterator to a simple
> >> >> >synchronous consumer like sum() or list() or "".join(). It would
> >> >> >be nice if there was a way to wrap them to asynchronous ones when
> >> >> >needed ? something like (async
> >> >> >sum)(asynchronously_produced_numbers()).
> >> > All that is easily achievable with classical Python coroutines, not
> >> > with asyncio garden variety of coroutines, which lately were casted
> >> > into a language level with async/await disablers:
> >> >
> >> > def coro1():
> >> >      yield 1
> >> >      yield 2
> >> >      yield 3
> >> >
> >> > def coro2():
> >> >      yield from coro1()
> >> >      yield 4
> >> >      yield 5
> >> >
> >> > print(sum(coro2()))
> >>
> >>
> >> You have easily achieved combining two generators with 'yield from'
> >> and feeding that to 'sum' builtin.
> >
> > Right, the point here was that PEP492, banning usage of "yield" in
> > coroutines, doesn't help with such simple and basic usage of them. And
> > then I again can say what I said during initial discussion of PEP492:
> > I have dual feeling about it: promise of making coroutines easier and
> > more user friendly is worth all support, but step of limiting basic
> > language usage in them doesn't seem good. What me and other people can
> > do then is just trust that you guys know what you do and PEP492 will
> > be just first step. But bottom line is that I personally don't find
> > async/await worthy to use for now - it's better to stick to old good
> > yield from, until the promise of truly better coroutines is delivered.
> The purpose of PEP 492 is to fundamentally split the asynchronous IO
> use case away from traditional generators. If you're using native
> coroutines, you MUST have an event loop, or at least be using
> something like asyncio.run_until_complete() (which spins up a
> scheduler for the duration). If you're using generators without
> @types.coroutine or @asyncio.coroutine (or the equivalent for tulip,
> Twisted, etc), then you're expecting a synchronous driver rather than
> an asynchronous one.
> This isn't an accident, or something that will change at some point in
> the future, it's the entire point of the exercise: having it be
> obvious both how you're meant to interact with something based on the
> way it's defined, and how you factor outside subcomponents of the
> algorithm. Asynchronous driver? Use a coroutine. Synchronous driver?
> Use a generator.
> What we *don't* have are consumption functions that have an implied
> "async for" inside them - functions like sum(), any(), all(), etc are
> all synchronous drivers.
> The other key thing we don't have yet? Asynchronous comprehensions.
> A peak at the various options for parallel execution described in
> documentation helps illustrate why: once we're talking about applying
> reduction functions to asynchronous iterables we're getting into
> full-blown language-level-support-for-MapReduce territory. Do the
> substeps still need to be executed in series? Or can the substeps be
> executed in parallel, and either accumulated in iteration order or as
> they become available? Does it perhaps make sense to *require* that
> the steps be executable in parallel, such that we could write the
> following:
>     result = sum(x*x for async x in coro)
> Where the reduction step remains synchronous, but we can mark the
> comprehension/map step as asynchronous, and have that change the
> generated code to create an implied lambda for the "lambda x: x*x"
> calculation, dispatch all of those to the scheduler at once, and then
> produce the results one at a time?
> The answer to that is "quite possibly, but we don't really know yet".
> PEP 492 is enough to address some major comprehensibility challenges
> that exist around generators-as-coroutines. It *doesn't* bring
> language level support for parallel MapReduce to Python, but it *does*
> bring some interesting new building blocks for folks to play around
> with in that regard (in particular, figuring out what we want the
> comprehension level semantics of "async for" to be).
> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From p.f.moore at  Mon Jun 29 12:57:58 2015
From: p.f.moore at (Paul Moore)
Date: Mon, 29 Jun 2015 11:57:58 +0100
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
 <20150628185249.61624b82@x230> <>
Message-ID: <>

On 29 June 2015 at 09:32, Nick Coghlan <ncoghlan at> wrote:
> What we *don't* have are consumption functions that have an implied
> "async for" inside them - functions like sum(), any(), all(), etc are
> all synchronous drivers.

Note that this requirement to duplicate big chunks of functionality in
sync and async forms is a fundamental aspect of the design. It's not
easy to swallow (hence the fact that threads like this keep coming up)
as it seems to badly violate DRY principles, but it is deliberate.

There are a number of blog posts that discuss this "two separate
worlds" approach, some positive, some negative. Links have been posted
recently in one of these threads, but I'm afraid I don't have them to
hand right now.


From pmiscml at  Mon Jun 29 13:09:28 2015
From: pmiscml at (Paul Sokolovsky)
Date: Mon, 29 Jun 2015 14:09:28 +0300
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
 <20150628185249.61624b82@x230> <>
Message-ID: <20150629140928.7e767753@x230>


On Mon, 29 Jun 2015 11:57:58 +0100
Paul Moore <p.f.moore at> wrote:

> On 29 June 2015 at 09:32, Nick Coghlan <ncoghlan at> wrote:
> > What we *don't* have are consumption functions that have an implied
> > "async for" inside them - functions like sum(), any(), all(), etc
> > are all synchronous drivers.
> Note that this requirement to duplicate big chunks of functionality in
> sync and async forms is a fundamental aspect of the design. It's not
> easy to swallow (hence the fact that threads like this keep coming up)
> as it seems to badly violate DRY principles, but it is deliberate.
> There are a number of blog posts that discuss this "two separate
> worlds" approach, some positive, some negative. Links have been posted
> recently in one of these threads, but I'm afraid I don't have them to
> hand right now.

Maybe not the links you meant, but definitely discussing a split-world
problem designers of other languages and APIs face:

What Color is Your Function?

Red and Green Callbacks

> Paul

Best regards,
 Paul                          mailto:pmiscml at

From ncoghlan at  Mon Jun 29 13:23:52 2015
From: ncoghlan at (Nick Coghlan)
Date: Mon, 29 Jun 2015 21:23:52 +1000
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
 <20150628185249.61624b82@x230> <>
Message-ID: <>

On 29 Jun 2015 7:33 pm, "Guido van Rossum" <guido at> wrote:
> Not following this in detail, but want to note that async isn't a good
model for parallelization (except I/O) because the expectation of
coroutines is single threading. The event loop serializes callbacks.
Changing this would break expectations and code.

Yeah, it's a bad idea - I realised after reading your post that because
submission for scheduling and waiting for a result can already be separated
it should be possible in Py 3.5 to write a "parallel" asynchronous iterator
that eagerly consumes the awaitables produced by another asynchronous
iterator, schedules them all, then produces the awaitables in order.

(That idea is probably as clear as mud without code to show what I mean...)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jonathan at  Mon Jun 29 16:51:26 2015
From: jonathan at (Jonathan Slenders)
Date: Mon, 29 Jun 2015 16:51:26 +0200
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
Message-ID: <>

Could we do that? Is there is reason it's not already a namedtuple?

I always forget what the read-end and what the write-end of the pipe is,
and I use it quite regularly.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From joejev at  Mon Jun 29 20:00:24 2015
From: joejev at (Joseph Jevnik)
Date: Mon, 29 Jun 2015 14:00:24 -0400
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

Maybe make it a structsequence?

On Mon, Jun 29, 2015 at 10:51 AM, Jonathan Slenders <jonathan at>

> Could we do that? Is there is reason it's not already a namedtuple?
> I always forget what the read-end and what the write-end of the pipe is,
> and I use it quite regularly.
> Jonathan
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ron3200 at  Mon Jun 29 23:51:10 2015
From: ron3200 at (Ron Adam)
Date: Mon, 29 Jun 2015 17:51:10 -0400
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <>
References: <>
 <20150628185249.61624b82@x230> <>
Message-ID: <mmsekf$mh8$>

On 06/29/2015 07:23 AM, Nick Coghlan wrote:
> On 29 Jun 2015 7:33 pm, "Guido van Rossum"
> <guido at
> <mailto:guido at>> wrote:
>  >
>  > Not following this in detail, but want to note that async isn't a good
> model for parallelization (except I/O) because the expectation of
> coroutines is single threading. The event loop serializes callbacks.
> Changing this would break expectations and code.
> Yeah, it's a bad idea - I realised after reading your post that because
> submission for scheduling and waiting for a result can already be separated
> it should be possible in Py 3.5 to write a "parallel" asynchronous iterator
> that eagerly consumes the awaitables produced by another asynchronous
> iterator, schedules them all, then produces the awaitables in order.
> (That idea is probably as clear as mud without code to show what I mean...)

Only the parts concerning "schedules them all", and "produces awaitables in 
order".   ;-)

Async IO is mainly about recapturing idle cpu time while waiting for 
relatively slow io.  But it could also be a way to organise asynchronous code.

In the earlier example with circles, and each object having it's own 
thread...  And that running into the thousands,  it can be rearranged a bit 
if each scheduler has it's own thread.  Then objects can be assigned to 
schedulers instead of threads.  (or something like that.)

Of course that's still clear as mud at this point, but maybe a different 
colour of mud.  ;-)


From cs at  Tue Jun 30 02:12:42 2015
From: cs at (Cameron Simpson)
Date: Tue, 30 Jun 2015 10:12:42 +1000
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On 29Jun2015 16:51, Jonathan Slenders <jonathan at> wrote:
>Could we do that? Is there is reason it's not already a namedtuple?
>I always forget what the read-end and what the write-end of the pipe is,
>and I use it quite regularly.

The ordering is the same as for the default process file descriptors. A normal 
process has stdin as fd 0 and stdout as fd 1. So the return from pipe() has the 
read end as index 0 and the write end as fd 1.

Cameron Simpson <cs at>

From njs at  Tue Jun 30 02:45:47 2015
From: njs at (Nathaniel Smith)
Date: Mon, 29 Jun 2015 17:45:47 -0700
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 29, 2015 at 7:51 AM, Jonathan Slenders <jonathan at> wrote:
> Could we do that? Is there is reason it's not already a namedtuple?
> I always forget what the read-end and what the write-end of the pipe is, and
> I use it quite regularly.

Sounds like a good idea to me.


Nathaniel J. Smith --

From steve at  Tue Jun 30 03:50:04 2015
From: steve at (Steven D'Aprano)
Date: Tue, 30 Jun 2015 11:50:04 +1000
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 30, 2015 at 10:12:42AM +1000, Cameron Simpson wrote:
> On 29Jun2015 16:51, Jonathan Slenders <jonathan at> wrote:
> >Could we do that? Is there is reason it's not already a namedtuple?
> >
> >I always forget what the read-end and what the write-end of the pipe is,
> >and I use it quite regularly.
> The ordering is the same as for the default process file descriptors. A 
> normal process has stdin as fd 0 and stdout as fd 1. So the return from 
> pipe() has the read end as index 0 and the write end as fd 1.

Yeah, I always forget which is fd 0 and which is fd 1 too.

Having nice descriptive names rather than using numbered indexes is 
generally better practice, and I don't think there is any serious 
downside to using a namedtuple. A minor enhancement like this shouldn't 
require an extended discussion here on python-ideas.

Jonathan, would you be so kind as to raise an enhancement request on the 
bug tracker? I don't think it's too late for 3.5.


From zachary.ware+pyideas at  Tue Jun 30 04:00:22 2015
From: zachary.ware+pyideas at (Zachary Ware)
Date: Mon, 29 Jun 2015 21:00:22 -0500
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jun 29, 2015 at 8:50 PM, Steven D'Aprano <steve at> wrote:
> Jonathan, would you be so kind as to raise an enhancement request on the
> bug tracker? I don't think it's too late for 3.5.

3.5 is feature-frozen without a special exemption from Larry Hastings,
and I think the window for those is already past too.  3.6 is open for
development already, though.


From cs at  Tue Jun 30 04:05:29 2015
From: cs at (Cameron Simpson)
Date: Tue, 30 Jun 2015 12:05:29 +1000
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On 30Jun2015 11:50, Steven D'Aprano <steve at> wrote:
>On Tue, Jun 30, 2015 at 10:12:42AM +1000, Cameron Simpson wrote:
>> On 29Jun2015 16:51, Jonathan Slenders <jonathan at> wrote:
>> >Could we do that? Is there is reason it's not already a namedtuple?
>> >
>> >I always forget what the read-end and what the write-end of the pipe is,
>> >and I use it quite regularly.
>> The ordering is the same as for the default process file descriptors. A
>> normal process has stdin as fd 0 and stdout as fd 1. So the return from
>> pipe() has the read end as index 0 and the write end as fd 1.
>Yeah, I always forget which is fd 0 and which is fd 1 too.

Shrug. I use the rationale above. stdin==0 is extremely easy to remember. 
However, I have no cogent objection to a named tuple myself.

>Having nice descriptive names rather than using numbered indexes is
>generally better practice, and I don't think there is any serious
>downside to using a namedtuple. A minor enhancement like this shouldn't
>require an extended discussion here on python-ideas.

But... what shall we call the attributes? Sure an extended bikeshed is required 

Cameron Simpson <cs at>

I will be a speed bump on the information super-highway.
        - jvogel at (jeff vogel)

From ben+python at  Tue Jun 30 04:10:22 2015
From: ben+python at (Ben Finney)
Date: Tue, 30 Jun 2015 12:10:22 +1000
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
References: <>
Message-ID: <>

Steven D'Aprano <steve at> writes:

> Yeah, I always forget which is fd 0 and which is fd 1 too.
> Having nice descriptive names rather than using numbered indexes is
> generally better practice

I definitely prefer to use, and promote, the explicit names ?stdin?,
?stdout?, and ?stderr? rather than the file descriptor numbers.

On the point of confusing them though: I find it easy enough to remember
that the two streams for output stay together, and the input one comes
first at 0.

> and I don't think there is any serious downside to using a namedtuple.
> A minor enhancement like this shouldn't require an extended discussion
> here on python-ideas.

+1, let's just get the standard names there as attributes of a

One more set of magic numbers to relegate to implementation detail,
encapsulated where they belong!

 \         Fry: ?Take that, poor people!?  Leela: ?But Fry, you?re not |
  `\     rich.?  Fry: ?No, but I will be someday, and then people like |
_o__)                                  me better watch out!? ?Futurama |
Ben Finney

From rosuav at  Tue Jun 30 04:14:17 2015
From: rosuav at (Chris Angelico)
Date: Tue, 30 Jun 2015 12:14:17 +1000
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 30, 2015 at 12:10 PM, Ben Finney <ben+python at> wrote:
> Steven D'Aprano <steve at> writes:
>> Yeah, I always forget which is fd 0 and which is fd 1 too.
>> Having nice descriptive names rather than using numbered indexes is
>> generally better practice
> I definitely prefer to use, and promote, the explicit names ?stdin?,
> ?stdout?, and ?stderr? rather than the file descriptor numbers.
> On the point of confusing them though: I find it easy enough to remember
> that the two streams for output stay together, and the input one comes
> first at 0.
>> and I don't think there is any serious downside to using a namedtuple.
>> A minor enhancement like this shouldn't require an extended discussion
>> here on python-ideas.
> +1, let's just get the standard names there as attributes of a
> namedtuple.

Except that this isn't about stdin/stdout - that just happens to make
a neat mnemonic. This is about a pipe, which has a reading end and a
writing end. If you pass one of those to another process to use as its
stdout, you'll be reading from the reading end; calling it "stdin"
would be confusing, since you're getting what the process wrote to

How about just "read" and "write"?

Yep, Cameron was right...


From ncoghlan at  Tue Jun 30 06:08:19 2015
From: ncoghlan at (Nick Coghlan)
Date: Tue, 30 Jun 2015 14:08:19 +1000
Subject: [Python-ideas] Are there asynchronous generators?
In-Reply-To: <mmsekf$mh8$>
References: <>
 <20150628185249.61624b82@x230> <>
Message-ID: <>

On 30 June 2015 at 07:51, Ron Adam <ron3200 at> wrote:
> On 06/29/2015 07:23 AM, Nick Coghlan wrote:
>> On 29 Jun 2015 7:33 pm, "Guido van Rossum"
>> <guido at
>> <mailto:guido at>> wrote:
>>  >
>>  > Not following this in detail, but want to note that async isn't a good
>> model for parallelization (except I/O) because the expectation of
>> coroutines is single threading. The event loop serializes callbacks.
>> Changing this would break expectations and code.
>> Yeah, it's a bad idea - I realised after reading your post that because
>> submission for scheduling and waiting for a result can already be
>> separated
>> it should be possible in Py 3.5 to write a "parallel" asynchronous
>> iterator
>> that eagerly consumes the awaitables produced by another asynchronous
>> iterator, schedules them all, then produces the awaitables in order.
>> (That idea is probably as clear as mud without code to show what I
>> mean...)
> Only the parts concerning "schedules them all", and "produces awaitables in
> order".   ;-)

Some completely untested conceptual code that may not even compile,
let alone run, but hopefully conveys what I mean better than English

    def get_awaitables(self, async_iterable):
        """Gets a list of awaitables from an asynchronous iterator"""
        asynciter = async_iterable.__aiter__()
        awaitables = []
        while True:
            except StopAsyncIteration:
        return awaitables

    async def wait_for_result(awaitable):
        """Simple coroutine to wait for a single result"""
       return await awaitable

    def iter_coroutines(async_iterable):
        """Produces coroutines to wait for each result from an
asynchronous iterator"""
        for awaitable in get_awaitables(async_iterable):
               yield wait_for_result(awaitable)

    def iter_tasks(async_iterable, eventloop=None):
        """Schedules event loop tasks to wait for each result from an
asynchronous iterator"""
        if eventloop is None:
            eventloop = asyncio.get_event_loop()
        for coroutine in iter_coroutines(async_iterable):
               yield eventloop.create_task(coroutine)

    class aiter_parallel:
        """Asynchronous iterator to wait for several asynchronous
operations in parallel"""
       def __init__(self, async_iterable):
            # Concurrent evaluation of future results is launched immediately
            self._tasks = tasks = list(iter_tasks(async_iterable))
            self._taskiter = iter(tasks)
        def __aiter__(self):
            return self
        def __anext__(self):
                return next(self._taskiter)
            except StopIteration:
                raise StopAsyncIteration

    # Example reduction function
    async def sum_async(async_iterable, start=0):
        tally = start
        async for x in aiter_parallel(async_iterable):
            tally += x
        return x

    # Parallel sum from synchronous code:
    result = asyncio.get_event_loop().run_until_complete(sum_async(async_iterable))

    # Parallel sum from asynchronous code:
    result = await sum_async(async_iterable))

As the definition of "aiter_parallel" shows, we don't offer any nice
syntactic sugar for defining asynchronous iterators yet (hence the
question that started this thread). Hopefully the above helps
illustrate the complexity hidden behind such a deceptively simple
question :)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From cs at  Tue Jun 30 08:07:56 2015
From: cs at (Cameron Simpson)
Date: Tue, 30 Jun 2015 16:07:56 +1000
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On 30Jun2015 12:14, Chris Angelico <rosuav at> wrote:
>On Tue, Jun 30, 2015 at 12:10 PM, Ben Finney <ben+python at> wrote:
>> Steven D'Aprano <steve at> writes:
>>> and I don't think there is any serious downside to using a namedtuple.
>>> A minor enhancement like this shouldn't require an extended discussion
>>> here on python-ideas.
>> +1, let's just get the standard names there as attributes of a
>> namedtuple.
>Except that this isn't about stdin/stdout - that just happens to make
>a neat mnemonic. This is about a pipe, which has a reading end and a
>writing end. If you pass one of those to another process to use as its
>stdout, you'll be reading from the reading end; calling it "stdin"
>would be confusing, since you're getting what the process wrote to
>How about just "read" and "write"?

+1 for "read" and "write" for me. And -1 on "stdin" and "stdout" for the same 
reason as outlined above.

Cameron Simpson <cs at>

From jonathan at  Tue Jun 30 09:02:05 2015
From: jonathan at (Jonathan Slenders)
Date: Tue, 30 Jun 2015 09:02:05 +0200
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

If we use "read" and write as names. It means that often we end up writing
code like this:

os.write(our_pipe.write, data)

Is that ok? I mean, it's not confusing that the is a method, while is an attribute.


2015-06-30 8:07 GMT+02:00 Cameron Simpson <cs at>:

> On 30Jun2015 12:14, Chris Angelico <rosuav at> wrote:
>> On Tue, Jun 30, 2015 at 12:10 PM, Ben Finney <ben+python at>
>> wrote:
>>> Steven D'Aprano <steve at> writes:
>>>> and I don't think there is any serious downside to using a namedtuple.
>>>> A minor enhancement like this shouldn't require an extended discussion
>>>> here on python-ideas.
>>> +1, let's just get the standard names there as attributes of a
>>> namedtuple.
>> Except that this isn't about stdin/stdout - that just happens to make
>> a neat mnemonic. This is about a pipe, which has a reading end and a
>> writing end. If you pass one of those to another process to use as its
>> stdout, you'll be reading from the reading end; calling it "stdin"
>> would be confusing, since you're getting what the process wrote to
>> stdout.
>> How about just "read" and "write"?
> +1 for "read" and "write" for me. And -1 on "stdin" and "stdout" for the
> same reason as outlined above.
> Cheers,
> Cameron Simpson <cs at>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Tue Jun 30 09:03:23 2015
From: rosuav at (Chris Angelico)
Date: Tue, 30 Jun 2015 17:03:23 +1000
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 30, 2015 at 5:02 PM, Jonathan Slenders <jonathan at> wrote:
> If we use "read" and write as names. It means that often we end up writing
> code like this:
> os.write(our_pipe.write, data)
> Is that ok? I mean, it's not confusing that the is a method, while
> is an attribute.

I'd much rather that than the converse. You always put read with read,
you always put write with write.


From njs at  Tue Jun 30 09:11:39 2015
From: njs at (Nathaniel Smith)
Date: Tue, 30 Jun 2015 00:11:39 -0700
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jun 30, 2015 at 12:03 AM, Chris Angelico <rosuav at> wrote:
> On Tue, Jun 30, 2015 at 5:02 PM, Jonathan Slenders <jonathan at> wrote:
>> If we use "read" and write as names. It means that often we end up writing
>> code like this:
>> os.write(our_pipe.write, data)
>> Is that ok? I mean, it's not confusing that the is a method, while
>> is an attribute.
> I'd much rather that than the converse. You always put read with read,
> you always put write with write.

It also appears to be the way that everyone is already naming their variables:

I see returns named "r, w", "rout, wout", "rfd, wfd", "rfd,
self.writepipe", "readfd, writefd", "p2cread, p2cwrite", etc.

Maybe readfd/writefd or read_fileno/write_fileno would be a little
better than plain read/write, both to remind the user that these are
fds rather than file objects and to make the names nouns instead of
verbs. But really read/write is fine too.


Nathaniel J. Smith --

From jonathan at  Tue Jun 30 09:22:24 2015
From: jonathan at (Jonathan Slenders)
Date: Tue, 30 Jun 2015 09:22:24 +0200
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

I created an issue:

readfd/writefd sounds like a good choice, but it's still open for

2015-06-30 9:11 GMT+02:00 Nathaniel Smith <njs at>:

> On Tue, Jun 30, 2015 at 12:03 AM, Chris Angelico <rosuav at> wrote:
> > On Tue, Jun 30, 2015 at 5:02 PM, Jonathan Slenders <jonathan at>
> wrote:
> >> If we use "read" and write as names. It means that often we end up
> writing
> >> code like this:
> >>
> >> os.write(our_pipe.write, data)
> >>
> >>
> >> Is that ok? I mean, it's not confusing that the is a method,
> while
> >> is an attribute.
> >
> > I'd much rather that than the converse. You always put read with read,
> > you always put write with write.
> It also appears to be the way that everyone is already naming their
> variables:
> I see returns named "r, w", "rout, wout", "rfd, wfd", "rfd,
> self.writepipe", "readfd, writefd", "p2cread, p2cwrite", etc.
> Maybe readfd/writefd or read_fileno/write_fileno would be a little
> better than plain read/write, both to remind the user that these are
> fds rather than file objects and to make the names nouns instead of
> verbs. But really read/write is fine too.
> -n
> --
> Nathaniel J. Smith --
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> Code of Conduct:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From niki.spahiev at  Tue Jun 30 09:32:22 2015
From: niki.spahiev at (Niki Spahiev)
Date: Tue, 30 Jun 2015 10:32:22 +0300
Subject: [Python-ideas] Make os.pipe() return a namedtuple.
In-Reply-To: <>
References: <>
Message-ID: <>

On 30.06.2015 09:07, Cameron Simpson wrote:
> +1 for "read" and "write" for me. And -1 on "stdin" and "stdout" for the
> same reason as outlined above.

This is common found code:

if not hasattr(src, 'read'): src = open(src)

same for write.

I think read_fd and write_fd is better.
